Session 2 – Capacity and skills issues

the notes from this session are available.

Welcome to the web forum for this session within the JISC Innovation Forum’s ‘Data’ strand. The (invitation-only) face-to-face component of this forum will be on 15th July 1500-1630. For that to be a success, we need your comments and feedback.

The sector is at a relatively early stage in terms of thinking about what skills and careers are needed to enable data to be managed / curated well, except in limited, fairly well-defined areas where a lot of progress has been made. Hence, we’ve anticipated the face-to-face session being something of an ‘informed brainstorm’. To kick this off, and – we hope – spark some online discussion before the face-to-face session, we’d like you to share:

1. a *very* brief summary outlining specifically how your work relates to skills development and capacity building for research data management

2. your reactions to these questions, which are intended to prompt discussion:
• What are the current data management skills deficits and capacity building possibilities?
• What are the longer term requirements and implications for the research community?
• What is the value of and possibilities for accrediting data management training programmes?
• How might formal education for data management be progressed?

Please use the comments facility below to share your contribution by Thursday 10th July.

Relevant background reports include:
Publication and Quality Assurance of Research Data Outputs
Dealing with data

Many thanks
Joy Davidson and Neil Jacobs

10 thoughts on “Session 2 – Capacity and skills issues

  1. Neil Geddes

    1- We store scientific data and work with researchers and IT support staff to share best practice in data storage and management. This is done through collaborative development and through regular coordination meetings.

    2
    * What are the current data management skills deficits and capacity building possibilities?
    – lack of lightweight end-user tools which can integrated effectively into professionally run curation services
    – experience for managing very large data sets within the research community. Commercial providers such as Google and Amazon are far ahead of academic practice in this area and are not in the business of sharing their expertise.

    • What are the longer term requirements and implications for the research community?
    – need to understand how and when to leverage commercial offerings and how to bridge or fill the gaps.

  2. Joy Davidson

    Enabling International Access to Scientific Data Sets: Creation of the Distributed Data Curation Center (D2C2) by James L. Mullins, Purdue University

    http://www.iatul.org/doclibrary/public/Conf_Proceedings/2007/Mullins_J_full.pdf

    This paper provides the following description of the range of roles involved in data curation may be of value to the discussions in this session.

    ‘The individual roles that are most relevant to this paper are the definitions of data authors, data managers, and data scientists. In simple terms these three roles break down as follows: data authors – domain scientists, educators and students who have a vested interest involved in the research generated from the data; data managers – information technologists, computer scientists, and information scientists responsible for the computing, storage and access of the data for analysis; and data scientists – curators, expert annotators, librarians and archivists, among others. The Data Scientists have responsibility to undertake creative inquiry and analysis to enhance the undertaking of research by the data authors, and to apply a consistent methodology and best practices to the curation of data.’

  3. Alma Swan

    James Mullins has also quoted the NSF’s definition of a data scientist as some who could:
    “conduct creative inquiry and analysis; enhance through consultation, collaboration, and coordination the ability of other to conduct research and education using digital data collections; be at the forefront in developing innovative concepts in database technology and information sciences, including methods for data visualization and information discovery, and applying these in the fields of science and education relevant to the collection; implement best practices and technology; serve as a mentor to beginning or transitioning investigators, students, and others interested in pursuing data science; and design and implement education and outreach programs that make the benefits of data collections and digital information science available to the broadest possible range of researchers, educators, students, and the general public”

    This is considerably wider than the Mullins definition above and extends the role from ‘curators, expert annotators, librarians and archivists’ to people with other positions. It is worth some discussion.

  4. Norman Gray

    1. I work on a number of projects connected with the international Virtual Observatory movement, enabling planet-scale interoperability and sharing of data.

    2. From this point of view the ‘skills deficits’ are how you make it easy for disparate projects to make their data available interoperably, given that they’re already managing that data.

    We don’t necessarily need to use commercial expertise at this larger-scale end. Google isn’t _that_ big — its holdings are probably only 1PB in size, and there are several present or planned scientific datasets larger than that.

  5. Sam Pepler

    1. a *very* brief summary outlining specifically how your work relates to skills development and capacity building for research data management

    I am a data centre curation manager, so I am directly responsible for the development of the data scientists running the data centre. We also try to educate the scientists in data management techniques and issues.

    2. your reactions to these questions, which are intended to prompt discussion:
    • What are the current data management skills deficits and capacity building possibilities?
    – IPR and legal issues
    – Data selection and retention
    – Writing documentation

    • What are the longer term requirements and implications for the research community?
    – Poorly documented data that no-one wants is available
    – Good data lost

    • What is the value of and possibilities for accrediting data management training programmes?
    – Data managment is ill defined. Training couses force you to define it and make it visable.

    • How might formal education for data management be progressed?
    – Records managment has been active for decades. What do they do?

  6. Robin Rice

    I’m interested in this topic both through my ‘day job’ as a data librarian for a research university, http://datalib.ed.ac.uk/, and as project manager for DISC-UK DataShare, http://www.disc-uk.org/datashare.html .

    In the Data Library, we are finding increasingly that users need help less with finding sources of data (though they still may need help and training in using it), and more help in managing data. This reflects the changing model in librarianship more generally from information scarcity to information abundance. In order to assist our users, we need to become more knowledgeable ourselves. We also want to be in a position to help our users meet funders’ demands for data sharing plans in research proposals.

    This leads on to DISC-UK’s interest in sharing knowledge between data librarians, data managers and repository managers, and librarians to envisage better services for our users, in this case, building capacity for institutional repositories to manage research data as well as publications.

    I am turned off by the ‘debate’ about who should be trained to curate data. The experience of IASSIST (International Association for Social Science Information Services and Technology) is that ‘data people’ come from a variety of backgrounds – librarianship, technologists, researchers – and can form a community around common needs and interests, within a common domain (social sciences in this case). The conference in May at Stanford was particularly vibrant: Technology of Data: Collection, Communication, Access and Preservation. https://www.stanford.edu/group/ADS/cgi-bin/drupal/conference-schedule

    There are advantages to having people from different backgrounds at the table, but one disadvantage is that this community which has been in existence for about 35 years has not managed to find a path to accreditation, and as a result the profession, such as it is, remains marginalised (also because they are so dispersed).

  7. Joy Davidson

    Thanks to everyone for their views so far. I think we should have quite a lively session.

    It does seem a daunting enough task to simply define what data curation involves and who should be involved so it may well prove extremely difficult to pin down a set of core or minimum requirements for formal education and accredited training. However, it does seem that there is value in professionalising the role of data curator (but perhaps a better job title is required!). I am particularly keen to see some indications of career progression so that the role of data curator is seen as a dead-end job.

  8. Robin Rice

    Joy – I’m sure you mean NOT seen as a dead-end job. 😉

    At the moment there’s an interesting parallel situation with regard to career paths for institutional repository managers.

    To what extent is the IR a side-show from the ‘main business’ of the library? How many IR managers are on short-term contracts and face job insecurity when the JISC project funding runs out? Do they get support from their Library senior managers in terms of technical support and political advocacy within the institution?

    A hard-hitting and colourful analysis of this situation (in the US, at least) has been written by Dorothea Salo, Innkeeper at the Roach Motel. She also covers other issues currently vexing IR managers such as open source software communities.

    The Repository Fringe event in Edinburgh at the end of this month welcomes Dorothea as our keynote speaker. Dave DeRoure will be giving the closing plenary address.

  9. Rachel Bruce

    This is not in reply to the questions but I skimmed the thread and recalled Lorcan Dempsey’s note to the JISC repositories list – so just as an example:

    The “Master of Science–Concentration in Data Curation” from GSLIS at
    the University of Illinois at Urbana Champaign is interesting in this
    context.

    http://www.lis.uiuc.edu/programs/ms/data_curation.html

    From the blurb:

    “The Data Curation Education Program (DCEP) concentration within our
    ALA-accredited master of science offers a focus on data collection and
    management, knowledge representation, digital preservation and
    archiving, data standards, and policy. Data curation is the active and
    on-going management of data through its lifecycle of interest and
    usefulness to scholarship, science, and education. Data curation
    activities enable data discovery and retrieval, maintain its quality,
    add value, and provide for re-use over time, and this new field includes
    authentication, archiving, management, preservation, retrieval, and
    representation. Our program will provide a strong focus on the theory
    and skills necessary to work directly with academic and industry
    researchers who need data curation expertise.”

  10. Rachel Bruce

    I have just had a meeting with Adrian Burton from Australian National University, APSR and in reconfiguring their activity to support research infrastructure they have thought about this issue and ways that they can try and address it. They are hoping to help define elements of a curriculum that can support the development of the skills required for data.

    The key areas where they think they need to be influecing the curriculum are:

    Library science = what generic data management skills are needed? are these “data scientists”?

    IT, computer science = can the addition of certain data focussed aspects on these courses create the skills of the “data scientist”?

    Researchers = what is it we need to ensure is part of a research post-grad ( or before ? )that will make sure researchers are data literate?

Comments are closed.