Monday, 28 October 2013

Introduction to Research Data Management (RDM)

This is an attempt at ‘RDM in a (Coco)nutshell’ primarily for researchers rather than Information management practitioners.

Key Concepts:

1. RDM is good research practice

2. RDM is concerned with looking after your data throughout the research project.

3. RDM involves long-term preservation of some of your data after the project.

4. Metadata is data documentation and is essential for RDM.

5. RDM makes it possible to share your data – if you want to or have to.

6. Data Management Plan


1. RDM is good research practice – it is mostly common sense. Most researchers will be aware of many of the issues involved and already practicing some of the procedures. Data in this case will refer to digital material collected during the research process, which will be analysed to test the research hypothesis – the same principles apply to non-digital data (see the earlier blog post about research data).

2. RDM is concerned with looking after your data throughout the research project from the project planning stage to the publication stage. Looking after your data involves making sure that the data are stored securely and backed up regularly. It involves ensuring only the right people have access to the data, during the research process and afterwards when the data are archived.

When planning research, risks to the data will need to be assessed and procedures agreed upon to minimise these risks. What would you do if you lost your research data tomorrow? Can your data be recreated or are they, for example, observational data that can’t be captured again?

At the planning stage, consideration needs to be given to choosing appropriate file formats for your digital data – will the software necessary for interacting with the data be available in the future? Consideration should also be given to the organisation of the data – the arrangement of folder structures and protocols for naming files and folders.

3. Long term preservation (or curation) of some of your data may be required by the funder.
At the end of the research project, decisions need to have been made about which data will be preserved, how they will be processed for preservation and which will be disposed of. This may well have been determined during the planning stage by agreement with the research funder. The UK research funders have different requirements for long-term curation of research data – details can be found at the DCC website.

Where should your research data be curated? Many research funders and researcher communities have established Data Centres and Discipline based Repositories; a list of these may be found at Datacite or at Databib. A number of HE institutions are developing their own Data Repositories, or have widened the capabilities of their Institutional Repositories to include research datasets as well as research papers.

4. Metadata is data documentation and is essential for RDM.
From the planning stage to the data preservation stage, metadata will need to be collected. Metadata identifies specific units of research data and their purpose in the research process.

When data are created, it is essential that details such as instrument settings, experimental protocols and conditions are recorded, for it may be difficult to recall these at a later stage. Many instruments will record such metadata during the process – the more automated the recording of metadata is, the better.

Many metadata elements will be recorded as attributes of the digital file when the data are created / collected – file format, size and creation date for example. Data should be given an appropriate filename or title; Different versions of files will need naming to distinguish them; Files may be collected together in appropriately named folders. Some metadata elements will be common to all research data created by a research project – Creator name, Project name, Institution, Grant number, for example. 

Where necessary, metadata will need recording, perhaps in text files or word documents, and placed in a folder with the data that it documents. This doesn't necessarily need to be structured or in a standard format, although that would be best. 

5. Sharing your data – if you want to or have to – is made possible through RDM practice.
Perhaps the most contentious issue for researchers is the concept of making their research data openly accessible.

Good Research Data Management does not require your research data to be ‘Open Data’ or openly accessible! 
Although it is true that some of the drivers in developing RDM good practice are the concepts of ‘Data Sharing and Data Reuse’ and ’Data driven Science’, making research data openly accessible is NOT a requirement of RDM, but it is made possible through RDM.

You may be required by your funder to make your long-term curated research data accessible to anyone or by registered access. In line with the RCUK principles on Data policy, which does advocate open access to data resulting from publicly funded research, most UK funders have published data sharing policies – details are available at the DCC website.

You may of course be willing to share research data, openly or on a restricted basis. Your research data may be cited in the same way as other research publications and is now accepted as a research output for the REF2014 assessment (output type S). There is a growing body of research that indicates the benefit of sharing research data; Piowar, et al (2007) and Henneken & Accomazzi (2011) find increased citation rates for articles associated with accessible data. Data sharing has been common practice for many years in a number of disciplines – Molecular Biology for example.

Before publishing data, it is important to check with Research and Innovation Services – as The University of Sheffield  Research Data Management Policy states:

5. Unless the terms of research grants or contracts provide otherwise, data generated by research projects are the property of the University of Sheffield. Researchers should exercise care in assigning rights in data to publishers or other external agencies.
To publish and share research data, they will need submitting to one of the Data Centres or Discipline based Repositories - listed here at Datacite or at Databib. These are, of course, the source of other projects’ data that you may wish to reuse.

6. A Data Management Plan (DMP) will need to be drawn up which will set out your arrangements for managing your research data. Most research funders require the submission of a DMP (also known as a Technical Appendix by the AHRC) during the grant application. Information about and resources for creating a DMP can be found at the DCC website.

Useful Resources:
Datacite - Repository list  http://datacite.org/repolist 
Databib  http://databib.org/.
Henneken, E. & Accomazzi, A. (2011) Linking to Data - Effect on Citation Rates in Astronomy  http://arxiv.org/abs/1111.3618
Piowar, et al (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.  http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0000308
RCUK - Principles on Data policy  http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
The University of Sheffield - Research Data Management Policy  http://www.shef.ac.uk/ris/other/gov-ethics/grippolicy/practices/all/rdmpolicy

No comments:

Post a Comment