Aller au contenu principal
Retour au site principal de la Bibliothèque de l'ÉTS
 

Research Data Management (RDM): Documentation and metadata

Purpose of documentation

“Documentation is a love letter that you write to your future self.” Damian Conway (2005)

Data documentation can be defined as the clear description of everything that a new "data user" or "your future self" would need to know in order to find, understand, reproduce, and reuse your data, independently and without the risk of misinterpretation. It should clearly describe how you generated or used the data, why, and where to find the associated files. It can also serve as integration documentation for new colleagues, even if the responsible researcher leaves the project. Developing a data management plan (DMP) can facilitate the development of documentation.

Data documentation is necessary at two levels: documentation about the entire study or project, and documentation about individual records, observations, or data points. (Adapted from ELIXIR Belgium, 2020)

Here is a list of important elements to consider for ensuring proper documentation that facilitates information retrieval, data sharing within the research team, and potential reuse while ensuring their longevity throughout their useful life (Université de Sherbrooke, 2022):

  • Formulated hypotheses
  • Methodological approach used
  • Analysis types and procedures
  • Description of the collected data
  • Data dictionary defining the variables used and the objects
  • Key concepts, vocabularies, classification systems
  • Units of measurement
  • Tools or software used, source code
  • Information about the people who worked on the project and performed each task
  • Readme files that describe the context of the datasets, their structure, rights, usage limitations, etc.

Purpose of metadata

Metadata is highly structured documentation. Machine-readable or actionable metadata make your (meta)data more discoverable, accessible, interoperable, and reusable. They enhance the quality and visibility of data within the scientific community, thereby increasing their potential for reuse and recognition. They are also useful for the long-term digital preservation of data.

Metadata should be accompanied by sufficient documentation (such as software manuals, survey designs, and user guides, etc.) prepared by the person who created the data, to enable the use of the resource by others. (Adapted from Digital Preservation Coalition, 2021)

Readme File

The README file is a text file usually named README.txt or LISEZ-MOI.txt, in the open .txt format, which presents and explains a project. It is part of the data documentation that should be produced at the beginning of the project. It allows recording any information that cannot be stored in a highly structured manner as free-form text.

Potential users of the project's data typically consult it before accessing the data.

Code hosting services such as GitHub, Bitbucket, and GitLab will also search for and display your README file along with the list of files and directories in your project.

When should I create a README file?

Before showing a project to others or making it public, and even during the planning phase. It is recommended to make it the first file you create in a new project.

Additionally, some data repositories may require a README file to be deposited along with the list of files related to the project's data and any relevant information. Creating a README file at the beginning of each project will save you time later on.

Where should I put it and what should it contain?

Depending on the number of folders/files you have and the number of years you will keep them, you can create a README file for your top-level directory or for each created folder and subfolder in your directory to document specific parts of your data.

It is recommended to have a README file in the top-level directory of the project since that's where someone unfamiliar with your project will start their consultation. The README file placed at this level should contain general information about the project and the data organization system used.

If a README file is placed in a subfolder containing raw or processed data, it should contain descriptive information for that data.

Make sure your README files are not too long.

(Adapted from ELIXIR Belgium, 2020 and Make a README, 2018)

Recommended contents of the README file

The recommended minimum content for data reuse is shown in bold (Cornell, 2023).
 

General information

  1. Provide a title for the dataset
  2. Name/institution/address/e-mail information for
    • Principal investigator (or person responsible for data collection)
    • Associate or co-investigators
    • Contact person for questions
  3. Date of data collection (can be a single date or a range)
  4. Information on geographical location of data collection
  5. Keywords used to describe the subject of the data
  6. Language information
  7. Information on funding sources that supported data collection

 

Data and file overview

  1. For each file name, a brief description of the data it contains
  2. File format if not obvious from file name
  3. If the dataset comprises several related files, the relationship between the files, or a description of the file structure that contains them (possible terminology could include "dataset", "study" or "data package").
  4. File creation date
  5. Date(s) on which file(s) were updated (versioned) and nature of update(s), if any
  6. Information on associated data collected but not included in the described data set.

 

Sharing and accessing information

  1. Licenses or restrictions imposed on data
  2. Links to publications that cite or use the data
  3. Links to other publicly accessible data locations (see data sharing best practices for more information on identifying repositories)
  4. Recommended citation for data (see best practices for data citation)

 

Methodological information

  1. Description of data collection or generation methods (include links or references to publications or other documents containing the experimental design or protocols used)
  2. Description of data processing methods (describe how data were generated from raw or collected data)
  3. Any software or instrument-specific information needed to understand or interpret the data, including software and hardware version numbers.
  4. Standards and calibration information, if applicable
  5. Describe any quality assurance procedures applied to data
  6. Definitions of codes or symbols used to note or characterize poor quality/discreditable/aberrant values that people should be aware of
  7. Persons involved in sample collection, processing, analysis and/or submission.

 

Data-specific information
*Repeat this section if necessary for each data set (or file, if applicable)*.

  1. Number of variables and number of observations or rows
  2. List of variables, including full names and definitions (spell out abbreviated words) of column headings for tabular data
  3. Units of measurement
  4. Definitions of codes or symbols used to record missing data
  5. Specialized formats or other abbreviations used

 

Useful resources

Guidelines and templates for README files

Best practices in brief for formatting a README

  • 1st document to be created at the start of the project
  • Save it at the top level of the project directory
  • Give it a name that will be easily associated with the data files it describes (
  • Write your README document as a text file - (avoid proprietary formats like MS Word) - open formats are always more durable.
  • Create a README for each identical data file
  • Format all your README files identically (use the same terminology).
  • Use standardized date formats

References

ELIXIR Belgium (2020). RDM guide. Retrieved from https://rdm.elixir-belgium.org/about_DMP

Digital Preservation Coalition (DPC). (2021). Digital Preservation Handbook. Retrieved from https://www.dpconline.org/docs/digital-preservation/handbook

Université Sherbrooke (2022). Research Data Management: Documenting your project and processes. Retrieved from https://libguides.biblio.usherbrooke.ca/gdr/documenter

Make a README, 2018. Retrieved from https://www.makeareadme.com/