“Documentation is a love letter that you write to your future self.” Damian Conway (2005)
Data documentation can be defined as the clear description of everything that a new "data user" or "your future self" would need to know in order to find, understand, reproduce, and reuse your data, independently and without the risk of misinterpretation. It should clearly describe how you generated or used the data, why, and where to find the associated files. It can also serve as integration documentation for new colleagues, even if the responsible researcher leaves the project. Developing a data management plan (DMP) can facilitate the development of documentation.
Data documentation is necessary at two levels: documentation about the entire study or project, and documentation about individual records, observations, or data points. (Adapted from ELIXIR Belgium, 2020)
Here is a list of important elements to consider for ensuring proper documentation that facilitates information retrieval, data sharing within the research team, and potential reuse while ensuring their longevity throughout their useful life (Université de Sherbrooke, 2022):
Metadata is highly structured documentation. Machine-readable or actionable metadata make your (meta)data more discoverable, accessible, interoperable, and reusable. They enhance the quality and visibility of data within the scientific community, thereby increasing their potential for reuse and recognition. They are also useful for the long-term digital preservation of data.
Metadata should be accompanied by sufficient documentation (such as software manuals, survey designs, and user guides, etc.) prepared by the person who created the data, to enable the use of the resource by others. (Adapted from Digital Preservation Coalition, 2021)
The README file is a text file usually named README.txt or LISEZ-MOI.txt, in the open .txt format, which presents and explains a project. It is part of the data documentation that should be produced at the beginning of the project. It allows recording any information that cannot be stored in a highly structured manner as free-form text.
Potential users of the project's data typically consult it before accessing the data.
Code hosting services such as GitHub, Bitbucket, and GitLab will also search for and display your README file along with the list of files and directories in your project.
Before showing a project to others or making it public, and even during the planning phase. It is recommended to make it the first file you create in a new project.
Additionally, some data repositories may require a README file to be deposited along with the list of files related to the project's data and any relevant information. Creating a README file at the beginning of each project will save you time later on.
Depending on the number of folders/files you have and the number of years you will keep them, you can create a README file for your top-level directory or for each created folder and subfolder in your directory to document specific parts of your data.
It is recommended to have a README file in the top-level directory of the project since that's where someone unfamiliar with your project will start their consultation. The README file placed at this level should contain general information about the project and the data organization system used.
If a README file is placed in a subfolder containing raw or processed data, it should contain descriptive information for that data.
Make sure your README files are not too long.
(Adapted from ELIXIR Belgium, 2020 and Make a README, 2018)
The recommended minimum content for data reuse is shown in bold (Cornell, 2023).
General information
Data and file overview
Sharing and accessing information
Methodological information
Data-specific information
*Repeat this section if necessary for each data set (or file, if applicable)*.
Guidelines and templates for README files
ELIXIR Belgium (2020). RDM guide. Retrieved from https://rdm.elixir-belgium.org/about_DMP
Digital Preservation Coalition (DPC). (2021). Digital Preservation Handbook. Retrieved from https://www.dpconline.org/docs/digital-preservation/handbook
Université Sherbrooke (2022). Research Data Management: Documenting your project and processes. Retrieved from https://libguides.biblio.usherbrooke.ca/gdr/documenter
Make a README, 2018. Retrieved from https://www.makeareadme.com/