Aller au contenu principal
Retour au site principal de la Bibliothèque de l'ÉTS
 

Research Data Management (RDM): Write a README file

What's a README file?

A "README" file is a guide for your dataset. It is typically a plain text file to maximize ease of use and long-term preservation potential. The purpose of a README file is to help other researchers (or yourself in the future) in understanding your dataset, its content, origin, license, and how to interact with it. README files are commonly named README or LISEZ-MOI, readme.txt or lisezmoi.txt, or read-me.md.

The name "README" signifies that the file contains important information, and the file type, "TXT," can be opened by various software, making the content widely accessible.

README files are included as a component of a dataset.

README file or metadata for data repositories?

When you deposit your data in repositories (e.g. Borealis or FRDR), you are asked to provide metadata. A README file complements, but does not replace, repository metadata.

The best practice is to record information in both the repository metadata and the README file. Repository metadata will facilitate searching within and between data repositories, while the README file follows the dataset and continues to describe it after it has been separated from its original context. In all cases, you should use the conventions appropriate to your discipline to record information about your dataset.
 

Exercise 1 - Find the meaning of a data set

Access this data set by clicking on the link:

Clark, Luke, 2019, “Role Reversal: The Influence of Slot Machine Gambling on Subsequent Alcohol Consumption”, https://doi.org/10.5683/SP2/SLOY0N, Borealis, V1, UNF:6:zsehCAz4agntvPwDZF03OA== [fileUNF]

Select and download the data file "Gambling_Alcohol_Study 1_Archive.tab" in the original file format.

Looking at the data, try to answer the following questions:

  • Describe the different playing conditions in this study.
  • What does a variable called "ResultingBAC" mean?
  • How was the data collected?

"What do you observe?"

Recommended contents of the README file

The recommended minimum content for data reuse is shown in bold (Cornell, 2023).
 

General information

  1. Provide a title for the dataset
  2. Name/institution/address/e-mail information for
    • Principal investigator (or person responsible for data collection)
    • Associate or co-investigators
    • Contact person for questions
  3. Date of data collection (can be a single date or a range)
  4. Information on geographical location of data collection
  5. Keywords used to describe the subject of the data
  6. Language information
  7. Information on funding sources that supported data collection

 

Data and file overview

  1. For each file name, a brief description of the data it contains
  2. File format if not obvious from file name
  3. If the dataset comprises several related files, the relationship between the files, or a description of the file structure that contains them (possible terminology could include "dataset", "study" or "data package").
  4. File creation date
  5. Date(s) on which file(s) were updated (versioned) and nature of update(s), if any
  6. Information on associated data collected but not included in the described data set.

 

Sharing and accessing information

  1. Licenses or restrictions imposed on data
  2. Links to publications that cite or use the data
  3. Links to other publicly accessible data locations (see data sharing best practices for more information on identifying repositories)
  4. Recommended citation for data (see best practices for data citation)

 

Methodological information

  1. Description of data collection or generation methods (include links or references to publications or other documents containing the experimental design or protocols used)
  2. Description of data processing methods (describe how data were generated from raw or collected data)
  3. Any software or instrument-specific information needed to understand or interpret the data, including software and hardware version numbers.
  4. Standards and calibration information, if applicable
  5. Describe any quality assurance procedures applied to data
  6. Definitions of codes or symbols used to note or characterize poor quality/discreditable/aberrant values that people should be aware of
  7. Persons involved in sample collection, processing, analysis and/or submission.

 

Data-specific information
*Repeat this section if necessary for each data set (or file, if applicable)*.

  1. Number of variables and number of observations or rows
  2. List of variables, including full names and definitions (spell out abbreviated words) of column headings for tabular data
  3. Units of measurement
  4. Definitions of codes or symbols used to record missing data
  5. Specialized formats or other abbreviations used

 

Style and process

THE STYLE

The way you write your README is as important as the information you include. Remember to be as clear as possible. Here are some best practices for documenting data:

  • Don't use jargon;
  • Define terms and acronyms;
  • Make documentation machine-readable (avoid special characters).

 

An example of README content:

 

To use the ÉTS README template, click here.

For more information, see the README section of this guide.

 

THE PROCESS

"Document your work as you go along, so you don't lose any details. If you wait until the end of your project, you may have already lost or forgotten valuable information."

 

You can create a README using any text editor (e.g. TextEdit, Notepad++, Atom.io, Sublime Text) or word processor (e.g. Word, LibreOffice).

However, save your README as UTF-8 encoded text. Using plain text preserves your information because it relies on sustainable, open standards rather than proprietary formats. If you're using GitHub, your README should be written using Markdown syntax (readme.md).

Store the README at the top level of the project folder on your computer, next to the project files.

Exercise 2 - Fill in a README

Download the ÉTS README template and choose a data project you're currently working on. Spend 5 to 7 minutes filling it in.

Pay particular attention to the list of variables. A dataset without named variables is not useful. How would your peers know what a variable named "Data.VF.1", for example, means?

Congratulations!

Now you're ready to write a good README file so that other researchers can understand your dataset without a hitch!

 

Photo by Vasily Koloda on Unsplash

References

This page is an adaptation of the guide Introduction to Research Data Management : File Naming by UBC Library Research Commons licensed under a Creative Commons Attribution 4.0 International licence (CC BY).Research Data Management : File naming is licensed under  a Creative Commons Attribution 4.0 International licence (CC BY) by ÉTS Library.

Kristin Briney (2023).Chapter 2- Documentation. Dans The Research Data Management Workbook. Caltech Library. https://doi.org/10.7907/z6czh-7zx60

Alert icon. Icône repéré sur Flaticon:  <a href="https://www.flaticon.com/free-icons/alert" title="alert icons">Alert icons created by Pixel perfect - Flaticon</a>