Aller au contenu principal
Retour au site principal de la Bibliothèque de l'ÉTS
 

Research Data Management (RDM): File Formats

What is a file format?

The file format is the structure of a file that tells a program how to display its content. For example, a Microsoft Word document saved in the .DOC file format displays best in Microsoft Word. Even though another program may be able to open the file, it may not have all the necessary features to properly interpret the document. (Computer Hope, 2022)

Each type of file (text, image, or sound) has multiple file formats. The format is indicated by the file name extension.

 Illustration by Alfan Subekti found on Vecteezy

Here are some of the most commonly used formats:

  • .xls (Microsoft Excel)
  • .mp3 (for digital audio)
  • .docx (Microsoft Word)
  • .gdoc (Google document)

Exercise #1

A dance club in Châteauguay has kept its documents since the early 2000s. Recently, a club member wanted to refer back to an agenda from July 5, 2003 (please refer to the WordPerfect file below) but was unable to open it.

Can you download this document and attempt to recover it on your computer?

Photo by David Pupăză found on Unsplash

Finding: This is an old proprietary file that can no longer be open. This inaccessibility could also occur with other proprietary formats such as Microsoft Word or Google Docs.

What are proprietary formats?

Proprietary formats usually are limited by:

  • software patents;
  • lack of format specification details;
  • built-in encryption to prevent open usage by the public.

You must use specific software provided by one vendor to use the proprietary format.

Photo by Roth Melinda on Unsplash

In some cases, an industry may treat specific file formats as a de facto standard even if the formats are proprietary and rely on expensive software.

What are open formats?

Open formats:

  • are not proprietary;
  • are freely available for everyone to use;
  • allow open-source developers to utilize the published format specifications to write software to utilize the file format in case a particular vendor no longer supports it;
  • may decrease the risk of technical obsolescence by removing the dependency on the underlying technology.

Photo by valérie faiola on Unsplash

Please favor open formats because they are more durable and easier to preserve in the long term.

Quality or Size?

In addition to choosing between open and proprietary formats, file quality, the representation of the given item’s characteristics, is a large part of the file format decision.

In simple terms, if you want a higher image quality, you will need a larger encoding to handle a higher resolution. However, this means that the file will take up more storage space and be less convenient to share with others.

Photo by Kier in Sight on Unsplash

Common file formats recommanded for long-term conservation

Proprietary formats are not durable due to inherent usage restrictions. Therefore, it is not recommended to use them for long-term data preservation.

When data analysis is complete and data needs to be prepared for long-term storage, data conversion should be considered. Using open, standard, interchangeable, and durable formats ensures ease of long-term data usability. This practice is also recommended for backups.(UK Data services, 2021)

The following table lists the recommended file formats for data sharing, reuse, and preservation:

File type Recommended formats
Text XML, ASCII, TXT, PDF
Images TIFF, JPEG2000, PNG, JPEG/JFIF
Video MOV, MPEG-2
Audio PCM, WAVE, DSD
Dataset CSV, TSV, .db, .sqlite, Shapefile
Web Data  JSON, XML, HTML

For further guidance on recommended format files, please refer to the Resources section of this guide.

Exercise #2

Pratical cases of long-term data preservation:

Access this dataset:

Koralesky, Katherine; Sirovica, Lara; Hendricks, Jillian; Moulins, Katelyn ; von Keyserlingk, Marina; Weary, Daniel, 2022, "Social Acceptance of Genetic Engineering Technology"https://doi.org/10.5683/SP3/NX3LZ9 , Borealis, V2

  • Download an Excel file (.xlsx) and convert it to CSV 
  • Download a Word file (.docx) and convert it to PDF ou .txt.

Photo by Compare Fibre on Unsplash

Well done!

Well done! You now know which file formats are appropriate for data preservation so that your research data can be stored in the long term!

Photo de Vasily Koloda sur Unsplash