The file format is the structure of a file that tells a program how to display its content. For example, a Microsoft Word document saved in the .DOC file format displays best in Microsoft Word. Even though another program may be able to open the file, it may not have all the necessary features to properly interpret the document. (Computer Hope, 2022)
Each type of file (text, image, or sound) has multiple file formats. The format is indicated by the file name extension.
Illustration by Alfan Subekti found on Vecteezy
Here are some of the most commonly used formats:
A dance club in Châteauguay has kept its documents since the early 2000s. Recently, a club member wanted to refer back to an agenda from July 5, 2003 (please refer to the WordPerfect file below) but was unable to open it.
Can you download this document and attempt to recover it on your computer?
Photo by David Pupăză found on Unsplash
Finding: This is an old proprietary file that can no longer be open. This inaccessibility could also occur with other proprietary formats such as Microsoft Word or Google Docs.
Proprietary formats usually are limited by:
You must use specific software provided by one vendor to use the proprietary format.
Photo by Roth Melinda on Unsplash
In some cases, an industry may treat specific file formats as a de facto standard even if the formats are proprietary and rely on expensive software.
Open formats:
Photo by valérie faiola on Unsplash
Please favor open formats because they are more durable and easier to preserve in the long term. |
In addition to choosing between open and proprietary formats, file quality, the representation of the given item’s characteristics, is a large part of the file format decision.
In simple terms, if you want a higher image quality, you will need a larger encoding to handle a higher resolution. However, this means that the file will take up more storage space and be less convenient to share with others.
Photo by Kier in Sight on Unsplash
Proprietary formats are not durable due to inherent usage restrictions. Therefore, it is not recommended to use them for long-term data preservation.
When data analysis is complete and data needs to be prepared for long-term storage, data conversion should be considered. Using open, standard, interchangeable, and durable formats ensures ease of long-term data usability. This practice is also recommended for backups.(UK Data services, 2021)
The following table lists the recommended file formats for data sharing, reuse, and preservation:
File type | Recommended formats |
Text | XML, ASCII, TXT, PDF |
Images | TIFF, JPEG2000, PNG, JPEG/JFIF |
Video | MOV, MPEG-2 |
Audio | PCM, WAVE, DSD |
Dataset | CSV, TSV, .db, .sqlite, Shapefile |
Web Data | JSON, XML, HTML |
For further guidance on recommended format files, please refer to the Resources section of this guide.
Pratical cases of long-term data preservation:
Access this dataset:
Koralesky, Katherine; Sirovica, Lara; Hendricks, Jillian; Moulins, Katelyn ; von Keyserlingk, Marina; Weary, Daniel, 2022, "Social Acceptance of Genetic Engineering Technology", https://doi.org/10.5683/SP3/NX3LZ9 , Borealis, V2
Photo by Compare Fibre on Unsplash
Well done! You now know which file formats are appropriate for data preservation so that your research data can be stored in the long term!
Photo de Vasily Koloda sur Unsplash
This page is an adaptation of the guide Introduction to Research Data Management : File Formats for Data Curation by UBC Library Research Commons licensed under a Creative Commons Attribution 4.0 International licence (CC BY).Research Data Management : File format is licensed under a Creative Commons Attribution 4.0 International licence (CC BY) by ÉTS Library.
Computer hope (2022). File format. Repéré à: https://www.computerhope.com/jargon/f/file-format.htm
UK Data service (2021). File formats. Répéré à https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/file-formats/