In order for data to be used properly by you, your colleagues, and other researchers in the future, the data must be documented. Data documentation (also known as metadata) enables you to describe the content, formats, and internal relationships of your data in detail and will enable other researchers to find, use, and properly cite your data.

It is critical to start documenting your data at the very beginning of the research project, before data collection begins. Doing so will make documentation easier and reduce the likelihood that aspects of the data will be forgotten later in the research project.

What to Document

Research Project Documentation

  • Context of data collection
  • Data collection methods
  • Structure and organization of data files
  • Data source used
  • Data validation, quality assurance
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access, and use conditions

Data Documentation

  • Variable names and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • File format and software (including version) used

Researchers can choose among various metadata standards, often tailored to a particular discipline or file format. The Digital Curation Center has created a directory of discipline specific metadata standards:

Data centers and subject-specific repositories may require specific metadata in order to deposit your data. Check with any repositories before you begin outlining the metadata plan for your data. At minimum, store documentation in a “readme.txt” file.