If a dataset is to be archived, it must be organised in such a way that other people can read it. If you have been following the data management recommendations, the decisions required at this point will already have been made, implemented, monitored and documented.
Final Data File Preparation
Ideally, the dataset should be accessible using a standard statistical package, such as SAS, SPSS, or Stata. Essentially, there are three broad choices: (a) provide the data in raw (ASCII) form along with documentation and let users prepare their own programs; (b) store the data in ASCII form, along with setup files to read them into a standard program; or (c) store the data as a "system file" in an analysis package. Each of these alternatives has its advantages and disadvantages, which are explained in greater detail for example in the ICPSR Guide (pp. 25-26).
In general, data are ready for deposit when the following conditions are met:
- The dataset is technically suitable and documentation sufficient to allow archiving and secondary use.
- Copyright has been cleared, and there are no remaining legal issues.
- There are no other legal impediments to archiving.
- The original purpose of data collection does not prevent archiving.
Stages of Deposit
The minimum common stages of deposit applicable in most archives are outlined here.
- Depositor completes a data submission form for return to archive (see the ESDS's data review form for an example).
- The archive carries out an "acquisitions review" to ensure data are complete and ready for deposit. The aim of these preliminary checks is the creation of a dataset in which all cases are complete as well as meaningfully identified and are fully consistent with the encoding. A list of things which might be checked by the archive would include:
- Completeness of primary documents: Is the questionnaire present? Is the coding frame complete? Is the documentary material sufficient for a methodological description of the study?
- Checking of the storage medium: Is the storage medium readable and virus-checked?
- Checking of the data: Has the correct dataset been deposited? Is the number of cases correct? Are the questionnaire, coding frame and data consistent? Are there any undefined or wild codes/duplicate cases?
- If data are accepted, the depositor then completes a set of forms for return with the dataset. Typically, these will include:
- Data collection deposit form (or, for example, a dataset description form for the Finnish Data Archive) which collects all the information required for information retrieval purposes: for the construction of the catalogue record (which allows discovery and proper citation) and for internal administrative purposes.
- A licence agreement, which specifies the rights and responsibilities of both parties and authorises the archive to preserve and to distribute the data collection under the terms and conditions specified (see the Intellectual Property Rights section for more details).
Each archive will have differing access categories to be assigned to the individual dataset, ranging from open access to access to parts of the data only and then under strictly monitored conditions. These are also described in the Rights & Confidentiality section.