Processing
Once the data are formally accepted by the archive and the full set of data materials have been received from the depositor, the archive begins its processing or ingest activities:
1. Checking the Integrity of the Data and Metadata
- A series of checks is made to ensure that the data are what they purport to be and make sense; any errors or queries uncovered are usually referred back to the producer; the archive will rarely, if ever, "correct" the data.
- Further checks are made to ensure that the data are well-documented so that the potential future user can understand what the data represent and how they were collected and constructed.
2. Disclosure Control Checking
- A set of checks for confidentiality and potential disclosure control are made - a first set of checks for any obvious disclosure control issues (inclusion of names/individual contact details) are referred back to the depositor, for correction or resolution.
- Checks for clusters of variables which might compromise confidentiality are also made which are again referred back to the depositor.
- Consideration of any other legal issues is made.
3. Production of a Catalogue Record
- The archive will, in most cases, produce an online web-based catalogue record of the data collection to enable location and ordering of the materials. The catalogue record describes the content of the data collection together with details of its provenance. This will for many CESSDA archives be based on the Data Documentation Initiative (DDI) metadata standard, which is the XML-based standard used to capture and store its catalogue records. It allows for the interchange of metadata with other archives in the form of a common cross institutional catalogue such as the CESSDA web catalogue. The DDI includes all the key Dublin Core fields and can be mapped to ISAD(G).
4. Conversion of Data and Metadata to Dissemination Formats
- Once all checks have been satisfactorily completed and any arising problems resolved, the data materials will be converted to an appropriate dissemination format. In most cases, this means that they are transferred to a suitable effective and efficient data dissemination format, which is more user-friendly (and document-friendly). This usually means conversion to the software formats most commonly utilised by its user community, and converting metadata to pdf format for downloading via its website and via the CESSDA website.