Data Deposit Information and Forms
The following guidelines aim to set out the minimum standards required for the deposit of data at the South African
When depositing data with SADA, the Depositor must provide the following:
A depositor's form, providing information about the principal
investigator and depositor, the study description and technical issues and specifications.
A copy of the codebook or code lists.
Copies of the data files in a machine-readable format
(e.g. ASCII or SPSS), and the data definition files detailing the characteristics of records and variables assigned.
A copy of the data collection instrument(s) (e.g. Questionnaire and Interviewer manual if available).
Ownership and Copyright
It is important to note that the depositor always retains full copyright and ownership of the data. The data do not
in any way belong to SADA. SADA stores, preserve, administers and controls access. In addition, the data are only
available to other researchers subject to conditions laid down by the depositor. Depositors may provide unrestricted
access to their data through SADA or may specify restrictions to which SADA adheres to.
Depositor's form information
Part 1: Information about the investigator(s) and depositor
The names, contact details and institutional affiliations of the principal investigators, co-investigators and
depositor of the study are required in this section.
Part 2: Study description
The study description should include the following:
- The title of the study.
- The time period covered, that is, the start and end date of the field work period.
- Geographic coverage of the data, for example, national.
- The name(s) of organisation(s) responsible for data collection.
- The details of any funding organisation(s).
- The type of data collection, for example, survey or census.
- The units of observation, for example, individuals, households or groups.
- The number of observations or cases.
- The number of variables.
- The overall response rate.
- Weighting procedures (if applicable).
- Time dimensions, for example, cross sectional, longitudinal, panel or trend.
- A brief study description, listing the objectives of the study.
- The original language(s) employed in the study.
- The data collection methods used, for example, face-to-face interviews or telephone surveys.
- The type of questionnaire used (if applicable), for example, open-ended or structured.
- The sampling method, for example, random, cluster or quota sampling.
- A list of both published and unpublished papers or reports.
Part 3: Technical issues and specifications
This section should include:
- The number of data files included.
- Details on whether the data files are compressed or uncompressed, zipped or unzipped and if they can be merged.
Storage media, such as disks and CDs, should be clearly labelled, ensuring that the external labels and filenames
- Variable names and value labels, where possible.
Information on derived variables and other recoding. The depositor should specify the following:
- Source variables
- The question numbers and names to which the original variables relate
- The new variable label and value labels
- If a derived variable is created for only part of a sample, this should be made clear
Procedures taken to correct errors. The documentation accompanying a dataset should describe how the checking
of errors was done, and if different versions of the data are produced, this must be adequately described.
- A brief description of all data and documentation files.
Please complete the electronic depositor's form
Codebook or codelist information
A codebook can be defined as documentation for a study which includes the complete technical description of each
question or variable, as well as the actual location of the question in the data record.
The following information must be included in the codebook:
Identify names or numbers of variables: These are included when the data are prepared with certain software
systems (for example, SAS, SPSS or Excel). A variable name is an abbreviation or summary for each question.
Variable numbers are usually assigned sequentially to each question.
Location of variables: Each variable must have a data location. If a card format is used, card and column numbers
must be assigned. If a logical record format is used, only column numbers are given.
Questionnaire text: The complete text of each question should be recorded. The use of abbreviated names may
cause confusion since the name may not adequately convey what was asked of the respondent.
Explanatory text: The coding conventions employed and any interviewer instructions should be included with the
codebook. Information contained on flash or show cards should also be part of the explanatory text.
Code categories: All coded fields of information, together with a description of each coded value, must be recorded.
If abbreviations or other standardisations are systematically used, they should be defined in the codebook.
Wild codes should be documented as wild codes.
Missing data: Missing data values for each variable should be defined clearly. If certain questions are applicable
only to a subset of the population, that subset needs to be described in appropriate text or code description.
There are two types of missing data:
Item non-response: The documentation should outline the reasons why specific terms are missing; it should
note if specific conventions such as blanks were used for "not applicable"; if new values were estimated
for missing codes, this should be detailed, as well as how they were calculated; and, if estimated values
were flagged for identification purposes, this should also be discussed.
Case non-response: The documentation should detail if reasons were recorded for missing cases; if cases
were retained for which there was only partial information; it should discuss if weights were used to
compensate for non-responses or sample design, and what types of weights were used; and if weights were used,
can weighted data be distinguished from non-weighted data.
- Derived variables: These should be clearly marked and documented.
Confidentiality procedures: It is essential that the confidentiality and anonymity of data subjects be maintained.
This may be difficult, especially in the case when geographic identifiers can be used to breach confidentiality.
Thus, a full statement of confidentiality procedures (for e.g. excluding explicit references to persons, households
or institutions) should be included.
Information on data files
Data should be adequately documented. The ideal format of the data is one in which the data are written in a
standard format (e.g. ASCII or SPSS), and accompanied by a data definition file for the software used. SADA accepts
data in ASCII, SPSS, SAS or Excel formats.
A separate code should be assigned to missing data, and this should always be made clear in the accompanying
documentation. Blanks, or assigning zero as a code, should be avoided are far as possible, as this often causes
View All Datasets
Labour and Business
Surveys and Censuses
Recent Data Submissions
SADA is covered by
Data Citation Index
ICSU World Data Systems
Summer Program in Quantitative Methods