UC DATA is primarily an archive for machine readable datasets. This means that we do not, by and large, archive the results or findings from a study, but rather the raw data. This data is stored as a string of numbers or characters in a file on a tape, cartridge, or CD. As a result, in addition to identifying which study or dataset can hopefully help to answer your questions, you will also need to be prepared to locate and access the data files, identify what the numbers in different places in the file represent, and analyze them using a statistical package. In order to effectively use the data archive, therefore, you must know:

Finding the right dataset

The holdings at UC DATA which can be searched include both current holdings (the data sets we already have) and datasets we can obtain from the Inter-university Consortium forPolitical and Social Research (ICPSR), or other sources. ICPSR Holdings may be searched by study number, title keyword, study description, or as an alphabetical title list. If you are casting a broad net for possible data sources, browsing by study description or title key word may be most appropriate. If you think a specific dataset may have what you want, you may wish to search by title or study number, then examine the study description. Studies distributed through ICPSR which are not part of our current holdings may be ordered, but may require a week to 10 days before they become available. These studies may only be used by UC Berkeley students, faculty and staff.   This and other collection specific search engines are orgainzed with our on-line data holdings under the "Data" link to your left along the main menu.  Contact the data archive for further information. In addition to searching for data at UC DATA, you may also consider searching other Social Science Web sites, a good collection of these can be found under "Links" to your left along the main menu.

Getting the access information for your dataset

Our holdings reside on a number of different media, including 9-track tape, CD-ROM, 3480 tape cartridge, and on the campus UNIX system, accessible by simple copy commands to anyone with a socrates account on this campus, as well as by anonymous ftp, command line and/or browser intergrated by anyone logged on from within the ~.berkeley.edu domain.  Data on tapes and cartridges will require that you contact the archive in order to get it read and onto our on-line collection, this may appreciably affect delivery time, as the campus CMS system is currently being dismantled for reason of antiquity.  Some of these tapes and cartridges are physically located at Evans Hall, and can be mounted more easily than others. Others are physically located at the Survey Research Center, and must be sent to Evans Hall prior to use.  Eventually (in 1999), all access will be discontinued.  If you know you will be needing data that is currently part of our CMS  collection, you should request it to be made availble by us as soon as possible.

The information necessary to mount tapes and access files on those tapes include the Slot ID, internal tape ID, file number, record length and block size. You may e-mail the archive staff (archive@ucdata.berkeley.edu) to request tape access information. You must include the UC DATA study number and study title in your request.

Although we do not provide consulting on the utilities to mount tapes, some examples of IBATCH jobs to mount tapes and cartridges are available.

From Raw Data to Analysis

The manner in which data is organized within a file can vary widely. All of the data for a single observation may be on one record or, as is common with older datasets, spread across several 80 character records. All of the records in a file may be of the same type, with the same variable represented in particular column locations for all records, or record types may be mixed within a file, as is the case the Census Public Use Microdata Sample. In order to discover how the data is organized within a data file, you will need to look at either a codebook for the study or a data dictionary. Many studies have only machine readable documentation, which you will need to copy to your hard disk or print. We also have printed codebooks for a large number of studies. You may look at these codebooks in the archive office, located in the Survey Research Center at 2538 Channing Way in Berkeley, during our normal operating hours (9:00 - 12:00 in the morning and 1:00 - 5:00 in the afternoon, Monday through Thursday, and 9:00 - 12:00 on Fridays.)

Many datasets are accompanied by either SAS or SPSS setup files designed to ease the creation of an analysis file and containing variables names, column locations, variable labels, and value labels. OSIRIS dictionaries, which can be used by SAS and SPSS, may also accompany the dataset. If none of these are available, or if you are using some other software package for analysis, you will need to refer to the codebooks, data dictionaries, and the documentation for the software package you are using for analysis to proceed further.

We anticipate adding on-line extract and analysis capability to a subset of our data collection using SDA begining in January of 1999.

While UC DATA provides statistical consulting only to paying clients, statistical consulting for UC Berkeley students and staff are available from statistical consulting service, #443 Evans Hall, call the main statistics office at 642-2781 for hours. Some example SPSS and SAS command files are also available.

http://ucdata.berkeley.edu/new_web/newhelp.html
Last revision 12/23/98
Copyright 1995 UC DATA