UC DATA is primarily an
archive for machine readable datasets. This means that we do not, by and
large, archive the results or findings from a study, but rather the raw
data. This data is stored as a string of numbers or characters in a file
on a tape, cartridge, or CD. As a result, in addition to identifying which
study or dataset can hopefully help to answer your questions, you will
also need to be prepared to locate and access the data files, identify
what the numbers in different places in the file represent, and analyze
them using a statistical package. In order to effectively use the data
archive, therefore, you must know:
-
How to find what studies or
datasets may be of potential use
-
Where to find the tape location
and access information for the dataset
-
The record layout of the data
file
-
How to use a statistical package
(e.g. SPSS, SAS) to analyze your data
Finding the right dataset
The holdings at UC DATA which
can be searched include both current holdings (the data sets we already
have) and datasets we can obtain from the Inter-university Consortium forPolitical
and Social Research (ICPSR), or other sources. ICPSR Holdings may be searched
by study number, title keyword, study description, or as an alphabetical
title list. If you are casting a broad net for possible data sources, browsing
by study description or title key word may be most appropriate. If you
think a specific dataset may have what you want, you may wish to search
by title or study number, then examine the study description. Studies distributed
through ICPSR which are not part of our current holdings may be ordered,
but may require a week to 10 days before they become available. These studies
may only be used by UC Berkeley students, faculty and staff.
This and other collection specific search engines are orgainzed with our
on-line data holdings under the "Data" link to your left along the main
menu. Contact the data archive for further information. In addition
to searching for data at UC DATA, you may also consider searching other
Social Science Web sites, a good collection of these can be found under
"Links" to your left along the main menu.
Getting the access information
for your dataset
Our holdings reside on a number
of different media, including 9-track tape, CD-ROM, 3480 tape cartridge,
and on the campus UNIX system, accessible by simple copy commands to anyone
with a socrates account on this campus, as well as by anonymous ftp, command
line and/or browser intergrated by anyone logged on from within the ~.berkeley.edu
domain. Data on tapes and cartridges will require that you contact
the archive in order to get it read and onto our on-line collection, this
may appreciably affect delivery time, as the campus CMS system is currently
being dismantled for reason of antiquity. Some of these tapes and
cartridges are physically located at Evans Hall, and can be mounted more
easily than others. Others are physically located at the Survey Research
Center, and must be sent to Evans Hall prior to use. Eventually (in
1999), all access will be discontinued. If you know you will be needing
data that is currently part of our CMS collection, you should request
it to be made availble by us as soon as possible.
The information necessary
to mount tapes and access files on those tapes include the Slot ID, internal
tape ID, file number, record length and block size. You may e-mail the
archive staff (archive@ucdata.berkeley.edu)
to request tape access information. You must include the UC DATA study
number and study title in your request.
Although we do not provide
consulting on the utilities to mount tapes, some examples of IBATCH jobs
to mount tapes and cartridges are available.
From Raw Data to Analysis
The manner in which data is
organized within a file can vary widely. All of the data for a single observation
may be on one record or, as is common with older datasets, spread across
several 80 character records. All of the records in a file may be of the
same type, with the same variable represented in particular column locations
for all records, or record types may be mixed within a file, as is the
case the Census Public Use Microdata Sample. In order to discover how the
data is organized within a data file, you will need to look at either a
codebook for the study or a data dictionary. Many studies have only machine
readable documentation, which you will need to copy to your hard disk or
print. We also have printed codebooks for a large number of studies. You
may look at these codebooks in the archive office, located in the Survey
Research Center at 2538 Channing Way in Berkeley, during our normal operating
hours (9:00 - 12:00 in the morning and 1:00 - 5:00 in the afternoon, Monday
through Thursday, and 9:00 - 12:00 on Fridays.)
Many datasets are accompanied
by either SAS or SPSS setup files designed to ease the creation of an analysis
file and containing variables names, column locations, variable labels,
and value labels. OSIRIS dictionaries, which can be used by SAS and SPSS,
may also accompany the dataset. If none of these are available, or if you
are using some other software package for analysis, you will need to refer
to the codebooks, data dictionaries, and the documentation for the software
package you are using for analysis to proceed further.
We anticipate adding on-line
extract and analysis capability to a subset of our data collection using
SDA begining in January
of 1999.
While UC DATA provides statistical
consulting only to paying clients, statistical consulting for UC Berkeley
students and staff are available from statistical consulting service, #443
Evans Hall, call the main statistics office at 642-2781 for hours. Some
example SPSS and SAS command files are also available.
http://ucdata.berkeley.edu/new_web/newhelp.html
Last revision 12/23/98
Copyright 1995 UC DATA