Statement of Work for FY 2001-2004
UC Data Archive & Technical Assistance
Between 1973 and 1996 Lawrence Berkeley Laboratory (LBL) amassed an enormous collection of numeric social science statistical data used for government planning purposes by the U.S. Departments of Labor, Energy and Army Corps of Engineers. This rich data collection is in imminent danger of extinction. The collection includes numerous invaluable historical electronic files (such as 1960 Census summary information and 1970 Census tract digitized boundary files) found nowhere else in the country.† Support for the database at LBL ended in 1998 and the last remaining computer upon which this data resides is being kept alive at the Bureau of the Census; this computer (a 1980ís vintage Digital Equipment VAX computer) could cease operations at any time. The UCB Library and UC DATA have been working to rescue and preserve this data and make it available to researchers and UCB and elsewhere through the Government and Social Science Information (GSSI) web site.††† Data of this type has been widely used on the UCB campus for research in Demography, Political Science, City and Regional Planning, Agricultural and Resource Economics, and Epidemiology.
SCOPE OF PROJECT:
The UCB Library and UC DATA propose a two phased effort to rescue of 1970 and 1980 Census data, as well as 1960 county level data and historic census tract boundary files.†
Phase I: Convert data from original compressed format, archive and make available for electronic storage and transfer.
These data were stored in a unique compression format by Lawrence Berkeley Laboratory during the 1970ís and 1980ís.† The Historical Census Rescue Project will decompress these data into ASCII format for ftp access at the GovDocs section of the University of California Social Science and Government Data Library web site.
†Phase II: Install data for web-accessible retrieval, display, and selective downloading
Data extracted and archived will be installed into a WWW
interface for selection by area and data item, then retrieved and displayed as
html tables, with option for downloading in most modern data formats (e.g Excel
spreadsheets).† Then users will be able,
for example, to select all census tracts with Latino population greater than
15% and display or download family composition median family income statistics.
This phase will require some computer programming effort on the part of UC DATA
and The UCB Library.† We expect to
utilize software already developed by the California Digital Library Counting California project which covers
1990 and 2000 Census data for
PHASE I: CENSUS FILES TO BE DECOMPRESSED AND MADE AVAILABLE VIA FTP
Priority 1) 1960 Census of Population
This file has 1000 subject matter cells for each county.† Geographic coverage is: counties as designated by 1960 census county codes and 1960 FIPS county codes.
Priority 2) 1980 Census of Population
Priority 3) 1970 Census of Population
Priority 4) 1970 Census Tract Map Boundary Files
During the 1970ís Lawrence Berkeley Laboratory digitized the map boundary files for 1970 Census tracts in Metropolitan Statistical Areas.† This boundary file was utilized in production of the 1970 Urban Atlas series, a joint project between the Bureau and the Department of Labor.† About 35,000 polygon vector lat-long records are available.
Priority 5) Other Datasets:
Other datasets which should be actively considered for rescue are:
Each dataset will need to have a human-readable version of the following associated documentation:
PHASE II: ACCESS SOFTWARE AND METADATA:
UC DATA proposes to develop software which will facilitate interactive geographic area selection and display of data as tables.† All metadata which describes† the data will be created or converted to XML DTD specifications of the summary tables according to an emerging documentation standard for social science data under development at University of Michiganís Inter-university Consortium for Political and Social Research..† With the application of XML, XSL style sheets will be developed for multidimensional table display.† We would expect to utilize the California Digital Library (CDL) Counting California projectís collection of database and SAS programs for the task of data request and display, and thus avoid original software development as much as possible.† However, the project will require a half-time programmer analyst to adapt the CDL software to retrieve data for sub-county and sub-city geography and design and load new databases accordingly.
PROJECT MANAGEMENT: Project management will be under the direction of Dr. Fred Gey, assistant director of UC DATA, worked at LBL for many years as database manager for the LBL data collection, until coming to the UCB campus in 1989. Dr. Gey is familiar with the unique LBL data formats.† He directed a data rescue effort which recently made available 1970 Census data for nearly 300,000 areas (about 90 million pieces of data).† He will be assisted by principal data archivist Ilona Einowski, director of User Services at UC DATA, who has worked on the Counting California project as an XML content expert.