CENSUS RESCUE PROJECT (2002-2004)

Current status as of April 30, 2004

The LBL archive was examined for relevant historical datasets which would be useful for University of California researchers.  Two kinds of datasets were identified – those installed in the LBL information system SEEDIS (Socio-Economic-Environmental Demographic Information System) for which the compressed data structure has been completely identified and decoded, and other datasets, originally stored on the Control Data Supercomputer (CDC-7600),  for which the storage architecture (GSS or “general storage system”) has not been completely identified and decoded.

 

List of Relevant Data Sets originally installed in SEEDIS by geographic level

 

List of Relevant Data Sets in GSS format by geographic level (not installed in SEEDIS)

 

List of physical file locations for selected 1970 census datasets (password protected)

 

Programs (C code) have been written which decode and decompress SEEDIS datasets.  This code allows for multiple ASCII output formats (fixed-length fields, fixed length records; comma separated variable length fields; and one field per line).

 

Programs have been written, but not fully debugged which decode and decompress the CDC GSS dataset formatted data.  The current code does not always unpack all fields, nor can it reliably recognize record boundaries.  All code currently is compiled on a SUN Unix (SOLARIS) system using the GNU gcc compiler.

 

PERL code (scripts) was written to transform the SEEDIS data definitions to definitions for the SPSS statistical software, so that data can be listed and examined for accuracy.

 

A select set of datasets have been decoded and decompressed and their data (for California counties and places) and original metadata have been placed on UC DATA’s census rescue web site.

 

For one dataset (County Data Book 1947-77),  metadata has been converted to Counting California’s MUDD format and data decompressed and installed into Counting California.  Every data value was quality-control checked against published values in six published volumes of the Census Bureau’s City-County Data Book (1947, 1952, 1956, 1962, 1967, 1977).

 

For one other dataset (1960 Census of Population), attempts were made to convert the SEEDIS metadata to MUDD format.  These have led to discovery of errors in the original metadata (same labeling and headers for different data elements).   Some attempt has been made to reconcile.

 

One other dataset (1970 Census of Population Second Count),  some table information has been extracted as a prelude to developing a MUDD.