Data Systems and Statistical Requirements for the Personal Responsibility and Work Opportunity Act of 1996

Henry E. Brady and Barbara West Snow
University of California Data Archive & Technical Assistance(UC DATA)
University of California, Berkeley
 

October 14, 1996

Web Version Date: 12/10/96












Prepared for the Committee on National Statistics of the National Research Council --- National Academy of Sciences
 

DRAFT
Please do not cite or quote without permission.


Henry E. Brady is Director, UC DATA and Professor of Political Science and Public Policy at the University of California.

Barbara West Snow is Research Director, UC DATA and Director of the California Work Pays Demonstration Project.

We would like to thank the staffs of UC DATA and of the Research Branch of the California Department of Social Services with whom we have worked on many of the projects described in this paper. We have learned a lot from them. Two people have made especially important contributions to our thinking about datasets for monitoring social programs. Dr. Fred Gey of UC DATA has instructed us on the technical aspects of database design, construction, and management. Werner Schink, the Chief of the CDSS Research Branch, has provided the vision and leadership that is necessary to bring new data systems to fruition.
 

Points of view or opinions expressed in this document are those of the authors and do not necessarily represent the official position or policies of the Regents of the University of California or the California Department of Social Services
 
 









Table of Contents



I. The Personal Responsibility and Work Opportunity Act of 1996
 

A. Landmark Legislation in the Use of Statistical Data
B. What the Act Does---Goals and Programs


II. Program Implementation and Statistical Needs
 

III. Generic Problems of Monitoring and Evaluation and Some Solutions
 


IV. Specific Examples of What Might be Done
 

V. Challenges to Getting Data Collection Strategies On-Line VI. Conclusions
 
 
 
 
 
 
 
 

Data Systems and Statistical Requirements for the Personal
Responsibility and Work Opportunity Act of 1996
 

Overview

In this paper, we explore the statistical needs for planning, monitoring, research, and evaluation that grow out of the Personal Responsibility and Work Opportunity Act of 1996. In the first section we provide a quick overview of the goals and the programs of the Act with an eye towards their implications for data collection and statistical reporting. This tells us what the Congress and the President intend to happen, but it does not tell us what will happen. We turn, therefore, in the second section to a discussion of how we think the Act's program performance standards and case management requirements, the features most closely related to data collection and statistical reporting, will be implemented. This provides us with a picture of the statistical data collection systems that will likely be in place after the implementation of the Act. This is where we must begin if we are to develop a useful data collection and analysis system.

In the third section, we go beyond the Act to discuss a number of recurring problems facing those who create statistical data systems for monitoring the effects of social programs. We also discuss some solutions to these problems. Then in the fourth section we turn to a number of specific examples of what could be done through the creative use of surveys and administrative data systems by building upon the existing systems and the opportunities provided by the new Act. In the fifth section we touch on some of the problems of resources, interagency coordination, and confidentiality that must be faced to achieve a better nation-wide system for monitoring and evaluating social program participation and the status of the least fortunate Americans. We end with some conclusions.
 





I. The Personal Responsibility and Work Opportunity Act of 1996


 


A. Landmark Legislation in the Use of Statistical Data

The Personal Responsibility and Work Opportunity Reconciliation Act of 1996 is not only a landmark in the development of American social policy, it is also a landmark in the governmental uses of statistical data and information. Major innovations in data collection and in database construction will be required to meet the goals of the Act. For example, to avoid substantial reductions in funding, states must meet strict outcome standards for the employment of welfare recipients. This will require measuring and recording in extraordinary detail the work experience of those receiving aid. States must also set five-year (or stricter) cumulative time limits on the receipt of welfare, and they must get recipients back to work by the time they have accumulated two years of aid. Meeting these goals will require, for the first time, tracking recipients over long periods of time. Longitudinal databases of unprecedented scope will have to be constructed for this purpose. States must strictly enforce child support laws, and several databases, including state and national registries of new hires must be created that can be updated quickly and made available to many governmental social service agencies. Several new studies are called for including a national survey of children who are at risk of child abuse or neglect, and this study must be longitudinal, must yield data at the State level for as many states as possible, must summarize the out-of-home placements of the child, and must determine the frequency of contact with State or local agencies. Any one of these tasks alone would pose a substantial challenge. Together they are formidable indeed.
 
 

In fact, as we read the bill, we were charmed to find that the legislative staff members who drafted it and the legislators who approved it had such faith in the ability of public administrators, survey researchers, database managers, and statisticians to track people over time, update databases regularly and accurately, to measure work effort in enough detail to develop weekly logs of the number of hours and kinds of work undertaken by someone on assistance, and to keep track of the complicated living arrangements of modern American households in this era of single parent families. The truth is that the two major ways that we obtain reliable information, the sample survey and administrative databases, will be stretched to their current limits --- and perhaps beyond them --- as they are called upon to do these things. The bill itself recognizes some of these problems and there is a small section that calls for a report on "what would be required to establish [an automated data processing] system capable of (A) tracking participants in public programs over time; and (B) checking the case records of the States to determine whether individuals are participating in public programs of two or more states" (pages 62-3). (EN #2) And there is another section which calls for the study of "outcomes measures for evaluating the success of the States in moving individuals out of the welfare system through employment as an alternative to the minimum participation rates" described in the new Act (page 63). These sections suggest what is obvious to anyone who reads this bill: its implementation will require a new level of sophistication in the provision of social statistics.
 

In this paper we describe the statistical needs created by the passage of this legislation, and we make some suggestions about what can be done to meet those needs. We start with a discussion of the legislation, but we broaden our field of inquiry to consider statistical needs that are not explicit in the legislation. We will also provide very concrete examples of the problems of collecting statistical data in this area and the possibilities for using administrative data, often in concert with sample surveys, to improve our information about the effects of the legislation.
 
 

B. What the Act Does--Goals and Programs
There are many useful summaries of the new Act, and this is not the place to go into detail about what it does. (EN#3) The Goals of the act are listed in Figure 1 and many of the major titles contribute in some way to these goals by requiring work, facilitating the establishment of paternity, enforcing child support payments, providing child care, and penalizing teenage pregnancy. As shown in Figure 2, the Temporary Assistance for Needy Families (TANF) program created by Title I provides time-limited welfare through block grants to the states. Title III greatly strengthens child support enforcement and tracking. Title VI provides child care, and Title VIII cuts food stamp benefits and adds a work requirement to the program for adults between the ages of 18 and 50 who are not supporting minor children. Title V makes only relatively minor changes in child protection programs, but it funds a major study of children at risk for child abuse. Title II changes narrows the scope of Supplemental Security Income for disabled children and Title IV restricts welfare and public benefits for aliens.
 

By its goals, the Act suggests that we should monitor needy families to see if children are being cared for in their own homes and to see if they have adequate living arrangements, especially satisfactory child care, while their parents are working. We should see if poor parents do get jobs and stay married. We should see if out-of-wedlock pregnancies are reduced, and we should keep a close watch on child abuse, neglect, and abandonment. In addition, the various Titles of the Act suggest that we should monitor the circumstances of disabled children who might be affected by changes in SSI, the situation of aliens who will received reduced benefits or be denied benefits altogether, and the circumstances of those on food stamps who face a modified program with work requirements and reduced benefits. These are substantial tasks. Where should we begin?
 



II. Program Implementation and Statistical Needs


 



One way to think about the planning, monitoring, research, and evaluation needs created by the Personal Responsibility and Work Opportunity Act of 1996 is to focus, as we just did, on its goals, the programmatic changes it makes, and the resulting impacts on poor families. Certainly the purposes of the Act must be addressed in any satisfactory monitoring system, but we know that political agendas, resource constraints, and the weight of history will largely shape the statistical system that grows out of this legislation. It seems reasonable, therefore, to consider how the implementation of the program will structure the kinds of data that are collected. Let us turn to this kind of analysis for a moment.
 

The implementation perspective looks at a program and tries to understand how the incentives offered to those asked to implement it will affect the final shape of the program. This approach assumes, for example, that goals with money attached are more likely to be implemented than those without resources, no matter what the intent of the legislation. It assumes that tasks that are explicitly demanded will drive out those that are not, even if this detracts from achieving the goals of the program. And it assumes that powerful actors will bend programs to their agendas. Because the new welfare legislation is a block grant program the states are the major players, and we must understand what they are likely to do.
 

From a statistical standpoint, the legislation operates at two major levels. At the programmatic level, it sets a number of program performance standards with substantial penalties for failures to meet them. These include work participation and enforcing child support. At the case management level, the legislation requires five year limits on the receipt of welfare, involvement in work preparation programs and work readiness before accumulating two years of welfare assistance, tracking and locating those who do not pay child support, extensive redeterminations of eligibility for Supplemental Security Income, and changes in eligibility for food stamps. These programmatic and case management imperatives will determine the shape of the information systems. In addition, the existing data systems for those programs that have not been changed such as Medicaid and Unemployment Insurance will provide the background against which new data systems will be developed. Indeed, linkages with these and other existing systems will be required to implement certain provisions in the Act. Many social services data systems must be changed in some way, although special funding for such changes is provided only in the Medicaid and child support areas.
 

A. Program Performance Standards
Program Performance Standards for TANF --- The most conspicuous and important feature of the Title I of the Act is the mandatory work requirements. Figure 3 summarizes these requirements. They have three components: (1) a requirement that a certain fraction (which increases over time) of the total heads of households on Temporary Assistance be engaged in work activities as defined by the Act, (2) a definition of the allowable work activities, and (3) a set of standards for the minimum number of hours that the head of the household must be engaged in "work activities" and for the minimum number of hours that the head must be engaged in a specific subset of these activities. (EN#4) A little reflection suggests that it will be a daunting task to devise a system for keeping track of all this information. Yet, the Act requires Quarterly Reports which include the information necessary to calculate participation rates, and it penalizes those states which do not meet the work performance standards. (EN#5) The Quarterly Reports require not only the information on Figure 3, but also the additional information described in Figure 4. Note that one of the requirements of the law is a sample of closed as well as open cases. As well as requiring a lot of information in these reports, they must be submitted quickly. To avoid a penalty of four percent of the block grant, reports must be submitted no later than the end of the quarter following the one for which the data are collected.
 

How Program Performance will be Measured for TANF --- How will the states provide this information? Our best guess is that a quarterly survey will be used by many states for collecting much of this information because existing administrative systems will not be able to provide it, and the Act authorizes "the use of scientifically acceptable sampling methods" to collect it (page 48). In effect, such a survey would be a state-wide "mini-Survey of Income and Program Participation" of a sample of current or recent recipients of TANF that will be similar to the nation-wide survey of a sample of the entire population undertaken by the Census Bureau in its Survey of Income and Program Participation (SIPP). (EN#6)
 

Getting a quarterly reporting process underway poses some major challenges for the states. In the State of California, for example, it appears that a quarterly survey will be a major component of this process. There are already two different offices, the Review and Evaluation Bureau (REB) in DSS and the Statistical Services Bureau in the Department of Health and Welfare which are possible candidates for carrying out this mission. REB has traditionally done the federally mandated checking of case files to make sure that federal standards for grant calculation and eligibility are met. In California, there are about 300 employees of this group who now, under the new legislation, have no specific mandate. They are trained in statistical sampling, in checking the files of recipients, and in doing follow-up investigations, but they are not experts in survey interviewing although they have done "characteristics surveys" every two years to determine the characteristics of those receiving AFDC. The Statistical Services Bureau has traditionally obtained data from the county administrative systems and published reports on welfare in California. These units can both lay claim to having expertise in developing data on welfare, but it seems likely that they might take different approaches to producing Quarterly Reports.
 

Surveys designed to supply information for the Quarterly Reports can provide part of the foundation of a statistical system for monitoring the impact of TANF. Their usefulness, however, will depend crucially upon their content, their design, and their quality. (EN#7) It is possible to envision a bare bones cross-sectional survey that would allow the states to calculate their participation rates but which would be of limited usefulness for assessing the impact of TANF. With a bit of effort, however, it might be possible to develop a strategy for rolling panels, expanded content, and linkage to administrative data. The rolling panels would allow those monitoring the program to follow families over time and to observe what happens to them when they leave the program. They would also provide the statistical power of a longitudinal design. Expanded content could include issues such as adequacy of parenting, housing, health-care, and nutrition; the sexual and fertility behavior of recipients; the school plans and performance of minors and young adults; the establishment of paternity and the collection of child support; and other subjects. Linkage to administrative data may be essential to monitor TANF payments and food stamp amounts, homeless and child care assistance, and time on aid. It could also provide a historical picture of the family's experience with welfare before and after the survey.
 

The importance of the work participation standards, the substantial penalties for failure to submit Quarterly Reports on time, and the penalties for failure to meet the performance standards suggests that states will develop a method for reporting the data listed in Figure 4. (EN#8) States that had a centralized state-wide AFDC program and data system to begin with, as well as states with smaller populations, may find it possible to base their reports upon administrative data systems alone, although the breadth and depth of the required information for the quarterly reports suggests that surveys might be needed even in these situations. (EN#9) Presumably the mention of specific data elements in the Act and their importance for comparing one state versus another will create some incentives for the Federal government to get the states to agree upon a common set of definitions to avoid the reporting of incomparable data. There is good reason, then, to suppose that there will be a core survey effort with some common definitions to study the impacts of TANF.
 

Those who want to expand this enterprise must convince the states that the marginal costs of more sophisticated designs and more extensive content is relatively small. These resources, though relatively small, may be hard to come by. Although there is plenty in TANF and the rest of the Act for Democratic and Republican governors and legislators to fight over as they develop state programs and enabling legislation, it seems possible that one source of dispute might be the content and design of such surveys. Conservatives will no doubt prefer surveys which focus on work effort while liberals will want to include questions about quality of life for welfare recipients. Conservatives have the advantage of statutory language (Figure 4) which emphasizes collecting information on work effort and program participation, but both liberals and conservatives might be able to agree that it would be useful to know how much children are affected by the new programs. Furthermore, some of the information obtained from these surveys might be of interest to the many groups (such as the powerful cities and counties in California and elsewhere) concerned about the budgetary implications of the Act. (EN#10) The tracking of those who leave TANF could provide information about the likely impacts of time-limited welfare on General Assistance and on Foster Care -- two programs that could balloon as families time-out of TANF. (EN#11) It might even provide a way to determine whether child abuse, crime, or other unwanted behaviors will be affected by the legislation. A bi-partisan coalition of legislators might be built around the promise that these data could provide an early warning system of burgeoning costs or unexpected problems.
 

Other Program Performance Standards --- The Act also includes some other program performance standards regarding decreasing out-of-wedlock births, (EN#12) improving the enforcement of child support, and more generally measuring state performance so as to achieve the goals of the Act. (EN#13) Bonuses are provided to states which reduce out-of-wedlock births and penalties for failure to improve the rates of paternity establishment, but there is nothing comparable to the Quarterly Reports required of the agencies administering TANF in the states. (EN#14) Bonuses are only offered for about five years, but penalties will continue, and while it is not clear that bonuses are so large that they will foster a substantial effort by the states to collect data on these issues, the surveys for the Quarterly Reports might provide some of this information.

B. Case Management Data

Case Management Data for TANF and Food Stamps --- The Act also requires information about each case to check on eligibility for the new program and, most importantly, to keep track of time on welfare and work effort. From a statistics and data management perspective, the introduction of time-linked limitations on receipt of aid is probably the most significant innovation in this legislation. The three major limitations are the following:

 
(1) No TANF is to be provided to families which include an adult member(EN#15) who has received assistance in any State program for sixty months, whether consecutive or cumulative (page 35). (EN#16)
(2) A parent or caretaker receiving assistance under TANF must engage in work after no more than 24 months of aid, whether or not consecutive (page 10).
(3) Food stamps can be provided for no more than three months of the last 36 months to individuals between the ages of 18 and 50 who do not haveresponsibility for a minor child, who are not disabled, and who have not

worked for 20 hours or more per week (page 227).

Of these three time-limitations on benefits, the most novel and complex is the five-year limitation on the receipt of TANF because it involves keeping historical records of benefits, work-history, and other information over a lifetime; taking into account individual relationships to families over that lifetime; and creating an absolute limitation on program participation without a chance to restart program eligibility. Other social programs have kept earnings and benefit histories over long-periods of time (e.g., Social Security), provided time-limited benefits--but with a chance to restart program eligibility after some time (e.g., Workers' Compensation, Unemployment Insurance), and considered current and past family structure in the determination of eligibility or benefits (e.g., Social Security), but none to our knowledge has combined as many of these features as does the five-year life-time limitation on the receipt of TANF. All others provide opportunities to restart eligibility. Taken together, the special features of time-limited welfare pose some substantial problems for those designing database systems for case management in the TANF program.
 

Most states or counties(EN#17) currently have computerized data systems which collect case-level information on the structure of the family, income from work, child support status and payments, and other data needed to calculate AFDC (or food stamp) grant amounts. In California this information is only available at the county level, but it is collected in state-wide systems in some states. States or counties also have information about participation in JOBS programs, but this is often kept in separate data systems. For example, in California, most large counties have completely different information systems for keeping track of AFDC benefit calculations and GAIN (the California JOBS program) participation. Furthermore, these systems vary in structure and data elements by county.
 

TANF envisions a situation in which these systems are combined or at least communicate with one another. (EN#18) To enforce the five-year limit in TANF and the other time-limits in the Act, at first a statewide system must be created, and eventually a national registry must be constructed as well. At the moment, however, we are far from having such systems. Even within a single California county, it is a great challenge to get computer systems working together so that a case manager will know at any given moment whether a head of household is in GAIN or receiving welfare, but it is an even bigger challenge to add the new data elements required to implement TANF, to develop a common format across systems separated by geography and by bureaucratic task, and to link them over long periods of time as case composition and individual names change. Most systems currently only keep information for a few months, or at most three years, and even those states or counties that do keep historical information have almost never developed an ongoing process of linking it over time and across political jurisdictions. Yet TANF and the changes to the food stamp program require careful record-keeping of months of receiving aid across spells of aid and periods of sanctions with varying lengths and application to different family members. There are penalties for states which fail to comply.  (EN#19)
 

Indeed, the complexity of the TANF provisions -- for example, months on aid by unmarried persons under eighteen years of age do not count towards the 60 month time limit and certain kinds of educational experiences count as "work activities," although sometimes for only limited periods of time -- means that it will be necessary to keep current and historical information available for all people in a case, including their ages, educations, workfare program participation dates and hours, school attendance, etc. in addition to case-specific data such as payments, child support, food stamp amounts, and Medicaid eligibility. Accurate and up-to-date information will be necessary on who is or is not attending school, or participating in an approved employment or community service activity, and for how long. And since a maximum of four concurrent weeks of job search is permitted, and only twelve months of vocational education, these must be recoded separately from other work activities. Data systems must track all this, and do so accurately.
 

The goal here, however, is not at all the reality. We are far from being able to do this, and the Act appears to recognize this by calling for two studies that would be first steps towards creating a synoptic data system. (1) The Secretary of Health and Human Services must submit a report to the Congress on the status of the automated data processing systems operated by the States to assist management in the administration of State programs. This report must describe what would be required to establish a system capable of tracking participants in public programs over time to check case records of the States to determine whether individuals are participating in public programs of two or more States. The report should contain a plan for building on the automated data processing systems that exist, and a time estimate for establishing the new system. (2) In addition, the Act calls for the Commissioner of Social Security to develop a prototype counterfeit-resistant social security card and to study the feasibility of issuing this card for all individuals over 3, 5, or 10 years. Social security number matching, which has been relatively rare in the past because agencies relied predominantly upon their agency case or person numbers for tracking, could be facilitated with the development of a counterfeit-resistant social security card although the extensive use of social security numbers raises numerous questions about privacy and confidentiality.
 

It is a long way, however, from these reports to a true national tracking capability. Indeed, many states are a long way from being able to track welfare recipients over time. In California, for example, most of the information about welfare or JOBS resides in different county databases, and the only statewide welfare database, the MEDS system, would have to be significantly redesigned to accommodate the needs of the TANF legislation.
 

Case Management Data for Child Support --- The Act also requires the creation of two other detailed databases to aid in child support enforcement. The automated State case registry will contain records on support orders established or modified in the state after October 1, 1998 and records on each case in which services are provided under the State plan for child support enforcement (p. 105 of the Act). Child support enforcement actions are required for those receiving benefits from TANF, food stamps, Medicaid, foster care maintenance payments, and any child of an individual who applies for services. The registry must have standardized data elements for identification of parents (such as names, social security numbers, dates of birth, and case identification numbers) and detailed information on case status, on the amounts of support owed and support paid, and on administrative and judicial actions and proceedings. Information from the State case registry will be made available to a Federal Case Registry.
 

The State Directory of New Hires must be in operation by October 1, 1997, and employers and labor organizations in the state must furnish reports for each newly hired employee. Through a W-4 form employers must furnish to the State Directory of New Hires the name, address, and identification number of the employer within 20 days of the date of hire. This directory will be used to locate individuals for the purposes of establishing paternity and enforcing child support obligations. The State Directory of New Hires must also report quarterly to the National Directory of New Hires information on wages and unemployment compensation, and new hire information can also be disclosed to the state agencies administering TANF, Medicaid, Unemployment Compensation, Food Stamps, and SSI. (EN#20)
 

These two data systems, the State Case Registry and the Directory of New Hires cover quite different universes, and they will be useful for answering quite different questions. The Directory of New Hires is designed to cover all employment so that it will be somewhat broader than the existing data from the Unemployment Insurance program. This may help to fill in some gaps that currently exist in UI data. The State Case Registry might be useful for producing statistics on the establishment of paternity and child support agreements for those receiving social welfare, but it will not cover the universe or even a well-defined demographic (as opposed to programmatic) group.
 

How Will Automated Case Management Systems Be Constructed? --- Modern database technology makes it possible to construct the massive databases contemplated in the Act. Networking makes it possible for field units to be in direct contact with centralized computers so that information can be input or queried on an ongoing basis. Individual computer tapes now hold up to 40 gigabytes of data, which is enough room for hundreds of millions of records. Fast computers make it possible to sort and link these millions of records. And relational databases have created a powerful tool for organizing and accessing data.
 

Relational databases (EN#21) are essentially linked tables of information so that there can be a table of individual characteristics such as age, sex, race, and marital status which lists everyone in the database, a table of family relationships which shows how these people are related to one another, a table listing all instances of one kind of event along with the characteristics of the event such as receiving a TANF check of some amount during a specific month, a table of another kind of event along with its characteristics such as being enrolled in a work program of some sort, and perhaps still a third table of events such as receiving wages for an average of 20 hours or more per week in the last month. The structured query languages used in these databases make it possible to ask for information on, for example, all cases in which the head of household was 18 years or older, which had already accumulated 24 months of welfare assistance, in which the head of household was not enrolled in a work program, and in which the head of household had not worked an average of 20 hours or more in the last month. Once the database is constructed, it is very simple to construct these inquiries, and this makes it possible to "cut" the data in many different ways without an excessive amount of programming each time a different slice of the data is needed.
 

Relational databases also have another feature that makes them useful for organizing data. They enforce a kind of discipline called "normalization" (EN#22) on how tables are constructed that reduces ambiguity and simplifies the process of updating the files. This could prove to be especially useful in the construction of social program databases where considerable confusion results from having cases composed of persons who can move in and out of cases and who can form new cases. Many social program databases utilize cases as their basic unit of analysis and computer systems store information by these cases. Because the case is the basic unit of concern, information on the persons within the cases is sometimes not collected in a very useful form or at all. This can bedevil anyone who wants to follow individuals over time because they are sometimes not identifiable within the case, or even if they are identifiable, their relationship to other members of the case is not always clear. In California, for example, it has been very hard to identify teenage mothers "nested" within cases with their own mothers or other relatives because it is impossible to distinguish a case where a baby is the biological offspring of the teenager's mother from a case where a baby is the biological offspring of the teenager. A well-designed and maintained relational database would make it harder for these kinds of problems to arise. Another kind of problem arises when a child within a case gets SSI because of a disability. In that circumstance, the child disappears from the AFDC case and forms a new SSI case. Finally, cases may dissolve or form in new ways as adults get married or divorced.
 

These anomalies often make it very hard to follow people, especially children, in these files. Indeed, there is a basic paradox embedded in many of these databases. Although there is a great deal of concern with the welfare of children, the bulk of the attention is placed on the adults, and the databases are often designed to track adults much more readily than to track children. If we are really going to be concerned with outcomes for children, we must make sure that we design data systems that allow us to follow children and to measure the outcomes of our programs for them.
 

These comments suggest that much could be gained from integrating and redesigning our current data systems. But this is easier said than done. In our work with the State of California, we have constructed separate longitudinal persons and cases files for the four counties, Los Angeles and San Bernardino in Southern California and San Joaquin and Alameda in Northern California, from information supplied by them since December, 1992. We have also used data supplied by the state from the Medi-Cal eligibility file to construct welfare history files for these same persons and cases back to 1987. To put together the persons and cases files for the counties we have had to process between four and eight files from each county (see Figure 5 ). In doing this, we have encountered different database management systems (none of them modern relational databases) for each county, some of them dating back 20 years, and all of them requiring substantial translation and reformatting before we could construct our own four county database. But it is not only the age of these systems that makes them unwieldy. They also have quite different data elements and different definitions of the same data elements, and some of the files are much richer than others. This has led us to create separate persons and cases files for Los Angeles. Figure 6 shows a table of the variable list for the six files we have created. A comparison of the columns for the Los Angeles County Case file and the Four County Case file demonstrates the additional richness of the Los Angeles file. But this chart does not begin to reveal the work that went into creating variables that were comparable across just these four counties. In summary, then, the creation of a uniform database requires both substantial computer talents to create datasets that are in a common computer format as well as significant social program knowledge and re-processing of data to insure that ostensibly similar data really are measuring the same variable.
 

The difficulty here is that there are so many different databases that must be pieced together to create a useful longitudinal dataset. One solution to the problem would be to develop an entirely new system that uses more modern technology. The Act provides the opportunity to do this for TANF because it places expenditures for "information technology and computerization needed for tracking or monitoring" (page 21) outside the fifteen percent cap on administrative expenditures. This may help to fund new systems, but it could also lead to a battle between liberals and conservatives. Liberals may choose not to spend TANF resources on information systems which they may see as mechanisms only for better enforcement of the time-limits, and conservatives may favor spending because they view enforcement as the essence of the new program. Perhaps, as with the surveys for the quarterly reports, a bipartisan coalition can be put together that is concerned with monitoring the long-term impacts of the Act.

C. Existing Systems
 

From a statistical perspective, one of the advantages of this Act is that it did not drastically change the eligibility for two major social welfare programs, Medicaid and food stamps. (EN#23) Consequently, the data systems and the universe of program participants for these programs will not be substantially changed, and they will be available as baselines for studying the impact of TANF. Thus, a sample of those eligible for Medicaid before and after the introduction of TANF could be studied to see how the program has affected their lives. Similarly, a sample of food stamp families with minor children could be studied in the same way.
 

Some other existing systems include Unemployment Insurance data, Vital Statistics, foster care data, and tax data. The Unemployment Insurance system provides quarterly wages for individuals and information about employers. Vital statistics includes dates of birth and death and other basic demographic information. Foster care data in California (EN#24) include information on the children (birth date, sex, ethnicity, relation removed from) and the placement (removal-reason, location of placement, start-date, facility type, end-date, reason for exit). Tax data provides detailed information on wages and other income, receipt of the Earned Income Tax Credit, and some information on family structure. These tax data are very rich sources of information although they are not easily accessible because of confidentiality safeguards.
 

Another way to think of these various databases is that they cover three broad areas. Vital statistics, foster care data, and the new State Registry for Child Support may provide information on basic family structure. Unemployment Insurance, tax data, and the new Directory of New Hires can provide information on employment and wages. Finally, Medicaid, food stamps, TANF, and SSI can provide information on means-tested social program participation.
 

Because these datasets often contain complementary information, much can be learned by linking them. At UC DATA, for example, we have worked with the California Department of Social Services and the State franchise tax board to link tax records, Unemployment Insurance data, and AFDC data as part of a study of the take-up rate for the Earned Income Tax Credit. (EN#25) We are now in the process of linking foster care data and AFDC information. (EN#26)

D. Conclusions from the Implementation Approach
 

Survey and administrative data produced to implement the Personal Responsibility and Work Opportunity Act of 1996 could be very useful in monitoring and evaluating the new programs established as part of the Act. If surveys are used by states to produce their quarterly reports, then these surveys could provide a flexible vehicle for monitoring the impacts of the legislation. Existing data systems such as Medicaid or food stamps may provide opportunities for tracking needy families who might otherwise disappear from welfare rolls because of the changes from AFDC to TANF. Existing data systems can also be used to produce ancillary information on family structure, employment, and wages that can be very useful in monitoring the impact of the legislation. Finally, as new case management systems are developed, efforts can be made to insure that common data elements and common definitions of data elements are used across the counties and states.

III. Generic Problems of Monitoring and Evaluation and Some Solutions
 

The implementation perspective treats legislation much as we might trace the possible courses of a great river in flood. We examine the topography and presume that the legislation will follow the contours of the land. Fatalists might stop there, but with a bit of engineering, we can sometimes channel even the most rambunctious river. In the development of statistical data systems, there are two types of engineering that must be done. We must deal with the technical problems of constructing useful databases, and we must deal with the political and bureaucratic ones of coordinating their construction. In this section we try to identify some of the basic technical problems of monitoring and evaluation, many of them mentioned already, to see if we can find ways to overcome them. In the fifth section, we summarize our thoughts on the political and bureaucratic problems.

A. The Problems and Some Solutions
 

Choosing the Units and Universe for Analysis --- To answer any statistical question, the starting point is defining the unit of analysis. We have already discussed the complexities of monitoring individuals, especially children, in a system that operates in terms of cases. The problem is exacerbated by the fact that there are so many different definitions of a case. An AFDC case, food stamps household, or tax return might cover different subsets of people within the same "family." Certainly, one of the ongoing challenges is to improve our ability to sort out these different definitions and to develop ways to track children and adults within these program.
 

Once we have decided upon the basic units of interest, say children or adults or families, then we must describe the universe that we wish to study. This is difficult for at least two reasons. First, it matters whether we sample the stock of people on welfare or the flow of people into it (or off it). It is well-known that a cross-section of people on welfare has a longer average spell-length and is less likely to get off welfare in the next month than a sample of new entrants to welfare. Yet, it is easy to sample a cross-section because administrative systems are usually designed as repeatedly updated cross-sections, but it is hard to sample by length of time on welfare because welfare databases have not typically kept track of this information. It can only be obtained by laboriously linking repeated cross-sections to create a longitudinal database. Second, we often care about demographic groups such as all legal immigrants, all disabled children, or all people below the poverty line instead of programmatic groups such as all legal immigrants on welfare or all disabled children on SSI, or all people on welfare. We often care about the demographic groups because we want to know the fraction of a population that is served by a program or what happens to those who are not served by it. This is often called the problem of obtaining "denominator data," but it is also the problem of getting some variation in the treatment so that we can determine what happens to those who get the program (the "treatment") and what happens to those who do not. Another closely related problem is comparing those in one program, say AFDC, with those in another program such as TANF.
 

The sampling of stocks and flows can be facilitated by constructing longitudinal relational databases from which cross-sections, new entrants, new exits, or any other group can be easily hived off. If all we care about is rates of participation in programs, we can often solve the "denominator" problem by taking census or other data on some demographic groups of interest and comparing the number of people in each demographic group in the census with the number in our programs. This solution sometimes flounders because the definitions of a group in one source of data is different from that in another, but with some effort and a bit of artifice this can often be overcome. The problem of getting some variation in the treatment is often a difficult one (and once solved there is often the additional problem of selection into the treatment), but it can sometimes be solved by finding a data source that can be linked with the program participation data and which is either a superset of those in the program or an intersecting set which includes some people in the program and some outside of it. As noted earlier, because Medicaid has not been substantially changed by the new legislation, a sample of people receiving Medicaid before the welfare program changes should be similar(EN#27) to a sample of people receiving Medicaid after the changes. By comparing AFDC usage in the Medicaid sample selected before the changes in welfare with TANF usage in the Medicaid sample selected after the changes, we should be able to see in what ways a group of needy individuals (as indicated by their enrollment in Medicaid) is affected by the changes in the welfare program. (EN#28) 
 

Describing the Treatment --- A remarkable feature of the Act is that it says a great deal about what it wants to achieve, but very little about how it wants to achieve it (except perhaps in the section on child support). There is virtually no specification of what programs should be implemented to move families from welfare to work, or what should be done to reduce teenage pregnancies beyond abstinence education. This provides the states with a tremendous opportunity to innovate, but it also presents those monitoring these programs with a tremendous problem of knowing what the treatment is.
 

There are two levels to this problem. On a state by state level, some effort must be put into keeping track of the programs that are devised. This in itself may be a substantial job, as indicated by those who have tried to document the details of the federal waiver process. Unfortunately, however, even if this can be done, it will not be enough because programs may differ substantially from person to person within a state. This may be because some people will get more services than others, because some counties offer different programs, or because eligibility for programs will be tailored to the individuals. In any case, this means that individual level data will be needed to assess the real impact of the new welfare programs. This poses a tremendous challenge to those monitoring the program because of the difficulties of linking data on community service or employment and other programmatic information with participation in welfare. For example, in the Cal-Learn program in California for pregnant and parenting teens, the treatment consists of case management services and monetary sanctions and rewards to encourage teens to finish high school. To describe the Cal-Learn treatment fully, UC DATA has had to obtain data from AFDC files where there are records of monetary sanctions and rewards, from GAIN files where there are records of supportive services and of recommendations to provide bonuses or sanctions, and from the Adolescent Family Life Program files where there are records of case management services.
 

Linking over Time, Programs, and Space --- Linking over time, programs, and space can greatly increase the power of a statistical system. Yet each kind of linkage presents particular problems and opportunities.
 

Linking over time creates a longitudinal database which is especially useful for understanding the dynamics of program participation. This might seem straightforward, but it requires some rules for following cases and some understanding of the ways that files are updated. As for cases, what should be done when a case splits up or seems to disappear? We have followed the rule of searching for the youngest child in the original case and continuing with that person on the grounds that the youngest child is most likely to continue receiving assistance. But other, and possibly better, rules are possible. Understanding the way files are updated is important because cases may regularly disappear at some calendar date only because bureaucratic routines call for cleaning out discontinued cases at that time. Or updating may lead to some clerical errors in identifiers so that attempts must be made to search for cases that have continued but with a different identifier. Or cases may be assigned different case numbers from one spell of welfare to the next.
 

Linking across programs or datasets can greatly increase the possibilities for analysis, but it usually requires linkage of identifiers, such as names, that might be recorded in quite different ways. The field of probabilistic matching has developed a great deal in the last decade, so there are now very useful algorithms for determining the likelihood that one case is the same as another based upon the degree to which a set of identifiers is the same in the two cases. (EN#29) This still requires that the designer of the system choose the set of identifiers and that the designer decide how to use information about the likelihood of a match. In one model, a match is considered to have been made if the likelihood exceeds a threshold and from then on the records from the two files are treated as if they were about the same case. Alternatively, if a subset of the records can be matched accurately (which may be possible through intensive examination and investigation), then this information can be used to build a model for imputation and editing the rest of the data.
 

Linkage across datasets could be dramatically improved if efforts were made to develop common identifers. We have already noted that the Act provides for a substantial amount of matching by Social Security number, and it calls for a study of counterfeit-resistant Social Security Cards. The use of Social Security numbers, of course, is not foolproof because of mispunches and other problems that can arise. An alternative or complementary approach is to require records to have enough individual information such as name, sex, date of birth, mother's maiden name, or other information to facilitate matching. The California Health Information Policy Project has championed this approach and gotten some support for it. All of these methods, of course, raise sensitive issues of confidentiality.
 

Linkage across programs sometimes provides multiple sources of information on the same data element. This makes it possible to get a better understanding of how the method of data collection affects the data element. For example, Henry E. Brady and Samantha Luks have explored the differences between survey responses on the length of welfare spells and administrative data on the receipt of welfare. (EN#30) They have found evidence for social desirability bias in survey responses with respondents reporting shorter spells than recorded in the administrative data, and they have found evidence for administrative churning in the administrative data in which one to two month interruptions in aid constitute late or misplaced paperwork but not true interruptions. In a study of the Earned Income Tax Credit, information on earnings from tax records, AFDC files, and surveys provides a chance to see if respondents report different earnings because of different incentives in the programs. (EN#31)
 

There are two ways that files can be linked across space. One is simply to look for the same individuals in different jurisdictions so that they can be followed if they move. This is essentially another version of the matching problem described above. A second way that data can be linked across space is to connect Census or other information that is available on a geographic basis with individual records. This can be done by geocoding addresses (which raises additional problems in matching), by using Zip codes, or by using other information about geographic location. These kinds of information can help us understand how context affects individual behavior. Hilary Hoynes, using UC DATA information, has shown how the availability of jobs in local areas affects the employment prospects of welfare recipients and their ability to get off welfare. (EN#32)
 

Gathering Outcome Data and "Control" Variables --- Getting people off welfare is an explicit goal of TANF, and the Act proposes to do this by preparing them for work. In the past, without time-limited welfare, a transition off welfare was clearly a good thing because it indicated that assistance was no longer needed, but with time-limited welfare, people may leave welfare without being prepared for work, without having any job available, or without having overcome the difficulties that led them to seek assistance in the first place. This means that data on outcomes other than simply leaving welfare must be collected. In fact, information on the reason for leaving welfare is required for the Quarterly Reports (see Figure 4), but ideally we would like to have additional information on job prospects and current quality of life.
 

Quality of life information is especially important with respect to the children in the case. As we have already noted, welfare information systems have often neglected to collect much information on children. Yet, it is of utmost importance to know what happens to them. The Act seems to recognize this and there is a provision for studies of the circumstances of children of families that timed-off welfare and of teenage parents and their children (page 53). The studies have to consider the incomes, educational attainment, employment, criminal behavior, fertility, and social program participation of these groups. Outcomes that are not mentioned, but which might be equally important, are nutrition, adequacy of housing, adequacy of health care, child abuse and neglect, and movement to foster care or adoption.
 

As well as getting outcome variables, it is very important to record the characteristics of recipients which might increase or decrease their ability to leave welfare or to be successful once they leave. Educational attainment, job training, disabilities, marital status, and number of children are some of the most important characteristics. Many of these are often very badly measured by administrative data systems so that it is hard to do analyses which take them into account. One of the advantages of surveys is that they can capture this information.
 

Data Quality and Missing Data --- Missing data, in the form of either unit non-response or item non-response, (EN#33) has always been a major problem for survey researchers, but it may be an even bigger problem for those designing administrative data systems. Administrative data systems often have tremendous gaps in the reporting of some items --- especially those that are unrelated to the business purpose of the data system, and individuals or cases sometimes get lost because of faulty matching. In addition, administrative data systems often suffer from severe problems of non-comparable data, poor documentation, and unreliable data. These problems are reduced in sample surveys through the use of a uniform instrument, careful documentation, and the thorough training of interviewers and coders. There are well-known ways to deal with missing data, but non-comparable data pose even greater challenges.

B. Survey Data, Administrative Data, and Linking
 

Surveys versus Administrative Data --- There is no one-time fix-up to all of these problems, and no one means of data collection is unequivocally better than another. Administrative data, for example, may have its weaknesses, but it also has great strengths such as large sample sizes and being an excellent record of certain kinds of events. Figure 7, reproduced from a study that UC DATA did for the Division of Workers' Compensation of the State of California, (EN#34) compares administrative data versus sample surveys along three dimensions. "Data" refers to the amount and type of data that can be collected. "Cases" refers to the number of observations and the degree to which they are representative of the universe of interest. "Times" refers to the frequency and schedule of data collection. (EN#35)
 

Administrative databases often have only a small amount of information on each case compared to surveys, (EN#36) but what is there for business purposes is often of superior quality. For example, the kinds of data that people often have trouble remembering in an interview, such as the exact amount of their benefits or the dates on which they received assistance, are carefully recorded in administrative databases which are designed to keep track of these facts. Unfortunately, those data which are collected in administrative databases but which are not essential for business purposes, for example educational attainment or race, are often of inferior quality. (EN#37) Administrative databases are often richer in the description of services --- receipt of benefits, leaving welfare, preparation for work, job training, child care --- than in two other important types of information. They often contain little on the characteristics of people, situations, or events such as educational attainment, job history, or disability that might explain why the individual needs the service, and they seldom contain outcome data, such as quality of life measures concerning the adequacy of parenting, health care, nutrition, or housing, that might more fully characterize the situation of the individual. It is true that the receipt of some services, such as job training, might explain why some others, such as welfare assistance, are eventually no longer needed, but by and large, surveys must be used to collect background information and detailed outcome measures. Surveys are also useful if information needs change over time because it is much easier and less costly to rewrite a survey instrument than to change an ongoing administrative database.
 

Administrative databases are usually superior to surveys because they include information on an entire universe of cases, although this can present problems of confidentiality. Finally, administrative databases and surveys differ in the timing of data collection. Administrative systems collect data as part of the ongoing administrative process. This is an advantage insofar as it insures that these events are recorded in a timely manner before memory loss or other events obscure them. But it is a disadvantage insofar as it means that there is often no observation of the case during a "normal" period between important administrative events. This means that important changes in the case can remain invisible to these systems.
 

Linking Datasets --- What would the best possible dataset look like? Let us describe datasets by how much of the three dimensional space they fill-up in Figure 8. In this figure, the vertical axis is the number of variables or the amount of data, the axis into the plane of the picture is the number of observations or cases, and the horizontal axis is time. The best dataset would provide us with as much data, as many cases, and as many time periods as possible --- it would fill-up the entire space. Neither administrative data nor surveys can do all of these things at once.
 

This suggests a hybrid approach. Why not link surveys and administrative data to get the advantages of each method? In fact, why not link several surveys to one another and several administrative datasets to one another, and then link the surveys to the administrative data? Figure 8 is a schematic of how we have done this to create the California Welfare Research Databases. The picture (which one wag described as "Downtown Pittsburgh") shows how UC DATA and the California Department of Social Services have linked state level administrative data (the Longitudinal Data Base), to county level administrative data about 15,000 research families from four counties (the County Welfare Administrative Data Base), and to several in-depth panel surveys (the Panel Survey Data Base) from a 15% sample of those in the four county database and from a foreign language survey of all language groups comprising one percent or more of the entire 15,000 research families. Taken alone, each of these datasets was constructed by putting a number of files together, and they required a great deal of linking, editing, cleaning, and documenting. We describe them and their uses in more detail in the next section.
 

IV. Specific Examples of What Might be Done

A. The California Experience
 

The California Work Pays Demonstration Project --- In the last four years, the Research Branch of the California Department of Social Services (CDSS), University of California Data Archive and Technical Assistance (UC DATA), and the Survey Research Center (SRC) at the University of California, Berkeley have been working together on California's federal AFDC waiver, the California Work Pays Demonstration Project (CWPDP). It would take us too far afield to describe California's waiver in detail, but the waiver has two major components: (1) changes in the calculation of grants meant to encourage work effort such as waiving the 100 hour work rule for unemployed parent cases and rescinding the four month limitation on the $30 and 1/3 income disregard and (2) the Cal-Learn program to encourage pregnant and parenting teens to finish high school by providing monetary incentives for good grades and disincentives for getting bad grades or dropping out of school, and case management services to help teen parents get access to services and manage their lives.
 

When the first parts of the California waiver came into effect on December 1, 1992, UC DATA was asked to design and implement a series of data collection strategies for an experimental evaluation of the work incentives feature of the waiver. The many elements of this design are summarized in Figure 9. (EN#39) Its central feature (see the rectangle with a dashed line around it near at the top of the picture on the left) is the designation of 15,000 cases on AFDC in four counties (Alameda, Los Angeles, San Bernardino, and San Joaquin) as research cases. Choosing these cases was greatly simplified by the existence of the state-wide MEDS file (see the upper left-hand corner of Figure 9) which has records on all Californians eligible for Medi-Cal. MEDS delineated our universe of potential research subjects because all AFDC recipients automatically appeared on it through their eligibility for Medi-Cal.
 

Ten thousand of the 15,000 research cases were placed in an experimental group which, like the rest of the welfare population, was subject to the rule changes, and five thousand were kept on the rules that were fixed as of September of 1992. For them it was as if the welfare law never changed. Though the U--or unemployed parent--cases (which are two parent families) constitute only about 5-7% of the welfare caseload, they were oversampled for the experiment because, as a group, they were expected to be more responsive to the work incentive features of the program. (Adults in this type of case tend to be employed more.) One third of the WPDP sample was drawn from the unemployed parent cases. (EN#40)
 

A supplemental group of 4,000 pregnant and parenting teens has been drawn to evaluate the Cal-Learn component of the CWPDP. Each of these 4,000 is being assigned to one cell of four cells in a two factor experimental design. One of the two factors is case management services and the other is monetary sanctions and incentives. Teens assigned to one cell get both case management services and monetary sanctions and incentives, those assigned to another cell get neither, and those assigned to one of the other two cells get one of the treatments but not the other. Thus, the impacts of both case management services and monetary sanctions or incentives can be evaluated with this design.
 

The CWPDP Datasets --- In order to develop the best possible tools to evaluate the CWPDP, a number of datasets have been constructed based upon the MEDS file and the over 15,000 research families in the four counties. These datasets(EN#41) are:
 
 
 

Longitudinal Databases (LDB) --- Ten percent and one percent samples of cases and persons have been taken from the MEDS file from 1987 to 1995. (See the upper right hand corner of Figure 9.) These samples are of all Californians who are enrolled in Medi-Cal, and they are constructed to be continuously updated rolling cross-sections with continuous monitoring of families once they get on aid. This continuous follow-up provides the longitudinal component to the data. Data on quarterly earnings from the state Unemployment Insurance data files have been added to confidential versions of these files. The major features and data elements in these files are summarized in Figure 10.
 
 
 

Research Sample Longitudinal Database (Sample LDB) --- The MEDS files have also been used to construct a longitudinal database for the 15,000 research cases. This file has the same information as the LDB described above.
 
 
 

County Welfare Administrative Database --- This dataset provides information derived from county AFDC and food stamps databases on the 15,000 research subjects. On Figure 9 it appears near the bottom of the page on the left where it is called the Uniform Database. Our initial hope had been that we could create a truly uniform set of codes and variables across all four counties, but as shown earlier in Figures 5 and 6, the county AFDC and food stamp case management systems were simply too different to make this possible.
 
 
 

Panel Surveys --- Two waves of in-depth telephone interviews with a 15% subsample of the 15,000 original research cases, or about 2,250 female heads of assistance units who speak English or Spanish, have been conducted. In addition, two waves of a parallel foreign language survey of 1,350 people who speak Armenian, Cambodian, Laotian, or Vietnamese have been finished. These four language groups were chosen because out of the 15,000 research cases, each of them constituted one percent or more of the sample. The English-Spanish and Foreign Language surveys ask basically the same questions, but the Foreign Language survey includes some additional items about refugee status, including ESL classes and camp experiences. The content of the surveys is summarized in Figure 11.
 

These surveys include background information and outcome information that is almost never available from administrative data systems. This includes questions about education, AFDC history, work history, housing quality and stability, economic hardship, hunger, respondent and child's health and disabilities, labor market activities of partner/spouse, income, child support, child care knowledge and use of child care, and knowledge of work incentives. The rate of interview refusal is extraordinarily low, and the greatest problem with conducting the interviews is locating the respondents.
 
 
 

Cal-Learn Studies --- A number of datasets are being developed for the Cal-Learn study. Administrative data from a variety of data sources including AFDC, GAIN (the California JOBS program), and Adolescent Family Life Programs has been put together into the Cal-Learn Administrative database. There are two Cal-Learn surveys. There is a "Retrospective Survey" of those in the Cal-Learn program who have already had children (see the left-hand side of Figure 9) and a "Prospective Survey" (see the bottom right of Figure 9) for those teens at risk of becoming teenage parents. In the prospective survey, teens who are potentially, but not yet eligible for Cal-Learn, (i.e., eleven to seventeen year old sons and daughters of adults in the WPDP English/Spanish telephone survey) will be interviewed by telephone.
 
 

Our experience in California (and the experience of others around the country) has demonstrated the possibility of linking administrative datafiles to construct research quality longitudinal datasets, and the added benefits to be gained by conducting surveys that can be linked with the administrative data. We have found that administrative datafiles become more and more useful as they are extended in time to create longitudinal datasets, as they are linked together to provide more variables, and as they are cleaned and documented to make them readily accessible. Our datasets have been designed so that they can be linked, so that they complement one another, and so that they provide information on important policy issues such as teenage parenting, quality of life for welfare recipients, disabilities, job preparation, and employment. We have found that they provide the basis for monitoring many aspects of the welfare system and for answering very diverse research and evaluation questions. We consider some of them in the next section.

B. Answering Specific Questions
 

In this section we will consider how four important questions --- disability and AFDC, the use of the Earned Income Tax Credit by welfare recipients, and the role that job availability plays in exiting from welfare --- have been approached using the California Work Pays Demonstration Datasets.
 

Disability and AFDC --- Although TANF allows states to exempt up to 20 percent of their caseload from the five year limit on assistance due to conditions such as disability, we know surprisingly little about the actual impacts of disability on the receipt of welfare. Probably the major reason for this is that there are very few datasets which link information on disability to information about AFDC receipt. Henry Brady, Marcia Meyers, and Sam Luks are using the CWPDP data to investigate the impact of child and adult disabilities on the duration of welfare spells. (EN#42)
 

The CWPDP surveys were designed to ask detailed questions about disabilities of children and mothers. This alone, however, would not have been that useful because the surveys only reliably indicate AFDC status at a point in time. Linking the panel surveys with the Research Sample Longitudinal Database provides reliable retrospective AFDC history back to 1987. Linking the panel surveys with the County Welfare Administrative Database (CWAD) provides reliable information on AFDC history after the interview dates. In addition, efforts are being made to link the County Welfare Administrative Database to state Supplemental Security Income files to check on SSI status for each of the families in the CWAD (and in the panel surveys which are nested within the CWAD). At the moment, the researchers are relying upon survey responses about SSI status.
 

There are few studies which directly examine the relationship between disabilities, work, and welfare receipt but Acs and Loprest have used 1990 SIPP data to show that mothers with severe or multiple limitations are less likely to than others to leave welfare for a job. They find little consistent evidence that the disability status of children affects these transitions. This may be because it matters what transitions are being studied. Using the CWPDP data, Brady, Meyers and Luks show that the disabilities of mothers and children do not seem to predict exits from AFDC, but they do predict the kind of exits that will occur. Simply put, disabilities of both mothers and children appear to simultaneously increase time on AFDC and to increase the likelihood that the case will move from AFDC to SSI. These two competing effects cancel one another out when we only look at exits from AFDC because families can exit into SSI or completely off AFDC and SSI.
 

Take-up of the Earned Income Tax Credit --- The EITC project has linked most of the CWPDP databases to State tax records to get a better understanding of how many welfare families take-up the EITC. This study is relying upon the detailed income and earnings data available in the CWAD and in the LDB after it has been linked to UI data. There are very few reliable studies of take-up rates among poor people. The CWPDP databases provide a very large sample of poor people along with detailed information on their incomes. This makes it possible to do a detailed study of this subject.
 

Job Availability and Exits from Welfare --- One of the most important questions facing those implementing the new welfare bill is whether there will be jobs for those who seek to make the transition to work. In a recent paper using CWPDP data, (EN#43) Hilary Hoynes has demonstrated how transitions off welfare are facilitated by strong demand for labor and impeded by weak labor markets. Her study uses the LDB linked to local area data through zip codes and county of residence. The LDB provides a large enough sample to determine whether exits are affected by local labor market conditions.
 

V. Challenges to Getting Data Collection Strategies On Line



We hope that the reader is convinced by this time that there are many opportunities for improving welfare data systems for monitoring, evaluation, planning and research. It should also be clear, however, that there are some real challenges to creating an improved system. In this section, we summarize some of those challenges.
A. Technical Challenges
 

Advanced Case Management Systems --- With time limits and other program mandates in TANF, it will be necessary to keep current disaggregated information available for all people in a case, including their ages, educations, workfare program participation dates and hours, school attendance, etc. in addition to case-specific data such as payments, child support, food stamp amounts, and Medi-Cal eligibility. Simply implementing the different mandates of the program will require a capability to make benefits contingent on a number of changes in the status of a case, or of the individuals in a case. Accurate and up-to-date information will be necessary on who is or is not attending school, or participating in an approved employment or community service activity, and for how long. As we described earlier, there are hours per week minimum standards for all TANF recipients engaged in work, workfare, or community service. Data systems must track all this, and do so accurately.
 

Information Sharing --- Extensive information sharing and/or data system interface will be necessary to meet the data reporting requirements of the Personal Responsibility and Work Reconciliation Act of 1996. The sharing of information is mandated at two levels: Federal and State. Generally, all of the programs addressed in the Act will be required to share information about individuals and families with other government agencies. Child support agencies will be the recipients of data from the INS, government licensing bureaus, a multiplicity of government agencies, businesses, credit bureaus, the UI program and banks. Social security number matching, which has been relatively rare in a past that predominantly relied on agency case numbers for tracking, may become standard. If the provisions of this legislation become implemented, data needed to operate single programs will be located in the files of different agencies. (EN#44)
 
 

Survey Support --- If surveys are selected as a primary means of acquiring quarterly report information, one of the main challenges will be finding welfare recipients for interview within the required time frame. This task is frequently a time-consuming and labor intensive process and time is what States don t have with respect to quarterly reports. Survey response rates among welfare recipients are frequently problematic even under ideal conditions in which time is not a consideration because they move frequently and information on their current addressees and telephone numbers is not updated. So even if surveys are conducted, the only way reports could be finished in time is if a sampling frame with current information is continuously updated and submitted to a centralized data bank as soon as it becomes available. This suggests that a centralized source of welfare data will be necessary to support a survey effort, as well as to collect additional program data .(EN#45)

B. Resources
 

Possibly the best approach to improving welfare data systems would be a dedicated TANF information system that would be designed specifically for the information needs of the TANF program. There are some real advantages to a new system. It is often easier to implement entirely new software designs than to update old programs that may have been written in the languages of long ago for different purposes entirely. A new system provides the opportunity to use the new technologies that we have mentioned in this paper. And a new system could be designed to maintain good locator information on TANF recipients which could facilitate the collection of information on recipients through quarterly surveys. But a new system will be very costly. The resources to create a new system do not have to meet the limit of fifteen percent on administrative costs, but they will have to be taken from resources that would otherwise go to the program itself. This suggests that TANF systems may be build upon existing systems or upon other systems that are better funded.
 

Although it is very unlikely that states would locate welfare data in child support agencies, the Child Support and Medicaid programs have special Federal funding for data system development, whereas TANF does not. State Medicaid systems together are allocated $500 million over the first 12 quarters of TANF to change their information systems because of the unlinking of welfare and Medicaid. The Child Support enforcement agencies will be given 90% of the planned cost of their new information systems submitted before the Personal Responsibility bill was signed. If information systems were located in either of these agencies, access to special funding would mean that states might not have to use funds from cash benefits or from employment or pregnancy prevention programs to develop their TANF information systems.
 

We do not know what strategy states will follow, but we do know that resource constraints will figure prominently in what they decide to do. One of our hopes is that by analyzing the possibilities for broadening state efforts beyond merely meeting the letter of the law, it will be possible to create systems that serve statistical needs as well as case management needs.

C. Political Will
 

Whether or not to use block grant funds for the development of surveys or new case management systems is partly a political choice. We have noted that liberals and conservatives might differ over the content of surveys and the utility of creating new case tracking systems. But we have also noted that there are strong incentives in TANF for the development of some sort of surveys and some sort of case management system. Furthermore, there are excellent bi-partisan reasons to want to know what happens to TANF recipients as they reach the end of their aid. Very few politicians want big surprises --- especially those that break the bank. A good statistical monitoring system can ensure that there is an early warning system.
 
 
 
 

D. Inter-Agency Agreement
 

The sharing of information between agencies is always a more or less challenging affair. Agencies differ in their confidentiality provisions, in their financial arrangements for data sharing, and in their responsiveness to one another concerning data requests. It helps a great deal to have personal contacts within the agency whose data is needed. It also helps to be prepared to pay for the requested data files, especially if programming time is required to create them. In California, the Medicaid eligibility system has provided extensive longitudinal files for research on welfare spell durations and for social program evaluation. Several studies based on the Medi-Cal records have included shared data from other organizations, as we discussed earlier. One of the major challenges in the development of new statistical systems is the negotiation of interagency agreements. One way to facilitate this would be to develop standard interagency agreements with similar specifications, so that authorized users of information may be held to a single standard. The problems here are formidable. There are many different agencies with very different enabling legislation.
 
 

E. Confidentiality
 

One of the major impediments to interagency agreement will be legitimate concerns with confidentiality. The Act seems to be of two minds on this issue. On the one hand it restricts disclosure and access to information in a number of places. (EN#46) But on the other hand the TANF portion of the Act has few references to privacy or confidentiality issues, except that as part of each State plan, a provision must be included that addresses how the State will, "Take such reasonable steps as the State deems necessary to restrict the use and disclosure of information about individuals and families receiving assistance under the program attributable to funds provided by the Federal Government" ( page 10 ). (EN#47) Furthermore, as we have seen, the Act includes extensive provisions for matching data across datasets using Social Security numbers. It also has a provision for a study of a prototype counterfeit-resistant social security card so as to provide individuals with reliable proof of citizenship.
 

As researchers, we have a strong interest in access to data, but we have no interest in compromising the privacy of individuals. Hence, while we might see exciting research opportunities in the ability to link different datasets, we also see dangers of violations of confidentiality. We are not experts in this field, but we believe that one of the greatest challenges facing those who want to develop new systems for monitoring welfare reform is protecting people's privacy while maintaining a clear picture of how their lives have been affected by the new legislation.
 

As information is power, and the linkage of information systems is required under this legislation, much careful thought must go into developing, not only comprehensive, easy to understand, and uniform confidentiality requirements, but also ways to allow access to data for research and evaluation. Providing accurate information and good research on the many social groups affected by the Act can make an important contribution to improving policies in ways that can assist poor people.
 

V. Conclusions



The concept of block grants to States carries a connotation of reduced paperwork, reporting, bureaucracy, and red tape. However, in the Personal Responsibility and Work Opportunity Reconciliation Act of 1996, there is a large and increasingly important Federal role in compiling information from the States for specially mandated studies, and for assuring that the programs defined and implemented differently in the States remain accountable to the intent of the Federal legislation. Because of the quantity and specificity of the Act s reporting requirements and because States will suffer serious financial penalties for not complying with them, the many changes under block grants may very well increase the need for data collection.
 

In this paper we have attempted to integrate an implementation perspective with the planning, monitoring, research and evaluation perspective. The implementation perspective looks at the way new programs are likely to be implemented and tries to develop systems that are compatible with the programs. This perspective tries to understand the motivations of the various groups and agencies involved in the implementation of the programs in order to be realistic about what can be accomplished. It has the defect of getting locked into the presuppositions of the legislation, or the world view of the actors involved.
 

In adopting the planning, monitoring, research and evaluation perspective, we have attempted to describe what needs to be done to get a reasonable strategy or strategies for data collection and a statistical system in place. This is a useful contrast to the implementation perspective, but it can falter because it fails to understand the intent of the legislation and the goals of the various actors.
 

We believe that both perspectives make a case for the approach we have taken at UC DATA in our work with the California Department of Social Services. By linking many different datasets, each with its own strengths and weaknesses, we can produce a more complete picture of what is happening in the lives of groups of people affected by welfare reform. As we move to a block granted welfare system and see an increasingly diverse set of state programs, let us not forget that there are many reasons why many of the nation's poorest families may end up outside of the new system. Some may be success stories; others may not be so successful. It is a task of welfare researchers in the next century to document where both groups end up.