Data Collection

Our ability to use newly available census data is a particularly advantageous component of this study. This study uses four Census Bureau datasets that are highly restricted and housed in nine Research Data Centers (RDC’s) nationwide. To construct the dataset for this project, we merge administrative and survey data from the U.S. Census Bureau, the Bureau of Labor Statistics, and the National Center for Education Statistics. Also combined with this dataset is detailed information on state ECEC regulations for family daycare homes, childcare centers, and prekindergarten programs. The databases are described below:

A. ECEC Availability Data

By combining the datasets below, we are conducting a fairly comprehensive examination of the ECEC providers including family daycare homes, centers and pre-kindergarten programs.

  • Longitudinal Business Database:
    The Longitudinal Business Database (LBD), constructed by the Census Bureau, covers all non-farm private sector establishments as well as public sector firms in most industries, including ECEC. The data are compiled from tax return information and include any operating business that has employees and files a tax return, including government entities and tax-exempt establishments. Because it includes the entire universe of businesses in the U.S., the LBD offers a unique opportunity to provide a never-before-seen comprehensive portrait of childcare businesses with employees in the U.S.
  • Integrated Longitudinal Business Database:
    A key limitation of Federal data for studies of ECEC is the exclusion of many family daycare home providers from datasets that track employees. The LBD includes only those businesses with employees (known in Census language as employer businesses). An advantage of this study is our ability to connect the LBD data with newly available Integrated Longitudinal Business Database (ILBD) which includes information on non-employer businesses (those without any employees, i.e. owner-operated). Non-employer businesses comprise 89 percent of the establishments in the childcare industry, so their omission from many previous studies of early childhood supply may be an important one (O’Neill & O’Connell, 2001). Because so much of the childcare industry is comprised of family daycare homes which are often run by their owners and have no employees, the addition of the ILBD offers a more inclusive picture of the market for childcare in the country than has been seen previously.
  • Common Core of Data:
    Together, the LBD and the ILBD encompass the bulk of ECEC providers. However, in order to paint a complete picture of the availability of care we must include the preschool settings within public schools. To account for these programs we will supplement the census data with public school, pre-kindergarten enrollment data from the Common Core of Data (CCD), a program of the National Center for Education Statistics (NCES). The CCD, which is available for every year of the proposed study, includes detailed data about every public school and district in the country. The CCD reports annually on the number of pre-kindergarten children enrolled at each school. It also reports the number of pre-kindergarten teachers at the district level.
  • Longitudinal Employer Household Dynamics:
    In order to examine the supply of childcare workers, we will include a third administrative, census data set called the Longitudinal Employer Household Dynamics Infrastructure files (LEHD). The LEHD provides detailed longitudinal data about these childcare workers. Data in the LEHD come from unemployment insurance records, supplemented with information extracted from the Social Security Administration’s Numident file – the database containing application information for Social Security Numbers – and from the Statistical Administrative Records System (StARS) - a statistical Census Bureau dataset created to track home and work address information on U.S. residents. The data provide demographic information including sex, age, place of birth, citizenship, race, as well as place of residence for all employees earning at least $1 from 1990 to 2005 for in 40 participating states.

B. ECEC Regulations Data
This study draws on state policy and regulation information from a number of sources.

  • First, several researchers who have studied ECEC regulations (Blau, Hotz, Kilburn, and Xiao) generously shared their data on state regulations related to both childcare centers and family daycare homes. This data covers the years 1983 to 2000 (Hotz & Xiao, 2005). Regulations included address a variety of structural characteristics pertaining to ECEC workers including specific coursework, degree, training and experience requirements. Other regulations include staff-to-child ratios and classroom size limits. We are in the process of updating these data through 2005 with information from the Child Care Licensing Study and the Family Child Care Licensing Study, reports produced on an annual or biennial basis by the Children's Foundation and the National Association of Regulation Research. Once we have finished with the compilations, we will have detailed information on state regulation of childcare policies from 1983 through 2005.
  • Oftentimes regulations for state preschools are more stringent than those that govern other childcare centers or family daycare homes. Because in many states, these preschools now represent a significant presence in the ECEC sector, it is important to account for the stronger regulations facing these organizations. To do this we supplement the regulations data described above with information on preschool regulations as reported in the State of Preschool Yearbooks produced by the National Institute for Early Education Research from the years 2003 to 2008. Unfortunately, the Yearbooks only began collecting information on state pre-kindergarten programs in 2003, so we are limited to just the years since then, but we use them to supplement data on state policies we collect from other sources. These yearbooks include valuable information on the standards and regulations in state preschool programs, including education and training requirements and class size and staff-to-child ratios. Although the reports only cover the most recent years of our study, they provide valuable information to supplement the regulations contained in other sources.

C. Child Outcomes Data
The data on child outcomes for this study come from three sources: the National Longitudinal Study of Youth, 1979 Child Survey (NLSY79C) conducted by the Bureau of Labor Statistics, the Early Childhood Longitudinal Studies – Kindergarten Cohort (ECLS-K) conducted by the National Center for Education Statistics (NCES) and the Early Childhood Longitudinal Studies – Birth Cohort (ECLS-B) also conducted by NCES. All three sources offer nationally representative samples of children and include important measures of child outcomes. The studies were conducted differently, so in what follows we outline the data collection methods and describe key measures used in each in turn.

  • National Longitudinal Survey of Youth 1979 Cohort, Children's Sample:
    In 1979, the Bureau of Labor Statistics (BLS) began collecting data on a nationally representative sample of 12,686 men and women between the ages of 14 and 22. The data comprise what is known as the National Longitudinal Study of Youth 1979 Cohort (NLSY79). The sample selection was done in two pieces to create one sample representative of the entire civilian population of the US and another that oversampled minorities and economically disadvantaged non-minorities. Sampling for the first subsample was done by first identifying households within geographically delineated segments (subsets of block groups within primary sampling units). All households that were screened and found to have appropriately aged men and women in residence were given a base year survey. The oversampling of minorities and economically disadvantaged youth was done in a similar fashion by randomly sampling from a set of segments derived from primary sampling units determined by the socioeconomic characteristics of their residents. Seven years after the NLSY79 began, the BLS began collecting data on the children born to the women of the original NLSY79 sample. When the appropriate sample weights are used, this data, known as the Child Supplement (here NLSY79C), is representative of American children born to the population of women born in 1957 through 1964 and living in the United States in 1979.
  • Early Childhood Longitudinal Studies – Kindergarten Cohort (ECLS-K):
    The ECLS-K includes information on a nationally representative sample of kindergarteners in the 1998-1999 school-year. A total of 21,260 children completed either the child assessment or parent interview in the fall or spring of kindergarten. When choosing the students, the NCES used a multistage probability sample design to ensure a nationally representative sample. In the first-stage of sampling, counties or groups of counties were chosen based on their size and the race-ethnicity of the residents, the size of the area and the per capita income. In the second-stage, schools within the primary sampling units were selected based on whether they offered kindergarten (or something similar, like transitional first grade) and in proportion to their size. Finally, in the third-stage, the target was to sample approximately 24 students per school.
  • Early Childhood Longitudinal Studies – Birth Cohort (ECLS-BK):
    Collection of data on the ECLS-B began in 2001, when the children involved in the study were 9 months old. When sampling for the ECLS-B, the NCES used a list frame design in which it sampled births within primary sampling units (or secondary sampling units when necessary). The study was designed to be representative of all children in the US born in 2001 except children born to mothers less than 15 years of age or those who either died or were adopted prior to the nine month assessment.

D. Community Characteristics

  • Decennial Data:
    We will supplement all other datasets in this study with data on the demographic characteristics of areas in which providers operate. We have access to the restricted-use versions of the Decennial Census from 1990 and 2000. Each is a representative sample of 1-in-6 members of the population. The survey collects detailed information about respondents, such as gender, age, race, marital status, school enrollment, language spoken, citizenship, employment and wages, industry and occupation, income from various sources, housing characteristics, etc. Our access to the restricted-use version also gives us very detailed information (down to the Census Block) about geographic location of respondents' places of residence and employment.