Principal Research Scientist at Educational Testing Service
A common problem in applied educational research is estimating the causal effects of educational interventions or instructional experiences from non-experimental data. For example, teacher value-added estimation relies on statistical modeling to control for student background characteristics when estimating the effects of individual teachers on student achievement. An open question in this and related applications is whether and how measurement error in test scores from prior years should be accounted for when estimating treatment effects. The answer depends on whether the observed test scores and other student covariates are sufficient to account for relevant differences across observational treatment groups. This generally cannot be determined from observed data. For example, if analysts had access to direct measures or other proxy measures of achievement from earlier grades, such as student work, class grades, or teacher judgments of student proficiency, these measures could be used to test whether controlling for the information commonly available in longitudinal data systems is sufficient to balance groups with respect to prior achievement. Unfortunately such data are not commonly available in large-scale longitudinal databases available to researchers for secondary analysis.
We develop a novel method for using item-level response data from standardized tests, which are increasingly archived in longitudinal data systems, to proxy for such measures. The approach exploits the fact that item-level data are commonly not used as efficiently as possible to create test scores that are reported to students, parents, and educators, and so contain unobserved information about latent achievement. We present empirical results from the application of the methods to data from a large urban school system, where we test selection on observables assumptions about student-teacher assignments that are commonly made in teacher VA estimation. We find these assumptions to be widely rejected. We consider implications of the findings for how best to use observed test scores to balance non-experimental groups with respect to prior achievement. This is work in progress, and is joint with Daniel F. McCaffrey at ETS.