Data sgp is an ad-hoc statistical analysis package that uses data derived from standardized tests to evaluate teachers’ effects on student achievement. It enables users to conduct several different types of analyses. Typically, these analyses are conducted in two steps. The first step involves importing a LONG or INSTRUCTOR-STUDENT lookup file, and the second step is to run the analysis.

The results of these analyses are presented in a variety of formats, including graphs and tables. In addition, they include information about the reliability of the estimates.

Measurement error is a common source of bias in teacher-level aggregated SGP. It occurs when the estimated teacher effect is regressed on prior test scores and student background variables (e.g., ethnicity and socioeconomic status). It is possible to reduce this bias through measurement error correction, as discussed in Wooldridge, 2002. However, if true SGPs are correlated with student background characteristics, such corrections would not remove these relationships.

Our analyses of data from the National Longitudinal Survey of Youth, 2001 and a cross-section of school districts in the United States show that the variance from this source of bias is large enough to account for a large fraction of the spread between quantiles of student achievement. In particular, the spread between the 0.10 and 0.90 quantiles in Figure 2 is (46.9, 53.5) compared with (44.8, 55.4) for Figure 3.

It is also possible to avoid this source of bias by regressing the teacher fixed effects, prior test scores, and student background variables on the students’ latent variables (i.e., proxies for those latent traits). Although it is not possible to eliminate the variance due to this source of bias, our analysis shows that this approach can be used to significantly lower the RMSE for conditional mean estimates of e4,2,i in a model that regresses on the prior test scores, teacher fixed effects, and latent variables.

We also find that the reliability of e4,2,i is much higher when conditioning on all of the students’ test scores than when conditioning on only a subset of students’ test scores. These findings are important in evaluating the interpretation and transparency benefits of aggregated SGPs, and should be considered when selecting a method to improve the accuracy of SGP.

In addition, the reliability of e4,2,i is higher for the grade 7 and 8 cohorts than for the grade 4 cohort. This suggests that the underlying latent trait may have a stronger influence on the performance of students in the higher grades than in the lower ones.

The results of this study suggest that the value-added modeling approaches described above may be a better way to evaluate teacher effectiveness than using aggregated SGP. Because these methods are able to remove the source of bias in aggregated SGP that arises from measurement error, they offer the best option for assessing teacher performance.

Because of the importance of removing this source of bias in the aggregated SGP, it is critical to evaluate the costs and benefits of alternative modeling approaches before selecting one for implementation. For example, the cost of reducing the variance from this source of bias in an aggregated SGP could be significant, and it is therefore a good idea to weigh the potential interpretation and transparency benefits against the costs of allowing this source of bias in the model.