Rating Teachers: The Trouble with Value-Added Analysis

Most people assume teachers are held accountable for student learning — seems obvious considering why we have teachers in the first place. In fact, until last year, five states had laws explicitly banning the use of student achievement data in teacher evaluations and only four states required that a teacher's evaluation be based primarily on students' test scores. But since the launch last summer of the federal "Race to the Top" program, in which states compete for grant money by implementing education reform, 12 states have passed legislation to improve their teacher evaluations, and all the data "firewalls" are gone.

But in a sign that the push to improve teacher evaluations is at once moving too fast and too slow, the Los Angeles Times last month published a searchable database of the "value-added" scores of 6,000 Los Angeles teachers, boiling down each educator's effectiveness in the classroom into a single stat. Researchers, teachers' unions, nonprofit advocates and even Secretary of Education Arne Duncan rushed to praise or condemn the landmark data dump. The numbers had been obtained via California's Public Records Act and then crunched by a seasoned education analyst. But most parents were baffled — what does "value-added" mean, anyway?

The last decade has yielded an explosion of data about student performance. In many places, these data can be used to create a year-over-year analysis of how much a teacher advanced the learning of an individual student. Because value-added models can control for other factors impacting student test scores, the most important being whether a student arrived in a teacher's classroom several grade levels behind, this method of analysis can offer a more accurate estimate of how well a particular teacher is teaching than simply looking at the latest set of student test scores. High-flying teachers can be recognized, and low performers can be identified before they spend years doing a disservice to kids. Science and technology to the rescue again!

Unfortunately, it's not so straightforward in practice. The tests used in a lot of places are a bad match for the value-added methodology, which is a lot more complicated than subtracting one year's score from the next. Meanwhile, different value-added models can yield different conclusions about the same teacher. A small detail like that matters a lot if you're going to use this data to start firing people.

In addition, though you wouldn't know it from all the noise about testing, most of the nation's teachers teach subjects, often electives, for which students are not subjected to standardized testing. Even subjects like science frequently are not tested annually. End result: the best value-added model still leaves out many teachers who are not teaching math or language arts.

These and other complications are why, ultimately, the education field will end up using the same imperfect evaluation strategy used by most professions: a blend of quantifiable data and managerial judgment.

Getting from here to there will be bloody. With few exceptions, teachers' unions fight against efforts to ground teacher evaluation in data and simultaneously resist giving administrators the discretion to remove teachers. That pretty much relegates evaluations to the realm of Ouija boards. (Disclosure: I'm making this criticism at the same time I'm getting paid by one of the unions to help strategize in a different area.) Currently, the unions' favored remedy is "peer review," letting teachers evaluate each other. The results of the few initiatives trying this method are strikingly modest (especially in relation to the rhetoric about them), and overall there is no evidence that self-regulation works any better in education than it did in the financial sector.

Given the importance of effective teaching, the rush to do something, anything, is understandable. The Los Angeles Times published the value-added data along with a series of thoughtful articles about how teachers are evaluated in the city. What's astounding is that the Los Angeles school system had been sitting on this value-added data all along and did nothing with it. This inaction is certainly a reason for the newspaper to use the data to highlight the problem and hold the district accountable.

But attaching teachers' names to their scores? Letting parents (or anyone else) search a database to see whether Mr. Smith is a good teacher or a bad one? Unless you'd be comfortable with a newspaper publishing an incomplete evaluation of your job performance, that step should give you pause. The persistent problem of low-performing teachers is exasperating but should not cause us to lose perspective.

Likewise, when I was a member of the state board of education in Virginia, the lack of attention we paid to teacher effectiveness was ridiculous. Yet I would not have supported a wholesale rush into value-added analysis, either, because Virginia was badly positioned to do so without sparking chaos — and probably litigation.

The answer, then, is to develop better training for supervisors and better methods of evaluating teachers, including using value-added analysis but also classroom observations and other tools. Parents should, of course, have access to the results of comprehensive evaluations. Not something as oversimplified as a computer spitting out a ranking of teachers, but a system that is more durable and far-reaching. Building the right tools — and a culture that values evaluations — is the hard work ahead that will be necessary to create a genuine profession for teachers and to produce more consistent outcomes for students.

Andrew J. Rotherham, who writes the blog Eduwonk, is a co-founder and partner at Bellwether Education, a nonprofit working to improve educational outcomes for low-income students. School of Thought, his education column for TIME.com, appears every Thursday.

Time.com

Rating Teachers: The Trouble with Value-Added Data