Thursday, March 29, 2007

How Standardized Testing is Killing American Education: Reason #6

5th in a series.

6) Ranked scoring doesn't tell you anything about learning: If every student in the state dramatically improved their learning and scored much higher on the test, would you expect the average scores to go up? Likewise, if every student in the state was accidentaly given the test in Arabic instead of English, would you expect the average scores to go down? Guess what? They wouldn't! Not in either case. In both cases, the average score would be "600" no matter how many more questions were answered correctly (or incorrectly) or even if the test was an incomprehensible graduate level neuroscience test given to 3rd graders.

How can this be? Well, the "scores" that students get are not directly based on the number of answers they get right. The highest possible score is "1000", but that doesn't mean that a "600" score denotes 60% of the questions answered correctly. A student with a score of 600 may have gotten 10%, or 50%, or even 90% of the questions correct. The scores that are published for these tests are "scaled scores." Scaled scores are given based on how many students you scored better than on the test, not how many questions you got correct.

Imagine that there are 100 9th graders in California. After taking the test, the students are lined up in order based on the number of questions they answered correctly. The person at the beginning of the line might have answered 2%, or 20% (which is what you'd expect a student randomly guessing to get), or even 80% of the questions correct. Similarly, the students at the end of the line might have answered 50%, or 75%, or 98% of the questions correctly. It doesn't matter: they're just lined up in order. After they are in order, they are divided into 5 equal groups. The first 20% of the students (20 students in our example) all receive a score of "200". The next 20 students all receive a score of "400", then "600" for the next 20, "800" for the next 20, and "1000" for the 20 students with the most questions answered correctly.

A school's API (Academic Performance Index) score is the average of these "quintile" scores from all of its students. So what's the problem?

Well, let's look at our imagninary 100 9th graders. Student #3 could have answered 4% of the questions correctly, and student #18 could have answered 43% of the questions correctly, but they both get the same score: 200. Likewise, student #20 could have answered 42% of the questions correctly and student #21 could have answered 43% of the questions correctly, but student #20 will get a score of 200 while student #21 will get a score of 400... twice as many points! Can you see how these numbers can be misleading?

Another problem: schools receive a score of 1-10 based on their ranking, similar to the students. The bottom 10% get a "1", the top 10% get a "10" and so on. But the number of students that would need to move up one quintile for a school to move from a "1" to a "2" is significantly higher than the number of students that would have to move up one quintile from a "4" to a "5". The schools in the middle are bunched together very closely, and movement between the rankings doesn't necessarily indicate a large number of students scoring differently. Movement at the bottom (and at the top) on the other hand require large numbers of students to improve their scores, and the improvement in learning that moving from a "1" to a "2" is significantly greater than that of a school that moves from a "5" to a "6"... yet the numerical value given to this score is the same.

Most importantly, the only way for a student to move up in the rankings is for them to improve disproportionately more than a student that previously scored higher than they did. There will always be 20% of the students in the bottom quintile: that's how the system works. If a student moves up, another student has to move down. Therefore, we are requiring schools with low scores to teach their students more than schools with high scores. The system is basically competitive: you don't have to improve you students' learning necessarily. Rather, you need to hope for some other school to do a worse job than your school does. This puts educators (and students and parents, for that matter) in a position of hoping for other schools to educate their students poorly. I don't know about you, but I find that reprehensible. We should never put our children in a position where their success is measured in such a way that they are dependent on the failure of others. If our schools are training our children to engage in life as a zero-sum game where their well-being is predicated on the misfortune or failure of others, we are setting them up to take the messed up world they've inherited from us and make it all the more hellish.

- "'Welcome to Hell.' "Oh, thanks. That means a lot, coming from you.'"

3 comments:

bethany said...

i work at a special education center in lausd for students 12-22 with moderate to severe developmental disabilities. almost none of the students at this school can read, and some are not able to care for their most basic needs on their own(eating, toileting, walking, etc.). we also have standardized testing. it feels very odd to be administering them, let me tell you.

Anonymous said...

which standardized tests are scale scored, and which are not. Are SAT tests scale scored? I had no clue about any of this.

Mr. Mac said...

Ah, my most loyal reader. Welcome back, anonymous!
Anyway, the short answer is "yes." SAT tests are scale scored. What's interesting is that a few years back when there was a lot of press about how the SAT got "easier" in how it was scored and high scores didn't mean as much... the scores actually meant more, if by "more" you mean "more accurate." The old system based the scaled score on a small sample of predominantly white, upper-middle-class, east-coast kids, assuming that their scores were representative of the scores of kids nationwide. When a more geographically, ethnically, and socio-economically diverse group of students was used to set the scale, the average scores dropped and the standard deviation jumped. The thing is, if the old system gave you a score indicating that you scored better than 85% of the kids in the country, what it really meant was that you scored better than 85% of rich white kids in Connecticut. A score that reported you as having a below average score could actually be given to you even if your scores were above the national average. The new system is more accurate. Still screwed up, but less so.