Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Thursday, March 29, 2007

How Standardized Testing is Killing American Education: Reason #6

5th in a series.

6) Ranked scoring doesn't tell you anything about learning: If every student in the state dramatically improved their learning and scored much higher on the test, would you expect the average scores to go up? Likewise, if every student in the state was accidentaly given the test in Arabic instead of English, would you expect the average scores to go down? Guess what? They wouldn't! Not in either case. In both cases, the average score would be "600" no matter how many more questions were answered correctly (or incorrectly) or even if the test was an incomprehensible graduate level neuroscience test given to 3rd graders.

How can this be? Well, the "scores" that students get are not directly based on the number of answers they get right. The highest possible score is "1000", but that doesn't mean that a "600" score denotes 60% of the questions answered correctly. A student with a score of 600 may have gotten 10%, or 50%, or even 90% of the questions correct. The scores that are published for these tests are "scaled scores." Scaled scores are given based on how many students you scored better than on the test, not how many questions you got correct.

Imagine that there are 100 9th graders in California. After taking the test, the students are lined up in order based on the number of questions they answered correctly. The person at the beginning of the line might have answered 2%, or 20% (which is what you'd expect a student randomly guessing to get), or even 80% of the questions correct. Similarly, the students at the end of the line might have answered 50%, or 75%, or 98% of the questions correctly. It doesn't matter: they're just lined up in order. After they are in order, they are divided into 5 equal groups. The first 20% of the students (20 students in our example) all receive a score of "200". The next 20 students all receive a score of "400", then "600" for the next 20, "800" for the next 20, and "1000" for the 20 students with the most questions answered correctly.

A school's API (Academic Performance Index) score is the average of these "quintile" scores from all of its students. So what's the problem?

Well, let's look at our imagninary 100 9th graders. Student #3 could have answered 4% of the questions correctly, and student #18 could have answered 43% of the questions correctly, but they both get the same score: 200. Likewise, student #20 could have answered 42% of the questions correctly and student #21 could have answered 43% of the questions correctly, but student #20 will get a score of 200 while student #21 will get a score of 400... twice as many points! Can you see how these numbers can be misleading?

Another problem: schools receive a score of 1-10 based on their ranking, similar to the students. The bottom 10% get a "1", the top 10% get a "10" and so on. But the number of students that would need to move up one quintile for a school to move from a "1" to a "2" is significantly higher than the number of students that would have to move up one quintile from a "4" to a "5". The schools in the middle are bunched together very closely, and movement between the rankings doesn't necessarily indicate a large number of students scoring differently. Movement at the bottom (and at the top) on the other hand require large numbers of students to improve their scores, and the improvement in learning that moving from a "1" to a "2" is significantly greater than that of a school that moves from a "5" to a "6"... yet the numerical value given to this score is the same.

Most importantly, the only way for a student to move up in the rankings is for them to improve disproportionately more than a student that previously scored higher than they did. There will always be 20% of the students in the bottom quintile: that's how the system works. If a student moves up, another student has to move down. Therefore, we are requiring schools with low scores to teach their students more than schools with high scores. The system is basically competitive: you don't have to improve you students' learning necessarily. Rather, you need to hope for some other school to do a worse job than your school does. This puts educators (and students and parents, for that matter) in a position of hoping for other schools to educate their students poorly. I don't know about you, but I find that reprehensible. We should never put our children in a position where their success is measured in such a way that they are dependent on the failure of others. If our schools are training our children to engage in life as a zero-sum game where their well-being is predicated on the misfortune or failure of others, we are setting them up to take the messed up world they've inherited from us and make it all the more hellish.

- "'Welcome to Hell.' "Oh, thanks. That means a lot, coming from you.'"

Friday, March 16, 2007

How Standardized Testing is Killing American Education: Reason #8

Third in a series.

8) "Scaled scores" don't tell you anything about student learning: Standardized tests scores are given as "scaled scores." This means that your score is not based directly on how many questions you got right: a student who answered 25% of the questions correctly would not receive a score half that of a student who answered 50% of the questions correctly. Rather, the scores tell you how many other students that took the test scored worse than you did. A student who scores in the 35th percentile did not necessarily get 35% of the questions correct. What happened is that 35% of the students who took the same test got less questions correct than that student did. It's possible that they got 35% of the questions correct, but it's just as possible that they got 5% of the questions correct, or 75%, or even 90%. A scaled score doesn't tell us anything about the number of questions answered correctly.
Likewise, improvement on a scaled score doesn't necessarily indicate improvement in learning. A student could answer 45% of the questions correctly one year and 55% the next year. Their scaled score could improve, or drop, or stay the same, depending on whether other students improved similarly or not. Ideally, we want all students to improve, don't we? Well, if that happens at the same rate, our scaled scores will not change at all, and will give no indication that the outcome we most desire is actually taking place!
Scaled scores are deceptive on several counts. First of all, it is not uncommon for someone to think that someone with a scaled score under 50% has mastered less than 50% of the material. That is not true. Someone with a scaled score of 50% scored higher than 50% of the students who took the same test. In other words, this is a totally average student. Right in the middle. Typical of American students in general. This students actual score could tell us a lot about the state of American education: if an average student has an actual score of 20%, we would be disappointed; likewise, an "average" actual score of 85% would be very encouraging. Unfortunately, the only score we're ever exposed to is the scaled score, which doesn't tell us a lot about what (or whether) students are actually learning.

(Resources: 1 2)

-"USA Today has come out with a new survey: Apparently three out of four people make up 75 percent of the population."

How Standardized Testing is Killing American Education: Reason #9

Second in a series.

9) Norming is biased: "Norming" refers to comparing one students' results against all other students to determine how they compare to the population at large. Most of the time, however, the "population at large" scores are compared to is actually a smaller sample of the entire population which is judged to be a representative sample of the entire population. This smaller sample is given the test early, and those results are used to set up a virtual spread of scores.
So, you have two problems: how do you assure that your sample is truly representative of the larger population? You can select for race, number of years in the country, socio-economic status, parents' education, region of the country, gender, age, and a host of other variables that may or may not have some bearing on results, but no matter how big your sample is, you're always going to have sampling error. Choosing a representative sample is also really hard and expensive, so instead, samples tend to be less representative in favor of choosing students from the same geographical area, often close to the location of the test-makers offices. For the SAT, that meant that upper-middle class, predominantly white students were the sample that the test was normed against for years. Remember a few years back when they "rescaled" the scores and people complained that they were lowering the bar by making grading "easier"? What actually happened was that a more representative sample was used and the college board realized that their sample had been skewing the Norm high for years. The new scores are more accurate because they're based on a more representative sample.
The second big problem is just regular old sampling error. You can't get away from it. When you compound the sampling error inherent in choosing test questions with the sampling error from the group used to set the Norm, the reliability of the test results becomes shakier and shakier.
Several years ago, as Reformed Math made Integrated courses more popular, California debuted Integrated Math Standardized tests as options for schools. For several years, the results were impossible to norm: that is to say, results did not fit a normal distribution as you would expect from an unbiased test. Results had to be fiddled with and forced artificially into a normal curve. You'd think that this would reveal a flaw in the testing (even more than the normal level of error which is considerable) and states and districts might hold back on making major decisions based on these scores. No such luck. Bureaucracy reigns supreme, and the wheels of progress have too much inertia to stop turning, even if it means innocent students are crushed underneath.

(Resources: 1 2)

- "He uses statistics as a drunken man uses lampposts—for support rather than for illumination."

The Annual Standardized Testing Rant: First in a series!

Welcome back to my favorite topic: how standardized tests are killing American education. I've tackled this topic before, so this year I'm going to go for a series of the main reasons I detest standardized testing so much in the form of a "top ten" list." Here we go (drum roll, please!):

10) Sampling error makes it impossible to get accurate results: "Sampling Error" refers to the inherent error that exists when you choose a small sample of all possible items to evaluate mastery of the entire set. For standardized tests, there are millions of possible questions that could be asked to assess students' mastery of the standards that students are supposed to learn in a given year. To create a usable test, a small number of those possible questions must be chosen. The assumption is that the questions are chosen carefully enough so that they are representative of all possible questions. In other words, if a student answers 70% of the sample questions correctly, the assumption is that they would have answered 70% of all possible questions correctly.
"Sampling error" is a mathematical term that refers to the probability that the sample score is close (usually 90% or 95% accuracy is checked for) to the actual score the student would have received if tested on all questions. You see this number when political polls results are reported, it's called the "margin of error." So if candidate A is poled at 40% and candidate B is polled at 45% but the margin of error is + or - 7%, you would say that they are in a statistical tie. The margin of sampling error is greater than the difference between the results, meaning that the poll doesn't really indicate a clear advantage for either candidate.
For the California standards tests, students scores are grouped into "quintiles," where a student in the bottom 20% is in quintile 1, students in the next 20% (21% to 40%) are in quintile 2, etc. A student who is in the 3rd percentile is in quintile 1 and receives a score of 200. A student in the 19th percentile is also in the 1st quintile and also receives a score of 200. A student in the 22nd percentile would be in quintile 2 and receives a score of 400. Quintile 3 gets 600, 4 gets 800 and 5 gets 1000. The problem is, if you look at the average number of questions correct of a student in quintile 3 and the average number of questions correct of a student in quintile 4, the difference is less than the margin of error due to sampling error! Students could go up or down 1 quintile just by choosing different questions to include in the test, without any additional learning or skills on the students' part.
It seems immoral to me to attach such high stakes to tests that suffer from this tragic flaw from the outset. I think that we can use these tests as long as we acknowledge their limited ability to give us accurate data. When we make major funding decisions as if these results are objective fact and not broadly fallible approximations, we are playing Russian Roulette with our kids' education and future. Our kids deserve better than that.

(Resources: 1 2)

-"Definition of Statistics: The science of producing unreliable facts from reliable figures."

Friday, April 07, 2006

Whoops! We CAN do better!

In November, I wrote a post about how the state's goal for school's API scores was 800, and how it was impossible for more than 40% of the schools to meet that goal. You know what? I was wrong... sort of. There is a way for 60% of school's to reach that target, but I still maintain that it's gonna be nigh impossible to get there.

Why? Well, imagine that there are 100 students in the country, and we rank them according to their test scores into quintiles (what's a quintile, you ask? Check this post out for an explanation). That would mean that 20 kids got a score of 200, 20 got a score of 400, 20 got a score of 600, 20 got a score of 800, and 20 got a score of 1000. You can see that only 40% of the students got 800 or above. The school's API, however, depends on the average of all of the students in the school. So it's possible to arrange the students so that the average is at or above 800 for more than 40% of the students.

How? Well, it's those kids that scored 1000. We can use their extra points to balance out some other kids who scored under 800. The simplest example would be to match up one student with a score of 1000 with one student with a score of 600. If those two students made up the entire school, that school would have an average API of 800. See? Easy, huh?

Unfortunately, it's only 20 percent of the students that have these extra points, so if we make 40 % of our schools populated with exactly half of their students in the 600 range and half in the 1000 range, we can bring up those schools to 800 as an average score. Keep all of the students who scored 800 together and their schools keep their 800 score, giving us another 20% of our schools meeting the target. Those kids who scored 200 or 400? Well, the problem is that we'd have to waste 2 or 3 1000 scoring kids on each one of those students to bring the average up to 800, which would mean less schools overall would have the average that we want. Sorry, bottom-percentilers. You lose, but America wins... right?

Not really. Notice that the only thing I'm doing is rearranging which schools these kids are going to. No increase in learning or improvement in instruction has to happen for more schools to come up to the statewide goal, we just need to mix the students up a little bit.

Actually, that's not a bad idea. I am not alone among educators who think that a heterogeneous population in a school and in individual classrooms makes for a better learning environment and improves learning for all students, even those at the top (of course, this won't necessarily improve test scores, since they don't measure learning, just ranking). So why do I say it's still impossible? Well, you'd probably be able to convice parents and students at a school averaging 600 that it's a good idea for some students from their school to be transferred to the school across town that averages 1000. The problem comes when you try to get parents and students (mostly parents) of the students at school 1000 to move over to school 600.

Well, I have a proposal that could actually make the standardized testing system marginally worthwhile. Not worthwhile enough to keep doing it the way we do, but less of a complete waste of time, energy, resources, and money. What if a federal mandate demands that all schools must be made up of a student population that has an average API of exactly 600? Testing would be given in 3rd grade, 6th grade, 9th grade, and 12th grade. Transfers would only be permissible in the 4th grade, 7th grade and 10th grade, and those transfers would have to rebalance the API averages in those schools back to 600. (It's cruel, ineffective, and nonsensical to administer standardized tests to kids before 3rd grade. They can't read well enough or sit still and focus long enough to make it worth the effort.) That way, every "Elementary B" school ("Elementary A" being Kindergarten through 3rd grade) would start off with the same average level of test-takers (notice I don't say that they're at the same level as far as actual knowledge or skill... just test-taking), and then the test that they take 3 years later would actually show whether their test-taking improved as a result of 3 years at that school or not (again, this wouldn't necessarily tell us anything about their learning during those 3 years, just their test-taking ability). Likewise for 7th -9th grade "Middle Schools" and 10th through 12th grade "High Schools." Any schools that scored above 600 would have improved their students test taking ability relative to the average improvements across the nation. Lower than 600? Well, that would be bad, wouldn't it?

It still wouldn't tell us much about whether the kids are learning anything, but at least it would be a fair comparison of the schools' test-prep abilities. It's a far cry from assessing actual knowledge, but isn't it better than testing for socio-economic status and race, which is what the tests as they're currently set-up do test for?

- "That's impossible, no one can give more than one hundred percent, by definition that is the most anyone can give."

Monday, April 11, 2005

What standardized test scores really tell us

The API numbers are out, and guess what? They tell you (once again) which students go to which schools.

What's that? You thought that they were supposed to tell you how well the students were being educated by their schools? You mean you bought that line? Let me tell you how the system really works:

Performance on standardized tests (including the CAT-6 and SAT) can be predicted very reliably by a few factors... none of which is the school the student attends.

Regardless of which schools students attend, the most reliable factor is the parents' level of education. More highly educated parents have kids who get better test scores. I wonder if that has anything to do with the fact that the average student spends less than 1,500 hours a year under a teacher's supervision in a classroom, and more than 3, 500 hours a year under their parent's supervision outside of school. Who do you think has a greater impact on how they spend their time, especially in middle school and high school where those 1,500 hours are split up between 5-7 different teachers?

Another factor that affects test scores more than which school a student attends is socio-economic status. Poor students do worse than rich students, no matter where they go to school. Not suprisingly, there's a lot of overlap between parents socio-economic status and level of education. An interesting exception which proves the rule is the fact that children of teachers tend to do better than other students whose parents are in the same income bracket. If anything, this might be used as evidence that teachers are under-paid...

Another unsurprising trend is that minority students do worse than white students. What might be surprising is that this trend has more to do with socio-economic factors than race. It just so happens that more minorities are poor. Rich minority students with highly educated parents do nearly as well as their white counterparts, and the same trend can be seen in the poor performance of poor white children of poorly educated parents and their minority counterparts.

So, what do API scores tell us? They tell us the level of education of the parents of the students at that school, they tell us about the socio-economic status of the students at that school, and to a limited extent, they can give us an idea of the likelihood of minorities being over- or under-represented at the school. What they can't tell us is how well that school is educating students.

What to do? How about de-segregating schools? We've been trying to do it for 40 years, but schools are still segregated. It's less a question of race, however, than of socio-economics. We need to integrate the schools with rich kids and poor kids together. We need highly educated parents' kids in school with less educated parents' kids. That's the only way that test scores will be useful in the ways we try to use them.

I expect this will happen soon... as soon as the Devil ice-skates to work and farmers need airplanes to herd their swine. Until then, by all means let's punish schools for being willing to educate the poor and needy. I mean, what's the use of being rich and well-educated if your kids don't get preferential treatment?

- "Wait a minute. Are you telling me that we're so far behind the other students that we're going to catch up with them by going SLOWER than them?" (I'm kind of cheating, this is from a TV show and I couldn't find the exact quote...)