Most of the information comes from the technical report. It's scary to look at but it's mostly skippable data tables so it's not a terrible read. There's also the API Information Guide. I'll try to remember to cite when I can but if something seems wonky, call me on it and I'll verify.
Part 1 I'll explain the basics of test construction and API results.
Part 2 I'll discuss the few useful pieces of information I've been able to extract from test results.
I don't think there will be a part 3 but if I get enough questions I'll see what I can do.
If you don't feel like reading, skip to "What are valid score comparisons?" That's the part you'll want to know. The more appropriate heading would be, "What aren't valid score comparisons?"
How is API calculated?
There are adjustments
From the API info guide (page 6):
In high school, the CAHSEE (our exit exam) is also factored in. The arithmetically minded may notice the large drop-off in points from Below Basic to Far Below Basic.
There is an excel spreadsheet to help you estimate your API.
How is the test constructed and scored?
It's a lot. I'll give you the highlights. Tara Richerson has an excellent series on test construction and she's got actual experience at it. Pay attention to how they're anchored to the previous year's test.
There are two things that really interested me. The first was how cut scores for the different levels of proficiency were created. I'm just going to snip and let you read. From the technical report (257):
The Modified Angoff is used for the ELA tests and the Bookmark Method for the rest. Nutshell: A panel reads the questions and estimates what a barely proficient/basic/etc person would get right. Then the median of the panelists is taken. This becomes the cut scores. Science uses the Bookmark Method, so they put the questions from easiest to hardest. Someone then says, "I think a barely proficient person would get it right up to this question about 2/3 of the time and miss the ones after about 2/3 of the time." A bunch of those people are asked and the median becomes the cut scores. ELA works basically the same way except they rate each question and the cut score is computed based on the score. The cut scores and all raw scores are then matched to a table to align the scale scores from year to year (actually they only really align in two-year pairs). This isn't useful to know at all, but I just find it really interesting.
The second thing I'm pointing out is actually useful. Based on the test results, CA has generated proficiency level descriptors. If I recall correctly, these were generated based on a few years of test results and so are supposed to be things that, for example, a Proficient science student actually knows. These are useful, especially for those of us who need to decide on the level of depth for our standards. It's located here and the good stuff starts in appendix A. 8th grade science starts at A-102. Here's an example:
What are valid score comparisons?
There are two main ways people (teachers, parents, admin, everyone) mess this up. People think you can compare scale score from year to year and that you can compare API scores year to year. You can't do either. This is crucial to understand.
In the example that got this started, Student A got a perfect 600 in 7th grade and a 550 in 8th grade. It's a natural question to ask why the student dropped from 7th to 8th. You can't though. California does not vertically align its scores. A 550 in one year has no relation to a 550 the other. Additionally, a 550 in the same year has no relation to a 550 in a different content area. You CANNOT make this comparison.
Horse's mouth (Technical Report, 6):
You are fine comparing the same year/content to other classes, schools, districts, the state. Anything else, and I mean anything else, isn't valid. Jennifer tweeted this link out earlier. If you take a look at the graph you'll see certain test cut scores are harder than others. MS math scores will be lower than elementary scores because our tests are harder to score proficient on.
If you go back up to how the test cut scores are created, you'll noticed they're defined for "Proficient Algebra student" or "Proficient Science Student." They are not scored based on growth from the previous year. Some states do that.1
The API results are similarly misleading. You'd think you can just look at your school's API each year and see if it goes up. Turns out, you can't. That's because how the API is calculated varies each year, for example the weights of different tests and which tests are included. So a 2006 API score can't be compared to 2011. It makes sense when you think about it but it's completely unintuitive and everybody in the entire world thinks you can create a line graph and see how your school is doing.
You CAN compare between base and growth APIs. These will be matched (page 14 of the Info Guide)
and you CAN compare the growth from year to year. Take the Growth API and subtract the Base API. Also on page 14.
Repeating myself in case you missed it: Base and Growth scores in the same cycle compare different year's test scores with the same calculation method. If they are not in the same cycle, they could (and likely do) use a different method for calculation. It is not valid to compare API scores in different cycles.
Does anyone know this? No. Everyone, understandably, compares scores year to year. This is important to know though because if your scores take a dip, it might be because the calculation methods have changed. For example, until the 2010-2011 cycle, high school APIs didn't include the CMA, which is the modified test usually taken by students in SDC.
Summary: You can compare your student scores only within the same grade level and content area. You can compare API scores within cycles (Base to Growth) and you can compare growth between cycles. THAT. IS. IT.
In part 2, I'll write about what useful (for me) information you can get out of it.
1: Vertically aligned scores usually come in 2 flavors. Either the same score indicates the same equivalent level. If you score a 550 one year and 550 another, that means you made the equivalent of one year of growth. Or the score is like the "Reading Level" reports we get and all students are scored on the same scale. One year you get a 400 and the next a 520. You've made 120 points of growth that year. Smarter states will make one year equal 100 points so you can easily see if you made a year of growth.
As I was typing a response to a teacher at my school who said "I have my students' scores what am I looking at? increases? decreases? no change? what is it all about" my daughter said Jason has a post about CST scores.
ReplyDeleteYou went into much more detail that I did - so for the simpler people -
API is the state accountability requirement - it involves growth. It is valid for schools and districts. One way to improve your school's API score is to raise the ELA scores of the FBB and BB students. ELA is weighted more heavily than math, science or social studies and moving low kids up gets you a bigger API increase.
AYP is the federal accountability requirement and it is a number chosen by someone. The biggest factor in AYP is the number of students who are at least proficient on the CST tests. For the 2011 tests the number they are looking for is about 68% (not happening in my school). By 2014 the plan is to have 100% of the students Proficient. hmmm
But API and AYP are not really for individual students as you pointed out. And it may not be possible to compare scaled or raw scores from one year to another. BUT I think it is valid to compare levels.
If my own kid went from Proficient to BelowBasic in a year, I'd be talking to the school about it. Unfortunately if my kid went from Basic to Advanced I probably wouldn't speak to the school :-(
As I look at the scores of my students I do look to see changes in levels. And I acknowledge growth as I see them at school.