Saturday, August 13, 2011

Information about the California Standards Test Part 1

I was going to do a post on questioning routines but I got distracted by a Twitter convo with David Cox and Jennifer Borgioli. It was about how the results of the California Standards Test (CST) can be used. This information is specifically for my California peeps.

Most of the information comes from the technical report. It's scary to look at but it's mostly skippable data tables so it's not a terrible read. There's also the API Information Guide. I'll try to remember to cite when I can but if something seems wonky, call me on it and I'll verify.

Part 1 I'll explain the basics of test construction and API results.
Part 2 I'll discuss the few useful pieces of information I've been able to extract from test results.
I don't think there will be a part 3 but if I get enough questions I'll see what I can do.

If you don't feel like reading, skip to "What are valid score comparisons?" That's the part you'll want to know. The more appropriate heading would be, "What aren't valid score comparisons?"

How is API calculated? 

There are adjustments for certain populations (need to look into this more. Adjustments may just be in order to norm base/growth years. Edit again: The population adjustments are just for finding Similar Schools.), but it basically comes down to a straight mean. Your kids either score Advanced, Proficient, Basic, Below Basic, or Far Below Basic. Advanced and Proficient are good. The rest are bad.  Advanced earns 1000 points, Proficient 875, Basic 700, Below Basic 500, and Far Below Basic 200. As far as I know, the only weird thing is a student not taking Algebra in 8th grade (the General Math test) get bumped down a level in API points. So if she scored Advanced in the General Math test, she would earn 875 points for the school. (Edit: A ninth grader taking the General Math test gets bumped down two levels) Additionally, 7th graders do not get a bump for taking Algebra in 7th and the same applies for 8th graders in Geometry. After that, each test is weighted and the mean is calculated. The CAPA and CMA follow the same weighting rules as the CST. (edit: added)

From the API info guide (page 6):
Content Area Weights

In high school, the CAHSEE (our exit exam) is also factored in. The arithmetically minded may notice the large drop-off in points from Below Basic to Far Below Basic.

There is an excel spreadsheet to help you estimate your API.

How is the test constructed and scored?

It's a lot. I'll give you the highlights. Tara Richerson has an excellent series on test construction and she's got actual experience at it. Pay attention to how they're anchored to the previous year's test.

There are two things that really interested me. The first was how cut scores for the different levels of proficiency were created. I'm just going to snip and let you read. From the technical report (257):

The Modified Angoff is used for the ELA tests and the Bookmark Method for the rest. Nutshell: A panel reads the questions and estimates what a barely proficient/basic/etc person would get right. Then the median of the panelists is taken. This becomes the cut scores. Science uses the Bookmark Method, so they put the questions from easiest to hardest. Someone then says, "I think a barely proficient person would get it right up to this question about 2/3 of the time and miss the ones after about 2/3 of the time." A bunch of those people are asked and the median becomes the cut scores. ELA works basically the same way except they rate each question and the cut score is computed based on the score. The cut scores and all raw scores are then matched to a table to align the scale scores from year to year (actually they only really align in two-year pairs). This isn't useful to know at all, but I just find it really interesting.

The second thing I'm pointing out is actually useful.  Based on the test results, CA has generated proficiency level descriptors. If I recall correctly, these were generated based on a few years of test results and so are supposed to be things that, for example, a Proficient science student actually knows. These are useful, especially for those of us who need to decide on the level of depth for our standards. It's located here and the good stuff starts in appendix A. 8th grade science starts at A-102. Here's an example:

What are valid score comparisons?

There are two main ways people (teachers, parents, admin, everyone) mess this up. People think you can compare scale score from year to year and that you can compare API scores year to year. You can't do either. This is crucial to understand.

In the example that got this started, Student A got a perfect 600 in 7th grade and a 550 in 8th grade. It's a natural question to ask why the student dropped from 7th to 8th. You can't though. California does not vertically align its scores. A 550 in one year has no relation to a 550 the other. Additionally, a 550 in the same year has no relation to a 550 in a different content area. You CANNOT make this comparison.

Horse's mouth (Technical Report, 6):

You are fine comparing the same year/content to other classes, schools, districts, the state. Anything else, and I mean anything else, isn't valid. Jennifer tweeted this link out earlier. If you take a look at the graph you'll see certain test cut scores are harder than others. MS math scores will be lower than elementary scores because our tests are harder to score proficient on.

If you go back up to how the test cut scores are created, you'll noticed they're defined for "Proficient Algebra student" or "Proficient Science Student." They are not scored based on growth from the previous year. Some states do that.1

The API results are similarly misleading. You'd think you can just look at your school's API each year and see if it goes up. Turns out, you can't. That's because how the API is calculated varies each year, for example the weights of different tests and which tests are included. So a 2006 API score can't be compared to 2011. It makes sense when you think about it but it's completely unintuitive and everybody in the entire world thinks you can create a line graph and see how your school is doing.

You CAN compare between base and growth APIs. These will be matched (page 14 of the Info Guide)

and you CAN compare the growth from year to year. Take the Growth API and subtract the Base API. Also on page 14.

Repeating myself in case you missed it: Base and Growth scores in the same cycle compare different year's test scores with the same calculation method. If they are not in the same cycle, they could (and likely do) use a different method for calculation. It is not valid to compare API scores in different cycles.

Does anyone know this? No. Everyone, understandably, compares scores year to year. This is important to know though because if your scores take a dip, it might be because the calculation methods have changed. For example, until the 2010-2011 cycle, high school APIs didn't include the CMA, which is the modified test usually taken by students in SDC.

Summary: You can compare your student scores only within the same grade level and content area. You can compare API scores within cycles (Base to Growth) and you can compare growth between cycles. THAT. IS. IT.

In part 2, I'll write about what useful (for me) information you can get out of it.

1: Vertically aligned scores usually come in 2 flavors. Either the same score indicates the same equivalent level. If you score a 550 one year and 550 another, that means you made the equivalent of one year of growth. Or the score is like the "Reading Level" reports we get and all students are scored on the same scale. One year you get a 400 and the next a 520. You've made 120 points of growth that year. Smarter states will make one year equal 100 points so you can easily see if you made a year of growth. 

1 comment:

  1. As I was typing a response to a teacher at my school who said "I have my students' scores what am I looking at? increases? decreases? no change? what is it all about" my daughter said Jason has a post about CST scores.
    You went into much more detail that I did - so for the simpler people -
    API is the state accountability requirement - it involves growth. It is valid for schools and districts. One way to improve your school's API score is to raise the ELA scores of the FBB and BB students. ELA is weighted more heavily than math, science or social studies and moving low kids up gets you a bigger API increase.
    AYP is the federal accountability requirement and it is a number chosen by someone. The biggest factor in AYP is the number of students who are at least proficient on the CST tests. For the 2011 tests the number they are looking for is about 68% (not happening in my school). By 2014 the plan is to have 100% of the students Proficient. hmmm
    But API and AYP are not really for individual students as you pointed out. And it may not be possible to compare scaled or raw scores from one year to another. BUT I think it is valid to compare levels.
    If my own kid went from Proficient to BelowBasic in a year, I'd be talking to the school about it. Unfortunately if my kid went from Basic to Advanced I probably wouldn't speak to the school :-(
    As I look at the scores of my students I do look to see changes in levels. And I acknowledge growth as I see them at school.