Here's a story: I'm driving in my car. I check my odometer and I've just gone 25 miles. I check it again and I've now gone 50 miles total. I check again and I'm 75 miles away. I stop when I'm 100 miles away. If I take an average of each time I checked by mileage, I get 62.5 miles.
Totally unrelated story: I'm taking a test. The first time I get a 25%. I take it again and I get 50%. The next time I get 75%. Finally, I get 100%. If I take an average of each time I took that test, I get 62.5%
Learning is a journey. You cannot average different stages of the trip in any meaningful way. Not only is it an inappropriate use of averaging but it sends the wrong message. It tells students that the 100% they got the last time was nothing more than experimental error. It dismisses the growth that has happened.
I usually disapprove of number crunching for grades in general. But I understand that some people are required to do it.
So when can you average?
Going back to my car trip: I stop my car and get out. I look at the odometer. I take a GPS reading. I check the road signs. I check my map. I've now got four different measurements for how far I am at this exact moment.
Multiple measures for standards-based grading are good. It is in fact a requirement that you take multiple and varied measurements in any good assessment system. Ideally these would all occur at the same time, but realistically they'd be within a few days of each other.
In this case, it is acceptable to average your results as long as you don't do it mindlessly. Not all assessments are created equal. I wouldn't even think of averaging my GPS results with the ones I got by using a ruler and a map.
If you have to average multiple assessments, they should meet two criteria:
- The assessments all need to be quality measures of the learning goal. A lab called "Measuring Motion" isn't a valid assessment of that learning goal just because its got it in the name. Check every assessment against your learning goals. Make sure you're assessing what you think you're assessing.
- The assessments all need to measure the same point in the learning progression. Usually this means temporal proximity. Don't average two assessments that occurred three weeks apart.
The criteria must be evaluated on a per student basis.
Assessments are not quality assessments for each and every student. The time span it takes to render an assessment obsolete varies by student. This relates directly to the statement by Chris Ludwig I quoted in my last post.
Your grades come from weighing the total body of evidence you've gathered against the standards you've set and communicated. Use averaging if it will help you make a better decision but don't let it make the decision for you.
To quote @johntspencer: "A simple glimpse at Star Trek reminds me that Data is meant to inform rather than drive." [source]
Data is useful. Data is good for advice. But Picard is the captain. Be the captain. Don't mindlessly average.
1: O'Connor says that if you must use mean, also take a look at median and mode to see if the mean is giving you a true picture of mastery.
Data image from: http://upload.wikimedia.org/wikipedia/en/0/09/DataTNG.jpg
Picard image from:http://upload.wikimedia.org/wikipedia/en/6/6d/JeanLucPicard.jpg
Post publishing note: This was probably the first post all summer where I didn't link to Shawn's blog. I publish this, check my Reader.....and he also has a picture of Data! I swear, we're not the same person. He's much cooler than me. Literally. He curls in his backyard.