Rethinking assessment

When we moved online and moved to mandatory S/Cr/NC grading (basically, Carleton’s version of pass-fail) this spring, I vastly simplified the way I graded projects and other major course assessments. Here’s what I wrote in my syllabus:

Each assessment that you hand in will be evaluated against a checklist related to one or more of the course learning objectives. I will rank each learning objective, and the overall submission, according to a three-point scale: Does not meet expectations; Meets expectations; Exceeds expectations. If an assignment does not meet expectations overall, you (and your team, where applicable) will have the opportunity to revise and resubmit it to be re-evaluated.

… You will earn an S in the course if at least 70% of your evaluated work (after revision, if applicable) is marked as “Meets expectations” or “Exceeds expectations.” You will earn a Cr in the course if between 60 and 70% of your evaluated work (after revision, if applicable) is marked as “Meets expectations.” You will earn an NC in the course if less than 60% of your evaluated work is marked as “Meets expectations.”

CS 257, Spring 2020 syllabus

I’d heard the term “specifications grading” and I knew that what I’d be doing in the spring was in the general spirit of specifications grading (if you squint hard enough). And it worked surprisingly well. Students knew exactly what they had to do to earn a particular grade in the course, and on individual assignments thanks to targeted rubrics. Allowing revision on any major assessment meant that students could recover from the inevitable hiccups during a pandemic term (and a term marked by grief, loss, and protests over George Floyd’s murder). And at the end of the term, when some students could just not give any more to their studies due to all that was happening around them, the system extended some much-needed grace — if they’d already met the threshold for an S, they could bow out or step back from the final assessment, assuming their teams were on board with their decisions.

I wondered: what would it take to do something like this during a graded academic term? I wanted some more guidance.

And so, I did what I always do when I want to learn more about something: I hit the books. Two books, in particular: Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time, by Linda B. Nilson (2014); and Grading for Equity: What It Is, Why It Matters, and How It Can Transform Schools and Classrooms, by Joe Feldman (2018).

Both books tackle the same general problem: grades and grading are imperfect, biased, and measure lots of things other than how well students achieved learning outcomes. Nilson solves the problem with a straightforward up-or-out approach: work is either acceptable, or it’s not. Feldman’s solution is a bit more nuanced: work is somewhere on a (short!) continuum between “insufficient evidence” and “exceeds learning targets”, but nothing resembling a formative assessment and/or “life skills” gets a grade.

Specifications Grading is faculty-centered in its approach, at its heart. A key goal of specifications grading is to save faculty time while still maintaining high quality feedback to students. The specifications grading approach is two pronged. First, all individual assessment grades are pass-fail. The assignment either meets the standard of acceptability, or it does not. No partial credit, no wrangling over how many points something is worth. The standards of acceptability are spelled out in a detailed rubric, or checklist, so that students know exactly what constitutes an acceptable submission. Second, a student achieves a particular course grade (A, B, etc) by completing either a specified set of activities (“bundles” or “modules”), or by demonstrating more advanced mastery of the course learning outcomes (“jumping higher hurdles”). The bundles/modules/hurdles are spelled out in detail in the syllabus, so that it’s crystal clear how a student earns a particular grade. In my spring course, for example, the bundles were simply percentages of course assessments acceptably completed. In a graded course, a bundle is often more complex: extra assessments, for instance, or more challenging assignments. While setting up the bundles/modules/hurdles seems like a really time consuming process, it is front-loaded, done before the course starts, so that the grading itself during the term is more streamlined. Basically, the instructor decides what constitutes meeting learning outcomes, and constructs the assessments and bundles/modules/hurdles accordingly. At the end of the course, then, the grade more closely indicates the level of mastery of course learning outcomes than a traditional partial-credit focused grade.

Grading for Equity is, I would say, more student focused. (And more K-12 focused, although I certainly found enough in the book worthwhile to consider for the college context.) Grading for equity is based on three pillars. First, grades should accurately reflect student achievement towards learning outcomes. This means grades should not include things like formative assessments (homework, in-class activities), extra credit, behavior, or “soft skills” — they should only reflect the results of summative assessments, and only the most recent result of a summative assessment. Feldman also cautions against using the typical 100-point scale, which is skewed towards failure, in favor of more compact scales (a 4 point scale, for instance, or a minimum score). Second, grades should be bias-resistant. They should not reflect a teacher’s impression of student behavior, which is flawed for many reasons, nor reflect a student’s life circumstances (for instance, their ability to complete homework outside of school hours). Third, grades should be motivational. It should be transparent to students what counts as mastery of a learning objective and how to achieve a particular grade. Formative feedback should not penalize mistakes, because this promotes a fixed mindset rather than a growth mindset. For the former, Feldman is a fan of detailed rubrics and the four-point scale (something like “Exceeds expectations”, “Meets expectations”, “Partially meets expectations”, “Insufficient evidence”).

Both books agree that students learn at different rates, and any summative assessment should take this into account. Both systems, thus, allow for retakes and redos. Specifications grading puts some limits around redos to make things easier on the professor, recommending some kind of “token” system where students have some limited number of redos/late passes over the course of the term/semester. Grading for equity favors as many retakes (up to the end of the term/semester) as a student needs or wants in order for them to meet learning targets. (Theoretically, anyway; the book acknowledges that there could be a snowball effect particularly when later work depends on earlier work, and suggests that time limits on retakes would be appropriate in this context.) Grading for equity argues that later assessments should replace earlier grades rather than, say, averaging them, It gets into the weeds a bit on the freedom of faculty to count anything that demonstrates a learning objective as an appropriate assessment of that objective, including things like discussions in office hours. I get the spirit of this, but it seems like something like this would be ripe for bias.

So, how am I using what I learned from these books about assessment as I plan my fall course, a CS elective? And what am I struggling with?

Plan: Retain the meets/exceeds expectations scales, with minor changes. I really liked the ease and clarity of the three-point scale in the spring. Grading for equity makes a compelling argument for the inclusion of a “not yet demonstrated” category, allowing teachers to differentiate between “handed in and not sufficient” and “not handed in”. So I may move to a 4-point scale for some assessments (“insufficient evidence”, “partially meets expectations”, “meets expectations”, “exceeds expectations”). Roughly, “partially meets” in my head equates to C-level work, “meets” to B-level work, and “exceeds” to A-level work. Moodle likes to convert everything to percentages, which is not as useful for this type of grading. I need to figure out how to hack Moodle to show students something closer to this scale rather than “you’ve met expectations so you’ve earned 50% on this assignment”.

Plan: Allow for revisions and be flexible with deadlines. We’re still in the midst of a pandemic. We still live in a white supremacist society. And the 2020 elections….well, need I say more? Fall will be tough emotionally and mentally for many of us. Extending flexibility and grace to my students, being willing to meet them where they are, is the least I can do. And while I’ve used revisions on exams previously with pretty good results, I’m eager to extend that to all major assessments, as I did in the spring, with later grades replacing earlier grades. I still need to figure out what revision looks like for my “deconstructed exam questions”, though.

Struggle: Not grading homework. While specifications grading allows forms of preparing for class to count towards a bundle, grading for equity adamantly opposes the idea — even just giving points for students completing the homework before class, regardless of correctness. (Which has been my policy for years.) Again, the argument is compelling — homework is formative, not summative; grading homework for correctness penalizes making mistakes in the learning process; there are many good reasons students can’t complete homework outside of class. But I still want students to take preparing for class seriously, so that once we get into class we can be an effective learning community, ready to engage with ideas. Feldman indicates late in the book that keeping track of homework submission can be valuable in pointing out effective and less effective learning strategies to students, and that awarding a small percentage of the overall grade to homework is not horrible. So I think I will compromise and continue to grade homework for completion only, but reduce the percentage it’s worth (maybe from 10% to 5%?).

Struggle: Bundles and modules and objectives, oh my! I’ll admit that I had to put specifications grading down for a bit and come back to it later once it got into specific examples of bundles. Ditto when I tried to wrap my head around deconstructing my usual assessments around course learning outcomes, as grading for equity describes. I got stuck on how I’d translate this to my elective. On reflection, it will take a lot of up front work, but the increased transparency will be worth it. I plan to use the “higher hurdles” approach from specifications grading, measuring hurdle height with the 4 point scale from grading for equity. I’m still not sure if I can achieve complete separation of learning objectives when an assessment covers several of them, grade book wise, so I may have to let that go for now and try separating those out more cleanly in the grade book in a future term.

There are other struggles, of course — grading for equity has me puzzling over my approach to grading group work, for instance — but these are the key ones on my mind as I piece my course together. I’m eager to continue my experiments with assessment and curious to apply what I’ve learned from my spring experiences and from these two books.