Introducing CS 1 students to algorithmic bias via the Ethical Engine lab

There’s a lot of recent interest around the ethics of technology. From recent popular press books like Algorithms of OppressionAutomating Inequality, and Technically Wrong*, to news stories about algorithmic bias, it seems like everyone is grappling with the ethical impacts of technology. In the computer science education community, we’re having our own discussions (and have been for some time, although there seems to be an uptick in interest there) on where ethics “belongs” in the curriculum, and how we can incorporate ethics across the curriculum — including in introductory courses.

One initiative aimed at touching on ethical issues in CS 1 particularly caught my attention. In July 2017, Evan Peck, at Bucknell University, posted about a programming project he and Gabbi LaBorwit developed based on MIT’s Moral Machine, a reworking of the classic Trolley Problem for self-driving cars. This project, the Ethical Engine, had students design and implement an algorithm for the “brains” of a self-driving car, specifically how the car would react if it could only save its passengers or the pedestrians in the car’s way. After implementing and testing their own algorithms, students audited the algorithms other students in the class designed.

Justin Li at Occidental College built upon this lab, making some changes to the code and formalizing the reflection questions and analysis. He wrote about his experiences here. In particular, Justin’s edits focused more on student self-reflection, having them compare their algorithm’s decisions against their manual decisions and reflecting to what extent their algorithm’s decisions reflected or did not reflect their priorities.

I was intrigued by the idea of this lab, and Justin’s version seemed like it would fit well with Carleton students and with my learning goals for my intro course. I decided to integrate it into my fall term section of intro CS.

Like Evan and Justin, I’ve made my code and lab writeup freely available on GitHub. Here are links to all three code repositories:

Framework

Based on Justin’s and Evan’s writeups, I made several modifications to the code.

  • In the Person class, I added “nonbinary” as a third gender option. I went back and forth for a bit on how I wanted to phrase this option, and whether “nonbinary” captured enough of the nuance without getting us into the weeds, but ultimately decided this would be appropriate enough.
  • Also in the Person class, I removed “homeless” and “criminal” as occupations, since they didn’t really fit in that category, and made them boolean attributes, similar to “pregnant”. Any human could be homeless, but only adults could have the “criminal” attribute associated with them.
  • In the Scenario class, I removed the “crossing is illegal” and “pedestrians are in your lane” messages from the screen output, since in this version of the code these things are always true.

I also made it a bit clearer in the code where the students should make changes and add their implementation of the decision making algorithm they designed.

Execution

I scheduled the lab during Week 8 of our 10 week course, just after completing our unit on writing classes. We take a modified “objects-early” approach at Carleton in CS 1, meaning students use objects of predefined classes starting almost immediately, and learn to write their own classes later in the term. The lab mainly required students to utilize classes written by others, accessing the data and calling upon the methods in these classes, which conceivably they could have done earlier in the term. However, I found that slotting the lab in at this point in the term meant that students had a deeper understanding of the structure of the Person and Scenario classes, and could engage with the classes on a deeper level.

I spread the lab over two class periods, which seemed appropriate in terms of lab length. (In fact, one of the class periods was shortened because I gave a quiz that day, and the majority of the students had not finished the lab by the end of class, which leads me to believe that 2 whole class meeting periods at Carleton, or 140 minutes, would be appropriate for this lab.) As they do in all our class activities, students worked in assigned pairs using pair programming.

On the first day, students made their manual choices and designed their algorithm on paper. To ensure they did this without starting with the code, I required them to show their paper design to either my prefect (course TA) or myself. A few pairs were able to start implementing the code at the end of Day 1. On the second day, students implemented and tested their algorithms, and started working through the lab questions for their writeups. Most groups did not complete the lab in class and had to finish it on their own outside of class.

At the end of the first day, students submitted their manual log files. To complete the lab, students submitted their algorithm implementation, the manual and automatic logs, and a lab writeup.

Observations

Unexpectedly, students struggled the most with figuring out how to access the attributes of individual passengers and pedestrians. I quickly realized this is because I instruct students to access instance variables using accessor and mutator methods, but the code I gave them did not contain accessor/mutator methods. This is a change I plan to make in the code before I use this lab again. I also plan to look a bit more closely at the description of the Person and Scenario classes in the lab, since students sometimes got confused about which attributes belonged to Scenarios and which belonged to Persons.

Students exhibited a clear bias towards younger people, often coding this into their algorithms explicitly. One pair mentioned that while their algorithm explicitly favored younger people over the elderly, in their manual decisions they did “think of our grandmas”, which led to differences in their manual and automatic decisions in some places. A fair number of students in this class came from cultures where elders traditionally hold higher status than in the US, so the fact that this bias appeared so strongly surprised me somewhat. Pregnant women also got a boost in many students’ algorithms, which then had the effect of overfavoring women in the decisions — which many students noted in their writeups. While nearly all pairs explicitly favored humans over pets, a few pairs did give a small boost to dogs over cats, while no one gave any boost to cats. I’m not sure why this class was so biased against cats.

I was impressed by the thoughtfulness and nuance in many of the lab writeups. Most students were able to identify unexpected biases and reason appropriately about them. Many thoughtfully weighed in on differences in their algorithm’s choices versus the choices of their classmates’ algorithms, one pair even going so far as to reason about which type of self-driving car would be more marketable.

In the reflection question about the challenges of programming ethical self-driving cars, many students got hung up on the feasibility of a car “knowing” your gender, age, profession, etc, not to mention the same characteristics of random pedestrians, and being able to utilize these to make a split-second decision about whom to save. This is a fair point, and in the future I’ll do a better job framing this (although to be honest I’m not 100% sure what this will end up looking like).

One of the lab questions asked students to reflect on whether the use of attributes in the decision process is ethical, moral, or fair. Two separate pairs pointed out that the selection of attributes can make the decision fair, but not ethical; one pair pointed out the converse, that a decision could be ethical but not necessarily fair. I was impressed to see this recognition in student answers. Students who favored and used simpler decision making processes also provided some interesting thoughts about the limitations of both “simpler is better” and more nuanced decision-making processes, both of which may show unexpected bias in different ways.

Conclusions and takeaway points

Ten weeks is a very limited time for a course, so for any activity I add or contemplate in any course I teach, I weigh whether the learning outcomes are worth the time spent on the activity. In this case, they are. From a course concept perspective, the lab gave the students additional practice utilizing objects and developing and testing algorithms, using a real-world problem as context. This alone is worth the time spent. But the addition of the ethical analysis portion was also completely worth it. While I have yet to read my evaluations for the course, students informally commented during and after the exercise that they found the lab interesting and thought-provoking, and that it challenged their thinking in ways they did not expect going into an intro course. I worried a bit about students not taking the exercise seriously, and while I think that was true in a few cases, by and large the students engaged seriously with the lab and in discussions with their classmates.

I teach intro again in spring term, and I’m eager to try this lab again. The lab has already sparked some interest among my colleagues, and I’m hoping we can experiment with using this lab more broadly in our intro course sections, as a way to introduce ethics in computing early in our curriculum.

*all of which are excellent books, which you should definitely read if you haven’t done so already!

Reuniting with an old familiar course after a long layoff

As you could probably tell from the radio silence, things have been crazy around here. December and the first part of January were a blur of grant writing (and frantically finishing up simulations/analysis to generate data for the grant proposal) and job applications, and oh yeah, some holidays and travel. And in the midst of this craziness, class prep for a course I last taught in Spring Term 2012 (almost 3 years ago!): Intro to Computer Science.

Intro CS used to be my bread-and-butter course. I taught at least one, and typically 2, sections of intro each year through most of my time here. Intro is probably one of the most challenging courses to teach, partly because students come in with wildly varying backgrounds and partly because there’s so much to learn and grasp early on—the learning curve can be steep, and trying to keep track of all the syntax while also learning to think in a completely different way about problem solving is tricky and can be daunting. But it’s precisely because of the challenge, and because the students learn so much and grow so much over the course of the term, that it’s one of my favorite courses to teach.

Recently, we’ve handed over much of the teaching of intro to our visiting faculty. Part of this is because we often haven’t hired our visitors by the time we have to craft the next year’s schedule, so it’s easy to assume that whomever we eventually hire can teach intro. Part of this is also to give our new and visiting faculty a break—by teaching multiple sections of a course over the year, they are doing fewer new-to-them preps, which eases their burden. And our visitors tend to do a nice job with the course. The price of this, unfortunately, is that old fogies like myself don’t get the pleasure and the privilege of introducing students to the discipline like we used to.

Last year, when I was making the schedule for this year (one of the “perks”(?) of being chair), and weighing everyone’s teaching preferences, I saw that I had an opportunity to teach a section of intro, so I scheduled myself for one of the sections.

The re-entry has been a bit rough. Fortunately a lot of what I used to do and a lot of my old intuition about how to approach various topics has come back as I’ve reviewed my old class notes and my sample code. We’ve switched from Python 2 to Python 3 since I last taught, which I’ve taken as an opportunity to rewrite most of my sample code (which also helps with the recall). However, I tend to over- or underestimate what we can get done in the course of a 70 minute class (mostly overestimating at this point), and I’ve forgotten just how much trouble students have with a few key concepts early on in the course. My timing is off, too—I feel like I’m spending too much time explaining things and not leaving enough time for coding and practice in class—but I think I’m starting to get a better handle on that mix of “talk” and “do”.

There have been some benefits to the long layoff, though. I have some new ideas that I’ve been trying out—for instance, starting class by having students work on a problem by hand for 10-15 minutes, to get the intuition behind whatever we’re coding up in class that day—that I might not have considered if I was teaching intro more consistently. I’m reading the textbook more carefully (because none of the readings are familiar anymore and I’ve switched textbook editions), so I have a better sense of the level of preparation students have when they come into class after completing the daily targeted readings and practice problems. I’ve done more live-coding in class, because as I’ve been re-working my code examples I’ve noticed places where it would benefit students to see me code and think out loud in real time, rather than just walking them through pre-written code. Basically, I get to see the course with fresh eyes, without all the stress of it being a completely new prep.

So I’m immensely enjoying the intro experience again, and while on balance the layoff was partly beneficial, I hope that I don’t go quite such a long time between teaching intro sections again.