Introducing CS 1 students to algorithmic bias via the Ethical Engine lab

There’s a lot of recent interest around the ethics of technology. From recent popular press books like Algorithms of OppressionAutomating Inequality, and Technically Wrong*, to news stories about algorithmic bias, it seems like everyone is grappling with the ethical impacts of technology. In the computer science education community, we’re having our own discussions (and have been for some time, although there seems to be an uptick in interest there) on where ethics “belongs” in the curriculum, and how we can incorporate ethics across the curriculum — including in introductory courses.

One initiative aimed at touching on ethical issues in CS 1 particularly caught my attention. In July 2017, Evan Peck, at Bucknell University, posted about a programming project he and Gabbi LaBorwit developed based on MIT’s Moral Machine, a reworking of the classic Trolley Problem for self-driving cars. This project, the Ethical Engine, had students design and implement an algorithm for the “brains” of a self-driving car, specifically how the car would react if it could only save its passengers or the pedestrians in the car’s way. After implementing and testing their own algorithms, students audited the algorithms other students in the class designed.

Justin Li at Occidental College built upon this lab, making some changes to the code and formalizing the reflection questions and analysis. He wrote about his experiences here. In particular, Justin’s edits focused more on student self-reflection, having them compare their algorithm’s decisions against their manual decisions and reflecting to what extent their algorithm’s decisions reflected or did not reflect their priorities.

I was intrigued by the idea of this lab, and Justin’s version seemed like it would fit well with Carleton students and with my learning goals for my intro course. I decided to integrate it into my fall term section of intro CS.

Like Evan and Justin, I’ve made my code and lab writeup freely available on GitHub. Here are links to all three code repositories:

Framework

Based on Justin’s and Evan’s writeups, I made several modifications to the code.

  • In the Person class, I added “nonbinary” as a third gender option. I went back and forth for a bit on how I wanted to phrase this option, and whether “nonbinary” captured enough of the nuance without getting us into the weeds, but ultimately decided this would be appropriate enough.
  • Also in the Person class, I removed “homeless” and “criminal” as occupations, since they didn’t really fit in that category, and made them boolean attributes, similar to “pregnant”. Any human could be homeless, but only adults could have the “criminal” attribute associated with them.
  • In the Scenario class, I removed the “crossing is illegal” and “pedestrians are in your lane” messages from the screen output, since in this version of the code these things are always true.

I also made it a bit clearer in the code where the students should make changes and add their implementation of the decision making algorithm they designed.

Execution

I scheduled the lab during Week 8 of our 10 week course, just after completing our unit on writing classes. We take a modified “objects-early” approach at Carleton in CS 1, meaning students use objects of predefined classes starting almost immediately, and learn to write their own classes later in the term. The lab mainly required students to utilize classes written by others, accessing the data and calling upon the methods in these classes, which conceivably they could have done earlier in the term. However, I found that slotting the lab in at this point in the term meant that students had a deeper understanding of the structure of the Person and Scenario classes, and could engage with the classes on a deeper level.

I spread the lab over two class periods, which seemed appropriate in terms of lab length. (In fact, one of the class periods was shortened because I gave a quiz that day, and the majority of the students had not finished the lab by the end of class, which leads me to believe that 2 whole class meeting periods at Carleton, or 140 minutes, would be appropriate for this lab.) As they do in all our class activities, students worked in assigned pairs using pair programming.

On the first day, students made their manual choices and designed their algorithm on paper. To ensure they did this without starting with the code, I required them to show their paper design to either my prefect (course TA) or myself. A few pairs were able to start implementing the code at the end of Day 1. On the second day, students implemented and tested their algorithms, and started working through the lab questions for their writeups. Most groups did not complete the lab in class and had to finish it on their own outside of class.

At the end of the first day, students submitted their manual log files. To complete the lab, students submitted their algorithm implementation, the manual and automatic logs, and a lab writeup.

Observations

Unexpectedly, students struggled the most with figuring out how to access the attributes of individual passengers and pedestrians. I quickly realized this is because I instruct students to access instance variables using accessor and mutator methods, but the code I gave them did not contain accessor/mutator methods. This is a change I plan to make in the code before I use this lab again. I also plan to look a bit more closely at the description of the Person and Scenario classes in the lab, since students sometimes got confused about which attributes belonged to Scenarios and which belonged to Persons.

Students exhibited a clear bias towards younger people, often coding this into their algorithms explicitly. One pair mentioned that while their algorithm explicitly favored younger people over the elderly, in their manual decisions they did “think of our grandmas”, which led to differences in their manual and automatic decisions in some places. A fair number of students in this class came from cultures where elders traditionally hold higher status than in the US, so the fact that this bias appeared so strongly surprised me somewhat. Pregnant women also got a boost in many students’ algorithms, which then had the effect of overfavoring women in the decisions — which many students noted in their writeups. While nearly all pairs explicitly favored humans over pets, a few pairs did give a small boost to dogs over cats, while no one gave any boost to cats. I’m not sure why this class was so biased against cats.

I was impressed by the thoughtfulness and nuance in many of the lab writeups. Most students were able to identify unexpected biases and reason appropriately about them. Many thoughtfully weighed in on differences in their algorithm’s choices versus the choices of their classmates’ algorithms, one pair even going so far as to reason about which type of self-driving car would be more marketable.

In the reflection question about the challenges of programming ethical self-driving cars, many students got hung up on the feasibility of a car “knowing” your gender, age, profession, etc, not to mention the same characteristics of random pedestrians, and being able to utilize these to make a split-second decision about whom to save. This is a fair point, and in the future I’ll do a better job framing this (although to be honest I’m not 100% sure what this will end up looking like).

One of the lab questions asked students to reflect on whether the use of attributes in the decision process is ethical, moral, or fair. Two separate pairs pointed out that the selection of attributes can make the decision fair, but not ethical; one pair pointed out the converse, that a decision could be ethical but not necessarily fair. I was impressed to see this recognition in student answers. Students who favored and used simpler decision making processes also provided some interesting thoughts about the limitations of both “simpler is better” and more nuanced decision-making processes, both of which may show unexpected bias in different ways.

Conclusions and takeaway points

Ten weeks is a very limited time for a course, so for any activity I add or contemplate in any course I teach, I weigh whether the learning outcomes are worth the time spent on the activity. In this case, they are. From a course concept perspective, the lab gave the students additional practice utilizing objects and developing and testing algorithms, using a real-world problem as context. This alone is worth the time spent. But the addition of the ethical analysis portion was also completely worth it. While I have yet to read my evaluations for the course, students informally commented during and after the exercise that they found the lab interesting and thought-provoking, and that it challenged their thinking in ways they did not expect going into an intro course. I worried a bit about students not taking the exercise seriously, and while I think that was true in a few cases, by and large the students engaged seriously with the lab and in discussions with their classmates.

I teach intro again in spring term, and I’m eager to try this lab again. The lab has already sparked some interest among my colleagues, and I’m hoping we can experiment with using this lab more broadly in our intro course sections, as a way to introduce ethics in computing early in our curriculum.

*all of which are excellent books, which you should definitely read if you haven’t done so already!