Crowdsourcing an Ethical Dilemma

January 5th, 2009 by Lukas Biewald

Stalin said, “A single death is a tragedy; a million deaths is a statistic.” So what about one hundred deaths? What about five?

We tested this experimentally, asking people on Amazon Mechanical Turk to decide three versions of the classic philosophical conundrum, the Trolley Problem, in which you decide whether to kill one person so that several others may live.

There is no clear consensus what to do; in psychological experiments subjects disagree. But how does our decision change based on the number of people who will die? We varied the number of people saved between 1 and 1000 to see if that changed subjects’ ethical calculus.

Here are the frequencies of responses for three different scenarios (plotted on a log scale with a loess fit):

Here are the sample scenario descriptions (With five people):

Scenario A

A trolley is running out of control down a track. In its path are 5 people who have been tied to the track. Fortunately, you can flip a switch, which will lead the trolley down a different track to safety. Unfortunately, there is a single person tied to that track. Should you flip the switch?

Scenario B

As before, a trolley is hurtling down a track towards five people. You are on a bridge under which it will pass, and you can stop it by dropping a heavy weight in front of it. As it happens, there is a very fat man next to you - your only way to stop the trolley is to push him over the bridge and onto the track, killing him to save five. Should you proceed?

Scenario C

A brilliant transplant surgeon has five patients, each in need of a different organ, each of whom will die without that organ. Unfortunately, there are no organs available to perform any of these five transplant operations. A healthy young traveler, just passing through the city the doctor works in, comes in for a routine checkup. In the course of doing the checkup, the doctor discovers that his organs are compatible with all five of his dying patients. Suppose further that if the young man were to disappear, no one would suspect the doctor. Should the doctor sacrifice the man to save his other patients?


Each Turker was asked all three questions for a random value. But Turkers could answer the question multiple times if they wanted. Rational decision making would show monotonicity, i.e. if you would switch the track to save 10 people, you would switch the track to save 30 people. Human beings are not always rational like this, as is showed in this chart.

Each line in the following plot represents a single turker (so denser lines means a turker answered the question more times). The horizontal axis is the number of people that would be saved by answering “yes”. The red dots are where the turker responded with “no”, the blue dots are where the turker responded with “yes”. The turkers are sorted by the number of people in which they are first willing to answer “yes”.

-Lukas and Brendan

Original idea and post by Brendan at http://anyall.org/blog/2008/01/moral-psychology-on-amazon-mechanical-turk/

Trolley image from http://www.unc.edu/~prinz/pictures/

16 Responses to “Crowdsourcing an Ethical Dilemma”

  1. Bill Mill Says:

    Scenario C has no question; is that a typo, or how it was on the survey?

  2. Bill Mill Says:

    ahha, I missed the link to the post by brendan. The last line of Scenario C is: “Should the doctor sacrifice the man to save his other patients?”

  3. Paul Says:

    Is there a correlation between a specific Turker’s answer on all three questions? Can you say that someone responding yes on A for a low number (i.e. more directly comparing the number of lives saved in each of the two outcomes) is more likely to answer yes on B? One expects a correlation…

  4. lukas Says:

    Paul - there is definitely a very strong correlation. Our second chart tries to show this (I realize it’s a little confusing, but it actually lays out all the raw data in a graphical form). One worry I have is that the correlation may be artificially inflated by the fact that people had to answer all three questions at once.

  5. brendano Says:

    Bill: yeah that was a typo for this post.

  6. Mosh Says:

    Luke, your last point about people having to answer all three questions at once makes me think there might be an Innocent Bystanders Threshold that’s been exceeded by the time people get to the last question. “Ooh, I already killed some people. I think that’s enough for today.” Did you guys ask the questions in the same order every time?

  7. Travis Kalanick Says:

    Couple things:
    1) For each hypothetical number of lives saved, can you provide the number of turker samples you got? For example, can you provide the number of turkers answered Scenario A for 10 lives saved?

    2) This is a really cool post, but would be nice if either at the beginning or the end, you provided 3 or 4 bullets for key takeaways. . . would be easier for me to synthesize for discussions at cocktail parties :)

  8. Idiocy of the Crowds Says:

    Crowdsourcing ethical dilemmas is an incredibly dumb idea. The “wisdom of the crowds” has jumped the shark. Firstly, yes, you might get data on what people will do, but that still doesn’t answer what you should do, which is the whole point of ethics. Secondly, the whole experiment is logically flawed. It’s an appeal to popular opinion, which is a fallacy. Just because a crowd believes in something, doesn’t make whatever position they hold true.

  9. Andy Eggers Says:

    The lack of monotonicity is the most interesting and troubling thing here: in many cases it seems that the turkers were responding purely randomly, which makes sense since this was the fastest way to get paid. This makes me think that mturk can really only be used seriously for tasks that can be validated via ESP game-like checks.

  10. Lukas Says:

    Andy - I agree the lack of monotonicity is troubling but I disagree with your conclusion that “mturk can really only be used seriously for tasks that can be validated via ESP game-like checks.”

    Each individual seems illogical, but the aggregate is a mononically increasing function in the number of people sacrificed, and there is a clear distinction between the three scenarios. So the aggregate looks surprisingly logical. This indicates that most workers weren’t clicking randomly.

    Thinking about the problem for myself. If you asked me if I wanted to switch the trolley track to kill 5 people I’m not sure I would give the same answer each time. Do you think that you could remain completely internally consistent if presented with this question over and over with different values of N?

    -Lukas

  11. Andy Eggers Says:

    Yeah, Lukas, I think on reflection I would seriously moderate what I said, not so much because I think the data is fine but because I think that even if some turkers were not taking the questions seriously you can still get some good estimates out of the data. For example, even if you don’t know exactly what the “bad guys” were doing (and thus your estimate of the frequency of yes’s in a particular scenario was biased), as long as you think they were doing the same thing in answering the different scenarios you could still get an unbiased estimate of the difference in the mean proportions between scenarios. (Although you would need to at least estimate how much of the data was bad.)

    Anyway, I concede that good experiments could be done on turkers, but as with all research it’s hard to come up with good questions. Personally I’m not so interested in this one but I will keep an eye out for some.

  12. Cezary Says:

    You assume what is a rational decision too quickly. I imagine that for tested people it might be a problem of what other people would think of him taking any decision. Have you tested your turks for religious beliefs? Why would 30 people be better for me than 10? $30 is better than $10, but people? They behave differently, they might all destroy your house as very stressed survivors. Wouldn’t they be more likely to attribute their survival to some supernatural force and not the decisionmaker not being thankful at all? Wouldn’t they want after the survival help from my taxes from the government? There are many issues to analyze here.

  13. Carey Says:

    Idiocy of the Crowds (#8), I think you’ve read far too much into this. I don’t see anywhere that states what should be done, or what’s true or false. You can’t have a fallacious argument if there is no argument at all. I see this as simply presenting some data, which may or may not be interesting to the reader.

  14. Norman Creaney Says:

    @Lukas wrt monotonicity - the irrationality of individuals is a given - its the aggregate that matters.
    Norman

  15. johnfromukiah Says:

    I heard this same experiment on NPR’s Radio Lab and the same correlations were found. If I remember correctly, one conclusion made by the hosts was that it was perhaps the technology, the remote control if you will, of the lever in the first scenario that made it easier to say “yes” (implications for using drones over Pakistan?). However, after taking a bit of time to think about it, I think there is another subtle difference between the scenarios that can help explain the different choices made. In scenario A, there is a significant delay between the pulling the switch and the train reaching the man. Scenario B is less time, but still time between pushing the man and the death of the man. Scenario C is the least amount of time between your physical action and the death of the man. I would venture to say that if you shortened the distance in scenario A such that the man would instantly die when you pulled the lever, less people would choose “yes”. Moreover, I think that the reason why these time intervals matter is that despite the foregone conclusions of each scenario (that the man will die) logic states that the longer the interval between your choice and the death, the more likely something else may happen that will prevent the death (e.g. the man will untie himself from the tracks, the man’s friend will rescue him, the train will be able to stop).

  16. Chris MacDonald Says:

    Not to be picky, but is this really “crowdsourcing?” Is a survey — even a sophisticated one like this, really an instance of crowdsourcing? Doesn’t crowdsourcing involve getting some job done? The only way a survey could be taken as getting something done would be if the results were taken to be normative, or as implying the “right” answer to a question. But then Comment #8 above needs a serious answer.

Leave a Reply