[Introduction to mass collaboration], [Human computation],
[Open call], [Distributed data collection],
[Fragile Families Challenge]
Matthew J. Salganik
Department of Sociology
Princeton University
1) Introduction
2) Observing behavior
3) Asking questions
4) Running experiments
5) Mass collaboration
6) Ethics
7) The future
Fig 5.4 (Salganik 2018)
Fig 5.4 (Salganik 2018)
Human computation:
I Easy task, big scale problems where humans better than computers
Human computation:
I Easy task, big scale problems where humans better than computers
I Split-apply-combine strategy
Human computation:
I Easy task, big scale problems where humans better than computers
I Split-apply-combine strategy
I Human effort can be magnified with supervised learning
Human computation:
I Easy task, big scale problems where humans better than computers
I Split-apply-combine strategy
I Human effort can be magnified with supervised learning
I Increasingly important as we move from numeric survey data to working with text,
images, movies, and audio.
Galaxy Zoo
Astronomers are interested in understanding the relationship between the shape and
color of galaxies
(a) Elliptical (b) Spiral
Galaxy Zoo
Needed hand-classified galaxies so Schawninski worked seven, 12 hour days to classify
50,000 galaxies
Galaxy Zoo
Needed hand-classified galaxies so Schawninski worked seven, 12 hour days to classify
50,000 galaxies
Only 5% of the ∼ 1 million galaxies in the Sloan Digital Sky Survey. A new approach
was needed . . . .
The aim
y neural
man clas-
galaxies,
morphol-
it is not
fications.
adopted.
of visual
al. 2007;
extraor-
ets of the
ed basic
0.05 and
e results
ption of
08), pro-
SS main
Figure 1. Main analysis page from the Galaxy Zoo web site.
Galaxy Zoo
I Volunteers had a ∼5 minute training and passed a quiz
I Categorized as many or as few galaxies as they wished
I Much of the recruiting happened through the media
Galaxy Zoo
(a) Classifications over time (b) Classifications per user
Galaxy Zoo
40 million classification to a consensus labels (Lintott et al., 2011)
1. Cleaning
I only the first classification that a volunteer made of a specific galaxy was used in the
analysis
I anyone who classified more than 2 galaxies more than 5 times each had all their
classifications discarded
Galaxy Zoo
40 million classification to a consensus labels (Lintott et al., 2011)
1. Cleaning
I only the first classification that a volunteer made of a specific galaxy was used in the
analysis
I anyone who classified more than 2 galaxies more than 5 times each had all their
classifications discarded
2. De-biasing
I bias to classify far away spiral galaxies as elliptical galaxies (Bamford et al., 2009)
Galaxy Zoo
40 million classification to a consensus labels (Lintott et al., 2011)
1. Cleaning
I only the first classification that a volunteer made of a specific galaxy was used in the
analysis
I anyone who classified more than 2 galaxies more than 5 times each had all their
classifications discarded
2. De-biasing
I bias to classify far away spiral galaxies as elliptical galaxies (Bamford et al., 2009)
3. Combining (∼40 classifications per galaxy)
I use classifier/classification matrix to upweight good classifiers
Produces data comparable in quality to expert coders (Lintott et al. 2011), but at
much greater scale
Galaxy Zoo
From millions to billions to trillions . . . .
Galaxy Zoo
Fig 5.4 (Salganik 2018), inspired by Banerji et al. (2010)
https://www.zooniverse.org/
Benoit et al. (2016)
Benoit et al. (2016)
Here’s a piece of the manifesto of the Labor Party in the United Kingdom from 2010:
“Millions of people working in our public services embody the best values of
Britain, helping empower people to make the most of their own lives while
protecting them from the risks they should not have to bear on their own.
Just as we need to be bolder about the role of government in making markets
work fairly, we also need to be bold reformers of government.”
What I like about Benoit et al. (2016)
I Better not cheaper
What I like about Benoit et al. (2016)
I Better not cheaper
I Experts are a bug not a feature
Wrapping up:
I Easy task, big scale problems where humans better than computers
Wrapping up:
I Easy task, big scale problems where humans better than computers
I Split-apply-combine strategy
Wrapping up:
I Easy task, big scale problems where humans better than computers
I Split-apply-combine strategy
I Human effort can be magnified with supervised learning
Wrapping up:
I Easy task, big scale problems where humans better than computers
I Split-apply-combine strategy
I Human effort can be magnified with supervised learning
I Increasingly important as we move from numeric survey data to working with text,
images, movies, and audio.
What to read next:
I Human computation (Law and von Ahn, 2011)
I reCAPTCHA (von Ahn et al. 2008)
I Background about Amazon Mechanical Turk: Bohannon 2016
[Introduction to mass collaboration], [Human computation],
[Open call], [Distributed data collection],
[Fragile Families Challenge]
Matthew J. Salganik
Department of Sociology
Princeton University