The Crowd and the Cosmos: Adventures in the Zooniverse
Page 18
quite-irony searching for the remains of Toby the pig, the first pig
to visit the Antarctic twice.*
But there was no escaping the fact that one of the things that
might be threatening the well-being of the charismatic, if stink-
ing, penguins was our own presence alongside them. Part of the
point of our mission was to try and understand human impact
* Toby had sailed south with a Uruguayan expedition, who sold him to a French ship on the way home. Toby’s second visit to the Antarctic was thus part of an exhibition organized by the French explorer, Jean-Baptiste Charcot, whose crew were one of the first to deliberately overwinter on the Peninsular. Charcot is remembered now for his superior planning, which allowed for a modicum of comfort which he believed was important for the crew’s well-being. There’s a photo of him breakfasting—at a carefully laid table—on the ice, butler and champagne on hand, and the ship’s crew each morning enjoyed the daily paper from Paris—just distributed precisely a year late.
Too Many Penguins 145
on the landscape. Tourism was in evidence whenever we were
there, for obvious reasons, and science—either by our own
efforts or those of the research bases we visited—also had a
visible impact. Two hidden threats need to be taken into account,
though.
The first is the one you’re thinking about. It’s hard to mention
‘ice’, now, without immediately thinking of the melting induced
by climate change that is afflicting the world’s ice caps and gla-
ciers. The last few summers have been brutal in the polar regions,
with the Arctic ice cap frequently so far from its usual extent for
the time of year that it’s obvious even to most casual observer
looking at satellite photographs. The Antarctic Peninsula, too, is
warming, although the complexities of the flow of water around
here have complicated the picture somewhat. The other threat
lies offshore, and comes as something of a shock to those who
have come here precisely because it is an unspoilt wilderness.
‘Fishing?’, they say, as Tom explains his idea that the penguin
colonies we are visiting are suffering from the effects of over-
zealous human fisheries. The colonies look robust enough, but
data from the camera traps show that the numbers of one species
in particular is declining. It is the species of penguin that depends most of all on krill, the diverse and nutritious tiny crustaceans
that swarm throughout the Southern Ocean.
Krill have been harvested seriously in these waters for the last
couple of decades. That sounds surprising, because you’ve never
ordered a krill burger, but in addition to food for fish farms the
bountiful harvest produces gallons of sub-standard cod-liver
and fish oil. Buy a generic supplement from your pharmacy, and
without reading very carefully you’ll be competing with the
Antarctic penguins for their primary foodstuff.
There is a lot of krill to go round. One source, the Commission
for the Conservation of Antarctic Marine Living Resources,
146 Too Many Penguins
points out that the mass of all the Antarctic krill in the world
outweighs us humans. Unlike human flesh, though, most of the
biomass which exists in the form of krill is eaten each year and
then replaced. This rapid turnover makes it even harder to believe
that human intervention could make much of a difference, but it
actually makes the problem worse.
As the krill disappear, harvested for our consumption, those
penguin species which aren’t able to adjust to find other food-
stuffs suffer. Tom’s camera network has already picked up a dif-
ference between the resilient gentoo penguins, which have been
able to switch from krill to other food, and populations of less
versatile chinstraps.
Cameras can only tell you so much, which is why I found
myself reaching tentatively into a fridge full of a summer’s worth
of guano samples. Tom and team have long been in the habit of
collecting penguin poop from most of their sites, hoping to
marry serious lab analysis with camera data. The cameras tell us
how the colony is doing, and the lab work will tell us how at least
some individual penguins are doing, bearing information on
diet, on health, and on any infections the penguins carry.
First though, we had to get the samples back to Oxford, and as
the ship sailed around the peninsula there was a rare opportun-
ity. One of our stops was Port Lockroy, a British base more than
a hundred years old and home to the only functioning Post Office
on the continent. The volunteer staff from the charity UK
Antarctic Heritage, who run the base, act as curators, mainten-
ance staff, wildlife recorders, and more during their stint there,
but during the tourist season they also run a thriving gift shop.
Must-have items include an Antarctic tartan tie, but what every-
one really wants is to send a postcard back to friends at home.
There is, therefore, a working Post Office, though you have to
wait for the next ship before mail leaves the base. If Tom and I
Too Many Penguins 147
could package the samples that had been languishing in his
fridge, they could be dispatched directly from the base to Oxford.
That meant diluting each sample with stabilizing chemical,
which meant donning latex gloves and squeezing into the tiny
bathroom attached to Tom’s cabin.
I’ve already mentioned the unappetizing smell of the penguin
colonies, and the stench of their droppings within the small cabin
was unforgettable. It got into one’s nostrils instantly, onto
clothes, and, I feared, into my skin. After a while on the produc-
tion line, handling tubes Tom passed to me while pointing them
as far from my nose as possible, I pleaded for a break and headed
out to get coffee.
I got back to the cabin to notice one of the ship’s efficient crew
fiddling with pipes in the corridor outside. I was just explaining
this to Tom when a knock on the door revealed the ship’s purser,
in pursuit of an unearthly smell that was disturbing those paying
passengers who were trying to have the holiday of a lifetime
around our scientific expedition. Apparently he wanted to see
Tom’s bathroom, where he feared the smell now working its way
through the ship’s air conditioning originated.
I don’t really know how to describe what we looked like, two
unshaven researchers with blue hospital gloves, inane grins, and
the realization of what we’d done slowly showing up in our
expressions. Somehow, between our shock and his confusion,
we agreed he should come back and inspect the plumbing later
and got on with the job. I’ve never worked faster in my life, and
somehow we got the samples safely packaged before a more for-
cible intervention arrived.
Reflecting on the morning’s events in the ship’s bar later, won-
dering if people were avoiding me because of a lingering stench,
I realized just how close to astrophysics Tom’s research was.
Obviously, I’m rarely called in to deal with galaxy excreta, at least
148 Too Many P
enguins
directly, but Tom (with my inept help) had become, while still
doing science, a professional and an expert in data collection. His
command of the moving network of ships and people, and the
resulting spread of cameras and data, reminded me of the unsung
heroes of the Sloan Digital Sky Survey, the engineers and astron-
omers who spend huge amounts of time gathering the data on
which the rest of us depend.
The fact that we’d rushed to treat the samples before the smell
contaminated a cruise ship won’t ever be mentioned in a scien-
tific paper. The careful day-to-day diplomacy that ensured that a
ship employed on quite some other purpose delivered the team
to each of their cameras is reflected in an unbroken data series,
but won’t ever be commented on in formal publication. And the
fact that Tom was able to leap up that hill, and I wasn’t, won’t be
recorded anywhere but in the pages of this book.
Similarly, without people who understand how to make the
telescope perform, to keep the camera operating at peak per-
formance, all the science that uses this data is measurably poorer.
Without people who are really good at being Sloan’s ‘cold observer’*
my measurements of galaxy properties would be less accurate.
It’s rare, in science, to pay much attention to these hidden parts
of the process, which since the nineteenth century have become
increasingly professionalized and, for most of us as we’ve entered
the digital age, increasingly remote from our day-to-day lives.
When you go to that much effort—whether in the surpris-
ingly chilly New Mexico night or the much more predictable
cold of the Antarctic—it’s important to make the best of all of
the data you can obtain. For Tom and his team that means shar-
ing the images his camera network takes, and which they go to
* This is a real job title; the cold observer is out with the telescope while the
‘warm observer’ is inside with the electronics.
Too Many Penguins 149
such lengths to return to Oxford, with the entire world, via a
Zooniverse project called Penguin Watch.
Penguin Watch is perhaps the simplest of all our projects, so
straightforward that I know 5-year-old children have taken part.
Presented with an image from the cameras, all you have to do is
count and then click on the penguins. This information seems
almost banal in the context of a single image, but over time we
learn how colonies shrink and expand, how much time individ-
ual penguins are spending out at sea, and even what their feeding
behaviour is like. These details can then be compared to weather
and climate records, to changes in fishing permits and protected
areas, and to visitor numbers to get a sense of what’s really hap-
pening.
For that to work, the data must be accurate. An individual pen-
guin counter can make a mistake, and so the key is to combine
everyone’s penguin counts to produce a consensus. All of our
citizen science projects depend on multiple people looking at the
same data, but there are a few maddening quirks about penguins
that make it more difficult. First—and you might have to take my
word for it—there are a lot of them. We, of course, are after an
accurate, scientific count, but some of the cameras capture hun-
dreds of penguins in a single image. That’s not ideal, obviously,
but the cameras are set up once and then left for the year and so
when things change, so does their view.
The problem with having too many penguins is that people
baulk at counting them. Being presented with an image of hun-
dreds of the critters is, for many people, more annoying than
interesting. Our designers and developers know this, and so
Penguin Watch reassures you that after your penguin count
reaches thirty it’s OK to move on. (It turns out there are people
out there, many of them attracted to Penguin Watch, who deeply
resent this message. They are people who like order, who like
150 Too Many Penguins
completing a challenge no matter how many penguins it involves.
And so we learn once again that people are complicated.) For
these busy images, therefore, few people cover every penguin.
Some click the front row, others a cluster near the back, and so
on and so forth.
What we’re left with once many people have seen each image
is a mosaic of things that at least one person thought was a pen-
guin. What the team need is a list of likely penguins, ideally with
some sense of how likely each possibility is to be real. This
requires some careful data handling, but the basic idea is sim-
ple—if two people mark a penguin in roughly the same place,
then each marking counts as a ‘vote’ that there’s a penguin there.
The nice thing about this is that the researcher working on the
data, has only got to make two decisions. The first concerns how
close together two markings have to be for them to be in ‘roughly’
the same place. That can be found by trial and error. If you have a
few images that experts have gone through, then you can just
adjust the parameter until you get results that look pretty good.
If you fail, then you need more people to look at each image so
that you get more data. This is essentially what we do when test-
ing a project.
That’s the easy part. If you want to make it complicated, there
are reams of computer science papers that deal with this sort of
clustering problem, and plenty of researchers who will make it
more complex for you. The degree of proximity required to have
the algorithm decide that two marks refer to the same penguin
need not be a constant, for example, but could depend on how far
the markings are from the camera, or the time of day, or how
many other penguins are in the image, or a host of other varia-
bles.
In general, though, because Penguin Watch volunteers are
pretty good, there’s not much need for a complex solution to the
Too Many Penguins 151
problem. The second decision you have to make is much more
difficult. How many people have to have marked the same spot
for us to conclude there is a penguin there?
We want accurate data, so the temptation is to say that we
need lots of people for each penguin. That’ll produce a set of
places where we’re really, really sure a penguin is—but we’ll miss
most of them. If we want complete data—if we want to catch
every penguin going, even the ones at the back disguising them-
selves effectively as rocks—then we need to relax, make the algo-
rithm less picky, and include places where only a few people say
there’s a penguin in the final list.
That will decrease the accuracy, so there’s a trade-off to be
made between accuracy and completeness; in a problem like
this, you have to choose which you care more about. This turns
out to be a general feature of this type of problem, and for differ-
ent scientific problems you might pick different combinati
ons.
If you wanted a few excellent images of penguins, for example,
you might go for high accuracy and low completeness. If your
research called for an upper limit on the number of surviving
penguins, then completeness becomes more important than
accuracy.
But this isn’t the end of the story. We haven’t used all of the
information we have to hand, and in trying to squeeze more
from the data, in order to do justice to all the hard work that went
into collecting it, things get interesting. So far, we’ve treated everyone’s classifications as being of equal value, but it must be true
that some will be better, or more diligent, at penguin counting
than others. For example, we know a large proportion of people
who take part in Penguin Watch in particular have ‘Mom’, ‘Mum’,
or ‘Dad’ in their usernames; it seems reasonable to assume that
these represent households where participating in science by
152 Too Many Penguins
counting penguins is a family activity, involving children too
young to have their own account.
It’s possible too that these kids are less good than their older
counterparts at the task, and thus we should pay less attention to
them. On the other hand, I could easily believe that the combin-
ation of growing up as digital natives, smaller hands, and the
insatiable desire for repetition that characterizes most 5 year olds
of my acquaintance might make them killer classifiers. We don’t
have to decide in advance; with only a small number of images
labelled by experts, we can find the people who are best at pen-
guin counting, and pay attention to them.
It’s a simple and obvious idea. The ability of websites to under-
stand us based solely on our interactions with them is one of the
things that drives the digital world. If Google can discover just by
watching the information I happen to give it where I work (after
a brief, embarrassing period during which it insisted the Lamb
and Flag pub was my office), and if Facebook can serve up the
memory of the long-forgotten party I want to see each morning,
then surely discovering whether I’m any good at penguin count-
ing by asking me to count penguins is the digital equivalent of
child’s play.
Once we start thinking like this, there’s much we can do to