Here Comes Everybody

Page 11

by Clay Shirky

Wikipedia’s Content

Mere volume would be useless if Wikipedia articles weren’t any good, however. By way of example, the article on Pluto as of May 2007 begins:

Pluto, also designated 134340 Pluto, is the second-largest known dwarf planet in the Solar System and the tenth-largest body observed directly orbiting the Sun. Originally considered a planet, Pluto has since been recognized as the largest member of a distinct region called the Kuiper belt. Like other members of the belt, it is primarily composed of rock and ice and is relatively small; approximately a fifth the mass of the Earth’s Moon and a third its volume. It has an eccentric orbit that takes it from 29 to 49 AU from the Sun, and is highly inclined with respect to the planets. As a result, Pluto occasionally comes closer to the Sun than the planet Neptune.

That paragraph includes ten links to other Wikipedia articles on the solar system, astronomical units (AU), and so on. The article goes on for five thousand words and ends with an extensive list of links to other sites with information about Pluto. This kind of thing—a quick overview, followed by broad and sometimes quite lengthy descriptions, ending with pointers to more information—is pretty much what you’d want in an encyclopedia.

The Pluto article is not unusual; you can find articles of similarly high quality all over the site.

The Okeechobee Hurricane, or Hurricane San Felipe Segundo, was a deadly hurricane that struck the Leeward Islands, Puerto Rico, the Bahamas, and Florida in September of the 1928 Atlantic hurricane season. It was the first recorded hurricane to reach Category 5 status on the Saffir-Simpson Hurricane Scale in the Atlantic basin.

or

Ludwig Josef Johann Wittgenstein (April 26, 1889 in Vienna, Austria—April 29, 1951 in Cambridge, England) was an Austrian philosopher who contributed several ground-breaking ideas to philosophy, primarily in the foundations of logic, the philosophy of mathematics, the philosophy of language, and the philosophy of mind. His influence has been wide-ranging, placing him among the most significant philosophers of the 20th century.

And so on. There are hundreds of thousands of articles whose value is both relied on and improved daily.

The most common criticism of Wikipedia over the years stemmed from simple disbelief: “That can’t work.” Sanger understood this objection and titled an early essay on the growth of Wikipedia “Wikipedia is wide open. Why is it growing so fast? Why isn’t it full of nonsense?” In that article he ascribed at least part of the answer to group editing:

Wikipedia’s self-correction process (Wikipedia co-founder Jimmy Wales calls it “self-healing”) is very robust. There is considerable value created by the public review process that is continually ongoing on Wikipedia—value that is very easy to underestimate, for those who have not experienced it adequately.

One other fateful choice, which actually predates the founding of Wikipedia itself, was the name, or rather the “-pedia” suffix. Wikipedia, like all social tools, is the way it is in part because of the way the software works and in part because of the way the community works. Though wikis can be used for many kinds of writing, the early users were guided by the rhetorical models of existing encyclopedias, which helped synchronize the early work: there was a shared awareness of the kind of writing that should go into a project called Wikipedia. This helped coordinate the users in ways that were not part of the software but were part of the community that used the software.

Wikipedia has now transcended the traditional functions of an encyclopedia. Within minutes of the bombs going off in the London transit system, someone created a Wikipedia page called “7 July 2005 London bombings.” The article’s first incarnation was five sentences long and attributed the explosions to a power surge in the Underground, one of the early theories floated before the bus bombing was linked to the Underground explosions. The Wikipedia page received more than a thousand edits in its first four hours of existence, as additional news came in; users added numerous pointers to traditional news sources (more symbiosis) and a list of contact numbers for people either trying to track loved ones or simply figuring out how to get home. What was conceived as an open encyclopedia in 2001 has become a general-purpose tool for gathering and distributing information quickly, a use that further cemented Wikipedia in people’s mind as a useful reference work. Note the virtuous circle at work here: because enough people thought of using Wikipedia as a coordinating resource, it became one, and because it became one, more people learned to think of it as a coordinating resource. This evolution was made possible precisely because the community had gotten the narrower version of an encyclopedia right earlier, which provided a high-visibility platform for further experimentation.

Skepticism about Wikipedia’s basic viability made some sense back in 2001; there was no way to predict, even with the first rush of articles, that the rate of creation and the average quality would both remain high, but today those objections have taken on the flavor of the apocryphal farmer beholding his first giraffe and exclaiming, “Ain’t no such animal!” Wikipedia’s daily utility for millions of users has been settled; the interesting questions are elsewhere.

Unmanaged Division of Labor

It’s easy to understand how Cunningham’s original wiki functioned; a small group that knows one another presents organizational challenges no worse than getting a neighborhood poker game going. But Wikipedia doesn’t operate at the scale of a neighborhood poker game; it operates at the scale of a Vegas casino. Something this big seems like it should require managers, a budget, a formal work-flow process. Without those things how could it possibly work? The simple but surprising answer is: spontaneous division of labor. Division of labor is usually associated with highly managed settings, but it’s implemented here in a far more unmanaged way. Wikipedia is able to aggregate individual and often tiny contributions, hundreds of millions of them annually, made by millions of contributors, all performing different functions.

Here’s how it works. Someone decides that an article on, say, asphalt should exist and creates it. The article’s creator doesn’t need to know everything (or indeed much of anything) about asphalt. As a result, such articles often have a “well, duh” quality to them. The original asphalt article read, in full, “Asphalt is a material used for road coverings.” The article was created in March 2001, at the dawn of Wikipedia, by a user named Cdani, as little more than a placeholder saying, “We should have an article on asphalt here.” (Wikipedians call this a “stub.”)

Once an article exists, it starts to get readers. Soon a self-selecting group of those readers decide to become contributors. Some of them add new text, some edit the existing article, some add references to other articles or external sources, and some fix typos and grammatical errors. None of these people needs to know everything about asphalt; all contributions can be incremental. And not all edits are improvements: added material can clutter a sentence, intended corrections can unintentionally introduce new errors, and so on. But every edit is itself provisional. This works to Wikipedia’s benefit partly because bad changes can be rooted out faster, but also partly because human knowledge is provisional. During 2006 a debate broke out among astronomers on whether to consider Pluto a planet or to relegate it to another category; as the debate went on, Wikipedia’s Pluto page was updated to reflect the controversy, and once Pluto was demoted to the status of “dwarf planet,” the Pluto entry was updated to reflect that almost immediately.

A Wikipedia article is a process, not a product, and as a result, it is never finished. For a Wikipedia article to improve, the good edits simply have to outweigh the bad ones. Rather than filtering contributions before they appear in public (the process that helped kill Nupedia), Wikipedia assumes that new errors will be introduced less frequently than existing ones will be corrected. This assumption has proven correct; despite occasional vandalism, Wikipedia articles get better, on average, over time.

It’s easy to understand division of labor in industrial settings. A car comes into being as it passes
down an assembly from one group of specialists to the next—first the axle, then the wheels. A wiki’s division of labor is nothing like that. By 2007, the asphalt article has had 129 different contributors, who have subdivided it into two separate articles, one on asphalt, the petroleum derivative, and another on asphalt concrete, the road covering. To each of these articles, the contributors have added or edited sections on the chemistry, history, and geographic distribution of asphalt deposits, on different types of asphalt road surfaces, and even on the etymology of the word “asphalt,” transforming the original seven-word entry into a pair of detailed and informative articles. No one person was responsible for doing or even managing the work, and yet researching, writing, editing, and proofreading have all unfolded over the course of five years. This pattern also exists across Wikipedia as a whole: one person can write new text on asphalt, fix misspellings in Pluto, and add external references for Wittgenstein in a single day. This system also allows great variability of effort—of the 129 contributors on the subject of asphalt, a hundred of them contributed only one edit each, while the half-dozen most active editors contributed nearly fifty edits among them, almost a quarter of the total. The most active contributor on the subject of asphalt, a user going by SCEhardt, is ten times more active than the average contributor and over a hundred times more active than the least active contributor.

This situation is almost comically chaotic—a car company would go out of business in weeks if it let its workers simply work on what they wanted to, when they wanted to. A car company has two jobs. The obvious one is making cars, but the other job is being a company. It’s hard work to be a company; it requires a great deal of effort and a great deal of predictability. The inability to count on an employee’s particular area of expertise, or even on their steady presence, would doom such an enterprise from the start. There is simply no commercially viable way to let employees work on what interests them as the mood strikes. There is, however, a noncommercial way to do so, which involves being effective without worrying about being efficient.

Wikis avoid the institutional dilemma. Because contributors aren’t employees, a wiki can take a staggering amount of input with a minimum of overhead. This is key to its success: it does not need to make sure its contributors are competent, or producing steadily, or even showing up. Mandated specialization of talent and consistency of effort, seemingly the hallmarks of large-scale work, actually have little to do with division of labor itself. A business needs employee A and employee B to put in the same effort if they are doing the same job, because it needs interchangeability and because it needs to reduce friction between energetic and lazy workers. By this measure, most contributors to Wikipedia are lazy. The majority of contributors edit only one article, once, while the majority of the effort comes from a much smaller and more active group. (The two asphalt articles, with a quarter of the work coming from six contributors, are a microcosm of this general phenomenon.) Since no one is being paid, the energetic and occasional contributors happily coexist in the same ecosystem.

The freedom of contributors to jump from article to article and from task to task makes the work on any given article unpredictable, but since there are no shareholders or managers or even customers, predictability of that sort doesn’t matter. Furthermore, since anyone can act, the ability of the people in charge to kill initiatives through inaction is destroyed. This is what befell Nupedia; because everyone working on that project understood that only experts were to write articles, no one would even begin an article they knew little about, and as long as the experts did nothing (which, on Nupedia, is mostly what they did), nothing happened. In an expert-driven system, an article on asphalt that read “Asphalt is a material used for road coverings” would never appear, even as a stub. So short! So uninformative! Why, anyone could have written that! Which, of course, is the principal advantage of Wikipedia.

In a system where anyone is free to get something started, however badly, a short, uninformative article can be the anchor for the good article that will eventually appear. Its very inadequacy motivates people to improve it; many more people are willing to make a bad article better than are willing to start a good article from scratch. In 1991 Richard Gabriel, a software engineer at Sun Microsystems, wrote an essay that included a section called “Worse Is Better,” describing this effect. He contrasted two programming languages, one elegant but complex versus another that was awkward but simple. The belief at the time was that the elegant solution would eventually triumph; Gabriel instead predicted, correctly, that the language that was simpler would spread faster, and as a result, more people would come to care about improving the simple language than improving the complex one. The early successes of a simple model created exactly the incentives (attention, the desire to see your work spread) needed to create serious improvements. These kinds of incentives help ensure that, despite the day-to-day chaos, a predictable pattern emerges over time: readers continue to read, some of them become contributors, Wikipedia continues to grow, and articles continue to improve. The process is more like creating a coral reef, the sum of millions of individual actions, than creating a car. And the key to creating those individual actions is to hand as much freedom as possible to the average user.

A Predictable Imbalance

Anything that increases our ability to share, coordinate, or act increases our freedom to pursue our goals in congress with one another. Never have so many people been so free to say and do so many things with so many other people. The freedom driving mass amateurization removes the technological obstacles to participation. Given that everyone now has the tools to contribute equally, you might expect a huge increase in equality of participation. You’d be wrong.

You may have noticed a great imbalance of participation in many examples in this book. The Wikipedia articles for asphalt had 129 contributors making 205 total edits, but the bulk of the work was contributed by a small fraction of participants, and just six accounted for about a quarter of the edits. A similar pattern appears on Flickr: 118 photographers contributed over three thousand Mermaid Parade photos to Flickr, but the top tenth contributed half of those, and the most active photographer, Czarina, contributed 238 photos (about one in twelve) working alone. This shape, called a power law distribution, is shown in Figure 5-1.

Figure 5-1: The distribution of photographers contributing photos of the 2005 Coney Island Mermaid Parade.

Five points are shown on this graph. The two leftmost data points are the most and second-most active photographers. The most active photographer is far more active than the second most active, and they are both far more active than most of the rest of the photographers. The average number of photos taken (all photos divided among all photographers) is twenty-six, while the median (the middle photographer) took eleven photos, and the mode (the number of photos that appeared most frequently) is a single photo.

Note the sharp drop-off in the number of photos between the top few contributors and most of the participants. Notice too that because of the disproportionate contributions of these few photographers, three-quarters of the photographers contributed a below-average number of pictures. This pattern is general to social media: on mailing lists with more than a couple dozen participants, the most active writer is generally much more active than the person in the number-two slot, and far more active than average. The longest conversation goes on much longer than the second-longest one, and much longer than average, and so on. Bloggers, Wikipedia contributors, photographers, people conversing on mailing lists, and social participation in many other large-scale systems all exhibit a similar pattern.

There are two big surprises here. The first is that the imbalance is the same shape across a huge number of different kinds of behaviors. A graph of the distribution of photo labels (or “tags”) on Flickr is the same shape as the graph of readers-per-weblog and contributions-per-user to Wikipedia. The general form of a power law distribution appears in social settings when some set of items—users, picture
s, tags—is ranked by frequency of occurrence. You can rank a group of Flickr users by the number of pictures they submit. You can rank a collection of pictures by the number of viewers. You can rank tags by the number of pictures they are applied to. All of these graphs will be in the rough shape of a power law distribution.

The second surprise is that the imbalance drives large social systems rather than damaging them. Fewer than two percent of Wikipedia users ever contribute, yet that is enough to create profound value for millions of users. And among those contributors, no effort is made to even out their contributions. The spontaneous division of labor driving Wikipedia wouldn’t be possible if there were concern for reducing inequality. On the contrary, most large social experiments are engines for harnessing inequality rather than limiting it. Though the word “ecosystem” is overused as a way to make simple situations seem more complex, it is merited here, because large social systems cannot be understood as a simple aggregation of the behavior of some nonexistent “average” user.

The most salient characteristic of a power law is that the imbalance becomes more extreme the higher the ranking. The operative math is simple—a power law describes data in which the nth position has 1/nth of the first position’s rank. In a pure power law distribution, the gap between the first and second position is larger than the gap between second and third, and so on. In Wikipedia article edits, for example, you would expect the second most active user to have committed only half as many edits as the most active user, and the tenth most active to have committed one-tenth as many. This is the shape behind the so-called 80/20 rule, where, for example, 20 percent of a store’s inventory accounts for 80 percent of its revenues, and it has been part of social science literature since Vilfredo Pareto, an Italian economist working in the early 1900s, found a power law distribution of wealth in every country he studied; the pattern was so common that he called it “a predictable imbalance.” This is also the shape behind Chris Anderson’s discussion in The Long Tail; most items offered at online retailers like iTunes and Amazon don’t sell well, but in aggregate they generate considerable income. The pattern doesn’t apply just to goods, though, but to social interactions as well. Real-world distributions are only an approximation of this formula, but the imbalance it creates appears in an astonishing number of places in large social systems.

‹ Prev Next ›