The Most Human Human

Page 26

by Brian Christian

Carnegie Mellon computer scientist Guy Blelloch suggests the following:

One might think that lossy text compression would be unacceptable because they are imagining missing or switched characters. Consider instead a system that reworded sentences into a more standard form, or replaced words with synonyms so that the file can be better compressed. Technically the compression would be lossy since the text has changed, but the “meaning” and clarity of the message might be fully maintained, or even improved.

But—Frost—“poetry is what gets lost in translation.” And—doesn’t it seem—what gets lost in compression?

Establishing “standard” and “nonstandard” ways of using a language necessarily involves some degree of browbeating. (David Foster Wallace’s excellent essay “Authority and American Usage” shows how this plays out in dictionary publishing.) I think that “standard” English—along with its subregions of conformity: “academic English,” specific fields’ and journals’ style rules, and so on—has always been a matter of half clarity, half shibboleth. (That “standard” English is not the modally spoken version should be enough to argue for its nonstandardness, enough to argue that there is some hegemonic force at work, even if unwittingly or benevolently.)

But often within communities of speakers and writers, these deviations have gone unnoticed, let alone unpunished: if everyone around you says “ain’t,” then the idea that “ain’t” ain’t a word seems ridiculous, and correctly so. The modern, globalized world is changing that, however. If American English dominates the Internet, and British-originating searches return mostly American-originating results, then all of a sudden British youths are faced with a daily assault of u-less colors and flavors and neighbors like no other generation of Brits before them. Also, consider Microsoft Word: some person or group at Microsoft decided at some point in time which words were in its dictionary and which were not, subtly imposing their own vocabulary on users worldwide.23 Never before did, say, Baltimorean stevedores or Houston chemists have to care if their vocabulary got the stamp of approval from Seattle-area software engineers: who cared? Now the vocabulary of one group intervenes in communications between members of other groups, flagging perfectly intelligible and standardized terms as mistakes. That said, on the other hand, as long as you can spell it, you can write it (and subsequently force the dictionary to stop red-underlining it). The software doesn’t actually stop people from typing what they want.

That is, as long as those people are using computers, not phones. Once we’re talking about mobile phones, where text prediction schemes rule, things get scarier. In some cases it may be literally impossible to write words the phone doesn’t have in its library.

Compression, as noted above, relies on bias—because making expected patterns easier to represent necessarily makes unexpected patterns harder to represent. The yay-for-the-consumer ease of “normal” language use also means there’s a penalty for going outside those lines. (For a typewriter-written poem not to capitalize the beginnings of lines, the beginnings of sentences, or the word “I” may be either a sign of laziness or an active aesthetic stand taken by the author—but for users subject to auto-”correction,” it can only be the latter.)

The more helpful our phones get, the harder it is to be ourselves. For everyone out there fighting to write idiosyncratic, high-entropy, unpredictable, unruly text, swimming upstream of spell-check and predictive auto-completion: Don’t let them banalize you. Keep fighting.24

Compression and the Concept of Time

Systems which involve large amounts of data that go through relatively small changes—a version control system, handling successive versions of a document, or a video compressor, handling successive frames of a film—lend themselves to something called “delta compression.” In delta compression, instead of storing a new copy of the data each time, the compressor stores only the original, along with files of the successive changes. These files are referred to as “deltas” or “diffs.” Video compression has its own sub-jargon: delta compression goes by “motion compensation,” fully stored frames are “key frames” or “I-frames” (intra-coded frames), and the diffs are called “P-frames” (predictive frames).

The idea, in video compression, is that most frames bear some marked resemblance to the previous frame—say, the lead actor’s mouth and eyebrow have moved very slightly, but the static background is exactly the same—thus instead of encoding the entire picture (as with the I-frames), you just (with the P-frames) encode the diffs between the last frame and the new one. When the entire scene cuts, you might as well use a new I-frame, because it bears no resemblance to the last frame, so encoding all the diffs will take as long as or longer than just encoding the new image itself. Camera edits tend to contain the same spike and decay of entropy that words do in the Shannon Game.

As with most compression, lowered redundancy means increased fragility: if the original, initial document or key frame is damaged, the diffs become almost worthless and all is lost. In general, errors or noise tends to stick around longer. Also, it’s much harder to jump into the middle of a video that’s using motion compensation, because in order to render the frame you’re jumping to, the decoder must wheel around and look backward for the most recent key frame, prepare that, and then make all of the changes between that frame and the one you want. Indeed, if you’ve ever wondered what makes streamed online video behave so cantankerously when you try to jump ahead in it, this is a big part of the answer.25

But would it be going too far to suggest that delta compression is changing our very understanding of time? The frames of a film, each bumped downward by the next; the frames of a View-Master reel, each bumped leftward by the next … but these metaphors for motion—each instant in time knocked out of the present by its successor, like bullet casings kicked out of the chamber of an automatic weapon—don’t apply to compressed video. Time no longer passes. The future, rather than displacing it, revises the present, spackles over it, touches it up. The past is not the shot-sideways belt of spent moments but the blurry underlayers of the palimpsest, the hues buried in overpainting, the ancient Rome underfoot of the contemporary one. Thought of in this way, a video seems to heap upward, one infinitesimally thin layer at a time, toward the eye.

Diffs and Marketing, Personhood

A movie poster gives you one still out of the 172,800-ish frames that make up a feature film, a billboard distills the experience of a week in the Bahamas to a single word, a blurb tries to spear the dozen hours it will take to read a new novel using a trident of just three adjectives. Marketing may be lossy compression pushed to the breaking point. It can teach us things about grammar, by cutting the sentence down to its key word. But if we look specifically at the way art is marketed, we see a pattern very similar to I-frames and P-frames; but in this case, it’s a cliché and a diff. Or a genre and a diff.

When artists participate in a stylistic and/or narrative tradition (which is always), we can—and often do—describe their achievement as a diff. Your typical love story, with a twist: ____________________. Or, just like the sound of ________ but with a note of _____________. Or, ________ meets _______.

Children become diffs of their parents. Loves, diffs of old loves. Aesthetics become diffs of aesthetics. This instant: a diff of the one just gone.

Kundera:

What is unique about the “I” hides itself exactly in what is unimaginable about a person. All we are able to imagine is what makes everyone like everyone else, what people have in common. The individual “I” is what differs from the common stock, that is, what cannot be guessed at or calculated, what must be unveiled.

When the diff between two frames of video is too large (this often occurs across edits or cuts), it’s often easier to build a new I-frame than to enumerate all the differences. The analogy to human experience is the moment when, giving up on the insufficient “It’s like ____ meets ____” mode of explanation-by-diff, we say “It’d take me longer to explain it to you
than to just show you it” or “I can’t explain it really, you just have to see it.” This, perhaps, as a definition of the sublime?

Diffs and Morality

Thomas Jefferson owned slaves; Aristotle was sexist. Yet we consider them wise? Honorable? Enlightened? But to own slaves in a slave-owning society and to be sexist in a sexist society are low-entropy personality traits. In a compressed biography of people, we leave those out. But we also tend on the whole to pass less judgment on the low-entropy aspects of someone’s personality compared to the high-entropy aspects. The diffs between them and their society are, one could argue, by and large wise and honorable. Does this suggest, then, a moral dimension to compression?

Putting In and Pulling Out: The Eros of Entropy

Douglas Hofstadter:

We feel quite comfortable with the idea that a record contains the same information as a piece of music, because of the existence of record players, which can “read” records and convert the groove-patterns into sounds … It is natural, then, to think … decoding mechanisms … simply reveal information which is intrinsically inside the structures, waiting to be “pulled out.” This leads to the idea that for each structure, there are certain pieces of information which can be pulled out of it, while there are other pieces of information which cannot be pulled out of it. But what does this phrase “pull out” really mean? How hard are you allowed to pull? There are cases where by investing sufficient effort, you can pull very recondite pieces of information out of certain structures. In fact, the pulling-out may involve such complicated operations that it makes you feel you are putting in more information than you are pulling out.

The strange, foggy turf between a decoder pulling information out and putting it in, between implication and inference, is a thriving ground for art criticism and literary translation, as well as that interesting compression technique known as innuendo, which thrives on the deniability latent in this in-between space. There’s a kind of eros in this, too—I don’t know where you (intention) end and I (interpretation) begin—as the mere act of listening ropes us into duet.

Men and Women: (Merely?) Players

Part of the question of how good, say, a compressed MP3 sounds is how much of the original uncompressed data is preserved; the other part is how good the MP3 player (which is usually also the decompressor) is at guessing, interpolating the values that weren’t preserved. To talk about the quality of a file, we must consider its relationship to the player.

Likewise, any compression contests or competitions in the computer science community require that participants include the size of the decompressor along with their compressed file. Otherwise you get the “jukebox effect”—“Hey, look, I’ve compressed Mahler’s Second Symphony down to just two bytes! The characters ‘A7’! Just punch ’em in and listen!” You can see the song hasn’t been compressed at all, but simply moved inside the decompressor.

With humans, however, it works a little differently. The size of our decompressor is fixed—about a hundred billion neurons. Namely, it’s huge. So we might as well use it. Why read a book with the detachment of a laser scanning an optical disc? When we engage with art, the world, each other, let us mesh all of our gears, let us seek that which takes maximum advantage of the player—that which calls on our full humanity.

I think the reason novels are regarded to have so much more “information” than films is that they outsource the scenic design and the cinematography to the reader. If characters are said to be “eating eggs,” we as readers fill in the plate, silverware, table, chairs, skillet, spatula … Granted, each reader’s spatula may look different, whereas the film pins it down: this spatula, this very one. These specifications demand detailed visual data (ergo, the larger file size of video) but frequently don’t matter (ergo, the greater experienced complexity of the novel).

This, for me, is a powerful argument for the value and potency of literature specifically. Movies don’t demand as much from the player. Most people know this; at the end of the day you can be too beat to read but not yet too beat to watch television or listen to music. What’s less talked about is the fragility of language: when you watch a foreign film with subtitles, notice that only the words have been translated; the cinematography and the soundtrack are perfectly “legible” to you. Even without “translation” of any kind, one can still enjoy and to a large extent appreciate foreign songs and films and sculptures. But that culture’s books are just so many squiggles: you try to read a novel in Japanese, for instance, and you get virtually nothing out of the experience. All of this points to how, one might say, personal language is. Film and music’s power comes in large part from its universality; language’s doggedly nonuniversal quality points to a different kind of power altogether.

Pursuit of the Unimaginable

Kundera:

Isn’t making love merely an eternal repetition of the same? Not at all. There is always the small part that is unimaginable. When he saw a woman in her clothes, he could naturally imagine more or less what she would look like naked …, but between the approximation of the idea and the precision of reality there was a small gap of the unimaginable, and it was this hiatus that gave him no rest. And then, the pursuit of the unimaginable does not stop with the revelations of nudity; it goes much further: How would she behave while undressing? What would she say when he made love to her? How would her sighs sound? How would her face distort at the moment of orgasm? … He was not obsessed with women; he was obsessed with what in each of them is unimaginable … So it was a desire not for pleasure (the pleasure came as an extra, a bonus) but for possession of the world.

The pursuit of the unimaginable, the “will to information,” as an argument for womanizing? Breadth before depth? Hardly. But a reminder, I think, that a durable love is one that’s dynamic, not static; long-running, not long-standing; a river we step into every day and not twice. We must dare to find new ways to be ourselves, new ways to discover the unimaginable aspects of ourselves and those closest to us.

Our first months of life, we’re in a state of perpetual dumbfoundedness. Then, like a film, like a word, things go—though not without exception—from inscrutable to scrutable to familiar to dull. Unless we are vigilant: this tendency, I believe, can be fought.26 Maybe it’s not so much about possession of the world as a kind of understanding of it. A glint of its insane detail and complexity.

The highest ethical calling, it strikes me, is curiosity. The greatest reverence, the greatest rapture, are in it. My parents tell the story that as a child I went through a few months when just about all I did was point to things and shout, “What’s it!” “Ta-ble-cloth.” “What’s it!” “Nap-kin.” “What’s it!” “Cup-board.” “What’s it!” “Pea-nut-but-ter.” “What’s it!” … Bless them, they conferred early on and made the decision to answer every single time with as much enthusiasm as they could muster, never to shut down or silence my inquiry no matter how it grated on them. I started a collection of exceptional sticks by the corner of the driveway that soon came to hold every stick I found that week. How can I stay so irrepressibly curious? How can we keep the bit rate of our lives up?

Heather McHugh: We don’t care how a poet looks; we care how a poet looks.

Forrest Gander: “Maybe the best we can do is try to leave ourselves unprotected. To keep brushing off habits, how we see things and what we expect, as they crust around us. Brushing the green flies of the usual off the tablecloth. To pay attention.”

The Entropy of English

What the Shannon Game—played over a large enough body of texts and by a large enough group of people—allows us to do is actually quantify the information entropy of written English. Compression relies on probability, as we saw with the coin example, and so English speakers’ ability to anticipate the words in a passage correlates to how compressible the text should be.

Most compression schemes use a kind of pattern matching at the binary level: essentially a kind of find and replace, where long strings
of digits that recur in a file are swapped out for shorter strings, and then a kind of “dictionary” is maintained that tells the decompressor how and where to swap the long strings back in. The beauty of this approach is that the compressor looks only at the binary—the algorithm works essentially the same way when compressing audio, text, video, still image, and even computer code itself. When English speakers play the Shannon Game, though, something far trickier is happening. Large and sometimes very abstract things—from spelling to grammar to register to genre—start guiding the reader’s guesses. The ideal compression algorithm would know that adjectives tend to come before nouns, and that there are patterns that appear frequently in spelling—“u after q” being a good example, a pairing so common they had to alter Scrabble to accommodate it—all of which reduce the entropy of English. And the ideal compressor would know that “pearlescent” and “dudes” almost never pop up in the same sentence.27 And that one-word sentences, no matter the word, are too curt, tonally, for legal briefs. And maybe even that twenty-first-century prose tends to use shorter sentences than nineteenth-century prose.

So, what, you may be wondering, is the entropy of English? Well, if we restrict ourselves to twenty-six uppercase letters plus the space, we get twenty-seven characters, which, uncompressed, requires roughly 4.75 bits per character.28 But, according to Shannon’s 1951 paper “Prediction and Entropy of Printed English,” the average entropy of a letter as determined by native speakers playing the Shannon Game comes out to somewhere between 0.6 and 1.3 bits. That is to say, on average, a reader can guess the next letter correctly half the time. (Or, from the writer’s perspective, as Shannon put it: “When we write English half of what we write is determined by the structure of the language and half is chosen freely.”) That is to say, a letter contains, on average, the same amount of information—1 bit—as a coin flip.

‹ Prev Next ›