Just like we can use language to be elitist, we can also use language to show solidarity, like politicians who suddenly adopt a folksy way of talking on the campaign trail. In some cases, shifting language is practically universal: none of us talk to a dog the same way we talk to our coworkers (“Who’s a good boss! Do you want to go for walkies and also give me a raise?”). In other cases, our linguistic styles are bound up in a specific identity: William Labov studied residents of Martha’s Vineyard and found that those who identified strongly with traditional island culture had stronger local accents than those who didn’t. More recent research has shown that intonation in particular is related to social identity: young men in Washington, D.C., with one black and one white parent talk differently depending on whether they identify as black or biracial; the speech patterns of people living in Appalachia depend on how “rooted” they feel in the local community; and the speech of Jewish women in Ohio and New Jersey varies depending on their relationship with their Jewish identity.
In still other cases, the alignment is less about showing that you’re part of the same group and more about borrowing coolness from another group. Research on youth language in several countries shows a parallel trend: there are distinctive linguistic forms associated with economically and racially marginalized youth in contexts ranging from the American inner city to the banlieues of Paris to the favelas of Rio de Janeiro. Elements of their language then get picked up by white middle-class youth. They don’t adopt enough to make them no longer seem comfortably middle class, but just enough to strike a note of autonomy from parents, teachers, and other authority figures. Of course, when a word like “lit” or “bae” gets sufficiently associated with mainstream culture—and especially when it gets picked up by brands capitalizing on trends—it then loses its appeal to hip insiders, prompting the cycle to begin again.
In English, the association of words from African American English with coolness and their subsequent appropriation by non–African Americans is much older than the internet. Terms associated with African American music, including blues, jazz, rock and roll, and rap, have all made their way into broader Western culture, while the speakers who originated them continue to be stigmatized for the way they talk. One thing that changes with the decentralization of online media is that the original speakers can become more visible. While a white person in the sixties listening to Elvis might have had no idea that he was singing a style heavily influenced by black performers like B.B. King and Sister Rosetta Tharpe, it’s easier to see that mainstream America’s adoption of “on fleek” came from a post on Vine (a now defunct service for sharing short videos) by the user Peaches Monroee. Still, it’s tempting to mislabel the many words currently being appropriated into general American pop culture from African American English as “social media words” simply because they’re used by young people, and young people are on social media, without giving due credit to the words’ true origins. Fittingly, the internet has come up with a word for this: columbusing, or white people claiming to discover something that was already well established in another community, by analogy with how Columbus gets credit for discovering America despite the millions of people who already lived there.
In other languages, English itself is often a source of trendy new linguistic influence, one that signals interest in a broader, global culture rather than a smaller local one. The situation in Arabic is particularly interesting, because it involves multiple languages, multiple dialects, and multiple scripts. Most Arabic speakers know two varieties of Arabic: Modern Standard Arabic, which is the standardized, multinational version based on Classical Arabic that people learn to write in school but speak only rarely, and a local dialect, such as Egyptian or Moroccan Arabic, which is the language of everyday speech and doesn’t have an official written form. Back when Arabic speakers, like most of the rest of the world, associated writing with formality and speech with informality, this worked fine. Sure, you’d have news anchors speaking the standard and advertisements written in the vernacular to add a bit of local color, but for the most part, Arabic was comfortably settled in what linguists call a diglossia: when a society has two languages or dialects that almost everyone speaks, each of which serves a distinct social function.
Then personal computers and the internet arrived, and things got really complicated, really quickly. Early computers and websites were in English and were often used by people at universities who spoke English to communicate with the rest of the world. And, importantly, these new devices generally came with English keyboards and English displays, rather than Arabic ones. So speakers figured out a way of writing Arabic sounds using the Latin alphabet, a system known by various names, such as ASCII Arabic, the Arabic chat alphabet, Franco-Arabic, Araby, Arabizi, and Arabish.
Arabizi has some distinct advantages. Most official Romanizations of Arabic use “kh” to represent the Arabic letter خ, a sound that may be familiar to English speakers as the “ch” in Scottish “loch” or the “x” in the Spanish pronunciation of “Mexico.”* But “kh” is actually a rather confusing way of representing this sound, because it looks the same as simply the /k/ sound followed by the /h/ sound, a sequence which is rare in English (found only in compounds like “cookhouse”) but fairly common in Arabic. So informal writers use a different convention. Based on the similarity in shape, people instead write it as the number 5 or 7' (that’s 7 with an apostrophe), which looks sort of like the خ in a mirror. They don’t use plain 7, though, because that’s already in use to represent ح (its dotless equivalent), another sound that’s hard to transcribe—many systems use “h” for it, because it sounds kind of like a throatier /h/, but that’s a problem because Arabic also has the more common /h/ sound that’s in English. Using 7 instead solves the problem of one letter representing two sounds.
By similar logic, the numbers 9' and 9 can be used for the letters ض and ص, the numbers 6' and 6 for ظ and ط, and the numbers 3' and 3 for the letters غ and ﻉ—all representing sounds that don’t have ready equivalents in the Latin alphabet. What’s important about Arabizi is that it assumes familiarity with Arabic already: it’s a grassroots system based on the priorities of literate native speakers that each of these different sounds should be represented by a distinct symbol. Other Romanizations tend to do the opposite, rendering the same letters as variants of “d” and “s,” “dh” and “t,” “gh” and a backwards apostrophe (or simply omitting it altogether, as in the word “Arabic” itself, which is technically “3arabi”), based on what they sound like to non–Arabic speakers. Sure, sometimes it’s useful to be able to interact on a more globalized level, like when writing about names and locations in Arabic-speaking countries for an English-language newspaper, but sometimes you also care about the local. To Arabic speakers, these distinctions are completely vital, and omitting them is like trying to convince English speakers to spell “sing” and “thing” the same way because French doesn’t care about that weird English “th” sound.
Although Arabizi was initially made necessary because computers didn’t support the Arabic alphabet, it’s now taken on a social dimension. A paper by David Palfreyman and Muhamed Al Khalil, analyzing chat conversations between students at an English-speaking university in the United Arab Emirates, gave an example of a cartoon that one student drew to represent other students in her class. One student was labeled with the name “Sheikha,” using the official Romanization of the university. But the nickname version of the same name, which doesn’t have an officially sanctioned spelling, was written in the cartoon as “shwee5”—using Arabizi “5” to represent the same sound as the official “kh.” It’s a hand-drawn cartoon: there’s no technological reason for either name to be written in the Latin alphabet. But at least for some people, it’s become cool: participants in the study commented that “we feel that only ppl of our age could understand such symbols” and that it makes “the word sound more like ‘Arabic’ pronunciation rather than English. For ex
ample, we would type the name ('7awla) instead of (Khawla). It sounds more Arabic this way.”
In particular, with advancements in keyboarding meaning that it’s easier to type the Arabic alphabet than it was in the 1990s, people generally use the official alphabet for the Standard variety, with its established writing system, and can now use either alphabet in a grassroots fashion for the local varieties. A study of the linguistic choices of prominent Egyptians on Twitter gives us some examples of how people decide which one to use. A politician tweeted predominantly in Modern Standard, reflecting his older age and the traditional expectation of politicians to speak the standard. A popular singer tweeted mostly in Egyptian Colloquial Arabic with some Modern Standard, both written in Arabic script, reflecting his younger age and fanbase, as well as the language his songs were in. A fancy restaurant tweeted in English and Egyptian Arabic written in Arabizi, to appeal to a wealthy, cosmopolitan clientele who would have been educated abroad. A cultural center tweeted in English and Modern Standard, to appeal to an educated regional and international audience. Egyptian Twitter users could thus potentially see four different linguistic conventions on their one feed: English and Modern Standard Arabic in their respective scripts, and Egyptian Arabic in both. And they could pick and choose between them for their own messages, depending on who they are and who they’re trying to talk to.
While we may not all have multiple alphabets to choose from, we do all make linguistic choices based on our audience. Jacob Eisenstein, the linguist who was Twitter-mapping “yinz” and “hella,” and his collaborator Umashanthi Pavalanathan at Georgia Tech decided to split up English tweets in a different way. Rather than look at location, language, or script, they looked at the difference between tweets about a particular topic, say the Oscars, versus tweets in conversation with another person. As it happens, Twitter has an easy way of automatically grouping these two kinds of tweets. If you put a hashtag in your tweet, like #oscars, then other people who are also interested in the Oscars know that they can click on or search that hashtag to find other tweets that also contain #oscars. If you put someone’s Twitter username after an @ sign, like @Beyonce, then that user will get a notification about your message and hopefully reply to you the same way.
Since # and @ are distinct symbols, it’s easy enough to automatically sort a giant pile of tweets, discarding the ones that contain both or neither. Sure, it’s a bit rough—people probably aren’t searching through sarcastic hashtags like #sorrynotsorry for topical information, and Beyoncé probably won’t tweet you back (uh, #sorry)—but it works pretty well at a large scale. What Eisenstein and Pavalanathan found was that people used regionalisms like “hella,” slang like “nah” and “cuz,” emoticons like :), and other informal language more in the tweets that @mentioned another user, while the same people used a more standardized, formal style in their tweets with hashtags. They theorized that, just as in person we’d generally talk more formally when addressing a roomful of people than when talking one-on-one, we’re directing a tweet with a hashtag towards a large group of people. Our @mentions, on the other hand, are more informal, only noticed by a select few—and we adjust our language electronically the same way we do out loud.
Studies of people who tweet in other languages show a similar pattern. A study of people in the Netherlands who tweet in both the locally dominant language, Dutch, and a local minority language, Frisian or Limburgish, found that tweets with hashtags were more likely to be written in Dutch, so as to reach a broader audience, but that users would often switch to a minority language when they were replying to someone else’s tweet. The inverse was less common: few people would start in a smaller language for the hashtagged tweet and switch to the larger language for the one-on-one reply.
Another study investigated how people use informal language in Indonesian, comparing how they write in private, one-on-one text messages versus public tweets. For example, the Indonesian word sip means “okay, yeah, good,” but to emphasize it, you can respell it siiippp, and “thank you” is terima kasih, but if you want to try to match the pronunciation of the popular Jakarta dialect, you can respell it makasi. If @replies on Twitter are slightly more casual than messages broadcast in hashtags, then texts are more intimate still, and sure enough, Indonesians used informal respellings like this almost four times more often in texts than in tweets. Tweets were also nearly twice as long as texts, on average, and contained more complex sentences and a larger variety of words.
From an internet linguistics perspective, language variation online is important not so much because it’s new (language has always varied), but because it’s only rarely been written down. Literature favors a few elite languages and dialects, even though there are around seven thousand languages in the world and at least half of the world’s population speaks more than one language. So this glorious variety masks a digital divide: people who switch between languages or who speak a less written linguistic variety run into difficulties with many of the automated linguistic tools that internet residents rely on, such as search, voice recognition, automatic language detection, and machine translation. These tools are trained on large corpora, often from formal sources like books, newspapers, and radio, which are biased towards the forms of language that are already well documented. One method of bridging this gap uses public social media writing itself as training input—a promising avenue, considering that the quantity of informal writing produced on the internet exceeds the volume of formal writing many times over.
There aren’t very many quadrilingual Arabic-Frisian-Indonesian-English speakers: I wouldn’t expect to see a study of tweets switching between all four anytime soon. But regardless of the specific linguistic circles we hang out with online, we’re all speakers of internet language because the shape of our language is influenced by the internet as a cultural context. Every language online is becoming decentralized, getting more of its informal register written down. Every speaker is learning how to write exquisite layers of social nuance that we once reserved for speech, whether we mark them by switching alphabets, switching languages, or respelling words.
All our texting and tweeting is making us better at expressing ourselves in writing. Researcher Ivan Smirnov analyzed posts by nearly a million users in St. Petersburg on the Russian equivalent of Facebook, a social media site called VK, from 2008 to 2016. He found that average word length, a measure of complexity, increases as people get older and as they get more education, as we might expect. But Smirnov also found that messages overall have been getting more complex over time. As he put it: “15-year-old users in 2016 wrote more complex posts than users of any age in 2008.”
No one who writes “u” does it because they’re unaware that “you” is an option. A literacy study by Michelle Drouin and Claire Davis points out that the idea that textisms might interfere with our ability to produce the formal standard just doesn’t fit with what we know about how memory works. Slang and abbreviations are for very common words: “u” for “you,” “ur” for “your,” “idk” or “dunno” for “I don’t know,” and so on. That’s the point—the sender saves a bit of effort, and the receiver can interpret them because they’re so frequent. We don’t get internet abbreviations for longer, rarer words and phrases, like “pterodactyl” or “do you wanna start a band?” In psychological terms, shortcuts are for ideas that we’ve overlearned. You might forget how to find a fancy restaurant that you only go to occasionally, but you can get from your bed to your bathroom even when you’re half asleep. If we were going to forget any part of language, it would be the rare, two-dollar words like “grandiloquent” or “sedulous” that we memorize with flashcards for the sake of a test, not the short words we learned as tiny children and keep encountering every day in both their abbreviated and non-abbreviated forms.
Just as conversation and public speaking have coexisted throughout human history, informal writing online can share space with more formal styles. Formal internet genres lik
e ebooks and news sites and company websites no more resemble your quickly dashed-off text message than print books and newspapers and company brochures resembled a hastily scribbled note on the kitchen table. Several studies show that people who use a lot of internet abbreviations perform, at worst, just as well on spelling tests, formal essays, and other measures of literacy as people who never use abbreviations—and sometimes even better.
Instead, what people are doing with internet slang is a good deal more subtle. The linguists Sali Tagliamonte and Derek Denis got seventy-one teenagers to donate the written records of their instant messaging conversations so that they could disentangle what they were actually doing. They found that the teens weren’t actually using internet slang all that much. Unlike examples from hyperbolic articles, where almost every word is replaced with slang (r u gna b on teh interwebz l8r?), only 2.4 percent of the actual teens’ messages were slang. (I’m reminded of the surveys of perception versus reality for other kinds of youth behavior, where everyone thinks everyone else is drinking more and having more sex than them.) What the teens were doing instead was more sophisticated: they intermixed the very informal features, like smiley faces and acronyms, with very formal ones, words like “must” and “shall” that are rare in speech. Here are a few snippets from various conversations:
aaaaaaaaagh the show tonight shall rock some serious jam
Jeff says “lyk omgod omgod omgodzzzzzZZZzzzzz!!!11one”
heheh okieee! must finish it now ill ttyl
Because Internet Page 6