Researchers at Oxford University have concluded that what others hear as a foreign accent is actually a speech impediment that is mistakenly interpreted as an accent. For example, a patient with FAS might find that her attempt to pronounce the sound “is” comes out as “eez.” It is also common for patients who have suffered trauma or mild stroke to put an even stress on every syllable, which is very unlike an English speaker (but very like a French speaker). Put the two together and you could conceivably get an accent that sounds a bit Pepé Le Pew.
Anne’s patient most likely lost the ability to pronounce the letter w, which came out like a v. As for the changes in cadence, some scientists have suggested that once patients realize they’re sounding foreign, it’s easier to switch than to fight, so they adopt—subconsciously or otherwise—the language’s cadence. In fact, some go a step further and start to adopt the mannerisms (or at least the perceived mannerisms, often exaggerated) of their “new” culture, becoming real-life Zeligs.
“Is she seeking treatment?” I ask Anne.
“There is no treatment.”
“I see.”
Anne must notice my brow furrowing in thought. “What is it?”
“I was just wondering . . . is her German better than my French?”
* It’s the latter.
** You may, however, commit a faux pas if you order a côté voiture in France, where the drink, which may well have originated in Paris, is nevertheless more likely to be called (Académic française members, brace yourselves) un side-car.
Fruit Flies When You’re Having Fun
“What’s that fish doing in my ear?”
“It’s a Babel fish. It’s translating for you.”
— DOUGLAS ADAMS, The Hitchhiker’s Guide to the Galaxy, 1979
Tell me if you’ve heard this one: God, according to the book of Genesis, which religious scholars date to the fifth or sixth century BC, is feeling a tad insecure because the city of Babel is erecting a skyscraper, taller than anything ever seen round those parts. The tower is rising alarmingly close to heaven, so close that the workers are only a few floors short of knocking on heaven’s gate. Really, the last thing God needs is to have his minions construct a stairway to heaven. You have to earn your way there.
Something has to be done, but what? Now, this is where a lot of people get the story wrong: God does not destroy the Tower of Babel; He merely halts further construction, like a powerful union boss. Now, He could’ve opted to do this in any number of ways. This is, after all, God, the same God who stopped the sun dead in its tracks for twenty-four hours so the Israelites could finish off the Amorites at Gibeon. But God recognizes that the tower isn’t so much the problem as the symptom of a problem.
The real issue is that the peoples of the world “have one language, and nothing will be withholden from them which they purpose to do.” The gift of language is so mighty it poses a threat to God! So He famously says, “Come, let us go down and confound their speech.”
Hang on a second. “Us?” Who’s the “us”? Either God has company up there or He is using the royal “we,” which, you’ll remember, is descended from the Latin usage of vos/nos and led to the French power semantic, the vous/tu distinction.
In any event, He does (or they do) a thoroughly good and lasting job of confounding language. Unable to communicate, the workers abandon tower and city alike and go back to their previous business of waging war against one another, and to this day the world’s been grappling with the problems of the planet’s confounded tongues. Undoubtedly we haven’t been nearly as close to heaven since.
Linguists and sadistic high school French teachers are not the only ones who have tried to deal with God’s handiwork at Babel. Scientists have also taken a crack at this, and it’s about time. Society, by which I mean the entertainment industry, by which I mean science fiction movies and television, by which I probably mean Star Trek, has been dangling the promise of a computerized, universal translation device for four decades—most of my life. I honestly expected that by now I’d be wearing a Bluetooth-earpiece type of device that would translate spoken French into English and vice versa, and that would be that. I could travel anywhere in the world, from France to China to Boston, and understand and be understood without hundreds or thousands of hours of study.
More knowledgeable people than I have shared the same expectation. As recently as 2006, IBM included in its annual “5 in 5” list—five technologies they expect to see within five years—the prediction that 80 percent of the world’s population would own mobile devices “capable of precise language translation.” They envisioned “a technology equivalent to Star Trek’s universal translator—allowing two people speaking different languages to understand each other fluidly.”
Wow. It’s as if they were reading my mind, not to mention watching the same TV shows. But the company known as Big Blue didn’t make the prediction out of the blue; IBM was already working on such a device. And you’d think that if anyone could pull off this technological feat, it would be IBM, the company that built Watson, the computer that absolutely creamed the two greatest Jeopardy! champions the world has ever known.
Well, the five years are up, and my French is down, so instead of spending another morning trying to remember that in the upside-down world of French a literal “young girl” (jeune fille) is older than a “girl” (fille), I’ve come to visit Watson and his colleagues at the IBM Thomas J. Watson Research Center, not far from my home in the Hudson Valley, to see firsthand how far IBM has come toward realizing their prediction of an earpiece of Babel—and whether the technology might soon bail me and my French out.
Watson, massive and silent behind a glass wall in a darkened room like a high-tech Buddha, is resting, but Bowen Zhou, the manager of the speech-to-speech translation project at the Watson labs, is here to greet me. IBM has been working on computerized language translation for decades, for not only is communication important to this prototypical global company—after all, the I in IBM stands for “International”—there is potentially big prestige and bigger money to be won here. The European Union alone spends a billion euros a year on human translators.
Since well before Watson’s day, decades before there was such a thing as Google, computer scientists have been working on this problem. The origin of machine translation virtually coincides with the invention of the digital computer. The first published paper on the subject appears in 1949. It was almost as if scientists said, “Okay, we have this thing we’re calling a computer. What the heck are we going to do with it?” and someone piped up, “I’m going to Italy next year and don’t speak a word. How about we get it to translate?”
Why not? This would seem to be the type of thing computers excel at, but in reality it turns out to be a deceptively difficult task, one that defeated the best and brightest minds of two generations. Before a computer can translate between two languages, it first has to “learn” each of them. This means programming each language, its syntax and lexicon and rules, into the computer and then constructing a mapping between the languages.
Even while computers were flying airplanes and defeating chess grand masters (the checkmating computer known as Deep Blue is an IBM machine as well), computerized language translation continued to thwart scientists for all the reasons that French has been thwarting me: language is recursive (“John went to the store in the middle of the town that he moved to last year”); language is full of exceptions (“la nouvelle cuisine”; “la cuisine chaude”); language is ambiguous (“John threw Paul the ball”; “John threw up”).
Consider the two nearly identical sentences “The pen is in the box” and “The box is in the pen.” The words for a writing instrument and an enclosure for animals happen to be the same in English, but that is unlikely to be the case in any other language; thus the machine needs to possess knowledge of the sizes of things in the physical world before it can supply a correct translation. Not surprisingly, early efforts at machine translation produced as
many howlers as they did accurate, or even understandable, translations.
Feeling that there had to be a better way, in 1990 Peter Brown and three colleagues at the Watson Research Center published a groundbreaking paper that would fundamentally change the way that scientists approach computerized language translation. Brown argued that we should forget trying to teach the rules and syntax of language to a computer; that’s a fool’s errand. His approach didn’t rely on any knowledge of language at all. Rather, he proposed a statistical, data-driven model, in which the computer would have at its disposal a large database that recorded how professional translators—that is, real human beings—had previously translated words, phrases, and sentences from one language into another, and then the computer would search this database to find the most statistically probable match to the phrase being translated.
Take the phrase “good morning.” The computer searches its database and finds that “good morning” appears one hundred times. Eighty-two times it has been translated (by humans) as bonjour; twelve times, as bon matin; and six times, as bonne matinée. The computer chooses bonjour as the most likely translation.
Most likely, but not always correct. If the full sentence is “Good morning, sir,” the translation bonjour is correct, but if the context is, “You know it’s going to be a good morning when the newspaper lands near the door,” the statistically improbable bon matin is correct. If the speaker is leaving the boulangerie and wants to wish the clerk a good remainder of the morning, the even less statistically likely bonne matinée is the right translation.
Some of this information can be inferred from the phrase’s placement in the sentence, so you must have your software parse out and match the word, phrase, and sentence alignments (something called parameter estimation) and know something about the order of phrases in both the source and the target languages to get it right. As IBM’s Bowen Zhou explains to me, “You learn the word alignments first, then you extract patterns. We are trying to understand the hierarchical structure of languages. You cannot treat a language as just a string of words.”
In other words, it’s not merely a game of probabilities. Still, this statistical model of language translation that Brown was suggesting in 1990 was a totally novel approach, a far cry from trying to teach a computer the syntax and conjugations of a language. It was so radical that, backed by Brown’s promising early results, it startled, then changed, the field of machine translation.
When you think about it, the approach makes sense. It’s not dissimilar to the way a child learns a first language. He hears all this noise around him and may hear words in different contexts, with different representations, but over time he learns which meanings are the most likely for these new sounds. He also hears a lot of incorrect and fragmentary language and is able to filter out the good from the bad, partly because the good outnumbers—is more statistically probable than—the bad. The toddler’s database is the world of voices around him. But what’s the database for a computer? Where do you find enough quality translations between two languages for even a pilot project?
Canada. By law, all the proceedings of the Canadian Parliament must be recorded in both English and French. By 1990, all these transcripts, known as the Hansard, were available in computer-readable format, some hundred million words that Brown was able to obtain from the Canadian government. From that treasure trove his team was able to extract three million matching English-French sentence pairs, a modern-day Rosetta Stone.
Of course, in their test translations they didn’t always or even usually find an exact match for the input sentence (remember my earlier statement that it is likely that every sentence on this page has never appeared in print before). When they didn’t find a sentence match (which was about 95 percent of the time), they would drop down to the next level, trying to match a cluster of words in the sentence. If they found a cluster match, they would save it and move on to the next cluster. Once all the clusters were resolved, it was time to do the hardest part: construct a grammatically correct sentence in French from these French clusters.
When the computer doesn’t find a match for even a cluster of words, it seeks a match for each single word in the cluster. Here’s where things get really hazardous. For example, if you want to translate the word “hear” into French, you’re probably going to be using some form of the verb entendre. Yet in Brown’s early tests, the computer bizarrely translated “hear” into bravo every time, because when searching the database for “hear,” forms of entendre came up less than 1 percent of the time. The other 99 percent of the time “hear” was translated as bravo.
What gives? Well, in the Canadian Parliament, when a speaker says something members approve of, they respond with shouts—dutifully recorded by the stenographer—of “Hear, hear!” while the Québécois delegates shout “Bravo!” so in this context the translation is quite correct. But it skews the database, ludicrously translating “I hear you” into je vous bravo, one of those howlers I referred to earlier. Well, how do you get around this? By telling the computer that one “hear” means entendre and two of them back-to-back mean bravo.
Another way you get around this problem is by increasing the size of your database, drawing from a wider body and variety of translations. And by the middle of the next decade, the Internet would make that not only possible but relatively easy. And who is better at scouring the Internet for data than Google? Building on IBM’s research on the statistical model of language translation, Google ditched their old rules-based program in 2006, launching their own statistically based product, Google Translate, which soon became the most popular translation engine in the world, displacing such early entries as BabelFish.
What websites does Google visit to get reliable translations? “All official United Nations documents are required to be published in the six official UN languages,” Jeff Chin, lead product manager for Google Translate, told me over the phone. As are all documents of the European Union, which has twenty-three official languages. It turns out that there are a surprising number of translated documents on the Web, from online books to political reports to newspapers to blogs, and the first thing that statistical translation software does is “document alignment,” identifying which documents are translations of others, filtering out spam, deliberately malicious translations, and the like before proceeding with sentence, phrase, and then word alignment.
The explosion of the Web has allowed Google to include to date 80 languages that can be translated into any of the others, offering 6,320 potential language pairs. Of course, the pairs with the greatest number of available translated documents are going to produce the most accurate matches, so English-Spanish translations tend to be more reliable than, say, Vietnamese-Yiddish. In fact, there may be almost no Vietnamese works translated into or from Yiddish, but there are English works, from novels to news stories, that have been translated into both Vietnamese and Yiddish, and those two corresponding documents can be aligned.
The chief limitation with the Google approach is that it is all done on their server farms, requiring an Internet connection. IBM has taken a different approach. “We envision that speech translation is something that you carry with you all the time,” Bowen Zhou says to me as we sit in his IBM lab. “Think of where you are when you’re doing speech translation. You are often in a foreign country, far away from home, maybe in the middle of nowhere. You may have no Internet connection, or if you do, it may be expensive. He pulls an off-the-shelf smartphone out of his pocket and places it in airplane mode, disabling its cellular and wi-fi connections. “This is my everyday phone,” the Chinese-born Zhou says, “with our program loaded on it.” Then he presses the volume-up button and says something in Chinese. A moment later I hear a natural-sounding male voice say, “I enjoyed my stay in Tokyo and I really like this meeting,” putting emphasis on “really,” while the words appear on the screen.
Zhou hands me the phone, and I ask for a coffee, which he confirms is correctly translated into Chinese,
and then he says something in Chinese in return, and we have a brief conversation, not perfect, but not bad, with the smartphone as intermediary. Although I never do get the coffee.
The speech recognition is impressive as well and improves with use as it learns your voice. Harkening back to my days as a computer programmer, I wonder aloud, “How on earth did you squeeze all of this onto a smartphone?”
“It comes with a compromise,” Zhou says, explaining that they have intentionally limited the vocabulary and scope in order to do a limited thing well, instead of trying to do everything for everyone. The software, which was first deployed in Iraq under a Defense Department contract, is strong on military and infrastructure language, but weak on, say, sports.
Zhou encourages me to challenge the device, so I say, “Time flies when you’re having fun.” This isn’t quite as cheeky as it may seem; I chose the phrase “time flies” because, surprisingly, my French pen pal, Sylvie, with her limited command of English, had used it in a recent note. The software recognizes my English correctly but doesn’t know the idiom. Zhou says the Chinese it returned is too literal a translation, as if time had wings, but is otherwise correct and understandable to a Chinese speaker. Undeterred, I give it the acid test. “Fruit flies like spoiled peaches.” He shakes his head. It translated it as fruit flying through the air. Apparently it hasn’t been trained in entomology, either.
Flirting with French: How a Language Charmed Me, Seduced Me, and Nearly Broke My Heart Page 14