Winds of Change.NET: Liberty. Discovery. Humanity. Victory.

Formal Affiliations
  • Anti-Idiotarian Manifesto
  • Euston Democratic Progressive Manifesto
  • Real Democracy for Iran!
  • Support Denamrk
  • Million Voices for Darfur
  • milblogs
Syndication
 Subscribe in a reader

Machine Translation and Online Communities: Let Me Tell You A Story

| 7 Comments

My earlier post on the potential for machine translation (MT) in the blogosphere has drawn some great comments, with ideas that I'll collate and use as the base for some more research and a follow-on post - and of course others are invited to grab the ball and run as well. But in the meanwhile, let me tell you a war story that may help explain why I think this notion has potential:

Once upon a long time ago - 1994-6, to be precise - I was an exec at the pioneering online service CompuServe. When I arrived to take on part of the R&D efforts of the time, I inherited an existing project to test out machine translation to encourage multilingual conversations within our 'Forums' (most people would call them bulletin boards today.) This resulted in a vibrant multilingual online community of thousands, with some outcomes highly relevant to the notion of translating the blogosphere. I'm largely working from memory, since I turned in my notebooks when I left the company....

To give credit where it's due, the project was originated by Sandy Trevor (now here) and was led throughout by Dr. Mary Flanagan (now here). (I'll send an e-mail inviting Mary to correct or elaborate in the comments). The MT technology used was a now-vanished rule driven engine called Transcend, by Intergraph. (Update/Correction: Transcend was first sold to Transparent Language and then resold to SDL which still supports it.) A contemporary trade press description of the project is here. Fair warning: Some of the systems created were patented (US Pat #5,715,466).

We first set up the machine translator to run on a technical support forum for the service's Mac based proprietary client. As commenters below noted, restricting the translation domain, particularly into a technical field with a lot of jargon, makes the translation more effective. The translator would run through the lists of posts every few minutes, and translate every new English post into French and German and insert them on parallel forums in those languages, also grabbing new ones originating in French and German and translating and reposting them in English. (Note we never had the non-English language pairings - all of these projects were 'stars' around English.)

This went pretty well, in spite of some squawking and mocking by a group of professional translators on the service, and we moved on to bigger things. Adding Spanish to the list of languages, we create four parallel forum bulletin boards which were collectively called the 'World Community Forum'. We seeded them with some guesses of topics that people would want to talk about across cultures - e.g., Sports, Politics, Travel. We were now venturing out of safe technical topics, and the technology was going to be challenged more severely. One of the joys of online services then, as with the Internet now, was the ability to just 'go live' with your experiments. And since the revenue model of the day was pay-per-connect time, we could even try to make money doing so. After running a few tests, we make the forums public, and started promoting them on the "What's New" part of the main CompuServe menu.

The World Community Forum was a roaring success. It attracted hundreds and soon thousands of visitors, went profitable within a few weeks, and soon was regularly appearing in the top 50 products list of the entire service. (You'd have thought the profits would have been added to my R&D budget, but noooo....) After some weeks of dealing with translation accuracy and with scaling issues - we'd brought the whole project up on a single server in a field office - there was enough time to run some analytics and dig into the conversation patterns and topics that were evolving in the forums.

The results were very interesting. While there was significant use of the translators, the majority of posts and responses were in the same language, even though we could tell from the idents of the users that many were posting from different native language territories. Digging into the texts, we found some of them ridiculing and trying to correct the machine translations, but that more were people deliberately using their second langauges. Germans tried out their schoolbook English on the Americans and Brits, and Americans used their rotten Spanish to try and converse with Latin Americans. And all sides cheerfully corrected each other's mistakes as the conversation flowed.

What had begun as a technology trial had in fact resulted in a marketing experiment. Just by the way we promoted the forums, we had drawn in those most interested in cross-cultural meetups, from a service that was already the most international of that time. Topic drift being multicultural, of course they proceeded to talk about anything that came to mind, largely foiling any ideas to use specific vocabularies in particular topic areas. (I clearly recall reading one thread in which German hausfraus and wives of US troops posted there had a lengthy and detailed discussion of their different approaches to drying and folding towels. Go figure.)

I take these results as optimistic for the idea of translated blogs as cultural bridges, in several ways. The known deficiencies of machine translation didn't kill the experiment. Real people were able to repair the dialog in almost all cases of error, and found some common cause in doing so. Providing the facility attracted motivated cultural 'bridgers' who in many cases supported each other in language skills directly, as well as carrying on enjoyable and wide ranging conversations. Few of them were on world saving topics, but a number of close friendships resulted.

'Translating' this experience from a closed, proprietary, and highly structured system into the loosely linked, open, and dynamic blogosphere is going to be a bit of a challenge. It's likely to become a social system as much as a technology infrastructure, so they need to start modestly and co-evolve. But this long ago and already paid for experiment suggests it is workable and worthwhile.

7 Comments

Very interesting. I'd heard/read some rumors about hand-held English-Arabic translators being made available to troops in Iraq in small quantities early on in OIF, but never anything further. Was it true, do you know?

Hmm, interesting stuff. Anybody wanna ask Clay Shirky what he thinks about this?

It's called the "Phraselator." Basically the user speaks / selects a phrase and it finds the foreign language equiv. and it "speaks" the desired phrase out of a speaker.

They started using them in limited quantities during OEF.

MH

Phraselator is an interesting story itself. It was still nominally in development at DARPA (see 2nd item at link) when the project was pushed into operational status for Afghanistan, and a few contractors got a much more exciting field trial than they had likely anticipated. As we were discussing on the other thread, natural language processing does better when you control the domain of discourse, and Phraselator is no exception. IIRC, the first versions sent over had a vocabulary suitable for handling and interrogating prisoners, and subsequent versions expanded to medical assistance, etc., as the conflict ended and reconstruction began.

(As an aside, DARPA is one of the most respected government research organization in the VC community. A lot of their systems level work obviously doesn't fit the civilian market, but their basic technology work is always smart, in the sense of being clearly relevant to an outcome, and fully leveraging what's already been done.)

Have a look at http://www.multilingualblog.com/index.php/weblog/translation_in_the_blogosphere/ for some thoughts on other translation projects in this space.

I am hugely disappointed at how slowly MT is coming. It should be an EU open source/ X prize kind of competition, with different teams (national teams?) for some 10 million Euros or something significant.

And lots of open source results after.

There should be a better bi-lingual blogging tool; I know that the Dissident Frogman and Merde in France do theirs manually (I'd like one for Slovak).

If somebody had a contact to JK Rowling, it might be interesting to get her books, on-line, as part of a corpus of translated text. (I'm now reading my 10th HP book: Harry Potter #5 in Slovak)

Google might well be interested -- finding "phrases that are often asked about", and having more phrases in phrase DBs (much, much larger than dictionaries).

Kerry's complaint about the lack of translators was a strong one -- and a big failure of Clinton as well as a HUGE failure, after 9/11, of Bush. Every college in America with a foreign language dept. should be encouraged to increase their Arabic (and Farsi? Persian) ME languages.

I'm surprised that the military hasn't added some bonus pay to soldiers learning Arabic, all over America.

The Linguistic User Interface (LUI)

"Perhaps the most underappreciated accelerating transition we are participating in today is the emergence of the Linguistic User Interface or LUI. The LUI is the natural language front end to an increasingly intelligent and profoundly humanizing and malleable Internet. LUIs exist today in primitive form in interfaces like Google, but will be increasingly powerful in coming years. So what will Windows 2015 look like? For one thing, it seems clear now that it will have some very sophisticated software simulations of human beings as part of the interface."

http://www.usnews.com/usnews/tech/nextnews/archive/next040423.htm

See also:
http://www.singularitywatch.com/lui.html
http://divedi.blogspot.com/2004/04/linguistic-user-interface-or-lui.html

Leave a comment

Here are some quick tips for adding simple Textile formatting to your comments, though you can also use proper HTML tags:

*This* puts text in bold.

_This_ puts text in italics.

bq. This "bq." at the beginning of a paragraph, flush with the left hand side and with a space after it, is the code to indent one paragraph of text as a block quote.

To add a live URL, "Text to display":http://windsofchange.net/ (no spaces between) will show up as Text to display. Always use this for links - otherwise you will screw up the columns on our main blog page.




Recent Comments
  • Joe Katzman: No, Andrew, I did not. Glad to hear it. read more
  • Joe Katzman: I didn't say it was necessarily new, though humans hadn't read more
  • Joe Katzman: I'm not so sure about the British, Grim, but characterizing read more
  • dfkling: While I tend to agree with the majority of the read more
  • Jeff Medcalf: I have several issues with this. First, I disagree with read more
  • Tim Oren: I wonder what is the correlation between countries where military read more
  • Alchemist: Good post by the way, and I largely agree with read more
  • Grim: Hm. "We would never pay bribes, which is illegal. This read more
  • Grim: Smart, yes, but what's the evidence that it's new, i.e., read more
  • Armed Liberal: I've got to dig the book out, but I think read more
  • Marcus Vitruvius: Andrew, That's not surprising. Sad, but not surprising. Of the read more
  • Andrew J. Lazarus: The vast majority of comments at that link are pro-Birther. read more
  • Silverlake Bodhisattva: Re: "I'm just asking the question": "I know those stories read more
  • mark buehner: Maybe now Conservatives will stop slurring liberals as having a read more
  • Marcus Vitruvius: Hear, hear. Schlichter nails it when he says that "I'm read more
The Winds Crew
Town Founder: Left-Hand Man: Other Winds Marshals
  • 'AMac', aka. Marshal Festus (AMac@...)
  • Robin "Straight Shooter" Burk
  • 'Cicero', aka. The Quiet Man (cicero@...)
  • David Blue (david.blue@...)
  • 'Lewy14', aka. Marshal Leroy (lewy14@...)
  • 'Nortius Maximus', aka. Big Tuna (nortius.maximus@...)
Other Regulars Semi-Active: Posting Affiliates Emeritus:
Winds Blogroll
Author Archives
Categories
Powered by Movable Type 4.23-en