- Global Voices Summit 2012 - https://summit2012.globalvoices.org -

Keeping Endangered Languages Alive Online

Categories: Languages, Summit Panels

Keeping Endangered Languages Alive #GV2012 [1]

Will we all converge on English and Chinese online? Or will technology help sustain our many mother tongues?

Eddie Avila, Director of Rising Voices [2], notes the difference between endangered languages and underrepresented languages. Three of the panelists are working with the latter in African languages, where there's not a proportional amount of the language online. Young people need to see their language reflected on the internet and in localized software (like OpenOffice) to understand that their language belongs to the future, and not history.

Boukary Konate

Boukary (@fasokan [3], blogging at Fasokan.com [4]), is from Mali, and speaks Bambara [5], one of maybe fifteen languages in Mali. Bambara is spoken by some 80% of the people. The educational system begins with the mother tongue, followed by French and English. He thinks it's critical to keep his native language online, so he blogs in Bambara and French. Language is passed on in villages with alphabetization lessons, so Boukary has designed lessons to encourage children to write and tell stories in their native language.

Bambara depends on four very unique letters, and Western keyboards don't feature these characters. But there are extraneous letters on the standard QWERTY keyboard, like the letter Q itself, which Bambara does not need, so substitutions can be made manually and letters can be switched. Someone developed a Facebook / Twitter application that supports Bambara, allowing people to post statuses in their native tongue.

Might on-screen, software-based keyboards help us overcome the limitations of physical keyboards designed for dominant languages? (See ANLOC's language keyboards page [6] for more). Accentuate.us [7] is another solution. It's a great Firefox extension that “allows you to type quickly and easily in more than 100 languages without extra keystrokes or a special keyboard.”

Abdoulaye Bah

Abdoulaye is a Global Voices blogger in the French-speaking group. He is from Guinea, but lives in Italy and France. Over the course of his life, he has spoken 8 languages, but the one he knows least is his own, Fulani [8]. He has used the other languages more, as they have been required by practical purposes. Fula is spoken as a first or second language in as many as 18 African countries, including Guinea, where 40% of the people speak it, and Mauritania, Cameroon, Chad, and parts of Ethiopia.

Many of the problems the language faces stem from the fact that it isn't taught in school. People learn to write it by individual lesson, or not at all. Abdoulaye sees blogging as one of the only ways to keep the language alive. There are many blogs in Fulani. Online videos are another way the language is represented online, bypassing the challenge that more people speak than write with the language.

Oliver Stegen

Oliver is originally from Germany, but has been in East Africa for many years (his mother tongue is most definitely not endangered). He came to Global Voices not as a blogger, but as a linguist with SIL. They work with mother tongues and translation.

Rangi [9] is one of 120 languages spoken in Tanzania, and is historically only a spoken language. The global linguist community wanted to document a written version, so Oliver spent years in Tanzania talking to government officials and teachers to record the language. He wrote linguistic articles in academic journals, but quickly realized that this form of publication would not help this endangered language survive. Instead, he worked with teachers to establish literacy classes in villages. But the language still didn't gain much relative ground.

Come 2006, Oliver discovered citizen media. He began posting Facebook statuses in Rangi, to the dismay of his friends. He couldn't even convince Rangi speakers to use Rangi online, because many of their own friends spoke English. Languages appear to benefit and suffer from the same network effects as social networking sites themselves.

indigenoustweets.com Twitter List [10]

Oliver then began writing tweets in Rangi, and discovered indigenoustweets.com [11], where Kevin Scannell identifies and traces tweets posted in small languages. The list represents well over 100 small languages, and the most influential tweeters in each language (as measured by followers). Oliver's Rangi tweets made the list. He provided Kevin with 150 pages of Rangi text, which was included in the software so the language can now be recognized when it is used online.

The African Network for Localization [12] (ANLOC) organizes forums in local languages [13].
Oliver started a Facebook Group for Rangi speakers, and after years of trying, the group grew to 26 members within 2 hours. The earlier literacy efforts had paid off, in terms of people being able to write the language. The group now has well over 300 members, with daily usage across Tanzania and worldwide. A Rangi-speaker in Miami was able to use the group to connect with a Rangi-speaker in Minnesota who was traveling through on a business trip. Challenges include the special characters of the Rwangi language, which is hard to represent online. Accentuate.us [7] is a great service, but Rwangi relies on unique characters.

Oliver is also on the language committee of Wikimedia and an editor at the Swahili Wikipedia. There's a debate with Swahili, which is certainly not endangered, but is underrepresented online. Global Voices has had challenges finding Swahili translators, resulting in months without fresh Swahili content on the site.

EndangeredLanguages.com screenshot of languages by location [14]

Last week, Google launched their EndangeredLanguages.com [14] initiative with the Alliance for Linguistic Diversity. The first challenge we face is counting endangered languages. The Google initiative lists over 3,000 languages, some of which may be underrepresented rather than endangered. Only 285 languages have a Wikipedia edition at all (much less a robust edition). The gap between Google's 3,054 languages and the 285 Wikipedia editions is the challenge of getting endangered languages online.

A group is working in Romania to localize software, and faces challenges like inventing new words for modern objects like the computer. Rising Voices offers microgrants [15] to amplify the voices of people in underrepresented communities, and this includes three grants to people working to revive underrepresented languages:

Lingual Convergence Online

It appears that with language online, people default to the most common and dominant languages. One person wrote on a friend's wall in Kechua, but he replied in Spanish. Especially on Facebook, people write for the language that the majority of their friends will understand. Oliver suggests that solution around this phenomenon is to create a dedicated space online where an underrepresented language can be practiced. If people come into the Rangi forums writing only in Swahili, they're told off. Language norms can be enforced in online fora just like other community norms.

As the world becomes more global, people seek to conform. In Africa, people are moving to cities and learning English and especially Chinese. Even languages that weren't previously endangered are becoming so, because newly urban children don't learn or use their mother tongue. We all speak English on these panels, and the concept of a lingua franca [17] is nothing new.

An audience member shares a linguistic joke with us: Finnish is such a difficult language that when two Finns want to tell each other something quickly, they use English instead.

Another joke: Which language is closer to heaven, Hungarian or Finnish?
It doesn't matter, they both take eternity to learn.

Is automated translation an option?

Google Chrome, and now Gmail, offer inline translation. Bing provides Facebook status translation. It's not perfect, but it helps us get the gist of what the message is about. And it works OK between Hungarian and Finnish. But it's terrible for Swahili. We need many human translators to improve the automated algorithms. We're unlikely to ever have an automated option for endangered languages, as we don't even have a corpus to train the machine.