Techrecipe

Wikipedia, between machine translation and reliability

There are several language versions of Wikipedia. However, some of the translated versions of English Wikipedia are difficult to understand due to machine translation, which can cause embarrassment.

Wikipedia’s policies and guidelines state that the goal of Wikipedia is to create the largest encyclopedia ever, both in quality and quantity, as a free, reliable encyclopedia. However, even a word that is popular in a specific language area can be thought to be puzzled when viewed as machine translation in a place that is not popular in the native language.

Wikipedia’s largest community is English. There are more than 1 million texts, 15 out of 300 languages are spoken. In fact, the most common languages spoken on the Internet are English and Chinese. For this reason, the difference between English and native language can be severe.

Wikipedia said it plans to translate articles in partnership with Google Translate. According to a Wikimedia press release, Zulu is spoken by more than 12 million people, but there are only 1,100 texts in Wikipedia. It is a strategy to expand multiple languages.

This is the CTT (Content Translation Tool) developed with Google. As of July, use is limited as a beta version. Still, it is said that 400,000 Wikipedia articles have been translated through this tool so far. The press release clearly states that the test is translated using the power of machine translation, including Google translation. Through this, there are 121 languages in which content can be translated.

CTT seems to be convenient, but it also appears to involve risks. Problems may arise, such as the Portuguese editor misunderstanding that the quality of the machine translation was poor and the village pump dropped a bomb on the village.

In recent machine translations, the term Human Parity is often used. This means that the quality of translation is increasing due to humans. In fact, the quality of human-level quality is often based only on extremely limited experimental results, and many languages have not yet reached this level.

In the case of Indonesian language in Wikipedia, there is also an official request to ban the use of translation tools. There are voices of concern about whether this problem will damage Wikipedia’s credibility. Google translation quality is improving, but there still seems to be a big difference in translation quality by language. Experts say it’s worth discussing efforts to improve machine learning by bringing communities together. This is because efficiency can be lost through difficult methods such as machine translation correction. In order to translate a language or term, if you don’t take the culture as a background, simple literal translation can cause problems. That’s why mistakes still exist in small translated documents. Related information can be found here .

lswcap

lswcap

Through the monthly AHC PC and HowPC magazine era, he has watched 'technology age' in online IT media such as ZDNet, electronic newspaper Internet manager, editor of Consumer Journal Ivers, TechHolic publisher, and editor of Venture Square. I am curious about this market that is still full of vitality.

Add comment

Follow us

Don't be shy, get in touch. We love meeting interesting people and making new friends.

Most discussed