Íàçàä â áèáëèîòåêó

Automated Translation

Àâòîð: Christian Boitet
Èñòî÷íèê: Universite Joseph Fourier / GETA, CLIPS, IMAG.

Àííîòàöèÿ

Christian Boitet Automated Translation We have a lot of information about several kinds of AT (automated translation) systems.

Îáùàÿ ïîñòàíîâêà ïðîáëåìû

Machine Translation was the first non-numerical application of computers. After initially promising demonstrations in the US around 1954, it was realized that HQFAMT (high quality fully automatic machine translation) would in general be impossible. Less ambitious tasks were then attacked by lowering one or more requirements. The result was several kinds of AT (automated translation) systems. There exist many LQFAMT (low quality-) systems, producing "rough" translations and used for accessing information in foreign languages. HQFAMT systems for very restricted typologies (kinds of texts) are less common but do exist. There are also HQMT systems for restricted typologies, which produce "raw" translations good enough to be revised cost-efficiently by professional revisers. HQMT can also be obtained by asking end users to assist the system. Finally, TA (translation aids) combining online dictionaries, term banks, and translation memories, are used extensively by professionals.

HUMAN TRANSLATION

Translation is difficult.Although it can be argued that the common nature of natural languages makes it possible to translate between any two languages, the task is much more difficult than usually believed, because of differences between languages, unavoidable ambiguities, and insufficient contextual knowledge.

Natural languages reflect different points of view about the world. The "percepts" (things) we speak about may differ. English, French, and Russian have one word for "wall" (mur, stena/CTEHA), while German, Spanish, and Italian distinguish between the volume and the surface (Mauer/Wand, muro/pared, muro/parete). The "concepts" (ideas) also reflect cultural differences. For example, Japanese has not only different verb forms but actually different verbs for the concept "to be" (da/familiar, desu/neutral, gozaimasu/polite) according to the attitude of the speaker. In Japanese, the subject of a verb must be volitional, so that it is incorrect to say "the typhoon destroyed the house": one must use another point of view, and say "the house collapsed/was destroyed due to the typhoon". Etc.

Another source of the differences which make translation difficult is that, while all types of meanings can be expressed in all languages, some elements of meaning must be expressed in some languages, while they are usually unexpressed in others. In languages without articles (the, an), such as Russian, Chinese, Japanese, or Thai, definiteness is "underspecified" (usually not expressed explicitly). Similarly, aspect is underspecified in French as compared with Russian or even English. Number and gender (or sex) are likewise necessary in some language and not usually expressed in others. For example, "Tanaka-san" in Japanese means "Mr Tanaka" as well as "MrsTanaka" or “Miss Tanaka”.

It is not possible in general to translate by parts of speech: a noun is not always translatable as a noun, an adjective by an adjective, etc. One reason is that languages may not have the same parts of speech. For example, Thai has no adjectives. This variability extends to phrases: gerund or infinitive clauses don’t exist in all languages. Other important differences appear in the grouping or structuring of words and phrases inside utterances. For example, in German "er schwimmt gern" (he willingly swims) means "he likes to swim", where the syntactic relations are reversed.

All natural languages are inherently ambiguous, at all levels (sounds, words, phrases, functions, propositional meaning, intentions). For example, "recognize speech" can be understood as "wreck a nice beach"; "The time to learn about computers" can mean the time it takes to learn about computers, or the right moment to learn about computers; and "I didn’t hit you on purpose" may mean "I hit you by accident", or "I purposely avoided hitting you". If there are "black notebooks and folders", the folders may or may not be black - and so on. An important point here is that humans rarely notice the ambiguity, because they use their background knowledge and their current anticipations to go straight to one particular interpretation, and then stop. But different humans arrive at different interpretations, so misunderstandings and accidents sometimes occur.

There are also unambiguous words or utterances that become ambiguous in translation, because the target language must separate meanings indistinguishable in the source language. This is often the case with elements of meaning systematically underspecified in the source (such as definiteness, modality, aspect, and number).

Translation is a catchword!

According to the situation, the required translation may be 1-to-1, that is, from one language into another one (e.g. Russian-English), or 1-to-N (from one source language into many target languages, e.g. for disseminating technical documentation), or M-to-1 (from many source languages into a single target language, e.g. for a monolingual reader), or M-to-N (from many languages into many other languages, as for international organisations)..

It is often said that perfect understanding is needed to produce high quality translations, with the frequent implicit assumption that human translation is perfect. But "perfect understanding" is extremely rare. Not even a good bilingual engineer-turned-translator can maintain a perfect grasp on new developments in his field. In reality, admittedly imperfect but very high quality translations can in fact be produced with less than full understanding, or even with minimal understanding, by trained translators used to translating within a particular field and very familiar with its technical terms.

In any case, junior translators usually don’t produce very high quality results; so their first draft, a "raw" translation, must usually be revised by senior translators. Typically, one hour is needed to translate a standard page of 250 words, and 20 minutes to revise it. Given the low price of human translation, professional translators must produce results in the time available, even if the quality suffers: they can’t spend more time if they want to earn a living. In addition, they must often work with stringent deadlines, and within fields that they don’t know very well, so that much time is spent looking up terms.

Automation is needed.

The automation of translation and interpretation has become necessary since the 50’s as new needs have appeared.

There can never be enough translators to cover the perceived needs. For example, not even the US army could train enough translators to skim through all Soviet literature in the Cold War era. With increasing globalisation and the growth of the Internet, the need for all kinds of translation, from rough to raw to refined, are growing. The European Community has now 11 official languages, but still about 1200 in-house translators, the same number it had 25 years ago, when there were only 8 official languages.

Second, many translation tasks are so boring or stressful that translators want to escape them. For example, the famous METEO system came about because translators working with the Canadian Meteorological Centre heard of the TAUM MTproject at Montreal University and went there to ask for a system which would free them for some interesting work (revision, as opposed to repetitious translation of routine whether reporting formulae). Automation can thus be seen as a way to free translators from menial tasks and promote them to revisers and co-developers of the automated systems.

Third, there is also increasing need for interpretation, especially to assist travellers abroad. There are many situations where sufficient knowledge of a common language is lacking: visiting a doctor, booking tickets for travel or leisure, asking for help on the roadside, calling motels, etc.

Translation may be automated with various goals and users in mind. Low quality is sufficient for end users wanting to track information, provided it is fast and cheap. Translators need either good support tools to do their job, or high quality machine output to replace a first draft.

Âûâîäû

Despite considerable investment over the past 50 years, only a small number of language pairs is covered by MT systems designed for information access, and even fewer are capable of quality translation or speech translation. To open the door toward MT of adequate quality for all languages (at least in principle), four keys are needed..

On the technical side, one should:

Ñïèñîê èñïîëüçîâàííîé ëèòåðàòóðû

1. Boitet, C. Recent developments in Russian-French Machine Translation at Grenoble // Linguistics 19, 199-271.

2. Maruyama, H. An Interactive Japanese Parser for Machine Translation // Proc. of COLING-90, 20-25 aout 1990, ACL 2/3, 257-262.


Íàçàä â áèáëèîòåêó