Machine Translation 12: 39–51, 1997.
39
c 1997 Kluwer Academic Publishers. Printed in the Netherlands.
Bricks and Skeletons: Some Ideas for the Near
Future of MAHT
JEAN-MARC LANGÉ
IBM France, Dept. 3099, 68, quai de la Rapée, F-75592 Paris Cedex 12
jml@vnet.ibm.com
ÉRIC GAUSSIER
Rank Xerox Research Centre, 6, chemin de Maupertuis, F-38240 Meylan
eric.gaussier@grenoble.rxrc.xerox.com
BÉATRICE DAILLE
IRIN, Université de Nantes, F-44072 Nantes Cedex 03
Beatrice.Daille@irin.univ-nantes.fr
Abstract. This paper sets forth some ideas for the evolution of translation tools in the near future.
The proposed improvements consist in a closer integration of terminology and sentence databases. In
particular, we suggest that bilingual sentence databases (translation memories) could be refined by
splitting sentences into large “bricks” of text. We also propose a mechanism through which bilingual
sentence databases could be generalized by replacing the known terms by variable placeholders, thus
yielding technical sentence “skeletons”.
This latter idea is the most original and, in our view, the most promising for future implemen-
tations. Although these ideas are not yet supported by experiments, we believe that they can be
implemented using simple techniques, following the general philosophy that such tools should go as
far as possible while remaining robust and useful for the human translator.
Key words: Machine-Aided Human Translation (MAHT), Translation Memory, terminology, Example-
Based Translation, sentence skeleton
1. Introduction
The market for MT or translation aids has not met the expectations that foretold
a “boom” for the 1990s. In view of this situation, we are tempted to follow the
trend that originated in the late 1980s, when there was a shift of interest in favor of
practical, robust solutions, and to focus on the development of simple improvements
to existing systems, always keeping in mind The Proper Place of Men and Machines
in Language Translation, as Martin Kay had it.
From this perspective, we propose in this paper a tighter integration of the two
main components in today’s translation utilities, terminology and translation mem-
ory. Although we hope these ideas will raise the interest of the MAHT community,
we want to stress that to date no experiment has been carried out in order to confirm
or deny their usefulness.
40 JEAN-MARC LANGÉ ET AL.
We will deal mainly with Machine-Aided Human Translation (MAHT) prod-
ucts, meant for professional (as opposed to casual) use of translation. Typical users,
which we will call “translators”, are either professional translators or professionals
who devote part of their working time to doing translations.
We will first take a look at the market for MAHT products today, then at the
products themselves, in particular those involving terms or sentence databases.
We will then propose two ideas that could help to improve and to integrate these
existing components better.
2. The State of the Market
There are two main types of products1 available today on the translation-tools
market:
Machine Translation (MT) products, with different technological levels.
Machine-Aided Human Translation (MAHT): The products we will concen-
trate on in this paper are essentially based on dynamic access to two databases,
one of terms and one of previously translated sentences, the latter being known
as “translation memory”, or formerly as “repetition file”.
Other MAHT tools include online bilingual dictionaries and concordance or
textual-search tools (possibly multilingual); the latter, although useful, are
to our knowledge less used by translators, probably because of the limited
availability of (bilingual) textual databases.
Some products offer an integration of sorts between MT and MAHT: For exam-
ple, in Eurolang Optimizer, an MT engine can provide raw translations when
nothing is available in the translation memory; IBM Personal Translator, an MT
system, comes with a translation memory.
Still, translators do not rush to buy these products. After some years spent on
development, marketing and sales of MT and MAHT products, we are now sitting
back and reflecting on the reasons for this relative failure.
Prices? This should be no problem since mass-market MT systems are now
sold at very low prices, typically $200–500 US. As for MAHT systems, the entry
products are listed from $1000 US, a price which can bring a very fast return on
investment. Although it is clear that $50 systems would sell by the thousands, we
know that in most cases (in particular for cheap MT products) they are shelved
after a few attempts.
Quality? It is true that MT systems generally yield a quality that is far too poor
for translators’ requirements, in spite of decades of research and development: The
only excuse we can give is the extreme difficulty of the task of translation. This is
why, in the 1980s and early 1990s, it has been suggested that MAHT could be a
reasonable alternative for translators.
If price and quality are not the (only) issues, we are left with two other explana-
tions: Translators might be resistant to technological change; or they do not find in
our products all the features they would like. Resistance to change does exist, but
BRICKS AND SKELETONS: SOME IDEAS FOR THE NEAR FUTURE OF MAHT 41
nowadays most translators have made the technological jump: They are equipped
with modems, etc. Which leaves us the technological features of the products, and
their failure to satisfy the market’s needs. We will concentrate on these, and try
to see what new features we could integrate into our products in the near future,
to improve the state of the market by efficient use of the state of the art. Let us
first review the features of present-day MAHT products in detail, in order to locate
potential for improvement.
3. A Closer Look at Today’s MAHT Products and their Philosophy
As already noted, the main functionalities of MAHT products on the market are
dynamic terminology lookup and translation memory. We are more interested in
these than in concordancing or textual-search tools, because the latter are rather
tools for consulting, requiring human intervention in order to sort out useful items,
while the former yield ready-to-use chunks of translated text.
Terminology lookup usually involves some simple source morphology (e.g.
the entry floppy disk is found if the text to be translated contains floppy disks) and
proposes one or several translations from one or several databases. Terminology
database management functions are included in the system.
Translation memory facilities consist of lookup in a database of previously
translated sentences for exact or approximate equivalents of the sentence to be
translated; users can pick up the translation of their choice and re-use it. Major
MAHT products now also include the so-called “alignment” of existing source files
and their translations in order to build a translation memory automatically.
The basic idea of these products is therefore to store translators’ knowledge,
either at the word/term or sentence level. An essential difference between translation
memories and term databases is that the former are fed automatically each time the
translator validates a new translated sentence, while terminology has to be entered
into the latter manually when the need arises. The reason for this is simply that
it is fairly easy for a system to recognize a sentence, while recognizing a term
involves what we would call “linguistic intelligence”. This is unfortunate, because
terms tend to be more repetitive than sentences in texts: Sentences, being generally
much longer than terms, have more chance of varying, and the number of possible
sentences in a technical domain is potentially infinite, while there is presumably a
finite number of domain terms. We will come back to this phenomenon, which we
call sentence variability. As a consequence, there are more translation proposals for
terms than for sentences in MAHT products (provided the term databases are kept
updated). This is partially balanced by the fact that copying a sentence proposed by
the translation memory saves much more typing than copying a term translation.
Note that terminology and translation memory have no point of contact, and this is
one of the enhancements we will suggest later on.
The MAHT products are well in line with Martin Kay’s ideas about the proper
place of humans and computers, and indeed these products are indebted to the first
42 JEAN-MARC LANGÉ ET AL.
system of its kind, developed by ALPS at the beginning of the 1980s. The idea is
still the same: Use computers for what they are good at (e.g. fast searching among
large quantities of data), and let humans take the baton when it comes to the “génie
de la langue”. Computer chess systems do not do much more; they can, however,
compete on equal grounds with the best human chess players.
4. So What?
So far, we have assumed that translators do not buy MAHT products partly because
of a lack of certain features, and we have seen the kind of features proposed by
current products. What are the missing features? The obvious first step to answer
that question would be to ask the customer. Well, as far as we know customers
have very simple demands. But they often ask questions which can be interpreted
as visions of the perfect translator’s workstation. For example, they ask whether
the translation memory is “able to recognize only part of a sentence”. The answer
is a partial “Yes” for the current products, thanks to the fuzzy match capability.
Such tools as concordancing or bilingual keyword-in-context (bi-kwic) (Isabelle et
al., 1993; Langé and Bonnet, 1994; Macklovitch, 1994) also offer a partial answer.
However, their usefulness is limited since there is no indication of which portion
of the translated sentence corresponds to the phrase that was found in the source
sentence. Take the following example: Say we are looking for the translation of an
expression such as so to speak; we can retrieve and display all source sentences
containing this phrase, together with their translations. But while so to speak can
be highlighted in the source sentence, its French translation cannot be highlighted
because in most cases the system does not have the ability to know which particular
words of the target sentence are involved in the translation of the expression.
5. The “Building Blocks” Translation Memory
What is needed in the aforementioned case is the ability for the system to decide
which specific words in a translated sentence are the translation of certain source
sentence words. This issue of word alignment has been addressed by researchers
(Brown et al., 1993; Gale and Church, 1991; Gaussier et al., 1992) but the solutions
work only at word level, and still require amounts of data and processing power
that are beyond the reach of present-day desktop systems.
However, we see here one possibility for future systems: To offer an extended
MT capability that would deal not only with sentences, but also with the elementary
“bricks” that sentences are made of, including terms, phrases or clauses.
As an example, suppose we have to translate (1):
(1) Proceed with installation checking once you are done with installation.
and the TM contains (among others) the sentence pairs (2)–(4):
BRICKS AND SKELETONS: SOME IDEAS FOR THE NEAR FUTURE OF MAHT 43
(2) Once you are done with installation, you can go to Chapter 3, “Customiza-
tion”.
Une fois l’installation terminée, passez au chapitre 3 qui traite de la
personnalisation de votre poste de travail.
(3) Proceed with customization.
Passez à l’étape de personnalisation.
(4) 2.1.2 Installation checking
2.1.2 Vérification de l’installation
A system smart enough could locate two text blocks in (1), retrieve the equivalent
translated “bricks” from (2)–(4), and propose them in the same order as that of the
original sentence, as in (5):
(5) Passez à l’étape de vérification de l’installation une fois l’installation
terminée.
The translator could then intervene on the order of the bricks or modify their
contents.
This Lego approach, which we will call “Building Block Translation Memo-
ry” (BBTM), fills the gap between the word level of terminology databases and
the sentence level of translation memories. It bears immediate resemblance to the
example-based approach (EBMT) currently advocated mainly by Japanese research
groups (see Section 6.4). However, the EBMT approach is known to pose several
problems (what size of elementary bricks to use, how to parse a source sentence
into its elementary bricks, how to relate source and target bricks, how to organize
target bricks into something that makes sense, and how to choose between multiple
solutions when the database size grows). In most descriptions, EBMT is a sophisti-
cated process which necessarily involves a large amount of linguistic or statistical
processing. This very sophistication is the main drawback of the approach.
In our view, the BBTM should operate at very shallow levels. We do not want
a facility that works all the time, but one that is robust. Therefore we envisage
that it will be triggered only in simple cases, for example when the splitting of a
sentence into two bricks is made easier by the presence of an unambiguous marker
such as a conjunction, or punctuation marks. Aligning source and target bricks
can be achieved by using the word-alignment information or the existing bilingual
terminology. A preliminary experiment in this direction can be found in Meunier
(1993).
BBTM does not pretend to perform automatic translation, and therefore it calls
for interaction with the user: The bricks could be represented graphically by a set of
color blocks which the translator could mouse-drag to their proper place in the final
sentence. To be operational, the system should propose as few bricks as possible,
44 JEAN-MARC LANGÉ ET AL.
i.e. the bricks should be large (in the ideal case one big brick, a fully translated
sentence, is retrieved, as in a normal translation memory).
Since the system should only attempt what is feasible, this feature would help
users in a number of simple cases, while letting them deal with the more complex
ones. Likewise, semi-constructed target sentences could be completed manually
by the translator when only a part of the building blocks are available.
6. Putting Bricks in Holes in the Wall: The Skeleton-Sentence Approach
We pointed out that the translations and term databases did not communicate much.
There are, however, ways in which they could be more closely integrated. After
all, both provide what in the preceding section we have called “building blocks,”
although of different sizes. One of the functions provided in some products is
particularly interesting because it could be extended to bring terminology and
translation memory databases to cooperate, suggesting a new and, in our view,
important direction to explore for MAHT systems.
6.1. “TRANSWORDS” IN CURRENT MAHT PRODUCTS
A feature of IBM’s MAHT product TranslationManager is called “automatic
replacement in fuzzy matches”. The idea is that a number of “words” are kept
unchanged in the process of translating: Numbers, dates, proper nouns, etc. These
have been dubbed “transwords” in Gaussier et al. (1992). Etymology aside, trans-
words are a subset of “cognates” as defined by Simard et al. (1992), and can be
seen as identical cognates.
A good example of transwords is given by the strings DOS and 7.1 in the
following trilingual sample:
(6) The current version of DOS is 7.1
DOS en est à sa version 7.1
La versión actual del DOS es la 7.1
In fuzzy translation-memory matches, the product is able to detect the differ-
ences between the source sentence and its look-alikes found in the translation
memory, and to show where deletions, insertions or replacements took place. Now,
if the system finds that the source sentence and its match in the translation memory
only differ by a string that also appears in the translation, it can decide to mirror
this difference in the new translation. An example will make this clear; if one has
to translate (7),
(7) Go to chapter 3.
and the translation memory contains the pair (8)
BRICKS AND SKELETONS: SOME IDEAS FOR THE NEAR FUTURE OF MAHT 45
(8) Go to chapter 1.
Passez au chapitre 1.
the system analyzes that in the English sentences, 1 became 3; since the French
sentence also contains 1, the system changes it to 3, and instead of a fuzzy match
(in which the user would have to input changes) it proposes the perfect translation
(9):
(9) Passez au chapitre 3.
Now what all this amounts to, in a certain way, is the implicit use of sentence
skeletons containing variables, or placeholders (Go to chapter X). We will now
proceed to extend this idea.
6.2. FROM TRANSWORDS TO TERMS TO VARIABLES: SKELETON SENTENCES
Transwords are useful because they give us the quasi-certainty that the source word
is aligned with the target word. There is a one-to-one equivalence between the
source and the target item, unlike in EBMT where this equivalence is not 100 per
cent certain. As such the transwords are a source of bilingual knowledge. Now, a
similar source of information is readily available in most systems: The terminology
database, which usually establishes one-to-one relations between terms in source
and target language.
In the examples we have given for the BBTM, we can see that different terms
may appear in the same context (10)–(11):
(10) Proceed with customization.
(11) Proceed with installation checking.
We can represent sentences (10) and (11) as in (12):
(12) Proceed with X.
where X stands for a term, either customization or installation checking.
It is in fact possible to be more general without loss of precision, and simply
say that X stands for any term in the terminology database. By “without loss of
precision,” we simply mean that this generalization, per se, should not induce
errors.
The sentences in the translation memory could thus be encoded as “skeletons”
in which some variables need to be instantiated. Since terms and transwords do not
require the same treatment (terms are translated using the terminology database
whereas transwords need not be translated), we can use two different variables to
represent them. Thus, if our translation memory contained the aligned sentences in
(13),
46 JEAN-MARC LANGÉ ET AL.
(13) Examples of link budget calculations are given in annex II.
Des exemples de calculs de bilan de liaison sont donnés dans l’annexe II.
they should be replaced by the aligned skeletons in (14):
(14) Examples of X1 are given in annex Y1 .
Des exemples de TX1 sont donnés dans l’annexe Y1 .
where Xi /TXi is any term pair in the terminology database and Yi is a transword
representing a number. The suffixes uniquely identify the translations of the source
terms in the target sentence.
Note that a sentence that has been “skeletonized” to include variable parts is
more general, and should therefore be found more frequently in the translation
memory than fully instantiated sentences. Such a gap-filling system can therefore
be a partial answer to the aforementioned sentence-variability problem.
6.3. HOW TO USE A SKELETON SENTENCE-BASED TRANSLATION MEMORY
Having a translation memory consisting of skeletons rather than full sentences
leads us to the following translation process given an input sentence:
identify terms that are present both in the terminology database and in the
sentence;
replace all the terms identified in the source sentence with variables in order
to obtain the skeleton of the sentence;
try to match this skeleton in the translation memory;
if the match succeeds,
get the target skeleton from the translation memory;
replace all term variables in the target skeleton with the translations of the
source terms, as found in the terminology database;
replace transword variables in the target skeleton with their corresponding
values in the input sentence;
if the match does not succeed, try another combination of instantiated/non-
instantiated terms in the source sentence.
We will discuss the problems raised by this process below. Let us give a simple
example of such a process: Suppose our terminology database contains the term
pair (Esc, Échap), and that our translation memory contains the skeleton sentence
pair (15).
(15) Press the X1 key to continue.
Appuyez sur TX1 pour continuer.
Now, if we are presented with the source sentence (16),
(16) Press the Esc key to continue
BRICKS AND SKELETONS: SOME IDEAS FOR THE NEAR FUTURE OF MAHT 47
the system would locate the term Esc, look for the skeleton (15) in the memory,
retrieve the target skeleton, replace TX1 by Échap and yield the translated sentence
(17).
(17) Appuyez sur Échap pour continuer.
This could be performed easily in current products. All it takes is re-use of
existing terminology and translation-memory-lookup components.
6.4. THE SKELETON APPROACH VERSUS PURE TRANSLATION MEMORY AND
EBMT
The skeleton sentence approach is in fact an intermediary stage between translation
memory and EBMT. On the one hand, it allows for more flexibility than the former
insofar as the system does not try to match complete sentences. On the other hand,
it captures less generality than EBMT, since the latter relies on complete parses
of sentences. Its main advantage over EBMT is that it can be based on robust
techniques, as will be seen below.
In the description of EBMT given by Sato and Nagao (1990), each sentence in
the example base is represented by a dependency tree in which leaves are either
words or semantic features. When the system is given a sentence for translation, it
first builds a representation of the sentence according to the representations used
in the example base, i.e. a particular dependency tree, and then tries to find which
representations in the example base are closest to the representation of the input
sentence. This second step involves estimating the distances between the different
semantic features or syntactic structures. Lastly, a third step is needed to generate
the translation.
The skeleton approach tries to find a realistic balance between translation mem-
ory and EBMT. A skeleton ideally represents the linear syntactic structure of a
sentence in which some of the arguments, the variables, are not instantiated and
not semantically specified. As stated above, these variables can be instantiated by
any of the terms of the domain. Thus, the comparison between an input sentence
and a sentence in the memory will take two steps:
terminological generalization, similar to the semantic generalization of EBMT,
for the lexical differences;
pattern matching (i.e. the translation-memory strategy), for the detection of a
similar syntactic structure.
This enables the use of robust techniques (pattern matching instead of syntactic
analysis), and the capture of lexical generalization through the domain terminology,
a resource which is more easily available than the complete set of semantic features
needed for EBMT.
Finally, we want to point out some similarities between the skeleton approach
and the BBTM: The placeholder terms of the BBTM are just a particular kind
of building brick that have the advantage of being already perfectly defined since
48 JEAN-MARC LANGÉ ET AL.
they have been entered by humans. This suggests that the two approaches could be
combined.
6.5. SOME PROBLEMS WITH THE SKELETON APPROACH, AND SOME SUGGESTIONS
FOR IMPROVEMENT OF TERMINOLOGY-LOOKUP COMPONENTS
Although we have not implemented the ideas proposed herein regarding the skele-
ton approach, we think its implementation could be rather straightforward, provided
a few issues are considered.
The first problem we can think of is linked to the degree of variability of terms
in technical texts, which is known to be considerable in certain cases (see the
next section for more on this matter). We think that a first experiment should
only rely on simple pattern matching, as is done by current MAHT products.
This will not enable us to obtain general skeletons in all cases, but will prevent
ill-considered generalization. More complex procedures could be invoked to deal
with cases such as the insertion of an adjective, an adverb or a noun inside a term,
but these procedures imply linguistic processes which are not, to our knowledge,
implemented in available translation tools.
A related problem is that of overlapping terms: what should the system do with
the sentence Install the receiving antenna support, if the term database contains
both receiving antenna and antenna support? Since we aim at robustness, any
solution privileging one term over the other would only result in loss of generality
(i.e. less recall from the memory), but not in loss of quality (since no wrong solution
is proposed).
Another problem we can think of is how to select the actual translation of a term
when there are several possibilities which are not simple variants of a base form.
Two solutions can be envisaged to cope with this problem:
If one wants to rely on a fully automatic process, further information should be
added to the terminology database, such as domain markers or information on
relative frequency of use, so that the system can propose the relevant solution,
or at least the most probable one.
We can also adopt a semi-automatic strategy and ask the system to display
all possible translations, then let the translator make the appropriate choice
(which is the case in current MAHT products when a source term has several
translations).
Finally, as always with organ transplants, there will be instances of incompati-
bility and rejection: Since we have performed a very shallow syntactic analysis by
abstracting a term (typically a noun phrase) to a variable, the process of grafting the
target term in place of the variable is a matter of generation, and therefore we are
bound to experience agreement problems (noun–adjective, subject–verb, etc.). In
order to remain robust, we see no other alternative here than relying on the human
translator to perform the necessary adjustments.
BRICKS AND SKELETONS: SOME IDEAS FOR THE NEAR FUTURE OF MAHT 49
It is to be noted that several of these issues are not proper to the implementation
of the skeleton approach, and we think they should be addressed at any rate in the
terminology-identification and lookup components of current MAHT products.
7. Cooking the Bricks: The Problem of Data Acquisition
The components described above make use of data and engines. The necessary
engines can include a certain amount of linguistic intelligence, but this has to be
considered carefully, the state of the art in computational linguistics being such that
not even syntactic analysis with large coverage and good quality can be achieved
(which hinders, for example, the possibilities of current EBMT implementations).
However, as already noted, one can design “shy” engines that will only be triggered
when no ambiguity jeopardizes the quality of the final result.
Rather than the engines, what we are mostly concerned with here is data. Data
is indispensable in all cases. So far, we have seen that data in current MAHT
products is found in two databases, terminology and translation memory. We have
also noted that while the latter was self-feeding, the terminology database needed
manual feeding (even in the case of conversion from other databases, these are most
likely hand-constructed). Having well-stocked terminology databases is important
for the productivity and quality of the work of MAHT users, but terminology
acquisition is a costly process. What could we do to help here?
A simple translation memory, loaded into a word processor, provides for quick
manual terminology identification: When one identifies a term in the source sen-
tence, one is almost sure that the equivalent of the term will be found in the
corresponding target sentence. It is therefore easy to browse through the trans-
lation memory with a mouse and copy terms and their translations to any other
application such as a database.
It is possible to automate this process partially. We have worked extensively
on the subject of automatic terminology acquisition, and refer the reader to Daille
et al. (1994) and Gaussier and Langé (1994, 1995) for details. We have found
statistical models and procedures that would partially ease the terminologists’
work by providing them with a pre-list of bilingual terminological entries that
they could clean up later. It must be noted, though, that a given term (and its
translation) can vary considerably in texts (through different modifiers, insertion
of determiners, etc.). Some studies set this degree of variability as high as 30
per cent of the cases (Macklovitch, 1995). While some of these variations can be
dealt with automatically (e.g. a simple grammar can trace antenne parabolique de
réception as one particular instance of the term antenne de réception), others, such
as metaphors or discourse ellipses, would be harder to track. This is another reason
for being practical and leaving it up to humans to deal with such phenomena.
Although not fully automatic, such utilities would at least speed up terminology
acquisition. One would have to design carefully an interface where the bilingual
candidates are presented in context (that is, in the pairs of bilingual sentences where
50 JEAN-MARC LANGÉ ET AL.
they are found), and users only need one click to validate a pair of terms. One such
interface can be found in Termight (Dagan and Church, 1994).
It is most interesting that the fuel for this improved terminology acquisition
is provided by an existing resource, translation memories. This gives us further
motives for advocating a closer integration of the two components.
8. A Word on MT
We have mentioned that translators did not like MT very much. Perhaps one day
MT will reach the expected level of quality (but this day is remote). Meanwhile, the
usual arguments remain valid: MT can be of some help in restricted domains, and
one can make use of it if one knows the limits of such systems. In the framework
of MAHT, it seems to us that MT can achieve wider acceptance since its results
(1) are not imposed on the translator, just suggested as another available resource,
and (2) only come to the forefront if no other resource (e.g. a proposal from the
translation memory) is found. It has also been noted that the quality of MT output
can improve dramatically when accurate and complete terminology is available,
a problem for which we have proposed solutions above. Furthermore, those MT
systems based on the use of bilingual corpora (such as EBMT) are more likely to
satisfy users if they are derived from these users’ own corpora. That is just what
translation memories could also be used for. MT is therefore another component
of future MAHT systems that could profitably communicate with the two other
components, terminology and translation memory.
9. Conclusion: Keep Within the Limits, but Keep Moving
We have seen a number of points where current MAHT products can be improved.
Such improvements rely on a better use of the available resources, and integra-
tion between these resources: Translation memories, which are built automatically
as a result of the human translator’s work, can be a valuable source of data for
improved terminology acquisition, or for a certain class of MT system. Translation
memories and terminology databases can be integrated for an enhanced version
of the translation memory where skeleton sentences are retrieved. Finally, some
linguistics-based engines could provide for a re-use of text bricks that are interme-
diary in size between the terms found in the terminology database and the sentences
found in the translation memory.
We have tried to show that the necessary engines and interfaces do not call
for sophisticated resources such as high-level grammatical analysis. The idea is
to respect the limit where computer power has to leave it up to humans to do the
job. The consequence is of course that such systems will not provide an answer in
all cases. But this is the condition for high-quality results, which at the same time
should ensure optimal user acceptance.
BRICKS AND SKELETONS: SOME IDEAS FOR THE NEAR FUTURE OF MAHT 51
This limit, which defines the proper place of humans and computers, will move
as progress is made in domains such as computational linguistics. Then will come
the time to think of smarter applications. What the art of MAHT comes down to is
an awareness of its limits at a given time.
Note
1. All trademarks are hereby acknowledged.
References
Brown, P., Della Pietra, S., Della Pietra, V., and Mercer, R.: 1993, ‘The Mathematics of Machine
Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.
Dagan, I. and Church, K.: 1997, ‘Termight: Coordinating Man and Machine in Bilingual Terminology
Acquisition’, this issue, pp. 89–107.
Daille, B., Gaussier, E., and Langé, J.-M.: 1994, ‘Towards Automatic Extraction of Monolingual and
Bilingual Terminology’, in Proceedings of the 15th International Conference on Computational
Linguistics, COLING-94, Kyoto, Japan, pp. 515–521.
Gale, W. and Church, K.: 1991, ‘Identifying Word Correspondences in Parallel Texts’, in Proceedings
of the Fourth DARPA Speech and Natural Language Workshop, Pacific Grove, California.
Gaussier, E. and Langé, J.-M.: 1994, ‘Some Methods for the Extraction of Bilingual Terminology’,
in International Conference on New Methods in Language Processing (NeMLaP), Manchester,
UK, pp. 224–228.
Gaussier, E. and Langé, J.-M.: 1995, ‘Modèles statistiques pour l’extraction de lexiques bilingues’,
T.A.L. (Traitement automatique des langues) 36, 133–155.
Gaussier, E., Langé, J.-M., and Meunier, F.: 1992, ‘Towards Bilingual Terminology’, in Proceedings
of the ALLC/ACH Conference, Oxford, England, pp. 121–124.
Isabelle, P., Dymetman, M., Foster, G., Jutras, J.-M., Macklovitch, E., Perrault, F., Ren, X., and
Simard, M.: 1993, ‘Translation Analysis and Translation Automation’, in Proceedings of the
Fifth International Conference on Theoretical and Methodological Issues in Machine Translation
TMI’93, Kyoto, Japan, pp. 201–217.
Langé, J.-M. and Bonnet, E.: 1994, ‘The Multiple Uses of Parallel Corpora’, in Proceedings of
Teaching and Language Corpora (TALC’ 94), Lancaster, U.K.
Macklovitch, E.: 1994, ‘Using Bi-textual Alignment for Translation Validation: the TransCheck
System’, in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First
Conference of the Association for Machine Translation in the Americas, Columbia, Maryland,
pp. 157–168.
Macklovitch, E.: 1995, ‘Qu’est-ce que c’est au juste que la cohérence terminologique’, in Actes des
IVèmes Journées scientifiques de l’AUPELF-UREF: Lexicomatiques et dictionnairiques, Lyon,
France.
Meunier, F.: 1993, Découpage de phrases et alignement de sous-phrases dans un corpus bilingue,
Rapport de DEA en Informatique Fondamentale, Université Paris 7.
Sato, S. and Nagao, M.: 1990, ‘Towards Memory-Based Translation’, in COLING-90: Papers pre-
sented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3,
247–252.
Simard, M., Foster, G., and Isabelle, P.: 1992, ‘Using Cognates to Align Sentences in Bilingual Cor-
pora’, in Fourth International Conference on Theoretical and Methodological Issues in Machine
Translation: Empiricist vs. Rationalist Methods in MT. TMI-92, Montréal, Canada, pp. 67–82.