Machine translation has been the Holy Grail of computational linguistics since the early days of computer science. Like the Holy Grail, it has remained elusive. It is the contention of this article that the primary reason for the failure of commercial and experimental machine translation systems to consistently deliver usable (let alone good) results lies in a number of misconceptions, not only in respect of the translation process, but also in respect of the nature of language itself. The staggering variety of syntactic manifestations across languages has driven researchers to hunt for language universals to provide a framework to which utterances in any language can be reduced, and hence reformulated in a different language. With a few notable exceptions, such language universals are held to be classes or categories to which “words” can be assigned and which interact with each other in defined ways. Translations should then be a relatively simple matter of substituting the appropriate words and applying the specific syntax of the target language.
This article attempts to use some simple examples from the day-to-day experience of a well-versed translator to illustrate some of the fallacies that underpin such approaches. If language universals exist, they almost certainly do not exist in the forms in which they are sought.
There is a widespread perception that the job of a translator is largely mechanical, simply converting streams of words in one language to similarly formed streams of words in another language. Insofar as there is any appreciation of the difference between a good translator and a poor translator, it is largely understood to be that the good translator has a larger stock of words from which to select appropriate matches and puts a little more effort into fine-tuning the result so that it reads more elegantly. Some clients are also appreciative of the fact that good translators show a good understanding of the particular field in which the client is engaged.
Even among translators themselves, it appears to me that there is only a vague understanding of the processes that lead to a good translation and the skills that define the very few truly great translators. Nevertheless, there is considerable agreement among good translators when reading a good translation or a poor translation. The highest praise bestowed on a translator is usually that they have “really freed themselves from the original text”. And yet, it is my experience that even the very best are largely unaware of the precise mechanisms involved in producing an entirely readable and convincing target text while retaining all the salient information from the original.
By investigating a few short examples from my own experience, I shall attempt to highlight some of the mechanisms by which good translations evolve, and in the process questions some fundamental assumptions often made in respect of language universals.
Of necessity, the examples I shall use will be taken from German, with English as the target language. Although the languages are closely related on many levels, not least syntactically, the discrepancies and disjuncts that will be revealed are of a remarkably fundamental nature, and it is to be expected that such disjuncts will be even more widespread across language pairs from entirely different families.
Lest it be misunderstood, the author does not rank himself among the truly great translators and looks with awe and reverence on those who appear to achieve with consummate ease that which he struggles to produce with any consistency.
This example goes back to the very early days of my professional translating career. Although seemingly innocuous, it has haunted me ever since.
In the context of a general article on the nature of business administration, the following sentence appeared:
Sinn und Zweck eines jeden Unternehmens ist die Optimierung des Gewinns.
For the benefit of those who do not read German, this can be simply rendered as
The purpose of every company is the optimization of profit.
This retains virtually all the syntax of the original German. A few slight alterations would provide a more satisfactory translation, which would, in all probability, have been similar to the translation I used at the time:
The purpose of any company is to optimize profits.
In fact, this translation is unexceptionable, if not riveting. But this can hardly be claimed of the German either.
The syntactic and lexical adjustments compared with the German or with a more literal rendering are relatively minor and of the type that most good translators would not think twice about.
Firstly there is the choice of “any” to render “eines jeden”, which is essentially one of focus, emphasis and readability. Alternatives could be “all” or “each and every”. While the second of these could be regarded as being closer to the import of the original German, it is perfectly possible to claim that the use of “each and every” would shift the focus of the sentence far more strongly than the German does.
The second substantive change in the translation is the use of the plural “profits” in place of the (genitive) singular German “Gewinns”. I suspect that experienced translators of German to English, particularly those who regularly translate business-related material and read original English annual reports, the Economist, and any other original business publications would actually be entirely unaware of having pluralized the noun. The fact is that there is a slight disjunct in the use of the words “profit” and “Gewinn” when referring to the overall profits made by a company rather than the profit made from an individual transaction or project. (As always, the reality is in fact a little more complex than this, but this explanation will suffice at this point.)
It is the third substantive change that is of interest to me in this article and that has caused me to consider and reconsider the nature of the translation process and, indeed, the nature of language, language divergence and linguistic relativism for the past twenty years or so.
The shift from the noun “optimization” to the non-finite verb “optimize” initially appears innocuous. It is my contention that the shift is far from innocuous, but actually reveals a fundamental disjunct within this language pair (which is well known to translators), and that embracing this disjunct has far-reaching implications.
But first, a few rather mundane observations:
- “Optimierung” and “optimization” are grammatically identical in English and German. They are both nouns and can fulfil a largely unrestricted number of noun roles.
- “Optimierung” and “optimization” are both derived from an adjectival root using analogous derivational morphology (adjective -> verb -> noun).
- Semantically, there is no significant difference between “Optimierung” and “optimization”.
At first sight, this would appear to suggest that there is no strong argument not to translate the noun “Optimierung” as the noun “optimization”. Indeed, the initial rendering of the sentence
The purpose of every company is the optimization of profit.
is grammatically well-formed and comprehensible.
Of course, any translator of German to English who has undergone any training whatsoever will have had it drilled into them that German, and formal German in particular, has a tendency to use “nominal structures” and that these are usually best rendered with “verbal structures”. On the face of it, this is true, and those of us unfortunate enough to regularly encounter academic German texts will have struggled with the problem plenty of times.
Often enough, substitution of nouns in German with non-finite verb forms in English (infinitives, gerunds, …) as in the above example (“to optimize”) will produce an adequate and readable rendering. But, leaving aside the pragmatic issue of whether a given text needs to be rendered particularly elegantly and “deserves the full treatment”, is such a rendering of noun-heavy German constructions with non-finite verbal constructions in English justified? And if so, is it sufficient?
Linguistic relativism has had a chequered history. Although the concept of linguistic relativism dates back at the very least to Plato, it is most usually associated with Benjamin Lee Whorf and his work in the middle of the last century. Indeed, it is often referred to as the “Sapir-Whorf hypothesis”, or “Whorfianism”. His work, which initially suggested that a person’s language constrains and determines the way in which that person is able to think, generated considerable interest, although two variants of the “hypothesis” soon emerged. Essentially, strong versions of the hypothesis claim that the linguistic categories and mechanisms available in a language constrain and limit the cognitive categories and mechanisms available to the speaker, whereas weak versions claim that such linguistic categories and mechanisms merely influence, but do not constrain cognitive behaviour. As it became clear that at least parts of Whorf’s data were incorrect and some of his deductions untenable, debate about linguistic relativism subsided, only to re-emerge as a result of the work of cognitive linguists in the 1980s.
I shall not here rehearse the arguments in favour of linguistic relativism. Others have done that far more eloquently than I ever could, in particular George Lakoff in his monumental “Women, Fire and Dangerous Things”. Suffice it to say that my many years of exposure to a language that is not my native language and to the culture associated with that language for some 30 years have entirely convinced me that the language one speaks has a significant influence on the way in which one conceptualizes reality. German-speaking people genuinely think differently from English-speaking people, in the same way that Chinese-speakers think differently from Russian-speakers.
This does not relate merely to cultural and geographical discrepancies between languages, where most languages lack, for instance, an adequate concise rendering of the English “croft”, and where the German “Alm” is only unsatisfactorily rendered by “alpine meadow”. Nor does it simply refer to the famous “untranslatables”, conceptual lacunae in one language from the perspective of another language (“Gemütlichkeit”, “Schadenfreude” and “besinnliche Feiertage” spring to my mind at this moment). Neither does it refer to discrepancies in semantic scope, such as the English word “house”, which has a different semantic scope for English speakers from that of the very close German equivalent “Haus”.
While all the above examples illustrate the different ways in which different languages “carve up” the semantic universe, and also condition speakers’ conceptualizations of the world around them, there are other, even deeper elements of language that have a far more radical impact on conceptualization.
In the case of the language pair English and German, it is very noticeable that German has a marked tendency to use nouns in ways in which English would not. The example sentence is just a relatively straightforward example of this.
But is this significant? As we have said, good translators are fully aware of this tendency and instinctively compensate for it.
And yet, there is an argument that in compensating for such tendencies, we do a disservice to the original text. If we assume that the language we speak does indeed shape our conceptualization of the world around us, and if we assume, as most theories of meaning do, that there is a fundamental difference between nouns and verbs, then there is a significant conceptual disjunct between the use of the noun “Optimierung” and the verb “optimize”.
There is considerable anecdotal evidence and a small amount of research findings that indicate that some significant difficulties are encountered when English-speakers and German-speakers collaborate on projects in a business environment. Some of the reported difficulties are undoubtedly purely the result of different cultural backgrounds and working practices that have become established in the different environments. These can include such things as attitudes to working hours, the importance of social interaction in the workplace and between colleagues outside the workplace, the significance of and respect for managerial hierarchies, flexibility in respect of deadlines and agreed content and so on. While cultural differences such as these can have considerable impact on the success or otherwise of a collaborative project, linguistic relativism suggests that the may be even more fundamental differences between the ways in which speakers of different languages perceive and conceptualize the world around them and, indeed, the task in hand.
If we return to our example and accept the noted tendency of German-speakers to prefer nouns to verbs in certain circumstances, it is perfectly plausible to propose that German-speakers would conceptualize the core message of “die Optimierung des Gewinns” differently from the way in which English-speakers would conceptualize “to optimize profits”. If a noun is a fundamentally different animal from a verb, it would be no surprise to identify such a discrepancy. If “Optimierung” (or, for that matter, “optimization”) has the essential nominal qualities of being discrete and essentially static (a “thing”) whereas “optimize” has the essential verbal characteristics of action, change of state, and so on, it would seem likely that the nominal form “Optimierung” can be regarded as an “end state”, an objective that can be realized or, to put it more provocatively, something that can be ticked off from a checklist. The verbal form, on the other hand, may well be conceived as a process with no clearly defined end state or, indeed, end point. It would be an activity that stretches over time.
From my own purely subjective experience of the word “Optimierung”, I would contend that this is indeed the case. I suspect that, over the past quarter of a century, there have been very few weeks in which I have not had to translate the word “Optimierung”, and my overwhelming impression from the contexts in which it has occurred is indeed that it is largely regarded as an achievable objective. This contrasts sharply with my reading of original English texts, where verbal forms are more common and the understanding I deduce from the context is one of an ongoing activity.
If my intuition and sensibilities are correct, there could well be an argument for retaining such nominal forms on the grounds that they actually convey semantic information that crucially modifies the concepts being conveyed by the language. If there is a genuine understanding in the mind of a German-speaker that “Optimierung” is something that can be treated as an achievable end state, and if it is the intention of the German-speaker to convey this information, the use of a nominal form in English would be eminently sensible.
Such an approach, of course, throws up a wealth of questions.
It was the hugely influential linguist Roman Jakobson who, as early as 1959, noted that “languages differ essentially in what they must convey and not in what they may convey”. It seems to me that it took a long time before linguists began to understand the concepts of linguistic relativism in these terms. It is patently obvious that a language that, for instance, features no tense distinctions in the verb does not prevent speakers of those languages from conceiving of present, past and future time. Indeed, grammatically, English does not mark the future tense in the verb, but nobody would contend that English speakers cannot conceive of future time, nor that they cannot conceive of the six tense distinctions used, for instance, in Kalaw Lagaw Ya. But, crucially, speakers of Kalaw Lagaw Ya are constrained to frame any utterance in respect of past time in terms of the remote past, the recent past or the today past. When translating a past narrative from Kalaw Lagaw Ya into English, does it make sense to make an attempt to constantly reference the timeframe, or is it sufficient to allow the reader/hearer to infer the timeframe from context in the same way that we do in English narratives? Does it provide additional insight into the mind of the speaker to explicitly convey such information constantly?
To take another example, when relating a past event in Turkish, it is necessary to provide in every verb the information as to whether the speaker actually witnessed the event or came by the information in some other way. Of course, English and other languages that have no inferential form of the verb (the vast bulk of languages) are perfectly able to convey such information, and regularly do (“it has been reported that Ed Milliband made an unequivocal policy statement last week” – but I did not witness this myself), but English-speakers do not habitually make such distinctions. It is an interesting thought experiment to consider the impact on the perceived reliability of news reporting were English to impose such a constraint.
A similar, amusingly presented, example can be found in a 2010 article from the New York Times:
Some languages, like Matses in Peru, oblige their speakers, like the finickiest of lawyers, to specify exactly how they came to know about the facts they are reporting. You cannot simply say, as in English, “An animal passed here.” You have to specify, using a different verbal form, whether this was directly experienced (you saw the animal passing), inferred (you saw footprints), conjectured (animals generally pass there that time of day), hearsay or such. If a statement is reported with the incorrect “evidentiality,” it is considered a lie. So if, for instance, you ask a Matses man how many wives he has, unless he can actually see his wives at that very moment, he would have to answer in the past tense and would say something like “There were two last time I checked.” After all, given that the wives are not present, he cannot be absolutely certain that one of them hasn’t died or run off with another man since he last saw them, even if this was only five minutes ago. So he cannot report it as a certain fact in the present tense.
There is a wealth of examples of such fundamental linguistic disjuncts, and there is an increasing body of research suggesting that such phenomena do indeed shape the way we think in very profound ways. One of the foremost researchers in this field at present is Lera Boroditsky, and her article “How does our language shape the way we think?” offers a very brief overview of the directions some of this research is taking.
But even if there is considerable evidence that speakers of different languages actually conceptualize events and situations differently, should translators attempt to reconstruct such a conceptualization in the mind of the reader or attempt to reflect the putative conceptualization that a speaker of the target language would have constructed when confronted with the same events and circumstances as the original speaker?
It appears to me that the accepted wisdom among translators largely favours the second approach, and such an approach undoubtedly results in far more accessible and readable translations. But to what extent do important aspects of the original conceptualization “get lost in translation” as a result of this approach? Clearly, where information that is explicit in one language can be and is inferred from context in another language, it is often sufficient to ensure that that context is provided in the target language, and this is common practice among good translators. But what of a pervasive disjunct such as that which I have hypothesized in respect of nouns and verbs between German and English? If I am correct in thinking that nominal constructions lead speakers who use such constructions to conceive of the things about which they are speaking in more static and concrete terms, and if (of which there is no doubt in my mind) German does indeed use far more such constructions than English, would a verbal translation of a nominal construction lead to fundamentally different understandings of the issues under discussion?
It is perfectly possible that this is the case, but, at least for our example of nominal and verbal constructions, there is another line of thinking that I have been pursuing recently.
Again, this example comes from the ancient past, when I was a bright-eyed and bushy-tailed translator. The following German monstrosity destroyed any naïve innocence I may have had:
Das Ausschalten des Geräts erfolgt durch Betätigen des Ein-/Aus-Schalters.
Again, for the benefit of those who do not read German, I shall not spare you a literal rendering:
The switching off of the device happens through actuation of the on/off switch.
Heaven only knows how I might have rendered that in those days! Any number of options cross my mind now. Subject to the constraints of the style employed in the rest of the manual concerned and any preferences the client may have expressed, the following sounds good to me nowadays:
Use the on/off switch to power down the unit.
I freely admit that, even to German ears and taking account of the tendency in German to use nominal constructions, the German sentence is ugly and clumsy beyond redemption. But such constructions are by no means rare. Any translator of German will encounter them many times in any working day. And any good translator of German to English will find ways to get the verbs into the translation.
Recently, however, it began to dawn on me that by introducing words into the English translation that belong to the grammatical class of verbs, I was not actually introducing the concept of verbality into the translation. It is, in fact, already present in the German.
Even German does not survive grammatically without verbs. Like English, the overwhelming proportion of all utterances in German rely on some form of interaction between nouns and verbs. But crucially, the verbs used in many German sentences that appear to us as English speakers to be “noun-heavy” do not actually carry the primary verbal import of the utterance. Hence, in the above sentence, the verb “erfolgen” does not bear the primary verbal import of the utterance (“switch off”). Instead, this is borne by the (deverbal) noun “Ausschalten”.
And German appears to have a veritable cornucopia of such verbs that actually carry little semantic import. “Erfolgen”, “bedürfen”, “unterziehen”, “stehen”, “geben”; all of these and many more can be used with very little semantic import of their own. Of course, the insight that such verbs exist is not new. The concept goes under many names, including “light verbs”, “semantically weak verbs”, “delexical verbs”, to name but a few. And, of course, they are also common in English (“have a smoke”, “take a look”). And yet, it appears to me that many of the German verbs used in such roles somehow seem “meatier” than light verbs commonly used in similar English constructions. As a result, we translators agonize over whether we should attempt to render whatever semantic import the German verb may have. In the two sample sentences I have used so far, this has not been an issue, but take the following sentence:
Das Produkt wurde einer strengen Prüfung unterzogen.
The product was subjected to stringent testing.
Yes, English can also use “subject to” in exactly this way, but is it necessary? Does the German not mean “the product was thoroughly tested”? Is it not perhaps the case that the tendency of German to use nominal constructions somewhat restricts the adverbial constructions available in German, which further pronounces the existing tendency? Translators from English to German also regularly struggle to render some English adverbial expressions in German. My own opinion is that there is no need whatsoever to translate the above sentence with “was subjected to”. “Unterziehen” is acting as a light verb, and the verbal content is carried by “Prüfung”.
In the second example, “erfolgen” is also a light verb. And in my very first example? It is the copular verb “sein” (“ist”) that is the light verb here. In conjunction with a light verb, “Optimierung” carries verbal import.
And there is a further issue: all the pertinent nouns in the examples given above are in fact deverbal nouns. They derive from verbs. And my intuition tells me that, in the vast majority of nominal constructions that give translators of German to English so much grief, this will almost invariably be the case. German deverbal nouns (possibly in contrast to English deverbal nouns) retain their essential verbality, which is “activated” by light verbs.
So perhaps the best translation for my original example sentence would look something like:
Companies optimize profits. That’s what they’re there for.
But then again, I still believe that German-speakers really do have a fundamentally different conceptualization of “Optimierung”. And they really do seem to be very good at making (and selling) “things” compared with the British, for instance!
The points above only scratch the surface of the issues relating to nominal constructions in German. There are plenty more instances in which nominal phrases have to be rendered verbally despite the absence of any verb, even a light one. And, as far as the problems for translators are concerned, more often than not, they revolve around multiple adjectival qualification of a noun phrase that would be better rendered verbally. But these few examples nevertheless suffice to illustrate the point I am making.
Ultimately, I suspect that the reality of conceptualization is a blend of the aspects touched on above (and, undoubtedly, many others). For translation practice, it means that good translators will continue to rely on their intuition to find readable, elegant and faithful renderings.
So what about functional typology?
The title of this article refers to “functional typology”, and the introduction wittered on about machine translation. What does all the above have to do with functional typology or machine translation?
There have been innumerable attempts over the past 60 years or so to describe the grammar of languages in a way that is applicable to all human languages and that allows manipulation of a language in the field of computational linguistics. These have taken the form of generative grammars of various types, dependency grammars, also of various types, and more recently construction grammars, alongside several others that do not fall neatly into any of these categories. Most claim to be applicable to all languages, and most claim to rely on certain fundamental similarities between the structures of languages, i.e. on language universals. Construction grammars differ somewhat from most generative and dependency approaches in this last respect.
Most approaches to machine translation rely on an assumption that, if one strips away the detailed grammatical structures, which are specific to a given language, an utterance in one language is essentially the same as an equivalent utterance in another language at some highly abstracted level. And this is the domain of language universals. Claims have been made, for instance, that all languages have nouns, and that all languages have verbs. Others reject these claims for some languages, even claiming that some languages have no grammatical categories that can reasonably be equated to parts of speech. Others have concentrated on the function of words rather than their grammatical embodiment, but still we usually see a fundamental assumption that complex meaning is only possible when words with a verb function interact with words with a noun function.
Crucially, it is assumed that a concept that functions verbally in one language will also function verbally in another language, in other words, that the component ideas in an utterance in two different languages essentially have the same relationships between them.
The analyses above certainly call into question the assumption that, on a grammatical level, the subject – verb – object (to take just one example) relationship in an utterance in one language can be mapped to a subject – verb – object relationship between corresponding concepts in a different language.
Even on a functional level, the actor – verb – patient relationship may not pertain between concepts across different languages if, for instance the actor in one language manifests itself grammatically and functionally as a verb in another language. It may well be that speakers of all languages construe complex meaning in terms of noun-like concepts interacting with verb-like concepts (and I personally believe this to be so). But this does not entail the assumption that the individual element of an utterance that is conceptualized as, say, a verb by speakers of one language will also be conceptualized with verbal function (as opposed to grammatical form) by speakers of another language. It is perfectly feasible, for instance, to imagine a conceptual universe where events are not construed in terms of an actor, a verb and possibly a patient (“Harry kicked the ball”), but in terms of a resulting state caused by a moving force (“the state of the ball having been kicked was caused by Harry”). Far fetched? “The switched-offness of the device is caused by the actuatedness of the on/off switch.”
Although what is conceptualized as being essentially verbal in one language may well usually correlate closely with what is conceptualized as essentially verbal in another language, there is no reason why this should always be so.
The bulk of this article has been devoted to presenting two different approaches to resolving categorial discrepancies between (grammatical) verbs and nouns between German and English. One approach suggested that there is a fundamental discrepancy in conceptualization between “Optimierung” as an “achievable end state” (and essentially nominal in nature) and “optimize” as a process with no clearly defined end point (and essentially verbal in nature). The other approach suggested that the discrepancy is merely grammatical and that, on a functional level, “Optimierung” retained the functional qualities of a verb.
The first of these approaches effectively precludes the existence of a universal functional typology that includes or relies on a functional distinction between noun concepts and verb concepts and that expects categorial correlation across languages. The second admits the potential of such universal functional categories, but falls foul of the probability that I am basing functional categories on the way in which I, as an English speaker, happen to conceptualize the world around me and that I am shoehorning discrepant conceptualizations into my own schema with a few elegant tricks.
For machine translation, the upshot is that even at an extremely abstracted functional level, I cannot rely on the fact that speakers of two different languages will construe the same perceived event as “A is doing something to B”.
While it may still be possible to make claims such as “all languages have mechanisms to express actions and state transitions” or even “all languages have mechanisms to express copular import”, there remains no reliable way of mapping the lexical and grammatical elements of language A and language B to a single purely functional schema that will necessarily be identical for both languages for the same perceived event. Any universals such as “all languages have noun-like thingies” may possibly apply, but the actual instantiation of the universals will vary from language to language. What is construed as essentially nominal in one language may well be construed as essentially verbal in another.
Like the Holy Grail, language universals, at least with any practical purpose for computational linguistics, may well not exist. If they do, we may be hunting for entirely the wrong thing.
It never ceases to amaze me, when I sit down of an evening to watch German TV for an hour or so, that I hear and understand this stream of sounds without any resort to my own language whatsoever. I am not aware of the many disjuncts between the two languages because I am operating on one language alone. How do I do it? Actually, I have no idea. But I remain convinced that the mechanisms involved can be codified and that one day we shall be able to shed light on the inner workings of the mind of one of the most peculiar animals in the world’s bestiary, the translator.