Ionic Column -- September 1997
Englishman David Parry lived in Tokyo from 1980 to 1994 and was a TPC member from 1986. A frequent contributor to the AJ, he was Newsletter Publisher from late 1988 to early 1990 and began the Ionic Column in 1992. This column has won a prize and an honorable mention in newsletter awards. To the Tokyo BBS community, he now lives in virtual cyberspace and teleports textually over the ether. On the physical level, he currently lives and works in Düsseldorf, that part of Germany that most resembles Japan.
This month I will say more about machine translation and translation aids, if only because I do not have enough material yet for a raunchy cybersex column. But it is probably only a matter of time, what with e-mail inviting me to make a free visit to the likes of www.cyberpimp.com. While I await a more scintillating subject, I will regale you with the joys of machine translation this month.
The Hot 100
A goodly part of August was spent sweltering in the heat of a belated but sultry Rhenish summer, cursing the Windows gremlins that foiled the installation of an in-house "translation aid" program from Alpnet. The company in question has offices around Germany and even one in Tokyo, as I discovered when I followed up the URL for NTT's Townpage telephone directory and asked for details of all the translators in Tokyo. I was able to download the details I wanted a screenful at a time, but only up to the first 100 items out of about 270. Like many of these online directories, the search engine limits you to simple search categories such as area, so I could not ask for companies with names starting with J and going up to S, for example. As always, it proves that it is best to have a business name near the beginning of the alphabet if you wish the world to beat a path to your telephone.
Missing in action
Back to the gremlins. First I installed the Windows 3.1 32 bit extensions provided along with the new program. The program installed itself on my main PC but stubbornly refused to run. I was just a bit perplexed to find that Windows could not find the executable file the moment it had been installed, and when the path was found to be correct after checking. One of the more aggravating problems with the Quantum drive on my main system with the SCSI controller has been file errors such as being unable to find a file - more precisely, to recognise a file name. I then installed the same software on the second PC, the one that recovered from a near-death experience a few weeks ago. The program loaded. Oh joy. I then loaded a text file. So far, so good. I then started the procedure to process the file, and got a fatal error immediately. I then reinstalled everything a couple of times after scooping out the old version with Uninstaller, and got exactly the same error. The techies at Alpnet were helpful, but clearly puzzled.
All that Jaz
In a way, this episode is the straw that broke the camel's back. I think that the best and cheapest thing for me to do is to remove and sell the 3.2 GB Tempest drive, and get a Jaz drive to use for additional high-speed storage on a semi-permanent basis. I could easily run all my programs off the 1 GB of my old IDE drive, even if I split it down the middle and install Windows 95 in one half. Taking matters a little further, I am wondering if the best way to run Windows 95 would be to run it off a Jaz drive. Just a few points; my copy of Windows 95 is still gathering dust, and I will only install it once I have System Commander up and running. And if I have some time...my favorite mantra for the past few months. Last but not least, would I need to make the Jaz drive bootable to start Windows 95 from it? I have a feeling that it would not be easy.
Giving SCSI the boot
Making the drive bootable means that the SCSI BIOS is still active. I wonder if my disk problems would go away on IDE system in which the SCSI controller has the BIOS disabled so that it cannot interfere with the PC BIOS. I just do not know what else can account for the problems on a PC that worked just fine with two IDE hard disks and the OEM version of an NCR SCSI card with no BIOS that drives my scanner. I may yet be able to report back on what this "translation aid" program does. But from what I heard, it is a kind of vocabulary list program that does a smart version of search and replace from a glossary file within a pre-prepared text file.
Looking at translation programs, I have a little knowledge of three that relate to European languages. Power Translator from Globalink was advertised in the Microwarehouse catalog and was a mere $119 for the basic program. I bought it, and found that it was useless for commercial work without the additional dictionaries, which is in fact true for any such program. The program is available for the four main European languages, to and from English.
Adding new entries proved to be quite a chore, as you have to tell the system all the about the new word you wish to include, and cross-check it with the translation in the reverse direction, since many words can be translated in different ways. To give an example: in German there are two words for "bolt", one of which (Schraube) is often used for "screw" as well. In many cases I cannot tell which is meant except by reading through the rest of the text and looking at the illustrations, if any. If you translate a text with the word "Schraube" and back-translate it to check the accuracy, it may emerge as either "bolt" or "screw." How is the program to decide?
Programs such as Power Translator take a lot of getting used to. You need to isolate words in the text file that are not to be translated by encapsulating them with double angle brackets or the like, which is a slow and tedious process. If you do not do this, the computer will manfully produce linguistic howlers such as "Bowel Town" for Darmstadt (near Frankfurt) and "White House" for Casablanca. The text has to be reduced to ASCII, so all formatting is lost, and the program is also unable to make sense of headers, since they have no punctuation. Pre-editing a file to do all this was very time-consuming.
I tested the program over a year ago when I only had 8 MB RAM on my PC, running Windows 3.1 then as now. Power Translator quickly ran out of memory, so I had to cut up text files into digestible portions of about ten pages of ASCII text. It was not all that quick on my Pentium/100, taking about five minutes. The text was distinctly rough, but the program can make sense of a fairly straightforward document such as a letter or a newspaper article. However, even with the supplementary technical dictionaries, the program is too limited to be of any use for my purposes. A DOS version of the program was available, but has probably been discontinued by now.
A bigger program that I have heard of several times is Trados, which is a good deal more expensive. It weighs in at several thousand dollars by the time you have bought the basic modules and enough technical dictionaries to make it usable for serious work. I picked up a brochure for Trados at CeBIT last year, but cannot find it now to give their address. I do not know what file formats are supported or which operating systems, but I would guess that it runs under Windows 3.1 / 95 and perhaps NT, and can handle a number of Windows-based word processing formats plus ASCII. I have not seen the program in action, but it is perhaps the best-known one in the translation business over here. That is not to say that it is widely used: I do not know of any person or agency regularly doing the bulk of their work on any machine translation system.
The third program is Logica, from Logica Systems. It is at the top end of this market and seemed to me to be the only one that could be described as "industrial strength." I went to a seminar held by Logica about 18 months ago and got an overview of the capabilities and limitations of this system. By and large, these apply to any machine translation (MT) system, and Logica was at least prepared to confront them honestly.
The Logica program is client/server system that runs on a Unix workstation, although it is possible to use a PC client running Windows to feed in the text. This alone reduces the potential clientele, since you need to run one of the main flavors of Unix such as SCO, AT&T / Bell Labs or Sun Solaris. Not being a Unix nerd, I am little hazy as to who sells the "original" Unix that came from Bell Labs, so I apologize to all and sundry. I did ask if the Logica program runs under Linux, and predictably it does not. This means that it would be a good deal more expensive and complex to set up than a PC with Windows. I do not recall a statement from Logica to the effect that a Windows NT version was planned, so it may be Unix only in future as well.
Logica are aware of this cost problem, and were thinking of offering an online service to translators, whereby they could upload their text to Logica together with a glossary, Logica would run the text through the machine, and send it back to the user. Together with a bill, of course, which would be not unreasonable providing you got passable results first time. At least you would not have the capital investment of the hardware and software. Since I went to the meeting as the guest of another company, I have not been following this closely and cannot say what has come of this project.
The program had one huge merit for today's translation market; it could accept formatted text from a variety of programs such as Word for Windows and DTP programs. This implies that it recognises headers and other bits of detached text. It also means that you do not have to copy the translated text in the form of ASCII back onto the original formatted file to produced a translation with the original formatting, which is what I do now when I dictate a translation.
Concerning which, I find that I have to do this process with ASCII text, otherwise the attributes of the text that I am pasting in will override the attributes of the text already there. In general, I would prefer to have precisely the opposite, but this seems to be a feature of WinWord and many other programs. The only way to avoid this is to use ASCII text for the copying operation.
One of the most interesting points of the Logica seminar was the lecture by a user, the Dutch company Oce, which produces office equipment such as copiers. They had used a number of translation agencies to produce their instruction manuals in a number of European languages, and had constant problems with consistency. This was partly due to switching agencies and translators - even though in some cases they might get the same translator even if they switched agency, and they would often ask to get the same translator again. But it was also due to two other problems; creating and using a standard glossary for each language, and creating a consistent and high-quality original text. Most texts I get have inconsistencies of various kinds, which I can usually correct for. A machine cannot.
Oce found that the key to good results with MT was to produce good original text. This took a great deal of work, but it forced them to really look hard at what they were writing, and to make it clear and lucid. One problem was to resolve ambiguities in compound descriptive terms such as for various kinds of levers (I forget their precise example) and to refer to the same item with the same word(s) each time. This alone greatly improves the accuracy of the translation.
As I said, Oce is a Dutch company, based close to the border with Germany, and it exports its products throughout the world. So, in which language do they write the documentation? Did you say "Dutch"? Try again. The original is produced in English and translated into Dutch for the local market.
How successful has MT been at Oce? The speaker said that the system now worked very well and they got a productivity increase of around 50%, allowing for all the pre- and post-editing. But they also got far better quality in terms of consistency, in all the languages. The key consideration with them is that they can translate the same text into several different languages and be sure of virtually the same quality each time. This is the real benefit of MT, they implied.
It also implies that a simple one-off translation of a text into one foreign language requires too much additional work to be really justified. And I have been talking about technical translation here. In general, MT is never going to be much good on "unfiltered" language because it cannot recognise the context. This is the main source of the oft-quoted howlers such as "the vodka is strong but the meat is rotten" (the famous rendition of "the spirit is willing but the flesh is weak" into Russian by an MT system some years ago).
I do not see that MT will put me out of business, but I do foresee that "translation aid" programs or intelligent glossaries will speed up the basic chore of searching for equivalent words or expressions. I will have more on this topic next month, when I will look at dictionaries on CD-ROM.
Comments or feedback or more information?
Contact me on Compuserve on 100575,2573
© Algorithmica Japonica Copyright Notice: Copyright of material rests with the individual author. Articles may be reprinted by other user groups if the author and original publication are credited. Any other reproduction or use of material herein is prohibited without prior written permission from TPC. The mention of names of products without indication of Trademark or Registered Trademark status in no way implies that these products are not so protected by law.
The Newsletter of the Tokyo PC Users Group
Submissions : Editor
Tokyo PC Users Group, Post Office Box 103, Shibuya-Ku, Tokyo 150-8691, JAPAN