Search This Blog

7/06/2010

CAT and Statistical Methods are again proving worth

Many years have passed now since we started hearing from this technological marvel called Computer Aided Translation -CAT- (although in some countries it is still to be acknowledged), and the best is yet to come. We have seen many companies come and go, others come, grow and stay, and translation memories (TM) have completely stopped being a luxury or just an aid to completely become an asset. On an even more foggy side of the mirror we every now and then see automatic translation (AT), which had a boom some years ago promising to deliver Star Trek like translations, and then fell into oblivion for some for some parts of our ever-growing producing and consuming society. These two sides of the mirror are frequently mixed or misunderstood in many countries, generating a total rejection in most of the cases towards CAT. I have been and IT consultant in the language industry (although some years ago that seemed more like a hobby than a job in Latin America), and most of the times I have heard things like: CAT? sorry, computers can't translate; translation memory? a database? that is making translation work even more complicated; or my favorite, oh yes yes, I know about CAT, I use word and Babylon, going beyond that is useless and expensive. Nothing further from the truth. CAT has generated a brand new way to see translation and an amazing tool to recycle and reproduce translation work, still, as with all technological tools, this is a double edged knife. If you reproduce trash, you will only get trash. If you have a bad TM management, usage, and maintenance then you are just making the monster of chaos bigger. In the same way, AT is not the ultimate tool, it is not capable of substituting a human being, and will not be in most of the cases still for a long time; however, it is a great tool to deal with some parts of texts, especially when there are short simple sentences, lists, tables, amongst other types of text. This two technologies have finally met in the crossroad and will take the next step together. We have recently seen many companies include the possibility to apply a TM and then an AT engine when there are no relevant results with the first one. Thus leveraging translation tasks and reducing work. Even with all this, there is still a lot of resistance in many countries to put this two forces together to make a huge difference.

So, the verdict would easily be that some selected gurus have grasped the whole potential of these technologies and are using and taking advantage of them, while the rest of translators are still fighting with their Windows Me, or Vista (they were identically bad to me), trying to understand the new distribution of the elements in the awkward office ribbon that substituted the traditional menus, hoping to migrate to MAC for a better land of stability, learning to blog, tweet, use the Proz website, subscribe to forums, synchronize calendars, and a long long list of etceteras.

When amidst this craziness, another breakthrough in technology catches us uncovered, machine translation using statistical analysis seems to have finally evolved to start growing and being a useful tool. But what is the difference between old fashioned AT and ST? well, mainly that instead of only following certain rules to translate it compares thousands of translations between two languages and makes a "best guess". And this is where CAT and AT meet, because there is no better source of material to compare than a well segmented and populated translation memory.

The ongoing project on this regards (called SMART) will no doubt impact CAT and the language industry, but for obvious reasons (and some fear) it will impact the social stratum before the commercial one. Still, convergence between these two strata is not difficult to find. For example, a well organized translation department in a company which has been producing high quality TMs could use them to feed their internal ST engine, which could deliver in turn, with an end-user interface, quick "high quality translations" to end users that just need to know the essence of a text to move on with their work, instead of having to wait until the translation is sent, approved, done, and returned.

On the other hand, these systems are also used to retrieve information in very effective ways, so another interface could be implemented within CAT software in use to make a very fast research on each sentence and deliver pertinent results without the translator leaving his translation interface (if the software developers do I hope they give me the credit and maybe another job he he he). This interface could work in a simple way by retrieving statistically relevant information in the same language or by cross-referencing the original sentence with a TM, obtaining relevant results, then analyzing in turn the equivalent sentence in the target language to finally retrieve relevant information on this. Thus, the translators would have by the best-guess method relevant sentences to help them contextualize terms in a matter of seconds (omg I don't know if I should be writing or patenting this hehehe).

This all takes us to draw one conclusion, on the end-user side, technological tools are very important, useful, and needed by the translation community. Still, unless we want to go crazy and spend more time in configuring solutions than translating, we always need to choose those that are more relevant to us, that do not provoke a hazel in our desktop, and that are more cost-effective. While on the developer side, we need more resources embedded in one solution, TM, ST, AT, Glossaries, and autotexts (and all the others you can think about). Also, we have to be aware that this could make this solutions more expensive, but still, let us be fair, how much have you saved or gained by leveraging your work with CAT or AT? The answer, I'm sure, would be a lot, and even when I'm a supporter of open or even free software I have to say that I will put my marbles in the best bag. So be it, and may the best fighter win in this CAT solutions battle. In the end the only winner would be the final user. For more info go to GOV MONITOR or SDL Trados website
Licencia de Creative Commons
CAT and Statistical Methods are again proving worth by Rodrigo Vasquez is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

No comments: