Project: Creating a Context-based Translator Using Data Mining
In line with parsing languages, I decided for my next project to be to translate stuff. I’ve been learning French and I’ve noticed that Google Translate really isn’t that good at translating. When I do have to translate a sentence, I use Reverso. It allows me to search my sentence and comes up with pre-translated sentences (from other sources) that best match it. It often only matches a quarter or half of a sentence or even less, so it’s not very automatic and requires fine-tuning by me, with other searches.
I thought that I would solve this problem. The first thing I looked at on Reverso was whether or not there was an APA that I could use. Unfortunately, there was not. I vaguely remembered a Python library called BeautifulSoup to parse the HTML. This was my first time doing so and I found it interesting that I was parsing HTML to get the results from a natural language search thing to translate a text.
I had trouble with the library at first, but eventually I (with my google-foo) managed to work out how to get the first result from a search.
Now that I had that, I just needed to get an input sentence and work this magic on it. I thought that I would iterate searching the sentence and then the sentence minus a word and so on, until a search term completely matches a pre-done translated text. This will ensure that the translation is as accurate as possible.
It took a while, but here is the test output for “the dog plays with the ball”:
le chien
le chien se joue de la balle
Surprisingly, the conjugations come out fine most of the time, which I thought was pretty weird. The translator is pretty good, but sometimes the translated text requires a bit of cleaning up. One problem is when the original text is part of an idiom, which messes up the translated text. After cleaning up the code, adding comments and an interface (so you can choose what language you want to translate into), here is the final code: