I started researching chatbots and how they worked. It was actually slightly disappointing because as far as it’s concerned, chatbots basically use statistics and algorithms to determine what to respond (at least the basic ones).
Anyway, I still decided to play around with this module. It’s pretty cool because you can train it as you speak to it. However… when I tried that feature, it was very VERY bad. It repeated itself and myself and it made no sense at all. So I decided to give it a little bit of pre-training.
Now, I began looking for corpora of English dialogue and eventually found this, after a few unusable databases. It’s an ESOL site and it was just PERFECT. It gave the dialogue in an easy A: B: format and I figured I could use BeautifulSoup to parse it pretty easily.
That was when I ran into a problem, because I simply couldn’t parse it. Instead, I had to use Regular Expressions, which I learnt way way back and had to look it up again. After a nice stroll down memory lane, I came up with these expressions:
The rest of the code was pretty easy actually; it just took a bit of time. The amount of dialogue is absolutely massive and I gave it a go at processing all of it today (i got up to about line 6000 of dialogue), but it took too long and I gave up. I’ll do it tomorrow, but here’s the functioning code: