I’ve heard about the Travelling Salesman Problem a while ago, but hadn’t looked into it till now. I wanted to try solving it using genetic algorithms, to see if I can.
Basically, here’s the problem:
You are given a list of cities and the distance between them.
The salesman has to visit each city, only once, and return to the original city.
Find the shortest distance required.
So, I looked for data sets online for the TSP and found this site. I chose Djibouti as it only had 38 cities, which I think is more than enough to try out with. With the data set provided, as well as the optimum solution, I went to work…
Shown above, is the graphed version here. The data points are given here.
Unfortunately, I found the data set hard to use in it’s raw form, so I did this to convert it into a form that was usable:
I converted the string of numbers into a list of coordinates, which was then divided into node, x-coordinate, and y-coordinate. The node number was unnecessary, so it was removed. The strings were then converted into floats.
The floats were then manipulated so that they could be used as coordinates by pygame. By subtracting a number from every co-ord, the distance between them would remain unchanged.
I further manipulated the points by dividing all co-ordinates by 3, so that they could fit on the screen.
Once I rendered the nodes in Pygame, however, they didn’t match up with the one in the image. I realised that the difference was that their render used co-ordinates that started at the bottom left, and that it wouldn’t make a difference, so I just kept it that way.
I created each route, in the population, with a list that randomly chose from the cities once, including the first city.
The fitness function was then calculated using the distance formula (which I decided to implement this time, instead of importing numpy:
Since self.route didn’t contain the original city, at the end, it also had to be added.
The fitness function took a little tinkering. But by using the reciprocal of the distance, converting it to a number > 1 and then raising it to the power of 15, I made it so that the lower the distance, the greater the fitness.
Now here was the really hard part. The customary cross-over function wouldn’t’ve worked, since you could only visit each city once. If a midpoint was chosen and then split along it, the different cities would likely result in cities placed more than once in the genes.
Hence, I thought of creating another “used choices” list, which stores the choices already made. And, instead of picking a midpoint, each index would be randomly chosen from the two parents unless it was already on the list of used choices.
I thought this was all good and worked until I ran into a problem:
Parent 1: A B C
Parent 2: B C A
Possible child: A C (unable to continue)
Using this algorithm, it would’ve chosen randomly on the first index, between A and B. If it chose A, then A would also be placed on the used choices list. The next index would also be randomly chosen. But, because only A was on the list, it would still randomly select from both. Thus, if it chose C, then the next choice can’t be made as both C and A are on the used choices.
What to do then? I thought of creating two lists instead, one of chosen choices and one of the previously unchosen choices instead. If one of the choices have already been unchosen, then that choice would be the unchosen one. Then if both choices have not been unchosen, then it would check with the chosen choices.
1st choice: A (randomly chosen)
List of unchosen: B
2nd choice: B (forced)
List of unchosen: B, C
3rd choice: C (forced)
List of unchosen: B, C, A
Maybe that was obvious, or maybe someone’s already done it that way, but I’m still proud of it and proud that I figured it out.
And yet, it still didn’t work, which I realised was a fault of this algorithm again and could not be solved without completely going back to the basics. So I remedied the few duplicates by setting up a deadlock list that would take in these pairs and then choose randomly assign the choices that weren’t chosen, at the end. I realise that this isn’t a true cross-over between the two parents, but since the duplicates are quite low in number, it shouldn’t make that much of a difference.
Again, the mutation function wouldn’t work if I just copied the generic mutation function. If I randomly mutated a city into another, there would be duplicates. This was easily solved, by simply swapping both cities. The mutation rate would have to be lowered, since each mutation would result in two changes.
Other than the above functions, the rest of the genetic algorithm was fairly simple to implement. One thing of note was that the lines were drawn, using a list comprehension:
When I ran it the first time, I thought it worked and I was so happy. But I realised that as it would slow down, until it got to around the seventh generation and ground to a stop. It took me longer than I would admit to find out the reason. I tried optimising it, little things like moving stuff around, using xrange, but it kept not working. I finally realised that it was the fitness function at fault. The lower the distance, the greater the fitness value. In fact, it worked too well and the fitness value eventually went over the roof, which incredibly increased the length of the mating list, until it was too long. I simply adjusted the fitness function like so, and it worked:
The program still doesn’t go to the optimum distance of 6600 but gets to around 7100, which is pretty close. Here’s the full code, and I’ll try to see if I can optimise it even further tomorrow.