Author: William Cook
Traveling Salesman Problem Using Genetic Algorithms

Source of information: Source article


I have developed a solution to the Traveling Salesman Problem (TSP) using a Genetic Algorithm (GA) . In the Traveling Salesman Problem, the goal is to find the shortest distance between N different cities. The path that the salesman takes is called a tour .

Testing every possibility for an N city tour would be N! math additions. A 30 city tour would have to measure the total distance of be 2.65 X 10 32 different tours. Assuming a trillion additions per second, this would take 252,333,390,232,297 years. Adding one more city would cause the time to increase by a factor of 31. Obviously, this is an impossible solution.

A genetic algorithm can be used to find a solution is much less time. Although it might not find the best solution, it can find a near perfect solution for a 100 city tour in less than a minute. There are a couple of basic steps to solving the traveling salesman problem using a GA. First, create a group of many random tours in what is called a population . This algorithm uses a greedy initial propulation that gives preference to linking cities that are close to each other.

Second, pick 2 of the better (shorter) tours parents in the population and combine them to make 2 new child tours. Hopefully, these children tour will be better than either parent.

A small percentage of the time, the child tours are mutated . This is done to prevent all tours in the population from looking identical.

The new child tours are inserted into the population replacing two of the longer tours. The size of the population remains the same.

New children tours are repeatedly created until the desired goal is reached.

As the name implies, Genetic Algorithms mimic nature and evolution using the principles of Survival of the Fittest .

The two complex issues with using a Genetic Algorithm to solve the Traveling Salesman Problem are the encoding of the tour and the crossover algorithm that is used to combine the two parent tours to make the child tours.

In a standard Genetic Algorithm, the encoding is a simple sequence of numbers and Crossover is performed by picking a random point in the parent's sequences and switching every number in the sequence after that point. In this example, the crossover point is between the 3 rd and 4 th item in the list. To create the children, every item in the parent's sequence after the crossover point is swapped.

Parent 1 FAB | ECGD
Parent 2 DEA | CGBF
Child 1 FAB | CGBF
Child 2 DEA | ECGD

The difficulty with the Traveling Salesman Problem is that every city can only be used once in a tour. If the letters in the above example represented cities, this child tours created by this crossover operation would be invalid. Child 1 goes to city F & B twice, and never goes to cities D or E.

The encoding cannot simply be the list of cities in the order they are traveled. Other encoding methods have been created that solve the crossover problem. Although these methods will not create invalid tours, they do not take into account the fact that the tour "ABCDEFG" is the same as "GFEDCBA". To solve the problem properly the crossover algorithm will have to get much more complicated.

My solution stores the links in both directions for each tour. In the above tour example, Parent 1 would be stored as:

City First Connection Second Connection
A F B
B A E
C E G
D G F
E B C
F D A
G C D

The crossover operation is more complicated than combining 2 strings. The crossover will take every link that exists in both parents and place those links in both children. Then, for Child 1 it alternates between taking links that appear in Parent 2 and then Parent 1. For Child 2, it alternates between Parent 2 and Parent 1 taking a different set of links. For either child, there is a chance that a link could create an invalid tour where instead of a single path in the tour there are several disconnected paths. These links must be rejected. To fill in the remaining missing links, cities are chosen at random. Since the crossover is not completely random, this is considered a greedy crossover.

Eventually, this GA would make every solution look identical. This is not ideal. Once every tour in the population is identical, the GA will not be able to find a better solution. There are two ways around this. The first is to use a very large initial population so that it takes the GA longer to make all of the solutions the same. The second method is mutation , where some child tours are randomly altered to produce a new unique tour.

This Genetic Algorithm also uses a greedy initial population. The city links in the initial tours are not completely random. The GA will prefer to make links between cities that are close to each other. This is not done 100% of the time, becuase that would cause every tour in the initial population to be very similar.

There are 6 parameters to control the operation of the Genetic Algorithm:

Population Size - The population size is the initial number of random tours that are created when the algorithm starts. A large population takes longer to find a result. A smaller population increases the chance that every tour in the population will eventually look the same. This increases the chance that the best solution will not be found.

Neighborhood / Group Size - Each generation, this number of tours are randomly chosen from the population. The best 2 tours are the parents. The worst 2 tours get replaced by the children. For group size, a high number will increase the likelyhood that the really good tours will be selected as parents, but it will also cause many tours to never be used as parents. A large group size will cause the algorithm to run faster, but it might not find the best solution.

Mutation % - The percentage that each child after crossover will undergo mutation When a tour is mutated, one of the cities is randomly moved from one point in the tour to another.

# Nearby Cities - As part of a greedy initial population, the GA will prefer to link cities that are close to each other to make the initial tours. When creating the initial population this is the number of cities that are considered to be close.

Nearby City Odds % - This is the percent chance that any one link in a random tour in the initial population will prefer to use a nearby city instead of a completely random city. If the GA chooses to use a nearby city, then there is an equally random chance that it will be any one of the cities from the previous parameter.

Maximum Generations - How many crossovers are run before the algorithm is terminated.

The other options that can be configured are (note: these are only available in the downloadable version):

Random Seed - This is the seed for the random number generator. By having a fixed instead of a random seed, you can duplicate previous results as long as all other parameters are the same. This is very helpful when looking for errors in the algorithm.

City List - The downloadable version allows you to import city lists from XML files. Again, when debugging problems it is useful to be able to run the algorithm with the same exact parameters.

The starting parameter values are:

Parameter Initial Value
Population Size 10,000
Group Size 5
Mutation 3 %
# Nearby Cities 5
Nearby City Odds 90 %

Note: I originally wrote this program in 1995 in straight C. The tours in the population were stored as an array of 32 bit int's, where each bit indicated a connection. Ex: If tour[0] = 00000000000001000000010000000000 in binary, then city 0 connected to city 11 and 19. That implementation was much faster than the current C# version. The greedy part of crossover could be performed by doing a binary AND on the two tours. While that code was very fast, it had a lot of binary operations, was limited in the number of cities it could support, and the code wasn't readable. Hopefully, this new version will allow for more re-use.