Simulated annealing is an optimisation metaheuristic whose goal is to find the global minimum/maximum of a function in a large search space. It is a random-search technique inspired from annealing in metallurgy, that’s why in this article you will see references to notions such as the energy or the temperature.

Compared to an optimisation technique such as gradient descent, you don’t need to compute any derivative of the function you want to optimize. Therefore, it can be applied to non-differentiable functions. Moreover, it has less chance to get stuck in a local optima. The tradeoff for this feature is of course computational time.

In this article we’ll first present simulated annealing’s main concepts. Then, in a second time we’ll apply it to solve the traveling salesman problem.

# Main concepts

If we do an analogy with physics, the goal of simulated annealing is to minimize the energy of a system. In an optimization problem it means minimizing a cost function.

To address this problem we start by defining two parameters:

- The energy E, which is the value returned by the cost function you want to optimize;
- The temperature T, which we are going to explain the meaning later.

We start from an initial solution S0 that can be generated randomly. We associate an energy E0 and a temperature T0 to this solution.

At each iteration, we generate a new solution Sn that slightly differs from the previous solution Sn-1. In the case of the traveling salesman problem that we are going to solve later, it consists in switching 2 cities in our solution.

Once we have generated a new solution, we can decide to accept or refuse it. The rules to take this decision are based on the Metropolis-Hastings algorithm.

We always accept a new solution that decreases the global energy of our system since it means that it is better than the previous one.

When En > En-1 we don’t automatically reject it, even though it means that the new solution is worse than the previous one. We accept the new solution with a probability of .

## Why should we accept a worse solution ?

You may wonder why we want to accept a solution that is worse than the previous one, which at first sight may seem unlogical. The reason is to avoid local optima. Indeed, imagine on the picture below you start near point B. Then after few iterations you will converge to B. Since all of the new solutions will be worse, then you will get stuck in B, even though it is not the global minima. However, if you allow your algorithm to sometimes take “bad” decisions, it may enable you to leave B’s neighbourhood to go to A.

## At which frequency should we allow “bad” decisions

There are two parameters that influence this probability:

- The difference of energy ΔE between the two solutions;
- The temperature T of our system.

The only parameter we can influence is the temperature. The more we increase it, the more likely we are going to allow a solution that increases the global energy of our system. On the opposite, the smaller it is, the more likely we are going to refuse worse solutions.

Concerning the value to give to the temperature, it can set to a fixed value to the temperature or you can use a strategy that makes it vary over time. In the next part we will define a cooling rate parameter. After each iteration we will decrease our temperature by multiplying it with this rate.

The reason we do this is similar to the Boltzman selection in genetic algorithms. It tends to favor exploration at the begining, whereas in the end it focus on a local optima.

## Algorithm

To sum up, the general principle of the algorithm is to start from an initial solution. Then at each iteration we generate a slightly different solution. If it is better, then we automatically accept it. Otherwise we accept it with a probability of .

We repeat the iteration process until a stopping criterion is reached. It may be:

- A minimum value of the temperature;
- A number of iterations that has passed without acceptance of a new solution;
- A fixed number of iterations;

# Traveling salesman problem

In this part we apply simulated annealing to the traveling salesman problem. We already solved this problem in a previous article (in french) using genetic algorithms.

The problem is, given a set of cities, to find the shortest path to visit all of the cities exactly once.

We start by defining a City class.

Then we define two classes: Tour and TourManager.

Finally we create a SimulatedAnnealing class whose goal will be to run the simulated annealing algorithm. We need to pass the value of the initial temperature as well as the cooling rate to instantiate it.

We explain briefly the methods of this class: The accept_solution method is responsible for deciding whether we want to accept a new solution or not. As we explained in the first part of this article, when the new solution is better than the previous one we always accept it. When that is not the case, we accept it with a probability of .

The evolve_tour method is an iteration of our simulated annealing algorithm. We generate a new solution by inverting the place of 2 cities randomly selected. We use the accept_solution method to decide whether or not we keep the new generated solution.

Finally, the run method is responsible for iterating while the temperature is greater than 1. After each iteration we multiply the temperature by 1 - cooling rate, which as we said previously, tends to favor exploration at the beginning.

We use the following main to run our algorithm. The cities of our example are the same french cities that we used for the genetic algorithm:

By running the program we obtain a solution of 4615 km to visit all of our cities.

The graph below represents the distance in kilometers against the number of iterations. As we can see, it is quite chaotic at the beginning and tends to stabilize in the end, when the temperature decreases.

If we compare it with the graph below obtained by running a genetic algorithm, we can see the difference in the way the two algorithms converges.