An Introduction to a Powerful Optimization Technique: Simulated Annealing | by Hennie de Harder | Mar, 2023

By Jessie Hobb On Mar 15, 2023

It’s all about the temperature. Image by Dall-E 2.

Explanation, parameters, strengths, weaknesses and use cases

Simulated annealing is an optimization technique that tries to find the global optimum for a mathematical optimization problem. It is a great technique and you can apply it to a wide range of problems. In this post we’ll dive into the definition, strengths, weaknesses and use cases of simulated annealing.

You are not supposed to have favorite machine learning models or optimization techniques. But people do have their favorites, take LightGBM for example. In optimization, one technique I like is simulated annealing. It’s easy to understand and works for many different types of problems.

This post requires knowledge about mathematical optimization problems and local search. If you don’t have this knowledge, this post is a good start.

Simulated annealing (SA) is a stochastic optimization algorithm used to find the global minimum of a cost function. It is a meta-heuristic in local search to solve the problem of getting stuck in local minima. The algorithm is based on the physical process of annealing in metallurgy, where metal is heated and then slowly cooled. By doing this, the metal is softened and prepared for further work. The algorithm was introduced in 1983 in this paper.

In SA, the algorithm starts with an initial solution to the problem and a high initial temperature. The solution is perturbed by making a random change to it, and the resulting new solution is evaluated using an objective function. If the new solution is better (i.e., has a lower cost), it is accepted as the new current solution. If the new solution is worse (i.e., has a higher cost), it may still be accepted with a probability that depends on the temperature and the degree of worsening, this is called the acceptance value. The idea is to allow the algorithm to occasionally accept worse solutions early on in the process, in order to explore a wider range of solutions and avoid getting stuck in local minima.

The metropolis acceptance criterion is used normally to calculate the acceptance value. The inputs are the temperature and the difference between the score of the neighbor solution (new solution) and the current solution. So first, calculate the difference, and if it’s below zero accept the current solution. If it’s above zero, calculate the acceptance value, or the probability of accepting this solution while it’s worse:

As the algorithm progresses, the temperature is gradually decreased according to a cooling schedule. This reduces the likelihood of accepting worse solutions and allows the algorithm to converge towards the global minimum. There are different types of cooling schedules that can be used, such as linear, logarithmic, or geometric cooling.

SA terminates either when a satisfactory solution is found, or when a maximum number of iterations or a certain convergence criteria is met.

Pseudocode and functions used in a basic version of Simulated Annealing.

If you want to get the best out of your simulated annealing implementation (and obviously you do), it’s important to pay attention to its parameters. Here are the most important ones:

Temperature and Cooling Schedule

The initial temperature is the starting temperature of the system and determines the probability of accepting a worse solution. If the initial temperature is too high, the algorithm may accept too many bad solutions, while if it is too low, the algorithm may get stuck in a local minimum.

Directly related is the cooling schedule: it determines how fast the temperature decreases during the annealing process. If the cooling schedule is too slow, the algorithm may not converge to a good solution in a reasonable amount of time, while if it is too fast, the algorithm may converge to a suboptimal solution.

Different cooling schedules to determine the temperature for a given iteration. Starting temperature set to 100 and iteration limit of 100. Image by author.

Acceptance Criterion

The acceptance criterion determines the threshold for accepting or rejecting a new solution. The value of the acceptance criterion is the probability that a new solution is accepted while it scores worse than the current solution. It is based on the difference between the new solution score and the current solution score.

Below the acceptance values for different deltas (1, 10 and 100) related to different cooling schedules:

Acceptance values for different deltas and cooling schedules. Image by author.

So if we look at one example, iteration 75 and delta 10, the probabilities for accepting the solution are:

The differences are quite large! That’s why it’s important to test different cooling schedules with different parameters.

Number of Iterations

The number of iterations should be set high enough to allow the algorithm to converge to a good solution, but not so high that the algorithm takes too long to run.

Neighborhood Function

This is the set of moves that can be applied to the current solution to generate a new one. The neighborhood function should be chosen carefully to balance the exploration of the search space with the exploitation of good solutions. For every problem, you can construct it in different ways. Below are 4 different types of local moves for the traveling salesman problem:

Common local moves for the traveling salesman problem. Image by author.

SA is powerful and effective. Here are five strength points of the meta-heuristic:

Strength 1. Effective in finding global optima

Instead of getting stuck in local optima (like local search), SA is designed to find the global minimum of a cost function. By allowing for random moves and probabilistic acceptance of worse solutions, it can effectively search large solution spaces and find global optima, even in the presence of local optima.

Below you can see what happens when you apply local search without meta-heuristic like SA. You start somewhere and you only accept better solutions. Then you end up in a local minimum:

Local search. The final solution depends on the starting solution. Gif by author.

Simulated annealing can escape local minima, because it accepts worse solutions. If we start with the same solution, this can happen with SA:

Simulated annealing search for the optimal solution. Gif by author.

Besides SA, you can try other meta-heuristics to escape local minima, like iterated local search or tabu search.

Strength 2. Robustness

Simulated annealing can handle a wide range of optimization problems with different types of cost functions, constraints, and search spaces. It does not rely on assumptions about the nature of the cost function, and is robust to noise and uncertainty in the input.

Strength 3. Flexibility

Another strength is that SA can be easily customized and adapted to different problem domains and search spaces. The cooling schedule, neighborhood structure, and acceptance criterion can be tuned to suit the specific problem being solved.

Strength 4. Efficient

Generally, the algorithm is efficient in terms of computational resources required to find a solution. It can often find good solutions quickly, particularly in the early stages of the algorithm when large, random moves are allowed. If the problem is large, you can save a lot of time by using SA instead of exact algorithms like MIP, while still getting good quality solutions.

Strength 5. Parallelizable

Parallelizing allows for faster optimization and the ability to search a wider range of solutions simultaneously. SA is easy to parallelize. Below a code snippet that sets up parallelization:

import multiprocessing as mp
import numpy as npdef simulated_annealing(problem: Problem, seed: int):
# Set up problem-specific variables
# Initialize a starting solution
# Run simulated annealing algorithm
# Return best solution found
best_solution = np.random.randint(0, 30)
return best_solution
def parallel_simulated_annealing(problem: Problem, n_processes: int = 5):
seeds = np.random.randint(0, 2**32-1, size=n_processes)
pool = mp.Pool(n_processes)
results = [pool.apply_async(simulated_annealing, args=(problem, seed)) for seed in seeds]
pool.close()
pool.join()
best_solution = results[0].get()
for r in results[1:]:
s = r.get()
if problem.is_better(s, best_solution):
best_solution = s
return best_solution
class Problem:
def __init__(self):
# Set up the problem
pass
def is_better(self, s, best_solution):
if s < best_solution:
return True
else:
return False
if __name__ == '__main__':
solution = parallel_simulated_annealing()

Beware, the implementations of simulated_annealing and the Problem class are really simplistic and unreal. A random number is returned and the lowest one is accepted as the best solution.

As with any optimization technique, there are some downsides too. Here are the most important ones:

Weakness 1. Tuning

Tuning is required. E.g. the cooling schedule, neighborhood structure, and acceptance criterion require tuning in order to achieve good results. To finetune these parameters, it’s best to use a combination of domain knowledge and experimentation. It’s not only important to tune for different problems, but problems with the same structure that are different in size are probably also better off with different parameters.

Weakness 2. No guarantee of optimality

Simulated annealing cannot guarantee finding the global optimum, as it relies on probabilistic moves and acceptance of worse solutions. It can only guarantee finding a locally optimal solution. By using SA, you don’t know the gap between the optimal solution and the current best solution, while this is possible if you would use an exact algorithm.

Gurobi log with % gap between the current solution and the best possible solution. Image by author.

Weakness 3. Slow convergence

In some cases, SA can converge slowly, particularly for complex problems with large search spaces. This can make it inefficient for problems with tight time constraints. One way to solve this is to specify a time limit.

There are several use cases that are a good fit for simulated annealing. Combinatorial optimization problems with a large search space and a non-convex objective function, are really well suited for simulated annealing. Examples are the minimum spanning tree problem, traveling salesman, capacitated vehicle routing problem or the graph coloring problem.

Also for nonlinear optimization problems, simulated annealing can be effective. If these problems have multiple local optima, traditional gradient-based optimization techniques may fail. For example, simulated annealing has been used for protein folding and molecular conformation problems.

On the other hand, for certain types of problems you shouldn’t consider simulated annealing. It may not be the best choice for convex optimization problems, where traditional gradient-based optimization techniques are often more efficient and effective.

Simulated annealing can be computationally expensive and slow for large-scale optimization problems with high-dimensional search spaces. In these cases, other optimization techniques such as genetic algorithms or particle swarm optimization may be more effective, because they look at multiple solutions at once.

Protein folding. Photo by National Cancer Institute on Unsplash

Simulated annealing is a stochastic optimization algorithm based on the physical process of annealing in metallurgy. It can be used to find the global minimum of a cost function by allowing for random moves and probabilistic acceptance of worse solutions, thus effectively searching large solution spaces and avoiding getting stuck in local minima. SA is powerful, robust, flexible, efficient, and parallelizable. However, it requires tuning, cannot guarantee finding the global optimum, and can converge slowly in some cases. To get the best results, it’s important to pay attention to its parameters, such as the initial temperature, cooling schedule, neighborhood function, acceptance criterion, and number of iterations. For some use cases, you should consider other techniques like genetic algorithms or particle swarm optimization.

Explanation, parameters, strengths, weaknesses and use cases

This post requires knowledge about mathematical optimization problems and local search. If you don’t have this knowledge, this post is a good start.

SA terminates either when a satisfactory solution is found, or when a maximum number of iterations or a certain convergence criteria is met.

If you want to get the best out of your simulated annealing implementation (and obviously you do), it’s important to pay attention to its parameters. Here are the most important ones:

Temperature and Cooling Schedule

Acceptance Criterion

Below the acceptance values for different deltas (1, 10 and 100) related to different cooling schedules:

So if we look at one example, iteration 75 and delta 10, the probabilities for accepting the solution are:

The differences are quite large! That’s why it’s important to test different cooling schedules with different parameters.

Number of Iterations

The number of iterations should be set high enough to allow the algorithm to converge to a good solution, but not so high that the algorithm takes too long to run.

Neighborhood Function

SA is powerful and effective. Here are five strength points of the meta-heuristic:

Strength 1. Effective in finding global optima

Below you can see what happens when you apply local search without meta-heuristic like SA. You start somewhere and you only accept better solutions. Then you end up in a local minimum:

Simulated annealing can escape local minima, because it accepts worse solutions. If we start with the same solution, this can happen with SA:

Besides SA, you can try other meta-heuristics to escape local minima, like iterated local search or tabu search.

Strength 2. Robustness

Strength 3. Flexibility

Strength 4. Efficient

Strength 5. Parallelizable

Parallelizing allows for faster optimization and the ability to search a wider range of solutions simultaneously. SA is easy to parallelize. Below a code snippet that sets up parallelization:

import multiprocessing as mp
import numpy as npdef simulated_annealing(problem: Problem, seed: int):
# Set up problem-specific variables
# Initialize a starting solution
# Run simulated annealing algorithm
# Return best solution found
best_solution = np.random.randint(0, 30)
return best_solution
def parallel_simulated_annealing(problem: Problem, n_processes: int = 5):
seeds = np.random.randint(0, 2**32-1, size=n_processes)
pool = mp.Pool(n_processes)
results = [pool.apply_async(simulated_annealing, args=(problem, seed)) for seed in seeds]
pool.close()
pool.join()
best_solution = results[0].get()
for r in results[1:]:
s = r.get()
if problem.is_better(s, best_solution):
best_solution = s
return best_solution
class Problem:
def __init__(self):
# Set up the problem
pass
def is_better(self, s, best_solution):
if s < best_solution:
return True
else:
return False
if __name__ == '__main__':
solution = parallel_simulated_annealing()

Beware, the implementations of simulated_annealing and the Problem class are really simplistic and unreal. A random number is returned and the lowest one is accepted as the best solution.

As with any optimization technique, there are some downsides too. Here are the most important ones:

Weakness 1. Tuning

Weakness 2. No guarantee of optimality

Weakness 3. Slow convergence

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

An Introduction to a Powerful Optimization Technique: Simulated Annealing | by Hennie de Harder | Mar, 2023

Explanation, parameters, strengths, weaknesses and use cases

Temperature and Cooling Schedule

Acceptance Criterion

Number of Iterations

Neighborhood Function

Strength 1. Effective in finding global optima

Strength 2. Robustness

Strength 3. Flexibility

Strength 4. Efficient

Strength 5. Parallelizable

Weakness 1. Tuning

Weakness 2. No guarantee of optimality

Weakness 3. Slow convergence

Related

Explanation, parameters, strengths, weaknesses and use cases

Temperature and Cooling Schedule

Acceptance Criterion

Number of Iterations

Neighborhood Function

Strength 1. Effective in finding global optima

Strength 2. Robustness

Strength 3. Flexibility

Strength 4. Efficient

Strength 5. Parallelizable

Weakness 1. Tuning

Weakness 2. No guarantee of optimality

Weakness 3. Slow convergence

Related