Genetic Algorithm — An Optimization Approach | by Prasun Biswas | Nov, 2022

By Jessie Hobb On Nov 16, 2022

An introduction to optimization using genetic algorithms and implementations in R

Optimization is a very important concept in any business domain be it retail, finance, automobile or healthcare. In simple words, the purpose of optimization is to find a point or set of points in the search space by minimizing/maximizing the loss/cost function, that gives us the optimal solution for the problem in hand. Here, we try to minimize/maximize the objective function f(x) subject to one/multiple constraints like,

Minimize f(x)
Subject to:
g(x) = 0 (equality constraint)
h(x) <= 0 (inequality constraint)
lb <= x <= ub (lb: lower bound, ub: upper bound)

This article will help you to understand how we can optimize the problem statement using genetic algorithm (GA), which is one of the simplest evolutionary algorithms (EAs).

What is a Genetic Algorithm and how it works?

The basic intuition of the evolutionary algorithm is to select the best individuals as parents from the population, asking them to reproduce to extend the generation. During this reproduction process genes from both the parent’s crossover and in a way, an un-inherited error occurs which is known as mutation. Then the next generations are asked to reproduce their offspring and the process continues. The evolutionary algorithm is inspired on this theory of crossover and mutation, where basically Crossover is used to create new solutions from population’s genetic information and mutation occurs to bring new information or maintain diversity within the population and prevent premature convergence to make the solution more generic.

Genetic Algorithm (GA) is a search-based optimization technique based on the principles of biological evolutions though Genetics and Natural Selection. It is commonly used to find optimal or near-optimal solutions to the problems from the search space which otherwise would have taken a significant amount to solve. Now, let’s talk about the fundamentals of the working algorithm.

Initiate the Population

Let us first understand what population means in GA.

Population (or Generation): It is a set of all possible solutions (candidate solutions) to initiate the search. GA will iterate over multiple generations until it finds an optimized solution.

Chromosome: It represents one particular(candidate) solution present in the population.

Gene: It is a single element of the decision variable that contains the value(allele) and position(locus) of the particular Chromosome.

First-generation is randomly initiated (Random Initialization). The algorithm generally starts with the randomly generated population. The size of the population depends on the nature of the problem and is determined by the size of the decision variable. The same size of the population is maintained throughout all the generations. Also, there is another option to initiate the population using a known heuristic for the problem (Heuristic Initialization) but it is not recommended usually as it can result in the population having similar solutions and very little diversity.

Evaluate the Fitness

The fitness function evaluates how fit a candidate solution is for the objective. It gives a fitness score (probability value) to each individual based on which that individual will be selected for reproduction.

Usually, the fitness function can be thought as an objective function but in case of complex problems, where multiple objectives and constraints are present, fitness function can be designed differently.

A fitness function should have the following characteristics:

· The fitness function should be sufficiently fast to compute.

· It must quantitatively measure how fit a given solution is or how fit individuals can be produced from the given solution.

Select Parents

Parent selection is the process of identifying the fittest solution from a population, and it reproduces next generation of solutions. It helps the next generation to inherit the “good” features naturally. As in generation 0, we do not have any offsprings, we select parents from the population initiated.

This selection is very critical as it drives individuals to better and fitter solutions and hence leading to convergence of the algorithm. Also, one of the key characteristics of the parent should be maintaining good diversity in the population. There are different methods to select parents (read here).

Crossover

Crossover helps to generate new offspring. As GA is random-based EA, the amount of genes carried from each parent is random. Certain genes from both the parent chromosomes are overlapped or mixed to produce new generation. Since the offspring is the result of the parent chromosomes’ crossover, it inherits both the parents’ characteristics. Sometimes the offspring takes half of its genes from one parent and the other half from the other parent and sometimes such percent changes.

There are different methods like single point crossover, double point crossover, uniform crossover. Uniform crossover is the popular one to perform the crossover. Here, genes are randomly chosen from the parent chromosome. Here the genes which are inherited are marked as 1 and others as 0, which comes from the uniform distribution (let say these are called unif_value). Now the unif_value gets multiplied with the gene value of each parent chromosome. The parent selection and crossover operations are repeated multiple times until the number of the solutions in the next generation reaches the population_size (the same size of the population is maintained throughout all the generations).

Mutation

Mutation helps to include new characteristics into the gene, which basically maintain the diversity within the population and prevent premature convergence of the process. Mutation is the part of the GA which is related to the “exploration” of the search space. Hence, mutation is essential to the convergence of the GA while crossover is not.

Criteria Satisfied

The algorithm terminates when it has reached to its convergence i.e., it does not produce offspring which are significantly different from earlier generation. Usually, the algorithm terminates if any of the following criteria is satisfied:

· if there is no fitness improvement in the population for next X iterations

· If objective function has attained the optimal or pre-defined value

· If maximum number of generations have been reached

Why Genetic algorithm

Genetic Algorithm (GA) has the ability to provide a “good-enough” solution “fast-enough” in large-scale problems, where traditional algorithms might fail to deliver a solution. It provides a generic framework for solving the complex optimization problem. Below are few advantages of using GA algorithm:

a) Overcomes the failure of Traditional optimization algorithm

A function is differentiable if we can calculate the gradient for any given point. Traditional algorithms like gradient descent, newton’s method use derivative approach to obtain optimal solution. It starts with a random point and by moving in the direction of the gradient, till we reach the peak. This technique is efficient and works very well for linear regression type of problem where we have single-peaked objective function. But, in real world we have a complex problem having multiple peaks and many valleys (non convex objective function). Here the traditional algorithms suffer from an inherent tendency of getting stuck at the local optima.

Whereas, the genetic algorithm does not require the gradient of the objective function. It can be used to variety of optimization problems where the objective function is discontinuous, non-differentiable, stochastic, or highly nonlinear.

b) This can be easily parallelized.

c) GA is quite fast and explores the search space in a relatively short computation time.

d) Multiple complex optimization objectives can be included

Limitations

a) GA might not converge to optimal solution if not implemented properly

b) Might be computationally expensive as fitness function is calculated repeatedly.

c) Sometimes GA doesn’t allow hard constraints, so need to pass them as penalties in the objective function. Penalty function reduces the fitness of infeasible solutions, so that the fitness is reduced in proportion with the number of constraints violated or the distance from the feasible region.

Few examples as Use Cases

a) Inventory optimization: This can help the replenishment decisions which will avoid to stock-out conditions and therefore lost in sales.

b) Store Space optimization: This can help to increase the sales of the store by reshuffling the space, keeping some factors as constraint.

c) Power supply optimization

Implementation: Code Snippet in R

 Define the functions:
pred_eval_fun_P <- function(x) {}: Defining fuction for calculating predicted demand          
orig_eval_fun_P <- function(x) {}: Defining fuction for calculating actual demand 
eval_g_ineq1 <- function(x) {}: Creating inequality constraint for space condition
eval_g_ineq2 <- function(x) {}: Creating constraint to maximize profit over current profit 
fitness_P <- function(x) {}: Creating the objective function and adding penalty as it can't take hard constraint. As example like below:
{
f <- eval_fun_P(x)               # maximise f(x)
p <- sqrt(.Machine$double.xmax)  # penalty term
pel_int <- abs(eval_g_ineq1(x))
pel_int1 <- abs(eval_g_ineq2(x))
penalty1 <-                      # define as needed
penalty2 <-                      # define as needed
f-penalty1-penalty2               # fitness function value
}lb: Defining the lower bound
ub: Defining the upper bound
no_cores <- detectCores() - 1
Also define starting, max_gen, hard_gen as needed
library(snow)
library(rgenoud)
cl1 <-  makeCluster(no_cores, type="SOCK") #makePSOCKcluster(no_cores)
clusterExport(cl1, list("fitness_P","lb","ub","pred_eval_fun_P","eval_g_ineq1","eval_g_ineq2","starting","max_gen","hard_gen"))
GA_P1 <- genoud(fitness_P, Domains=cbind(lb,ub), nvars=length(lb), pop.size=100, max=TRUE,wait.generations = 50, max.generations = max_gen, hard.generation.limit=hard_gen, data.type.int=TRUE, cluster = cl1,print.level = 1, solution.tolerance = 0.001)
final_reco <- as.numeric(as.character(GA_P1$par))
stopCluster(cl1)

Disclaimer: The views expressed in this article is the opinion of the author in his personal capacity and not of his employer