Heuristics as Warm Start for Mixed Integer Programming (MIP) Models | by Bernardo Furtado | Apr, 2023

By Jessie Hobb On Apr 6, 2023

Setting a starting solution in MIP Models: a scheduling application

In computer science, heuristics are techniques used to find a feasible solution to a given problem, typically faster than exact methods but without a guarantee of optimality. On the other hand, exact methods are much more expensive computationally, but the optimal solution is guaranteed.

Modeling a problem as a Mixed Integer Program (MIP) and solving it using a solver may give you the optimal solution. Usually, those solvers use a branch-and-bound algorithm behind the scenes. Branch and Bound (B&B) is considered an exact method to solve optimization problems.

As the name suggests, it enumerates the solutions as a tree (branching). The main difference between B&B and exhaustive search, often called brute force, is the bounding phase. During the process, every node (solution) is compared against the solution’s lower and upper bound. If the branch is proven not to give better results than the best already found, it is pruned and the algorithm goes to another branch.

The target of this article is not to give too many details on the Branch and Bound algorithm; a much deeper explanation is found here. But we have to define a couple of things:

Incumbent: best feasible solution found at each step of the algorithm. If it is a minimization problem, it is an upper bound.
Best Bound: valid lower bound of the solution.
Gap (%): difference between Best Bound and Incumbent. If incumbent is equal to the best bound, the gap is 0, and optimal solution was found.

State-of-the-art solvers have different methods to calculate an incumbent and best bound for any MIP. But for the solver, our MIP is just a set of equations and objective function. Since we know the exact structure of our problem, wouldn’t it be helpful to develop an Adhoc algorithm and provide the result as an incumbent itself? Yes, and that’s when heuristics and warm-start comes into the picture

The permutation flow shop scheduling problem is one of the most classic optimization problems in the literature. It can be briefly described as follows: given a set of machines m and a set of jobs n, how to process those jobs such that every job has to go through all the machines in the same order. Objective is to minimize the completion time of last job.

Let’s suppose there are 3 machines (M1, M2 and M3) and 3 jobs (J1, J2 and J3). Jobs have to follow that order given. A possible feasible solution is shown below in form of Gantt Chart:

Feasible solution for Permutation Flow shop

The completion time, or Makespan, of this schedule is 34 and the solution/order is J1, J2, J3.

This problem can be formulated as MIP model. Let’s call m the set of machines and n the set of jobs. The jobs have to go through the same order of machines {0,1,…,m}. The processing time of job j in machine i is given by p_ij. Let’s create those parameters in python. We’ll start with a set of 15 jobs and 5 machines. Processing time is generated randomly between 1 and 100. M is an upper bound of the solution. A good upper bound is the sum of all processing times of all jobs in all machines.

import random
import numpy as nprandom.seed(10)
# Number of jobs
n = 15
# Number of machines
m = 5
# Max time for random input generator
max_time = 100
# Generate processing times randomly
times = np.zeros((m, n))
tot_time_job = []
for i in range(m):
for j in range(n):
times[i][j] = random.randint(1, max_time)
# Total processing time per job - sum across machines
tot_processing_job = np.sum(times, axis=0)
# Upper bound of the solution - sum of transit matrix
M = sum(times[j][i] for i in range(n) for j in range(m))

There are two sets of variables. Variable x_ij, a continuous variable, which is the starting time of job j in machine i. And variable y_jk, a binary variable that is equal to 1 if job j is executed before k. 0 otherwise. Cmax is the makespan of the schedule. Now let’s define that using gurobipy

opt_model = grb.Model(name="Flow shop scheduling")# Start time of job j at in machine i
x = {(j, i): opt_model.addVar(vtype=grb.GRB.CONTINUOUS, lb=0, name="x_{0}_{1}".format(j, i))
for j in range(n) for i in range(m)}
# 1 if job j is executed before job k. 0 otherwise
y = {(j, k): opt_model.addVar(vtype=grb.GRB.BINARY, name="y_{0}_{1}".format(j, k))
for j in range(n) for k in range(n) if j != k}
# Makespan - Completion time of last job in last machine
c = opt_model.addVar(vtype=grb.GRB.CONTINUOUS, lb=0, name="c")

Constraints are shown below:

(1) ensures that processing of job j in machine i can start only when it is finished in machine i-1. (2) and (3) are the disjunction constraints — if job j is executed after job k, it should start after its completion, in the given machine. (4) defines the makespan, which is the completion time of last job in machine m (last machine).

# Job j in machine i can start only when it is finished in machine i-1
c1 = {(j, i): opt_model.addConstr(x[j, i] - x[j, i - 1] >= times[i - 1][j],
name="c1_{0}_{1}".format(j, i))
for j in range(n) for i in range(1, m)}# Disjunctive constraints - if job j is after k, it should start after its completion
c2 = {(j, k, i): opt_model.addConstr(x[j, i] - x[k, i] + M * y[j, k] >= times[i][k],
name="c2_{0}_{1}_{2}".format(j, k, i))
for j in range(n) for k in range(n) for i in range(m) if k != j}
c3 = {(j, k, i): opt_model.addConstr(-x[j, i] + x[k, i] - M * y[j, k] >= times[i][j] - M,
name="c3_{0}_{1}_{2}".format(j, k, i))
for j in range(n) for k in range(n) for i in range(m) if k != j}
# Makespan is the completion time of last job in last machine
c4 = {j: opt_model.addConstr(c >= x[j, m - 1] + times[m - 1][j],
name="c4_{0}".format(j))
for j in range(n)}

Objective function is minimize makespan:

opt_model.ModelSense = grb.GRB.MINIMIZE
opt_model.setObjective(c)
opt_model.setParam('MIPGap', 0.018)
opt_model.optimize()

If we set the MIP Gap to be 1.8%, the best objective is 832. The convergence of the Branch-and-Bound algorithm is shown in the chart below:

Convergence curve for Permutation Flow shop example

Incumbent starts at around 1,200 (the first solution gurobi was able to find) and converges until 832. What if we could find a better initial solution for the problem and use it as the incumbent?

The NEH heuristic is one of the most known heuristics for the flow shop scheduling problem. It is named NEH due to its authors (Nawaz, Enscore, and Ham). It is a constructive heuristic, therefore, it starts from an empty solution and constructs the schedule iteratively until all the jobs are assigned.

A solution representation of the permutation flow shop problem is a list of n elements. The list is ordered by start time — the first job in the list is the first to be executed and the last job is the latest. The first step is to create a function that receives the ordered list (a solution) as input and returns the makespan associated to it.

def get_makespan(solution):
''' Calculate the makespan of a sequence of integer (jobs).
- A job can start only after the previous operation of the same job in machine j-1 ends and
machine is not processing any other job
- Finish time of a job in a given machine is its start time plus processing time in current machine
'''
end_time = np.zeros((m, len(solution) + 1))
for j in range(1, len(solution) + 1):
end_time[0][j] = end_time[0][j - 1] + times[0][solution[j - 1]]
for i in range(1, m):
for j in range(1, len(solution) + 1):
end_time[i][j] = max(end_time[i - 1][j], end_time[i][j - 1]) + times[i][solution[j - 1]]
return end_time

NEH starts by sorting the jobs in decreasing order of processing times (sum of all machines). The first iteration adds the job with the highest processing time at the beginning of the schedule. For the remaining jobs, it tries to insert it in all possible positions in the current solution, compares the (partial) makespan of each, and stores the best solution found. This process is repeated until all the jobs are assigned. The function is shown below:

def neh():
''' Heuristic NEH (Nawaz, Enscore & Ham) for flow shop scheduling
1 - Start from an empty schedule
2 - Add first the job with highest sum of processing time
3 - Go through the list of the unassigned jobs, test all in all possible positions in the current solutions
4 - Assign the best job at the best position (with lowest makespan) at the final solution
5 - Repeat (3) and (4) until there are no job unassigned
'''
initial_solution = np.argsort(-tot_processing_job)
current_solution = [initial_solution[0]]
for i in range(1, n):
best_cmax = 99999999
for j in range(0, i + 1):
temp_solution = current_solution[:]
temp_solution.insert(j, initial_solution[i])
temp_cmax = get_makespan(temp_solution)[m - 1][len(temp_solution)]
if best_cmax > temp_cmax:
best_seq = temp_solution
best_cmax = temp_cmax
current_solution = best_seq
return current_solution, get_makespan(current_solution)[m - 1][n]

The result found by NEH has makespan of 891, much better than the 1,200 found by Gurobi as first incumbent.

If we wanted to start with our incumbent, B&B from Gurobi would be able to skip a lot of branches with solution higher than 891. As in the chart below:

Convergence curve for Permutation Flow shop example against NEH solution

Using a warm start in Gurobi is relatively easy. The only challenge is to transform the data structure returned by the heuristic into values of the MIP variables. For our problem, we must transform an ordered list and a makespan value into variable values of y, x, and c.

cmax_heuristic and c variable have thesame meaning, so the only task is assigning the variable start value.

c.Start = cmax_heuristic

Transforming the ordered list (sequence_heuristic) into variable y is also straightforward. One needs to iterate through the list, and if job j comes before job k in the list, y_jk = 1. 0 otherwise.

for j in range(n):
for k in range(n):
if j != k:
y[j, k].Start = 0
for i in range(0, len(sequence_heuristic) - 1):
for j in range(i + 1, len(sequence_heuristic)):
j1 = sequence_heuristic[i]
j2 = sequence_heuristic[j]
y[j1, j2].Start = 1

We don’t need to assign variable x values. Gurobi can infer their values from the constraints. Once the model is executed, the user will be able to see the following message:

It means the solution is feasible, its objective is 891, and it will be used as the MIP start. What if we try to insert an unfeasible MIP start? For example, if we try to add as a start y_ij = 0, for all i and all j. Of course, this is not feasible since all jobs must be on the schedule. In that case, the user would see the following message:

Having a heuristic that finds a good starting point for MIP is not the only use of the MIP start functionality. There are a couple of others:

In a real-world problem, the user could feed the model with a scenario that happened in reality. In this case, the baseline would be the model’s starting point. For example, if a company wants to create a model for their routing process and if there’s historical data available, the routes taken by one day of operation could be a MIP start if it respects the constraints given in the mathematical model. This is also a good exercise to check if the designed constraints are respected in reality.
As already explained, for solvers, our model is just a bunch of equations. If the user doesn’t want to spend time developing heuristics, using a very simple/naive method could also be helpful. For example, an ordered list of jobs by their index is a feasible solution ([1,2,3,..,n]) and could be useful if solver is struggling to find an initial point.
If a very good known solution is available and one wants to prove it is the optimal, or how far it is from the optimal.

Last but not least, it is not guaranteed that a MIP start, even if it is better than the solver’s initial solution, will improve convergence and/or run time. This is because the solver takes an entirely different path than the original. The case we showed before is a good example. The convergence curve of MIP using a MIP start is shown below:

Convergence curve for Permutation Flow shop example using NEH as start

It indeed starts from a better solution, but it gets stuck in values close to 891.

Finally, MIP start is a very useful resource and almost all commercial solvers have this functionality. But its value will depend greatly on the problem, dataset, and time to develop a good heuristic.

All images unless otherwise noted are by the author. Full code is available in GitHub.