XGBoost: Transfer business knowledge using monotonic constraints | by Saupin Guillaume | Nov, 2022

By Jessie Hobb On Nov 11, 2022

A few days ago, I was discussing with a good friend of mine, Julia Simon, about taking into account business knowledge in a decision tree-based model.

She had in mind a very simple problem, where the value to predict was strictly increasing with a given feature. She wanted to know if it was possible to force the model to ensure this constraint.

The answer is yes, and it has been added to XGBoost a long time ago (around December 2017 according to XGBoost changelogs), but it’s not a very well-known feature of XGBoost: monotonic constraint.

Let’s see how this has been implemented, what are the underlying mathematics, and how it works.

Let’s start by defining monotonic constraint . First, in mathematics, monotonic is a term that applies to functions, and means that when the input of that function increase, the output of the function either strictly increases or decreases.

The function x³ for instance is strictly monotonic:

x³ is strictly monotonic. Pot by the author.

On the opposite, the x² function is not monotonic, at least on its whole domain R:

x² is not monotonic on R. Plot by the author.

Restricted to R+, x² is monotonic, and the same stands for R-.

Mathematically speaking, saying that f is monotonic mean that

f(x_1) > f(x_2) if x_1 > x_2 in the case of increasing monotonicity.

f(x_1) < f(x_2) if x_1 < x_2 in the case of decreasing monotonicity.

On many occasions, data scientists have prior knowledge of the relation between the value to predict and some features. For instance, the level of sales of bottled water is proportional to temperature, hence it could be interesting to enforce this constraint in a model that would predict sales of bottled water.

Using monotonic constraints is an easy way to add this kind of constraint to an XGBoost model.

Let’s look at a simple example. Let’s say that we are trying to model the following equation, where the value to predict y depends on x as follow :

y = 3*x.

This is a very simple relation, where y is strictly proportional to x . However, when collecting data in real life, noise is introduced and this can lead to data points that locally do not respect the relation. In these cases, it’s necessary to ensure that the model is monotonic, as is the theoretical formula.

The code below shows how to use XGBoost and monotonic constraints:

I’ve shown in a previous article how to implement Gradient Boosting for decision tree from scratch:

This code could be easily modified to integrate monotonic constraints. Handling constraints in code usually requires the development of a solver, and it’s generally quite complex code. Various approaches are possible. You can see in this article how such a solver can be implemented using an iterative approach based on geometry:

However, in the case of Gradient Boosting applied to decision trees, monotonicconstraints can be implemented quite easily. The simplicity of this implementation comes from the use of binary decision trees as the underlying model.

Indeed, the decision handled by each node is a comparison between a value and a threshold. Hence enforcing monotonicity simply required that this monotonicity property is respected at the decision node level.

For instance, if the right node contains rows whose column A is lesser than a threshold T, then the gain of the right node must be lesser than the gain of the left node.

How does XGBoost handle monotonic constraints?

To see how we can implement this kind of constraint, let’s see how XGBoost does it in its C++ code:

Extract from XGBoost code.

The code is in fact pretty simple. It just ensures that the monotonicity is respected at the gain level. If it’s not the case, the code artificially set the gain to negative_infinity to ensure that this splitting will not be kept.

Hence decision nodes that would not ensure monotonicity are discarded.

The code snippet below shows how to add monotonic constraints to an XGBoost model:

Train an XGBoost model with monotonic constraints. Code by the author

In this educational example, two XGBoost models are trained to learn a simple theoretical model where y = 6.66 x . Some strictly negative noise has been added to ensure that the training data are not monotone, i.e. sometimes y_j < y_i even though x_i < x_j.

The first model is trained without any constraint, whereas the second one adds a monotonicconstraint.

Note that this is enforced by defining the parameters monotone_constraint. This parameter is a tuple that must contain as many items as there are features in the model.

When the item c_iassociated with the feature f_i is 0, no constraint is applied. When c_i = 1 , an increasing monotonic constraint is enforced whereas when c_i = -1 , a decreasing monotonic constraint is enforced.

The resulting predictions are displayed in this plot:

Raw data, unconstrained and constrained predictions. Plot by the author.

Zooming on the plot offers a better picture of the effect of the constraint:

The constrained prediction, in green, is strictly increasing. Plot by the author.

It clearly shows that the model without constraint does not ensure monotonicity, as predictions are not always increasing. On the opposite, the constrained model generates only increasing predictions.