Understanding The Hyperplane Of scikit-learn’s SVC Model | by Jacob Toftgaard Rasmussen | May, 2022
How to interpret the coef_ attribute of the linear SVC from scikit-learn for a binary classification problem
This post will teach you how to interpret the coef_
and intercept_
attributes of scikit-learn’s SVC, and how they can be used to make predictions for new data points.
I recently finished a project where I had to deploy an SVC in C. I trained an SVC in Python in order to do the heavy lifting of finding the hyperplanes in a high level language, and then I extracted the necessary values from that model.
In that process I found it a bit difficult to understand exactly how the values in the coef_
and intercept_
attributes should be interpreted, so that is exactly what I will show you in this post.
NOTE: This post will not include all the details of the math behind the SVC, instead it aims to give you the intuition and practical understanding of what is going on when using the model from scikit-learn.
The predicting function
Fitting an SVC means that we are solving an optimization problem. In other words, we are trying to maximize the margin between the hyperplane and the support vectors of the different labels. Once the optimal margin and hyperplane has been found we can use the following equations to predict the label for a new data point:
Where w
is the coefficient coming from the fitted model’s coef_
attribute. x
is the vector of the new datapoint that we want to classify. b
is a bias term that we get from the model’s intercept_
attribute.
Remember that a single hyperplane is essentially just a line, and it can therefore only classify two classes, one on either side of it. Mathematically we can represent this as 1 and -1, (or 1 and 0, it doesn’t really matter), as seen in the equations above.
The equation works in the following way: We take the dot product of the coefficient and the new point and then we add the bias. If the result is greater than or equal to 0 then we classify the new point as label 1. Otherwise, if the result is below 0 then we classify the new point as label -1.
Example: SVC for a binary problem
To demonstrate the math that we have just seen and to get a first look at how we can extract the coefficients from the fitted model let’s take a look at an example:
The above code snippet creates some dummy data points that are clearly linearly seperable and are divided in two different classes. After fitting an SVC to the data in the variable clf
, the data points and the hyperplane with support vectors are also plotted. This is the resulting plot:
NOTE:
sklearn.inspection.DecisionBoundaryDisplay
is pretty cool and can be used to draw the hyperplane and support vectors for a binary classification problem (two labels).
Now let’s take a look at the coef_
and intercept_
attributes of the fitted clf
model from before.
print(clf.coef_)
print(clf.intercept_)>> [[ 0.39344262 -0.32786885]] #This is the w from the equation
>> [-0.1147541] #This is the b from the equation
We will come back to them shortly, but first let’s introduce two new data points that we are going to classify.
new_point_1 = np.array([[-1.0, 2.5]])
new_point_2 = np.array([[2, -2.5]])plt.scatter(new_point_1[:, 0], new_point_1[:, 1], c='blue', s=20)
plt.scatter(new_point_2[:, 0], new_point_2[:, 1], c='red', s=20)plt.show()
If we execute this code in continuation of the first code gist shown in the post then we get the following plot that includes the two new data points colored blue (new_point_1
, top left) and red (new_point_2
, bottom right).
Using the fitted model we can classify these points by calling the predict
function.
print(clf.predict(new_point_1))
print(clf.predict(new_point_2))>> [0] #Purple (result is less than 0)
>> [1] #Yellow (result is greater than or equal to 0)
A manual calculation mimicking the predict function
In order to make that classification the model uses the equation we have seen previously. We can make a calculation “by hand” to see if we get the same results.
Reminder:coef_
was [[ 0.39344262 -0.32786885]]
was
intercept_[-0.1147541]
was
new_point_1[[-1.0, 2.5]]
was
new_point_2[[2.0, -2.5]]
Calculating the dot product and adding the bias can be done like this:
print(np.dot(clf.coef_[0], new_point_1[0]) + clf.intercept_)
print(np.dot(clf.coef_[0], new_point_2[0]) + clf.intercept_)>> [-1.32786885] #Purple (result is less than 0)
>> [1.49180328] #Yellow (result is greater than or equal to 0)
Voilá! We make the same classifications as the predict function did.
I hope that was clear and easy to follow. This was not a deep look into how the SVC model works, but just enough to get the essential understanding of what is going on when making a classification.
Things become more complicated when the classification problem is not binary but multiclass instead. I will be writing a follow up post where I explain how to interpret the coefficients of such a model.
If you have any feedback or questions then please don’t hestitate to reach out to me.
Thanks for reading!
How to interpret the coef_ attribute of the linear SVC from scikit-learn for a binary classification problem
This post will teach you how to interpret the coef_
and intercept_
attributes of scikit-learn’s SVC, and how they can be used to make predictions for new data points.
I recently finished a project where I had to deploy an SVC in C. I trained an SVC in Python in order to do the heavy lifting of finding the hyperplanes in a high level language, and then I extracted the necessary values from that model.
In that process I found it a bit difficult to understand exactly how the values in the coef_
and intercept_
attributes should be interpreted, so that is exactly what I will show you in this post.
NOTE: This post will not include all the details of the math behind the SVC, instead it aims to give you the intuition and practical understanding of what is going on when using the model from scikit-learn.
The predicting function
Fitting an SVC means that we are solving an optimization problem. In other words, we are trying to maximize the margin between the hyperplane and the support vectors of the different labels. Once the optimal margin and hyperplane has been found we can use the following equations to predict the label for a new data point:
Where w
is the coefficient coming from the fitted model’s coef_
attribute. x
is the vector of the new datapoint that we want to classify. b
is a bias term that we get from the model’s intercept_
attribute.
Remember that a single hyperplane is essentially just a line, and it can therefore only classify two classes, one on either side of it. Mathematically we can represent this as 1 and -1, (or 1 and 0, it doesn’t really matter), as seen in the equations above.
The equation works in the following way: We take the dot product of the coefficient and the new point and then we add the bias. If the result is greater than or equal to 0 then we classify the new point as label 1. Otherwise, if the result is below 0 then we classify the new point as label -1.
Example: SVC for a binary problem
To demonstrate the math that we have just seen and to get a first look at how we can extract the coefficients from the fitted model let’s take a look at an example:
The above code snippet creates some dummy data points that are clearly linearly seperable and are divided in two different classes. After fitting an SVC to the data in the variable clf
, the data points and the hyperplane with support vectors are also plotted. This is the resulting plot:
NOTE:
sklearn.inspection.DecisionBoundaryDisplay
is pretty cool and can be used to draw the hyperplane and support vectors for a binary classification problem (two labels).
Now let’s take a look at the coef_
and intercept_
attributes of the fitted clf
model from before.
print(clf.coef_)
print(clf.intercept_)>> [[ 0.39344262 -0.32786885]] #This is the w from the equation
>> [-0.1147541] #This is the b from the equation
We will come back to them shortly, but first let’s introduce two new data points that we are going to classify.
new_point_1 = np.array([[-1.0, 2.5]])
new_point_2 = np.array([[2, -2.5]])plt.scatter(new_point_1[:, 0], new_point_1[:, 1], c='blue', s=20)
plt.scatter(new_point_2[:, 0], new_point_2[:, 1], c='red', s=20)plt.show()
If we execute this code in continuation of the first code gist shown in the post then we get the following plot that includes the two new data points colored blue (new_point_1
, top left) and red (new_point_2
, bottom right).
Using the fitted model we can classify these points by calling the predict
function.
print(clf.predict(new_point_1))
print(clf.predict(new_point_2))>> [0] #Purple (result is less than 0)
>> [1] #Yellow (result is greater than or equal to 0)
A manual calculation mimicking the predict function
In order to make that classification the model uses the equation we have seen previously. We can make a calculation “by hand” to see if we get the same results.
Reminder:coef_
was [[ 0.39344262 -0.32786885]]
was
intercept_[-0.1147541]
was
new_point_1[[-1.0, 2.5]]
was
new_point_2[[2.0, -2.5]]
Calculating the dot product and adding the bias can be done like this:
print(np.dot(clf.coef_[0], new_point_1[0]) + clf.intercept_)
print(np.dot(clf.coef_[0], new_point_2[0]) + clf.intercept_)>> [-1.32786885] #Purple (result is less than 0)
>> [1.49180328] #Yellow (result is greater than or equal to 0)
Voilá! We make the same classifications as the predict function did.
I hope that was clear and easy to follow. This was not a deep look into how the SVC model works, but just enough to get the essential understanding of what is going on when making a classification.
Things become more complicated when the classification problem is not binary but multiclass instead. I will be writing a follow up post where I explain how to interpret the coefficients of such a model.
If you have any feedback or questions then please don’t hestitate to reach out to me.
Thanks for reading!