Techno Blender
Digitally Yours.

Understanding The Hyperplane Of scikit-learn’s SVC Model | by Jacob Toftgaard Rasmussen | May, 2022

0 93


How to interpret the coef_ attribute of the linear SVC from scikit-learn for a binary classification problem

Photo by Lisa Vanthournout on Unsplash

This post will teach you how to interpret the coef_ and intercept_ attributes of scikit-learn’s SVC, and how they can be used to make predictions for new data points.

I recently finished a project where I had to deploy an SVC in C. I trained an SVC in Python in order to do the heavy lifting of finding the hyperplanes in a high level language, and then I extracted the necessary values from that model.

In that process I found it a bit difficult to understand exactly how the values in the coef_ and intercept_ attributes should be interpreted, so that is exactly what I will show you in this post.

NOTE: This post will not include all the details of the math behind the SVC, instead it aims to give you the intuition and practical understanding of what is going on when using the model from scikit-learn.

The predicting function

Fitting an SVC means that we are solving an optimization problem. In other words, we are trying to maximize the margin between the hyperplane and the support vectors of the different labels. Once the optimal margin and hyperplane has been found we can use the following equations to predict the label for a new data point:

Equation for predicting the label as 1
Equation for predicting the label as -1

Where w is the coefficient coming from the fitted model’s coef_ attribute. x is the vector of the new datapoint that we want to classify. bis a bias term that we get from the model’s intercept_ attribute.

Remember that a single hyperplane is essentially just a line, and it can therefore only classify two classes, one on either side of it. Mathematically we can represent this as 1 and -1, (or 1 and 0, it doesn’t really matter), as seen in the equations above.

The equation works in the following way: We take the dot product of the coefficient and the new point and then we add the bias. If the result is greater than or equal to 0 then we classify the new point as label 1. Otherwise, if the result is below 0 then we classify the new point as label -1.

Example: SVC for a binary problem

To demonstrate the math that we have just seen and to get a first look at how we can extract the coefficients from the fitted model let’s take a look at an example:

Code example for a binary classification scnario using scikit-learn’s SVC — Created by author

The above code snippet creates some dummy data points that are clearly linearly seperable and are divided in two different classes. After fitting an SVC to the data in the variable clf, the data points and the hyperplane with support vectors are also plotted. This is the resulting plot:

Plot of the binary classification problem’s data points and hyperplane — Created by author

NOTE: sklearn.inspection.DecisionBoundaryDisplay is pretty cool and can be used to draw the hyperplane and support vectors for a binary classification problem (two labels).

Now let’s take a look at the coef_ and intercept_ attributes of the fitted clf model from before.

print(clf.coef_)
print(clf.intercept_)
>> [[ 0.39344262 -0.32786885]] #This is the w from the equation
>> [-0.1147541] #This is the b from the equation

We will come back to them shortly, but first let’s introduce two new data points that we are going to classify.

new_point_1 = np.array([[-1.0, 2.5]])
new_point_2 = np.array([[2, -2.5]])
plt.scatter(new_point_1[:, 0], new_point_1[:, 1], c='blue', s=20)
plt.scatter(new_point_2[:, 0], new_point_2[:, 1], c='red', s=20)
plt.show()

If we execute this code in continuation of the first code gist shown in the post then we get the following plot that includes the two new data points colored blue (new_point_1, top left) and red (new_point_2, bottom right).

Plot of the binary classification problem’s data points, hyperplane, and two new points — Created by author

Using the fitted model we can classify these points by calling the predict function.

print(clf.predict(new_point_1))
print(clf.predict(new_point_2))
>> [0] #Purple (result is less than 0)
>> [1] #Yellow (result is greater than or equal to 0)

A manual calculation mimicking the predict function

In order to make that classification the model uses the equation we have seen previously. We can make a calculation “by hand” to see if we get the same results.

Reminder:
coef_ was [[ 0.39344262 -0.32786885]]
intercept_
was [-0.1147541]
new_point_1
was [[-1.0, 2.5]]
new_point_2
was [[2.0, -2.5]]

Calculating the dot product and adding the bias can be done like this:

print(np.dot(clf.coef_[0], new_point_1[0]) + clf.intercept_)
print(np.dot(clf.coef_[0], new_point_2[0]) + clf.intercept_)
>> [-1.32786885] #Purple (result is less than 0)
>> [1.49180328] #Yellow (result is greater than or equal to 0)

Voilá! We make the same classifications as the predict function did.

I hope that was clear and easy to follow. This was not a deep look into how the SVC model works, but just enough to get the essential understanding of what is going on when making a classification.

Things become more complicated when the classification problem is not binary but multiclass instead. I will be writing a follow up post where I explain how to interpret the coefficients of such a model.

If you have any feedback or questions then please don’t hestitate to reach out to me.

Thanks for reading!


How to interpret the coef_ attribute of the linear SVC from scikit-learn for a binary classification problem

Photo by Lisa Vanthournout on Unsplash

This post will teach you how to interpret the coef_ and intercept_ attributes of scikit-learn’s SVC, and how they can be used to make predictions for new data points.

I recently finished a project where I had to deploy an SVC in C. I trained an SVC in Python in order to do the heavy lifting of finding the hyperplanes in a high level language, and then I extracted the necessary values from that model.

In that process I found it a bit difficult to understand exactly how the values in the coef_ and intercept_ attributes should be interpreted, so that is exactly what I will show you in this post.

NOTE: This post will not include all the details of the math behind the SVC, instead it aims to give you the intuition and practical understanding of what is going on when using the model from scikit-learn.

The predicting function

Fitting an SVC means that we are solving an optimization problem. In other words, we are trying to maximize the margin between the hyperplane and the support vectors of the different labels. Once the optimal margin and hyperplane has been found we can use the following equations to predict the label for a new data point:

Equation for predicting the label as 1
Equation for predicting the label as -1

Where w is the coefficient coming from the fitted model’s coef_ attribute. x is the vector of the new datapoint that we want to classify. bis a bias term that we get from the model’s intercept_ attribute.

Remember that a single hyperplane is essentially just a line, and it can therefore only classify two classes, one on either side of it. Mathematically we can represent this as 1 and -1, (or 1 and 0, it doesn’t really matter), as seen in the equations above.

The equation works in the following way: We take the dot product of the coefficient and the new point and then we add the bias. If the result is greater than or equal to 0 then we classify the new point as label 1. Otherwise, if the result is below 0 then we classify the new point as label -1.

Example: SVC for a binary problem

To demonstrate the math that we have just seen and to get a first look at how we can extract the coefficients from the fitted model let’s take a look at an example:

Code example for a binary classification scnario using scikit-learn’s SVC — Created by author

The above code snippet creates some dummy data points that are clearly linearly seperable and are divided in two different classes. After fitting an SVC to the data in the variable clf, the data points and the hyperplane with support vectors are also plotted. This is the resulting plot:

Plot of the binary classification problem’s data points and hyperplane — Created by author

NOTE: sklearn.inspection.DecisionBoundaryDisplay is pretty cool and can be used to draw the hyperplane and support vectors for a binary classification problem (two labels).

Now let’s take a look at the coef_ and intercept_ attributes of the fitted clf model from before.

print(clf.coef_)
print(clf.intercept_)
>> [[ 0.39344262 -0.32786885]] #This is the w from the equation
>> [-0.1147541] #This is the b from the equation

We will come back to them shortly, but first let’s introduce two new data points that we are going to classify.

new_point_1 = np.array([[-1.0, 2.5]])
new_point_2 = np.array([[2, -2.5]])
plt.scatter(new_point_1[:, 0], new_point_1[:, 1], c='blue', s=20)
plt.scatter(new_point_2[:, 0], new_point_2[:, 1], c='red', s=20)
plt.show()

If we execute this code in continuation of the first code gist shown in the post then we get the following plot that includes the two new data points colored blue (new_point_1, top left) and red (new_point_2, bottom right).

Plot of the binary classification problem’s data points, hyperplane, and two new points — Created by author

Using the fitted model we can classify these points by calling the predict function.

print(clf.predict(new_point_1))
print(clf.predict(new_point_2))
>> [0] #Purple (result is less than 0)
>> [1] #Yellow (result is greater than or equal to 0)

A manual calculation mimicking the predict function

In order to make that classification the model uses the equation we have seen previously. We can make a calculation “by hand” to see if we get the same results.

Reminder:
coef_ was [[ 0.39344262 -0.32786885]]
intercept_
was [-0.1147541]
new_point_1
was [[-1.0, 2.5]]
new_point_2
was [[2.0, -2.5]]

Calculating the dot product and adding the bias can be done like this:

print(np.dot(clf.coef_[0], new_point_1[0]) + clf.intercept_)
print(np.dot(clf.coef_[0], new_point_2[0]) + clf.intercept_)
>> [-1.32786885] #Purple (result is less than 0)
>> [1.49180328] #Yellow (result is greater than or equal to 0)

Voilá! We make the same classifications as the predict function did.

I hope that was clear and easy to follow. This was not a deep look into how the SVC model works, but just enough to get the essential understanding of what is going on when making a classification.

Things become more complicated when the classification problem is not binary but multiclass instead. I will be writing a follow up post where I explain how to interpret the coefficients of such a model.

If you have any feedback or questions then please don’t hestitate to reach out to me.

Thanks for reading!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment