Tweaking a model for lower False Predictions | by Gustavo Santos | Dec, 2022


When creating a classification model, many algorithms offer the function predict_proba() to give us the probability of that observation being classified under each class. Thus, it is common to see an output like this:

[0.925, 0.075]

In the previous case, the model is 92.5% sure that the observation pertains to class 0, and only 7.5% of chance to be from class 1.

If we, therefore, request this same model to give us a binary prediction using the predict() function, we’ll just get a [0] as the result, correct?

In this example, it is most likely that we would not want the model predicting the observation as class 1, given it has only a small chance of being it. But let’s say we have a prediction for another observation and the result is as follows:

[0.480, 0.520]

No what?

Certainly, the rough cut prediction from many models will give us the result [1]. But is it the best decision? Sometimes, yes. Other times, not that much.

In this post, we will learn how to use the catboost package in Python to provide us with the best threshold value for a classification, based on the amount of False Positive [FPR] or False Negative Rate [FNR] that we understand as acceptable for our use case.

To contextualize this article, let’s understand why we would want to change the threshold from the default 50% cut to another number.

The best example we have is from the Healthcare industry. We know that many lab exams diagnostics and medicine tests rely on machine learning to help specialist to come up with the most precise answer. After all, in this industry, every percentage point counts for one’s life.

So let’s say that we are working with data to diagnose breast cancer. Talking to the stakeholders, we have reached an agreement that we want our model to give at maximum 1% of false negatives. We want to make very sure that a person is healthy to say it is a negative for breast cancer. If there is a doubt, we will classify it as positive and recommend a second examination or a different confirmation test.

As you might have concluded already, making this we will reduce our model’s accuracy, since we will increase the number of false positives, but that is acceptable since the person can always check again and make other examinations to confirm either that is a true positive or not. On the other hand, we won’t miss anyone that has the disease and received a negative result.

Photo by Towfiqu barbhuiya on Unsplash

You will find the entire code for this exercise in my GitHub repository, here.

To install catboost, use pip install catboost. Some imports needed are listed next.

# Basics
import pandas as pd
import numpy as np
# Visualizations
import plotly.express as px
# CatBoost
from catboost import CatBoostClassifier
from catboost import Pool
# Train test
from sklearn.model_selection import train_test_split
# Metrics
from sklearn.metrics import confusion_matrix, f1_score

Dataset

The data to be used is the famous toy dataset Breast Cancer, native from sklearn.

# Dataset
from sklearn.datasets import load_breast_cancer

# Load data
data = load_breast_cancer()

# X
X = pd.DataFrame(data.data, columns=data.feature_names)
# y
y = data.target

As you might or might not know, this dataset is fairly ready to roll. There’s not much to be explored or transformed before modeling. And that is not our purpose here to, so I will just move on with the code.

Train Test Split

Let’s split the data for training and test.

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f'Train shapes: {X_train.shape} | {y_train.shape}')
print(f'Test shapes: {X_test.shape} | {y_test.shape}')

Train shapes: (455, 30) | (455,)
Test shapes: (114, 30) | (114,)

First Model

Next, we will train the first model with CatBoostClassifier.

# Creating a Pool for training and validation sets
train_pool = Pool( data=X_train, label=y_train)
test_pool = Pool( data=X_test, label=y_test)

# Fit
model = CatBoostClassifier(iterations=500)
model.fit(train_pool, eval_set=test_pool, verbose=100)

In the sequence, here is the F1 score: 97%.

# Predict
preds = model.predict(X_test)
f1_score(y_test, preds)

0.971830985915493

Excellent. But our model is a bit complex, given it has over 30 features. Let’s try to reduce that without losing too much performance. Catboost has the feature_importances_ attribute that can help us determining the best ones to choose.

# Feature importances to dataframe
feature_importances = (
pd.DataFrame({'feature': data.feature_names,
'importance': model.feature_importances_})
.sort_values(by='importance', ascending=False)
)
# Plot
px.bar(feature_importances,
x= data.feature_names, y=model.feature_importances_,
height=600, width=1000).update_layout(xaxis={'categoryorder':'total descending'})
Cut on importances under 3. Image by the author.

Without using any fancy technique, I just arbitrarily chose to keep any feature with 3+ importance. This kept 10 of them, to the left of the red line.

Simpler Model

Let’s train the simpler model and evaluate the score.

# Simpler model
features = feature_importances.feature[:10]
# Creating a Pool for training and validation sets
train_pool2 = Pool( data=X_train[features], label=y_train)
test_pool2 = Pool( data=X_test[features], label=y_test)

# Model
model2 = CatBoostClassifier(iterations=600)
model2.fit(train_pool2, eval_set=test_pool2, verbose=100)

# Score
preds2 = model2.predict(test_pool2)
f1_score(y_test, preds2)

0.979020979020979

Nice. Same F1 score: 97%.

As we are working with medical diagnosis, we should not be very tolerant to false negatives. We would want our model to say the patient is healthy only if we have a huge certainty that he is actually healthy.

But we know that CatBoost algorithm uses the standard 50% threshold to predict the outcome. Meaning that, if the positive probability is under 50%, the patient will be diagnosed as negative for breast cancer. But we can tweak that number to make it give a negative prediction only for a higher amount of certainty.

Let’s see how’s that done. Here are a few predictions from our model.

# Regular predictions
default_preds = pd.DataFrame(model2.predict_proba(test_pool2).round(3))
default_preds['classification'] = model2.predict(test_pool2)
default_preds.sample(10)
Predictions probabilities from our model with 50% threshold. Image by the author.

Notice that the observation 82 has 63.4% of chance to be a negative, but it also has 36% of chance to be a positive, what could be considered high for medical standards. We want this case to be classified as positive, even knowing that it can be false. So we can send this person for another test on a later date. So let’s set our False Negative Rate [FNR] tolerance as 1%.

from catboost.utils import select_threshold
# Finding the right threshold
print(select_threshold(model2, test_pool2, FNR=0.01))

0.1420309044590601

Great. Now that CatBoost calculated the number, the new threshold to be classified as negative is 1–0.142 = 0.858. In simpler terms, the probability for class 0 must be over 85.8% to be marked as 0, otherwise it will be classified as 1.

Ok. So I have created a custom function predict_threshold(df, threshold, rate_type)(visit my GitHub to check out the code) that takes as input the data frame with the explaining variables, the threshold desired and the rate type (FNR or FPR) and returns the classifications using the new cut.

# Predict
new_predictions = predict_threshold(df= test_pool2,
threshold= 0.01,
rate_type= "FNR")

# Standard predictions
normal_predictions = model2.predict(test_pool2)

That same observation on index 82, previously classified as negative (0) with 63% probability is now classified as a positive (1).

That same observation #82 is now a positive. Image by the author.

Here is the confusion matrix with the standard 50% threshold.

# Confusion Matrix 50% standard threshold
pd.DataFrame( confusion_matrix(y_true=y_test, y_pred=normal_predictions) )
Classification 50% threshold. Image by the author.

And this is the new classification with the updated threshold.

# Confusion Matrix 1% of false negatives allowed threshold
pd.DataFrame( confusion_matrix(y_true=y_test, y_pred=new_predictions) )
Classification 85.8% threshold. Image by the author.

Observe the bottom left cell [true=1, pred=0, FN] from both confusion matrices. The top one shows one false negative. The person actually has cancer and the model classified as negative. That problem was solved in the new model, where there was no false negatives. The flipside is that we increased the false positive too, by one. So, it is all about trade-offs, like many things in Data Science.

The FPR (Type I error ) and FNR (Type II error) are complementary. As you decrease one, necessarily the other will have to go up.

The same method can be applied to decrease the FPR, if the need of very low amount of false positives is what your project requires.

In summary, what we have learned in this post was:

  • The default cut-off threshold for classification is 50% of probability.
  • This number can be tweaked to decrease the number of false positives or false negatives.
  • FPR (Type I error) and FNR (Type II error) are complementary. Decreasing one will increase the other.
  • Use catboost package to calculate the threshold value for probability cut-off for classification.
  • Ex: predict_threshold(test_pool2, threshold= 0.01, rate_type=”FNR”)

If you liked this content, follow my blog or find me on LinkedIn.

Become a Medium member using this referral code (part of your subscription will come to me and motivate me to keep creating content).


When creating a classification model, many algorithms offer the function predict_proba() to give us the probability of that observation being classified under each class. Thus, it is common to see an output like this:

[0.925, 0.075]

In the previous case, the model is 92.5% sure that the observation pertains to class 0, and only 7.5% of chance to be from class 1.

If we, therefore, request this same model to give us a binary prediction using the predict() function, we’ll just get a [0] as the result, correct?

In this example, it is most likely that we would not want the model predicting the observation as class 1, given it has only a small chance of being it. But let’s say we have a prediction for another observation and the result is as follows:

[0.480, 0.520]

No what?

Certainly, the rough cut prediction from many models will give us the result [1]. But is it the best decision? Sometimes, yes. Other times, not that much.

In this post, we will learn how to use the catboost package in Python to provide us with the best threshold value for a classification, based on the amount of False Positive [FPR] or False Negative Rate [FNR] that we understand as acceptable for our use case.

To contextualize this article, let’s understand why we would want to change the threshold from the default 50% cut to another number.

The best example we have is from the Healthcare industry. We know that many lab exams diagnostics and medicine tests rely on machine learning to help specialist to come up with the most precise answer. After all, in this industry, every percentage point counts for one’s life.

So let’s say that we are working with data to diagnose breast cancer. Talking to the stakeholders, we have reached an agreement that we want our model to give at maximum 1% of false negatives. We want to make very sure that a person is healthy to say it is a negative for breast cancer. If there is a doubt, we will classify it as positive and recommend a second examination or a different confirmation test.

As you might have concluded already, making this we will reduce our model’s accuracy, since we will increase the number of false positives, but that is acceptable since the person can always check again and make other examinations to confirm either that is a true positive or not. On the other hand, we won’t miss anyone that has the disease and received a negative result.

Photo by Towfiqu barbhuiya on Unsplash

You will find the entire code for this exercise in my GitHub repository, here.

To install catboost, use pip install catboost. Some imports needed are listed next.

# Basics
import pandas as pd
import numpy as np
# Visualizations
import plotly.express as px
# CatBoost
from catboost import CatBoostClassifier
from catboost import Pool
# Train test
from sklearn.model_selection import train_test_split
# Metrics
from sklearn.metrics import confusion_matrix, f1_score

Dataset

The data to be used is the famous toy dataset Breast Cancer, native from sklearn.

# Dataset
from sklearn.datasets import load_breast_cancer

# Load data
data = load_breast_cancer()

# X
X = pd.DataFrame(data.data, columns=data.feature_names)
# y
y = data.target

As you might or might not know, this dataset is fairly ready to roll. There’s not much to be explored or transformed before modeling. And that is not our purpose here to, so I will just move on with the code.

Train Test Split

Let’s split the data for training and test.

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f'Train shapes: {X_train.shape} | {y_train.shape}')
print(f'Test shapes: {X_test.shape} | {y_test.shape}')

Train shapes: (455, 30) | (455,)
Test shapes: (114, 30) | (114,)

First Model

Next, we will train the first model with CatBoostClassifier.

# Creating a Pool for training and validation sets
train_pool = Pool( data=X_train, label=y_train)
test_pool = Pool( data=X_test, label=y_test)

# Fit
model = CatBoostClassifier(iterations=500)
model.fit(train_pool, eval_set=test_pool, verbose=100)

In the sequence, here is the F1 score: 97%.

# Predict
preds = model.predict(X_test)
f1_score(y_test, preds)

0.971830985915493

Excellent. But our model is a bit complex, given it has over 30 features. Let’s try to reduce that without losing too much performance. Catboost has the feature_importances_ attribute that can help us determining the best ones to choose.

# Feature importances to dataframe
feature_importances = (
pd.DataFrame({'feature': data.feature_names,
'importance': model.feature_importances_})
.sort_values(by='importance', ascending=False)
)
# Plot
px.bar(feature_importances,
x= data.feature_names, y=model.feature_importances_,
height=600, width=1000).update_layout(xaxis={'categoryorder':'total descending'})
Cut on importances under 3. Image by the author.

Without using any fancy technique, I just arbitrarily chose to keep any feature with 3+ importance. This kept 10 of them, to the left of the red line.

Simpler Model

Let’s train the simpler model and evaluate the score.

# Simpler model
features = feature_importances.feature[:10]
# Creating a Pool for training and validation sets
train_pool2 = Pool( data=X_train[features], label=y_train)
test_pool2 = Pool( data=X_test[features], label=y_test)

# Model
model2 = CatBoostClassifier(iterations=600)
model2.fit(train_pool2, eval_set=test_pool2, verbose=100)

# Score
preds2 = model2.predict(test_pool2)
f1_score(y_test, preds2)

0.979020979020979

Nice. Same F1 score: 97%.

As we are working with medical diagnosis, we should not be very tolerant to false negatives. We would want our model to say the patient is healthy only if we have a huge certainty that he is actually healthy.

But we know that CatBoost algorithm uses the standard 50% threshold to predict the outcome. Meaning that, if the positive probability is under 50%, the patient will be diagnosed as negative for breast cancer. But we can tweak that number to make it give a negative prediction only for a higher amount of certainty.

Let’s see how’s that done. Here are a few predictions from our model.

# Regular predictions
default_preds = pd.DataFrame(model2.predict_proba(test_pool2).round(3))
default_preds['classification'] = model2.predict(test_pool2)
default_preds.sample(10)
Predictions probabilities from our model with 50% threshold. Image by the author.

Notice that the observation 82 has 63.4% of chance to be a negative, but it also has 36% of chance to be a positive, what could be considered high for medical standards. We want this case to be classified as positive, even knowing that it can be false. So we can send this person for another test on a later date. So let’s set our False Negative Rate [FNR] tolerance as 1%.

from catboost.utils import select_threshold
# Finding the right threshold
print(select_threshold(model2, test_pool2, FNR=0.01))

0.1420309044590601

Great. Now that CatBoost calculated the number, the new threshold to be classified as negative is 1–0.142 = 0.858. In simpler terms, the probability for class 0 must be over 85.8% to be marked as 0, otherwise it will be classified as 1.

Ok. So I have created a custom function predict_threshold(df, threshold, rate_type)(visit my GitHub to check out the code) that takes as input the data frame with the explaining variables, the threshold desired and the rate type (FNR or FPR) and returns the classifications using the new cut.

# Predict
new_predictions = predict_threshold(df= test_pool2,
threshold= 0.01,
rate_type= "FNR")

# Standard predictions
normal_predictions = model2.predict(test_pool2)

That same observation on index 82, previously classified as negative (0) with 63% probability is now classified as a positive (1).

That same observation #82 is now a positive. Image by the author.

Here is the confusion matrix with the standard 50% threshold.

# Confusion Matrix 50% standard threshold
pd.DataFrame( confusion_matrix(y_true=y_test, y_pred=normal_predictions) )
Classification 50% threshold. Image by the author.

And this is the new classification with the updated threshold.

# Confusion Matrix 1% of false negatives allowed threshold
pd.DataFrame( confusion_matrix(y_true=y_test, y_pred=new_predictions) )
Classification 85.8% threshold. Image by the author.

Observe the bottom left cell [true=1, pred=0, FN] from both confusion matrices. The top one shows one false negative. The person actually has cancer and the model classified as negative. That problem was solved in the new model, where there was no false negatives. The flipside is that we increased the false positive too, by one. So, it is all about trade-offs, like many things in Data Science.

The FPR (Type I error ) and FNR (Type II error) are complementary. As you decrease one, necessarily the other will have to go up.

The same method can be applied to decrease the FPR, if the need of very low amount of false positives is what your project requires.

In summary, what we have learned in this post was:

  • The default cut-off threshold for classification is 50% of probability.
  • This number can be tweaked to decrease the number of false positives or false negatives.
  • FPR (Type I error) and FNR (Type II error) are complementary. Decreasing one will increase the other.
  • Use catboost package to calculate the threshold value for probability cut-off for classification.
  • Ex: predict_threshold(test_pool2, threshold= 0.01, rate_type=”FNR”)

If you liked this content, follow my blog or find me on LinkedIn.

Become a Medium member using this referral code (part of your subscription will come to me and motivate me to keep creating content).

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsDecFalseGustavomachine learningModelPredictionsSantosTechnoblendertweaking
Comments (0)
Add Comment