Techno Blender
Digitally Yours.

A Practical Approach to Evaluating Positive-Unlabeled (PU) Classifiers in Real-World Business Analytics | by Volodymyr Holomb | Mar, 2023

0 37


Made by DALL-E-2 according to the author’s description

As businesses increasingly employ machine learning models on collected data, one challenge that arises is the presence of positive-unlabeled (PU) datasets. These datasets contain only a small portion of labelled data, with the remaining samples being unlabeled. While unlabeled samples are typically considered negative, some of them may be positive. PU datasets are used in various business contexts, such as predicting customer churn or upsell opportunities, sales forecasting, and fraud detection.

Evaluating machine learning algorithms on PU datasets can be difficult because traditional metrics may not accurately reflect the model’s performance. For example, simply holding out the positive samples for testing and adding unlabeled entries as the negative class can result in a highly skewed confusion matrix inflated by false positives. This can occur when the model detects positive samples in the testing set, but their corresponding labels are negative.

To address the issue, our team adopted a practical approach that estimates standard binary classification metrics on PU datasets by using information about the expected frequency of positive samples. Our approach involves using the prior probability of the positive class (estimated during the fitting of the self-learning classifier) to adjust the observed false positives and true positives observed on the test. This approach enables a more accurate evaluation of the model’s performance on PU datasets, even when the positive class is significantly underrepresented.

To demonstrate the efficacy of our approach and run an experiment in a controlled setting, we first created a synthetic binary classification dataset using sci-kit-learn’s make_classification function. The positive samples represent the minor class in the data, and a PU learning scenario is simulated by randomly selecting a subset of the positive samples and removing their labels.

In a real-world business scenario, the dataset may typically contain such a preset ratio of labelled / unlabelled entries. For example, the dataset used to predict customer churn for the coming year may contain labelled customers from the previous year who did not sign a new yearly contract, as well as current customers who have similar characteristics as the churned customers but have not yet churned. In this case, the dataset may contain up to 40% churned customers, but only half of them will be labelled as such (showing the annual churn rate of 20%).

Image by the author

We then split the data into training and testing sets using the train_test_split function. The features X and a pseudo-labelled version of the target variable y_pu are passed to the classifier for training. To evaluate the classifier’s performance, we compute standard machine learning metrics such as accuracy, precision, and recall on the unlabeled version of the testing set, and compare them further to the corresponding metrics computed on the original labelled version.

Below we provide a code snippet that demonstrates the implementation of our proposed approach for evaluating classifier performance on PU datasets.

Our compute_confusion_matrix function determines the size of the testing data and identifies the indices of positive samples in the training set. The model’s probability estimates of the positive samples in the training set are then obtained, and their mean is computed, representing the probability that a positive sample is labelled.

Next, the function applies the fitted ImPULSE model to predict the probabilities of the positive class for the testing data and creates a confusion matrix using sci-kit-learn’s confusion_matrix function. If the model’s prior probability of the positive class is greater than zero, the function adjusts the confusion matrix to account for the potential presence of unlabeled positive samples in the testing data. The function estimates the expected number of false positives and true positives due to unlabeled entries. It then adjusts the confusion matrix accordingly.

To ensure that the resulting confusion matrix matches the size of the testing data, the function rounds and rescales it, adjusting the number of true negatives if needed.

After obtaining the adjusted confusion matrix, we can use it to calculate standard machine learning metrics to get more accurate, as far as possible, of the model’s performance.

You can find the corresponding demo notebook on Jovian and the full code in the GitHub repo.

We have proposed a practical approach for evaluating machine learning models on positive-unlabeled (PU) datasets commonly found in business scenarios. Traditional evaluation metrics may not accurately reflect the model’s performance on such datasets. The approach estimates standard binary classification metrics on PU datasets by using the prior probability of the positive class, enabling a more accurate evaluation of the model’s performance.

  1. Jain, Shantanu, et al. “Recovering True Classifier Performance in Positive-Unlabeled Learning.”, 2017
  2. Bekker Jessa, and Davis Jesse. “Learning from Positive and Unlabeled Data: a Survey.”, 2018
  3. Agmon Alon. “Semi-Supervised Classification of Unlabeled Data (PU Learning).”, 2022
  4. Saunders, Jack, and Freitas, A. “Evaluating the Predictive Performance of Positive-Unlabelled Classifiers: a Brief Critical Review and Practical Recommendations for Improvement.”, 2022


Made by DALL-E-2 according to the author’s description

As businesses increasingly employ machine learning models on collected data, one challenge that arises is the presence of positive-unlabeled (PU) datasets. These datasets contain only a small portion of labelled data, with the remaining samples being unlabeled. While unlabeled samples are typically considered negative, some of them may be positive. PU datasets are used in various business contexts, such as predicting customer churn or upsell opportunities, sales forecasting, and fraud detection.

Evaluating machine learning algorithms on PU datasets can be difficult because traditional metrics may not accurately reflect the model’s performance. For example, simply holding out the positive samples for testing and adding unlabeled entries as the negative class can result in a highly skewed confusion matrix inflated by false positives. This can occur when the model detects positive samples in the testing set, but their corresponding labels are negative.

To address the issue, our team adopted a practical approach that estimates standard binary classification metrics on PU datasets by using information about the expected frequency of positive samples. Our approach involves using the prior probability of the positive class (estimated during the fitting of the self-learning classifier) to adjust the observed false positives and true positives observed on the test. This approach enables a more accurate evaluation of the model’s performance on PU datasets, even when the positive class is significantly underrepresented.

To demonstrate the efficacy of our approach and run an experiment in a controlled setting, we first created a synthetic binary classification dataset using sci-kit-learn’s make_classification function. The positive samples represent the minor class in the data, and a PU learning scenario is simulated by randomly selecting a subset of the positive samples and removing their labels.

In a real-world business scenario, the dataset may typically contain such a preset ratio of labelled / unlabelled entries. For example, the dataset used to predict customer churn for the coming year may contain labelled customers from the previous year who did not sign a new yearly contract, as well as current customers who have similar characteristics as the churned customers but have not yet churned. In this case, the dataset may contain up to 40% churned customers, but only half of them will be labelled as such (showing the annual churn rate of 20%).

Image by the author

We then split the data into training and testing sets using the train_test_split function. The features X and a pseudo-labelled version of the target variable y_pu are passed to the classifier for training. To evaluate the classifier’s performance, we compute standard machine learning metrics such as accuracy, precision, and recall on the unlabeled version of the testing set, and compare them further to the corresponding metrics computed on the original labelled version.

Below we provide a code snippet that demonstrates the implementation of our proposed approach for evaluating classifier performance on PU datasets.

Our compute_confusion_matrix function determines the size of the testing data and identifies the indices of positive samples in the training set. The model’s probability estimates of the positive samples in the training set are then obtained, and their mean is computed, representing the probability that a positive sample is labelled.

Next, the function applies the fitted ImPULSE model to predict the probabilities of the positive class for the testing data and creates a confusion matrix using sci-kit-learn’s confusion_matrix function. If the model’s prior probability of the positive class is greater than zero, the function adjusts the confusion matrix to account for the potential presence of unlabeled positive samples in the testing data. The function estimates the expected number of false positives and true positives due to unlabeled entries. It then adjusts the confusion matrix accordingly.

To ensure that the resulting confusion matrix matches the size of the testing data, the function rounds and rescales it, adjusting the number of true negatives if needed.

After obtaining the adjusted confusion matrix, we can use it to calculate standard machine learning metrics to get more accurate, as far as possible, of the model’s performance.

You can find the corresponding demo notebook on Jovian and the full code in the GitHub repo.

We have proposed a practical approach for evaluating machine learning models on positive-unlabeled (PU) datasets commonly found in business scenarios. Traditional evaluation metrics may not accurately reflect the model’s performance on such datasets. The approach estimates standard binary classification metrics on PU datasets by using the prior probability of the positive class, enabling a more accurate evaluation of the model’s performance.

  1. Jain, Shantanu, et al. “Recovering True Classifier Performance in Positive-Unlabeled Learning.”, 2017
  2. Bekker Jessa, and Davis Jesse. “Learning from Positive and Unlabeled Data: a Survey.”, 2018
  3. Agmon Alon. “Semi-Supervised Classification of Unlabeled Data (PU Learning).”, 2022
  4. Saunders, Jack, and Freitas, A. “Evaluating the Predictive Performance of Positive-Unlabelled Classifiers: a Brief Critical Review and Practical Recommendations for Improvement.”, 2022

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment