Techno Blender
Digitally Yours.

Rethinking Survival Analysis: How to Make your Model Produce Survival Curves | by Marco Cerliani | Nov, 2022

0 45


Time to Event Forecasting with Simple ML Approaches

Photo by Markus Spiske on Unsplash

In data-driven companies, time-to-event applications assume a crucial role in decision-making (also more than we can imagine). With time-to-event analysis, we are referring to all the techniques used to measure the time which elapses until some events of interest happen. This straightforward definition may immediately outline all the benefits of developing time-event applications in business contexts (and not only).

Time-to-event origins are related to the medical field to answer questions like: “how long do the individuals under analysis live?”. For this reason, the terms survival and time-to-event are usually used as synonyms. Nowadays, with the adoption of machine learning on large scale, it’s often to find applications of survival methodologies also in companies outside the medical/clinical sector. A manufacturer may be interested in estimating the expected life of some engines; a service provider may need to compute the expected life of their customers; a financial institution may evaluate the borrower’s insolvency risk over time.

Practically speaking, to model a time-to-event problem, there is a proper set of methodologies. From classical linear statistical methods to more sophisticated machine learning approaches up to cutting-edge deep-learning solutions, a lot of survival frameworks have been released. All of them are awesome but they must respect the assumption proper to the survival modeling theory which may result in low adaptability or limitations for real use cases. For these reasons, a convenient way to deal with survival analysis may consist of treating time-to-event modeling as a classification problem. The idea is not new and can be found also in these two works [1], [2] (citations at the bottom of the post).

In this post, we propose a generalization to carry out survival analysis with predictive capabilities. We aim to model the elapsed time, between the starting time and the event of interest, as a multiple binary classification problem. With proper and simple post-processing, we can retrieve reliable and robust individual survival curves. We can do it using our favorite classification algorithm, making parameters search as always, and considering the possibility to calibrate our outcomes to make them more trustable.

Arranging the data at our disposal to implement a survival predictive application is straightforward and doesn’t require particular effort. We must have some input features (numerical or categorical) and a raw target, like in a standard regression/classification tabular task. In this context, the target represents how much time elapsed from the monitoring start to the event happening.

Event time grid (on the left). Sorted event time grid (on the right). [image by the author]

Let’s imagine being a company that offers an online subscription service. We may be interested in computing the expected life of our customers at subscription time. In other words, when a new customer lands on our platform and subscribes to get the services, we would like to know how much time she/he will remain our client. We can carry out the task by developing a survival approach that outputs probability survival curves (one for each customer). Survival curves are sequences of monotonic probabilities. For each time step, we have numbers between 0 and 1 that state the survival likelihood of some events (subscriptions in our case) in that particular time range.

A graphical representation of survival function. [image by the author]

We simulate some numerical input features and a target representing how much time individuals remain our customers from their first subscription. From our simulation, we observe that most of our customers leave in the first stage after their engagement (left side of the histogram below). This represents a reliable dynamic for most companies, where a lot of clients churn after a while. On the contrary, we have a group of loyal subscribers who remain users of our services (right side of the histogram below). In our case, we limit the maximum observable subscription times up to 700 periods (let’s say days). This assumption is mandatory to make our approach work.

Event time distribution (on the left). Binary label proportion (on the right). [image by the author]

We start binning the leaving time into groups of regular length (bins). For each customer under analysis, we end up having a normalized categorical target with several unique classes equal to the number of created bins. At this point, we can transform the target with one-hot encoding, resulting in a multidimensional binary target of zeros and ones. The ones identify in which temporal range our customers are leaving (leaving bins). As the final step, we have to replace, in the target sequences, the zeros with ones on the left before the “leaving bins”. This final step is important to provide a temporary path in the target ready to be modeled, where zeros identify the temporal ranges in which our customers are left.

One-hot encoded binned event time (on the left). Cumulative one-hot encoded binned event time (on the right). [image by the author]

Now we have all that we need in the correct format. We have a set of features and a multidimensional binary target. In other words, we simply have to solve a multidimensional binary classification task. A possibility to solve it consists in using native scikit-learn methodologies (ClassifierChain).

from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression

model = ClassifierChain(
LogisticRegression(random_state=33, max_iter=2000), cv=5
)
model.fit(X_train, y_train)

With a classifier chain, we model our multioutput classification target as standalone but dependent binary classification tasks. We say dependant since the output of the previous step is concatenated with the initial features and used as input for the next training in the chain.

After the training phase, we end with a set of dependent binary classifiers. Each of them provides a probability outcome which is a piece to build the final individual survival curves. The probabilities are for sure between 0 and 1 but there is no guarantee about the monotony constraint proper of survival functions. In other words, the probability of survival in the first temporal bin must be higher than the ones obtained in the following temporal ranges. To have this requirement satisfied we operate a postprocessing manipulation on the probabilities obtained by our classifier chain at customer levels.

from sklearn.isotonic import IsotonicRegression
from joblib import Parallel, delayed

isoreg = IsotonicRegression(y_min=0, y_max=1, increasing=True)
x = np.arange(0, n_bins)

proba = model.predict_proba(X_test)

proba = Parallel(n_jobs=-1, verbose=1)(
delayed(isoreg.fit_transform)(x, p)
for p in proba
)
proba = 1 - np.asarray(proba)

Simply applying isotonic regressions on the final probabilities we obtain monotonic survival curves as final outputs.

Samples of predicted survival curves. [image by the author]

In the end, we can measure the errors like in standard supervised tasks, using our metrics of interest. We may use for example the Brier score or the more standard logistic loss.

from sklearn.metrics import brier_score_loss, log_loss

brier_scores = Parallel(n_jobs=-1, verbose=1)(
delayed(brier_score_loss)(true, pred, pos_label=1)
for true,pred in zip(y_test,proba)
)

logloss_scores = Parallel(n_jobs=-1, verbose=1)(
delayed(log_loss)(true, pred, labels=[0,1])
for true,pred in zip(y_test,proba)
)

Brier score distribution of test data (on the left). LogLoss distribution of test data (on the right). [image by the author]

In this post, we introduced a simple and effective method to produce survival curves with standard machine learning classifiers of our choice. We found that we can predict survival curves by modeling the observed failure times as sequences of binary targets. With straightforward probability postprocessing, we obtained reliable probabilistic outputs. The proposed methodology can be easily generalizable and applied in a lot of contexts (also considering censored observations if their addition is reasonable to provide a performance improvement).


Time to Event Forecasting with Simple ML Approaches

Photo by Markus Spiske on Unsplash

In data-driven companies, time-to-event applications assume a crucial role in decision-making (also more than we can imagine). With time-to-event analysis, we are referring to all the techniques used to measure the time which elapses until some events of interest happen. This straightforward definition may immediately outline all the benefits of developing time-event applications in business contexts (and not only).

Time-to-event origins are related to the medical field to answer questions like: “how long do the individuals under analysis live?”. For this reason, the terms survival and time-to-event are usually used as synonyms. Nowadays, with the adoption of machine learning on large scale, it’s often to find applications of survival methodologies also in companies outside the medical/clinical sector. A manufacturer may be interested in estimating the expected life of some engines; a service provider may need to compute the expected life of their customers; a financial institution may evaluate the borrower’s insolvency risk over time.

Practically speaking, to model a time-to-event problem, there is a proper set of methodologies. From classical linear statistical methods to more sophisticated machine learning approaches up to cutting-edge deep-learning solutions, a lot of survival frameworks have been released. All of them are awesome but they must respect the assumption proper to the survival modeling theory which may result in low adaptability or limitations for real use cases. For these reasons, a convenient way to deal with survival analysis may consist of treating time-to-event modeling as a classification problem. The idea is not new and can be found also in these two works [1], [2] (citations at the bottom of the post).

In this post, we propose a generalization to carry out survival analysis with predictive capabilities. We aim to model the elapsed time, between the starting time and the event of interest, as a multiple binary classification problem. With proper and simple post-processing, we can retrieve reliable and robust individual survival curves. We can do it using our favorite classification algorithm, making parameters search as always, and considering the possibility to calibrate our outcomes to make them more trustable.

Arranging the data at our disposal to implement a survival predictive application is straightforward and doesn’t require particular effort. We must have some input features (numerical or categorical) and a raw target, like in a standard regression/classification tabular task. In this context, the target represents how much time elapsed from the monitoring start to the event happening.

Event time grid (on the left). Sorted event time grid (on the right). [image by the author]

Let’s imagine being a company that offers an online subscription service. We may be interested in computing the expected life of our customers at subscription time. In other words, when a new customer lands on our platform and subscribes to get the services, we would like to know how much time she/he will remain our client. We can carry out the task by developing a survival approach that outputs probability survival curves (one for each customer). Survival curves are sequences of monotonic probabilities. For each time step, we have numbers between 0 and 1 that state the survival likelihood of some events (subscriptions in our case) in that particular time range.

A graphical representation of survival function. [image by the author]

We simulate some numerical input features and a target representing how much time individuals remain our customers from their first subscription. From our simulation, we observe that most of our customers leave in the first stage after their engagement (left side of the histogram below). This represents a reliable dynamic for most companies, where a lot of clients churn after a while. On the contrary, we have a group of loyal subscribers who remain users of our services (right side of the histogram below). In our case, we limit the maximum observable subscription times up to 700 periods (let’s say days). This assumption is mandatory to make our approach work.

Event time distribution (on the left). Binary label proportion (on the right). [image by the author]

We start binning the leaving time into groups of regular length (bins). For each customer under analysis, we end up having a normalized categorical target with several unique classes equal to the number of created bins. At this point, we can transform the target with one-hot encoding, resulting in a multidimensional binary target of zeros and ones. The ones identify in which temporal range our customers are leaving (leaving bins). As the final step, we have to replace, in the target sequences, the zeros with ones on the left before the “leaving bins”. This final step is important to provide a temporary path in the target ready to be modeled, where zeros identify the temporal ranges in which our customers are left.

One-hot encoded binned event time (on the left). Cumulative one-hot encoded binned event time (on the right). [image by the author]

Now we have all that we need in the correct format. We have a set of features and a multidimensional binary target. In other words, we simply have to solve a multidimensional binary classification task. A possibility to solve it consists in using native scikit-learn methodologies (ClassifierChain).

from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression

model = ClassifierChain(
LogisticRegression(random_state=33, max_iter=2000), cv=5
)
model.fit(X_train, y_train)

With a classifier chain, we model our multioutput classification target as standalone but dependent binary classification tasks. We say dependant since the output of the previous step is concatenated with the initial features and used as input for the next training in the chain.

After the training phase, we end with a set of dependent binary classifiers. Each of them provides a probability outcome which is a piece to build the final individual survival curves. The probabilities are for sure between 0 and 1 but there is no guarantee about the monotony constraint proper of survival functions. In other words, the probability of survival in the first temporal bin must be higher than the ones obtained in the following temporal ranges. To have this requirement satisfied we operate a postprocessing manipulation on the probabilities obtained by our classifier chain at customer levels.

from sklearn.isotonic import IsotonicRegression
from joblib import Parallel, delayed

isoreg = IsotonicRegression(y_min=0, y_max=1, increasing=True)
x = np.arange(0, n_bins)

proba = model.predict_proba(X_test)

proba = Parallel(n_jobs=-1, verbose=1)(
delayed(isoreg.fit_transform)(x, p)
for p in proba
)
proba = 1 - np.asarray(proba)

Simply applying isotonic regressions on the final probabilities we obtain monotonic survival curves as final outputs.

Samples of predicted survival curves. [image by the author]

In the end, we can measure the errors like in standard supervised tasks, using our metrics of interest. We may use for example the Brier score or the more standard logistic loss.

from sklearn.metrics import brier_score_loss, log_loss

brier_scores = Parallel(n_jobs=-1, verbose=1)(
delayed(brier_score_loss)(true, pred, pos_label=1)
for true,pred in zip(y_test,proba)
)

logloss_scores = Parallel(n_jobs=-1, verbose=1)(
delayed(log_loss)(true, pred, labels=[0,1])
for true,pred in zip(y_test,proba)
)

Brier score distribution of test data (on the left). LogLoss distribution of test data (on the right). [image by the author]

In this post, we introduced a simple and effective method to produce survival curves with standard machine learning classifiers of our choice. We found that we can predict survival curves by modeling the observed failure times as sequences of binary targets. With straightforward probability postprocessing, we obtained reliable probabilistic outputs. The proposed methodology can be easily generalizable and applied in a lot of contexts (also considering censored observations if their addition is reasonable to provide a performance improvement).

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment