Better Churn Prediction — using survival analysis | by Iyar Lin | Oct, 2022


Answering the “when” question

Photo by Markus Spiske on Unsplash

On a previous post I made the case that survival analysis is essential for better churn prediction. My main argument was that churn is not a question of “who” but rather of “when”.

In the “when” question we ask when will a subscriber churn? Put differently how long does a subscriber stay subscribed on average? We can then answer one of the most important questions: What is the average subscriber life time value?

Let’s roll up our sleeves and dive right in: The survival curve S(t) measures the probability a subscriber will “survive” (not churn) until time t since starting his subscription. For example S(3)=0.8 means a subscriber has %80 chance of not churning by the 3rd month of subscription.

The most common way of estimating S(t) is by using the Kaplan-Meier curve who’s formula is given by:

Image by author

where t_i are all times where at least one subscriber has churned, d_i is the number of subscribers who have churned at time t_i and n_i is the number of subscribers who survived till at least t_i. We can think of the term d_i/n_i as the churn rate at time t_i.

To illustrate let’s calculate the survival curve for the following subscriber data:

Image by author

The column t denotes the time a user has been subscribed until today. If he churned that would be the time till he churned.

We have 2 times at which churn events happened: t_i = {2,6}.

For t < 2 we have S(t)=1 since no one churned up to that point.

At t_1=2 we have d_1=2 (subscribers 3 and 6) and n_1=5 (all subscribers but 4). Using the above formula we get:

Image by author

At t_2=6 we have d_2=1 (subscriber 2) and n_2=1 (again, just subscriber 2).

We thus have:

Image by author

Let’s plot that curve:

Image by author

One thing to notice here is that at that every point along the curve we only consider subscribers who survived up to that point. If a subscriber joined very recently (e.g. subscriber 4) he won’t play a major role in the calculation.

In practice you’d be better off using the survival curve implementation in the R survival package or the python lifelines library.

So why go through the hassle of calculating S(t) in the first place? Turns out that the expected life time is the area under the survival curve (I won’t go into proving that here).

So in our example above:

Image by author

If a users’ monthly plan bill is for example $10 then we can say that his expected LTV (life time value) is $44.

In this post we’ve seen how using survival curves we can answer the “when” question — how long is the average subscription. We saw this can then be used to indicate what is the $ value of a subscriber.

Sometimes we may actually be interested in the “who” question as well. For example “What subscribers are most likely to churn within the first month of subscription”? On my next post I’ll show that using survival curves we can better answer that question as well!


Answering the “when” question

Photo by Markus Spiske on Unsplash

On a previous post I made the case that survival analysis is essential for better churn prediction. My main argument was that churn is not a question of “who” but rather of “when”.

In the “when” question we ask when will a subscriber churn? Put differently how long does a subscriber stay subscribed on average? We can then answer one of the most important questions: What is the average subscriber life time value?

Let’s roll up our sleeves and dive right in: The survival curve S(t) measures the probability a subscriber will “survive” (not churn) until time t since starting his subscription. For example S(3)=0.8 means a subscriber has %80 chance of not churning by the 3rd month of subscription.

The most common way of estimating S(t) is by using the Kaplan-Meier curve who’s formula is given by:

Image by author

where t_i are all times where at least one subscriber has churned, d_i is the number of subscribers who have churned at time t_i and n_i is the number of subscribers who survived till at least t_i. We can think of the term d_i/n_i as the churn rate at time t_i.

To illustrate let’s calculate the survival curve for the following subscriber data:

Image by author

The column t denotes the time a user has been subscribed until today. If he churned that would be the time till he churned.

We have 2 times at which churn events happened: t_i = {2,6}.

For t < 2 we have S(t)=1 since no one churned up to that point.

At t_1=2 we have d_1=2 (subscribers 3 and 6) and n_1=5 (all subscribers but 4). Using the above formula we get:

Image by author

At t_2=6 we have d_2=1 (subscriber 2) and n_2=1 (again, just subscriber 2).

We thus have:

Image by author

Let’s plot that curve:

Image by author

One thing to notice here is that at that every point along the curve we only consider subscribers who survived up to that point. If a subscriber joined very recently (e.g. subscriber 4) he won’t play a major role in the calculation.

In practice you’d be better off using the survival curve implementation in the R survival package or the python lifelines library.

So why go through the hassle of calculating S(t) in the first place? Turns out that the expected life time is the area under the survival curve (I won’t go into proving that here).

So in our example above:

Image by author

If a users’ monthly plan bill is for example $10 then we can say that his expected LTV (life time value) is $44.

In this post we’ve seen how using survival curves we can answer the “when” question — how long is the average subscription. We saw this can then be used to indicate what is the $ value of a subscriber.

Sometimes we may actually be interested in the “who” question as well. For example “What subscribers are most likely to churn within the first month of subscription”? On my next post I’ll show that using survival curves we can better answer that question as well!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai Newsanalysisartificial intelligenceChurnIyarLinOctPredictionSurvivalTech News
Comments (0)
Add Comment