Using Propensity-Score Matching to Build Leading Indicators | by Jordan Gomes | Mar, 2023

By Jessie Hobb On Mar 4, 2023

A short guide on building activation metrics for a product

In a previous article, I talked about the Input > Output > Outcome framework, and how “output” was the central piece, but not necessarily easy to define — just because you want it to be moved by your inputs, but at the same time, you need to have a causal link with your outcome.

User activation metrics fall under this category of metrics. “Activation” is the third stage of the Pirate metrics framework designed by Dave McClure (the famous AAARRR framework — Awareness, Acquisition, Activation, Retention, Referral, Revenue), and it is usually defined as when your user passed the first set of frictions, started using your product, received some value from it, and is now more likely to be retained in the longer term.

Some examples of product activation metric:
Loom: Sharing a loom¹ 
Zappier: Setting a zap¹
Zoom: Completing a zoom meeting within 7d of signup¹
Slack: Sending 2,000+ team messages in the first 30 days²
Dropbox: Uploading 1 file in 1 folder on 1 device within 1 hour²
HubSpot: Using 5 features within 60 days²¹2022 product benchmark from Open View 
https://openviewpartners.com/2022-product-benchmarks/
²Stage 2 Capital: the science of scaling:
https://www.stage2.capital/science-of-scaling

Measuring activation is important because it helps you understand how well your product is resonating with new users and whether you are effectively getting them to become “active” users. It is the very first step toward user loyalty — this is the stage where you know if your users are likely to stick around for the long haul. If activation is low, it can indicate that there is a problem with the product or the onboarding process, and it may be necessary to make changes to improve the user experience and increase activation.

You want Activation to be a good predictor of Retention, but at the same time, you want it to be simple enough as this should be an easy first step your users are following.
Basically, you are looking for the smallest action a user can take that will showcase the product’s value for them, but you want this small action to have a causal link with retention (however you define it).
As with any ‘leading’ indicator, the causality piece (“doing action Y leads to long-term retention”) is hard. You usually start with observational data, and traditional data analysis might not give you the full picture, as it can overlook confounding factors that can impact activation/retention.

Using a cohort analysis, you can start building some intuition around what user actions could good candidate for your activation metric.

The idea is to:

Group your users based on where they signed-up for youu product
Separate them based on if they made it to the retain stage or not
Look for the actions that are overwhelming done by the users you made it to the retain stage, but not so much by the users you didn’t.

Let’s say you run a fitness app. You start creating monthly cohort, and you notice that 70% of users that upload at least one workout within the first week of signing up are still engaged with the app a year later, vs 40% if they don’t. This can be a first idea for an activation metric.

A pre-requisite here is for you to get the idea of which action to study. In the example above, you had to have the idea to look at who tracked their workouts. This is where quant meets qual, and when your ‘user acumen’/common sense comes into play. Or your networking skills if you want to ask the help of other subject matter experts.

Some advice:

You might want to come up with just a few ideas of potential actions, not necessarily look into too many of them, just because as the adage goes: “if you torture the data long enough, it will confess to anything” (Ronald H. Coase). The more actions you select, the more likely you will find something, but you will be at high risk of it being a false positive. So sticking to what makes sense and is not too far-fetched can be a good rule of thumb.
You might want to adopt a principled approach to this, and only look for things that you believe you would be able to move. If you come up with something too complicated/niche, you might not be able to move it, and so this will defeat the purpose of the whole exercise.

With propensity score matching, you can confirm or infirm your previous intuitions

Once you’ve identified your potential activation signals, the next step is to make sure they’re accurate. That’s where propensity score matching can come in handy — to understand if the correlation you found previously could actually be causation. Although this is not the only solution existing, and it does require to have a bit of knowledge around your users (which might not always be the case) it can be relatively easy to implement and can give you more confidence in your result (until maybe further triangulation, with more robust approaches such as A/B testing).

The idea behind propensity score matching is the following:

In order to find the causal link between taking the action and retainment, ideally you would clone your users that took the action and have the clone not take the action — to compare the result.
Since it is not possible (yet?), the next best thing is to look inside your data, find users that are very similar (almost identical) to your users that took the action — but who didn’t take the action.

Propensity score matching is a methodology that allows you to find those very similar users and pair them. Concretely speaking, it is about:

Training a model to predict the likelihood of your users to take the action you defined (their propensity).
Matching users based on the previously found likelihood (the matching part)

(Note: you have different ways to go about both steps, and some great guidelines are available online regarding how to select a model, how to select the right variable, what matching algorithm to select, etc. — for more information, see “Some Practical Guidance for the Implementation of Propensity Score Matching”)

Taking our fitness app example again:

You’ve identified that 70% of users that upload at least one workout within the first week of signing up are still engaged with the app a year later, vs 40% if they don’t.
You train a model to predict the likelihood of your user to upload a workout within a week of signing up — and you find out that the likelihood is very high for users which downloaded the app via a referral link from a large fitness website
You rank your users based on the likelihood, and start doing a simple 1:1 matching (the 1st users in terms of likelihood that took the action is matched with the 1st users in terms of likelihood that didn’t take the action, and etc.)
Post-matching, you see the difference drop greatly, but still being important for you to consider it as a potential candidate for an activation metric!

Cohort analysis + Propensity score matching can help you isolate the impact of a specific action on user behavior, which is essential for defining accurate activation metrics.

But this methodology is not a panacea —there are a bunch of hypothesis that comes with the methodology, and you will need to fine-tune it / have some validation to make sure it works for your use-case.

In particular, the efficacy of PSM will be highly dependent on how well you can predict the self selection. If you are missing key features, and the bias from unobserved characteristics is large — then the estimates from PSM can be very biased and not be really helpful.

All this being said — using this methodology, even in an imperfect way, can help having a more data-driven approach for metric selection, get you started on ‘what to focus on’, until you get to the stage of running A/B testing and have a better understanding of what drive long term success.

Hope you enjoyed reading this piece! Do you have any tips you’d want to share? Let everyone know in the comment section!

And If you want to read more of me, here are a few other articles you might like: