E-Commerce: Who Is Likely to Convert? | by Anastasia Reusova | Jan, 2023

By Jessie Hobb On Jan 24, 2023

Signal-based scoring is easier than you think and you don’t need Machine Learning

Product manager, analyst, growth hacker, designer or a marketer — understanding user personas is invaluable for any of those roles. While third-party tools can be helpful for gaining initial insights, having your hands on the raw data is incredibly valuable to create accurate models of user behavior. This blog post discussed the methodology that can be used to build a relative scoring system for individual users, which can be used to identify the most active users in your e-commerce business and keep them engaged. I have already shared how we approached in-app user journey analytics in an e-commerce — feel free to check those out:

Looking for Power User Journeys in E-commerce

Find Your Power Users Using BigQuery Firebase Data

In a nutshell, you want to get your hands busy with understanding your users’ behavior. Third-party tools are great for exploratory purposes but often lack flexibility and getting access to raw data could be challenging — and this is ultimately what you want to get, the raw data. Once you have raw data, you can start modeling user behavior and use those models across different systems in your e-commerce business — systems which are not connected directly — product and CRM by creating a comprehensive user journey, product and marketing, operations, post-order service and so on. You can think about those models as a meta-layer that can be fed into your systems.

If I did not manage to interest you yet — user generated signals are type of implicit feedback that can be used in a personalization layer, which as known by now, can help increase business revenues.

When I say “models”, I don’t mean machine learning models. In fact, if machine learning is not part of the organizational DNA, it may be hard to advocate for the use of ML. There a multiple reasons for that, highlighting a few: you don’t have a big data science team, and new projects may get backlogged or will take longer than you wish; business may not have understanding or ML models limitations and results; your current infrastructure may not be ready for the use and maintenance of ML solutions.

Building something simple as a starter, based on common business knowledge may pave the way to more complex and intricate approaches — once your assumptions about user behavior have been proven correct or incorrect, and been reiterated upon.

You can think about finding and working with signals as feature engineering. For example, if your funnel is that of a typical e-commerce, the important steps in a funnel will include:

Homepage landing
Product listing page (PLP) view
Product detail page (PDP) view
Add to cart
Cart view
Checkout start
Purchase

As users go down the funnel, the conversion rate of such users increases. The share of users who go deeper also goes down. Image by author.

It’s easy to imagine how those events can form a funnel. Also, it’s fair to say that some events are more rare than others. For example, typical e-commerce user conversion rate can be benchmarked at around 2.5–3% in a regular business as usual time. Therefore, a purchase being an ultimate “destination” in a funnel can also be an indicator of its efficiency. Going bottom-up, less users will start a checkout compared to those who viewed a cart. You’ll also have some “bouncer” users who only visit a homepage and do not take any “meaningful” actions on the website such as PLP or PDP views. It wouldn’t be a far reach to say that the more “rare” an event in the high-level e-commerce funnel, the more weight it carries in terms of purchase signaling. Moreover, if a user has entered the bottom of the funnel and simply dropped off, we may have a strong reason to reach out in attempts to facilitate or promote movement down the funnel.

As users go down the funnel, what essentially happens in terms of conversions is that you decrease the demoninator by keeping the nominator essentially the same. Everyone who makes a purchase has to go through the funnel. But not everyone who starts at the first step has to complete the transaction.

Once you’ve collected the data on a user level, you can aggregate it like so:

Aggregating user signals from the e-commerce funnel. Image by author.

In the table above, User A looks similar to a window shopper, that is engaged enough — 50% of their PLP views turn to PDP views. They have not added anything to a cart but they may have had something there from previous sessions — which is indicated by 1 cart view.

User B is likely a customer who is actively trying to make a choice. They may be preparing their cart for a transaction but have not started checking out yet.

User C went way deeper into PLP browsing. Like user B, they show signs of being ready to commit to a purchase and even started checking out once. However, they did not complete a transaction. Possibly, they dropped off in search for coupon codes or better deals elsewhere. Maybe, based on the high PLP view count, they were deep into search but did not manage to find the products of interest.

User D probably knew what they wanted, which is indicated by a relatively high ration from PLP to PDP views and hight PDP views to Add to Cart ratio. They viewed their cart multiple times, reviewing it. But somehow, they have not started the checkout. This could be a perfect candidate for the abandoned cart campaign.

User E is exceptional. They are most probably a returning customer who come back shortly after another session. They have passed home page, PLPs, PDPs and add-to-carts, however, they were actively checking out their cart. Unless it’s a bug, you may want to evaluate their cart and offer them an incentive to checkout.

To make it easier for analysis, instead of working with raw numbers, you can absolutely normalize them to score users based on:

an individual signal (signal X)
on all the signals via an overall score

The frequency and the timeframe of calculations totally depend on the business model and the action you’re expecting to take with it. For example, if you know that users take 1 month to consider a purchase, updating the segments on a 30-day rolling basis makes total sense. And if you want to communicate with your customers daily, daily morning updates could be something to consider. If you are in a higher-frequency business where the user takes a critical decision within a shorter period, make your updates accordingly.

This approach, as simple as it is, may help you re-engage with more active customers before they slip through the cracks of your funnel — while reaching out to them still matters.

And to more holistically describe customers’ behavior, it makes sense to come up with 2 scoring systems: for individual signals, and for behavior overall.

Individual Signals

Talking about individual signals is easier, so let’s start with them. Individual signals are simply events that are positively associated with the conversion. From what we discussed before, these could be the number of daily PLP views, PDP views, add to carts, etc.

Understanding how a user scores in each one of them can help identify which part of the conversion funnel was not covered by a user. As in the examples above, this may require different actions from the business in case we still want to engage with the user.

Without using a machine learning approach and any expert inputs, customer signals can be scored relatively to the overall customer base. One of the simpler approaches is to figure out Q33 and Q66 of the customer signal distribution and assign each customer to a group either below Q33, or between Q33 and Q66, or above Q66. What you get as a result is a very simple segmentation with 3 segments: below average, average, above average customers in terms of their activity on the app. Additionally, you can single out customers who did not signal at all (0-score users).

While being rather simple, this approach has a couple of points of consideration:

how can seasonality affect the distribution of signals? when you calculate Q33 and Q66, should you be using same-day data or a rolling window? If your business has strong seasonality within a week or business events that may strongly impact user signals, you may want to “smoothen” the signals with a rolling metric.
overall, do Q33 and Q66 make sense?
do we need 3 segments or would like to go more granular?
what do we do with 0-score users? Should they be part of scoring or we can exclude them during benchmarking?

Overall Signals

You can score user by every signal and combine their signals into an overall score. Image by author.

In order to aggregate user signals into 1 score, the easiest way is to assign a weight to every signal and add those up.

Aggregating signals into one can be a creative process. It’s important to note that each signal has different value and importance when it comes to signaling the probability of conversion. Because as the users go deeper down the funnel, they usually demonstrate a stronger intent to convert. One way to assign weight to each signal is through expert attribution, where a percentage of weight is agreed upon for each action such as product use, product clicks, or add to cart events. For example, a rare action like an add to cart event would be assigned a higher weight, such as 50%, while a common action like a product impression would be assigned a lower weight, such as 10%.

Another way of assigning weight to signals is by looking at historical data, and taking into account how customer behavior and the seasonality of the business may affect the signaling potential of certain actions. We automate the weight assignment by the share of customers who’ve performed a certain action on our app or website. Based on the exploratory data analysis, we have figured out that the more rare an event is, the higher conversion rate is associated with.

Speaking of seasonality, some events may become more frequent during significant e-commerce events such as Black Friday. More users would engage into window shopping and use their cart as a wishlist.

To smooth out this effect, we decided to take a three-day rolling estimation of the number of users who engaged or showed any of the signals that we used for our funnel. This share represents the frequency of an event, so to quantify the rarity, we simply subtract the percentage from 100%.

As an example, if 99% of users engage into PLP or product listing page browsing, then the weight of the number of PLP impressions would be 100–99% which is 1%. If 30% of users add items to cart, then the weight of the add to cart signal would be estimated as one 100-30%, which is 70%. And so on.

Once the weights are assigned, the signals can be added up to create a single score. This score may not have a defined range, as it can fluctuate depending on the day, so it will be hard to compare signals across days. To avoid this, the score can be normalized between 0 and 1.

While this method may not be perfect, it generates a reasonable differentiation between different customer groups when it comes to conversion rate and revenue per user, and can be used as a benchmark for other models, including data science models, which typically take more time to develop and tune.

Simple weighting of signals can help you get the overall score. Image by author.

Segmentation

3 segments are an easy starting point and will allow you to prioritize customer communications, for example. It is clear that if an aggressive user did not convert within an expected period, based on the exploratory data analysis, we want to prioritize them in our comms and maybe offer a monetary reward. Below average users may benefit

When we implemented this approach to customer segmentation and mapped user conversion rates as well as revenue per user for each of the segments, the results were quite promising. Results for the conversion rate (CR) can be expected, however, seeing the higher in revenue per converted user (RPU) was not something expected. This could mean that more active users are more engaged with the product and are ready to shop cross-category, increase basket size by either shopping for more items or for items at a higher price.

Active segments are related to higher conversions and revenue per user. Image by author.

Of course you want to make this — as any — model actionable. The point of making it simple is that you can iterate fast and show business results, which hopefully will show you that you’re moving to the right direction.

In our case, we decided to first go into testing with the CRM communications, which did not promise anything extra to the customer, however, we structured the communications around the time the user was last seen on the app. In our case, the timing is quite important, and it may be even more important in yours as the user takes their purchase decision purchase something even before they land on your app. You may have similar conclusions based on your expiratory data analysis.

Ideally, your first iteration and the rest of the iterations should be launched within an AB test, so you can evaluate the real result or the real impact of reaching out to those customers one extra time. As your iterations progress, you can start increasing the rewards per customer based on the expected uplift in the average order value and conversion rate.

When running an AB test, it would be perfect to exclude the control group from the rest of the communications and keep the target group only the target group for this campaign to get the purest results possible, unless you can decouple the results from different campaigns. And because the users were active within a short period of time, you may pay additional attention to how they were targeted with CRM or marketing campaigns before you reach out to them again.

Why not opt-in for a DS model?

There are a few reasons why it may be better to opt for a simple data model instead of a more complex data science model.

One reason is a lack of data. Data science models often require large amounts of data to be trained effectively. If you don’t have enough data, a simple model may be more appropriate because it will be less prone to overfitting and may be more likely to generalize to new data.

Another reason is interpretability. Simple models are often easier to understand and interpret, especially for people who are not data scientists. This can be especially important in situations where the results of the model need to be explained to non-technical stakeholders.

Finally, simple models can also be faster to deliver. Data science models can take a long time to train, especially if you have a lot of data. Simple models can often be trained and implemented much more quickly, which can be important if you need to get a solution in place in a short amount of time.

Overall, there are trade-offs to be considered when deciding between a simple data model and a more complex data science model. In some cases, a simple model may be more appropriate, while in other cases a more complex model may be needed to achieve the desired level of accuracy and performance.