The Best SQL Template for Customer Lifetime Value | by Josh Berry | Aug, 2022

By Jessie Hobb On Aug 24, 2022

Reusable SQL for RFM and CLV Analysis

Photo by energepic.com: https://www.pexels.com/photo/black-payment-terminal-2988232/

Introduction

Over the past 10 years of my career, I’ve worked on Customer Lifetime Value (CLV) models with a variety of clients, industries, and datasets. There are many different ways to solve the problem, but this article is not about that. Instead, I want to explain the data setup, which is often neglected in the blogosphere.

In my last blog post, Lets go further with SQL, I talked about how excited I am about the potential for combining jinja and SQL, in order to leverage semantic abstractions that sit above the SQL code itself. This post is the first of many ideas that have been floating around in my head recently. My goal is to (1) show you that the data prep for a CLV model can be done in SQL, but more importantly, (2) give you a template that you can apply elsewhere, so you don’t have to actually write the SQL from scratch.

Getting started

The most commonly used probabilistic model for predicting CLV is the Beta Geometric/Negative Binomial Distribution model (often abbreviated BG/NBD). It has a popular cousin, the Pareto/NBD model. These models try to predict the future transactions of a given customer. The Gamma-Gamma extension model focuses on the monetary aspect of how much those transactions will be.

This year, hundreds, if not thousands, of data scientists across the world are going to build this kind of model.

How will they start? How would you start? If you’re being honest, the first step is probably to Google it, read a few blogs, and then follow along with your own data. If you can find a good enough example online, you can methodically manipulate your data in a way that is similar to the blog post, until you’ve got it close enough that you’re off and running.

A familiar pattern emerges

If you read enough of these blogs, you’ll begin to realize that there is a familiar pattern. All of them start with manipulating the data to a specific shape — often called an RFM table .

There are other methods to calculate CLV, and I’m not going to cover those. My main focus is the RFM table because not only is it a very popular approach, it is also very repeatable. If you’re not familiar with an RFM Table, check out my friend Olubukunola Akinsola’s blog post about it.

The basics are:

R — the most recent date a customer purchased from the business
F — how frequently does a customer purchases from the business
M — the monetary value of the customer’s purchase.

Choosing SQL

If you’ve read any of my past blog posts, then you already know that I’m passionate about utilizing SQL in order to take advantage of the data warehouse’s processing power. In order to create the RFM table, we need to aggregate the data from the raw transaction level, which is likely one of the largest tables in our database. While it is true that there are plenty of python solutions available, why bother going through the effort to offload millions or billions of transactions, which might not even fit in my python memory, when I have a perfectly good data warehouse?

Making it repeatable

In the following sections, I’m going to walk through an example that creates an RFM table from a sales dataset that I found online. However, that means that anyone who stumbles upon this post in the future, is still going to rewrite the SQL carefully to match their own database.

This is where jinja comes into play. Jinja is a 14 year old templating engine that is mainly used to dynamically generate HTML, XML, and other markup formats. Recently, it has been shown to be a powerful partner to SQL, being leveraged by both dbt and Rasgo. If you want to read up on the basics, Julien Kervizic wrote a prophetic post back in 2019 called Jinja the SQL way of the Ninja.

So, instead of just sharing the SQL code and logic, I’m going to take it to the next level — I’m going to share a template that you can use to dynamically create the SQL for your own data.

Walking through an example

Let’s take some example sales transactions from the UCI Machine Learning Repository.

The way I approached this problem was by first constructing the SQL. This part is obviously challenging, but I did it so that you don’t have to. Here is the working SQL.

Reusable SQL for RFM and CLV Analysis

Photo by energepic.com: https://www.pexels.com/photo/black-payment-terminal-2988232/

Introduction

Getting started

This year, hundreds, if not thousands, of data scientists across the world are going to build this kind of model.

A familiar pattern emerges

If you read enough of these blogs, you’ll begin to realize that there is a familiar pattern. All of them start with manipulating the data to a specific shape — often called an RFM table .

The basics are:

R — the most recent date a customer purchased from the business
F — how frequently does a customer purchases from the business
M — the monetary value of the customer’s purchase.

Choosing SQL

Making it repeatable

So, instead of just sharing the SQL code and logic, I’m going to take it to the next level — I’m going to share a template that you can use to dynamically create the SQL for your own data.

Walking through an example

Let’s take some example sales transactions from the UCI Machine Learning Repository.

The way I approached this problem was by first constructing the SQL. This part is obviously challenging, but I did it so that you don’t have to. Here is the working SQL.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

The Best SQL Template for Customer Lifetime Value | by Josh Berry | Aug, 2022

Reusable SQL for RFM and CLV Analysis

Introduction

Getting started

A familiar pattern emerges

Choosing SQL

Making it repeatable

Walking through an example

Being a jinja ninja

Conclusion

Reusable SQL for RFM and CLV Analysis

Introduction

Getting started

A familiar pattern emerges

Choosing SQL

Making it repeatable

Walking through an example

Being a jinja ninja

Conclusion