Techno Blender
Digitally Yours.

SQL users: Halve the length of (some of) your queries with this one trick | by Matt Chapman | Mar, 2023

0 47


Image by Joanna Kosinska on Unsplash

In this article, I’m going to introduce you to the SQL QUALIFY clause: a trick that will help you write shorter and more readable SQL queries.

If you work in Data Science/Analytics or Machine Learning, this is a must-have SQL tool for four reasons:

  1. First, because it can help you write much shorter queries, simplifying your data collection pipelines. In my experience, QUALIFY reduces the length of many queries by up to 50%.
  2. Second, because it will help you write much more succinct queries. This matters because, as Paul Graham famously said, succinctness is power in that it makes your code more readable and maintainable.
  3. Third, QUALIFY helps you write more efficient queries which reduce costs and speed up development time.
  4. Fourth, it’s an easy way to show off. I’m not saying that you’re going to win any innovation prizes for using QUALIFY, but because it’s a relatively new clause, chances are that your boss might not have heard of it. At any rate, it’ll be fun to show your team a new trick.

Simply put, the QUALIFY clause enables you to filter the results of window functions without needing to use nested subqueries or the WITH clause. It allows you to replace these bulky code blocks with a single line of code. It’s sort of like how the HAVING clause enables you to filter the results of GROUP BY aggregations.

the QUALIFY clause enables you to filter the results of window functions

If that sounds like gobbledygook, give me 2 minutes of your time and I’ll try my best to explain.

If you’re not familiar with window functions, you might want to check out this article, which introduces SQL window functions through 4 real-world use cases. But, in a nutshell, window functions can be used to perform calculations on a set of rows in a table without aggregating or collapsing the rows, which is very useful when you want to preserve the original structure of the table.

For example, let’s say we have a table named orders which lists all the product orders a company receives. Each row represents an order and records the date the order was placed, the customer who placed the order, and the order_amount (in dollars).

Image by author

Using a window function, we could add in a new column total_order_amount which represents the total order amount per customer. We could simply write:

SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders
ORDER BY date

which would return:

Image by author

As you can see, the window function SUM(order_amount) OVER(PARTITION BY customer) effectively partitioned our table into different “windows” (one for each customer) and then calculated the total_order_amount for each of these windows. All of this was achieved without using a GROUP BY aggregation, allowing us to retain the same number of rows.

Now that we’ve got our window function set up, let’s imagine that we want to send a Thank You email to our highest-spending customers. Before we can do this, we need to filter this table to see the orders of customers whose total_order_amount exceeded $5,000. In other words, we want to produce something like this:

Image by author

A typical way of producing this would be to use a WITH clause:

WITH my_order_table AS (
SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders
)

SELECT
*
FROM my_order_table
WHERE total_order_amount > 5000
ORDER BY date

… or a nested subquery:

SELECT
*
FROM
(SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders) AS A

WHERE A.total_order_amount > 5000
ORDER BY A.date

However, while both of these are perfectly acceptable solutions, the QUALIFY clause enables us to simplify and shorten this code significantly. We could simply write:

SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders
QUALIFY total_order_amount > 5000

which would return the exact same result.

The great thing about the QUALIFY clause is that it is evaluated after window functions are computed. This means that you can reference the window function column using the alias you assigned to it. In the example above, I demonstrated this by writing QUALIFY total_order_amount.., referencing the alias total_order_amount which I assigned at the point of creating the original window function.

Image by author

It’s also not necessary to write the window function in the SELECT clause. For example, if I didn’t want to actually show the total order amount as a separate column, but still needed to filter on this window function, I could write:

SELECT
date,
customer,
order_amount,
FROM orders
QUALIFY SUM(order_amount) OVER(PARTITION BY customer) > 5000

which would return the same result, but without the total_order_amount column:

Image by author

On top of all this, the QUALIFY clause can also be slightly more efficient than using subqueries or WITH. Not only will this speed up your development time; it also helps a ton with cost optimisation.

A common use case of the QUALIFY clause is to select the top-N-per-group. For example, imagine that we had the following table sales which records every sale that a shop makes on a given day. Each row represents a single transaction and contains the timestamp of the sale, the product purchased, and the total amount of money spent in that sale.

Using the window function:

SELECT
Timestamp,
Product,
Sale_value,
ROW_NUMBER() OVER(PARTITION BY Product ORDER BY Timestamp ASC) as row_num
FROM sales
ORDER BY Product, row_num

we could assign a row number to each sale, within each product category:

Image by author

and then, using a QUALIFY statement, we could filter the result to only show the first row in each category:

SELECT
Timestamp,
Product,
Sale_value,
ROW_NUMBER() OVER(PARTITION BY Product ORDER BY Timestamp ASC) as row_num
FROM sales
QUALIFY row_num = 1
ORDER BY Product
Image by author

And there you have it — the magic of QUALIFY.

The QUALIFY clause is not yet available in all dialects of SQL. At the time of writing, it’s supported by many of the big platforms including BigQuery, Databricks, Snowflake, and H2. If your platform/dialect is not compatible, you’ll have to rely on subqueries/WITH for now, but you can keep an eye on Modern SQL to see when it becomes available.


Image by Joanna Kosinska on Unsplash

In this article, I’m going to introduce you to the SQL QUALIFY clause: a trick that will help you write shorter and more readable SQL queries.

If you work in Data Science/Analytics or Machine Learning, this is a must-have SQL tool for four reasons:

  1. First, because it can help you write much shorter queries, simplifying your data collection pipelines. In my experience, QUALIFY reduces the length of many queries by up to 50%.
  2. Second, because it will help you write much more succinct queries. This matters because, as Paul Graham famously said, succinctness is power in that it makes your code more readable and maintainable.
  3. Third, QUALIFY helps you write more efficient queries which reduce costs and speed up development time.
  4. Fourth, it’s an easy way to show off. I’m not saying that you’re going to win any innovation prizes for using QUALIFY, but because it’s a relatively new clause, chances are that your boss might not have heard of it. At any rate, it’ll be fun to show your team a new trick.

Simply put, the QUALIFY clause enables you to filter the results of window functions without needing to use nested subqueries or the WITH clause. It allows you to replace these bulky code blocks with a single line of code. It’s sort of like how the HAVING clause enables you to filter the results of GROUP BY aggregations.

the QUALIFY clause enables you to filter the results of window functions

If that sounds like gobbledygook, give me 2 minutes of your time and I’ll try my best to explain.

If you’re not familiar with window functions, you might want to check out this article, which introduces SQL window functions through 4 real-world use cases. But, in a nutshell, window functions can be used to perform calculations on a set of rows in a table without aggregating or collapsing the rows, which is very useful when you want to preserve the original structure of the table.

For example, let’s say we have a table named orders which lists all the product orders a company receives. Each row represents an order and records the date the order was placed, the customer who placed the order, and the order_amount (in dollars).

Image by author

Using a window function, we could add in a new column total_order_amount which represents the total order amount per customer. We could simply write:

SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders
ORDER BY date

which would return:

Image by author

As you can see, the window function SUM(order_amount) OVER(PARTITION BY customer) effectively partitioned our table into different “windows” (one for each customer) and then calculated the total_order_amount for each of these windows. All of this was achieved without using a GROUP BY aggregation, allowing us to retain the same number of rows.

Now that we’ve got our window function set up, let’s imagine that we want to send a Thank You email to our highest-spending customers. Before we can do this, we need to filter this table to see the orders of customers whose total_order_amount exceeded $5,000. In other words, we want to produce something like this:

Image by author

A typical way of producing this would be to use a WITH clause:

WITH my_order_table AS (
SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders
)

SELECT
*
FROM my_order_table
WHERE total_order_amount > 5000
ORDER BY date

… or a nested subquery:

SELECT
*
FROM
(SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders) AS A

WHERE A.total_order_amount > 5000
ORDER BY A.date

However, while both of these are perfectly acceptable solutions, the QUALIFY clause enables us to simplify and shorten this code significantly. We could simply write:

SELECT
date,
customer,
order_amount,
SUM(order_amount) OVER(PARTITION BY customer) AS total_order_amount
FROM orders
QUALIFY total_order_amount > 5000

which would return the exact same result.

The great thing about the QUALIFY clause is that it is evaluated after window functions are computed. This means that you can reference the window function column using the alias you assigned to it. In the example above, I demonstrated this by writing QUALIFY total_order_amount.., referencing the alias total_order_amount which I assigned at the point of creating the original window function.

Image by author

It’s also not necessary to write the window function in the SELECT clause. For example, if I didn’t want to actually show the total order amount as a separate column, but still needed to filter on this window function, I could write:

SELECT
date,
customer,
order_amount,
FROM orders
QUALIFY SUM(order_amount) OVER(PARTITION BY customer) > 5000

which would return the same result, but without the total_order_amount column:

Image by author

On top of all this, the QUALIFY clause can also be slightly more efficient than using subqueries or WITH. Not only will this speed up your development time; it also helps a ton with cost optimisation.

A common use case of the QUALIFY clause is to select the top-N-per-group. For example, imagine that we had the following table sales which records every sale that a shop makes on a given day. Each row represents a single transaction and contains the timestamp of the sale, the product purchased, and the total amount of money spent in that sale.

Using the window function:

SELECT
Timestamp,
Product,
Sale_value,
ROW_NUMBER() OVER(PARTITION BY Product ORDER BY Timestamp ASC) as row_num
FROM sales
ORDER BY Product, row_num

we could assign a row number to each sale, within each product category:

Image by author

and then, using a QUALIFY statement, we could filter the result to only show the first row in each category:

SELECT
Timestamp,
Product,
Sale_value,
ROW_NUMBER() OVER(PARTITION BY Product ORDER BY Timestamp ASC) as row_num
FROM sales
QUALIFY row_num = 1
ORDER BY Product
Image by author

And there you have it — the magic of QUALIFY.

The QUALIFY clause is not yet available in all dialects of SQL. At the time of writing, it’s supported by many of the big platforms including BigQuery, Databricks, Snowflake, and H2. If your platform/dialect is not compatible, you’ll have to rely on subqueries/WITH for now, but you can keep an eye on Modern SQL to see when it becomes available.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment