Techno Blender
Digitally Yours.

Exploring Query Folding in Power Query | by Salvatore Cagliari | Aug, 2022

0 229


Query folding is an important feature in Power Query to offload processing steps to the Source. With this feature, Power Query can reduce CPU and RAM consumption. Let’s look at how we can use it.

Photo by Element5 Digital on Unsplash

Query folding is the mechanism to push Power Query transformations back to the source to reduce the load of Power Query.

This works only for some data sources, namely relational databases and sources which work like relational databases.

For example, look at the following diagram, which shows Power BI, which gets data from a database:

Data from a database (Figure by the Author)
Figure 1 — Data from a database (Figure by the Author)

When adding transformation steps in Power Query, we want to avoid unnecessary load on Power Query.

It would be nice to push back such transformations to the source database in the native database language: SQL.

But why should we want to have this?

As you might know, Relation database engines are very efficient in managing data.

The operations at which a relational database engine excels are:

  • Sorting data (ORDER BY)
  • Combining data (JOIN / Merge)
  • Execute calculation on a large amount of data (Calculated columns)
  • Do only what is necessary (Optimization of Queries)

But the most crucial point is that relational database engines run on server hardware, which is optimised to handle a large amount of data. These servers have much more CPU cores and memory than our humble laptops or PCs and can handle the data with much fewer restrictions than we have.

All data sources with support for SQL can support query folding. Besides the obvious relational databases (like SQL Server, Oracle, DB/2, etc.), query folding is supported by OData sources, Exchange, Active Directory, etc.

Let’s take a closer look.

Now, I use my usual Contoso database to see query folding in action.

I switch to Power Query and use the Azure SQL database source to get some tables from my database in Import mode:

Source tables (Figure by the Author)
Figure 2 — Source tables (Figure by the Author)

After loading the tables into Power Query, you can right-click on one of the applied steps and look if the item “View Native Query” is active:

Get Native Query (Figure by the Author)
Figure 3 — Get Native Query (Figure by the Author)

If it’s active, then Query folding is used, and you can see the query that Power Query generates and send to the source database:

First Native query (Figure by the Author)
Figure 4 — First Native query (Figure by the Author)

In this case, the query is simple and sent straightforwardly to the source database.

Now, let’s apply some modifications:

  • Remove some columns
  • Change the name of columns
  • Change the data type of columns

When I get the Native query, I see this:

Native query after Modifications (Figure by the Author)
Figure 5 — Native query after Modifications (Figure by the Author)

As you can see, the query has changed to reflect the steps in Power Query.

You can notice that the query doesn’t include the removed columns anymore and converts the DateKey column directly with the query.

This is much more efficient than doing this in Power Query, which consumes CPU and memory in our computer.

Now, let’s do a more complex transformation.

To describe a product, we have the following hierarchy:

Product Hierarchy (Figure by the Author)
Figure 6 — Product Hierarchy (Figure by the Author)

As you might know, Power BI doesn’t like such a hierarchy of tables. Power BI works best with a one-step relationship between the Fact and Dimension tables.

The target is to merge these tables into a single Product table.

We can use the Merge transformation in Power Query to Merge the Product-Category into Product-Subcategory and the result into Product (Yes, we can do this directly from the Product table, as Power Query recognises the existing relationships in the database, but let’s look at what happens when we’re doing this manually):

Native query with merged tables (Figure by the Author)
Figure 7 — Native query with merged tables (Figure by the Author)

The query is much more complicated, as it has to merge (Join) the source tables to form the result.

But the database engine is optimised for doing such stuff and will perform this task much better than Power Query.

Unfortunately, not all transformations can be folded.

But it is simple to determine which will not fold:

  1. Transformations which affect more than one source (database)
    For example, when you want to merge a table from a database and a table from a text file, query folding is impossible as the text file doesn’t understand SQL.
    Sadly, the same applies when merging data from two different databases, even when they reside on the same database server
  2. When you apply complex transformations which cannot be translated into SQL
    For example, when you are using M functions to manipulate data.

You can find more information on non-foldable transformations in the articles in the References section below.

You must be aware of these limitations. Consequently, you must perform as many transformations as possible before query folding isn’t possible anymore. This approach increases load performance by offloading the computations to the database engine.

We can write our own queries when retrieving data from a database.

For example, let’s assume that we write a SQL query to join the three product tables together to import the result into Power Query:

SELECT [P].[ProductKey]
,[P].[ProductLabel]
,[P].[ProductName]
,[P].[ProductDescription]
,[PSC].[ProductSubcategoryName]
,[PSC].[ProductSubcategoryLabel]
,[PSC].[ProductSubcategoryDescription]
,[PC].[ProductCategoryName]
,[PC].[ProductCategoryLabel]
,[PC].[ProductCategoryDescription]
,[P].[Manufacturer]
,[P].[BrandName]
,[P].[ClassID]
,[P].[ClassName]
,[P].[StyleID]
,[P].[StyleName]
,[P].[ColorID]
,[P].[ColorName]
,[P].[Size]
,[P].[SizeRange]
,[P].[SizeUnitMeasureID]
,[P].[Weight]
,[P].[WeightUnitMeasureID]
,[P].[UnitOfMeasureID]
,[P].[UnitOfMeasureName]
,[P].[StockTypeID]
,[P].[StockTypeName]
,[P].[UnitCost]
,[P].[UnitPrice]
,[P].[AvailableForSaleDate]
,[P].[StopSaleDate]
,[P].[Status]
FROM [dbo].[DimProduct] AS [P]
INNER JOIN [dbo].[DimProductSubcategory] AS [PSC]
ON [PSC].[ProductSubcategoryKey] = [P].[ProductSubcategoryKey]
INNER JOIN [dbo].[DimProductCategory] AS [PC]
ON [PC].[ProductCategoryKey] = [PSC].[ProductCategoryKey];

In Power Query, this looks like that:

Import table with native query (Figure by the Author)
Figure 8 — Import table with native query (Figure by the Author)

When doing this, query folding is automatically disabled:

Query folding disabled with Native query (Figure by the Author)
Figure 9 — Query folding disabled with Native query (Figure by the Author)

To enable query folding, you have to change the M-Code, as described in the Article referenced below (Query Folding On SQL Queries In Power Query Using Value.NativeQuery() and EnableFolding=true (Chris Webb’s Blog)).

In our case, I have to change the M-Script to the following:

let
Source = Sql.Databases(“sca4tvddemo.database.windows.net”),
ContosoRetailDW = Source
{[Name = “ContosoRetailDW”]}
[Data],
RunSQL = Value.NativeQuery(
ContosoRetailDW,
“SELECT [P].[ProductKey]
,[P].[ProductLabel]
,[P].[ProductName]
,[P].[ProductDescription]
,[PSC].[ProductSubcategoryName]
,[PSC].[ProductSubcategoryLabel]
,[PSC].[ProductSubcategoryDescription]
,[PC].[ProductCategoryName]
,[PC].[ProductCategoryLabel]
,[PC].[ProductCategoryDescription]
,[P].[Manufacturer]
,[P].[BrandName]
,[P].[ClassID]
,[P].[ClassName]
,[P].[StyleID]
,[P].[StyleName]
,[P].[ColorID]
,[P].[ColorName]
,[P].[Size]
,[P].[SizeRange]
,[P].[SizeUnitMeasureID]
,[P].[Weight]
,[P].[WeightUnitMeasureID]
,[P].[UnitOfMeasureID]
,[P].[UnitOfMeasureName]
,[P].[StockTypeID]
,[P].[StockTypeName]
,[P].[UnitCost]
,[P].[UnitPrice]
,[P].[AvailableForSaleDate]
,[P].[StopSaleDate]
,[P].[Status]
FROM [dbo].[DimProduct] AS [P]
INNER JOIN [dbo].[DimProductSubcategory] AS [PSC]
ON [PSC].[ProductSubcategoryKey] = [P].[ProductSubcategoryKey]
INNER JOIN [dbo].[DimProductCategory] AS [PC]
ON [PC].[ProductCategoryKey] = [PSC].[ProductCategoryKey]”,
null,
[EnableFolding = true]
)
in
RunSQL

Notice that I have removed the Semicolon at the end of the query.

This is necessary as Power query will enclose this query within an outer query when performing query folding.

This means that this approach works only with database engines that don’t enforce the semicolon at the end of each query, like SQL Server and Postgres and others.

Now the Native Query Option is available and is still available after adding transformations. So, these transformations are folded into the source database:

Get Native Query after modifying the M-Code (Figure by the Author)
Figure 10 — Get Native Query after modifying the M-Code (Figure by the Author)

As soon as you access a database, you should know how Query folding works to optimise the data load’s performance.

Then, as soon as you combine data from one source with data from another source, you should take care of the order of your transformations to push as much transformation as possible to the source database.

It doesn’t matter if you have only a few thousand rows. But, as soon as you load millions of rows, performance can become critical.

As you have seen, it is unnecessary to know SQL to leverage the Power of Query folding. It’s enough to see if the option to see the Native query is available or not to see if Query folding happens.

Of course, you can start to learn SQL and do all the transformations directly at query time.

You decide on what is best for you.

Thank you for reading.

Photo by Jon Tyson on Unsplash

Look at the following resources for detailed information on how Power Query and query folding work:

Make sure to not miss the articles of @Nikola Ilic about query folding.

I use the Contoso sample dataset, like in my previous articles. The dataset can be freely used under the MIT License, as described here.

The Download link is here: Download Microsoft Contoso BI Demo Dataset for Retail Industry from Official Microsoft Download Center.

Or, you can use SQLBI’s tool Contoso Data Generator to generate your copy of the Contoso dataset.


Query folding is an important feature in Power Query to offload processing steps to the Source. With this feature, Power Query can reduce CPU and RAM consumption. Let’s look at how we can use it.

Photo by Element5 Digital on Unsplash

Query folding is the mechanism to push Power Query transformations back to the source to reduce the load of Power Query.

This works only for some data sources, namely relational databases and sources which work like relational databases.

For example, look at the following diagram, which shows Power BI, which gets data from a database:

Data from a database (Figure by the Author)
Figure 1 — Data from a database (Figure by the Author)

When adding transformation steps in Power Query, we want to avoid unnecessary load on Power Query.

It would be nice to push back such transformations to the source database in the native database language: SQL.

But why should we want to have this?

As you might know, Relation database engines are very efficient in managing data.

The operations at which a relational database engine excels are:

  • Sorting data (ORDER BY)
  • Combining data (JOIN / Merge)
  • Execute calculation on a large amount of data (Calculated columns)
  • Do only what is necessary (Optimization of Queries)

But the most crucial point is that relational database engines run on server hardware, which is optimised to handle a large amount of data. These servers have much more CPU cores and memory than our humble laptops or PCs and can handle the data with much fewer restrictions than we have.

All data sources with support for SQL can support query folding. Besides the obvious relational databases (like SQL Server, Oracle, DB/2, etc.), query folding is supported by OData sources, Exchange, Active Directory, etc.

Let’s take a closer look.

Now, I use my usual Contoso database to see query folding in action.

I switch to Power Query and use the Azure SQL database source to get some tables from my database in Import mode:

Source tables (Figure by the Author)
Figure 2 — Source tables (Figure by the Author)

After loading the tables into Power Query, you can right-click on one of the applied steps and look if the item “View Native Query” is active:

Get Native Query (Figure by the Author)
Figure 3 — Get Native Query (Figure by the Author)

If it’s active, then Query folding is used, and you can see the query that Power Query generates and send to the source database:

First Native query (Figure by the Author)
Figure 4 — First Native query (Figure by the Author)

In this case, the query is simple and sent straightforwardly to the source database.

Now, let’s apply some modifications:

  • Remove some columns
  • Change the name of columns
  • Change the data type of columns

When I get the Native query, I see this:

Native query after Modifications (Figure by the Author)
Figure 5 — Native query after Modifications (Figure by the Author)

As you can see, the query has changed to reflect the steps in Power Query.

You can notice that the query doesn’t include the removed columns anymore and converts the DateKey column directly with the query.

This is much more efficient than doing this in Power Query, which consumes CPU and memory in our computer.

Now, let’s do a more complex transformation.

To describe a product, we have the following hierarchy:

Product Hierarchy (Figure by the Author)
Figure 6 — Product Hierarchy (Figure by the Author)

As you might know, Power BI doesn’t like such a hierarchy of tables. Power BI works best with a one-step relationship between the Fact and Dimension tables.

The target is to merge these tables into a single Product table.

We can use the Merge transformation in Power Query to Merge the Product-Category into Product-Subcategory and the result into Product (Yes, we can do this directly from the Product table, as Power Query recognises the existing relationships in the database, but let’s look at what happens when we’re doing this manually):

Native query with merged tables (Figure by the Author)
Figure 7 — Native query with merged tables (Figure by the Author)

The query is much more complicated, as it has to merge (Join) the source tables to form the result.

But the database engine is optimised for doing such stuff and will perform this task much better than Power Query.

Unfortunately, not all transformations can be folded.

But it is simple to determine which will not fold:

  1. Transformations which affect more than one source (database)
    For example, when you want to merge a table from a database and a table from a text file, query folding is impossible as the text file doesn’t understand SQL.
    Sadly, the same applies when merging data from two different databases, even when they reside on the same database server
  2. When you apply complex transformations which cannot be translated into SQL
    For example, when you are using M functions to manipulate data.

You can find more information on non-foldable transformations in the articles in the References section below.

You must be aware of these limitations. Consequently, you must perform as many transformations as possible before query folding isn’t possible anymore. This approach increases load performance by offloading the computations to the database engine.

We can write our own queries when retrieving data from a database.

For example, let’s assume that we write a SQL query to join the three product tables together to import the result into Power Query:

SELECT [P].[ProductKey]
,[P].[ProductLabel]
,[P].[ProductName]
,[P].[ProductDescription]
,[PSC].[ProductSubcategoryName]
,[PSC].[ProductSubcategoryLabel]
,[PSC].[ProductSubcategoryDescription]
,[PC].[ProductCategoryName]
,[PC].[ProductCategoryLabel]
,[PC].[ProductCategoryDescription]
,[P].[Manufacturer]
,[P].[BrandName]
,[P].[ClassID]
,[P].[ClassName]
,[P].[StyleID]
,[P].[StyleName]
,[P].[ColorID]
,[P].[ColorName]
,[P].[Size]
,[P].[SizeRange]
,[P].[SizeUnitMeasureID]
,[P].[Weight]
,[P].[WeightUnitMeasureID]
,[P].[UnitOfMeasureID]
,[P].[UnitOfMeasureName]
,[P].[StockTypeID]
,[P].[StockTypeName]
,[P].[UnitCost]
,[P].[UnitPrice]
,[P].[AvailableForSaleDate]
,[P].[StopSaleDate]
,[P].[Status]
FROM [dbo].[DimProduct] AS [P]
INNER JOIN [dbo].[DimProductSubcategory] AS [PSC]
ON [PSC].[ProductSubcategoryKey] = [P].[ProductSubcategoryKey]
INNER JOIN [dbo].[DimProductCategory] AS [PC]
ON [PC].[ProductCategoryKey] = [PSC].[ProductCategoryKey];

In Power Query, this looks like that:

Import table with native query (Figure by the Author)
Figure 8 — Import table with native query (Figure by the Author)

When doing this, query folding is automatically disabled:

Query folding disabled with Native query (Figure by the Author)
Figure 9 — Query folding disabled with Native query (Figure by the Author)

To enable query folding, you have to change the M-Code, as described in the Article referenced below (Query Folding On SQL Queries In Power Query Using Value.NativeQuery() and EnableFolding=true (Chris Webb’s Blog)).

In our case, I have to change the M-Script to the following:

let
Source = Sql.Databases(“sca4tvddemo.database.windows.net”),
ContosoRetailDW = Source
{[Name = “ContosoRetailDW”]}
[Data],
RunSQL = Value.NativeQuery(
ContosoRetailDW,
“SELECT [P].[ProductKey]
,[P].[ProductLabel]
,[P].[ProductName]
,[P].[ProductDescription]
,[PSC].[ProductSubcategoryName]
,[PSC].[ProductSubcategoryLabel]
,[PSC].[ProductSubcategoryDescription]
,[PC].[ProductCategoryName]
,[PC].[ProductCategoryLabel]
,[PC].[ProductCategoryDescription]
,[P].[Manufacturer]
,[P].[BrandName]
,[P].[ClassID]
,[P].[ClassName]
,[P].[StyleID]
,[P].[StyleName]
,[P].[ColorID]
,[P].[ColorName]
,[P].[Size]
,[P].[SizeRange]
,[P].[SizeUnitMeasureID]
,[P].[Weight]
,[P].[WeightUnitMeasureID]
,[P].[UnitOfMeasureID]
,[P].[UnitOfMeasureName]
,[P].[StockTypeID]
,[P].[StockTypeName]
,[P].[UnitCost]
,[P].[UnitPrice]
,[P].[AvailableForSaleDate]
,[P].[StopSaleDate]
,[P].[Status]
FROM [dbo].[DimProduct] AS [P]
INNER JOIN [dbo].[DimProductSubcategory] AS [PSC]
ON [PSC].[ProductSubcategoryKey] = [P].[ProductSubcategoryKey]
INNER JOIN [dbo].[DimProductCategory] AS [PC]
ON [PC].[ProductCategoryKey] = [PSC].[ProductCategoryKey]”,
null,
[EnableFolding = true]
)
in
RunSQL

Notice that I have removed the Semicolon at the end of the query.

This is necessary as Power query will enclose this query within an outer query when performing query folding.

This means that this approach works only with database engines that don’t enforce the semicolon at the end of each query, like SQL Server and Postgres and others.

Now the Native Query Option is available and is still available after adding transformations. So, these transformations are folded into the source database:

Get Native Query after modifying the M-Code (Figure by the Author)
Figure 10 — Get Native Query after modifying the M-Code (Figure by the Author)

As soon as you access a database, you should know how Query folding works to optimise the data load’s performance.

Then, as soon as you combine data from one source with data from another source, you should take care of the order of your transformations to push as much transformation as possible to the source database.

It doesn’t matter if you have only a few thousand rows. But, as soon as you load millions of rows, performance can become critical.

As you have seen, it is unnecessary to know SQL to leverage the Power of Query folding. It’s enough to see if the option to see the Native query is available or not to see if Query folding happens.

Of course, you can start to learn SQL and do all the transformations directly at query time.

You decide on what is best for you.

Thank you for reading.

Photo by Jon Tyson on Unsplash

Look at the following resources for detailed information on how Power Query and query folding work:

Make sure to not miss the articles of @Nikola Ilic about query folding.

I use the Contoso sample dataset, like in my previous articles. The dataset can be freely used under the MIT License, as described here.

The Download link is here: Download Microsoft Contoso BI Demo Dataset for Retail Industry from Official Microsoft Download Center.

Or, you can use SQLBI’s tool Contoso Data Generator to generate your copy of the Contoso dataset.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment