Tableau Data Blending Tutorial — A Step-By-Step Guide For Beginners | by Zoumana Keita | Jan, 2023


Image by Lukas Blazek on Unsplash

Nowadays, companies use data from different sources to solve their business problems. Being able to efficiently collect and combine those data has become an essential skill for all Data Scientists and Data Engineers in order to assist organizations with smart decision-making.

In this tutorial, we will first build your understanding of a powerful data combination approach called Data Blending before exploring the benefits. Then we will explore some of its drawbacks and finally cover some of the top questions related to Data Blending using Tableau.

In real life, you will be dealing with information coming from multiple sources such as Excel Spreadsheet, SQL, CSV, etc. As Data Scientist, you will need to interconnect them in order to generate global business insights.

Tableau can help tackle this issue using two different approaches: joining, blending, and relationships.

In this section, we will focus on understanding the difference between the first two approaches.

What are Joins in Tableau?

If you have been working with SQL before, you might be familiar with the main joins concepts: left, right, inner, cross, and full outer joins. The same applies when using Tableau.

These Joins aim to combine different tables from the same source, based on some logical column relationships between those tables. For instance, trying to combine an Excel file and a SQL table will break because they do not come from the same source.

Also, when joining tables, the columns used in the two tables must be the same and changing these data type lead the join to break.

Finally, it is not possible to remove duplicate columns from the joins in Tableau.

What is Blending in Tableau?

Unlike Joins and Relationships, Data Blending aggregates data from multiple sources such as databases, business intelligence systems, cloud systems, flat files, web services, etc. to create a single data for better visualizations.

Why use Tableau for Data Blending?

First of all, What is Tableau?

Tableau is a no-code Business Intelligence tool, that provides an intuitive drag-and-drop interface for analytics and visualization. The non-technical aspect makes it stand out in the industry.

In addition, it is fast and provides the capability to interconnect data from multiple sources such as spreadsheets, SQL databases, web services, etc. whether from the cloud or on-premise.

Data Blending with Tableau provides the same flexibility and makes it easier to combine data from multiple sources without any programming skills.

Step-by-Step Guide on How to Blend Data in Tableau.

For a better understanding of the Data Blending concept, this section walks you through a step-by-step process to perform Data Blending with Tableau.

  1. Understand the Use Case

This case is about analyzing the 2020 and 2021 revenues of two departments from Zoom.AI, an AI company that operates in different African capitals (Big Cities) as shown in the screenshots below.

Revenue Data From Department 1 and Department 2 (Image by Author)

2. Understand the data columns

We notice that both data have the same format and convey the same message. However, the first two columns’ names are different (Period, Capitals for Department 1, and Year, Cities for Department 2).

Trying to apply Joins to the current format of the Data will fail because the columns involved in the Join need to be the same. We will see how Data Blending can smartly combine those data without any prior column normalization.

3. Create Data Connections

Knowing that our data is stored in different files, we need to create a space in Tableau, where they can be regrouped, and that is done by creating Data Connections in Tableau as shown below:

Importing data into Tableau (Animation by Author)

Our departments’ data appear in the Data tab, in the left-top corner, meaning they have been successfully uploaded.

4. Start Data Blending

Let’s start by visualizing how the Revenue varies per Capitals for the first department. This can be done with the following two steps:

  • drag and drop the Revenue column in the Columns section of Tableau.
  • drag and drop the Capitals column in the Rows section of Tableau.
Visualization of Revenue for 2020 (Animation by Author)

The goal is to use Data Blending in order to compare the departments’ revenue. For this purpose, we need to perform the same drag-and-drop process for the second department’s data, which has the Cities column instead, as shown below.

The result for department 2 has been associated with the orange color for clarification purposes.

Adding the Revenue for Department 2 (Animation by Author)

The visualization broke💥💥💥, Why?!

The reason for that is that we haven’t specified to Tableau that the Capitals column has the same level of information as the Cities column, which leads to having the star signs for the Cities column in the visualization.

This is where Data Blending comes in handy by fixing such an issue, which can be done by specifying that the Cities and Capitals columns mean the same thing. Below is the illustration.

Proper blending of the data (Animation by Author)

After we solved the issue, we can see an orange chain⛓ sign on the Cities column on the left. This means that the Cities column has been used as the linking field.

More Data Blending

Now that you understand how to properly blend data, let’s perform the same analysis at the year/period level. To do so, we just need to create a link between the Period and Year columns, otherwise, we will face the same issue as the one before.

Proper blending of the data (Animation by Author)

Congrats on making it to this level! Now that you know how to perform Data Blending, you can learn how to create advanced visualizations using Tableau from this course.

What are the benefits of Data Blending in Tableau?

As you can observe from the previous tutorial, using data blending provides a lot of benefits compared to Joins. Some of them are shown below:

  1. No preprocessing of data columns

We don’t need to perform any costly preprocessing of columns to be able to combine different data with Data Blending, because a simple specification of the columns of interest is enough for the Data Blending algorithm to understand how to efficiently combine the data.

2. A better way to solve the granularity issue

Data Blending provides better flexibility when it comes to combining data with different levels of granularity. Let’s imagine that our departments’ revenue data were organized at a monthly level instead of yearly.

Using simple joins will lead to an inaccurate result because the yearly rows will be duplicated for each monthly row as illustrated below. Data Blending can solve this issue by aggregating the months’ information to the yearly level.

Data Blending at different levels of granularity (Image by Author)

Team productivity

Let’s consider the previous granularity case. Without Data Blending, we would have to perform some data preprocessing such as aggregations, and duplicate removal. When dealing with huge data sets, this kind of preprocessing can quickly become time-consuming.

Quick analysis for better decision-making

Accurate decision-making relies on having a better overview of the data. It took a few clicks and drag and drop to capture the essence of the two departments. Using Joins could lead to inaccurate results, thus inaccurate decision-making.

Tableau Data Blending Limitations.

Despite the advantages of data blending, it also has some downsides as shown below:

  • Data Blending works with the left join under the hood, and it does not perform any other types of joins.
  • The order matters when trying to blend data with different granularity. The secondary data always have to have the smallest level of granularity, because it is the one that is subject to aggregation.
  • Non-additive aggregates such as COUNT, MEAN, MEDIAN, and SUM suffer from data blending.

Conclusion

This tutorial has covered the main aspects related to the motivations behind using Data Blending, some of its benefits for Data Scientists compared to joins, and also a hands-on practice to better understand how it works.

I hope you found this benchmarking analysis useful for making a good choice!

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!




Image by Lukas Blazek on Unsplash

Nowadays, companies use data from different sources to solve their business problems. Being able to efficiently collect and combine those data has become an essential skill for all Data Scientists and Data Engineers in order to assist organizations with smart decision-making.

In this tutorial, we will first build your understanding of a powerful data combination approach called Data Blending before exploring the benefits. Then we will explore some of its drawbacks and finally cover some of the top questions related to Data Blending using Tableau.

In real life, you will be dealing with information coming from multiple sources such as Excel Spreadsheet, SQL, CSV, etc. As Data Scientist, you will need to interconnect them in order to generate global business insights.

Tableau can help tackle this issue using two different approaches: joining, blending, and relationships.

In this section, we will focus on understanding the difference between the first two approaches.

What are Joins in Tableau?

If you have been working with SQL before, you might be familiar with the main joins concepts: left, right, inner, cross, and full outer joins. The same applies when using Tableau.

These Joins aim to combine different tables from the same source, based on some logical column relationships between those tables. For instance, trying to combine an Excel file and a SQL table will break because they do not come from the same source.

Also, when joining tables, the columns used in the two tables must be the same and changing these data type lead the join to break.

Finally, it is not possible to remove duplicate columns from the joins in Tableau.

What is Blending in Tableau?

Unlike Joins and Relationships, Data Blending aggregates data from multiple sources such as databases, business intelligence systems, cloud systems, flat files, web services, etc. to create a single data for better visualizations.

Why use Tableau for Data Blending?

First of all, What is Tableau?

Tableau is a no-code Business Intelligence tool, that provides an intuitive drag-and-drop interface for analytics and visualization. The non-technical aspect makes it stand out in the industry.

In addition, it is fast and provides the capability to interconnect data from multiple sources such as spreadsheets, SQL databases, web services, etc. whether from the cloud or on-premise.

Data Blending with Tableau provides the same flexibility and makes it easier to combine data from multiple sources without any programming skills.

Step-by-Step Guide on How to Blend Data in Tableau.

For a better understanding of the Data Blending concept, this section walks you through a step-by-step process to perform Data Blending with Tableau.

  1. Understand the Use Case

This case is about analyzing the 2020 and 2021 revenues of two departments from Zoom.AI, an AI company that operates in different African capitals (Big Cities) as shown in the screenshots below.

Revenue Data From Department 1 and Department 2 (Image by Author)

2. Understand the data columns

We notice that both data have the same format and convey the same message. However, the first two columns’ names are different (Period, Capitals for Department 1, and Year, Cities for Department 2).

Trying to apply Joins to the current format of the Data will fail because the columns involved in the Join need to be the same. We will see how Data Blending can smartly combine those data without any prior column normalization.

3. Create Data Connections

Knowing that our data is stored in different files, we need to create a space in Tableau, where they can be regrouped, and that is done by creating Data Connections in Tableau as shown below:

Importing data into Tableau (Animation by Author)

Our departments’ data appear in the Data tab, in the left-top corner, meaning they have been successfully uploaded.

4. Start Data Blending

Let’s start by visualizing how the Revenue varies per Capitals for the first department. This can be done with the following two steps:

  • drag and drop the Revenue column in the Columns section of Tableau.
  • drag and drop the Capitals column in the Rows section of Tableau.
Visualization of Revenue for 2020 (Animation by Author)

The goal is to use Data Blending in order to compare the departments’ revenue. For this purpose, we need to perform the same drag-and-drop process for the second department’s data, which has the Cities column instead, as shown below.

The result for department 2 has been associated with the orange color for clarification purposes.

Adding the Revenue for Department 2 (Animation by Author)

The visualization broke💥💥💥, Why?!

The reason for that is that we haven’t specified to Tableau that the Capitals column has the same level of information as the Cities column, which leads to having the star signs for the Cities column in the visualization.

This is where Data Blending comes in handy by fixing such an issue, which can be done by specifying that the Cities and Capitals columns mean the same thing. Below is the illustration.

Proper blending of the data (Animation by Author)

After we solved the issue, we can see an orange chain⛓ sign on the Cities column on the left. This means that the Cities column has been used as the linking field.

More Data Blending

Now that you understand how to properly blend data, let’s perform the same analysis at the year/period level. To do so, we just need to create a link between the Period and Year columns, otherwise, we will face the same issue as the one before.

Proper blending of the data (Animation by Author)

Congrats on making it to this level! Now that you know how to perform Data Blending, you can learn how to create advanced visualizations using Tableau from this course.

What are the benefits of Data Blending in Tableau?

As you can observe from the previous tutorial, using data blending provides a lot of benefits compared to Joins. Some of them are shown below:

  1. No preprocessing of data columns

We don’t need to perform any costly preprocessing of columns to be able to combine different data with Data Blending, because a simple specification of the columns of interest is enough for the Data Blending algorithm to understand how to efficiently combine the data.

2. A better way to solve the granularity issue

Data Blending provides better flexibility when it comes to combining data with different levels of granularity. Let’s imagine that our departments’ revenue data were organized at a monthly level instead of yearly.

Using simple joins will lead to an inaccurate result because the yearly rows will be duplicated for each monthly row as illustrated below. Data Blending can solve this issue by aggregating the months’ information to the yearly level.

Data Blending at different levels of granularity (Image by Author)

Team productivity

Let’s consider the previous granularity case. Without Data Blending, we would have to perform some data preprocessing such as aggregations, and duplicate removal. When dealing with huge data sets, this kind of preprocessing can quickly become time-consuming.

Quick analysis for better decision-making

Accurate decision-making relies on having a better overview of the data. It took a few clicks and drag and drop to capture the essence of the two departments. Using Joins could lead to inaccurate results, thus inaccurate decision-making.

Tableau Data Blending Limitations.

Despite the advantages of data blending, it also has some downsides as shown below:

  • Data Blending works with the left join under the hood, and it does not perform any other types of joins.
  • The order matters when trying to blend data with different granularity. The secondary data always have to have the smallest level of granularity, because it is the one that is subject to aggregation.
  • Non-additive aggregates such as COUNT, MEAN, MEDIAN, and SUM suffer from data blending.

Conclusion

This tutorial has covered the main aspects related to the motivations behind using Data Blending, some of its benefits for Data Scientists compared to joins, and also a hands-on practice to better understand how it works.

I hope you found this benchmarking analysis useful for making a good choice!

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsbeginnersblendingDataGuideJanKeitamachine learningStepbySteptableauTechnoblenderTutorialZoumana
Comments (0)
Add Comment