Techno Blender
Digitally Yours.

Big data experience with Tableau dashboards — challenges and learnings | by Alle Sravani

0 94


Photo by Myriam Jessier on Unsplash

My humble dashboard creation journey began with Excel. Since then, I have used multiple tools like Qlik, Sisense, PowerBI & Tableau. Tableau is still my favorite since it never gets dull. It’s easy to use and learn; but can also get complex very quickly. The satisfaction you get on a task well done is priceless. I had the opportunity to collaborate with many Tableau experts. I gradually picked up many tips and tricks for making powerful visuals. Even though I’ve gained proficiency at utilising the tool, I think I’ve only scratched the surface.

One of the biggest challenges I faced recently was building a dashboard using big data. I had to design a dashboard to track the effectiveness of push notifications sent to the mobile app users. Push notifications are sent for many reasons, including status updates, new offers, reminders, campaigns, etc. This data can become extremely large depending on the number of users and the notifications sent. When I started working on this project, I had three months worth of data consisting more than 50 million records in the source table!

Best practices for dashboard creation process. Image created by author using Powerpoint.

Before working on creating a new dashboard, one should gather all the requirements from the stakeholders/ end users. It’s crucial to understand what to expect from a dashboard — what it’s for, what metrics should be tracked, what level of granularity should be used, what filters are required, and so on. If at all possible, all of these should be documented. This is to avoid receiving multiple change requests and wasting time developing something that will never be used. I had several meetings with stakeholders while working on the project to ensure that all requirements were met. I also involved the engineering team to ensure that the required data was readily available in our data warehouse.

The next step is to create an engineering framework — in the case of Tableau, this will be a proper data extract to which the dashboard can connect. What is an extract? They are simply saved subsets of your original data that can be refreshed regularly. The biggest advantage of an extract is improved performance. You reduce the number of total queries to your primary data source (needed only while refreshing). Furthermore, you can aggregate or keep only the required fields and apply filters to reduce the total size of the extract. If the extract is well-designed, it can be used as a data source for multiple workbooks, saving time on creating a new dataset each time. You can read the complete documentation of Tableau extracts here.

There are multiple ways of creating an extract:

  • Create a data model using Tableau Desktop and then publish it to the server as an extract — it’s easy and works well for small datasets
  • Create a data model using Tableau Desktop, publish it to the server as a “live” connection and then change it to extract on the server — best for medium-sized datasets for saving time to create an extract locally

This is where I encountered my first stumbling block. I was unable to create an extract because the data was too large. I made a live connection to the dataset to see if I could still work without creating an extract. It had been a nightmare. The dashboard was extremely slow, and whenever I tried to change something in the view, it would either take 20 seconds or more to load, or the application would stop working entirely. I began looking for other options when I came across a useful article that demonstrated how to create an empty extract and trick the server into refreshing the data here.

Success message after an extract is published to Tableau. Screenshot taken by author.

I was super excited to have successfully created an extract. I built the dashboards according to the specifications, and after a thorough QA (Quality Assurance) check, the dashboard was made available to everyone. However, my success was short-lived. After a few months, the extract failed to refresh on the server, and the connection automatically switched from extract to live. The data had surpassed 200 million records, and any attempt to extract it, even using the empty-extract trick, failed. The dashboard was slow again and I began receiving requests from various users to improve the performance.

Lesson 1 — Temporary fixes don’t last

I knew that the only way to improve performance was to reduce the size of the data and make it an extract. While creating the dataset in our data warehouse, some best practices were already followed. To avoid costly joins, all of the properties (for both users and push notifications) required for analysis were consolidated into a single large table (OBT). There’s a common consensus that star schema is always better than OBT as the “normalised” tables require less storage and are easier to understand. However, for big data analytical reporting, joins between the fact and multiple dimension tables can have a big impact on the performance when retrieving data. The tradeoffs between using a Star schema and OBT are well explained in this blog here.

Any manipulation I could do was only possible in the dataset that I had created for use in the dashboard. Now that the dashboards were completed, I noticed that some metrics were specific only for the users while others were only for the push notifications. This opened up new possibilities:

  • I could create two different datasets — one for the users and the other for the push notifications
  • I could also utilise CTEs (common table expression) and window queries to add additional metrics to avoid creating some calculated fields in Tableau.

With the above changes, the combined size of the new aggregated datasets was only 3% of the original! All that remained was to replace the old data sources in the dashboard with new ones.

Lesson 2 — Always work with extracts on Tableau

The second roadblock I encountered was that while Tableau allows you to replace a data source, you cannot replace it with multiple data sources. There is no way to selectively assign a data source to specific sheets. This meant I would have to rebuild at least half of the entire dashboard from scratch with one of the new datasets. I was hesitant to start over given the number of custom calculations, parameters, and formatting tricks I had already implemented. This was my last option. I looked online for a solution but couldn’t find one. A Tableau expert I know suggested a clever workaround of copying sheets from another workbook.

The following is how it works:

  • Make a copy of your original workbook (call it workbook1) and delete any sheets for which you do not want to replace the data source.
  • Replace the data source in workbook1 with a new data source.
  • Copy and paste the workbook1 sheets into the original workbook. (To copy a sheet, select ‘copy’ from the bottom ribbon by clicking on the sheet name. Then, return to the original workbook by right-clicking on the ribbon and selecting paste.)
  • Now comes the magic: you’ll have two sets of sheets, one from workbook1 with the correct data source and one from the original data source with the incorrect data source.
  • All that remains is to swap sheets. To swap a sheet, go to the dashboard and click ‘swap sheets’ to ensure you’re tapping into the correct ones.
    After swapping, delete the sheets from the original workbook that are no longer required.
  • Replace the original workbook’s old data source with the new data source. This will alter the data source for the remaining sheets.
  • Congratulations! You have successfully replaced a single-data-source workbook with two-data-source workbooks.

The entire process took an hour, whereas manually building half the dashboard might have taken me a whole day.

Lesson 3 — Make friends with Tableau experts

It has been a while since the new version of the dashboards is up. There have been no reports of bugs or performance issues. So I guess the work here is done. It’s funny how sometimes you expect a very complicated solution to your problem, but the actual solution that works the best is also the simplest. I hope you found something useful here. If you are also a Tableau user, I would love to know what are the best hacks you have used in recent times!

Keep learning, don’t hesitate to seek help. Photo by John Schnobrich on Unsplash


Photo by Myriam Jessier on Unsplash

My humble dashboard creation journey began with Excel. Since then, I have used multiple tools like Qlik, Sisense, PowerBI & Tableau. Tableau is still my favorite since it never gets dull. It’s easy to use and learn; but can also get complex very quickly. The satisfaction you get on a task well done is priceless. I had the opportunity to collaborate with many Tableau experts. I gradually picked up many tips and tricks for making powerful visuals. Even though I’ve gained proficiency at utilising the tool, I think I’ve only scratched the surface.

One of the biggest challenges I faced recently was building a dashboard using big data. I had to design a dashboard to track the effectiveness of push notifications sent to the mobile app users. Push notifications are sent for many reasons, including status updates, new offers, reminders, campaigns, etc. This data can become extremely large depending on the number of users and the notifications sent. When I started working on this project, I had three months worth of data consisting more than 50 million records in the source table!

Best practices for dashboard creation process. Image created by author using Powerpoint.

Before working on creating a new dashboard, one should gather all the requirements from the stakeholders/ end users. It’s crucial to understand what to expect from a dashboard — what it’s for, what metrics should be tracked, what level of granularity should be used, what filters are required, and so on. If at all possible, all of these should be documented. This is to avoid receiving multiple change requests and wasting time developing something that will never be used. I had several meetings with stakeholders while working on the project to ensure that all requirements were met. I also involved the engineering team to ensure that the required data was readily available in our data warehouse.

The next step is to create an engineering framework — in the case of Tableau, this will be a proper data extract to which the dashboard can connect. What is an extract? They are simply saved subsets of your original data that can be refreshed regularly. The biggest advantage of an extract is improved performance. You reduce the number of total queries to your primary data source (needed only while refreshing). Furthermore, you can aggregate or keep only the required fields and apply filters to reduce the total size of the extract. If the extract is well-designed, it can be used as a data source for multiple workbooks, saving time on creating a new dataset each time. You can read the complete documentation of Tableau extracts here.

There are multiple ways of creating an extract:

  • Create a data model using Tableau Desktop and then publish it to the server as an extract — it’s easy and works well for small datasets
  • Create a data model using Tableau Desktop, publish it to the server as a “live” connection and then change it to extract on the server — best for medium-sized datasets for saving time to create an extract locally

This is where I encountered my first stumbling block. I was unable to create an extract because the data was too large. I made a live connection to the dataset to see if I could still work without creating an extract. It had been a nightmare. The dashboard was extremely slow, and whenever I tried to change something in the view, it would either take 20 seconds or more to load, or the application would stop working entirely. I began looking for other options when I came across a useful article that demonstrated how to create an empty extract and trick the server into refreshing the data here.

Success message after an extract is published to Tableau. Screenshot taken by author.

I was super excited to have successfully created an extract. I built the dashboards according to the specifications, and after a thorough QA (Quality Assurance) check, the dashboard was made available to everyone. However, my success was short-lived. After a few months, the extract failed to refresh on the server, and the connection automatically switched from extract to live. The data had surpassed 200 million records, and any attempt to extract it, even using the empty-extract trick, failed. The dashboard was slow again and I began receiving requests from various users to improve the performance.

Lesson 1 — Temporary fixes don’t last

I knew that the only way to improve performance was to reduce the size of the data and make it an extract. While creating the dataset in our data warehouse, some best practices were already followed. To avoid costly joins, all of the properties (for both users and push notifications) required for analysis were consolidated into a single large table (OBT). There’s a common consensus that star schema is always better than OBT as the “normalised” tables require less storage and are easier to understand. However, for big data analytical reporting, joins between the fact and multiple dimension tables can have a big impact on the performance when retrieving data. The tradeoffs between using a Star schema and OBT are well explained in this blog here.

Any manipulation I could do was only possible in the dataset that I had created for use in the dashboard. Now that the dashboards were completed, I noticed that some metrics were specific only for the users while others were only for the push notifications. This opened up new possibilities:

  • I could create two different datasets — one for the users and the other for the push notifications
  • I could also utilise CTEs (common table expression) and window queries to add additional metrics to avoid creating some calculated fields in Tableau.

With the above changes, the combined size of the new aggregated datasets was only 3% of the original! All that remained was to replace the old data sources in the dashboard with new ones.

Lesson 2 — Always work with extracts on Tableau

The second roadblock I encountered was that while Tableau allows you to replace a data source, you cannot replace it with multiple data sources. There is no way to selectively assign a data source to specific sheets. This meant I would have to rebuild at least half of the entire dashboard from scratch with one of the new datasets. I was hesitant to start over given the number of custom calculations, parameters, and formatting tricks I had already implemented. This was my last option. I looked online for a solution but couldn’t find one. A Tableau expert I know suggested a clever workaround of copying sheets from another workbook.

The following is how it works:

  • Make a copy of your original workbook (call it workbook1) and delete any sheets for which you do not want to replace the data source.
  • Replace the data source in workbook1 with a new data source.
  • Copy and paste the workbook1 sheets into the original workbook. (To copy a sheet, select ‘copy’ from the bottom ribbon by clicking on the sheet name. Then, return to the original workbook by right-clicking on the ribbon and selecting paste.)
  • Now comes the magic: you’ll have two sets of sheets, one from workbook1 with the correct data source and one from the original data source with the incorrect data source.
  • All that remains is to swap sheets. To swap a sheet, go to the dashboard and click ‘swap sheets’ to ensure you’re tapping into the correct ones.
    After swapping, delete the sheets from the original workbook that are no longer required.
  • Replace the original workbook’s old data source with the new data source. This will alter the data source for the remaining sheets.
  • Congratulations! You have successfully replaced a single-data-source workbook with two-data-source workbooks.

The entire process took an hour, whereas manually building half the dashboard might have taken me a whole day.

Lesson 3 — Make friends with Tableau experts

It has been a while since the new version of the dashboards is up. There have been no reports of bugs or performance issues. So I guess the work here is done. It’s funny how sometimes you expect a very complicated solution to your problem, but the actual solution that works the best is also the simplest. I hope you found something useful here. If you are also a Tableau user, I would love to know what are the best hacks you have used in recent times!

Keep learning, don’t hesitate to seek help. Photo by John Schnobrich on Unsplash

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment