Techno Blender
Digitally Yours.

Why and How to Deprecate Dashboards | by Sarah Krasnik | Dec, 2022

0 56


When there are too many dashboards to sift through, establish an automated deprecation process to keep your BI instance clean.

Your dashboards aren’t this critical. Photo from Unsplash

How many dashboards can one person reasonably gain insight from and act upon? While I don’t know the answer (this would actually be a great survey), I hope we can all agree the limit does exist. When we run before we walk and create a dashboard for every question asked without analyzing the question itself, that’s when dashboard bloat happens.

Definition: Dashboard bloat is the effect on an organization when time is wasted finding the relevant visual to answer a question or recreating visuals that already exist.

The cost in this case is time (and thus efficiency). When the holy grail Looker instance is poorly organized, the Head of Growth spends more time looking for that one useful graph on CAC segmented by paid versus free users than actually altering the growth strategy. If cleanliness and organization have already gone down the drain, the analytics team recreates visualizations that already exist.

Dashboard bloat can be the result of a poorly implemented self-serve strategy or the direct antithesis of self-serve, depending on your viewpoint. Some have been saying dashboards aren’t the answer since 2013, but coming up on 10 years later they’re as used as ever. While there certainly are new trends like data apps, automated insights, and operationalized data — dashboards aren’t going anywhere for a long time.

Under the premise they’re here to stay — how do we organize them?

They say startups win because they move fast, which is the case with analytics teams in startups as well. There is usually so much to do, so it’s hard to prioritize strategy over constant execution of everything and anything until tech debt is already there.

On the off chance anyone reading is just starting to think about BI tool strategy, this section is for you. Usually, strategy talks come after going too far too fast and needing to retrace your steps backwards. In that case, answering these questions is about how you want the world to look, not how it looks today.

Dashboard bloat can only be avoided by answering high-level questions about how you, a data leader, envision BI looking within your organization and actively course correcting to align with the vision as closely as possible.

💡 Think about overall analytics strategy.

  • How do you decide which data products to prioritize and how they will be delivered?
  • Who is responsible for modeling data and building visualizations?

💡Get specific with rules of the game for BI.

  • How are the folders in the instance organized?
  • Who has view, edit, and admin permissions in the organization?
  • When do visualizations reach end-of-life status and are no longer supported?

Answers to these questions vary depending on company size (50 vs 500 people?), culture (centralized or distributed analytics team?), business type (B2B or B2C?), and more. That last question, though, is the most important.

Let’s start from the beginning: here is the basic strategy I start with.

Analytics strategy

  1. Make sure each data product has a purpose, big enough to prioritize ongoing maintenance of the data product. The purpose should tie to a larger company goal.
  2. Identify and document SLAs for each data product. Does attribution source per user really need to be exposed via an API and will it actually be used in near real-time? (No. It definitely won’t.)
  3. Each data product should answer a question, not be a statement (ie. instead of “CAC of free vs paid users”, do “When is CAC lowest between free and paid users?”).

BI instance organization

  1. Organize your instance in folders by team. A “general” folder can include company-wide metrics but shouldn’t be overused.
  2. Each team should have edit access to only their team folder and view access to everything else (with exceptions, like team performance or PII).
  3. Clean up your BI instance automatically — this is so important that it deserves its own section.

As data professionals, we’ve all been on the receiving end of an ask that’s claimed to be mission critical. They say beauty is in the eye of the beholder; so is criticality. When we then discover the dashboard that’s updating every 15 minutes isn’t actually used, sure, it can be deleted once. But what’s preventing the same issue from happening again?

Strategy is only as good as its maintenance, and a critical part of maintenance is deprecation.

While human QA might be the most politically-correct approach within a company, it’s the most manual and the hardest to prioritize. I propose an automated dashboard deprecation strategy that’s built once and needs minimal human support.

An automated dashboard deprecation strategy fetches all BI metadata and automatically deletes visuals that haven’t been used in some time.

I’ll walk through pseudo-code samples in Python with Looker as it’s popular and subject to a lot of bloat, while giving guides on other enterprise tools as well (since this can be done with virtually any BI tool).

Let’s get technical.

1. Write a script to dump all BI metadata into the warehouse.

Initialize access to the instance. With Looker this happens via a Python SDK. Set environment variables for the instance URL, client ID, and client secret. Other BI tools might also have an official SDK (ex Tableau), others an unofficial SDK (ex Domo), or you may find it convenient to call the rest API directly (ex PowerBI).

import looker_sdk
sdk = looker_sdk.init31()

Fetch all the data you can through the SDK directly. For Looker, the most useful information is getting all dashboards, looks, and users. For Tableau Server, fetch workbooks, views, and users. No matter the BI tool, you’ll need to clean the response by either casting it as a JSON or extracting only specific fields of relevance (like ID, name, created date, user).

dashboards = sdk.all_dashboards()
looks = sdk.all_looks()
users = sdk.all_users()

Fetch data from internal usage stats reports. Many tools don’t expose usage and history through their API directly. However, they do expose raw datasets like in the case of Looker (i__looker metadata) and PowerBI (raw audit logs), or pre-built reports with Tableau and Domo. Any arbitrary BI report itself can be exported as a dataset in Python. All we really need is a report with the latest access date for each visual.

history_data = sdk.run_look(my_admin_look_id, result_format="json")

Write all these outputs into warehouse tables. I won’t bother you with code snippets for this one. Overwrite tables with data dumps (like all visuals), and append data that builds over time (like historical access). This process should be idempotent and run on a schedule (I suggest daily) to be fully automated.

2. Model the data to make sense of usage.

No matter how you choose to transform data, the raw tables needs to be joined, pivoted, and aggregated to make any sense.

Let’s recap the sources we have:

  • Tables for each visual (dashboards and looks for the Looker example). Call it `looker_dashboard` and `looker_look`
  • Table of users. Call it `looker_user`
  • Table of historical access (either raw or aggregated to latest access date per visual). Call it `looker_historical_access`

The result we need is a table with one row per visual, when it was created, the user who created it, and the last date it was viewed or edited. A rough query could look something like this:

with history as (
select visual_id,
max(access_date) as latest_access_date
from looker_historical_access
group by visual_id
), dashboards as (
select
id as visual_id,
name as visual_name,
user_id as visual_created_user_id,
created_at as visual_created_at,
'dashboard' as type
from dashboard
), looks as (
select
id as visual_id,
name as visual_name,
user_id as visual_created_user_id,
created_at as visual_created_at,
'look' as type
from look
), visuals as(
select * from dashboards union all select * from looks
)
select
v.*,
coalesce(h.latest_access_date, v.visual_created_at) as latest_access_date,
u.email
from visuals as v
left join history as h on h.visual_id =
left join user as u on v.visual_created_user_id;

A few things I’ve glossed over:

  • The history access table may or may not be already aggregated, depending on your source report. It will also likely not contain a clean `visual_id`, so that will have to be derived.
  • You’ll have to union different visual information (whether it be dashboards and looks for Looker or workbooks and views for Tableau).
  • Sometimes, creating the visual doesn’t count as accessing it, so you’ll need to make sure recently created visuals aren’t flagged for deletion.
  • The data can get much more complicated when you start to bring in user access data, folder structures, and more. If building on this process you can structure this more rigidly in different dbt models, but I’ve taken the easiest approach for starters.
  • To alert users via Slack, you’ll need to map their email to a Slack username.
  • If it’s a table not a view, update this on a schedule.

3. Automatically warn users before deprecation, then delete visuals.

So we’ve got all the data in the warehouse and we know which visuals haven’t been used recently (I usually recommend 60 or 90 days as the threshold for “not recent”). The BI tool is usually used heavily outside of the data team, so how should this effort be communicated?

Communicate the reason for the effort. Organizational communication is always the hardest step. Before beginning the deprecation effort, document and communicate the benefits of keeping a clean BI instance to the broader organization (…or circulate this post if you feel so inclined). The purpose is not to delete others’ work; it’s to enable everyone in the company to get insights from data faster.

Create a deprecation Slack channel for automated communication. Anyone who is a user of the BI tool should be added to this channel.

Query visuals that haven’t been accessed in X-7 days and send a Slack message. Visuals included should be ones unused for 53 days if deleting at 60 days of idle time, or 83 days if deleting at 90 days of idle time. Send a Slack into the channel for each visual, tagging the user that created it.

# Everything below is pseudo-code, with utility methods abstracted away
deprecation_days = 60
warn_visuals = get_warehouse_data( # Pseudo method
f'''
select visual_name, created_by_user
from modeled_looker_data
where current_date - last_accessed_date = {deprecation_days - 7}
''')
slack_message_template = '''
Visual {{visual_name}} created by @{{slack_username}} will be
deprecated in 7 days. If this is incorrect, please contact the
analytics team.
'''
for visual in warn_visuals:
send_slack_message(slack_message_template, visual) # Pseudo method

Query visuals that are ready for deletion and delete them programatically. After getting the list of visuals ready to be deleted, you’ll have to iterate over them and delete each one. Within the iteration, there will likely be different methods relevant for different types of visuals. The type can be stored when modeling the data to be used in this step. Each of these visuals should have had a warning message previously.

deprecation_days = 60
delete_visuals = get_warehouse_data( # Pseudo method
f'''
select visual_id
from modeled_looker_data
where current_date - last_accessed_date >= {deprecation_days}
''')
for visual in delete_visuals:
visual_id = visual['visual_id']
if visual['type'] == 'look':
sdk.delete_look(visual_id)
else:
sdk.delete_dashboard(visual_id)

As they say in woodworking: measure twice, cut once. When it comes to deleting things, run the automated process for a few weeks by commenting out the actual deletion to ensure the logic is sound.

The title of this post was purposeful: I find cleaning thrilling, and that includes automatically cleaning BI instances. They say productivity is heightened when your workspace is neat and tidy, so why wouldn’t your BI instance be just as neat?

Finally, I would be remiss if I didn’t make a few additional comments below.

Most enterprise tools don’t have free tiers, which is why I included a rough code outline instead of specific code snippets that would be hard to continually test. I also haven’t mentioned newer or code-heavy BI tools like Superset, Lightdash, Metabase, and the like. While I recommend this approach no matter the tool, the specific endpoints exposed in the API may vary.

It’s important to mention the role of data catalogs in metadata efforts. While most modern data catalogs connect to BI tools and collect the metadata for you, they’re not quite built (yet) to be proactive and delete visuals. A middle ground in the interim, however, could be to export the BI metadata directly from a centralized data catalog and write the deprecation logic yourself. This approach still requires dealing with APIs and the like.

Originally published on sarahsnewsletter.substack.com.


When there are too many dashboards to sift through, establish an automated deprecation process to keep your BI instance clean.

Your dashboards aren’t this critical. Photo from Unsplash

How many dashboards can one person reasonably gain insight from and act upon? While I don’t know the answer (this would actually be a great survey), I hope we can all agree the limit does exist. When we run before we walk and create a dashboard for every question asked without analyzing the question itself, that’s when dashboard bloat happens.

Definition: Dashboard bloat is the effect on an organization when time is wasted finding the relevant visual to answer a question or recreating visuals that already exist.

The cost in this case is time (and thus efficiency). When the holy grail Looker instance is poorly organized, the Head of Growth spends more time looking for that one useful graph on CAC segmented by paid versus free users than actually altering the growth strategy. If cleanliness and organization have already gone down the drain, the analytics team recreates visualizations that already exist.

Dashboard bloat can be the result of a poorly implemented self-serve strategy or the direct antithesis of self-serve, depending on your viewpoint. Some have been saying dashboards aren’t the answer since 2013, but coming up on 10 years later they’re as used as ever. While there certainly are new trends like data apps, automated insights, and operationalized data — dashboards aren’t going anywhere for a long time.

Under the premise they’re here to stay — how do we organize them?

They say startups win because they move fast, which is the case with analytics teams in startups as well. There is usually so much to do, so it’s hard to prioritize strategy over constant execution of everything and anything until tech debt is already there.

On the off chance anyone reading is just starting to think about BI tool strategy, this section is for you. Usually, strategy talks come after going too far too fast and needing to retrace your steps backwards. In that case, answering these questions is about how you want the world to look, not how it looks today.

Dashboard bloat can only be avoided by answering high-level questions about how you, a data leader, envision BI looking within your organization and actively course correcting to align with the vision as closely as possible.

💡 Think about overall analytics strategy.

  • How do you decide which data products to prioritize and how they will be delivered?
  • Who is responsible for modeling data and building visualizations?

💡Get specific with rules of the game for BI.

  • How are the folders in the instance organized?
  • Who has view, edit, and admin permissions in the organization?
  • When do visualizations reach end-of-life status and are no longer supported?

Answers to these questions vary depending on company size (50 vs 500 people?), culture (centralized or distributed analytics team?), business type (B2B or B2C?), and more. That last question, though, is the most important.

Let’s start from the beginning: here is the basic strategy I start with.

Analytics strategy

  1. Make sure each data product has a purpose, big enough to prioritize ongoing maintenance of the data product. The purpose should tie to a larger company goal.
  2. Identify and document SLAs for each data product. Does attribution source per user really need to be exposed via an API and will it actually be used in near real-time? (No. It definitely won’t.)
  3. Each data product should answer a question, not be a statement (ie. instead of “CAC of free vs paid users”, do “When is CAC lowest between free and paid users?”).

BI instance organization

  1. Organize your instance in folders by team. A “general” folder can include company-wide metrics but shouldn’t be overused.
  2. Each team should have edit access to only their team folder and view access to everything else (with exceptions, like team performance or PII).
  3. Clean up your BI instance automatically — this is so important that it deserves its own section.

As data professionals, we’ve all been on the receiving end of an ask that’s claimed to be mission critical. They say beauty is in the eye of the beholder; so is criticality. When we then discover the dashboard that’s updating every 15 minutes isn’t actually used, sure, it can be deleted once. But what’s preventing the same issue from happening again?

Strategy is only as good as its maintenance, and a critical part of maintenance is deprecation.

While human QA might be the most politically-correct approach within a company, it’s the most manual and the hardest to prioritize. I propose an automated dashboard deprecation strategy that’s built once and needs minimal human support.

An automated dashboard deprecation strategy fetches all BI metadata and automatically deletes visuals that haven’t been used in some time.

I’ll walk through pseudo-code samples in Python with Looker as it’s popular and subject to a lot of bloat, while giving guides on other enterprise tools as well (since this can be done with virtually any BI tool).

Let’s get technical.

1. Write a script to dump all BI metadata into the warehouse.

Initialize access to the instance. With Looker this happens via a Python SDK. Set environment variables for the instance URL, client ID, and client secret. Other BI tools might also have an official SDK (ex Tableau), others an unofficial SDK (ex Domo), or you may find it convenient to call the rest API directly (ex PowerBI).

import looker_sdk
sdk = looker_sdk.init31()

Fetch all the data you can through the SDK directly. For Looker, the most useful information is getting all dashboards, looks, and users. For Tableau Server, fetch workbooks, views, and users. No matter the BI tool, you’ll need to clean the response by either casting it as a JSON or extracting only specific fields of relevance (like ID, name, created date, user).

dashboards = sdk.all_dashboards()
looks = sdk.all_looks()
users = sdk.all_users()

Fetch data from internal usage stats reports. Many tools don’t expose usage and history through their API directly. However, they do expose raw datasets like in the case of Looker (i__looker metadata) and PowerBI (raw audit logs), or pre-built reports with Tableau and Domo. Any arbitrary BI report itself can be exported as a dataset in Python. All we really need is a report with the latest access date for each visual.

history_data = sdk.run_look(my_admin_look_id, result_format="json")

Write all these outputs into warehouse tables. I won’t bother you with code snippets for this one. Overwrite tables with data dumps (like all visuals), and append data that builds over time (like historical access). This process should be idempotent and run on a schedule (I suggest daily) to be fully automated.

2. Model the data to make sense of usage.

No matter how you choose to transform data, the raw tables needs to be joined, pivoted, and aggregated to make any sense.

Let’s recap the sources we have:

  • Tables for each visual (dashboards and looks for the Looker example). Call it `looker_dashboard` and `looker_look`
  • Table of users. Call it `looker_user`
  • Table of historical access (either raw or aggregated to latest access date per visual). Call it `looker_historical_access`

The result we need is a table with one row per visual, when it was created, the user who created it, and the last date it was viewed or edited. A rough query could look something like this:

with history as (
select visual_id,
max(access_date) as latest_access_date
from looker_historical_access
group by visual_id
), dashboards as (
select
id as visual_id,
name as visual_name,
user_id as visual_created_user_id,
created_at as visual_created_at,
'dashboard' as type
from dashboard
), looks as (
select
id as visual_id,
name as visual_name,
user_id as visual_created_user_id,
created_at as visual_created_at,
'look' as type
from look
), visuals as(
select * from dashboards union all select * from looks
)
select
v.*,
coalesce(h.latest_access_date, v.visual_created_at) as latest_access_date,
u.email
from visuals as v
left join history as h on h.visual_id =
left join user as u on v.visual_created_user_id;

A few things I’ve glossed over:

  • The history access table may or may not be already aggregated, depending on your source report. It will also likely not contain a clean `visual_id`, so that will have to be derived.
  • You’ll have to union different visual information (whether it be dashboards and looks for Looker or workbooks and views for Tableau).
  • Sometimes, creating the visual doesn’t count as accessing it, so you’ll need to make sure recently created visuals aren’t flagged for deletion.
  • The data can get much more complicated when you start to bring in user access data, folder structures, and more. If building on this process you can structure this more rigidly in different dbt models, but I’ve taken the easiest approach for starters.
  • To alert users via Slack, you’ll need to map their email to a Slack username.
  • If it’s a table not a view, update this on a schedule.

3. Automatically warn users before deprecation, then delete visuals.

So we’ve got all the data in the warehouse and we know which visuals haven’t been used recently (I usually recommend 60 or 90 days as the threshold for “not recent”). The BI tool is usually used heavily outside of the data team, so how should this effort be communicated?

Communicate the reason for the effort. Organizational communication is always the hardest step. Before beginning the deprecation effort, document and communicate the benefits of keeping a clean BI instance to the broader organization (…or circulate this post if you feel so inclined). The purpose is not to delete others’ work; it’s to enable everyone in the company to get insights from data faster.

Create a deprecation Slack channel for automated communication. Anyone who is a user of the BI tool should be added to this channel.

Query visuals that haven’t been accessed in X-7 days and send a Slack message. Visuals included should be ones unused for 53 days if deleting at 60 days of idle time, or 83 days if deleting at 90 days of idle time. Send a Slack into the channel for each visual, tagging the user that created it.

# Everything below is pseudo-code, with utility methods abstracted away
deprecation_days = 60
warn_visuals = get_warehouse_data( # Pseudo method
f'''
select visual_name, created_by_user
from modeled_looker_data
where current_date - last_accessed_date = {deprecation_days - 7}
''')
slack_message_template = '''
Visual {{visual_name}} created by @{{slack_username}} will be
deprecated in 7 days. If this is incorrect, please contact the
analytics team.
'''
for visual in warn_visuals:
send_slack_message(slack_message_template, visual) # Pseudo method

Query visuals that are ready for deletion and delete them programatically. After getting the list of visuals ready to be deleted, you’ll have to iterate over them and delete each one. Within the iteration, there will likely be different methods relevant for different types of visuals. The type can be stored when modeling the data to be used in this step. Each of these visuals should have had a warning message previously.

deprecation_days = 60
delete_visuals = get_warehouse_data( # Pseudo method
f'''
select visual_id
from modeled_looker_data
where current_date - last_accessed_date >= {deprecation_days}
''')
for visual in delete_visuals:
visual_id = visual['visual_id']
if visual['type'] == 'look':
sdk.delete_look(visual_id)
else:
sdk.delete_dashboard(visual_id)

As they say in woodworking: measure twice, cut once. When it comes to deleting things, run the automated process for a few weeks by commenting out the actual deletion to ensure the logic is sound.

The title of this post was purposeful: I find cleaning thrilling, and that includes automatically cleaning BI instances. They say productivity is heightened when your workspace is neat and tidy, so why wouldn’t your BI instance be just as neat?

Finally, I would be remiss if I didn’t make a few additional comments below.

Most enterprise tools don’t have free tiers, which is why I included a rough code outline instead of specific code snippets that would be hard to continually test. I also haven’t mentioned newer or code-heavy BI tools like Superset, Lightdash, Metabase, and the like. While I recommend this approach no matter the tool, the specific endpoints exposed in the API may vary.

It’s important to mention the role of data catalogs in metadata efforts. While most modern data catalogs connect to BI tools and collect the metadata for you, they’re not quite built (yet) to be proactive and delete visuals. A middle ground in the interim, however, could be to export the BI metadata directly from a centralized data catalog and write the deprecation logic yourself. This approach still requires dealing with APIs and the like.

Originally published on sarahsnewsletter.substack.com.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment