Use-Case-Driven Data Readiness as an Instrument to Manage and Control Data Quality | by Murali Kashaboina

A data readiness-based approach to addressing data quality

TL;DR: Data quality should be viewed through its readiness for specific consumption use cases. A data readiness approach is proposed to establish specific data consumption context through a specific use case to improve the data quality. Data readiness for a specific use case can be tested just like how software use case gets tested using various test-case scenarios. Similar to data dictionaries, a data readiness artifact must also be cataloged. It is no longer a cliched question of whether data is of the highest quality, but the question is for what use cases data is readied to enable data consumption.

The Problem

The issue of data quality has always been a hot topic that is debated widely. Evidently, data quality is not limited to one industry alone but is a problem prevalent across all sectors. I can easily imagine this topic being on the minds of every data officer, who eventually rushes to solve it. The general pattern is that everybody thinks everyone else is addressing it, so they rush to address it with a below-par success rate in most cases. The problem lies in rushing to address it. It is as if somebody drinks the cool-aid too fast, too early.

Can time-tested software quality practices help?

While data quality is a tormenting issue, it should be seen as something other than a standalone concern. It will be hard to rally around a cause to fix it if it is seen as a standalone problem. Conceivably approaches different from the incumbent ones should be incubated. Perhaps we can take a cue from how software quality typically gets addressed. Such well-established practices may shed some light here. Software quality is primarily addressed in terms of software’s readiness for consumption — functional and non-functional readiness of the software. Typically, use cases or user stories represent the functional requirements for software, and the software’s readiness is evaluated by validating if the software can meet such requirements. Likewise, user requirements such as response times and request processing throughput define the non-functional characteristics that get validated on the software. While not every software is perfect, such readiness assessment is a means to an end to ensure better quality. Effectively, functional and non-functional use cases drive a software’s readiness assessment. Therefore, readiness would reason the software’s state of quality by providing a consumption context, and people would rally around to fix the quality to improve the readiness.

A similar philosophy can be applied to data. As soon as data is looked through the lenses of a specific consumption use case, the consumption readiness of such data will become evident immediately. Poor data collection practices, missing data, inaccessible storage mechanisms, poor semantics and structures, lack of standards, lack of intellectual property rights, missing stewardship, unclear ownership, and limited security and privacy protections hinder the consumption readiness of data. Such issues hindering data readiness are also evident signs of poor data quality. Effectively, data readiness indicates how usable, complete, reliable, trustworthy, and meaningful the data is before insightful knowledge and intelligence can be extracted from it to support specific use cases.

Proposed Approach

The proposal is to leverage the data readiness approach to establish specific data consumption context through a specific use case to improve the data quality. Data readiness for a specific use case can be tested just like how software use case gets tested using various test-case scenarios. Such an approach also promotes concepts such as data-product, data-as-a-feature, and data-as-a-product — the concepts the data-mesh aficionados would love.

However, there is a caveat. Data readiness for one use case may not meet the needs of another. Therefore, it is crucial to build data readiness incrementally, covering all the potential use cases in a given business domain. When new use cases are discovered, it becomes critical to re-assess the data readiness for new and existing use cases collectively. Such re-assessment is akin to regression testing of the software when new features are added, or existing features are updated.

The data readiness assessment for a specific use case spans the entire data life-cycle addressing use case requirements from the data creation through data processing operations employed during the data preparation, cleaning, transformation, standardization, canonicalization, and storage. This is again similar to how software gets tested end-to-end to ensure readiness at each component/sub-system in the path to meet specific use case requirements.

Cataloging Data Readiness Information

Just like meta-data, such as data dictionaries, are cataloged, a data readiness artifact must also be cataloged. A data readiness artifact should capture the use cases supported and readiness assessment results for each use case. The data readiness artifact can serve as an authentic information source highlighting data quality operations performed and how such operations ensured the data quality, meeting specific use case requirements. The data readiness artifact will become a record of truth controlled by the corresponding data steward. Such information could reduce repetitive data exploration and analysis, enable easy data auditing, and increase data accountability and trustworthiness. Essentially, it is another way of effectuating data stewardship.

Final Remark

Effectively, it is no longer a cliched question of whether data is of the highest quality, but the question is for what use cases data is readied to enable data consumption. The proposed data readiness-driven approach changes the dialog among the stakeholders and sheds a new perspective on how data is viewed and how data quality is managed.

References

Afzal, S., Rajmohan, C., Kesarwani, M., Mehta, S., & Patel, H. (2021). Data readiness report. 2021 IEEE International Conference on Smart Data Services (SMDS), 42–51. https://doi.org/10.1109/SMDS53860.2021.00016

Cheng, G., Li, Y., Gao, Z., & Liu, X. (2017). Cloud data governance maturity model. 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), 517–520. https://doi.org/10.1109/ICSESS.2017.8342968

Lawrence, N. D. (2017). Data readiness levels. Cornell University ArXiv Computer Science Database. https://doi.org/10.48550/arXiv.1705.02245

A data readiness-based approach to addressing data quality

Photo by Michael Dziedzic on Unsplash

The Problem

Can time-tested software quality practices help?

Proposed Approach

Cataloging Data Readiness Information

Final Remark

References

Lawrence, N. D. (2017). Data readiness levels. Cornell University ArXiv Computer Science Database. https://doi.org/10.48550/arXiv.1705.02245

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.

Use-Case-Driven Data Readiness as an Instrument to Manage and Control Data Quality | by Murali Kashaboina | Dec, 2022

A data readiness-based approach to addressing data quality

The Problem

Can time-tested software quality practices help?

Proposed Approach

Cataloging Data Readiness Information

Final Remark

References

A data readiness-based approach to addressing data quality

The Problem

Can time-tested software quality practices help?

Proposed Approach

Cataloging Data Readiness Information

Final Remark

References

Related Posts