Managing a Federated Data Product Ecosystem | by Eric Broda | Jan, 2023

By Jessie Hobb On Jan 11, 2023

As Data Mesh matures, enterprises are struggling to manage their growing federated data product ecosystem. How can this rapidly evolving ecosystem be managed?

We all know of that data volumes, variety, and complexity are increasing exponentially. Yet, our current approach — centralized data management — is failing. Enterprises are offered the illusion of greater control, yet see slow, inflexible, and bureaucratic processes that hinder innovation.

So out of necessity, it is no surprise to see enterprises experimenting in different data management approaches. Data Mesh is one of these “experiments”. And the early indications are that things are working.

But Data Mesh is still evolving and maturing. And the problems of data federation — which data mesh is based upon — still has some obstacles on its path.

This article describes how Data Mesh embraces federation of data — and federation of data management — to dramatically increase speed, foster agility, and enable local autonomy. But obstacles to success, and how to overcome those obstacles will also be offered. Along these lines, several topics will be discussed, including:

Data Mesh as a crucial enabler of federated data and federated data management
Latent challenges in scaling Data Mesh
New principles to federate data in a Data Mesh
New principles for scaling data federation with Data Mesh

Data Mesh is a relatively new approach that offers a new set of principles for data management. Simplistically, these principles treat data as a first class product with clear ownership, accountabilities, supported by a self-serve platform and federated governance. Lots has been written about this here, here and here so I won’t deep dive into Data Mesh specifically.

Rather, I suggest that there is something even more fundamental about Data Mesh that is worth discussing. Simply put, the promise of Data Mesh is that it enables federation of data and federation of data management, which fosters local autonomy, which drives the speed, agility, and innovation required in today’s fast changing market.

But let’s start at the beginning: What does it mean to have “federated data” and “federated data management”? And what makes it so much better?

Let’s start with the basics: Data is the “Lego Building Block” of the modern enterprise. Like a single Lego block, individual data elements provide limited value. But when combined, blocks can be assembled into bigger components called “data products.

*Figure 1, Data Mesh — the Lego Blocks of Enterprise Data*

Data products, when combined with Data Mesh principles of clear boundaries, empowered owners, self-serve capabilities, enable the “federation” of data products within the enterprise. Data products are distributed across the enterprise, with no central process or team to bind them.

Today, many enterprises today have several data products running in vibrant but small ecosystems. But as these ecosystems grow, we find that data products become:

Hard to find, as there is no “registry” that acts as a searchable directory of data products; And once found, they have little, or inconsistent documentation making data products hard to understand, especially as their usage evolves beyond the original creator group.
Hard to consume and access, since there are few simple or consistent methods to acquire the permissions to access to data; There is name resolution service that translates data product identity into endpoints, as DNS does for the internet; And once access is granted, data products are hard to consume, since there are few consistent mechanisms to access diverse data products.
Hard to operate, observe, and secure, as each data product has diverse security needs, and even more diverse implementations. Instrumentation for observability is inconsistent, diverse security requirements leads to inordinate complexity, and operability delayed until production problems demand resolution.
Hard to trust, as the linage of data, its transformations, and the inevitable errors in the data supply chain make cast doubt upon the quality and trustworthiness of data products; And for regulated industries that require a deep understanding of their data, data products unfortunately, are hard to govern, as few consistent statistics and metrics are exposed by data products leading to increased manual processes.

*Figure 2, Data is Hard to Find, Consume, Trust, and Share*

These problems are symptomatic of so-called “problems of scale”. Success — in this case, the plethora of data products sprouting up in organizations — clearly has some downsides. But how can we tame the unruly evolution of federation and overcome these scaling issues?

Based upon my experience in growing large data ecosystems, I think a new set of principles is emerging that will enable federated data management to scale in a practical and efficient manner. These principles are organized into two categories: One set applies to data products and their owners, and another complementary set fosters rapid growth.

Federating data products rely on many things: clarity of vision, practical weighing of trade-offs, and relentless focus on implementation excellence. But the key success factor for growing your federated ecosystem of data products is the practical implementation and institutionalization of one single notion: The Data Product Owner Reigns Supreme.

So, practically, what does this mean? I suppose it means what it says: the data product owner has final decision making — and veto power — over all elements of their data product. Yes, all decisions. And yes, they need to work within the normal bounds of good enterprise behaviour — they must obey senior management policies, regulatory constraints, and in some cases, profit & loss accountability. But they get to decide how to implement those policies, how to fit within regulatory constraints, and how to hit their revenue (or cost) targets. They decide!

So, make no mistake, where data products owners are empowered, and have real decision rights, you will find a successful data product ecosystem and a growing set of data products.

*Figure 3, Data Product Owner Principles*

So, to bring to life “data product owner supremacy”, I offer the following new data product owner principles:

Data Product Owners “own” their technology decisions; A data product owner can use any technology they feel will be most effective — even those against current enterprise standards — to build their data product. For example, a data product owner’s desire to optimize for agility, speed, and data product owner autonomy takes precedence over traditional cost focus and enterprise standards adoption. Now, to be clear, they should obviously prioritize consideration of existing enterprise standard products, but they are not beholden to them. And yes, they may need to incur the overhead required to bring in a new technology. But, again, they decide.
Data Product Owners “manage” their data supply chain: Data Product Owners vouch for the integrity of their data supply chain. They own all the levers of data ingestions: The data product team, and not a central pipeline team, own the specifications of their data ingestion pipelines; and data product owners also have the skills to design and build the ingestion pipelines. And from a practical perspective, data product owners drive investment in management practices, tools, and instrumentation to proactively identify, diagnose, and quality, stability, or availability issues in their data supply chain.
Data Product Owners “certify” their data demand chain: Data product owners attest — or certify — to the safety, trustworthiness, quality, and adherence to service levels. Data Product owners dictate consumption contracts (obviously in partnership with consumers). They also determine how those contracts are implemented and how they will meet service level expectations. And they proactively measure and publicly report on quality metrics such that data expectations are met.
Data Product owners “sell” their data products: Data products, like any other product offered to consumers by the enterprise, have a lifecycle. Now, while the lifecycles are similar, the emphasis within enterprise are almost always radically different. While owners of traditional products (those offered to, and paid for by, consumers) are expected to attain revenue and cost targets, this is almost never the case for internal products. Similarly, data product owners need to act like traditional product owners — they need to hustle, they need to sell, and they need to get as much support and momentum in the organization to — minimally — become self-sustaining, recognized, and funded.

Still, even with empowered owners, things can still go astray. And where they do, one finds gaps in understanding as well as unnecessary constraints that hinder the growth of a data product ecosystem.

So, let me offer a few principles that have worked well for me. For some, these will be sound contrarian, for some impractical, and for others inconsistent with enterprise mandates. But they work!

These principles prioritize agility and speed over cost containment. They choose testing and learning over the need for perfection. They offer innovation and accelerated time to market over unwarranted consistency.

*Figure 4, Principles to Foster the Growth of Federated Data Products*

So, here are these new principles and here is how they foster the grow of federated data products.

“Let a thousand flowers bloom”: We must encourage data product growth and diversity allowing wide range of ideas and approaches to flourish, rather than being suppressed or restricted. To make it easy to “let a thousand flowers bloom”, an enterprise should make it easy to create safe, secure, observable, and operable data products. It should strongly advocate a “test and learn” approach that is tolerant of experimentation.
“Weed containment is a secondary priority”. Continuing our analogy… in a field of a thousand flowers, there inevitably will be weeds. Pruning the weeds is less important than providing the food and light (yes, the analogy is getting stretched to far) to the flowers. It is crucial to nurture the most promising and valuable data products, while also allowing for the natural process of selection to take place — and allow learning of what works, and what does not — to take place. Yes, poorly constructed data products will not be used, will lose funding, and should eventually die off. But, more importantly, valuable data products will replace them and hopefully thrive.
Make data products easy to find, understand, consume, and trust: Enterprises must establish a consistent “registry” of data products that expose necessary data glossary, knowledge graph, governance information, and feedback. Enterprises, however, must also ensure that it is easy for data producers to create, certify, and govern data products in the registry. To make data trustworthy, data product consumers (and producers) must be able to provide feedback to data product owners. And to use the modern vernacular, they should be able to “like”, “up-vote”, or “star” data products. This “crowdsourcing” model has worked exceedingly well in software (GitHub “stars”) and social media (Facebook “likes”) and provide to data product owners a macro level view of data product quality and trustworthiness while providing invaluable information to data product owners. It almost goes without saying that the role of the “enterprise” is to provide the tools and utilities to make is easy to provide feedback.
Simplify and streamline data product governance: Enterprises should define an absolute minimum set of attributes, metrics, and service levels that each data product owners must provide. But enterprises must also provide the supporting tools and utilities to make it easy for data product owners to establish and publish this information.

This article has described how Data Mesh embraces federation of data — and data management — to dramatically increase speed, foster agility, and enable local autonomy. We have shown that data mesh as a crucial enabler of federated data and federated data management and that latent challenges in scaling data mesh. But we offer several new principles to federate data in a data mesh, while also offering new principles for scaling data federation with Data Mesh.

And it is with this that I hope to not only enable your enterprise data mesh journey but, more importantly, accelerate its growth!

***

All images in this document except where otherwise noted have been created by Eric Broda (the author of this article). All icons used in the images are stock PowerPoint icons and/or are free from copyrights.

The opinions expressed in this article are mine alone and do not necessarily reflect the views of my clients.