Start-Up Data Mesh Blueprint: 3 Steps for Becoming a Data-Driven Start-Up through the Data Mesh | by Sven Balnojan | May, 2022


A dead-simple 3-step blueprint for start-ups to get data-driven right through the data mesh

The 3-step blueprint for becoming a data meshed data-driven start-up. Image by Author.

The word “data mesh” is the top trend inside the data industry. Common wisdom currently seems to be that it’s something for big companies.

But that’s wrong IMHO. Even better, start-ups have the unique chance to leverage the data mesh from the very beginning of their existence to become more data-driven than any single incumbent in their market. The simple truth is, that transferring over to a data mesh is hard work because it involves organizational changes. Start-ups have the unique chance of doing the organizational foundations right from the beginning, and thus avoid the lock-in into a legacy data organization large incumbent companies will have.

I’ve done a bunch of in-depth studies of data journeys to deliver a dead-simple blueprint most start-ups can follow. The gaming start-up Kolibri Games with their hit game “Idle Miner Tycoon” in particular inspired a lot of this because they shared some particularly deep insights into their data journey reaching back to the founding of their company until today.

The blueprint is a simple three-step blueprint, both organizationally and technologically:

1. Step: Focus on using data in decisions, until you maximize the value per piece of data.

2. Step: Focus on building data into decisions, until you maximize the amount of data produced.

3. Step: Focus on the speed of deriving decisions from data.

Let’s first understand why I think start-ups have a unique opportunity here.

Each stage of the blueprint as depicted above and below has three “columns”.

  • One technology column left side of the pic,
  • one people & processes column, in the middle,
  • and one “manifest” column, on the right.

The technology column, while important, is actually the least important one in the data mesh context. Technology alone will not do anything for you. In fact, you can use the same technologies to build up a plain new modern data stack, without getting an inch closer to being data-driven.

It is first and foremost the commitment to a “manifesto”, a clear usage of data throughout every decision at the company that is most important. It’s the people that are hired to support that. And it’s the processes that clearly dictate that the decision-makers & their support teams own the data.

This is the only way to ensure that data will be built into decisions, that it is really “owned”.

Changing from a traditional “data is just some by-product, analytics teams that do some analysis” approach towards true responsibility & ownership for data is hard, really hard. It will involve changing people around, hiring or upskilling, and changing lots of processes. That is, why if you do it right from the beginning, you as a start-up will have a unique edge over your competition.

A data mesh is no technology, it’s more a mindset that says…

“The one organizational unit who produces the data also owns it; That means it’s responsible for serving it to others and transforming it to something useful”.

(That’s a very short version though!)

So if I am the product manager for a piece of software, I also am responsible for the data it produces. After all, I also need to know whether it works, right? If I make a change or implement something new, without data I won’t know whether it’s the right thing. So also owning the data seems like the most natural thing in the world to me. I am responsible for having the data captured. I am also responsible for providing this data to other teams, just like I am for any other API my team builds.

The catch is, that this wasn’t always the case, in fact, most software developers and even product managers won’t share that perspective today.

But that just means one thing: If you’re a start-up, you will be able to make more right decisions, being more data-driven than most companies in the market, if you adopt this mentality right off the start.

Now let us take a look at our example product manager Eve.

Eve is the product manager for an online game. She wants to increase the total time played which is the number of players multiplied by their average game time. The game currently consists of three rounds.

Her idea is that increasing the rounds from 3 to 4 rounds will increase that by roughly keeping the number of players constant and of course increasing their average game time (which again is related to the churn rate in the game).

If Eve implements that change, our start-up needs three things to make this decision data-driven, and then learn from it:

  • 1. We need some kind of guidance for Eve so she knows that she should do some research for her decision and not simply base it on gut feeling. => We call that the “Manifesto” which can take many forms.
  • 2. We need some kind of technology that allows someone to collect some of the data points Eve will need, like the number of players and the average game time. => We call that the technology platform.
  • 3. Finally, we need some processes/ people that help Eve to focus on making decisions. => We call that the “processes & people” part.

As painful as it is, the starting point, and one alluded to by Kolibri Games is getting started with using data, even if there is next to no infrastructure at all. That means within the first 10–20 people in your company, there should be 1–2 analysts, supporting decisions.

If you look at Airbnb, they hired their first data scientist within the first 20 employees. At Spotify one of the first hires went on to build the recommendation system, the key stepping stone for Spotify to grow to its current size.

But even data-minded product managers & founders will do the job, as long as you have people who are starting to use data, wherever it may come from.

What follows is just one possible blueprint. I’ve seen it implemented a bunch of times and it’s quite flexible, but of course, there are others that fit better depending on your situation. But as said, this is supposed to make things simpler, not harder, so I’ll stick with one flexible blueprint for start-up data meshes.

We can summarize step 1 in the following bullet points:

  • 3rd party tools all the way
  • simplest possible data integration, or none at all;
  • not even a full employee needed to take care of the infrastructure;
  • multiple decision-makers using data;
  • data used in most decisions.

The picture below describes this in an example.

Blueprint Step 1 example. Image by Author.

Goal: Get people to base their decisions on data. But of course, the data needs to be there, so extend your use of 3rd party tools all over the place, until you “cannot bear it anymore”. Really max out this step by putting the analytics people into the right places, which are close to decision-makers, and by getting the decision-makers to use data all the way.

The reasoning behind step 1: Step 2 will be to get data out of 3rd parties, integrate them in the simplest form possible, and put up some kind of access to it. These are 2–3 major technological items on your list and they simply are expensive. They require permanent manpower and money. So, unless you are already deriving value from data, it simply will not be worth it. Focus on step 1 first.

The key at the beginning of a start-up and in building up a data mesh from the very beginning is simple, to focus on generating value from the data, not on generating more data.

But it’s deceptively simple because focusing on this means ignoring something else. And you should ignore something else!

You should strip everything away on the technology side, keeping almost nothing in your own domain. Using 3rd Party tools everywhere, hosted cloud versions. Don’t own a database, don’t try to even use one.

Your goal should be to have either none or at least no full-time employee working on the data stack.

You’re doing this to focus on bringing data to the people, to the decision-makers. Hire 1–2 analysts you place close to them, say a marketing analyst and a product analyst. Keep them there, don’t try to move them to some central “analyst unit”.

Use your values, your manifesto your whatever core communication medium you have to make the data the core of every decision. Kolibri Games, later on, had a manifesto that read “90% of all decisions should be based on data”. It’s a good example of this.

Optionally, if your situation requires it, get someone with a bit more technical expertise to write 1–2 small scripts to push data from 3rd party tool 1 to 2 to 3.

Possible Technology Choices: Any 3rdparty tool, tracking tools frontend and backend, simple bash/Python/R scripts.

Bullet point summary:

  • 3rd party tools most of the way,
  • but get the data out, store it, and make it accessible.
  • ~ 1 full employee is there to take care of the infrastructure.
  • data used to validate decisions & changes;
  • data users become more technical & able to do deeper analyses.

The graphic below depicts an example of a step 2 start-up.

Blueprint step 2. Image by Author.

The goal of step 2 is to get decision-makers to be able to use data to validate decisions, not just base them on it. But that usually requires two things, one being access to integrated data, meaning multiple data sources at the same time. The second is access to data straight targeted at one new feature, one change.

On the mindset side of things that means we could for instance widen our manifesto by having a clause like “90% of all changes need to be validated by data”. This one is again taken from Kolibri Games, but you can find similar ones at Zynga, coincidentally also in the gaming industry.

On the people side of things, integrated data requires more technical skills. That means people will have to be Excel & Power BI Ninjas, able to write some SQL and possibly some Python code. We need those people again close to our decision-makers. These might be the same people as before, or we might add one “data scientist” to our company.

How you structure the interaction between development, analysts & decision-makers is pretty much up to you. One option could be to let analysts & product managers model out “events”, say by fixing e.g. the Google Analytics categories, naming, and the likes. These events are then implemented by a dev team together with the new feature.

You can also let the analyst work closely with the development team if they’re already responsible teams.

On the least important front, the technology front we will have to do quite a bit of heavy lifting. One, we need some way of getting data out of our 3rd parties. Two, we need someplace to put & integrate it. We can choose some hosted solutions here. Three, we need some way to access the stored data, again we should opt for a hosted solution here.

For one, we can choose a lot of things, simple scripts in Python, some hosted ingestion tools like Stitch, Airbyte, Meltano, or many other solutions. If we’ve already built a small “integration script” before, we might simply be able to use that as a starting point.

For two, we can choose some kind of object storage like AWS S3, although I’d first always go for a database because it makes the access rather simple.

For three, we can default to Excel & some basic SQL editors. Although I’d go the extra mile and use a BI/Reporting tool that is able to do some “custom data manipulations”. Some that fit that description would be Looker, Redash, or Metabase.

Optional components: Event streaming solutions. Depending on the developer maturity you can also include, or even base most of your work on event streaming solutions similar to Kafka. The benefit is the ease of data integration, whereas the downside is the needed maturity of dev teams.

Note: As mentioned, whether you end up with a data mesh that supercharges your company or just another data stack is in the process. Here we deliberately leave out a part that is commonly called “transformation”. We want end-users to get their hands dirty so that we keep the responsibility of the data in the right place. We simply enable them to do much more! We do that until, as before, they cannot bear it anymore but have built up the chops & technical personnel to take the next step, to also transform data themselves.

The bullet-point summary:

  • simplest possible integrations
  • 1–2 full employees take care of the infrastructure;
  • multiple decision-makers use data;
  • data used in most decisions.

The graphic below depicts an example of a step 3 start-up.

Blueprint step 3. Image by Author.

Goal: The third (and for now final) step is to optimize for speed while keeping the quality constant. Here the true power of the decentralized model unfolds. If you haven’t built up the knowledge & people in the decentralized units, sales, marketing, and products you will now have a serious problem.

In fact, I don’t see a single option to ever achieve this goal without this structure. Because the simple truth is, that the data questions are generated decentrally, you need to invest in this decentralized data knowledge.

To optimize for speed, our central data platform team provides “transformation” tooling as a service for decentralized teams. Popular choices could be a hosted Trino cluster with the option of checking in materialized views, Dbt, or even Databricks. Anything that provides the possibility to transform raw data in an easy manner into new information.

On the people side, we add data-able people to decentralized teams. In our example, we chose to add “analytics engineers” to the end-user side, but we could just as well bring the same capability to the development teams. The choice is up to you and depends largely on whether you’re currently more focused on “sharing data between domains” or more on the “we’re working inside one data domain” side of things. If it’s the latter, choose the analytics engineers, if it’s the former, give the development teams the right capabilities.

On the metalevel, we’re focusing on SLAs which cover the whole company regarding data. Therefore it’s essential to have gone through steps 1–2 to build up the data maturity. The SLAs which cover the whole company should be regarding speed. In our example, we chose “90% of all data questions can be answered within 1hr”, which implies this should be able to happen without a call to the analytics department, in self-service. This SLA means, that dev teams have to be really thorough in including data into all their products and it also means the central platform teams have to provide proper tooling to make all data available somehow.

And that’s it. But what about the fancy data mesh words you say?

True I didn’t use any of these fancy words, not even socio-technological paradigm shift, because I don’t think that overhead is needed at all at this stage. Step 1 is to be as lean as possible, and I still think even in Step 3 you’re only scratching the surface of the value of your data, although you’ve come much further than many already.

Next up would be very hard to forecast, so I excluded that from the linear blueprint. You now have a few options, like building up some machine learning solutions using the still technically centralized infrastructure, or scaling up the rest of your data operation.

Whatever you do, you will soon notice you’ll need to break out parts of these data streams and turn them into actual “data products”.

But the good news is, that you’ve already built an organization around you that will be in favor of doing such a thing! You will have development teams that handled data from day one and treat it as 1st class citizens, you’ll have managers and decision-makers all around who know that they own the data for their decisions. People that know it is necessary to collect data for any step I take.

So going from here to a mesh of (actual) data products will be a simple step, whereas your competition from incumbents will be very slow to react at all.

Finally, you will also start to encounter governance issues, things like data quality will become much more important.

If you’re reading this and want to apply it, I highly recommend checking out the journey Kolibri Games went through. Other than that, if you feel you’re somewhere inside of this, I’d love it if you ping me on LinkedIn or Twitter to share a part of your story! Enjoy.


A dead-simple 3-step blueprint for start-ups to get data-driven right through the data mesh

The 3-step blueprint for becoming a data meshed data-driven start-up. Image by Author.

The word “data mesh” is the top trend inside the data industry. Common wisdom currently seems to be that it’s something for big companies.

But that’s wrong IMHO. Even better, start-ups have the unique chance to leverage the data mesh from the very beginning of their existence to become more data-driven than any single incumbent in their market. The simple truth is, that transferring over to a data mesh is hard work because it involves organizational changes. Start-ups have the unique chance of doing the organizational foundations right from the beginning, and thus avoid the lock-in into a legacy data organization large incumbent companies will have.

I’ve done a bunch of in-depth studies of data journeys to deliver a dead-simple blueprint most start-ups can follow. The gaming start-up Kolibri Games with their hit game “Idle Miner Tycoon” in particular inspired a lot of this because they shared some particularly deep insights into their data journey reaching back to the founding of their company until today.

The blueprint is a simple three-step blueprint, both organizationally and technologically:

1. Step: Focus on using data in decisions, until you maximize the value per piece of data.

2. Step: Focus on building data into decisions, until you maximize the amount of data produced.

3. Step: Focus on the speed of deriving decisions from data.

Let’s first understand why I think start-ups have a unique opportunity here.

Each stage of the blueprint as depicted above and below has three “columns”.

  • One technology column left side of the pic,
  • one people & processes column, in the middle,
  • and one “manifest” column, on the right.

The technology column, while important, is actually the least important one in the data mesh context. Technology alone will not do anything for you. In fact, you can use the same technologies to build up a plain new modern data stack, without getting an inch closer to being data-driven.

It is first and foremost the commitment to a “manifesto”, a clear usage of data throughout every decision at the company that is most important. It’s the people that are hired to support that. And it’s the processes that clearly dictate that the decision-makers & their support teams own the data.

This is the only way to ensure that data will be built into decisions, that it is really “owned”.

Changing from a traditional “data is just some by-product, analytics teams that do some analysis” approach towards true responsibility & ownership for data is hard, really hard. It will involve changing people around, hiring or upskilling, and changing lots of processes. That is, why if you do it right from the beginning, you as a start-up will have a unique edge over your competition.

A data mesh is no technology, it’s more a mindset that says…

“The one organizational unit who produces the data also owns it; That means it’s responsible for serving it to others and transforming it to something useful”.

(That’s a very short version though!)

So if I am the product manager for a piece of software, I also am responsible for the data it produces. After all, I also need to know whether it works, right? If I make a change or implement something new, without data I won’t know whether it’s the right thing. So also owning the data seems like the most natural thing in the world to me. I am responsible for having the data captured. I am also responsible for providing this data to other teams, just like I am for any other API my team builds.

The catch is, that this wasn’t always the case, in fact, most software developers and even product managers won’t share that perspective today.

But that just means one thing: If you’re a start-up, you will be able to make more right decisions, being more data-driven than most companies in the market, if you adopt this mentality right off the start.

Now let us take a look at our example product manager Eve.

Eve is the product manager for an online game. She wants to increase the total time played which is the number of players multiplied by their average game time. The game currently consists of three rounds.

Her idea is that increasing the rounds from 3 to 4 rounds will increase that by roughly keeping the number of players constant and of course increasing their average game time (which again is related to the churn rate in the game).

If Eve implements that change, our start-up needs three things to make this decision data-driven, and then learn from it:

  • 1. We need some kind of guidance for Eve so she knows that she should do some research for her decision and not simply base it on gut feeling. => We call that the “Manifesto” which can take many forms.
  • 2. We need some kind of technology that allows someone to collect some of the data points Eve will need, like the number of players and the average game time. => We call that the technology platform.
  • 3. Finally, we need some processes/ people that help Eve to focus on making decisions. => We call that the “processes & people” part.

As painful as it is, the starting point, and one alluded to by Kolibri Games is getting started with using data, even if there is next to no infrastructure at all. That means within the first 10–20 people in your company, there should be 1–2 analysts, supporting decisions.

If you look at Airbnb, they hired their first data scientist within the first 20 employees. At Spotify one of the first hires went on to build the recommendation system, the key stepping stone for Spotify to grow to its current size.

But even data-minded product managers & founders will do the job, as long as you have people who are starting to use data, wherever it may come from.

What follows is just one possible blueprint. I’ve seen it implemented a bunch of times and it’s quite flexible, but of course, there are others that fit better depending on your situation. But as said, this is supposed to make things simpler, not harder, so I’ll stick with one flexible blueprint for start-up data meshes.

We can summarize step 1 in the following bullet points:

  • 3rd party tools all the way
  • simplest possible data integration, or none at all;
  • not even a full employee needed to take care of the infrastructure;
  • multiple decision-makers using data;
  • data used in most decisions.

The picture below describes this in an example.

Blueprint Step 1 example. Image by Author.

Goal: Get people to base their decisions on data. But of course, the data needs to be there, so extend your use of 3rd party tools all over the place, until you “cannot bear it anymore”. Really max out this step by putting the analytics people into the right places, which are close to decision-makers, and by getting the decision-makers to use data all the way.

The reasoning behind step 1: Step 2 will be to get data out of 3rd parties, integrate them in the simplest form possible, and put up some kind of access to it. These are 2–3 major technological items on your list and they simply are expensive. They require permanent manpower and money. So, unless you are already deriving value from data, it simply will not be worth it. Focus on step 1 first.

The key at the beginning of a start-up and in building up a data mesh from the very beginning is simple, to focus on generating value from the data, not on generating more data.

But it’s deceptively simple because focusing on this means ignoring something else. And you should ignore something else!

You should strip everything away on the technology side, keeping almost nothing in your own domain. Using 3rd Party tools everywhere, hosted cloud versions. Don’t own a database, don’t try to even use one.

Your goal should be to have either none or at least no full-time employee working on the data stack.

You’re doing this to focus on bringing data to the people, to the decision-makers. Hire 1–2 analysts you place close to them, say a marketing analyst and a product analyst. Keep them there, don’t try to move them to some central “analyst unit”.

Use your values, your manifesto your whatever core communication medium you have to make the data the core of every decision. Kolibri Games, later on, had a manifesto that read “90% of all decisions should be based on data”. It’s a good example of this.

Optionally, if your situation requires it, get someone with a bit more technical expertise to write 1–2 small scripts to push data from 3rd party tool 1 to 2 to 3.

Possible Technology Choices: Any 3rdparty tool, tracking tools frontend and backend, simple bash/Python/R scripts.

Bullet point summary:

  • 3rd party tools most of the way,
  • but get the data out, store it, and make it accessible.
  • ~ 1 full employee is there to take care of the infrastructure.
  • data used to validate decisions & changes;
  • data users become more technical & able to do deeper analyses.

The graphic below depicts an example of a step 2 start-up.

Blueprint step 2. Image by Author.

The goal of step 2 is to get decision-makers to be able to use data to validate decisions, not just base them on it. But that usually requires two things, one being access to integrated data, meaning multiple data sources at the same time. The second is access to data straight targeted at one new feature, one change.

On the mindset side of things that means we could for instance widen our manifesto by having a clause like “90% of all changes need to be validated by data”. This one is again taken from Kolibri Games, but you can find similar ones at Zynga, coincidentally also in the gaming industry.

On the people side of things, integrated data requires more technical skills. That means people will have to be Excel & Power BI Ninjas, able to write some SQL and possibly some Python code. We need those people again close to our decision-makers. These might be the same people as before, or we might add one “data scientist” to our company.

How you structure the interaction between development, analysts & decision-makers is pretty much up to you. One option could be to let analysts & product managers model out “events”, say by fixing e.g. the Google Analytics categories, naming, and the likes. These events are then implemented by a dev team together with the new feature.

You can also let the analyst work closely with the development team if they’re already responsible teams.

On the least important front, the technology front we will have to do quite a bit of heavy lifting. One, we need some way of getting data out of our 3rd parties. Two, we need someplace to put & integrate it. We can choose some hosted solutions here. Three, we need some way to access the stored data, again we should opt for a hosted solution here.

For one, we can choose a lot of things, simple scripts in Python, some hosted ingestion tools like Stitch, Airbyte, Meltano, or many other solutions. If we’ve already built a small “integration script” before, we might simply be able to use that as a starting point.

For two, we can choose some kind of object storage like AWS S3, although I’d first always go for a database because it makes the access rather simple.

For three, we can default to Excel & some basic SQL editors. Although I’d go the extra mile and use a BI/Reporting tool that is able to do some “custom data manipulations”. Some that fit that description would be Looker, Redash, or Metabase.

Optional components: Event streaming solutions. Depending on the developer maturity you can also include, or even base most of your work on event streaming solutions similar to Kafka. The benefit is the ease of data integration, whereas the downside is the needed maturity of dev teams.

Note: As mentioned, whether you end up with a data mesh that supercharges your company or just another data stack is in the process. Here we deliberately leave out a part that is commonly called “transformation”. We want end-users to get their hands dirty so that we keep the responsibility of the data in the right place. We simply enable them to do much more! We do that until, as before, they cannot bear it anymore but have built up the chops & technical personnel to take the next step, to also transform data themselves.

The bullet-point summary:

  • simplest possible integrations
  • 1–2 full employees take care of the infrastructure;
  • multiple decision-makers use data;
  • data used in most decisions.

The graphic below depicts an example of a step 3 start-up.

Blueprint step 3. Image by Author.

Goal: The third (and for now final) step is to optimize for speed while keeping the quality constant. Here the true power of the decentralized model unfolds. If you haven’t built up the knowledge & people in the decentralized units, sales, marketing, and products you will now have a serious problem.

In fact, I don’t see a single option to ever achieve this goal without this structure. Because the simple truth is, that the data questions are generated decentrally, you need to invest in this decentralized data knowledge.

To optimize for speed, our central data platform team provides “transformation” tooling as a service for decentralized teams. Popular choices could be a hosted Trino cluster with the option of checking in materialized views, Dbt, or even Databricks. Anything that provides the possibility to transform raw data in an easy manner into new information.

On the people side, we add data-able people to decentralized teams. In our example, we chose to add “analytics engineers” to the end-user side, but we could just as well bring the same capability to the development teams. The choice is up to you and depends largely on whether you’re currently more focused on “sharing data between domains” or more on the “we’re working inside one data domain” side of things. If it’s the latter, choose the analytics engineers, if it’s the former, give the development teams the right capabilities.

On the metalevel, we’re focusing on SLAs which cover the whole company regarding data. Therefore it’s essential to have gone through steps 1–2 to build up the data maturity. The SLAs which cover the whole company should be regarding speed. In our example, we chose “90% of all data questions can be answered within 1hr”, which implies this should be able to happen without a call to the analytics department, in self-service. This SLA means, that dev teams have to be really thorough in including data into all their products and it also means the central platform teams have to provide proper tooling to make all data available somehow.

And that’s it. But what about the fancy data mesh words you say?

True I didn’t use any of these fancy words, not even socio-technological paradigm shift, because I don’t think that overhead is needed at all at this stage. Step 1 is to be as lean as possible, and I still think even in Step 3 you’re only scratching the surface of the value of your data, although you’ve come much further than many already.

Next up would be very hard to forecast, so I excluded that from the linear blueprint. You now have a few options, like building up some machine learning solutions using the still technically centralized infrastructure, or scaling up the rest of your data operation.

Whatever you do, you will soon notice you’ll need to break out parts of these data streams and turn them into actual “data products”.

But the good news is, that you’ve already built an organization around you that will be in favor of doing such a thing! You will have development teams that handled data from day one and treat it as 1st class citizens, you’ll have managers and decision-makers all around who know that they own the data for their decisions. People that know it is necessary to collect data for any step I take.

So going from here to a mesh of (actual) data products will be a simple step, whereas your competition from incumbents will be very slow to react at all.

Finally, you will also start to encounter governance issues, things like data quality will become much more important.

If you’re reading this and want to apply it, I highly recommend checking out the journey Kolibri Games went through. Other than that, if you feel you’re somewhere inside of this, I’d love it if you ping me on LinkedIn or Twitter to share a part of your story! Enjoy.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
BalnojanblueprintDataDataDrivenlatest newsMeshStartupStepsSvenTechnoblenderTechnology
Comments (0)
Add Comment