In this article, Pascal Coggia, Managing Partner at Artefact UK, explains what Data Mesh is and what it isn’t, why it’s as much a mindset as an approach, and its use cases, benefits and challenges.
What is Data Mesh? How is it different from a data lake?
The original architect of the term is Zhamak Dehghani, a Thoughtworks consultant and evangelist for data decentralization. In simple terms, Data Mesh is a distributed architecture approach for managing analytical data. It allows end-users to easily access and query data where it resides, without first transporting it to a data lake or warehouse. A decentralized Data Mesh strategy treats data as a product and provides domain-specific teams with data ownership through a self-service platform that has embedded data governance.
Data Lakes are minimally governed storage areas for raw domain data. They were meant to provide unlimited access to data in an attempt to avoid the bottleneck of centralized, tightly-governed data warehouses, but they tended to suffer from poor data quality and discoverability issues. Certain governed data lake projects have addressed these issues with a modicum of success, but they tend to reduce the relative accessibility of the data as a result. Data Mesh aims to solve these challenges through decentralization, thereby avoiding these so-called “data swamps” entirely.
What is meant by “data as a product”?
I think about it a bit like the app store. You just download an app when you want to do something else. Why shouldn’t it be that way with data? Think about it structurally: what are the components of a data product?
All of this suggests that a data product sits on a fabric that allows it to interact. It’s not in isolation. You can’t just throw some data together and stick it in an S3 bucket and call it a data product. You have to wrap ownership and governance around it.
What are the benefits for businesses?
There are many benefits Data Mesh can offer to organizations and cross-functional domain teams:
What are the challenges to Data Mesh adoption?
It’s important to remember that Data Mesh doesn’t just require a technological shift, it requires a mindset shift. Organizations have to learn to think about data as a product, about data governance and ownership. Shifting businesses from centralized to decentralized ownership and moving organizations from pipelines to product, where data domains are the first class concern, is going to take some doing.
A few other issues include these cited by Deloitte:
When is a company ready to adopt a Data Mesh strategy?
It depends on how prepared the company is. But it also depends on who you’re talking to. A Chief Data Officer who’s built a massive central organization may not be ready for Data Mesh because they will need to first establish how to federate those functions. But most business leaders understand the need to democratize the data asset towards the edges and the business because they’re often frustrated with the centralized approach.
You also need to know what has to happen at an engineering level to be able to control and govern the mesh, because if you don’t set it out correctly, it can turn into the Wild West. So there’s a series of steps to follow.
Transitioning to a Data Mesh is an incremental journey because all the elements you already have – data lakes, data warehouses – need to connect to the Data Mesh, they can’t be discarded. People will want that information and the value and governance that’s already wrapped around them.
What kinds of companies are successfully deploying Data Mesh?
Right now, Data Mesh is being successfully adopted in the financial services sector. ING is a good example. It makes sense for banks to use Data Mesh – it supports stronger data governance, so it offers increased security. With Data Mesh, fraud detection systems don’t need to connect to other systems and pull the same data every day. Instead, organizations can create domain-focused data products that their anomaly detection experts can use to create better models and outcomes.
Zalando, which is Europe’s leading online platform for fashion, decentralized their data in 2020 and turned their massive data lake into a Data Mesh. As for other sectors, we’ll have to see how it goes on a case-by-case basis. Because any business case you create for Data Mesh will need to be tailored to the organization’s – and the sector’s – specific challenges, and those are in constant flux.
Data management strategies are always evolving and organizations need to be prepared to adapt to changes in order to stay competitive. Data Mesh is a way to break down the silos of unwieldy monolithic architecture systems and decentralize data for end-to-end accountability and scalability. Whether Data Mesh is right for your business – or not, or not yet – is the question.