The Data Mesh concept is a counterpoint to the traditional large, centralized data platforms of the past few decades. This requires a major shift in thinking, according to Zhamak Dehghani, principal technology consultant at ThoughtWorks, who identified several shortcomings of the traditional approach to data architectures, which rely on centralized and monolithic platforms to consolidate varied data types:
They are inadequate for large organizations with numerous data sources and large numbers of people who might need access to the data
The teams that govern enterprise-wide data tend to be siloed and encumbered by their own set of processes, policies and technologies
The Data Mesh concept is an alternative approach designed to deal with the rapidly increasing amounts of data and growing number of data use cases that organizations deal with. It is intended to enable rapid access to datasets across a large organization with distributed data-related technologies. As such, it addresses the shortcomings of traditional centralized architecture, including the problems of both the data warehouse and the data lake.
Get the Beginners Guide to Data Mesh
From centralized to localized ownership
In fact, it challenges the entire assumption that, in order for data to be useful, it has to be centrally located or even centrally managed. In contrast, the theory behind the Data Mesh holds that data has more potential for innovation if it is owned by the departments that are most likely to use it. CRM data, for example, should be owned and maintained by sales and marketing, while supply chain data should be owned and maintained by logistics and procurement functions.
The Data Mesh, while highly federated, is no free-for-all. It relies on consistent policies being followed across the various departments to ensure that data is discoverable, trustworthy, secure, in compliance with regulations, and sufficiently enriched with metadata so that its function and history is clear. Only under these conditions will the Data Mesh succeed in making data-derived insights rapidly available organization-wide through data that is essentially highly decentralized.
The Data Mesh, therefore, employs several foundational principles, including:
Decentralization of control over data to department-level teams
Enterprise-wide self-service access to data
Enterprise-wide federated yet uniform governance over the data
The Data Mesh will need to address, at massive scale, such details as:
Encryption
Schema
Discovery
Governance
Standardization
Lineage
Data quality
This approach requires that roles for data ownership be assigned within the various department teams, and that the teams be incentivized to take full ownership of their data and govern it in such a way that it will benefit the entire organization. For example, the teams or ‘domains’ may be expected to adhere to SLAs based on data quality measures.
The local teams are responsible for managing their own ETL pipelines, as well as aggregating their data assets. Meanwhile, the Data Mesh employs a ‘layer of connectivity’ that enables universal access to the organization. For instance, the sales or marketing teams need to have access to the logistics data to answer business questions that might require data that’s held in supply chain data repositories. This requires some kind of software platform that virtualizes access to the entirety of an organization’s data, regardless of where it resides.
Interestingly, this architecture doesn’t necessarily replace data warehouses and data lakes. Rather, it sits on top of them. If one department wants to keep their data in Hadoop, and another in Snowflake, that’s fine, as long as they adhere to the governing principles of the organization’s data mesh. By allowing greater autonomy and flexibility for data owners, the Data Mesh paradigm encourages more experimentation and innovation.
Is the Data Mesh practical?
It depends on the capabilities of the ‘connectivity layer’.
Anyone who has to manage or lead a team may rightfully ask whether local teams are really up to the job. Considering how specialized data engineering is, can this be passed along to the departments? To answer this, we need to pay close attention to the ‘layer of connectivity’ that we just mentioned. For the Data Mesh to be successful, as much work as possible needs to be removed from local teams. A virtual layer that lifts much of the burden for standardization and governance from the local teams is required to take the Data Mesh concept out of the realm of theory and make it a practical reality.
451 research published A Beginners Guide to Data Mesh. Download the Beginner’s Guide to Data Mesh free.