WHITEPAPER
Data Mesh
A Market Primer
This paper addresses the why, what, and how of data
mesh, including the market need, a detailed description of
data mesh architecture, and look at its core capabilities,
objectives, challenges, use cases, and more.
What is Data Mesh? A Market Primer 1
Why Data Mesh
The simple premise of data mesh is that
business domains should be able to
define, access, and control their own data
products.
The thinking is, that business stakeholders in a specific
domain understand their data needs better than
anybody else. And when business people are forced to
work with data engineers or data scientists outside their
domain, provisioning the right data, to the right data
consumers, at the right time, is time-consuming, often
error-prone, and ultimately, ineffective.
Data Mesh Principles
The data mesh architecture has emerged to address the
following key data management principles:
• A single source of truth is a must, but it’s • Data needs to be made available to everyone,
incredibly challenging when data is scattered among all the time, without the need for technical expertise
hundreds of disparate legacy, cloud, and hybrid or any involvement of IT.
systems.
• Effective data management requires
• The volume of data is growing exponentially, collaboration, of data engineers, data scientists,
with increasing demand for instant data access and business analysts, and operational data consumers.
faster response times.
What is Data Mesh?
Data mesh, an emerging data architecture for organizing and delivering
enterprise data, is founded on 4 key concepts:
• Data as a product, where data products – • Distributed data governance, where each domain
comprised of clean, fresh, and complete data – are governs its own data products, but is reliant on
delivered to any data consumer, anytime, anywhere, central control of data modeling, security policies, and
based on permissions and roles compliance
In the data mesh implementation, every business
• Business domain-driven data ownership, which domain retains control over all aspects of its data
reduces the reliance on centralized data teams (often
including data engineers and data scientists) products for both analytical and operational use cases
– in terms of quality, freshness, privacy compliance,
• Instant access to data, enabled by new levels of etc. – and is responsible for sharing them with other
abstraction and automation – designed to share domains (departments in the enterprise).
relevant data cross-functionally, on demand
What is Data Mesh? A Market Primer 2
What are Data Products?
Data products are produced to be consumed with a specific purpose in mind. A data product may assume a variety of
forms, based on the specific business domain, or use case to be addressed. A data product will often correspond to a
business entity – such as a customer, asset, supplier, order, credit card, campaign, etc. – that data consumers would like
to access for analytical and operational workloads. The data for the product will typically be fragmented across dozens
of siloed source systems, often of different technologies, structures, formats, and terminologies.
A data product, therefore, encapsulates everything that a data consumer requires in
order to derive value from the business entity’s data. This includes the product’s:
• Metadata, both static and active (usage and The data product delivery lifecycle adheres to the agile
performance) principles of being short and iterative, to deliver quick,
incremental value to data consumers. A data product
• Algorithms, for processing the ingested, raw data approach entails:
• Data, post-processing (i.e., unified, cleansed, masked,
enriched, etc.) • Definition and design
Data product requirements are defined in the context
• Access methods, such as SQL, JDBC, web services, of business objectives, data privacy and governance
streaming, CDC,... constraints, and existing data asset inventories.
Data product design depends on how the data will
• Synchronization rules, defining how and when the be structured, and how it will be componentized as a
data is synced with the source systems product, for consumption via services.
• Orchestrated data flows, as visualized in a • Engineering
modern data catalog
Data products are engineered by identifying,
integrating, and collating the data from its sources,
• Lineage, to the source systems and then masking it as needed. Web service APIs are
• Audit log, of data changes created to provide consuming applications with the
authority to access the data product, and pipelines
• Access controls, including credential checking are secured for delivering the data to its constituents.
and authentication
• Quality assurance
The data is tested and validated to ensure that it’s
The data product is created by applying complete, compliant, and fresh – and that it can be
cross-functional, product lifecycle securely consumed by applications at massive scale.
methodology to data.
• Support and maintenance
Data usage, pipeline performance, and reliability are
continually monitored, by local authorities and data
engineers, to so issues can be addressed as they
arise.
• Management
Just as a software product manager is responsible
for defining user needs, prioritizing them, and then
working with development and QA teams to ensure
delivery, the data product approach calls for a similar
role. The data product manager is responsible for
delivering business value and ROI, where measurable
objectives – such as response times for operational
insights, or the pace of application development –
have definitive goals, or timelines, based on SLAs
reached between business and IT.
What is Data Mesh? A Market Primer 3
What’s a Mesh?
Data product mindset The meaning of mesh
Innovative data product practices combine the concepts In tech terms, mesh is a network topology in which
of Design Thinking, for breaking down the organizational a group of non-hierarchical nodes work together
silos that often impede cross-functional innovation, and collaboratively. Some common examples of mesh
the Jobs to be Done Theory, which defines the product’s include:
ultimate purpose in fulfilling specific data consumer
goals.
• WiFi mesh– Routers and extenders working in
tandem to improve Internet access
Decentralization • 5G mesh– A series of wireless clients, routers and
With the meteoric rise of cloud-based applications, gateways assuring reliable cell phone connections
application architectures are transitioning away from
centralized IT, towards distributed, microservices (or a • Service mesh– A unified way to control
decentralized microservices
service mesh).
Data architecture is following the same trend, with data Similar to the above, data mesh represents a
being distributed across a wide range of physical sites, decentralized way of distributing data across virtual and
spanning many locations (or a data mesh). Although physical networks, spanning great distances. Where
a monolithic, centralized data architecture is often legacy data integration tools require a highly centralized
simpler to create and maintain, in an IT world propelling infrastructure, a data mesh operates across on-premise,
to the cloud, there are many good reasons and benefits single-cloud, multi-cloud and edge environments.
to having a modular, decentralized data management
system. Distributed security
When data is highly distributed and decentralized,
security plays a critical role. Distributed systems must
delegate authentication and authorization activities
out to a host of different users, with different levels of
access. Key data mesh capabilities include:
• Data encryption, at rest and in motion
• Data masking, for effective PII obfuscation
• Data privacy management, in all its forms
• GDPR and CCPA compliance, and all other
legislation
• Identity management, including LDAP/IAM-type
services
Data Mesh Objectives
The objectives of data mesh are to:
• Exchange data products between data producers and
data consumers
• Simplify the way data is processed, organized, and
governed
• Democratize data with a self-service approach that
minimizes dependence on IT
What is Data Mesh? A Market Primer 4
The table below compares the features of traditional data management
platforms to data mesh architectures.
Traditional data management platforms Data mesh architectures
Serve a centralized data team that supports multiple domains Serve autonomous domain teams
Manage code, data, and policies, as a single unit Manage code and pipelines independently
Require separate stacks for operational and analytical Provide a single platform for operational and analytic
workloads workloads
Cater to IT, with little regard for Business Cater to IT and Business, alike
Centralize the platform for optimized control Decentralize the platform for optimized scale
Force domain awareness Remain domain-agnostic
The left-hand-side of the table describes most monolithic data platforms. They serve a centralized IT team, and are
optimized for control. Operational stacks used to run enterprise software, are completely separated from the clusters
managing the analytical data.
The data mesh dictates greater autonomy in the management of data flows, data pipelines, and policies. In the end of
the day, data mesh is an architecture based on decentralized thinking that can be applied to any domain.
Data Mesh Challenges
The main challenges of a data mesh stem from the complexities inherent to managing
multiple data products (and their dependencies) across multiple autonomous domains.
Here are the key considerations:
• Multi-domain data duplication • Cost and risk
Redundancy, which may occur when the data of one Existing data and analytics tools should be adapted
domain is repurposed to serve the business needs and augmented to support a data mesh architecture.
another domain, could potentially impact resource Establishing a data management infrastructure to
utilization and data management costs. support a data mesh - including data integration,
virtualization, preparation, masking, governance,
• Federated data governance and quality orchestration, cataloging, and delivery - can be a very
assurance large, costly, and risky undertaking.
Different domains have different governance and
quality requirements, which must be taken into • Cross-domain analytics
account when data products and pipelines are shared An enterprise-wide data model must be defined to
commodities. The resulting deltas must be identified consolidate the various data products and make
and federated. them available to authorized users in one central
location.
• Change management
Decentralizing data management to adopt a
data mesh approach requires significant change
management in highly centralized data management
practices.
What is Data Mesh? A Market Primer 5
Data Mesh Benefits
The benefits of a data mesh are significant, and include the following:
• Agility and scalability IT-business cooperation, domain knowledge is
enhanced, and business agility is extended.
Data mesh improves business domain agility,
scalability, and speed to value from data. It
decentralizes data operations, and provisions data • Faster data delivery
infrastructure as a service. As a result, it reduces Data mesh makes data accessible to authorized
IT backlog, and enables business teams to operate data consumers in a self-service manner, hiding the
independently and focus on the data products underlying data complexities from users.
relevant to their needs.
• Strong central governance and compliance
With the ever-growing number of data sources and
• Cross-functional domain teams formats, data lakes and DWHs often fail at massive-
As opposed to traditional data architecture
approaches, in which highly-skilled technical teams scale data integration and ingestion. Domain-based
are often involved in creating and maintaining data data operations, coupled with strict data governance
pipelines, data mesh puts the control of the data in guidelines, promote easier access to fresh, high-
the hands of the domain experts. With increased quality data. With data mesh, bulk data dumps into
data lakes are things of the past.
Data Mesh Capabilities
Data mesh supports the following Data mesh also addresses the following
functional capabilities: non-functional capabilities:
• Data catalog • Data scale, volume, and performance
Discovers, classifies, and creates and inventory of Scales both up and down, dynamically, seamlessly,
data assets, and visually displays information supply and at high speed, regardless of data volume
chains
• Accessibility
• Data engineering Supports all data source types, access modes,
Rapid creation of scalable and reliable data pipelines formats, technologies, and integrates master and
that support analytical and operational workloads. transactional data, at rest, or in motion
Common data preparation flows are productized for
reuse by the domains. • Distribution
Deploys on premise, cloud, or in hybrid environments,
• Data governance with complete transactional integrity
Distributes certain quality assurance, privacy
compliance, and data availability policies and • Security
enforcement to the business domains, whilst Encrypts and masks data, to comply with privacy
maintaining centralized governance over company- regulations, and checks user credentials, to ensure
wide data policies. authorized access is maintained
• Data preparation and orchestration
Enables quick orchestration of source-to-target data
flows, including data cleansing, transformation,
masking, validation, and enrichment
• Data integration and delivery
Accesses data from any source and pipelines it to any
target, in any method: ETL (bulk), messaging, CDC,
virtualization, and APIs
• Data persistence layer
Selectively stores and/or caches data in the hub,
or within the domains to improve data access
performance. What is Data Mesh? A Market Primer 6
Data Mesh Use Cases
Data mesh supports many different operational and analytical
use cases, across multiple domains.
Here are a few examples:
• Customer 360 view, to support customer care in
reducing average handle time, increase first contact
resolution, and improve customer satisfaction. A
single view of the customer may also be deployed by
marketing to predictive churn modeling or next-best-
offer decisioning
• Hyper segmentation, to enable marketing teams
deliver the right campaign to the right customer, at
the right time, and via the right channel
• Data privacy management to protect customer
data by complying with ever-emerging regional data
privacy laws, like VCDPA, prior to making it available
to data consumers in the business domains
• IoT device monitoring, providing product teams
with insights into edge device usage patterns, to
continually improve product adoption and profitability
• Federated data preparation, enabling domains to
quickly provision quality, trusted data for their data
analytics workloads
Implementing Data Mesh with
an Entity-Based Data Fabric
Based on the concepts of “business entity” and “data as a product”, an entity-based
data fabric is the optimal implementation for the data mesh design pattern.
A data fabric creates an integrated layer of connected methods, and the needed central governance policies,
data across disparate data sources to deliver a real-time that protect and secure the data, in the data products, in
and holistic view of the business to operational and accordance with regulations.
analytical workloads.
Additional data fabric nodes are deployed in alignment
An entity-based data fabric centralizes the semantic with the business domains, providing the domains with
definition of the various data products that are important local control of data services and pipelines to access
to the business. It also sets up the data ingestion and govern the data products for their respective data
consumers.
What is Data Mesh? A Market Primer 7
Here’s what a data mesh implementation looks like
based on an entity-based data fabric.
In this sense, an entity-based data fabric – that manages, prepares, and delivers data in the form of business entities
– becomes the data mesh core.
While data mesh architecture design introduces technology and implementation challenges, these are neatly
addressed via an entity-based data fabric:
Data mesh implementation issues How they are resolved by an entity-based data fabric
Need for data integration expertise Data products as business entities
Domain-specific data pipelining requires distributed
When a data product is a business entity managed in
expertise in complex data integration and modeling of
a virtual data layer, domains don’t have to deal with the
multiple disparate source systems across the enterprise.
underlying source systems.
Independence vs confederacy Cross-functional collaboration
Striking the right balance between domain independence Centralized data teams collaborate with domain-specific
and reliance on central data teams isn’t trivial. teams to produce the data products. The domain-specific
teams create APIs and pipelines for their respective data
consumers, govern and control access rights, and monitor
usage.
Real-time and batch data delivery Operational and analytical workloads
Trusted data products need to be provisioned to both online An entity-based data fabric ingests and processes data from
and offline data consumers, efficiently and securely, on a underlying systems, to deliver data products on demand, for
single platform. operational and analytical use cases.
What is Data Mesh? A Market Primer 8
About K2View
At K2View, we believe that every enterprise should be able to use its data to be as disruptive and agile
as Google, Amazon, and Netflix.
We make this possible by transforming all your data – wherever it is – into business-driven data
products, which are defined and managed by business domains.
Data products could be customers, products, suppliers, orders – or anything else that’s important
to your business. We manage every individual data product in its own secure Micro-Database™,
continuously in sync with all source systems, and instantly accessible to everyone.
This is all made possible by our operational data fabric, which delivers a trusted, real-time view of
any data product. K2View Fabric deploys in weeks, scales linearly, and adapts to change on the fly. It
supports modern data architectures, such as data mesh, data hub, and multi-domain MDM – in on-
premise, cloud, or hybrid environments.
This one platform drives many use cases, including application modernization, cloud migration,
customer 360, data privacy, data testing, and more – to deliver business outcomes in less than half
the time, and at half the cost, of any other alternative.
© 2022 K2View. All rights reserved. K2View Fabric and Micro-Database are trademarks of K2View.
Content subject to change
What is Data Mesh? A Market Primer 9