Data Mesh Architecture

Shashi Shankar

Mar 21, 2023

Distributed Data Management Architecture for Large Enterprises

What is a Data Mesh Architecture

Data Mesh is an architectural approach for organizing and managing data in large and complex organizations. It aims to decentralize data ownership and processing by treating data as a product and enabling domain-oriented, self-serve data access and analytics.

Data mesh is an architectural paradigm designed for the construction of enterprise-level data platforms within sizable and intricate organizations. It serves to expand the adoption of analytics across multiple platforms and implementation teams, thus enhancing scalability within the analytics ecosystem.

The key characteristics of Data Mesh include:

Domain-Oriented Ownership:

Data Mesh advocates for assigning data ownership to individual domains or business units within an organization. Each domain is responsible for managing its own data, including ingestion, storage,

processing, and governance.

Data as a Product:

In the Data Mesh paradigm, data is treated as a product that is created, managed, and consumed by different domains. This approach encourages data producers to focus on creating high-quality, reusable data products that meet the specific needs of data consumers.

Decentralized Data Architecture:

Data Mesh promotes a decentralized architecture where data processing and analytics capabilities are distributed across multiple domains or teams. This enables greater agility, autonomy, and scalability in data operations.

Self-Serve Data Infrastructure:

Data Mesh emphasizes the importance of providing self-serve data infrastructure and tools to enable domain teams to access, analyze, and derive insights from data autonomously. This includes data catalogs, data pipelines, data lakes, and analytics platforms that are designed for ease of use and interoperability.

Data Mesh Governance:

Data Mesh advocates for establishing governance mechanisms that ensure consistency, quality, security, and compliance across data products and domains. This includes defining data standards, policies, and guidelines, as well as implementing monitoring and auditing processes.

Cross-Functional Collaboration:

Data Mesh encourages cross-functional collaboration and alignment between data engineering, data science, domain experts, and business stakeholders. This fosters a culture of data-driven decision-making and innovation across the organization.

Scalability and Resilience:

Data Mesh is designed to scale and adapt to evolving business requirements and technological advancements. It enables organizations to handle large volumes of data, diverse data sources, and complex analytics workloads with resilience and efficiency.

Components of Data Mesh architecture

In a data mesh architecture, there are several key components that work together to enable decentralized data management and governance across an organization:

Data Domains:

Domains represent distinct business units, functional areas, or subject areas within the organization. Each domain is responsible for managing and governing its own data assets, including defining schemas, data models, and access policies.

Data Products:

Data products are the outputs of domain-specific data processing pipelines. They can include raw data, curated datasets, analytical models, or any other data asset that provides value to the organization. Data products are created, managed, and consumed within individual domains.

Data Mesh Framework:

The data mesh framework defines the principles, standards, and guidelines for implementing a decentralized data architecture. It includes governance policies, data sharing agreements, and technical standards that enable seamless integration and collaboration across domains.

Data Infrastructure:

The data infrastructure consists of the technical components and systems that support data processing, storage, and analytics. This may include data lakes, data warehouses, data streaming platforms, and other tools and technologies for managing and analyzing data.

Data Mesh Platform:

The data mesh platform provides the foundational technology stack for implementing the data mesh architecture. It includes tools and services for data discovery, data cataloging, metadata management, data lineage tracking, and other capabilities that enable collaboration and interoperability across domains.

Master Data Management (MDM) In Data Mesh Architecture

In a Data Mesh architecture multiple versions of the same data might exist in this setup. Data can be inconsistent in contexts between these different domains. Data quality can also vary. A robust Master Data Management (MDM) must be implemented to prevent data quality challenges. The Master Identification numbers link together mastered data and data from your domains.

Hub & Spoke Data Mesh Implementation

In a hub-and-spoke model of data mesh architecture, data infrastructure is organized around a central hub that serves as a core repository or coordination point for data. Spoke systems or domains, representing different business units or functions, connect to this hub to exchange data. This architecture fosters decentralization by distributing data ownership and governance to individual spokes while maintaining centralized control and coordination through the hub.

The data hub provides generic services platform operations for data domains such as:

· self-service data publishing to managed locations

· data cataloging, lineage, audit, and access control via Unity Catalog

· data management services such as time travel