Home
>
Data Science
>
Update Data Management on a Decentralized Data Mesh

March 29, 2022 by Phu Nguyen

Update Data Management on a Decentralized Data Mesh

Main Contents:

Data Management on a Decentralized Data Mesh is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Data Management on a Decentralized Data Mesh in today’s post !

The Quest for Data Integration

Simply put, data integration involves combining data from several disparate sources into a unified view, enabling the analysis of combined datasets. Businesses very often want insights across, or to test hypotheses involving, multiple business processes. For example, in a consumer product company, find the impact of defects of products of some category on sales, customer satisfaction and call center personnel, by region.

This requires data managed in processes from different business units that generally use different operational systems. Obtaining this insight needs data integration capabilities. Data mesh should support data integration.

But data integration is hard. We believe this is because of the following challenges:

Data heterogeneity. In our company example above, there are disparate operational systems housing data of disparate formats and inconsistent semantics.
Data source explosion in terms of variety and velocity, explained by pervasive use of new technology such as cloud computing, IoT device, and mobility.
Poor data quality. On top of the previous problems, a root cause is little ownership for data. Data quality has been found to be very deficient in most organizations.
Mergers and acquisitions. Many organizations grow this way, inheriting the mess of the companies acquired.

Because of its versatility, data warehousing has been the most popular form of integration for analytics. It strives to homogenize every aspect of this heterogeneity problem by moving data to a central storage and providing a common unified view across all functional areas. However, according to this survey, only 22% of organizations over the past two years fully realize any data warehouse ROI. Most data warehousing project failures can be attributed to the challenges described above.

Data lakes ingest data requiring no conformance to a unified view. Modelling and conformance via cleansing/transforming is delayed until there is a business need. Data lakes don’t solve the problem: Today, adoption has waned, and they also have big failure rates — over 85%, according to Gartner, which cites lack of governance, outdated and irrelevant data as failure scenarios to prevent.

Federation, or virtualization, systems implement another form of integration that leaves data in place and presents a virtual model as unified view, but has limitations on query processing capabilities and scale, uniform history management and ability to cleanse data without updating it, so usage has been low. However, this is changing, as explained below.

Finally, we have master data management systems, which help organizations integrate a subset of the data that has the most data quality problems: their official master data assets, those “big-level” entities — customers, products or places — that appear in more than one operational application.

They provide a central, authoritative hub where individual entity records are standardized with common, unified models and with rules to eliminate incorrect (i.e., dirty, duplicate, inconsistent) data from entering the hub. MDM projects have also been associated with high failure rates, although implementations seem to be maturing.

How Data Mesh helps Data Integration

By organizing software teams by domain and by decentralizing ownership, data mesh squarely addresses the data quality challenge. Teams are closer to the source of the data, understand it better and should know what their consumers expect, so they are in the best position to fix the main root causes for bad data quality.

The federated governance team determines what to measure — data quality dimensions and KPIs — and the data platform team supplies the “how” part — technology and best practices for automating measurement.

With no central data delivery team, consumers now must look for data within data products, find the data of interest and, if needed, perform their own integration — much like in many companies, there’s shadow-IT integration happening beyond that of its central team.

However, the situation is much better thanks to the “data as products” principle and federated governance standards: Data products are self-describing via standard metadata, discoverable, accessed uniformly via APIs and securely by global access control, and benefit from managed federated identity.

Developers doing their own integration — aggregating domains or aligning with consuming use cases — must nevertheless build a model for it, which is simpler than building a global model for an MDM hub or a data mart, let alone a warehouse, as they are built iteratively and organically, as opposed to upfront and with a cross-organization aim. They must also address data quality within the new data product, including addressing the potential inconsistencies among records referring to the same entity.

Improving Data Integration Capabilities in a Data Mesh

MDM system capabilities

Let’s start with MDM system capabilities, as they help with the entity resolution problem and with addressing inconsistencies. But first, we believe it is important to highlight the differences between the MDM domain and data mesh domain concepts. As mentioned above, master data domains refer to “big-level” entities managed across multiple applications and organizational units.

Instead, a data mesh domain arises from decomposing a complex system into “bounded contexts” along organizational units. The dominant factor drawing boundaries between contexts is human culture. An MDM domain corresponds then to several data mesh domains dealing with the “same concept”:

Some domains aligned with the source (e.g., per service line or business unit)
Aggregate data domains can be created when some degree of aggregation of source domains into a more holistic concept is needed.

The italicized terms in the previous sentence are important. Data mesh architect Zhamak Dehghani explicitly warns against creating MDM-style, ambitious aggregate domains, capturing all facets of a particular concept, such as “Customer 360,” implying a unified cross-organizational model, especially if teams from different cultures participate.

If capturing all facets of a complex domain to publish a trusted master data hub is one of your requirements, then data mesh will not help you. It will even impose an extra level of burden — setting up a self-service infrastructure, implementing data as products — that will go against delivery speed.

Identity

In her data mesh principles article, Deghani identifies global identity as a basic concern from the platform and the federated governance layer. Identity sounds simple, but for complex domains such as Customer, it requires the functionality of a matching engine that performs pair-wise record similarity computations through authored rules or machine learning methods using data element inputs such as person names, organization names, addresses, identifiers and dates.

The market provides general-purpose matching engines (IBM, SAP, Informatica), specialized engines (D&B’s corporation data matching) and registry-style MDM systems providing a matching capability.

This service could also deliver global identifiers for the domain elements by wrapping such engines with batch matching and real-time lookup before creation APIs. Both source-aligned domains and aggregated domains could call such service, each choosing their own matching policy.

Consistency of copies in aggregated domains

When two records refer to the same entity, and they belong to the same (data mesh) domain, the domain must decide which record or record values should be retained. Most recent update is a frequently used policy in MDM systems. When the records for the same entity belong to different domains and their models overlap, the inconsistency problem appears when records don’t agree on the overlapping part.

Aggregate domains could borrow capabilities from consolidation-style MDM systems: They could model the overlap, detect same-entity inconsistent records and decide, based on a predefined survivorship policy, how to resolve the conflict. Frequent policies include domain source, value set and, again, update recency. In fact, implementations of aggregate domains could be done wrapping consolidation-style MDM systems.

Synchronization of copies within source-aligned domains

The previous section talked about the consistency problem at the aggregate domain level, but it remains unsolved in source-aligned domains. Other more ambitious MDM styles and capabilities were introduced to solve this problem:

Co-existence-style systems, which help in synchronizing changes within operational systems, and
Transaction-style systems, where the MDM hub becomes the system of record for the master domain(s), providing read access to all operational systems via services; the latter no longer create or edit master data.

If the problem of inconsistency of copies on source-aligned domains generates too much pain, a solution is to add co-existence-style synchronization services to a mesh platform and use them as follows:

Source-aligned domains subscribe to entity changes occurring within an aggregate domain,
Upon a change, the aggregate domain pushes the changed entity to domains that subscribed, and
Race conditions on concurrent updates are solved by governance-level synchronization rules.

Transaction-style hubs and data meshes are poles apart; we recommend against simultaneously implementing both.

Analysts want query access

The basis for interoperable data products is providing access through APIs, ensuring developers can connect to domains from other domains. Data product developers should provide both operational APIs and analytical APIs, the latter for consumer developers to, for instance, get all current domain records, filter records by criteria and get historical data over time. APIs are both good and necessary because they are technology agnostic; via encapsulation, they limit cascading effects of internal data structure changes; and they promote security.

However, API-only access for analysis is only adequate for data science workloads or small datasets. BI-type analysts prefer ad-hoc query access: Data providers can’t anticipate all possible ways they want to filter, slice and dice the data. Without direct query access, they must read the entire contents via API and move it to another data product.

If data from a domain is needed for analysis, query access should be opened to consumers. Note that all business intelligence (BI) tools on the market require a database connection. API-based sources outnumber available connectors for BI tools by several orders of magnitude.

Dealing with the impact of changes

If query access is opened, data model changes internal to a data product will have a cascading effect in all consuming queries (consider de-normalizing a model). A way to control the impact of such changes is through data lineage technology available in today’s data catalog products. These products gather metadata from BI tools, databases and data pipeline middleware, stitch them together, and establish dependency chains of queries on tables or views down to the column level. Such functionality is a welcome addition to a data platform. Of course, there’s another obvious reason to adopt data catalogs in the data platform, and that is to support data product discovery.

Supporting cross-business process analysis

We turn now to analysis on multiple data products, which is how the original quest for data integration is formulated on a data mesh. Our example above — impact of product defects on sales and consumer satisfaction— needs the identification of common data elements having significance across data products supporting sales and customer support business processes. These include region codes, time calendars and product hierarchies.

Supporting data interoperability implies the ability to relate KPIs sliced on, say, the same region and product category. This needs agreement on the values of these common data elements (this is Kimball’s conformed dimensions equivalent). Authoring, tracking changes and distributing reference data values to data products must be governed: This capability, akin to, and supported by, MDM systems, is a candidate to include in a data mesh platform.

In a decentralized data mesh, domain teams have the autonomy to store their data products in the systems of their choice. We mentioned above that the use of virtualized systems was limited, but recent open source query engines — such as Trino, now Starbust — have made significant progress toward becoming agnostic to storage technology, yet scalably process data using parallel processing techniques.

Some organizations can be less liberal, have teams store their data in assigned areas of the same underlying data engine and still be faithful to data mesh’s decentralized ownership principles, as discussed here. In this case, query engines have less data to move during processing, so better performance can be expected.

Concluding note

Data mesh has good chances of succeeding because it has strong and sound principles, and because there is enabling technology that has matured over the years to solve many related data management problems. However, to fully establish itself, there needs to be evidence that data mesh projects are capable of overcoming the data integration problem. A data mesh with a data platform layer containing data management capabilities like the ones described here would be a good place to start.

Photo by Steve Johnson on Unsplash.

Source: InApps.net

List of Keywords users find our article on Google:

entity management software

data integration hub

architectural mesh

sap businessobjects meta data manager

informatica metadata manager

sap businessobjects bi reviews

cost element table in sap

mdm matching

mdm web services

informatica multi domain mdm training

sap data lineage

identity access governance sap

business objects vs business intelligence

sap data hub

data mesh wikipedia

capability wikipedia

d&b data

update as update in informatica

sap sales district table

table info record sap

internal tandem duplication

master data management wiki

sap data source

pervasive analytics sap business one

tandem duplication example

sap master data management & architecture

sap data quality

ibm master data management

persistent systems

how to put www in front of domain

“data and domains”

master data manager linkedin

mesh term lookup

persistent systems review

sap mdm catalog

indeed data manager

sap master data management commerce

sap mdm logo

indeed stitch fix

informatica mdm hub

sap mdm full form

mdm wikipedia

master data management wikipedia

hire bobj developer

ibm master data management jobs

persistent systems jobs

informatica domain

product information management wikipedia

call center process provider without upfront

indeed ux researcher

informatica master data management pricing

sap businessobjects bi review

informatica metadata manager data lineage

sap master data management jobs

sap mdm vs ibm mdm

informatica mdm pricing

quest food management services jobs

mobile device management wikipedia

sap update internal table

custome 360 data manager

informatica mdm consultant

linkedin datahub

sap master data management openings

sap mdm

sap parallel processing framework

businessobjects api

asset master data table in sap

quest identity and access management

sap info record transaction

how to upload master data in sap

informatica complex scenarios

sap table cost center master data

mdm jobs

sap business partner email address table

master data management vendors gartner

sap vendor master tables

datahub sap

informatica enterprise information catalog and metadata manager

mdm survivorship

sap business one wikipedia

transaction management wiki

informatica version history

master data manager jobs

master data manager sap

sap-mdm

concurrent technologies corporation jobs

hire styled system developer

informatica data catalog pricing

master data management sap jobs

master data management solutions gartner

sap batch management optimization

sap commerce cloud product content hub

stitch labs review

bi master of management

sap transaction usage history

maintain customer master data sap

master data management in sap jobs

ux researcher indeed

informatica mdm 360

master data in sap bi

sap data catalog

sap mdm governance

what is data lineage in informatica

ibm app connect connectors

sap master data management governance

sap retail master data management

why informatica mdm

batch master table in sap

mdm api

moveit api

sap vendor email address table

master data management tools gartner

sap business objects mobile app

data aggregation to google doubleclick

datahub linkedin

mdm in sap

reference data management gartner

sap master data management consultant job description

sap master data governance

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update Data Management on a Decentralized Data Mesh

Read more about Data Management on a Decentralized Data Mesh at Wikipedia

The Quest for Data Integration

How Data Mesh helps Data Integration