Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
44 views14 pages

Data Mesh MD010585

The white paper discusses the Data Mesh concept, which promotes a decentralized approach to data product development by aligning with domain-driven design principles. It identifies three key strategies for successful implementation: intelligent decomposition of complex products, agile business-led development, and automation of development processes. The paper emphasizes the importance of understanding the characteristics of effective data products and the need for lightweight governance to ensure interoperability across domains.

Uploaded by

fawkes123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views14 pages

Data Mesh MD010585

The white paper discusses the Data Mesh concept, which promotes a decentralized approach to data product development by aligning with domain-driven design principles. It identifies three key strategies for successful implementation: intelligent decomposition of complex products, agile business-led development, and automation of development processes. The paper emphasizes the importance of understanding the characteristics of effective data products and the need for lightweight governance to ensure interoperability across domains.

Uploaded by

fawkes123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

WHITE PAPER

Data Mesh

By David Jerrim, Senior Director, Teradata


and Martin Willcox, Vice President of Technology, Teradata
08.21 / DATA MESH / WHITE PAPER
WH ITE PA PE R DATA MESH

Table of Contents Introduction.

2 Introduction There is clear value in knowledge workers


being able to access, combine and share
3 Aligning data products with real-world
data from across the enterprise - but data
business requirements
integration can be complex, time-consuming
4 Understanding the characteristics of and requires that stakeholders from across the
great data products organization come together. The Data Mesh
concept proposed by Dehghani1 proposes a
4 Infrastructure is not the real challenge
divide and conquer approach to the delivery
5 Six features of successful approaches of data and analytic products by aligning
to Data Mesh implementation with domain-driven design
principles and patterns.
10 Architectural considerations and
deployment best-practices
We are enthusiastic about the Data Mesh concept
13 Conclusions because we observe that there are essentially three
strategies for successfully delivering data and analytic
14 About Teradata products faster in large and complex organizations:

1. Intelligent use of decomposition, so that large and


complex products can be created in parallel by
multiple, loosely coupled development groups;
2. The use of agile, business-led development methods
and teams to eliminate unnecessary work;
3. The automation of as much as possible of the
necessary development and testing work that
remains.
These approaches are not mutually exclusive – and by
employing all three strategies, organizations with mature
“DataOps” processes and toolsets can build complex
data products in 4–6-week sprints. This is true even
when those data products require the acquisition and
integration of multiple datasets.

Whilst dividing large problem spaces into smaller, more


tractable components is fundamental to success,
the unthinking application of “divide and conquer”
approaches to data product development is creating
a new generation of overlapping, redundant data silos
in some organizations.

This risks creating more technical debt precisely


when organizations seek the rapid rollout of digital
transformation programmes and increased agility.
1 https://martinfowler.com/articles/data-monolith-to-mesh.html

2 TERADATA.COM
WH ITE PA PE R DATA MESH

It is also the case that despite several decades The remainder of this paper outlines practical steps
of intensive academic research, distributed query to deploying the Data Mesh concept as an effective
optimization is still relatively complex and unproven foundation for enterprise analytics.
at scale – and that improvements in the performance
of multi-core CPUs continue to outpace increases in
performance of network and storage sub-systems.
Aligning data products with real-world
Whilst development of usable and useful data products business requirements
is invariably business-led, understanding and respecting
these engineering fundamentals when architecting and The development of data and analytic products is
designing domains remains critical to success. inherently complex, for at least three reasons:

Finally, although “Data is the new oil” has become a 1. Data products often require that data is reused for
cliché, when Clive Humby coined the phrase in 2006 he purposes that were not foreseen when the processes
was also pointing out that raw data – the crude oil in that generate it were created – necessitating
the analogy – must be refined before high-value data complex data transformation, cleansing and
products are created. Raw, un-sessionized weblogs are, integration;
by themselves, neither terribly interesting nor remotely
2. Requirements are often ambiguous and incompletely
comprehensible to most business users. However, when
defined at the start of the project – and are
the raw data have been refined through the removal
frequently fluid thereafter;
of the web-bot traffic and the identification of user
sessions, the resulting data are a powerful predictor of 3. Integrating analytic insights into business processes
customer intent – let’s call this ‘diesel fuel.’ When the demands that complex trade-offs are revealed,
sessionized web data are combined with interaction understood and assessed.
data across channels and re-socialized, we have even For example, we may be able to improve the predictive
more powerful predictors – let’s call these ‘gasoline.’ accuracy of a fraud detection model by training it on
And when these behavioral data are combined with a larger set of features, but at the cost of increased
customer, transaction history and demographic data,
yet more powerful predictors – ‘kerosene’ or ‘jet fuel’
can be created.
“We believe that what we term “the connected
data warehouse” model will be fundamental to
Successful data-driven organizations ensure that data
successful data mesh implementation.”
products like these can be discovered and connected so
that the jet-fuel that powers their digital transformation
initiatives can be created quickly and efficiently, and so
that complex value chains can be optimized. run-times when the model is scored in production. An
increase in decision latency from 150ms to 200ms
As large enterprises operating across multiple might be an acceptable price to pay for a 20% increase
geographies continue to embrace cloud deployment in the lift of a fraud detection model – or it might not.
models and multiple service providers, we believe But we are unlikely to know at the start of the project
that what we term “the connected data warehouse” whether an improvement is possible or not – and even
model will be fundamental to successful Data Mesh less likely to be able to quantify the response time
implementation. Co-location of multiple schemas “cost” or the lift “benefit” so that one can be weighed
aligned to specific business domains within a single, against the other.
scalable database instance provides a natural platform
for at-scale Data Mesh deployment, with lightweight Agile, incremental approaches to the development of
governance processes providing interoperability. data and analytic platforms and products have proven

3 TERADATA.COM
WH ITE PA PE R DATA MESH

to be extremely successful in addressing these issues. • Embedded deployment in mission-critical business


Embedding business stakeholders in the development processes, so that eventual consistency models may
process ensures what gets built is what is needed be inappropriate.
to change the business. Not only is a “minimum Opportunistic technology vendors are rushing to jump
viable product” lens critical to avoiding unnecessary on the Data Mesh bandwagon by claiming that their
development work, but it also reduces testing and virtualization, federation, and even BI application
maintenance effort. Time-to-market is reduced, and technologies represent “magic middleware” that will
the accumulation of technical debt is slowed. enable data to be discovered, relationships to be
inferred, and complex joins across distributed datasets
It is also the case that data products come in a variety to be executed at-scale.
of shapes and sizes, ranging from lightly curated
source-oriented extract files to denormalized star These claims should be treated with extreme scepticism,
schemas optimized for slice-and-dice reporting. Good especially for use-cases where complex processing
data products address a specific requirement at a and high levels of concurrency are concerned. In this
specific point in-time; great data products are re-usable regard, note that even at the low-end a typical data
and extensible. Successful organizations therefore platform in a global 3,000 organization today often
anticipate that data products will need to be adapted supports 50+ analytic applications and 1 billion queries
as business requirements evolve, rather than remaining per annum. Many commentators anticipate increases in
static over the course of their lifetime. Managing the query volumes of two orders of magnitude as Machine
data product development lifecycle effectively requires Learning becomes ubiquitous over the next decade.
that data products are designed to support modularity
and abstraction, and are packaged with appropriate
metadata describing provenance and lineage. Our desire Infrastructure is not the real challenge
to create “minimum viable products” should not lead us
to over-rotate on minimum requirements at the expense We note also that some of the discussions about
of medium-term viability. the Data Mesh on professional social media appear
to suggest that rapid deployment of containerized
Understanding the characteristics infrastructure to support domain initiatives is the
critical ingredient in a successful Data Mesh. For us,
of great data products this misses the point. Provisioning infrastructure was
rarely the “long pole in the tent” for the development
The principal motivation for moving to Data Mesh based of sophisticated data products – even when that
architectures is the desire to deliver high-quality data meant procuring and installing physical infrastructure
products more quickly and at-scale. Understanding in the data centre. And as organizations migrate to the
the characteristics of those data products can help cloud, provisioning computing infrastructure is anyway
organizations to make intelligent technology choices. becoming even simpler and (much) quicker.
Large-scale, enterprise analytics typically share one
or more of the following characteristics: Wrangling disparate data so that it can be reliably
compared and combined remains the long pole in the
• Stateful processing of historical transaction, tent when developing analytic data products – and one
interaction and event data; of us has argued elsewhere that this issue is becoming
• Complex processing that requires multiple datasets more critical as organizations seek to deploy Machine
to be combined in order that sophisticated measured Learning more widely.
can be derived;
• Repeated execution against data that are Meyer and Madrigal’s2 recent experience in managing
continuously changing, so that static caching COVID data in the US is instructive. As they described in
strategies may have limited value; a recent essay in The Atlantic, the initial response of the

4 TERADATA.COM
WH ITE PA PE R DATA MESH

Federal Government in the US to the COVID epidemic data elements and products that in practice may be
rested on the assumption that COVID infection data shared only infrequently. It often makes more sense to
were fundamentally sound. In fact, they were not. decompose the problem space into domains that are
Different states were collecting what looked like the aligned with key business processes and to allow each
same data according to different policies and processes domain to implement the subject areas applicable to its
– so that what appeared to be the ‘same’ data could own activities. This is illustrated below in figure 1.
not in fact be reliably compared. Models fed with bad
data made bad predictions – and the result was bad Subject Areas
public policy. You may not have to deal with 50 states –
Database concept
but you are fortunate indeed if all of the data from your Subject Areas
manufacturing plants is created to the same standards Spans multiple domains
• Database concept
and supplied on the same schedule, if you sell products
• Spans multiple domains
ELDM overwhelming to business
in the same quantity and using the same identifier that
• ELDM overwhelming to business
Data centric
you use to order them, etc. These are not problems Demand
• Data centric
that, by themselves, distributed architectures,
Kubernetes clusters, and CI/CD development pipelines
will solve because they are not technology problems in
the first place. Labor Store ns
Promotions Finance Operatio

Transa Plan/
Six features of successful approaches ctions Shipping Pricing Forecast

to Data Mesh Call Cen


ter Customer HR
Distribut
Center
io n

n
Product
Inventory Logistics Locatio
In practice, we observe six critical success factors for
reducing time-to-market for the development of new
data and analytic products whilst also preserving
cross-domain interoperability.
Domains
1. Business-driven decomposition; or “subject • Business unit concept
areas versus domains” • Self-contained from business point-of view
One of the central concepts of domain-driven design • Small-to-medium size schemas
that is often misunderstood by organizations pursuing • Business area centric
distributed data architectures is “bounded context.”
Decomposition of a large and complex problem space
Yes No
Order Cancel
into a collection of smaller models is not a new idea in received order

science, technology or software engineering – and is


central to domain-driven design (DDD). But good DDD Account Overdue Process
In stock?
checked invoices order
also requires precise definition of explicit boundaries
and interrelationships between domains. It is tempting
to consider the decomposition of the building of
Issue
complex data products on a subject-area-by-subject- invoice

area basis, for example ‘event,’ ‘agreement’ or ‘product.’


Whilst this approach can simplify data re-use, in many
organizations it can imply lengthy negotiation and Figure 1: Data subject areas vs business domains
discussion in pursuit of a common understanding of

2 https://www.theatlantic.com/science/archive/2021/03/americas-coronavirus-catastrophe-began-with-data/618287/

5 TERADATA.COM
WH ITE PA PE R DATA MESH

This model works well where each domain is defined with two extra-large pizzas, it may be too big – and further
an explicit boundary and all users within that domain decomposition should be considered.
are working towards a common purpose and use a
consistent business vocabulary. This can be further All of this implies some degree of management and
enhanced using global standards, for example, for the co-ordination between different development teams.
use of surrogation to obfuscate PII data in natural keys. Lightweight governance processes ensure minimum
Identification, definition, and sizing of domains is also levels of co-ordination across domains, providing
a critical consideration. If domains are defined with too bounded context. Published meta-data ensure that
large a context, agility is sacrificed due to the number data products’ high-value data elements, refined at
of products that must be built and maintained within significant cost to the organization, can be discovered
the domain, and the number of people required to do so. and re-used in other contexts.
Conversely, where domains are drawn with too narrow
a focus, organizations find themselves forever creating 2. Separate schemas by domain to
additional cross-domain, enterprise teams that risk provide agility
redundancy and duplication. For us, the “two pizza rule”
One of the primary advantages of embracing domain-
of agile systems development remains a good guide; if
driven design is agility: loosely coupled teams working
the team building a data product cannot be fed with
more-or-less independently, and each focusing within
their specific areas of business expertise, are able to
deliver data products in parallel.
A simplified retail banking scenario of
business-driven decomposition Our recommended approach to implementation
of Data Mesh based architectures is to create
Consider a simplified retail banking scenario. separate schemas for each domain. Responsibility
There are multiple attributes of a mortgage for data stewardship, data modeling, and population
product that are of limited value outside of the of the schema content is owned by experts with
mortgage domain. Loan-to-value ratios, the business knowledge about the specific domain under
type of survey on which a property valuation construction. This approach removes many of the
was based, and the date of that valuation bottlenecks associated with attempting to implement
all represent important information to the a single, centralized consolidation of all enterprise data
mortgage function. But they have limited into a single schema. The domain-oriented schemas
value in other domains from across the Bank. provide a collection of data products aligned to areas
Decisions about how these data are captured, of business focus within the enterprise. In our simplified
cleansed, modelled, transformed, managed, retail bank scenario, for example, the mortgage domain
and exploited should therefore be delegated may have a legitimate and urgent requirement to create
exclusively to the mortgage domain. By a new data product to understand and measure the
contrast, information about customer salaries impact of the COVID-19 pandemic on the demand for
and mortgage debt will be highly relevant in larger suburban properties. At a minimum, this new data
other domains including unsecured loans, credit product will probably require the roll-up of mortgage
card, and risk. Ensuring that these data can be product sales by a new and different geographical
shared and combined across domains is not only hierarchy from that used by the rest of the organization.
highly desirable, but probably essential to ensure
regulatory compliance in most geographies. And A domain-aligned development process and schema
if delinquency codes can be standardized across makes this possible without lengthy discussion and
all the domains that extend credit to customers, negotiation across the rest of the organization, so long
then the task of understanding which customers as interoperability standards that also enable total
have or are likely to default across multiple sales of loan products to be rolled-up according to
product lines will be greatly simplified. corporate reporting hierarchies exist and are respected.

6 TERADATA.COM
WH ITE PA PE R DATA MESH

3. Integration across domains 4. Support for enterprise data products


Many business outcomes can be optimized in the Enterprise data products present a multi-domain view
context of a single domain. Many end-to-end business of aggregated data products or encapsulate a common
process optimization opportunities, however, require enterprise standard. They support optimization of
data to be combined across geographical, functional end-to-end business processes, such as customer
and domain boundaries and these analytic use-cases lifetime value, demand-driven forecasting, and network
often drive disproportionately high business value. planning. They are often cross-functional by design,
Organizations need to have an explicit strategy for typically require the aggregation of multiple sources of
enabling cross-domain sharing that includes: data – and often have value across multiple use-cases
and applications. Consequently, they will be frequently
• Understanding, defining and enforcing the minimum reused across multiple domains, as illustrated in figure 2.
set of PK / FK relationships required to join and
compare data across different domains reliably Many organizations continue to strive to deliver
and accurately. a 360-degree view of customers and operations
to support cross business activities. For a
• Defining appropriate business, technical and
telecommunications provider this would include
operational meta-data that enables data and data
understanding a customer’s recent network experience
products to be discovered and re-used.
in terms of streaming behavior, coverage areas,
• Appropriate Master Data Management that ensures mobility – but also their value to the organization in
critical attributes, frequently reused and shared terms of product subscriptions, out of bundle charges
across multiple domains, are consistently defined and likelihood to churn. Factor in behavioral indicators
and updated. regarding channel interaction, sentiment and changes to
• A role-based access control policy and framework calling circles and you have a requirement to be able to
that ensures data are accessed and shared build and manage a data product that is sourced from
appropriately, internally and externally. multiple domains.

Enterprise Data Products

Demand/ Savings Credit Cards Mortgages


deposit

Domains & Schemas

Figure 2: A multi-domain view of aggregated data products

7 TERADATA.COM
WH ITE PA PE R DATA MESH

Re-use is therefore about much more than merely 5. Supertypes and subtypes
avoiding the creation of 99 different versions of a
As we have already discussed, to deliver enterprise
nearly identical report. Ultimately, it is about creating
data products successfully and efficiently we need the
layered data architectures that enable high-value data
domain teams concerned to be able to reliably combine
elements to be discovered and re-used to efficiently
and aggregate data across multiple domains. In the
create multiple data products that support multiple
banking scenario described earlier it would be useful to
business processes. Interoperability across the domains
have an enterprise schema to capture information
requires the definition of consistent primary and foreign
about customers accounts across all the products they
key relationships and global standards for data typing,
have with the bank.
naming conventions, and quality metrics.

One approach to achieve this would be to create a


Layered data architectures that enable highly refined,
“Supertype” account enterprise data product that
cross-domain data products to be built from the
is populated from across all of the domains. This
products created by “downstream” domains can
data product contains attributes that are common
introduce dependencies. But the alternative is that
across all the domains. Each domain then manages
multiple domains are forced to acquire the same data
its own subtype account data product with additional
and to create redundant pipelines and overlapping data
attributes specific to the business domain data product.
products. This is not just expensive and error prone, it
This approach drives a degree of consistency across
also creating significant technical debt that will act as
domains since primary keys to support the join back to
a serious drag on innovation and digital transformation
the supertype table must be enforced, but also allows
programmes in the near future. Effective re-use of data
for flexibility for business area domains to extend the
products created in different domains can slash the
subtype data product as required. This is illustrated in
time-to-market for the development of these enterprise
figure 3 below:
data products whilst also improving quality and
consistency, which is why successful organizations place
such a high premium on ensuring that data products
can be discovered and re-used. Account
Enterprise domain
In theory – and assuming adequate PK / FK
relationships have been defined – it is possible to account_id (PK)

implement a “union” operator across separate business account_type_cd

domains to get an enterprise view of the data. In our open_dt

experience however, joining data across LAN and WAN balance_amt

segments with virtualization technologies seldom scales account_status_cd

or performs well for complex workloads, with exponential


degradation not uncommon as the number of federated
systems increases. Instead, it will often be appropriate
Account Account
to create enterprise domains to support the realization
Credit card domain Checking domain
of these enterprise data products with the active
support and collaboration of domain product owners. account_id (PK) account_id (PK)

account_interest_amt check_limit_amt
Robust governance and agile, incremental approaches late_pay_fee_amt overdraft_fee_amt

to delivery can co-exist. Where they do, combined with reward_points_qty min_balance_amt

the use of appropriate automation tools and “DataOps”


processes, they often lead to an order of magnitude
improvement in the time taken to acquire and integrate
Figure 3: Supertypes and subtypes to allow both consistency and flexibility.
data and to deliver complex, enterprise data products.

8 TERADATA.COM
WH ITE PA PE R DATA MESH

Options for the actual implementation of enterprise


data product needs careful consideration to take
Digital transformation and modern business
into account, amongst other factors, frequency of
initiatives are driving the need for more, not
access, location of data products, and performance
less, integration across domains.
expectations. An enterprise domain could provide
a consolidated data product by processing each
domain's individual account data products either
through virtualization or copying of data out of the
domain. However, in the case of a database hosted data
product, virtualized SQL UNION ALL across several
data platforms may not perform as desired when taking
into account factors such as network bandwidth. In this
case, it may make more sense to have (denormalized)
copies of the supertype information duplicated into a
business domains data product, especially if file-based
access is relevant. This consideration is an exercise
is required to successfully leverage data management
for physical design optimization and will need to be
tools and effectively implement cross-domain data
considered for each potential enterprise data product.
governance. Each domain will need support to develop
data quality processes, data structure design, data
Customer and prospect information embedded in each reconciliation, and other elements of the architecture in
business domain almost always increases in value a way that not only works for the near-term application
when promoted to an enterprise domain to facilitate a and analytics use cases, but also enables scalability
customer 360 view for enterprise marketing, risk, and and extensibility. Doing this well takes specialized
other analytics. It is important that in our desire to training and more than a little experience.
embrace increased productivity in analytic teams we do
not back-slide to the bad old days of the 80s and 90s,
There is, nevertheless, an important balance to be
when siloed information systems prevented many B2C
struck here. Strong governance and standards by
organizations from treating customers holistically.
themselves do not guarantee success. Precisely
because the semantic alignment that underpins data
6. Good governance – and the right standards integration is complex and time-consuming, it is
One of the hotly debated topics when adopting important to be selective – both about which data are
Data Mesh principles is how much – or how little – aligned and integrated and the extent to which they are
governance should be applied centrally, rather than modeled. Integration costs serious time and effort – and
delegated to individual domains. the game is not always worth the candle.

Digital transformation and modern business initiatives ‘Engineered’ levels of modeling and integration should
are driving the need for more, not less, integration be deferred until there is a sophisticated understanding
across domains. Providing the coherent, cross- of which data will need to be frequently and reliably
functional view across operations required by modern compared and/or joined with one another. Since this
businesses requires that data are not merely technically level of understanding exists only rarely when the
connected, but also that they are semantically first few MVP data products are being developed,
linked. The consistency in implementation across organizations should take care to avoid over-investing
domains required to make this happen does not just in data modeling and data engineering during the early
spontaneously emerge; rather it requires a co-ordinated, stages of a new programme or project by adopting a
business-driven approach to data governance. ‘Light Integration’ approach, like Teradata’s
Furthermore, it is still the case that specialist expertise LIMA framework.

9 TERADATA.COM
WH ITE PA PE R DATA MESH

A recent data platform development programme at Architectural considerations


a large Asian Telco provides a simple example of the
application of a lightweight, de-centralized governance and deployment best-practices
model. Multiple development teams at the telco were
able to work largely autonomously whilst ensuring Federating the development of complex data products
that inter-domain relationships and dependencies does not automatically imply the federation of their
were modeled and understood. Agile delivery teams deployment. In fact, a spectrum of deployment options
building-out the data platform worked in parallel is available to organizations deploying Data Mesh
domain-oriented teams. The teams managed cross- solutions.
domain impacts via a weekly planning meeting. Each
team would add Post-It notes on a shared planning Different strategies are associated with fundamentally
board for each data subject area they were working different engineering trade-offs, so it is important that
with for that sprint. Where multiple teams were organizations frame these choices correctly and are
leveraging the same subject areas the impacts and intentional about their decisions.
changes were discussed so that all teams were clear
on the changes that were being made. There are many In general terms, there are three different strategies
good tools available to those seeking to digitize this for deploying schemas within a Data Mesh:
process, but this example highlights that collaborative
development processes and practices that prioritize 1. Co-location,
knowledge capture and sharing are the real keys to 2. Connection,
success. By contrast, in our experience far too many
3. Isolation.
organizations lack even basic Wiki pages (or similar) to
describe their data platforms, so that the vast majority
of organizational knowledge about data products walks These are not mutually exclusive and man real-
out the door at the end of each project and world implementations use a combination of
each contract. these approaches.

Isolated Schemas Connected and Connected and


Co-located Schemas Distributed Schemas

Query
Grid

Figure 4: Strategies for deployment schemas within Data Mesh.

10 TERADATA.COM
WH ITE PA PE R DATA MESH

Co-located schemas particular data types or workloads. Virtualization and


federation technologies can still enable cross-domain
The co-located approach to deployment places
analytics in the connected scenario. But by themselves
domains aligned to different schemes under the
these technologies do not guarantee interoperability
management of a single database instance on a
any more than co-location does.
common technology platform. Domains are free to
manage and evolve their schemas independently
It is the reconciling and aligning of data so that they
according to whatever policies and standards have
can be reliably combined and compared that makes
been agreed centrally.
data integration complex. These hard yards must be
run whether the data are co-located and joined in
By itself, co-location does not guarantee interoperability
a database management system or are physically
– that must still be designed-in using the approaches
partitioned across multiple platforms and joined in
outlined earlier in this paper. But deploying on a single
an application or middleware tier.
platform can have very important performance and
scalability advantages, especially for cross-domain
Isolated schemas
workloads that otherwise require data to be assembled
across multiple database instances or even multiple When using the isolated technique, the implication
clouds. Specifically, co-location can eliminate large- is that a data product is completely self-contained
scale data movement operations across relatively within a single domain. The schemas used with the
slow WAN, LAN, and storage networks and can isolated technique are usually narrow in scope and
permit improved query optimization. Even marginal service operational reporting requirements rather than
improvements in query optimization can mean very enterprise analytics. Isolated domains typically have
significant improvements in query performance more autonomy in their deployment – both in data
and concurrency. modeling and technology selection. Sometimes isolated
domains are proposed based on the need for extreme
Where data platforms are supporting billions of queries security (e.g., HR data). However, more often than not,
per annum, the cumulative impact of optimization is the real reason has to do with politics and the desire
substantial improvement in query throughput and for organizational independence. It is rare that truly
reduced total-cost-of-ownership. useful data does not amplify its value when combined
with other data, so even where isolation is desirable
Connected schemas or necessary consideration should be given to using
agreed message formats and pub/sub frameworks and/
Whilst the physical co-location of schemas generally
or APIs to enable the exchange of critical data.
contributes to better performance and lower TCO when
data across multiple domains is combined, there are
A simplified example
many cases where it does not make sense to co-locate
all schemas under a single database instance. For This simplified retail scenario illustrates these
example, sovereignty laws may require that data different approaches and choices. In a retail business,
created by a business unit within a specific country analyzing sales data enables understanding of
must remain in that country. For a multi-national product performance; analyzing order data enables
company this means that there will be multiple schemas understanding of supplier performance; and analyzing
deployed across different geographies – even if the inventory data enables costs to be managed. But
database technology used is the same. There are other by putting detailed sales, order, and inventory data
reasons why schemas may not be optimally co-located together and sharing it with partners and suppliers,
under the umbrella of a single database instance. Wal-Mart dominated grocery retail in the 90s
These include the data gravity created by applications by creating a demand-driven supply chain that
producing data in different clouds, and the use of simultaneously improved on-shelf availability, sales, and
fit-for-purpose database technologies optimized for customer experience whilst also crushing costs.

11 TERADATA.COM
WH ITE PA PE R DATA MESH

Amazon similarly dominates retail today by combining Figure 5 shows a grossly simplified schematic
purchase data with behavioral data to understand representation of a grossly simplified Retail architecture,
what customers want better than its competitors do. illustrating the concepts of co-location, connection
By enabling partners to leverage the platform it has and isolation in the development of data products. The
created, it generates even more data about even more Sales, Orders, and Inventory products are domain-
customers in a virtuous circle. Data integration is critical oriented and developed in parallel, but domain
to both business models. interrelationships are defined and the products are
co-located on the same platform so that data can be
Product performance, supplier performance, and cost combined to create a scalable and high-performance
management domains can build out dedicated data Demand Forecast data product.
products in parallel, but unless these domains engage
in the kind of lightweight collaboration and governance The Customer Experience data products are also
described earlier in this document, development of domain-oriented and are also built-out in parallel,
one of the high-value analytic applications – demand but on a separate platform that enables these data
forecasting – will be significantly more complex and products to be run-time connected to improve both the
more time consuming. Furthermore, because that Customer and the Demand Forecast data products;
application will require large volumes of product, sales, whilst integration is deferred until run-time, it still
and order data to be routinely and frequently combined, requires interrelationships to be defined and modeled.
co-locating these data products on the same platform A conscious decision is taken to de-couple the
to avoid unnecessary data movement - and so improve Activity Based Costing data product from the Product
performance and scalability - is likely to be Performance data product, at the expense of potential
highly desirable. inconsistency in Sales reporting and increased
technical debt.

Data Virtualization Layer

Supply Chain Platform Customer Experience Platform Finance Analytics Platform

Acvitivity
Demand Forecast Behavioral
based costing

Sales Orders INV Cust Click Sales Finance HR

Object
Lake

Enterprise message queue / log

Figure 5: A simplified architecture illustrating co-location, connection, and isolation.

12 TERADATA.COM
WH ITE PA PE R DATA MESH

Additional domains and workstreams can build out the Conclusions


customer experience analytic products, also in parallel.
But it is hard to imagine a useful customer analytics We are enthusiastic about the Data Mesh concept
product that does not involve combining customer because it places intelligent decomposition front-
demographic data with sales data. The Demand and-centre in the rapid development of complex data
Forecast product will be substantially improved if it can products and platforms. Our own experience leads us to
also leverage behavioral data from online channels – a conclude that when smart decomposition is combined
leading, rather than a lagging indicator. with agile, incremental development methods and
appropriate use of DevOps processes and automation
For the purposes of our stylized example, let us assume tools, time-to-market for the development of complex
that these data will need to be combined less frequently data products can in some cases be reduced from
so that co-location is less important. However, even months to weeks.
if this were the case, it is much harder to argue that
the products created by these domains should not be The single most critical success factor for the
linked at all. At a minimum, we should conclude that development of data products remains alignment
these domains need to be connected. As we have with funded business initiatives. Agile, incremental,
already discussed, connecting these data products business-led approaches to data and analytic product
assumes that the domains concerned have defined and development matter because they help organizations
implemented the necessary interrelationships and that to focus on delivering high-quality data products that
a scalable, high-performance virtualization layer – like solve a real-world business problem. They therefore
Teradata’s QueryGrid framework – has been deployed avoid unnecessary development and testing work. That
to enable them to be run-time connected. adds up to less cost, reduced maintenance overhead,
and greater business benefit.
For the purposes of our simplified example, let us
assume that another set of domain-oriented teams are The ultimate “eliminate unnecessary work” play is to
tasked to build-out finance and HR data and analytic extend and reuse existing data products. Designing
products, including an Activity Based Costing (ABC) data products for reuse, creating discoverable data
data product. The ABC data product will also need to services to enable those data products to be accessed,
leverage sales data. If the ABC domain acquire this and ensuring that useful, usable, and searchable
sales data from the product performance data product, “crowd-sourced” business meta-data are available
they become a consumer of that product – and a to end-users are all critical to avoiding the constant
dependency is introduced into the architecture. If they re-invention of similar, overlapping data products.
acquire that data themselves then they eliminate that
dependency at the expense of additional technical Data love data and are frequently exponentially more
debt – because their redundant sales data product valuable when they are aligned and combined across
and pipeline will need to be developed and maintained domains. Lightweight governance models can provide
– and they also risk introducing inconsistency, so that for the interoperability that is required to optimize end-
sales numbers can no longer be reliably compared to-end business processes. Done right, this amounts to
across the organization. Neither of these choices is another reuse strategy, because it ensures that existing
automatically right or wrong, but clearly organizations domain data products are leveraged in the creation of
should be intentional about them to avoid inadvertently aggregate data products. This is a faster and cheaper
deploying hard-to-change and-maintain ‘accidental’ approach to creating these data products. It is better
architectures. too, since it reduces the data duplication that can
otherwise drive inconsistency, complexity and increased
technical debt.

13 TERADATA.COM
WH ITE PA PE R DATA MESH

Whilst we believe that development of data products About Teradata


should be federated along domain lines by default, we
encourage organizations to proceed more cautiously Teradata is the connected multi-cloud data platform
before federating the deployment of those company. Our enterprise analytics solve business
data products. challenges from start to scale. Only Teradata gives you
the flexibility to handle the massive and mixed data
In general, both the co-location and connection workloads of the future, today. The Teradata Vantage
patterns provide for better performance, scalability, architecture is cloud native, delivered as-a-service,
and TCO than does the isolation pattern, with the and built on an open ecosystem. These design features
co-located pattern scaling and performing best of make Vantage the ideal platform to optimize price
all in the vast majority of circumstances. All three of performance in a multi-cloud environment. Learn more
the deployment strategies have a place – and most at Teradata.com.
large and complex organizations will find that they
need to deploy all of them in different parts of the
business. Doing so intentionally, and with a clear-eyed About the Authors
understanding of their strengths and weaknesses, is key By David Jerrim, Senior Director, Teradata
to avoiding “accidental architectures” that inhibit, rather and Martin Willcox, Vice President of Technology, Teradata
than enable, change.
The authors would like to acknowledge the contributions of:
Jean-Marc Bonnet, Jim Bougatsias, Stephen Brobst, Kevin
Most large enterprises already operate across multiple Lewis and Nathan Green. With grateful thanks to Stephen
geographies – and are increasingly leveraging multiple Brobst for providing the illustrations.
cloud service providers. That makes the connected
data warehouse fundamental to at-scale Data Mesh
implementation. Within a Cloud Service Provider and
within a geography, co-location of multiple schemas
aligned to specific business domains within a single,
scalable database instance gives the best of two
worlds: agility in implementation and high-performance
in execution.

17095 Via Del Campo, San Diego, CA 92127     Teradata.com

The Teradata logo is a trademark, and Teradata is a registered trademark of Teradata Corporation and/or its affiliates in the U.S. and worldwide. Teradata continually
improves products as new technologies and components become available. Teradata, therefore, reserves the right to change specifications without prior notice. All features,
functions and operations described herein may not be marketed in all parts of the world. Consult your Teradata representative or Teradata.com for more information.

© 2021 Teradata Corporation    All Rights Reserved.    Produced in U.S.A.    08.21

You might also like