Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views7 pages

Understanding Data Contracts

Understanding Data Contracts

Uploaded by

konphap.thien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

Understanding Data Contracts

Understanding Data Contracts

Uploaded by

konphap.thien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Understanding Data Contracts

Julekha Khatun1
1
Affiliation not available

June 07, 2024


Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

Abstract
In today’s data-driven world, seamless data exchange between different systems, teams, and organizations is crucial for opera-
tional efficiency and informed decision-making. Data contracts have emerged as a vital tool to ensure that data is exchanged
reliably, consistently, and with high quality. This article thoroughly examines data contracts - formal agreements that specify
how data is structured, formatted, and shared between different systems or parties. Data contracts define the schema, semantics,
quality, and terms of use for data exchange, ensuring a common understanding between data providers and consumers.

Figure 1: Overall Data flow

So, let’s understand what a Data Contract is


A data contract is a formal agreement that specifies how data are structured, formatted, and shared among
different systems or parties. It defines the schema, semantics, quality, and terms used for data exchange,
ensuring that all parties understand the shared data. Data contracts are essential for maintaining data
consistency, interoperability, and reliability across diverse systems and organizations
To implement data contracts with access control and data quality checks, you can use a YAML file to define
the schema, constraints, and access rules. Below is a sample YAML file that demonstrates how to set up a
data contract with these features:

yaml
table name: customer bookings
version: 1.1
owner: jack dawson
schema:
- column name: tx date

1
type: timestamp
constraints:
not null: true
no future dates: true
- column name: customer email
type: string
constraints:
not null: true
check pii: true
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

- column name: sales amt


type: decimal
constraints:
not negative: true
- column name: revenue amt
type: decimal
constraints:
not negative: true
- column name: booking type
type: string
access control:
roles:
- role: data analyst
permissions:
- read
- role: data engineer
permissions:
- read
- write
- role: data scientist
permissions:
- read
- analyze
data quality checks:
- check: duplicate check

2
description: Ensure no duplicate records
- check: freshness
description: Table to be updated every 30 mins
- check: email format
description: Ensure email format is correct
- check: pii masking
description: Mask customer address and email
“‘
Explanation
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

1. Table Metadata:
- ‘table name‘: The name of the table.
- ‘version‘: The version of the data contract.
- ‘owner‘: The owner of the data contract.

2. Schema Definition:
- ‘column name‘: The name of the column.
- ‘type‘: The data type of the column.
- ‘constraints‘: Constraints applied to the column, such as ‘not null‘, ‘no future dates‘, ‘check pii‘, and
‘not negative‘.

3. Access Control:
- ‘roles‘: Defines different roles and their permissions.
- ‘role‘: The name of the role.
- ‘permissions‘: The permissions granted to the role, such as ‘read‘, ‘write‘, and ‘analyze‘.

4. Data Quality Checks:


- ‘check‘: The type of data quality check.
- ‘description‘: A brief description of the check.

Let’s look at the Implementation Steps


1. Define the Data Contract: Create a YAML file with the schema, access control, and data quality checks
as shown above.
2. Enforce the Data Contract:Use a CI/CD pipeline to enforce the data contract. For example, you can use
GitHub Actions to validate the data contract against the data before merging any changes.
3. Monitor and Maintain:Continuously monitor the data quality and access control to ensure compliance
with the data contract. Use tools like dbt, Great Expectations, or custom scripts to automate these checks.

3
By following these steps and using the provided YAML template, you can implement data contracts that
ensure data quality and proper access control within your data architecture
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

Figure 2: How Data Contracts Work

Let’s review some real-life examples of Data Contracts

Example 1: An E-commerce Platform


Consider an e-commerce platform that integrates multiple third-party vendors for inventory management,
payment processing and shipping. A data contract between the e-commerce platform and payment processor
specifies the structure and format of transaction data, including fields such as transaction ID, amount,
currency, and timestamp. This contract maintains that both parties understand the data being exchanged
and can process it accurately without errors.
Example 2: A Healthcare Data Exchange
Data contracts facilitate patient information exchange among hospitals, laboratories, and insurance com-
panies in the healthcare industry. A data contract may define the structure of patient records, including
fields for patient ID, name, diagnosis, treatment, and insurance details. This ensures that all parties can
accurately interpret and use the data correctly, improve patient care, and reduce administrative errors.

Now, let’s understand some of the challenges in Implementing Data Contracts


One of the most common challenges in implementing data contracts is the resistance to change. Teams may
be accustomed to existing processes and reluctant to adopt new standards and practices. Such resistance
can hinder the successful implementation of data contracts

4
Effective data contracts require collaboration and trust between teams and organizations. A lack of trust
and poor communication can lead to misunderstandings and conflicts, complicating the establishment and
enforcement of data contract
Data requirements often evolve over time, necessitating updates to the data contracts. Managing these
changes, while ensuring backward compatibility and minimizing disruptions can be challenging

So, how do we overcome these challenges? Let us have a look:


To overcome resistance to change, all relevant stakeholders should be involved in the design of the data
contracts. This collaborative design approach ensures that the contract meets everyone’s expectations and
reduces misunderstanding. Regular workshops and discussions can help build consensus and foster a collab-
orative culture
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

The clear documentation of data contracts is essential for effective communication. Provide examples to
illustrate various contract components and use standardized terminology. Regular updates and reviews of
contracts can help maintain trust and ensure that all parties are on the same page.
Implementation of versioning strategy to manage changes in data requirements. Each version of the data
contract should detail the modifications made to ensure backward compatibility and smooth transition for
stakeholders. This approach allows flexibility while maintaining the integrity of the data exchange process.

Figure 3: How we can solve Data Contratcs challenges

5
Now that we have understood the challenges and steps to overcome the implementation of data contracts,
let’s discuss how Data Contracts Help with Data Quality
Data contracts play a crucial role in ensuring data quality by formalizing the expectations and requirements
for data exchange. They help them in several ways
By Defining Data Structure and Format- Data contracts specify the schema and format of the data, ensuring
that data are consistently structured and formatted across different systems. This reduces the risk of data
errors and discrepancies.
By Setting Quality Standards- Data contracts include data quality standards and validation rules, ensuring
that incoming data meet the predefined criteria. This helps to detect and address data quality issues early
in the data processing pipeline.
By Facilitating Data Governance - Data contracts promote effective data governance by defining the roles,
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

responsibilities, and accountability for data quality. This ensures that data are appropriately managed across
the organization.
By enhancing collaboration and communication, providing a common understanding of data expectations
and contracts improves the communication and cooperation between data producers and consumers. This
leads to more effective data analysis and decision making.
Keywords: Data, DataContracts, DataQuality, DataGovernance, AI

Figure 4: how data contracts look

Conclusion

Data contracts ensure reliable, consistent, and high-quality data exchanges in today’s interconnected world.
Data contracts provide a common understanding between data providers and consumers by defining data
structure, format, semantics, and quality. Although challenges such as resistance to change, lack of trust, and
evolving data requirements exist, they can be overcome through collaborative design, clear documentation,
and versioning strategies.

By implementing data contracts effectively, organizations can improve data governance, build trust between
teams, and ensure that data are used efficiently and accurately, ultimately driving better business outcomes.

References
[1] What is a Data Contract? - Data Mesh Manager https://www.datamesh-manager.com/learn/what-is-a-
data-contract

6
[2] Data Contract Specification | Data contracts bring data providers and . . . http://datacontract.com
[3] What is a Data Contract? | One Data https://onedata.ai/what-is-a-data-contract/
[4] Five ways data contracts optimize data governance - DataGalaxy
https://www.datagalaxy.com/en/blog/how-data-contracts-optimize-data-governance/
[5] Data Contracts 101 - How They Work, Why They’re Important https://www.montecarlodata.com/blog-
data-contracts-explained/
[6] How Data Contracts Ensure Your Data Quality Rules - Gable.ai https://www.gable.ai/blog/data-quality-
rules
[7] Data Contracts 101: What Are They? & How To Implement One? https://atlan.com/data-contracts/
[8] Enhancing data quality with data contracts: A pragmatic approach https://www.thoughtspot.com/data-
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

trends/data-governance/data-contracts
[9] 10 Data Contract Open Questions You Need to Ask - Atlan https://atlan.com/data-contracts-open-
questions/
[10] Data Contracts Explained: The Power and Precision of Data Contracts https://juliana-
jackson.com/data-contracts-explained/
[11] Data Contracts: Why Thought Leaders Opt for Ounces Over Pounds https://www.gable.ai/blog/data-
contracts
[12] What Are Data Contracts & How Do They Work? - Hightouch https://hightouch.com/blog/data-
contracts
[13] An Engineer’s Guide to Data Contracts - Pt. 1 https://dataproducts.substack.com/p/an-engineers-
guide-to-data-contracts
[14] 7 Lessons From GoCardless’ Implementation of Data Contracts https://www.montecarlodata.com/blog-
data-contracts/
[15] A Guide to Data Contracts - Striim https://www.striim.com/blog/a-guide-to-data-contracts/
[16] Data contracts and schema enforcement with dbt - Xebia https://xebia.com/blog/data-contracts-and-
schema-enforcement-with-dbt/
[17] Everything you need to know about Data Contracts - Zeenea https://zeenea.com/everything-you-need-
to-know-about-data-contracts/
[18] What is, and what isn’t, a data contract - by Yali Sassoon https://datacreation.substack.com/p/what-
is-and-what-isnt-a-data-contract
[19] Data Contracts: Accountable Data Quality - Data Council https://www.datacouncil.ai/talks/data-
contracts-accountable-data-quality
[20] The What, Why, and How of Data Contracts - DataHub https://blog.datahubproject.io/the-what-why-
and-how-of-data-contracts-278aa7c5f294?gi=a079a68d2064

You might also like