Understanding Data Contracts
Understanding Data Contracts
Julekha Khatun1
1
Affiliation not available
Abstract
In today’s data-driven world, seamless data exchange between different systems, teams, and organizations is crucial for opera-
tional efficiency and informed decision-making. Data contracts have emerged as a vital tool to ensure that data is exchanged
reliably, consistently, and with high quality. This article thoroughly examines data contracts - formal agreements that specify
how data is structured, formatted, and shared between different systems or parties. Data contracts define the schema, semantics,
quality, and terms of use for data exchange, ensuring a common understanding between data providers and consumers.
yaml
table name: customer bookings
version: 1.1
owner: jack dawson
schema:
- column name: tx date
1
type: timestamp
constraints:
not null: true
no future dates: true
- column name: customer email
type: string
constraints:
not null: true
check pii: true
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...
2
description: Ensure no duplicate records
- check: freshness
description: Table to be updated every 30 mins
- check: email format
description: Ensure email format is correct
- check: pii masking
description: Mask customer address and email
“‘
Explanation
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...
1. Table Metadata:
- ‘table name‘: The name of the table.
- ‘version‘: The version of the data contract.
- ‘owner‘: The owner of the data contract.
2. Schema Definition:
- ‘column name‘: The name of the column.
- ‘type‘: The data type of the column.
- ‘constraints‘: Constraints applied to the column, such as ‘not null‘, ‘no future dates‘, ‘check pii‘, and
‘not negative‘.
3. Access Control:
- ‘roles‘: Defines different roles and their permissions.
- ‘role‘: The name of the role.
- ‘permissions‘: The permissions granted to the role, such as ‘read‘, ‘write‘, and ‘analyze‘.
3
By following these steps and using the provided YAML template, you can implement data contracts that
ensure data quality and proper access control within your data architecture
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...
4
Effective data contracts require collaboration and trust between teams and organizations. A lack of trust
and poor communication can lead to misunderstandings and conflicts, complicating the establishment and
enforcement of data contract
Data requirements often evolve over time, necessitating updates to the data contracts. Managing these
changes, while ensuring backward compatibility and minimizing disruptions can be challenging
The clear documentation of data contracts is essential for effective communication. Provide examples to
illustrate various contract components and use standardized terminology. Regular updates and reviews of
contracts can help maintain trust and ensure that all parties are on the same page.
Implementation of versioning strategy to manage changes in data requirements. Each version of the data
contract should detail the modifications made to ensure backward compatibility and smooth transition for
stakeholders. This approach allows flexibility while maintaining the integrity of the data exchange process.
5
Now that we have understood the challenges and steps to overcome the implementation of data contracts,
let’s discuss how Data Contracts Help with Data Quality
Data contracts play a crucial role in ensuring data quality by formalizing the expectations and requirements
for data exchange. They help them in several ways
By Defining Data Structure and Format- Data contracts specify the schema and format of the data, ensuring
that data are consistently structured and formatted across different systems. This reduces the risk of data
errors and discrepancies.
By Setting Quality Standards- Data contracts include data quality standards and validation rules, ensuring
that incoming data meet the predefined criteria. This helps to detect and address data quality issues early
in the data processing pipeline.
By Facilitating Data Governance - Data contracts promote effective data governance by defining the roles,
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...
responsibilities, and accountability for data quality. This ensures that data are appropriately managed across
the organization.
By enhancing collaboration and communication, providing a common understanding of data expectations
and contracts improves the communication and cooperation between data producers and consumers. This
leads to more effective data analysis and decision making.
Keywords: Data, DataContracts, DataQuality, DataGovernance, AI
Conclusion
Data contracts ensure reliable, consistent, and high-quality data exchanges in today’s interconnected world.
Data contracts provide a common understanding between data providers and consumers by defining data
structure, format, semantics, and quality. Although challenges such as resistance to change, lack of trust, and
evolving data requirements exist, they can be overcome through collaborative design, clear documentation,
and versioning strategies.
By implementing data contracts effectively, organizations can improve data governance, build trust between
teams, and ensure that data are used efficiently and accurately, ultimately driving better business outcomes.
References
[1] What is a Data Contract? - Data Mesh Manager https://www.datamesh-manager.com/learn/what-is-a-
data-contract
6
[2] Data Contract Specification | Data contracts bring data providers and . . . http://datacontract.com
[3] What is a Data Contract? | One Data https://onedata.ai/what-is-a-data-contract/
[4] Five ways data contracts optimize data governance - DataGalaxy
https://www.datagalaxy.com/en/blog/how-data-contracts-optimize-data-governance/
[5] Data Contracts 101 - How They Work, Why They’re Important https://www.montecarlodata.com/blog-
data-contracts-explained/
[6] How Data Contracts Ensure Your Data Quality Rules - Gable.ai https://www.gable.ai/blog/data-quality-
rules
[7] Data Contracts 101: What Are They? & How To Implement One? https://atlan.com/data-contracts/
[8] Enhancing data quality with data contracts: A pragmatic approach https://www.thoughtspot.com/data-
Posted on 7 Jun 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.171779368.80821952/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...
trends/data-governance/data-contracts
[9] 10 Data Contract Open Questions You Need to Ask - Atlan https://atlan.com/data-contracts-open-
questions/
[10] Data Contracts Explained: The Power and Precision of Data Contracts https://juliana-
jackson.com/data-contracts-explained/
[11] Data Contracts: Why Thought Leaders Opt for Ounces Over Pounds https://www.gable.ai/blog/data-
contracts
[12] What Are Data Contracts & How Do They Work? - Hightouch https://hightouch.com/blog/data-
contracts
[13] An Engineer’s Guide to Data Contracts - Pt. 1 https://dataproducts.substack.com/p/an-engineers-
guide-to-data-contracts
[14] 7 Lessons From GoCardless’ Implementation of Data Contracts https://www.montecarlodata.com/blog-
data-contracts/
[15] A Guide to Data Contracts - Striim https://www.striim.com/blog/a-guide-to-data-contracts/
[16] Data contracts and schema enforcement with dbt - Xebia https://xebia.com/blog/data-contracts-and-
schema-enforcement-with-dbt/
[17] Everything you need to know about Data Contracts - Zeenea https://zeenea.com/everything-you-need-
to-know-about-data-contracts/
[18] What is, and what isn’t, a data contract - by Yali Sassoon https://datacreation.substack.com/p/what-
is-and-what-isnt-a-data-contract
[19] Data Contracts: Accountable Data Quality - Data Council https://www.datacouncil.ai/talks/data-
contracts-accountable-data-quality
[20] The What, Why, and How of Data Contracts - DataHub https://blog.datahubproject.io/the-what-why-
and-how-of-data-contracts-278aa7c5f294?gi=a079a68d2064