MCSD1053
Data Science Governance
Chapter 1:
Introduction to Data
Governance
Key Concept
What is Governance?
Execution and enforcement of authority over
the management of data and data-related
assets.
Data governance is and should be in
alignment with the corporate governance
policies as well as with in the IT Governance
framework.
What is Data • Definition:
Governance • “The exercise of authority, control, and
shared decision making (planning,
(DG)? monitoring and enforcement) over the
management of data assets.”
--Data Management Body of Knowledge (DMBOK)--
What is Data • Alternate Definition:
Governance • “The organization and implementation of
policies, procedures, structures, roles and
(DG)? responsibilities which outline and enforce
rules of engagement, decision rights, and
accountabilities for the effective
management of information assets.”
--John Ladley, 2020--
What it (DG) • DG provides guardrails for using and taking
does? care of the data assets.
• With data, data scientist and analyst use the
data, systems create and change the data,
DG makes sure everyone abides by the
rules.
• An important requirement for success if an
organization wants to be “data driven”.
Source: John Ladley, 2020
7 Components
of DG
Source: John Ladley, 2020
Overview of the DG in
Zachman Framework,
a common framework
used by enterprise
architects and
planners.
Source: John Ladley, 2020
What is Data • Formalization of
Stewardship? accountability for
the management of
data and data-
related assets.
• Role of data
stewards is critical to
operational integrity
and success of a big
data governance
framework.
Two Perspectives:
Who is the
Data Perspective 1:
Steward?
1.Everyone in the organization who deals with
data as being accountable for how they treat
data. Can be anyone in the organization whether
business minded or technical.
2.They are tasked with the responsibility and
accountability for what they do with the data as
they define, produce and/or use data as part of
their work.
Who is the Perspective 2:
Data
1.Someone whose sole responsibility is to be
Steward? accountable for their organization’s treatment
of data.
2.Is one who defines, produces or uses data
as part of their job and has defined level of
accountability for assuring quality in the
definition, production or usage of that data.
DG Best Practices
• Promote a partnership between IT and other functional areas, division
and lines of business.
• Shared responsibilities across the enterprise, not solely an IT
responsibility.
• Every employee is a data steward, responsible for use, storage and
protection of data.
• Apply formal accountability and set the bar for the right behaviours
towards data governance in an organization.
• DG framework must support existing and new processes to ensure the
proper management of action and usage of data through its life cycle.
DG Must Support 3-D Data Analytics
Temporal Dimensions in Data Analysis
DG Must Support Both
Business Intelligence and Data Analytics
DG Must Support Both
Data Warehouse and Data Lakes
DG Must Cover 4-Layers
Data Analytics Framework
Data Connection Layer
• Set-up data ingestion pipelines and data
connectors to access data
• Make an inventory of where the data is
created and stored
• Apply methods to identify meta-data in all
source data repositories.
• Implement Extract, Transfer, and Load (ETL)
software tools to extract data from their source
Find out about the tools that can be
used to transfer data to the Data
Management Layer such as Talend and
data exchange standards such as X.12 CLASS ACTIVITY 1
Data Management Layer
• Store data (might need normalization) in
certain database architectures to improve data
query and access by the analytics layer.
• Apply security & privacy controls In this layer,
we must pay attention to.
• Apply data cleansing programs such as write
tools to de-duplicate (remove duplicate
records) and resolve any data inconsistencies.
Find out about
• Taxonomy of database tools including SQL, NoSQL,
Hadoop, Shark, Cassandra, Lucene, SOLR, Hive, Spark
• HIPAA standards for security and privacy
CLASS ACTIVITY 2
Analytics Layer
• Use analytical engines to support
analytics application
• Include engines for optimization,
machine learning, natural
language processing, predictive
modelling, pattern recognition,
classification, inferencing and
semantic analysis
Find out about
• Machine Learning, Descriptive Analysis, Predictive CLASS ACTIVITY 3
Analysis, Prescriptive Analysis and Deep Learning
Presentation Layer
• Build dashboards, applications
and user-facing applications to
display results of analytics engines
Find out about
• Rapid data visualization programs including Tableau,
QlikView
• Data visualization tools
• Difference between rapid data visualization
programs like Tableau
CLASS ACTIVITY 4
Data Management Body of Knowledge (DMBOK)
Data Management Association (DAMA International)
• organization to advance data management principles, guidelines and best practices.
• source for data management concepts, education and collaboration on international scale.
• Published a set of guidelines that are publicly available under Data Management Body of
Knowledge Version 2.0 (DMBOK 2.0)
DMBOK2 (2011)
• collection of processes, principles & knowledge areas about proper management of data
• includes many of the best practices and standards in data management
• may be used as a blueprint to develop strategies for data management for the entire
enterprise or even at division level
• For used by IT professionals, consultants, data stewards, analysts, and data managers.
https://www.dama.org/content/body-knowledge
Domains of Data Management Body of Knowledge (DMBOK)
9 domains
https://www.dama.org/content/body-knowledge
Maturity Model
• Capability Maturity Model Integration (CMMI)
• was developed as a set of process improvement and appraisal program by Carnegie
Mellon University to assess an organization’s Capability Maturity.
• Data Maturity Model (DMM)
• released by CMMI Institute
• a set of principles to enable organizations improve data management practices across
the full spectrum of their business
• provides organizations with a standard set of best practices to build better data
management structure and align data management with organization’s business goals.
https://www.dama.org/content/body-knowledge
Data Maturity Model (DMM)
• includes a Data Management Maturity Portfolio plus supporting services, training,
partnerships, assessment methods and professional certifications.
• DMM was modeled after CMMI maturity model to measure process maturity of an
organization, improve efficiency and productivity, reduce risks and costs associated with
enterprise data.
• Excellent framework to answer questions such as:
• How mature are your data management processes?
• What level of DMM maturity is your organization at?
• How do you raise your maturity level?
• How do I capture the maximum value and benefits from data for the business?
https://www.dama.org/content/body-knowledge
Data Maturity Model (DMM) Levels
Data Maturity Model (DMM) Domains
A supply chain
metaphor
Source: John Ladley, 2020
Why DG? 1. Compliance and less regulatory problems. Without
metadata compliance and regulatory reports are unreliable
without quality data.
2. Increased assurance and dependability of knowledge
assets. With governance, you can trust your data and place
decisions on solid data.
3. Improved information security & privacy. Data Governance
will reduce the risks and exposures associated with data
loss or breaches.
4. Across the enterprise accountability. Data Governance
enables more accountability and consistency of data
handling across the enterprise.
5. Consistent data quality. Improved quality reduces rework,
waste and delays. It improves quality of decision making.
6. Maximizing asset potential. With Data Governance, the
assets are known, discoverable and usable across the
enterprise, thus raise the value and return on data (ROD).
3 Levels of DG Operational data governance
• focuses on daily operations and safeguard of data, security, privacy
and implementing policies typically influenced by the IT organization
with little involvement from business owners of data. Data is not
treated as a strategic asset. There is no staff dedicated to the roles of
data stewards or data guardians in the organization.
Tactical Data Governance
• concerned with the current and immediate issues associated with
management of data and implementation of governance,
accountability of data governance.
Strategic Data Governance
• long term and future perspective of data governance.
• treats data as a corporate asset
• brings a holistic view of standardization into the conversation,
including evaluating the current organization’s standard terms and
ontology, identifying the gaps and including advanced tools and
techniques such as semantic ontologies and semantic concepts
across the enterprise
• have roles such as Chief Data Officer (CDO), Accountable Executive
(AE), Data Risk Officers (DRO) and a central data council (also
referred to as Data Governance Council)