Data Management Fundamentals
DATA
MANAGEMENT
Data Governance
Ms. Wendy C. Canillo
INTRODUCTION
• Data modeling is the process of discovering, analyzing, and scoping data
requirements, and then representing and communicating these data
requirements in a precise form called the data model.
• Data modeling is a critical component of data management.
• The modeling process requires that organizations discover and
document how their data fits together.
• Data models comprise and contain
Metadata essential to data
consumers.
• Much of this Metadata uncovered
during the data modeling process is
essential to other data management
functions.
• For example, definitions for data
governance and lineage for data
warehousing and analytics.
BUSINESS DRIVERS
Data models are critical to effective management of data. They:
• Provide a common vocabulary around data
• Capture and document explicit knowledge about an organization’s data
and systems
• Serve as a primary communications tool during projects
• Provide the starting point for customization, integration, or even
replacement of an application
GOALS AND PRINCIPLES
• The goal of data modeling is to confirm and document understanding of
different perspectives, which leads to applications that more closely align
with current and future business requirements, and creates a foundation
to successfully complete broad-scoped initiatives such as Master Data
Management and data governance programs. Proper data modeling leads
to lower support costs and increases the reusability opportunities for
future initiatives, thereby reducing the costs of building new
applications. Data models are an important form of Metadata.
DATA MODELING AND DATA MODELS
• Data modeling is most frequently performed in the context of systems
development and maintenance efforts, known as the system
development lifecycle (SDLC). Data modeling can also be performed for
broad-scoped initiatives (e.g., Business and Data Architecture, Master
Data Management, and data governance initiatives) where the immediate
end result is not a database but an understanding of organizational data.
• A model is a representation of something that exists or a pattern for
something to be made. A model can contain one or more diagrams. Model
diagrams make use of standard symbols that allow one to understand
content. Maps, organization charts, and building blueprints are examples
of models in use every day.
TYPES OF DATA THAT ARE MODELED
Four main types of data can be modeled (Edvinsson, 2013). The types of
data being modeled in any given organization reflect the priorities of the
organization or the project that requires a data model:
• Category information: Data used to classify and assign types to things.
For example, customers classified by market categories or business
sectors; products classified by color, model, size, etc.; orders classified by
whether they are open or closed.
TYPES OF DATA THAT ARE MODELED
• Resource information: Basic profiles of resources needed conduct
operational processes such as Product, Customer, Supplier, Facility,
Organization, and Account. Among IT professionals, resource entities are
sometimes referred to as Reference Data.
• Business event information: Data created while operational processes
are in progress. Examples include Customer Orders, Supplier Invoices,
Cash Withdrawal, and Business Meetings. Among IT professionals, event
entities are sometimes referred to as transactional business data.
TYPES OF DATA THAT ARE MODELED
• Detail transaction information: Detailed transaction information is
often produced through point-of sale systems (either in stores or online).
It is also produced through social media systems, other Internet
interactions (clickstream, etc.), and by sensors in machines, which can be
parts of vessels and vehicles, industrial components, or personal devices
(GPS, RFID, Wi-Fi, etc.). This type of detailed information can be
aggregated, used to derive other data, and analyzed for trends, similar to
how the business information events are used. This type of data (large
volume and/or rapidly changing) is usually referred to as Big Data.
ENTITY
ENTITY
ENTITY ALIASES
• The generic term entity can go by other names. The most common is
entity-type, as a type of something is being represented (e.g., Jane is of
type Employee), therefore Jane is the entity and Employee is the entity
type. However, in widespread use today is using the term entity for
Employee and entity instance for Jane.
GRAPHIC REPRESENTATION OF ENTITIES
DEFINITION OF ENTITIES
Entity definitions are essential contributors to the business value of any
data model. They are core Metadata. High quality definitions clarify the
meaning of business vocabulary and provide rigor to the business rules
governing entity relationships. They assist business and IT professionals in
making intelligent business and application design decisions. High quality
data definitions exhibit three essential characteristics:
• Clarity: The definition should be easy to read and grasp. Simple, well-
written sentences without obscure acronyms or unexplained ambiguous
terms such as sometimes or normally.
DEFINITION OF ENTITIES
• Accuracy: The definition is a precise and correct description of the
entity. Definitions should be reviewed by experts in the relevant business
areas to ensure that they are accurate.
• Completeness: All of the parts of the definition are present. For example,
in defining a code, examples of the code values are included. In defining
an identifier, the scope of uniqueness in included in the definition.
RELATIONSHIP
• A relationship is an association between entities (Chen, 1976). A
relationship captures the high-level interactions between conceptual
entities, the detailed interactions between logical entities, and the
constraints between physical entities.
RELATIONSHIP ALIASES
• The generic term relationship can go by other names. Relationship
aliases can vary based on scheme. In relational schemes the term
relationship is often used, dimensional schemes the term navigation path
is often used, and in NoSQL schemes terms such as edge or link are used,
for example. Relationship aliases can also vary based on level of detail. A
relationship at the conceptual and logical levels is called a relationship,
but a relationship at the physical level may be called by other names,
such as constraint or reference, depending on the database technology
GRAPHIC REPRESENTATION OF RELATIONSHIPS
RELATIONSHIP CARDINALITY
• In a relationship between two entities, cardinality captures how many of one
entity (entity instances) participates in the relationship with how many of the
other entity. Cardinality is represented by the symbols that appear on both ends
of a relationship line. Data rules are specified and enforced through cardinality.
Without cardinality, the most one can say about a relationship is that two entities
are connected in some way.
• For cardinality, the choices are simple: zero, one, or many. Each side of a
relationship can have any combination of zero, one, or many (‘many’ means could
be more than ‘one’). Specifying zero or one allows us to capture whether or not
an entity instance is required in a relationship. Specifying one or many allows us
to capture how many of a particular instance participates in a given relationship.
RELATIONSHIP CARDINALITY
UNARY (RECURSIVE) RELATIONSHIP
• A unary (also known as a recursive or self-referencing) relationship
involves only one entity. A one-to-many recursive relationship describes
a hierarchy, whereas a many-to-many relationship describes a network
or graph. In a hierarchy, an entity instance has at most one parent (or
higher-level entity). In relational modeling, child entities are on the many
side of the relationship, with parent entities on the one side of the
relationship. In a network, an entity instance can have more than one
parent.
UNARY (RECURSIVE) RELATIONSHIP
BINARY RELATIONSHIP
TERNARY RELATIONSHIP
FOREIGN KEY
• A foreign key is used in physical and sometimes logical relational data
modeling schemes to represent a relationship. A foreign key may be
created implicitly when a relationship is defined between two entities,
depending on the database technology or data modeling tool, and
whether the two entities involved have mutual dependencies.
THANK YOU!