0% found this document useful (0 votes)

479 views21 pages

Data Management For Analytics Notes

1. The document provides an introduction to data management concepts including data frameworks, data storage types, data modeling, database design, and database properties. 2. It discusses data modeling techniques like normalization, functional dependencies, and entity relationship diagrams. 3. Key aspects of database design are covered including the phases of design from conceptual to logical to physical models.

Uploaded by

Keng Whye Leong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

479 views21 pages

Data Management For Analytics Notes

Uploaded by

Keng Whye Leong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

1.

Introduction to Data Management for Analytics

(I) Data Management Frameworks
Note: Review Pdf 1.0 then do a google search on the frameworks
- DAMA-DMBOK

(2) Types of Data Storage in Cloud Computing

2. Data Modelling & Design
- Data Modelling

● Process of learning about the data; construct visual

representation of the parts of the data
● Goal is to show relationships between structures, data
points, data grouping, attributes of the data

- Functional Dependencies
● Definition: Constraint that determines the relation of one attribute to another attribute in a database
- Denoted by an arrow →
- Example: X → Y
- In example below, Employee Name/Salary/City are all functionally dependent on Employee Number. So we can
say Employee number → Employee Name/Salary/City

● Multivalued Dependency
- Definition: Occurs when there are two/more independent attributes in a table that is dependent on
another attribute

← Example: maf_year and color are independent of each other

- However, both are dependent on car_model
- Therefore, both columns/attributes are ‘multivalue dependent’ on car_model
- We can denote the r/s as car_model → maf_year | car_model → color
● Trivial Functional Dependency
- Definition: Occurs when the attribute that has the dependency is a subset of the attribute it is dependent on.
- Example: X → Y is a trivial functional dependency if Y is a subset of X

← Example: (Emp_id, Emp_name) → Emp_id is a trivial functional dependency as

Emp_id is a subset of (Emp_id, Emp_name)

● Non-Trivial Functional Dependency

- Definition: Basically where attribute that has the dependency is not a subset of the attribute it is
dependent on

← Example: Company → CEO. CEO is not a subset of company. We

must know the company before we know the CEO
- Similarly, CEO → Age. We must know who the CEO is before we
can tell his/her age

● Transitive Dependency
- Definition: Occurs between 3 or more attributes. Essentially, it’s an indirect non-trivial dependency

← Example: Company → Age is a transitive dependency

- We know Company → CEO and CEO → Age
- Therefore, Company → Age; We must know the company before we know
CEO, we must know who the CEO is before we know the age
- Normalization

● Refers to the process of removing

dependencies from the
tables/data
● Purpose is to avoid data
redundancy, insertion, update &
deletion anomaly

- Process of Normalization
● From 0NF to 1NF
- MatrixNum is the unique key for all the records
- We can say that the rest of the columns are ‘functionally dependent’ on MatrixNum
- From 0NF to 1NF, we’re removing the nesting/grouping on the Unique Key to associate it 1-1 to the
rest of the columns

● From 1NF to 2NF

- Essentially you want to remove partial functional dependencies in the table
- Idea of partial functional dependencies is when your data requires more than 1 unique keys working together to
uniquely identify/make sense of the data
- See example below → Your MatrixNum relates to Name/Programme. EnrollNum relates to Semester,
AcadYear, Course, CourseName.
- Your MatrixNum + EnrollNum relates to Result

- To normalize the data → You need to split it into separate tables

● From 2NF to 3NF

- Idea here is that you wanna remove transitive dependencies from the table
- In the 2NF table below, EnrollNum is associated with Course and Course is associated with CourseName
- So the transitive dependency is EnrollNum → Course → CourseName
- To normalize the data → You need to split the Course & CourseName into separate mapping table
- Data Model & Data Modelling Notation
● Chen Notation
- Lecturer Comments: Very old and overly technical

● Crow’s Feet Notation

- Lecturer’s Comments: This is the industry standard. We will use this for purposes of the course as
well

● Unified Modelling Language (UML) Notation

- :Lecturer Comments: Usually used by software developers

- Designing a Database (Phases of Database Design)

● Data Model: Plan/blueprint for a database design; More generalized and abstract than a database
design
● Phases of a Database Design:

- The actual implementation is (1) You come up w the data model, (2) You write SQL scripts to
represent the data model & create the tables, (3) You run the SQL scripts and insert the actual data
● Conceptual → You think/come up w the table/fields you require
Logical → You determine the r/s between the different tables
Physical → You come up w the details for the different fields/tables (E.g. Set character limits/data type, etc…)
- Entity Relationship Diagram (ER Diagram)

● Basically a structural diagram used in database design; contains entity and maps out the relationships
between entities

- Aspects of the ER Model

● Entity
- Basically the ‘table’ in a database
- Represented by a name and a rectangle, with its attributes listed in the body of
the rectangle

● Entity Attributes

- Refers to the property/characteristic of the entity/table

- For databases, it has a name and the data type/size of the attribute

● Primary Key
- Refers to a special entity attribute that uniquely defines a record in a table
- This means that the column/field that is the PK, must not be repeated in the table
- Example, id field of a given table
● Foreign Key
- Is essentially a PK of another table
- FK need not be unique in the table where it is not the PK
- Concept of Cardinality for ER-Model
● ER Model relationships are classified by their cardinality
● Cardinality refers to the possible number of occurrences in one entity which is associated with the
number of occurrences in another
- E.g. ONE team has MANY players, we can then say Team has a one-to-many cardinality with Player
● Notations:

● Reading/Interpreting a ER Diagram/Relation

- Between Customer & Pizza, to establish relationship between Customer → Pizza, we must look at the crow’s
feet notation attached to Pizza
- In this case, its 0/Many attached to Pizza → Means a customer can either order 0 pizza or many pizzas
- For the converse, it's also 0/Many attached to Customer → Means a Pizza can be ordered by 0 customer or many
customers

- Weak Entity
● Defined as an entity that does not have attributes/columns that can identify its records uniquely
- E.g. No primary key in weak entity
- Example: Intermediate tables → Where table consists of PKs of two other tables
- Database Design Misc Example
● In this example, COMPANY & PART tables can’t be joined to each other
- Solution is to create an intermediary table mapping CompanyName & PartNumber to allow both tables
to join
- Lecturer: Would be good to create a PK in COMPANY_PART_INT table for ease of reference

- Database Property → ACID: Atomicity, Consistency, Isolation, Durability

● Properties that all transactions should possess
○ Atomicity
- Relates to the ‘all or nothing’ property
- The transaction must be an indivisible unit that is either performed it its entirety/not performed
at all
○ Consistency
- DB transaction must transform the database from one consistent state to another consistent
state
○ Isolation
- Transactions must be able to execute independently from one another
- I.e. One incomplete transaction must not affect another transaction
○ Durability
- Effects of a successful transaction must be permanently recorded in the database and not lost
in a subsequent failure

- Types of Data
● Transactional Data
- Refers to the data that is captured from transactions
- Example: Time of transaction, place, price, payment method employed, etc…
- Usually captured at point of sale through a POS system
● Analytical Data
- Transaction data that is transformed via calculations/analysis
● Master Data
- Refers to the actual critical business objects upon which transactions are performed
- Data Warehouse
● The general idea for data storage in Data Warehouse is to provide information & knowledge to support
decision making in your org
● Having the data in normalized/OLTP form usually is not good as it can be computationally expensive to
join data together to perform analysis
● Therefore, often data in Data Warehouse is stored in a de-normalized form

- OLTP vs OLAP

- Dimensionality Modelling (Converting data from OLTP to OLAP)

● Basically, you are de-normalizing the data for high performance access (Less joins)
● You can represent the denormalized data via 2 schemas:
○ Star Schema
- Structure that contains a fact table in the center (Fact tables are tables that contain
transactional data; E.g. Sales)
- Fact table is surrounded by dimension tables containing reference data (Dimension tables are
tables that contain reference information; E.g. Store_id to Store Name)
○ Snowflake Schema
- Variant or star schema where dimension tables do not contain denormalized data
● Terminology
- Dimension Tables: Tables connected to fact table containing reference/static data
- Attribute: Non-key fields in Dimension tables
- Fact Table: Central table in a dimensional model contain facts/transactional data
- Facts: Business measures/metrics
- Grain/Granularity: Level of detail/frequency at which data in Fact table is recorded
3. Data Integration and Interoperability
- Data Integration

● Process of bringing data from disparate sources together to provide users w a unified view
● Purpose: To make data more easily available and easier to consume by systems/end-users
● Benefits: Free-up resources, improve data quality, improved operational efficiency, can gain valuable
insight through data

- Data Integration Tools

- Set Theory for Data Join

1. Outer Join

2. Left/Right Join

3. Inner Join

- Data Acquisition and Extraction

● Data Acquisition
- Process of capturing, integrating, transforming, aggregating and loading the data to the data
warehouse after assuring data quality
- Process is more inclusive/comprehensive that ETL (Extract Transform Load) / ELT (Extract Load
Transform)
● ETL/ELT

○ ETL
○ ELT
● ETL vs ELT
- If you transform first, you are in practice determining the schema in advance, data stored might be too
inflexible for use
- If you load first, the risk is that it might become ‘rubbish’
● Data Warehouse
- Structured data is loaded into the data warehouse for analytical use
● Data Lake
- Data lake is the place where all sorts of data is stored
- Structured, textual, unstructured data
● Data Lakehouse
- Similar to Data Lake but with data management architecture baked in to index/cache all forms of data
stored in the data lake

- Data Consolidation
● Basically consolidating data from different silos to a single place

- Data Visualization
● Bring all data from different sources/places to one platform
● One platform to access/combine/analyze the dataset; reduce access cost
- Data Federation
● A software/platform that allows multiple databases to function as one
● Data from multiple sources are combined into a common model; i.e. can query/join using a common
platform/schema
- Data Replication
● Data is intentionally stored in > 1 site/server
● Purpose is to allow data to be available in case of downtime/heavy traffic; idea of improving data
accessibility/uptime
- Data Harmonization

- Data Pipeline
- Data Engineering
- Data Fabric
3. Data Project Implementation
4. Data Governance - Data Quality, Security & Privacy

- Types of Data for Data Governance

- Data Governance Implementation: Sales Analytics

- Data Quality
● Quality is assessed wrt to data’s fit to purpose it was intended for
- High quality means it accurately represents the real world constructs
- Bad data will result in low information quality, then as it moves up the management hierarchy, it leads
to bad business decision

- Measuring/Assessment Data Quality - Data Quality Checks

1. Data Sampling
(i) Random
(ii) Sampling with Fixed Criteria

2. Data Profiling
- Process where data is examined & analyzed; generate summary statistics
- Purpose: Give an overview of data to ensure any discrepancies/risks/trends are spotted
- Data Dictionary

● Specification/description of data structures in a

database/data model/data source
● Contains list of entities/tables/datasets and their
fields/columns/data elements
● Information may include: Data type, description,
relationships, aliases, constraints, sources,
etc…
● Data Catalog - Distinct from Data Dictionary -
Basically an inventory of data objects in your
organization

- Data Mapping

● Definition: R/s between 2/more datasets and matching/connecting fields from one dataset to another
● Purpose: Link data fields across areas to create standardized accurate data

- Data Privacy: Data Confidentiality, Anonymization, Masking

1. Data Masking
- Definition: Technique that scrambles data to create an inauthentic company for non-production
purposes
- After masking, data retains the characteristics & integrity of production data
- Masked data usually used for analytics/training/testing
2. Data Redaction
- Definition: Data masking technique that replaces data with chose redaction
- E.g. → S9300000J → XXXXXX000J
- Purpose: Used as a secrecy control/privacy control, usually used for hiding personal identifiable information
(PII)

3. Data Encryption
- Definition: Translate data into another form → Only people w access to a secret key/password can read it

4. Data Masking/Redaction vs Encryption

- Data Masking/Redaction used more frequently as it allows organization to maintain usability of
customer data; Usually used as the standard solution for pseudonymisation

- Aspects of Data Security

1. Data Access
- Authentication/authorization of access
- Data access is recorded and will be audited
- Data access must necessarily relate to location where data is stored; on-prem vs cloud
2. Data Classification (User Role)

3. Data Lineage
- Need to understand how changes upstream may affect downstream sources
- E.g. Upstream data source gets an update, downstream data might be impacted
4. Data Encryption
- Whether the data is encrypted at rest or encrypted while in-transit

- Data Classification
● Definition: Process of organization information/data assets using an agreed upon categorization logic
- Result usually is a large repository of metadata useful to make further decision/to facilitate use and
governance of data
- E.g. Can make decisions on the value/security/access rights/usage rights/privacy/storage
location/quality/retention period of the data
● Example - GDPR Classification Tags

- Data Lineage

CDMP 5月6号随时可以考试的
No ratings yet
CDMP 5月6号随时可以考试的
22 pages
EB2406 - Teradata PDF
No ratings yet
EB2406 - Teradata PDF
18 pages
Dama Exam Prep
No ratings yet
Dama Exam Prep
7 pages
Informatica BDM Training Agenda
100% (2)
Informatica BDM Training Agenda
4 pages
Data Virtuality Best Practices
No ratings yet
Data Virtuality Best Practices
18 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
38 pages
CDMP Practice Exam v3
No ratings yet
CDMP Practice Exam v3
38 pages
Data Catalog Lab for IT Professionals
No ratings yet
Data Catalog Lab for IT Professionals
17 pages
DMF 1220
No ratings yet
DMF 1220
6 pages
Data Dictionary
No ratings yet
Data Dictionary
2 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
31 pages
Data Quality Rule
No ratings yet
Data Quality Rule
6 pages
Data Warehousing FAQ
No ratings yet
Data Warehousing FAQ
5 pages
Axon Data Governance - The Playbook For Information Segmentation (2021)
No ratings yet
Axon Data Governance - The Playbook For Information Segmentation (2021)
27 pages
Data Dictionary
No ratings yet
Data Dictionary
4 pages
Data Governance Daily Workflow
No ratings yet
Data Governance Daily Workflow
7 pages
Power BI Information Pack
No ratings yet
Power BI Information Pack
4 pages
Introduction To Data Management - Week 1 - 2024
No ratings yet
Introduction To Data Management - Week 1 - 2024
17 pages
EnterpriseDataPlanning PDF
No ratings yet
EnterpriseDataPlanning PDF
24 pages
Review of The Data Dictionary
No ratings yet
Review of The Data Dictionary
4 pages
Advanced SQL Case Study
No ratings yet
Advanced SQL Case Study
42 pages
DAMA Data Governance Chapter
No ratings yet
DAMA Data Governance Chapter
28 pages
DMF 1220 Demo
No ratings yet
DMF 1220 Demo
8 pages
Data Models Data Modelling and Analysis
No ratings yet
Data Models Data Modelling and Analysis
55 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
In 1021 EnterpriseDataCatalog (REST-API) Reference en
100% (1)
In 1021 EnterpriseDataCatalog (REST-API) Reference en
59 pages
Teradata Commands Syntaxes
100% (1)
Teradata Commands Syntaxes
3 pages
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
Data Anonymization
No ratings yet
Data Anonymization
19 pages
Identifying Physical Database Requirements
No ratings yet
Identifying Physical Database Requirements
11 pages
Unit 1
No ratings yet
Unit 1
61 pages
Axon-EDC Integration
No ratings yet
Axon-EDC Integration
23 pages
Data Modeling for Database Designers
No ratings yet
Data Modeling for Database Designers
48 pages
CDMP Chapter 7 Data Security
No ratings yet
CDMP Chapter 7 Data Security
22 pages
ER/Studio® Software Architect: Evaluation Guide
No ratings yet
ER/Studio® Software Architect: Evaluation Guide
27 pages
Resume - Tanmoy Munshi PDF
No ratings yet
Resume - Tanmoy Munshi PDF
2 pages
Chapter 2 DATA HANDLING ETHICS
No ratings yet
Chapter 2 DATA HANDLING ETHICS
6 pages
Lecture 4
No ratings yet
Lecture 4
16 pages
Data Catalog & Metadata Management
100% (1)
Data Catalog & Metadata Management
6 pages
SW Project
No ratings yet
SW Project
19 pages
SCIM Configuration in IICS Admin
No ratings yet
SCIM Configuration in IICS Admin
16 pages
DMF-1220 Data Management Fundamentals Practice Questions
100% (1)
DMF-1220 Data Management Fundamentals Practice Questions
13 pages
Designing Data Governance.1 33676
0% (1)
Designing Data Governance.1 33676
6 pages
Velocity v8 Data Warehousing Methodology
No ratings yet
Velocity v8 Data Warehousing Methodology
1,106 pages
Data Modelling Training 21st Century +917386622889
No ratings yet
Data Modelling Training 21st Century +917386622889
8 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Teradata Interview Questions
100% (1)
Teradata Interview Questions
2 pages
Erwin Data Modeler Navigator Edition User Guide - 140
No ratings yet
Erwin Data Modeler Navigator Edition User Guide - 140
135 pages
Teradata Frequently Asking Questions
No ratings yet
Teradata Frequently Asking Questions
46 pages
NDMO - Data and Privacy Regulatory Sandbox - Guideline
No ratings yet
NDMO - Data and Privacy Regulatory Sandbox - Guideline
13 pages
SQL Database Testing Guide
No ratings yet
SQL Database Testing Guide
0 pages
Data Warehousing
No ratings yet
Data Warehousing
39 pages
Chapter 1 1 PDF
No ratings yet
Chapter 1 1 PDF
60 pages
Datastage Business Glossary
100% (1)
Datastage Business Glossary
108 pages
Tdwi Best Practices Report Building The Unified Data Warehouse and Data Lake
No ratings yet
Tdwi Best Practices Report Building The Unified Data Warehouse and Data Lake
32 pages
Fundamental and Advanced Database Tutorial
No ratings yet
Fundamental and Advanced Database Tutorial
93 pages
Database Management System: Introduction To DBMS Ms. Deepikkaa.S
No ratings yet
Database Management System: Introduction To DBMS Ms. Deepikkaa.S
45 pages
Database Management
No ratings yet
Database Management
7 pages
Designing Databases: Data Storage Design Objectives
No ratings yet
Designing Databases: Data Storage Design Objectives
8 pages
Conceptual Data Model END
No ratings yet
Conceptual Data Model END
102 pages
An Engineering Document Management System PDF
No ratings yet
An Engineering Document Management System PDF
19 pages
QUIZ Merged Merged
No ratings yet
QUIZ Merged Merged
89 pages
Chapter 5 - Foundations of Business Intelligence Database and Information Management
No ratings yet
Chapter 5 - Foundations of Business Intelligence Database and Information Management
30 pages
Data Models and Structures Compact
No ratings yet
Data Models and Structures Compact
2 pages
4th Semester
No ratings yet
4th Semester
55 pages
3161 GIS Data Models
No ratings yet
3161 GIS Data Models
13 pages
Rbi Cims
No ratings yet
Rbi Cims
32 pages
Dbms 2 Marks New
No ratings yet
Dbms 2 Marks New
43 pages
Data Management and Information Processing
No ratings yet
Data Management and Information Processing
25 pages
Database Design
No ratings yet
Database Design
97 pages
Data Versioning Principles Best Practices
No ratings yet
Data Versioning Principles Best Practices
21 pages
TeamCenterHothouse Transcript
No ratings yet
TeamCenterHothouse Transcript
7 pages
Data Mining & Techniques Guide
No ratings yet
Data Mining & Techniques Guide
108 pages
A Detailed View Inside Snowflake
No ratings yet
A Detailed View Inside Snowflake
14 pages
CM WDDBA Level 3
No ratings yet
CM WDDBA Level 3
41 pages
Study and Evaluation Scheme: Uttarakhand Technical University SESSION 2009-10
No ratings yet
Study and Evaluation Scheme: Uttarakhand Technical University SESSION 2009-10
11 pages
Fundamental Database Chapter 1 - 3
No ratings yet
Fundamental Database Chapter 1 - 3
102 pages
Hawassa UNIVERSITY: Daye Campus
No ratings yet
Hawassa UNIVERSITY: Daye Campus
15 pages
N.G Acharya& D. K. Marathe College: Project Report On
No ratings yet
N.G Acharya& D. K. Marathe College: Project Report On
59 pages
Database System Concepts 7th Edition Henry F. Korth Instant Download
No ratings yet
Database System Concepts 7th Edition Henry F. Korth Instant Download
52 pages
Dbms Assignment 1
No ratings yet
Dbms Assignment 1
17 pages
B.Tech Project: E-Shopping System
100% (1)
B.Tech Project: E-Shopping System
71 pages
Graph-Based RAG with Neo4j
No ratings yet
Graph-Based RAG with Neo4j
13 pages
B.Tech DBMS Class Notes
No ratings yet
B.Tech DBMS Class Notes
16 pages
T4RM - Module10
No ratings yet
T4RM - Module10
56 pages
Database Development Process Guide
No ratings yet
Database Development Process Guide
56 pages
Data Mining by Worapoj Kreesuradej
No ratings yet
Data Mining by Worapoj Kreesuradej
43 pages
Behavioural Models From Modelling Finite Automata To Analysing Business Processes 1st Edition Matthias Kunze Ebook All Chapters PDF
100% (2)
Behavioural Models From Modelling Finite Automata To Analysing Business Processes 1st Edition Matthias Kunze Ebook All Chapters PDF
65 pages
6th Sem Open Elective II Syllabus - Final
No ratings yet
6th Sem Open Elective II Syllabus - Final
51 pages
PR WinCC OA V315 Technical Features en
100% (1)
PR WinCC OA V315 Technical Features en
252 pages

Data Management For Analytics Notes

Uploaded by

Data Management For Analytics Notes

Uploaded by

1.

Introduction to Data Management for Analytics

(2) Types of Data Storage in Cloud Computing

● Process of learning about the data; construct visual

← Example: maf_year and color are independent of each other

← Example: (Emp_id, Emp_name) → Emp_id is a trivial functional dependency as

● Non-Trivial Functional Dependency

← Example: Company → CEO. CEO is not a subset of company. We

← Example: Company → Age is a transitive dependency

● Refers to the process of removing

● From 1NF to 2NF

- To normalize the data → You need to split it into separate tables

● From 2NF to 3NF

● Crow’s Feet Notation

● Unified Modelling Language (UML) Notation

- Designing a Database (Phases of Database Design)

- Aspects of the ER Model

- Refers to the property/characteristic of the entity/table

- Database Property → ACID: Atomicity, Consistency, Isolation, Durability

- Dimensionality Modelling (Converting data from OLTP to OLAP)

- Data Integration Tools

- Set Theory for Data Join

- Data Acquisition and Extraction

- Types of Data for Data Governance

- Data Governance Implementation: Sales Analytics

- Measuring/Assessment Data Quality - Data Quality Checks

● Specification/description of data structures in a

- Data Privacy: Data Confidentiality, Anonymization, Masking

4. Data Masking/Redaction vs Encryption

- Aspects of Data Security

You might also like