Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views54 pages

Lecture 1

Uploaded by

Francisca Santos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views54 pages

Lecture 1

Uploaded by

Francisca Santos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Gestão e Armazenamento de dados

-
Data management and storage

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa
Overview

Theoretical & Practical


Lecturer: Mijail Naranjo-Zolotov
Email: [email protected]
Office: Second floor NOVA IMS building. (Office 26)
Office Hours: Wednesday from 16:00 to 17:30 (schedule in advance by e-mail)

Practical (only for the night class):


Lecturer: Mijail Naranjo-Zolotov & Yuri Binev ([email protected] )

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 2
Overview – Learning units

1. Introduction to databases.
2. Normal forms in relational database modelling.
3. Theory + Lab 1: Creating a database using MySQL language.
4. Theory + Lab 2: SQL queries (aggregation, sorting). Introduction to AI
tools.
5. Theory + Lab 3: SQL Joins. Working with AI tools.
6. Theory + Lab 4: SQL Views.
7. Types of NoSQL databases. CAP theorem. Wrap-up;

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 3
Overview – Grading policy

Regular examination period (1st epoch)


• A group project (35%).
• Final exam (65%).
Resit examination period (2nd epoch)
• A group project (35%)
• Final exam (65%).

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 4
Overview – Grading policy

Rules:
• The score in the exam should at least 9.0 (out of 20).
• The teams are made of 4-5 students.
• Late deliveries for the project will be penalized with 1
point for each late day up to 5 points.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 5
Choose your team in moodle

Image sources: https://medium.com/magenta-lifestyle/why-two-large-pizza-team-is-the-best-team-ever-4f19b0f5f719

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 6
Overview - Bibliography

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 7
Lecture 1: Introduction to Databases

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 8
What is database?

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 9
What is database?

• A database can be defined as a collection of related data items within a


specific business process or problem setting.
• A database management system (DBMS) is the software package used to
define, create, use, and maintain a database.
• The combination of a DBMS and a database is then often called a
database system.

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 10
Data created

(in
zettabytes)

Source: https://www.statista.com/statistics/871513/worldwide-data-created/

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa
DBMS value
Size of the database management system (DBMS) market
worldwide from 2017 to 2021(in billion U.S. dollars)

Note: Cloud DBMS


accounted for the
majority of the
overall market
growth, as database
systems are
migrating to cloud
platforms.

Source: https://www.statista.com/statistics/724611/worldwide-database-market/
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 12
DBMS ranking

Source: https://db-engines.com/en/ranking
Method of calculating the scores: https://db-engines.com/en/ranking_definition

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa
DBMS ranking

Source https://insights.stackoverflow.com/survey/2021#most-popular-technologies-database

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa
Why do we need a database?

Fictional case study


Sober – startup Taxi company
1000% driven by technology
SOBER - 1000% Driven by technology

Ride-hailing

Ride-sharing

Image sources: https://thehardtimes.net/harddrive/report-self-driving-cars-95-less-likely-to-pull-off-sick-donuts-in-7-11-parking-lot/


https://www.gep.com/mind/blog/is-ride-sharing-setting-new-horizons-for-corporate-travel
https://ny.curbed.com/2018/9/25/17900464/nyc-taxi-ride-hailing-waave
https://www.pymnts.com/news/ridesharing/2019/kapten-uber-london/
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 16
SOBER - 1000% Driven by technology

For business management, which data SOBER needs to store?

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 17
SOBER - 1000% Driven by technology

Which data SOBER needs to store?


For each ride-hail service:
• time of pick-up and drop-off
• location of pick-up and drop-off
• ride duration
• distance
• number of passengers
• fee
• type of request (via Sober App or hand-waving)
• number and name of the lead customer (the one who pays)

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 18
SOBER - 1000% Driven by technology

Which data SOBER needs to store?


For each ride-sharing service:
• time of pick-up and drop-off
• location of pick-up and drop-off
• ride duration
• distance
• number and names of all customers
• the upfront negotiated fee.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 19
SOBER - 1000% Driven by technology

Which data SOBER needs to store?


For each ride-sharing service:
• time of pick-up and drop-off
• location of pick-up and drop-off
• ride duration
• distance
• number and names of all customers
• the upfront negotiated fee.
What about the accidents?

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 20
SOBER - 1000% Driven by technology

Which data SOBER needs to store?


For each ride-sharing service:
• time of pick-up and drop-off
• location of pick-up and drop-off
• ride duration
• distance
• number and names of all customers
• the upfront negotiated fee.
What about the accidents?
• accident dates
• Location
• damage amounts per car

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 21
SOBER - 1000% Driven by technology

SOBER is a start up company and must carefully decide


how it will manage all its data.
The company is thinking about storing all its data in Word
documents, Excel files, and maybe some other files (e.g.,
Notepad) as well.

Is it a good idea?

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 22
SOBER - 1000% Driven by technology

SOBER is a start up company and must carefully decide


how it will manage all its data.
The company is thinking about storing all its data in Word
documents, Excel files, and maybe some other files (e.g.,
Notepad) as well.

Is it a good idea? NOT


Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 23
Why do we need a database? File-based approach

What is the main problem with file systems?

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 24
Why do we need a database? File-based approach

What is the main problem with file systems? Duplicate data.


What is the solution?

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 25
Database approach

The DBMS
must support the ACID
(Atomicity, Consistency,
Isolation, Durability)

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 26
The Three-Layer Architecture
The Three-Layer Architecture

The three-layer architecture is an essential element of every database application and


describes how the different underlying data models are related

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 28
The Three-Layer Architecture

The conceptual and logical data models focus on the data items, their
characteristics, and relationships.

• user-friendly
• implementation-
independent

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 29
The Three-Layer Architecture

The external data model offers a window on selected part of the logical
data model.

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 30
The Three-Layer Architecture

The internal data model specifies how the data are stored or organized
physically.

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 31
The Three-Layer Architecture - Example

Three-layer architecture for a procurement business process

Source: Lemahieu, W., vanden Broucke, S., & Baesens, B. (2018). Principles of Database Management
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 32
Database users
Database users

The business user will run applications to perform


specific database operations. He/she can also
directly query the database using interactive
querying facilities for reporting purposes.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 34
Roles in Data Management

The information architect (or information analyst)


designs the conceptual data model, preferably in
dialogue with the business users.

The database designer translates the conceptual


data model into a logical and internal data model.

The data owner has the authority to ultimately


decide on the access to, and usage of, the data.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 35
Roles in Data Management

Data stewards are the Data Quality experts in


charge of ensuring the quality of both the actual
business data and the corresponding metadata.

The database administrator (DBA) is responsible for


the implementation and monitoring of the
database.

Data scientist analyzes data using state-of-the-art


analytical techniques to provide new insights into,
for example, customer behavior.
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 36
Database design process
What is a model?

Image Source: https://www.stpetertravel.com/en/viaggi_tour/7072/coliseum-architecture-coliseum-roman-e-imperial-forum-rome-tours.html


Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 38
The database design process

The aim is to understand the diferente steps and data needs of the
process. Techniques: interviews, surveys, inspections of documents,
etc

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 39
The database design process

The information architect and the business user formalize the


requirements in a conceptual data model. This is a high-level model, easy
to understand for the business user and formal enough for the database
designer who will use it in the next step.
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 40
The database design process

The logical data model is based upon the implementation environment. At this
stage it is already known what type of DBMS (e.g., RDBMS, OODBMS, etc.) will be
used, the product itself (e.g., Microsoft, IBM, Oracle) has not been decided yet.
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 41
The database design process
Finally: The logical data model
can be mapped to an internal
data model by the database
designer. In this step, the DBMS
product is known. The database
can then be populated with
data and is ready for use.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 42
The entity-relationship model
The entity relationship model

An ENTITY TYPE represents a business


concept with an unambiguous
meaning to a particular set of users.
Examples?
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 44
The entity relationship model

“ENTITY TYPE is anything that might


deserve its own table in your
database model”. (Tekstenuitleg.net)

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 45
The entity relationship model

An entity is one particular occurrence


or instance of an entity type

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 46
The entity relationship model

An attribute type represents a property


of an entity type. As an example, name
and address are attribute types of the
entity type supplier.
Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 47
The entity relationship model

An attribute is an instance of an
attribute type

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 48
Relationship

A relationship represents an
association between two or
more entities. A relationship
type then defines a set of
relationships among
instances of one, two, or
more entity types.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 49
Cardinalities

Every relationship type can be


characterized in terms of its
cardinalities, which specify
the minimum or maximum
number of relationship
instances that an individual
entity can participate in.

Example: A student is enrolled for 1 or M courses


Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa 50
Small exercise

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 51
Design an entity relationship model (conceptual)

Sober has decided to invest in a new database and begin a


database design process. As a first step, it wants to formalize the
data requirements in a conceptual data model. Sober asks you
to build an entity-relationship model for its business setting.

Think about the ENTITY TYPES that you may need for SOBER.
Think about the RELATIONSHIPS between those ENTITY TYPES.
Draw the diagram on paper.

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 52
How to install MySQL
• MySQL installation will NOT be done in labs.
• You can download MySQL community edition from here (choose Windows (x86, 64-bit), MSI Installer):
https://dev.mysql.com/downloads/mysql/
• If requested the C++ redistributable package, you can download from here:
https://aka.ms/vs/17/release/vc_redist.x64.exe
• You can download MySQL workbench from here: https://dev.mysql.com/downloads/workbench/
Installation videos:
• Windows: https://www.youtube.com/watch?v=u96rVINbAUI
https://www.youtube.com/watch?v=6Gxm6mZ4c7w
• Mac: https://www.youtube.com/watch?v=-BDbOOY9jsc
• https://www.youtube.com/watch?v=5tjKVkbWglY
• Linux: https://www.youtube.com/watch?v=ohln8gMWxYg
More info about installation:
• The Workbench: https://dev.mysql.com/doc/workbench/en/wb-installing.html
• The MySQL server: https://dev.mysql.com/doc/refman/8.0/en/installing.html

Instituto Superior de Estatística e Gestão de Informação


Universidade Nova de Lisboa 53
END OF LECTURE 1

You might also like