Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views8 pages

Lecture Notes

Data Science involves gathering, analyzing, and making decisions based on data to identify patterns and predict future outcomes. It is applicable in various fields such as logistics, e-commerce, and politics, and requires skills in machine learning, statistics, and programming. Data can be structured or unstructured, and structuring data is essential for effective analysis.

Uploaded by

Hidingstar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Lecture Notes

Data Science involves gathering, analyzing, and making decisions based on data to identify patterns and predict future outcomes. It is applicable in various fields such as logistics, e-commerce, and politics, and requires skills in machine learning, statistics, and programming. Data can be structured or unstructured, and structuring data is essential for effective analysis.

Uploaded by

Hidingstar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

28/3/2025

ST2195
Introduction to Data Science

What is Data Science?

Data Science is about data gathering, analysis and


decision-making.
Data Science is about finding patterns in data, through
analysis, and make future predictions.
By using Data Science, companies are able to make:
• Better decisions (should we choose A or B)
• Predictive analysis (what will happen next?)
• Pattern discoveries (find pattern, or maybe hidden
information in the data)

1
28/3/2025

Where is Data Science Needed?

• For route planning: To discover the best routes to ship


• To foresee delays for flight/ship/train etc. (through
predictive analysis)
• To create promotional offers
• To find the best suited time to deliver goods
• To forecast the next years revenue for a company
• To analyze health benefit of training
• To predict who will win elections

Application for Data Science


• Consumer goods
• Stock markets
• Industry
• Politics
• Logistic companies
• E-commerce

2
28/3/2025

How Does a Data Scientist Work?

A Data Scientist requires expertise in several


backgrounds:
• Machine Learning
• Statistics
• Programming (Python or R)
• Mathematics
• Databases

How a Data Scientist Works:


1.Ask the right questions - To understand the business problem.
2.Explore and collect data - From database, web logs, customer
feedback, etc.
3.Extract the data - Transform the data to a standardized format.
4.Clean the data - Remove erroneous values from the data.
5.Find and replace missing values - Check for missing values and
replace them with a suitable value (e.g. an average value).
6.Normalize data - Scale the values in a practical range (e.g. 140 cm is
smaller than 1,8 m. However, the number 140 is larger than 1,8. - so
scaling is important).
7.Analyze data, find patterns and make future predictions.
8.Represent the result - Present the result with useful insights in a
way the "company" can understand.

3
28/3/2025

DATA

What is Data?

Data is a collection of information.


One purpose of Data Science is to structure data, making
it interpretable and easy to work with.
Data can be categorized into two groups:
• Structured data
• Unstructured data

4
28/3/2025

Unstructured Data
Unstructured data is not organized. We must organize the data for analysis
purposes.

Structured Data

• Structured data is organized and easier to work with.

5
28/3/2025

How to Structure Data?

We can use an array or a database table to structure or


present data.
Example of an array:
• [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
Example: (Array in Python)
• Array =
[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
print(Array)

Database Table
A database table is a table with structured data.

• The following table shows a database table with


health data extracted from a sports watch:

Duration Average_Pul Max_Pulse Calorie_Bur Hours_Work Hours_Sleep


se nage
30 80 120 240 10 7
30 85 120 250 10 7
45 90 130 260 8 7
45 95 130 270 8 7
45 100 140 280 0 7
60 105 140 290 7 8
60 110 145 300 7 8
60 115 145 310 8 8
75 120 150 320 0 8
75 125 150 330 8 8

6
28/3/2025

Database Table Structure

Database Table Structure Column Column Column Column Column Column


1 2 3 4 5 6
• A database table consists of column(s) and
row(s): Duration Average Max_Pul Calorie_ Hours_ Hours_S
_Pulse se Burnage Work leep
Row 1 30 80 120 240 10 7
Row 2 30 85 120 250 10 7
Row 3 45 90 130 260 8 7
Row 4 45 95 130 270 8 7
Row 5 45 100 140 280 0 7
Row 6 60 105 140 290 7 8
Row 7 60 110 145 300 7 8
Row 8 60 115 145 310 8 8
Row 9 75 120 150 320 0 8
Row 10 75 125 150 330 8 8

Variables

A variable is defined as something that can be measured


or counted.
Examples can be characters, numbers or time.
• In the example under, we can observe that each column
represents a variable.
There are 6 columns, meaning that there are 6 variables
(Duration, Average_Pulse, Max_Pulse, Calorie_Burnage,
Hours_Work, Hours_Sleep).
• There are 11 rows, meaning that each variable has 10
observations.

7
28/3/2025

Variables
Duration Average_P Max_Pulse Calorie_Bu Hours_Wor Hours_Slee
ulse rnage k p
30 80 120 240 10 7
30 85 120 250 10 7
45 90 130 260 8 7
45 95 130 270 8 7
45 100 140 280 0 7
60 105 140 290 7 8
60 110 145 300 7 8
60 115 145 310 8 8
75 120 150 320 0 8
75 125 150 330 8 8

You might also like