Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
26 views23 pages

Introduction To Data Science - Lect 1

The document provides an introduction to Data Science, defining it as an interdisciplinary field that utilizes scientific methods and algorithms to extract insights from data. It outlines the data science process, including data collection, cleaning, exploration, feature engineering, model building, evaluation, and deployment. Additionally, it discusses types of data, data analysis objectives, and the importance of exploratory data analysis (EDA) in understanding data characteristics.

Uploaded by

Isaac Acquah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

Introduction To Data Science - Lect 1

The document provides an introduction to Data Science, defining it as an interdisciplinary field that utilizes scientific methods and algorithms to extract insights from data. It outlines the data science process, including data collection, cleaning, exploration, feature engineering, model building, evaluation, and deployment. Additionally, it discusses types of data, data analysis objectives, and the importance of exploratory data analysis (EDA) in understanding data characteristics.

Uploaded by

Isaac Acquah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Data Science

DR. NANA YAW DUODU


SYSTEM UNIT
COMPUTER SCIENCE DEPARTMENT

● Demonstrate a thorough knowledge of Data Science Concept.

● Demonstrate understanding in data science processes.

● Explain data analysis

15 July 2025 FACULTY OF APPLIED SCIENCES 2


COMPUTER SCIENCE DEPARTMENT

What is Data Science?

15 July 2025 FACULTY OF APPLIED SCIENCES 3


DATA IS THE NEW OIL
COMPUTER SCIENCE DEPARTMENT

15 July 2025 FACULTY OF APPLIED SCIENCES 4


What is Data?
COMPUTER SCIENCE DEPARTMENT

• In Data Science, data refers to raw facts, figures, or information collected from various
sources, which are then processed and analyzed to generate meaningful insights or
knowledge.

15 July 2025 FACULTY OF APPLIED SCIENCES 5


COMPUTER SCIENCE DEPARTMENT

What is Data Science?

15 July 2025 FACULTY OF APPLIED SCIENCES 6


What is Data Science? COMPUTER SCIENCE DEPARTMENT

● Definition: Data Science is an interdisciplinary


field that uses scientific methods, algorithms,
and systems to extract knowledge and insights
from structured and unstructured data.
● Core Components:
○ Statistics,
○ Machine Learning,
○ Data Engineering,
○ Domain Expertise, and
○ Data Visualization.

15 July 2025 FACULTY OF APPLIED SCIENCES 7


DATA SCIENCE TOOLS
COMPUTER SCIENCE DEPARTMENT

Tools you will need in data science

ANALYTICS
MATH
PROGRAMMING DEEP
LEARNING PowerBI
Probability
Statistics Linear MACHINE Tableau
Algebra Python LEARNING Qlik Sense
Neural Networks
Calculus Matplotlib
Segmentation NLP
Sipy Pandas
Regression
Numpy
Classification
Clustering

15 July 2025 FACULTY OF APPLIED SCIENCES 8


Types of Data-Based on Structure
COMPUTER SCIENCE DEPARTMENT

1. a. Structured Data
• Organized in rows and columns (e.g., databases, spreadsheets)
• Easy to store, query, and analyze
• Examples: Excel sheets, SQL tables
b. Unstructured Data
• Does not follow a predefined format
• Difficult to search and process
• Examples: Text files, images, audio, videos, emails
c. Semi-Structured Data
• Contains both structured and unstructured elements
• Often marked with tags or metadata
• Examples: JSON, XML, HTML files, NoSQL databases

15 July 2025 FACULTY OF APPLIED SCIENCES 9


Types of Data-Based on Measurement Level
COMPUTER SCIENCE DEPARTMENT
2. a. Nominal Data (Categorical)
• Labels or names with no specific order
• Examples: Gender, Country, Blood type
b. Ordinal Data
• Categorical data with a meaningful order
• No consistent difference between levels
• Examples: Education level (High school, Bachelor's, Master's), Ratings (Good, Fair, Poor)
c. Discrete Data
• Countable and finite values
• Examples: Number of students, cars, transactions
d. Continuous Data
• Measurable and can take any value within a range
• Examples: Height, weight, temperature, salary

15 July 2025 FACULTY OF APPLIED SCIENCES 10


Types of Data-Based on Nature
COMPUTER SCIENCE DEPARTMENT

3. a. Quantitative Data
• Numeric and measurable
• Includes both discrete and continuous data
b. Qualitative Data
• Descriptive and categorical
• Includes nominal and ordinal data

15 July 2025 FACULTY OF APPLIED SCIENCES 11


COMPUTER SCIENCE DEPARTMENT

Data Science Process

15 July 2025 FACULTY OF APPLIED SCIENCES 12


DATA SCIENCE WORKFLOW
WHAT IS DATA SCIENCE?

01 04
BUSINESS UNDERSTANDING FEATURE ENGINEERING
Identify the problem that must Transform your raw data into
be considered in the study relevant and meaningful
features

02
05
DATA COLLECTION
Collect data that serves
PREDICTIVE MODELS
your study’s objectives SOURCES OF Build Models

BIG DATA Train machine/deep learning


models, and evaluate their
performance and use them to
make predictions
03
DATA CLEANING 06
Fix the consistency in the DATA VISUALIZATION
data and handle missing Communicate the finding with
values stakeholders and illustrate them
with interactive visualization

CS316: INTRODUCTION TO DATA SCIENCE AA


NIS KO
PY
ChT
aHtG
OP
NTFa
OnRdDO
ApTeAnS
ACI:IIEnNtC
roEdSuPcEtC
ioIA
nLIZATION INTROD
AUnCT
isIOK
N
CoT
uOU
O bDR
aAS
TA
aE@S
1:C2
IE0
D N
A2CEA
T3 STRUCTURES AND VISUALIZ TIO NUBAAA|N2
IS
02K4
OUBAA | 2024
Data Science Process
COMPUTER SCIENCE DEPARTMENT

1. Data Collection: Gathering raw da t a from various sources


(databases, APIs, sensors, etc.).
2. Data Cleaning: Handling missing data, outliers, and formatting
issues.
3. Data Exploration: Analyzing da ta patterns, distributions,
and anomalies through descriptive statistics and
visualizations.
4. Feature Engineering: Transforming raw da t a into
meaningful inputs for machine learning models.
5. Model Building: Developing predictive or inferential models
using statistical learning methods.
6. Model Evaluation: Measuring model performance through
metrics like accuracy, precision, recall, etc.
7. Deployment & Monitoring: Integrating the model into production
and continuously monitoring performance.
15 July 2025 FACULTY OF APPLIED SCIENCES 14
Course Data Analytics
COMPUTER SCIENCE DEPARTMENT

Missing Data

Progra
m ming Maj
Assign Proj or
m ent ect Ex
Assign Assign Assign Assign Assign Assign Course Final Final Final
total total am
Quiz1 m ent:1 Quiz: 2 m ent: m ent: m ent: m ent: m ent: total (Real) Exam: Final Exam TF Exam Total Tot
(Real) (Rea tot al
2 3 4 5 6 OpenCV ROS
l) al
(Re
al)

7.65 10 9.11 10 10 18.71 18.5 10 19.25 19.5 19.5 58 15 14 9 38 96

8.77 10 8.67 10 18.97 20 10 20 19.75 20 59 11 10 5 26 85

8.89 10 9.33 10 10 19.29 18.5 10 19.25 17 17 56 12 12 5 29 85

2.41 10 - 10 10 16.2 20 10 20 11 11 48 10 8 4 22 70

7.16 10 10 10 10 18.86 10 20 17 17 56 14 13 7 34 90

7.53 10 7.56 10 10 18.03 20 10 20 16 16 55 11 11 6 28 83

15 July 2025 FACULTY OF APPLIED SCIENCES 15


Data analysis
COMPUTER SCIENCE DEPARTMENT

• Data analysis is the process of examining, transforming, and interpreting raw data in order
to extract useful information, identify patterns, and support decision-making.

15 July 2025 FACULTY OF APPLIED SCIENCES 16


Objectives of Data analysis
COMPUTER SCIENCE DEPARTMENT

i. To understand what the data is showing

ii. To identify trends, patterns, and relationships

iii. To test hypotheses or assumptions

iv. To inform business or research decisions

15 July 2025 FACULTY OF APPLIED SCIENCES 17


use of statistical models and
machine learning techniques to
COMPUTER SCIENCE DEPARTMENT
predict the trends in the future

process that analyzes data and provides


Use historical data to instant recommendations on how to optimize
identify trends and business practices to suit multiple predicted
relationships
15 July 2025 FACULTY OF APPLIED SCIENCES 18
COMPUTER SCIENCE DEPARTMENT

15 July 2025 FACULTY OF APPLIED SCIENCES 19


Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT

• Exploratory Data Analysis (EDA) is a crucial step in the data science process where a data
scientist explores datasets to understand their structure, detect patterns, spot anomalies, and
test assumptions before building models.
Definition:
• EDA is the process of analyzing data sets visually and statistically to summarize their
main characteristics and gain initial insights.

15 July 2025 FACULTY OF APPLIED SCIENCES 20


Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT

• Exploratory Data Analysis (EDA) is a crucial step


in the data science process where a data scientist
explores datasets to understand their structure, detect
patterns, spot anomalies, and test assumptions before
building models.
Definition:
• EDA is the process of analyzing data sets
visually and statistically to summarize their
main characteristics and gain initial insights.
15 July 2025 FACULTY OF APPLIED SCIENCES 21
Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT

● Purpose
o Understand the structure and content of the data
o Identify patterns, trends, and relationships
o Detect missing values, outliers, and inconsistencies
o Guide feature selection and hypothesis formation
o Support data cleaning and preprocessing
o To understand data characteristics before modeling

15 July 2025 FACULTY OF APPLIED SCIENCES 22


Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT

● Techniques

○ DescriptiveStatistics:Mean, median, variance, etc.

○ DataVisualization: Histograms, scatter plots, box plots, etc.

○ Correlation Analysis: Identifying relationships between variables.

● Tools

○ Pandas, Matplotlib, Seaborn (Python); ggplot2, dplyr (R).

15 July 2025 FACULTY OF APPLIED SCIENCES 23

You might also like