Introduction to Data Science
DR. NANA YAW DUODU
SYSTEM UNIT
COMPUTER SCIENCE DEPARTMENT
● Demonstrate a thorough knowledge of Data Science Concept.
● Demonstrate understanding in data science processes.
● Explain data analysis
15 July 2025 FACULTY OF APPLIED SCIENCES 2
COMPUTER SCIENCE DEPARTMENT
What is Data Science?
15 July 2025 FACULTY OF APPLIED SCIENCES 3
DATA IS THE NEW OIL
COMPUTER SCIENCE DEPARTMENT
15 July 2025 FACULTY OF APPLIED SCIENCES 4
What is Data?
COMPUTER SCIENCE DEPARTMENT
• In Data Science, data refers to raw facts, figures, or information collected from various
sources, which are then processed and analyzed to generate meaningful insights or
knowledge.
15 July 2025 FACULTY OF APPLIED SCIENCES 5
COMPUTER SCIENCE DEPARTMENT
What is Data Science?
15 July 2025 FACULTY OF APPLIED SCIENCES 6
What is Data Science? COMPUTER SCIENCE DEPARTMENT
● Definition: Data Science is an interdisciplinary
field that uses scientific methods, algorithms,
and systems to extract knowledge and insights
from structured and unstructured data.
● Core Components:
○ Statistics,
○ Machine Learning,
○ Data Engineering,
○ Domain Expertise, and
○ Data Visualization.
15 July 2025 FACULTY OF APPLIED SCIENCES 7
DATA SCIENCE TOOLS
COMPUTER SCIENCE DEPARTMENT
Tools you will need in data science
ANALYTICS
MATH
PROGRAMMING DEEP
LEARNING PowerBI
Probability
Statistics Linear MACHINE Tableau
Algebra Python LEARNING Qlik Sense
Neural Networks
Calculus Matplotlib
Segmentation NLP
Sipy Pandas
Regression
Numpy
Classification
Clustering
15 July 2025 FACULTY OF APPLIED SCIENCES 8
Types of Data-Based on Structure
COMPUTER SCIENCE DEPARTMENT
1. a. Structured Data
• Organized in rows and columns (e.g., databases, spreadsheets)
• Easy to store, query, and analyze
• Examples: Excel sheets, SQL tables
b. Unstructured Data
• Does not follow a predefined format
• Difficult to search and process
• Examples: Text files, images, audio, videos, emails
c. Semi-Structured Data
• Contains both structured and unstructured elements
• Often marked with tags or metadata
• Examples: JSON, XML, HTML files, NoSQL databases
15 July 2025 FACULTY OF APPLIED SCIENCES 9
Types of Data-Based on Measurement Level
COMPUTER SCIENCE DEPARTMENT
2. a. Nominal Data (Categorical)
• Labels or names with no specific order
• Examples: Gender, Country, Blood type
b. Ordinal Data
• Categorical data with a meaningful order
• No consistent difference between levels
• Examples: Education level (High school, Bachelor's, Master's), Ratings (Good, Fair, Poor)
c. Discrete Data
• Countable and finite values
• Examples: Number of students, cars, transactions
d. Continuous Data
• Measurable and can take any value within a range
• Examples: Height, weight, temperature, salary
15 July 2025 FACULTY OF APPLIED SCIENCES 10
Types of Data-Based on Nature
COMPUTER SCIENCE DEPARTMENT
3. a. Quantitative Data
• Numeric and measurable
• Includes both discrete and continuous data
b. Qualitative Data
• Descriptive and categorical
• Includes nominal and ordinal data
15 July 2025 FACULTY OF APPLIED SCIENCES 11
COMPUTER SCIENCE DEPARTMENT
Data Science Process
15 July 2025 FACULTY OF APPLIED SCIENCES 12
DATA SCIENCE WORKFLOW
WHAT IS DATA SCIENCE?
01 04
BUSINESS UNDERSTANDING FEATURE ENGINEERING
Identify the problem that must Transform your raw data into
be considered in the study relevant and meaningful
features
02
05
DATA COLLECTION
Collect data that serves
PREDICTIVE MODELS
your study’s objectives SOURCES OF Build Models
BIG DATA Train machine/deep learning
models, and evaluate their
performance and use them to
make predictions
03
DATA CLEANING 06
Fix the consistency in the DATA VISUALIZATION
data and handle missing Communicate the finding with
values stakeholders and illustrate them
with interactive visualization
CS316: INTRODUCTION TO DATA SCIENCE AA
NIS KO
PY
ChT
aHtG
OP
NTFa
OnRdDO
ApTeAnS
ACI:IIEnNtC
roEdSuPcEtC
ioIA
nLIZATION INTROD
AUnCT
isIOK
N
CoT
uOU
O bDR
aAS
TA
aE@S
1:C2
IE0
D N
A2CEA
T3 STRUCTURES AND VISUALIZ TIO NUBAAA|N2
IS
02K4
OUBAA | 2024
Data Science Process
COMPUTER SCIENCE DEPARTMENT
1. Data Collection: Gathering raw da t a from various sources
(databases, APIs, sensors, etc.).
2. Data Cleaning: Handling missing data, outliers, and formatting
issues.
3. Data Exploration: Analyzing da ta patterns, distributions,
and anomalies through descriptive statistics and
visualizations.
4. Feature Engineering: Transforming raw da t a into
meaningful inputs for machine learning models.
5. Model Building: Developing predictive or inferential models
using statistical learning methods.
6. Model Evaluation: Measuring model performance through
metrics like accuracy, precision, recall, etc.
7. Deployment & Monitoring: Integrating the model into production
and continuously monitoring performance.
15 July 2025 FACULTY OF APPLIED SCIENCES 14
Course Data Analytics
COMPUTER SCIENCE DEPARTMENT
Missing Data
Progra
m ming Maj
Assign Proj or
m ent ect Ex
Assign Assign Assign Assign Assign Assign Course Final Final Final
total total am
Quiz1 m ent:1 Quiz: 2 m ent: m ent: m ent: m ent: m ent: total (Real) Exam: Final Exam TF Exam Total Tot
(Real) (Rea tot al
2 3 4 5 6 OpenCV ROS
l) al
(Re
al)
7.65 10 9.11 10 10 18.71 18.5 10 19.25 19.5 19.5 58 15 14 9 38 96
8.77 10 8.67 10 18.97 20 10 20 19.75 20 59 11 10 5 26 85
8.89 10 9.33 10 10 19.29 18.5 10 19.25 17 17 56 12 12 5 29 85
2.41 10 - 10 10 16.2 20 10 20 11 11 48 10 8 4 22 70
7.16 10 10 10 10 18.86 10 20 17 17 56 14 13 7 34 90
7.53 10 7.56 10 10 18.03 20 10 20 16 16 55 11 11 6 28 83
15 July 2025 FACULTY OF APPLIED SCIENCES 15
Data analysis
COMPUTER SCIENCE DEPARTMENT
• Data analysis is the process of examining, transforming, and interpreting raw data in order
to extract useful information, identify patterns, and support decision-making.
15 July 2025 FACULTY OF APPLIED SCIENCES 16
Objectives of Data analysis
COMPUTER SCIENCE DEPARTMENT
i. To understand what the data is showing
ii. To identify trends, patterns, and relationships
iii. To test hypotheses or assumptions
iv. To inform business or research decisions
15 July 2025 FACULTY OF APPLIED SCIENCES 17
use of statistical models and
machine learning techniques to
COMPUTER SCIENCE DEPARTMENT
predict the trends in the future
process that analyzes data and provides
Use historical data to instant recommendations on how to optimize
identify trends and business practices to suit multiple predicted
relationships
15 July 2025 FACULTY OF APPLIED SCIENCES 18
COMPUTER SCIENCE DEPARTMENT
15 July 2025 FACULTY OF APPLIED SCIENCES 19
Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT
• Exploratory Data Analysis (EDA) is a crucial step in the data science process where a data
scientist explores datasets to understand their structure, detect patterns, spot anomalies, and
test assumptions before building models.
Definition:
• EDA is the process of analyzing data sets visually and statistically to summarize their
main characteristics and gain initial insights.
15 July 2025 FACULTY OF APPLIED SCIENCES 20
Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT
• Exploratory Data Analysis (EDA) is a crucial step
in the data science process where a data scientist
explores datasets to understand their structure, detect
patterns, spot anomalies, and test assumptions before
building models.
Definition:
• EDA is the process of analyzing data sets
visually and statistically to summarize their
main characteristics and gain initial insights.
15 July 2025 FACULTY OF APPLIED SCIENCES 21
Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT
● Purpose
o Understand the structure and content of the data
o Identify patterns, trends, and relationships
o Detect missing values, outliers, and inconsistencies
o Guide feature selection and hypothesis formation
o Support data cleaning and preprocessing
o To understand data characteristics before modeling
15 July 2025 FACULTY OF APPLIED SCIENCES 22
Exploratory Data Analysis (EDA)
COMPUTER SCIENCE DEPARTMENT
● Techniques
○ DescriptiveStatistics:Mean, median, variance, etc.
○ DataVisualization: Histograms, scatter plots, box plots, etc.
○ Correlation Analysis: Identifying relationships between variables.
● Tools
○ Pandas, Matplotlib, Seaborn (Python); ggplot2, dplyr (R).
15 July 2025 FACULTY OF APPLIED SCIENCES 23