STATISTICAL ANALYSIS AND DATA PRE-
PROCESSING
Dr. Nana Yaw Duodu
Computer Science Department
Accra Technical University
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 2
STATISTICAL ANALYSIS COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 3
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
• A statistic is any quantity computed from values in a sample which is considered for a
statistical purpose.
• Statistical purposes include estimating a population parameter, describing a sample, or evaluating
a hypothesis.
• Analysis is the act of studying or examining phenomenon (something) in detail, in order to
discover or understand more about it.
• This understanding leads to the formulation of opinion and judgment.
• Our financial experts conducted an independent analysis of the investment plan's performance.
7/15/2025 FACULTY OF APPLIED SCIENCES 4
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
• Statistical analysis is the process of collecting large volumes
of data and then using statistics and other data analysis
techniques to identify trends, patterns, and insights.
Or
• Statistical Analysis is the procedure of gathering and
evaluating vast volumes of data in order to identify trends
and build valuable insights.
• Statistics are applied every day, by data scientists and others
in research, industry and government.
7/15/2025 FACULTY OF APPLIED SCIENCES 5
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
• In the professional field, Statistical Analysts take raw data and identify co-
relations among variables to reveal trends and patterns to potential
stakeholders.
• As Statistical Analysts work in various fields, they are responsible for new
scientific discoveries.
• This enhances the health of the communities and makes informed business
decisions.
7/15/2025 FACULTY OF APPLIED SCIENCES 6
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 7
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 8
STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 9
TYPES OF STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 10
TYPES OF STATISTICAL ANALYSIS
COMPUTER SCIENCE DEPARTMENT
1. Descriptive Analysis: Descriptive Statistical Analysis includes gathering, analysing, interpreting,
and outlining data to give them in the form of charts, diagrams, and tables. Rather than concluding, it
makes complicated data easy to read and understand.
2. Inferential Analysis: The inferential Statistical Analysis mainly focuses on drawing significant
conclusions based on the data analysed. It examines the connection between different variables or
makes predictions for the entire population.
3. Predictive Analysis: Predictive Statistical Analysis is a Statistical Analysis that examines data to
emanate past trends and forecast future events based on them. It utilises Machine Learning (ML)
Algorithms, Data Modelling, Data Mining and Artificial Intelligence (AI) to perform Statistical Data
Analysis.
4. Prescriptive Analysis: The prescriptive Analysis analyses data and specifies the best action based on
the outcomes. It is a Statistical Analysis that assists you in making informed decisions.
5. Exploratory Data Analysis: Exploratory Analysis is just like inferential Analysis, but the only
difference is that it includes analysing anonymous data associations. It explores the potential
relationships within the data.
6. Causal Analysis: The causal Statistical Analysis determines the cause-and-effect relationship
between different variables within the raw data. Simply, it defines why something occurs and its
impact on other variables. This procedure can be used by businesses to specify the reason for
failure.
7/15/2025 FACULTY OF APPLIED SCIENCES 11
BENEFITS OF STATISTICAL ANALYSIS IN DATA
SCIENCE
COMPUTER SCIENCE DEPARTMENT
1. Data-driven Decision-making: Statistical Analysis helps in making informed decisions based on
data instead of leaning solely on intuition evidence leading to a systematic and objective approach to
decision-making.
2. Pattern Recognition: Statistical Analysis allows you to draw conclusions about populations or
phenomena based on the sample of data. This enables you to make broader generalisations and
predictions from limited observations.
3. Risk Assessment: It helps assess and quantify risks by analysing historical data or simulations. This
is crucial in fields like finance and insurance, where risk management is a key concern.
4. Data Validation and Cleaning: Statistical Analysis helps identify data quality issues like missing
values, outliers or errors. This is essential for ensuring the reliability of the Analysis.
5. Resource Optimisation: In business and operations, Statistical Analysis can optimise resource
allocation by, for example, minimising costs, maximising profit, or optimising the production
process.
6. Quality Control: Statistical quality control techniques help in monitoring and improving product or
process quality. It identifies deviations from desired standards and takes corrective actions.
7/15/2025 FACULTY OF APPLIED SCIENCES 12
METHODS OF STATISTICAL ANALYSIS IN DATA
SCIENCE
COMPUTER SCIENCE DEPARTMENT
• Statistical Analysis includes various methods and techniques to examine and interpret data. Some
common methods of Statistical Analysis.
1. Mean: The mean or average mean is one of the most widespread Statistical Analysis methods. Mean defines the general
trend of the data and is very easy to estimate. Mean is calculated by adding the numbers in the data set and dividing it by the
number of data points.
2. Standard Deviation: Standard deviation is another most popularly used Statistical tool or method. It examines the deviation
of various data points from the mean of the whole data set. It defines how the data set is distributed around the mean.
3. Regression: Regression is a Statistical tool that determines the cause-and-effect connection between the variables. It defines
the relationship between a dependent and an independent variable. It is typically used to forecast future trends and events.
4. Hypothesis Testing: Hypothesis testing helps in testing the trueness or the validity of an argument or a conclusion against a
data set. The hypothesis is called an assumption made at the start of the research.
5. Sample Size Determination: Sample size determination is also called Data Sampling. It is a method used for deriving a
sample from the representative population. This approach is used when the size of the population is enormous. You can
choose among the different data sampling methods, such as snowball, convenience, and random.
7/15/2025 FACULTY OF APPLIED SCIENCES 13
STATISTICAL ANALYSIS - Descriptive Analysis
COMPUTER SCIENCE DEPARTMENT
• Mean:
7/15/2025 FACULTY OF APPLIED SCIENCES 14
STATISTICAL ANALYSIS - Inferential Analysis
COMPUTER SCIENCE DEPARTMENT
• Standard Deviation:
7/15/2025 FACULTY OF APPLIED SCIENCES 15
STATISTICAL ANALYSIS - Predictive Analysis
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 16
STRUCTURED OF DATASET
COMPUTER SCIENCE DEPARTMENT
• Structured data is highly specific
and is stored in a predefined
format, where unstructured data
is a compilation of many varied
types of data that are stored in
their native formats.
• This means that structured data
takes advantage of schema-on-
write and unstructured data
employs schema-on-read.
7/15/2025 FACULTY OF APPLIED SCIENCES 17
SOURCES OF DATASET
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 18
SOURCES OF DATASET
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 19
NATURE OF DATASET
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 20
STATISTICAL ANALYSIS - Predictive Analysis
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 21
INTRODUCTION TO DATA PRE-PROCESSING
COMPUTER SCIENCE DEPARTMENT
• Data preprocessing, a component of data preparation, describes any type of
processing performed on raw data to prepare it for another data processing
procedure.
• It has traditionally been an important preliminary step for data mining.
• Raw data is often incomplete, inconsistent, and noisy. Data pre-processing cleans
and prepares it for analysis or machine learning.
• It improves quality, consistency, and modeling accuracy.
7/15/2025 FACULTY OF APPLIED SCIENCES 22
INTRODUCTION TO DATA PRE-PROCESSING
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 23
PRE-PROCESSING STEPS IN PYTHON
COMPUTER SCIENCE DEPARTMENT
1. Handling missing values (e.g., dropna, fillna)
2. Encoding categorical variables (LabelEncoder, OneHotEncoder)
3. Feature scaling (StandardScaler, MinMaxScaler)
4. Removing duplicates and irrelevant features
5. Normalization and transformation
7/15/2025 FACULTY OF APPLIED SCIENCES 24
PYTHON LIBRARIES FOR PRE-PROCESSING
COMPUTER SCIENCE DEPARTMENT
1. pandas: data loading, missing value handling
2. NumPy: array operations and numeric computing
3. scikit-learn: preprocessing tools and encoders
4. matplotlib/seaborn: data visualization
7/15/2025 FACULTY OF APPLIED SCIENCES 25
PYTHON LIBRARIES - SAMPLE CODE
COMPUTER SCIENCE DEPARTMENT
• import pandas as pd
from sklearn.preprocessing import StandardScaler
• df = pd.read_csv("data.csv")
df.fillna(df.mean(), inplace=True)
• scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[['age', 'income']])
7/15/2025 FACULTY OF APPLIED SCIENCES 26
HANDS-ON EXERCISES
COMPUTER SCIENCE DEPARTMENT
• Use the dataset provided to complete the following:
1. Calculate basic statistics.
2. Create relevant visualisations.
3. Identify outliers and missing values.
4. Interpret relationships between variables.
7/15/2025 FACULTY OF APPLIED SCIENCES 27
NATURE OF VISUALIZATION
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 28
NATURE OF VISUALIZATION
COMPUTER SCIENCE DEPARTMENT
7/15/2025 FACULTY OF APPLIED SCIENCES 29