0% found this document useful (0 votes)

4 views50 pages

Data Analysis and Data Visualization Basics 2

The document provides an overview of data analysis and visualization basics, focusing on visualization tools, statistics, and correlation trends. It highlights the importance of visual data representation, the types of statistics, and measures of central tendency and dispersion. Additionally, it discusses correlation and trends, emphasizing their applications and limitations in various fields.

Uploaded by

ezepraise080

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views50 pages

Data Analysis and Data Visualization Basics 2

Uploaded by

ezepraise080

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Data Analysis

and
Data Visualization Basics
(Pt2)

2
Overview
Visualization Tools

Statistics.

Statistical Summaries.

Correlation and Trends.

Visualization Tools

Click here for link image

Statistics on Visualization

Our brains value visuals over any other type of information.

90% of the information transmitted to the brain is visual – (MIT)

The human brain can process an image in just 13 milliseconds – (MIT)

50% of the brain is active in visual processing - (Piktochart)

Human brains process visuals 60,000 times faster than they do text – (University of Minnesota)
Visualization Tools

Tool Features Best For

Matplotlib Basic charts and plots Foundational plotting in Python

Seaborn Advanced statistical plots Visualizing relationships and trends

ggplot Grammar of Graphics framework Data visualization in R

Tableau Interactive dashboards Business intelligence and analytics

Power BI Real-time data analytics Dynamic reporting in enterprises

Power BI Visualization Tableau Visualization

Click here for link image Click here for link image

Install Power BI Desktop and get Install Tableau Public and get
started started
Seaborne, Matplotlib, Ggplot

Click here for link image

Statistics

Click here for link image

What is Statistics?

Statistics is the science of collecting, organizing,

analyzing, interpreting, and presenting data to make
informed decisions.

It is a foundational tool in many fields, including

business, healthcare, engineering, social sciences,
and natural sciences.

It is a foundation for data analysis and data science.

Types of Statistics

Inferential Statistics
Descriptive Statistics
 Makes predictions, inferences, and
 Summarizes and describes the main generalizations about a population based
features of a dataset. on a sample.
 Does not make predictions or infer  Involves probability theory and hypothesis
conclusions beyond the data. testing.
 Common techniques include  Key concepts include:
 Measures of central tendency  Estimation
 Measures of dispersion  Hypothesis Testing
 Data visualizations  Regression and Correlation
Analysis.
Key Concepts in Statistics

Population
The entire group of individuals or items being studied

Sample
A subset of the population used for analysis.

Probability

The active phase where data is accessed, processed, and analyzed for insights or operations.

Outliers
Data points significantly different from others. Can distort statistical measures like mean and
variance.
Statistical Methods

Data Collection
Surveys, experiments, observational studies, and simulations.

Data Analysis
Techniques for summarizing and exploring data.

Hypothesis Testing
• A systematic method to test assumptions about a dataset.
• Includes null and alternative hypothesis testes with statistical tests.

Regression Analysis
• Models the relationship between variables
• Simple Linear Regression and Multiple Regression.
Applications of Statistics

Healthcare Social Sciences

Engineering Education
Business
Challenges in Statistics

03
Interpretation
01 Data Quality

Bias
02
Statistical
Summaries

Click here for link image

Measures of Central Tendency

Measures of central tendency are They provide a summary of the data

statistical metrics that represent by identifying a single value that
the center or typical value of a reflects the overall distribution.
dataset.

Mean Mode

Median
Mean

The sum of all data points divided by the number of data points.

Affected by all values in the dataset, including outliers.

Mean = 𝒊

Where:
𝒊 : Each data point
: Total number of data points
Mean (Example 1)

Age 15 16 16 17 18 16 17 16 25
17

Mean = 𝒊

Mean = 15 + 16 + 16 + 17 +18 + 16 + 17 + 16 + 25 + 17 =
173 10 10

Mean = 17.3
Mean (Example 2)

Age 15 16 16 17 18 16 17 16 50
17

Mean = 𝒊

Mean = 15 + 16 + 16 + 17 +18 + 16 + 17 + 16 + 50 + 17 = 198

10 10

Mean = 19.8
Mod
e

The value(s) that occur most

frequently in a dataset. A dataset can be:
 Unimodal: One mode.
Applicable for both numerical and  Bimodal: Two modes.
categorical data.
 Multimodal: More than
two modes.
Not influenced by extreme values.
Mode (Example)

Age 15 16 16 17 18 16 17 16 25
17

15 1
16 4
17 3 Mode = 16
18 1
25 1
Median

The middle value in a sorted dataset.

Robust to outliers and skewed data.

Best used for datasets with extreme

values or non-symmetrical distributions.

If the dataset has an even number of

observations, the median is the average of
the two middle values.
Median (Example)

Age 15 16 16 17 18 16 17 16 25 17

15 16 16 16 16 17 17 17 18 25
Median = 16 +17 = 16.5
2

15 16 16 16 16 17 17 17 18
Median = 16
Advantages and Disadvantages

Measure Advantages Disadvantages

Sensitive to outliers and
Mean Easy to calculate; uses all data.
skewed distributions.
Ignores some data points; less
Median Not affected by extreme values.
informative.
May not exist or may not be
Mode Easy to understand; works for any data.
unique.

Choosing the Right Measure

 Mean: Use when the data is symmetrically distributed without outliers.

 Median: Use when the data is skewed or contains outliers.

 Mode: Use for categorical data or to identify the most common value(s) in numerical data.
Measures of Dispersion

• Measures of dispersion quantify the spread or variability of data in a dataset.

• They indicate how much the data points differ from each other and the central tendency
(mean, median, mode).

Types of Measures

Range

Variance

Standard Deviation

Interquartile Range
Range

 The difference between the maximum and minimum values in a dataset.

 Only considers the extremes, ignoring the distribution of the data.

Range = Maximum Value − Minimum Value

Range (Example)

Age 15 16 16 17 18 16 17 16 25
17

Min Max
15 16 16 16 16 17 17 17
18 25
Range = Maximum Value − Minimum Value

Range = 25 – 15 = 10
Variance

The average of the squared differences from the mean.

Measures how far data points are spread around the mean.

 Variance = 2
Variance (Example)

Age 15 16 16 17 18 16 17 16 25 17

Mean = 17.3

Variance = ∑(𝒙𝒊 − 𝒙)2

𝒏

Variance = (15 - 17.3)2 + (16 - 17.3)2 + (16 - 17.3)2 + (17 - 17.3)2 + (18 - 17.3)2 + (16 - 17.3)2 + (17 - 17.3)2 + (16 - 17.3)2 + (25 - 17.3)2 + (17 - 17.3)2
10

Variance = 7.21
Standard Deviation

The square root of the variance, providing a measure of dispersion in the same
units as the data.

Indicates the average distance from the mean.

Preferred over variance for interpretability.

∑(𝒙𝒊 𝒙)2
Standard Deviation =
𝒏

Standard Deviation =
(Example 1)

Age 15 16 16 17 18 16 17 16 25 17

Mean = 17.3

∑(𝒙𝒊 𝒙)2
Standard Deviation = 𝒏

Standard Deviation =
(15 − 17.3)2 + (16 − 17.3)2 + (16 − 17.3)2 + (17 − 17.3)2 + (18 − 17.3)2 + (16 − 17.3)2 + (17 − 17.3)2 + (16 − 17.3)2 + (25 − 17.3)2 + (17 − 17.3)2
10

Standard Deviation = 2.68

(Example 2)

Age 15 16 16 17 18 16 17 16 25
17
Mean = 17.3
Variance = 7.21

Standard Deviation = 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆

Standard Deviation = 2.68

Interquartile Range

The range of the middle 50% of the data

Robust to outliers.

Useful for understanding data spread in non-symmetrical distributions.

Calculated as the difference between the 75th percentile (Q3) and the 25th percentile (Q1).

IQR = Q3 – Q1
(Example)

Age 15 16 16 17 18 16 17 16 25
17
Q1 Q2 Q3 Q4
15 16 16 16 16 17 17 17
18 25
Q1 = 16 Q2= 16.5 Q3 = 17 Q4 = 25

IQR = Q3 – Q1
IQR = 17 – 16
IQR = 1
Advantages and Disadvantages

Measure Advantages Disadvantages

Range Simple to compute. Affected by outliers; ignores data distribution.

Variance Accounts for all data points. Units are squared, making interpretation harder.

Standard Deviation Easy to interpret; uses same units as data. Sensitive to outliers.

IQR Robust to outliers. Ignores data outside Q1 and Q3.

Choosing the Right Measure

 Range: Quick and simple but sensitive to outliers.

 Variance/Standard Deviation: Best for understanding variability around the mean.

 IQR: Effective for skewed data or datasets with outliers.

Applications

Finance

Quality Control

Healthcare

Education
Correlation
and
Trends

Click here for link image

Correlation

 Correlation measures the strength and direction of the relationship between two variables.

 It indicates whether and how strongly pairs of variables are related.

 Helps identify relationships and dependencies between variables, which is crucial for predictive modeling.

 Measured with numbers ranging from -1 to +1

Limitations

Does not imply causation.

Sensitive to outliers, which can distort 𝑟.

Assumes linear relationships; does not capture non-linear ones.

Types of Correlation

Positive Correlation
Both variables move in the same direction.

Negative Correlation
One variable increases while the other decreases.

No Correlation
No discernible relationship between the variables.
Scatterplot

• A visual tool to represent correlation.

wage energy
5000 50
4000 40

3000 30

2000 20

1000 10

2 4 6 8 10 hours 2 4 6 8 10 hours

Positive correlation Negative correlation

age wage price
50 5000 5000

40 4000 4000

30 3000 4000

20 2000 2000

10 1000 1000

20 25 30 35 40 temperature 2 4 6 8 10 hours 2 4 6 8 10 quantity

No Correlation
Correlation Coefficient ( )

• A numerical measure that quantifies correlation

• Range: −1 ≤ ≤ 1

• Interpretation:
= 1: Perfect positive correlation.
= −1: Perfect nega ve correla on.
= 0: No correlation.

Negative Correlation Positive

Strong Negative Moderate Negative Weak Weak Moderate Strong Positive
No

-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Applications of Correlation

Business

Education

Healthcare
Trends

Trends indicate the general direction in which data values change over time or another
independent variable.

Guides decision-making by understanding past patterns and anticipating future behavior.

Limitations
Can be influenced by random fluctuations or external factors.

Long-term trends may mask short-term variations.

Types of Trends

Upward Trend
Values increase over time. Example: Increasing adoption of AI

Downward Trend
Values decrease over time. Example: Decline in open defecation

Sideways Trend
A variable remains relatively stable over time

Click here for link image

No Correlation
Data shows no consistent pattern.
Applications of Trends

Business

Climate Science

Healthcare
Differences between Correlation
and Trends

Feature Correlation Trends

Focus Relationship between two variables. Direction of change in one variable.

Visual Tool Scatter plot. Line chart, time-series plot.

Key Metric Correlation coefficient (𝑟). Slope or pattern direction.

Studies dependencies (e.g., height vs. Studies changes over time (e.g., sales
Application weight). growth).
Conclusion

Statistical is a foundation of data analysis and data science

Measures of central tendency describes the center of the data

Measures of dispersion described the spread

Correlation shows relationship and not causation

Trend shows variations over time

Assignment

1.Research on all the visualization tools discussed today. Which do you think is better and
why?

2.When should you not visualize data?

3.What types of Correlation best describes the following and why?

i. Age and height
ii. Salary and years in an organization

George and Mallery (2003), PDF
33% (3)
George and Mallery (2003), PDF
63 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
2 - Statistics
No ratings yet
2 - Statistics
50 pages
Introduction To Statistical Computing
No ratings yet
Introduction To Statistical Computing
20 pages
2 - Unit-Ii-2
No ratings yet
2 - Unit-Ii-2
66 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
Math
No ratings yet
Math
50 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
Data Management in Measurement CODING
No ratings yet
Data Management in Measurement CODING
27 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Core Statistics 101 Guide
No ratings yet
Core Statistics 101 Guide
32 pages
I Am Sharing 'DOC-20250811-WA0005.' With You
No ratings yet
I Am Sharing 'DOC-20250811-WA0005.' With You
16 pages
Psychology Project
No ratings yet
Psychology Project
14 pages
B. Biostatistics (Descriptive Statistics)
No ratings yet
B. Biostatistics (Descriptive Statistics)
42 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Q & A - Unit 1 - Introduction To Statistics
No ratings yet
Q & A - Unit 1 - Introduction To Statistics
20 pages
Statistics: Types, Data, and Measures
No ratings yet
Statistics: Types, Data, and Measures
6 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
STAT241 - Business Statistics (Day 3)
No ratings yet
STAT241 - Business Statistics (Day 3)
32 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
Quantitative Data Analysis
100% (2)
Quantitative Data Analysis
27 pages
Levels of Data
100% (1)
Levels of Data
26 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
MBC Stat For Nonstat v1.0 Final
No ratings yet
MBC Stat For Nonstat v1.0 Final
172 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Statistics & Psychology
No ratings yet
Statistics & Psychology
47 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
Unit 4
No ratings yet
Unit 4
152 pages
Chi-Square Test for Independence
No ratings yet
Chi-Square Test for Independence
4 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Measures
No ratings yet
Measures
8 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Statistics
No ratings yet
Statistics
10 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Modeling Basketball's Points Per Possession With Application To Predicting The Outcome of College Basketball Games
No ratings yet
Modeling Basketball's Points Per Possession With Application To Predicting The Outcome of College Basketball Games
19 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Stats MCQs: Correlation & Regression
No ratings yet
Stats MCQs: Correlation & Regression
3 pages
Statistics Maths Clinic Gr12 Eng
No ratings yet
Statistics Maths Clinic Gr12 Eng
6 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
MS Excel in Data Analytics
No ratings yet
MS Excel in Data Analytics
56 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
Brms
No ratings yet
Brms
210 pages
Statistical Methods in Biology Design and Analysis of Experiments and Regression 1st Edition Welham PDF Download
No ratings yet
Statistical Methods in Biology Design and Analysis of Experiments and Regression 1st Edition Welham PDF Download
78 pages
Method Validation: With Confidence
100% (2)
Method Validation: With Confidence
52 pages
Simulation Modeling in Manufacturing
No ratings yet
Simulation Modeling in Manufacturing
3 pages
Statistics - Probability - Q3 - Mod6 - Central Limit Theorem
No ratings yet
Statistics - Probability - Q3 - Mod6 - Central Limit Theorem
24 pages
Business Analysis
No ratings yet
Business Analysis
54 pages
11.3 - Mixture Experiments
No ratings yet
11.3 - Mixture Experiments
8 pages
Stat 1st Quarter Exam 46
No ratings yet
Stat 1st Quarter Exam 46
34 pages
Prosiding Seminar Edusainstech ISBN: 978-602-5614-35-4 Fmipa Unimus 2020
No ratings yet
Prosiding Seminar Edusainstech ISBN: 978-602-5614-35-4 Fmipa Unimus 2020
9 pages
Null Hypothesis Testing
100% (2)
Null Hypothesis Testing
5 pages
Adaptive Forecasting SKJ
No ratings yet
Adaptive Forecasting SKJ
21 pages
Probability in Hydrology
No ratings yet
Probability in Hydrology
11 pages
Assignment 1 Ans (Reference)
No ratings yet
Assignment 1 Ans (Reference)
18 pages
Sample Exercise
No ratings yet
Sample Exercise
14 pages
Market Risk Analysis VaR Expected Shortfall Presentation
No ratings yet
Market Risk Analysis VaR Expected Shortfall Presentation
15 pages
Hypothesis Testing Guide for Statistics
No ratings yet
Hypothesis Testing Guide for Statistics
8 pages
TMC Presentation
No ratings yet
TMC Presentation
11 pages
Lecture 5 Final Point Estimation and Interval Estimation
No ratings yet
Lecture 5 Final Point Estimation and Interval Estimation
10 pages
Coupon Collector Problem Explained
No ratings yet
Coupon Collector Problem Explained
16 pages
Assignment PDF
No ratings yet
Assignment PDF
3 pages
Punya Titi
No ratings yet
Punya Titi
9 pages
TSA Chapter 2
No ratings yet
TSA Chapter 2
3 pages
Revision 2
No ratings yet
Revision 2
3 pages
Decision Making Tree For Statistical Tests
No ratings yet
Decision Making Tree For Statistical Tests
1 page