0% found this document useful (0 votes)

4 views50 pages

Data Analysis and Data Visualization Basics 2

The document provides an overview of data analysis and visualization basics, focusing on visualization tools, statistics, and correlation trends. It highlights the importance of visual data representation, the types of statistics, and measures of central tendency and dispersion. Additionally, it discusses correlation and trends, emphasizing their applications and limitations in various fields.

Uploaded by

ezepraise080

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views50 pages

Data Analysis and Data Visualization Basics 2

Uploaded by

ezepraise080

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Data Analysis

and
Data Visualization Basics
(Pt2)

2
Overview
Visualization Tools

Statistics.

Statistical Summaries.

Correlation and Trends.

Visualization Tools

Click here for link image

Statistics on Visualization

Our brains value visuals over any other type of information.

90% of the information transmitted to the brain is visual – (MIT)

The human brain can process an image in just 13 milliseconds – (MIT)

50% of the brain is active in visual processing - (Piktochart)

Human brains process visuals 60,000 times faster than they do text – (University of Minnesota)
Visualization Tools

Tool Features Best For

Matplotlib Basic charts and plots Foundational plotting in Python

Seaborn Advanced statistical plots Visualizing relationships and trends

ggplot Grammar of Graphics framework Data visualization in R

Tableau Interactive dashboards Business intelligence and analytics

Power BI Real-time data analytics Dynamic reporting in enterprises

Power BI Visualization Tableau Visualization

Click here for link image Click here for link image

Install Power BI Desktop and get Install Tableau Public and get
started started
Seaborne, Matplotlib, Ggplot

Click here for link image

Statistics

Click here for link image

What is Statistics?

Statistics is the science of collecting, organizing,

analyzing, interpreting, and presenting data to make
informed decisions.

It is a foundational tool in many fields, including

business, healthcare, engineering, social sciences,
and natural sciences.

It is a foundation for data analysis and data science.

Types of Statistics

Inferential Statistics
Descriptive Statistics
 Makes predictions, inferences, and
 Summarizes and describes the main generalizations about a population based
features of a dataset. on a sample.
 Does not make predictions or infer  Involves probability theory and hypothesis
conclusions beyond the data. testing.
 Common techniques include  Key concepts include:
 Measures of central tendency  Estimation
 Measures of dispersion  Hypothesis Testing
 Data visualizations  Regression and Correlation
Analysis.
Key Concepts in Statistics

Population
The entire group of individuals or items being studied

Sample
A subset of the population used for analysis.

Probability

The active phase where data is accessed, processed, and analyzed for insights or operations.

Outliers
Data points significantly different from others. Can distort statistical measures like mean and
variance.
Statistical Methods

Data Collection
Surveys, experiments, observational studies, and simulations.

Data Analysis
Techniques for summarizing and exploring data.

Hypothesis Testing
• A systematic method to test assumptions about a dataset.
• Includes null and alternative hypothesis testes with statistical tests.

Regression Analysis
• Models the relationship between variables
• Simple Linear Regression and Multiple Regression.
Applications of Statistics

Healthcare Social Sciences

Engineering Education
Business
Challenges in Statistics

03
Interpretation
01 Data Quality

Bias
02
Statistical
Summaries

Click here for link image

Measures of Central Tendency

Measures of central tendency are They provide a summary of the data

statistical metrics that represent by identifying a single value that
the center or typical value of a reflects the overall distribution.
dataset.

Mean Mode

Median
Mean

The sum of all data points divided by the number of data points.

Affected by all values in the dataset, including outliers.

Mean = 𝒊

Where:
𝒊 : Each data point
: Total number of data points
Mean (Example 1)

Age 15 16 16 17 18 16 17 16 25
17

Mean = 𝒊

Mean = 15 + 16 + 16 + 17 +18 + 16 + 17 + 16 + 25 + 17 =
173 10 10

Mean = 17.3
Mean (Example 2)

Age 15 16 16 17 18 16 17 16 50
17

Mean = 𝒊

Mean = 15 + 16 + 16 + 17 +18 + 16 + 17 + 16 + 50 + 17 = 198

10 10

Mean = 19.8
Mod
e

The value(s) that occur most

frequently in a dataset. A dataset can be:
 Unimodal: One mode.
Applicable for both numerical and  Bimodal: Two modes.
categorical data.
 Multimodal: More than
two modes.
Not influenced by extreme values.
Mode (Example)

Age 15 16 16 17 18 16 17 16 25
17

15 1
16 4
17 3 Mode = 16
18 1
25 1
Median

The middle value in a sorted dataset.

Robust to outliers and skewed data.

Best used for datasets with extreme

values or non-symmetrical distributions.

If the dataset has an even number of

observations, the median is the average of
the two middle values.
Median (Example)

Age 15 16 16 17 18 16 17 16 25 17

15 16 16 16 16 17 17 17 18 25
Median = 16 +17 = 16.5
2

15 16 16 16 16 17 17 17 18
Median = 16
Advantages and Disadvantages

Measure Advantages Disadvantages

Sensitive to outliers and
Mean Easy to calculate; uses all data.
skewed distributions.
Ignores some data points; less
Median Not affected by extreme values.
informative.
May not exist or may not be
Mode Easy to understand; works for any data.
unique.

Choosing the Right Measure

 Mean: Use when the data is symmetrically distributed without outliers.

 Median: Use when the data is skewed or contains outliers.

 Mode: Use for categorical data or to identify the most common value(s) in numerical data.
Measures of Dispersion

• Measures of dispersion quantify the spread or variability of data in a dataset.

• They indicate how much the data points differ from each other and the central tendency
(mean, median, mode).

Types of Measures

Range

Variance

Standard Deviation

Interquartile Range
Range

 The difference between the maximum and minimum values in a dataset.

 Only considers the extremes, ignoring the distribution of the data.

Range = Maximum Value − Minimum Value

Range (Example)

Age 15 16 16 17 18 16 17 16 25
17

Min Max
15 16 16 16 16 17 17 17
18 25
Range = Maximum Value − Minimum Value

Range = 25 – 15 = 10
Variance

The average of the squared differences from the mean.

Measures how far data points are spread around the mean.

 Variance = 2
Variance (Example)

Age 15 16 16 17 18 16 17 16 25 17

Mean = 17.3

Variance = ∑(𝒙𝒊 − 𝒙)2

𝒏

Variance = (15 - 17.3)2 + (16 - 17.3)2 + (16 - 17.3)2 + (17 - 17.3)2 + (18 - 17.3)2 + (16 - 17.3)2 + (17 - 17.3)2 + (16 - 17.3)2 + (25 - 17.3)2 + (17 - 17.3)2
10

Variance = 7.21
Standard Deviation

The square root of the variance, providing a measure of dispersion in the same
units as the data.

Indicates the average distance from the mean.

Preferred over variance for interpretability.

∑(𝒙𝒊 𝒙)2
Standard Deviation =
𝒏

Standard Deviation =
(Example 1)

Age 15 16 16 17 18 16 17 16 25 17

Mean = 17.3

∑(𝒙𝒊 𝒙)2
Standard Deviation = 𝒏

Standard Deviation =
(15 − 17.3)2 + (16 − 17.3)2 + (16 − 17.3)2 + (17 − 17.3)2 + (18 − 17.3)2 + (16 − 17.3)2 + (17 − 17.3)2 + (16 − 17.3)2 + (25 − 17.3)2 + (17 − 17.3)2
10

Standard Deviation = 2.68

(Example 2)

Age 15 16 16 17 18 16 17 16 25
17
Mean = 17.3
Variance = 7.21

Standard Deviation = 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆

Standard Deviation = 2.68

Interquartile Range

The range of the middle 50% of the data

Robust to outliers.

Useful for understanding data spread in non-symmetrical distributions.

Calculated as the difference between the 75th percentile (Q3) and the 25th percentile (Q1).

IQR = Q3 – Q1
(Example)

Age 15 16 16 17 18 16 17 16 25
17
Q1 Q2 Q3 Q4
15 16 16 16 16 17 17 17
18 25
Q1 = 16 Q2= 16.5 Q3 = 17 Q4 = 25

IQR = Q3 – Q1
IQR = 17 – 16
IQR = 1
Advantages and Disadvantages

Measure Advantages Disadvantages

Range Simple to compute. Affected by outliers; ignores data distribution.

Variance Accounts for all data points. Units are squared, making interpretation harder.

Standard Deviation Easy to interpret; uses same units as data. Sensitive to outliers.

IQR Robust to outliers. Ignores data outside Q1 and Q3.

Choosing the Right Measure

 Range: Quick and simple but sensitive to outliers.

 Variance/Standard Deviation: Best for understanding variability around the mean.

 IQR: Effective for skewed data or datasets with outliers.

Applications

Finance

Quality Control

Healthcare

Education
Correlation
and
Trends

Click here for link image

Correlation

 Correlation measures the strength and direction of the relationship between two variables.

 It indicates whether and how strongly pairs of variables are related.

 Helps identify relationships and dependencies between variables, which is crucial for predictive modeling.

 Measured with numbers ranging from -1 to +1

Limitations

Does not imply causation.

Sensitive to outliers, which can distort 𝑟.

Assumes linear relationships; does not capture non-linear ones.

Types of Correlation

Positive Correlation
Both variables move in the same direction.

Negative Correlation
One variable increases while the other decreases.

No Correlation
No discernible relationship between the variables.
Scatterplot

• A visual tool to represent correlation.

wage energy
5000 50
4000 40

3000 30

2000 20

1000 10

2 4 6 8 10 hours 2 4 6 8 10 hours

Positive correlation Negative correlation

age wage price
50 5000 5000

40 4000 4000

30 3000 4000

20 2000 2000

10 1000 1000

20 25 30 35 40 temperature 2 4 6 8 10 hours 2 4 6 8 10 quantity

No Correlation
Correlation Coefficient ( )

• A numerical measure that quantifies correlation

• Range: −1 ≤ ≤ 1

• Interpretation:
= 1: Perfect positive correlation.
= −1: Perfect nega ve correla on.
= 0: No correlation.

Negative Correlation Positive

Strong Negative Moderate Negative Weak Weak Moderate Strong Positive
No

-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Applications of Correlation

Business

Education

Healthcare
Trends

Trends indicate the general direction in which data values change over time or another
independent variable.

Guides decision-making by understanding past patterns and anticipating future behavior.

Limitations
Can be influenced by random fluctuations or external factors.

Long-term trends may mask short-term variations.

Types of Trends

Upward Trend
Values increase over time. Example: Increasing adoption of AI

Downward Trend
Values decrease over time. Example: Decline in open defecation

Sideways Trend
A variable remains relatively stable over time

Click here for link image

No Correlation
Data shows no consistent pattern.
Applications of Trends

Business

Climate Science

Healthcare
Differences between Correlation
and Trends

Feature Correlation Trends

Focus Relationship between two variables. Direction of change in one variable.

Visual Tool Scatter plot. Line chart, time-series plot.

Key Metric Correlation coefficient (𝑟). Slope or pattern direction.

Studies dependencies (e.g., height vs. Studies changes over time (e.g., sales
Application weight). growth).
Conclusion

Statistical is a foundation of data analysis and data science

Measures of central tendency describes the center of the data

Measures of dispersion described the spread

Correlation shows relationship and not causation

Trend shows variations over time

Assignment

1.Research on all the visualization tools discussed today. Which do you think is better and
why?

2.When should you not visualize data?

3.What types of Correlation best describes the following and why?

i. Age and height
ii. Salary and years in an organization

NVS Lab Attendant Notes
No ratings yet
NVS Lab Attendant Notes
4 pages
Copper Oxide Nanoparticles Thesis
No ratings yet
Copper Oxide Nanoparticles Thesis
8 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Introduction To Statistical Computing
No ratings yet
Introduction To Statistical Computing
20 pages
Math One Revision Booklet
No ratings yet
Math One Revision Booklet
121 pages
Concave vs Convex Mirror Quiz
100% (4)
Concave vs Convex Mirror Quiz
5 pages
2 - Statistics
No ratings yet
2 - Statistics
50 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
I Am Sharing 'DOC-20250811-WA0005.' With You
No ratings yet
I Am Sharing 'DOC-20250811-WA0005.' With You
16 pages
2 - Unit-Ii-2
No ratings yet
2 - Unit-Ii-2
66 pages
Math
No ratings yet
Math
50 pages
Psychology Project
No ratings yet
Psychology Project
14 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
B. Biostatistics (Descriptive Statistics)
No ratings yet
B. Biostatistics (Descriptive Statistics)
42 pages
Q & A - Unit 1 - Introduction To Statistics
No ratings yet
Q & A - Unit 1 - Introduction To Statistics
20 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
MCB 202 (Lecture 1)
No ratings yet
MCB 202 (Lecture 1)
12 pages
CONCLUSION
No ratings yet
CONCLUSION
3 pages
MS Excel in Data Analytics
No ratings yet
MS Excel in Data Analytics
56 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Proposal Nia
No ratings yet
Proposal Nia
23 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
STAT241 - Business Statistics (Day 3)
No ratings yet
STAT241 - Business Statistics (Day 3)
32 pages
English - Grade 11 - Third Term Test 2022 - Kalmunai - English Paper II
No ratings yet
English - Grade 11 - Third Term Test 2022 - Kalmunai - English Paper II
8 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
q3 Performance Task 1
No ratings yet
q3 Performance Task 1
4 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Core Statistics 101 Guide
No ratings yet
Core Statistics 101 Guide
32 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Image Processing Techniques
No ratings yet
Image Processing Techniques
1 page
ED-Course Plan 2024 EEE
No ratings yet
ED-Course Plan 2024 EEE
6 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Data Management in Measurement CODING
No ratings yet
Data Management in Measurement CODING
27 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
LSM6DS3 Datasheet
No ratings yet
LSM6DS3 Datasheet
100 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
Scaling Social Impact
No ratings yet
Scaling Social Impact
95 pages
Statistics Maths Clinic Gr12 Eng
No ratings yet
Statistics Maths Clinic Gr12 Eng
6 pages
Eggspress Y6 Non-Fiction
No ratings yet
Eggspress Y6 Non-Fiction
42 pages
Dissertation Writing Support
100% (2)
Dissertation Writing Support
7 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
Statistics & Psychology
No ratings yet
Statistics & Psychology
47 pages
STPM 2022 Results Analysis
No ratings yet
STPM 2022 Results Analysis
18 pages
OM Week3
No ratings yet
OM Week3
4 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Statistics
No ratings yet
Statistics
10 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Bull Heading
No ratings yet
Bull Heading
9 pages
RGUKT CET Final Notification 20.08.2021
No ratings yet
RGUKT CET Final Notification 20.08.2021
14 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Mastery 2 (Etech)
No ratings yet
Mastery 2 (Etech)
4 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Region Division District School Name School Id School Year
No ratings yet
Region Division District School Name School Id School Year
5 pages
Lps-01-Hti-Itp-Me-024 - Fan Coil Unit
No ratings yet
Lps-01-Hti-Itp-Me-024 - Fan Coil Unit
5 pages
Affords Investors The Right To Exclude How It Works, Physics Mechanism
No ratings yet
Affords Investors The Right To Exclude How It Works, Physics Mechanism
17 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Measures
No ratings yet
Measures
8 pages
Unit 4
No ratings yet
Unit 4
152 pages
Grade 11 CBSE Exam Schedule 2023
No ratings yet
Grade 11 CBSE Exam Schedule 2023
2 pages
Unit Test Integral Calculus Set A
No ratings yet
Unit Test Integral Calculus Set A
4 pages
Quantitative Data Analysis
100% (2)
Quantitative Data Analysis
27 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Statistics: Types, Data, and Measures
No ratings yet
Statistics: Types, Data, and Measures
6 pages
Hazards and Risk Identification and Management
No ratings yet
Hazards and Risk Identification and Management
2 pages
MBC Stat For Nonstat v1.0 Final
No ratings yet
MBC Stat For Nonstat v1.0 Final
172 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Practical Research 1: Quarter 3, LAS 6: Synthesizing Information and Writing Coherent Literature Review
No ratings yet
Practical Research 1: Quarter 3, LAS 6: Synthesizing Information and Writing Coherent Literature Review
8 pages
Port Ship Emissions Analysis
No ratings yet
Port Ship Emissions Analysis
13 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Levels of Data
100% (1)
Levels of Data
26 pages
Six Sigma Method and 5s Method
No ratings yet
Six Sigma Method and 5s Method
12 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Symbolism in Angela Carter's Reflections
100% (1)
Symbolism in Angela Carter's Reflections
2 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages