Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views49 pages

Data Visualization

Uploaded by

mariumzahida83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views49 pages

Data Visualization

Uploaded by

mariumzahida83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter 3- Introduction To Data Visualization

1
Data and Statistics
◦ Data
◦ Data sources
◦ Types of statistics
◦ Nature of data
◦ Levels of measurement

2
Definition
◦ The word statistics has two meanings. In the most common usage, statistics refers
to numerical facts.

◦ The second meaning of statistics refers to the field or discipline of study. In this
sense, statistics is a group of methods used to collect, organize, analyze, and
interpret data and make decisions.

◦ Or, Statistics is both the science of uncertainty and the technology of extracting
information from data

3
Why Do We Need Statistics?
◦ The study of statistics has become more popular than ever over the past four decades or so. The
increasing availability of computers and statistical packages has enlarged the role of statistics as a
tool of empirical research.

◦ Like almost all fields of study, statistics has two aspects: Theoretical and applied. The former, also
called mathematical statistics deals with the development of theorems, formulas, rules, and laws.
The latter involves the application of those theorems, formulas, rules and laws.

◦ The main aim of this lecture is to introduce statistics including the nature of data as well as the
levels of measurement that can be used.

◦ Statistics is used to help us make decisions. This is especially important in health care and public
health.
4
Why Do We Need Statistics? Cont.
◦ Example: CDC & the Flu Vaccine

◦ During the year, the CDC of USA collects,


organizes, analyzes, and interprets numerical
information and data.

◦ They extract information from the data dn


make decisions about what to include in next
year’s flu vaccine.

5
Why Do We Need Statistics? Cont.
The importance of statistics are:
◦ Statistics helps in gathering information about the appropriate quantitative data
◦ It depicts the complex data in the graphical form, tabular form and in
diagrammatic representation, to understand it easily
◦ It provides the exact description and better understanding of sample data
◦ It helps in designing the effective and proper planning of the statistical inquiry in
any field (integral part of research methodology)
◦ It gives valid inferences with the reliability measures about the population
parameters from the sample data
◦ It helps to understand the variability pattern through the quantitative observations
6
Functions of Statistics
Statistics provides methods for:

◦ Design: Planning and carrying out research studies

◦ Description: Summarizing and exploring data

◦ Inference: Making predictions and generalizations about phenomena represented


by sample data.

7
Definitions
◦ Data are the facts and figures collected, summarized, analyzed, and interpreted.

◦ The data collected in a particular study are referred to as the data set.

◦ The elements are the entities on which data are collected.

◦ A variable is a characteristic/subject of interest for the elements.

◦ The set of measurements collected for a particular element is called an


observation.

8
Using data collected on student exam score as an example

Exam Scores
Exam 1 Exam 2 Exam 3 Exam 4

9
In statistics, we commonly use these key terms:

◦ Population is the complete collection of elements to be studied.

◦ Sample is a sub collection of elements drawn from a population.

◦ Variables: numerical or categorical

◦ Data are the actual values of the variable. They may be numbers or words

10
What is a Sample?
◦ Example
◦ Definition
◦ Only survey ICU nurses at Square
◦ A sample is a small portion/subset of the hospital
population.
◦ Not a representative sample

◦ It can be a representative sample


(represent the population) ◦ At least one nurse from each department
of Square hospital
◦ More representative sample
◦ But it can also be a biased sample (not
representative)

11
Population Vs. Sample Data
◦ Sample Data
◦ Population Data
◦ In sample data, data is only collected
◦ In population data, data from every from some of the individuals/elements
individual/element in the population is in the population
available/collected.

◦ Very commonly used in research studies


◦ Entire population = census
of patients

◦ Survey all the professor in Brac University

12
Types of Statistics
◦ Broadly speaking, statistics can be divided into two areas: descriptive
statistics and inferential statistics.

◦ Descriptive statistics consists of methods for organizing, displaying and


describing data by using tables, graphs and summary from samples and
populations.

◦ Inferential statistics consists of methods that uses information from


sample to draw conclusions/make decisions or prediction about
population.
13
Data Sources
Existing Sources:
◦ Within an organization – almost any department Database services – NCBI

◦ Government agencies- Bangladesh Bureau of Statistics

◦ Industry associations – Bangladesh Association of Pharmaceutical Industries

◦ Special-interest organizations – Pharmacy Council of Bangladesh

◦ Internet – more and more organizations/firms

14
Statistical Studies

◦ In experimental studies, the variable of interest is first identified. Then one or


more other variables are identified and controlled, so that data can be obtained
about how they influence the variable of interest.

◦ In observational (non-experimental) studies, no attempt is made to control or


influence the variables of interest e.g., a survey

15
Nature of Data
Two types of data can be identified as qualitative (categorical) and
quantitative (numerical) data.
Qualitative data deals with characteristics and descriptors that cannot be easily
measured. (Categorical data)
◦ It can be separated into different categories that are distinguished by some
non- numerical characteristics.

Qualitative data are the result of categorizing or describing attributes of a


population. Ethnic group, hair colour, blood type are all types of qualitative
data. They are generally described by words or letters.
16
Categorical
• Eye color (brown, black, blue)
• Ethnic group (African, Asian, Caucasian)
• Blood type (A+, B+ etc.)
Quantitative data

Discrete data Continous data


(counts) (measurements)

Numerical

*A good common rule for defining if a data is continuous or discrete is that if the point of measurement can be reduced 17

in half and still make sense, the data is continuous


Quantitative: Discrete Vs. Continuous Data

◦ Quantitative data consist of number representing counts and


measurements.
◦ Discrete data is countable while continuous – measurable.

◦ The similarity is that both of them are the two types of quantitative
data also called numerical data.
◦ However, in practice, many statistical tests and decisions depend on
whether the basic data is discrete or continuous
18
Quantitative: Discrete Data
◦ Discrete data is a count that involves integers. Only a limited number of
values is possible. The discrete values cannot be subdivided into parts.
◦ For example, the number of children in a school is discrete data. You can
count whole individuals. However, you cannot count 1.5 kids
Discrete data key characteristics:

• You can count the data. It is usually units counted in whole numbers.
• The values cannot be divided into smaller pieces and do not have additional meaning.
• You cannot measure the data. By nature, discrete data cannot be measured at all. For example, you
can measure your weight with the help of a scale. So, your weight is not a discrete data.
• It has a limited number of possible values e.g., days of the month.
• Discrete data is graphically displayed by a bar graph
19
Quantitative: Discrete Data (cont.)
Examples of discrete data:

• The number of students in a class.


• The number of workers in a company.
• The number of parts damaged during transportation.
• Shoe sizes.
• Number of languages an individual speaks.
• The number of test questions you answered correctly.
• Instruments in a shelf.
• The number of siblings a randomly selected individual has.

20
Quantitative: Continuous Data
◦ Continuous data is information that could be meaningfully divided into finer levels.
It can be measured on a scale or continuum and can have almost any numeric value.
◦ For example, you can measure your height at very precise scales — meters,
centimeters, millimeters and etc.

◦ The continuous variables can take any value between two numbers. For example,
between 50 and 72 inches, there are literally millions of possible heights: 52.04762
inches, 69.948376 inches and so on.

◦ You can record continuous data at so many different measurements – width,


temperature, time, and etc. This is where the key difference with discrete data lies

21
Quantitative: Continuous Data (cont.)
◦ Continuous data is information that could be meaningfully divided into finer
levels. It can be measured on a scale or continuum and can have almost any
numeric value.

Discrete data key characteristics:

• In general, continuous variables are not counted.


• The values can be subdivided into smaller and smaller pieces, and they have additional
meaning.
• The continuous data is measurable.
• It has an infinite number of possible values within an interval.
• Continuous data is graphically displayed by histograms
22
Quantitative: Continuous Data (cont.)
Examples of discrete data:

• The amount of time required to complete a project.


• The height of children.
• The amount of time it takes to sell shoes.
• The amount of rain, in inches, that falls in a storm.
• The square footage of a house.
• The weight of a truck.
• The speed of cars.
• Time to wake up.

23
QUANTITATIVE:
DISCRETE VS.
CONTINUOUS
DATA

24
Levels of measurement of data
Any variable/data, either quantitative or qualitative, has one of four
different levels of measurement.
The way a set of data is measured is called its level of measurement.
Data can be classified into four levels of measurement. They are:

1. Nominal scale level


2. Ordinal scale level
3. Interval scale level
4. Ratio scale level
25
1. Nominal scale level:
◦ Data that is measured using a nominal scale is qualitative. It is characterized by
data that consists of names, labels or categories.

◦ Nominal data commonly identifies groups of two members, e.g. male or


female, left or right, young or old, yes or no, etc. Nominal data are not ordered
and cannot be used in calculations.

2. Ordinal scale level:


◦ This scale is similar to nominal scale but it is different as data can be ordered.

◦ For example, when responses are ordered from the desired responses to the
least desired one: excellent, good, satisfactory, unsatisfactory.

* Like the nominal scale, ordinal scale data cannot be used in calculations.
26
3. The interval scale level
◦ Like the ordinal, with the additional property that meaningful amounts of
differences between data can be determined. However, there is no natural
zero starting point. In other words, the interval scale has a definite ordering,
the difference between interval scale data can be measured, but there is no
starting point.

Example: Temperature scales like Celsius (C) are measured by using the
interval scale. In both temperatures, 40 degree is equal to 100 degrees minus
60 degrees. Differences make sense. pH is also an example of an interval scale.

* Zero is not the absolute lowest temperature.


*This kind of data can be used in calculations.
27
4. The Ratio scale level
◦ Like the interval level but, in addition, it has a 0 point and ratios can be
calculated. For example, the final exam scores are 18, 15, 10 and 9 (out of
20). This scale must contain a zero value that indicates that nothing exists for
the variable at the zero point.
*The data can be put in order: 9, 10, 15 an 18
*The difference between data have meaning:
◦ the difference between score 18 and 9 is 9 points.
◦ Ratios can be calculated: The smallest ratio score is 0.
◦ So, 9 is twice 18. The score of 18 is better than the score of 9.

Interval and ratio measurement levels are the most desirable as we can use the
more powerful statistical procedures available for means and standard deviations.

28
Exercise: What type of measurement scale is
being used?
1. A satisfaction survey of a social website by number:

◦ 1= Very satisfied, 2= somewhat satisfied, 3= not satisfied

2. Incomes measured in dollars

3. The dates: 1866, 1920, 2010

4. The gender of students

5. Baking temperatures for different dishes: 350, 200, 400, 250, 300

29
30
Summarizing Qualitative Data
◦ Frequency Distribution
◦ Relative Frequency Distribution
◦ Percent Frequency Distribution
◦ Bar Graphs
◦ Pie Charts

31
Frequency Distribution
Frequency is the number of times a data value occurs.
For example, if ten students score 80 in statistics, then the score of 80 has a
frequency of 10. Frequency is often represented by the letter f.

Frequency Distribution: A frequency distribution is a tabular or graphical


summary of data that displays the number of observations within a given category
or interval.
◦ The objective is to provide insights about the data that cannot be quickly obtained by
looking only at raw data.

32
Can you make any inferences from just
looking at this raw data?

Malay Chinese
Indian Others Observation Frequency
Malay Chinese Malay 4
Indian Others Chinese 3
Malay Chinese Indian 3
Others 4
Indian Others
Total 14
Malay Others

33
An example of data from a classroom summarized
into gender and ethnic group, the frequency tables
can get as below :

Observation Frequency

Male 28

Female 22

Total 50

Observation Frequency
Malay 33
Chinese 9
Indian 6
Others 2
Total 50 34
Relative Frequency
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class. A relative frequency distribution is a
tabular summary of a set of data showing the relative frequency for
each class.

Percent Frequency
The percent frequency of a class is the percentage of
the total number of data items belonging to the class.

35
Ex: Ten people were asked about their marital status:

Single, Single, Single, Divorced, Single, Married, Widowed, Married,


Widowed, Divorced

36
Bar Chart
Bar chart is used to display the frequency distribution in the graphical form.
It consists of two orthogonal axes.
one of the axes represent the observations while the other one represents
the % frequency of the observations. The frequency of the observations is
represented by a bar.

37
Pie Chart
Pie Chart is used to display the frequency distribution.
It displays the ratio of the observations according to
the percentage of frequencies in each category of
the distribution.

38
Summarizing Quantitative Data
◦ Frequency Distribution
◦ Relative Frequency Distribution
◦ Percent Frequency Distribution
◦ Histogram

39
Frequency Distribution for Quantitative Data
To summarize quantitative data, we use a
frequency distribution just like those for qualitative
data. However, since these data have no natural
categories, we divide the data into classes.
Lets say we have following eighteen numerical data
from a study:
4, 2, 6, 6, 5, 9, 10, 11, 11, 13, 14, 14, 13, 12, 10, 18, 17, 19
Classes are intervals of equal width that cover all values that
are observed in the data set.

The lower class limit of a class is the smallest value that


appears in that class.

The upper class limit of a class is the largest value that


appears in that class.

The class width is the difference between consecutive lower


class limits.
Guidelines for Choosing Classes
There are many ways to construct a frequency distribution, and they will differ
depending on the classes chosen. Following are guidelines for choosing the classes.

• Every observation must fall into one of the classes.

• The classes must not overlap.

• The classes must be of equal width.

• There must be no gaps between classes. Even if there


are no observations in a class, it must be included in the
frequency distribution.
Constructing a Frequency Distribution
Following are the general steps for constructing a frequency distribution.
4, 2, 6, 6, 5, 9, 10, 11, 11, 13,
Step 1: Choose a class width. 14, 14, 13, 12, 10, 18, 17, 19

Step 2: Choose a lower-class limit for the first class. This should
be a convenient number that is slightly less than the
minimum data value.

Step 3: Compute the lower limit for the second class, by adding
the class width to the lower limit for the first class:

Lower limit for second class = Lower limit for first class +
Class width

Step 4: Compute the lower limits for each of the remaining


classes, by adding the class width to the lower limit of
the preceding class. Stop when the largest data value is
included in a class.

Step 5: Count the number of observations in each class, and


construct the frequency distribution.
Example: Frequency Distribution
The emissions for 65 vehicles, in units of grams of
particles per gallon of fuel, are given. Construct a
frequency distribution using a class width of 1.
Example: Frequency Distribution (Continued 1)
Since the smallest value in the data set is 0.25, we choose 0.00 as the lower
limit for the first class.

The class width is 1 and the first lower class limit is 0.00, so the lower limit for
the second class is 0.00 + 1 = 1.00.
The remaining lower class limits are as follows.
1.00 + 1 = 2.00
2.00 + 1 = 3.00
3.00 + 1 = 4.00
4.00 + 1 = 5.00
5.00 + 1 = 6.00
6.00 + 1 = 7.00
Since the largest data value is 6.64, every data value is now contained in a
class.
Example: Frequency Distribution (Continued 2)
Lastly, we count the number of observations in each
class to obtain the frequency distribution.

Class Frequency
0.00 – 0.99 9
1.00 – 1.99 26
2.00 – 2.99 11
3.00 – 3.99 13
4.00 – 4.99 3
5.00 – 5.99 1
6.00 – 6.99 2
Relative Frequency Distribution

Class Frequency Relative Frequency


0.00 – 0.99 9 0.140
1.00 – 1.99 26 0.406
2.00 – 2.99 11 0.172
3.00 – 3.99 13 0.203
4.00 – 4.99 3 0.047
5.00 – 5.99 1 0.016
6.00 – 6.99 2 0.031
Histogram
Once we have a frequency distribution or a relative
frequency distribution, we can put the information in
graphical form by constructing a histogram.
A histogram is constructed by drawing a rectangle for
each class. The heights of the rectangles are equal to
the frequencies or the relative frequencies, and the
widths are equal to the class width.
Example: Histogram
The frequency histogram and relative frequency
histogram are given for the particulate emissions
data.
Note that the two histograms have the same shape.
The only difference is the scale on the vertical axis.
Choosing the Number of Classes
There are no hard and fast rules for choosing the number of
classes. In general, it is good to have more classes rather than
fewer, but it is also good to have reasonably large frequencies
in some of the classes. There are two principles that can guide
the choice.

◦ Too few classes produce a histogram lacking in detail.

◦ Too many classes produce a histogram with too much detail,


so that the main features of the data are obscured.

You might also like