Descriptive Stats is useful in many differing areas.
Descriptive Statistics is useful in many different jobs, and activities. Having a good
understanding of descriptive statistics will help anyone working in:
Business Analytics
Data Analysis
Data Engineering
Product Management
and so much more. Once you are finished with this course and its concepts, you'll be able to
apply them in ways you didn't even think about before.
Data is defined as distinct pieces of information and it can come in many forms. From numbers
in a spreadsheet, text to video and databases, to images and audio recordings, utilizing data in its
different forms is the new way of the world.
Data is used to understand and improve nearly every facet of our lives. So, no matter what field
you are in, you can utilize data to make better decisions and accomplish your goals.
We will start this lesson with an overview of data types and the most common statistics
used when analyzing data.
We'll discuss :
Measures of center and spread.
Common shapes that data takes on and how to handle outliers
How to use spreadsheets to handle these calculations
How to build visuals to communicate calculations
Data Types
In this video, two data types are introduced: Quantitative and Categorical.
Quantitative data takes on numeric values that allow us to perform mathematical operations
(like the number of dogs).
Categorical is used to label a group or set of items (like dog breeds - Collies, Labs, Poodles,
etc.).
Data Types (Ordinal vs. Nominal)
o Categorical Ordinal vs. Categorical Nominal
We can divide categorical data further into two types: Ordinal and Nominal.
Categorical Ordinal data take on a ranked ordering (like a ranked interaction on a scale
from Very Poor to Very Good with the dogs).
Categorical Nominal data do not have an order or ranking (like the breeds of the dog).
Data Types (Continuous vs. Discrete)
Continuous vs. Discrete
We can think of quantitative data as being either continuous or discrete.
o Continuous data can be split into smaller and smaller units, and still a smaller unit
exists.
An example of this is the age of the dog - we can measure the units of the age in
years, months, days, hours, seconds, but there are still smaller units that could
be associated with the age.
o Discrete data only takes on countable values. The number of dogs we interact with is
an example of a discrete data type.
Summary
The table below summarizes our data types. To expand on the information in
the table, you can look through the text that follows.
Data Types
Quantitativ
Continuous Discrete
e:
Pages in a Book, Trees in Yard, Dogs at a
Height, Age, Income
Coffee Shop
Categorical: Ordinal Nominal
Letter Grade, Survey
Gender, Marital Status, Breakfast Items
Rating
Below is a little more detail of the information shared in the above table.
Another Look
To break down our data types, there are two main blocks:
Quantitative and Categorical
Quantitative can be further divided into Continuous or Discrete .
Categorical data can be divided into Ordinal or Nominal .
You should have now mastered what types of data in the world around us falls
into each of these four buckets: Discrete, Continuous, Nominal, and Ordinal.
In the next sections, we will work through the numeric summaries that relate
specifically to quantitative variables.
Quantitative vs. Categorical
Some of these can be a bit tricky - notice even though zip codes are a
number, they aren’t really a quantitative variable. If we add two zip codes
together, we do not obtain any useful information from this new value.
Therefore, this is a categorical variable.
Height, Age, the Number of Pages in a Book, and Annual Income all take
on values that we can add, subtract and perform other operations with to gain
useful insight. Hence, these are quantitative .
Gender, Letter Grade, Breakfast Type, Marital Status, and Zip Code can
be thought of as labels for a group of items or individuals. Hence, these
are categorical .
Continuous vs. Discrete
To consider if we have continuous or discrete data, we should see if we can
split our data into smaller and smaller units. Consider time - we could
measure an event in years, months, days, hours, minutes, or seconds, and
even at seconds we know there are smaller units we could measure time in.
Therefore, we know this data type is continuous. Height, age,
and income are all examples of continuous data . Alternatively,
the number of pages in a book, dogs I count outside a coffee shop,
or trees in a yard are discrete data . We would not want to split our dogs
in half.
Ordinal vs. Nominal
In looking at categorical variables, we found Gender, Marital Status, Zip
Code, and your Breakfast items are nominal variables where there is no
order ranking associated with this type of data. Whether you ate cereal, toast,
eggs, or only coffee for breakfast; there is no rank-ordering associated with
your breakfast.
Alternatively, the Letter Grade or Survey Ratings have a rank ordering
associated with it, as ordinal data . If you receive an A, this is higher than
an A-. An A- is ranked higher than a B+, and so on... Ordinal variables
frequently occur on rating scales from very poor to very good. In many cases,
we turn these ordinal variables into numbers, as we can more easily analyze
them, but more on this later!
Analyzing Quantitative Data
Four Aspects for Quantitative Data
There are four main aspects to analyzing Quantitative data.
1. Measures of Center
2. Measures of Spread
3. The Shape of the data.
4. Outliers
Analyzing Categorical Data
Though not discussed in the video, analyzing categorical data has fewer parts
to consider. Categorical data is analyzed usually by looking at the counts or
proportion of individuals that fall into each group. For example, if we were
looking at the breeds of the dogs, we would care about how many dogs are of
each breed, or what proportion of dogs are of each breed type.
Measures of Center
There are three measures of center:
1. Mean
2. Median
3. Mode
The Mean
In this video, we focused on the calculation of the mean. The mean is often
called the average or the expected value in mathematics. We calculate the
mean by adding all of our values together and dividing by the number of
values in our dataset.
The remaining measures of the median and mode will be discussed in detail
in the upcoming quizzes and videos.
The Median
The median splits our data so that 50% of our values are lower and 50% are
higher. We found in this video that how we calculate the median depends on if
we have an even number of observations or an odd number of observations.
Median for Odd Values
If we have an odd number of observations, the median is simply the number
in the direct middle. For example, if we have 7 observations, the median is
the fourth value when our numbers are ordered from smallest to largest. If we
have 9 observations, the median is the fifth value.
Median for Even Values
If we have an even number of observations, the median is the average of
the two values in the middle. For example, if we have 8 observations, we
average the fourth and fifth values together when our numbers are ordered
from smallest to largest.
In order to compute the median, we MUST sort our values first.
Whether we use the mean or median to describe a dataset is largely
dependent on the shape of our dataset and if there are any outliers. We will
talk about this in just a bit!
The Mode
The mode is the most frequently observed value in our dataset.
There might be multiple modes for a particular dataset or no mode at all.
No Mode
If all observations in our dataset are observed with the same frequency, there
is no mode. If we have the dataset:
1, 1, 2, 2, 3, 3, 4, 4
There is no mode because all observations occur the same number of times.
Many Modes
If two (or more) numbers share the maximum value, then there is more than
one mode. If we have the dataset:
1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9
There are two modes 3 and 6, because these values share the maximum
frequencies at 3 times, while all other values only appear once.