Data Analytic
I
O. J. Akintande, Ph.D
Personal Webpage:
https://ojakintande.github.io/
Prerequisite information
● You should have a computer
● You should have basic data imputation
knowledge
● You should know how to use MS
Excel, to a basic level of data entry and
query
● You must attend 75% of the lectures
● You must submit 100% of your
practical assignments.
Course Intro
[Data analytic I]
Image Source: Web search
The Origin of Data | Traditional Statistics
The word “data” derives from the Latin word
“datum” (singular), which means the “thing given”.
Data is any information that has been translated into
different forms to be processed, analyzed, managed,
and transferred.
The first use of data goes back to 19,000 BC when
our Palaeolithic ancestors used a baboon tool called
the Ishango bone to perform simple calculations.
In the 1640s, John Graunt, a hat maker, started
collecting information regarding deaths in London.
He noted down statistics such as:
➔ The number of deaths
➔ The mortality rate among age groups
➔ The causes of death
The Origin of Data | Traditional Statistics
One day back in the 1880s, the German-American
statistician Herman Hollerith saw a train conductor
punching train tickets for passengers. That’s how the
idea of using punch cards in writing and processing
data was born.
Hollerith started working on the design of the
tabulation machine that uses punch cards, based on a
previous model invented by the silk weaver Joseph
Jacquard in the 1800s.
Typically, punch cards are made of stiff paper with
holes punched in precise places by a machine. To
read the data electronically, the cards are moved
between brass rods.
After over ten years of trial and error, Hollerith's
significant discovery enabled the US government to
finish the census that same year.
Image Source: Web search
The Origin of Data | Traditional Statistics
In 1928, Fritz Pfleumer, a German engineer, patented
a magnetic tape that he utilized to store data instead
of wire recording.
The concept of a relational database management
system, or "data table" as we currently call it, was
first proposed by computer scientist Edgar Codd.
In the 1960s, Edgar Codd started working on a model
that can describe data attributes in columns and
their values in rows.
Image Source: Web search
The Origin of Data | Traditional Statistics
The rise of the internet ignites the rise of big
data. Regards to Sir Tim Berners Lee, hypertext
and hyperlinks that made it easy to share
information and connect resources.
With establishment of Google in 1997, data
became even more widely available to
everyone with access to a computer or mobile
device.
Although this marks the end of our
time-traveling adventure for today, the history
of data timeline does not end here.
Every technological advancement, whether it is
in data science, machine learning, or artificial
intelligence, brings with it new methods for
producing and disseminating information.
Image Source: Web search
The Origin of Data | Summary
Image Source: Web search
Data understanding in Statistics
Collection
Preparation &
Processing
Exploration &
Analysis
Interpretation & Inferences
(Generalization)
Image Source: Web search
What is Data and how do we see it
in Computational Statistics
Data
Categorical Metric TAP
Nominal Ordinal Ratio Interval
Text Audio Picture/Image
Binary Non-Binary
Source: Instructor
Data Analytic | Domain
Computer
Statistical
Science
/Data
Science
Computational
Statistics
Machine
Learning
Image Source: Instructor Image Source: Web search
What is Data and how do we see it
in Computational Statistics
Algorithm
Categorical Metric TAP
Statistics, Machine Learning &
Data Science
Computer Science, Machine
Learning, & Artificial Intelligence
Image Source: Instructor
Data Analytic | Definition
Image Source: Web search
Data Analytic | Definition
Image Source: Web search
Course | Industry Relevance
1 Data Analysis skills
2 Decision Science skills
3 Algorithmic Development skills
4 Data Engineering skills
5 Data Entrepreneurship skills
Course Tools Uncovering the differences
|
Lecture II
[Descriptive
Analysis]
Descriptive | Statistics Measures of central tendencies
Mean
Median
Mode
Average, middle and most
Descriptive | Statistics in MS Excel Measures of central tendencies
Given examination scores of students. To generate descriptive
statistics for these scores, we must execute the following steps:
● On the Data tab, in the Analysis group, click Data Analysis
● Select Descriptive Statistics and click OK
Descriptive | Statistics in MS Excel Measures of central tendencies
● Select the range A2:A15 as the Input Range.
● Click OK.
● Select cell C1 as the Output Range.
● Make sure Summary statistics is checked
Interpretation
Descriptive | Statistics in MS Excel
Descriptive | Statistics in MS Excel Real-life example
Twitter feedback on GTBank page
By implication
Measure Positive Negative Neutral
1. Having 60% +ve feedback is
Statistics 60% 30% 10% not a good overall feedback.
2. It means, only 20 - 30% of the
customers are likely to continue
• Most feedback is +ve
with the company services.
• 1 out of every 3 feedbacks are -ve 3. Even though, most feedbacks
are positive, the company faithful
• 1 out of every 7 +ve feedbacks are neutral customer base is less than or
equals to 30%.
• 1 out of every 4 -ve feedbacks are neutral
Descriptive | Statistics Measures of Spreads
Variance
Standard deviation
Standard error
Mean distance, mean distance based on
sample and mean distance from sample
Descriptive | Statistics in MS Excel Measures of
Spread
Given examination scores of students. To generate descriptive
statistics for these scores, we must execute the following steps:
● On the Data tab, in the Analysis group, click Data Analysis
● Select Descriptive Statistics and click OK
Descriptive | Statistics in MS Excel Measures of
Spread
● Select the range A2:A15 as the Input Range.
● Click OK.
● Select cell C1 as the Output Range.
● Make sure Summary statistics is checked
Interpretation
Descriptive | Statistics in MS Excel
Descriptive | Statistics Measures of
Partition
Quartile
Decile
Percentile
Quarter, 10th and 100th, Cut off and
Tail - H ead
Descriptive | Statistics in MS Excel Measures of
Partition
Given examination scores of students. To generate descriptive
statistics for these scores, we must execute the following steps:
● On the Data tab, in the Analysis group, click Data Analysis
● Select Descriptive Statistics and click OK
Descriptive | Statistics in MS Excel Measures of
Partition
● Select the range A2:A15 as the Input Range.
● Click OK.
● Select cell C1 as the Output Range.
● Make sure Summary statistics is checked
Interpretation
Descriptive | Statistics in MS Excel
Descriptive | Statistics Measures of Shapes
Skewness
Kurtosis
Distributions
Distribution, Preakness, Patterns
Descriptive | Statistics in MS Excel Measures of Shapes
Given examination scores of students. To generate descriptive
statistics for these scores, we must execute the following steps:
● On the Data tab, in the Analysis group, click Data Analysis
● Select Descriptive Statistics and click OK
Descriptive | Statistics in MS Excel Measures of Shapes
● Select the range A2:A15 as the Input Range.
● Click OK.
● Select cell C1 as the Output Range.
● Make sure Summary statistics is checked
Interpretation
Descriptive | Statistics in MS Excel
Flow/horizontal
distribution
Peakness/vertical
distribution
Lecture III
[Regression
Models]
Lecture IV
[Correlation
Analysis]