0% found this document useful (0 votes)

8 views34 pages

Data Science Using R

The document provides an overview of data science, emphasizing its role in data gathering, analysis, and decision-making across various industries. It covers key concepts in statistics, including descriptive statistics, probability distributions, and hypothesis testing, along with various statistical tests like Z-Test, T-Test, and ANOVA. The document highlights the importance of data visualization techniques and statistical methods in making informed business decisions.

Uploaded by

Dhana R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views34 pages

Data Science Using R

Uploaded by

Dhana R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Data science

using R
By
Abinaya
INTRODUCTION
• Data Science is about data gathering, analysis and decision-making.
• Data Science is about finding patterns in data, through analysis, and
make future predictions.

By using Data Science, companies are able to make:

• Better decisions (should we choose A or B)

• Predictive analysis (what will happen next?)
• Pattern discoveries (find pattern, or maybe hidden information in the
data)

Data science using R 2

Where is data science
needed
Data Science is used in many industries in the world today, e.g. banking,
.
consultancy, healthcare, and manufacturing.

Examples of where Data Science is needed:

•For route planning: To discover the best routes to ship

•To foresee delays for flight/ship/train etc. (through predictive analysis)
•To create promotional offers
•To find the best suited time to deliver goods
•To forecast the next years revenue for a company
•To analyze health benefit of training
•To predict who will win elections Data science using R 3
Data Science can be applied in nearly every part of a business where data is available.
Examples are:

•Consumer goods
•Stock markets
•Industry
•Politics
•Logistic companies
•E-commerce
Introduction to statistics
• Statistics is a field of math that generally deals with
the collection of data, tabulation,
and interpretation of numerical data. In simple
words statistics is an area of applied mathematics
concerned with data collection analysis, interpretation,
and presentation.
• It is actually a form of mathematical analysis that uses
different quantitative models to produce a set of
experimental data or studies of real life. Statistics deals
with how data can be used to solve complex problems.
Data science using R 5
Some people consider statistics to be a distinct
mathematical science rather than a branch of
mathematics.

Statistics makes work easy and simple and provides a

clear and clean picture of the work you do on a regular
basis.

Statistics is used in a variety of sciences and has huge

applications, it is used in Weather Forecasting, the
Study of the Stock Market, Insurance Sectors, Betting
Industry, Data Science, and others
Data science using R 6
Descriptive statistics

Describes the data set we have or data set

that’s being analyzed. or Quantitatively
describing the data.

1. Graphical Representation
2. Tabular Representation

Data science using R 7

Graphical statistics
Plot Type Variable Type Description

Only One Categorical A bar plot is a chart that presents

Variable categorical data with rectangular bars
Or with heights or lengths proportional to
Bar Plot
One Categorical the values that they represent.
Variable & One Visually represents frequency
Continous Measure distribution.

Data science using R 8

A stacked bar chart, also known as a stacked
bar graph, is a graph that is used to break down
a category by another category and compare
Stacked Bar Two Categorical parts of a whole.
Plot Variables Each bar in the chart represents one category as
a whole, and segments in the bar represent
different parts or categories of that whole.
Visually represents cross-tabulation data.

A histogram is an approximate representation of

Only One
the distribution of numerical data. It is created by
Histogram Continuous
converting a continuous variable into categorical
Variable
by binning/bucketing it.

Data science using R 9

A density plot is a representation of the
Distribution distribution of a numeric variable. It uses a
Only One
Plot kernel density estimate to show the
Continuous
(Density probability density function of the variable. It
Variable
Plot) is a smoothed version of the histogram
Visually shows Skewness in data.
The box plot is a standardized way of
Only One displaying the distribution of data based on
Continuous the five-number summary: minimum, first
Variable quartile, median, third quartile, and
Box Plot
Or maximum.
(Box and
One The Minimum and Maximum in box-plot are
Whisker
Continuous & Lower Control Limit (LCL) and Upper Control
Plot)
One Limit (UCL).
Categorical Any data point beyond the LCL or UCL is
Variable typically considered as an outlier.
Quickly helps find outliers in data.

PRESENTATION TITLE 10
One of the
dimension has A line plot is a type of chart that displays
to be Time information as a series of data points called
and the ‘markers’ connected by straight line
Line Plot
second segments.
dimension a Visually shows trends in Time Series
Continuous Data.
Variable
A graph in which the values of two variables
are plotted along two axes. The pattern of the
Two
resulting points on the plot visually depicts
Scatter Plot Continuous
the existence of Correlation between the two
Variables
variables.
Quickly helps find Correlation.
One
Categorical
A pie chart is a circular statistical graphic,
Variable
which is divided into slices to illustrate
Pie Chart associated
numerical proportions.
with a
Quickly helps
Data science using compare
R parts of a whole. 11
Continuous
Bar plot

PRESENTATION TITLE 12
Stacked bar plot

Data science using R 13

Histogram
Distribution
plot
Box and whisker
plot
Line plot
Scatter plot
Pie chart
Tabular statistics
• Tabular method of data presentation is wide spread in
all spheres of human life. These methods are used to
summarize data from a sample or population into
table format.
• Data is grouped into categories and the number (or
frequency) of observations in each category is
obtained.

• Frequency distribution is a type of tabular method. A

frequency distribution is a tabular summary of data
showing the frequency of items in each of several
non-overlapping classes.
• The objective is to provide insights about the data that
cannot be quickly obtained by looking only at the
original data.
Example
Probablity
• Probability denotes the possibility of the outcome
of any random event.
• The meaning of this term is to check the extent to
which any event is likely to happen.
• For example, when we flip a coin in the air, what is
the possibility of getting a head? The answer to
this question is based on the number of possible
outcomes.
• Here the possibility is either head or tail will be the
outcome. So, the probability of a head to come as
a result is 1/2.
Probablity
distribution
• A probability distribution is a statistical function that describes
all the possible values and likelihoods that a random variable
can take within a given range.
• This range will be bounded between the minimum and
maximum possible values, but precisely where the possible
value is likely to be plotted on the probability distribution
depends on a number of factors.
• These factors include the distribution's mean (average),
standard deviation, skewness, and kurtosis.
Key takeaways
•A probability distribution depicts the expected
outcomes of possible values for a given data-generating
process.
•Probability distributions come in many shapes with
different characteristics, as defined by the mean,
standard deviation, skewness, and kurtosis.
•Investors use probability distributions to anticipate
returns on assets such as stocks over time and to hedge
their risk.
How probability
distribution works
• Perhaps the most common probability distribution is the normal
distribution, or "bell curve," although several distributions exist that
are commonly used.
• Typically, the data-generating process of some phenomenon will
dictate its probability distribution. This process is called the
probability density function.
• Probability distributions can also be used to create cumulative
distribution functions (CDFs), which add up the probability of
occurrences cumulatively and will always start at zero and end at
100%.
Hypothesis
testing
Hypothesis Testing is a type of statistical analysis in which
you put your assumptions about a population parameter
to the test. It is used to estimate the relationship between
2 statistical variables.
Let's discuss few examples of statistical hypothesis from
real-life -
•A teacher assumes that 60% of his college's students
come from lower-middle-class families.
•A doctor believes that 3D (Diet, Dose, and Discipline) is
90% effective for diabetic patients.
Now that you know about hypothesis testing, look at the
two types of hypothesis testing in statistics.
Statistical tests
Statistical tests are used in hypothesis testing. They can
be used to:
•Determine whether a predictor variable has a statistically
significant relationship with an outcome variable.
•Estimate the difference between two or more groups.
• Statistical tests assume a null hypothesis of no
relationship or no difference between groups.
• Then they determine whether the observed data fall
outside of the range of values predicted by the null
hypothesis.
• If you already know what types of variables you’re
dealing with, you can use the flowchart to choose the
right statistical test for your data.
Types of statistical
tests
• Z- Test
• T- Test
• Paired T-Test
• Independent T-Test
• One sample T-Test
• ANOVA Test
• Non-parametric statistical test
• Chi-Square test
Z-Test
• A z-test is a statistical test used to
determine whether two population
means are different when the
variances are known and the sample
size is large.
• In z-test mean of the population is
compared.The parameters used are
population mean and population
standard deviation.
• Z-test is used to validate a
hypothesis that the sample drawn
belongs to the same population.
T-Test
• In t-test the mean of the two given
samples are compared.
• A t-test is used when the population
parameters (mean and standard deviation)
are not known.

Paired T-Test
• Tests for the difference between two
variables from the same population( pre-
and post test score).

• For example- In a training program

performance score of the trainee before
and after completion of the program.
Independent T-
Test
• The independent t-test which is also
called the two sample t-test or
student’s t-test, is a statistical test
that determines whether there is a
statistically significant difference
between the means in two unrelated
groups.
• For example -comparing boys and
girls in a population.
ANOVA Test
• Analysis of variance (ANOVA) is a
statistical technique that is used to
check if the means of two or more
groups are significantly different
from each other.
• ANOVA checks the impact of one
or more factors by comparing the
means of different samples.
• If we use a t-test instead of ANOVA
test it won’t be reliable as number
of samples are more than two and
it will give error in the result.
Non-Parametric
statistical test
Non parametric tests are used when data
is not normally distributed. Non
parametric tests include chi-square
test.
Chi-square Test
• Chi-square test is used to compare two
categorical variables.
• Calculating the Chi-Square statistic value
and comparing it against a critical value
from the Chi-Square distribution allows to
assess whether the observed frequency
are significantly different from the
expected frequency.

STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
CHAPTER 4 Measure of Dispersion
No ratings yet
CHAPTER 4 Measure of Dispersion
76 pages
Data Science & Machine Learning Algorithms - A CONCISEtasets, and Free Text Books) - Ananthu S Chakravarthi
100% (3)
Data Science & Machine Learning Algorithms - A CONCISEtasets, and Free Text Books) - Ananthu S Chakravarthi
90 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Statistics for CSS Students
No ratings yet
Statistics for CSS Students
73 pages
Data Visualization & Probability Basics
No ratings yet
Data Visualization & Probability Basics
29 pages
Statistical Analysis Basics
100% (1)
Statistical Analysis Basics
143 pages
CENG3300 Lecture 2-2
No ratings yet
CENG3300 Lecture 2-2
23 pages
RVO-STATISTICS - Statistics - Introduction To Statistics IBBI
No ratings yet
RVO-STATISTICS - Statistics - Introduction To Statistics IBBI
93 pages
MATH 361 (Autosaved)
No ratings yet
MATH 361 (Autosaved)
17 pages
SMA 140 Lectures Notes 2024 Sep
No ratings yet
SMA 140 Lectures Notes 2024 Sep
87 pages
Prob & Stat
No ratings yet
Prob & Stat
50 pages
Parametric and Non Parametric Test
No ratings yet
Parametric and Non Parametric Test
76 pages
Univariate Statistics w24 Update
No ratings yet
Univariate Statistics w24 Update
144 pages
Statistics 101: Introduction To Data Management
No ratings yet
Statistics 101: Introduction To Data Management
37 pages
Probability & Statistics Basics
No ratings yet
Probability & Statistics Basics
72 pages
Basic Statistics
No ratings yet
Basic Statistics
90 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Data Science by CFA
No ratings yet
Data Science by CFA
27 pages
Variables & Chart
No ratings yet
Variables & Chart
60 pages
Grade 12 Research Data Analysis Guide
100% (1)
Grade 12 Research Data Analysis Guide
29 pages
Making Sense of Data Statistic Course
No ratings yet
Making Sense of Data Statistic Course
39 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
3rd QTR Stats Reviewer
No ratings yet
3rd QTR Stats Reviewer
24 pages
Data Science Using R
No ratings yet
Data Science Using R
34 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Engineering Statistics Guide
No ratings yet
Engineering Statistics Guide
124 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Manm526 W1
No ratings yet
Manm526 W1
38 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Biostat Aguila Mission Solis
No ratings yet
Biostat Aguila Mission Solis
44 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Standard Deviation
100% (1)
Standard Deviation
18 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Unit 3
No ratings yet
Unit 3
6 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Job Stressors Impact on Performance
100% (1)
Job Stressors Impact on Performance
4 pages
Inferential Statistics
No ratings yet
Inferential Statistics
48 pages
Making Sense of Data Mooc Notes PDF
No ratings yet
Making Sense of Data Mooc Notes PDF
32 pages
Year 1 Statistics Guide
No ratings yet
Year 1 Statistics Guide
49 pages
CA Foundation QA RTP May 2025 Exam Castudynotes Com
No ratings yet
CA Foundation QA RTP May 2025 Exam Castudynotes Com
8 pages
TOS GRADE 11 STAT & PROB (Finals) 2019 - 2020
100% (1)
TOS GRADE 11 STAT & PROB (Finals) 2019 - 2020
2 pages
Descriptive Statistics Course Guide
No ratings yet
Descriptive Statistics Course Guide
50 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
69 pages
Statistics Course Overview
100% (3)
Statistics Course Overview
43 pages
Notes
No ratings yet
Notes
29 pages
Predictive Analytics Unit I1
No ratings yet
Predictive Analytics Unit I1
21 pages
Nlary C1 Stat1181
No ratings yet
Nlary C1 Stat1181
42 pages
TI-Inspire Manual: Stats & Calculations
No ratings yet
TI-Inspire Manual: Stats & Calculations
56 pages
BIO 401 FINAL MCQs AND QUESTION
No ratings yet
BIO 401 FINAL MCQs AND QUESTION
22 pages
MCA 202-Big Data and Big Data Analysis
No ratings yet
MCA 202-Big Data and Big Data Analysis
189 pages
Questions
No ratings yet
Questions
22 pages
Statistical Tools and Techniques: College-Level Notes
No ratings yet
Statistical Tools and Techniques: College-Level Notes
14 pages
Variable and Data-2
No ratings yet
Variable and Data-2
27 pages
STA1007 Notes
No ratings yet
STA1007 Notes
251 pages
STAT Module I Notes
No ratings yet
STAT Module I Notes
10 pages
Forecasting
No ratings yet
Forecasting
47 pages
Semiparametric Theory and Missing Data 1st Edition Anastasios Tsiatis Download
100% (3)
Semiparametric Theory and Missing Data 1st Edition Anastasios Tsiatis Download
81 pages
Lecture 7
No ratings yet
Lecture 7
20 pages
Chapter 4: Research Results
No ratings yet
Chapter 4: Research Results
30 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Monthly Income Analysis Report
No ratings yet
Monthly Income Analysis Report
35 pages
Insem AIML
No ratings yet
Insem AIML
8 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
No ratings yet
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
29 pages
2539-Article Text-7345-1-10-20221113
No ratings yet
2539-Article Text-7345-1-10-20221113
11 pages
Computatm Solution
No ratings yet
Computatm Solution
6 pages
An Introduction To Statistics With Python With Applications in The Life Sciences Research PDF Download
100% (18)
An Introduction To Statistics With Python With Applications in The Life Sciences Research PDF Download
14 pages
Body Esteem Scale A Validation On Italian Adolescents
No ratings yet
Body Esteem Scale A Validation On Italian Adolescents
13 pages
Slides 0
No ratings yet
Slides 0
21 pages
AIS-Q450 (MTBF) : Reliability Prediction Report
No ratings yet
AIS-Q450 (MTBF) : Reliability Prediction Report
16 pages
Study Design and Statistical Analysis Guide
67% (3)
Study Design and Statistical Analysis Guide
5 pages
Straightforward Statistics With Excel 2nd Edition C. Bowen Full
No ratings yet
Straightforward Statistics With Excel 2nd Edition C. Bowen Full
177 pages
ECON1007 PS3 2025 Solutions
No ratings yet
ECON1007 PS3 2025 Solutions
7 pages
Entropy 24 00713 v2
No ratings yet
Entropy 24 00713 v2
12 pages
Name: Chinmay Tripurwar Roll No: 22b3902: Simple Regression Model Analysis
No ratings yet
Name: Chinmay Tripurwar Roll No: 22b3902: Simple Regression Model Analysis
9 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Hypothesis Testing in Statistics
No ratings yet
Hypothesis Testing in Statistics
9 pages
Probability Distributions Circuit Training
No ratings yet
Probability Distributions Circuit Training
4 pages
4332bQAM601 - Statistics For Management
No ratings yet
4332bQAM601 - Statistics For Management
6 pages
KYKO Validation and Calibration of Items
No ratings yet
KYKO Validation and Calibration of Items
50 pages
Choosing Between A Nonparametric Test and A Parametric Test
No ratings yet
Choosing Between A Nonparametric Test and A Parametric Test
3 pages
Skewness and Kurtosis: An Example: Normal Lognormal
No ratings yet
Skewness and Kurtosis: An Example: Normal Lognormal
2 pages
JRS, 523 - Mey Damayanti. C (143-151)
No ratings yet
JRS, 523 - Mey Damayanti. C (143-151)
9 pages

Data Science Using R

Uploaded by

Data Science Using R

Uploaded by

Data science

By using Data Science, companies are able to make:

• Better decisions (should we choose A or B)

Data science using R 2

Examples of where Data Science is needed:

•For route planning: To discover the best routes to ship

Statistics makes work easy and simple and provides a

Statistics is used in a variety of sciences and has huge

Describes the data set we have or data set

Data science using R 7

Only One Categorical A bar plot is a chart that presents

Data science using R 8

A histogram is an approximate representation of

Data science using R 9

Data science using R 13

• Frequency distribution is a type of tabular method. A

• For example- In a training program

You might also like