Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views56 pages

Unit 2 DSRP

Uploaded by

foredu48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views56 pages

Unit 2 DSRP

Uploaded by

foredu48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

UNIT-2

•Descriptive Statistics
•Basic Statistical Analysis
Descriptive Statistics
• Measures of central tendency
• Measures of location of dispersions
• Practice and analysis with R
Measures of Central Tendency & Dispersion
• Measures that indicate the approximate
center of a distribution are called measures of
central tendency
• Measures that describe the spread of the
data are measures of dispersion
• These measures include the mean, median,
mode, range, upper and lower quartiles,
variance, and standard deviation
Process of Descriptive Analysis
Measure of central tendency
• It represents the whole set of data by a single
value. It gives us the location of central points.
There are three main measures of central
tendency:
• Mean
• Mode
• Median
Measure of variability
OR
Measure of Dispersion
Measure of variability is known as the spread of
data or how well is our data is distributed. The most
common variability measures are:
• Range
• Variance
• Standard deviation
Practice and analysis with R
• getwd()
• setwd("C:/Users/USHARAM/Desktop/R-Practice")
• mydata=read.csv("CGF.csv")
• print(head(mydata))
• mean=mean(mydata$Age)
• print(mean)
• median = median(mydata$Age)
• print(median)
• install.packages("modeest")
• library(modeest)
• mode = mfv(mydata$Age)
• print(mode)
• max = max(mydata$Age)
• min=min(mydata$Age)
• range=max-min
• cat("Range is:\n")
• print(range)
• r = range(mydata$Age)
• print(r)
• variance = var(mydata$Age)
• print(variance)
• std = sd(mydata$Age)
• print(std)
• quartiles = quantile(mydata$Age)
• print(quartiles)
• IQR = IQR(mydata$Age)
• print(IQR)
• summary = summary(mydata$Age)
• print(summary)
• q()
Basic Statistical Analysis
• Statistical hypothesis generation and testing
• Chi-Square test
• t-Test
• Analysis of variance
• Correlation analysis
• Maximum likelihood test
• Practice and analysis with R
Hypothesis Testing in R Programming
• A hypothesis is made by the researchers about
the data collected for any experiment or data
set.
• A hypothesis is an assumption made by the
researchers that are not mandatory true
• a hypothesis is a decision taken by the
researchers based on the data of the
population collected
• Hypothesis Testing in R Programming is a
process of testing the hypothesis made by the
researcher or to validate the hypothesis.
• To perform hypothesis testing, a random
sample of data from the population is taken
and testing is performed. Based on the results
of testing, the hypothesis is either selected or
rejected. This concept is known as Statistical
Inference.
The four-step process of hypothesis testing,
• One sample T-Testing,
• Two-sample T-Testing,
• Directional Hypothesis,
• one sample -test,
• two sample -test and
• correlation test in R programming.
Two-Sample t-Test with Unequal
Variance
• The general way to use the t.test() command
is to compare two vectors of numeric values.
Two-Sample t-Test with Equal
Variance
• You can override the default and use the
classic t-test by adding the var.equal = TRUE
instruction, which forces the command to
assume that the variance of the two samples
is equal.
• The calculation of the t-value uses pooled
variance and the degrees of freedom are
unmodified; as a result, the p-value is slightly
different from the Welch version:
One-Sample t-Testing
• You can also carry out a one-sample t-test. In
this version you supply the name of a single
vector and the mean to compare it to (this
defaults to 0):
Using Directional Hypotheses
• You can also specify a “direction” to your
hypothesis. In many cases you are simply testing
to see if the means of two samples are different,
but you may want to know if a sample mean is
lower than another sample mean (or greater).
• You can use the alternative = instruction to switch
the emphasis from a two-sided test (the default)
to a one-sided test. The choices you have are
between “two.sided”, “less”, or “greater”, and
your choice can be abbreviated.
U-test
• The U-test is used for comparing the median
values of two samples. You use it when the
data are not normally distributed, so it is
described as a non-parametric test.
• The U-test is often called the Mann-Whitney
U-test but is generally attributed to Wilcoxon
(Wilcoxon Rank Sum test), hence in R the
command is wilcox.test().
• When you have two samples to compare and
your data are non-parametric, you can use the
U-test.
• This goes by various names and may be known
as the Mann-Whitney U-test or Wilcoxon sign
rank test. You use the wilcox.test() command
to carry out the analysis.
Using Directional Hypotheses
• Both one- and two-sample tests use an alternative
hypothesis that the location shift is not equal to 0 as
their default. This is essentially a two-sided
hypothesis.
• You can change this by using the alternative =
instruction, where you can select “two.sided”, “less”,
or “greater” as your alternative hypothesis (an
abbreviation is acceptable but you still need quotes,
single or double).
• You can also specify mu, the location shift. By default
mu = 0. In the following example the hypothesis
• is set to something other than 0:
Paired tests
• The t-test and the U-test can both be used when
your data are in matched pairs. Sometimes this
kind of test is also called a repeated measures test
(depending on circumstance). You can run the test
by adding paired = TRUE to the appropriate
command.
• Here is an example where the data show the
effectiveness of greenhouse sticky traps in
catching whitefly. Each trap has a white side and a
yellow side. To compare white and yellow we can
use a matched pair.
CORRELATION AND COVARIANCE
• When you have two continuous variables you can look for a
link between them; this link is called a correlation.
• You can go about finding this several ways using R. The cor()
command determines correlations between two vectors, all
the columns of a data frame (or matrix), or two data frames
(or matrix objects). The cov() command examines
covariance.
• By default the Pearson product moment (that is regular
parametric correlation) is used but Spearman (rho) and
Kendall (tau) methods (both non-parametric correlation)
can be specified instead. The cor.test() command carries
out a test of significance of the correlation.
Simple Correlation
• Simple correlations are between two
continuous variables and you can use the cor()
command to obtain a correlation coefficient
like so:
• If your vectors are contained within a data
frame or some other object, you need to
extract them in a different fashion. Look at the
women data frame. This comes as example
data with your distribution of R.
1. https://www.youtube.com/watch?v=ZcaKgq
XsEbA
2. https://www.youtube.com/watch?v=ua-CiDN
Nj30
3. https://www.youtube.com/watch?v=xiEC5oF
sq2s

You might also like