0% found this document useful (0 votes)

51 views11 pages

Commands For Data Analysis Using R

The document provides a comprehensive overview of statistical analysis in R, covering single-value summary statistics, data frames, contingency tables, and various statistical tests such as t-tests, ANOVA, and regression analysis. It includes examples of data manipulation and visualization techniques, as well as commands for importing data from CSV files. Additionally, it outlines methods for conducting normality tests, correlation analysis, and chi-squared tests for association.

Uploaded by

nihalpj1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views11 pages

Commands For Data Analysis Using R

Uploaded by

nihalpj1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Summary Statistics for Single set of Data (with vectors)

Examples for Vectors and Data frame

# Sample vector of daily sales for a retail store (in dollars)
daily_sales <- c(5000, 6200, 4800, 5500, 7200, 6300, 5100, 4800, 5400, 6200, 5800, 7000,
6800, 5500, 6100, 5300, 4700, 5900, 6200, 6500, 7200, 6800, 5600, 4800, 5200, 6100, 5800,
7200, 6900, 5500, 6100)

# Calculate single-value summary statistics

mean_sales <- mean(daily_sales)
median_sales <- median(daily_sales)
std_deviation <- sd(daily_sales)
min_sales <- min(daily_sales)
max_sales <- max(daily_sales)

Summary Statistics with Data frame-(data frame is a type of data structure with two or
more set of data)

NOTE:
There are two ways of writing data frame
1. Writing the datas separately such as
Data1= c(1,2,3,4,5,6,7,8,9,20)
Data2= c(2,4,6,8,10,12,14,16,18,20)
So here there is two set of data and now you want this in table format.
Then the command is
Give any variable name eg:
df= data.frame(Data1=Data1, Data2=Data2) #click enter
df (#click enter)
2. The other way is
df = data.frame("Name" = c("Amiya", "Rosy", "Asish"),"Gender" = c("Male",
"Female", "Male"))
df
(the difference here is instead of writing each data separately you just write directly in
one command)
These are the some of examples

data_frame_data <- data.frame(

Name = c("Alice", "Bob", "Charlie"),
Math = c(85, 92, 78),
Science = c(88, 90, 85),
History = c(75, 82, 90))

#Contingency table

df = data.frame("Name" = c("Amiya", "Rosy", "Asish"),"Gender" = c("Male", "Female",

"Male"))

> table(df)

Output:

Gender

Name Female Male

Amiya 0 1

Asish 0 1

Rosy 1 0

# Sample data
gender <- c("Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female",
"Male", "Female")
brand <- c("Apple", "Samsung", "Samsung", "Apple", "Samsung", "Google", "Apple",
"Google", "Samsung", "Other")
#While using strings it is important to notes that it should have “…..” (inside the
brackets only)
# Create a data frame
data_df <- data.frame(Gender = gender, Brand = brand)

# Using the xtabs() function

cross_tab2 <- xtabs(~ Gender + Brand, data = data_df)
print(cross_tab2)

Importing Data from Excel or CSV File

File searching
Command IN R Software:
getwd()

enter
"C:/Users/Hp/Documents"

data1=read.csv(file.choose()) enter
data1 Enter

COMMANDS FOR DATA ANALYSIS FOR DIFFERENT TESTS

Tests Codes Data

Shapiro-Wilk #summary statistics for channel 1 ch1_data <- c(7, 7,
normality test 8, 8, 9, 10, 11, 11,
summary(ch1_data) 12, 12, 12, 13, 14,
15, 17, 17, 17, 18,
#summary statistics for channel 2
18, 19)
> summary(ch2_data)
ch2_data <- c(36,
# Step 3: Create a histogram 21, 27, 39, 33, 42,
> hist(ch1_data, main = "DeliveryTimes Ch1", 25, 30, 31, 37, 35,
xlab = "Delivery Time (hrs)", col = "lightblue", 29, 23, 34, 41, 23,
border = "black") 32, 32, 30, 39)

> hist(ch2_data, main = "DeliveryTimes Ch2",

xlab = "Delivery Time (hrs)", col = "lightblue",
border = "black")

# Step 4: Density plot and visualize the data

for channel 1 and 2

dens = density(ch1_data)
> plot(dens$x, dens$y)

#For channel 2

> dens = density(ch2_data)

> plot(dens$x, dens$y)

# Step 5: Conduct the Shapiro-Wilk

normality test

shapiro_test_result = shapiro.test(ch1_data)
> print(shapiro_test_result)

ks_test_result = ks.test(ch1_data, ch2_data)

> print(ks_test_result)

# Step 7: Create a QQ plot to visually assess

the goodness-of-fit

> qqnorm(ch1_data)
qqline(ch1_data)
> qqnorm(ch2_data)

> qqline(ch2_data)

# Step 8: Draw qqplot to compare the two

channels
> qp = qqplot(ch1_data,ch2_data)

Student's t-test t_test_result <- t.test(group_new, group_new <- c(80,

group_traditional) 85, 88, 92, 78, 90,
84, 88, 85, 89)
print(t_test_result) group_traditional <-
c(75, 82, 79, 88, 70,
81, 75, 80, 78, 83)

one-sample t- t_test_result <- t.test(sample_data, mu = 75) # sample_data <-

testing mu is the hypothesized population mean c(72, 74, 78, 70, 76,
print(t_test_result) 73, 77, 75, 79, 71,
74, 76, 80, 72, 74,
75, 73, 75, 78, 76,
73, 74, 76, 77, 75,
72, 78, 74, 76, 75)
(assume population
mean’s test score is
75.

#Two sample T- # Perform a two-sample t-test with unequal group1 <- c(22, 24,
test with unequal variances (Welch's t-test) 25, 28, 26)
variance (by t_test_result <- t.test(group1, group2, var.equal group2 <- c(30, 32,
default it = FALSE) 31, 35, 33)
assumes equal
variance) # Print the results
print(t_test_result)
(R has equal variance built in function already,
but in this analysis we are going with unequal
variances so while using the above command it
is important to write var.equal=FALSE
Incase you want equal variance then the
command would be var.equal=TRUE

one-tailed paired t_test_result <- t.test(after_training, before_training <-

samples t-test before_training, alternative = "greater") c(50, 55, 48, 52, 45,
print(t_test_result) 47, 53, 49, 51, 50)
after_training <-
c(58, 62, 55, 60, 54,
56, 61, 57, 59, 58)

two-sample result = wilcox.test(after, before, paired = after= c(4, 3, 4, 2, 3)

paired Wilcoxon TRUE) before= c(6, 7, 8, 5,
U-test print(result) 7)

#Covariance data <- data.frame(

(Without xlfile) Student = 1:10,
Hours_Studied = c(2, 3, 1, 4, 5, 2, 3, 1, 4, 5),
Exam_Score = c(65, 75, 60, 80, 90, 70, 75, 55,
85, 95)
)
# Calculate the covariance between
Hours_Studied and Exam_Score
covariance_matrix <- cov(data$Hours_Studied,
data$Exam_Score)

# Print the covariance

>covariance_matrix
#Correlation # Calculate the Pearson correlation tv_ad_spend <-
(Without xl file) coefficient c(5000, 5500, 6000,
>correlation_coefficient <- cor(tv_ad_spend, 5500, 5800, 6200,
sales_revenue, method = "pearson") 6500, 7000, 7500,
7200)
# Print the correlation coefficient >sales_revenue <-
>correlation_coefficient c(75000, 78000,
82000, 76000,
80000, 84000,
87000, 91000,
95000, 93000)

#Test for # Create a data frame with demographic data

Association >data <- data.frame(
using Gender = c("Male", "Female", "Male",
chi_squared_test "Female", "Male", "Female", "Male", "Female",
(without xl file) "Male", "Female"),
Education_Level = c("High School",
"College", "High School", "College",
"Graduate", "High School", "College",
"Graduate", "High School", "Graduate")
)

# Create a contingency table (cross-

tabulation) of the two variables
>contingency_table <- table(data$Gender,
data$Education_Level)
#Check how the contigency table looks like
>contingency_table

# Perform a chi-squared test for

independence
>chisq.test(contingency_table)
#Test for data1=read.csv(file.choose())
Association > data1
using
chi_squared_test
(with xl file)

# Step2: Generate a cross contingency table

summing up total respondent for each price-
rating combination
> contingency_table
=xtabs(Number.of.respondents ~ Price + Rating,
data = data1)

> contingency_table

# Step 4: Run chi-squared test to test the

Hypothesis
> chisq.test(contingency_table)

One Way students_data= read.csv(file.choose())

ANOVA > students_data
(with xlfile) #Step 2: visualization of mean using Boxplot
boxplot(Test_Score ~ Teaching_Method, data =
students_data, col = "lightblue", pch = 18, main
= "Distribution of Test Scores by Teaching
Method", xlab = "Teaching Method", ylab =
"Test Score")

#Step 3: One way ANOVA

anova_result <- aov(Test_Score ~

Teaching_Method, data = students_data)

> summary(anova_result)

#Step 4: Tukey HSD post hoc testing

tukey_results <- TukeyHSD(anova_result)

> print(tukey_results)

Two Way #Step 1: load the CSV file and organize it into
ANOVA data frames.
GTL=read.csv(file.choose())

> GTL

# Step 2: Using box plot, visualizing light vs

temperature for different glass types.
boxplot(Light ~ Temp * Glass, data = GTL, col
= c("lightblue", "lightgreen"), main = "Boxplot
of Light vs Temperature for Different Glass
Types",xlab = "Temperature",ylab = "Light")
# Step 3: Formulate a hypothesis about the
effect of glass type and temperature on light
output. Run Two way ANOVA.

anova_result= aov(Light ~ Glass * Temp, data =

GTL)

> summary(anova_result)

# Step 4: Conduct Post-hoc testing

TukeyHSD(anova_result)

Linear Step 1: load the data and organize it into data

Regression frames.
> height= c(65, 62, 60, 64, 68, 70, 68, 65)

> weight= c(75, 70, 65, 72, 75, 80, 72, 64)

> student_data= data.frame(Height=height,

Weight=weight)

> student_data

#Create a simple regression of Weight vs

Height
reg=
lm(student_data$Weight~student_data$Height)

> summary(reg)

#Find correlation coefficient and intercept

correlation=cor(student_data$Height,
student_data$Weight)

summary(correlation)

Multiple sales_data= data.frame(Sales = sales, sales =c(10, 15, 12,

Regression Advertising = advertising, Pricing = pricing, 18, 20, 22, 25, 28,
Competitor_Pricing = competitor_pricing) 39, 32)
> sales_data > advertising=c(5, 6,
6, 8, 10, 12, 15, 16,
Note: in case you want to input the data into
18, 20)
data structures(vectors etc) that is without
xlfiles, for regression or any kind of analysis > pricing= c(20, 18,
which involves more columns in such cases 16, 15, 14, 13, 12,
you need to convert the vector in a data 11, 10, 9)
frame.
>
# Step 2: Create a regression model
competitor_pricing=
> reg_model=lm(Sales ~ Advertising + Pricing +
c(18, 17, 16, 16, 15,
Competitor_Pricing, data = sales_data)
14, 13, 12, 11, 10)
> reg_model

> summary(reg_model)

Effective Organisational Communication
100% (2)
Effective Organisational Communication
383 pages
SEMI FINAL Exam - Stat and Prob 11
No ratings yet
SEMI FINAL Exam - Stat and Prob 11
2 pages
Unit 2 DSRP
No ratings yet
Unit 2 DSRP
56 pages
Unit III Methods of Philosophizing
No ratings yet
Unit III Methods of Philosophizing
6 pages
Inferential Statistics Lecture
No ratings yet
Inferential Statistics Lecture
83 pages
List of Experiments
No ratings yet
List of Experiments
5 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
8604 2nd Assignment
No ratings yet
8604 2nd Assignment
27 pages
Regression and Correlation Analysisxy
No ratings yet
Regression and Correlation Analysisxy
23 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Module 2.9
No ratings yet
Module 2.9
12 pages
R Studio Commands
No ratings yet
R Studio Commands
19 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Prob Lab
No ratings yet
Prob Lab
10 pages
Activity 1 Scientific Investigation
No ratings yet
Activity 1 Scientific Investigation
2 pages
YIJC H2 2021 Prelim P2 Solutions
No ratings yet
YIJC H2 2021 Prelim P2 Solutions
13 pages
sHIMADZU - Specification Sheet - LCMS-8040
No ratings yet
sHIMADZU - Specification Sheet - LCMS-8040
2 pages
Advance Data Exploration 27 Feb
No ratings yet
Advance Data Exploration 27 Feb
32 pages
R Program Corrections
No ratings yet
R Program Corrections
20 pages
Higher Eng Maths 9th Ed 2021 Solutions Chapter
No ratings yet
Higher Eng Maths 9th Ed 2021 Solutions Chapter
17 pages
CB161 (R Lab Manual)
No ratings yet
CB161 (R Lab Manual)
32 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
R Record-1
No ratings yet
R Record-1
57 pages
R Programming-1
No ratings yet
R Programming-1
6 pages
Introduction QR
No ratings yet
Introduction QR
34 pages
R Guru Cheat Sheet
No ratings yet
R Guru Cheat Sheet
2 pages
R File Code
No ratings yet
R File Code
16 pages
R Code
No ratings yet
R Code
9 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
Day 2
No ratings yet
Day 2
5 pages
Data Analysis in R
No ratings yet
Data Analysis in R
10 pages
R Console
No ratings yet
R Console
6 pages
BAN5
No ratings yet
BAN5
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
35 pages
IBS Sample I
No ratings yet
IBS Sample I
10 pages
R Practicals
No ratings yet
R Practicals
32 pages
Maths Lab
No ratings yet
Maths Lab
17 pages
Exploring Research Methodology: Review Article International Journal of Research & Reviewed by KEL
No ratings yet
Exploring Research Methodology: Review Article International Journal of Research & Reviewed by KEL
5 pages
R-Unit 5
No ratings yet
R-Unit 5
76 pages
Quantum Mechanics Exercises
No ratings yet
Quantum Mechanics Exercises
5 pages
UL2
No ratings yet
UL2
2 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
Advanced Gauge Theory for Physicists
No ratings yet
Advanced Gauge Theory for Physicists
16 pages
Rstudio Cours
No ratings yet
Rstudio Cours
11 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Introduction Qr1
No ratings yet
Introduction Qr1
34 pages
Paper 1 Revision Pack (162 Marks)
No ratings yet
Paper 1 Revision Pack (162 Marks)
28 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
HLST 2301 Notes Print Me
No ratings yet
HLST 2301 Notes Print Me
29 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
41 pages
R Commands
No ratings yet
R Commands
5 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
Data Analysis with R: Tables & Plots
No ratings yet
Data Analysis with R: Tables & Plots
13 pages
Statistics: 2.3 The Mann-Whitney U Test: Rosie Shier. 2004
No ratings yet
Statistics: 2.3 The Mann-Whitney U Test: Rosie Shier. 2004
3 pages
Advanced Statistical Methods Using R
No ratings yet
Advanced Statistical Methods Using R
32 pages
Unit3-Data Science
No ratings yet
Unit3-Data Science
37 pages
R Commands
No ratings yet
R Commands
18 pages
Sampling Techniques Guide
No ratings yet
Sampling Techniques Guide
6 pages
Talk
No ratings yet
Talk
12 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Scores of 30 Students of I - Benevolence in TLE Quiz
No ratings yet
Scores of 30 Students of I - Benevolence in TLE Quiz
4 pages
327 329
No ratings yet
327 329
3 pages
Chris Parkes: Feynman Graphs of QFT
100% (2)
Chris Parkes: Feynman Graphs of QFT
16 pages
Nordic Studies in Pragmatism: Pentti M A Att Anen
No ratings yet
Nordic Studies in Pragmatism: Pentti M A Att Anen
11 pages
Exercise & Heart Rate Lab Guide
No ratings yet
Exercise & Heart Rate Lab Guide
5 pages
Linguistics Context Critique
No ratings yet
Linguistics Context Critique
16 pages
Lampiran SPSS (Fix)
No ratings yet
Lampiran SPSS (Fix)
10 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Characteristic Features of Scientific Methods
No ratings yet
Characteristic Features of Scientific Methods
3 pages
Common Sources of Error in Physics Lab Experiments
0% (1)
Common Sources of Error in Physics Lab Experiments
3 pages
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
No ratings yet
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
15 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
R Basics for Beginners
No ratings yet
R Basics for Beginners
24 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R Course
No ratings yet
R Course
7 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
Curriculum Notes
20% (5)
Curriculum Notes
28 pages
GC/MS/MS Insights for Users
No ratings yet
GC/MS/MS Insights for Users
12 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages