R Commands
To get the working directory getwd()
To set up the working directory setwd()
To create a row vector v<- c(1,2,3)
To create a matrix M1<- matrix(1:20,nrow=4,ncol=5)
To enter data row wise
M1<- matrix(1:20, nrow=4,ncol=5,byrow=TRUE)
To get the value of a vector at particular position v1[2]
To access the particular package from library library(ggplot2)
To know the number of variables have been executed ls()
To get the data from datasets datasets:: mtcars
To view the datasets View(mtcars)
To store datasets into own file datasets:: mtcars
View(mtcars)
File1<- data(mtcars)
View(mtcars)
To know about which class data belongs to class(File1)
Import the data without header file Myfile<- read.csv(file.choose(), sep=””, header=FALSE)
To store the data from one data base to another Myfile<- as.data.frame(mtcars)
It will store the data from mtcars database to Myfile
database
To have the first 6 rows of datasheet head(Myfile)
To have last 6 rows of datasheet tail(Myfile)
To find out the structure of datafile str(Myfile)
To get to know about the descriptive statistics of summary(Myfile)
datafile
To get to know descriptive statistics of particular summary(Myfile$mpg)
variable in a datafile
The $ is used to symbolize the variable you want to
know the stastics. Here mpg is variable in data file
named Myfile
To get the variance of particular variable in datafile var(Myfile$mpg)
To get the standard variance of particular variable sqrt(var(Myfile$mpg))
Importing the Datafile Advertising.txt advertising<- read.csv(file.choose(), sep=” ”)
then write
advertising
Importing the Excel File Firsty, we need to install package
Install.packages(“readxl”)
Then load the package
R Commands
library(“readxl”)
for opening excel
my_data<- read_excel(file.choose())
Cleaning of Data
Mismatch Data
Missing Values
Irrelevant Data
Outliers
Infeasible value (Ex: Age can never be negative)
Redundant Data
To check missing values (overall dataset named is.na(Advertising)
Advertising)
Does the data set contains missing values any(is.na(Advertising))
It will give output as either True or False
To know total number of missing values in the sum(is.na (Advertising))
data set
To find out the exact location of missing value which(is.na(Advertising))
Remove Missing Value
If data set has missing values we can work on it in two ways:
Eliminate the whole row which has the missing value
To replace the missing values by a mean value
To remove the row which has Advertising_new<- na.omit(Advertising)
missing value Advertising_new
To replace the missing value by sum((Advertising$NewspaperAds),na.rm=TRUE)
the mean value at the position This will give the total of all observation other that NA
where NA is written
mean((Advertising$NewspaperAds),na.rm=TRUE)
This will provide the mean of all observations under Newspaper
Ads no including NA
R Commands
Avg_newspaper= mean((Advertising$NewspaperAds),na.rm
=TRUE)
Substituting the value of mean in Avg_newspaper variable
Advertising$NewspaperAds[is.na(Advertising$NewspaperAds)]<-
Avg_newspaper
Putting the mean at the position where NA is written
View(Advertising$NewspaperAds)
To Rename the header of column
Renaming the header row for each column names(Advertising)<- c(“City”, “TVAds”,
“RadioAds”, “NewspaperAds”, “Sales”,
“StoreType”)
Convert into different datatype
Converting numeric into integer Ex:
V1<-10
This is not an integer, it is stored in numeric form
To convert into integer
V1<-as.integer(10)
Now it is stored in form as integer.
OR
Put “L” at last to convert into integer
V1<- 10L
Now it will enter data as integer form.
Visualization
Before plotting up the graphs…we just need to load the package
For that use command:
library(ggplot2)
R Commands
To create a histogram hist(Advertising$RadioAds)
To create a scatter plot plot(Advertising$RadioAds)
Binding the rows/columns
Binding two vectors in column-wise x<-21:23
y<- 7:9
cbind(x,y)
This will only work when both vectors are of
same size
O/p will be
21 7
22 8
23 9
Binding vectors in row-wise x<-21:23
y<- 7:9
rbind(x,y)
This will only work when both vectors are of
same size
O/p will be
21 22 23
7 8 9
Dimension Names of a Matrix
to allocate names to rows and columns of m<-matrix(1:6,nrow=2, dimnames =
matrix list(c("a","b"),c("c","d","e")))