UNIT 1 Notes
Basic Fundamentals
R is an environment for data manipulation, statistical computing, graphic display
and data analysis. Effective data handling and storage of output is possible.
Simple as well as complicated calcu possible.
Introduction to R: R is a programming language and environment designed for
statistical computing and graphics. Developed by Ross Ihaka and Robert
Gentleman in the 1990s, it's widely used in data analysis, statistical modeling,
and visualization.
History and Applications: R has its roots in the S language created at Bell
Laboratories. It's used in various fields such as academia, healthcare, and
finance for data analysis and visualization.
Key Features
Comprehensive collection of statistical tools.
Rich graphical capabilities.
Extensible through packages.
Community support and active development.
Advantages of R
Open-source and free to use.
Wide variety of libraries and packages for data analysis and visualization.
Active community and extensive online resources.
Platform-independent (works on Windows, macOS, and Linux).
Excellent for statistical analysis and data manipulation.
Integration with other programming languages like Python and C++
R is an interpreted computer language.
UNIT 1 Notes 1
Installation and Use of Software
Downloading R: You can download R from the Comprehensive R Archive
Network (CRAN) at https://cran.r-project.org/. Follow the instructions on the
website to install R for your operating system (Windows, macOS, Linux).
Installing RStudio: RStudio is an integrated development environment (IDE) for
R. It provides a user-friendly interface for writing and executing R code.
Download RStudio from https://rstudio.com/.
UNIT 1 Notes 2
UNIT 1 Notes 3
UNIT 1 Notes 4
Creating a vector
Vector entries can be calculations or previously stored items including vectors
themselves
UNIT 1 Notes 5
myvec←c(1,3,4,2,4)
myvec
foo<-32.1
newvec<-(1,2,3,4,foo)
myvec3<-(myvec,newvec)
Slicing
3:27
foo=5.3
bar=foo:(-47+1.5)
bar
Sequences with seq
seq(from=3,to=27,by=3)
seq(from=3,to=27,length.out=40)
length.out means the gaps it will form…which means here 39 gaps will be formed
Repetition with rep
UNIT 1 Notes 6
rep(x=1,times=4)
x=c((3,4,2),times=3)
#342342342
rep(x,each=2)
#3 3 4 4 2 2
rep(x,each=2,times=3)
Sorting with sort
sort(x=c(2,4,5,19,33),decreasing=FALSE)
#for ascending order
foo=seq(from=4.3,to=5.5,length.out=8)
bar=sort(x=foo,decreasing=TRUE)
Vector length
length(x=c(1,2,3,4))
Subsetting and Element Extraction
myvec<-c(5,-2,3,4,4,4,-8)
length(x=myvec)
#7
myvec[1]
#5
foo<-myvec[2]
foo
#-2
myvec[length(myvec)]
#-8
UNIT 1 Notes 7
Command line Vs Script
Execution of commands in R is not menu driven.
we need to type the commands
single line and multi lines commands are possible to write
when writing multi line program it is useful to use a text editor rather than
executing everythn directly at the command line.
there are 2 options-
1.One may use R’s own built in editor. it is accessible from R gui menu bar.
2. Use R studio software.
Introduction to R studio
it is an interface b/w R and us.
it is more useful for beginners.
it makes coding easier.
there r 4 windows in R studio
Window 1- script selection, Win 2- console (calculation takes place here),
Win 3- Environment window ( all the variables and objects used
in the prog appear here), Win 4- Output window (output appears here).
Cleaning up the windows
We assign names to variables when analyzing any data. It is good practice to
remove the variables names given to any dataframe at the end of each session in
R.
To remove element we use Rm()
Detach() removes from the search path of available R object. Usually this is either
the data frame which has been attached or a package which was attach by the
library
UNIT 1 Notes 8
x=3
y=4
rm(x,y)
To get rid of everythn including data frames, type rm(list=ls())
library(splines) #loads the packages splines
detach(package:splines) #detaches the package splines
To plot the Histograms
hist(v, main, xlab, xlim, ylim, breaks, col, border)
v: Numerical values used in the histogram.
main: Title of the chart.
xlab: Label for the horizontal axis.
xlim: Range of values for the x-axis.
ylim: Range of values for the y-axis.
breaks: Width of each bar.
col: Color of the bars.
border: Border color of each bar
UNIT 1 Notes 9
3. Data Editing
Use the data.frame or matrix structure to store tabular data.
Example:
data <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
print(data)
Use edit(data) to edit data interactively.
4. Using R as a Calculator
Basic arithmetic:
5 + 3 # Addition
5 - 3 # Subtraction
5 * 3 # Multiplication
5 / 3 # Division
5^3 # Exponentiation
Functions for math operations:
sqrt(16) # Square root
log(10) # Natural logarithm
log10(100) # Base-10 logarithm
5. Functions and Assignments
UNIT 1 Notes 10
Defining Functions
Creating reusable blocks of code (functions)
add <- function(a, b) {
return(a + b)
}
result <- add(5, 3)
print(result)
Assignments
Use <- or = to assign values: assigning values to variables for computation
and data manipulation.
x <- 10
y = 20
z <- x + y
print(z)
6. R Packages
Collections of pre-built functions and datasets in R that extend its functionality for
specialized tasks
Extend R’s capabilities with packages.
To install a package:
install.packages("ggplot2")
Load a package:
library(ggplot2)
Check installed packages:
UNIT 1 Notes 11
installed.packages()
7. Expressions, Objects, Symbols, and Functions
Expressions
Any syntactically valid collection of R code. Any valid combination of code that
produces a result
x <- 5 + 3 # Expression
Objects
Everything in R is an object (e.g., vectors, matrices, lists).
Data entities in R, such as vectors or lists, that store information.
Create objects using assignments.
my_vector <- c(1, 2, 3)
Symbols
Names assigned to objects.
x <- 10 # 'x' is a symbol for the value 10
Functions
Built-in or user-defined reusable operations that perform specific tasks.
mean(c(1, 2, 3)) # Built-in function
8. Special Values
UNIT 1 Notes 12
Special Constants
Unique constants in R (e.g.,
NA , NULL , Inf , NaN ) that represent missing data, absence of value, or undefined
operations.
NA : Missing value.
NULL : Absence of value.
Inf : Infinity (e.g., 1/0).
NaN : Not a Number (e.g., 0/0).
Examples
x <- c(1, 2, NA, 4)
mean(x, na.rm = TRUE) # Remove NA for calculations
UNIT 1 Notes 13