Data Types in R
Introduction
Generally, while doing programming in any programming language, you need to use
various variables to store various information. Variables are nothing but reserved
memory locations to store values. This means that, when you create a variable you
reserve some space in memory.
You may like to store information of various data types like character, wide
character, integer, floating point, double floating point, Boolean etc. Based on the
data type of a variable, the operating system allocates memory and decides what
can be stored in the reserved memory.
In contrast to other programming languages like C and java in R, the variables are
not declared as some data type.
# The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R-objects. The
frequently used ones are –
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
In R programming, the very basic data types are the R-objects called vectors which
hold elements of different classes , Logical , Numeric , Integer , Complex ,
Character.
Vectors
Ex ;
> v = c(2,5,7,10,12,15) ; > v
[1] 2 5 7 10 12 15
> class(v) ; [1] "numeric"
> v = c("r","e","s")
> class(v)
[1] "character"
> v=(2,3,"w","r")
Error: unexpected ',' in "v=(2,"
Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
> # list in R
# creating lists in r ; as the name suggests there is a list of items in for example
there is list of customers with respect to age ; five came 7 am
(20,21,30,40,50);another four came at 8 am there age is (12,30,20,30); at 10 am
three members came their names are (A,B,C,D) , these can be stored in a list as
follows
> lis
[[1]]
[1] 20 21 30 40 50
[[2]]
[1] 12 30 20 30
[[3]]
[1] "A" "B" "C" "D"
> lapply(lis,mean)
[[1]]
[1] 32.2
[[2]]
[1] 23
[[3]]
[1] NA
> List = list(1,2,c("r","v","b"),c(3,5,6))
> List
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] "r" "v" "b"
[[4]]
[1] 3 5 6
> mean(List)
[1] NA
> lapply(List,mean)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] NA
[[4]]
[1] 4.666667
Matrices
M = matrix( c('a','a','b','c','b','a',2) nrow = 2, ncol = 3,
byrow = T)
M
Arrays
While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required
number of dimension.
> A = array(c(3,2),dim=c(3,3,4))
>A
,,1
[,1] [,2] [,3]
[1,] 3 2 3
[2,] 2 3 2
[3,] 3 2 3
,,2
[,1] [,2] [,3]
[1,] 2 3 2
[2,] 3 2 3
[3,] 2 3 2
,,3 …
Factors
Factors are the r - objects which are created using a vector.
It stores the vector along with the distinct values of the elements in the vector as
labels.
The labels are always character irrespective of whether it is numeric or character
or Boolean etc. in the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the
count of levels.
data<-
c("East","West","East","North","North","East","West","West","West","East","North")
> data
[1] "East" "West" "East" "North" "North" "East" "West" "West" "West" "East"
"North"
> facatordata = factor(data)
> factordata
[1] East West East North North East West West West East North
Levels: East North West
Data Frames
Data frames are tabular data objects.
Unlike a matrix in data frame each column can contain different modes of data.
The first column can be numeric while the second column can be character and
third column can be logical. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
Ex:
> BMI <- data.frame(
+ gender = c("Male", "Male","Female"),
+ height = c(152, 171.5, 165),
+ weight = c(81,93, 78),
+ Age = c(42,38,26)
+)
> print(BMI)
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
Tables
Tables are often essential for organizing and summarizing your data, especially with
categorical variables.
When creating a table in R, it considers your table as a specific type of object
(called “table”) which is very similar to a data frame.
Though this may seem strange since datasets are stored as data frames, this
means working with tables will be very easy.
Key words for forming tables are
table( ) ; prop.table() ; xtabs ; margin.table() ;
chisq.test( ) ; fisher.test()
note on margin.tables ; marginSums
> m = matrix(2:5,2)
>m
[,1] [,2]
[1,] 2 4
[2,] 3 5
> marginSums(m)
[1] 14
> marginSums(m,1)
[1] 6 8
> marginSums(m,2)
[1] 5 9
Example for Margins and Margin sums
> tel ; attach(tel) ;
> t = table(ed,churn);t ; margin.table(t)
churn
ed No Yes
College degree 142 92
Did not complete high school 172 32
High school degree 224 63
Post-undergraduate degree 38 28
Some college 150 59
[1] 1000
> margin.table(t,1) ; margin.table(t,2)
ed
College degree Did not complete high school
234 204
High school degree Post-undergraduate degree
287 66
Some college
209
churn
No Yes
726 274
Another example
> UCBAdmissions
, , Dept = A
Gender
Admit Male Female
Admitted 512 89
Rejected 313 19
, , Dept = B
Gender
Admit Male Female
Admitted 353 17
Rejected 207 8
> DF <- as.data.frame(UCBAdmissions)
> DF
Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
> tbl <- xtabs(Freq ~ Gender + Admit, DF)
> prop.table(tbl)
> tbl
Admit
Gender Admitted Rejected
Male 1198 1493
Female 557 1278
marginSums(tbl,1)
> marginSums(tbl,1)
Gender
Male Female
2691 1835
> marginSums(tbl,"Gender")
Gender
Male Female
2691 1835
> marginSums(tbl,2)
Admit
Admitted Rejected
1755 2771
> proportions(tbl,"Gender")
Admit
Gender Admitted Rejected
Male 0.4451877 0.5548123
Female 0.3035422 0.6964578