Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views27 pages

Tidy Data

Uploaded by

fruito779
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Tidy Data

Uploaded by

fruito779
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

20CS2058 / BASICS OF DATA

ANALYTICS - R
PROGRAMMING AND TABLEAU

Module 2
Tidy Data with tidyr
Introduction

• represent the same underlying data in multiple ways.


• Each dataset shows the same values of four variables, country,
year, population, and cases, but each dataset organizes the
values in a different way:
• table1
• table2
• table3
• table4a
• table4b
Interrelated rules
• There are three interrelated rules which make a dataset tidy:
1. Each variable must have its own column.
2. Each observation must have its own row.
3. Each value must have its own cell
Example1

#Compute rate per 10,000


• table1 %>%
mutate(rate = cases / population * 10000)
OUTPUT
Example2

• # Compute cases per year


• table1 %>%
count(year, wt = cases)
• table1 %>%
count(year, cases)
OUTPUT
Visualization
• library(ggplot2)
• ggplot(table1, aes(year, cases)) +
geom_line(aes(group = country),color =“red") +
geom_point(aes(color = country))
Output
Gathering

• tidy4a <- table4a %>%


gather(`1999`, `2000`, key = "year", value = "cases")
• tidy4b <- table4b %>%
gather(`1999`, `2000`, key = "year", value =
"population")
• left_join(tidy4a, tidy4b)
Output
Cntd…
Spreading

• Spreading is the opposite of gathering.


• You use it when an observation is scattered across
multiple rows.
• Example:
spread(table2, key = type, value = count)
Output
Cntd…
gather() vs spread()

• gather() makes wide tables narrower and longer;


spread() makes long tables shorter and wider.
Separate()

• pulls apart one column into multiple columns, by


splitting wherever a separator character appears
Example

• table3 %>%
separate(rate, into = c("cases", "population"))
Cntd..
Rewrite the preceding code

• If you wish to use a specific character to separate a


column, you can pass the character to the sep
argument of separate().
• table3 %>%
separate(rate, into = c("cases", "population"), sep = "/“)
Convert to better types

• table3 %>%
separate(
rate,
into = c("cases", "population"),
convert = TRUE
)
OUTPUT
Cntd…

• When using integers to separate strings, the length of


sep should be one less than the number of names in
into.
Unite

• is the inverse of separate():


• it combines multiple columns into a single column
• can use unite() to rejoin the century and year columns
that we created in the last example.
Example
Cntd…
Example
• The default will place an underscore (_)
between the values from different columns.
• We can also specify “”
• table5 %>% unite(new, century, year, sep =
"")

You might also like