0% found this document useful (0 votes)

13 views27 pages

Tidy Data

Uploaded by

fruito779

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views27 pages

Tidy Data

Uploaded by

fruito779

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

20CS2058 / BASICS OF DATA

ANALYTICS - R
PROGRAMMING AND TABLEAU

Module 2
Tidy Data with tidyr
Introduction

• represent the same underlying data in multiple ways.

• Each dataset shows the same values of four variables, country,
year, population, and cases, but each dataset organizes the
values in a different way:
• table1
• table2
• table3
• table4a
• table4b
Interrelated rules
• There are three interrelated rules which make a dataset tidy:
1. Each variable must have its own column.
2. Each observation must have its own row.
3. Each value must have its own cell
Example1

#Compute rate per 10,000

• table1 %>%
mutate(rate = cases / population * 10000)
OUTPUT
Example2

• # Compute cases per year

• table1 %>%
count(year, wt = cases)
• table1 %>%
count(year, cases)
OUTPUT
Visualization
• library(ggplot2)
• ggplot(table1, aes(year, cases)) +
geom_line(aes(group = country),color =“red") +
geom_point(aes(color = country))
Output
Gathering

• tidy4a <- table4a %>%

gather(`1999`, `2000`, key = "year", value = "cases")
• tidy4b <- table4b %>%
gather(`1999`, `2000`, key = "year", value =
"population")
• left_join(tidy4a, tidy4b)
Output
Cntd…
Spreading

• Spreading is the opposite of gathering.

• You use it when an observation is scattered across
multiple rows.
• Example:
spread(table2, key = type, value = count)
Output
Cntd…
gather() vs spread()

• gather() makes wide tables narrower and longer;

spread() makes long tables shorter and wider.
Separate()

• pulls apart one column into multiple columns, by

splitting wherever a separator character appears
Example

• table3 %>%
separate(rate, into = c("cases", "population"))
Cntd..
Rewrite the preceding code

• If you wish to use a specific character to separate a

column, you can pass the character to the sep
argument of separate().
• table3 %>%
separate(rate, into = c("cases", "population"), sep = "/“)
Convert to better types

• table3 %>%
separate(
rate,
into = c("cases", "population"),
convert = TRUE
)
OUTPUT
Cntd…

• When using integers to separate strings, the length of

sep should be one less than the number of names in
into.
Unite

• is the inverse of separate():

• it combines multiple columns into a single column
• can use unite() to rejoin the century and year columns
that we created in the last example.
Example
Cntd…
Example
• The default will place an underscore (_)
between the values from different columns.
• We can also specify “”
• table5 %>% unite(new, century, year, sep =
"")

Apache Nifi Tutorial
No ratings yet
Apache Nifi Tutorial
19 pages
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
Dplyr Cheatsheet PDF
100% (1)
Dplyr Cheatsheet PDF
2 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
05 Presentation of Data
No ratings yet
05 Presentation of Data
17 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Cleaning Data
No ratings yet
Cleaning Data
17 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
Tidy Data
No ratings yet
Tidy Data
4 pages
Tidy Data
No ratings yet
Tidy Data
62 pages
02-Data Gathering and Preparation
No ratings yet
02-Data Gathering and Preparation
54 pages
Data Cleansing Using R
0% (1)
Data Cleansing Using R
10 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy
No ratings yet
Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy
4 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
CleaningData Chapter 2
No ratings yet
CleaningData Chapter 2
21 pages
DataCamp Week 5
No ratings yet
DataCamp Week 5
7 pages
Tidy Verse
No ratings yet
Tidy Verse
76 pages
Stats 1 - IITM BS Notes - Part 1
No ratings yet
Stats 1 - IITM BS Notes - Part 1
16 pages
Data Analytics Lesson 10 Notes
No ratings yet
Data Analytics Lesson 10 Notes
7 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Assignment 2 Tidyr
No ratings yet
Assignment 2 Tidyr
2 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
San Francisco Crime Data Analysis
No ratings yet
San Francisco Crime Data Analysis
4 pages
R Dplyr Tutorial - Merge, Join, Spread PDF
No ratings yet
R Dplyr Tutorial - Merge, Join, Spread PDF
17 pages
Tidyr & Dplyr Functions Guide
No ratings yet
Tidyr & Dplyr Functions Guide
3 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R Module 8 - Data Cleaning
No ratings yet
R Module 8 - Data Cleaning
48 pages
Data Collection Essentials
No ratings yet
Data Collection Essentials
1 page
R Data Cleaning for Beginners
No ratings yet
R Data Cleaning for Beginners
21 pages
Principles of Data Literacy - Introduction To Data Cheatsheet - Codecademy
No ratings yet
Principles of Data Literacy - Introduction To Data Cheatsheet - Codecademy
6 pages
R Assignment
No ratings yet
R Assignment
5 pages
Data Collection & Organization Guide
No ratings yet
Data Collection & Organization Guide
13 pages
Tidy Data Principles and R Packages
No ratings yet
Tidy Data Principles and R Packages
14 pages
Study Guide Data Manipulation With R
No ratings yet
Study Guide Data Manipulation With R
4 pages
EDAV Manual With Code
No ratings yet
EDAV Manual With Code
70 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
32 pages
C++ 1 of 6
No ratings yet
C++ 1 of 6
65 pages
Intro to Git and GitHub Basics
No ratings yet
Intro to Git and GitHub Basics
43 pages
GC 2024 05 07
No ratings yet
GC 2024 05 07
40 pages
Unit 2 Answer
No ratings yet
Unit 2 Answer
12 pages
Pemodelan Basis Data
No ratings yet
Pemodelan Basis Data
15 pages
Iot Question Bank
No ratings yet
Iot Question Bank
2 pages
Android Game Debug Log
No ratings yet
Android Game Debug Log
2 pages
Final Exam 2081 C Programming
No ratings yet
Final Exam 2081 C Programming
2 pages
Continuous Deployment Quiz Guide
No ratings yet
Continuous Deployment Quiz Guide
5 pages
Minor Project File
No ratings yet
Minor Project File
26 pages
Latestlog
No ratings yet
Latestlog
24 pages
Prevent Duplicate BOM in CS01
No ratings yet
Prevent Duplicate BOM in CS01
12 pages
Ariska Dian Musali: Personal Data
No ratings yet
Ariska Dian Musali: Personal Data
4 pages
Android App for Disabled Health & Fitness
No ratings yet
Android App for Disabled Health & Fitness
6 pages
COMP6459 - Object Oriented Programming: Topic 9 - String Class
No ratings yet
COMP6459 - Object Oriented Programming: Topic 9 - String Class
27 pages
Spring Interview Questions
No ratings yet
Spring Interview Questions
84 pages
Book of Vaadin
No ratings yet
Book of Vaadin
480 pages
Red Hat Cloud-Native Microservices Development With Quarkus: ID DO378 Prezzo 2.400
No ratings yet
Red Hat Cloud-Native Microservices Development With Quarkus: ID DO378 Prezzo 2.400
3 pages
Object Oriented Modeling and Design (9166) - Sample Paper of MSBTE For Sixth Semester Final Year Computer Engineering Diploma (80 Marks)
0% (1)
Object Oriented Modeling and Design (9166) - Sample Paper of MSBTE For Sixth Semester Final Year Computer Engineering Diploma (80 Marks)
2 pages
Experiment 3 Python
No ratings yet
Experiment 3 Python
4 pages
OS. Frame - Vbs Macro
No ratings yet
OS. Frame - Vbs Macro
2 pages
Hema Lakshmi Siva Meghana Udathu - Latest Resume-2
No ratings yet
Hema Lakshmi Siva Meghana Udathu - Latest Resume-2
1 page
OOAD for Developers
No ratings yet
OOAD for Developers
6 pages
137 Datetime-Cheatsheet
No ratings yet
137 Datetime-Cheatsheet
4 pages
Web Tech Practical Guide
No ratings yet
Web Tech Practical Guide
28 pages
Apex MCQ 2 From Muhammad Nur E Alam For Students
100% (1)
Apex MCQ 2 From Muhammad Nur E Alam For Students
11 pages
Sharpen Up On C#
No ratings yet
Sharpen Up On C#
19 pages
Java Unit 2
No ratings yet
Java Unit 2
7 pages

Tidy Data

Uploaded by

Tidy Data

Uploaded by

20CS2058 / BASICS OF DATA

• represent the same underlying data in multiple ways.

#Compute rate per 10,000

• # Compute cases per year

• tidy4a <- table4a %>%

• Spreading is the opposite of gathering.

• gather() makes wide tables narrower and longer;

• pulls apart one column into multiple columns, by

• If you wish to use a specific character to separate a

• When using integers to separate strings, the length of

• is the inverse of separate():

You might also like