0% found this document useful (0 votes)

11 views6 pages

Assignment R

This document outlines a series of tasks from the 'Introduction to R Programming for Data Science' course by IBM on Coursera, focusing on practical exercises with R programming. It includes tasks related to extracting and processing COVID-19 data from Wikipedia, analyzing the data, and performing various calculations and comparisons. The document is structured into ten tasks, each addressing different aspects of data manipulation and analysis using R.

Uploaded by

badrtaha897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

Assignment R

Uploaded by

badrtaha897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment HTML

Mohammed Hamad

2024-07-16
{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)

Introduction
This document contains a series of tasks that are part of the “Introduction to R
Programming for Data Science” course offered by IBM on Coursera. These tasks serve as
practical exercises to reinforce the concepts covered in the course modules.

R Markdown
Course Information
• Course Name: Introduction to R Programming for Data Science
• Provider: IBM on Coursera

Objective
The primary objective of these tasks is to provide hands-on experience with R
programming techniques essential for data science. Each task is designed to cover specific
aspects of data manipulation, visualization, and analysis using R.

Structure of the Document

This document consists of 10 tasks. Each task focuses on a different aspect of R
programming and data science concepts taught in the course. The tasks are designed to be
completed sequentially, following the course modules.

To install Packages:
#install.packages("httr")
#install.packages("rvest")

To load the Library

library(httr)
library(rvest)
TASK 1: Get a COVID-19 pandemic Wiki page using HTTP request

To write the Get function:

get_wiki_covid19_page <- function(url, param) {
query_param <- list(title=param)
response <- GET (url , query=query_param)
return(response)
}

To call get_wiki_covid19_page
get_wiki_covid19_page("https://en.wikipedia.org/w/
index.php","Template:COVID-19_testing_by_country")

Task2: Extract COVID-19 testing data table from the wiki HTML page
Now use the read_html function in rvest library to get the root html node from
response
library(rvest)
url <- "https://en.wikipedia.org/w/index.php?title=Template:COVID-
19_testing_by_country"
root_node <- read_html(url)
root_node

Get the tables in the HTML root node using html_nodes function
table_node <- html_nodes(root_node, "table")
table_node

Notice we need to call number [2]; which is wikitable

Hint:- Please read the table_node with index 2(ex:- table_node[2]).
data_frame <- as.data.frame(html_table(table_node[2]))
head(data_frame)

TASK 3: Pre-process and export the extracted data frame

The goal of task 3 is to pre-process the extracted data frame from the previous
step, and export it as a csv file
Let’s get a summary of the data frame
summary(data_frame)

preprocess_covid_data_frame <- function(data_frame) {

shape <- dim(data_frame)

# Remove the World row
data_frame<-data_frame[!
(data_frame$`Country.or.region`=="World"),]
# Remove the last row
data_frame <- data_frame[1:172, ]

# We dont need the Units and Ref columns, so can be removed

data_frame["Ref."] <- NULL
data_frame["Units.b."] <- NULL

# Renaming the columns

names(data_frame) <- c("country", "date", "tested", "confirmed",
"confirmed.tested.ratio", "tested.population.ratio",
"confirmed.population.ratio")

# Convert column data types

data_frame$country <- as.factor(data_frame$country)
data_frame$date <- as.factor(data_frame$date)
data_frame$tested <- as.numeric(gsub(",","",data_frame$tested))
data_frame$confirmed <-
as.numeric(gsub(",","",data_frame$confirmed))
data_frame$'confirmed.tested.ratio' <-
as.numeric(gsub(",","",data_frame$`confirmed.tested.ratio`))
data_frame$'tested.population.ratio' <-
as.numeric(gsub(",","",data_frame$`tested.population.ratio`))
data_frame$'confirmed.population.ratio' <-
as.numeric(gsub(",","",data_frame$`confirmed.population.ratio`))

return(data_frame)
}

Call the preprocess_covid_data_frame function

proper_data_frame <- preprocess_covid_data_frame(data_frame)
head(proper_data_frame)

Get the summary of proper_data_frame

summary(proper_data_frame)

To save this file under this name: covid_19(2024)

write.csv(proper_data_frame, file = 'covid-
19(2024).csv',row.names=FALSE)

To check if its available:

# Get working directory
wd <- getwd()
# Get exported
file_path <- paste(wd, sep="", "/covid19(2024).csv")
# File path
print(file_path)
file.exists(file_path)

##TASK 4: Get a subset of the extracted data frame ##The goal of task 4 is to get the 5th to
10th rows from the data frame with only country and confirmed columns selected
#Read covid_data_frame_csv from the csv file
#read.csv("covid-19(2023).csv")

covid_19 <- read.csv("covid-19(2024).csv")

covid_data <- as.data.frame(covid_19)
# Get the 5th to 10th rows, with two "country" "confirmed" columns

covid_data[5:10,c('country','confirmed')]

TASK 5: Calculate worldwide COVID testing positive ratio

The goal of task 5 is to get the total confirmed and tested cases worldwide, and
try to figure the overall positive ratio using confirmed cases / tested cases
# Get the total confirmed cases worldwide
total_confirmed <- sum(covid_data[,'confirmed'])
total_confirmed
# Get the total tested cases worldwide
total_tested <- sum(covid_data[,'tested'])
total_tested
# Get the positive ratio (confirmed / tested)
positive_ratio <- total_confirmed/total_tested
positive_ratio
round(positive_ratio,2)

TASK 6: Get a country list which reported their testing data

The goal of task 6 is to get a catalog or sorted list of countries who have
reported their COVID-19 testing data
# Get the `country` column
covid_data[,'country']
# Check its class (should be Factor)
class(covid_data$country)
# Convert the country column into character so that you can easily
sort them
covid_data$country <- as.character(covid_data$country)
class(covid_data$country)
# Sort the countries A to Z
sort(covid_data$country)
# Sort the countries Z to A
desc_country <- sort(covid_data$country, decreasing=TRUE)
# Print the sorted Z to A list
print(desc_country)
TASK 7: Identify countries names with a specific pattern
The goal of task 7 is using a regular expression to find any countires start with
United
# Use a regular expression `United.+` to find matches
country_matches <- regexpr('United.+', covid_data$country)

# Print the matched country names

regmatches(covid_data$country, country_matches)

TASK 8: Pick two countries you are interested, and then review their testing
data
The goal of task 8 is to compare the COVID-19 test data between two countires,
you will need to select two rows from the dataframe, and select country,
confirmed, confirmed-population-ratio columns
# Select a subset (should be only one row) of data frame based on a
selected country name and columns

jordan <-
covid_data[covid_data$country=='Jordan',c('country','tested','confirme
d','confirmed.population.ratio')]

# Select a subset (should be only one row) of data frame based on a

selected country name and columns

united_states <- covid_data[covid_data$country=='United

States',c('country','tested','confirmed','confirmed.population.ratio')
]

jordan
united_states

I added this code to make it eaisier; To combine those two tables:

# Extract data for Jordan
jordan <- covid_data[covid_data$country == 'Jordan', c('country',
'tested', 'confirmed', 'confirmed.population.ratio')]

# Extract data for United States

united_states <- covid_data[covid_data$country == 'United States',
c('country', 'tested', 'confirmed', 'confirmed.population.ratio')]

# Combine the two data frames vertically

combined_data <- rbind(jordan, united_states)
# Print the combined data
print(combined_data)

Comparative Analysis of COVID-19 Testing and Confirmed Cases: United States

vs. Jordan
#difference in testing
united_states$tested > jordan$tested
#difference in confirmed
united_states$confirmed > jordan$confirmed

TASK 9: Compare which one of the selected countries has a larger ratio of
confirmed cases to population
The goal of task 9 is to find out which country you have selected before has
larger ratio of confirmed cases to population, which may indicate that country
has higher COVID-19 infection risk
# Use if-else statement
if (united_states$confirmed.population.ratio >
jordan$confirmed.population.ratio) {
print('United States have higher covid-19 risk')
} else {
print('Jordan has higher covid-19 risk')
}

TASK 10: Find countries with confirmed to population ratio rate less than a
threshold
The goal of task 10 is to find out which countries have the confirmed to
population ratio less than 1%, it may indicate the risk of those countries are
relatively low
# Get a subset of any countries with `confirmed.population.ratio` less
than the threshold
new_df <- covid_data[(covid_data$`confirmed.population.ratio` < 1), ]
new_df

COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
Introduction R For DS
No ratings yet
Introduction R For DS
9 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Covid Data Report
No ratings yet
Covid Data Report
21 pages
Soal UN Bahasa Inggris SMP Kelas IX Latihan 1
No ratings yet
Soal UN Bahasa Inggris SMP Kelas IX Latihan 1
4 pages
Lecture 3
No ratings yet
Lecture 3
53 pages
Adarsh Python
No ratings yet
Adarsh Python
18 pages
Masters of Russian Song (c1917) (Vol 2)
86% (7)
Masters of Russian Song (c1917) (Vol 2)
128 pages
DataQuest - Project
No ratings yet
DataQuest - Project
4 pages
Determinants: 97 Questions & Solutions
No ratings yet
Determinants: 97 Questions & Solutions
13 pages
Week12 Slides
No ratings yet
Week12 Slides
46 pages
Spatial Disparities in COVID-19 Vaccination Coverage in Bangladesh 8july21
No ratings yet
Spatial Disparities in COVID-19 Vaccination Coverage in Bangladesh 8july21
34 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Data Analytics Assignment 1
No ratings yet
Data Analytics Assignment 1
11 pages
Tidy Data
No ratings yet
Tidy Data
62 pages
Report MSA Practice02
No ratings yet
Report MSA Practice02
29 pages
The Truth About The Drug Companies How They Deceive Us and What To Do About It 1st Edition Marcia Angell Instant Download
100% (2)
The Truth About The Drug Companies How They Deceive Us and What To Do About It 1st Edition Marcia Angell Instant Download
37 pages
Pyr Agossou FR
No ratings yet
Pyr Agossou FR
12 pages
Maheswari Public School Kalwar Road: Project File Session 2023-24
No ratings yet
Maheswari Public School Kalwar Road: Project File Session 2023-24
28 pages
Ip Practical File Class Xii
No ratings yet
Ip Practical File Class Xii
27 pages
Rogress of Covid 19 Vaccination 1
No ratings yet
Rogress of Covid 19 Vaccination 1
39 pages
MMSBF23051 - Shreya Chakraborty
No ratings yet
MMSBF23051 - Shreya Chakraborty
19 pages
Sample
No ratings yet
Sample
13 pages
Corona Virus Analysis
No ratings yet
Corona Virus Analysis
27 pages
Covid19 Visualization
No ratings yet
Covid19 Visualization
2 pages
First Holy Communion (A5 Booklet)
100% (1)
First Holy Communion (A5 Booklet)
7 pages
Baby
No ratings yet
Baby
18 pages
Pragmatics of Speech Actions Hops 2: Unauthenticated Download Date - 6/3/16 11:21 Am
100% (1)
Pragmatics of Speech Actions Hops 2: Unauthenticated Download Date - 6/3/16 11:21 Am
744 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
I.P Project
No ratings yet
I.P Project
24 pages
COVID
No ratings yet
COVID
19 pages
Corona Virus in India
No ratings yet
Corona Virus in India
29 pages
COVID-19 Data Analysis With Pandas and NumPy
No ratings yet
COVID-19 Data Analysis With Pandas and NumPy
5 pages
Assignment Sujith S
No ratings yet
Assignment Sujith S
13 pages
DKV Card Specification - V - 1 - 21-1
No ratings yet
DKV Card Specification - V - 1 - 21-1
10 pages
Co Vids QL Present N 0710
No ratings yet
Co Vids QL Present N 0710
27 pages
Name
No ratings yet
Name
23 pages
Report - Data Visualization and Exploration
No ratings yet
Report - Data Visualization and Exploration
14 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
R Training AM
No ratings yet
R Training AM
6 pages
Análisis de Propagación Del Coronavirus: Angel Villamizar
No ratings yet
Análisis de Propagación Del Coronavirus: Angel Villamizar
16 pages
What'S New in This Version: Bugfix
No ratings yet
What'S New in This Version: Bugfix
10 pages
Package COVID19': January 6, 2021
No ratings yet
Package COVID19': January 6, 2021
6 pages
My P Report
No ratings yet
My P Report
14 pages
My Tribute To Sara Roxana Diaz Aulestia
No ratings yet
My Tribute To Sara Roxana Diaz Aulestia
3 pages
Informatics Practices Project 12 New
No ratings yet
Informatics Practices Project 12 New
31 pages
IP Project Covid-19 Impact
No ratings yet
IP Project Covid-19 Impact
25 pages
Project Covid 19 Data Analysis
No ratings yet
Project Covid 19 Data Analysis
2 pages
KrutikaKolhe 862467252 HW5
No ratings yet
KrutikaKolhe 862467252 HW5
18 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
Tutorial Worksheet wk7
No ratings yet
Tutorial Worksheet wk7
2 pages
Covid-19 Data Analysis & Safety
No ratings yet
Covid-19 Data Analysis & Safety
17 pages
Covid-19 India Dashboard with Python
No ratings yet
Covid-19 India Dashboard with Python
6 pages
Project Final Report
No ratings yet
Project Final Report
4 pages
Assignment - Ipynb - Colaboratory
No ratings yet
Assignment - Ipynb - Colaboratory
14 pages
Modelling COVID-19 Spatio-Temporal Spread Using Bayesian Nonparametric Covariance Regresssion
No ratings yet
Modelling COVID-19 Spatio-Temporal Spread Using Bayesian Nonparametric Covariance Regresssion
15 pages
Data Analysis Report Team 5
No ratings yet
Data Analysis Report Team 5
15 pages
Tara (Buddhism) - Wikipedia
No ratings yet
Tara (Buddhism) - Wikipedia
13 pages
Analysis and Prediction of COVID-19 For Different Regions and Countries Methods
No ratings yet
Analysis and Prediction of COVID-19 For Different Regions and Countries Methods
6 pages
COVID-19 Data Visualization in Python
No ratings yet
COVID-19 Data Visualization in Python
8 pages
How Does A Teacher Become A Facilitator of Learning
No ratings yet
How Does A Teacher Become A Facilitator of Learning
32 pages
WNL Move Over McDonalds French Taco Poised For Global Expansion Adv
No ratings yet
WNL Move Over McDonalds French Taco Poised For Global Expansion Adv
5 pages
Mad Summer 2022 Mad Model Answer Paper
No ratings yet
Mad Summer 2022 Mad Model Answer Paper
40 pages
Regression Analys
No ratings yet
Regression Analys
7 pages
R Course
No ratings yet
R Course
7 pages
Mini
No ratings yet
Mini
6 pages
Annotated-Lab 1 Spring 2025 Assignment - RMD
No ratings yet
Annotated-Lab 1 Spring 2025 Assignment - RMD
3 pages
Covid Data For Pbi Dashboard
No ratings yet
Covid Data For Pbi Dashboard
2 pages
R Programming Workshop Guide
No ratings yet
R Programming Workshop Guide
7 pages
Lesson Plan Math-3 (Detailed 1)
No ratings yet
Lesson Plan Math-3 (Detailed 1)
10 pages
5 5
No ratings yet
5 5
2 pages
Ruijie RG-S5300-E Series Gigabit 1
No ratings yet
Ruijie RG-S5300-E Series Gigabit 1
16 pages
Guidelines ERC Writing Style - EN - Final
No ratings yet
Guidelines ERC Writing Style - EN - Final
6 pages
3 The Writing Process
No ratings yet
3 The Writing Process
28 pages
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
No ratings yet
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
23 pages
ESL Lesson Introduction & Presentation
No ratings yet
ESL Lesson Introduction & Presentation
3 pages
B.A. Comparative Literature Hons
No ratings yet
B.A. Comparative Literature Hons
5 pages
AI's Impact on Tech and Society
No ratings yet
AI's Impact on Tech and Society
8 pages
69th Film Awards MCQs
No ratings yet
69th Film Awards MCQs
25 pages
R Stats Cheatsheet
No ratings yet
R Stats Cheatsheet
1 page
Bit) Bit Bit Bit Bit
No ratings yet
Bit) Bit Bit Bit Bit
4 pages
L3 - Substitution Cipher
No ratings yet
L3 - Substitution Cipher
22 pages
Wabi, Sabi, and Shibui
No ratings yet
Wabi, Sabi, and Shibui
2 pages
Shortcut Keys
No ratings yet
Shortcut Keys
1 page
MS Excel Full Notes PDF Free Download - Google Search
No ratings yet
MS Excel Full Notes PDF Free Download - Google Search
3 pages
Striving For Inner Peace Najmi
No ratings yet
Striving For Inner Peace Najmi
2 pages
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
No ratings yet
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
6 pages
18.reading Mysterious Creatures
No ratings yet
18.reading Mysterious Creatures
1 page
SEO Basics: Search Engines & Optimization
No ratings yet
SEO Basics: Search Engines & Optimization
52 pages

Assignment R

Uploaded by

Assignment R

Uploaded by

Assignment HTML

Structure of the Document

To load the Library

To write the Get function:

Notice we need to call number [2]; which is wikitable

TASK 3: Pre-process and export the extracted data frame

preprocess_covid_data_frame <- function(data_frame) {

shape <- dim(data_frame)

# We dont need the Units and Ref columns, so can be removed

# Renaming the columns

# Convert column data types

Call the preprocess_covid_data_frame function

Get the summary of proper_data_frame

To save this file under this name: covid_19(2024)

To check if its available:

covid_19 <- read.csv("covid-19(2024).csv")

TASK 5: Calculate worldwide COVID testing positive ratio

TASK 6: Get a country list which reported their testing data

# Print the matched country names

# Select a subset (should be only one row) of data frame based on a

united_states <- covid_data[covid_data$country=='United

I added this code to make it eaisier; To combine those two tables:

# Extract data for United States

# Combine the two data frames vertically

Comparative Analysis of COVID-19 Testing and Confirmed Cases: United States

You might also like