0% found this document useful (0 votes)

11 views6 pages

Assignment R

This document outlines a series of tasks from the 'Introduction to R Programming for Data Science' course by IBM on Coursera, focusing on practical exercises with R programming. It includes tasks related to extracting and processing COVID-19 data from Wikipedia, analyzing the data, and performing various calculations and comparisons. The document is structured into ten tasks, each addressing different aspects of data manipulation and analysis using R.

Uploaded by

badrtaha897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

Assignment R

Uploaded by

badrtaha897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment HTML

Mohammed Hamad

2024-07-16
{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)

Introduction
This document contains a series of tasks that are part of the “Introduction to R
Programming for Data Science” course offered by IBM on Coursera. These tasks serve as
practical exercises to reinforce the concepts covered in the course modules.

R Markdown
Course Information
• Course Name: Introduction to R Programming for Data Science
• Provider: IBM on Coursera

Objective
The primary objective of these tasks is to provide hands-on experience with R
programming techniques essential for data science. Each task is designed to cover specific
aspects of data manipulation, visualization, and analysis using R.

Structure of the Document

This document consists of 10 tasks. Each task focuses on a different aspect of R
programming and data science concepts taught in the course. The tasks are designed to be
completed sequentially, following the course modules.

To install Packages:
#install.packages("httr")
#install.packages("rvest")

To load the Library

library(httr)
library(rvest)
TASK 1: Get a COVID-19 pandemic Wiki page using HTTP request

To write the Get function:

get_wiki_covid19_page <- function(url, param) {
query_param <- list(title=param)
response <- GET (url , query=query_param)
return(response)
}

To call get_wiki_covid19_page
get_wiki_covid19_page("https://en.wikipedia.org/w/
index.php","Template:COVID-19_testing_by_country")

Task2: Extract COVID-19 testing data table from the wiki HTML page
Now use the read_html function in rvest library to get the root html node from
response
library(rvest)
url <- "https://en.wikipedia.org/w/index.php?title=Template:COVID-
19_testing_by_country"
root_node <- read_html(url)
root_node

Get the tables in the HTML root node using html_nodes function
table_node <- html_nodes(root_node, "table")
table_node

Notice we need to call number [2]; which is wikitable

Hint:- Please read the table_node with index 2(ex:- table_node[2]).
data_frame <- as.data.frame(html_table(table_node[2]))
head(data_frame)

TASK 3: Pre-process and export the extracted data frame

The goal of task 3 is to pre-process the extracted data frame from the previous
step, and export it as a csv file
Let’s get a summary of the data frame
summary(data_frame)

preprocess_covid_data_frame <- function(data_frame) {

shape <- dim(data_frame)

# Remove the World row
data_frame<-data_frame[!
(data_frame$`Country.or.region`=="World"),]
# Remove the last row
data_frame <- data_frame[1:172, ]

# We dont need the Units and Ref columns, so can be removed

data_frame["Ref."] <- NULL
data_frame["Units.b."] <- NULL

# Renaming the columns

names(data_frame) <- c("country", "date", "tested", "confirmed",
"confirmed.tested.ratio", "tested.population.ratio",
"confirmed.population.ratio")

# Convert column data types

data_frame$country <- as.factor(data_frame$country)
data_frame$date <- as.factor(data_frame$date)
data_frame$tested <- as.numeric(gsub(",","",data_frame$tested))
data_frame$confirmed <-
as.numeric(gsub(",","",data_frame$confirmed))
data_frame$'confirmed.tested.ratio' <-
as.numeric(gsub(",","",data_frame$`confirmed.tested.ratio`))
data_frame$'tested.population.ratio' <-
as.numeric(gsub(",","",data_frame$`tested.population.ratio`))
data_frame$'confirmed.population.ratio' <-
as.numeric(gsub(",","",data_frame$`confirmed.population.ratio`))

return(data_frame)
}

Call the preprocess_covid_data_frame function

proper_data_frame <- preprocess_covid_data_frame(data_frame)
head(proper_data_frame)

Get the summary of proper_data_frame

summary(proper_data_frame)

To save this file under this name: covid_19(2024)

write.csv(proper_data_frame, file = 'covid-
19(2024).csv',row.names=FALSE)

To check if its available:

# Get working directory
wd <- getwd()
# Get exported
file_path <- paste(wd, sep="", "/covid19(2024).csv")
# File path
print(file_path)
file.exists(file_path)

##TASK 4: Get a subset of the extracted data frame ##The goal of task 4 is to get the 5th to
10th rows from the data frame with only country and confirmed columns selected
#Read covid_data_frame_csv from the csv file
#read.csv("covid-19(2023).csv")

covid_19 <- read.csv("covid-19(2024).csv")

covid_data <- as.data.frame(covid_19)
# Get the 5th to 10th rows, with two "country" "confirmed" columns

covid_data[5:10,c('country','confirmed')]

TASK 5: Calculate worldwide COVID testing positive ratio

The goal of task 5 is to get the total confirmed and tested cases worldwide, and
try to figure the overall positive ratio using confirmed cases / tested cases
# Get the total confirmed cases worldwide
total_confirmed <- sum(covid_data[,'confirmed'])
total_confirmed
# Get the total tested cases worldwide
total_tested <- sum(covid_data[,'tested'])
total_tested
# Get the positive ratio (confirmed / tested)
positive_ratio <- total_confirmed/total_tested
positive_ratio
round(positive_ratio,2)

TASK 6: Get a country list which reported their testing data

The goal of task 6 is to get a catalog or sorted list of countries who have
reported their COVID-19 testing data
# Get the `country` column
covid_data[,'country']
# Check its class (should be Factor)
class(covid_data$country)
# Convert the country column into character so that you can easily
sort them
covid_data$country <- as.character(covid_data$country)
class(covid_data$country)
# Sort the countries A to Z
sort(covid_data$country)
# Sort the countries Z to A
desc_country <- sort(covid_data$country, decreasing=TRUE)
# Print the sorted Z to A list
print(desc_country)
TASK 7: Identify countries names with a specific pattern
The goal of task 7 is using a regular expression to find any countires start with
United
# Use a regular expression `United.+` to find matches
country_matches <- regexpr('United.+', covid_data$country)

# Print the matched country names

regmatches(covid_data$country, country_matches)

TASK 8: Pick two countries you are interested, and then review their testing
data
The goal of task 8 is to compare the COVID-19 test data between two countires,
you will need to select two rows from the dataframe, and select country,
confirmed, confirmed-population-ratio columns
# Select a subset (should be only one row) of data frame based on a
selected country name and columns

jordan <-
covid_data[covid_data$country=='Jordan',c('country','tested','confirme
d','confirmed.population.ratio')]

# Select a subset (should be only one row) of data frame based on a

selected country name and columns

united_states <- covid_data[covid_data$country=='United

States',c('country','tested','confirmed','confirmed.population.ratio')
]

jordan
united_states

I added this code to make it eaisier; To combine those two tables:

# Extract data for Jordan
jordan <- covid_data[covid_data$country == 'Jordan', c('country',
'tested', 'confirmed', 'confirmed.population.ratio')]

# Extract data for United States

united_states <- covid_data[covid_data$country == 'United States',
c('country', 'tested', 'confirmed', 'confirmed.population.ratio')]

# Combine the two data frames vertically

combined_data <- rbind(jordan, united_states)
# Print the combined data
print(combined_data)

Comparative Analysis of COVID-19 Testing and Confirmed Cases: United States

vs. Jordan
#difference in testing
united_states$tested > jordan$tested
#difference in confirmed
united_states$confirmed > jordan$confirmed

TASK 9: Compare which one of the selected countries has a larger ratio of
confirmed cases to population
The goal of task 9 is to find out which country you have selected before has
larger ratio of confirmed cases to population, which may indicate that country
has higher COVID-19 infection risk
# Use if-else statement
if (united_states$confirmed.population.ratio >
jordan$confirmed.population.ratio) {
print('United States have higher covid-19 risk')
} else {
print('Jordan has higher covid-19 risk')
}

TASK 10: Find countries with confirmed to population ratio rate less than a
threshold
The goal of task 10 is to find out which countries have the confirmed to
population ratio less than 1%, it may indicate the risk of those countries are
relatively low
# Get a subset of any countries with `confirmed.population.ratio` less
than the threshold
new_df <- covid_data[(covid_data$`confirmed.population.ratio` < 1), ]
new_df

Forty Years Among The Zulu (Josiah Tyler)
No ratings yet
Forty Years Among The Zulu (Josiah Tyler)
346 pages
2007 Rare Plant Auction - Delaware Center For Horticulture
100% (1)
2007 Rare Plant Auction - Delaware Center For Horticulture
88 pages
PCAB List of Licensed Contractors For CFY 2019-2020 As of 21 Oct 2019 - Web
100% (2)
PCAB List of Licensed Contractors For CFY 2019-2020 As of 21 Oct 2019 - Web
846 pages
Colloid Chemistry: Lecture 10: Surfactants
No ratings yet
Colloid Chemistry: Lecture 10: Surfactants
28 pages
Dictionary en CRO Technical
No ratings yet
Dictionary en CRO Technical
1,105 pages
RRL With Comments
No ratings yet
RRL With Comments
6 pages
Mazda Body Shop Manual
No ratings yet
Mazda Body Shop Manual
35 pages
01 PTE Unbonded System 02 070909
No ratings yet
01 PTE Unbonded System 02 070909
9 pages
Innovative Engineering Company
No ratings yet
Innovative Engineering Company
21 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
Netflix Inspired Powerpoint Design Template (BY GEMO EDITS)
No ratings yet
Netflix Inspired Powerpoint Design Template (BY GEMO EDITS)
12 pages
Juvenile Crimes and Punishment
No ratings yet
Juvenile Crimes and Punishment
11 pages
Genetics MCQ
No ratings yet
Genetics MCQ
3 pages
Allen-Bradley 1783-US5T Switch, Unmanaged, 5 Ports, RJ45 Copper, AC or DC
No ratings yet
Allen-Bradley 1783-US5T Switch, Unmanaged, 5 Ports, RJ45 Copper, AC or DC
4 pages
Curiculum Vitae) : Dr. Anggraini Alam, DR., Spa (K
No ratings yet
Curiculum Vitae) : Dr. Anggraini Alam, DR., Spa (K
42 pages
Instruction Manual: ESG1 Series Controller
No ratings yet
Instruction Manual: ESG1 Series Controller
83 pages
Air Mass Calculations PDF
No ratings yet
Air Mass Calculations PDF
3 pages
Lab Report 7
No ratings yet
Lab Report 7
4 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Knowledge and Attitude Towards Pregnancy Induced Hypertension Among Pregnant Women Attending Antenatal at Kampala International University Teaching Hospital
0% (1)
Knowledge and Attitude Towards Pregnancy Induced Hypertension Among Pregnant Women Attending Antenatal at Kampala International University Teaching Hospital
71 pages
Análisis de Propagación Del Coronavirus: Angel Villamizar
No ratings yet
Análisis de Propagación Del Coronavirus: Angel Villamizar
16 pages
Medical Technology Grade 11
No ratings yet
Medical Technology Grade 11
3 pages
El Mouradi Djerba Menzel Voucher Details
No ratings yet
El Mouradi Djerba Menzel Voucher Details
2 pages
Modelling COVID-19 Spatio-Temporal Spread Using Bayesian Nonparametric Covariance Regresssion
No ratings yet
Modelling COVID-19 Spatio-Temporal Spread Using Bayesian Nonparametric Covariance Regresssion
15 pages
Package COVID19': January 6, 2021
No ratings yet
Package COVID19': January 6, 2021
6 pages
Analysis and Prediction of COVID-19 For Different Regions and Countries Methods
No ratings yet
Analysis and Prediction of COVID-19 For Different Regions and Countries Methods
6 pages
Data Analytics Assignment 1
No ratings yet
Data Analytics Assignment 1
11 pages
COVID-19 Data Visualization in Python
No ratings yet
COVID-19 Data Visualization in Python
8 pages
Covid-19 India Dashboard with Python
No ratings yet
Covid-19 India Dashboard with Python
6 pages
Tidy Data
No ratings yet
Tidy Data
62 pages
NCP For Lifestyle Medicine
No ratings yet
NCP For Lifestyle Medicine
29 pages
Assignment Sujith S
No ratings yet
Assignment Sujith S
13 pages
API 20 E Bolting
No ratings yet
API 20 E Bolting
4 pages
COVID
No ratings yet
COVID
19 pages
Assignment - Ipynb - Colaboratory
No ratings yet
Assignment - Ipynb - Colaboratory
14 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Covid19 Visualization
No ratings yet
Covid19 Visualization
2 pages
Covid Data For Pbi Dashboard
No ratings yet
Covid Data For Pbi Dashboard
2 pages
R Programming Workshop Guide
No ratings yet
R Programming Workshop Guide
7 pages
Compliance Training Essentials
No ratings yet
Compliance Training Essentials
88 pages
R Course
No ratings yet
R Course
7 pages
5 5
No ratings yet
5 5
2 pages
Spatial Disparities in COVID-19 Vaccination Coverage in Bangladesh 8july21
No ratings yet
Spatial Disparities in COVID-19 Vaccination Coverage in Bangladesh 8july21
34 pages
Rogress of Covid 19 Vaccination 1
No ratings yet
Rogress of Covid 19 Vaccination 1
39 pages
Covid-19 Data Analysis & Safety
No ratings yet
Covid-19 Data Analysis & Safety
17 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
No ratings yet
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
6 pages
Data Analysis Report Team 5
No ratings yet
Data Analysis Report Team 5
15 pages
Ink and Paint
No ratings yet
Ink and Paint
19 pages
Report - Data Visualization and Exploration
No ratings yet
Report - Data Visualization and Exploration
14 pages
Pyr Agossou FR
No ratings yet
Pyr Agossou FR
12 pages
COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
Tutorial Worksheet wk7
No ratings yet
Tutorial Worksheet wk7
2 pages
Informatics Practices Project 12 New
No ratings yet
Informatics Practices Project 12 New
31 pages
IP Project Covid-19 Impact
No ratings yet
IP Project Covid-19 Impact
25 pages
Regression Analys
No ratings yet
Regression Analys
7 pages
Name
No ratings yet
Name
23 pages
Diode Vs Thyristor Comparison
No ratings yet
Diode Vs Thyristor Comparison
1 page
Baby
No ratings yet
Baby
18 pages
UNIT 3 - Sensation and Perception
No ratings yet
UNIT 3 - Sensation and Perception
77 pages
Year 6 English (Paper 2 - 2021)
No ratings yet
Year 6 English (Paper 2 - 2021)
8 pages
Gui River River Health Report WEB
No ratings yet
Gui River River Health Report WEB
132 pages
Mini
No ratings yet
Mini
6 pages
Co Vids QL Present N 0710
No ratings yet
Co Vids QL Present N 0710
27 pages
Maheswari Public School Kalwar Road: Project File Session 2023-24
No ratings yet
Maheswari Public School Kalwar Road: Project File Session 2023-24
28 pages
Corona Virus Analysis
No ratings yet
Corona Virus Analysis
27 pages
DataQuest - Project
No ratings yet
DataQuest - Project
4 pages
Lecture 3
No ratings yet
Lecture 3
53 pages
Corona Virus in India
No ratings yet
Corona Virus in India
29 pages
MMSBF23051 - Shreya Chakraborty
No ratings yet
MMSBF23051 - Shreya Chakraborty
19 pages
Neurogymscience Eo
No ratings yet
Neurogymscience Eo
1 page
My P Report
No ratings yet
My P Report
14 pages
Covid Data Report
No ratings yet
Covid Data Report
21 pages
R Training AM
No ratings yet
R Training AM
6 pages
I.P Project
No ratings yet
I.P Project
24 pages
Annotated-Lab 1 Spring 2025 Assignment - RMD
No ratings yet
Annotated-Lab 1 Spring 2025 Assignment - RMD
3 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
Report MSA Practice02
No ratings yet
Report MSA Practice02
29 pages
Introduction R For DS
No ratings yet
Introduction R For DS
9 pages
80 20 Meal Plan SoreyFitness
No ratings yet
80 20 Meal Plan SoreyFitness
8 pages
Week12 Slides
No ratings yet
Week12 Slides
46 pages
KrutikaKolhe 862467252 HW5
No ratings yet
KrutikaKolhe 862467252 HW5
18 pages
R Stats Cheatsheet
No ratings yet
R Stats Cheatsheet
1 page
COVID-19 Data Analysis With Pandas and NumPy
No ratings yet
COVID-19 Data Analysis With Pandas and NumPy
5 pages
Sample
No ratings yet
Sample
13 pages
Adarsh Python
No ratings yet
Adarsh Python
18 pages
Ip Practical File Class Xii
No ratings yet
Ip Practical File Class Xii
27 pages
Project Final Report
No ratings yet
Project Final Report
4 pages
Project Covid 19 Data Analysis
No ratings yet
Project Covid 19 Data Analysis
2 pages

Assignment R

Uploaded by

Assignment R

Uploaded by

Assignment HTML

Structure of the Document

To load the Library

To write the Get function:

Notice we need to call number [2]; which is wikitable

TASK 3: Pre-process and export the extracted data frame

preprocess_covid_data_frame <- function(data_frame) {

shape <- dim(data_frame)

# We dont need the Units and Ref columns, so can be removed

# Renaming the columns

# Convert column data types

Call the preprocess_covid_data_frame function

Get the summary of proper_data_frame

To save this file under this name: covid_19(2024)

To check if its available:

covid_19 <- read.csv("covid-19(2024).csv")

TASK 5: Calculate worldwide COVID testing positive ratio

TASK 6: Get a country list which reported their testing data

# Print the matched country names

# Select a subset (should be only one row) of data frame based on a

united_states <- covid_data[covid_data$country=='United

I added this code to make it eaisier; To combine those two tables:

# Extract data for United States

# Combine the two data frames vertically

Comparative Analysis of COVID-19 Testing and Confirmed Cases: United States

You might also like