Travis Build Status
noaase is the R package developed as a capstone project in
"Mastering Software Development in R Specialization".
Goal of the project is to build software package in R to clean and visualize data from the "NOAA Significant Earthquake Database".
library(devtools)
devtools::install_github('KrainskiL/noaase', build_vignettes = TRUE)
library(noaase)
Data cleaning:
eq_clean_dataproduce cleanedDATEcolumn, convertLATITUDEandLONGITUDEto numeric type and cleanLOCATION_NAMEusingeq_location_cleanfunction.eq_location_cleanremoves country name fromLOCATION_NAMEand convert text to title case.
ggplot2 geoms:
geom_timelinecreates earthquakes timeline with magnitude and the number of associated deaths within picked countries.geom_timeline_labeladds annotations to then_maxlargest earthquakes on the timeline.
leaflet maps:
eq_mapcreates interactive map with earthquakes epicenters and annotation based on dataset column e.g. date.eq_create_labelcreates HTML label ("Location", "Total deaths", "Magnitude") to use as annotation on leaflet map.
Download and load the data from NOAA Website or use data delivered with the package
filename = system.file("extdata", "eq.txt", package="noaase")
library(readr)
df = readr::read_delim(filename, delim = "\t")
Clean the data using eq_clean_data function.
clean_df = eq_clean_data(df)
To produce interactive leaflet map with earthquakes epicenters (annotated with date) in Mexico since 2000 use eq_map function:
clean_df %>%
dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) >= 2000) %>%
eq_map(annot_col = "DATE")
To produce similar map as in example above, but with annotated Location, Total deaths and Magnitude use the eq_create_label before the eq_map function:
clean_df %>%
dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) >= 2000) %>%
dplyr::mutate(popup_text = eq_create_label(.)) %>%
eq_map(annot_col = "popup_text")
To create timeline of earthquakes with magnitude, death statistics and date of occurrence use ggplot with geom_timeline geom:
clean_df %>%
dplyr::filter(COUNTRY == c("MEXICO","USA") & lubridate::year(DATE) >= 2010) %>%
ggplot(aes(x = DATE,
y = COUNTRY,
color = TOTAL_DEATHS,
size = EQ_PRIMARY)) +
geom_timeline(alpha = .7) +
theme(legend.position = "bottom", legend.box = "horizontal", plot.title = element_text(hjust = 0.5)) +
ggtitle("Earthquakes timeline in USA and Mexico") +
labs(size = "Richter scale value", color = "# deaths")
Use the geom_timeline_label geom for adding annotations to the n_max largest earthquakes:
clean_df %>%
dplyr::filter(COUNTRY == c("MEXICO","USA") & lubridate::year(DATE) >= 2010) %>%
ggplot(aes(x = DATE,
y = COUNTRY,
color = TOTAL_DEATHS,
size = EQ_PRIMARY)) +
geom_timeline(alpha = .7) +
geom_timeline_label(aes(label = LOCATION_NAME), n_max = 4) +
theme(legend.position = "bottom", legend.box = "horizontal", plot.title = element_text(hjust = 0.5)) +
ggtitle("Earthquakes timeline in USA and Mexico") +
labs(size = "Richter scale value", color = "# deaths")