0% found this document useful (0 votes)

3 views10 pages

Data Science Individual 4

The document outlines an R script for analyzing and visualizing the trends in Sakura full bloom dates over time, utilizing libraries like ggplot2 and dplyr. It includes data cleaning, reshaping, and various plots to illustrate the shift in bloom dates, which have advanced earlier in the year, likely due to climate change. Additionally, it discusses forecasting future bloom dates using an ARIMA model and highlights the implications of these trends on tourism and environmental factors.

Uploaded by

shaashwath.sankaresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

Data Science Individual 4

Uploaded by

shaashwath.sankaresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

R Script

# Load libraries
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(lubridate)

# Read and clean data

data <- read_csv("full.csv")
cleaned_data <- na.omit(data)
cleaned_data[ , c("Currently Being Observed", "30 Year Average 1981-2010", "Notes")] <-
list(NULL)
View(cleaned_data)

# Reshape data from wide to long format

cleaned_data <- gather(cleaned_data, key = year, value = "value", -"Site Name")
cleaned_data$year <- as.numeric(cleaned_data$year)
View(cleaned_data)

# Extract Year,Month Date,

cleaned_data$year <- substr(cleaned_data$value,1 ,4)

cleaned_data$month <- substr(cleaned_data$value, 6 ,7)
cleaned_data$date <- substr(cleaned_data$value, 9 ,10)

cleaned_data$doy <- yday(cleaned_data$value)

#convert date and year to numeric

cleaned_data$year <- as.numeric(cleaned_data$year)

cleaned_data$month <- as.numeric(cleaned_data$month)
cleaned_data$date <- as.numeric(cleaned_data$date)

# Create the time series plot

plot(cleaned_data$year, cleaned_data$doy,
type = "l",
main = "Sakura Full Bloom Over Time",
xlab = "Time",
ylab = "Day of Year",
col = "purple")
# Plot with month on x axis and year on the fill aesthetic
ggplot(cleaned_data, aes(x= year, y=cleaned_data$doy)) +
geom_point(aes(size=3), alpha = 0.5) +
geom_smooth(method="lm", se=FALSE) +
labs(title= 'Shift in Asahikawa and Obihiro Sakura full blooms',
x= 'Year', y="Full Bloom Date (DOY)") +
theme_bw() # Apply a black and white theme

ggplot(cleaned_data, aes(x=year, y=doy)) +

geom_point() +
geom_smooth(method= lm)+
labs(title = "Shift in Sakura Full Blooms Date",
x="Year",
y="Full Bloom Date(DOY)"
)

# Create base plot

p <- ggplot(cleaned_data, aes(x=year, y=doy)) + #chat gpt fixed data_cleaned instead of
cleaned_data
geom_line(color = "purple") +

#Shade April (DOY 91 to 121)

annotate ("rect",
xmin = min(cleaned_data$year), xmax = max(cleaned_data$year), #chatgpt fixed minn
typo
ymin =91, ymax =121,
fill = "lightgreen", alpha=0.5) +
annotate("text", x=1960, y=109, label = "April", color = "darkgreen", size =6) +

#Shade May (DOY 121 to 151)

annotate("rect",
xmin= min(cleaned_data$year), xmax = max(cleaned_data$year), #chatgpt fixed
datacleaned typo
ymin = 121, ymax = 151,
fill = "lightpink", alpha =0.5) +
annotate("text", x= 1960, y= 142, label = "May", color= "darkred", size=6)+

#Annotate special bloom years

annotate("text", x=1984, y=146, label = "most delayed bloom at 1984") + #chatgpt fixed
missing +
annotate("text", x=2008, y=146, label ="2nd most delayed \nbloom at 2013", size = 3) +
annotate("text", x=2016, y=104, label ="earliest bloom at 2023", size = 3)+
#Annotate arrow
annotate("curve", x= c(2008, 1980, 2018), xend = c(2012, 1983, 2022), y=c(144,145,105),
yend = c(138, 140, 109), linewidth = 0.2, curvature = 0.2,
arrow =arrow(angle=20, length =unit(1.5,"mm"),type="closed"))+

#Mark some specific years

annotate("text",x = 1983, y= 116, label = "1983", size =3) +
annotate("text",x= 1998, y= 112, label ="1998", size =3 ) +
annotate("text", x= 2002, y= 112, label = "2002", size = 3) +
annotate("text", x= 2008, y= 113, label = "2008", size = 3) +

# customize axis and labels

scale_x_continuous(breaks = seq (min(cleaned_data$year),max(cleaned_data$year), by =
10 )) +
labs(title ="Sakura First Bloom Over Time",
x= "Year", y = "Day of Year (DOY)") +
theme_minimal()

print(p)

#FORECASTING

# subset data for a single country

Asahikawa <- subset(cleaned_data, `Site Name` == "Asahikawa") #chatGPT told me to use
backticks for this
View(Asahikawa)

#Subset only univariate

Asahikawa[, c("Site Name",'year','value','month', 'date')] <- list(NULL)

#we predict all years so no removing years

#convert it to a time series object.

Asahikawa_ts_actual <-ts(Asahikawa, start =c(1953,1), frequency =1)

Asahikawa_ts_pred <-ts(pred_Asahikawa, start =c(1953,1), frequency =1)

#Print the timeseries data

print(Asahikawa_ts_actual)
print(Asahikawa_ts_pred)
install.packages("forecast", dependencies = TRUE)
library(forecast)

# fitting model using auto.arima model

model <- auto.arima(Asahikawa_ts_actual) #chat gpt told to change from pred to actual since
# part of graph was missing

#Next 7 forecasted values

forecast_data <- forecast(model,7)
print(forecast_data)
plot(forecast_data, main = "Forecasting for Sakura Full Bloom at Asahikawa")
lines(Asahikawa_ts_actual, type="l", col = "red")# red color didnt work unable to fix

Plot 1:
This plot shows the Date(represented by the Day of the year), in which the Sakura Full bloom
was observed, between the years 1960 and 2020. We can see that the data has a large
variance and oscillated up and down. However there is still a steady trend of Sakura reaching
Full Bloom Earlier in the year as time passed on. This effect could possibly be caused by Global
warming

Plot(1B):
This is the same as the previous plot, except we have shaded the April Green, and May Purple
to help the viewer identify the month. We can see most full blooms occurred in May historically,
but recently it has been shifting towards April. Also important points like Earliest and Latest
Bloom are marked.

Plot 2:
This plot shows the recorded observations of the Date (Day of year) Sakura Full bloom in each
year from 1950 -2010 in various cities across Japan. The graph is titled Shift in Asahikawa and
Obihiro, though it includes other cities as well. This is because those two cities have a
continuous record, while other cities have mostly null values. Thus the graph is mostly
representative of those two cities. The trend line clearly shows the trend of earlier blooming that
was also mentioned in the previous graph

Plot(3)
This is the same as plot 2A, but we have added a confidence interval. It is a 95% confidence
interval which means that there is a 95% probability that the full bloom date is within the shaded
grey area. We can see that there are many blooms that actually occurred outside the shaded
area. The confidence interval doesn't seem that accurate.
Plot (4)

Storytelling:

Around 50 years ago, Cherry blossoms could be observed in full bloom around the 130th day of
the year, or Mid May. Nowadays more recently, Sakura Full bloom may occur as Early as mid
March. The average date of Sakura first bloom has moved 1.2 days forward per decade since
1953 (odotonline.org). The graph clearly indicates this as there is a downward trend, showing
that the full bloom is occurring earlier each decade. Global warming is speculated as a reason
for this This speculation can be confirmed by the fact that the advancing of sakura occurs more
in more urbanized areas. (odotonline.org).

Over the past years , the average temperature in Japan has increased steadily. Japan’s
average temperature in 2024 was 1.48 degrees above the 30-year average for the period up to
2020, indicating the advance of climate change. (nippon.com). A study has shown that without
any human influence, the sakura bloom in Kyoto would have been 11 days later (metoffice.com).

The forecast in the graph shows that this general trend of advancing bloom will continue. This is
expected as the global temperatures are projected to continue to increase. By 2030, global
average temperatures may rise more than 2.0 degrees C past pre industrial levels. (WMO.org).
In addition to delayed bloom, there are also other issues of concern.
For example, If it doesnt become cold enough, the flower buds may go dormant and not
bloom in the next spring (odotonline.com) Further blooming earlier makes sakura more
vulnerable to cold snaps, as they are more likely to happen earlier in the year (axios.com).
Disruption to Sakura schedules and lack of blooming are expected to have an impact on
tourism. (axios.com)

5. Accuracy and Interpretation:

The black (should be red) line shows the historical data of Sakura Full bloom date Between the
years 1960 and 2023. We forecast the bloom between 2024 to 2030, with the ARIMA model and
compare it to the actual data. The projection shows the trend going downward, but does not
accurately predict the date of the year. I think it is hard to forecast with this historical data,
because there are a lot of random fluctuations and the variance is large. The 80% and 95%
confidence intervals are also shown on this graph here. Due to the fluctuations they are very
wide showing that the reliability of the forecast is low, and the forecast is hard to predict. Also
the prediction does not follow the previous pattern of random fluctuations every 1-2 years, so I
think it is inaccurate. However the ARIMA correctly captures the trend of Full Bloom becoming
Earlier as the years Pass.

Which Device (A-H) Would You Use For The Tasks (1-8) ? ( ../8)
100% (3)
Which Device (A-H) Would You Use For The Tasks (1-8) ? ( ../8)
3 pages
Blooming Thermometers Student Page
No ratings yet
Blooming Thermometers Student Page
1 page
Q3a Q3B
No ratings yet
Q3a Q3B
13 pages
Python Crash Course by Ehmatthes 16
No ratings yet
Python Crash Course by Ehmatthes 16
1 page
Daily Trend of PM2.5 Conc. Over The IGP, India (2019-2024)
No ratings yet
Daily Trend of PM2.5 Conc. Over The IGP, India (2019-2024)
2 pages
Q3AB
No ratings yet
Q3AB
15 pages
ARIMA Forecasting for Airline Data
No ratings yet
ARIMA Forecasting for Airline Data
2 pages
3rd Class of R
No ratings yet
3rd Class of R
2 pages
Bollinger Closeline Ema+ichicloud
No ratings yet
Bollinger Closeline Ema+ichicloud
2 pages
Bill Sendewicz TSA Project
No ratings yet
Bill Sendewicz TSA Project
49 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
TS Da
No ratings yet
TS Da
9 pages
Carl2013 Article PhaseDifferenceAnalysisOfTempe
No ratings yet
Carl2013 Article PhaseDifferenceAnalysisOfTempe
10 pages
Heathrow Sunshine Time Series Analysis
No ratings yet
Heathrow Sunshine Time Series Analysis
19 pages
Min - Mean - Max - Single Axis Plot For PM2.5 With Seasonal Highlights
No ratings yet
Min - Mean - Max - Single Axis Plot For PM2.5 With Seasonal Highlights
2 pages
q3 R Software
No ratings yet
q3 R Software
12 pages
9 .ML Programs
No ratings yet
9 .ML Programs
95 pages
Fenologia Reproductiva y Clima
No ratings yet
Fenologia Reproductiva y Clima
14 pages
Time Series Homework
No ratings yet
Time Series Homework
9 pages
ISYE6501 Homework 4
No ratings yet
ISYE6501 Homework 4
7 pages
R Code RPB Abundance Over Time
No ratings yet
R Code RPB Abundance Over Time
1 page
R - Interpolate Missing Values in A Time Series With A Seasonal Cycle - Stack Overflow
No ratings yet
R - Interpolate Missing Values in A Time Series With A Seasonal Cycle - Stack Overflow
3 pages
Dev Lab Record
No ratings yet
Dev Lab Record
31 pages
ARIMA Seasonal R Fun
No ratings yet
ARIMA Seasonal R Fun
1 page
Isye HW2
No ratings yet
Isye HW2
10 pages
7 DS Assignment 1
No ratings yet
7 DS Assignment 1
9 pages
Introduction To R. Graphical Representation of Multivariate Observations
No ratings yet
Introduction To R. Graphical Representation of Multivariate Observations
5 pages
TS Gas Report
No ratings yet
TS Gas Report
40 pages
Assignment No 707
No ratings yet
Assignment No 707
7 pages
Code File
No ratings yet
Code File
4 pages
R Programming in Watershed Hydrology: Rainfall Anomaly Index
No ratings yet
R Programming in Watershed Hydrology: Rainfall Anomaly Index
6 pages
An Introduction To Data Analysis Visualization Using R
No ratings yet
An Introduction To Data Analysis Visualization Using R
30 pages
Practice 1 From Introductory Time Series With R
No ratings yet
Practice 1 From Introductory Time Series With R
14 pages
TS Gas Report
No ratings yet
TS Gas Report
43 pages
Iris HC Solution
No ratings yet
Iris HC Solution
31 pages
PROJECT - Time Series Forecasting by Akshay Kharote PDF
100% (2)
PROJECT - Time Series Forecasting by Akshay Kharote PDF
85 pages
Edr 2
No ratings yet
Edr 2
11 pages
XSTK 66
No ratings yet
XSTK 66
5 pages
R Data Preprocessing & Analysis
No ratings yet
R Data Preprocessing & Analysis
7 pages
Leveraging Web-Based Tool For Phenological Data Management
No ratings yet
Leveraging Web-Based Tool For Phenological Data Management
5 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Gas Prod
100% (3)
Gas Prod
24 pages
Van Der Maaten - Et Al - Dendrochronologia 2016
No ratings yet
Van Der Maaten - Et Al - Dendrochronologia 2016
14 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
11 pages
Data Analysis & Clustering Guide
No ratings yet
Data Analysis & Clustering Guide
22 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
No ratings yet
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
8 pages
ISYE 6501 Georgia Tech hmwk6.2
No ratings yet
ISYE 6501 Georgia Tech hmwk6.2
32 pages
CSE315:Introduction To Data Science: WEEK-8
No ratings yet
CSE315:Introduction To Data Science: WEEK-8
27 pages
ASSIGNMENT-10-M24MSA068.R: # Load Required Libraries
No ratings yet
ASSIGNMENT-10-M24MSA068.R: # Load Required Libraries
7 pages
Practicals Data
No ratings yet
Practicals Data
26 pages
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
No ratings yet
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
20 pages
Statistics Project SEM1 Notes
No ratings yet
Statistics Project SEM1 Notes
5 pages
7 8
No ratings yet
7 8
2 pages
Applsci 13 04898 Histogram
No ratings yet
Applsci 13 04898 Histogram
18 pages
Zoo Read
No ratings yet
Zoo Read
18 pages
Fa Assignment
No ratings yet
Fa Assignment
7 pages
2 Lecture2 Codenotes
No ratings yet
2 Lecture2 Codenotes
11 pages
Group 10 TS Assignment
0% (1)
Group 10 TS Assignment
21 pages
Audison Thesis Car Audio
100% (3)
Audison Thesis Car Audio
5 pages
25 Cleverly Designed Minimal Logo Designs For Inspiration - Designbeep
No ratings yet
25 Cleverly Designed Minimal Logo Designs For Inspiration - Designbeep
13 pages
ATM Banking System (18192203029)
No ratings yet
ATM Banking System (18192203029)
4 pages
Hi-Target V30 50 GNSS RTK System Manual PDF
100% (2)
Hi-Target V30 50 GNSS RTK System Manual PDF
70 pages
Gamayas Portfolio
No ratings yet
Gamayas Portfolio
17 pages
Phase in Oxo Connect C080 en
100% (1)
Phase in Oxo Connect C080 en
2 pages
Weatherwax - Conte - Solution - Manual Capitulo 2 y 3
No ratings yet
Weatherwax - Conte - Solution - Manual Capitulo 2 y 3
59 pages
Maintenance Manual mb491 PDF
No ratings yet
Maintenance Manual mb491 PDF
298 pages
13 - Flowcharts and Loops (In C and Assembly)
No ratings yet
13 - Flowcharts and Loops (In C and Assembly)
20 pages
Salient Features of IT Act 2000
No ratings yet
Salient Features of IT Act 2000
10 pages
Bluetooth Communication Using A Touchscreen Interface With The Raspberry Pi
No ratings yet
Bluetooth Communication Using A Touchscreen Interface With The Raspberry Pi
4 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
16 pages
DSP LAB Manual - ECE - KNCET
No ratings yet
DSP LAB Manual - ECE - KNCET
60 pages
Duolingo App: Sebastián Valencia
No ratings yet
Duolingo App: Sebastián Valencia
11 pages
Federated Anomaly Detection Insights
No ratings yet
Federated Anomaly Detection Insights
21 pages
GAMMA Building Control KNX 2012
No ratings yet
GAMMA Building Control KNX 2012
324 pages
Annihilator Method
100% (1)
Annihilator Method
7 pages
Box Plot
No ratings yet
Box Plot
4 pages
C 5750 Users Guide
No ratings yet
C 5750 Users Guide
105 pages
Flux AI Image Generator Using N8n.io OpenAI
No ratings yet
Flux AI Image Generator Using N8n.io OpenAI
17 pages
SPLA Licensing Best Practices
No ratings yet
SPLA Licensing Best Practices
1 page
3408-Data Structure
No ratings yet
3408-Data Structure
3 pages
Chart - Poster - PMBOK 6th Ed Data Flow Diagram
No ratings yet
Chart - Poster - PMBOK 6th Ed Data Flow Diagram
1 page
IT Professionals: SCCM & Packaging Expertise
No ratings yet
IT Professionals: SCCM & Packaging Expertise
3 pages
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
Computer Hardware Assessment Package LS 6
No ratings yet
Computer Hardware Assessment Package LS 6
21 pages
PGDCA Project: Time Table System
No ratings yet
PGDCA Project: Time Table System
4 pages
Artificial Intelligence Questions
No ratings yet
Artificial Intelligence Questions
15 pages
Object Oriented Programming Tutorial
No ratings yet
Object Oriented Programming Tutorial
61 pages

Data Science Individual 4

Uploaded by

Data Science Individual 4

Uploaded by

R Script​

# Read and clean data

# Reshape data from wide to long format

# Extract Year,Month Date,

cleaned_data$year <- substr(cleaned_data$value,1 ,4)

cleaned_data$doy <- yday(cleaned_data$value)

#convert date and year to numeric

cleaned_data$year <- as.numeric(cleaned_data$year)

# Create the time series plot

ggplot(cleaned_data, aes(x=year, y=doy)) +

# Create base plot

#Shade April (DOY 91 to 121)

#Shade May (DOY 121 to 151)

#Annotate special bloom years

#Mark some specific years

# customize axis and labels

# subset data for a single country

#Subset only univariate

#we predict all years so no removing years

#convert it to a time series object.

Asahikawa_ts_actual <-ts(Asahikawa, start =c(1953,1), frequency =1)

#Print the timeseries data

# fitting model using auto.arima model

#Next 7 forecasted values

5. Accuracy and Interpretation:

You might also like

R Script