R Script
# Load libraries
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(lubridate)
# Read and clean data
data <- read_csv("full.csv")
cleaned_data <- na.omit(data)
cleaned_data[ , c("Currently Being Observed", "30 Year Average 1981-2010", "Notes")] <-
list(NULL)
View(cleaned_data)
# Reshape data from wide to long format
cleaned_data <- gather(cleaned_data, key = year, value = "value", -"Site Name")
cleaned_data$year <- as.numeric(cleaned_data$year)
View(cleaned_data)
# Extract Year,Month Date,
cleaned_data$year <- substr(cleaned_data$value,1 ,4)
cleaned_data$month <- substr(cleaned_data$value, 6 ,7)
cleaned_data$date <- substr(cleaned_data$value, 9 ,10)
cleaned_data$doy <- yday(cleaned_data$value)
#convert date and year to numeric
cleaned_data$year <- as.numeric(cleaned_data$year)
cleaned_data$month <- as.numeric(cleaned_data$month)
cleaned_data$date <- as.numeric(cleaned_data$date)
# Create the time series plot
plot(cleaned_data$year, cleaned_data$doy,
type = "l",
main = "Sakura Full Bloom Over Time",
xlab = "Time",
ylab = "Day of Year",
col = "purple")
# Plot with month on x axis and year on the fill aesthetic
ggplot(cleaned_data, aes(x= year, y=cleaned_data$doy)) +
geom_point(aes(size=3), alpha = 0.5) +
geom_smooth(method="lm", se=FALSE) +
labs(title= 'Shift in Asahikawa and Obihiro Sakura full blooms',
x= 'Year', y="Full Bloom Date (DOY)") +
theme_bw() # Apply a black and white theme
ggplot(cleaned_data, aes(x=year, y=doy)) +
geom_point() +
geom_smooth(method= lm)+
labs(title = "Shift in Sakura Full Blooms Date",
x="Year",
y="Full Bloom Date(DOY)"
)
# Create base plot
p <- ggplot(cleaned_data, aes(x=year, y=doy)) + #chat gpt fixed data_cleaned instead of
cleaned_data
geom_line(color = "purple") +
#Shade April (DOY 91 to 121)
annotate ("rect",
xmin = min(cleaned_data$year), xmax = max(cleaned_data$year), #chatgpt fixed minn
typo
ymin =91, ymax =121,
fill = "lightgreen", alpha=0.5) +
annotate("text", x=1960, y=109, label = "April", color = "darkgreen", size =6) +
#Shade May (DOY 121 to 151)
annotate("rect",
xmin= min(cleaned_data$year), xmax = max(cleaned_data$year), #chatgpt fixed
datacleaned typo
ymin = 121, ymax = 151,
fill = "lightpink", alpha =0.5) +
annotate("text", x= 1960, y= 142, label = "May", color= "darkred", size=6)+
#Annotate special bloom years
annotate("text", x=1984, y=146, label = "most delayed bloom at 1984") + #chatgpt fixed
missing +
annotate("text", x=2008, y=146, label ="2nd most delayed \nbloom at 2013", size = 3) +
annotate("text", x=2016, y=104, label ="earliest bloom at 2023", size = 3)+
#Annotate arrow
annotate("curve", x= c(2008, 1980, 2018), xend = c(2012, 1983, 2022), y=c(144,145,105),
yend = c(138, 140, 109), linewidth = 0.2, curvature = 0.2,
arrow =arrow(angle=20, length =unit(1.5,"mm"),type="closed"))+
#Mark some specific years
annotate("text",x = 1983, y= 116, label = "1983", size =3) +
annotate("text",x= 1998, y= 112, label ="1998", size =3 ) +
annotate("text", x= 2002, y= 112, label = "2002", size = 3) +
annotate("text", x= 2008, y= 113, label = "2008", size = 3) +
# customize axis and labels
scale_x_continuous(breaks = seq (min(cleaned_data$year),max(cleaned_data$year), by =
10 )) +
labs(title ="Sakura First Bloom Over Time",
x= "Year", y = "Day of Year (DOY)") +
theme_minimal()
print(p)
#FORECASTING
# subset data for a single country
Asahikawa <- subset(cleaned_data, `Site Name` == "Asahikawa") #chatGPT told me to use
backticks for this
View(Asahikawa)
#Subset only univariate
Asahikawa[, c("Site Name",'year','value','month', 'date')] <- list(NULL)
#we predict all years so no removing years
#convert it to a time series object.
Asahikawa_ts_actual <-ts(Asahikawa, start =c(1953,1), frequency =1)
Asahikawa_ts_pred <-ts(pred_Asahikawa, start =c(1953,1), frequency =1)
#Print the timeseries data
print(Asahikawa_ts_actual)
print(Asahikawa_ts_pred)
install.packages("forecast", dependencies = TRUE)
library(forecast)
# fitting model using auto.arima model
model <- auto.arima(Asahikawa_ts_actual) #chat gpt told to change from pred to actual since
# part of graph was missing
#Next 7 forecasted values
forecast_data <- forecast(model,7)
print(forecast_data)
plot(forecast_data, main = "Forecasting for Sakura Full Bloom at Asahikawa")
lines(Asahikawa_ts_actual, type="l", col = "red")# red color didnt work unable to fix
Plot 1:
This plot shows the Date(represented by the Day of the year), in which the Sakura Full bloom
was observed, between the years 1960 and 2020. We can see that the data has a large
variance and oscillated up and down. However there is still a steady trend of Sakura reaching
Full Bloom Earlier in the year as time passed on. This effect could possibly be caused by Global
warming
Plot(1B):
This is the same as the previous plot, except we have shaded the April Green, and May Purple
to help the viewer identify the month. We can see most full blooms occurred in May historically,
but recently it has been shifting towards April. Also important points like Earliest and Latest
Bloom are marked.
Plot 2:
This plot shows the recorded observations of the Date (Day of year) Sakura Full bloom in each
year from 1950 -2010 in various cities across Japan. The graph is titled Shift in Asahikawa and
Obihiro, though it includes other cities as well. This is because those two cities have a
continuous record, while other cities have mostly null values. Thus the graph is mostly
representative of those two cities. The trend line clearly shows the trend of earlier blooming that
was also mentioned in the previous graph
Plot(3)
This is the same as plot 2A, but we have added a confidence interval. It is a 95% confidence
interval which means that there is a 95% probability that the full bloom date is within the shaded
grey area. We can see that there are many blooms that actually occurred outside the shaded
area. The confidence interval doesn't seem that accurate.
Plot (4)
Storytelling:
Around 50 years ago, Cherry blossoms could be observed in full bloom around the 130th day of
the year, or Mid May. Nowadays more recently, Sakura Full bloom may occur as Early as mid
March. The average date of Sakura first bloom has moved 1.2 days forward per decade since
1953 (odotonline.org). The graph clearly indicates this as there is a downward trend, showing
that the full bloom is occurring earlier each decade. Global warming is speculated as a reason
for this This speculation can be confirmed by the fact that the advancing of sakura occurs more
in more urbanized areas. (odotonline.org).
Over the past years , the average temperature in Japan has increased steadily. Japan’s
average temperature in 2024 was 1.48 degrees above the 30-year average for the period up to
2020, indicating the advance of climate change. (nippon.com). A study has shown that without
any human influence, the sakura bloom in Kyoto would have been 11 days later (metoffice.com).
The forecast in the graph shows that this general trend of advancing bloom will continue. This is
expected as the global temperatures are projected to continue to increase. By 2030, global
average temperatures may rise more than 2.0 degrees C past pre industrial levels. (WMO.org).
In addition to delayed bloom, there are also other issues of concern.
For example, If it doesnt become cold enough, the flower buds may go dormant and not
bloom in the next spring (odotonline.com) Further blooming earlier makes sakura more
vulnerable to cold snaps, as they are more likely to happen earlier in the year (axios.com).
Disruption to Sakura schedules and lack of blooming are expected to have an impact on
tourism. (axios.com)
5. Accuracy and Interpretation:
The black (should be red) line shows the historical data of Sakura Full bloom date Between the
years 1960 and 2023. We forecast the bloom between 2024 to 2030, with the ARIMA model and
compare it to the actual data. The projection shows the trend going downward, but does not
accurately predict the date of the year. I think it is hard to forecast with this historical data,
because there are a lot of random fluctuations and the variance is large. The 80% and 95%
confidence intervals are also shown on this graph here. Due to the fluctuations they are very
wide showing that the reliability of the forecast is low, and the forecast is hard to predict. Also
the prediction does not follow the previous pattern of random fluctuations every 1-2 years, so I
think it is inaccurate. However the ARIMA correctly captures the trend of Full Bloom becoming
Earlier as the years Pass.