Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views10 pages

Data Science Individual 4

The document outlines an R script for analyzing and visualizing the trends in Sakura full bloom dates over time, utilizing libraries like ggplot2 and dplyr. It includes data cleaning, reshaping, and various plots to illustrate the shift in bloom dates, which have advanced earlier in the year, likely due to climate change. Additionally, it discusses forecasting future bloom dates using an ARIMA model and highlights the implications of these trends on tourism and environmental factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

Data Science Individual 4

The document outlines an R script for analyzing and visualizing the trends in Sakura full bloom dates over time, utilizing libraries like ggplot2 and dplyr. It includes data cleaning, reshaping, and various plots to illustrate the shift in bloom dates, which have advanced earlier in the year, likely due to climate change. Additionally, it discusses forecasting future bloom dates using an ARIMA model and highlights the implications of these trends on tourism and environmental factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

R Script​

# Load libraries
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(lubridate)

# Read and clean data


data <- read_csv("full.csv")
cleaned_data <- na.omit(data)
cleaned_data[ , c("Currently Being Observed", "30 Year Average 1981-2010", "Notes")] <-
list(NULL)
View(cleaned_data)

# Reshape data from wide to long format


cleaned_data <- gather(cleaned_data, key = year, value = "value", -"Site Name")
cleaned_data$year <- as.numeric(cleaned_data$year)
View(cleaned_data)

# Extract Year,Month Date,

cleaned_data$year <- substr(cleaned_data$value,1 ,4)


cleaned_data$month <- substr(cleaned_data$value, 6 ,7)
cleaned_data$date <- substr(cleaned_data$value, 9 ,10)

cleaned_data$doy <- yday(cleaned_data$value)

#convert date and year to numeric

cleaned_data$year <- as.numeric(cleaned_data$year)


cleaned_data$month <- as.numeric(cleaned_data$month)
cleaned_data$date <- as.numeric(cleaned_data$date)

# Create the time series plot


plot(cleaned_data$year, cleaned_data$doy,
type = "l",
main = "Sakura Full Bloom Over Time",
xlab = "Time",
ylab = "Day of Year",
col = "purple")
# Plot with month on x axis and year on the fill aesthetic
ggplot(cleaned_data, aes(x= year, y=cleaned_data$doy)) +
geom_point(aes(size=3), alpha = 0.5) +
geom_smooth(method="lm", se=FALSE) +
labs(title= 'Shift in Asahikawa and Obihiro Sakura full blooms',
x= 'Year', y="Full Bloom Date (DOY)") +
theme_bw() # Apply a black and white theme

ggplot(cleaned_data, aes(x=year, y=doy)) +


geom_point() +
geom_smooth(method= lm)+
labs(title = "Shift in Sakura Full Blooms Date",
x="Year",
y="Full Bloom Date(DOY)"
)

# Create base plot


p <- ggplot(cleaned_data, aes(x=year, y=doy)) + #chat gpt fixed data_cleaned instead of
cleaned_data
geom_line(color = "purple") +

#Shade April (DOY 91 to 121)


annotate ("rect",
xmin = min(cleaned_data$year), xmax = max(cleaned_data$year), #chatgpt fixed minn
typo
ymin =91, ymax =121,
fill = "lightgreen", alpha=0.5) +
annotate("text", x=1960, y=109, label = "April", color = "darkgreen", size =6) +

#Shade May (DOY 121 to 151)


annotate("rect",
xmin= min(cleaned_data$year), xmax = max(cleaned_data$year), #chatgpt fixed
datacleaned typo
ymin = 121, ymax = 151,
fill = "lightpink", alpha =0.5) +
annotate("text", x= 1960, y= 142, label = "May", color= "darkred", size=6)+

#Annotate special bloom years


annotate("text", x=1984, y=146, label = "most delayed bloom at 1984") + #chatgpt fixed
missing +
annotate("text", x=2008, y=146, label ="2nd most delayed \nbloom at 2013", size = 3) +
annotate("text", x=2016, y=104, label ="earliest bloom at 2023", size = 3)+
#Annotate arrow
annotate("curve", x= c(2008, 1980, 2018), xend = c(2012, 1983, 2022), y=c(144,145,105),
yend = c(138, 140, 109), linewidth = 0.2, curvature = 0.2,
arrow =arrow(angle=20, length =unit(1.5,"mm"),type="closed"))+

#Mark some specific years


annotate("text",x = 1983, y= 116, label = "1983", size =3) +
annotate("text",x= 1998, y= 112, label ="1998", size =3 ) +
annotate("text", x= 2002, y= 112, label = "2002", size = 3) +
annotate("text", x= 2008, y= 113, label = "2008", size = 3) +

# customize axis and labels


scale_x_continuous(breaks = seq (min(cleaned_data$year),max(cleaned_data$year), by =
10 )) +
labs(title ="Sakura First Bloom Over Time",
x= "Year", y = "Day of Year (DOY)") +
theme_minimal()

print(p)

#FORECASTING

# subset data for a single country


Asahikawa <- subset(cleaned_data, `Site Name` == "Asahikawa") #chatGPT told me to use
backticks for this
View(Asahikawa)

#Subset only univariate


Asahikawa[, c("Site Name",'year','value','month', 'date')] <- list(NULL)

#we predict all years so no removing years

#convert it to a time series object.

Asahikawa_ts_actual <-ts(Asahikawa, start =c(1953,1), frequency =1)


Asahikawa_ts_pred <-ts(pred_Asahikawa, start =c(1953,1), frequency =1)

#Print the timeseries data


print(Asahikawa_ts_actual)
print(Asahikawa_ts_pred)
install.packages("forecast", dependencies = TRUE)
library(forecast)

# fitting model using auto.arima model


model <- auto.arima(Asahikawa_ts_actual) #chat gpt told to change from pred to actual since
# part of graph was missing

#Next 7 forecasted values


forecast_data <- forecast(model,7)
print(forecast_data)
plot(forecast_data, main = "Forecasting for Sakura Full Bloom at Asahikawa")
lines(Asahikawa_ts_actual, type="l", col = "red")# red color didnt work unable to fix

Plot 1:
This plot shows the Date(represented by the Day of the year), in which the Sakura Full bloom
was observed, between the years 1960 and 2020. We can see that the data has a large
variance and oscillated up and down. However there is still a steady trend of Sakura reaching
Full Bloom Earlier in the year as time passed on. This effect could possibly be caused by Global
warming

Plot(1B):
This is the same as the previous plot, except we have shaded the April Green, and May Purple
to help the viewer identify the month. We can see most full blooms occurred in May historically,
but recently it has been shifting towards April. Also important points like Earliest and Latest
Bloom are marked.

Plot 2:
This plot shows the recorded observations of the Date (Day of year) Sakura Full bloom in each
year from 1950 -2010 in various cities across Japan. The graph is titled Shift in Asahikawa and
Obihiro, though it includes other cities as well. This is because those two cities have a
continuous record, while other cities have mostly null values. Thus the graph is mostly
representative of those two cities. The trend line clearly shows the trend of earlier blooming that
was also mentioned in the previous graph

Plot(3)
This is the same as plot 2A, but we have added a confidence interval. It is a 95% confidence
interval which means that there is a 95% probability that the full bloom date is within the shaded
grey area. We can see that there are many blooms that actually occurred outside the shaded
area. The confidence interval doesn't seem that accurate.
Plot (4)​

Storytelling:

Around 50 years ago, Cherry blossoms could be observed in full bloom around the 130th day of
the year, or Mid May. Nowadays more recently, Sakura Full bloom may occur as Early as mid
March. The average date of Sakura first bloom has moved 1.2 days forward per decade since
1953 (odotonline.org). The graph clearly indicates this as there is a downward trend, showing
that the full bloom is occurring earlier each decade. Global warming is speculated as a reason
for this This speculation can be confirmed by the fact that the advancing of sakura occurs more
in more urbanized areas. (odotonline.org).

Over the past years , the average temperature in Japan has increased steadily. Japan’s
average temperature in 2024 was 1.48 degrees above the 30-year average for the period up to
2020, indicating the advance of climate change. (nippon.com). A study has shown that without
any human influence, the sakura bloom in Kyoto would have been 11 days later (metoffice.com).

The forecast in the graph shows that this general trend of advancing bloom will continue. This is
expected as the global temperatures are projected to continue to increase. By 2030, global
average temperatures may rise more than 2.0 degrees C past pre industrial levels. (WMO.org).
In addition to delayed bloom, there are also other issues of concern.
For example, If it doesnt become cold enough, the flower buds may go dormant and not
bloom in the next spring (odotonline.com) Further blooming earlier makes sakura more
vulnerable to cold snaps, as they are more likely to happen earlier in the year (axios.com).
Disruption to Sakura schedules and lack of blooming are expected to have an impact on
tourism. (axios.com)

5. Accuracy and Interpretation:


The black (should be red) line shows the historical data of Sakura Full bloom date Between the
years 1960 and 2023. We forecast the bloom between 2024 to 2030, with the ARIMA model and
compare it to the actual data. The projection shows the trend going downward, but does not
accurately predict the date of the year. I think it is hard to forecast with this historical data,
because there are a lot of random fluctuations and the variance is large. The 80% and 95%
confidence intervals are also shown on this graph here. Due to the fluctuations they are very
wide showing that the reliability of the forecast is low, and the forecast is hard to predict. Also
the prediction does not follow the previous pattern of random fluctuations every 1-2 years, so I
think it is inaccurate. However the ARIMA correctly captures the trend of Full Bloom becoming
Earlier as the years Pass.

You might also like