A complete end-to-end data science project that analyses 175 years of global surface temperature data (1850–2026) and builds multiple forecasting models to predict future climate trends.
- Zero data preprocessing headaches — uses the NOAA GCAG dataset with no missing values
- 5 models compared — Linear Regression, Random Forest, SVR, ARIMA, and Facebook Prophet
- Hyperparameter tuning via GridSearchCV (not arbitrary defaults)
- Proper time-series methodology — no-shuffle train/test split, lag features, stationarity testing
- Future forecast up to 2030 with a correctly implemented rolling prediction loop
| Property | Details |
|---|---|
| Source | NOAA GCAG via datahub.io |
| Coverage | January 1850 — present (updated monthly) |
| Rows | ~2,100+ months |
| Missing values | None |
| Target variable | Temperature anomaly (°C deviation from 20th-century average) |
No download required. The dataset is fetched automatically inside the notebook via a single URL.
| Model | Type | Notes |
|---|---|---|
| Linear Regression | ML Baseline | Interpretable, fast |
| Random Forest | ML Ensemble | Best accuracy; tuned with GridSearchCV |
| SVR | ML Kernel | Robust to outliers |
| ARIMA(2,1,2) | Classical Time-Series | Principled sequential model |
| Facebook Prophet | Time-Series | Handles trend + seasonality automatically |
global-temperature-forecasting/
│
├── Weather_Forecasting_NOAA_Portfolio.ipynb
├── requirements.txt
├── README.md
└── .gitignore
- Global surface temperatures have risen +1.3°C above the 20th-century baseline
- Warming rate since 1980 is approximately +0.2°C per decade
- Random Forest achieved the best predictive accuracy on test data
- The 12-month lag feature was the most important predictor, capturing annual seasonality
- Python 3.10+
pandas,numpy— data manipulationmatplotlib,seaborn— visualisationscikit-learn— ML models and evaluationstatsmodels— ARIMA, ADF test, seasonal decompositionprophet— Facebook Prophet forecasting