Thanks to visit codestin.com
Credit goes to github.com

Skip to content

juaAI/ENTSOE-Hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Electricity Price Forecasting - ENTSO-E Data

Problem Statement

Objective: Forecast day-ahead electricity prices for the Germany-Luxembourg (DE-LU) bidding zone.

Electricity prices are highly volatile and influenced by multiple factors including renewable generation (wind, solar), fossil fuel availability, cross-border electricity flows, and demand patterns. Accurate price forecasting is critical for:

  • Energy traders optimizing market positions
  • Grid operators balancing supply and demand
  • Renewable generators maximizing revenue
  • Consumers managing energy costs

This project combines ENTSO-E market data with atmospheric forecast latent states to predict electricity prices, capturing the relationship between weather patterns and market dynamics.

Core Question: "Given a weather forecast valid at a specific time, what will the electricity price (or other market variable) be at that same time?"

To address this, you are provided with two data files:

  • entsoe_data_2023.csv - ENTSO-E market data (electricity demand, generation, prices, cross-border flows)
  • latent_states_enabling-muskox.zarr - Latent forecast weather dataset (encoded atmospheric predictions)

ENTSO-E Dataset (entsoe_data_2023.csv)

The ENTSO-E dataset contains hourly/15-minute resolution data for the DE-LU zone spanning the entire 2023 year with the following variables:

Electricity Price (...what will the electricity prices be?)

  • price_eur_mwh: Day-ahead electricity market price in EUR per megawatt-hour (hourly resolution)

Demand (Grid-Level Consumption)

  • load_actual_mw: Total electricity consumption across the entire DE-LU zone (households, businesses, industry)

Generation by Fuel Type (Actual Production)

Production columns (*_actual_aggregated_mw): Electricity generated and injected into the grid by all production units of that type:

  • gen_actual_biomass_actual_aggregated_mw: Biomass generation
  • gen_actual_fossil_brown_coal_lignite_actual_aggregated_mw: Lignite (brown coal) generation
  • gen_actual_fossil_hard_coal_actual_aggregated_mw: Hard coal generation
  • gen_actual_fossil_coal_derived_gas_actual_aggregated_mw: Coal-derived gas generation
  • gen_actual_fossil_gas_actual_aggregated_mw: Natural gas generation
  • gen_actual_fossil_oil_actual_aggregated_mw: Oil-based generation
  • gen_actual_nuclear_actual_aggregated_mw: Nuclear power generation
  • gen_actual_solar_actual_aggregated_mw: Solar photovoltaic generation
  • gen_actual_wind_onshore_actual_aggregated_mw: Onshore wind generation
  • gen_actual_wind_offshore_actual_aggregated_mw: Offshore wind generation
  • gen_actual_hydro_run_of_river_and_poundage_actual_aggregated_mw: Run-of-river hydroelectric generation
  • gen_actual_hydro_water_reservoir_actual_aggregated_mw: Reservoir hydroelectric generation
  • gen_actual_hydro_pumped_storage_actual_aggregated_mw: Pumped storage generation (when releasing stored energy)
  • gen_actual_geothermal_actual_aggregated_mw: Geothermal generation
  • gen_actual_waste_actual_aggregated_mw: Waste-to-energy generation
  • gen_actual_other_actual_aggregated_mw: Other generation sources
  • gen_actual_other_renewable_actual_aggregated_mw: Other renewable sources

Auxiliary consumption columns (*_actual_consumption_mw): Electricity consumed by generation facilities themselves for operations (parasitic load):

  • gen_actual_hydro_pumped_storage_actual_consumption_mw: Power used to pump water uphill (storing energy for later release)
  • gen_actual_solar_actual_consumption_mw: Auxiliary power for solar farm operations (cooling, controls, monitoring)
  • gen_actual_wind_onshore_actual_consumption_mw: Power for wind farm systems (yaw motors, heating, blade de-icing, controls)

Cross-Border Electricity Flows

⚠️ IMPORTANT LIMITATION: Current data contains exports only (Germany→neighbors). Imports are NOT included, which is a significant gap since Germany frequently imports electricity from France and other neighbors.

Cross-border flows represent electricity exports FROM Germany-Luxembourg TO neighboring countries:

  • flow_fr_mw: Exports to France (MW) - Note: Germany often imports FROM France (not captured)
  • flow_nl_mw: Exports to Netherlands (MW)
  • flow_pl_mw: Exports to Poland (MW)
  • flow_cz_mw: Exports to Czech Republic (MW)
  • flow_at_mw: Exports to Austria (MW)
  • flow_dk1_mw: Exports to Denmark West (MW)
  • flow_dk2_mw: Exports to Denmark East (MW)
  • flow_se4_mw: Exports to Sweden SE4 (MW)
  • flow_ch_mw: Exports to Switzerland (MW)

Current Data Interpretation:

  • Values represent electricity exported FROM Germany-Luxembourg TO each neighbor
  • All values are ≥0 (no negative values)
  • Low/zero values may indicate either no export OR reverse flow (import) - you can't tell which!

To get complete bidirectional flows, query both directions and calculate net flow:

export_flow = client.query_crossborder_flows(DE_LU, neighbor)  # Germany → Neighbor
import_flow = client.query_crossborder_flows(neighbor, DE_LU)  # Neighbor → Germany
net_flow = export_flow - import_flow  # positive=net export, negative=net import

Example: For France in June 2023:

  • flow_fr_mw (export) averages ~140 MW (small)
  • Missing import flow averages ~1,887 MW (large)
  • Germany was a net importer from France (not visible in current data)

Latent Weather Dataset (latent_states_enabling-muskox.zarr)

Input Features: Atmospheric Forecast Latent States (Given a weather forecast valid at time t...)

Atmospheric forecast latent states (64 channels, 3 compressed pressure levels, 90 compressed latitudes, 180 compressed longitudes) spanning the entire 2023 year.

Note: The latent inputs have global coverage (not just Europe).

What are these?

  • Pre-computed compressed representations of weather forecasts from Jua EPT2 model
  • Each latent state represents a weather forecast: given the initial weather state at time t₀, the EPT2 model predicts the weather state at time t = t₀ + leadtime, and this prediction is encoded using a Variational Auto-Encoder (VAE)
  • The timestamps in the dataset represent the valid time (t₀ + leadtime), not the initialization time
  • Example: A latent state with timestamp "2023-01-01 18:00:00" represents a weather forecast valid at 6 PM on January 1st
  • Capture patterns relevant to renewable energy generation (wind speeds, solar radiation, temperature, pressure, etc.)
  • Storage location: s3://entsoe-datasets/datasets/latent_states_enabling-muskox.zarr (requires R2 credentials - see "How to Run" section)

Zarr Dataset Structure:

Dimensions: (time_idx: 10093, channel: 64, level: 3, y: 90, x: 180)
Data variables:
  - latent_states_input   (time_idx, channel, level, y, x) float16  - EPT2 forecasted weather encoded by VAE
  - latent_states_target  (time_idx, channel, level, y, x) float16  - Actual weather state at target time (DO NOT USE)
  - timestamps            (time_idx) datetime64[ns]                  - Forecast initialization times
  - leadtimes             (time_idx) float32                         - Forecast lead times in normalized hours (N/12)
  - running_mean          (channel) float32                          - Channel-wise normalization means
  - running_variance      (channel) float32                          - Channel-wise normalization variances
  - z_original_surf       (time_idx) float32                         - Surface geopotential height
  - z_original_atmos      (time_idx) float32                         - Atmospheric reference height
  • latent_states_input: EPT2 weather predictions encoded by VAE (this is what you should use)
  • latent_states_target: Actual atmospheric state at the forecast target time (NOT for use in this task - included only for reference)
  • 64 channels: Learned atmospheric features from VAE encoder
  • 3 levels: Vertical atmospheric layers (surface, mid-level, upper level)
  • 90 × 180 grid: Global spatial coverage at reduced resolution (~2° × 2° approximately)
  • 10,093 timesteps: Hourly data throughout 2023 (note: 1,440 samples are repeated timestamps; the dataset class handles deduplication automatically)

Sample Dataset and Code

PyTorch Dataset Example

The provided latent_price_dataset.py demonstrates how to combine these two data sources:

Input: Single latent weather forecast state (already valid at timestamp + leadtime)
Output: Corresponding ENTSO-E variable(s) at the same valid time

Key Concept: The latent states are already forecast states from the model. Each latent state represents a weather forecast valid at a specific time (initialization time + leadtime). The dataset simply maps each latent state to the corresponding ENTSO-E variable at that same time.

Usage Example

from latent_price_dataset import LatentForecastDataset

# Single target variable (e.g., electricity price)
dataset = LatentForecastDataset(
    latent_zarr_path="s3://entsoe-datasets/datasets/latent_states_enabling-muskox.zarr",
    entsoe_csv_path="entsoe_data_2023.csv",
    target_variable="price_eur_mwh",  # Can be any column from the CSV
    start_date="2023-01-01",
    end_date="2023-12-25",
    channels=None,  # Use all 64 channels (or select subset like [0, 1, 2])
    normalize_target=True
)

# Get a sample
latent_input, target_value, metadata = dataset[0]
print(f"Latent input shape: {latent_input.shape}")   # (64, 3, 90, 180) - single state
print(f"Target value shape: {target_value.shape}")   # () - scalar
print(f"Target value: {target_value.item():.2f}")    # e.g., -0.45 (normalized)
print(f"Timestamp: {metadata['timestamp']}")         # e.g., 2023-01-01 06:00:00

# Multiple target variables (e.g., price + solar + load)
dataset_multi = LatentForecastDataset(
    latent_zarr_path="s3://entsoe-datasets/datasets/latent_states_enabling-muskox.zarr",
    entsoe_csv_path="entsoe_data_2023.csv",
    target_variable=["price_eur_mwh", "gen_actual_solar_actual_aggregated_mw", "load_actual_mw"],
    start_date="2023-01-01",
    end_date="2023-12-25",
    channels=None,
    normalize_target=True
)

latent_input, target_values, metadata = dataset_multi[0]
print(f"Latent input shape: {latent_input.shape}")   # (64, 3, 90, 180) - single state
print(f"Target values shape: {target_values.shape}") # (3,) - one value per target

Note: LatentPriceDataset is still available for backward compatibility but is deprecated. Use LatentForecastDataset instead.

Dataset Design Philosophy

The dataset implements a direct mapping approach: each sample consists of a single latent weather forecast state and its corresponding ENTSO-E value at the same valid time. This design reflects the fact that:

  1. Latent states are already forecasts: Each latent state from the EPT2 model represents a weather prediction valid at a specific time (initialization + leadtime)
  2. Simple, clean I/O: One weather state → one market value makes training straightforward
  3. Flexible composition: You can create sequences or forecast horizons in your model architecture, not in the dataset

Shape Reference:

  • Single target: latent (C, L, H, W)target () scalar value
  • Multi-target: latent (C, L, H, W)target (N,) one value per target

where C=channels, L=3 levels, H=90 height, W=180 width, N=num_targets

What the dataset does:

  1. Loads and aligns data sources:
# From __init__ method
self.ds = xr.open_zarr(latent_zarr_path)  # Load weather latents
self.entsoe_df = pd.read_csv(entsoe_csv_path, parse_dates=True)  # Load ENTSO-E data
self.entsoe_df = self.entsoe_df.dropna(subset=self.target_variables)  # Keep only complete data

# Match latent timestamps with ENTSO-E timestamps
valid_entsoe_times = set(self.entsoe_df.index)
entsoe_mask = np.array([ts in valid_entsoe_times for ts in self.latent_timestamps])
self.latent_timestamps = self.latent_timestamps[entsoe_mask]  # Keep only matching times
  1. Returns single latent state and corresponding target value(s):
# From __getitem__ method
timestamp = self.latent_timestamps[idx]
zarr_idx = self.latent_time_indices[idx]

# Load single latent state at this timestamp
latent_data = self.ds.latent_states_input.isel(
    time_idx=zarr_idx,
    channel=self.channels,
).values  # Shape: (channels, levels, height, width)

# Get target value(s) at the same timestamp
target_data = self.entsoe_df.loc[timestamp, self.target_variables].values

# Normalize if requested
if self.normalize_target:
    mean = self.target_stats[var]["mean"]
    std = self.target_stats[var]["std"]
    target_data = (target_data - mean) / std

Key Features:

  • Flexible target variables: Use any column(s) from the ENTSO-E CSV as prediction targets
  • Single or multi-target: Predict one variable or multiple variables simultaneously
  • Direct time matching: Each latent state maps to its corresponding ENTSO-E value at the same time
  • Clean I/O: Single latent state in → single (or multi) value out
  • Optional normalization for stable training (per-variable statistics)
  • Returns scalar for single target, 1D array for multiple targets

Code Structure

  • latent_price_dataset.py: PyTorch Dataset class that loads latent weather forecasts and aligns them with any ENTSO-E target variable(s)
  • test_dataset_integration.py: Comprehensive integration tests for the dataset (works with synthetic data, no credentials needed)
  • test_dataset_refactor.py: Unit tests for dataset logic and functionality
  • load_entsoe_data.py: Script to download and plot historical ENTSO-E market data (generation, demand, flows, prices) via API
    • ⚠️ Data has been already downloaded for you, but feel free to look at the code or download more if needed
  • plot_entsoe_data.py: Visualization functions for market data overview

Running Tests

Test the dataset implementation without requiring cloud credentials:

uv run python test_dataset_integration.py

This runs comprehensive tests including:

  • Single and multiple target variables
  • Normalized and non-normalized outputs
  • Backward compatibility checks
  • Edge cases and robustness
  • Optional real data test (if credentials available)

Important Notes

⚠️ This is a SAMPLE DATASET for demonstration purposes only

  • The latent states are FORECASTED atmospheric conditions, not historical observations
  • This dataset shows one possible approach: using weather forecasts to predict price movements
  • The weather-price relationship is particularly relevant for Germany with high renewable penetration

🔓 Participants are encouraged to extend this dataset

You should feel free to:

  • Add HISTORICAL data from ENTSO-E (past generation, demand, prices, flows)
  • Incorporate time-based features (hour of day, day of week, seasonality)
  • Use lagged price values (autoregressive features)
  • Add fuel prices, carbon prices, or other market indicators
  • Engineer features from cross-border flow patterns
  • Combine multiple data sources in creative ways

The provided code in load_entsoe_data.py shows how to fetch historical market data from the ENTSO-E API.


Time Period

Training data covers: January 2023 - December 2023


How to Run

Prerequisites

  1. Set up Cloudflare R2 credentials (required to access the latent states dataset):

    export R2_ACCESS_KEY=your_access_key_here
    export R2_SECRET_KEY=your_secret_key_here
  2. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh

    or

    wget -qO- https://astral.sh/uv/install.sh | sh

Running the Dataset Demo

Run the provided dataset example to load and visualize the latent-price dataset:

uv run python3 latent_price_dataset.py

This will:

  • Load the latent weather states from S3/R2
  • Load the ENTSO-E price data from CSV
  • Create PyTorch datasets (single and multi-target examples)
  • Print dataset statistics and sample information
  • Demonstrate both single-target and multi-target usage

Note: The demo shows how latent weather forecasts at specific times map to corresponding electricity market variables at those same times.

Alternative: Using Raw Weather Data

If you prefer to work with raw weather forecast data instead of the pre-computed latent states, you can access:

🌍 NOAA GFS (Global Forecast System) - Real-time global weather forecasts

  • Source: dynamical.org
  • Coverage: Global, 0.25° resolution (~20km)
  • Variables: Temperature, wind, precipitation, pressure, humidity, radiation, and more
  • Forecast horizon: 0-384 hours (16 days)

🌍 ERA5 ARCO (ECMWF Reanalysis) - Historical weather reanalysis

  • Source: Google Cloud Public Datasets
  • Coverage: Global, 0.25° resolution
  • Variables: 100+ atmospheric variables at multiple pressure levels
  • Time range: 1940-present, hourly resolution

Code Examples

Loading GFS forecast data for 2023:

import xarray as xr

# Open GFS forecast dataset
ds_gfs = xr.open_zarr(
    "https://data.dynamical.org/noaa/gfs/forecast/[email protected]",
    chunks="auto"
)

# Filter to 2023 date range
ds_gfs_2023 = ds_gfs.sel(init_time=slice("2023-01-01", "2023-12-31"))

print(f"Available variables: {list(ds_gfs_2023.data_vars)}")
# Access specific variables like temperature_2m, wind_u_10m, precipitation_surface, etc.

Loading ERA5 ARCO data for 2023:

import xarray as xr

# Open ERA5 ARCO dataset (single-level variables)
ds_era5 = xr.open_zarr(
    "gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3",
    chunks="auto",
    storage_options={"token": "anon"}  # Anonymous access
)

# Filter to 2023 date range
ds_era5_2023 = ds_era5.sel(time=slice("2023-01-01", "2023-12-31"))

print(f"Available variables: {list(ds_era5_2023.data_vars)}")
# Access variables like wind components, temperature, pressure, etc.

These raw weather datasets can be combined with ENTSO-E market data to create custom features for price forecasting.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages