git clone https://github.com/Manuel3567/master-thesis.git
cd master-thesis
python -m venv .venv
.venv\Scripts\activate
if you are using a CUDA-compatible GPU
pip install -r requirements_gpu.txt
otherwise
pip install -r requirements.txt
pip install -e .
4 data sources are used: Sources 1. to 3. are downloaded manually by clicking on links that are provided below. The files then have to be saved in a specific directory format (details see below). Source 4. is downloaded by running the code below in a jupyter notebook.
The data sources need to be downloaded in a data folder in the following order:
- entsoe
- netztransparenz
- opendatasoft
- open meteo
project_root/
βββ data/
β βββ entsoe/ # Raw aggregated wind power data
β βββ netztransparenz/ # wind power data of wind park data
β βββ opendatasoft/ # PLZ list of Germany
| βββ open_meteo/ # Wind speed data
- login > Export Data > "Actual Generation per Production Type (year, CSV)"
- download EEG (!make sure that underscores are used as separators instead of spaces in the file name)
- download legend
- Flat file formats > CSV
- download PLZ
- (start, end dates have to be set):
- run this code in a new jupyter notebook
from analysis.downloads import download_open_meteo_wind_speeds_of_10_biggest_wind_park_locations_and_at_geographic_mean download_open_meteo_wind_speeds_of_10_biggest_wind_park_locations_and_at_geographic_mean("2016-01-01", "2024-12-31")
project_root/
βββ data/
β βββ entsoe/
β β βββ Actual Generation per Production Type_201601010000-201701010000.csv
β β βββ Actual Generation per Production Type_201701010000-201801010000.csv
β β βββ ...
β β βββ Actual Generation per Production Type_202401010000-202501010000.csv
β βββ netztransparenz/
β β βββ 50Hertz_Transmission_GmbH_EEG-Zahlungen_Stammdaten_2023.csv
β β βββ anlagenstammdaten_legende.xslx
β βββ open_meteo/
β β βββ historical/
| | βββ top_10_biggest_wind_parks_50hertz.json
β βββ opendatasoft/
β βββ georef-germany-postleitzahl.csv
Specify (create) a root directory: replace output_dir with your own directory. For each of the three models, a new folder with the name of the model will be automatically created if it does not yet exist.
Run in a Jupyter notebook
output_dir = "C:\Users\Manuel\Documents\results"
id=1
from analysis.baseline_model import *
run_baseline_model(id, output_dir)
The output of this is in "C:\Users\Manuel\Documents\results\baseline_model"
Run in a Jupyter notebook
from analysis.ngboost import *
from analysis.datasets import load_entsoe
output_dir = "C:\Users\Manuel\Documents\results"
entsoe = load_entsoe()
evaluate_ngboost_model(
entsoe,
target_column='power',
dist=Normal,
case=1,
n_estimators=100,
learning_rate=0.03,
random_state=42,
output_file=output_dir,
train_start = "2016-01-01",
train_end = "2022-12-31",
validation_start = "2023-01-01",
validation_end = "2023-12-31"
)
id=1
output_dir = "C:\Users\Manuel\Documents\results"
from analysis.TabPFN import *
run_tabpfn(id, output_dir)
id=1
output_dir = "C:\Users\Manuel\Documents\results"
from analysis.baseline_model import *
calculate_scores_baseline(id, output_dir)
evaluation is included in evaluate_ngboost_model()
id=1
output_dir = "C:\Users\Manuel\Documents\results"
from analysis.tabpfn import *
calculate_scores_tabpfn(id, output_dir)
βββ output_dir/
β βββ baseline/
β β βββ experiment_1.pkl
β β βββ ...
β β βββ experiment_2.pkl
β β βββ quantiles/
β β βββ experiment_results_1.pkl
β β βββ experiment_results_2.pkl
β β βββ ...
β βββ ngboost/
β β βββ full_year/
β β βββ case1.xlsx
β β βββ case2.xlsx
β β βββ ...
β β βββ Merged_sheet.xlsx
β β βββ q4_train/
β β βββ case1.xlsx
β β βββ case2.xlsx
β β βββ ...
β β βββ Merged_sheet.xlsx
β βββ tabpfn/
β βββ experiment_1.pkl
β βββ experiment_2.pkl
β βββ ...
β βββ quantiles/
β βββ experiment_results_1.pkl
β βββ experiment_results_2.pkl
β βββ ...
from analysis.datasets import get_coordinates_of_grid_operators
from analysis.plots import plot_installed_capacity_scatter
aggregated_df = get_coordinates_of_grid_operators()
plot_installed_capacity_scatter(aggregated_df)
from analysis.preprocessor import *
import matplotlib.pyplot as plt
import seaborn as sns
preprocessor = DataPreprocessor()
preprocessor.load_data()
entsoe = preprocessor.df
plt.figure(figsize=(6, 4))
# Plot histogram and KDE
sns.histplot(entsoe['power'], bins=150, color="lightblue")
plt.title("Marginal Distribution of Power")
plt.xlabel("Power [MW]")
plt.ylabel("Frequency")
plt.legend()
plt.show()
from analysis.preprocessor import *
import matplotlib.pyplot as plt
import seaborn as sns
preprocessor = DataPreprocessor()
preprocessor.load_data()
preprocessor = preprocessor.transform_power()
entsoe = preprocessor.df
plt.figure(figsize=(6, 4))
sns.histplot(entsoe['power'], bins=150, color="lightblue")
plt.title("Marginal Distribution of Log Power")
plt.xlabel("ln(Power/Power_max + eps)")
plt.ylabel("Frequency")
plt.legend()
plt.show()
| Method | ID | Features | Split |
|---|---|---|---|
| Baseline | 1 | power, mean ws | Q1 2022 / Q1 2023 |
| Baseline | 23 | p_t-96 | Q1 2022 / Q1 2023 |
| Baseline | 24 | p_t-96 | Q4 2022 / FY 2023 |
| Baseline | 25 | 2 mean, power_t-96 | Q4 2022 / FY 2023 |
| Baseline | 26 | ws 10 loc 10m + 100, P_t-96 | Q4 2022 / FY 2023 |
| Baseline | 27 | 2 mean, ws 10 loc 10m + 100, P_t-96 | Q4 2022 / FY 2023 |
| Baseline | 28 | 2 mean, ws 10 loc 10m + 100, P_t-96, time index | Q4 2022 / FY 2023 |
| Baseline | 29 | 2 mean, ws 10 loc 10m + 100, P_t-96, time index | 2016-2022 / FY 2023 |
| Baseline | 30 | 2 mean, ws 10 loc 10m + 100, P_t-96 | 2016-2022 / FY 2023 |
| Baseline | 31 | ws 10 loc 10m + 100, P_t-96 | 2016-2022 / FY 2023 |
| Baseline | 32 | 2 mean, power_t-96 | 2016-2022 / FY 2023 |
| Baseline | 33 | p_t-96 | 2016-2022 / FY 2023 |
Same ID but for 2016-2022 split
| Method | ID | Features | Split |
|---|---|---|---|
| NGBoost | 1 | p_t-96, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 2 | p_t-96, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 3 | 2 mean ws, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 4 | 2 mean ws, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 5 | 2 mean, power_t-96, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 6 | 2 mean, power_t-96, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 7 | ws 10 loc 10m + 100, P_t-96, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 8 | ws 10 loc 10m + 100, P_t-96, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 9 | 2 mean, ws 10 loc 10m + 100, P_t-96, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 10 | 2 mean, ws 10 loc 10m + 100, P_t-96, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 11 | 2 mean, ws 10 loc 10m + 100, P_t-96, time index, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 12 | 2 mean, ws 10 loc 10m + 100, P_t-96, time index, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 13 | P_t-96, time index, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 14 | P_t-96, time index, Loss function = LogScore | Q4 2022 / FY 2023 |
| NGBoost | 15 | 2 mean, time index, Loss function = CRPScore | Q4 2022 / FY 2023 |
| NGBoost | 16 | 2 mean, time index, Loss function = LogScore | Q4 2022 / FY 2023 |
TabPFN
| ID | Features | Split |
|---|---|---|
| 1 | P(t-96), 2 mean ws | Q1 2022 / Q1 2023 |
| 2 | P(t-96), 2 mean ws | Q2 2022 / Q2 2023 |
| 3 | P(t-96), 2 mean ws | Q3 2022 / Q3 2023 |
| 4 | P(t-96), 2 mean ws | Q4 2022 / Q4 2023 |
| 5 | P(t-96), 2 mean ws | Q4 2022 / Q1 2023 |
| 6 | P(t-96), 2 mean ws | Q4 2022 / Q2 2023 |
| 7 | P(t-96), 2 mean ws | Q4 2022 / Q3 2023 |
| 8 | P(t-96), 10 ws | Q4 2022 / H1 2023 |
| 9 | P(t-96), 10 ws | Q4 2022 / H2 2023 |
| 10 | P(t-96), 2 mean+10 ws | Q4 2022 / H1 2023 |
| 11 | P(t-96), 2 mean+10 ws | Q4 2022 / H2 2023 |
| 12 | (all) | Q4 2022 / H1 2023 |
| 13 | (all) | Q4 2022 / H2 2023 |
| 14 | P(t-96), 2 mean ws | Q1 2022 / Q2 2023 |
| 15 | P(t-96), 2 mean ws | Q3 2022 / Q2 2023 |
| 16 | P(t-96), 2 mean ws | Q1 2022 / Q4 2023 |
| 17 | P(t-96), 2 mean ws | Q1 2022 / Q3 2023 |
| 18 | P(t-96), 2 mean ws | Q2 2022 / Q1 2023 |
| 19 | P(t-96), 2 mean ws | 2022.08.01 β 2022.12.31 / FY 2023 |
| 20 | P(t-96), 2 mean ws | Q2 2022 / Q4 2023 |
| 21 | P(t-96), 2 mean ws | Q3 2022 / Q1 2023 |
| 22 | P(t-96), 2 mean ws | Q3 2022 / Q4 2023 |
| 34 | P(t-96), 2 mean ws | 2022-09-01 - 2022-12-31 / Q1 2023 |
| 35 | P(t-96), 2 mean ws | 2022-08-01 - 2022-12-31 / Q1 2023 |
| 36 | P(t-96), 2 mean ws | H2 2022 / Q1 2023 |
| 37 | P(t-96), 2 mean ws | FY 2022 / Q1 2023 |
| 38 | power, all ws, time bin | FY 2022 / Q1 2023 |
| 39 | power, all ws, time bin | FY 2022 / Q2 2023 |
| 40 | power, all ws, time bin | FY 2022 / Q3 2023 |
| 41 | power, all ws, time bin | FY 2022 / Q4 2023 |
| 42 | power, mean ws | FY 2022 / Q1 2023 |
| 43 | power, mean ws | FY 2022 / Q2 2023 |
| 44 | power, mean ws | FY 2022 / Q3 2023 |
| 45 | power, mean ws | FY 2022 / Q4 2023 |
| 46 | power, ws at 10 loc | FY 2022 / Q1 2023 |
| 47 | power, ws at 10 loc | FY 2022 / Q2 2023 |
| 48 | power, ws at 10 loc | FY 2022 / Q3 2023 |
| 49 | power, ws at 10 loc | FY 2022 / Q4 2023 |
| 50 | power, all ws | FY 2022 / Q1 2023 |
| 51 | power, all ws | FY 2022 / Q2 2023 |
| 52 | power, all ws | FY 2022 / Q3 2023 |
| 53 | power, all ws | FY 2022 / Q4 2023 |
| 54 | power | FY 2022 / Q1 2023 |
| 55 | power | FY 2022 / Q2 2023 |
| 56 | power | FY 2022 / Q3 2023 |
| 57 | power | FY 2022 / Q4 2023 |
| 58 | power | Q4 2022 / H1 2023 |
| 59 | power | Q4 2022 / H2 2023 |
The results of the baseline model can be found in the jupyter notebook "notebooks/022_models.ipynb".
- Markdown "Reproduce Figure 5.1" in 022 models gives the code to produce figure 5.1
- Markdown "Reproduce Table 5.1, 5.2" in 022 models gives the code to produce the entries of tables 5.1 and 5.2. The mapping of table entries to experiment ID is given in the table below:
| Feature | 2016β2022 Training, mean and quantiles | Q4 2022 Training, Mean |
|---|---|---|
| Power | 33 | 24 |
| Power, mean ws | 32 | 25 |
| Power, ws at 10 loc | 31 | 26 |
| Power, all ws | 30 | 27 |
| Power, all ws, t-bin | 29 | 28 |
The results can be obtained by reading the excel files.
Easier is to use the python code in the jupyter notebook "notebooks/022_models.ipynb". Be careful to change the pkl_file_path
- Markdown "Reproduce Figure 5.7" in 022 models gives the code to produce figure 5.7
- Markdown "Reproduce Figure 5.8" in 022 models gives the code to produce figure 5.8
- Markdown "Reproduce Tables 5.7" in 022 models gives the code to produce Tables 5.7 The following mapping table between experiment ID and table entries has been used
| Loss Function | NLL | CRPS |
|---|---|---|
| Features | ||
| Power | 2 | 1 |
| Power, mean ws | 6 | 5 |
| Power, ws at 10 loc | 8 | 7 |
| Power, all ws | 10 | 9 |
| Power, all ws, t-bin | 12 | 11 |
- Markdown "Reproduce Tables 5.8, 5.9" in 022 models gives the code to produce Tables 5.8, 5.9
| Feature | 2016β2022 Training, mean and quantiles | Q4 2022 Training, Mean |
|---|---|---|
| Power | 2 | 2 |
| Power, mean ws | 6 | 6 |
| Power, ws at 10 loc | 8 | 8 |
| Power, all ws | 10 | 10 |
| Power, all ws, t-bin | 12 | 12 |
- Markdown "reproduce table 5.3" in the same notebook produces table 5.3
- Table 5.4 can be reproduced by running "calculate_scores_tabpfn" with the following ids: Mean is to be calculated by averaging across rows
| Score | 23Q1 | 23Q2 | 23Q3 | 23Q4 | Mean |
|---|---|---|---|---|---|
| 22Q1 | 1 | 14 | 17 | 16 | |
| 22Q2 | 18 | 2 | 19 | 20 | |
| 22Q3 | 21 | 15 | 3 | 22 | |
| 22Q4 | 5 | 6 | 7 | 4 |
- Table 5.5, 5.6 can be reproduced by running "calculate_scores_tabpfn" with the following ids:
| Feature | 2022 (Mean, 5%, 25%, 75%, 95%) | Q4 2022 Mean |
|---|---|---|
| Power | 54, 55, 56, 57 | 58, 59 |
| Power, mean ws | 42, 43, 44, 45 | 4, 5, 6, 7 |
| Power, ws at 10 loc | 46, 47, 48, 49 | 8, 9 |
| Power, all ws | 50, 51, 52, 53 | 10, 11 |
| Power, all ws, t-bin | 38, 39, 40, 41 | 12, 13 |
-
Markdown "Reproduce Figure 5.5" in 022 models gives the code to produce figure 5.5
-
Markdown "# Reproduce figure 5.6" in 022 models gives the code to produce figure 5.6
-
The CDF and PDF figures in figure 5.2 - 5.4 can be found in the same notebook under the markdowns "reproduce pdf/cdf". Note the user can specify the sample (0,500) and whether the figure should be stored
Can be obtained by the individual model results.