Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Manuel3567/master-thesis

Repository files navigation

Setup

1. Clone the Repository

git clone https://github.com/Manuel3567/master-thesis.git
cd master-thesis

2. Set up a python virtual environment

python -m venv .venv
.venv\Scripts\activate

3. Install dependencies

if you are using a CUDA-compatible GPU

pip install -r requirements_gpu.txt

otherwise

pip install -r requirements.txt

4. Run

pip install -e .

Prepare the data structure

4 data sources are used: Sources 1. to 3. are downloaded manually by clicking on links that are provided below. The files then have to be saved in a specific directory format (details see below). Source 4. is downloaded by running the code below in a jupyter notebook.

1. entsoe: 2016-2024

2. netztransparenz: EEG, legend

3. opendatasoft: PLZ

4. open_meteo

The data sources need to be downloaded in a data folder in the following order:

  1. entsoe
  2. netztransparenz
  3. opendatasoft
  4. open meteo
project_root/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ entsoe/               # Raw aggregated wind power data
β”‚   β”œβ”€β”€ netztransparenz/      # wind power data of wind park data 
β”‚   └── opendatasoft/         # PLZ list of Germany
|   └── open_meteo/           # Wind speed data

1. entsoe: 2016-2024 (LOGIN REQUIRED)

2. Netztransparenz:

3. opendatasoft:

  • Flat file formats > CSV

4. open_meteo

  • (start, end dates have to be set):
  • run this code in a new jupyter notebook
    from analysis.downloads import download_open_meteo_wind_speeds_of_10_biggest_wind_park_locations_and_at_geographic_mean
    download_open_meteo_wind_speeds_of_10_biggest_wind_park_locations_and_at_geographic_mean("2016-01-01", "2024-12-31")
    

The final data structure should look like this:

project_root/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ entsoe/               
β”‚   β”‚   β”œβ”€β”€ Actual Generation per Production Type_201601010000-201701010000.csv
β”‚   β”‚   β”œβ”€β”€ Actual Generation per Production Type_201701010000-201801010000.csv
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── Actual Generation per Production Type_202401010000-202501010000.csv
β”‚   β”œβ”€β”€ netztransparenz/      
β”‚   β”‚   β”œβ”€β”€ 50Hertz_Transmission_GmbH_EEG-Zahlungen_Stammdaten_2023.csv
β”‚   β”‚   └── anlagenstammdaten_legende.xslx
β”‚   β”œβ”€β”€ open_meteo/          
β”‚   β”‚   └── historical/
|   |       └── top_10_biggest_wind_parks_50hertz.json
β”‚   └── opendatasoft/         
β”‚       └── georef-germany-postleitzahl.csv

Train models

Specify (create) a root directory: replace output_dir with your own directory. For each of the three models, a new folder with the name of the model will be automatically created if it does not yet exist.

5.1 Baseline

Run in a Jupyter notebook

output_dir = "C:\Users\Manuel\Documents\results"
id=1
from analysis.baseline_model import *
run_baseline_model(id, output_dir)

The output of this is in "C:\Users\Manuel\Documents\results\baseline_model"

5.2 NGBoost

Run in a Jupyter notebook

from analysis.ngboost import *
from analysis.datasets import load_entsoe
output_dir = "C:\Users\Manuel\Documents\results"
entsoe = load_entsoe()

evaluate_ngboost_model(
    entsoe, 
    target_column='power', 
    dist=Normal, 
    case=1, 
    n_estimators=100, 
    learning_rate=0.03, 
    random_state=42, 
    output_file=output_dir,
    train_start = "2016-01-01",
    train_end = "2022-12-31",
    validation_start = "2023-01-01",
    validation_end = "2023-12-31"
)

5.3 TabPFN

id=1
output_dir = "C:\Users\Manuel\Documents\results"
from analysis.TabPFN import *
run_tabpfn(id, output_dir)

Evaluate models

6.1 Baseline

id=1
output_dir = "C:\Users\Manuel\Documents\results"
from analysis.baseline_model import *
calculate_scores_baseline(id, output_dir)

6.2 NGBoost

evaluation is included in evaluate_ngboost_model() 

6.3 TabPFN

id=1
output_dir = "C:\Users\Manuel\Documents\results"
from analysis.tabpfn import *
calculate_scores_tabpfn(id, output_dir)

Output

β”œβ”€β”€ output_dir/
β”‚   β”œβ”€β”€ baseline/
β”‚   β”‚   β”œβ”€β”€ experiment_1.pkl
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”œβ”€β”€ experiment_2.pkl
β”‚   β”‚   └── quantiles/
β”‚   β”‚       β”œβ”€β”€ experiment_results_1.pkl
β”‚   β”‚       └── experiment_results_2.pkl
β”‚   β”‚       └── ...
β”‚   β”œβ”€β”€ ngboost/
β”‚   β”‚   └── full_year/
β”‚   β”‚       β”œβ”€β”€ case1.xlsx
β”‚   β”‚       β”œβ”€β”€ case2.xlsx
β”‚   β”‚       β”œβ”€β”€ ...
β”‚   β”‚       β”œβ”€β”€ Merged_sheet.xlsx
β”‚   β”‚   └── q4_train/
β”‚   β”‚       β”œβ”€β”€ case1.xlsx
β”‚   β”‚       β”œβ”€β”€ case2.xlsx
β”‚   β”‚       β”œβ”€β”€ ...
β”‚   β”‚       β”œβ”€β”€ Merged_sheet.xlsx
β”‚   └── tabpfn/
β”‚       β”œβ”€β”€ experiment_1.pkl
β”‚       β”œβ”€β”€ experiment_2.pkl
β”‚       β”œβ”€β”€ ...
β”‚       └── quantiles/
β”‚           β”œβ”€β”€ experiment_results_1.pkl
β”‚           └── experiment_results_2.pkl
β”‚           └── ...

Plots

Geographic location of 50Hertz wind parks

from analysis.datasets import get_coordinates_of_grid_operators
from analysis.plots import plot_installed_capacity_scatter
aggregated_df = get_coordinates_of_grid_operators()
plot_installed_capacity_scatter(aggregated_df)

Marginal distribution of power

from analysis.preprocessor import *
import matplotlib.pyplot as plt
import seaborn as sns

preprocessor = DataPreprocessor()
preprocessor.load_data()
entsoe = preprocessor.df

plt.figure(figsize=(6, 4))
# Plot histogram and KDE
sns.histplot(entsoe['power'], bins=150, color="lightblue")
plt.title("Marginal Distribution of Power")
plt.xlabel("Power [MW]")
plt.ylabel("Frequency")
plt.legend()
plt.show()

Marginal distribution of Log power

from analysis.preprocessor import *
import matplotlib.pyplot as plt
import seaborn as sns

preprocessor = DataPreprocessor()
preprocessor.load_data()
preprocessor = preprocessor.transform_power()
entsoe = preprocessor.df

plt.figure(figsize=(6, 4))
sns.histplot(entsoe['power'], bins=150, color="lightblue")
plt.title("Marginal Distribution of Log Power")
plt.xlabel("ln(Power/Power_max + eps)")
plt.ylabel("Frequency")
plt.legend()
plt.show()

Map of Experiment ID to training details

ID Map

Method ID Features Split
Baseline 1 power, mean ws Q1 2022 / Q1 2023
Baseline 23 p_t-96 Q1 2022 / Q1 2023
Baseline 24 p_t-96 Q4 2022 / FY 2023
Baseline 25 2 mean, power_t-96 Q4 2022 / FY 2023
Baseline 26 ws 10 loc 10m + 100, P_t-96 Q4 2022 / FY 2023
Baseline 27 2 mean, ws 10 loc 10m + 100, P_t-96 Q4 2022 / FY 2023
Baseline 28 2 mean, ws 10 loc 10m + 100, P_t-96, time index Q4 2022 / FY 2023
Baseline 29 2 mean, ws 10 loc 10m + 100, P_t-96, time index 2016-2022 / FY 2023
Baseline 30 2 mean, ws 10 loc 10m + 100, P_t-96 2016-2022 / FY 2023
Baseline 31 ws 10 loc 10m + 100, P_t-96 2016-2022 / FY 2023
Baseline 32 2 mean, power_t-96 2016-2022 / FY 2023
Baseline 33 p_t-96 2016-2022 / FY 2023

Same ID but for 2016-2022 split

Method ID Features Split
NGBoost 1 p_t-96, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 2 p_t-96, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 3 2 mean ws, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 4 2 mean ws, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 5 2 mean, power_t-96, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 6 2 mean, power_t-96, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 7 ws 10 loc 10m + 100, P_t-96, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 8 ws 10 loc 10m + 100, P_t-96, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 9 2 mean, ws 10 loc 10m + 100, P_t-96, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 10 2 mean, ws 10 loc 10m + 100, P_t-96, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 11 2 mean, ws 10 loc 10m + 100, P_t-96, time index, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 12 2 mean, ws 10 loc 10m + 100, P_t-96, time index, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 13 P_t-96, time index, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 14 P_t-96, time index, Loss function = LogScore Q4 2022 / FY 2023
NGBoost 15 2 mean, time index, Loss function = CRPScore Q4 2022 / FY 2023
NGBoost 16 2 mean, time index, Loss function = LogScore Q4 2022 / FY 2023

TabPFN

ID Features Split
1 P(t-96), 2 mean ws Q1 2022 / Q1 2023
2 P(t-96), 2 mean ws Q2 2022 / Q2 2023
3 P(t-96), 2 mean ws Q3 2022 / Q3 2023
4 P(t-96), 2 mean ws Q4 2022 / Q4 2023
5 P(t-96), 2 mean ws Q4 2022 / Q1 2023
6 P(t-96), 2 mean ws Q4 2022 / Q2 2023
7 P(t-96), 2 mean ws Q4 2022 / Q3 2023
8 P(t-96), 10 ws Q4 2022 / H1 2023
9 P(t-96), 10 ws Q4 2022 / H2 2023
10 P(t-96), 2 mean+10 ws Q4 2022 / H1 2023
11 P(t-96), 2 mean+10 ws Q4 2022 / H2 2023
12 (all) Q4 2022 / H1 2023
13 (all) Q4 2022 / H2 2023
14 P(t-96), 2 mean ws Q1 2022 / Q2 2023
15 P(t-96), 2 mean ws Q3 2022 / Q2 2023
16 P(t-96), 2 mean ws Q1 2022 / Q4 2023
17 P(t-96), 2 mean ws Q1 2022 / Q3 2023
18 P(t-96), 2 mean ws Q2 2022 / Q1 2023
19 P(t-96), 2 mean ws 2022.08.01 – 2022.12.31 / FY 2023
20 P(t-96), 2 mean ws Q2 2022 / Q4 2023
21 P(t-96), 2 mean ws Q3 2022 / Q1 2023
22 P(t-96), 2 mean ws Q3 2022 / Q4 2023
34 P(t-96), 2 mean ws 2022-09-01 - 2022-12-31 / Q1 2023
35 P(t-96), 2 mean ws 2022-08-01 - 2022-12-31 / Q1 2023
36 P(t-96), 2 mean ws H2 2022 / Q1 2023
37 P(t-96), 2 mean ws FY 2022 / Q1 2023
38 power, all ws, time bin FY 2022 / Q1 2023
39 power, all ws, time bin FY 2022 / Q2 2023
40 power, all ws, time bin FY 2022 / Q3 2023
41 power, all ws, time bin FY 2022 / Q4 2023
42 power, mean ws FY 2022 / Q1 2023
43 power, mean ws FY 2022 / Q2 2023
44 power, mean ws FY 2022 / Q3 2023
45 power, mean ws FY 2022 / Q4 2023
46 power, ws at 10 loc FY 2022 / Q1 2023
47 power, ws at 10 loc FY 2022 / Q2 2023
48 power, ws at 10 loc FY 2022 / Q3 2023
49 power, ws at 10 loc FY 2022 / Q4 2023
50 power, all ws FY 2022 / Q1 2023
51 power, all ws FY 2022 / Q2 2023
52 power, all ws FY 2022 / Q3 2023
53 power, all ws FY 2022 / Q4 2023
54 power FY 2022 / Q1 2023
55 power FY 2022 / Q2 2023
56 power FY 2022 / Q3 2023
57 power FY 2022 / Q4 2023
58 power Q4 2022 / H1 2023
59 power Q4 2022 / H2 2023

Reproducability

Baseline

The results of the baseline model can be found in the jupyter notebook "notebooks/022_models.ipynb".

  1. Markdown "Reproduce Figure 5.1" in 022 models gives the code to produce figure 5.1
  2. Markdown "Reproduce Table 5.1, 5.2" in 022 models gives the code to produce the entries of tables 5.1 and 5.2. The mapping of table entries to experiment ID is given in the table below:
Feature 2016–2022 Training, mean and quantiles Q4 2022 Training, Mean
Power 33 24
Power, mean ws 32 25
Power, ws at 10 loc 31 26
Power, all ws 30 27
Power, all ws, t-bin 29 28

NGBoost

The results can be obtained by reading the excel files. Easier is to use the python code in the jupyter notebook "notebooks/022_models.ipynb". Be careful to change the pkl_file_path

  1. Markdown "Reproduce Figure 5.7" in 022 models gives the code to produce figure 5.7
  2. Markdown "Reproduce Figure 5.8" in 022 models gives the code to produce figure 5.8
  3. Markdown "Reproduce Tables 5.7" in 022 models gives the code to produce Tables 5.7 The following mapping table between experiment ID and table entries has been used
Loss Function NLL CRPS
Features
Power 2 1
Power, mean ws 6 5
Power, ws at 10 loc 8 7
Power, all ws 10 9
Power, all ws, t-bin 12 11
  1. Markdown "Reproduce Tables 5.8, 5.9" in 022 models gives the code to produce Tables 5.8, 5.9
Feature 2016–2022 Training, mean and quantiles Q4 2022 Training, Mean
Power 2 2
Power, mean ws 6 6
Power, ws at 10 loc 8 8
Power, all ws 10 10
Power, all ws, t-bin 12 12

TabPFN

  1. Markdown "reproduce table 5.3" in the same notebook produces table 5.3
  2. Table 5.4 can be reproduced by running "calculate_scores_tabpfn" with the following ids: Mean is to be calculated by averaging across rows
Score 23Q1 23Q2 23Q3 23Q4 Mean
22Q1 1 14 17 16
22Q2 18 2 19 20
22Q3 21 15 3 22
22Q4 5 6 7 4
  1. Table 5.5, 5.6 can be reproduced by running "calculate_scores_tabpfn" with the following ids:
Feature 2022 (Mean, 5%, 25%, 75%, 95%) Q4 2022 Mean
Power 54, 55, 56, 57 58, 59
Power, mean ws 42, 43, 44, 45 4, 5, 6, 7
Power, ws at 10 loc 46, 47, 48, 49 8, 9
Power, all ws 50, 51, 52, 53 10, 11
Power, all ws, t-bin 38, 39, 40, 41 12, 13
  1. Markdown "Reproduce Figure 5.5" in 022 models gives the code to produce figure 5.5

  2. Markdown "# Reproduce figure 5.6" in 022 models gives the code to produce figure 5.6

  3. The CDF and PDF figures in figure 5.2 - 5.4 can be found in the same notebook under the markdowns "reproduce pdf/cdf". Note the user can specify the sample (0,500) and whether the figure should be stored

Summary tables and figures

Can be obtained by the individual model results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages