Project to predict the outcome of Pokémon battles from the early phases of a match. Raw data (jsonl files) are loaded into a SQLite DB, preprocessed, and used to train/evaluate various ML models.
- Creates a DB connection and registers a new
Dataset(Train or Test). - For each match in the JSONL:
- Inserts match metadata via
insert_battle. - Inserts team Pokémon with
load_pokemonandload_team. - Inserts turns with
insert_turn, which in turn callsinsert_state_moveto save states and moves.
- Inserts match metadata via
- Maintains consistency with
INSERT OR IGNOREfor reference tables (type, status, moves). - Final commit of the data into the DB.
The data_analyzer/ folder contains code to extract, preprocess, select models and run experiments:
- data_analyzer/init.py — exports main functions from
lib.py. - data_analyzer/lib.py — utilities and preprocessing / I/O pipeline:
get_datapoints— builds the dataset with all normalized features (as described in the report).load_datapoints— reads preprocessed tables (Input,Output,TestInput,TestOutput).create_submission— generates submission CSVs from predictions.load_best_model— reconstructs a model from the information inmodels.json.
- data_analyzer/model_selection.py — classes and helpers for hyperparameter search and validation:
ModelTrainerand various implementations (LogisticRegressionTrainer, RandomForestClassifierTrainer, XGBClassifierTrainer, etc.) for cross‑validation and hyperparameter search.plot_history— saves validation plots.
- data_analyzer/main.py — CLI for analysis operations (save dataset, PCA, training, ensemble, etc.). Runs routines that use functions from
lib.pyandmodel_selection.py.
- main.py contains the entire pipeline to reproduce the results; it was later converted into a notebook for the Kaggle challenge.
- Main script: main.py
- DB creation: analisi/create_db.sql
- LaTeX report template: Latex/
- Saved models info: models.json
- PCA importance/features: pca.json