Begin by downloading the games.json file from the Steam Games Dataset on Kaggle. This dataset provides comprehensive information about Steam games, which will serve as the foundation for analysis and modeling.
- Locate the downloaded
games.json.zipfile. - Extract the contents of the
.ziparchive to retrieve thegames.jsonfile. - Place the
games.jsonfile in the storage/ directory of this repository.
These steps ensure the dataset is properly organized and ready for further processing.
Install the required Python dependencies using the following command:
pip install -r requirements.txtThis will ensure all necessary libraries are installed in your environment. Python version is 3.9
Note
To make sure the dashboard is working, you will need a Chromium-based browser installed on your system.
Install it using the command:
playwright install chromiumApply the trained model to external data and generate predictions.
Command:
python run.py -mode "inference" -file "./path_to_input.csv"Arguments:
-mode: Set to "inference" to run inference on best model. -file: Path to the input CSV file. Output:
A new CSV file with predictions added as a predict column, saved in the folder specified by infer_folder in config.yaml.
Fetch the next batch of data, preprocess it, and retrain the model.
Command:
python run.py -mode "update"Arguments:
-mode: Set to "update" to process the next batch and retrain the model. Output:
Updated model and metrics saved in the storage/results/ folder. Data quality and EDA reports saved in the storage/results/reports and storage/results/eda folders.
Generate a monitoring report summarizing data quality, model metrics, and hyperparameters.
Command:
python run.py -mode "summary"Arguments:
-mode: Set to "summary" to generate the monitoring report. Output:
A monitoring_report.txt file saved in the report_storage folder.
Access dashboard by
python dashboard/app.py