Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

2nd place solution for Sberbank Data Science Journey 2018 AutoML competition

Notifications You must be signed in to change notification settings

antklen/sdsj2018_solution

Repository files navigation

Sberbank Data Science Journey 2018: AutoML

2nd place solution for Sberbank Data Science Journey 2018 AutoML competition

main scripts:
train.py - training
predict.py - prediction on test data

Preprocessing

Apart from basic preprocessing (extracting datetime features, encoding categorical variables, drop constant features):

  • is_holiday flag for each datetime feature
  • mean target encoding for categorical features
  • for dealing with memory issues:
    • read small part of data, define data types, read entire data with float32 instead of float64
    • parse datetime while reading

Machine learning approach

  • LightGBM
  • Hyperopt for parameter tuning
    after each step check if time limit is not exceeded, then continue
  • ensemble (blending) of best models from hyperopt
    • during hyperopt iterations remember all models that were trained
    • choose 5 best models in the end
    • blend them with stepwise blending
      Caruana et al. (2004) Ensemble Selection from Libraries of Models

Special cases

  • Very small data
    • don't use target encoding (to prevent overfitting)
    • don't optimize parameters at all (to prevent overfitting)
    • run several models (LightGBM, XGBoost, RF, ET) with random parameters and average them
  • Very big data
    • simple feature selection from LightGBM feature importance

About

2nd place solution for Sberbank Data Science Journey 2018 AutoML competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages