0% found this document useful (0 votes)

18 views4 pages

Using Stacking Approaches For Machine Learning Models

This paper explores the use of stacking approaches to enhance the performance of machine learning models for time series forecasting and logistic regression. The study demonstrates that combining predictions from multiple models can lead to improved accuracy, particularly in competitions like Kaggle. The authors detail their methodology, including the use of various classifiers and the importance of feature selection, resulting in successful predictive models.

Uploaded by

Novin Nekuee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

Using Stacking Approaches For Machine Learning Models

Uploaded by

Novin Nekuee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

IEEE Second International Conference on Data Stream Mining & Processing

August 21-25, 2018, Lviv, Ukraine

Using Stacking Approaches for

Machine Learning Models
Bohdan Pavlyshenko
SoftServe, Inc., Ivan Franko National University of Lviv
Lviv, Ukraine
[email protected], [email protected]

Abstract—In this paper, we study the usage of stacking another type of a classifier, e.g. Random Forest classifier or
approach for building ensembles of machine learning models. Neural Network. In our study, linear regression and machine
The cases for time series forecasting and logistic regression have learning models were from scikit-learn python package, neural
been considered. The results show that using stacking technics
we can improve performance of predictive models in considered network was from Keras python package. It is important to
cases. mention that in case of time series prediction, we cannot use
Index Terms—machine learning, stacking, forecasting, classifi- a conventional cross validation approach, we have to split a
cation, regression historical data set on the training set and validation set by
using time splitting, so the training data will lie in the first time
I. I NTRODUCTION period and the validation set - in the next one. Fig. 1 shows
One of effective approaches in machine learning classifi- the time series forecasting on the validation sets obtained using
cation and regression problems is stacking. The main idea of different models. Predictions on the validation sets are treated
stacking is using predictions of machine learning models from as regressors for the linear model with Lasso regularization.
the previous level as input variables for models on the next Fig. 2 shows the results obtained on the second-level with
level. Using multilevel models with stacking approach is very linear regularized model. Only two models from the first level
popular among the participants of Kaggle [1] community. (gradientBoosting and ExtraTree) have non zero coefficients
On Kaggle platform, different business companies propose for their results. For other cases of sales datasets, the results
their problems with datasets for data scientists competitions to can be different and the other models from the first level can
develop predictive models with the best accuracy. Time series play more essential role in the forecasting.
can be analysed by different approaches such as ARIMA,
linear models, machine learning models [2]. III. S ALES TIME SERIES FORECASTING
In this study, we consider the applying of stacking approach The company Grupo Bimbo organized Kaggle competition
to predictive models for time series and for logistic regres- ’Grupo Bimbo Inventory Demand’ [5]. In this competition,
sions. Grupo Bimbo invited Kagglers to develop a model to forecast
accurately the inventory demand based on historical sales data.
II. U SING L INEAR R EGRESSION FOR M ODELS S TACKING I had a pleasure to be a teammate of a great team ’The Slippery
We are going to consider several simple cases of approaches Appraisals’ which won this competition among nearly two
in the sales times series forecasting. For our study, we used thousand teams. We proposed the best scored solution for
the data set from Kaggle competition ’Rossmann Store Sales’ sales prediction in more than 800,000 stores for more than
[3]. Combining different predictive models with different sets 1000 products. Our first place solution can be found at [6].
of features into one ensemble, one can improve the result To built our final multilevel model, we exploited AWS server
accuracy.. There are two main approaches for model ensem- with 128 cores and 2Tb RAM. For our solution, we used a
bling - bagging and stacking. Bagging is a simple approach multilevel model, which consists of three levels (Fig. 3). We
when we use weighted blending of different model predictions. built a lot of models on the 1st level. The training method
Such models use different types of classifiers with different of most 1st level models was XGBoost. On the second level,
sets of features and meta parameters. If forecasting errors of we used a stacking approach when the results from the first
these models have weak correlation, then these errors will be level classifiers were treated as the features for the classifiers
compensated by each other under the weighted blending. The on the second level. For the second level, we used ExtraTrees
less is the error correlation of model results, the more precise classifier, the linear model from Python scikit-learn and Neural
forecasting result we will receive. Let us consider the stacking Networks. On the third level, we applied a weighted average
technic [4] for building ensemble of predictive models. In to the second level results. The most important features are
such an approach, the results of predictions on the validation based on the lags of the target variable grouped by factors and
set are treated as input regressors for the next level models. their combinations, aggregated features (min, max, mean, sum)
As the next level model, we can consider a linear model or of target variable grouped by factors and their combinations,

978-1-5386-2874-4/18/$31.00 2018 IEEE 255

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Different methods for time series forecasting

approach in manufacturing failure detection was considered.

As a data set for the analysis, we used the data from Kaggle
competition ’Bosch Production Line Performance’ [11]. The
data set has a lot of anonymized features. For modeling we
used linear, machine learning and Bayesian approaches. To
find influence of different factors we exploited the generalized
linear model. Using Bayesian approach for logistic regression,
we can get the probability distribution function for model
parameters. Having statistical distribution we can make risk
assessments. To build machine learning models, we used
XGBoost classifier from R package ’xgboost’ [7], [9], [10].
Fig. 2. Coefficients for stacking linear regression The data in this set have highly imbalanced classes. To reduce
this problem we used undersampling approach. The samples
with positive value 1 for target variable were retained without
frequency features of factors variables. One of the main ideas changes. The samples with value 0 for target variable were
in our approach is that it is important to know what were the randomly sampled, so the total number of data items was
previous week sales. If during the previous week too many reduced. For categorical features we used one-hot encoding.
products were supplied and they were not sold, next week this The results of the classification were the probability for posi-
product amount, supplied to the same store, will be decreased. tive responses. Combining machine learning models and linear
So, it is very important to include lagged values of target or Bayesian models on different levels can give us improved
variable as a feature to predict next sales. More details about results for logistic regression. Fig. 4 shows a diagram of such
our team’s winer solution are at [6]. The simplified version of possible stacking model. On the first level, there are different
the R script is at [8]. Our winner solution may seem to be too XGBoost classifiers with different sets of features and subsets
complicated, but our goal was to win the competition and even of samples. On the second level, probabilities predicted on the
a small improvement in forecasting score required essential first level can be blended with appropriate weights using linear
numbers of machine learning models in the final ensemble. or Bayesian regression. To evaluate classification performance
Real business cases with a sufficient accuracy can be simpler. we used Matthews correlation coefficient (MCC):
IV. S TACKING APPROACH FOR LOGISTIC REGRESSION TP · TN − FP · FN
M CC = p
Let us consider using stacking approach for logistic regres- (T P + F P )(T P + F N )(T N + F P )(T N + F N )
sions problems. In the Kaggle competition ’Bosch Production where T P represents the number of true positives, T N
Line Performance’ [11], the problem of internal failures represents the number of true negatives, F P represents the
on assembly lines was considered. The data set consists of number of false positives, and F N represents the number
measurements for components on assembly lines. This case is of false negatives. Let us consider the use of generalized
a type of logistic regression problem with highly imbalanced linear model for stacking logistic regression with independent
classes. The problem lies in predicting which parts will fail variables which are the probabilities predicted by XGBoost
a quality control. In the work [12], the logistic regression models on the first levels.

256

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.
Level 3
w1 ET + w2 LM + w3 N N

ExtraTree Linear Neural Network

Level 2
Model Model Model

XGBoost XGBoost ..... Classifier m

Level 1
Model 1 Model 2 Model n

Fig. 3. Mulitlievel machine learning model for sales time series forecasting

Linear Bayesian
Level 2
Model Model

XGBoost XGBoost XGBoost

Level 1 Parameters Set 1 Parameters Set 2 ..... Parameters Set n
Samples Set 1 Samples Set 2 Samples Set n

Fig. 4. Stacking model for logistic regression

We used different sets of parameters for 3 XGBoost models, more precise results in comparison with single models. For
they are - set 1: max.depth = 15, colsample bytree = 0.7; set stacking machine learning models the linear regression with
2: max.depth = 5, colsample bytree = 0.7; set 3: max.depth = Lasso regularization, other machine learning model, Bayesian
15, colsample bytree = 0.3. For these 3 models, we used the model can be used. Using stacking model on the second level
same subset of samples. Fig. 5, 6 show the dependence of with the covariates that are predicted by machine learning
Matthews correlation coefficient from a probability threshold models on the first level, makes it possible to take into account
for different subsets of features, where features set 2 is features the differences in results for machine learning models received
set 1 with 4 added magic features. So-called magic features for different sets of parameters and subsets of samples. As
which are based on the ID of samples were considered by the obtained results show, using stacking approach for machine
participants of the competition at [13]–[15]. For Bayesian learning models we can improve performance of predictive
models, we used the same 3 subsets of parameters with models.
different subsets of samples. As it was shown above, for
different samples subsets, we received slightly different results R EFERENCES
for Matthews correlation coefficient. These differences can
[1] Kaggle: Your Home for Data Science. URL: http://kaggle.com
be taken into account using Bayesian model. For Bayesian [2] B. M. Pavlyshenko. “Linear, machine learning and probabilistic ap-
inference we used JAGS sampling software [16], [17]. We proaches for time series analysis,” in IEEE First International Conference
used Bayesian model for logistic regression. As covariates we on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, pp. 377-
381, August 23-27, 2016.
used the probabilities predicted by three XGBoost models.
[3] “Rossmann Store Sales”, Kaggle.Com, URL:
Fig. 7 shows the boxplots for coefficients of probabilities http://www.kaggle.com/c/rossmann-store-sales .
predicted by different XGBoost models. [4] D. H. Wolpert. “Stacked generalization.” Neural networks, 5(2), pp. 241-
259,1992.
[5] Kaggle competition “Grupo Bimbo Inventory Demand ” URL:
V. C ONCLUSION https://www.kaggle.com/c/grupo-bimbo-inventory-demand
In our study, we considered stacking approaches for time [6] Kaggle competition “Grupo Bimbo Inventory Demand”
#1 Place Solution of The Slippery Appraisals team.
series forecasting and logistic regression with highly imbal- URL:https://www.kaggle.com/c/grupo-bimbo-inventory-
anced data. Using multilevel stacking models, one can receive demand/discussion/23863

257

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Matthews coefficient for different XGBoost parameter sets (feature set 1)

Fig. 6. Matthews coefficient for different XGBoost parameter sets (feature set 2)

Data Analysis, 38(4):367-378, 2002.

[11] Kaggle competition “Bosch Production Line Performance”. URL:
https://www.kaggle.com/c/bosch-production-line-performance
[12] B. Pavlyshenko. “Machine learning, linear and bayesian models for
logistic regression in failure detection problems.,’ in IEEE International
Conference on Big Data (Big Data), Washington D.C., USA, pp. 2046-
2050, December 5-8, 2016.
[13] Kaggle competition “Bosch Production Line Performance”. The Magical
Feature : from LB 0.3- to 0.4+. URL:https://www.kaggle.com/c/bosch-
production-line-performance/forums/t/24065/the-magical-feature-from-
lb-0-3-to-0-4
[14] Kaggle competition “Bosch Production Line Performance”. Road-
2-0.4+. URL:https://www.kaggle.com/mmueller/bosch-production-line-
performance/road-2-0-4
[15] Kaggle competition “Bosch Production Line Performance”. Road-2-0.4+
–>FeatureSet++. URL: https://www.kaggle.com/alexxanderlarko/bosch-
production-line-performance/road-2-0-4-featureset
[16] John Kruschke. Doing Bayesian data analysis: A tutorial with R, JAGS,
and Stan. Academic Press, 2014.
[17] Martyn Plummer. JAGS Version 3.4.0 user manual.
URL:http://sourceforge.net/projects/mcmcjags/files/Manuals/3.x/
jags user manual.pdf

Fig. 7. Boxplots for coefficients of probabilities predicted by different

XGBoost models

[7] T. Chen and C. Guestrin. “Xgboost: A scalable tree boosting system.”

In Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining. ACM. 2016, pp. 785-794.
[8] Kaggle competition “Grupo Bimbo Inventory Demand” Bimbo XGBoost
R script LB:0.457. URL: https://www.kaggle.com/bpavlyshenko/bimbo-
xgboost-r-script-lb-0-457
[9] J. Friedman. “Greedy function approximation: a gradient boosting ma-
chine.”, Annals of Statistics, 29(5):1189-1232, 2001.
[10] J. Friedman. “Stochastic gradient boosting.”, Computational Statistics &

258

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.

Essentials of Econometrics Damodar Gujarati Z Library
No ratings yet
Essentials of Econometrics Damodar Gujarati Z Library
52 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Improving Regressors Using Boosting Techniques: Observations, XX
No ratings yet
Improving Regressors Using Boosting Techniques: Observations, XX
9 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Solar Power Forecasting With Machine Learning Techniques: Emil Isaksson Mikael Karpe Conde
No ratings yet
Solar Power Forecasting With Machine Learning Techniques: Emil Isaksson Mikael Karpe Conde
64 pages
Business Analytics: Machine Learning Models
No ratings yet
Business Analytics: Machine Learning Models
10 pages
Titanic Dataset Variables
No ratings yet
Titanic Dataset Variables
3 pages
Texto para Discussão: Departamento de Economia
No ratings yet
Texto para Discussão: Departamento de Economia
43 pages
R Data Analysis
No ratings yet
R Data Analysis
10 pages
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
No ratings yet
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
11 pages
A Comparative Analysis On Linear Regression and Support Vector Regression
No ratings yet
A Comparative Analysis On Linear Regression and Support Vector Regression
5 pages
Pavlyshenko (2019) Machine-Learning Models For Sales Time Series Forecasting. Data-04-00015-V2
No ratings yet
Pavlyshenko (2019) Machine-Learning Models For Sales Time Series Forecasting. Data-04-00015-V2
11 pages
Mathematics: Solving Regression Problems With Intelligent Machine Learner For Engineering Informatics
No ratings yet
Mathematics: Solving Regression Problems With Intelligent Machine Learner For Engineering Informatics
25 pages
Improvizing Big Market Sales Prediction: Meghana N
No ratings yet
Improvizing Big Market Sales Prediction: Meghana N
7 pages
Paper Prediction: Ostl Mini Project 1.hrutwika Ambavane 2.juili Kadu 3. Bhavesh Bawankar 4.akshat Singh
No ratings yet
Paper Prediction: Ostl Mini Project 1.hrutwika Ambavane 2.juili Kadu 3. Bhavesh Bawankar 4.akshat Singh
13 pages
Machine Learning Advances For Time Series Forecasting: Ricardo P. Masini
No ratings yet
Machine Learning Advances For Time Series Forecasting: Ricardo P. Masini
44 pages
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
No ratings yet
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
11 pages
Time Series
100% (1)
Time Series
91 pages
Innovation Strategy Complementarity
No ratings yet
Innovation Strategy Complementarity
38 pages
Fundamentals of Machine Learning
No ratings yet
Fundamentals of Machine Learning
23 pages
ML for Time Series in Engineering
No ratings yet
ML for Time Series in Engineering
54 pages
Application of Predictive Analytics in Volume Forecasting and Resource Planning
No ratings yet
Application of Predictive Analytics in Volume Forecasting and Resource Planning
69 pages
ML Unit II Modelling Notes
No ratings yet
ML Unit II Modelling Notes
18 pages
E Commerce
No ratings yet
E Commerce
20 pages
ICSCSP 2021 Proceedings-477-488
No ratings yet
ICSCSP 2021 Proceedings-477-488
12 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Application and Comparison of Several Machine Learning Algorithms and Their Integration Models in Regression Problems
No ratings yet
Application and Comparison of Several Machine Learning Algorithms and Their Integration Models in Regression Problems
9 pages
MISTIE
No ratings yet
MISTIE
10 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Regresi Logistik - Bahan
No ratings yet
Regresi Logistik - Bahan
89 pages
Ads - Phase 2
No ratings yet
Ads - Phase 2
6 pages
Classification Models
No ratings yet
Classification Models
3 pages
Supervised Learning - Basics
No ratings yet
Supervised Learning - Basics
115 pages
Loan Default Prediction Using Supervised Machine Learning Algorithms
No ratings yet
Loan Default Prediction Using Supervised Machine Learning Algorithms
70 pages
Jurnal Trauma Thorax
No ratings yet
Jurnal Trauma Thorax
5 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Sales Forecasting with Kernel SVM
No ratings yet
Sales Forecasting with Kernel SVM
6 pages
Supermarket Sales Forecasting Model
No ratings yet
Supermarket Sales Forecasting Model
3 pages
LSTM Paper
No ratings yet
LSTM Paper
10 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Lecture 1-3 Research Methods, Final
No ratings yet
Lecture 1-3 Research Methods, Final
154 pages
Jurnal Inter FD
No ratings yet
Jurnal Inter FD
22 pages
Methods and Models
No ratings yet
Methods and Models
12 pages
Thesis Presentation
No ratings yet
Thesis Presentation
23 pages
Machine Learning Based Student AcademicPerformance Prediction
No ratings yet
Machine Learning Based Student AcademicPerformance Prediction
6 pages
Zeybek. 2008. Stature and Gender Estimation Using Foot
No ratings yet
Zeybek. 2008. Stature and Gender Estimation Using Foot
5 pages
Sales Abstract
No ratings yet
Sales Abstract
1 page
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Admission Prediction Model Study
No ratings yet
Admission Prediction Model Study
4 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
NK DT Project
No ratings yet
NK DT Project
54 pages
Modeling Sustainable Earnings and PE Ratios With FSA
No ratings yet
Modeling Sustainable Earnings and PE Ratios With FSA
57 pages
Salespredmmmm
No ratings yet
Salespredmmmm
15 pages
ML 2
No ratings yet
ML 2
155 pages
Carcello, J.V. and T. L. Neal
No ratings yet
Carcello, J.V. and T. L. Neal
35 pages
Cs3491 Aiml Unit 3 Qbank
No ratings yet
Cs3491 Aiml Unit 3 Qbank
50 pages
A Project Based On Python
No ratings yet
A Project Based On Python
17 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Logistic Regression-Advanced Biostat PDF
No ratings yet
Logistic Regression-Advanced Biostat PDF
86 pages
DeepAR - Probabilistic Forecasting With Autoregressive Recurrent Networks
No ratings yet
DeepAR - Probabilistic Forecasting With Autoregressive Recurrent Networks
11 pages
A Working Guide To Boosted Regression Trees: J. Elith, J. R. Leathwick and T. Hastie
No ratings yet
A Working Guide To Boosted Regression Trees: J. Elith, J. R. Leathwick and T. Hastie
12 pages
Calibration in Predictive Analytics
No ratings yet
Calibration in Predictive Analytics
7 pages
Cassava Farmers' Herbicide Use
No ratings yet
Cassava Farmers' Herbicide Use
12 pages
Predictive Analys
No ratings yet
Predictive Analys
34 pages
CPEN106 Machine Learning and Predictive Analytics
No ratings yet
CPEN106 Machine Learning and Predictive Analytics
43 pages
Statistical Inference Final Project Hamza
No ratings yet
Statistical Inference Final Project Hamza
25 pages
Unit III - Data Visualization and Representation
No ratings yet
Unit III - Data Visualization and Representation
17 pages
Aml Weather
No ratings yet
Aml Weather
6 pages
M5 Dataset Model
No ratings yet
M5 Dataset Model
13 pages
ForecastingRetailSalesusingMachine Learning Models
No ratings yet
ForecastingRetailSalesusingMachine Learning Models
34 pages
Test Cricket Player Rating System
No ratings yet
Test Cricket Player Rating System
13 pages
Caselet Summery
No ratings yet
Caselet Summery
12 pages
Stacking
No ratings yet
Stacking
2 pages
Paper 49
No ratings yet
Paper 49
10 pages
A Comparative Online Sales Forecasting Analysis: Data Mining Techniques
No ratings yet
A Comparative Online Sales Forecasting Analysis: Data Mining Techniques
13 pages
Classification & Prediction Methods
No ratings yet
Classification & Prediction Methods
21 pages
ICS423 IoT Syllabus
No ratings yet
ICS423 IoT Syllabus
2 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
SYLLABUS Predictive Analysic
No ratings yet
SYLLABUS Predictive Analysic
3 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Ammmp2023 87 94
No ratings yet
Ammmp2023 87 94
8 pages
Colorectal Cancer
No ratings yet
Colorectal Cancer
12 pages
OKX Trade Simulator Code Explanation
No ratings yet
OKX Trade Simulator Code Explanation
3 pages
AI ML K6rn1i 54 Merged
No ratings yet
AI ML K6rn1i 54 Merged
6 pages
Business Forecasting Techniques Using Machine Learning Using R Programming
No ratings yet
Business Forecasting Techniques Using Machine Learning Using R Programming
11 pages
Toxic Comment Analysis For Online Learning
No ratings yet
Toxic Comment Analysis For Online Learning
6 pages

Using Stacking Approaches For Machine Learning Models

Uploaded by

Using Stacking Approaches For Machine Learning Models

Uploaded by

IEEE Second International Conference on Data Stream Mining & Processing

August 21-25, 2018, Lviv, Ukraine

Using Stacking Approaches for

978-1-5386-2874-4/18/$31.00 2018 IEEE 255

approach in manufacturing failure detection was considered.

ExtraTree Linear Neural Network

XGBoost XGBoost ..... Classifier m

XGBoost XGBoost XGBoost

Fig. 4. Stacking model for logistic regression

Data Analysis, 38(4):367-378, 2002.

Fig. 7. Boxplots for coefficients of probabilities predicted by different

[7] T. Chen and C. Guestrin. “Xgboost: A scalable tree boosting system.”

You might also like