Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views4 pages

Using Stacking Approaches For Machine Learning Models

This paper explores the use of stacking approaches to enhance the performance of machine learning models for time series forecasting and logistic regression. The study demonstrates that combining predictions from multiple models can lead to improved accuracy, particularly in competitions like Kaggle. The authors detail their methodology, including the use of various classifiers and the importance of feature selection, resulting in successful predictive models.

Uploaded by

Novin Nekuee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Using Stacking Approaches For Machine Learning Models

This paper explores the use of stacking approaches to enhance the performance of machine learning models for time series forecasting and logistic regression. The study demonstrates that combining predictions from multiple models can lead to improved accuracy, particularly in competitions like Kaggle. The authors detail their methodology, including the use of various classifiers and the importance of feature selection, resulting in successful predictive models.

Uploaded by

Novin Nekuee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IEEE Second International Conference on Data Stream Mining & Processing

August 21-25, 2018, Lviv, Ukraine

Using Stacking Approaches for


Machine Learning Models
Bohdan Pavlyshenko
SoftServe, Inc., Ivan Franko National University of Lviv
Lviv, Ukraine
[email protected], [email protected]

Abstract—In this paper, we study the usage of stacking another type of a classifier, e.g. Random Forest classifier or
approach for building ensembles of machine learning models. Neural Network. In our study, linear regression and machine
The cases for time series forecasting and logistic regression have learning models were from scikit-learn python package, neural
been considered. The results show that using stacking technics
we can improve performance of predictive models in considered network was from Keras python package. It is important to
cases. mention that in case of time series prediction, we cannot use
Index Terms—machine learning, stacking, forecasting, classifi- a conventional cross validation approach, we have to split a
cation, regression historical data set on the training set and validation set by
using time splitting, so the training data will lie in the first time
I. I NTRODUCTION period and the validation set - in the next one. Fig. 1 shows
One of effective approaches in machine learning classifi- the time series forecasting on the validation sets obtained using
cation and regression problems is stacking. The main idea of different models. Predictions on the validation sets are treated
stacking is using predictions of machine learning models from as regressors for the linear model with Lasso regularization.
the previous level as input variables for models on the next Fig. 2 shows the results obtained on the second-level with
level. Using multilevel models with stacking approach is very linear regularized model. Only two models from the first level
popular among the participants of Kaggle [1] community. (gradientBoosting and ExtraTree) have non zero coefficients
On Kaggle platform, different business companies propose for their results. For other cases of sales datasets, the results
their problems with datasets for data scientists competitions to can be different and the other models from the first level can
develop predictive models with the best accuracy. Time series play more essential role in the forecasting.
can be analysed by different approaches such as ARIMA,
linear models, machine learning models [2]. III. S ALES TIME SERIES FORECASTING
In this study, we consider the applying of stacking approach The company Grupo Bimbo organized Kaggle competition
to predictive models for time series and for logistic regres- ’Grupo Bimbo Inventory Demand’ [5]. In this competition,
sions. Grupo Bimbo invited Kagglers to develop a model to forecast
accurately the inventory demand based on historical sales data.
II. U SING L INEAR R EGRESSION FOR M ODELS S TACKING I had a pleasure to be a teammate of a great team ’The Slippery
We are going to consider several simple cases of approaches Appraisals’ which won this competition among nearly two
in the sales times series forecasting. For our study, we used thousand teams. We proposed the best scored solution for
the data set from Kaggle competition ’Rossmann Store Sales’ sales prediction in more than 800,000 stores for more than
[3]. Combining different predictive models with different sets 1000 products. Our first place solution can be found at [6].
of features into one ensemble, one can improve the result To built our final multilevel model, we exploited AWS server
accuracy.. There are two main approaches for model ensem- with 128 cores and 2Tb RAM. For our solution, we used a
bling - bagging and stacking. Bagging is a simple approach multilevel model, which consists of three levels (Fig. 3). We
when we use weighted blending of different model predictions. built a lot of models on the 1st level. The training method
Such models use different types of classifiers with different of most 1st level models was XGBoost. On the second level,
sets of features and meta parameters. If forecasting errors of we used a stacking approach when the results from the first
these models have weak correlation, then these errors will be level classifiers were treated as the features for the classifiers
compensated by each other under the weighted blending. The on the second level. For the second level, we used ExtraTrees
less is the error correlation of model results, the more precise classifier, the linear model from Python scikit-learn and Neural
forecasting result we will receive. Let us consider the stacking Networks. On the third level, we applied a weighted average
technic [4] for building ensemble of predictive models. In to the second level results. The most important features are
such an approach, the results of predictions on the validation based on the lags of the target variable grouped by factors and
set are treated as input regressors for the next level models. their combinations, aggregated features (min, max, mean, sum)
As the next level model, we can consider a linear model or of target variable grouped by factors and their combinations,

978-1-5386-2874-4/18/$31.00 2018 IEEE 255

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Different methods for time series forecasting

approach in manufacturing failure detection was considered.


As a data set for the analysis, we used the data from Kaggle
competition ’Bosch Production Line Performance’ [11]. The
data set has a lot of anonymized features. For modeling we
used linear, machine learning and Bayesian approaches. To
find influence of different factors we exploited the generalized
linear model. Using Bayesian approach for logistic regression,
we can get the probability distribution function for model
parameters. Having statistical distribution we can make risk
assessments. To build machine learning models, we used
XGBoost classifier from R package ’xgboost’ [7], [9], [10].
Fig. 2. Coefficients for stacking linear regression The data in this set have highly imbalanced classes. To reduce
this problem we used undersampling approach. The samples
with positive value 1 for target variable were retained without
frequency features of factors variables. One of the main ideas changes. The samples with value 0 for target variable were
in our approach is that it is important to know what were the randomly sampled, so the total number of data items was
previous week sales. If during the previous week too many reduced. For categorical features we used one-hot encoding.
products were supplied and they were not sold, next week this The results of the classification were the probability for posi-
product amount, supplied to the same store, will be decreased. tive responses. Combining machine learning models and linear
So, it is very important to include lagged values of target or Bayesian models on different levels can give us improved
variable as a feature to predict next sales. More details about results for logistic regression. Fig. 4 shows a diagram of such
our team’s winer solution are at [6]. The simplified version of possible stacking model. On the first level, there are different
the R script is at [8]. Our winner solution may seem to be too XGBoost classifiers with different sets of features and subsets
complicated, but our goal was to win the competition and even of samples. On the second level, probabilities predicted on the
a small improvement in forecasting score required essential first level can be blended with appropriate weights using linear
numbers of machine learning models in the final ensemble. or Bayesian regression. To evaluate classification performance
Real business cases with a sufficient accuracy can be simpler. we used Matthews correlation coefficient (MCC):
IV. S TACKING APPROACH FOR LOGISTIC REGRESSION TP · TN − FP · FN
M CC = p
Let us consider using stacking approach for logistic regres- (T P + F P )(T P + F N )(T N + F P )(T N + F N )
sions problems. In the Kaggle competition ’Bosch Production where T P represents the number of true positives, T N
Line Performance’ [11], the problem of internal failures represents the number of true negatives, F P represents the
on assembly lines was considered. The data set consists of number of false positives, and F N represents the number
measurements for components on assembly lines. This case is of false negatives. Let us consider the use of generalized
a type of logistic regression problem with highly imbalanced linear model for stacking logistic regression with independent
classes. The problem lies in predicting which parts will fail variables which are the probabilities predicted by XGBoost
a quality control. In the work [12], the logistic regression models on the first levels.

256

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.
Level 3
w1 ET + w2 LM + w3 N N

ExtraTree Linear Neural Network


Level 2
Model Model Model

XGBoost XGBoost ..... Classifier m


Level 1
Model 1 Model 2 Model n

Fig. 3. Mulitlievel machine learning model for sales time series forecasting

Linear Bayesian
Level 2
Model Model

XGBoost XGBoost XGBoost


Level 1 Parameters Set 1 Parameters Set 2 ..... Parameters Set n
Samples Set 1 Samples Set 2 Samples Set n

Fig. 4. Stacking model for logistic regression

We used different sets of parameters for 3 XGBoost models, more precise results in comparison with single models. For
they are - set 1: max.depth = 15, colsample bytree = 0.7; set stacking machine learning models the linear regression with
2: max.depth = 5, colsample bytree = 0.7; set 3: max.depth = Lasso regularization, other machine learning model, Bayesian
15, colsample bytree = 0.3. For these 3 models, we used the model can be used. Using stacking model on the second level
same subset of samples. Fig. 5, 6 show the dependence of with the covariates that are predicted by machine learning
Matthews correlation coefficient from a probability threshold models on the first level, makes it possible to take into account
for different subsets of features, where features set 2 is features the differences in results for machine learning models received
set 1 with 4 added magic features. So-called magic features for different sets of parameters and subsets of samples. As
which are based on the ID of samples were considered by the obtained results show, using stacking approach for machine
participants of the competition at [13]–[15]. For Bayesian learning models we can improve performance of predictive
models, we used the same 3 subsets of parameters with models.
different subsets of samples. As it was shown above, for
different samples subsets, we received slightly different results R EFERENCES
for Matthews correlation coefficient. These differences can
[1] Kaggle: Your Home for Data Science. URL: http://kaggle.com
be taken into account using Bayesian model. For Bayesian [2] B. M. Pavlyshenko. “Linear, machine learning and probabilistic ap-
inference we used JAGS sampling software [16], [17]. We proaches for time series analysis,” in IEEE First International Conference
used Bayesian model for logistic regression. As covariates we on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, pp. 377-
381, August 23-27, 2016.
used the probabilities predicted by three XGBoost models.
[3] “Rossmann Store Sales”, Kaggle.Com, URL:
Fig. 7 shows the boxplots for coefficients of probabilities http://www.kaggle.com/c/rossmann-store-sales .
predicted by different XGBoost models. [4] D. H. Wolpert. “Stacked generalization.” Neural networks, 5(2), pp. 241-
259,1992.
[5] Kaggle competition “Grupo Bimbo Inventory Demand ” URL:
V. C ONCLUSION https://www.kaggle.com/c/grupo-bimbo-inventory-demand
In our study, we considered stacking approaches for time [6] Kaggle competition “Grupo Bimbo Inventory Demand”
#1 Place Solution of The Slippery Appraisals team.
series forecasting and logistic regression with highly imbal- URL:https://www.kaggle.com/c/grupo-bimbo-inventory-
anced data. Using multilevel stacking models, one can receive demand/discussion/23863

257

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Matthews coefficient for different XGBoost parameter sets (feature set 1)

Fig. 6. Matthews coefficient for different XGBoost parameter sets (feature set 2)

Data Analysis, 38(4):367-378, 2002.


[11] Kaggle competition “Bosch Production Line Performance”. URL:
https://www.kaggle.com/c/bosch-production-line-performance
[12] B. Pavlyshenko. “Machine learning, linear and bayesian models for
logistic regression in failure detection problems.,’ in IEEE International
Conference on Big Data (Big Data), Washington D.C., USA, pp. 2046-
2050, December 5-8, 2016.
[13] Kaggle competition “Bosch Production Line Performance”. The Magical
Feature : from LB 0.3- to 0.4+. URL:https://www.kaggle.com/c/bosch-
production-line-performance/forums/t/24065/the-magical-feature-from-
lb-0-3-to-0-4
[14] Kaggle competition “Bosch Production Line Performance”. Road-
2-0.4+. URL:https://www.kaggle.com/mmueller/bosch-production-line-
performance/road-2-0-4
[15] Kaggle competition “Bosch Production Line Performance”. Road-2-0.4+
–>FeatureSet++. URL: https://www.kaggle.com/alexxanderlarko/bosch-
production-line-performance/road-2-0-4-featureset
[16] John Kruschke. Doing Bayesian data analysis: A tutorial with R, JAGS,
and Stan. Academic Press, 2014.
[17] Martyn Plummer. JAGS Version 3.4.0 user manual.
URL:http://sourceforge.net/projects/mcmcjags/files/Manuals/3.x/
jags user manual.pdf

Fig. 7. Boxplots for coefficients of probabilities predicted by different


XGBoost models

[7] T. Chen and C. Guestrin. “Xgboost: A scalable tree boosting system.”


In Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining. ACM. 2016, pp. 785-794.
[8] Kaggle competition “Grupo Bimbo Inventory Demand” Bimbo XGBoost
R script LB:0.457. URL: https://www.kaggle.com/bpavlyshenko/bimbo-
xgboost-r-script-lb-0-457
[9] J. Friedman. “Greedy function approximation: a gradient boosting ma-
chine.”, Annals of Statistics, 29(5):1189-1232, 2001.
[10] J. Friedman. “Stochastic gradient boosting.”, Computational Statistics &

258

Authorized licensed use limited to: Vrije Universiteit Amsterdam. Downloaded on June 04,2025 at 15:15:02 UTC from IEEE Xplore. Restrictions apply.

You might also like