Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views14 pages

SOCIAL

This research article presents a novel method combining social network sentiment analysis, a genetic algorithm, and deep learning to enhance stock prediction accuracy. The approach utilizes a hybrid genetic algorithm to identify key indicators related to stock price fluctuations and integrates sentiment variables from social media to improve the prediction model using long short-term memory (LSTM). Results indicate that this method effectively increases the accuracy of stock price predictions, particularly in the context of recent global events impacting financial markets.

Uploaded by

Divyansh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

SOCIAL

This research article presents a novel method combining social network sentiment analysis, a genetic algorithm, and deep learning to enhance stock prediction accuracy. The approach utilizes a hybrid genetic algorithm to identify key indicators related to stock price fluctuations and integrates sentiment variables from social media to improve the prediction model using long short-term memory (LSTM). Results indicate that this method effectively increases the accuracy of stock price predictions, particularly in the context of recent global events impacting financial markets.

Uploaded by

Divyansh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

International Journal of Computational Intelligence Systems (2023) 16:93

https://doi.org/10.1007/s44196-023-00276-9

RESEARCH ARTICLE

Using Social Network Sentiment Analysis and Genetic Algorithm


to Improve the Stock Prediction Accuracy of the Deep Learning‑Based
Approach
Jia‑Yen Huang1 · Chun‑Liang Tung1 · Wei‑Zhen Lin1

Received: 27 December 2022 / Accepted: 21 May 2023


© The Author(s) 2023

Abstract
Traditionally, most investment tools used to predict stocks are based on quantitative variables, such as finance and capital
flow. With the widespread impact of the Internet, investors and investment institutions designing investment strategies
are also referring to online comments and discussions. However, multiple information sources, along with uncertainties
accompanying international political and economic events and the recent pandemic, have left investors concerned about
information interpretation approaches that could aid investment decision-making. To this end, this study proposes a method
that combines social media sentiment, genetic algorithm (GA), and deep learning to predict changes in stock prices. First, it
employs a hybrid genetic algorithm (HGA) combined with machine learning to identify chip-based indicators closely related
to fluctuations in stock prices and then uses them as input for long short-term memory (LSTM) to establish a prediction
model. Next, this study proposes five sentiment variables to analyze PTT social media on TSMC’s stock price and performs
a grey relational analysis (GRA) to identify the sentiment variables most closely related to stock price fluctuations. The senti-
ment variables are then combined with the selected chip-based indicators as input to build the LSTM prediction model. To
improve the efficiency of the LSTM analysis, this study applies the Taguchi method to optimize the hyper-parameters. The
results show that the proposed method of using HGA-screened chip-based variables and social media sentiment variables
as input to establish an LSTM prediction model can effectively improve the prediction accuracy of stock price fluctuations.

Keywords Stock prediction accuracy · Genetic algorithm · Social media sentiment · COVID-19 pandemic · Deep learning ·
Taguchi method

Abbreviations COVID-19 Coronavirus disease 2019


ACC​ Prediction accuracy DB Dealer buy
ANN Artificial neural network DBN Deep belief networks
ANSI Aggregate news sentiment index DJIA Dow Jones Industrial Average
ARIMA Autoregressive integrated moving average DNN Deep neural networks
model DOMPSL Day offset of margin purchasing and short
ATNN Adaptive time delay neural networks selling
CNN Convolution neural networks DS Dealer sell
DTN Day trading number
DTT Day trading turnover
* Chun‑Liang Tung DTV Day trading volume
[email protected]; [email protected] FCIB Foreign/Chinese investors buy
Jia‑Yen Huang FCIS Foreign/Chinese investors sell
[email protected] FN False negative
Wei‑Zhen Lin FP False positive
[email protected] GA Genetic algorithm
1 GRA​ Grey relational analysis
Department of Information Management, National Chin-Yi
University of Technology, No.57, Sec. 2, Zhongshan Rd., HGA Hybrid genetic algorithm
Taiping District, Taichung City 41170, Taiwan, ROC HHPIC Taiwan Hon-Hai Precision Industry Co., Ltd.

13
Vol.:(0123456789)
93 Page 2 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

HL Huang and Liu sentiment score data. Schumaker et al. [39] stated that, using statistical and
ISSI Islamic Sharia stock index mathematical methods, value investment targets can be iden-
JS Jing sentiment score tified from a large amount of historical data to obtain stable,
KNN K-nearest neighbor sustained, and above-average investment returns.
KOSPI Korean stock price index Data for various quantitative technical indicators, such as
LR Logistic regression trading volume, earnings per share, and price–earnings ratio,
LSTM Long short-term memory are openly available to the public. These indicators are not
MBL Margin balance long only the basis for investors to gage the movement direction
MBS Margin balance short of stock prices but also are often used by academicians to
MLB Margin loan buy study stock price forecasts [52]. Studies have applied vari-
MLCP Margin loan cash pay ous machine learning tools to analyze changes in the stock
MLS Margin loan sell market using technical indicator data. However, an excess of
NBSTII Net buy/sell of three institutional investors indicators is not necessarily conducive to prediction. Given
NS Negative sentiment score that GA is a powerful machine learning tool for global
NST Number of shares traded searches, this study uses GA to screen important indicators
PNS Positive and negative score leading to stock price fluctuations and further applies the
PS Positive sentiment score recently emerged deep learning (DL) to predict stock prices.
PTT PTT bulletin board system Major international political and economic events, from
PFL Price fluctuation limit the 2008 financial crisis to the COVID-19 pandemic in
RAF Random forests recent years, as well as the US–China confrontation and the
RNN Recurrent neural networks Ukrainian–Russian war, have affected stock price volatility,
RSI Relative strength index rendering forecasting increasingly difficult. News related to
RWT​ Random walk theory such events or company operations often influences inves-
SITB Securities investment trust buy tors’ stock investment decisions. Several relevant research
SITS Securities investment trust sell has shown that in addition to the quantitative variables of
SLB Stock loan buy finance-related indicators, news information affects inves-
SLCP Stock loan cash pay tors’ investment decisions, which in turn affects the change
SLS Stock loan sell in stock prices [6, 7, 14, 43]. In addition to information
SSE Shanghai stock exchange from traditional media, investors are referring to opinions
SVM Support vector machine on social media or sharing their views on various online
TIV Turnover in value platforms. Thus, in addition to finance-related indicators,
TN True negative investor sentiment is an important variable when estimating
TP True positive stock prices.
TSNN Time delay neural networks In recent years, information technologies, such as text
TSMC Taiwan Semiconductor Manufacturing mining and web crawlers, have enabled social media com-
Company ments to be automatically and objectively collected and pro-
TV Trading volume cessed on a large scale. Many scholars have converted news
TVIX Taiwan volatility index data from social media into quantitative sentiment informa-
tion to predict stock price changes using machine learning
[45]. Using the Chinese language for a sentiment analysis
1 Introduction can be difficult, given the need for text segmentation. Moreo-
ver, the sentiment dictionary resources are incomplete and
Taiwan’s stock market continues to grow in scale, with the difficult to construct [31]. Therefore, this study focuses on a
number of listed companies increasing from 698 to 954 sophisticated sentiment analysis technique that converts text
between 2007 and 2021. Investors must, thus, make more from social media into predictive variables to help improve
rational investment decisions to efficiently select stocks wor- the forecasting accuracy for stock prices.
thy of investment from numerous targets. Early research has Stock price forecasting is a challenging task, given the
mainly adopted random walk theory (RWT), which claims volatile and non-linear nature of financial stock markets. To
that stock prices follow a random walk model and cannot tackle this complex problem, this study proposes a complete
exceed 50% accuracy prediction [21]. Chen [10], how- and innovative forecasting framework comprising feature
ever, pointed out that RWT is not suitable for the Taiwan selection (FS) that combines GA and machine learning tools.
Weighted Stock Index, and investors can predict the direc- In addition, it uses the Taguchi method to find the optimum
tion of future stock price movement from past stock price configuration of DL network architecture, identify sentiment

13
International Journal of Computational Intelligence Systems (2023) 16:93 Page 3 of 14 93

variables most strongly correlated with stock fluctuations, concluded that the use of tools for forecasting has positive
integrate the technical indicators and sentiment variables implications for investments. LR, which classifies data by
in the DL model, and assess the impact of COVID-19 on probability value, has the advantages of strong adaptability,
prediction accuracy. robustness, and good model interpretability. Huang and Liu
The remainder of this paper is organized as follows. [21] selected more than 50 technical indexes, used LR to
Section 2 reviews machine learning prediction models that screen variables, and accordingly, established a predictive
are based on technical indicators related to the stock mar- stock price model. SVM has high prediction accuracy and
ket and the use of social media mining to improve predic- reports good performance in solving problems, such as small
tion models. This section also illustrates the importance of sample sizes, nonlinearity, and high dimensionality. Endri
feature selection in forecast accuracy and reviews the lit- et al. [15] used SVM to develop an early warning system to
erature on predictive modeling using the results of feature predict the delisting of Islamic stocks (ISSI). Chen and Hao
selection. Section 3 details opinion mining of social media [12] proposed a hybrid framework using feature-weighted
and the proposed sentiment variables. It then develops the SVM and K-nearest neighbor (KNN) to predict stock market
method of applying HGA to screen chip-based indicators indices.
that cause fluctuations in Taiwan Semiconductor Manu- The use of evolutionary computation, especially GA, is
facturing Company’s (TSMC) stock prices. Section 4 uses not uncommon in the financial literature [3]. GA searches
the selected indicators to determine the highest prediction for optimal global solutions by imitating the concept of
accuracy using LSTM. This section performs a grey rela- survival of the fittest, and its application range continues
tional analysis (GRA) to select the sentiment variables most to expand extensively [3]. Since the algorithm is a variant
closely associated with stock price volatility and then adds of adaptive probabilistic search technologies, its selection,
the variables to the LSTM model to evaluate their effect on crossover, and mutation operations are performed from a
improving prediction accuracy. Section 5 presents conclu- probability viewpoint, thereby increasing the flexibility of
sions and recommendations for future research. its search process. Chung and Shin [13] used GA to optimize
the parameter settings of neural networks, and Sezer et al.
[40] and Sezer et al. [41] applied GA to optimize the techni-
2 Literature Review cal indicators of stock markets. However, these studies adopt
the simplistic approach of basing buying and selling points
This section reviews research that applied machine learning on technical indicators for the relative strength index (RSI).
techniques, features selection, and social media mining to Given its excellent predictability in image classification
predict stock prices. and natural language processing, deep learning has attracted
widespread attention and has been applied to stock predic-
2.1 Machine Learning Prediction Model Based tion in recent research. Representative methods of deep
on Technical Indicators Related to Stock learning include deep belief networks (DBN), convolution
Markets neural networks (CNN), recurrent neural networks (RNN),
and LSTM. LSTM is an improved RNN, which is one of
Early research primarily used statistics and machine learning the most advanced deep learning algorithms. However, rel-
techniques to forecast stock prices. However, the forecasting evant studies are insufficient in the field of financial fore-
performance of these statistical methods is not satisfactory. casting. Li and Tam [30] used SVM and LSTM to predict
Autoregressive integrated moving average model (ARIMA), the performance of various stocks listed on the Shanghai
for example, often requires more historical data to satisfy the Stock Exchange (SSE 50 Index). Their results showed that
statistical assumptions of normality. Empirical results show SVM is suitable when forecasting low volatility stocks, and
that machine learning techniques have superior predictive LSTM has the best overall performance for stocks ranging
power over statistical models because they can identify hid- between medium and high volatility. Zhao et al. [54] used
den relationships between factors affecting stock markets the Random Forest (RAF) regression algorithm for feature
and capture complex patterns in data without prior knowl- screening, and proposed a hybrid model which combined
edge from input data [2]. an economic model with a deep learning model to improve
There are various tools that can be employed in machine the prediction accuracy of the option pricing model of CSI
learning to predict stock market changes on the basis of tech- 300ETF.
nical indicator data, and each tool has its own advantages. Singh and Srivastava [43] argued that stock market fore-
Common tools include support vector machine (SVM), casting is a chaotic, complex, volatile, and dynamic time
logistic regression (LR), artificial neural network (ANN), series problem. Given the failure of existing ANN meth-
and deep learning [35]. Ballings et al. [5] compared various ods to provide encouraging results, they proposed that deep
classifiers used to predict the fluctuation in stock prices and learning can improve the accuracy of stock market forecasts.

13
93 Page 4 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

Fischer and Krauss [16] used the LSTM network to predict FS can be applied in numerous ways to improve fore-
the price movement direction of S&P 500 constituent stocks casting accuracy in finance-related fields. Yuan et al. [53],
from 1992 to 2015 and showed that the LSTM network for example, use RF and SVM-recursive feature elimination
outperforms RAF, deep neural networks (DNN), LR, and (SVM-RFE) in their features engineering analysis to predict
other classifiers. Bao et al. [6] proposed a novel deep learn- the Chinese stock market. Zhao et al. [54] use RF in their
ing framework that incorporates methods, such as stacked regression algorithm for FS and propose a hybrid model that
autoencoders, wavelet transforms, and LSTMs, to predict combines an economic model with a DL model to improve
stock prices. the prediction accuracy of an option pricing model for CSI
Chung and Shin [13] pointed out that although LSTM is 300 ETF.
a powerful tool to solve time series and pattern recognition In addition to the above-mentioned machine learning
problems, they are subject to certain shortcomings. First, methods, many studies have used GA with FS to improve
LSTMs cannot provide specific explanations for their pre- prediction accuracy in finance-related fields. Abraham et al.
diction results. Second, like other neural network models, [1] use stock prices recorded over the past 10 days and the
LSTMs have many parameters that must be tuned, such as historical data of four international stock indices (S&P 500,
the number of layers, neurons per layer, and the number NIKKEI 225, CAC40, DAX) as the chromosomes for their
of time lags. However, time and computational constraints features engineering with GA and establish a prediction
make it impossible to examine all parameter spaces and find model with RF. They select 15 stocks across three sectors
the optimal parameter set, and thus, the setting of these con- (technology, finance, and health) for stock price forecast-
trol parameters often depends on the researcher’s experi- ing and find that the most relevant indicator for stock price
ence. To solve this problem, Kim and Shin [28] applied GA changes is S&P 500. This result is not surprising given that
to assist both adaptive time delay neural networks (ATNN) most of the 15 stocks are US-based companies. Moreover,
and time delay neural networks (TDNN) to optimize time the authors treat stock movement prediction as a binary clas-
delay values and network architecture. Chung and Shin [13] sification problem. They classify stock trends as an uptrend
used GA to optimize the temporal pattern of daily data for if the change in daily return price is greater than 0.5% and a
the Korean Stock Price Index (KOSPI) and detection-related non-uptrend otherwise. While the data for their study were
architectural factors, such as time window size and the num- collected for more than 17 months, the four stock indices
ber of LSTM units in hidden layers. Chung and Shin [13] rose more and fell less during their analysis period, creating
claimed that their LSTM network comprises two hidden lay- an imbalanced dataset.
ers and is characterized by deep architecture that can effec- GA is one of the most widely used meta-heuristic and
tively express the non-linear and complex nature of stock evolutionary algorithms in various domains, such as stock
markets. However, the optimal number of neurons tends to trend prediction and determination of optimal model con-
differ when the number of hidden layers varies. figuration for DL. Sharma et al. [42] propose a method with
ANN and GA for stock market forecasting. Their study uses
historical data for DOW 30 and NASDAQ 100 and reveals
2.2 Features Selection that the accuracy of the hybrid model is greater than that of
a single ANN technique. Considering the significant depend-
In recent years, numerous studies from a wide range of fields ence of LSTM performance on architectural design, Suddle
have applied FS to improve the accuracy and efficiency of et al. [44] develop GA-based algorithms to automatically
predictive models [42]. This section briefly reviews research optimize the LSTM architecture for a sentiment analysis.
on GA-based features engineering in non-financial fields. Oyedele et al. [33] developed a GA-tuned framework to
Yang et al. [51] present a fault diagnosis approach com- obtain a generalized prediction for the daily closing prices
prising an FS method based on GA and then use selected of cryptocurrency.
features as input attributes for classification. Using a typi- The literature on stock market forecasting commonly
cal classification method, such as SVM and random forest adopts feature selection algorithms, such as principal com-
(RF), they show that the optimized features space is superior ponent analysis (PCA), stepwise regression analysis (SRS),
to the original features space. Pei et al. [36] combine GA and information gain and decision tree (DT). However,
with RF cascaded for FS to expedite image classification these feature selection algorithms are unable to determine
and improve the accurate recognition rate. Khan et al. [27] the direct influence of stock features on stock price [4].
apply GA to a texture-based feature descriptor to remove Researchers have also adopted other feature selection meth-
possible redundant features. Evaluating the proposed clas- ods. Peng et al. [37], for example, discuss feature selection
sification framework on a standard 7-class microstructural in the context of DNN models to predict stock price direc-
image dataset offers impressive outcomes, confirming its tion. They apply three feature selection methods to reduce
superiority over certain state-of-the-art methods. the feature dimension from a set of 124 technical analysis

13
International Journal of Computational Intelligence Systems (2023) 16:93 Page 5 of 14 93

indicators, namely the sequential forward floating selection combining technical variables and social media sentiment to
(SFFS) algorithm, tournament screening (TS) algorithm, and predict stock market trends.
least absolute shrinkage and selection operator (LASSO). Schumaker et al. [39] developed a stock price prediction
Among these, TS is analogous to a genetic algorithm. Given engine to analyze financial news sentiment. The experimen-
the success of GA in solving complex optimization prob- tal results showed that the investment performance of this
lems, researchers from various fields, including image clas- system is better than that of the market average. Peng et al.
sification, medical gene identification, and visual human [37] applied features extracted from historical price data
action recognition have used GA for feature selection [38]. and financial news to DNNs to predict stock movements
However, few studies discuss the application of GA to fea- and showed that financial news could significantly improve
ture selection in stock market prediction. forecasting accuracy. Cakra and Trisedya [8] examined the
The work of Huang and Liu [21] is the most analogous sentiment of Twitter posts to predict the direction of Indo-
to our study. Focusing on stock price forecasting, their nesian stock market volatility. Pagolu et al. [34] studied the
research uses chip-based indicators as predictive variables correlation between the emotional state of Twitter users and
and conducts feature selection. They select more than 50 the price of the Dow Jones Industrial Average (DJIA) and
technical indexes, use LR to screen variables, and accord- found a strong correlation between the two.
ingly, establish a predictive stock price model. However, it Vargas et al. [46] applied deep learning to predict daily
is difficult to describe the moving tendency of stock prices stock price movements on the basis of financial news head-
given their non-linear, non-stationary, and noisy character- lines and technical indicators. In addition to comparing two
istics [4]. In view of the nonlinearity in stock data, a model sets of technical indicators, their study compared a financial
developed using a traditional or single intelligent technique news hybrid model composed of CNN, an LSTM hybrid
may not accurately forecast results [42]. LR cannot solve model for technical indicators, and an I-RNN model com-
nonlinear problems and is sensitive to multi-collinear data, prising technical indicators. Their results showed that CNN
thus, many studies have stated that it is inappropriate to use is better than RNN in extracting semantics from text, while
LR for feature selection. Wu et al. (2022) [50], for example, RNN performs better in capturing contextual information
highlight the difficulty of using LR to screen features and and in simulating complex temporal characteristics. In addi-
that the quality of feature engineering cannot be guaranteed. tion, financial news plays a major role in obtaining stable
Therefore, they propose a gradient-boosting decision tree results, while the two sets of technical indicators have little
algorithm to improve the quality of feature engineering. effect on forecast accuracy. Wu et al. [50] combined stock
The literature review presented thus far highlights the forum posts, financial news, technical indicators and stock
positive impact of GA in features engineering across fields. historical transaction data as the feature set of stock price
Thus, this study adopts the feature selection method with prediction and adopt the LSTM to predict the China Shang-
GA. Another important step in performing a GA analysis hai A-share market. Since only three technical indicators
is chromosome setting. Each study uses different variables were considered in their study, including the stochastic oscil-
to set chromosomes, and this autonomy allows for research lator index, William index and relative strength index, their
creativity. This study is the first to use chips-based indices accuracy is still far from sufficient [29].
as the genes that make up the chromosome. Since information on stock purchases and sales in Taiwan
is publicly available, it is not uncommon for academicians
to use technical variables to make stock selection decisions
2.3 Integration of Social Media Mining and Stock in the context of the Taiwan Stock Exchange. Huang et al.
Price Prediction [24], for example, used a wrapper approach to select the best
subset of features from an original feature set containing 23
In addition to chip-based indicators, stock markets are technical indicators and then applied a voting scheme com-
affected by other uncertainties, such as opinions shared bining different classification algorithms to predict the trend
through news and social media. News and social media have of the Korean and Taiwanese stock markets. Wei et al. [47]
become valuable resources when mining public sentiment constructed an integrated news sentiment index (ANSI) on
using text sentiment analysis technology. Predicting stock the basis of Chinese financial news related to all companies
price fluctuations by mining news texts is an emerging field listed on the Taiwan Stock Exchange. A rise in the index
of data mining research. Several studies have abandoned the is accompanied by an increased trade value and a reduc-
traditional method of predicting stock prices on the basis tion in investor fear, as indicated by the Taiwan volatility
of technical variables and simply explored the relationship index (TVIX). Their findings proved that sentiment levels
between news sentiment and stock prices or combined tech- reflected in news reports could be effectively used as a refer-
nical variables and news sentiment to predict stock prices ence for investment decisions. Chang et al. [9] examined 50
[6]. Nti et al. [32] presented a detailed review of research Taiwanese constituent stocks using a vector auto-regression

13
93 Page 6 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

model and showed that Taiwan’s large-capitalization stocks


Collect stock price and
are most affected by corporate and foreign investments. Wu chip-based data for TSMC
et al. [49] showed that news variables provide information
that is useful in predicting Taiwan stock market returns, and Collect reviews and
the prediction accuracy is higher when the stock market is Data normalization
replies related to
booming. Huang and Liu [21] used binary LR to establish a TSMC
forecasting model for three price-level changes (0%, 0.5%,
and 1.0% and above) for the stocks of Taiwan Hon-Hai Pre- Start of GA
Phrase rules for
cision Industry Co., Ltd. (HHPIC). However, their analysis reviews and
results were not statistically significant owing to the rela-
Machine learning
tively small data. Huang and Liu [21] integrated chip-based prediction Sentiment scores
indicators and sentiment variables with their LR forecasting
model and claimed that sentiment variables could be used to
improve forecast accuracy. However, they did not evaluate End of GA GRA
the impact of the selected variables on forecast accuracy, and
the selected variables changed when predicting the percent-
age of price fluctuation for different stocks. DL Taguchi method
In sum, the accuracy of stock prediction is closely related
to the selection of predictor variables. Thus, this study pro- Optimal prediction
poses the use of GA to screen chip-based indicators and variables and model
combines the results with sentiment variables as input fea-
tures for deep learning to build a prediction model. Since the
Fig. 1  Research framework
collected data span the period before and after the pandemic,
this study also compares changes in important chip-based
indicators and social media sentiment before and after the Referencing Hong et al. [17], this study sets February 21,
pandemic. 2020, as the cut-off date for the occurrence of COVID-19.
Therefore, the period before the pandemic is from January
1, 2019, to February 20, 2020, and that after the pandemic is
between February 21, 2020, and April 29, 2021. The overall
3 Methodology period includes 561 trading days, excluding the dates when
the stock market was closed. The period after the pandemic
3.1 Research Framework
has 290 trading days.
This research references two types of data: chip-based indi-
3.3 Opinion Mining of Social Media
cators and social media data. The first is obtained from the
Taiwan Stock Market Observation Post System for TSMC,
Globally well-known social media platforms, such as Face-
and the second is reviews and corresponding replies posted
book, YouTube, Line, and Instagram, have their own main
about TSMC on PTT’s bulletin board. Both data were col-
functions and features that attract users. PTT is built on the
lected between January 1, 2019, and April 31, 2021. This
resources of Taiwan’s academic network and serves as an
study performed a pre-process and word segmentation on
instant online discussion platform. It is one of the most influ-
PTT’s corpus and then extracted opinion words to estab-
ential and widely used online forums in Taiwan [11], with
lish quantitative variables for social media sentiment. It
good features for instant interactions and rapid dissemination
then applied a hybrid GA to screen the chip-based variables
speed. The platform has 1.5 million registered users who
related to price fluctuations for TSMC’s stock and used
contribute more than 20,000 reviews and 500,000 replies
LSTM to establish the prediction model. Figure 1 illustrates
per day. The most frequent users belong to the age group
the research framework.
of 20–45 years, and most of them are employed and are
financially able.
3.2 Data for Chip‑Based Indications Articles on PTT are composed of two parts: reviews
and replies to the reviews. The author of an article com-
Data collected for TSMC’s 25 chip-based indicators can be ments on a specific subject in the form of a review, and
divided into four categories: transaction information, buy/ users post responses to the review as replies. Both parts
sell information for three major institutional investors, mar- use different grammatical structures given the purpose and
gin /stock loan, and day trading information (see Table 1). professional quality of users. Most review authors have

13
International Journal of Computational Intelligence Systems (2023) 16:93 Page 7 of 14 93

Table 1  List of collected chip-based indicators


Categories Indicators
Transaction information Turnover in value (TIV) Trading volume (TV) Number of shares traded Price fluctuation limit (PFL)
(NST)

Buy/sell of three major Foreign/Chinese investors Securities investment trust Dealer buy (DB) Net buy/sell of three institu-
institutional investors buy (FCIB) buy (SITB) tional investors (NBSTII)
Foreign/Chinese investors Securities investment trust Dealer sell (DS)
sell (FCIS) sell (SITS)
Margin loan/ Stock loan Margin loan buy (MLB) Stock loan buy (SLB) Margin loan cash pay Margin loan balance (MLB)
(MLCP)
Margin loan sell (MLS) Stock loan sell (SLS) Stock loan cash pay (SLCP) Stock loan balance (SLB)
Margin balance long (MBL) Margin balance short Day offset of margin pur-
(MBS) chasing and short selling
(DOMPSL)
Day trading information Day trading turnover (DTT) Day trading volume (DTV) Day trading Number (DTN)

strong professional literacy in the stock market and express Table 2  Example of PTT’s sentiment score calculation for TSMC
their views on current affairs with longer and more rigor-
Number of reviews Scores after considering
ous phrases and professional words. Replies, on the other
and replies that degree words
hand, reflect users’ feelings in response to a review. The match phrase rules
content is relatively brief and casual, and the grammati-
Positive Negative Positive score Negative score
cal structure considerably differs from that of reviews. To number number
analyze sentiments on social media, this study referred to
Huang and Liu’s [21] method to establish phrase rules for Degree (score)
reviews and to Huang and Lu’s [20] approach to determine Over (1) 3 4 3 −4
phrase rules for replies. Very (2) 4 1 8 −2
This study collected TSMC-related information from Extreme (3) 2 3 6 −9
2,177 reviews and about 30,000 replies on PTT. Since the PS score 17
occurrence of degree words in replies is relatively low, NS score − 15
this study set degree words to three levels (i.e., over, very, PNS score 2
and extreme) and assigned the levels 1, 2, and 3 points,
respectively. When negative words, such as “no” and “does
not have”, appeared in a sentence, the positive and nega- (1) Positive sentiment score (PS): The variable assumes
tive meanings of the sentence were likely to be reversed. If that the direction of TSMC’s stock movement is asso-
negative words were used before or after positive opinion ciated with positive sentiments on PTT, and its value
words, then the sentence was classified as representing is the sum of positive scores for reviews and replies on
negative emotions. However, if negative words appeared that day. As shown in Table 2, this variable scores 17
before or after negative opinion words, the sentence was points on the basis of all reviews and replies that satisfy
classified under expressions of positive emotions. the phrase rules on that day.
Based on the assumption that all replies reflect the same (2) Negative sentiment score (NS): This variable assumes
emotional direction as the review, Huang and Liu [21] that the direction of TSMC’s stock movement is related
estimated the sentiment score as the sum of the sentiment to negative sentiments on PTT, and its value is the sum
scores multiplied by the number of replies. However, while of negative scores for reviews and replies on that day.
the replies were in response to the reviews, the polarity of The variable reports a score of − 15 points as per all
emotions may not be in the same direction. Therefore, it is reviews and replies that meet the phrase rules on that
worth re-evaluating whether it is appropriate to consider day (Table 2).
the number of replies as a weight in calculating sentiment (3) Sum of positive and negative scores (PNS): This vari-
scores. To select suitable sentiment scores to enhance pre- able summarizes whether the overall reaction of PTT
diction accuracy, this study proposed five sentiment vari- users on a given day is positive or negative toward
ables (see Table 2 for an example). TSMC’s stock price. Its value is the sum of the positive

13
93 Page 8 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

and negative sentiment scores on that day. The score for Δij =||yj- xi,j||, where i stands for sentiment variable and j
this indicator on that day is 2 points (Table 2). denotes date. The distinguishing coefficient 𝜉 is set at 0.3.
(4) Huang and Liu sentiment score (HL): This variable, Δmax is the largest value for Δij and the Δmin is the smallest
proposed by Huang and Liu [21], is based on the sen- value for Δij.
timent of a review, and the number of corresponding
replies is used as the weight. The variable does not (4) Computation of grey relational grade: Grey relational
quantify reply sentiment. For instance, if the review grade Γi of each sentiment variable was estimated by
sentiment score is 10 points and the number of replies averaging the grey relational coefficient corresponding
is 30, the total sentiment score for the article is 300. to each sentiment variable:
The sentiment score for this variable is the sum of sen- n
timent scores for all articles on a given day. 1∑
Γi = 𝜍 , i = 1, 2, … , 5.
n j ij (3)
(5) Jing sentiment score (JS): This variable refers to Jing
et al.’s (2021) [26]scoring method, which records the
number of positive and negative reviews on a given day
and uses the total number of reviews on the day as the 3.4 Hybrid Genetic Algorithm
denominator to calculate the sentiment score of the day.
It is estimated as the total number of positive reviews This study focused on predicting if the increase in TSMC’s
minus the total number of negative reviews divided by stock price is greater than 0%, which is a binary classifi-
the total number of review articles on the day. This cation problem. Thus, it selected common classifiers in
variable does not account for sentiment and the number machine learning to establish prediction models, including
of replies. decision tree (DT), LR, and SVM. The study incorporated
GA with machine learning tools to find the optimal combina-
tion of chip-based indicators, and thus, it is called a hybrid
Among the above-mentioned sentiment variables, this genetic algorithm.
study used GRA to identify variables most related to fluc- In the present HGA process, the initial population of
tuations in TSMC’s stock price. Accordingly, it incorporated chromosomes is randomly generated bit by bit. If a gene
the variables into the forecast model to evaluate their effect corresponding to a certain chip-based indicator is set to 0,
on forecast accuracy. One of the major advantages of GRA the variable is not included in the classifier analysis. On the
is its ability to identify major correlations among factors of other hand, if it is set to 1, the variable is included. This
a system [19]. The steps for GRA are as follows: selection mechanism simulates the evolutionary process;
that is, the best chromosomes have more copies in the next
(1) Definition of reference sequence and comparabil- generation, and the worst ones perish. The priority order for
ity sequences: In this study, the reference sequence, each chromosome during the evolution is based on its fitness
Y = (y1,y2,…,yn), is the daily rise or fall records for function value, which is set to be the prediction accuracy of
TSMC’s stock price, and the comparability sequences, machine learning. Referencing Huang and Yao’s [25] elitism
Xi = (xi,1,xi,2,…,xi,n), is the daily scores for the senti- strategy, this study adopted the roulette wheel mechanism to
ment variables i, where i = 1,2,…,5, represents the five select chromosomes for reproduction in the HGA. It directly
above-mentioned sentiment variables. copied 10% of the chromosomes with the best fitness values
(2) Normalization of comparability sequences: Using to the next generation. The step-by-step procedure for the
Eq. (1), xi,j was converted into a number between 0 HGA is as follows:
and 1, where xmax i and xmin i denote the maximum and Step 1. Input the dataset and GA parameters with a
minimum value of each sentiment variable. chromosome length of 25 (i.e., the number of chip-based
xi,j − xmin i indicators); a crossover rate of 0.8; and a mutation rate of
xi,j = . (1) 0.003846, which is obtained following Huang and Wu’s [23]
xmax i − xmin i
suggestion of 1/(chromosome length + 1). Set gen = 1 and
(3) Calculation of grey relational coefficient 𝜍ij for each Rec.dat = ∅.
sentiment variable: Step 2. Randomly generate binary strings for the chro-
mosomes in the initial population whose number is set to
Δmin + 𝜉 × Δmax 100 groups.
𝜍ij = , (2)
Δij + 𝜉 × Δmax Step 3. Simulate the evolutionary process by applying
selection and reproduction, followed by crossover and muta-
where, Δ ij is the deviation sequence of the reference tion to the current population to generate a new population
sequence (Y) and comparability sequence (X i), that is, in the next generation.

13
International Journal of Computational Intelligence Systems (2023) 16:93 Page 9 of 14 93

Step 4. Check if each set of chromosomes exists in Rec.dat. The Taguchi method uses orthogonal arrays to obtain
If all the sets of chromosomes are recorded in Rec.dat, proceed effective statistical data with fewer experiments. Huang
to Step 5. Chromosomes that are not stored in Rec.dat are used and Tsai [22] used the Taguchi method to determine the
by machine learning classifiers to build prediction models. optimal combination of factors that may affect the predic-
Each chromosome is modeled on a ten-fold validation, and its tion accuracy of SVM. Hsieh et al. [18] used orthogonal
prediction accuracy is averaged as the fitness function value. arrays to find appropriate hyper-parameters for a back-
Step 5. If the termination condition is satisfied, stop the propagation neural network (BPNN), including the num-
evolutionary process and select the solution with the largest ber of neurons in the hidden layer, the learning rate, and
objective value in Rec.dat as the best solution. Otherwise, set the momentum. To avoid the inefficient practice of trial
gen = gen + 1 and go back to Step 3. When the evolutionary and error, this study utilized orthogonal arrays to opti-
generation reaches 300, or the best solution has not improved mize LSTM hyper-parameter combinations, including the
in the last 50 generations, the GA operation is terminated. number of hidden layer neurons, learning rate, batch size,
Prediction accuracy (ACC) in Eq. (4) is based on the clas- number of epoch, and time steps (Table 4). This study
sification table of prediction models (Table 2) and is used to used an ­L16(45) orthogonal array (Table 5). Each factor
calculate the results by comparing the prediction with the is set to four levels, and the quality characteristic is pre-
actual value. diction accuracy. To avoid interaction effects, the third
column of the orthogonal array is intentionally left blank.
TP + TN
ACC = . (4) Each experiment is conducted 3–5 times, and the value
TP + TN + FP + FN
When the predicted value is the same as the actual value,
the prediction is correct and can be subdivided into TP and
TN, as shown in Table 3. There are also two cases of inaccu-
Table 4  Factors and levels of Taguchi orthogonal arrays
rate predictions: false positive (FP) and false negative (FN). FP
refers to when the number of incorrect predictions for a stock Level Level 1 Level 2 Level 3 Level 4
price increases by more than 0%, and FN means the number of
Factors
incorrect predictions of a stock price increases by less than 0%.
Neuros 16 32 64 128
Batch size 8 16 24 32
3.5 Analysis of LSTM Prediction Model Incorporated
Epoch 250 500 750 1000
with Taguchi Method
Time steps 30 45 60 75

This study used a combination of chip-based indicators


obtained by HGA as the input features for LSTM to evaluate
the effect of deep learning on improving prediction accuracy. Table 5  L16(45) orthogonal array
To solve for the vanishing gradient problem in training the long Factors 1 (Neuros) 2 (Batch 3 (Blank) 4 (Epoch) 5 (Time
sequence, it added three control gates to LSTM—input gate, size) steps)
output gate, and forget gate—to determine when to update
Exp
the memory. The input gate decides which data should enter
1 16 8 250 30
long-term memory; the output gate identifies which results to
2 16 16 500 45
output; and the forget gate uses a sigmoid function to establish
3 16 24 750 60
whether to retain or forget each feature data at a specific time
4 16 32 1000 75
stamp in the cell. Since LSTM generally includes multiple
5 32 8 750 75
hidden layers, and each hidden layer has multiple neurons, the
6 32 16 1000 60
number of parameters is often significantly large. It is imprac-
7 32 24 250 45
tical to use the brute force method to find the best parameter
8 32 32 500 30
combination, given the computational limitations. Thus, this
9 64 8 1000 45
study adopted the Taguchi method to optimize the selection
10 64 16 750 30
of LSTM parameters.
11 64 24 500 75
12 64 32 250 60
13 128 8 500 60
Table 3  Confusion matrix
Prediction 14 128 16 250 75
Actual TP FN 15 128 24 1000 30
FP TN 16 128 32 750 45

13
93 Page 10 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

for signal-to-noise ratio (S/N) is used to create a response Table 7  Prediction results for HGA using different data periods
table and diagram to determine the optimal combination. SVM (%) LR (%) DT (%)

Overall period 57.22 54.36 57.20


4 Analysis and Discussion Post-pandemic period 59.65 57.93 56.55

4.1 Screen Results for Chip‑Based Indicators


times. The accuracy was averaged to avoid statistical bias.
This study used GA and three machine learning tools to Owing to space limitations, this study only presents the
screen 25 chip-based indicators for the subsequent LSTM results for HGA using SVM as the classifier in the overall
analysis. It employed Scikit-Learn, a software machine period (Fig. 2).
learning library for Python, to perform machine learning. Among the 16 experiments on the L ­ 16(45) orthogonal
In addition to the parameters defaulted by the software, the arrays, the best prediction accuracy of 58.89% was reported
study performed a grid search to identify the best parameter in the 10th experiment, which was a combination of [A3,
settings for each method (see Table 6). B2, C3, D1] (i.e., [64, 16, 750, 30]). It is preferable to have
To identify the impact of the pandemic on important chip- a larger S/N value for the quality characteristic, that is, pre-
based indicators affecting stock prices, this study divided the diction accuracy. Accordingly, [A4, B2, C3, D1] (i.e., [128,
data into two periods: the overall period (from January 1, 16, 750, 30]) was selected from Fig. 2 as the most suitable
2019, to April 29, 2021) and the period after the pandemic combination of hyper-parameters to build the LSTM mod-
(from February 21, 2020, to April 29, 2021). Table 7 pre- els. However, since this combination did not exist in the
sents the HGA classification results for different periods. 16 Taguchi experiments, a confirmation test was required.
SVM outperforms the other two methods. Using SVM and After analyzing the results of this combination, the predic-
LR and post-pandemic data to establish a prediction model tion accuracy rate reached 60.00%, which was indeed an
can establish better prediction accuracy, whereas DT has improvement over the previously estimated highest percent-
no significant effect on prediction accuracy in both periods. age of 58.89%. Thus, the Taguchi method can effectively
Table 7 shows that classifying data under a post-pandemic optimize LSTM hyper-parameters.
period can help improve prediction performance.
4.3 Quantitative Analysis of Sentiment Variables
4.2 LSTM Analysis Results Using the Taguchi
Method to Adjust Hyper‑Parameters Using sentiment variables as a variable to predict stock price
fluctuations, this study adopted GRA to evaluate the correla-
This study used chip-based indicators screened using HGA tion between the above-mentioned five sentiment variables
as the training features for LSTM. Then, adopting the Tagu- and stock prices change.
chi method to determine hyper-parameters, it explored the As shown in Table 8, among the five sentiment vari-
degree to which LSTM can improve prediction accuracy. ables, the negative sentiment score has the highest grey
After repeated experiments, LSTM was found to more relational grade, followed by the positive sentiment score.
easily converge with five hidden layers. Therefore, this
study used five hidden layers to determine the most suit-
able hyper-parameter combination for modeling. Indicator
combinations and sets of optimal hyper-parameters differ by
machine learning approach, and thus, Taguchi experiments
must be separately conducted. In this study, each experi-
ment with the Taguchi orthogonal array was conducted ten

Table 6  Parameter settings for machine learning tools


Machine learning tool Parameter settings

SVM C = 495.0, kernel = 'rbf'’,


gamma = 1/(number of features + 1)
LR penalty = 'l2', C = 22.0
DT criterion = 'gini', splitter = 'best'
Fig. 2  Response diagram for HGA using SVM as the classifier

13
International Journal of Computational Intelligence Systems (2023) 16:93 Page 11 of 14 93

Table 8  GRA analysis results Sentiment vari- Grey Table 10  Differences in prediction accuracy after adding sentiment
for five sentiment variables ables relational variables
grade Prediction
accuracy
PS 0.550
(%)
NS 0.572
PNS 0.512 LSTM without sentiment variables 60.00
HL 0.460 LSTM with sentiment variables 62.22
JS 0.458 Improvement + 2.22

estimated using the prediction model without sentiment vari-


Table 9  Changes in sentiment score before and after the pandemic ables. Therefore, adding the two sentiment variables pro-
Period Number of PS score NS score
posed in this study can effectively improve the prediction
articles accuracy of the LSTM model.

Pre-pandemic 485 5732 − 4968 4.5 Prediction Performance Evaluation


Post-pandemic 1692 16,546 − 14,517
Summation 2177 22,278 − 19,485 Several studies have discussed stock market forecasting, and
each uses different forecasting variables and tools. However,
unlike the research on image recognition that generally uses
Table 9 shows that after the outbreak of the pandemic, the the same database as the basis for comparison, studies on
number of discussions about TSMC on PTT increased, and the accuracy of stock market forecasts do not have the same
the overall number of articles and total sentiment score database as a benchmark for comparison. This is because
were significantly greater than those before the outbreak. research objects and time periods differ by study. Thus, it is
Irrespective of the period, the scores for positive and nega- difficult to objectively compare forecast performance with
tive sentiments were similar, and the total scores for posi- those presented in other papers. In addition to demonstrating
tive sentiment were slightly higher than those for nega- the improvement of prediction accuracy by applying LSTM
tive sentiment. The PNS variable ranked third, although and the sentiment of social media, this study also employs
simply adding the positive and negative scores of the day different machine learning tools, including DT, LR, and
does not seem to be an ideal sentiment variable. This is SVM, in the GA-based feature selection process to compare
because when a major financial or political event occurs, the performance of different classifiers. Moreover, we focus
PTT users tend to speak more enthusiastically, and both on the comparison with that of Huang and Liu [21] since
positive and negative sentiment scores are high; however, their work is the most analogous to our study.
the sum of both scores may be low. This result may be Huang and Liu [21] also include chips-based indices and
indistinguishable from the case when users express few the sentiment of social media in the prediction model. Their
opinions on the PTT platform. Since NS only represents article uses HHPIC as the analysis object, and this study
negative sentiment, and the grey relational grade of PS is focuses on TSMC. The analysis period (110 days) in their
close to NS, this study subsequently added both variables article does not account for the COVID outbreak, whereas
to LSTM to model predictions. our study covers the period before and after the outbreak
(from January 1, 2019 to April 29, 2021, 561 days). While
their article also considers the sentiment score as a predic-
4.4 Prediction Model Performance of Sentiment tive variable, they only use a single sentiment variable.
Variables in LSTM Furthermore, they use chips-based indices as predictive
variables but employ LR only as a predictive and feature
As mentioned in subsection 4.1, using SVM as a classifier in screening tool.
GA can help obtain the best combination of chip-based indi- Although, Huang and Liu’s prediction accuracy appears
cators, including PFL, SITB, SITS, DS, MLS, MLCP, SLB, marginally higher than that in this study, the higher pre-
and SLB. In addition to these eight variables, this study diction accuracy can be attributed to the lack of analyzed
added two sentiment variables, NS and PS, selected using data. Their study uses a small number of days for the anal-
GRA. A total of 10 indicators were included in LSTM to ysis. For example, their study shows P = 1% (stock price
rebuild the prediction model. Table 10 shows that after add- increased more than 1%) and a prediction accuracy of as
ing the sentiment indicator, prediction accuracy can reach high as 78%. During the analysis period, HHPIC showed
62.22%, which is 2.22% higher than the prediction accuracy an increase of more than 1% for only 40 days, which is

13
93 Page 12 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

significantly less than the number of days it reported Funding 1. All authors certify that they have no affiliations with or
P = 0% (110 days). Thus, it is difficult to use LR to tackle involvement in any organization or entity with any financial interest or
non-financial interest in the subject matter or materials discussed in this
the issue of imbalanced data, which may easily occur when manuscript. 2. The authors declare that they have no known competing
analyzing data for 40 days. Thus, while the accuracy of financial interests or personal relationships that could have appeared to
their analysis is impressive, there remain several concerns. influence the work reported in this paper.
According to Westreich et al. [48], if the number of obser-
Availability of Data and Materials The datasets generated during and/or
vations is less than the number of features, using LR may analyzed during the current study are available from the corresponding
lead to overfitting. Therefore, as stated in Sect. 2 of our author on reasonable request.
paper, Huang and Liu’s analysis results are not statistically
significant owing to the relatively small dataset. Declarations
Conflict of Interest The authors declare that they have no competing
interests.

5 Conclusions Ethical Approval and Consent to Participate Not applicable.

Consent for Publication Not applicable.


This study makes four major contributions. First, it inte-
grated GA with three machine learning tools to identify the
Open Access This article is licensed under a Creative Commons Attri-
best combination of indicators, which were then incorpo- bution 4.0 International License, which permits use, sharing, adapta-
rated into LSTM to improve prediction accuracy. Second, tion, distribution and reproduction in any medium or format, as long
after the pandemic, the social media sentiment score sig- as you give appropriate credit to the original author(s) and the source,
nificantly increased, and the composition of the best chip- provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
based indicators for the LSTM prediction model changed. included in the article's Creative Commons licence, unless indicated
This study confirmed that segregating data between peri- otherwise in a credit line to the material. If material is not included in
ods before and after the pandemic can help improve pre- the article's Creative Commons licence and your intended use is not
diction performance. Third, using Taguchi’s method to permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
find the best combination of LSTM hyper-parameters can copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.
efficiently enhance LSTM prediction performance. Finally,
using GRA, this study identified that sentiment variables
NS and PS are most strongly correlated with fluctuations
of TSMC stocks, and adding the two sentiment variables to References
the LSTM model can improve the prediction performance
for stock price fluctuation. 1. Abraham, R., Samad, M.E., Bakhach, A.M., El-Chaarani, H.,
This study is subject to certain limitations that warrant Sardouk, A., Nemar, S.E., Jaber, D.: Forecasting a stock trend
further consideration. First, it only used articles on PTT as using genetic algorithm and random forest. J. Risk Financ. Manag.
15(5), 188 (2022)
a source of information. However, investors do not solely 2. Adebiyi, A.A., Adewumi, A.O., Ayo, C.K.: Comparison of
rely on PTT as an information source. Future research may ARIMA and artificial neural networks models for stock price
explore other social media platforms to ensure informa- prediction. J. Appl. Math. 2014, 614342 (2014)
tion diversity. Second, internet languages or slang rapidly 3. Aguilar-Rivera, R., Valenzuela-Rendón, M., Rodríguez-Ortiz, J.J.:
Genetic algorithms and Darwinian approaches in financial appli-
change, and many users use grammar ironically or sar- cations: a survey. Expert Syst. Appl. 42(21), 7684–7697 (2015)
castically. Thus, it may be inaccurate to judge emotions 4. Alhnaity, B., Abbod, M.: A new hybrid financial time series pre-
using the current phrase rule method. To more efficiently diction model. Eng. Appl. Artif. Intell. 95, 103873 (2020)
judge the sentiment of online messages, artificial intelli- 5. Ballings, M., Van den Poel, D., Hespeels, N., Gryp, R.: Evaluat-
ing multiple classifiers for stock price direction prediction. Expert
gence methods can be used in future analyses. Third, this Syst. Appl. 42(20), 7046–7056 (2015)
research only made binary predictions on the rise or fall 6. Bao, W., Yue, J., Rao, Y.: A deep learning framework for finan-
of stock prices on the next day. Further research is needed cial time series using stacked autoencoders and long-short term
on formulating actual trading strategies on the basis of memory. PLoS ONE 12(7), e180944 (2017)
7. Bogle, S.A., Potter, W.D.: SentAMaL-a sentiment analysis
analysis results. machine learning stock predictive model. In: Proceedings on the
International Conference on Artificial Intelligence (ICAI). The
Steering Committee of the World Congress in Computer Science,
Author Contributions J-YH and C-LT wrote the main manuscript text Computer Engineering and Applied Computing (WorldComp), p.
and participated in the conceptualization, design, and analysis and 610 (2015)
interpretation of data for the study. W-ZL prepared all figures, tables, 8. Cakra, Y.E., Trisedya, B.D.: Stock price prediction using linear
and data analysis. All authors reviewed the manuscript. regression based on sentiment analysis. In 2015 International

13
International Journal of Computational Intelligence Systems (2023) 16:93 Page 13 of 14 93

Conference on Advanced Computer Science and Information 29. Koukaras, P., Nousi, C., Tjortjis, C.: Stock market prediction using
Systems (ICACSIS), IEEE, pp. 147–154 (2015) microblogging sentiment analysis and machine learning. Telecom
9. Chang, W.S., Chang, C.Y., Chang, S.H.: An empirical study on 3(2), 358–378 (2022)
stock returns of the Taiwan 50 constituent stocks with techni- 30. Li, Z., Tam, V.: A comparative study of a recurrent neural network
cal indicators and chip analysis using the vector autoregression and support vector machine for predicting price movements of
model. J. Glob. Bus. Oper. Manag. 10, 177–187 (2018) stocks of different volatilites. In: 2017 IEEE Symposium Series
10. Chen, T.H.: Is the Taiwan stock market efficient? Evidence from on Computational Intelligence (SSCI), IEEE, pp. 1–8 (2017)
a TAR model with an autoregressive unit root. Int. Res. J. Financ. 31. Lizhen, L., Wei, S., Hanshi, W., Chuchu, L., Jingli, L.: A novel
Econ. 77, 74–83 (2011) feature-based method for sentiment analysis of Chinese product
11. Chen, B.: Sovereignty or identity? The significance of the Diaoy- reviews. China Commun. 11(3), 154–164 (2014)
utai/Senkaku Islands dispute for Taiwan. Perceptions 19(1), 107 32. Nti, I.K., Adekoya, A.F., Weyori, B.A.: A systematic review of
(2014) fundamental and technical analysis of stock market predictions.
12. Chen, Y., Hao, Y.: A feature weighted support vector machine and Artif. Intell. Rev. 53(4), 3007–3057 (2020)
K-nearest neighbor algorithm for stock market indices prediction. 33. Oyedele, A.A., Ajayi, A.O., Oyedele, L.O., Bello, S.A., Jimoh,
Expert Syst. Appl. 80, 340–355 (2017) K.O.: Performance evaluation of deep learning and boosted trees
13. Chung, H., Shin, K.S.: Genetic algorithm-optimized long short- for cryptocurrency closing price prediction. Expert Syst. Appl.
term memory network for stock market prediction. Sustainability 213, 119233 (2023)
10(10), 3765 (2018) 34. Pagolu, V.S., Reddy, K.N., Panda, G., Majhi, B.: Sentiment analy-
14. Ding, X., Zhang, Y., Liu, T., Duan, J.: Deep learning for event- sis of twitter data for predicting stock market movements. In: 2016
driven stock prediction. In: Twenty-Fourth International Joint International Conference on Signal Processing, Communication,
Conference on Artificial Intelligence, pp. 2327–2333 (2015) Power and Embedded System (SCOPES), IEEE, pp. 1345–1350
15. Endri, E., Kasmir, K., Syarif, A.: Delisting sharia stock prediction (2016)
model based on financial information: support vector machine. 35. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and
Decis. Sci. Lett. 9(2), 207–214 (2020) stock price index movement using trend deterministic data prepa-
16. Fischer, T., Krauss, C.: Deep learning with long short-term mem- ration and machine learning techniques. Expert Syst. Appl. 42(1),
ory networks for financial market predictions. Eur. J. Oper. Res. 259–268 (2015)
270(2), 654–669 (2018) 36. Pei, L., Shen, J., Liu, R.: Deep feature of image screened by
17. Hong, H., Bian, Z., Lee, C.C.: COVID-19 and instability of stock improved clustering algorithm cascaded with genetic algorithm.
market performance: evidence from the US. Financ. Innov. 7(1), In: 2017 29th Chinese Control and Decision Conference (CCDC),
1–18 (2021) IEEE pp. 452–455 (2017)
18. Hsieh, L.F., Hsieh, S.C., Tai, P.H.: Enhanced stock price variation 37. Peng, Y., Albuquerque, P.H.M., Kimura, H., Saavedra, C.A.P.B.:
prediction via DOE and BPNN-based optimization. Expert Syst. Feature selection and deep neural networks for stock price direc-
Appl. 38(11), 14178–14184 (2011) tion forecasting using technical analysis indicators. Mach. Learn.
19. Huang, J.Y.: Patent network analysis of cloud computing by text Appl. 5, 100060 (2021)
mining. J. Technol. 31(2), 127–146 (2016) 38. Sarkar, A., Hossain, S.S., Sarkar, R.: Human activity recognition
20. Huang, L.: Study on the phrase rule of PTT replies. In: 2018 from sensor data using spatial attention-aided CNN with genetic
International Conference of Annual Meeting of the Operations algorithm. Neural Comput. Appl. 35, 5165–5191 (2023)
Research Society of Taiwan and 16th Conference on Sustainable 39. Schumaker, R.P., Zhang, Y., Huang, C.N., Chen, H.: Evaluating
Operation and Development, 2018, Taichung, Taiwan (2018) sentiment in financial news articles. Decis. Support Syst. 53(3),
21. Huang, J.Y., Liu, J.H.: Using social media mining technology to 458–464 (2012)
improve stock price forecast accuracy. J. Forecast. 39(1), 104–116 40. Sezer, O.B., Ozbayoglu, A.M., Dogdu, E.: An artificial neural
(2020) network-based stock trading system using technical analysis and
22. Huang, J.Y., Tsai, P.C.: Determination of order quantity for perish- big data framework. In: Proceedings of the Southeast Conference,
able products using the support vector machine. J. Chin. Inst. Ind. pp. 223–226 (2017)
Eng. 28(6), 425–436 (2011) 41. Sezer, O.B., Ozbayoglu, M., Dogdu, E.: A deep neural-network
23. Huang, S.C., Wu, T.K.: Integrating GA-based time-scale feature based stock trading system based on evolutionary optimized tech-
extractions with SVMs for stock index forecasting. Expert Syst. nical analysis parameters. Proced. Comput. Sci. 114, 473–480
Appl. 35(4), 2080–2088 (2008) (2017)
24. Huang, C.J., Yang, D.X., Chuang, Y.T.: Application of wrapper 42. Sharma, D.K., Hota, H.S., Brown, K., Handa, R.: Integration of
approach and composite classifier to the stock trend prediction. genetic algorithm with artificial neural network for stock market
Expert Syst. Appl. 34(4), 2870–2878 (2008) forecasting. Int. J. Syst. Assur. Eng. Manag. 13(Suppl 2), 828–841
25. Huang, J.Y., Yao, M.J.: A genetic algorithm for solving economic (2022)
lot scheduling problem in flow shops. Int. J. Prod. Res. 46(14), 43. Singh, R., Srivastava, S.: Stock prediction using deep learning.
3737–3761 (2008) Multimed. Tools Appl. 76(18), 18569–18584 (2017)
26. Jing, N., Wu, Z., & Wang, H. (2021). A hybrid model integrat- 44. Suddle, M.K., Bashir, M.: Metaheuristics based long short term
ing deep learning with investor sentiment analysis for stock price memory optimization for sentiment analysis. Appl. Soft Comput.
prediction. Expert Systems with Applications, 178, 115019. 131, 109794 (2022)
27. Khan, A.H., Sarkar, S.S., Mali, K., Sarkar, R.: A genetic algorithm 45. Tuarob, S., Wettayakorn, P., Phetchai, P., Traivijitkhun, S., Lim,
based feature selection approach for microstructural image clas- S., Noraset, T., Thaipisutikul, T.: DAViS: a unified solution for
sification. Exp. Tech. 46, 335–347 (2022) data collection, analyzation, and visualization in real-time stock
28. Kim, H.J., Shin, K.S.: A hybrid approach based on neural net- market prediction. Financ. Innov. 7(1), 1–32 (2021)
works and genetic algorithms for detecting temporal patterns in 46. Vargas, M.R., dos Anjos, C.E., Bichara, G.L., Evsukoff, A.G.:
stock markets. Appl. Soft Comput. 7(2), 569–576 (2007) Deep learning for stock market prediction using technical indi-
cators and financial news articles. In: 2018 International Joint
Conference on Neural Networks (IJCNN), IEEE, pp. 1–8 (2018)

13
93 Page 14 of 14 International Journal of Computational Intelligence Systems (2023) 16:93

47. Wei, Y.C., Lu, Y.C., Chen, J.N., Hsu, Y.J.: Informativeness of 52. Yu, H., Chen, R., Zhang, G.: A SVM stock selection model within
the market news sentiment in the Taiwan stock market. N. Am. J. PCA. Proced. Comput. Sci. 31, 406–412 (2014)
Econ. Finance 39, 158–181 (2017) 53. Yuan, X., Yuan, J., Jiang, T., Ain, Q.U.: Integrated long-term stock
48. Westreich, D., Lessler, J., Funk, M.J.: Propensity score estimation: selection models based on feature selection and machine learning
neural networks, support vector machines, decision trees (CART), algorithms for China stock market. IEEE Access 8, 22672–22685
and meta-classifiers as alternatives to logistic regression. J. Clin. (2020)
Epidemiol. 63(8), 826–833 (2010) 54. Zhao, K., Zhang, J., Liu, Q.: Dual-hybrid modeling for option
49. Wu, G.G.R., Hou, T.C.T., Lin, J.L.: Can economic news predict pricing of CSI 300ETF. Information 13(1), 36 (2022)
Taiwan stock market returns? Asia Pac. Manag. Rev. 24(1), 54–59
(2019) Publisher's Note Springer Nature remains neutral with regard to
50. Wu, S., Liu, Y., Zou, Z., Weng, T.H.: S_I_LSTM: stock price jurisdictional claims in published maps and institutional affiliations.
prediction based on multiple data sources and sentiment analysis.
Connect. Sci. 34(1), 44–62 (2022)
51. Yang, Y., Suliang, M., Jianwen, W., Bowen, J., Weixin, L.,
Xiaowu, L.: Fault diagnosis in gas insulated switchgear based on
genetic algorithm and density-based spatial clustering of applica-
tions with noise. IEEE Sens. J. 21(2), 965–973 (2019)

13

You might also like