Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views6 pages

Research Paper

This study explores the use of sentiment analysis and machine learning, specifically a Random Forest Classifier, to predict stock price changes based on financial news headlines. The model achieved an accuracy of 85.97% in forecasting stock price movements, demonstrating the significant impact of market sentiment on stock behavior. Future work aims to enhance the model by incorporating historical stock data and advanced NLP techniques like BERT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Research Paper

This study explores the use of sentiment analysis and machine learning, specifically a Random Forest Classifier, to predict stock price changes based on financial news headlines. The model achieved an accuracy of 85.97% in forecasting stock price movements, demonstrating the significant impact of market sentiment on stock behavior. Future work aims to enhance the model by incorporating historical stock data and advanced NLP techniques like BERT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Utilizing Sentiment Analysis and Machine Learning

to Forecast Stock Price Changes from Financial


News
Ishica Sudhanshu Kumar Jha Ujala
Bachelor of Engineering - Computer Bachelor of Engineering - Computer Bachelor of Engineering - Computer
Science and Engineering Science and Engineering Science and Engineering
Chandigarh University Chandigarh University Chandigarh University
Mohali, Punjab Mohali, Punjab Mohali, Punjab
[email protected] [email protected] [email protected]

Sneha Suraj Er. Ritika Chaudhary


Bachelor of Engineering - Computer Bachelor of Engineering - Computer Department of Computer Science and
Science and Engineering Science and Engineering Engineering
Chandigarh University Chandigarh University Chandigarh University
Mohali, Punjab Mohali, Punjab Mohali, Punjab
[email protected] [email protected] [email protected]

Abstract— This study looks into the use of sentiment analysis By using a Random Forest Classifier in conjunction with
to forecast stock prices by using headlines from the day's sentiment analysis, the study seeks to close the gap between
financial news to foretell changes in the market. The research qualitative market data and stock price prediction [2].
creates a prediction model that can anticipate whether stock
prices will rise or fall based on sentiment gleaned from news
stories by combining Natural Language Processing (NLP)
with machine learning. CountVectorizer is used for
preprocessing, tokenization, and feature extraction on a
dataset of daily headlines and stock results (positive or
negative). Using a Random Forest Classifier, high precision
and recall metrics were attained along with an accuracy of
85.97%. The model performs well in forecasting rises in stock
price, indicating the influence of market mood on stock
behavior. In order to conduct a more thorough contextual
analysis, future work will involve extending the model to
include historical stock data, macroeconomic variables, and
sophisticated NLP approaches like BERT. The present study
underscores the capacity of sentiment analysis to augment
financial market forecasting and decision-making.

Keywords: Random Forest Classifier, Natural Language


Processing (NLP), Sentiment Analysis, Stock Price Prediction,
and Financial News Forecasting Figure 1. Stock prediction with Sentiments Analysis using
Random Forest Regression

I. INTRODUCTION The principal aim of this research is to create a machine


learning model that can forecast the direction of stock prices
In the financial markets, predicting swings in stock by utilizing the sentiment of financial news. The project's
prices is an essential undertaking that can give investors a main objectives are to pre-process textual input, apply
big edge if done correctly. Technical indicators and machine learning techniques, and assess model
historical price data have long been mainstays of stock performance using a range of indicators. The study also
market prediction models. Still, qualitative factors like looks at ways to make improvements in the future by adding
market news, economic reports, and political events can more significant variables, like past stock prices and
affect stock prices. Stock price swings can result from macroeconomic indicators, and by using sophisticated NLP
changes in investor decisions caused by the way these techniques to gain a better grasp of market sentiment [3].
elements are interpreted, which in turn shapes market
sentiment [1].
This study tackles a crucial query in the finance sector: Is it
Sentiment analysis, or the process of pulling sentiments, possible to accurately forecast changes in stock prices using
emotions, and opinions out of text, is now able to be the mood found in daily news headlines? By providing
included into predictive models thanks to machine learning answers, the study hopes to reveal insightful information on
techniques and the quick development of technology. the relationship between sentiment analysis and financial
Utilizing Natural Language Processing (NLP) techniques, forecasts, potentially providing investors with tools to help
this study attempts to forecast stock price fluctuations and them make better judgments.
evaluate daily financial news headlines.
The capacity of ML algorithms to analyze extensive
II. LITERATURE REVIEW data and recognize non-linear connections has caused a rise
in the adoption of such models in financial prediction. The
Historical stock data and technical indicators were used
project demonstrates that utilizing Random Forest
in traditional methodologies, which were mainly
Classifiers is highly effective for analyzing high-
quantitative in nature and predicted future price trends.
dimensional data, such as the sentiment of financial news.
Scholars have focused more on integrating qualitative
Breiman (2001) introduced RF, an ensemble learning
elements like market news and sentiment into predictive
method that combines the Analyse of multiple decision
models, realizing that stock prices are impacted by more
trees to enhance forecast accuracy. This method can be
than simply quantitative data. In order to anticipate stock
advantageous for financial prediction issues by reducing
prices, this section analyzes important research that
overfitting and maintaining robustness in noisy data
integrate sentiment analysis, machine learning, and NLP.
scenarios [8].
The process of identifying a text's emotional undertone
Additional studies have examined the utilization of
in order to learn more about the market's mood is known as
machine learning algorithms, including SVM, Neural
opinion mining, or sentiment analysis. Emotion is
Networks, and XGBoost, to forecast fluctuations in stock
important in financial markets, according to research, since
prices. For example, Kumar and Thenmozhi (2006) utilized
investor behaviour can be greatly influenced by the tone of
SVM to categorize stock movements using news articles
news reports, stories, and social media criticism. Tetlock
and obtained positive outcomes. Random Forests remain
(2007) conducted early research that showed the predictive
one of the most frequently utilized models in this field due
potential of sentiment in financial news and how stock
to their simplicity, interpretability, and feature priority
prices might be negatively impacted by bad sentiment in the
ranking.
media. Subsequent study investigating various methods of
quantifying sentiment from text and its correlation with
The advantages of combining sentiment data with past
stock market performance was made possible by his work
stock prices and other financial indicators for increased
[4].
forecast accuracy have been shown in a number of research.
Technical indicators like moving averages and RSI as well
In a more recent study, Bollen, Mao, and Zeng (2011)
as textual data from news sources can improve the
examined the potential of sentiment analysis on platforms
effectiveness of stock prediction models, as demonstrated
like Twitter to forecast stock market trends. Based on their
by Ding, Zhang, Liu, and Duan (2015). Their conclusions
study, real-time tweets from the public can serve as a
show that including data from several sources improves the
dependable signal of daily market changes. This
model's ability to represent the qualitative and quantitative
demonstrates the importance of sentiment analysis in
elements that influence stock prices [10].
understanding how investors respond to financial news and
the overall market sentiment portrayed on various media
Using daily financial news headlines as a stand-in for
platforms [5].
market mood, this initiative takes a similar strategy. The
research, however, indicates that including further elements
NLP is now a powerful tool for extracting information
could increase the model's accuracy. Examples of these
from text, especially in financial use cases. Various
include past stock prices and macroeconomic indices like
methods, including bag-of-words models, TF-IDF,
GDP growth and inflation rates. Gupta and Chen (2016)
Word2Vec, and GloVe, have converted disorganized
provided evidence of the effectiveness of this combination
financial writing into structured, machine-readable formats.
method by demonstrating that models that use historical
Kogan et al. (2009) demonstrated that analyzing text from
data in addition to sentiment data perform better than those
earnings reports and SEC filings using natural language
that rely only on one data source.
processing (NLP) techniques can aid in predicting stock
performance [6]. Furthermore, the advancement of
Although sentiment analysis and ML have made
transformer-based models like BERT has significantly
significant advancements in predicting stock price
improved the understanding of words in text contexts,
movements, there are still gaps that need to be
leading to notable progress in natural language processing.
addressed.The majority of previous research, for example,
These models have shown effectiveness in grasping the
focuses on static datasets and integrates real-time news
subtleties of sentiment in financial news, going beyond the
streams only minimally. The context in which words appear
constraints of standard NLP methods. While models like
is also ignored by many models, which instead rely on more
BERT can understand how words are connected in a
basic NLP approaches like bag-of-words and TF-IDF.
sentence and offer more in-depth sentiment analysis,
While more complicated linkages and temporal correlations
CountVectorizer and TF-IDF are commonly used to track
in the text can be captured by advanced models like BERT
how often concepts appear. Despite being computationally
or LSTMs (Long Short-Term Memory networks), these
demanding, the application of such sophisticated
models have not yet been routinely applied in stock market
approaches has demonstrated encouraging results in
forecasting [11].
capturing market sentiment [7].
Apart from that, there aren't many thorough studies in
the literature on how new data sources—like sentiment
from social media and other alternative data streams—
affect stock price prediction. Future research has a great  Random Word Insertion and Removal: To mimic actual
chance to investigate how the sentiment of financial news fluctuations in news reporting, words were substituted
could be linked with these new data sources to provide or removed at random.
forecasts that are more precise and timely [12].  The Random Sentence Shuffling technique involves
rearranging headlines to create noise, which compels
III. METHODOLOGY the model to concentrate on the overall sentiment rather
than the word order [15].
This research utilizes sentiment analysis of financial
news headlines along with advanced ML algorithms to E. Model Training
predict fluctuations in stock prices. The study merges Transformer-based models like BERT, BEiT, and
Random Forest Classifier with transformer-based models Random Forest Classifier are among the machine learning
like BERT and its variations for sentiment extraction and models that were employed in this investigation. Using
stock prediction. To ensure strong and precise stock price sentiment analysis, these models were optimized for stock
prediction, the approach involved extensive data price prediction. The following steps were engaged in the
preparation, enhancement, model learning, and assessment. training process:
 BERT Model: To categorize sentiment, pretrained
A. Dataset Creation BERT was refined using the financial news headlines
The daily stock price results—whether positive or that had been analyzed [16].
negative—that correspond to the headlines in the financial  BEiT Model: This study's adaptation of the BEiT model
news comprise the dataset used in this study. A variety of for text data was used for masked image modeling.
reliable financial sources are used to compile the news  Random Forest Classifier: This machine learning model
headlines, and labels are used to indicate whether the stock was trained to predict changes in stock prices by using
market finished higher (1) or lower (0). Preparing this the transformer models' outputs as input features [16].
dataset for prediction using Random Forest Classifier and
transformer models required preprocessing [13]. The following hyperparameters were employed during
training:
B. Data Acquisition and Splitting  Learning Rate: 0.001, dynamically adjusted to
The dataset was split into three subgroups to ensure maximize performance.
proper training, validation, and testing:  Batch Size: 32, selected to strike a compromise between
 65% of the data set is used for model training. training speed and computational economy.
 The validation set accounts for 20% of the dataset and  Epochs: Each model is trained for ten epochs until
is used for fine-tuning performance and adjusting model convergence [17].
hyperparameters.
 Evaluation of the model at the end involves using a test The models were evaluated based on the metrics
set that makes up 15% of the dataset. provided. Accuracy is defined as the proportion of correct
By implementing this stratified split, biases in model guesses made. Precision is the ratio of accurately predicted
training and evaluation were minimized as it ensured an positive outcomes. The model is capable of identifying all
even spread of sentiment representation in all subsets. the positive examples. The F1-Score, a fair gauge of model
performance, combines Recall and precision using the
C. Data Preprocessing harmonic mean. The confusion matrix gives a detailed
To get the textual data ready for input into the machine analysis of true positives, false positives, true negatives,
learning models, a number of preprocessing methods were and false negatives to better evaluate the model's
used, including: effectiveness.
 Text cleaning: Stop words, special characters, and
punctuation were eliminated from all headlines and they F. Evaluation Metrics
were all changed to lowercase. This research shows how well sophisticated
 Tokenization: Keywords in headlines were separated transformer-based models work in conjunction with
into tokens for analysis [14]. sentiment analysis to forecast changes in stock prices based
 Feature extraction: To convert the text into a numerical on financial news headlines. The algorithm accurately
format and create sparse matrices that represented word predicts trends in stock prices by utilizing natural language
frequencies and their significance, the study employed processing (NLP) techniques to extract market sentiment.
CountVectorizer and TF-IDF. Important models, such BEiT and BERT, were used for
 Normalization of the data was implemented to sentiment classification, and a Random Forest Classifier
guarantee uniformity in the model's input and enhance was used to predict stocks, demonstrating how these
convergence while it was being trained. methods worked in concert. Diversity and robustness in the
model's performance were guaranteed by a meticulously
D. Data Augmentation selected and preprocessed collection of daily financial news
To improve the model's resilience and generalizability, [18]. By means of extensive training and assessment, the
data augmentation strategies were used: methodology underscores the potential of sentiment
 Synonym Replacement: To provide variation to the text analysis as a significant instrument for financial
data, synonyms were chosen at random. forecasting, hence facilitating more knowledgeable
investment choices. Potential improvements in the future
can be real-time data integration, more market indications
and the use of more sophisticated NLP models, increasing
the solution's applicability to actual financial markets. The
study's overall findings highlight the useful convergence of
machine learning and sentiment analysis for more accurate
financial market forecasts [19].

IV. RESULT

The outcomes of this investigation offer a thorough


assessment of the efficacy of sentiment analysis and ML in
the prediction of stock market fluctuations. The program Figure 3: Confusion Matrix
accurately predicted whether stock prices would rise or fall
by assessing the sentiment found in financial news
headlines. Predictive stock price and confusion matrix
visualizations provide additional insight into the model's
behavior, while key performance indicators like recall,
precision, and F one-score were utilized to evaluate the
model's effectiveness. The association between sentiment
shifts and changes in stock prices is demonstrated by these
Figure 4: Classification Report
findings, which also demonstrate the model's adaptability to
C. Stock Price Prediction:
various market circumstances.
The time series plot (see Figure 5) shows the predicted
and actual stock price trends of the model over time. Once
A. Sentiment Distribution:
the training is completed, the train-prediction plot shows
The news headlines were divided into three groups by
how well the model fits the data, while the test-prediction
the sentiment analysis: neutral, negative, and positive. A
plot confirms the model's capacity to apply to unseen data.
greater concentration of neutral and negative attitudes was
The model's ability to predict stock movements through
found in the sample, as seen by the sentiment distribution
sentiment analysis is evident in the close alignment between
histogram (see Figure 2), which also shows a noticeable
projected and actual stock prices.
spike in frequency for higher sentiment scores in the 0.9
range. To guarantee the strength of the sentiment-based
forecasts, the categories must be distributed evenly.

Figure 5: Stock Price Predictions

D. Sentiment vs. Stock Price Movement:


The association between changes in stock prices and
market mood is emphasized in the final visualization (refer
to Figure 6). The orange line displays the matching stock
closing values, while the blue line indicates the average
daily sentiment. It is clear that sentiment and stock prices
are correlated, especially at times when substantial shifts in
sentiment are followed by commensurate swings in prices.
Figure 2: Sentiment Distribution of News Headlines

B. Model Performance:
A split of 80% training and 20% testing data was used
to train and assess the Random Forest Classifier. The
confusion matrix displays the test set's categorization
results (see Figure 3). The model has a recall of 0.97 and an
F1-score of 0.86 for positive labels, indicating high
precision, particularly in forecasting positive stock moves.
With very few misclassifications, the confusion matrix
shows that the model accurately anticipated most stock
price fluctuations, both positive and negative.
Figure 6: Sentiment vs. Stock Price Movement
The research findings reveal the effectiveness of [4] Cambria, B. Schuller, Y. Xia, and C. Havasi, "New
sentiment analysis in predicting stock price fluctuations, Avenues in Opinion Mining and Sentiment Analysis,"
achieving a high accuracy rate of 84.39% in the machine IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21,
learning model. To ensure a solid foundation for model Mar.–Apr. 2013. doi: 10.1109/MIS.2013.30.
training, the sentiment distribution focuses on including an [5] J. Bollen, H. Mao, and X.-J. Zeng, "Twitter mood
equal mix of negative, neutral, and positive attitudes. The predicts the stock market," Journal of Computational
model excels, displaying high recall and precision for both Science, vol. 2, no. 1, pp. 1–8, Mar. 2011. doi:
sentiment classes, as evidenced by the classification report 10.1016/j.jocs.2010.12.007.
and confusion matrix. [6] D. S. Asudani, N. K. Nagwani, and P. Singh, "Impact
of word embedding models on text analytics in deep
The stock price prediction graph demonstrates the learning environment: a review," Artificial
model's ability to effectively extrapolate to unfamiliar data Intelligence Review, vol. 56, pp. 10345–10425, Feb.
by closely matching expected and actual stock prices. 2023. doi: 10.1007/s10462-023-10336-1.
Additionally, there is a connection between sentiment and [7] L. Gorenstein, E. Konen, M. Green, and E. Klang,
stock price changes, underlining the forecasting ability of "Bidirectional Encoder Representations from
sentiment analysis in finance. According to these findings, Transformers in Radiology: A Systematic Review of
models driven by sentiment have the ability to greatly Natural Language Processing Applications,"
improve market forecasts going forward by including [8] W. Bao, Y. Cao, Y. Yang, H. Che, J. Huang, and S.
additional market indicators and data sources. Wen, "Data-driven stock forecasting models based on
neural networks: A review," Expert Systems with
V. CONCLUSION Applications, vol. 194, p. 116904, May 2022. doi:
10.1016/j.eswa.2022.116904.
This research investigated how sentiment analysis and [9] M. Kumar and M. Thenmozhi, "Forecasting Stock
machine learning algorithms can predict changes in stock Index Movement: A Comparison of Support Vector
prices using financial news headlines. The study Machines and Random Forest," SSRN Electronic
demonstrated that feelings extracted from news could Journal, Jan. 2006. doi: 10.2139/ssrn.876544.
predict stock market patterns with an accuracy of 84.39% [10] X. Ding, Y. Zhang, T. Liu, and J. Duan, "Using
when utilizing models like BERT and Random Forest Structured Events to Predict Stock Price Movement:
Classifier alongside thorough preprocessing techniques. An Empirical Investigation," in Proceedings of the
The model's capacity to manage a wide range of sentiment 2014 Conference on Empirical Methods in Natural
categories and precisely forecast market movements was Language Processing (EMNLP), Doha, Qatar, Oct.
confirmed by the examination of the sentiment distribution, 2014, pp. 1415–1425. doi: 10.3115/v1/D14-1148.
confusion matrix, and classification report. The model's [11] M. M. Kumbure, C. Lohrmann, P. Luukka, and J.
ability to generalize was further supported by the alignment Porras, "Machine learning techniques and data for
of anticipated and actual stock prices, and the predictive stock market forecasting: A literature review,
power of sentiment was highlighted by the link between [12] T. H. Nguyen, K. Shirai, and J. Velcin, "Sentiment
changes in sentiment and stock price movements. Analysis on Social Media for Stock Movement
According to these results, sentiment-driven models Prediction," Expert Systems with Applications, vol.
provide insightful information for financial predictions with 42, no. 24, pp. 9603–9611, Dec. 2015. doi:
the potential to achieve even higher accuracy when 10.1016/j.eswa.2015.07.052.
combining sentiment analysis from social media and [13] M. Costola, O. Hinz, M. Nofer, and L. Pelizzon,
technical indications with other data sources. The "Machine learning sentiment analysis, COVID-19
advancement of sentiment analysis in financial prediction news and stock market reactions," Finance Research
models is aided by this research. Letters, vol. 46, Part B, p. 102379, 2022. doi:
10.1016/j.frl.2021.102379.
REFERENCES [14] M. Siino, I. Tinnirello, and M. La Cascia, "Is text
preprocessing still worth the time? A comparative
[1] Z. Ying, D. Cheng, C. Chen, X. Li, P. Zhu, Y. Luo, survey on the influence of popular preprocessing
and Y. Liang, "Predicting stock market trends with methods on Transformers and traditional classifiers,"
self-supervised learning," accepted for publication, Information Systems, vol. 114, p. 102342, 2023. doi:
Nov. 2023. 10.1016/j.is.2023.102342.
[2] Hamid and A. M. Abdulazeez, "Sentiment Analysis [15] L. F. A. O. Pellicer, T. M. Ferreira, and A. H. R. Costa,
Based on Machine Learning Techniques: A "Data augmentation techniques in natural language
Comprehensive Review," Indonesian Journal of processing," Applied Soft Computing, vol. 131, p.
Computer Science, vol. 13, no. 3, pp. **, June 2024. 109803, Jan. 2023. doi: 10.1016/j.asoc.2022.109803.
doi: 10.33022/ijcs.v13i3.4049. [16] Y. Wu, Z. Jin, C. Shi, and P. Liang, "Research on the
[3] J. Gong, B. Paye, G. Kadlec, and H. Eldardiry, application of deep learning-based BERT model in
"Predicting Stock Price Movement Using Financial sentiment analysis," Applied and Computational
News Sentiment," in Proceedings of the 22nd Engineering, vol. 71, no. 1, pp. 14–20, May 2024. doi:
Engineering Applications of Neural Networks 10.54254/2755-2721/71/2024MA.
Conference, pp. 503–517, June 2021. doi: [17] L. N. Smith, "A disciplined approach to neural
10.1007/978-3-030-80568-5_41. network hyper-parameters: Part 1—learning rate,
batch size, momentum, and weight decay," arXiv
preprint arXiv:1803.09820, 2018. doi:
10.48550/arXiv.1803.09820.
[18] Saxena, A. Jain, P. Sharma, and S. Singla, "Sentiment
Analysis of Stocks Based on News Headlines Using
NLP," in Proceedings of the International Conference
on Artificial Intelligence Techniques for Electrical
Engineering Systems (AITEES 2022), pp. 124–135,
Jan. 2023. doi: 10.2991/978-94-6463-074-9_12.
[19] Liu, A. Arulappan, R. Naha, and A. Mahanti, "Large
Language Models and Sentiment Analysis in
Financial Markets: A Review, Datasets and Case
Study," IEEE Access, vol. PP, no. 99, pp. 1–1, Aug.
2024. doi: 10.1109/ACCESS.2024.3445413.

You might also like