Monitoring Fake Review of Products: Ensuring Authenticity in
Consumer Feedback
Elakkiya U Swathy M Kaviyasri R Shakthi R
Assistant Professor Information Technology Information Technology Information Technology
Sri Ramakrishna Institute of Sri Ramakrishna Institute of Sri Ramakrishna Institute of Sri Ramakrishna Institute of
Technology Technology Technology Technology
Coimbatore, India Coimbatore, India Coimbatore, India Coimbatore, India
[email protected] [email protected] [email protected] [email protected]
Abstract -In the digital age, customer reviews play a pivotal role in A. Historical Background
shaping consumer decisions and building brand credibility. However,
the increasing prevalence of fake reviews those artificially generated to The presence of fake reviews has increasingly compromised
either boost or tarnish product ratings threatens the authenticity of the credibility of online marketplaces. Initially, review
online feedback, leading to mistrust among consumers. This study monitoring was conducted manually, relying on moderators and
presents an approach to monitoring and detecting fake reviews using rule-based filtering techniques to identify fraudulent content.
both the Logistic Regression algorithm and Long Short-Term Memory However, these methods were inefficient, prone to human error,
(LSTM) networks. The method involves analyzing text-based review and unable to scale with the vast number of user-generated
features, user behavior, and product metadata to identify patterns reviews. With the rise of machine learning (ML) and natural
characteristic of inauthentic feedback. Logistic Regression, a robust language processing (NLP), automated review detection
and interpretable machine learning algorithm, models the probability systems have been developed. Early techniques, such as
of a review being fake, while LSTM captures deep semantic patterns keyword-based filtering and sentiment analysis, struggled to
and contextual dependencies in the review text. The models are trained differentiate between genuine and deceptive reviews. Recent
on a labeled dataset of verified and unverified reviews to enhance advancements have introduced TF-IDF vectorization to extract
prediction accuracy, with results showing that combining Logistic meaningful text features and enhance classification models.
Regression and LSTM achieves high precision and recall in This shift towards ML-based automation has significantly
distinguishing fake from real reviews. Performance is further optimized improved scalability, precision, and real-time fraud detection in
through feature extraction and sequence modeling, resulting in a online platforms.
balanced approach that minimizes false positives and negatives. This
research offers a scalable solution for e-commerce platforms, B. Problem Statement
promoting transparency in customer feedback. Future work includes Online reviews have become a powerful tool for consumers
enhancing NLP integration and adapting the models to evolving review to evaluate products and services. Websites like Amazon, eBay,
manipulation tactics. Flipkart, TripAdvisor, and Yelp rely on user-generated
feedback to build credibility, enhance customer trust, and
Keywords-Fake Review Detection, Machine Learning, LSTM,
tokenisation, Natural Language Processing, Fraud Detection. influence purchasing decisions. However, with the rise of
artificial intelligence (AI) and text-generation models, the
I. INTRODUCTION authenticity of online reviews is under serious threat. The
widespread presence of fake reviews on e-commerce platforms
In today’s digital age, online reviews have become a critical has made it difficult for customers to make informed purchasing
factor in consumer decision-making. Before purchasing a product decisions. The increasing volume of online reviews makes
or service, buyers often rely on user-generated reviews to assess manual identification impractical, necessitating automated
the quality, reliability, and effectiveness of an item. Online fraud detection mechanisms. This study introduces an ML-
marketplaces such as Amazon, Flipkart, Yelp, and eBay provide based fake review detection model using LSTM and
a platform for customers to express their opinions, which in turn vectorization. The proposed approach aims to enhance
influence other buyers and impact business sales. However, not classification accuracy, scalability, and adaptability to combat
all reviews are genuine. The rise of fake reviews has become a fraudulent review practices.
serious issue in e-commerce, as businesses, competitors, and
unethical marketers often manipulate online ratings for their C. Application
advantage. Fake reviews are typically posted to either Boost sales The proposed fake review detection system has several
by generating false positive feedback for a product. Damage practical applications, including:
competitors by leaving negative, misleading reviews. These
fraudulent reviews mislead customers into purchasing low-quality • E-commerce Platforms: Identifying fake product reviews
products or avoiding legitimate businesses, causing financial on sites like Amazon and Flipkart.
losses and eroding trust in e-commerce platforms. This growing • App Stores: Filtering misleading ratings on Google Play
challenge has led to increased interest in developing automated and the Apple App Store.
systems using machine learning and natural language processing
to detect and eliminate deceptive reviews, thereby ensuring fair • Consumer Awareness: Helping users make informed
competition, customer satisfaction marketplaces. purchasing decisions based on authentic feedback.
D. Scope of the Project • Original Review (OR) (Label: 0) – These are verified,
authentic product reviews written by real users.
This project aims to develop an automated fake review
detection system that classifies reviews as genuine or fraudulent • Fake Review (Label: 1) – These are artificially created
based on linguistic patterns, metadata, and behavioural indicators. or misleading reviews designed to manipulate product
By leveraging Logistic Regression and TF-IDF vectorization, the ratings.
system enhances the accuracy of fraudulent content detection in To ensure balanced model training and evaluation, the dataset
various online platforms. The model is designed to be scalable was divided into three subsets:
and adaptable, ensuring effective deployment across different
• Training Set (80%) – Used to train the machine learning
sectors, including e-commerce, travel, and app marketplaces. The
models and learn textual patterns.
system integrates real-time data processing, enabling continuous
monitoring and classification of new reviews as they are posted. • Test Set (20%) – Evaluates the final model's
This approach not only improves fraud detection but also performance on unseen data.
minimizes the impact of fake reviews on consumer decision- B. Data Preprocessing
making. Furthermore, the project incorporates data visualization
tools, such as confusion matrices and bar charts, to provide clear Data preprocessing is essential to standardize textual input
insights into the model’s performance. Future enhancements and ensure consistency across the dataset. In this project, review
include deep learning models (LSTM), multilingual review text was cleaned, tokenized, and transformed into a structured
analysis, and user behavioural tracking, which will further format suitable for machine learning models. The key
improve the system’s efficiency and accuracy. By implementing preprocessing steps include:
this solution, businesses can maintain transparency, protect • Text Cleaning – Removal of special characters,
consumer trust, and ensure fair competition in digital punctuation, and HTML tags to reduce noise in the
marketplaces. Additionally, regulatory bodies can use this system dataset.
to monitor and take action against deceptive marketing practices. • Lowercasing – Converting all text to lowercase to
E. Existing System maintain uniformity and avoid duplicate representations
of words.
Currently, fake review detection in online platforms primarily
C. Feature Extraction
relies on manual moderation and rule-based filtering. While some
platforms implement basic keyword detection and sentiment Once the data is pre-processed, the next step involves
analysis, these methods struggle with identifying sophisticated extracting meaningful features that the model can utilize for
fraudulent content. Additionally, user reports and flagging classification. In the context of deep learning models such as
mechanisms are often employed to detect suspicious reviews, but Long Short-Term Memory (LSTM), the process of feature
this approach depends on subjective human intervention and can extraction differs significantly from traditional approaches like
be inconsistent. Some existing machine learning-based models Term Frequency-Inverse Document Frequency (TF-IDF).
attempt to classify fake reviews; however, they often suffer from Traditional methods rely on manually computed term statistics
limited feature extraction, high false positive rates, and difficulty that often ignore the sequence and contextual semantics of
generalizing across different review platforms. Many models rely words. In contrast, LSTM models learn features automatically
solely on textual analysis, ignoring metadata and behavioural from raw text data through sequential modelling and word
patterns, which are crucial in detecting fraudulent activity. embeddings. Unlike classical machine learning methods,
Furthermore, current systems lack real-time processing LSTM networks do not require manual feature engineering.
capabilities, leading to delays in identifying and removing The key steps involved in feature extraction for LSTM models
deceptive reviews. The absence of scalable and adaptable include:
automated solutions presents a significant challenge for e- • Tokenization: Converts each word in the text into a
commerce, travel, and service-based platforms, highlighting the corresponding integer index using a vocabulary built
need for a more robust and comprehensive detection approach[1]. from the dataset.
II. METHODOLOGY • Sequence padding: Ensures that all input sequences are
of equal length by padding shorter ones with zeros,
A. Data Collection and Data Description allowing consistent batch processing.
The product reviews play a significant role in shaping • Embedding layer: Maps each word index to a dense
consumer decisions and influencing brand reputations. However, vector representation that captures semantic
the increasing presence of fake reviews has made it difficult for relationships between words.
consumers to trust online feedback. To address this challenge, this These steps transform raw text into a format suitable for input
project aims to detect fake reviews using LSTM, leveraging a into the LSTM network, enabling the model to learn directly from
dataset of genuine and computer-generated reviews. The dataset the data. By leveraging word embeddings and sequence
used in this project consists of product reviews sourced from modelling, LSTM networks extract nuanced patterns that
publicly available e-commerce databases, particularly from traditional methods might overlook. This ability to detect subtle
Kaggle’s Amazon Review Dataset. It includes both genuine cues makes them especially suitable for tasks like fake review
reviews written by real users and fake reviews generated by bots detection, where deceptive content may closely resemble genuine
or manipulated through fraudulent means. The dataset is labelled language on the surface but differs in structure and tone.
into two categories to facilitate supervised learning:
D. Model Development • Sigmoid Activation Layer: Finally, the LSTM output goes
The model is developed using a Long Short-Term Memory into a dense layer with a sigmoid activation function. This
(LSTM) neural network, a type of recurrent neural network (RNN) layer combines all the information learned and gives a
well-suited for learning sequential dependencies in natural probability score between 0 and 1: Close to 1 means the
language data. LSTM networks effectively capture both short- review is likely fake and close to 0 means it’s likely a
term and long-term contextual relationships between words, genuine review.
making them powerful tools for analysing patterns often found in σ(x) = 1 / (1 + e^(-x)) (2)
fake reviews. - σ(x) = Output of the sigmoid function (probability
between 0 and 1).
- x = Input value from the previous layer (Dense layer).
- e =Euler’s number (approx. 2.718).
E. Evaluation Metrics
The model’s performance was evaluated using various
metrics to assess its classification capabilities. Metrics like
accuracy, precision, recall, F1-score, and the confusion matrix
were employed to provide a comprehensive evaluation.
Fig. 2. Classification Report
Fig. 1. Model Architecture
Evaluation metrics include:
At the heart of the model lies the embedding layer, which
transforms word indices into dense, trainable vectors. These • Accuracy – Measures overall classification correctness.
embeddings capture semantic meaning and enable the model to • Precision – Evaluates the accuracy of positive
recognize contextual similarities between words. For example, predictions.
words used in similar contexts tend to have similar vector • Recall – Assesses the model’s ability to identify positive
representations. To quantify word similarity in the embedding cases.
space, cosine similarity is used:
• F1-Score – Balances precision and recall for an overall
cosine_similarity(v1, v2) = (v1 · v2) / (||v1|| * ||v2||) (1) effectiveness measure.
Where:
• Confusion Matrix – Visualizes the distribution of
- v1 · v2 = Dot product of vectors. correct and incorrect classifications across classes.
- ||v1|| and ||v2|| = Magnitudes (Euclidean norms) of vectors. These metrics offer insights into the model’s strengths and
Higher values indicate greater semantic similarity between areas requiring improvement.
words. After embedding, the input passes through the LSTM layer,
which processes the word sequences in order, learning contextual III. RESULT AND DISCUSSION
and temporal patterns. This allows the model to detect linguistic A. Classification Report
cues such as unusual or inconsistent phrasing, overuse of
A classification report is a performance evaluation metric
sentiment-heavy words and repetitive structures often associated
used to assess the effectiveness of a classification model. It
with fake reviews. The LSTM output is then passed through
provides a detailed summary of key metrics such as precision,
several additional layers:
recall, F1-score, and support for each class in a multi-class or
• Dropout Layer: Randomly disables a fraction of neurons binary classification problem. This report helps in understanding
during training to prevent overfitting and improve how well the model performs in predicting each class,
generalization. highlighting both strengths and weaknesses.
• Dense (Fully Connected) Layer: Maps the learned • Accuracy – Accuracy measures the overall correctness
features into an output space suitable for classification. of the model by calculating the proportion of correctly
This layer aggregates information and contributes to the predicted instances out of the total number of predictions.
final decision-making process.
Accuracy = (TP + TN) / (TP + TN + FP + FN) (3)
• Precision – It measures the proportion of correctly predicted IV. CONCLUSION AND FUTURE SCOPE
positive instances out of all instances predicted as positive. It
focuses on how accurate the model’s positive predictions are, A. Conclusion
making it crucial in situations where false positives are costly. The proposed system for detecting CG reviews using a hybrid
Precision = (TP) / (TP + FP) (4) approach of Logistic Regression and LSTM represents a
• Recall – It measures the proportion of actual positive instances significant advancement in ensuring the authenticity of customer
that were correctly identified by the model. It emphasizes how feedback in online platforms. By integrating efficient data
well the model captures all positive cases, which is vital in collection, advanced preprocessing using TF-IDF, and robust
cases like medical diagnosis where missing a positive case feature extraction with both interpretable and deep learning
(false negative) can be dangerous. models, this methodology effectively addresses the challenges of
Recall = (TP) / (TP + FN) (5) identifying deceptive reviews. The advantages of the system,
including scalability, real-time analysis, and deep contextual
• F1 Score – It is the harmonic mean of precision and recall,
providing a balance between the two metrics. It is especially understanding via LSTM with embedding and padding layers,
useful when the data is imbalanced or when both false make it well-suited for deployment across diverse review-driven
positives and false negatives carry significant costs. environments. The inclusion of TF-IDF helps highlight critical
textual patterns, while the LSTM model captures sequential and
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) (6) linguistic nuances often missed by traditional models.
Emphasizing interpretability and adaptability, the system ensures
B. Confusion Matrix transparency for stakeholders and incorporates feedback loops for
A confusion matrix is a performance measurement tool for continuous learning, allowing it to remain effective against
classification problems, providing a detailed breakdown of evolving fraudulent tactics. This not only enhances review
how well the model’s predictions align with the true class credibility but also supports consumers in making informed
labels. purchasing decisions. In conclusion, the proposed methodology
provides a strong foundation for fighting CG reviews, ensuring
transparency, and reinforcing consumer trust in digital
ecosystems.
B. Future Work
Future improvements for CG review detection systems can
focus on enhancing accuracy, adaptability, and transparency. One
key advancement involves integrating deep learning techniques
such as LSTM and attention mechanisms to better capture
complex linguistic patterns and context, improving the system’s
ability to distinguish between genuine and deceptive reviews.
Real-time monitoring capabilities can also be added, allowing
platforms to detect and flag CG reviews as they are posted,
minimizing their impact on consumer perception. Another
important direction includes expanding the system’s reach
through multi-language support and cross-platform integration,
ensuring global scalability and consistency in review analysis.
Enhanced user profiling and behavioural analytics can help detect
suspicious activity, such as repetitive posting, unnatural
sentiment shifts, or coordinated review bursts. Incorporating
Fig. 3. Confusion Matrix natural language processing tools such as advanced sentiment
analysis and embeddings like Word2Vec can deepen the
It presents the results in a tabular format, where: understanding of textual content. To ensure fairness and trust,
implementing explainable AI techniques will allow users and
• Rows represent the actual classes of the data.
platform moderators to understand why certain reviews are
• Columns represent the predicted classes by the model. flagged. Ethical AI practices and transparent review-handling
• True Positive (TP): Element of the first row and first policies will also foster user confidence. By incorporating these
column in the confusion matrix. enhancements, the system can evolve into a more robust and
• False Positive (FP): Sum of the elements of first column. intelligent solution, staying ahead of emerging fraud tactics while
promoting credibility and trust in online review ecosystems.
• False Negative (FN): Sum of the elements of first row.
• True Negative (TN): Difference between the sum of all
elements except the diagonals and the sum of TP, FP and
FN.
ACKNOWLEDGEMENT
We express our sincere gratitude to Sri Ramakrishna Institute of
Technology for providing us with the opportunity and necessary
resources to carry out this project on Monitoring Review of
Product: Ensuring Authenticity in Consumer Feedback. We
extend our heartfelt thanks to our Principal, Dr. J. David
Rathnaraj, for his constant encouragement and support. We are
deeply indebted to Dr. J. J. Adri Jovin, Head of the Department
of Information Technology, for his invaluable guidance and
motivation throughout the project. Our sincere appreciation goes
to our Project Coordinator, Dr. T. C. Ezhil Selvan, for his
continuous support and insightful suggestions. A special thanks
to our Project Supervisor, Ms. U. Elakkiya, for his expert advice,
constructive feedback, and encouragement, which helped shape
our research work efficiently. We would also like to express our
gratitude to all faculty members of the Department of
Information Technology for their technical support and
knowledge-sharing during the course of this project. Finally, we
extend our heartfelt thanks to our parents, friends, and well-
wishers, whose unwavering support and encouragement played
a crucial role in the successful completion of this project.
REFERENCES
[1] Wang, J., Tang, S., & Zhang, H. Fake Review Detection Based on
Multiple Feature Fusion and Rolling Collaborative Training. IEEE
Access, Vol. 15, Issue 8, pp. 2075–2089, 2020.
[2] Tang, S., Liu, Y., & Zhang, J. Fraud Detection in Online Product
Review Systems via Heterogeneous Graph Transformer. IEEE Access,
Vol. 32, Issue 3, pp. 1234–1245, 2021.
[3] Mohawesh, R., Mulyana, T., & Abduallah, M. A Survey of Fake
Review Detection Techniques in E-Commerce. Journal of Electronic
Commerce Research, Vol. 22, Issue 4, pp. 45–61, 2021.
[4] Tufail, H., H. Fake Reviews Detection in Online Shopping Platforms
During the COVID-19 Era. Journal of Computer Science and
Technology, Vol. 39, Issue 11, pp. 302–315, 2024.
[5] Liu, M., Zhang, L., & Wu, Q. Detecting Fake Reviews Using
Multidimensional Representations with Fine-Grained Aspects Plan.
IEEE Transactions on Neural Networks and Learning Systems, Vol. 33,
Issue 9, pp. 2765–2779, 2021.
[6] Tufail, H. The Effect of Fake Reviews on e-Commerce During and
After Covid-19 Pandemic: SKL-Based Fake Reviews Detection. IEEE
Access, Vol. 10, pp. 25555–25564, 2022.
[7] Abulqader, M., Sadeghi, M., & Abdullah, S. Unified Fake Review
Detection Using Deception Theories. Springer Journal of Artificial
Intelligence Research, Vol. 70, Issue 5, pp. 599–613, 2022.
[8] Ennaouri, M., Benazzouz, A., & Boudraa, R. A Comprehensive Review
of Sentiment Analysis Techniques in Fake Review Detection. Springer
Journal of Computational Intelligence, Vol. 21, Issue 4, pp. 1482–1495,
2023.
[9] Zhang, S., Wei, Z., & Li, K.-C. Building Fake Review Detection Model
Based on Sentiment Intensity and PU Learning. IEEE Transactions on
Neural Networks and Learning Systems, Vol. 34, Issue 10, pp. 6926–
6939, 2023.
[10] Abedin, E., & Boudraa, R. Understanding the Credibility of Online
Drug Reviews. Springer Journal of Health Informatics, Vol. 25, Issue
6, pp. 1324–1342, 2024.