Docbatch 3
Docbatch 3
INTRODUCTION
While hundreds of individuals have benefited from the immense sources of information
made available by the Internet and social media, there has been a massive increase in the
emergence of cybercrime. According to a 2019 study in the Economic Times, India
experienced a 457% increase in cybercrime between 2011 and 2016.
Most people believe this is because of the influence of social media platforms like
Instagram on our daily life. While they obviously aid in the formation of a solid social
network, the creation of user accounts on these sites usually requires only an email address.
In contrast to the real world, where various laws and regulations are imposed to identify
oneself in a unique fashion (for example, when issuing one's passport or driver's license),
entrance to the virtual world of social media is not required.
In this project, we look at Instagram accounts in particular and try to determine whether
they are phony or real. In today’s digital landscape, social media has become an integral
part of our lives, connecting billions of people globally. While these platforms offer
significant opportunities for interaction, they also face challenges, including the
proliferation of fake profiles. These fraudulent accounts, often created with malicious
intent, can undermine trust, facilitate scams, and spread misinformation.
The rapid growth of social media platforms has brought numerous benefits to how people
connect and communicate, but it has also led to an alarming rise in fake social media
profiles. These profiles, often created with false or misleading information, can serve
malicious purposes such as spreading misinformation, phishing, scamming, impersonating
individuals, or inflating engagement metrics artificially. Detecting and managing fake
profiles is crucial for maintaining the authenticity of online interactions, combating
cybercrimes, and enhancing user trust.
The Fake Social Media Profile Detection and Reporting project aims to identify and report
fraudulent or suspicious user profiles on social media platforms by analysing behavioural
patterns, profile inconsistencies, and content anomalies using machine learning and data
1
analysis techniques. With the increasing misuse of social platforms for spreading
misinformation, scams, and impersonation, this project provides an automated and scalable
solution to enhance user safety and trust. It involves gathering data from social networks.
1.2 OBJECTIVE
The objective of this project is to design and implement an automated and efficient system
for detecting and reporting fake social media profiles. Fake profiles are often used for
malicious purposes, including spreading misinformation, scamming individuals,
impersonating others, and conducting cyberbullying, which compromises the security and
integrity of social media platforms. This system will focus on leveraging advanced
technologies such as artificial intelligence (AI) and machine learning (ML) to analyze user
behaviour, account activity patterns, metadata, and content to differentiate fake accounts
from genuine users with high accuracy.
The solution aims to reduce the prevalence of fraudulent accounts, protect users from
potential harm, and enhance the overall trustworthiness of social media platforms. It will
also incorporate a seamless and user-friendly reporting mechanism that allows both users
and administrators to flag suspicious accounts for further review or immediate action.
Furthermore, the system will ensure scalability to handle large datasets, adaptability to
evolving fraudulent tactics, and compliance with privacy and data protection standards to
maintain user trust.
2
The aim of the proposed system is to develop a robust and intelligent solution for detecting
and reporting fake social media profiles in order to enhance platform security, protect user
privacy, and foster trust among users. By leveraging advanced technologies such as
artificial intelligence, machine learning, and data analytics, the system seeks to accurately
identify fake accounts based on behavioural patterns, profile metadata, and content
analysis. Additionally, the system aims to provide an efficient and user-friendly reporting
mechanism, enabling timely action against fraudulent accounts while ensuring compliance
with ethical, legal, and privacy standards. This will contribute to creating a safer, more
secure, and trustworthy social media ecosystem.
One of the critical aspects of this project lies in real-time detection and adaptability. Fake
accounts often evolve in behavior to avoid detection, mimicking the activities of genuine
users. Therefore, the proposed system must not only rely on static rules but also incorporate
dynamic learning mechanisms that adapt as new patterns emerge. By continuously updating
the model using new data such as flagged accounts, user feedback, and detection results
the system remains effective against newly created and more sophisticated fake profiles.
This kind of self-improving framework is vital for keeping pace with the rapidly changing
digital landscape.
Another key consideration is user empowerment and engagement in the detection process.
While automation is essential, user input adds a valuable human layer to the system.
Integrating community-based features such as the ability for users to provide feedback on
flagged accounts or participate in crowdsourced reporting can improve both the accuracy
of detection and the sense of shared responsibility. When users feel actively involved in
keeping their platform safe, it not only boosts the effectiveness of the system but also builds
a stronger, more engaged online community.
Lastly, the system’s ethical and transparency components are just as important as its
technical performance. Users must understand why an account has been flagged or
restricted to maintain fairness and avoid unnecessary panic or misuse. Providing
explanations for decisions, offering appeal mechanisms, and respecting user rights are
3
essential to ensuring that the technology is used responsibly. These measures not only
protect users from false accusations but also strengthen the system’s credibility and
encourage widespread acceptance of its use across various platforms.
1.3 MOTIVATION
The rise of fake social media profiles has introduced challenges that existing detection
systems struggle to address effectively. Traditional methods for identifying fake profiles,
such as manual reviews or rule-based algorithms, have proven to be limited in scalability
and adaptability. Manual detection is labour-intensive, time-consuming, and prone to
human error, while rule based systems often rely on predefined heuristics that are static and
incapable of addressing evolving tactics employed by fraudsters. For instance,
sophisticated fake profiles can now mimic legitimate user behaviour, making it difficult for
traditional approaches to differentiate between real and fake accounts.
The motivation for employing machine learning lies in its ability to overcome the
limitations of earlier systems by offering scalability, adaptability, and efficiency. By
leveraging advanced techniques such as natural language processing, behavioural analysis,
and anomaly detection, machine learning-based solutions can identify fake profiles more
accurately and proactively. This enables faster detection, reduces the burden on platform
administrators, and enhances the overall security and trustworthiness of social media
platforms, ensuring a safer online environment for users
Beyond technical challenges, the social and psychological impact of fake profiles is
significant and often overlooked. These fraudulent accounts can be used to harass
individuals, manipulate opinions during elections, spread false narratives, or deceive users
4
into financial scams. The emotional toll on victims ranging from anxiety to loss of trust in
online communities highlights the need for more human-centered solutions. An effective
detection system, therefore, does more than just flag suspicious activity; it serves as a
protective barrier for vulnerable users and helps maintain a respectful and trustworthy
digital environment.
Finally, collaboration across platforms and sectors could significantly strengthen the fight
against fake profiles. While individual platforms may develop their own detection systems,
the problem of fake accounts is often cross-platform in nature. Sharing anonymized data,
detection techniques, or threat intelligence among tech companies, cybersecurity firms, and
regulatory bodies could enhance the collective ability to identify and respond to threats.
Such cooperation can lead to industry-wide standards for fake profile detection, ensuring a
more consistent and coordinated defense against digital deception.
Detection System:
Binary Classifiers:
These systems analyse profile information to classify accounts as genuine or fake.
5
Algorithms such as Support Vector Machines (SVM), Neural Networks (NN), and Random
Forests are commonly used. For instance, a GitHub project demonstrates the use of these
algorithms for fake profile detection.
Ensemble Methods:
Techniques like XGBoost and LightGBM combine multiple models to improve detection
accuracy. A study comparing various classifiers found that XGBoost outperformed others
in identifying fake profiles
Deep Learning Approaches:
Long Short-Term Memory (LSTM) networks and multilay neural networks have been
applied in addition to detection, systems have been developed to facilitate the reporting of
fake profiles
User Reporting Interfaces:
These systems allow users to report suspicious profiles, providing evidence and details that
are managed by a reporting system. Machine learning models then analyse the reported
profiles to classify them as genuine or fake.
Real-Time Alerts:
Some systems are designed to monitor user activity in real-time, flagging suspicious
profiles and generating alerts for further investigation Manual Verification:
Many current methods rely heavily on manual verification by human moderators, which is
time consuming, error-prone, and cannot scale to handle the vast number of profiles.
Limited Feature Consideration:
Existing systems may not consider a comprehensive set of features that can accurately
distinguish between real and fake profiles, leading to reduced detection accuracy.
Privacy Concern :
Some existing methods may collect and store sensitive user data causing privacy concerns.
Systems may incorrectly classify genuine profiles as fake or fail to detect the fake profiles
which causes negative impact on User experience.
Limited Accuracy and Precision
False Positives: Genuine profiles are sometimes mistakenly flagged as fake, leading to user
dissatisfaction.
6
False Negatives: Sophisticated fake profiles, especially those created with AI tools, often
bypass detection systems.
ALOGRITHMS USED
1. Logistic Regression
Logistic Regression is a simple and effective classification algorithm used to predict binary
outcomes—in this case, whether a profile is real (0) or fake (1). It calculates the probability
of a profile being fake based on features such as username patterns, profile completeness,
number of followers, post frequency, and engagement rates. This algorithm works well for
linearly separable data and is easy to interpret, making it suitable for an initial baseline
model.
2. Decision Tree
A Decision Tree splits the dataset into branches based on feature values and makes a
prediction at each leaf node. It is useful for handling both numerical and categorical data
and can model complex decision-making paths such as “If the profile has no picture AND
has 0 followers, then it's likely fake.” However, Decision Trees can overfit on small
datasets, so they are often used in ensembles
3. Random Forest
Random Forest is an ensemble learning method that builds multiple decision trees and
merges their outputs to improve accuracy and reduce overfitting. It is highly effective for
fake profile detection because it can capture complex interactions between features like
suspicious activity patterns, repetition in bio text, or sudden follower growth. Random
Forest is robust and works well on imbalanced datasets with some tuning.
4. Support Vector Machine (SVM)
SVM is a powerful algorithm for binary classification problems. It works by finding the
hyperplane that best separates fake and real profiles in a high-dimensional space. Using
8
kernel functions, SVM can also handle non-linear data patterns, such as the subtle textual
differences in bios or activity behavior between real and fake profiles.
5. Naive Bayes
Naive Bayes is especially useful when working with textual features like profile bios,
usernames, or posts. It applies Bayes' theorem with the assumption of feature independence
and performs well in natural language processing tasks. For example, it can flag profiles
that use certain fake-promoting keywords or unnatural patterns in their bio descriptions.
6. K-Nearest Neighbors (KNN)
KNN is a lazy learning algorithm that classifies a new profile based on the majority class
of its ‘K’ nearest neighbors in the feature space. If a new profile is similar to previously
flagged fake ones, KNN can detect it based on its closeness to those examples. It works
best with normalized datasets and smaller-scale applications due to its higher
computational cost.
7. XGBoost (Extreme Gradient Boosting)
XGBoost is a high-performance boosting algorithm that combines multiple weak learners
(typically decision trees) to create a strong model. It is well-suited for complex datasets
and offers high accuracy, speed, and robustness. XGBoost can be trained to prioritize
important features like fake engagement patterns or content duplication.
The proposed system emphasizes behavioral analytics as a core element for detecting fake
profiles. Instead of relying solely on static profile information such as names, bios, or
profile pictures, the system continuously monitors dynamic behavioral signals. These
include patterns in posting frequency, follower-to-following ratios, engagement
irregularities, and response times. For instance, accounts that post content at unusually
consistent intervals or have disproportionate interaction metrics may raise red flags. By
examining these subtleties, the system can detect even sophisticated fake profiles that
mimic human behavior.
Another integral part of the system is the use of network analysis, which examines how a
profile is connected to others in the social media space. Real users typically exhibit organic
9
and diverse social networks, whereas fake profiles often show clusters of artificial or
botlike connections. The system can evaluate these patterns through graph-based
techniques, identifying suspicious clusters or anomalies within a user’s friend or follower
network. This approach enhances the system's ability to detect coordinated fake profile
campaigns or networks of bots working together.
To ensure real-time detection and response, the system incorporates streaming data
pipelines that process user activity as it happens. This is crucial for minimizing damage
caused by fake profiles before they reach large audiences. The architecture is designed to
handle high-throughput data environments, allowing social media platforms to respond
instantly when suspicious activity is detected. This real-time capability is essential for
platforms dealing with high volumes of content and users, enabling timely interventions
such as flagging, limiting, or disabling accounts.
The system also includes a feedback and learning loop, allowing it to evolve and improve
over time. User reports, moderator actions, and system decisions feed back into the learning
model, helping refine detection accuracy with each cycle. This adaptive mechanism ensures
that the system stays current with new tactics employed by malicious actors. Moreover, it
minimizes false positives by learning from past detection errors and adjusting thresholds
accordingly, making the system more reliable and less disruptive to genuine users.
Finally, the proposed system supports modular deployment and scalability, allowing it to
be integrated into existing platform infrastructures with minimal disruption. Each
component—data collection, preprocessing, detection engine, reporting, and user
interface—is built as a standalone module. This design makes it easier to upgrade or replace
individual parts without affecting the entire system. It also supports expansion, so as user
bases grow or new platforms are added, the system can scale horizontally and maintain
performance levels.
10
1.6 SCOPE AND PURPOSE
The primary purpose of the "Fake Social Media Profile Detection and Reporting" project
is to develop a robust, automated system that can identify, detect, and report fake or
fraudulent profiles across social media platforms. These fake profiles are often used for
malicious activities such as identity theft, phishing, scams, and spreading misinformation.
The system aims to enhance user safety, improve platform integrity, and maintain trust in
social media interactions by providing an effective mechanism for identifying and
managing suspicious accounts.
The scope and purpose of fake social media profile detection and reporting focus on
maintaining the integrity, security, and trustworthiness of online platforms. These systems
aim to identify and remove fake accounts, bots, and impersonators to protect users from
scams, identity theft, and harassment while ensuring authentic interactions and accurate
analytics for businesses.
Additionally, such systems support compliance with regulatory frameworks like the GDPR
or Digital Services Act and mitigate the misuse of emerging technologies, such as
AIgenerated profiles or bots, for malicious purposes. Ultimately, the purpose is to create a
more secure and credible digital ecosystem for users and organizations The primary
purpose of this project is to develop an intelligent system that can accurately detect and
report fake or malicious social media profiles in order to enhance user safety, trust, and
platform integrity.
With the rapid growth of social media, fake profiles are increasingly being used for harmful
purposes such as phishing, misinformation, identity theft, spamming, political
manipulation, and financial scams. By leveraging artificial intelligence, data analytics, and
cybersecurity technologies, this system aims to automatically identify suspicious behaviour
and flag or remove such accounts before they can cause harm. The ultimate goal is to create
a safer and more authentic online environment where users can interact without fear of
deception or exploitation.
11
The scope of this project covers the design, development, and deployment of a detection
framework that uses machine learning models, natural language processing, image
verification, and network analysis to identify fake profiles. It includes the collection of
social media data (such as posts, follower statistics, profile information, and user
behaviour), training of algorithms to classify profiles, and implementation of automated
and manual reporting tools. The project also explores the integration of real-time
monitoring tools for continuous protection, and a userfriendly interface for administrators
or users to review flagged accounts
An important dimension of this project’s scope is its focus on cross-platform applicability.
Fake profiles are not confined to a single social media site; they often operate across
multiple platforms to maximize reach and impact. Therefore, the system aims to be
adaptable and scalable so it can integrate with various social networks, regardless of their
underlying technologies or data structures. This flexibility ensures that detection efforts are
consistent and comprehensive, helping to close gaps that fraudsters might exploit when
moving between platforms.
The project also emphasizes the need for collaborative intelligence and data sharing within
the digital ecosystem. By incorporating mechanisms for anonymized data exchange and
threat intelligence sharing, the system can benefit from collective insights across platforms
and organizations. This approach enables faster identification of emerging fraudulent
tactics and coordinated responses to large-scale attacks, making it harder for fake profiles
to proliferate unchecked. It also supports industry-wide standards for reporting and
response, fostering a unified front against online deception.
Finally, the purpose extends beyond mere detection to include educating users and raising
awareness about the risks posed by fake profiles. An informed user base plays a vital role
in early identification and prevention of fraudulent activity. The project aims to integrate
user-friendly educational tools and alerts that help users recognize suspicious behavior
themselves, empowering them to take appropriate precautions.
12
CHAPTER 2
LITERATURE SURVEY
Michael Fire et al. (2012). "Strangers intrusion detection-detecting spammers and
fake profiles in social networks based on topology anomalies." Human Journal 1(1):
26-39. Günther, F. and S. Fritsch (2010). IEEE Conference on Machine Learning and
IOT
Fake and Clone profiles are creating dangerous security problems to social network users.
Cloning of user profiles is one serious threat, where already existing user’s details are stolen
to create duplicate profiles and then it is misused for damaging the identity of original
profile owner. They can even launch threats like phishing, stalking, spamming etc. Fake
profile is the creation of profile in the name of a person or a company which does not really
exist in social media, to carry out malicious activities. In this paper, a detection method has
been proposed which can detect Fake and Clone profiles in Twitter. Fake profiles are
detected based on number of abuse reports, number of comments per day and number of
rejected friend requests, a person who are using fake account. For Profile Cloning detection
two Machine Learning algorithms are used. One using Randomforest Classification
algorithm for classifying the data and Support Vector Machine algorithm. This project has
worked with other ML algorithms, those training and testing results are included in this
paper.
13
retrieval decision is made by comparing the terms of the query with the index terms
(important words or phrases) appearing in the document itself. The decision may be binary
(retrieve/reject), or it may involve estimating the degree of relevance that the document has
to query. Unfortunately, the words that appear in documents and in queries often have many
structural variants. So before the information retrieval from the documents, the data
preprocessing techniques are applied on the target data set to reduce the size of the data set
which will increase the effectiveness of IR System The objective of this study is to analyse
the issues of preprocessing methods such as Tokenization, stop word removal and
Stemming for the text documents Keywords: Text Mining, NLP, IR, Stemming.
Shalinda Adikari and Kaushik Dutta, Identifying Fake Profiles in LinkedIn, PACIS
2014 Proceedings, AISeL
As organizations increasingly rely on professionally oriented networks such as LinkedIn
(the largest such social network) for building business connections, there is increasing
value in having one's profile noticed within the network. As this value increases, so does
the temptation to misuse the network for unethical purposes. Fake profiles have an adverse
effect on the trustworthiness of the network as a whole, and can represent significant costs
in time and effort in building a connection based on fake information. Unfortunately, fake
profiles are difficult to identify. Approaches have been proposed for some social networks;
however, these generally rely on data that are not publicly available for LinkedIn profiles.
In this research, we identify the minimal set of profile data necessary for identifying fake
profiles in LinkedIn, and propose an appropriate data mining approach for fake profile
identification. We demonstrate that, even with limited profile data, our approach can
identify fake profiles with 87% accuracy and 94% True Negative Rate, which is comparable
to the results obtained based on larger data sets and more expansive profile information.
Further, when compared to approaches using similar amounts and types of data, our method
provides an improvement of approximately 14% accuracy.
14
Computer Networks and Information Technology (ICCNIT),2011 International
Conference on, July, pp. 35–390.
The social network a crucial part of our life is plagued by online impersonation and fake
accounts. Facebook, Instagram, Snapchat are the most well-known informal communities’
sites. The informal organization an urgent piece of our life is tormented by online
pantomime and phony records. Fake profiles are for the most part utilized by the
gatecrashers to complete malevolent exercises, for example, hurting individual, data fraud,
and security interruption in online social network (OSN).
Hence, recognizing a record is certified or counterfeit is one of the basic issues in OSN.
Right now, propose a model that could be utilized to group a record as phony or certified.
This model uses random forest method as an arrangement strategy and can process an
enormous dataset of records on the double, wiping out the need to assess each record
physically. Our concern can be said to be a characterization or a bunching issue. As this is
a programmed recognition strategy, it very well may be applied effectively by online
interpersonal organizations which have a large number of profiles, whose profiles cannot
be inspected physically.
Stein T, Chen E, Mangla K,” Facebook immune system. In: Proceedings of the 4th
workshop on social network systems”, ACM 2011, pp
Popular Internet sites are under attack all the time from phishers, fraudsters, and spammers.
They aim to steal user information and expose users to unwanted spam. The attackers have
vast resources at their disposal. They are well-funded, with fulltime skilled labor, control
over compromised and infected accounts, and access to global botnets. Protecting our users
is a challenging adversarial learning problem with extreme scale and load requirements.
Over the past several years we have built and deployed a coherent, scalable, and extensible
real time system to protect our users and the social graph. This Immune System performs
real time checks and classifications one very read and write action. As of March 2011, this
is25B checks per day, reaching 650K per second at peak. The system also generates signals
for use as feedback in classifiers and other components.
15
Saeed Abu-Nimeh, T. M. Chen, and O. Alzubi, “Malicious and Spam Posts in Online
Social Networks,” Computer, vol.44, no.9, IEEE2011, pp.23– 28.
Many people today use social networking sites as a part of their everyday lives. They create
their own profiles on the social network platforms every day, and they interact with others
regardless of their location and time. In addition to providing users with advantages, social
networking sites also present security concerns to them and their information to them. We
need to classify the social network profiles of the users to figure out who is encouraging
threats on social networks. From the classification, we can figure out which profiles are
genuine and which are fake. As far as detecting fake profiles on social networks is
concerned, we currently have different classification methods. However, we must improve
the accuracy of detecting fake profiles in social networks. We propose the use of a machine
learning algorithm and Natural Language Processing (NLP) technique in this paper so as
to increase the detection rate of fake profiles. This can be achieved using Support Vector
Machines (SVM) and Naïve Bayes algorithms.
16
of profile visits over a period of 90 days for users in the Peking University Renren network
and use statistics of profile visits to study issues of user profile popularity, reciprocity of
profile visits, and the impact of content updates on user popularity. We find that latent
interactions are much more prevalent and frequent than active events, are nonreciprocal in
nature, and that profile popularity is correlated with page views of content rather than with
quantity of content updates. Finally, we construct latent interaction graphs as models of
user browsing behavior and compare their structural properties, evolution, community
structure, and mixing times against those of both active interaction graphs and social graphs
17
CHAPTER 3
1. System Requirements
Operating System: Windows 10/11, Linux, or macOS
RAM: Minimum 4 GB (8 GB recommended)
Python Version: Python 3.8 or higher
Storage: At least 500 MB free space for libraries and datasets
2. Programming Tools
Python IDE: VS Code / PyCharm / Jupyter Notebook
Package Manager: pip (Python's package installer)
Version Control (optional): Git (for source code tracking and collaboration)
3. Python Libraries
Based on machine learning + web scraping + data processing:
Core Libraries:
18
• matplotlib / seaborn – data visualization Machine Learning:
• scikit-learn – ML algorithms (RandomForest, SVM, etc.)
• xgboost (optional) – gradient boosting
• joblib – for saving ML models Text Processing / NLP:
• nltk or spaCy – for profile description analysis
• re – regular expressions for data cleaning Web Scraping / API (if needed):
• requests – handling HTTP requests
• beautifulsoup4 – parsing HTML for scraping
• selenium – dynamic web scraping
Web Interface (optional):
• Database:
SQLite (via sqlite3) or MySQL for storing profiles and reports
• Report Generation:
• FPDF or report lab for generating reports in PDF format
• Logging:
Python logging module to track detections
1.Processor (CPU):
A multi-core processor (Intel i5 or AMD Ryzen 5 or higher) is recommended for faster data
processing and machine learning model training.
2.Memory (RAM):
Minimum 4 GB of RAM is required, but 8 GB or more is recommended to efficiently
handle large datasets and run multiple libraries simultaneously.
3. Storage:
At least 500 MB of free disk space is needed for installing Python, libraries, and saving
models or data; SSD storage is preferred for faster access.
4.Graphics Card (Optional):
A dedicated GPU (such as NVIDIA GTX series) is not necessary unless deep learning or
largescale data processing is involved.
5.Internet Connectivity:
Required for downloading libraries, datasets, or accessing online APIs and web scraping
sources.
6.Display Monitor:
A monitor with at least 1366×768 resolution is recommended for a clear view of the
development environment, data visualizations, and web interface if implemented.
7.Input Devices:
A standard keyboard and mouse are necessary for coding, testing, and interacting with the
development tools and web interface.
20
CHAPTER 4
SYSTEM DESIGN
The system architecture for fake profile detection typically follows a multi layered and
modular design, ensuring clarity, scalability, and ease of maintenance. At its core, the
architecture consists of four main layers: the data collection layer, processing and detection
layer, reporting and decision making layer, and user interaction layer. Each layer has
distinct responsibilities, allowing for better performance optimization and separation of
concerns. This layered approach ensures that updates or changes in one component do not
disrupt the entire system. At the data collection layer, the system gathers data from various
social media platforms via APIs or authorized scraping methods. This data includes profile
metadata (username, bio, profile picture), likes, comments, posting frequency, and content
data text, hashtags, media. The collected data is securely stored in a centralized or
distributed database, depending on the system's scale.
The processing and detection layer is where the core intelligence of the system resides.
This layer handles data preprocessing and feeds the refined data into machine learning
models. These models are trained to detect patterns commonly found in fake profiles such
as automated behavior, repetitive content, and anomalous connection patterns. In more
advanced setups, this layer may also use techniques like natural language processing and
graph-based analysis to improve accuracy and context awareness.
The reporting and decision making layer acts on the output from the detection models. If a
profile is flagged as suspicious, the system either auto-generates a report or sends it to a
21
moderation queue, depending on the confidence level. A scoring system can be applied to
indicate how likely the profile is fake. This layer also supports feedback mechanisms where
human moderators or users can approve, dispute, or review flagged results, allowing the
system to continuously improve its accuracy and decision logic.
Finally, the user interaction layer consists of user-facing components like dashboards,
reporting tools, and alert systems. This layer provides interfaces for general users to report
suspicious accounts and for administrators to view analytics, flagged profiles, and system
performance.
22
managed and was created by, the Object Management Group The goal is for UML to
become a common language for creating models of object- oriented computer software. In
its current form UML is comprised of two major components: A Metamodel and a notation.
In the future, some form of method or process may also be added to; or associated with,
UML.
UML diagrams serve as blueprints for software systems, enabling developers, analysts, and
stakeholders to communicate ideas clearly and effectively. They help bridge the gap
between technical and non technical team members by providing a visual representation of
how the system works. By modeling key components such as system structure, user
interactions, data flow, and processes, UML diagrams reduce ambiguity and ensure that
everyone involved in the project has a shared understanding of the system's design. This is
especially helpful in complex systems where clear communication is vital to avoid costly
design errors.
There are two broad categories of UML diagrams: structural diagrams and behavioral
diagrams. Structural diagrams—like class diagrams, object diagrams, and component
diagrams represent the static aspects of the system. These diagrams show how components
relate to each other, including inheritance, associations, and dependencies. Behavioral
diagrams such as use case diagrams, sequence diagrams, and activity diagrams focus on
the dynamic behavior of the system, illustrating how components interact over time and
23
how the system responds to various inputs. Both types are essential for fully understanding
and designing the system.
In the context of a Fake Profile Detection System, UML diagrams can play a critical role
in mapping the workflow. For instance, a use case diagram can depict how users and
administrators interact with the system such as submitting reports, viewing flagged profiles,
or verifying suspicious accounts. A sequence diagram can illustrate the step-bystep process
of data collection, model analysis, and reporting actions. These diagrams provide clarity on
how the system handles requests and processes data from end to end, ensuring all scenarios
are accounted .
Furthermore, UML facilitates modular and maintainable design, which is key in projects
that evolve over time. As requirements change or new features (like image recognition or
multilingual support) are added, UML diagrams can be updated to reflect the new structure
or behavior. This makes the software easier to maintain, refactor, or extend in the future.
UML also supports better documentation, which is helpful when onboarding new
developers or handing over the project to different teams.
Overall, UML diagrams are more than just drawings they are an integral part of the software
engineering lifecycle, from planning and analysis to implementation and testing. For any
serious software project, especially one involving complex AI and data pipelines like fake
profile detection, using UML ensures the system is well-structured, scalable, and easy to
understand at every stage of development.
UML Unified Modeling Language diagrams are standardized visual representations used
in software engineering to illustrate the structure, behaviour, and interactions of a system.
They help developers and stakeholders understand system design, architecture, and
workflow. Common types include class diagrams show structure, use case diagrams show
user interactions, sequence diagrams show object interactions over time, and activity
diagrams show workflows.
24
4.2.1 USE CASE DIAGRAM
A Use Case Diagram is a visual representation that illustrates the interactions between users
(actors) and a system, highlighting the system's functionalities (use cases). It helps identify
the requirements of a system by showing what users can do with it and how they interact
with various features. Use Case Diagrams are valuable for understanding user needs and
guiding system design.
A Use Case Diagram visually represents how users (actors) interact with a system through
different functionalities (use cases). It includes actors, use cases, system boundaries, and
relationships like association, include, and extend. This diagram helps in requirement
analysis, system design, and communication between stakeholders. It is widely used in
software documentation to clarify system behaviour and interactions.
In the Fake Social Media Profile Detection and Reporting project, the use case diagram
involves three primary actors: the user, the admin, and the detection system. Users can
register or log in, submit suspicious profiles for analysis, view detection results, and report
fake profiles directly. The admin oversees the system by reviewing reported profiles,
validating detection results, and taking necessary actions such as banning or deleting
confirmed fake accounts. The detection system, powered by machine learning, analyses
profile data and classifies them as real or fake. These interactions collectively ensure a
streamlined process for identifying and managing fake social media profiles effectively.
A use case in software engineering describes how a user or actor interacts with a system to
achieve a specific goal. It outlines the system's functional requirements by detailing the
steps involved in each interaction, helping developers understand user needs and system
behaviour. Use cases are typically represented in use case diagrams, which visually map
the relationships between actors and their associated tasks, making it easier to design and
validate system functionality.
25
Fig 4.2.1 Use case diagram
In the Fake Social Media Profile Detection and Reporting project, the class diagram
consists of several key classes that represent the main components of the system. The
primary classes include User, Profile, Report, Detection System, and Admin. The User class
holds attributes like username, password, and report history, along with methods for
26
registration, login, and profile submission. The Profile class contains information about the
profile being analyzed, such as profile details, activity patterns, and detection results.
Report represents the usersubmitted fake profile reports, holding data like report ID, date
submitted, and status (pending, resolved), with methods for report generation and status
updates.
The Detection System class is responsible for analysing profiles using machine learning
algorithms. It contains methods for processing profile data, flagging fake profiles, and
generating results based on model predictions. The admin class handles administrative
tasks, including managing user reports, validating detection outcomes, and taking actions
like banning fake accounts or generating system activity logs. Relationships between these
classes include associations such as a user having multiple reports and a detection system
analyzing many profiles. The system’s structure ensures smooth interaction between the
components, allowing for an efficient detection and reporting process.
A class diagram is a type of UML diagram that shows the static structure of a system by
illustrating its classes, attributes, methods, and the relationships between them. It represents
the blueprint of the system, helping developers understand how different entities interact
and are organized. Class diagrams are essential in object-oriented design, as they define the
system’s building blocks and their connections, supporting code development, system
organization, and maintenance.
A class diagram provides a detailed blueprint of a system’s classes, showing not only their
attributes (data members) and methods (functions or operations) but also the various
relationships among classes, such as inheritance (generalization), association, aggregation,
and composition. It captures how objects of different classes collaborate and interact within
the system. For example, inheritance shows a parent-child hierarchy where subclasses
inherit properties and behaviours from a superclass, while associations represent
connections or dependencies between classes. Class diagrams help in designing and
27
visualizing the system’s architecture before coding, ensuring a clear understanding of data
structures, system logic, and how components fit together.
In the Fake Social Media Profile Detection and Reporting project, the sequence diagram
illustrates the flow of interactions between the user, the detection system, and the admin.
28
The sequence begins when the user registers or logs in to the system. Once logged in, the
user submits a profile for verification. The system processes the profile data, invoking
machine learning algorithms to analyze the profile's authenticity. The detection system then
returns the result (real or fake) to the user, who can view the detection outcome. If the
profile is flagged as fake, the user has the option to report it, which creates a new report
entry in the system.
After the user submits the report, the admin is notified and can access the reported profile.
The admin reviews the report and the profile's detection results. If the admin confirms the
profile is fake, they can take actions such as banning or deleting the profile. The admin can
also generate activity reports for tracking system performance and managing fake profiles.
The sequence diagram helps visualize the interaction flow between these entities, ensuring
an efficient process for profile detection, reporting, and administrative action.
29
4.2.4 ACTIVITY DIAGRAM
Activity Diagrams are valuable for modeling workflows, business processes, and complex
operations, helping stakeholders understand the dynamics of a system and identify potential
improvements. the sequence of activities and decisions involved in a process.
An activity diagram in UML (Unified Modeling Language) is used to model the flow of
control or behavior within a system. It graphically represents the sequence of activities or
workflows, decisions, and parallel processes involved in a system. Activity diagrams are
particularly useful for visualizing business processes or the logic of a System’s
functionality.
The activity diagram in the Fake Social Media Profile Detection and Reporting project
visually represents the workflow of actions taken by users, the detection system, and
administrators. It begins with the user login or registration process, which is the entry point
for interacting with the system. After successful authentication, users can choose to submit
a profile for fake detection or manually report a suspicious profile. If the user chooses to
submit a profile, the system collects relevant profile data and sends it to the detection
module for analysis.
In the next stage of the activity flow, the detection system processes the profile using
machine learning and natural language processing techniques. The activity diagram shows
the steps involved in data preprocessing, feature extraction, and prediction. Once the
detection is complete, the system returns a result classifying the profile as "Real" or "Fake."
This result is then presented to the user, who can decide to take further action. If the profile
is marked as fake, the user has the option to submit a report. This report is saved in the
system and marked as "Pending Review" for the admin.
The final part of the activity diagram involves admin actions. The admin accesses all
pending reports and reviews the associated profiles and detection results. Based on this
review, the admin can choose to ignore the report, flag the profile for closer monitoring, or
take corrective action like banning or deleting the account. The admin also has the ability
to generate reports and monitor system performance. The activity diagram thus captures
30
the entire lifecycle from user interaction to system processing and admin decisionmaking—
ensuring a clear understanding of the system's operational flow.
.
31
relationships, including how components are distributed across different environments. A
Deployment Diagram is a type of UML (Unified Modeling Language) diagram used in
software engineering to visualize the physical deployment of artifacts (such as software
components, applications, or services) on hardware nodes. It shows how different parts of
a system are deIployed, how they interact, and the physical resources involved.
32
CHAPTER 5
IMPLEMENTATION
The rapid expansion of social media platforms has led to an exponential increase in the
creation of user profiles. Unfortunately, many of these profiles are fake created with
malicious intent such as impersonation, spreading misinformation, fraud, or phishing. This
project aims to design and implement a robust system that can detect and report fake social
media profiles using machine learning techniques and pattern analysis.
The core objective is to protect genuine users by identifying suspicious behavior and
anomalies in user profiles, thereby enhancing the overall trust and integrity of the platform.
The solution is envisioned to be deployable across different platforms with minimal
modification, depending on available APIs and data access policies.
The first step in implementing the system involves collecting user data from a social media
platform. Since direct access to real data is restricted due to privacy concerns, the project
may utilize publicly available datasets or synthetically generated data. Key features used
for detection include account age, posting frequency, follower-following ratio, content
originality, sentiment of posts, profile completeness, and interaction patterns (likes,
comments, shares). These features are preprocessed and normalized to build a training
dataset. Natural Language Processing (NLP) tools can also be applied to analyze the
language used in posts or bios, where spammy, overly generic, or promotional content
might indicate a fake profile.
The labeled dataset is split into training and testing sets, and cross-validation is used to
ensure accuracy. Performance is evaluated based on metrics like precision, recall, F1-score,
33
and ROC-AUC. In addition, unsupervised techniques such as clustering or anomaly
detection (e.g., DBSCAN, Isolation Forest) can help identify outliers that may indicate
suspicious profiles in the absence of labeled data.
Once a reliable model is trained, it is integrated into the social media platform’s backend
to run detection in real time. Each time a new profile is created or an existing one is updated,
the system analyzes the data and assigns a trust score. Profiles with scores below a certain
threshold are flagged for review. A dashboard for moderators displays flagged profiles, the
reasoning behind the classification, and options for manual verification or automatic action
(e.g., temporary suspension, user challenge, or deletion). A reporting feature is also
available for users to flag suspicious accounts, which further refines the model using
feedback loops.
A critical aspect of the system is its ability to generate detailed and transparent reports for
every detection case. These reports include feature weights, prediction confidence, and
recommendations for action. Ethical considerations are taken into account, such as
avoiding bias in training data, respecting user privacy, and ensuring compliance with data
protection regulations like GDPR. False positives are minimized through careful model
tuning and incorporating user verification processes before taking action. The system
should also include an appeal process for users wrongly flagged as fake, along with
automated logs to ensure accountability and fairness.
For scalability and performance, the system can be deployed using a microservices
architecture with components for data ingestion, feature processing, model inference, and
reporting services. Cloud platforms such as AWS, Azure, or Google Cloud can be used for
hosting and scalability. In the future, integrating deep learning models like BERT for
content analysis and graph-based analysis (e.g., Graph Neural Networks) for network
behavior modeling could enhance accuracy.
34
5.1 MODULES SPLIT UP
1. User Management Module
• Sends alerts to users and admins (e.g., report status updates, detection results).
• Can use email, in-app notifications, or logs for communication.
35
9. Logging and Monitoring Module
• Tracks system activity, errors, and user actions for auditing and debugging.
• Helps in analysing system performance and identifying unusual patterns or abuse.
36
4. Flask
Flask is a lightweight Python web framework used to build the user interface. It allows
users to log in, submit profiles for detection, view results, and report fake profiles. Flask
also connects the frontend with the backend detection logic and database.
5. SQLite
SQLite is a simple and efficient database system used to store user credentials, profile
submissions, detection results, and report records. It is lightweight and does not require a
server, making it ideal for small to medium-sized applications like this one.
Fake social media profile detection uses a combination of Machine Learning (ML), Natural
Language Processing (NLP), and bot detection systems to identify suspicious accounts. ML
algorithms such as Random Forest, SVM, and deep neural networks analyze features like
account age, post frequency, follower/following ratios, and engagement patterns to classify
accounts as genuine or fake. NLP techniques process textual content—like bios, posts, and
comments—to detect spam, repetition, and sentiment patterns using tools like keyword
analysis and named entity recognition. Image analysis is also crucial, employing reverse
image search and deepfake detection to identify AI-generated or stolen profile pictures.
In addition to content and behaviour analysis, graph-based network analysis is used to find
clusters of fake profiles and detect unnatural interaction patterns. Cybersecurity methods
like IP tracking, device fingerprinting, and honeypots help detect bots and prevent
impersonation. Real-time data processing tools such as Apache Kafka and Apache Spark
enable continuous monitoring of user activity. Big data technologies and scalable storage
solutions support the massive volume of social media data.
37
CHAPTER 6
RESULTS
Fig 6.1(a)
This first screen outlines the skeletal structure of your user interface using template logic,
likely from a Flask-based web application. It includes a text box labeled "Enter
Username" where users can input a username they suspect to be fake. A "Search" button
triggers the backend logic to check the entered username. The use of template tags like
{% if result %} and {% for key, value in user_data.items() %} suggests the dynamic
rendering of results returned by the backend. This screen is essential as it forms the
foundation of the user interface, showing how the front-end will display user information
once it’s retrieved from your fake account detection system.
The image shows a web interface for a "Check Username" feature, most likely built using
the Flask web framework with Jinja2 templating. It contains a form where users can enter
a username and click a "Search" button to retrieve and display related user data. The user
details section is conditionally rendered using Jinja2 syntax, which displays a dictionary
of user data in a readable format if results are found. There's also a "Report User" button
that appears when user data is available, possibly to flag inappropriate or suspicious
profiles.
38
Fig 6.1(b)
In this screen, a user has entered a sample username, "justin95912", into the input field and
is about to click the “Search” button. This represents the interaction phase, where the user
engages with the tool by providing the suspicious username. When the “Search” button is
clicked, a request is sent to the server-side logic that analyzes this username using
predefined criteria such as name length, profile picture availability, follower/following
ratio, and more. This screen signifies the start of the actual detection process and is crucial
for initiating the evaluation of potential fake accounts.
This web interface is likely part of a larger application aimed at monitoring or validating
user identities, which could be useful in platforms such as forums, social media tools, or
internal dashboards for admin users. When the "Search" button is clicked, the backend
would typically receive the inputted username (justin95912 in this case), process it through
a server-side function, and query a database or API to retrieve any corresponding user
information. If results are found, the page may update dynamically (possibly through
template rendering or AJAX) to show user attributes like name, email, activity history, or
status.
39
Fig 6.1(c)
After the backend processes the username, this screen displays the detailed analysis. For
the user "justin95912", the tool outputs multiple attributes such as username length, full
name structure, profile picture link, number of posts, followers, and whether the profile is
private. It concludes with a result: "justin95912 is a Fake Account", marked by the Fake:
TRUE field. The data-driven approach shown here is at the core of your detection
mechanism. The tool evaluates a combination of metadata and behavioral patterns to flag
suspicious profiles. A red "Report User" button is also available, allowing users to escalate
the case by reporting it to the platform or system administrator.
The image displays the result of a username check for "justin95912", where the system has
identified the account as fake. A detailed table of user attributes is shown, including values
such as Username length (11) fullname words (2), and Description length (100). It also
includes the profile picture URL, an external link (https://fkvryqaz.com), and engagement
metrics like 65 posts, 8194 followers, and 2680 follows. Notably, fields like Name equals
username and Private are marked FALSE, and the final Fake field is marked TRUE,
indicating that the account does not meet the criteria for authenticity. A red "Report User"
button is provided for taking further action.
40
Fig 6.1(d)
This final screen confirms that the user has reported the suspicious account. After clicking
the “Report User” button on the previous screen, the interface provides feedback via a green
notification that says “Report sent successfully!”. This serves as an acknowledgment that
the system has logged or forwarded the report for further action. This functionality is vital
for closing the loop in the fake detection process, providing users with a sense of
completion and helping administrators take the next steps in moderating or reviewing
flagged accounts.
The image shows the confirmation screen of a web-based username verification system
after a report has been submitted. At the top of the interface is the heading "Check
Username", followed by an empty input field and a blue "Search" button. Below that, a
green success message reading "Report sent successfully!" is prominently displayed,
confirming that the user has successfully flagged a suspicious or fake account. This
message is likely triggered by a backend action, such as recording the report in a database
or notifying administrators.
The design of this confirmation interface is clean and user-friendly, ensuring that the
reporting process feels intuitive and complete. The use of color blue for the button and
green for success feedback helps communicate interaction states effectively. Such a feature
is essential in systems where user-generated content.
41
CHAPTER 7
TESTING
TEST CASES
1. User Registration Test
42
• Expected Output: Model should classify correctly with a minimum accuracy (e.g.,
>85%).
7. Detection Failure Handling
The testing process for fake social media profile detection and reporting involves
evaluating the system’s ability to identify and classify fraudulent accounts accurately. This
is typically done by creating a well-balanced dataset consisting of genuine and fake
profiles, including synthetic accounts, bots, or manually flagged suspicious users. Various
machine learning models or rule-based systems are trained using features such as profile
completeness, activity patterns, posting frequency, follower-following ratios, and
engagement metrics. During testing, this dataset is divided into training and testing subsets
to validate the model’s accuracy, precision, recall, and F1-score. Cross-validation
techniques are also used to ensure the robustness and generalizability of the detection
model across different data samples.
43
Beyond accuracy focused testing, it's crucial to conduct functional testing to ensure each
component of the system works as intended. This includes validating the end-to-end
pipeline from data input (like URLs or usernames), through preprocessing and model
inference, to the generation of detection scores and report submissions. Each module, such
as the user interface for reporting, the backend API for data processing, and the alert
generation system, should be tested individually (unit testing) and in combination
(integration testing). This ensures the system not only makes correct predictions but also
delivers those predictions effectively to users or administrators in real-world conditions.
Another critical phase involves stress and performance testing, especially for platforms
expecting high user traffic or real time detection. The system should be tested under
simulated high-load conditions to evaluate its response time, stability, and scalability.
Performance testing helps identify bottlenecks in model inference time, database querying
speed, and server throughput. These insights are vital for optimizing infrastructure,
ensuring that the system remains responsive even when processing thousands of profiles or
handling concurrent report submissions.
Additionally, usability and user acceptance testing (UAT) are necessary to ensure the
system is practical and user friendly for both regular users and platform moderators. This
involves gathering feedback from test users on the clarity of detection reports, ease of
navigation, and effectiveness of alert mechanisms. It also includes validating edge cases
such as how the system behaves when given incomplete data, obscure profiles, or accounts
with borderline behavior. Ensuring the system is intuitive and trusted by end-users helps in
its adoption and real-world impact, ultimately contributing to a safer and more reliable
online environment.
44
CHAPTER 8
CONCLUSION
In today's digital age, the prevalence of fake social media profiles poses signific - ant
challenges to online security, privacy, and trust. The Fake Social Media Profile Detection
and Reporting System addresses this pressing issue by providing a robust and efficient
mechanism for identifying and reporting fraudulent accounts.
Through advanced techniques such as machine learning, natural language processing
(NLP), and pattern analysis, the system analyzes user profiles, behavioral patterns, and
content authenticity to distinguish genuine accounts from fake ones. By automating the
detection process, the project not only reduces the time and effort required for manual
identification but also minimizes human error.
While the project lays a strong foundation, future enhancements could include in- tegrating
real-time detection, multi-platform support, and improving the algorithm's accuracy
through large-scale datasets. Overall, this project is a step for- ward in combating online
impersonation and restoring trust in digital interactions.
The project on fake social media profile detection and reporting using machine learning
has successfully demonstrated the effectiveness of leveraging advanced algorithms to
identify and combat fraudulent accounts. By analyzing features such as profile activity,
behavioral patterns, and metadata, the system achieved high accuracy in distinguishing fake
profiles from genuine ones while integrating automated reporting mechanisms to
streamline action against malicious accounts. This approach enhances platform security,
mitigates the spread of misinformation, and improves user trust. Despite challenges like
dataset bias, evolving tactics of fake profiles, and privacy concerns, the project lays a strong
foundation for scalable, real-time solutions and underscores the importance of ethical
considerations in future advancements.
45
CHAPTER 9
FUTURE ENHANCEMENT
To further improve the efficiency and effectiveness of the fake social media profile
detection and reporting system, several future enhancements can be considered:
1.Real-Time Detection
Implement real-time monitoring and detection capabilities to identify fake profiles as they
are created or when suspicious activities are detected.
2.Advanced Machine Learning Models
Upgrade the detection algorithms by incorporating state-of-the-art machine learning
techniques, such as deep learning, to enhance accuracy in identifying sophisticated fake
profiles.
3.Cross-Platform Integration
Extend the system's capabilities to support multiple social media platforms, enabling a
unified approach to detecting and reporting fake profiles across networks.
4.Behavioral Analysis
Integrate behavioral analysis to detect unusual activities, such as bulk friend requests,
spamming, or irregular posting patterns, that may indicate fake profiles.
5.Image and Video Verification
Employ AI-powered tools for reverse image search and deepfake detection to identify
stolen or manipulated profile pictures and videos.
6.Natural Language Processing (NLP)
Enhance text analysis capabilities to detect linguistic patterns commonly used in fake
profiles, such as generic bios, repetitive messages, or unnatural text.
7.User Reporting and Feedback
Include a user-friendly reporting mechanism that allows users to flag suspicious profiles,
which can then be verified by the system for authenticity.
46
8.Blockchain for Data Integrity
Utilize blockchain technology to maintain a tamper-proof record of profile verification and
reporting, ensuring transparency and accountability.
9.Enhanced Privacy Protections
Ensure compliance with global data protection regulations (e.g., GDPR, CCPA) by
anonymizing user data and maintaining privacy during detection and reporting processes.
10.Global Threat Intelligence
Build a centralized threat intelligence database to share insights on fake profiles and
fraudulent activities across platforms, aiding in faster detection and response.
11.Educational Campaigns
Incorporate user education tools and awareness campaigns to help users identify and avoid
fake profiles independently.
12.Adaptive Algorithms
Develop adaptive detection systems that evolve to counter emerging tactics used by fake
profile creators, ensuring long-term effectiveness.
By incorporating these future enhancements, the system can achieve greater scalability,
reliability, and precision, creating a safer and more trustworthy social media environment.
47
CHAPTER 10
REFERENCES
[1] Chakraborty, P., Shazan, M., Nahid, M., Ahmed, M., Talukder, P. *Fake Profile
Detection Using Machine Learning Techniques* (2022).
https://scholar.google.com/scholar?q=Fake+Profile+Detection+Using+Machine+Learnin
g+Techniques+Chakraborty+2022
[3] Agravat, A., Makwana, U., Mehta, S., Mondal, D., Gawade, S. *Fake Social Media
Profile Detection and Reporting Using Machine Learning*.
https://scholar.google.com/scholar?q=Fake+Social+Media+Profile+Detection+and+Repo
rting+Using+Machine+Learning
[5] Alzahrani, A. A., Alzahrani, M. A. Anomaly Detection in Social Media Profiles Using
Machine Learning.
https://scholar.google.com/scholar?q=Anomaly+Detection+in+Social+Media+Profiles+U
sing+Machine+Learning
[6] Tiwari, V. Analysis and Detection of Fake Profile Over Social Network.
https://scholar.google.com/scholar?q=Analysis+and+Detection+of+Fake+Profile+Over+
Social+Network+Vijay+Tiwari
[7] Aydin, I., Sevi, M., Salur, M. U. Detection of Fake Twitter Accounts with Machine
Learning Algorithms.
https://scholar.google.com/scholar?q=Detection+of+Fake+Twitter+Accounts+with+Mach
ine+Learning+Algorithms
48
SAMPLE CODE
if external_url: score += 5 if
is_verified: score += 10
if business_category != "N/A": score += 5 if posts > 50: score +=
10 elif posts > 10: score += 5 if has_highlight_reels: score += 5 if
username_numbers_ratio < 0.2: score += 5 # Negative scores if
has_profile_pic == 0: score -= 10 if followers < 100: score -= 10
if engagement_ratio < 0.1: score -= 10 if username_numbers_ratio
> 0.5: score -= 5 if bio_length == 0: score -= 5 if external_url ==
0: score -= 3 if is_verified == 0: score -= 3 if business_category ==
"N/A": score -= 3
if posts < 5: score -= 5 if has_highlight_reels == 0: score -= 3
if (following > 1000 and followers < 200): score -= 5 if
(following / (followers + 1) > 10): score -= 5
if (followers > 1000 and posts < 10): score -= 5 if
(bio_length < 10 and external_url == 0): score -= 5
# Example placeholders for extra conditions
# avg_followers_per_post and suspicious_username # Set
them to 0 if you are not computing them yet
avg_followers_per_post = followers / (posts + 1)
suspicious_username = 0 bio_has_suspicious_keywords = 0
if avg_followers_per_post > 500 and posts < 5: score -= 5
if suspicious_username: score -= 5 if
bio_has_suspicious_keywords: score -= 5 # --- Final
verdict based on score --- i
if score >= 50: verdict = " Highly Legitimate Account!" elif
score >= 20: verdict = "Legitimate
Account " else: verdict = " Fake Account Detected!"
# Return all feature column data + score + verdict return {
"username": username, "profile_pic": profile.profile_pic_url,
"followers": followers,
"following": following, "posts": posts,
"username_numbers_ratio": username_numbers_ratio,
"bio_length": bio_length,
"external_url": profile.external_url or "N/A",
"is_verified": "Yes" if
is_verified else "No",
"business_category": business_category, "engagement_ratio": engagement_ratio,
"has_highlight_reels": "Yes" if has_highlight_reels else "No", "bio": profile.biography
or "No bio", "score": score, "verdict": verdict except
instaloader.exceptions.ProfileNotExistsException:
return {"error"Profile does not exist"}
except instaloader.exceptions.ConnectionException: return {"error": "Unable to connect
to Instagram"} except Exception as e: return
{"error": str(e)}
@app.route('/', methods=['GET', 'POST']) def index():
result = None if request.method == 'POST': username
= request.form.get('username') result =
get_instagram_profile(username) return
render_template('index.html', result=result) if name
== ' main ': app.run(debug=True)
from flask import Flask, render_template, request, redirect, url_for, flash import csv
import time app
Flask( name ) app.secret_key = 'your_secret_key'
CSV_FILE = "C:\\Users\\Ishwaryareddy\\OneDrive\\Dokumen\\instagram-
fakeaccount- detection[1]\\instagram-fake-account-detection\\user_datasets.csv"
def search_user(username): with open(CSV_FILE, 'r', encoding='utf-8') as file:
reader = csv.DictReader(file) for row in reader:
if row['username'].strip().lower() == username.strip().lower():
return row return
@app.route('/', methods=['GET', 'POST']) def index():
result = None user_data = None if
request.method == 'POST':
username = request.form['username'] user_data = search_user(username) if
user_data: if user_data['fake'].lower() == 'true': result = f"{username} is a
Fake Account" else: result = f"{username} seems Legitimate" else:
result = f"No data found for {username}" return
render_template('index.html', result=result, user_data=user_data)
@app.route('/report', methods=['POST']) def report():
flash('Report sent successfully!', 'success') return
redirect(url_for('index'))
if name == ' main ': app.run(debug=True) import requests import
random import string import csv from concurrent.futures import
ThreadPoolExecutor, as_completed def random_url(): return
f"https://{''.join(random.choices(string.ascii_lowercase,k=8))}.com" def
fetch_user_data():
api_url='https://api.api-ninjas.com/v1/randomuser' api_key =
'g9jLB+tA2T6dcxdPBqsTpw==jgtEJ2fnR1SmbdH0' try: response =
requests.get(api_url, headers={'X-Api-Key': api_key}, timeout=5) if
response.status_code == 200: return response.json() else: print("Error:",
response.status_code, response.text)
return None except Exception as e: print("Request failed:", e)
return None def generate_profile(user_data): username = user_data['username']
fullname = user_data['name'] gender=user_data['sex']
profile_pic = 'https://xsgames.co/randomusers/avatar.php?g=male' if gender == 'M' else
'https://xsgames.co/randomusers/avatar.php?g=female' custom_username = username +
str(random.randint(100, 999))
return [ profile_pic, custom_username, len(custom_username),
len(fullname.strip().split()), len(fullname.replace(" ", "")), custom_username.lower() ==
fullname.replace(" ", "").lower(), random.randint(0, 150), random_url()
if random.choice([True, False])
else '', random.choice([True, False]),random.randint(0, 500), random.randint(0, 10000),
random.randint(0, 5000), random.choice([True, False])]
headers=["profile_pic","username","username_length","fullname_words",
"fullname_length", "name_equals_username",
"description_length", "external_url", "private", "posts", “followers", "follows",
"fake"] data_list = []
num_profiles = 100
ThreadPoolExecutor(max_workers=10) as executor:
writer.writerows(data_list) print(f"CSV generation complete. {len(data_list)} profiles
saved to 'user_datasets.csv'.")
DATA SETS
In the Fake Social Media Profile Detection and Reporting project, various datasets are used
to train and test the system for detecting fake profiles. These datasets typically contain
information such as profile details, user behavior, and interactions. The types of datasets
used may include:
1. Profile Data (CSV)
• SampleColumns:user_id,followers_count,activity_level,bio_length,
post_frequency, is_fake (1 for fake, 0 for real).
6. Social Media Scraped Data (Optional, CSV or JSON)