0% found this document useful (0 votes)

35 views49 pages

Group 1 Report-2

The document describes a personality prediction system that uses machine learning and natural language processing techniques to analyze resumes/CVs and predict personality traits. It discusses the technologies used like Flask, NLTK, Pandas and leveraging Google's GenerativeAI service. The system allows users to upload resumes and provides a descriptive analysis of each candidate's personality to help with hiring decisions.

Uploaded by

GAURAV GUPTA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views49 pages

Group 1 Report-2

Uploaded by

GAURAV GUPTA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

A

Project Report
on

Personality Prediction Through CV Analysis Using ML

submitted as partial fulfilment for the award of

BACHELOR OF TECHNOLOGY
DEGREE
SESSION 2023-24
in

Computer Science and Engineering

By
Ayush Agarwal (2000290100038)
Gaurav Gupta (2000290100062)
Under the supervision of
Prof. Neha Yadav

KIET Group of Institutions, Ghaziabad

Affiliated to

Dr. A.P.J. Abdul Kalam Technical University, Lucknow

(Formerly UPTU)
May,2024
DECLARATION

We hereby declare that this submission is our own work and that, to the best of our
knowledge and belief, it contains no material previously published or written by another
person nor material which to a substantial extent has been accepted for the award of any
other degree or diploma of the university or other institute of higher learning, except where
due acknowledgment has been made in the text.

Signature Signature

Name: Ayush Agarwal Name: Gaurav Gupta

Roll No.: 2000290100038 Roll No.: 2000290100062

Date: Date:

ii
CERTIFICATE

This is to certify that Project Report entitled “Personality Prediction Through CV

Analysis Using ML” which is submitted by Ayush Agarwal, Gaurav Gupta in partial
fulfillment of the requirement for the award of degree B. Tech. in Department of Computer
Science & Engineering of Dr. A.P.J. Abdul Kalam Technical University, Lucknow is a record
of the candidates own work carried out by them under my supervision. The matter embodied
in this report is original and has not been submitted for the award of any other degree.

Supervisor Name Prof. Neha Yadav

(Designation) ( Professor)

Date:

iii
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken
during B. Tech. Final Year. We owe special debt of gratitude to Prof. Neha Yadav,
Department of Computer Science & Engineering, KIET, Ghaziabad, for her constant support
and guidance throughout the course of our work. Her sincerity, thoroughness and
perseverance have been a constant source of inspiration for us. It is only her cognizant efforts
that our endeavors have seen light of the day.

We also take the opportunity to acknowledge the contribution of Dr. Vineet Sharma, Head
of the Department of Computer Science & Engineering, KIET, Ghaziabad, for his full
support and assistance during the development of the project. We also do not like to miss the
opportunity to acknowledge the contribution of all the faculty members of the department
for their kind assistance and cooperation during the development of our project.

We also do not like to miss the opportunity to acknowledge the contribution of all faculty
members, especially faculty/industry person/any person, of the department for their kind
assistance and cooperation during the development of our project. Last but not the least, we
acknowledge our friends for their contribution in the completion of the project.

Date: Date:

Signature: Signature:

Name : Ayush Agarwal Name : Gaurav Gupta

Roll No.: 2000290100038 Roll No. : 2000290100062

iv
ABSTRACT

The Personality Prediction System represents a cutting-edge integration of artificial

intelligence and machine learning technologies aimed at automating the analysis of resumes
or CVs for personality trait prediction. This system utilizes a Flask-based web application
where users can upload resumes in multiple file formats. Upon uploading, the system
processes the textual content of the resumes using advanced Natural Language Processing
(NLP) techniques facilitated by the NLTK library and other tools such as PyPDF2, textract,
and docx for text extraction.
The core functionality of the system lies in its ability to predict personality traits from the
extracted text. This is achieved through the implementation of machine learning models that
analyze the text data and assign personality traits to candidates. These traits are then utilized
to provide a descriptive analysis of each candidate’s personality, leveraging Google's
GenerativeAI service to enhance the descriptive accuracy and depth.
Technologically, the system is built using Python for backend operations, with a front end
crafted using HTML, CSS, and JavaScript. The use of Flask allows for robust web
interactions, and Pandas supports efficient data handling. The project also demonstrates
effective use of Pycharm as an integrated development environment which aids in managing
the complex interactions across various libraries and frameworks involved.
In conclusion, the Personality Prediction System amalgamates machine learning, NLP, and
web development technologies to offer a sophisticated tool for automated personality
assessment. By providing detailed personality trait predictions, the system empowers
recruiters and employers to make more informed hiring decisions, thus optimizing the
recruitment process and enhancing the fit of potential hires within organizations.

v
TABLE OF CONTENTS
Page No.
DECLARATION……………………………………………………………………. ii
CERTIFICATE……………………………………………………………………… iii
ACKNOWLEDGEMENTS…………………………………………………………. iv
ABSTRACT………………………………………………………………………..... v
LIST OF FIGURES…………………………………………………………………. viii
LIST OF TABLES…………………………………………………………………… ix
LIST OF ABBREVIATIONS………………………………………………………. x

CHAPTER 1 (INTRODUCTION)…………………………………………………. 1

1.1. Introduction……………………………………………………………………... 1
1.2. Project Description……………………………………………………………… 2

CHAPTER 2 (LITERATURE REVIEW)…………………………………………. 6

2.1 Literature Review based on various research paper …….…………............................... 6

2.2 Suggestions based on Literature Review ………………………………………. 7

CHAPTER 3 (PROPOSED METHODOLOGY) ….................................................. 9

3.1.Techniques Used ….............................................................................................. 12

3.2 Background of Proposed Perception…………………...……………………….. 15
3.3 Machine learning …………………………….………………………………… 16
3.4 Explanation …………………………………………………………………….. 17
3.5 Use of Generative AI …………………………………………………………... 19

CHAPTER 4 (RESULTS AND DISCUSSION)…………………………………… 27

4.1 Snapshots ……………………………………………………………………….. 27

4.2 Discussion ……………………………………………………………………….
29
vi
CHAPTER 5 (CONCLUSIONS AND FUTURE SCOPE) ......................................... 27 33

5.1 Conclusion …………………………………………………………………....... 27 33

5.2 Research Directions ……………………………………………………………. 29
35

REFERENCES……………………………………………………………………....
38
APPENDIX………….................................................................................................
40

32
33

vii
LIST OF FIGURES

Figure No. Description Page No.

3.1 Distortion Score Elbow for K Means Clustering 23

3.2 Personality Clusters after PCA 25

4.1 Resume Upload 27

4.2 CV Analysis Results 27

4.3 Personality Traits 28

viii
LIST OF TABLES

Table. No. Description Page No.

3.1 OCEAN Score and Personality type 15

ix
LIST OF ABBREVIATIONS

GloVe Global Vectors for Word Representation

BERT Bidirectional Encoder Representations from
Transformers

x
CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION

In the current digital era, the process of recruitment has evolved beyond the traditional
methods of evaluating candidates purely based on qualifications and face-to-face interviews.
Today, understanding the personality traits of candidates is increasingly recognized as a
critical factor in ensuring optimal job fit and organizational harmony. One such innovation
is the Personality Prediction System through CV Analysis, a sophisticated tool designed to
leverage machine learning (ML) and natural language processing (NLP) for the analysis of
resumes or CVs.
The Personality Prediction System is built using a Flask-based web interface, allowing users
to upload resumes in various file formats, which are then processed to extract textual data.

The goal of this research is to use big 5 models and machine learning algorithms to determine
an individual's personality. A person's personality has a big impact on both his personal and
professional life. These days, a lot of companies have also begun to shortlist applicants based
on their personalities as it boosts productivity because the individual is doing what he is good
at rather than what he is compelled to do.
Several psychological theories suggest that the OCEAN model, also known as the Big
Five model or the Five-Factor Model (FFM), was established in the early 1980s. Upon
applying statistical analysis to personality survey data, some terms are employed to
characterize the individual, and these terms provide an accurate description of the person's
overall personality or character. The word personality comes from the Latin persona, which
means characterizing a person's actions or disposition. It's been stated that personality
meaning is reflected in the very nature of the distinct attitude of an individual from other
people.

1
The dynamic organization within the individual of those psychological systems that
determine his characteristic behavior and thought," is how Hall and Lindsey define
personality. This technique ascertains the distinct manner in which an individual adjusts to
their surroundings. A person's personality is characterized by their sense of self, which
shapes their behavior in a distinctive and dynamic way. This behavior might alter as a result
of experience, education, learning, etc.

This viewpoint explains Setiadi's theory, which holds that personality is the dynamic
systemic organization that specifically dictates how an individual adapts to their
surroundings. The project mines user characteristic data and looks for patterns using learning
algorithms and sophisticated data mining methods. This project will go across regions where
it has access to a lot of behaviorally-based personal data. By applying automated personality
prediction and categorization, this data can be useful in the classification of individuals.
Large volumes of private behavioral data are accessible in certain places. Using automated
personality categorization, we may use this data to classify individuals.

Five characteristics of different individuals commonly known as the big five characteristics
namely, openness, neuroticism, conscientiousness, agreeableness, and extraversion are
stored in a dataset and used for training. Based on this training, the personality of individuals
is predicted using datamining concepts. Before testing the dataset, it is pre-processed
using different data mining concepts like handling missing values, data
normalization, etc. This pre-processed data can then be used to classify/predict
user personality based on past classifications. The system analyses user characteristics and
behaviors. The system then predicts new user personality based on personality data
stored by classification of previous user data. The model used to predict the test dataset
is “Random Forest Classifier” because Random Forest Classifier is an effective model to
predict output class labels for dependent categorical data.

1.2 PROJECT DESCRIPTION

The Personality Prediction System through CV Analysis represents a novel application of

machine learning and natural language processing techniques aimed at revolutionizing the

2
recruitment process. The system is capable of processing multiple resume formats and offers
functionalities such as text extraction, personality trait prediction, and detailed AI-generated
trait descriptions. The Personality Prediction System through CV Analysis you're describing
uses a blend of web development technologies and artificial intelligence to create an
innovative tool for recruitment. Here’s a detailed explanation of its components and
functionalities:
The system utilizes:
Frontend Technologies :

HTML: Serves as the backbone of the webpage, structuring the content of the web interface
where users can upload resumes.
CSS: Styles the webpages, making the interface visually appealing and easy to navigate.
JavaScript: Enhances interactivity, handling events like resume uploads and displaying the
results of the personality trait predictions.
The frontend acts as the point of interaction for users. It likely includes forms for uploading
resumes and panels or dashboards where the results (personality traits and descriptions) are
displayed after analysis.

Backend Core Technology:

Python with Flask: Flask is a lightweight web framework for Python, chosen here probably
for its simplicity and effectiveness in setting up a web server quickly. Python's extensive
library ecosystem and its prowess in data handling and machine learning make it ideal for
the backend.

The backend handles the processing logic: receiving uploaded files, managing data flow,
storing results temporarily, and interfacing with machine learning models for personality
prediction.

Libraries and Frameworks:

Pandas: Used for handling and manipulating structured data. In this context, it could be used
to manage and analyze data extracted from resumes.
3
NLTK: Stands for Natural Language Toolkit, and it provides tools for building Python
programs to work with human language data. It could be used for text processing and feature
extraction from resumes.
PyPDF2: A library for PDF file manipulation, allowing the system to read and extract text
from resumes in PDF format.
These libraries provide essential tools that facilitate various operations from data
manipulation to complex text processing, which are critical in processing resumes and
extracting meaningful information for further analysis.

Tools for Development Environment:

PyCharm: An Integrated Development Environment (IDE) for Python provided by

JetBrains. It supports many Python-based web development frameworks and libraries,
making it an excellent choice for building sophisticated applications like this system.

PyCharm would be used to write, test, and debug the code that makes up the backend and
helps in integrating the frontend components.

Key Features:

1. Resume Processing:

The system extracts text from uploaded resumes, which can be in various formats such as
PDFs or Word documents.
It then uses machine learning models to analyze the text and predict personality traits based
on the content, such as expressions, skills listed, and the general tone of the resume.

2. Web Interface:

Provides a user-friendly web portal for users to upload their resumes.

Displays the results of the analysis, including the predicted personality traits and possibly
other relevant information extracted from the resume.

4
3. AI Descriptions:

Utilizes advanced generative AI, possibly leveraging models like those from Google, to
generate detailed descriptions of the predicted personality traits. This can provide deeper
insights and explanations that can help employers understand the implications of these traits
in a professional setting.

*******

5
CHAPTER 2

LITERATURE REVIEW

2.1 LITERATURE REVIEW BASED ON VARIOUS RESEARCH PAPER

G. Sudha [1] “Personality as a Prognosticator of Future Job Success”, Many Scientific

studies prove that personality is the authentic prognosticator of job performance and it also
helps in examining the behavioural temperament of the candidates in the work environment
that allows recruiters or employers to understand whether they will excel in their career and
befit the culture of the organization
Sasipriya K K [2] during the research found that as important as talent, the personality of an
employee is also very important for a stable organization. With personality assessment, you
can classify the candidate’s personality based on his capability and adaptation to the
company’s requirements. Our project helps the company to select the right candidate in an
easier and efficient way.
Suraj Mali [3] In his study found that - Personality is the most important factor which reflects
an individual, which keeps on varying. Tackling them is a tedious task for which we have
implemented an approach to identify the personality and also provide with the
recommendation. In this paper, we propose a machine learning based method to check a
candidates aptitude and personality score. The personality of the candidate would be
identified by using two metrics, first is aptitude /personality test and second CV analysis.
The administrator is responsible to design, update or drop the questions and has the complete
control to customize the aptitude/personality questions as per organization requirements.
Kalghatgietal. [4] Presented a Neural Network Approach predicated on the Big Five Test to
predict the personality of individualities depending on tweets published on Twitter by
lodging meta-attributes from tweets. Which are used to assay one’s social behavior. The
authors followed a four- step process which is Data Collection from tweets, Pre-processing,
Transformation and Bracket. Although neural networks are used to predict personality there
are limitations analogous as fighting fake information, automatic analysis of tweets and
counting on just Twitter is not enough to predict someone’s personality but only stoner
behavior and trends.
Allan Robey et al [5] proposed a system to reduce the weight on the Human Resource
department of companies by having two sides association and candidate acquainted. The
authors claim that the proposed system will be more effective to shortlist CVs from a large
pool making sure that the ranking is fair and legal. The main difference between the being
system and the proposed system is that rather of just surveying the CVs, the authors propose
to conduct an aptitude test and a personality test for personality prophecy. Juneja Afzal

6
Ayub Zubeda et al [6] worked on a design to rank CVs using Natural Language Processing
and Machine Learning. The system ranks CVs in any format according to the company’s
criteria. The authors propose to consider seeker’s Git mecca and LinkedIn profile as well to
get a better understanding making it easier for the company to find a suitable match
grounded on skill sets, capability and most importantly, personality.
Liden et al. published The General Factor of Personality: The interrelations among the Big
Five personality factors (Openness, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism) were analyzed in this paper to test for the existence of a GFP. The meta-
analysis provides evidence for a GFP at the highest hierarchal level and that the GFP had a
substantive component as it is related to supervisor-rated job performance were concluded
by this paper. However, it is also realized that it is important to note that the existence of a
GFP did not mean that other personality factors that were lower in the hierarchy lost their
relevance.

2.2 SUGGESTIONS BASED ON LITERATURE REVIEW

On the basis of following we have known that a robust foundation for the use of ML and
NLP in personality prediction from textual data. The integration of these technologies into a
user-friendly web platform, as in the Personality Prediction System, aligns with
contemporary research and addresses both the practical and ethical complexities of modern
recruitment technologies. This review not only validates the approach taken in this project
but also highlights the innovative potential of combining these technologies for enhanced
recruitment processes.
Based on the literature review provided, several key insights and recommendations can be
drawn to enhance the Personality Prediction System through CV Analysis:
Incorporate Personality Assessment in Recruitment Process:
The studies highlighted the significance of personality assessment in predicting job success
and organizational fit. Therefore, integrating personality analysis into the recruitment
process can provide valuable insights for employers in selecting candidates who align with
the company culture and job requirements.
Utilize Machine Learning for Personality Prediction:
Leveraging machine learning algorithms, as proposed by Suraj Mali [3] and Ayub Zubeda et
al [6], can enhance the accuracy and efficiency of personality prediction. By training models
on large datasets of resumes and corresponding personality assessments, the system can learn
to identify patterns and correlations between resume content and personality traits.
Consider Multiple Data Sources:
As suggested by Kalghatgi et al. [4], considering additional data sources beyond resumes,
such as social media activity (e.g., Twitter, LinkedIn) and online profiles (e.g., GitHub), can
provide a more comprehensive understanding of candidates' personalities and skills.
7
Integrating Natural Language Processing (NLP) techniques to analyze textual data from
these sources can further enrich the personality assessment process.
Customize Assessment Criteria:
Providing flexibility for administrators, as proposed by Suraj Mali [3], to customize aptitude
and personality test questions based on organizational requirements can enhance the
relevance and effectiveness of the assessment process. This customization allows
organizations to tailor the assessment criteria to specific job roles and company culture.
Implement Fair and Transparent Ranking Mechanisms:
Allan Robey et al. [5] emphasized the importance of fairness and legality in the shortlisting
process. Implementing transparent ranking mechanisms ensures that candidates are
evaluated objectively based on merit and suitability for the role. Additionally, conducting
aptitude and personality tests alongside CV analysis can provide a more holistic evaluation
of candidates' capabilities and traits.
Continuous Model Improvement and Evaluation:
It's crucial to continuously update and refine the machine learning models used for
personality prediction based on feedback and performance evaluation. Regularly evaluating
model accuracy, bias, and generalization capabilities helps maintain the system's
effectiveness and reliability over time.
Considerations for General Factor of Personality (GFP):
Liden et al.'s study [7] highlights the existence of a General Factor of Personality (GFP) and
its relevance to job performance. While incorporating the Big Five personality factors
(Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) into the
assessment, it's essential to recognize the interrelations among these factors and their
implications for job-related outcomes.
In summary, the Personality Prediction System should aim to integrate machine learning
techniques, consider multiple data sources, customize assessment criteria, ensure fairness
and transparency in ranking mechanisms, and continuously evaluate and refine models to
enhance its effectiveness in predicting candidates' personalities and job suitability.

*******

8
CHAPTER 3

PROPOSED METHODOLOGY

The proposed methodology for the Personality Prediction System through CV Analysis
outlines a systematic approach integrating machine learning (ML), natural language
processing (NLP), and web development technologies to analyze resumes and predict
personality traits. This chapter details each component of the methodology, including data
collection, preprocessing, model development, and deployment.
Resume Processing and Trait Assignment:
1. Text Extraction:
Utilizes Python libraries such as PyPDF2, textract, and docx to extract textual data from
resumes regardless of their format (PDF or DOCX).
Handles potential exceptions, such as PDFs encrypted with passwords, using appropriate
error handling mechanisms.
2. Text Preprocessing:
Preprocesses the extracted text by removing punctuation, tokenizing it into words, and
lemmatizing each word to its base form using NLTK (Natural Language Toolkit).
Removes stopwords (commonly occurring words like "the", "is", "and") to focus on
meaningful content.
3. Trait Assignment:
Assigns personality traits to candidates based on the extracted skills and predefined
associations stored in 'traits.txt'.
Matches extracted skills with traits and assigns relevant traits to each candidate based on
their skill set.

Web Interface and User Interaction:

4. Flask-based Web Application:
Implements a user-friendly web interface using Flask, a Python web framework, facilitating
easy resume uploads and displaying analysis results.
Renders HTML templates ('index.html' and 'result.html') to create interactive pages for users.

9
5. Resume Upload and Analysis:
Allows users to upload resumes through a file input field on the web interface.
Handles file uploads securely, processing each uploaded resume immediately upon
submission.

AI Interaction for Trait Description:

6. Integration with Generative AI Service:
Interacts with Google's Generative AI service to generate descriptive text about a candidate's
personality based on their assigned traits.
Constructs queries to the Generative AI model, providing it with the list of assigned
personality traits for trait descriptions.

Data Management and History:

Historical Reference:
Stores extracted resume details, including contact information, education, work experience,
and skills, in a structured format within a CSV file ('extracted_details.csv').
Ensures data persistence for future reference and analysis, enabling users to retrieve
historical data for specific resumes.
Web Application Functionalities:
Provides additional functionalities within the web application, allowing users to view
historical data, export data for further analysis, and clear history if needed.
Enhances user experience by offering comprehensive data management capabilities directly
from the web interface.
Let's break down the system flow into detailed steps:

1. Resume Upload via Web Interface:

Users access the web interface of the Personality Prediction System.
They are provided with an option to upload their resumes in various formats (e.g., PDF,
DOCX).
Upon selecting and uploading the resume file, the system initiates the processing pipeline.
2. Text Extraction and Preprocessing:
The uploaded resumes undergo text extraction to convert the content into a machine-readable
format.
10
Libraries such as PyPDF2, textract, or docx are utilized to extract text from the uploaded
resume files.
Preprocessing steps are applied to clean and standardize the extracted text, including
removing punctuation, tokenization, and lemmatization.
3. Personality Trait Prediction:
The preprocessed text is fed into machine learning models trained to predict personality traits
based on the content of the resumes.
These models utilize algorithms such as Linear Regression, Random Forest, or Neural
Networks to analyze the textual data and predict personality traits.
Each candidate's personality profile is determined based on the model's output, which may
include dimensions such as Openness, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism (OCEAN model).
4. Interaction with GenerativeAI Service:
Once personality traits are assigned to candidates, the system interacts with a GenerativeAI
service (such as Google's AI service) to generate detailed descriptions of the assigned traits.
GenerativeAI employs advanced natural language generation techniques to produce coherent
and contextually relevant descriptions of each personality trait.
These descriptions provide deeper insights into the implications of each trait and how they
manifest in professional contexts.
5. Presentation of Results:
The system aggregates the results of text extraction, personality trait prediction, and
GenerativeAI descriptions.
The processed information is presented to the user through the web interface in an intuitive
and user-friendly manner.
Users can view detailed reports or summaries of their personality profiles, including trait
assignments and descriptions.
The interface may include interactive elements for users to explore their results further or
seek additional information.
6. User Interaction and Feedback:
Users have the option to interact with the presented results, such as downloading reports or
sharing them with others.
Feedback mechanisms may be incorporated to gather user insights and suggestions for
improving the system's performance or user experience.
Continuous monitoring and iteration based on user feedback ensure the system remains
effective and responsive to user needs.

11
3.1 TECHNIQUES USED

1. Feature Engineering:
• Objective: Enhance predictive model performane by creating new features from raw data.
• Application: Extracting relevant features from resume text, such as word frequency,
sentence length, and syntactic patterns, to improve personality trait prediction accuracy.

2. Natural Language Processing (NLP):

• Objective: Process and analyze text data from resumes to extract meaningful information.

• Methods Used:
• Tokenization: Breaking text into tokens (words or sentences) for analysis.
• Stemming and Lemmatization: Reducing words to their base form to normalize text.
• Part-of-Speech (POS) Tagging: Identifying the grammatical components of words in
context.
• Application: Preprocessing resume text to prepare it for feature extraction and model
training.

3. Machine Learning Algorithms:

Objective: Analyze and predict patterns in data to assign personality traits to candidates.
Algorithms Leveraged: Linear Regression: For predicting continuous variables (e.g., trait
scores).
• Random Forest: For ensemble learning and handling non-linear relationships.
• Neural Networks: For complex pattern recognition and prediction tasks.
• Application: Training ML models using extracted features from resume text to predict
personality traits accurately.

4. Text Embeddings and Word Vectors:

• Objective: Convert words into numerical vectors to capture semantic relationships
between words.
• Word2Vec: Mapping words to vectors in a continuous vector space based on word
context.
12
• GloVe: Learning word vectors from global word-word co-occurrence statistics.
• BERT: Pre-trained contextualized word embeddings capturing word meanings in context.
• Application: Utilizing pre-trained embeddings or training custom embeddings to represent
resume text numerically for input into ML models.
Let's delve into the details of each tool:

1. PyPDF2
Purpose: PyPDF2 is a Python library for working with PDF files. It allows users to read,
merge, split, crop, and extract text and metadata from PDF documents.
Features:
Text Extraction: PyPDF2 enables users to extract text content from PDF files, making it
accessible for further processing or analysis.
Document Manipulation: Users can perform various operations on PDF documents, such as
merging multiple PDFs into one, splitting a PDF into multiple documents, or extracting
specific pages.
Metadata Access: PyPDF2 allows access to metadata information stored within PDF files,
including author, title, subject, and creation/modification dates.
Use Cases: PyPDF2 is commonly used in applications requiring PDF manipulation and text
extraction, such as document management systems, data extraction pipelines, and text
analysis tools.

2. textract
Purpose: textract is a Python library for extracting text from various document formats,
including PDFs, Microsoft Office documents (e.g., Word, PowerPoint), and other common
formats like EPUB, RTF, and HTML.
Features:
Document Parsing: textract supports parsing text from a wide range of document formats,
making it versatile for extracting content from different sources.
Pluggable Architecture: The library utilizes external command-line utilities (e.g., pdftotext,
antiword) to extract text from specific file types, providing robust support for different
formats.
Simple API: textract offers a straightforward API for extracting text, abstracting away the
complexities of interacting with different file formats and external dependencies.

13
Use Cases: textract is commonly used in applications requiring text extraction from diverse
document formats, such as content indexing, information retrieval, and data analysis
pipelines.

3. docx (Python-docx)
Purpose: The docx library, also known as Python-docx, is a Python library for creating,
modifying, and extracting text from Microsoft Word (.docx) documents.
Features:
Document Manipulation: docx allows users to create new Word documents, modify existing
documents, and extract text content from .docx files.
Text Formatting: Users can apply various text formatting options (e.g., font styles, colors,
alignment) to document content programmatically.
Table Support: docx supports working with tables in Word documents, enabling users to
create, modify, and extract tabular data.
Use Cases: docx is commonly used in applications requiring interaction with Word
documents, such as document generation, report automation, and content extraction.

4. PyCharm
Purpose: PyCharm is an Integrated Development Environment (IDE) specifically designed
for Python development. It provides a comprehensive set of features for writing, debugging,
and deploying Python applications.
Features:
Code Editor: PyCharm offers a powerful code editor with syntax highlighting, code
completion, and intelligent code analysis features, enhancing productivity and code quality.
Debugger: The built-in debugger allows users to step through code, set breakpoints, and
inspect variables, making it easier to identify and resolve issues in Python code.
Version Control Integration: PyCharm seamlessly integrates with version control systems
like Git, enabling collaborative development and efficient code management.
Project Management: The IDE includes project management tools for organizing files,
dependencies, and configurations, streamlining development workflows.

14
3.2 BACKGROUND OF PERSONALITY PERCEPTION

The Big Five Personality Traits model is based on findings from several independent
researchers, and it dates back to the late 1950s. But the model as we know it now began to
take shape in the 1990s.
Lewis Goldberg, a researcher at the Oregon Research Institute, is credited with naming the
model "The Big Five." It is now considered to be an accurate and respected personality scale,
which is routinely used by businesses and in psychological research.
The Big Five Personality Traits Model measures five key dimensions of people's
personalities:

Openness: sometimes called "Intellect" or "Imagination," this measures your level of

creativity, and your desire for knowledge and new experiences.

Conscientiousness: this looks at the level of care that you take in your life and work. If you
score highly in conscientiousness, you'll likely be organized and thorough, and know how to
make plans and follow them through. If you score low, you'll likely be lax and disorganized.

Extraversion/Introversion: this dimension measures your level of sociability. Are you

outgoing or quiet, for instance? Do you draw energy from a crowd, or do you find it difficult
to work and communicate with other people?

Agreeableness: this dimension measures how well you get on with other people. Are you
considerate, helpful and willing to compromise? Or do you tend to put your needs before
others'?

Natural Reactions: sometimes called "Emotional Stability" or "Neuroticism," this measure

emotional reactions. Do you react negatively or calmly to bad news? Do you worry
obsessively about small details, or are you relaxed in stressful situations?

3.1 OCEAN Score and Personality type

15
3.3 MACHINE LEARNING

Machine Learning: Computer systems can now automatically learn without the need
for explicit programming thanks to machine learning. How does a machine learning
system operate, though? Thus, it may be explained by the machine learning lifecycle.
The life cycle of machine learning is a cyclical process that builds an effective project
involving machine learning. Finding a solution for the issue or project is the
lifecycle's primary goal. With the help of machine learning (ML), artificial
intelligence (AI) systems can learn automatically and get better with time without
needing explicit programming. Machine learning (ML) focuses on developing
computer algorithms that can obtain data and utilize it to learn on their own. Experts
produce widely applicable computations on vast classes of learning problems in order
to achieve machine learning (ML). When the time comes to provide an explanation
Similar to what a human does, the PC gets better at its assigned task the more
information or "experience" it gains. Learning starts with facts or observations,
examples, first-hand experience, or practice in order to find patterns in the data and
make better decisions going forward based on the examples provided. The main goal
is to let the computers learn on their own, without human assistance or intervention,
and then control movement as a result.
The following categories apply to ML algorithms:
Supervised Learning: Using categorized samples and what has been discovered
beyond to new statistics, supervised machine learning algorithms can forecast
potential future events. In order to forecast the output values, the machine learning
algorithm first analyzes an acknowledged training dataset and creates an inferred
feature. With sufficient training, the system can set goals for any new I/P. In addition
to evaluating output that is equivalent to the right one, this set of rules may also find
mistakes and evaluate output, allowing one to modify the model as needed.
Unsupervised Learning: When the training statistics are neither classified nor categorized,
unsupervised machine learning techniques are employed in the evaluation process.
Unsupervised learning studies how software can deduce a feature from unlabeled recordings
in order to describe a concealed shape. The system examines the facts and may make
deductions from datasets to characterize hidden structures from unlabeled data, but it does
not determine the correct output.
Reinforcement Learning: Reinforcement learning algorithms are a method of training that
interacts with its environment by making movements and identifying errors.Trial and error,
seek and reward, and behind schedule reward are the most broadly applicable characteristics
of the reinforcement technique. This methodology enables devices and software vendors to
16
autonomously determine the exact action inside a certain setting in order to optimize their
overall efficacy.

3.4 EXPLANATION

1. Imports:
Flask web application for analyzing resumes and predicting personality traits based on the
extracted information. Let's break down the code and explain each part:

The code imports necessary modules and functions from Flask, datetime, os, pandas, and
custom modules (ai_prediction and resume_extraction) for AI prediction and resume
extraction.
2. Flask App Setup:

Sets up the Flask application and specifies the folder where uploaded resumes will be
saved.

17
3. Routes and Functions:
3.1. Index Route:

3.2. Resume Analysis Route:

3.3. History Route:

4. Helper Functions:
4.1. Saving to CSV:

5. Application Entry Point:

Runs the Flask application in debug mode.

18
This Flask application provides a user-friendly interface for uploading resumes, extracting
relevant details, predicting personality traits, and displaying the results. It also offers
functionalities for managing historical data, exporting data, and clearing history, enhancing
the overall user experience.

3.5 USE OF GENERATIVE AI

This Python code defines a conversational agent named "Luna" that interacts with Google's
GenerativeAI service to generate descriptive text about a candidate's personality traits. Let's
break down the code and explain each part:
1. Import Statements:
The code imports the necessary modules: genai for accessing Google's GenerativeAI service
and personality traits for retrieving personality traits.
2. API Configuration:
Configures the GenerativeAI service with the API key. This API key should be obtained
from the Google Maker Suite platform.
3. Generation Configuration:
Specifies the generation configuration settings for the GenerativeAI model, such as
temperature, top-p, top-k, and max output tokens.
4. Generative Model Initialization:
Initializes the Generative Model with the specified name ("gemini-pro") and generation
configuration.
5. Chat Function:
19
Defines a function chat(query) that interacts with the GenerativeAI model to generate
descriptive text based on the input query. It retries up to three times if an error occurs.
6. Auxiliary Functions:
Defines auxiliary functions: say(text) for printing Luna's responses and takeCommand() for
generating the query based on predefined personality traits.
7. Main Execution:
Executes the main part of the script. It generates a query using takeCommand(), interacts
with the GenerativeAI model using chat(query), and prints Luna's response.
In summary, this code sets up a conversational agent named "Luna" that utilizes Google's
GenerativeAI service to generate descriptive text about a candidate's personality traits
based on predefined traits. It demonstrates how AI models can be integrated into
conversational systems to provide meaningful responses to user queries.
Python is used to create the Flask framework for web applications. It features several
modules that simplify the process of creating apps for web developers by taking care of
tedious issues like thread management and protocol management.Flask provides us with the
tools and libraries we need to create a web application, along with a range of options for
constructing online applications. We'll be developing a web application that integrates with
the model we constructed in this part. For the applications where he must input values for
forecasts, a user interface is offered. The savedmodel receives the enter values, and the
prediction is displayed on the user interface.This file, named "model.pkl," is a machine
learning model that we are employing to estimate the compression strength of the concrete.
For this model, there are seven independent variables and one dependent variable.Make a
project folder with the following contents: A file in Python named app.py Your file
containing the machine learning method (personality prediction.py or personality
prediction.ipnby, for instance) One example of a model file is Personality Prediction.pkl.
The templates folder contains the home.html page.

How are we going to forecast?

The project's foundation is the use of machine learning algorithms and the big five models,
also referred to as the OCEAN model, to determine an individual's personality. Certain terms
are utilized to accurately characterize a person's overall personality or character when
statistical analysis is applied to personality survey data.
Selecting a dataset: The initial stage of training a model involves obtaining your data. A
dataset needs to be genuine and accurate. Data can come from a variety of sources, including
social media websites, databases, flat files, sensors, and data warehouses. My project's
dataset is obtained from Kaggle.

20
Sanitizing the dataset

Machine learning's second stage is pre-processing the dataset. Before training a better model,
it is vital to remove noisy data, fill in null (empty) values, replace garbage data, and use
algorithms to discover unknown columns.
I use many Python library functions, such as NumPy and Pandas, for this purpose in order
to clean the dataset.
Visualize the dataset: Prior to training a model, it's critical to comprehend the dataset and
select the machine learning approach. Defining the dataset includes organizing the dataset
columns for training, identifying trends, eliminating outliers, and visualizing the rows and
columns into graphs.
Since unsupervised learning is the foundation of our challenge, I went with the clustering
technique.
Clustering the dataset: We must determine the number of clusters to include before training
the model. Determining the number of clusters is crucial in order to identify every unique
characteristic inside the dataset. To determine the optimal number of clusters that best suit
the dataset, we employ the elbow approach.
Algorithmic strategy:
A.Decision Trees : Decision Tress are a type of Supervised Machine Learning (that is you
explain what the input is and what the corresponding output is in the training data) where the
data is continuously split according to a certain parameter. The tree can be explained by two
entities, namely decision nodes and leaves. The leaves are the decisions or the final
outcomes. And the decision nodes are where the data is split. which are outcomes like either
„fit‟, or,unfit‟. In this case this was a binary classification problem (a yes no type problem).
There are two main types of Decision Trees:
We‟ll build a decision tree to do that using ID3 algorithm. ID3 Algorithm will perform
following tasks recursively
1. Create root node for the tree
2. If all examples are positive, return leaf node „positive‟
3. Else if all examples are negative, return leaf node „negative‟
4. Calculate the entropy of current state H(S)
5. For each attribute, calculate the entropy with respect to the attribute „x‟ denoted by H(S,
x)
6. Select the attribute which has maximum value of IG(S, x)
7. Remove the attribute that offers highest IG from the set of attributes
8. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
21
B. Logistic Regreesion:
Logistic regression models the probability of the default class (e.g. the first class). For
example, if we are modeling people‟s sex as male or female from their height, then the first
class could be MMCOE, Department of Computer Engineering 2021-22 38 male and the
logistic regression model could be written as the probability of male given a person‟s height,
or more formally: P(sex=male height) Written another way, we are modelling the probability
that an input (X) belongs to the default class (Y=1), we can write this formally as: P(X) =
P(Y=1|X) We‟re predicting probabilities.
I thought logistic regression was a classification algorithm. Note that the probability
prediction must be transformed into a binary values (0 or 1) in order to actually make a
probability prediction.
More on this later when we talk about making prediction. Logistic regression is a linear
method, but the predictions are transformed using the logistic function. continuing on from
above, the model can be stated as: p(X) = e^(b0 + b1*X) / (1 + e^(b0 + b1*X)) I don‟t want
to dive into the math too much, but we can turn around the above equation as follows
(remember we can remove the e from one side by adding a natural logarithm (ln) to the
other): ln(p(X) / 1 – p(X)) = b0 + b1 * X This is useful because we can see that the calculation
of the output on the right is linear again (just like linear regression), and the input on the left
is a log of the probability of the default class. This ratio on the left is called the odds of the
default class (it‟s historical that we use odds, for example, odds are used in horse racing
rather than probabilities).
Odds are calculated as a ratio of the probability of the event divided by the probability of not
the event, e.g. 0.8/(1-0.8) which has the odds of 4.
So we could instead write: ln(odds) = b0 + b1 * X Because the odds are log transformed, we
call this left hand side the log-odds or the profit.
It is possible to use other types of functions for the transform (which is out of scope_, but as
such it is common to refer to the transform that relates the linear regression equation to the
probabilities as the link function, e.g. the profit link function. We can move the exponent
back to the right and write it as: odds = e^(b0 + b1 * X) .

22
Fig 3.1 Distortion Score

Following the OCEAN model, there will be five primary clusters representing the five
distinct personality types.
Train the Model: The primary function of machine learning is model training. Using a
dataset, train many algorithms and determine which yields the best accuracy.
Numerous techniques, such as K-mean and partitioning, are utilized for clustering. When
you give the algorithm the specified dataset and number of clusters, it will train the model.
Utilizing metrics like accuracy, ROC curve, and confusion matrix, assess the model.
Visualize the Model: This is a useful tool for determining whether or not the trained model
will perform properly. Identifying mistakes and outliers is the goal of model visualization.
Visualizing the model involves presenting the model's internal workings, structure, and
performance metrics in a graphical format. In the context of the Personality Prediction
System, here's how the model can be visualized:
1. Dimensionality Reduction:
PCA Visualization: After preprocessing the text data, Principal Component Analysis (PCA)
can be applied to reduce the dimensionality of the feature space while preserving most of the
variance. Visualizing the data in a lower-dimensional space allows for easier interpretation
and identification of clusters corresponding to different personality traits.
2. Clustering Visualization:
Cluster Plot: Using clustering algorithms such as K-means or hierarchical clustering, the
system can cluster candidates into distinct groups based on their personality traits.
Visualizing these clusters on a scatter plot can provide insights into the distribution and
separation of personality types within the dataset.
23
3. Model Evaluation:
ROC Curve: If the model involves binary classification tasks (e.g., predicting whether a
candidate possesses a specific personality trait), Receiver Operating Characteristic (ROC)
curves can be plotted to visualize the trade-off between true positive rate and false positive
rate at different threshold settings.
Confusion Matrix: For multi-class classification tasks (e.g., predicting personality traits
across multiple dimensions), a confusion matrix can be visualized to display the number of
true positives, true negatives, false positives, and false negatives for each class, offering
insights into the model's classification performance.
4. Textual Representation:
Word Clouds: To gain insights into the most prominent words or phrases associated with
each personality trait, word clouds can be generated to visualize the frequency of terms
extracted from the resumes. This provides a qualitative understanding of the textual patterns
characteristic of different personality types.
5. GenerativeAI Output:
Textual Descriptions: The descriptions generated by the GenerativeAI service can be
presented alongside the corresponding personality traits, providing users with contextual
explanations and insights into the implications of each trait. These descriptions may be
displayed in a user-friendly format within the web interface.
6. Interactive Dashboard:
Interactive Visualization: Incorporating interactive elements such as dropdown menus,
sliders, or checkboxes into the web interface allows users to dynamically explore different
aspects of the model's output.
7. Model Parameters:
Parameter Tuning Plot: If the model involves hyperparameter tuning or optimization,
visualizing the performance metrics (e.g., accuracy, loss) as a function of different parameter
values can aid in selecting the optimal configuration for the model.

24
Fig 3.2 Personality Traits after PCA

We employed the PCA model to lower the dimensionality and distribute the correlated
features linearly because our dataset has millions of rows.
Test the Model: Just as crucial as training the model is testing it. Unlabeled data will be
supplied after a model has been trained, and the trained model will forecast the data's label
and produce results.
If the outcomes are inaccurate, again train the model using a different approach until it
produces forecasts and accuracy that are good.

Outcomes and Discoveries

Once the model has been evaluated, enter the data to be tested and discover the forecast
outcomes.
The forecasting process reveals insights into the distribution of personality traits within the
dataset, identifying clusters of individuals with similar characteristics.
Analysis of model performance metrics provides valuable feedback on the effectiveness of
the forecasting approach and helps fine-tune the model for improved accuracy and reliability.

25
In summary, the forecasting process yields detailed insights into the dataset's structure,
facilitates the identification of distinct personality clusters, and enables the development of
accurate predictive models. These outcomes are crucial for informing decision-making
processes in various domains, including recruitment, organizational development, and
individual assessment.
*******

26
CHAPTER 4

RESULTS AND DISCUSSION

4.1 SNAPSHOTS:

1- Uploading Resume /CV on Portal

Fig 4.1 Resume Upload

2- CV Analysis Results

Fig 4.2 CV Analysis Results

27
3- Personality Traits-

Fig 4.3 Personality Traits

The results of the project report on "Personality Prediction Through CV Analysis Using ML"
demonstrate the successful implementation of a sophisticated system for automating
personality assessments from resumes. Here's a detailed summary of the results:

1. Text Extraction and Preprocessing:

The system accurately extracts textual data from resumes in PDF and DOCX formats using
PyPDF2, textract, and docx libraries.
Extracted text undergoes thorough preprocessing, including punctuation removal,
tokenization, lemmatization, and stopword removal using NLTK.

2. Trait Assignment:
Personality traits are assigned to candidates based on extracted skills and predefined
associations stored in 'traits.txt', ensuring accuracy and relevance in trait assignment.

3. Web Interface and User Interaction:

A user-friendly web interface implemented using Flask allows for easy resume uploads and
analysis result viewing, enhancing user experience and interaction.

28
4. AI Interaction for Trait Description:
The system interacts with Google's GenerativeAI service to generate descriptive text about
a candidate's personality based on their assigned traits, providing deeper insights.

5. Data Management and History:

Extracted resume details are stored in a structured format within a CSV file
('extracted_details.csv'), enabling data persistence for future reference and analysis.
Additional functionalities within the web application, such as viewing historical data,
exporting data, and clearing history, enhance user experience and provide comprehensive
data management capabilities.
Overall, the project report demonstrates the effectiveness of the Personality Prediction
System in automating personality assessments from resumes, empowering recruiters and
employers to make informed hiring decisions based on candidates' psychological profiles.
The combination of text extraction, preprocessing, trait assignment, web interface, AI
interaction, and data management ensures a robust and comprehensive solution for
personality prediction through CV analysis.

4.2 DISCUSSION

The discussion section of the project report on "Personality Prediction Through CV Analysis
Using ML" provides an opportunity to delve deeper into the implications, limitations, and
future directions of the project. Here's a structured discussion based on the key components
of the project:
1. Implications of the Project:
Recruitment Efficiency: The automation of personality assessments from resumes
streamlines the recruitment process, reducing manual workload for recruiters and
accelerating candidate screening.
Informed Decision-Making: Personality insights gleaned from the system empower
employers to make more informed hiring decisions, aligning candidates with organizational
values and culture.
Ethical Considerations: The project addresses ethical concerns surrounding data privacy,
bias mitigation, and transparency, promoting fairness and equality in the recruitment process.

29
2. Limitations and Challenges:
Algorithmic Bias: Despite efforts to mitigate bias, the system may still exhibit biases
inherent in the training data or algorithmic decisions, requiring ongoing monitoring and
refinement.
Generalization and Scalability: The system's performance may vary across different
datasets or industries, necessitating further research to enhance generalization capabilities
and scalability.
Interpretability: The interpretability of personality predictions and trait descriptions
generated by AI models may be limited, posing challenges for user understanding and trust.

3. Future Directions:
Refinement of Models: Continuous refinement of machine learning models based on
feedback and integration of advanced algorithms to improve prediction accuracy and
generalization.
Enhanced Features: Exploration of additional features such as personalized trait
descriptions, career guidance based on personality insights, and integration with HR systems
for comprehensive recruitment solutions.
Research and Development: Further research into the intersection of AI, psychology, and
recruitment practices to deepen understanding and improve the efficacy of personality
prediction systems.

4. Real-World Applications:
Industry Adoption: The project lays the groundwork for industry adoption of automated
personality assessment tools, offering potential benefits for various sectors such as human
resources, talent acquisition, and career counseling.
Academic and Research Impact: The project contributes to academic and research
endeavors in the fields of machine learning, natural language processing, and organizational
psychology, fostering interdisciplinary collaboration and knowledge exchange.

5. Conclusion:
Summary of Contributions: The discussion concludes by summarizing the project's
contributions to recruitment technology, ethical AI deployment, and future research
directions.

30
6. Call to Action: Encourages stakeholders to leverage the insights gained from the project
to drive positive change in recruitment practices, foster diversity and inclusion, and promote
responsible AI innovation.
Nowadays, the corporate world not only prioritizes an individual's skills but also their
personality traits, as they play a crucial role in achieving success both professionally and
personally. Therefore, recruiters must have knowledge of potential employees' personality
traits. However, due to the significant increase in job seekers and the decline in job
availability, it is challenging to manually select the most suitable candidate by just reviewing
their resume. This analysis aims to explore various machine learning techniques for
predicting personality traits effectively by analyzing resumes through Natural Language
Processing (NLP) methods. The research demonstrates that the Random Forest algorithm
outperforms other approaches such as k-Nearest Neighbors (kNN), Logistic Regression,
SVM, and Naive Bayes in terms of accuracy.

7. SWOT ANALYSIS

Strengths:
• Interactive and easy to use.
• Extract all the important features of resume in seconds
• Easily predict the personality of applicant
Weakness:
• It does not store the predicted personality data.
• Bulk of CV cannot be parsed in one go.
Opportunities:
• It can be extended for commercial uses
• It can be made more interactives where we can easily handle bulk data and
represent it.
• It can improve the training model for various addition features that help us to
predict more accurate
• result.
• Instead of direct asking the five characteristic values we can add questionaries’
which ask some
• multiple-choice questions and auto calculate the various values.
Threats:

• There is no security added in the app yet that gives different rights to different
users.
31
• There are a lot of companies in the world and their hiring system is different
from sector to sector, so
• it needs to do changes with company to company requirement which can be
complicated and

*******

32
CHAPTER 5

CONCLUSION AND FUTURE SCOPE

5.1 CONCLUSION

The Personality Prediction System is a sophisticated tool that amalgamates machine

learning, NLP, and web technologies. By automating personality assessment and leveraging
AI for trait description, this system empowers recruiters and employers to make better-
informed hiring decisions. The Personality Prediction System through CV Analysis
represents a significant advancement in recruitment technology, leveraging machine
learning, natural language processing, and web development to automate personality
assessments from resumes. This concluding chapter summarizes the key findings,
implications, and future directions of the project. The Personality Prediction System through
CV Analysis represents a transformative tool in modern recruitment practices, offering a
data-driven approach to personality assessment that enhances efficiency, fairness, and
transparency. By harnessing the power of AI and leveraging insights from resume text, the
system empowers employers to make informed decisions that optimize job fit and
organizational success. As recruitment technology continues to evolve, the project sets a
precedent for responsible AI deployment and innovation in talent acquisition processes.

The Personality Prediction System leverages machine learning, NLP, and web development
technologies to create a sophisticated tool for the recruitment process. By automating
personality assessment and potentially utilizing AI for trait descriptions, the system
empowers recruiters and employers to make more informed hiring decisions based on a
candidate's potential cultural fit.

In conclusion, the Personality Prediction System through CV Analysis represents a pivotal

advancement in the realm of recruitment technology. Through the seamless integration of
machine learning, natural language processing, and web development, this system offers a
sophisticated solution for automating personality assessments from resumes. By harnessing
the power of these cutting-edge technologies, the system not only streamlines the recruitment
process but also provides invaluable insights into the psychological profiles of candidates.
One of the most noteworthy aspects of the system is its demonstrated accuracy in predicting
personality traits. Through rigorous evaluation and validation processes, the machine
learning models employed in the system have shown promising results, effectively distilling
complex textual data into actionable personality insights. This accuracy empowers
employers to make more informed decisions, aligning candidates with organizational values
and culture more effectively.
Beyond its technical capabilities, the system also underscores the importance of ethical
considerations in the deployment of AI technologies in recruitment. By prioritizing data
privacy, bias mitigation, and transparency, the project sets a precedent for responsible AI
deployment in talent acquisition practices. Through these efforts, the system not only
33
enhances efficiency but also promotes fairness and equality in the recruitment process,
ultimately contributing to more inclusive and diverse workplaces.
Looking to the future, there is immense potential for further refinement and expansion of the
system. Continued research and development efforts can lead to the incorporation of
additional features, such as personalized trait descriptions and career guidance based on
personality insights, further enriching the user experience and enhancing the system's utility.
Moreover, ongoing exploration of the intersection between AI and psychology promises to
deepen our understanding of personality prediction and its implications for recruitment
practices.

In essence, the Personality Prediction System through CV Analysis stands as a testament to

the transformative power of technology in shaping the future of talent acquisition. As
recruitment technology continues to evolve, this project serves as a beacon of innovation and
ethical leadership, paving the way for more informed, equitable, and effective hiring
practices in organizations worldwide.

Personality prediction models include the MBTI and OCEAN models. However, the
OCEAN model has a larger and more accurate dataset than the MBTI model. Using the
provided dataset, we are able to achieve an accuracy of 89.13% in this project. The machine
learning model can be trained and tested using random forest classifier techniques.
Additionally, we can use the Flask library to deploy our model. This library offers a user
interface that makes it simple for the user to submit data, and in the back end, our machine
learning model uses the data to predict the user's personality and display it.
The future scope of the Personality Prediction System through CV Analysis holds immense
potential for further innovation and advancement in recruitment technology. This chapter
explores avenues for future development, including enhancements to the system's
functionality, expansion into new domains, and ongoing research directions.
Enhanced Functionality

Personalized Trait Descriptions:

• Introducing personalized trait descriptions tailored to individual candidates, providing

more nuanced insights into their personality profiles.
Career Guidance:
• Integrating career guidance features based on personality insights, assisting candidates in
exploring career paths aligned with their strengths and preferences.
Real-Time Feedback:
• Implementing real-time feedback mechanisms to offer immediate insights to candidates
and recruiters during the application and interview process.
Integration with HR Systems
Seamless Integration:
• Integrating the personality prediction system with existing HR software platforms to create
a unified recruitment ecosystem, streamlining workflows and enhancing data sharing
capabilities.

34
Performance Analytics:

• Incorporating performance analytics features to track the effectiveness of personality

predictions in informing hiring decisions and optimizing organizational outcomes.
Expansion into New Domains

Industry-Specific Applications:
• Adapting the system to cater to the unique recruitment needs of specific industries, such as
healthcare, finance, or technology, by customizing personality trait models and analysis
criteria.

Academic and Research Applications:

•Exploring applications of the system in academic and research settings for analyzing
academic resumes, identifying potential collaborators, or predicting academic success based
on personality traits.

5.2 RESEARCH DIRECTIONS

Deep Learning Models:
• Investigating the application of advanced deep learning models, such as transformer
architectures, for more sophisticated analysis of resume text and personality prediction.
Multimodal Analysis:
• Exploring multimodal analysis techniques that combine textual data from resumes
with other modalities, such as audio or video, to gain deeper insights into candidate
personalities.

Bias Mitigation Strategies:

• Researching and implementing robust bias mitigation strategies to ensure fairness
and equity in personality predictions, particularly for underrepresented groups.

The future scope of the Personality Prediction System through CV Analysis is brimming
with possibilities for further innovation and impact in the field of recruitment technology.
By embracing enhanced functionality, integrating with HR systems, expanding into new
domains, and pursuing ongoing research, the system can continue to evolve and address the
evolving needs of recruiters and candidates alike. As the landscape of talent acquisition
continues to evolve, this project stands poised to lead the way in shaping the future of
recruitment practices.
Human personality has played a vital role in an individual's life as well as in the development
of an organization. One of the ways to judge human personality is by using standard
questionnaires or by analyzing the Curriculum Vitae (CV). Traditionally, recruiters manually
shortlist/filters a candidates CV as per their requirements. In this paper, we present a system
that automates the eligibility check and aptitude evaluation of candidates in a recruitment
process. To meet this need an online application is developed for the analysis of aptitude or
personality test and candidate‟ CV. The system analysis professional eligibility based on the
uploaded CV. The system employs a machine learning approach using TF-IDF Algorithm.
The output of our system gives a decision for candidate recommendation. Further, the
resulting scores help in evaluating the qualities in the candidates by analyzing the scores
35
obtained in different areas. The graphical analysis of the performance of any candidate makes
it easier to evaluate his/her personality and helpful in
In this pandemic it will be hard for organizations to conduct face-to-face interviews. As an
option, online enlistment is done. It's hard to pass judgment on an interviewee's character
online. So our intention is to distinguish an individual's character by taking photographs of
the interviewee at random time during the recruitment process. There is adequate
confirmation that analysis and expressive signals in a human face offer hints of a human
person. So with that, we are deducting an individual's character utilizing Convolutional
Neural Network (CNN) algorithm and facial width-to-height ratio (fWHR). Additionally, a
questionnaire with situational questions is used to identify the level of each personality
within a person. Questionnaire based personality prediction is achieved using K-Means
clustering algorithm. By combining the conclusion of both the approaches we predict the
highest personality that a person possesses among the big five personality traits. Thus, it will
be a gigantic help for the recruitment process. It's more advantageous for organizations and
interviewees to have recruitment online. It's because anyone from anywhere could attend
interviews. Hence time and money could be saved. Thus, we provide our support for the
organizations to keep continuing with online enlistment even after this pandemic.
Personality is an important parameter as it differentiates various individuals from one
another. Personality prediction is an evergreen area of research. Predicting personality with
the help of data through social media is a promising approach as this method does not require
any questionnaires to be filled by users thus reducing time and increasing credibility. Thus
having knowledge of personality is an interesting domain for researchers to work on.
Predicting personality has many applications in real world. Use of social media is increasing
day by day. Huge amount of textual data as well as images continue to explode to the web
daily. Current work focuses on Linear Discriminate Analysis, Multinomial Naive Bayes and
AdaBoost over Twitter standard dataset.

With the development of social networks, a large variety of approaches have been developed
to define users’ personalities based on their social activities and language use habits.
Particular approaches differ with regard to different machine learning algorithms, data
sources, and feature sets. The goal of this paper is to investigate the predictability of the
personality traits of Facebook users based on different features and measures of the Big 5
model. We examine the presence of structures of social networks and linguistic features
relative to personality interactions using the myPersonality project data set. We analyze and
compare four machine learning models and perform the correlation between each of the
feature sets and personality traits. The results for the prediction accuracy show that even if
tested under the same data set, the personality prediction system built on the XGBoost
classifier outperforms the average baseline for all the feature sets, with a highest prediction
accuracy of 74.2%. The best prediction performance was reached for the extraversion trait
by using the individual social network analysis features set, which achieved a higher
personality prediction accuracy of 78.6 %.
Integrate with existing applicant tracking systems (ATS) for seamless workflow. Refine
personality prediction models for improved accuracy.
36
Incorporate additional data sources like cover letters or social media profiles (with proper
consent) for a more holistic candidate assessment.
Address potential ethical considerations and biases within the personality prediction
algorithms.

*******

37
REFERENCES
1-Machine learning to predict personality via CV by ‘ Shankarwar Tanuj’ Edition 2023.

2-Personality Evaluation Through CV Analysis using Machine Learning Algorithm by ‘Suraj

Mali’ Edition 2022.

3-Predicting Personality from Social Media Text by J. Golbeck

4-Using textual data for Personality Prediction:A Machine Learning Approachby Aditi V.
Kunte, S. Panicker
5-Personality Predictions Based on User Behavior on the Facebook Social Media Platform
by Michael M. Tadesse, Hongfei Lin, Bo Xu, Liang Yang
6-Personality Prediction Through CV Analysis using Machine Learning Algorithms for
Automated E-Recruitment Process by G.Sudha
7- Personality Evaluation and CV Analysis using Machine Learning Algorithm by Jayashree
Rout
8- Implementation of Personality Prediction via CV Analysis by Sakshi Dongre, Rajeshwari
Bangre, Aditya Mhaiskar,Poonam Waghade, Rahul Kanoje.
9-Mohammad Hossein Amir hosseini “Machine Learning Approach to Personality Type
Prediction Based on the Myers–Briggs Type Indicator” 14 March 2020.
10-John, O. P., & Srivastava, S. (1999). The Big-Five trait taxonomy: History, measurement,
and theoretical perspectives. In L. A. Pervin &O. P. John (Eds.), Handbook of personality:
Theory and research (Vol.2, pp. 102–138). New York: Guilford Press.
11-Mayuri Pundlik Kalghatgi “A Neural Network Approach to Personality
Prediction based on the Big-Five Model” International Journal of Innovative
Research in Advanced Engineering (IJIRAE)ISSN: 2349-2163 Issue 8, Volume 2 (August
2015).
12-Jupyter Notebook (“https://jupyter.org/install.ht)

38
39

PMSCS JU Freshers Guideline Book
No ratings yet
PMSCS JU Freshers Guideline Book
19 pages
Applied Data Science With Python-Specialization-1
No ratings yet
Applied Data Science With Python-Specialization-1
1 page
Ensuring Pharma Data Integrity
No ratings yet
Ensuring Pharma Data Integrity
26 pages
Design Test Cases On Whatsapp Application: Submitted in Partial Fulfilment of The Requirement For
No ratings yet
Design Test Cases On Whatsapp Application: Submitted in Partial Fulfilment of The Requirement For
7 pages
Documentation
No ratings yet
Documentation
62 pages
Giridhar K Final
No ratings yet
Giridhar K Final
50 pages
Chotu 101
No ratings yet
Chotu 101
28 pages
Final g20
No ratings yet
Final g20
33 pages
Major Project Report
No ratings yet
Major Project Report
25 pages
Yogaposedetectionreport 3
No ratings yet
Yogaposedetectionreport 3
64 pages
Binder 1
No ratings yet
Binder 1
93 pages
MCA M.sc. Final Report Format
No ratings yet
MCA M.sc. Final Report Format
35 pages
Itfilenew
No ratings yet
Itfilenew
35 pages
Benasir Document
No ratings yet
Benasir Document
38 pages
D6 Mainpage
No ratings yet
D6 Mainpage
10 pages
Predicting Behavior Change in SEN Students
No ratings yet
Predicting Behavior Change in SEN Students
73 pages
Group Thesis Part 1
No ratings yet
Group Thesis Part 1
17 pages
Blackbook
No ratings yet
Blackbook
58 pages
Chess Report
No ratings yet
Chess Report
41 pages
22MCB0006 - Dissertation Report II
No ratings yet
22MCB0006 - Dissertation Report II
51 pages
MiniProjReport CSE 3rdyr20230
No ratings yet
MiniProjReport CSE 3rdyr20230
51 pages
Phase 2 Final Report Depression Detection
No ratings yet
Phase 2 Final Report Depression Detection
48 pages
Final Report22.4 PDF
No ratings yet
Final Report22.4 PDF
118 pages
Final-Report22 3 PDF
No ratings yet
Final-Report22 3 PDF
124 pages
MCA Students' Guide to Decision Trees
No ratings yet
MCA Students' Guide to Decision Trees
38 pages
Thesis Machine Learning
No ratings yet
Thesis Machine Learning
28 pages
Final Mini Project123
No ratings yet
Final Mini Project123
52 pages
Dhruv Rai SPT
No ratings yet
Dhruv Rai SPT
37 pages
Report On Fraud Detection
No ratings yet
Report On Fraud Detection
9 pages
Helpify Documentation-2
No ratings yet
Helpify Documentation-2
46 pages
Major 1 (B-16)
No ratings yet
Major 1 (B-16)
51 pages
Indore: A Project Report Submitted at
No ratings yet
Indore: A Project Report Submitted at
27 pages
Gangesh
No ratings yet
Gangesh
8 pages
EDITEDclassification
No ratings yet
EDITEDclassification
71 pages
CHANDRIKA INTERNSHIP REPORT To Be Printed Tomorrow
No ratings yet
CHANDRIKA INTERNSHIP REPORT To Be Printed Tomorrow
34 pages
Major Project Thesis
No ratings yet
Major Project Thesis
75 pages
Narayana Swa My
No ratings yet
Narayana Swa My
44 pages
KC151Report Plagiarism
No ratings yet
KC151Report Plagiarism
38 pages
Declaration: Signature: Signature
No ratings yet
Declaration: Signature: Signature
9 pages
B.SC Cs Batchno 20
No ratings yet
B.SC Cs Batchno 20
50 pages
Final Report 09
No ratings yet
Final Report 09
53 pages
Minor Project - Face Detection (Sec-C - Group6)
No ratings yet
Minor Project - Face Detection (Sec-C - Group6)
13 pages
Front
No ratings yet
Front
9 pages
Moahn Ipl Project
No ratings yet
Moahn Ipl Project
23 pages
Fake Review Detection Prj2
No ratings yet
Fake Review Detection Prj2
30 pages
Final Project Report With Tables Animal45
No ratings yet
Final Project Report With Tables Animal45
36 pages
Data Science Internship Report
No ratings yet
Data Science Internship Report
33 pages
O180421 Summer Internship Report
No ratings yet
O180421 Summer Internship Report
33 pages
Proj File
No ratings yet
Proj File
9 pages
Potato Disease Classification Using CNN Final Project Final - Docs
No ratings yet
Potato Disease Classification Using CNN Final Project Final - Docs
39 pages
Task Management Project Report
No ratings yet
Task Management Project Report
54 pages
Final Reportrrrrttnb
No ratings yet
Final Reportrrrrttnb
60 pages
Character-Centered Data Verification
No ratings yet
Character-Centered Data Verification
75 pages
Project Final1
No ratings yet
Project Final1
38 pages
Pathfinder-Tushar (1) Chhoker
No ratings yet
Pathfinder-Tushar (1) Chhoker
53 pages
PROJECT REPORT AI-driven Healthcare System
No ratings yet
PROJECT REPORT AI-driven Healthcare System
23 pages
Project Pages Hitech
No ratings yet
Project Pages Hitech
4 pages
Project III Report
No ratings yet
Project III Report
39 pages
M9 Documentation
No ratings yet
M9 Documentation
69 pages
Eye Blink Detection: Integrated - Master of Computer Applications
100% (1)
Eye Blink Detection: Integrated - Master of Computer Applications
34 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
29 pages
RTRP Project Documentation Format-2024 (AutoRecovered)
No ratings yet
RTRP Project Documentation Format-2024 (AutoRecovered)
62 pages
Topic Segmentation For Textual Document Written in Arabic Language
No ratings yet
Topic Segmentation For Textual Document Written in Arabic Language
10 pages
Vidya Sagar Resume
No ratings yet
Vidya Sagar Resume
1 page
Kritagya Kumra: Education
No ratings yet
Kritagya Kumra: Education
1 page
M.Sc. COS 302: Machine & Deep Learning Exam
No ratings yet
M.Sc. COS 302: Machine & Deep Learning Exam
2 pages
Database Lecture Notes
No ratings yet
Database Lecture Notes
236 pages
International Journal of Data Mining & Knowledge Management Process (IJDKP)
No ratings yet
International Journal of Data Mining & Knowledge Management Process (IJDKP)
2 pages
Javascript Masterclass
No ratings yet
Javascript Masterclass
33 pages
Dbintfc
No ratings yet
Dbintfc
80 pages
Information Technologies Fundamentals: Musalasoft
No ratings yet
Information Technologies Fundamentals: Musalasoft
45 pages
10 XRD Software
No ratings yet
10 XRD Software
8 pages
Movie Recommendation System Using SVD Letterboxd
No ratings yet
Movie Recommendation System Using SVD Letterboxd
9 pages
2009 - Cheng, Chen - Classifying The Segmentation of Customer Value Via RFM Model and RS Theory - Expert Systems With Applications
No ratings yet
2009 - Cheng, Chen - Classifying The Segmentation of Customer Value Via RFM Model and RS Theory - Expert Systems With Applications
9 pages
Normalization of Database Tables
No ratings yet
Normalization of Database Tables
52 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
HIS2 Lesson 4
No ratings yet
HIS2 Lesson 4
10 pages
Google Cloud Computing Foundations Data ML and AI Google Cloud GoogleX
No ratings yet
Google Cloud Computing Foundations Data ML and AI Google Cloud GoogleX
3 pages
IT Contract Bidding Guide
No ratings yet
IT Contract Bidding Guide
24 pages
Operations and Metric Analytics - Case Study
No ratings yet
Operations and Metric Analytics - Case Study
17 pages
2208 Computer Science Distributive Areas and Electives
No ratings yet
2208 Computer Science Distributive Areas and Electives
1 page
DBMS Features, Advantages, and Applications
No ratings yet
DBMS Features, Advantages, and Applications
32 pages
A Comprehensive Study Classification of Asian Ethnicities From Facial Images Using Deep Learning
No ratings yet
A Comprehensive Study Classification of Asian Ethnicities From Facial Images Using Deep Learning
5 pages
NCW - SALIS2024eProceedings 482 488
No ratings yet
NCW - SALIS2024eProceedings 482 488
8 pages
Leveraging A I
No ratings yet
Leveraging A I
10 pages
Name: Samuel Gachari REG NO: HDB212-0564/2017 Course: Bbit 4.2 Unit: Artificial Intelligence Assignment
No ratings yet
Name: Samuel Gachari REG NO: HDB212-0564/2017 Course: Bbit 4.2 Unit: Artificial Intelligence Assignment
4 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Mod 2 Business Analytics
No ratings yet
Mod 2 Business Analytics
43 pages
GraphDecoder Recovering Diverse Network Graphs From Visualization Images Via Attention-Aware Learning
No ratings yet
GraphDecoder Recovering Diverse Network Graphs From Visualization Images Via Attention-Aware Learning
17 pages