Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views14 pages

Project

This document presents a novel method for job matching and skill recommendation using transformers and the O*NET database. The approach aims to enhance the efficiency of the hiring process by identifying skills from candidates' resumes and matching them with job descriptions, ultimately helping applicants find suitable jobs and recommending necessary skills to improve their qualifications. The evaluation of the proposed method demonstrates its superiority over existing baseline approaches in two defined scenarios.

Uploaded by

swethap.22it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views14 pages

Project

This document presents a novel method for job matching and skill recommendation using transformers and the O*NET database. The approach aims to enhance the efficiency of the hiring process by identifying skills from candidates' resumes and matching them with job descriptions, ultimately helping applicants find suitable jobs and recommending necessary skills to improve their qualifications. The evaluation of the proposed method demonstrates its superiority over existing baseline approaches in two defined scenarios.

Uploaded by

swethap.22it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Big Data Research 39 (2025) 100509

Contents lists available at ScienceDirect

Big Data Research


journal homepage: www.elsevier.com/locate/bdr

A novel approach for job matching and skill recommendation using


transformers and the O*NET database
,∗
Rubén Alonso a,d , Danilo Dessí b , Antonello Meloni c , Diego Reforgiato Recupero a,c,
a R2M Solution s.r.l., ICT Division, Polo Tecnologico di Pavia, Pavia, Italy
b Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, United Arab Emirates
c Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
d Programa de Doctorado, Centro de Automática y Robótica, CSIC-Universidad Politécnica de Madrid, Madrid, Spain

A R T I C L E I N F O A B S T R A C T

Keywords: Today we have tons of information posted on the web every day regarding job supply and demand which has
Information extraction heavily affected the job market. The online enrolling process has thus become efficient for applicants as it allows
Transformers them to present their resumes using the Internet and, as such, simultaneously to numerous organizations. Online
Online enrolling process
systems such as Monster.com, OfferZen, and LinkedIn contain millions of job offers and resumes of potential
Natural language processing
Course recommendation
candidates leaving to companies with the hard task to face an enormous amount of data to manage to select
the most suitable applicant. The task of assessing the resumes of candidates and providing automatic recom­
mendations on which one suits a particular position best has, therefore, become essential to speed up the hiring
process. Similarly, it is important to help applicants to quickly find a job appropriate to their skills and provide
recommendations about what they need to master to become eligible for certain jobs. Our approach lies in this
context and proposes a new method to identify skills from candidates’ resumes and match resumes with job de­
scriptions. We employed the O*NET database entities related to different skills and abilities required by different
jobs; moreover, we leveraged deep learning technologies to compute the semantic similarity between O*NET
entities and part of text extracted from candidates’ resumes. The ultimate goal is to identify the most suitable
job for a certain resume according to the information there contained. We have d­fined two scenarios: i) given
a resume, identify the top O*NET occupations with the highest match with the resume, ii) given a candidate’s
resume and a set of job descriptions, identify which one of the input jobs is the most suitable for the candidate.
The evaluation that has been carried out indicates that the proposed approach outperforms the baselines in the
two scenarios. Finally, we provide a use case for candidates where it is possible to recommend courses with the
goal to fill certain skills and make them qual­fied for a certain job.

1. Introduction search centers). One problem that arises from the application to multiple
online systems is that there is no universal standard format to adopt
The job market has been heavily i­fluenced by the tons of informa­ when filling resume information (although systems such as Europass1
tion that are posted on the Web every day regarding job supply and or LinkedIn2 have some methods to automatically generate structured
demand. The online enrolling process has become efficient for appli­ pr­files). It follows that resumes are all very different from each other
cants as it allows them to present their resumes using the Internet and, in terms of structure, design, and format, hindering an efficient and
as such, simultaneously to numerous organizations (companies or re­ fast analysis from people working within human resources divisions of

* Corresponding author at: Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy.
E-mail addresses: [email protected] (R. Alonso), [email protected] (D. Dessí), [email protected] (A. Meloni),
[email protected] (D. Reforgiato Recupero).
1
https://europa.eu/europass/en.
2
http://www.linkedin.com.

https://doi.org/10.1016/j.bdr.2025.100509
Received 18 August 2022; Received in revised form 22 January 2025; Accepted 6 February 2025

Available online 7 February 2025


2214-5796/© 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
the different organizations. Studies on job-resume matching have high­ Transformers are able to efficiently solve several tasks such as sequence
lighted the challenges posed by the lack of standardization in resumes, class­fication, question answering, language modeling, text generation,
which might lead to inconsistencies in candidate evaluation [1,2]. named entity recognition, and most of all, semantic similarity between
The simplicity and speed of sending online applications hide one two texts. Therefore, thanks to the advancements in Natural Language
more problem related to resume analysis. When an organization posts Processing (NLP), Semantic Web, Art­ficial Intelligence, Deep Learn­
a certain job, there is a number of required job-specific and mandatory ing, and Information Extraction, we have a plethora of technologies
skills. However, several applicants who do not satisfy them keep send­ (software and hardware) to efficiently extract and identify valuable
ing their resume ``just to give it a try''. This happens because there are and relevant information from candidates’ resumes. For instance, recent
no risks in sending non-eligible applications (although, in certain situa­ work has proposed the use of Large Language Models (LLMs) models
tions, they might end up in some black list of low-quality applications3 ). to produce personalized job descriptions given input CVs [15], lever­
Online systems such as Monster.com4 , OfferZen5 and LinkedIn con­ ages semantic-enhanced transformers to parse job descriptions and align
tain millions of job offers and resumes of potential candidates leaving them with user skills, improving the accuracy of recommendations [16],
to companies the hard task to face enormous amounts of data with the handling job recommendation as resource allocation tasks to study the
goal of selecting the most suitable applicant for their needs6 . Quickly fairness of machine learning-based models [17]. Despite these advance­
assessing the resumes of candidates and providing automatic recom­ ments, LLM- and deep learning-based recommenders face a significant
mendations on which one suits best a particular position have, therefore, challenge: they frequently introduce biases and fail to provide clear,
become essential to speed up the screening and hiring process for com­ interpretable just­fications for why a particular CV is matched with a
panies [3--5]. Similarly, it is important to help applicants to quickly find given job, raising concerns about fairness and accountability in the rec­
a job appropriate to their skills and provide recommendations about
ommendation process.
what they need to master to become eligible for their desired jobs [6].
Our paper lies in this context and proposes a new approach to iden­
The first problem to solve has been the text extraction from resumes
tify skills from candidates’ resumes and job postings and match the two
represented by non-textual documents (e.g., images or PDF files). This
of them. We consider the O*NET database information related to tech­
problem has already been discussed and faced in other domains as well
nology skills, skills, knowledge, abilities, work activities, task ratings,
leading to the creation of systems with very accurate extraction of body
and tools used. Then, given the resume of a candidate, we run differ­
text [7--9]. For instance, the study presented in [10] discusses methods
ent NLP tools to extract information that is then matched against the
for extracting information from documents like resumes, highlighting
O*NET entities and returns the most suitable job for the underlying re­
advancements in this area.
sume. We consider two scenarios, one useful for the companies needed
Once the body text has been extracted, the next challenge is to iden­
to screen several resumes and another one useful for candidates when
tify words or compound expressions that match the candidate’s skills
looking for a job among several job postings:
peculiar to the underlying job. One resource that has been created to
support this problem is the O*NET database [11]7 . It contains a rich
set of variables that describe work and worker characteristics, includ­ • First scenario: a resume is provided as input. Then the goal is to
ing skill requirements. It also contains hundreds of descriptors on almost identify the O*NET occupation with the highest match with the
a thousand occupations related to the United States. Thus, the O*NET resume;
database represents a valuable resource to identify skills in a candidate’s • Second scenario: a resume and a list of job postings are given in
resume and associate them with related jobs. Using O*NET, the chal­ input. The goal is to find which of the input job postings matches
lenge to find words indicating job skills is therefore narrowed to the the most candidate’s skills present in the resume.
ident­fication of words or expressions within a certain resume which
correspond to those described within the O*NET database. For exam­ The contributions of our paper are therefore the following:
ple, a resume might contain the word programming which, in O*NET,
corresponds to a skill for several jobs. Simple syntactic rules (i.e., string • we employ Deep Learning transformers to identify the pieces of
matching, approximate string matching using the Levenshtein distance, text in a resume with high semantic similarity with the entries of
etc.) may not be enough; in fact, the presence of word synonyms and corresponding O*NET entities;
the different forms a sentence can be formulated hinder their use. As a • using ad-hoc metrics we identify the most related O*NET job to the
further example, in O*NET the description associated with the technol­ resume of the underlying candidate;
ogy skill Google Drive, which, in turn, is one of the skills required for the • we provide two complementary scenarios where we applied our
job Marketing Managers, is Cloud-based data access and sharing software. approach;
Hardly a resume will contain the same formulation; more likely it will • for the first scenario, we tested our approach on a dataset of 105
include different expressions conveying the same meaning such as use resumes and outperformed two baseline methods we have d­fined;
of cloud-based software, or use of sharing applications to name a few. for such a purpose, three independent experts have annotated the
With the advancements of Machine Learning technologies, we have dataset which we have publicly released;
today very powerful methods able to compute the semantic similarity of • for the second scenario, we tested our approach on a dataset of
two pieces of text (documents, paragraphs, expressions). Furthermore, 100 resumes where each of them was associated with 10 different
the evolution of Deep Learning transformers [12,13], enabled by the job postings (the resume was qual­fied for only one of them); our
huge amount of publicly available annotated datasets and disposal of approach outperformed two more baselines we have d­fined; again,
powerful graphical processing units (GPU) to run neural networks in for such a purpose, three independent experts have annotated the
parallel, have provided cutting-edge solutions for a great number of dataset which we have publicly released;
problems included those revolved around the Semantic domain [14]. • we provide a use case where we apply our approach to recommend
courses and lectures to candidates with the goal of acquiring certain
3
skills and thus resulting qual­fied for certain jobs they had targeted;
https://www.monster.com/career-advice/article/things-that-will-blacklist-
• we release the source code of our approach for both the scenarios
you-from-job.
4 and the recommendation task in a public repository8 ;
http://www.monster.com.
5
https://www.offerzen.com/.
6
https://economicgraph.linkedin.com/resources/linkedin-workforce-report-
8
april-2022. Scenarios source code: https://gitlab.com/hri_lab1/using-transformers-and-
7
https://www.onetcenter.org/database.html. o-net-to-match-jobs-to-applicants-resumes.

2
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
• we provide an interactive demo9 and its source code10 that imple­ those skills. All in an anonymous way and based on public occupation
ments our proposed approach to match applicants’ resumes with databases such as O*NET or ESCO12 .
the O*NET database jobs to measure the eligibility of a user for a Other authors in [25] created a system to suggest jobs based on users’
given job. pr­files. Users and jobs have been treated as text documents, and a
model that incorporates job transitions trained on the career progres­
The remainder of this paper is organized as follows. Section 2 dis­ sions of a set of users has been adopted. The authors also showed that
cusses related works on resume extraction and job recommendation. combining career transitions with cosine similarity outperforms the sys­
Section 3 details the tools we have used in this paper as well as the tem using just career transitions. The evaluation proving the statements
O*NET database. The proposed approach is presented in Section 4 above has been carried out on a dataset of 2,400 Linkedin users with the
whereas Section 5 introduces the scenarios we have considered. Sec­ task of predicting users’ current positions by looking at their pr­files and
tion 6 illustrates the evaluation we have carried out for the two scenar­ their jobs history.
ios. A use case we propose in the paper which can leverage the proposed The works illustrated above do not leverage a widely recognized
approach to recommend skills to job seekers is shown in Section 7. Fi­ taxonomy of skills or other similar entities. Sometimes the set of the
nally, Section 8 ends the paper with conclusions, limitations, and future considered skills is too high and approaches to reduce the dimensional
works where we are headed. space need to be adopted with the drawback of losing precision. Dif­
ferently from the previous approaches, we use the O*NET taxonomy
by including not just skills but several other entities of a different kind
2. Related work
(i.e., knowledge, abilities, technology skills, etc.). O*NET is one of the
main occupational databases, and is almost a reference for occupation
The last two decades have seen the development of online recruit­ analysis and worker requirements. It is the primary source of occupa­
ing platforms, a topic that has recently acquired increasing attention. tional information in the United States and it includes regularly updated
Authors in [18,19] presented two surveys of existing recommendation occupational characteristics based on questionnaires made to several
approaches that have been proposed to create recommendation systems hundred workers. In doing so, we inject into our approach cognitive,
for job seekers and recruiters. On the one hand, extracting features from interpersonal, and physical skill knowledge representing worker and
resumes and from job postings has always been challenging. In particu­ job requirements coming from a large number of workers and compa­
lar, user pr­filing deals with acquiring, extracting, and representing the nies, thus making the person-job matching more robust. Finally, our
features of users [20]. On the other hand, job pr­filing is a representa­ work differs from previous studies [21--25] because it does not rely on
tion of job descriptions and their requirements. Usually, job descriptions previous applicants for a specific job position like in collaborative filter­
come in unstructured text with no attribute names with well-defined val­ ing approaches. Therefore, each resume is used as it is and is matched
ues. It follows that the skill set for a particular job includes skills with a against jobs thus allowing us to avoid biases as well as the cold start
Boolean value: True if the skill is required for that job, False otherwise. problem [26]. Last but not least, to match each of these entities with
A kind of approach that achieved great success to recommend jobs is applicants’ resumes and job postings we make use of sentence transform­
collaborative filtering. It is based on the assumption that if users 𝐴 and ers thus leveraging the semantics of each word, sentence, and context.
𝐵 have similar behaviors they will rate other items similarly [21--23]. This differs from the existing studies which often make use of only well­
Authors in [23] developed a job recommendation system using a model­ defined attributes mentioned in the applicants’ resumes or job postings,
based collaborative algorithm with clustering algorithms. The Latent thus making it possible for our approach to deal with the complexity of
Semantic Analysis (LSA) and Singular Value Decomposition (SVD) have the natural language text contained in applicants’ resumes.
been adopted to create a lower-rank matrix with information about skills
and positions. Moreover, the inverse cosine similarity was employed as 3. Task definition and the used material
distance to perform the agglomerative clustering to create clusters of
positions. Authors in [22] created a system for job recommendation by In this section, we provide a formal definition of the task addressed
making clusters of users that are based on skills extracted from differ­ in this work, alongside details about the transformers model and the
ent websites. The Euclidean distance has been leveraged as a measure O*NET database leveraged in the proposed approach.
of similarity between the skills of users. Then, ident­fied skills with low
occurrence were removed from the list and a class­fication using Naive 3.1. Task definition
Bayes was employed to rank the final set of recommendations for a user.
The Labor Market Explorer interactive dashboard for job seekers has The objective of our approach is to develop a system that automat­
been presented by authors in [24]. It was built with a careful user­ ically ident­fies and matches relevant skills and qual­fications between
centered design processing where both job seekers and job mediators candidates’ resumes and job postings. Formally, given:
were involved so that the matching process between jobs and job seek­
ers could be optimized. The dashboard enables an exploration of the • Resume Information (R): A structured or semi-structured textual
job market in a personalized way based on the skills and competencies representation of a candidate’s skills, experiences, education, and
of the applicants. Efforts related to career exploration and the detec­ qual­fications.
tion of training needs have also been and are being carried out. For • Job Information (J): A set of job classes derived from O*NET, com­
example, the STAR11 project has the goal to design new technologies prising multiple entities such as technology skills, general skills,
to enable the deployment of standard-based secure, safe, reliable, and knowledge, abilities, work activities, task ratings, and tools used.
trusted human-centric AI systems in manufacturing environment. There,
the Workers’ Training Platform is being developed. This platform allows The task is d­fined as a mapping function 𝑓 ∶ 𝑅 → 𝐽 , where for
workers to self-assess themselves and detect training needs related to each resume 𝑟 ∈ 𝑅, the function 𝑓 returns a ranked list of job classes
skills or knowledge while offering them training recommendations for {𝑗1 , 𝑗2 , ..., 𝑗𝑛 } ⊂ 𝐽 , ordered by their relevance to the candidate’s pr­file.
The relevance score is calculated by leveraging semantic similarity
to align the extracted elements from the resume with the O*NET enti­
9
Demo: http://192.167.149.11:8000. ties corresponding to each job. For each matched element, a job-specific
10
Demo source code: https://gitlab.com/hri_lab1/onet-db26-transformers-
demo.
11 12
https://www.star-ai.eu. https://ec.europa.eu/esco/.

3
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509

Fig. 1. The figure illustrates the structure of a Siamese network designed for text comparison. Each branch of the network consists of a text input processed through
a BERT model that encodes the text into vectorial representations, and a pooling layer that aggregates the token-level features into a fixed-size vector representation
capturing the most salient information. The resulting vectors from both branches u and v are then compared using cosine similarity to measure their similarity. .

score derived from O*NET is assigned across various categories, in­ comparing resumes, job descriptions, and O*NET entities. In addition,
cluding ``skills'', ``knowledge'', ``abilities'', ``work activities'', and ``tasks''. our use of Sentence Transformers aligns with the nature of our task,
Additionally, conventional scores are applied for ``technology skills'' and where capturing nuanced semantic similarity between textual data is
“tools used''. critical. These models are specifically designed to deliver superior re­
The approach operates in two specific scenarios: sults for tasks involving sentence embeddings and pairwise comparisons,
which are central to our approach. The reader notices that 𝜌 corresponds
1. Resume-to-Job Matching (R2J): Assisting companies in screen­ to the Spearman rank correlation coefficient [34] between the ranking
ing multiple resumes to identify the most suitable candidates for a of sentence pairs using the cosine similarity as score and the gold stan­
specific job posting. dard ranking for various semantic textual similarity tasks. The Spearman
2. Job-to-Resume Matching (J2R): Helping candidates find the most rank correlation coefficient gives a score in the continuous range [−1, 1]
relevant job postings based on their resumes. where a value of 1 indicates a perfect correlation between the two rank­
ings, a value of 0 indicates a weak correlation, and a value of -1 means
By leveraging NLP advancements and semantic similarity compu­ a perfect negative correlation.
tation, the proposed system aims to address challenges such as entity The all-mpnet-base-v2 model has a size of 420 Mbytes, a Max Se­
disambiguation, contextual understanding of skills and qual­fications, quence Length of 384 tokens, and an embedding speed of 2, 800
and alignment between heterogeneous textual data from resumes and sentences/sec on a V100 GPU. The generated embeddings are 768­
job descriptions. dimensional vectors. The suitable score functions are the dot product,
the cosine similarity, and the Euclidean distance.
3.2. Sentence transformers
3.3. The O*NET database

SentenceTransformers13 [27] is the state-of-the-art Python frame­ The O*NET Program16 is the primary source of occupational infor­
work for text embeddings generation. The framework leverages models mation for the United States. The goal of its creation was to understand
that are built on top of the original BERT model [28] or its further de­ the rapidly changing nature of work and how it impacts the work­
velopments such as RoBERTa [29], MPNet [30], and ALBERT [31]. It force and the U.S. economy. One of the main outcomes of the program
provides transformer models that use siamese and triplet network struc­ is the O*NET database, which includes hundreds of standardized and
tures to transform sentences into embeddings. Then, these can be com­ occupation-specific descriptors on almost a thousand occupations cov­
pared by using metrics (e.g., the cosine similarity) to perform tasks such ering the entire U.S. economy. The database is freely available, under a
as information retrieval, clustering, and semantic search. The first mod­ Creative Commons license, and is continuously updated on a quarterly
els were trained on Natural Language Inference (NLI) datasets [32,33] basis by different institutions. It has already been used by millions of per­
and successfully became the state of the art for solving Semantic Textual sons for career exploration and to discover which training is necessary
Similarity (STS) tasks. to be eligible for a position, and by employers to find skilled workers to
Today, SentenceTransformers pre-trained models are built as exten­ better compete in the marketplace.
sions of Huggingface Transformers models14 by applying pooling layers Each occupation included in the O*NET database requires a dis­
within siamese structures. The reader can observe an example of a com­ parate mix of knowledge, skills, and abilities and is performed using
mon siamese structure with two BERT models to compute the cosine a variety of activities and tasks. We have used the O*NET database ver­
similarity between two texts in Fig. 1. The pool of SentenceTransformers sion 26.2 which includes 1,016 occupations, 52 Abilities, 33 Knowledge,
models can be found at https://huggingface.co/models. 35 Skills, 41 Work Activities, 4,127 Tools Used, 17,975 Task Ratings, and
Within the proposed system, we have chosen to embed the all-mpnet­ 8761 Technology Skills grouped into 135 categories. In the remainder
base-v2 model, which achieves a 𝜌 = 0.69 on encoding sentences over of the paper, we will refer to Abilities, Knowledge, Skills, Work Activi­
14 diverse tasks, and a 𝜌 = 0.57 on 6 diverse tasks for semantic search ties, Tools Used, Task Ratings and Technology Skills as O*NET entities.
such as encoding of queries and questions. According to the official Sen­ Each occupation is associated with a unique ident­fier, a title, and a re­
tence Transformers documentation15 , this model is specifically noted for lated description. For example, the occupation Computer and Information
achieving higher performance compared to a range of other pre-trained Research Scientist has description Conduct research into fundamental com­
models, including standard BERT-based and RoBERTa-based architec­ puter and information science as theorists, designers, or inventors. Develop
tures. The documentation emphasizes that Sentence Transformers mod­ solutions to problems in the field of computer hardware and software.
els are optimized for sentence-level tasks, such as semantic similarity, The Abilities consist of a set of capacities related to each occupa­
making them particularly well-suited for our application, which involves tion. One occupation may be associated with many abilities, each with
a related score. A given ability (as for Knowledge, Skills and Work Ac­
tivities) might be present in different occupations with different scores.
13
https://www.sbert.net/.
14
https://huggingface.co/docs/transformers/index.
15 16
https://www.sbert.net/docs/pretrained_models.html. https://www.onetcenter.org/.

4
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509

Fig. 2. O*NET ``Technology Skills'' and ``Tasks'' Entities - First 50 elements by frequency.

One example is Oral Comprehension which, among others, appears in the one more field, (hot technology), meaning whether the underlying tech­
occupations Chief Executives and Biostatisticians with a score value of, re­ nology skill is hot or not.
spectively, 4.5 and 4. The score r­flects the importance of that ability The reader notices that the last three described O*NET entities, Task
with respect to the associated occupation and has a value in the [1-6] Ratings, Tools Used, and Technology Skills, only occur when they are
continuous range. The higher the score, the more important that ability required for the underlying job. Moreover, Tools Used and Technology
is for the job it refers to. Skills do not have a score value associated with an occupation, Task
Knowledge represents the required area of expertise for the under­ Ratings does. Overall, as previously mentioned, there are 4,127 Tools
lying job and has scores in the [1-7] continuous range. An example is Used, 17,975 Task Ratings, and 8761 distinct Technology Skills. A cer­
provided by Mathematics which has a score of 6.83 for the occupation tain O*NET job will require a subset of each of the three entities. For
Mathematicians. Similarly to the abilities, also Knowledge has many-to­ example, the O*NET job Chief Executives requires 7 Tools Used, 31 Task
many relations with the occupations and each different pair (occupation, Ratings, and 49 Technology Skills. Conversely, each O*NET occupation
knowledge) has a different score value. is always associated with a fixed number for each of the first four de­
Skills are the competencies required for each occupation and have a scribed entities. Hence, any O*NET occupation will be associated with
score in the [1-6] continuous range. One example is Programming, as­ a vector of 52 Abilities, a vector of 33 Knowledge, a vector of 35 Skills,
sociated with the job Computer Systems Engineers/Architects with a score and a vector of 41 Work Activities.
value of 3.38. One more table worth to be mentioned is the content model reference
For Work Activities (values range in the [1-7] continuous interval), which contains complete descriptions of all the elements included in
an example is given by Analyzing Data or Information, with a score of the entities of the O*NET database previously described. For example,
6.61 with respect to the occupation Financial Quantitative Analysts. Both one of the Abilities’ elements is Cognitive Abilities and its description
Skills and Work Activities have many-to-many relations with different is Abilities that i­fluence the acquisition and application of knowledge in
score values for each different job. problem solving.
One example of Task Ratings is Direct and coordinate activities involving To provide the reader with some statistics, Figs. 2 and 3 show the
sales of manufactured products, services, commodities, real estate or other first 50 elements, in decreasing order by the number of times occurring
subjects of sale, related to the job Sales Manager with an importance score within the dataset’s jobs, of the O*NET entities with a variable num­
of 4.22 (scores of Task Ratings are in the interval [1-5]). ber of elements per job, that is ``Technology skills'', ``Tasks'', and ``Tools
An example of Tools Used is Personal Computer, associated with sev­ Used''. For example, Microsoft Excel is a technology skill required by 834
eral jobs (almost all of them). O*NET jobs whereas Personal Computers is a Tool required by 656 occu­
Finally, an example of Technology Skills is Atlassian JIRA, related to pations. Similarly, Figs. 4 and 5 show the average values of the scores,
the occupation Administrative Services Managers; Technology Skills have calculated over all the O*NET jobs, of the elements of the O*NET entities

5
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
Table 1
Dimension of the vector space for the first four O*NET entities.

Name Dimension Example elements

Abilities 52 Oral Comprehension, Written Comprehension


Knowledge 33 Sales and Marketing, Computers and Electronics
Skills 35 Science, Active Learning, Service Orientation, Speaking
Work Activities 41 Analyzing Data or Information, Thinking Creatively

We have mapped the first four entities to 𝑛-dimensional vectors in


which the components are the scores (already mentioned in Section 3)
attributed to the importance of the 𝑖𝑡ℎ element of the underlying vector
(whose size is reported in Table 1) of one of the entities with respect
to the 1, 016 different jobs present in the O*NET database. For sake of
clarity, let us remark that, as discussed in Section 3, there is a difference
among the entities. The first four entities have a fixed dimension for each
job as indicated in Table 1. The other three entities have dimensions
that vary depending on the job. For Task Ratings for example, the job
Sales Manager includes 17 of them (each with a score or data value, as
shown in Fig. 6) whereas the job Wind Energy Engineers includes 16 other
different tasks.
The reader notices that for tasks we have considered the scores asso­
ciated with the scale importance which lies within the [1-5] interval. As
mentioned in Section 3.3, in the O*NET database, the only elements
that are not weighted differently based on their relative importance
for job matching are technology skills and tools. Our system utilizes
all the information provided by the O*NET database as it is, without
altering or interpreting the design choices made by its developers, in
order to preserve the consistency and structure of the database. Specifi­
cally, Technology Skills have no values relative to their importance but
are class­fied by O*NET as ``hot technology'' or not. Hence, assuming
we extract an element from the resume that is matched against a cer­
tain Technology Skill, we decided to assign a score of 1 if the recognized
item is class­fied from O*NET as ``hot'' and 0.75 otherwise. Finally, for
a certain Tool Used ident­fied in a resume, we simply assign a score of
1.0 if it is ident­fied and 0 if it has not been detected.

4.2. Transformer-based comparison

To collect information to compare against the O*NET entities, we ex­


tracted sentences, nouns (e.g., network, mathematics, archaeology), and
noun phrases (e.g., system administrator, server management, machine
Fig. 3. O*NET ``Tools Used'' Entity - First 50 elements by frequency.
learning) from the text of resumes or jobs descriptions using the TextBlob
library17 . It was selected for its simplicity, efficiency, and reliability in
with a fixed number of elements per job. So, for example, Active Listen­ extracting nouns and noun phrases, which align well with the structured
ing is the value of the entity Skills with the highest average score (3.60) elements of O*NET entities. This library allowed us to effectively pre­
whereas English Language is the value of the Knowledge entity with the process the textual data, ensuring compatibility with the downstream
highest average score (3.66). Similar considerations can be done for the tasks of semantic similarity and job matching.
values of the entities in Fig. 5. The nouns and noun phrases extracted are compared with the values
of the elements of the O*NET entities, while the sentences are compared
4. The proposed approach with the descriptions of these elements. The descriptions of elements
have been introduced in Section 3 and exist for the entities Abilities,
This section describes the proposed approach used to match resumes Knowledge, Skills, Work Activities, and Task Ratings. The reason why
and jobs. we chose nouns and noun phrases to be matched with the entities ele­
ments is the size of the latter (in general they consist of a few tokens). To
4.1. Adopted O*NET entities not lose the information given by the descriptions of some of the entities
we have considered, we have also compared the descriptions with the
We used the following O*NET entities: entire sentences extracted from the resumes or job descriptions. We used
a pre-trained neural network model from Sentence Transformers (‘all­
• ``Abilities'', mpnet-base-v2’) to get text embeddings and leverage the cosine similarity
• ``Knowledge'', to generate, given a certain resume or job posting, one similarity matrix
• ``Skills'', for each pair (job, entity). Given a similarity matrix for a pair (job, en­
• ``Work Activities'', tity), the columns of the matrix are the information extracted from the
• ``Task Ratings'',
• ``Technology Skills'',
17
• ``Tools Used''. https://textblob.readthedocs.io/en/dev/.

6
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509

Fig. 4. O*NET ``Skills'' and ``Knowledge'' Entities - Average value of the elements.

Fig. 5. O*NET ``Abilities'' and ``Work Activities'' Entities - Average value of the elements.

7
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509

Fig. 6. Task Ratings for the job Sales Manager.

resume (or job posting) and the rows are the elements of the O*NET en­
tity which might occur in the job being considered. For example, when
analyzing the entity Abilities for a certain job, the associated similarity
matrix will have 52 rows and a number of columns depending on the in­
formation (nouns, sentences, noun phrases) extracted from the resume
or job posting. We fill each cell of the matrix with the similarity value be­
tween the extracted information from the resume/job posting (column
𝑐 ) and the elements of the O*NET entities (row 𝑟). From each column,
the system extracts the element with the maximum value, which corre­
sponds to the element of the O*NET entity with the highest similarity
value with respect to the extracted element of the resume/job posting
represented in the underlying column. If the similarity value is greater
than an empirically found threshold, the score of the element of the
O*NET entity is added to the resume/job posting score for the current
job.
This threshold was determined after conducting numerous experi­
ments, and it was adjusted to identify the value that maximized class­fi­
cation accuracy. We tested different values and found that 0.65 provided
the best balance between correctly identifying relevant O*NET entities
and minimizing misclass­fications. Fig. 7. System diagram.
The system then calculates the maximum score obtainable for the
current job 𝑗 and normalizes the score obtained from the resume/job a job category has 25 entities associated in O*NET and only 5 entities
posting by applying the formula: are matched in the resume, the score would be normalized by dividing
𝑖=𝑛𝑢𝑚 _𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
(∑ ) by the sum of the 25 scores, yielding a lower score compared to a job
∑ 𝑘∈𝑒𝑛𝑡𝑖𝑡𝑦𝑖 𝑠𝑐𝑜𝑟𝑒𝑘 (𝑗) ∑
𝑠𝑐𝑜𝑟𝑒(𝑗) = + 0.5 ∗ 𝑠𝑐𝑜𝑟𝑒𝑘 (𝑗) with only 10 associated entities in O*NET. This imbalance could lead to
𝑖=1
𝑠𝑐𝑜𝑟𝑒_𝑚𝑎𝑥𝑖 (𝑗) 𝑘∈𝑒𝑛𝑡𝑖𝑡𝑦𝑖 the underrepresentation of jobs with a higher number of associated enti­
(1) ties, such as IT-related jobs, which are linked to many tools and technical
skills in O*NET, even if a significant portion of their relevant entities is
where 𝑛𝑢𝑚_𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 is equal to seven and corresponds to the entities that detected in the resume. The corrective factor is necessary to address this
have been previously described, 𝑠𝑐𝑜𝑟𝑒𝑘 (𝑗) is the sum of the scores of issue, ensuring that jobs with a large number of associated entities do not
each element (ident­fied in the resume/job posting) of the entity 𝑖 of the receive disproportionately low scores due to the normalization process.
job 𝑗 that is being checked whereas 𝑠𝑐𝑜𝑟𝑒_𝑚𝑎𝑥𝑖 (𝑗) is the maximum score We conducted several experiments adjusting this parameter (e.g., testing
that the job 𝑗 would obtain if all the related elements of the entity 𝑖 were values like 0.4, 0.3, etc.), and found that 0.5 provided the best balance

detected in a resume/job posting. The term (0.5 * 𝑘∈𝑒𝑛𝑡𝑖𝑡𝑦 𝑠𝑐𝑜𝑟𝑒𝑘 (𝑗)) for correct class­fication across diverse job types. Changing this value
𝑖
acts as a corrective factor, empirically determined to prevent the mis­ would i­fluence the class­fication results, as a smaller corrective factor
judgment of job categories with a large number of entities associated could lead to the misclass­fication of jobs with many associated entities,
in ONET. Without this correction, the formula would only normalize while increasing the factor could cause underrepresentation of jobs with
the scores by dividing by the maximum possible score, which could un­ fewer associated entities. Therefore, the value of 0.5 was chosen based
fairly disadvantage jobs with many associated entities. For example, if on its ability to yield the highest accuracy across various datasets. This

8
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509

Fig. 8. Example of similarity-matrix for the Knowledge entity.

corrective factor can be considered a hyperparameter, and its optimal ements of the underlying O*NET entity and the information extracted
value was determined through extensive experimentation. from the underlying resume or job posting.
The described procedure is performed for each job 𝑗 and returns the
first five jobs that have obtained the highest scores. Fig. 7 shows the 5. Scenarios
flow diagram of the algorithm just illustrated.
Fig. 8 shows an example of a similarity matrix for the O*NET entity In this section, we will show the two scenarios that we have con­
Knowledge (with 33 rows) with Java Programmer as job and Computers, sidered for job matching. They are related to the ident­fication of the
training, maths, English, sales, administrator, engineer as information ex­ best match between a resume and a certain job description. They are
tracted from an input resume. The reader should note that in the similar­ also complementary, in the sense that the first one is company-oriented
ity matrix, for each piece of information extracted from the resume (col­ whereas the second one is candidate-oriented.
umn), we have a cell with the maximum value on the row corresponding
5.1. First scenario
to an element of the Knowledge entity (as previously mentioned, Knowl­
edge has 33 rows). For example, for the information Computers, this
The input of the first scenario is a resume. Here the task is to identify
maximum value corresponds to the Knowledge element Computers and
the O*NET occupations with the highest match with the given resume.
Electronics, but the corresponding similarity value of 0.6389 is below the
This might be very useful for companies that need to quickly assess dif­
threshold (empirically fixed for all the entity values at 0.65). This rep­
ferent resumes. With an automatic tool like this, companies would be
resents a typical ``failure case'', where the match is incorrectly excluded
able to categorize plenty of resumes with very high precision in a matter
despite being highly relevant. The O*NET score of 4.87 (associated with of seconds. In such a case, the application of our approach is straight­
the value Computers and Electronics for the job Java Programmer) for the forward. We would first compute the entities of the resume as described
entity value Computers and Electronics will not, therefore, be added to the in Section 4. Next, we would compute the score of each job as indicated
total score of the resume. The normalized and averaged score of 9.26 in Equation (1) and would consider the job whose score is the highest.
(at the bottom of the figure and corresponding to the computation of
the member between the parenthesis of Equation (1) for the Knowledge 5.2. Second scenario
entity) will be then summed with the other normalized and averaged
scores of the other O*NET entities. Their similarity matrices used to cal­ The input of the second scenario we have envisaged consists of a re­
culate the other normalized and averaged scores, work in the same way. sume and a set of job descriptions. Here the goal is to return the job
What changes is the number of rows and columns, according to the el­ whose description matches the most with the candidate’s skills present

9
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
Table 2
Inter-agreement evaluation of the three ex­
perts on our approach and the two baselines.

Annotation1 Annotation2

Our Approach 0.37 0.27


Baseline1 0.56 0.43
Baseline2 0.39 0.37

scale. As such, human assessors assign a relevance score to each sug­


gested job. The score can be any of the following values:

• non-relevant (score 1): The predicted job is completely different


from the original one.
• ordinary (score 2): The predicted job has an ambiguous or unfair
Fig. 9. A resume within the category Web Designing. match with the original one.
• marginally relevant (score 3): The predicted job is more generic
in his/her resume. If the first scenario was thought to help companies or somewhat related with the original one or vice-versa.
in the fast screening of resumes, the second scenario is more applicant­ • relevant (score 4): The predicted job fairly matches with the orig­
oriented and allows to quickly assess which jobs among a list of many inal one.
are more suitable to the given resume. To adapt our method for this • highly relevant (score 5): The predicted job totally matches with
case we first apply our approach to each job description. For a given the original one.
job description 𝑗𝑑𝑒𝑠𝑐 present in the input set, after having collected its
related entities vectors, it returns the O*NET occupation with the high­ Although precision, recall, and F1-scores are widely used metrics for
est match with 𝑗𝑑𝑒𝑠𝑐 . Please note that at this stage the returned O*NET performance evaluation, we deliberately adopted the described method­
occupations 𝑂𝐶𝐶 are less than or equal to the number of job descrip­ ology to ensure a more nuanced and context-aware assessment of job
tions present in our set (it might be less if multiple job descriptions are relevance. This approach allows human annotators to capture qualita­
mapped to the same O*NET occupation). Finally, we apply once more tive aspects of the predictions, which are not fully accounted for by
our approach to the input resume with the difference that, instead of traditional quantitative metrics.
looking to the best possible match over all the O*NET occupations, we We used the mentioned ratings for two different annotations
limit the search to the 𝑂𝐶𝐶 set we ident­fied in the step before. There­ (𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 and 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 ). 𝐴𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 assesses the first predicted
fore, one of the O*NET occupations present in 𝑂𝐶𝐶 will be returned as job. 𝐴𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 assesses the five predicted jobs for a resume. More
the one that matches the most with the input resume. in detail, annotators were asked to give a score by picking the most
matching job according to their expertise among the 5 proposed by our
6. Evaluation approach. Therefore, if the resume’s label was Data Science and the five
returned O*NET occupations were, respectively, Models, Data Scien­
In this section, we will illustrate the evaluation assessment we have tists, Logisticians, Computer Programmers, and Mechanical Engineers,
performed for each scenario. The reader notices that the scoring mech­ one annotator should assign 1 to 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 (because Models is not rel­
anism that we have introduced for Scenario 1 has been used as well for evant to Data Science) and 5 to 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 (because among the first
Scenario 2. five results, we have Data Scientists which completely matches the orig­
inal Data Science label). It is straightforward to note that the following
6.1. Evaluating scenario 1 statement holds:

𝑠𝑐𝑜𝑟𝑒(𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 ) ≥ 𝑠𝑐𝑜𝑟𝑒(𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 )
As far as the first scenario is concerned, to check the performances
of our approach we used the Resume Dataset version 118 . The dataset Three independent recruiters have annotated the two scores returned
consists of 963 resumes from 25 different categories. An example of a by our algorithm for the 105 resumes. To decide which annotation to
resume is shown in Fig. 9. Several resumes had duplicates and some cat­ pick, we applied the majority voting algorithm. If three annotations for a
egories contained only a few of them. To create the test dataset so that particular resume were all different from each other (so that the majority
we could analyze the performances on its categories, we fixed as a con­ voting algorithm could not be applied) we computed the average and
straint that at least five resumes should occur for each category. Only rounded it to the closest integer score (for example, if three scores were
21 categories sati­fied such a constraint. Therefore, we selected a set of 1, 3, and 4 the final score would be 𝑟𝑜𝑢𝑛𝑑(1 + 3 + 4)∕3 = 3). The inter­
105 resumes from 21 different categories (5 resumes per category) and annotator agreement scores computed according to Fleiss’ kappa [36]
run our method which, for each resume, returned the top five O*NET were 0.37 and 0.27 for both 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 and 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 respectively
occupations with the highest match with the underlying resume. The 21 as reported in Table 2, indicating a fair/moderate agreement among
classes were ``Data Science'', ``Human Resources'', ``Advocate'', ``Mechan­ the annotators, despite the elevated number of classes to pick for each
ical Engineer'', ``Sales'', ``Health and fitness'', ``Civil Engineer'', ``Java sample. Then, we applied the majority voting and scored the results
Developer'', ``Business Analyst'', ``SAP Developer'', ``Automation Test­ of the proposed algorithm accordingly. In particular, we obtained an
ing'', ``Electrical Engineering'', ``Python Developer'', ``DevOps Engineer'', average score of 3.8 for 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 and an average score of 4.2 for
“Network Security Engineer'', ``Database'', ``Hadoop'', ``ETL Developer'', 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 , as indicated in Table 3. We compared these results against
“DotNet Developer'', ``Blockchain'', ``Testing''. those obtained using two naive unsupervised approaches which would
To evaluate our method, we performed an extensive manual assess­ constitute our baselines.
ment. The relevance of each suggested job was determined by adopting The two baselines do not take into account any entity of the O*NET
a similar approach presented by [35]. We rely on a five-point relevance database and work as follows. First, each input resume is broken down in
nouns, noun phrases, and sentences. For each element, we compute the
cosine similarity against each of the 21 class names of jobs (𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒1)
18
https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset. and against each of the 1,016 O*NET occupations names (𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒2).

10
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
Table 3 Table 4
Results of our approach and the two baselines. Inter-agreement score of the three experts on our ap­
proach and the two baselines for the second scenario.
Annotation1 Annotation2
Annotation1 Annotation2
Our Approach 3.8 4.2
Baseline1 3.4 4.9 Our Approach 0.69 0.66
Baseline2 3.0 3.8 Baseline1 0.81 0.63
Baseline2 0.66 0.65

In both cases, we return the five job classes with the highest seman­
tic similarity. Those classes have been annotated by the three experts Table 5
with the same scoring approach mentioned above and a majority voting Results of our approach and the two baselines
strategy has been applied. We computed the inter-annotator agreement for the second scenario.
scores [36] among the three annotators for the baselines as well. The Annotation1 Annotation2
inter-agreement values for the two baselines are reported in Table 2.
Our Approach 4.17 4.95
The reader might notice a much higher inter-agreement value for the
Baseline1 3.8 4.87
𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒1 than the others. The reason is that 𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒1 returns elements Baseline2 2.03 3.75
from a set of 21 classes which are much easier to rate for the annotators
with respect to 1,016 classes. Finally, as similarly performed for our ap­
proach, we consider the majority voting result that we report in Table 3. once in the entire collection. Because the related categories of each job
We can observe that for 𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒1, for the annotation 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 , the posting are not always matched with the occupations found within the
obtained score is very high. This happens for the same reason as the high O*NET database, we applied our method to each job posting in order
inter-annotator agreement value mentioned earlier. Because the set of to identify the O*NET occupation that matches the most the related
considered classes for this baseline is 21, it is very likely that out of 21 job posting. Therefore, for each sample, we also added ten O*NET job
candidate classes, one of the first five returned jobs is correct (equal to categories. Each one is related to each job posting in the sample.
the resume label). We d­fined two baselines. The first one works as follows. Given one
sample, it starts by extracting noun phrases, nouns, and sentences from
6.2. Evaluating scenario 2 the resume. Then for each extracted element it computes the cosine
similarity against the 20 job labels (the job posting categories of the
For this second scenario, we employed a dataset of 10k job posts CareerBuilder dataset and those from the O*NET database calculated
of the CareerBuilder job dataset from the UK in 201919 . A subset of by using our approach) and returns the five job classes whose semantic
8948 of these 10k job posts are in English and belong to 299 distinct similarity with any of the extracted element is the highest.
categories. Each category might contain one or more jobs. Each job The second baseline uses the same O*NET entities we have d­fined
belongs to one category only. To prepare the dataset to be used with within our approach but without using the scores. From the resume, it
our system, we selected the posts belonging to categories with cardi­ ident­fies all the values for each O*NET entity and builds one binary
nality greater than or equal to 5 (i.e., each job category contained at vector for each entity. Thus, the entity 𝑒𝑛𝑡 will consist of a vector of 0s
least five jobs). We then selected 100 resumes (5 for each category) and 1s: 0 if the 𝑖 feature of 𝑒𝑛𝑡 is not included in the resume, 1 other­
from the dataset used in scenario 1, belonging to the following 20 cat­ wise. To note that we considered all the values of each entity so that
egories: Data Science, Human Resources, Advocate, Mechanical Engineer, vectors of a certain entity will always have the same size. We perform
Sales, Health and fitness, Pharmacists, Java Developer, Business Analyst, the same operation for the 20 jobs present in each sample (thus we ob­
SAP Developer, Automation Testing, Electrical Engineering, Python Devel­ tain binary entity vectors for each job in the sample) and then compute
oper, DevOps Engineer, Network Security Engineer, Database, Hadoop, ETL the Euclidean distance between binary vectors. We take the 5 jobs with
Developer, DotNet Developer, Testing. Then we selected the most seman­ the smallest distance with respect to the resume.
tically similar job post category for each of the 20 resume categories: Finally, in a given sample, our approach is applied to the resume by
Computer and Information Research Scientists, Human Resources Specialists, computing the entities using the method already illustrated in Section 4
Lawyers, Industrial Engineers, Sales Managers, Health Educators, Pharma­ and looking for the best O*NET job label among the twenty we have.
cists, Computer Programmers, Management Analysts, Software Developers, Three human annotators filled two annotations for the two base­
Applications, Software Quality Assurance Engineers and Testers, Electrical lines and our approach. The first annotation assesses the first predicted
Engineers, Computer Programmers, Computer Systems Engineers/Architects, job post whereas the second annotation assesses the top-5 predicted job
Network and Computer Systems Administrators, Database Administrators, posts, similarly as d­fined within the first scenario. To note that a cer­
Computer Systems Analysts, Software Developers, Applications, Software De­ tain prediction is compared against the list of the 10 original job labels
velopers, Applications, Software Quality Assurance Engineers and Testers.20 and against the list of the derived 10 O*NET job labels. So, for exam­
Each of the 100 resumes has been associated with 10 different job posts,
ple, if a baseline contains ``Architectural and Engineering Managers'' as
the last of which belongs to the corresponding resume category. The ex­
its first prediction, 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 gets a 5 if that category is either the
traction of job posts was random and each extracted post could occur
tenth among the original job categories or if that is the O*NET cate­
just once.
gory corresponding to the tenth job description. We computed the final
Therefore, each sample contained a resume, ten job postings and
annotations by using a majority voting strategy on the three annota­
their related categories included in the CareerBuilder job dataset. For
tors’ scores similarly as performed within the first evaluation scenario.
a given sample, the ten job postings included are always of different
The inter-annotator agreement scores according to Fleiss’ kappa [36]
categories. The last job posting of each sample corresponds to the same
among the three annotators are reported in Table 4 indicating a sub­
category of the resume. The same job posting is never present more than
stantial agreement.
The reader notices that such values are much higher than those
19
https://data.world/promptcloud/jobs-on-careerbuilder-uk.
obtained within scenario 1. The reason is because, in this case, each
20
The reader notices that such a list is presented in order of similarity with method had to choose among 10 different jobs only. This low num­
the list of resume categories shown above. That is, the first element, Computer ber and the fact that they are different from each other helped the
and Information Research Scientists is the most semantically similar job category role of the annotators that were in agreement more often. Table 5
to the resume category Data Science, etc. shows the results we have obtained for the three methods where it

11
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
is possible to note how ours outperforms the two baselines for both There are several ways that can be employed for such a purpose.
𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛1 and 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛2 showing again its efficiency. The data re­ The first one is to leverage existing APIs of well-known massive open
lated to the evaluation that has been carried out for both scenarios is online course providers such as Udemy21 or Coursera22 . In such a case
freely available at https://gitlab.com/hri_lab1/using-transformers-and- we would simply leverage the search engine of those providers by feed­
o-net-to-match-jobs-to-applicants-resumes. ing them with the entities to be mastered by the applicant.
As an example, we submitted for the job Mechanical Engineer a re­
7. Use case: recommending skills to job seekers sume from the Resume Dataset belonging to the category Electrical Engi­
neering. Our system returns a score of 19.33, which is below the threshold
Our approach is well-suited for a recommendation task for job seek­ (34.62) by 15.29 points. Hence, the system does not consider the re­
ers. In this section, we will show an extension of our method in this sume eligible for the Mechanical Engineer job and lists the elements of
direction. It works as follows. The user of the system, that is a person the O*NET entities that have not been ident­fied in the CV, including for
looking for a job, should just provide his/her resume and a list of job example: Reading comprehension, Active Listening, Mathematics and Crit­
names he/she is interested in. First of all, the job names are matched ical Thinking. For each item, up to 5 Udemy courses are listed, when
against the O*NET occupations so that a map can be d­fined between available. In our example, the Become Active Reader & Master Your Read­
input job names and O*NET occupations. We already performed a simi­ ing Comprehension Skills course is suggested for the first element, Giving
lar computation within the evaluation of the proposed second scenario. full attention to what other people are saying, taking time to understand
Then, for each ident­fied O*NET job, the approach would compute a the points being made, asking questions as appropriate, and not interrupt­
score of the resume against the input O*NET categories according to ing at inappropriate times course for the second, Math and Perfect your
the Equation (1) introduced in Section 4.2. Mathematics skills courses for the third, and Using logic and reasoning to
identify the strengths and weaknesses of alternative solutions, conclusions, or
7.1. Job seeking and skills recommendation approaches to problems course for the fourth. Then the candidate should
read or attend those lectures and improve his/her resume with the ac­
The reader can find an implementation of this use case at http:// quired competencies. Afterward, by adding the missing elements listed
192.167.149.11:8000/ under the first demo link. Let us assume that the above to the resume and resubmitting the application to the system, the
user gives as an input one job name only, and thus only one job from augmented resume obtains a score of 39.85 and passes the eligibility
O*NET is selected. Then the user uploads his/her resume. The system threshold.
elaborates on it. To assess whether the score obtained from the resume Besides relying on online providers, we can also exploit dumps of re­
for the requested job makes the resume eligible, we compare it with an sources with courses’ information [37,38]. For example, one resource
experimentally established threshold value of 34.62. The value of 34.62 that can be leveraged is the COCO [38] dataset, a large collection of
for eligibility represents the average score that resumes from the correct courses that includes lessons, teachers, instructors, and learners’ ratings
job category obtained during the testing phase. It was chosen as a natu­ collected with the purpose of developing AI-based e-learning applica­
ral cutoff, distinguishing eligible candidates from ineligible ones based tions on top. For the courses, it provides metadata such as short and
on the distribution of scores across different classes. This value r­flects long descriptions, the necessary requirements, and the expected skills
the typical score achieved by resumes correctly class­fied, ensuring that acquired by successfully attending them. These can be leveraged by our
only resumes with a high likelihood of being relevant to the job are con­ system to find out which courses would support an applicant to mas­
sidered eligible. It is calculated by averaging the scores obtained from ter new skills by matching the detected missing skills of a resume. Our
all elements of the Resume Dataset used in the previous scenarios, un­ system would suggest a list of courses from the COCO dataset simi­
der the assumption (co­firmed among the release notes of the dataset) lar to what is performed by e-learning providers’ search engines. The
that each resume in that dataset is eligible for the O*NET occupation advantage of using such a dataset lies in the opportunity to develop
that was selected. customized AI-based applications (for example leveraging transform­
When the user’s resume is not eligible with the input job, the sys­ ers or other language models) giving freedom to the developers to
tem returns the list of the O*NET entity elements not found within the build their own solutions to find the best matches with the missing
input resume in decreasing order of importance. For the entities with skills.
scores (Abilities, Knowledge, Skills, Work Activities, Task Ratings), it Finally, with the aim to recommend lectures or articles, our approach
means to return their elements in decreasing order of scores. For the can also be easily fed with modern state-of-the-art resources such as the
other entities (Technical Skills and Tools Used), they are returned in “Academia/Industry DynAmic'' (AIDA) [39] or ``The Computer Science
alphabetical order (for the technological skills we first return the en­ Knowledge Graph'' (CS-KG) [40--43]. AIDA is a collection of metadata
tries with hot technologies and then the others). For each element, the about 21𝑀 publications and 8𝑀 patents categorized within a taxon­
system indicates the O*NET score, the name of the element, a brief de­ omy of computer science topics from the Computer Science Ontology
scription (when available), and the value that would be added to the (CSO) [44]. CS-KG is an automatically generated knowledge graph that
overall CV score if it was found in the resume (that is the value between describes the content of 6.7𝑀 scientific papers with 10𝑀 research en­
the parenthesis of the Equation (1) for a fixed 𝑖, 𝑘 and 𝑗 ). If the user tities and 41𝑀 relationships among them within the computer science
notices that he/she has simply failed to include some of his/her skills domain. These resources can be exploited to recommend patents and
or knowledge or some previous work activity, he/she can add it to the research papers that might be of interest for the applicants to discover
resume and resubmit it. Otherwise, he/she would need to get more com­ where and how state-of-the-art technologies are used to improve their
petencies and, consequently, raise the score of his/her resume over the skills and be ready for the job application. Considering the example
pre-established threshold. In addition to this, the demonstration also al­ above, if an applicant has to acquire the skill Critical Thinking for the
lows us to identify the skills, technical abilities, and other qual­fications Mechanical Engineering job, the system can be easily extended to look
needed for a job and can recommend job opportunities that align with for research papers that study it within the CS-KG SparQL endpoint23 ,
the user’s characteristics (second and third demo). suggesting to the applicant interesting reads to make stronger his/her
application. For example, CS-KG suggests ``Serious games on environ­
7.2. Acquiring new skills

To acquire new skills, a system that embeds our approach can cover 21
https://www.udemy.com.
the different values present in the entities so that the candidate becomes 22
https://www.coursera.org.
23
eligible for that specific job. https://scholkg.kmi.open.ac.uk/sparql/.

12
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cskg: <http://scholkg.kmi.open.ac.uk/cskg/resource/>
PREFIX cskg-ont: <http://scholkg.kmi.open.ac.uk/cskg/ontology#>
PREFIX provo: <http://www.w3.org/ns/prov#>
PREFIX cso: <http://cso.kmi.open.ac.uk/schema/cso#>
PREFIX dcterm: <http://purl.org/dc/terms/>

SELECT ?sub ?pre ?obj?paperTitle ?doi FROM


<http://scholkg.kmi.open.ac.uk/cskg>
WHERE { ?t rdf:subject cskg:critical_thinking .
?t rdf:subject ?sub .
?t rdf:predicate ?pre .
?t rdf:object ?obj .
?t provo:wasDerivedFrom ?paperID .
?paperID dcterm:title ?paperTitle ;
cskg-ont:hasDOI ?doi .
?t cskg-ont:hasSupport ?sup}
ORDER BY desc (?sup)

Fig. 10. SparQL query to find papers related to the Critical Thinking skill.

mental management''24 and ``Teaching Flooding Attack to the SDN Data collect users’ feedback and create a gold standard to evaluate the accu­
Plane with POGIL''25 as relevant papers to master the Critical Think­ racy of the system at scale. We believe this system might be either used
ing skill. Fig. 10 shows the SparQL query to be executed in the CS-KG standalone or incorporated into well-known platforms for job seekers.
SparQL endpoint to retrieve the papers related to the Critical Thinking One feature we would like to add is the possibility to include a gener­
skill. ator of resumes starting from a list of skills and abilities: this feature
would leverage new text generation models with the goal of prepar­
8. Conclusions and future directions ing a complete resume in natural language. Furthermore, we would like
to study the injection of O*NET database knowledge and transformer
In this paper, we have proposed an approach for job recommenda­ models into existing recommendation methodologies (e.g., collaborative
tion based on the information extracted from applicants’ resumes and filtering) with the goal of creating a ful­-fledged platform for job recom­
job postings. The information is then matched against different O*NET mendation that takes ben­fits from domain knowledge using O*NET,
entities using semantic similarity of deep learning transformers in order transformers, and past candidate experiences.
to identify the most suitable O*NET job to the underlying resume or job One of the areas where the authors are continuing their research
posting. Two scenarios have been considered: one useful for companies focuses on providing recommendations for courses and training materi­
with the goal of quickly screening several resumes and the other useful als. These recommendations aim to help job seekers acquire the skills or
for applicants with the goal of quickly identifying the job posting more competencies they lack. Additionally, the focus will shift towards apply­
suitable to their skills. ing the lessons learned in the job-seeking domain to worker reskilling
For the first scenario, an extensive evaluation on the Resume Dataset initiatives within organizations, especially under the context of Indus­
version 1 has been carried out considering 105 resumes from 21 differ­ try 5.0. In this context, LLMs and retrieval-augmented generation (RAG)
ent categories. We have d­fined a scoring mechanism with five different systems will be explored for the development of study guides based on
values from 1 to 5. Moreover, we came up with two baselines (one re­ the organization’s training documentation. Another case study will in­
turning the output over the 21 categories and the other returning the volve adapting the system for document class­fication. This adaptation
output over the 1,016 possible jobs) to compare our approach against will aim to identify the specific skills and abilities that a job seeker or stu­
and manually annotated the results of the three methods with the scor­ dent could improve through the study of the corresponding document.
ing mechanism we d­fined. The scoring of the three methods has been Finally, we plan to explore fine-tuning strategies and further improve­
applied to two outputs (the job with the highest score and the first five ments to the system’s performance.
jobs with the highest scores). Our approach outperformed the others in
three cases out of four.
CRediT authorship contribution statement
For the second scenario, we considered 1000 job postings from the
CareerBuilder job dataset and 100 resumes from the Resume Dataset
used within the first scenario. With similar scoring metrics as those used Rubén Alonso: Conceptualization, Funding acquisition, Visualiza­
within the first scenario, we showed how our method obtained much tion, Writing -- review & editing. Danilo Dessí: Formal analysis, Investi­
better results than two more baselines that we d­fined. gation, Validation, Writing -- review & editing. Antonello Meloni: Data
Finally, as a use case of the proposed approach, we discussed a rec­ curation, Investigation, Software, Writing -- original draft. Diego Refor­
ommending task for job seekers where, given in input a resume and a list giato Recupero: Conceptualization, Funding acquisition, Investigation,
of occupations, the system returns, for each occupation, a list of courses Methodology, Supervision, Writing -- review & editing.
that need to be attended to acquire the missing skills and become eligi­
ble for that specific job. Declaration of competing interest
As future directions, our aim is to r­fine our prototype and create a
real system that performs all the tasks we have illustrated in this work, No co­flict of interest exists.
including the use case discussed at the end of the manuscript. We will
release the APIs so that everyone will be able to play with them and
Acknowledgements
integrate them into existing systems. The API will also be employed to

This research was partially funded by the Eupean Un project STAR


24
https://doi.org/10.1016/j.scs.2016.11.007. - Novel AI technology for dynamic and unpredictable manufacturing
25
https://doi.org/10.1145/3368308.3415406. environments (grant number 956573).

13
R. Alonso, D. Dessí, A. Meloni et al.
Big Data Research 39 (2025) 100509
Data availability [23] G. Domeniconi, G. Moro, A. Pagliarani, K. Pasini, R. Pasolini, Job recommenda­
tion from semantic similarity of linkedln users’ skills, in: Proceedings of the 5th
International Conference on Pattern Recognition Applications and Methods, ICPRAM
Data will be made available on request. 2016, SCITEPRESS - Science and Technology Publications, Lda, Setubal, PRT, 2016,
pp. 270--277.
References [24] F. Gutiérrez, S. Charleer, R. De Croon, N.N. Htun, G. Goetschalckx, K. Verbert, Ex­
plaining and exploring job recommendations: a user-driven approach for interacting
with knowledge-based job recommender systems, in: Proceedings of the 13th ACM
[1] S.D. Risavy, C. Robie, P.A. Fisher, S. Rasheed, Resumes vs. application forms: why
Conference on Recommender Systems, RecSys ’19, Association for Computing Ma­
the stubborn reliance on resumes?, Frontiers in Psychology 13 (2022) 884205.
chinery, New York, NY, USA, 2019, pp. 60--68.
[2] S. Rojas-Galeano, J. Posada, E. Ordoñez, A bibliometric perspective on ai research
[25] B. Heap, A. Krzywicki, W. Wobcke, M. Bain, P. Compton, Combining career pro­
for job-résumé matching, The Scientific World Journal 2022 (1) (2022) 8002363.
gression and pr­file matching in a job recommender system, in: D.-N. Pham, S.-B.
[3] U. Goyal, A. Negi, A. Adhikari, S.C. Gupta, T. Choudhury, Resume data extraction Park (Eds.), PRICAI 2014: Trends in Art­ficial Intelligence, Springer International
using nlp, in: J. Singh, S. Kumar, U. Choudhury (Eds.), Innovations in Cyber Physical Publishing, Cham, 2014, pp. 396--408.
Systems, Springer, Singapore, Singapore, 2021, pp. 465--474. [26] X. Zhao, Cold-start collaborative filtering, SIGIR Forum 50 (1) (2016) 99--100,
[4] T.M. Harsha, G.S. Moukthika, D.S. Sai, M.N.R. Pravallika, S. Anamalamudi, M. En­ https://doi.org/10.1145/2964797.2964819.
duri, Automated resume screener using natural language processing (nlp), in: 2022 [27] N. Reimers, I. Gurevych, Sentence-bert: sentence embeddings using siamese bert­
6th International Conference on Trends in Electronics and Informatics (ICOEI), 2022, networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural
pp. 1772--1777. Language Processing, Association for Computational Linguistics, 2019, https://arxiv.
[5] A. Deshmukh, A. Raut, Applying bert-based nlp for automated resume screening and org/abs/1908.10084.
candidate ranking, Annals of Data Science (2024) 1--13. [28] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional
[6] S. Westman, J. Kauttonen, A. Klemetti, N. Korhonen, M. Manninen, A. Mononen, transformers for language understanding, arXiv preprint, arXiv:1810.04805, 2018.
S. Niittymäki, H. Paananen, Art­ficial intelligence for career guidance–current re­ [29] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
quirements and prospects for the future, IAFOR Journal of Education 9 (4) (2021) V. Stoyanov, Roberta: a robustly optimized bert pretraining approach, arXiv preprint,
43--62. arXiv:1907.11692, 2019.
[7] C. Yu, C. Zhang, J. Wang, Extracting body text from academic pdf documents for [30] K. Song, X. Tan, T. Qin, J. Lu, T.-Y. Liu, Mpnet: masked and permuted pre-training
text mining, in: KDIR, 2020, pp. 229--236. for language understanding, Advances in Neural Information Processing Systems 33
[8] C. Stahl, S. Young, D. Herrmannova, R. Patton, J. Wells, Deeppdf: a deep learning (2020) 16857--16867.
approach to extracting text from pdfs, in: Proceedings of the Eleventh International [31] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, Albert: a lite bert
Conference on Language Resources and Evaluation (LREC 2018), European Lan­ for self-supervised learning of language representations, arXiv preprint, arXiv:1909.
guage Resources Association (ELRA), Paris, France, 2018. 11942, 2019.
[9] J. Tiedemann, Improved text extraction from pdf documents for large-scale natural [32] S.R. Bowman, G. Angeli, C. Potts, C.D. Manning, A large annotated corpus for learn­
language processing, in: A. Gelbukh (Ed.), Computational Linguistics and Intelligent ing natural language inference, in: Proceedings of the 2015 Conference on Empirical
Text Processing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2014, pp. 102--112. Methods in Natural Language Processing, Association for Computational Linguistics,
[10] Z. Cheng, P. Zhang, C. Li, Q. Liang, Y. Xu, P. Li, S. Pu, Y. Niu, F. Wu, Trie++: towards Lisbon, Portugal, 2015, pp. 632--642, https://aclanthology.org/D15-1075.
end-to-end information extraction from visually rich documents, arXiv preprint, [33] A. Williams, N. Nangia, S. Bowman, A broad-coverage challenge corpus for sen­
arXiv:2207.06744, 2022. tence understanding through inference, in: Proceedings of the 2018 Conference of
[11] M.J. Handel, The o*net content model: strengths and limitations, Journal for Labour the North American Chapter of the Association for Computational Linguistics: Hu­
Market Research 49 (2) (2016) 157--176, https://doi.org/10.1007/s12651-016- man Language Technologies, vol. 1 (Long Papers), Association for Computational
0199-8. Linguistics, New Orleans, Louisiana, 2018, pp. 1112--1122, https://aclanthology.
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, org/N18-1101.
[34] C. Spearman, The proof and measurement of association between two things, The
I. Polosukhin, Attention is all you need, in: Proceedings of the 31st International
American Journal of Psychology 100 (3/4) (1987) 441--471, http://www.jstor.org/
Conference on Neural Information Processing Systems, NIPS’17, Curran Associates
stable/1422689.
Inc., Red Hook, NY, USA, 2017, pp. 6000--6010.
[35] A. Konjengbam, N. Kumar, M. Singh, Unsupervised tag recommendation for popular
[13] C. Helwe, C. Clavel, F.M. Suchanek, Reasoning with transformer-based models: deep
and cold products, Journal of Intelligent Information Systems 54 (2019) 545--566.
learning, but shallow reasoning, in: 3rd Conference on Automated Knowledge Base
[36] J.L. Fleiss, J.C. Nee, J.R. Landis, Large sample variance of kappa in the case of dif­
Construction, 2021, https://openreview.net/forum?id=Ozp1WrgtF5_.
ferent sets of raters, Psychological Bulletin 86 (5) (1979) 974.
[14] A. Meroño-Pe nuela, D. Spagnuelo, Can a transformer assist in scientific writing?
[37] D. Dessì, G. Fenu, M. Marras, D. Reforgiato Recupero, Coco: semantic-enriched col­
Generating semantic web paper snippets with gpt-2, in: A. Harth, V. Presutti, R.
lection of online courses at scale with experimental use cases, in: Á. Rocha, H. Adeli,
Troncy, M. Acosta, A. Polleres, J.D. Fernández, J. Xavier Parreira, O. Hartig, K. Hose,
L.P. Reis, S. Costanzo (Eds.), Trends and Advances in Information Systems and Tech­
M. Cochez (Eds.), The Semantic Web: ESWC 2020 Satellite Events, Springer Interna­
nologies, Springer International Publishing, Cham, 2018, pp. 1386--1396.
tional Publishing, Cham, 2020, pp. 158--163. [38] D. Dessì, G. Fenu, M. Marras, D. Reforgiato Recupero, Bridging learning an­
[15] Z. Zheng, Z. Qiu, X. Hu, L. Wu, H. Zhu, H. Xiong, Generative job recommendations alytics and cognitive computing for big data class­fication in micro-learning
with large language model, arXiv preprint, arXiv:2307.02157, 2023. video collections, Computers in Human Behavior 92 (2019) 468--477, https://
[16] Z. Guan, J.-Q. Yang, Y. Yang, H. Zhu, W. Li, H. Xiong, Jobformer: skill-aware job doi.org/10.1016/j.chb.2018.03.004, https://www.sciencedirect.com/science/
recommendation with semantic-enhanced transformer, ACM Transactions on Knowl­ article/pii/S0747563218301092.
edge Discovery from Data (2024). [39] S. Angioni, A.A. Salatino, F. Osborne, D.R. Recupero, E. Motta, AIDA: a knowledge
[17] Y. Li, M. Yamashita, H. Chen, D. Lee, Y. Zhang, Fairness in job recommendation graph about research dynamics in academia and industry, Quantitative Science Stud­
under quantity constraints, in: AAAI-23 Workshop on AI for Web Advertising, 2023. ies 2 (4) (2021) 1356--1398, https://doi.org/10.1162/qss_a_00162.
[18] J. Dhameliya, N. Desai, Job recommender systems: a survey, in: 2019 Innovations [40] D. Dessì, F. Osborne, D.R. Recupero, D. Buscaldi, E. Motta, H. Sack, AI-KG: an au­
in Power and Advanced Computing Technologies (i-PACT), vol. 1, 2019, pp. 1--5. tomatically generated knowledge graph of art­ficial intelligence, in: The Semantic
[19] R. Mishra, S. Rathi, Efficient and scalable job recommender system using collab­ Web - ISWC 2020 - 19th International Semantic Web Conference, in: Lecture Notes
orative filtering, in: A. Kumar, M. Paprzycki, V.K. Gunjan (Eds.), ICDSMLA 2019, in Computer Science, vol. 12507, Springer, 2020, pp. 127--143.
Springer, Singapore, Singapore, 2020, pp. 842--856. [41] D. Dessì, F. Osborne, D.R. Recupero, D. Buscaldi, E. Motta, Generating knowledge
[20] Y. Lu, S. El Helou, D. Gillet, A recommender system for job seeking and recruiting graphs by employing natural language processing and machine learning techniques
website, in: Proceedings of the 22nd International Conference on World Wide Web, within the scholarly domain, CoRR, arXiv:2011.01103, 2020, https://arxiv.org/abs/
WWW ’13 Companion, Association for Computing Machinery, New York, NY, USA, 2011.01103.
2013, pp. 963--966. [42] D. Dessí, F. Osborne, D.R. Recupero, D. Buscaldi, E. Motta, Scicero: a deep learning
[21] W. Shalaby, B. AlAila, M. Korayem, L. Pournajaf, K. Aljadda, S. Quinn, W. Zadrozny, and nlp approach for generating scientific knowledge graphs in the computer science
Help me find a job: a graph-based approach for job recommendation at scale, domain, Knowledge-Based Systems 258 (2022) 109945.
in: Proceedings of the International Conference on Big Data (BiG Data), 2017, [43] D. Dessí, F. Osborne, D. Reforgiato Recupero, D. Buscaldi, E. Motta, Cs-kg: a large­
pp. 1544--1553. scale knowledge graph of research entities and claims in computer science, in: Inter­
[22] S. Choudhary, S. Koul, S. Mishra, A. Thakur, R. Jain, Collaborative job prediction national Semantic Web Conference, Springer, 2022, pp. 678--696.
based on Naïve Bayes class­fier using python platform, in: 2016 International Confer­ [44] A.A. Salatino, T. Thanapalasingam, A. Mannocci, F. Osborne, E. Motta, The com­
ence on Computation System and Information Technology for Sustainable Solutions puter science ontology: a large-scale taxonomy of research areas, in: International
(CSITSS), 2016, pp. 302--306. Semantic Web Conference, Springer, 2018, pp. 187--205.

14

You might also like