Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views6 pages

Chen 2017

research paper

Uploaded by

Marjiba Jamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Chen 2017

research paper

Uploaded by

Marjiba Jamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2017 2nd IEEE International Conference on Computational Intelligence and Applications

Hybrid Deep Collaborative Filtering for Job Recommendation

Weijian Chen, Xingming Zhang, Haoxiang Wang*, Hongjie Xu


School of Computer Science and Engineering, South China University of Technology
Guangzhou 510006, China
e-mail: *[email protected]

Abstract—In recent years, deep learning has achieved great to enhance the performance of job recommender system [3].
success in a series of areas, but there is few published work on Paparrizos et al. defined the question of human resource
deep learning for job recommendation. Most researchers focus recommendation as a supervised machine learning problem.
on the application of traditional algorithms, most of which still They used the employment history and information of
use algorithms like collaborative filtering and content-based employees to establish a forecasting model to predict their
filtering. In this paper, we study and improve the existing future work [4]. Song Qingqing proposed job recommender
recommender algorithm based on deep learning and apply it to system based on the ontology, and focus on how to build
the field of job recommendation, hoping to solve the problems user model based on ontology for job recommendation [5].
existing in traditional recommender algorithm. We collect the
Meng Fanlian did research on the job recommender engine
information of the candidate and the job from a human
using hybrid recommender algorithm, he combined PLSA
resources business system, then performed pre-processing
operations on collected data, such as data cleaning, data and the content-based filtering algorithm to obtain the two-
transforming and data reduction, and obtain a human way job recommendation algorithm, which can recommend
resources data warehouse for job recommender algorithm. In both job and candidates [6]. Liu Shunwen proposed an
addition, we propose Hybrid Deep Collaborative Filtering improved Slope One algorithm by combining with the
(HDCF) algorithm based on Collaborative Deep Learning similarity of human resource content information, and used
(CDL) algorithm. With the help of the feature extraction distributed coding to realize the algorithm and run it on a
ability of deep learning, HDCF overcomes the shortcomings of distributed cluster to help improve the performance of job
traditional collaborative filtering algorithms when dealing with recommender system [7].
sparse data and cold-start items. Experimental results show Recommender system currently mainly uses the
that HDCF has better recommendation performance than traditional recommender algorithms like collaborative
traditional recommender algorithms such as Probabilistic filtering or content-based filtering. The traditional
Matrix Factorization (PMF) and Content-Based Filtering recommender algorithms is simple, but they all have their
(CBF). own shortcomings. In recent years, deep learning has
achieved great success in a series of areas such as computer
Keywords-Job recommendation, deep learning, Stacked vision, natural language processing and semantic recognition,
Denosing Auto-Encoder, sparse data, cold start and some researchers have been using deep learning to
I. INTRODUCTION improve the performance of recommender algorithms. In
2007, Salakhutdinov [8] for the first time used limited
In recent years, more and more candidates hunt their jobs Boltzmann model to solve recommendation problem, then
from the Internet, bringing an unprecedented increase of other researchers began to use other deep learning model
human resources information, which leads to the problem of such as CNN [9], RNN [10] and SDAE [11-13] in
information overload in human resources services. recommender system. Deep learning has become a
Recommender system quickly became the main tool to solve breakthrough in the recommender system technology. Based
the information overload problem with the ability to locate on the work of Wang et al. [11], this paper puts forward a job
and push the content of interest to the user from the recommender algorithm based on deep learning according to
overloaded information. Recruitment site cannot connect the characteristics of human resource recommendation, name
candidates and companies without the support of as Hybrid Deep Collaborative Filtering (HDCF) algorithm.
personalized recommender system.
In 2000, Rafter et al. developed a job recommender II. ALGORITHM DESIGN
system called CASPER, in which combined automated The working flow of the job recommender algorithm
collaborative filtering and matching retrieval [1]. based on the deep learning is shown in Fig.1. Early
Malinowski proposed a two-way human resources preparation of the algorithm includes data collection and data
recommender system in 2006, he let candidates-oriented preprocessing.
recommender system and recruiters-oriented recommender Data collection phase is responsible for collecting
system work together to get better performance than single information of candidates and jobs from a human resource
recommender system [2]. In 2011, Hutterer combined the business system, while data preprocessing phase is
explicit and implicit feedback information contained in the responsible for performing data cleaning, data converting
recommendation results into the user model to generate a and data reduction on the acquired data and storing it in a
hybrid user model, using the user's comprehensive feedback data warehouse. After the data preparation is completed, the

978-1-5386-2030-4/17/$31.00 ©2017 IEEE 275


user-item rating matrix and the text feature vectors of items After obtaining the predicted rating matrix of NCS items
are constructed as inputs of the algorithm. The rating matrix (e.g. R ), the rating matrix of cold start items (e.g. R )
is constructed by transforming user actions into ratings can be figured out using content-based filting method.
according to some certain rules, and text feature vectors are Combining these two matrices, for target user i, we will get a
constructed from description of jobs. predicted rating vector and generate a top-N
HDCF consists of two sub-algorithms, namely the deep recommendation list.
model algorithm and the content-based filtering algorithm.
The body of the deep model algorithm is composed of a III. DATA PREPROCESSING OF HUMAN RESOURCE DATA
probabilistic stacked denoising auto-encoder (PSDAE) and a The data used in this paper is collected from a human
probabilistic matrix decomposition (PMF) model. PSDAE resources employment platform, which can be divided into
uses its feature extraction ability to extract low-dimensional two parts. One is the basic information of candidates and
hidden feature vectors from the high-dimensional text feature jobs stored in the database of the employment platform,
vectors of items, as input the hidden feature vectors into the which belongs to structured data records. Candidates' data
PMF model. PMF model uses both the low-dimensional includes fields such as gender, age, job intention, salary
feature vectors of items and the original rating matrix to requirements et al., as well as long text fields like
learn the semantic vector of users and items. Subsequently, educational background and work experience. Jobs’ data
the predicted rating matrix of not cold start (NCS) items is includes fields such as categories, number of visits, salary,
calculated using (1). welfare, as well as long text fields like job description. The
other part of data consists of users’ behavior records
 , = v =∑ , ,    collected from the client by log system, most of which are
log files. The log files records four kinds of behavior of
candidates, e.g. browsing, collecting, applying and not
interested, which reflects the candidate's preference for a job.
Data preprocessing involves repetitive data remediation,
including data cleaning, data transforming and data reduction,
and the processing rules need to be continually improved
during this process. The following lists some problems of the
original data of human resources, and the corresponding
processing rules:
(1) Some fields of job data, such as wages and the
number of visits, are partially missing. The corresponding
processing rule is to set this fields to the default values, e.g.,
wage is set to the average and the number of visits is set to 0.
(2) Companies may publish multiple recruitment records
for a single job, which will create duplicate records in
database and affect the accuracy of the recommended results.
The corresponding processing rule is to merge duplicate job
records, while merging the counting fields such as the
number of visits.
(3) A candidate may have multiple behavior records to a
job at multiple times, as the candidates’ behavior reflects
their preference degree, these different records may lead to
data inconsistencies. The corresponding processing rule is to
sort the behavior records according to the priority order of
behavior, i.e., applying, collecting, not interested and
browsing, then only keep the highest priority behavior record.
After the human resource data is cleaned, the basic
attribute fields of candidates and jobs can be digitized
according to certain conversion and mapping rules. Mapping
operation means creating dictionary tables for non-numerical
fields in candidate table and job table, and mapping string
type fields to digital fields. After the above data pre-
processing work is completed, we get a human resources
Figure 1. Flow chart of recommender algorithm based on deep learning. data warehouse.
where , denotes the rating of user i to job j, and and IV. HYBRID DEEP COLLABORATIVE FILTERING
denote the semantic vectors of user i and item j After constructing human resources data warehouse, this
respectively. paper realize a job recommender algorithm based on CDL

276
[11]. CDL overcomes the problem of data sparseness to A. Construct Hybrid Features
some extent by integrating a PSDAE model. But in practice, In the recommendation system, companies must provide
we find that single CDL model has some limitations. The the category of the job, wage, welfares and other basic
specific problems are listed as follow: information when post a recruitment requirement, which
(1) Text sparseness of job data. CDL utilizes the textual affects candidates’ degree of preference for the job to some
information of jobs to assist in predicting the missing ratings extent. Compared with the jobs’ text fields which is offered
of the original rating matrix, which overcomes the sparseness optionally, the basic information of the jobs has a higher
of rating data. But in real data set, some text fields of jobs degree of integrity. If these basic information and text fields
are too short to provide enough description information can be used together in recommender algorithm, the basic
about jobs. Due to the lack of sufficient corpus information, information will help express the content characteristics of
the word-bag vectors constructed from these texts cannot jobs, which will alleviate the degradation of the algorithm
extract the implicit features that can represent the caused by the problem of text sparseness to some extent.
information of jobs. Thus, CDL model cannot guarantee the Therefore, this paper designs hybrid feature vectors of jobs,
accuracy when using the implicit feature to help predict the replacing the original word-bag vectors of the as the input of
missing ratings. PSDAE. As shown in Fig.3, the hybrid feature vectors
(2) Cold start problem of items. The task of consist of word-bag vectors and structured attribute vectors
recommender system is to predict unknown values in the of jobs.
rating matrix, which can be divided into two types of We define word-bag vector set as ∈ ℝ ∗ , structured
prediction, as shown in Fig.2. The left figure a) shows in- attribute vector set as ∈ ℝ ∗ , hybrid feature matrix as ,
matrix prediction, which means all the jobs have at least a then the operation in Fig.3 can be expressed as :
rating, and traditional collaborative filtering algorithm can
solve this kind of problem quite well. The right figure b)
shows out-of-matrix prediction.  =  
The fourth and fifth jobs in figure b) do not have any
rating, and we call these items as cold start items. The
traditional collaborative filtering algorithms are not good at Which means ∈ ℝ( )∗
. HDCF uses the hybrid
predict the ratings of such cold start items, because their feature matrix as input of PSDAE, in order to extract
principle is to use the existing ratings of jobs to predict the hidden features of jobs. Then the changed graphical model of
unknown ones [14]. CDL is based on PMF algorithm CDL is shown in Fig.4, where ǃ ǃ and denote
(collaborative filtering model) and therefore has the same hyperparameters of the model, and adjusts the weight of
problem. structured attributes in hybrid features. Gray circles
(3) Real-time problem of CDL. In CDL model, the local represents the observed data, denotes original input,
optimal implicit semantic matrices U and V are obtained by denotes corrupted input, and R denotes rating matrix. Blue
alternately training PSDAE and PMF modules. When the circles represents the variables of model, denotes the
system gets a new post, the change needs to be updated into collection of all layers of weight matrices and biases, XL/2
the jobs’ implicit semantic matrix after re-training the whole denotes the output of middle layer (i.e. the output of encoder)
model. In real business scenario, all new jobs can only be of an L-layer network, U denotes the semantic vector set of I
updated into in the users’ recommended list after offline candidates, and V denotes the semantic vector set of J jobs.
update is finished [15], which cannot meet the real-time As is composed of the word-bag vector set (i.e. )
requirements of recruitment information. and structured attribute vectors (i.e. ), HDCF introduces a
In order to overcome the shortcomings of CDL when weight parameter to adjust the weight of the structured
solving the problem of job recommendation, this paper attributes in hybrid features. Then the corresponding
proposes HDCF, which is described as follows. objective function of the model can be expressed as:

 ℒ=− ∑ (‖W ‖ + ‖b ‖ ) − ∑ ‖u ‖ −

∑ v −X − (∑ Y , ∗ −Y, ∗ + ∑ Z , ∗ −
, ∗
,
Z , ∗‖ )−∑, ( , − )  

where Y and Z denote the reconstructed version of word-


bag vector set and structured attribute vectors respectively,
and = .
Figure 2. Two kinds of prediction in recommender systems.

277
B. Deep Model Training on NCS Items
First, we extract the rating matrix and the hybrid feature
matrix of NCS items from the whole dataset. As shown in
Fig.5, we can detect the zero vectors in the original rating
matrix and record the specific id of each cold start item, then
extract ratings of NCS items (i.e. RNCS) with these ids.
Correspondingly, we find and drop the vectors of cold-start
items from the hybrid feature matrix according to the ids,
and obtain the hybrid feature matrix of NCS items. HDCF
uses the clipped data to train a CDL model, and get the
predicted rating matrix of NCS items (i.e. R ), as shown in
Fig.5.
C. CBF on CS Items
After obtaining the predicted ratings of NCS items
Figure 3. Construction of hybrid feature. (R ), we can calculate the ratings of remaining cold start
items using a content-based filtering algorithm. Firstly, based
on the structured attributes of jobs, we can calculate the
similarity (cosine similarity) between the cold start items and
NCS items, and get the similarity matrix Sim∈ ℝ ∗ )
(where and denote the number of cold-start items
and NCS items respectively). Then, for a cold start item j,
select M items that most similar to item j as a nearest
neighbor set according to the similarity matrix. Finally,
the predicted ratings of item j are calculated using (4).

∑∈ ( , )∗
 = ∑∈ | ( , )|
 

Where sim (i, j) denotes the similarity of NCS item i and


cold start item j, and R denotes the predicted rating NCS
item i.
As predicted rating matrix of all items (i.e. R) is figured
out, recommender system can perform top-N
recommendation based on the matrix R . Note that when a
Figure 4. The graphic model of modified CDL [11]. new job (cold-start item) joins the recommender system, we
can also use CBF to predict its ratings. The recommender
system only needs to update the similarity matrix (Sim) of
jobs, and calculate the similarities with all NCS items using
Fig.4, so that the new job can be updated into the user's
recommended list. Therefore, HDCF can solve the real-time
problem of job recommender system in practice.
V. EXPERIMENT RESULTS AND DISCUSSION
In order to evaluate the recommending performance of
the job recommender algorithm (HDCF), experiments are
conducted on a real-world dataset, of which the data is
collected from a human resources business system. The
dataset contains 4692 candidates, 15000 jobs and 170844
user behavior records, which means 0.24% of user-item
rating matrix entries contain ratings. In this paper, the
behavior records of the candidates are ordered by time and
divided into two parts, the training set accounted for 90% of
the data, and test set accounted for 10%. HDCF divides the
Figure 5. Predict rating matrix R ̂NCS with dataset of NCS items.
human resource data into cold-start part and NCS part, then
performs different algorithm processing on them. Table.1
shows the simple statistics of these two parts of data.

278
TABLE I. STATISTICS OF TWO PARTS OF HUMAN RESOURCE DATA time, the recall of HDCF is higher than that of CDL, which
Number of records in test proves that HDCF improves the performance of the
Item Category Number traditional recommender algorithms with the ability of
set feature extraction of deep learning models, and HDCF
CS 6209 1791 performs better than CDL by using the hybrid feature vectors.
NCS 8791 14790 As shown in the right plot of Fig.6, PMF and CDL get
All 15000 16581 recalls that close to 0 when dealing with cold-start items,
As in [11, 14, 16], this paper uses recall as the which means they do not have the ability to solve the cold
performance measure. For each user i, we sort the predicted start problem in recommendation. However, CBF and HDCF
ratingsR , and recommend the top N items to users. Then the have a relatively good recall for cold start items, which
recall for user i can be defined as (5). proves that CBF and HDCF can use content information of
items to discover items that users may like.
| ∩ |
@ = | |
  VI. CONCLUSION
In this paper, we study and improve the existing
Where Ri denotes the recommended lists, N denotes its recommender algorithm based on deep learning and apply it
size, and TRUEi denotes the number of items that user i likes to the field of job recommendation, hoping to solve the
in test set. In the job recommender system, higher recall with problems existing in traditional recommender algorithm. We
a fixed N represents better performance. collect the information of the candidate and the job from a
CDL algorithm combines the deep learning model human resources business system, then performed pre-
(PSDAE) and the traditional collaborative filtering model processing operations on collected data, such as data
(PMF). Further in this paper, HDCF combines CDL and cleaning, data transforming and data reduction, and obtain a
CBF, hoping to overcome the shortcomings of CDL on cold human resources data warehouse for job recommender
start problem by using CBF algorithm. Noting that HDCF is algorithm. In addition, we propose HDCF based on CDL
essentially a combination of PMF and CBF, this paper algorithm. With the help of the feature extraction ability of
choose these two traditional recommender algorithms for deep learning, HDCF overcomes the shortcomings of
comparison. This paper realizes PMF and CBF algorithms traditional collaborative filtering algorithms when dealing
based on the human resource dataset, and experimental with sparse data and cold-start items. Experimental results
results are shown in Fig.6. show that HDCF has better recommendation performance
As shown in the left plot of Fig.6, the overall recall of than traditional recommender algorithms such as PMF and
CDL and HDCF that based on deep learning is higher than CBF.
that of traditional CBF and PMF algorithms. At the same

Figure 6. Performance comparison of CDL, HDCF, CBF and PMF based on overall recall@N and recall@N of cold-start items.

279
[8] Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines
ACKNOWLEDGEMENT for collaborative filtering[C]. International Conference on Machine
Learning. ACM, 2007:791-798.
This work is financially supported by Guangdong [9] Oord A V D, Dieleman S, Schrauwen B. Deep content-based music
Provincial Science and Technology Plan (No. recommendation[C]. Neural Information Processing Systems
2016B030308002). Conference (NIPS 2013). Neural Information Processing Systems
Foundation (NIPS), 2013:2643-2651.
REFERENCES [10] Florez O U. Deep Learning of Semantic Word Representations to
[1] Rafter R, Bradley K, Smyth B. Personalised Retrieval for Online Implement a Content-Based Recommender for the RecSys
Recruitment Services[C]. 22nd Annual Colloquium on Information Challenge’14[J]. Communications in Computer & Information
Retrieval (IRSG2000). 2000:382. Science, 2014, 475:199-204.
[2] Malinowski J, Keim T, Wendt O, et al. Matching People and Jobs: A [11] Wang H, Wang N, Yeung D Y. Collaborative Deep Learning for
Bilateral Recommendation Approach[J]. 2006, 6:137c. Recommender Systems[J]. 2014:1235-1244.
[3] Hutterer M. Enhancing a Job Recommender with Implicit User [12] Li S, Kawale J, Fu Y. Deep Collaborative Filtering via Marginalized
Feedback[D]. FakultLt f §r Informatik der Technischen UniversitLt Denoising Auto-encoder[C]. The, ACM International. ACM,
Wien, 2011. 2015:811-820.
[4] Paparrizos I, Cambazoglu B B, Gionis A. Machine learned job [13] Wei J, He J, Chen K, et al. Collaborative filtering and deep learning
recommendation[C]. 2011:325-328. based recommendation system for cold start items[J]. Expert Systems
with Applications, 2016, 69:29-39.
[5] Song Q. Research on User Modeling of Personalized Recommender
System Based on Ontology [D]. Nanjing University of Aeronautics [14] Wang C, Blei D M. Collaborative topic modeling for recommending
and Astronautics, 2009. scientific articles[C]. ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Diego, Ca, USA,
[6] Meng F. Application of Improved Slope One Algorithm in August. DBLP, 2011:448-456.
Personalized Employment Recommendation System [D]. Beijing
University Of Technology, 2014. [15] Xiang L. Practice in Recommender System[M]. The People's Posts &
Telecommunications Press, 2012.
[7] Liu S. Application of Improved Slope One Algorithm in Personalized
Employment Recommendation System [J]. Computer Knowledge and [16] Purushotham S, Liu Y, Kuo C C J. Collaborative Topic Regression
Technology: Academic Exchange, 2016, 12(4X):84-85. with Social Matrix Factorization for Recommendation Systems[J].
Computer Science, 2012.

280

You might also like