Chen 2017
Chen 2017
Abstract—In recent years, deep learning has achieved great to enhance the performance of job recommender system [3].
success in a series of areas, but there is few published work on Paparrizos et al. defined the question of human resource
deep learning for job recommendation. Most researchers focus recommendation as a supervised machine learning problem.
on the application of traditional algorithms, most of which still They used the employment history and information of
use algorithms like collaborative filtering and content-based employees to establish a forecasting model to predict their
filtering. In this paper, we study and improve the existing future work [4]. Song Qingqing proposed job recommender
recommender algorithm based on deep learning and apply it to system based on the ontology, and focus on how to build
the field of job recommendation, hoping to solve the problems user model based on ontology for job recommendation [5].
existing in traditional recommender algorithm. We collect the
Meng Fanlian did research on the job recommender engine
information of the candidate and the job from a human
using hybrid recommender algorithm, he combined PLSA
resources business system, then performed pre-processing
operations on collected data, such as data cleaning, data and the content-based filtering algorithm to obtain the two-
transforming and data reduction, and obtain a human way job recommendation algorithm, which can recommend
resources data warehouse for job recommender algorithm. In both job and candidates [6]. Liu Shunwen proposed an
addition, we propose Hybrid Deep Collaborative Filtering improved Slope One algorithm by combining with the
(HDCF) algorithm based on Collaborative Deep Learning similarity of human resource content information, and used
(CDL) algorithm. With the help of the feature extraction distributed coding to realize the algorithm and run it on a
ability of deep learning, HDCF overcomes the shortcomings of distributed cluster to help improve the performance of job
traditional collaborative filtering algorithms when dealing with recommender system [7].
sparse data and cold-start items. Experimental results show Recommender system currently mainly uses the
that HDCF has better recommendation performance than traditional recommender algorithms like collaborative
traditional recommender algorithms such as Probabilistic filtering or content-based filtering. The traditional
Matrix Factorization (PMF) and Content-Based Filtering recommender algorithms is simple, but they all have their
(CBF). own shortcomings. In recent years, deep learning has
achieved great success in a series of areas such as computer
Keywords-Job recommendation, deep learning, Stacked vision, natural language processing and semantic recognition,
Denosing Auto-Encoder, sparse data, cold start and some researchers have been using deep learning to
I. INTRODUCTION improve the performance of recommender algorithms. In
2007, Salakhutdinov [8] for the first time used limited
In recent years, more and more candidates hunt their jobs Boltzmann model to solve recommendation problem, then
from the Internet, bringing an unprecedented increase of other researchers began to use other deep learning model
human resources information, which leads to the problem of such as CNN [9], RNN [10] and SDAE [11-13] in
information overload in human resources services. recommender system. Deep learning has become a
Recommender system quickly became the main tool to solve breakthrough in the recommender system technology. Based
the information overload problem with the ability to locate on the work of Wang et al. [11], this paper puts forward a job
and push the content of interest to the user from the recommender algorithm based on deep learning according to
overloaded information. Recruitment site cannot connect the characteristics of human resource recommendation, name
candidates and companies without the support of as Hybrid Deep Collaborative Filtering (HDCF) algorithm.
personalized recommender system.
In 2000, Rafter et al. developed a job recommender II. ALGORITHM DESIGN
system called CASPER, in which combined automated The working flow of the job recommender algorithm
collaborative filtering and matching retrieval [1]. based on the deep learning is shown in Fig.1. Early
Malinowski proposed a two-way human resources preparation of the algorithm includes data collection and data
recommender system in 2006, he let candidates-oriented preprocessing.
recommender system and recruiters-oriented recommender Data collection phase is responsible for collecting
system work together to get better performance than single information of candidates and jobs from a human resource
recommender system [2]. In 2011, Hutterer combined the business system, while data preprocessing phase is
explicit and implicit feedback information contained in the responsible for performing data cleaning, data converting
recommendation results into the user model to generate a and data reduction on the acquired data and storing it in a
hybrid user model, using the user's comprehensive feedback data warehouse. After the data preparation is completed, the
276
[11]. CDL overcomes the problem of data sparseness to A. Construct Hybrid Features
some extent by integrating a PSDAE model. But in practice, In the recommendation system, companies must provide
we find that single CDL model has some limitations. The the category of the job, wage, welfares and other basic
specific problems are listed as follow: information when post a recruitment requirement, which
(1) Text sparseness of job data. CDL utilizes the textual affects candidates’ degree of preference for the job to some
information of jobs to assist in predicting the missing ratings extent. Compared with the jobs’ text fields which is offered
of the original rating matrix, which overcomes the sparseness optionally, the basic information of the jobs has a higher
of rating data. But in real data set, some text fields of jobs degree of integrity. If these basic information and text fields
are too short to provide enough description information can be used together in recommender algorithm, the basic
about jobs. Due to the lack of sufficient corpus information, information will help express the content characteristics of
the word-bag vectors constructed from these texts cannot jobs, which will alleviate the degradation of the algorithm
extract the implicit features that can represent the caused by the problem of text sparseness to some extent.
information of jobs. Thus, CDL model cannot guarantee the Therefore, this paper designs hybrid feature vectors of jobs,
accuracy when using the implicit feature to help predict the replacing the original word-bag vectors of the as the input of
missing ratings. PSDAE. As shown in Fig.3, the hybrid feature vectors
(2) Cold start problem of items. The task of consist of word-bag vectors and structured attribute vectors
recommender system is to predict unknown values in the of jobs.
rating matrix, which can be divided into two types of We define word-bag vector set as ∈ ℝ ∗ , structured
prediction, as shown in Fig.2. The left figure a) shows in- attribute vector set as ∈ ℝ ∗ , hybrid feature matrix as ,
matrix prediction, which means all the jobs have at least a then the operation in Fig.3 can be expressed as :
rating, and traditional collaborative filtering algorithm can
solve this kind of problem quite well. The right figure b)
shows out-of-matrix prediction. =
The fourth and fifth jobs in figure b) do not have any
rating, and we call these items as cold start items. The
traditional collaborative filtering algorithms are not good at Which means ∈ ℝ( )∗
. HDCF uses the hybrid
predict the ratings of such cold start items, because their feature matrix as input of PSDAE, in order to extract
principle is to use the existing ratings of jobs to predict the hidden features of jobs. Then the changed graphical model of
unknown ones [14]. CDL is based on PMF algorithm CDL is shown in Fig.4, where ǃ ǃ and denote
(collaborative filtering model) and therefore has the same hyperparameters of the model, and adjusts the weight of
problem. structured attributes in hybrid features. Gray circles
(3) Real-time problem of CDL. In CDL model, the local represents the observed data, denotes original input,
optimal implicit semantic matrices U and V are obtained by denotes corrupted input, and R denotes rating matrix. Blue
alternately training PSDAE and PMF modules. When the circles represents the variables of model, denotes the
system gets a new post, the change needs to be updated into collection of all layers of weight matrices and biases, XL/2
the jobs’ implicit semantic matrix after re-training the whole denotes the output of middle layer (i.e. the output of encoder)
model. In real business scenario, all new jobs can only be of an L-layer network, U denotes the semantic vector set of I
updated into in the users’ recommended list after offline candidates, and V denotes the semantic vector set of J jobs.
update is finished [15], which cannot meet the real-time As is composed of the word-bag vector set (i.e. )
requirements of recruitment information. and structured attribute vectors (i.e. ), HDCF introduces a
In order to overcome the shortcomings of CDL when weight parameter to adjust the weight of the structured
solving the problem of job recommendation, this paper attributes in hybrid features. Then the corresponding
proposes HDCF, which is described as follows. objective function of the model can be expressed as:
ℒ=− ∑ (‖W ‖ + ‖b ‖ ) − ∑ ‖u ‖ −
∑ v −X − (∑ Y , ∗ −Y, ∗ + ∑ Z , ∗ −
, ∗
,
Z , ∗‖ )−∑, ( , − )
277
B. Deep Model Training on NCS Items
First, we extract the rating matrix and the hybrid feature
matrix of NCS items from the whole dataset. As shown in
Fig.5, we can detect the zero vectors in the original rating
matrix and record the specific id of each cold start item, then
extract ratings of NCS items (i.e. RNCS) with these ids.
Correspondingly, we find and drop the vectors of cold-start
items from the hybrid feature matrix according to the ids,
and obtain the hybrid feature matrix of NCS items. HDCF
uses the clipped data to train a CDL model, and get the
predicted rating matrix of NCS items (i.e. R ), as shown in
Fig.5.
C. CBF on CS Items
After obtaining the predicted ratings of NCS items
Figure 3. Construction of hybrid feature. (R ), we can calculate the ratings of remaining cold start
items using a content-based filtering algorithm. Firstly, based
on the structured attributes of jobs, we can calculate the
similarity (cosine similarity) between the cold start items and
NCS items, and get the similarity matrix Sim∈ ℝ ∗ )
(where and denote the number of cold-start items
and NCS items respectively). Then, for a cold start item j,
select M items that most similar to item j as a nearest
neighbor set according to the similarity matrix. Finally,
the predicted ratings of item j are calculated using (4).
∑∈ ( , )∗
= ∑∈ | ( , )|
278
TABLE I. STATISTICS OF TWO PARTS OF HUMAN RESOURCE DATA time, the recall of HDCF is higher than that of CDL, which
Number of records in test proves that HDCF improves the performance of the
Item Category Number traditional recommender algorithms with the ability of
set feature extraction of deep learning models, and HDCF
CS 6209 1791 performs better than CDL by using the hybrid feature vectors.
NCS 8791 14790 As shown in the right plot of Fig.6, PMF and CDL get
All 15000 16581 recalls that close to 0 when dealing with cold-start items,
As in [11, 14, 16], this paper uses recall as the which means they do not have the ability to solve the cold
performance measure. For each user i, we sort the predicted start problem in recommendation. However, CBF and HDCF
ratingsR , and recommend the top N items to users. Then the have a relatively good recall for cold start items, which
recall for user i can be defined as (5). proves that CBF and HDCF can use content information of
items to discover items that users may like.
| ∩ |
@ = | |
VI. CONCLUSION
In this paper, we study and improve the existing
Where Ri denotes the recommended lists, N denotes its recommender algorithm based on deep learning and apply it
size, and TRUEi denotes the number of items that user i likes to the field of job recommendation, hoping to solve the
in test set. In the job recommender system, higher recall with problems existing in traditional recommender algorithm. We
a fixed N represents better performance. collect the information of the candidate and the job from a
CDL algorithm combines the deep learning model human resources business system, then performed pre-
(PSDAE) and the traditional collaborative filtering model processing operations on collected data, such as data
(PMF). Further in this paper, HDCF combines CDL and cleaning, data transforming and data reduction, and obtain a
CBF, hoping to overcome the shortcomings of CDL on cold human resources data warehouse for job recommender
start problem by using CBF algorithm. Noting that HDCF is algorithm. In addition, we propose HDCF based on CDL
essentially a combination of PMF and CBF, this paper algorithm. With the help of the feature extraction ability of
choose these two traditional recommender algorithms for deep learning, HDCF overcomes the shortcomings of
comparison. This paper realizes PMF and CBF algorithms traditional collaborative filtering algorithms when dealing
based on the human resource dataset, and experimental with sparse data and cold-start items. Experimental results
results are shown in Fig.6. show that HDCF has better recommendation performance
As shown in the left plot of Fig.6, the overall recall of than traditional recommender algorithms such as PMF and
CDL and HDCF that based on deep learning is higher than CBF.
that of traditional CBF and PMF algorithms. At the same
Figure 6. Performance comparison of CDL, HDCF, CBF and PMF based on overall recall@N and recall@N of cold-start items.
279
[8] Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines
ACKNOWLEDGEMENT for collaborative filtering[C]. International Conference on Machine
Learning. ACM, 2007:791-798.
This work is financially supported by Guangdong [9] Oord A V D, Dieleman S, Schrauwen B. Deep content-based music
Provincial Science and Technology Plan (No. recommendation[C]. Neural Information Processing Systems
2016B030308002). Conference (NIPS 2013). Neural Information Processing Systems
Foundation (NIPS), 2013:2643-2651.
REFERENCES [10] Florez O U. Deep Learning of Semantic Word Representations to
[1] Rafter R, Bradley K, Smyth B. Personalised Retrieval for Online Implement a Content-Based Recommender for the RecSys
Recruitment Services[C]. 22nd Annual Colloquium on Information Challenge’14[J]. Communications in Computer & Information
Retrieval (IRSG2000). 2000:382. Science, 2014, 475:199-204.
[2] Malinowski J, Keim T, Wendt O, et al. Matching People and Jobs: A [11] Wang H, Wang N, Yeung D Y. Collaborative Deep Learning for
Bilateral Recommendation Approach[J]. 2006, 6:137c. Recommender Systems[J]. 2014:1235-1244.
[3] Hutterer M. Enhancing a Job Recommender with Implicit User [12] Li S, Kawale J, Fu Y. Deep Collaborative Filtering via Marginalized
Feedback[D]. FakultLt f §r Informatik der Technischen UniversitLt Denoising Auto-encoder[C]. The, ACM International. ACM,
Wien, 2011. 2015:811-820.
[4] Paparrizos I, Cambazoglu B B, Gionis A. Machine learned job [13] Wei J, He J, Chen K, et al. Collaborative filtering and deep learning
recommendation[C]. 2011:325-328. based recommendation system for cold start items[J]. Expert Systems
with Applications, 2016, 69:29-39.
[5] Song Q. Research on User Modeling of Personalized Recommender
System Based on Ontology [D]. Nanjing University of Aeronautics [14] Wang C, Blei D M. Collaborative topic modeling for recommending
and Astronautics, 2009. scientific articles[C]. ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Diego, Ca, USA,
[6] Meng F. Application of Improved Slope One Algorithm in August. DBLP, 2011:448-456.
Personalized Employment Recommendation System [D]. Beijing
University Of Technology, 2014. [15] Xiang L. Practice in Recommender System[M]. The People's Posts &
Telecommunications Press, 2012.
[7] Liu S. Application of Improved Slope One Algorithm in Personalized
Employment Recommendation System [J]. Computer Knowledge and [16] Purushotham S, Liu Y, Kuo C C J. Collaborative Topic Regression
Technology: Academic Exchange, 2016, 12(4X):84-85. with Social Matrix Factorization for Recommendation Systems[J].
Computer Science, 2012.
280