Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views11 pages

Discovering User Interests From Web Browsing Behav

This conference paper presents an approach to discover user interests from web browsing behavior, specifically applied to Internet news services. The proposed method analyzes browsing content and time to build customer profiles, which enhances personalized news recommendations compared to traditional methods. An empirical study demonstrates the effectiveness of this approach in improving user satisfaction and performance in news delivery.

Uploaded by

akshayjain.is22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Discovering User Interests From Web Browsing Behav

This conference paper presents an approach to discover user interests from web browsing behavior, specifically applied to Internet news services. The proposed method analyzes browsing content and time to build customer profiles, which enhances personalized news recommendations compared to traditional methods. An empirical study demonstrates the effectiveness of this approach in improving user satisfaction and performance in news delivery.

Uploaded by

akshayjain.is22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224075956

Discovering user interests from Web browsing behavior: An application to


Internet news services

Conference Paper · February 2002


DOI: 10.1109/HICSS.2002.994214 · Source: IEEE Xplore

CITATIONS READS

72 1,084

2 authors, including:

Ting-Peng Liang
National Sun Yat-sen University
226 PUBLICATIONS 14,227 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ting-Peng Liang on 05 June 2014.

The user has requested enhancement of the downloaded file.


Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Discovering User Interests from Web Browsing Behavior: An


Application to Internet News Services
Ting-Peng Liang and Hung-Jen Lai
Department of Information Management
National Sun Yat-sen University
Kaohsiung, Taiwan
Email: [email protected]

Abstract filtering and recommendation will be reviewed


briefly. This is followed by the presentation of
Discovering user interests is a very our approach and its application in Internet news
important task for providing personalized services. Section 4 presents the experimental
services in electronic commerce. A popular design. Findings are shown in Section 5.
approach is to develop customer profiles from Section 6 concludes the paper.
their browsing behavior. In this paper, we
present an approach that analyzes the browsing 2. Literature Review
content and time to determine user interests.
An empirical study using actual news provided The wide spread of the Internet has created
by the China Times shows that the proposed an efficient channel for information
system outperforms the traditional headline news dissemination. The information overload,
compiled by the news editor in both objective however, becomes a problem. How to reduce
performance indices and customer satisfaction. unnecessary information and provide customized
services becomes an important issue.
1. Introduction A few filtering mechanisms have been
proposed in the past. A typical one is to ask the
The rapid propagation of the Internet, reader to report his interest after reading. The
along with the evolution of information system can then build a profile of the reader and
technologies, has changed the nature of many make recommendation accordingly. For
businesses. The large amount of transactional example, Mock and Vemuri [10] presented the
data collected from the use of information Intelligent News Filtering Organization System
systems allows a company to better understand (INFOS) that asked each reader to indicate
customer needs and to integrate the knowledge whether he liked the report. The system
into their product design and marketing plans. reorganizes the order of news based on the
For physical products (e.g., computers and revealed preference. Results from a pilot test
televisions), mass customization and fast show that INFOS can effectively reduce the
response to market needs become critical to reader’s search load.
remaining competitive. For digital products Another approach is behavior-based. For
and services (e.g., news services and other instance, Sakagami and Kamba [17] developed
Internet content providers, ICP), personalized the ANATAGONOMY that learns reading
services that offer the tailored content to preference from the browsing behavior (e.g.,
different clients based on their interests become scroll, enlarge windows, etc.) of a user. The
feasible and necessary. system has a learning engine and a scoring
In this paper, we propose an approach that engine to produce personalized web news.
builds customer profiles from their browsing Information filtering and recommendation
behavior recorded by the computer and can also be performed based on feedback from
recommends personal services delivered on the others. For instance, Konstan, et al. [5]
web based on the profiles. The approach is proposed a system, called GroupLens, which
then applied to the Internet news services to summarized the feedback from previous readers
evaluate its applicability. The news to allow the next reader to determine whether to
recommendation system includes components read it. This is called collaborative filtering.
for news structure analysis, customer profile A system proposed by Balabanovic and Shoham
analysis, and personal recommendation [1] combines content analysis and collaborative
mechanism. An empirical study was performed filtering. It takes into account the association
to evaluate the proposed approach. between a reader and the theme of a report to
The remainder of the paper is organized as identify the discrepancy between different
follows. First, literature related to information individuals.

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 1


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Rucker and Polanco [16] proposed a system that Aij(pij) shows that the likelihood of object
analyzes the structure of bookmarks to Rj having the attribute Ai is pij.
determine the interests of an individual. In fact, Definition 3: Recency adjustment
bookmarks reveal not only the person’s interest The recency adjustment is to give higher
but also the way information is organized. weights to objects accessed recently than those
accessed earlier. The recency weight of an
3. A Time-based Approach to User object can be calculated by the following
Profiling equation:
γj(Rj) = g(Dj),
The key to information filtering and Where: γi(Rj) is the recency weight of the
recommendation is user profiling. In general, object, Rj;
user profiles can be obtained from self-reporting Dj is the elapse day of reading Rj; D0 <=Dj
or analysis of browsing behavior. Although <=Du, D0 is the lower bound for a elapse day to
self-reporting may be considered more accurate be considered for adjustment, and Du is the
in some cases, it is often tedious and difficult to upper bound for an elapse day to be considered
deal with dynamic changes. Therefore, much for adjustment.
research has focused on identifying user interests g is a function that calculates the recency
from the browsing data collected on-line. In weight. It may be a linear or a sigmoid function.
this section, we present a time-based approach Definition 4: Adjusted Interest level of an
that determines user interests based on the time attribute
they spent viewing objects with known attributes. If the interest level needs to be adjusted by
The underlying assumption of the method is that the recency weight, then the interest level
the more an object contains the information of becomes:
interest to a user, the longer the user would view σi = Σj[σj(Rj)* γi(Rj)*Aij(pij)].
the object. Because errors may exist when the Definition 5: User profile
browsing time is too long or too short, we use The profile of a user is a combination of
the average reading speed and recency weight to object attributes and their associated interest
adjust the interest level. The method can be levels. It can be represented as: U([Ai(σi)],
described briefly in the following: where U is a user, Ai is a set of attribute.
The reader profile is represented as a
combination of attributes and interest levels. 4. Application to Personal News
The interest level of a particular attribute is Recommendation
determined by the previous time spending on
browsing items having the attribute and may be In this section, an application of the time-based
adjusted by recency and other factors. mechanism to personal news recommendation
Therefore, given an object Rj[Aij(pij)], the over the Internet is described. News services
approach is defined as follows: are popular because the Internet provides an
Definition 1: Interest level of an object efficient way for news distribution. It can also
The interest level of an object is an be personalized at a very low cost. Therefore,
indicator of the extent to which a user is it is an excellent domain for testing the method.
interested in the object. The interest level is Since each news report contains certain
calculated by the following equation: characters, the system needs a module to
σj(Rj) = f(Tj/Tj*), determine the attributes of the content that are of
Where: σi(Rj) is the interest level of the interest. Therefore, four modules are essential:
object, Rj; structure analysis, reader profile analysis, rating
Tj is the time spent by the user on reading for recommendation, and learning (as shown in
Rj; T0 <=Tj <=Tu, T0 is the lower bound for a Figure 1).
browsing time to be considered reasonable, and
Tu is the upper bound for a browsing time to be 4.1 Structure analysis
considered reasonable.
Tj* is the estimated reasonable reading The first step for personalized news
time based on previous average reading speed; service is to analyze the product, i.e., the
f is a function that calculates the interest structure of the news. The foundation of
level. It may be a linear or a sigmoid function. structure analysis is to identify keywords in a
Definition 2: Interest level of an attribute report and to build the keyword dictionary.
The interest level of an attribute is the Keywords in each report are identified based on
aggregation of the interest levels of objects that the property of the words. Miller [9] developed
have the attribute. It is calculated as follows: a comprehensive keyword dictionary (called
σi = Σj[σj(Rj)*Aij(pij)], where WordNet) that includes many nouns and verbs.
σi is the interest level of attribute Ai; McQuail [8] points out that keywords are those

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 2


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

words associated with who, what, where, when, important keywords [14,15,19]].
and why. Names mentioned in a report are also

Structure
News Analysis
Recommended
Rating News

Browsing Reader Threshold


Behavior Profile Value

Learning

Figure 1. Architecture of the News Recommendation System

In our mechanism, we use nouns with a Inside.com and Yahoo appear four
particular emphasis on the role and issues times and AOL appears three times.
mentioned in the report. In the following “state (3) Position adjustment: Since the title
of the web” report, for example, the keywords and first paragraph often contain more
marked in italics can be identified. important information in a report, the
State of the Web: Inside.com Says 'No keywords that appeared in the title are
Thanks' to Yahoo! multiplied by 10 and those that
Could this be the beginning of the end of the appeared in the first paragraph are
multiplied by 3. As a result, the
portal strategy as we know it?
adjusted frequency of Inside.com is 19,
By James J. Cramer while Yahoo is 17, and AOL is 3.
You might not have noticed that This significantly differentiates the
Inside.com isn't on Yahoo! (Nasdaq: YHOO relative importance of different
- news) anymore. You may never even have keywords in the report.
heard of Inside.com. But you have to (4) Noise elimination: To simplify the
understand that this lone decision, by Steve structure, minimum frequency may be
Brill, the head of Inside.com, is sending set to remove unimportant words.
For instance, we may set a rule that
shock waves throughout the portal world.
keywords whose adjusted frequency is
Here's why. lower than 20% of the most important
When the Web first started to be keyword are removed. Then, only
commercial, outfits like Yahoo! and America keywords whose adjusted frequencies
Online needed to have content to wrap are higher than 4 are considered valid
around their ads. They first tried to grow it and will be recorded to represent the
and pay for it. Then, an epiphany struck Bob structure of the report.
Pittman at AOL: Content providers needed (5) Conversion to ratios: All keyword
frequencies are then converted into a
eyeballs so badly that they would pay AOL ratio, which is the frequency of a
to be there! That shift in strategy was the keyword/sum(frequencies of all
death knell for almost all original content keywords). The structure of the
providers on the Web because if you didn't report is the collection of valid
have money, you couldn't pay, and the only keywords along with their respective
people who could pay were established frequency ratio. The structure of the
players and players that tapped the public above example is [Inside.com (.31),
Yahoo (.28), portal (.21), web (.20)].
markets. (Source: http://www.yahoo.com/)
4.2 Analysis of Reader Profile
After identifying keywords, we further Based on the algorithm specified in
analyze the position and frequency of keywords. Section 3, we can analyze the interest profile of
Major steps include: a user in the following procedures:
(1) Determine whether synonyms exist,
(1) Calculate the average reading speed of
including American Online = AOL, President
the user: The computer keeps a record
Clinton = Bill Clinton, Yahoo = YHOO, etc..
of the time a user read a report.
(2) Calculate the frequency of keywords.
These data are aggregated and
In the previous example, the word

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 3


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

adjusted by the length of the reports. two reports that had been read by a user:
The average reading speed is [A001: Inside.com (.31), Yahoo (.28),
calculated by dividing the total Portal (.21), Web (.20)].
number of words by the total reading [A015: Web (.60), Yahoo (.30),
time. Merger (.10)]
(2) Calculate the interest level: The
interest level of a report is calculated 4.3 Rating and Recommendation
by the time spent in reading it.
Interest level is represented by the Rating and recommendation determine the
ratio of dividing the actual reading matching between a new report and a reader. If
time by the estimated reading time, the matching level is higher than the threshold
where the estimated reading time = value, the report will be recommended to the
total words in the report/average reader. Otherwise, it is dropped. Steps for
reading speed. A mapping table is matching reports with a reader include:
built to determine the interest levels. (1) Determine the structure of the report.
In our system, we set a range between For example, we have a new report
3 and 250 seconds as the reasonable A032, whose structure is [A032:
range for reading a news report. The Portal (0.7), Merger (0.3)].
incidents outside this range are (2) Calculate the matching level: The
considered exceptional and are matching level is calculated by
assigned an interest level of 0. If the aggregating the interest levels of
time ratio of a reading is below .25 different keywords. In the example,
(i.e., the actual reading time is 25% of the matching level is 0.648
the estimated reading time), the case (=0.7*0.84+0.3*0.2*).
is considered a fast browsing and is (3) Recommend news based on matching
assigned an interest level of 1. The levels: A hurdle can be set to screen
ratio between .25 and .75 is assigned out reports with low matching levels.
the value of 2, between .75 and 1.25 is The reports whose matching levels are
assigned 3, between 1.25 and 1.75 is higher than the threshold value will be
assigned 4, and above 1.75 is assigned recommended. In this step,
5. guidelines on the number of news
(3) Conduct recency adjustment: Since recommended and distribution of
we can reasonably assume that reports news among different categories can
read recently can more accurately be used to enhance the accuracy of
reflect a reader’s interest, the system recommendation.
gives a weight of 2 to reports that
were read within D1 days, 1.75 to 4.4 Learning
reports read between D1 and D2 days,
1.5 to those read between D2 and D3, The learning module is designed to adjust
and 1 to those longer than D3. various weights. It is not the focus of the paper
(4) Calculate the adjusted interest level of and hence is omitted.
a reader on a report is the product of
the interest level from step (3) 5. Empirical Study
multiplied by the recency weight
result from step (4). In order to evaluation the news
(5) Build the profile: The interest profile recommendation mechanism, an experimental
of a user is to multiply news structure study was performed. The benchmarks were
by the interest level. For example, the regular headline approach (HLA) and the
suppose a user has read two reports self-reported interests (SRI) approach.
that have the following structure and Prototype systems that present news by HLA,
his interest levels of the reports A001 SRI, and browsing behavior analysis (BBA)
and A015 are 4 and 2 respectively, approach were developed for the experiment.
then the resulting interest profile is
[Web (2.0), Yahoo (1.72), Inside.com
(1.24), Portal (.84), Merger (.20)].
This indicates that the user is most
interested in reports related to web,
followed by Yahoo, Inside.com, Portal,
and Merge.
[Example] the following are structures of

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 4


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

browse titles or read news in a news website, (4)


5.1 Experimental variables and frequency of using news websites in a week, (5)
hypotheses average time spent on a news website in a visit,
(6) which is your most common news source
The independent variable of the study is (print media, TV, or web), (7) experiences in
different news recommendation mechanisms, using the web, (8) motivation for using news
which include BBA, SRI, and the traditional websites, (9) importance of receiving news and
HLA. The dependent variable is the evaluation information to you, (10) preference in filtering
of the systems, which include objective news by the system, (11) choosing news by
performance measures and subjective user interests or location of the title.
satisfaction. They will be described in Section
5.3. (2) Objective performance measures
According to the framework, we would Both objective performance and user
like to examine whether news recommendation satisfaction of the subjects were measured.
mechanisms perform better than the traditional Two indices are common for measuring
HLA, and whether BBA outperforms SRI. The objective performance: precision and recall [18].
null hypotheses are: Precision measures the portion of recommended
H1: BBA and SRI perform equally well news that is relevant (i.e., read by the subject)
with HLA, and recall measures the portion of relevant news
H2: BBA performs equally well with SRI. that is recommended. To explore further detail
of the performance, five indices have been
5.2 Experimental Systems developed for the experiment: acceptance rate,
hit rate by the number of news, hit rate by the
Three experimental websites that provide reading time, effective usage rate, and effective
news services were designed. The HLA system reading rate. The definition of these indices is
copies the regular headline news approach. summarized in Table 2.
That is, the home page outlines the titles of the Table 2. Definition of Subjective
headline news that were determined by the editor. Performance Measures
Other news is organized into categories with the Measure Definition
category names (such as sport, stock, etc.) Acceptance No. of recommended and
shown on the homepage. The reader has to Rate read/No. of recommended
click a category name to get into the second Number Hit No. of recommended and
level web pages in order to read the titles within Rate read/Total No. of read
the category. The design of the SRI and BBA Time Hit Rate Total time reading
systems is the same except that their home pages recommended news/Total
are customized for each individual reader. reading time
When a user logs into the SRI or BBA system, Effective use Total time of selecting and
the computer identifies his reading interests from rate reading news/Total time
the user profile and then composes the Effective Total time of reading/Total
recommended news into the homepage. Those reading rate time of selecting and
that are not recommended remain in the second reading
level, i.e., their category names are shown on the
homepage but the reader needs to click on the The acceptance rate (AR) shows the
category name to see news titles. number of recommended reports that are read by
All systems have automated recording the subject. Number hit rate (NHR) shows the
module that keeps track of the following data: (1) portion of the report read by the subject that is
which news the subject click to read, (2) time among the recommended list. The time hit rate
spent in reading a particular news, (3) time spent (THR) shows the portion of reading time that is
in choosing a news to read (i.e., the time that the spent on the recommended news. Effective use
subject logs on the system but not for reading the rate (EUR) shows the availability of the system,
news). that is, the portion of time spent with the system
that is actually selecting and reading news. It
5.3 Measurement Instruments excludes the time for transmitting and
processing news by the system. Effective
(1) Background questionnaire reading rate (ERR) shows the portion of
The background questionnaire collects available time spent on reading news. The
data about the subject. It includes 11 questions: higher these indices are, the better.
(1) most interesting category of news (a total of
13 categories), (2) least interesting category of (2) User Satisfaction
news, (3) if you have 20 minutes, would you The instrument for measuring user

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 5


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

satisfaction includes four dimensions: were recruited at the beginning. They were
information content, customized services, user divided into two groups, one of which viewed
interface, and system value. Satisfaction on HLA and SRI (Group I) and the other viewed
information content is measured by three HLA and BBA (Group II). Nine of them
questions adapted from Doll [3]: (1) whether the dropped out during the experiment,. So, we
system finds the news that the reader wants to had a total of 87 effective subjects, with 43 in
read, (2) whether the system filters out the news Group I and 44 in Group II.
that the reader does not want, and (3) whether Subjects in both groups were asked to
the system captures the right category of interest view HLA in the first three days and fill out a
to the reader. satisfaction questionnaire after the second day.
Satisfaction on customized services is On the fouth day, subjects in Group I viewed
measured by three questions adapted from the SRI and those in Group II viewed BBA. They
personalized service portion of SERVQUAL all filled out questionnaires again to indicate
[12]. They are: (1) whether the system their satisfaction with the experimental system.
provides personal attention, (2) whether the Due to the difference in the recommendation
system captures my interests, and (3) whether approach, subjects in Group I needed to indicate
the system provides customized services. their interests in the report on a 1-7 scale (7 to be
Satisfaction on user interface is measured by the most interesting) after each reading, while
four questions adapted from Doll [3]. They are the subjects in Group II did not have to do so.
(1) whether the system is easy to use, (2) In order to be close to the real world, the
whether the system is friendly, (3) whether the news adopted for the experiment was actual
interface is properly formatted, and (4) whether news provided by China Times
the presentation is clear. System value asks (www.chinatimes.com.tw). During the
about whether the system is useful and is quick experimental period (June 7 – June 10, 2000), all
to find interesting news. Table 3 summarizes news available on the website of China Times
the measured dimensions. Finally, a question is before 9:00 am were downloaded to the
designed to assess the overall satisfaction of the experimental system, organized based on
user on the system. All questions are on a different approaches, and then presented to the
7-point Likert scale with 1 being least agreed experimental subjects. The average number of
and 7 being most agreed. news per day was 255, distributed into 13
Table 3. Dimensions for Measuring System categories, with an average of 44 chosen as
Satisfaction headline news by the editors and put in the
Information Customized User System homepage in the HLA approach. Table 5
content service Interface value shows the distribution of reports in the 13
- Find the - Personal - Ease of use - Useful categories: headline news (HDL), politics,
wanted attention - Friendly - international, China, finance, stock, technology,
- Filter the - Capture - Right Quicker medical, entertainment, sports, art, and
unwanted interests format comments.
- Find right - - Clear The experimental procedures include the
category Customized presentation following:
service Day 1: the subject logged onto the website
to read the description of the experiment
The reliability test of the questionnaire (approximately 5 minutes), filled out personal
using Cronbach’s alpha shows that the data (5 minutes), learned the system (5 minutes),
instrument is generally acceptable because most and read the news (20 minutes) arranged by the
alpha values are higher than 0.6 (in Table 4). HDA approach.
Table 4. The Reliability Data Day 2: Logged onto the system, read the
Dimension Cronback α news for 20 minutes, and filled out the
Information content 0.7018 satisfaction questionnaire.
Customized services 0.7714 Day 3: Logged onto the system, read the
User interface 0.8861 news for 20 minutes.
System value 0.6792 Day 4: Logged onto the system, read the
news for 20 minutes that were arranged by the
SRI approach for Group I and by the BBA
5.4 Experimental Design and Procedures
approach for Group II, and then filled out the
satisfaction questionnaire.
Since collecting browsing data needs to
have consecutive uses of a website, the subjects
were asked to participate in the experiment for
four days. A total of 96 volunteered subjects
Table 5. Distribution of News in the Experiment

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 6


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

HDL Polit Social Int’l China Finance Stock Tech Med Ent. Sport Art Comm Total
6/7 41 12 23 13 13 45 12 8 10 14 15 10 14 230
6/8 45 9 24 7 16 49 36 13 8 13 21 10 14 265
6/9 45 11 22 22 16 48 33 11 10 17 24 9 11 279
6/10 44 11 21 19 16 33 22 9 9 14 21 11 14 244
Avg 44 11 23 15 15 44 26 10 9 15 20 10 13 245

5.5 Experimental Results

(1) Descriptive statistics


Major data collected from the experiment includes (1) distribution of reading interests (2)
motivation for reading from websites, (3) major news source, (4) reading behavior, and (5)
performance of the experiment. They are illustrated in Tables 6-8.
Table 6. Most Interesting News Category as Indicated by the Subject
Ent HDL Social Sport Polit Int’l Med China Finance Tech Stock Art Comm Total
No. 24 13 13 12 6 6 6 2 2 2 1 0 0 87
% 27.6 14.9 14.9 13.8 6.9 6.9 6.9 2.3 2.3 2.3 1.1 0 0 100

Table 7. Motivation for Using News Websites


Free news Love web Social interaction Time killing Total
No. 39 32 9 7 87
% 44.8 36.8 10.4 8 100

Table 8. Major News Sources


Web TV Print media Radio Total
No. 35 31 18 3 87
% 40.2 35.6 20.7 3.4 100

Table 6 shows that the most preferred category was entertainment news with a total of 27.6% of
the subjects chosing it. The least preferred news were art and comments. Table 7 shows that the
major reason for the subjects to use web news was because it was free. The second reason was their
love in using the web. Table 8 shows that the web was the most favorable news source for 40% of the
subjects, whereas TV was the next favorable one.
Descriptive data (mean and standard deviation) of the browsing behavior are shown in Table 9.
The system recorded the number of reports read by the subjects (NRR), number of news accepted by
the subjects (NRA), number of news shown on the homepage (NNS), system processing time (SPT),
selecting time (ST), reading time (RT), and time for rating the read news (TRN, for HDA and SRI).

Table 9. Browsing Statistics of the Subjects


HLA SRI BBA
Mean St. dev. Mean St. dev. Mean St. dev.
NRR 14.02 6.32 14.53 5.99 14.34 5.21
NRA 2.16 2.49 6.33 3.51 6.27 3.55
NNS 41 0 17.77 7.42 17.61 5.97
SPT 38.46 35.52 16.88 5.72 16.30 6.57
ST 213.51 89.03 188.04 66.67 176.66 61.05
RT 597.13 108.74 651.86 80.75 707.05 60.94
TRN 50.88 20.83 43.20 22.56 0 0

The statistics indicate that the HLA system presents 41 news titles to all subjects in the homepage,
whereas SRI and BBA present an average of 17.77 and 17.61 recommended news titles in the
homepage to each subject, respectively. Standard deviations among the subjects are 7.42 and 5.97 for
SRI and BBA. That is, the recommendation systems are more selective than the standard headline
news version. This allows the subject to spend more time on reading news (RT) and less time on
selecting news (ST). The objective performance indices calculated from the browsing behavior are
shown in Table 10.

Table 10. Performance Indices of the Three Systems

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 7


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

HLA SRI BBA


Mean St. dev. Mean St. dev. Mean St. dev,
AR .0532 .0609 .3943 .2340 .3787 .2051
NHR .1710 .2119 .4539 .2304 .4573 .2354
THR .2115 .2449 .4469 .2432 .4622 .2803
EUR .9007 .0486 .9332 .0270 .9818 .0073
ERR .7343 .1159 .7749 .0829 .8001 .0689

(2) Hypothesis Testing

Since all subjects used the HLA system before using SRI or BBA, the paired t-test is used to test
the performance difference between SRI and HLA, and SRI and BBA, but independent t-test is used to
test the difference between SRI and BBA.

H1: SRI and BBA perform equally well with HLA


The results of the paired t-test as shown in Tables 11 and 12 indicate that SRI outperforms HLA
significantly in all objective indices and user satisfaction. The results in Tables 13 and 14 indicate
that BBA also outperforms HLA significantly in most objective indices (ERR is near significant) and
user satisfaction (except the perceived value). Therefore, we can safely conclude that the
recommendation systems outperform the traditional headline approach. The null hypothesis is
rejected.
Table 11. Results of paired t-test on Performance for SRI and HLA (df=42)
Mean Difference t-value Significance
SRI HDL
AR 0.3943 0.0530 0.3416*** 8.820 0.000
NHR 0.4539 0.1710 0.2829*** 6.117 0.000
THR 0.4469 0.2115 0.2354*** 4.786 0.000
EUR 0.9332 0.9007 0.033*** 5.084 0.000
ERR 0.7749 0.7343 0.041** 2.451 0.018
Note: ** denotes p<0.05; *** denotes p<0.01

Table 12. Results of paired t-test on Satisfaction for SRI and HLA (df=42)
Mean Difference t-value Significance
SRI HDL
Content 5.8488 5.2558 0.5930*** 4.379 0.000
Customization 5.7558 4.6628 1.0930*** 7.474 0.000
Interface 5.6802 5.4244 0.2558*** 2.632 0.012
Value 5.9767 5.3488 0.6279*** 3.699 0.001
Overall 5.8605 5.2558 0.6047*** 6.800 0.000
Note: *** denotes p<0.01

Table 13. Results of paired t-test on Performance for BBA and HLA (df=43)
Mean Difference t-value Significance
BBA HDL
AR 0.3786 0.0690 0.3099*** 9.244 0.000
NHR 0.4573 0.2236 0.2337*** 3.939 0.000
THR 0.4622 0.2440 0.2183*** 3.173 0.003
EUR 0.9819 0.9729 0.009*** 5.15 0.000
ERR 0.8001 0.7711 0.029* 1.92 0.061
Note: * denotes p<0.10; *** denotes p<0.01

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 8


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Table 14. Results of paired t-test on Satisfaction for SRI and HLA (df=42)
Mean Difference t-value Significance
BBA HDL
Content 5.6250 5.3068 0.3182* 1.956 0.057
Customization 5.4091 4.5682 0.8409*** 4.650 0.000
Interface 5.5739 5.3125 0.2614** 2.168 0.036
Value 5.5455 5.4773 0.0678 0.380 0.706
Overall 5.7727 5.3409 0.4318*** 3.772 0.000
Note: * denotes p<0.10; *** denotes p<0.01

H2: SRI and BBA perform equally well


Tables 15 and 16 show the performance differences between SRI and BBA, which use two
different recommendation mechanisms. BBA performs better than SRI in the rate of effective use
(EUR), but worse than SRI in customization and perceived value. The overall satisfaction shows no
significant difference. Therefore, the null hypothesis cannot be denied. We conclude that both
recommendation mechanisms are equally good.

Table 15. Results of t-test on Performance for SRI and BBA (df=85)
Mean Difference t-value Significance
SRI BBA
AR 0.3943 0.3787 0.016 0.332 0.741
NHR 0.4539 0.4573 0.003 0.069 0.945
THR 0.4469 0.4622 0.0153 0.272 0.786
EUR 0.9332 0.9819 0.048*** 11.608 0.000
ERR 0.7749 0.8001 0.025 1.546 0.126
Note: * denotes p<0.10; *** denotes p<0.01
Table 16. Results of paired t-test on Satisfaction for SRI and HLA (df=42)
Mean Difference t-value Significance
SRI BBA
Content 5.8488 5.6250 0.2238 1.506 0.136
Customization 5.7558 5.4091 0.3467** 2.004 0.048
Interface 5.6802 5.5739 0.1064 0.633 0.529
Value 5.9767 5.5455 0.2933* 1.740 0.086
Overall 5.8605 5.7727 0.087 0.377 0.512
Note: * denotes p<0.10; ** denotes p<0.05; *** denotes p<0.01

(3) Effect of individual difference are not very selective.


The subjects whose major news source
In addition to the effect of recommendation being the print media have the highest rate of
mechanisms, we also examine the possible effect acceptance (mean=0.4983), which is
of individual difference. Two effects are found significantly higher than that of the TV group
to be significant in ANOVA tests: (1) motivation (mean=0.3231). This may be because web
for using web news has a significant impact on news is still presented in a format similar to print
overall satisfaction (F=5.227, p<0.01) and (2) media. Therefore, the print media group is
the preferred news source has a significant more accustomed to the web news.
impact on the acceptance rate of recommended
news (F=2.790, p<0.05). 6. Concluding Remarks
Readers with a motivation for more social
interaction (mean= 6.3333) are more satisfied In this paper, we have presented a
with the system than all other groups. Those time-based approach to analyzing user profile
with motivations of web loving (5.8438) and and its application to Internet news
free news (mean = 5.7436) are more satisfied recommendation services. Empirical results
than the time killing group (mean = 5.1429). show that the proposed approach performs as
These may be because members of the social well as the self-reported interests approach and
interaction group enjoy reading the news for significantly better than the headline news
communicating with their friends. They are approach.
more selective in what news they would like to References
read. The members of the time killing group
read the news for no major purpose and hence [1] Balabanovic, M. and Shoham, Y., “Fab:

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 9


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002

Content-based Collaborative 1991, pp. 29-32.


Recommendataion,” Communications of the
[15] Rowe, N.C., “Inferring Depictions in
ACM, 40:3, 1997, pp. 66-72.
Natural Language Captions for Efficient Access
[2] Blumler, J. G., "The Role of Theory in Uses to Picture Data,” Information Processing and
and Gratifications Studies," Communication Management, 29:4, 1994, pp. 453-461.
Research, Vol. 6, 1979, pp. 9-36.
[16] Rucker, J. and Polanco, M.J., “Siteseer:
[3] Doll, W.J. and Torkzadeh, G., “The Personalized Navigation for Web,”
Measurement of End-user Computing Communications of the ACM, 40:3, 1997, pp.
Satisfaction,” MIS Quarterly, 12:2, 1988, pp. 73-76.
913-923.
[17] Sakagami, H. and Kamba, T., “Learning
[4] Katz, E., Blumler, J.G., and Gurevitch, M., Personal Preferences on Online Newspaper
"Utilization of Mass Communication by Articles from User Behaviors,” Computer
Individual," in Blumler, J.G. and Katz, E.(Eds.), Networks and ISDN Systems, Vol. 29, 1997,
The Use of Communications, Sage, Beverly Hills, pp.1447-1455.
CA, 1974, pp. 19-32.
[18] Saracevic, T., and Kantor, P., "A Study in
[5] Konstan, J. A., Miller, B. N., Maltz, D., Information Seeking and Retrieving. II. User,
Herlocker, J. L., Gordon, L. R., and Riedl, J., Questions and Effectiveness," Journal of the
"GroupLens: Applying Collaborative Filtering to American Society for Information Science,
Usenet News," Communications of the ACM, Vol.39, No.3, 1998, pp. 176-195.
Vol. 40, No. 3, Mar-97, pp. 77-87.
[19] Spink, A. and Leatherbury, M., “Name
[6] Liang, T.P., "A Composite Approach to
Authority Files and Humanity Database
Inducing Knowledge for Expert Systems
Searching,” Online and CDROM Review, 18:3,
Design," Management Science, 38:1, pp. 1-17,
1994, pp. 143-147.
1992
[7] Lorrie, A., “The Electronic Newspaper of the
Future: Rationale, Design, and
Implications,”http://cecl.wustl.edu/~cs142/articl
e/MISC/PUBLISHING/electronic-newspaper-cr
anor, 1992.
[8] McQuail, D., Media Performance, London:
Sage Publications, 1992.
[9] Miller, G.A., “WorleNet: A Lexical Database
for English,” Communications of the ACM,”
38:11, 1995, pp. 39-41.
[10] Mock and Vemuri, "Information Filtering
Via Hill Climbing, Wordnet, and Index
Patterns," Information Processing &
Management, Vol. 33, No. 5, 1997, pp. 633-644.
[11] Muller, J. and Kamerer, D., “Reader
Preference for Electronic Newspaper,”
Newspaper Research Journal, 16:3, 1995.
[12] Parasuraman, A., Zeithaml, V.A. and Berry,
L.L., “SERVQUAL: A Multiple-item Scale for
Measuring Consumer Perceptions of Service
Quality,” Journal of Retailing, 64:1, 1988, pp.
12-40.
[13] Pazzani, M.J., " A Framework for
Collaborative, Content-Based and Demographic
Filtering," Artificical Intelligence Reivew,
http://www.ics.uci.edu/~pazzani/ Publications/
Publications.html.
[14] Rau, L., “Extracting Company Names from
Text,” Sixth IEEE Conference on Artificial
Intelligence Applications, Miama Beach, Florida,

0-7695-1435-9/02 $17.00 (c) 2002 IEEE 10


Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35’02)
0-7695-1435-9/02 $17.00 © 2002 IEEE
View publication stats

You might also like