0% found this document useful (0 votes)

7 views9 pages

Query Retried

The document discusses the participation of Beijing Institute of Technology in the TREC 2009 Entity Track, focusing on improving entity finding accuracy using QA list techniques and machine learning models. It outlines the system architecture, the related entity finding task, and the methods used for named entity identification and homepage finding. The results indicate the effectiveness of their approach in identifying related entities and their homepages, with evaluations based on various metrics.

Uploaded by

Corporation Mh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

Query Retried

Uploaded by

Corporation Mh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Experiments on Related Entity Finding Track at TREC 2009

Qing Yang ,Peng Jiang, Chunxia Zhang, Zhendong Niu

School of Computer, Beijing Institute of Technology

{ yangqing2005,jp, cxzhang, zniu}@bit.edu.cn

Abstract. Our goal in participating in the TREC 2009 Entity Track is to study
whether QA list technique can help improve accuracy of the entity finding task.
Also, we take a looking for homepage finding to identify homepages of an
entity by training a maximum entropy classifier and a logistic regression
models for three types of entity respectively.

1. Introduction
This is Beijing Institute of Technology’s first year participating in TREC. For related
entity finding track, we mainly focus on employing pipeline architecture to model this
track, indexing and retrieving by indri and making use of OpenNLP’s ME classifier to
identify extracted entities homepages.

2. Related Entity Finding Task

The related entity finding task is new to be proposed by NIST this year. This task is
defined as the following:
Given an input entity, by its name and homepage, the type of the target entity, as
well as the nature of their relation, described in free text, find related entities that are
of target type, standing in the required relation to the input entity.
This task shares similarities with both expert finding (in that we need to return not
“just” documents) and homepage finding (since entities are uniquely identified by
their homepage). However, approaches to address this task need to generalize to
multiple types of entities (beyond just people) and return the homepages of multiple
entities, not just one. Also, the topic defines a focal entity to which returned
homepages should be related. [1]

2.1. System Overview

We complete our experimental system architecture as pipeline architecture by

devising from OpenEphyra’s framework.
Form Approved
Report Documentation Page OMB No. 0704-0188

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.

1. REPORT DATE 3. DATES COVERED

2. REPORT TYPE
NOV 2009 00-00-2009 to 00-00-2009
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Experiments on Related Entity Finding Track at TREC 2009 5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION

REPORT NUMBER
Beijing Institute of Technology,School of Computer,Beijing, 100081,
China,
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORT

NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT

Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES
Proceedings of the Eighteenth Text REtrieval Conference (TREC 2009) held in Gaithersburg, Maryland,
November 17-20, 2009. The conference was co-sponsored by the National Institute of Standards and
Technology (NIST) the Defense Advanced Research Projects Agency (DARPA) and the Advanced
Research and Development Activity (ARDA).
14. ABSTRACT
see report
15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF
ABSTRACT OF PAGES RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE Same as 8
unclassified unclassified unclassified Report (SAR)

Standard Form 298 (Rev. 8-98)

Prescribed by ANSI Std Z39-18
Fig. 1. The Related Entity Finding System Architecture.

We outline the retrieval framework as above. From TREC-supplied query topics,

we first analyze the narrative of every query topic and extract keywords and terms.
Second we employ BagofWordsGenerator and QueryReformationGenerator to rewrite
query strings. Then we send query strings to the indri search engine, and get results.
The granularity of results is focused text snippet rather than document. From the
focused text snippets, we employ some OpenNLP components and Stanford’s parser
to identify target typed named entities. By counting number of occurrences in focused
text snippets at sentence level, we rank entities by the reverse order of occurrences.
We get the top 150 entities and post the top 150 entities’ name to indri search engine
respectively. Using Maximum entropy classifier to score the returned web pages as
the related entity’s homepage candidate, we rank the entity’s homepage candidates by
the scores in reverse order. Finally, we rerank those ranked entities by just filtering
out those entities which have no homepages whose scores are above threshold.

2.2. Query Topic Parsing

As for a QA, Ephyra spends much effort to analyze question syntactically and
semantically. To identify answer type, it employs machine learning scheme to train
answer patterns and identify answer types. This tricky phase is not necessary for the
related entity track because the target entity type is explicitly given. Also,
OpenEphyra employs wordnet to expand query terms. From the expanded query
string, it generates some irrelevant terms. Then we remove those irrelevant words
manually to avoid topic drift. As for the explicit entity name, we just add it to the
query string without expanding it. To take the first query topic for an example,
OpenEphyra will generate two query strings. One is focus words i.e. blackberry
Carriers makes phones, weighted score 1.0. The query to send to indri is as following:
#combine [passage100:50](blackberry Carriers makes phones).
The passage length and increment size are set 100 and 50 separately
experimentally. Variable lengths of window size will generate different results. It is
challenging to decide the reasoned window size. The passage length constructed a
context for locating related entities. It represents the proximity between the source
entity and the target related entity.
The other is expanded terms, i.e. blackberry (Carriers OR toter OR bearer) makes
(phones OR telephone OR “telephone set”). Obviously, toter and bearer are
synonymous with carrier in wordnet. But it not suitable for this query topic. We just
remove them manually. The converted query string is such as the following:
#combine [passage100:50](blackberry Carriers #or(phones telephone
“telephone set”)).
This query string has weighted score 1.5. The weight score is addresses as the
degree how the generated query string matches the narrative of the query topic. It can
be considered as a degree of proximity between target entity and input entity.

2.3. Named Entity Identification

In this task, the type of target entity is restricted in three types: person, organization
and product. Generally speaking, the first two types are easier to identify from
focused snippets. However, for the product type, it is rather difficult to be identified
correctly. To deal with this issue, we resort to wikipedia online knowledge database
whose pages always have a category label. We made a hardworking to find that those
introductions, productions, products, games, software, hardware etc. category labels
are almost classified into product type. It helps us to extract 43,393 product names.
Also, by using the same method, we extracted 18,181 organization names and
118,002 person names.
In this experimental system, we employ OpenEphyra’s NETagger to complete
named entity identification. OpenEphyra’s NETagger combines model-based, pattern-
based and list-based named entity taggers. It is natural for us to add a product list to
hope to improve product identification performance. As for the other easier to identify
named entity types resort to StanfordNeTagger.
As for StanfordNeTagger, we load ner-eng-ie.crf-3-all2006-distsim.ser.gz
serialized model which can label: PERSON, ORGANIZATION, and LOCATION
entities. The model is trained on data from CoNLL, MUC6, MUC7, and ACE.

2.4. Related Entity Candidates Ranking

It is well known that the search results ranking is not necessary responding to those
extracted entities ranking. We apply the following formula in a probabilistic model to
rank related entity candidates.
Er = αQscore + βRscore + γNredundancy
Q refers to Query String which represents the fitness of the generated query string
to express the nature of related entity with the source entity. R refers to the search
Result, which represents the relevancy of searched result (in this context its
granularity is passage) to the search string. N refers to Number of redundancy of the
focused snippets which reside the same entity. α , β , γ are the coefficient
respectively. In this experimental setting, it is simply to set all the coefficients 1.
Besides that, the result’s score are formalized to 1 if the result is in the top N. You
may notice that the effect of the result ranking is taken into consideration implicitly
for we extract entities from the result passages from top N. Those which are not in the
top N are ignored definitely.

2.5. ME-Based Entity Homepages Classifier

We model the entity homepage identifying as a binary-class classifying problem. Our

aim is to set the probability to represent the likelihood of one URL is a homepage
rather than to make a binary decision – yes or no. [4] proposed a machine learning
approaches to predict the correct homepage in response to a user’s homepage finding
query. He generates a binary decision tree to predict whether a URL is a homepage
URL or not. Obviously, it is more suitable to employ probability than binary decision
in this task. Table. 2.1 demonstrate those attributes.

Table. 2.1 Attributes vector

URL length the number of characters in the URL

URL depth the number of slashes in the UR
URL type four types of URL: root, subroot, path, file. he
type of URL which is proposed by
UTwente/TNO in TREC-2001 homepage
finding track
Entity in URL whether the specified entity name is present in
the URL
Variants of entity in URL manually devise many variants of the entity
name which are likely to be used by web
designers and decide whether one of those
variants exists in the URL
Position type the position type refers to the above defined
types of URL. The position type represents the
entity name or its variants exists which part in
the URL
Entity in page title whether entity name or its variants exist in
page’s title
Keyword whether page’s title contains with a keyword;
these keywords are “official”, “home”,
“homepage”
Length of title number of characters in title
Occurrence of entity in the number of entity name occurrence in the
title title
The ME model then is applied to the results returned by the mixture of context
language models retrieval system, in hopes that we can filter out most of the irrelevant
web pages in these returned webpage lists.
Additionally, we normalize URL by using BasicURLNormalizer which is extracted
from nutch-0.9 before we extract type of URL. As for variants of entity name, we
analyze homepage_en.nt from DBPedia and get the following rule to generate
variants of the specified entity name.

Table. 2.2 Variant form rules

1. Replace blank space with “_”,”+”,”%20|”,”” respectively

2. Replace “’s” with “s”, “”, respectively
3. Number of characters in Abbreviation is equal or greater then three
4. Concatenate the first character of every word in the specified entity
name including stopwords or excluding stopwords respectively.
5. For two words in the specified entity name, combine first three to
five characters of each word to generate abbreviations.

By using these variants of the specified entity name, it gets 99.9% in finding entity
names in their homepages. It represents that web designers will always naming their
homepage’ URL from related entity names.

2.6. Logistic Regression Model for Homepage Finding

As for ME classifier, it is not easy to interpret the generated model from training
materials. We leverage a logistic regression model for homepage finding also. To
compare the effective performance the two models, the result is explained in the
section 2.10.

2.7. Entity Homepage Finding

The procedure to identify the extracted entities’ homepages involves two phrases. In
the first phrase, we generated a Max entropy homepage classifying model to predict
the probability of a URL is the specified entity’s homepage. The second phrase is to
employ a mixture of context language models, which can easily be expressed in the
Indri query language to find which web page is most relevant to the specified entity.
For example, for the entity name “BlackBerry”, the following query will be
constructed:
#wsum(5.0 #1(BlackBerry).(title) 3.0 #1(BlackBerry).(anchor) 1.0
#1(BlackBerry))
After send the constructed query to Indri, we get so many homepage candidates.
Then, we employ beforehand generated homepage model to predict the probability of
whether a webpage is the specified entity name’s homepage or not. For the task is just
to return three URL as an entity name at most, we rank these homepage candidates
according to the predicted probability in descending order and select the top 3.

2.8. Related Entity Reranking

In this phrase, we have already extracted related entities and their homepages. We
take a simple approach to rerank the entity list by filtering whose homepages’
probability scores are all below 0.5 which represent that the entity has no homepage at
all. Naturally, by definition, every entity will have a homepage at least. Then the
entities which are considered as no homepages will be dropped off. Of course, this
decision depends on the precision of the homepage ME classifier and the coverage of
the used corpus.

2.9. Experimental Setup

We ran our index builds and our queries on an IBM 366 server and data are located on
SCSI disk made in RAID5. For conveniently handling we divide the index in six
sections, which occupy 649G disk space totally. Index size is the total size of the
index on disk including both the inverted file and compressed collection. All indexed
documents are 50,220,423. For the slow index speed, we did not index anchor text but
just title and heading fields. Documents are stemmed with the Krovetz stemmer and
stopped using a standard list of 421 common terms. The metadata indexed include
docno and url.
We used the full collection and simply handled all documents as HTML documents.
That is, we did not resort to any special treatment of document types, nor did we
exploit the internal document structure that may be present; instead, we represented
all documents as plain text. We did not take special consideration on those wikipedia
documents in the corpus.
Supporting documents When we extract context from a document we also store
its document id and the entity the context belongs too. We return up to 3 supporting
documents from this set for each entity.

2.10. Results and Discussions

The official test set contained only 20 queries. Given the facts that this is a new task, a
new collection, and it supplies a relatively small number of topics, evaluation will
primarily focus on analysis of the results and runs on a per-topic basis, rather than on
average measures. The normalized discounted cumulative gain (nDCG) is an
important measure in the official results. There are many methods to evaluate the
system. For example, we can evaluate the system by the relevance of the normal
homepages, Wikipedia homepages or names of entities. Table 1 lists the evaluation
results of our two runs(we only submit BITDLDE09Run)that are evaluated with
different output fields. NAME denotes the names of result entities, HP denotes the
normal homepages of result entities and WP denotes the Wikipedia pages of result
entities. The official results are the values in the first row. They evaluated the runs
according to the relevance of the primary homepages of entity and did not take the
Wikipedia homepages or entity names into consideration. In other rows, we combine
different output fields to test the change of performance. The results shows that the
performance of each run improves when consider Wikipedia homepages in evaluation.
The reason is that the Wikipedia pages are of high quality and make it easy for system
to find the homepage. On the other hand, using entity names to evaluate decreases the
performance. It shows that directly finding the entity names is more difficult than
returning the whole web pages.

Table 2.3: nDCG and P@10 results of our runs with different output fields in Entity track
RunID BITDLDE09Run LogisticRegressionRun
nDCG P@10 nDCG P@10
HP 0.0416 0.0200 0.0499 0.0400
HP+NAME 0.0379 0.0200 0.0471 0.0400
HP+WIKI 0.0705 0.1250 0.0895 0.1150
HP+NAME+WIKI 0.0731 0.1250 0.0879 0.1150

We make a statistic for retrieval performance without discard any homepage for
identified entities. It shows that the retrieval ratio is low (0.3024 for all relevant
documents and 0.0898 primary homepages). Obviously, it needs to improve query
formations to raise recall greatly. The low recall ratio makes a great effect for the low
performance for the next step for homepage finding.
In future work, there are a number of things for us to explore. First, we will explore
more efficient way to automatically construct queries to improve great recall. It may
base the assumption described in Clarke: query terms are likely to appear in close
proximity to each other within relevant documents. This technique is expressed by
Donald Metzler etc. in TREC 2004 in the terabyte track which is used to evaluate
Indri search engine. [2]Second is on homepage finding for a specified named entity.
[3] found a prior based on the type of the URL to be a very effective source of
information. Our Max Entropy Classifier will take the different distribution for
different type entities into consideration. Third, we will employ language model to
construct entity model from the selected homepage candidates to improve recall.

Acknowledgments. This work is supported by the grant from Chinese National

Natural Science Foundation (No: 60705022).

3. References

[1] http://ilps.science.uva.nl/trec-entity/guidelines/
[2] Donald Metzler, T. S., Howard Turtle, W. Bruce Croft (2004). "Indri at TREC 2004:
Terabyte Track."
[3] W. Kraaij, T. Westerveld, and D. Hiemstra. The importance of prior probabilities for
entry page search. In Proceedings of the Twenty-Fifth Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR’02), 2002.

1115 Autoregressive Entity Retrieva
No ratings yet
1115 Autoregressive Entity Retrieva
20 pages
Crawling Deep Web Entity Pages: Yeye He Heyeye@cs - Wisc.edu Dong Xin Venkatesh Ganti Sriram Rajaraman Nirav Shah
No ratings yet
Crawling Deep Web Entity Pages: Yeye He Heyeye@cs - Wisc.edu Dong Xin Venkatesh Ganti Sriram Rajaraman Nirav Shah
10 pages
Learn Able Crawler
No ratings yet
Learn Able Crawler
6 pages
Neural Entity Linking On Technical Service Tickets: 1 Nadja Kurz 2 Felix Hamann 3 Adrian Ulges
No ratings yet
Neural Entity Linking On Technical Service Tickets: 1 Nadja Kurz 2 Felix Hamann 3 Adrian Ulges
6 pages
Chenet MA EEMCS
No ratings yet
Chenet MA EEMCS
57 pages
Semantic Crawling: An Approach Based On Named Entity Recognition
No ratings yet
Semantic Crawling: An Approach Based On Named Entity Recognition
5 pages
Leveraging Lightweight Semantics For Search Improv
No ratings yet
Leveraging Lightweight Semantics For Search Improv
3 pages
Semantic Web for CS Students
No ratings yet
Semantic Web for CS Students
23 pages
Assignment #1 Text Retrieval & Search Engine
No ratings yet
Assignment #1 Text Retrieval & Search Engine
6 pages
Sigir2013 Tutorial
No ratings yet
Sigir2013 Tutorial
1 page
IJARCCE 67 Project Research Paper
No ratings yet
IJARCCE 67 Project Research Paper
3 pages
Advanced Search Techniques Guide
No ratings yet
Advanced Search Techniques Guide
16 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Using Knowledge Graphs To Explain Entity Co-Occurrence in
No ratings yet
Using Knowledge Graphs To Explain Entity Co-Occurrence in
4 pages
Icwet 1094
No ratings yet
Icwet 1094
6 pages
Ner X LSTM
No ratings yet
Ner X LSTM
6 pages
O12 1027
No ratings yet
O12 1027
15 pages
Answering Entity-Seeking Queries
No ratings yet
Answering Entity-Seeking Queries
5 pages
Bootstrapping A Natural Language Interface To A Cyber Security Event Collection System Using A Hybrid Translation Approach
No ratings yet
Bootstrapping A Natural Language Interface To A Cyber Security Event Collection System Using A Hybrid Translation Approach
8 pages
Semantic Web for Data Integration
No ratings yet
Semantic Web for Data Integration
184 pages
Java NLP Techniques Guide
No ratings yet
Java NLP Techniques Guide
51 pages
Relevancy Based Content Search in Semantic Web
No ratings yet
Relevancy Based Content Search in Semantic Web
2 pages
A Keyword Focused Web Crawler Using Domain Engineering and Ontology
No ratings yet
A Keyword Focused Web Crawler Using Domain Engineering and Ontology
3 pages
Did It Make The News?
No ratings yet
Did It Make The News?
6 pages
Luke Ontology Based
No ratings yet
Luke Ontology Based
8 pages
Creating Ontologies From Web Documents
No ratings yet
Creating Ontologies From Web Documents
8 pages
Keyw Word Quer Ry Based D Focused Dwebc Rawler: Sciencedirect
No ratings yet
Keyw Word Quer Ry Based D Focused Dwebc Rawler: Sciencedirect
7 pages
2021 Findings-Emnlp 7
No ratings yet
2021 Findings-Emnlp 7
5 pages
Paper 1
No ratings yet
Paper 1
8 pages
Mining Data Records Based On Ontology Evolution For Deep Web
No ratings yet
Mining Data Records Based On Ontology Evolution For Deep Web
4 pages
A Wiki For Business Rules in Open Vocabulary Executable English
No ratings yet
A Wiki For Business Rules in Open Vocabulary Executable English
9 pages
BERT Model
No ratings yet
BERT Model
69 pages
Web People Search Using Ontology Based Decision Tree
No ratings yet
Web People Search Using Ontology Based Decision Tree
8 pages
AI Report Ver1
No ratings yet
AI Report Ver1
20 pages
LLM-Powered Natural Language Text Processing For O
No ratings yet
LLM-Powered Natural Language Text Processing For O
14 pages
Hidden Web Search Engine Survey
No ratings yet
Hidden Web Search Engine Survey
22 pages
Semantic Information Retrieval Based On Domain Ontology
No ratings yet
Semantic Information Retrieval Based On Domain Ontology
4 pages
Pure Entity Based Document
No ratings yet
Pure Entity Based Document
28 pages
Named Entity Recognition
No ratings yet
Named Entity Recognition
120 pages
Luận Văn Towards a Framework for Building an Annotated Named Entities Corpus
No ratings yet
Luận Văn Towards a Framework for Building an Annotated Named Entities Corpus
4 pages
16 Vol 2 No 2
No ratings yet
16 Vol 2 No 2
4 pages
Semantic Information Extraction in University Domain
No ratings yet
Semantic Information Extraction in University Domain
15 pages
Natural Language Processing in Investigative Journalism
No ratings yet
Natural Language Processing in Investigative Journalism
53 pages
Topic Distillation Via Subsite Retrieval
No ratings yet
Topic Distillation Via Subsite Retrieval
15 pages
Candidate Link Generation Using Semantic Phermone Swarm
No ratings yet
Candidate Link Generation Using Semantic Phermone Swarm
12 pages
Citation Data-Set For Machine Learning Citation ST
No ratings yet
Citation Data-Set For Machine Learning Citation ST
56 pages
Natural Language Search Interface
No ratings yet
Natural Language Search Interface
3 pages
Knowledge Discovery
No ratings yet
Knowledge Discovery
11 pages
Entity Linking For English and Other Languages: A Survey
No ratings yet
Entity Linking For English and Other Languages: A Survey
52 pages
Thesis On Named Entity Recognition
100% (3)
Thesis On Named Entity Recognition
5 pages
The TREC 2006 Terabyte Track
No ratings yet
The TREC 2006 Terabyte Track
14 pages
Query Based Expert Search Based On Relevance Class and Web Page Quality Ranking
No ratings yet
Query Based Expert Search Based On Relevance Class and Web Page Quality Ranking
7 pages
Recent Survey On Automatic Ontology Learning
No ratings yet
Recent Survey On Automatic Ontology Learning
5 pages
Annotating Search Results
No ratings yet
Annotating Search Results
14 pages
NLP Mini Project
No ratings yet
NLP Mini Project
19 pages
Ieee Paper PDF
No ratings yet
Ieee Paper PDF
14 pages
SW Mids
No ratings yet
SW Mids
5 pages
Task Variations
No ratings yet
Task Variations
25 pages
Review Grade 5 To 6 Arrange The Sentences
No ratings yet
Review Grade 5 To 6 Arrange The Sentences
7 pages
Comparative and Superlative Adjectives Guide
No ratings yet
Comparative and Superlative Adjectives Guide
8 pages
35 Đề Ôn Luyện Thi Vào 6 Môn Tiếng Anh - 2019-2020 A1 - De - 11-35
No ratings yet
35 Đề Ôn Luyện Thi Vào 6 Môn Tiếng Anh - 2019-2020 A1 - De - 11-35
89 pages
English Sentence Rewriting
No ratings yet
English Sentence Rewriting
8 pages
35 Đề Ôn Luyện Thi Vào 6 Môn Tiếng Anh - De1-10
No ratings yet
35 Đề Ôn Luyện Thi Vào 6 Môn Tiếng Anh - De1-10
23 pages
De in Phuc Yen 20212022
No ratings yet
De in Phuc Yen 20212022
5 pages
De Thi Vao Lop 6 Lap Thach - 20232024 - in
No ratings yet
De Thi Vao Lop 6 Lap Thach - 20232024 - in
3 pages
De Lapthach 20202021
No ratings yet
De Lapthach 20202021
1 page
Dokumen - Tips - Registered Trademark of Basf Se Magnafloc Magnafloc 155 Is A High Molecular Weight
No ratings yet
Dokumen - Tips - Registered Trademark of Basf Se Magnafloc Magnafloc 155 Is A High Molecular Weight
2 pages
Circles The Final Steps (MCQ'S) Ws
No ratings yet
Circles The Final Steps (MCQ'S) Ws
9 pages
Parle Products List
100% (3)
Parle Products List
5 pages
Blog Hubspot Com Marketing Team Structure Diagrams
No ratings yet
Blog Hubspot Com Marketing Team Structure Diagrams
13 pages
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
No ratings yet
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
25 pages
Effect of Niobium On The As-Cast Microstructure of Hypereutectic High Chromium Cast Iron
No ratings yet
Effect of Niobium On The As-Cast Microstructure of Hypereutectic High Chromium Cast Iron
4 pages
Hyundai Engine HMC l4kb9 Shop Manual
100% (64)
Hyundai Engine HMC l4kb9 Shop Manual
10 pages
Keypad Control For Multiple Appliances: S. Ramasamy R.G.Thiagaraj Kumar
No ratings yet
Keypad Control For Multiple Appliances: S. Ramasamy R.G.Thiagaraj Kumar
2 pages
Comprehensive Guide to GA Crossover Techniques
No ratings yet
Comprehensive Guide to GA Crossover Techniques
65 pages
Math 8 Q1 Week 2.2
No ratings yet
Math 8 Q1 Week 2.2
6 pages
Human Resource Management System (HRMS) : Department of Personnel
No ratings yet
Human Resource Management System (HRMS) : Department of Personnel
19 pages
Risk Assessment For General Activities
75% (4)
Risk Assessment For General Activities
25 pages
Form 4 pH Determination Guide
0% (1)
Form 4 pH Determination Guide
3 pages
BLF24 T ST en GB
No ratings yet
BLF24 T ST en GB
4 pages
Universal Shipbuilding Corporation: Single Loop Electro-Hydraulic Steering Gear S.No.038 TYPE
No ratings yet
Universal Shipbuilding Corporation: Single Loop Electro-Hydraulic Steering Gear S.No.038 TYPE
125 pages
The Present Continuous
No ratings yet
The Present Continuous
4 pages
LU-1500N Series: LU-1508NS LU-1508NH LU-1510N LU-1510N-7 LU-1509NS LU-1509NH LU-1511N-7
No ratings yet
LU-1500N Series: LU-1508NS LU-1508NH LU-1510N LU-1510N-7 LU-1509NS LU-1509NH LU-1511N-7
5 pages
9 - Class INTSO Work Sheet - 3 - Basic Concepts of Geometry
No ratings yet
9 - Class INTSO Work Sheet - 3 - Basic Concepts of Geometry
8 pages
Document 1
No ratings yet
Document 1
4 pages
2014 E400 W212 Relay & Fuse Guide
No ratings yet
2014 E400 W212 Relay & Fuse Guide
15 pages
GoAnywhere System Architecture Guide
No ratings yet
GoAnywhere System Architecture Guide
29 pages
Soil Variability and Its Consequences in Geotechnical Engineering
No ratings yet
Soil Variability and Its Consequences in Geotechnical Engineering
302 pages
Daily Lesson Log of M8Al-Ib-2 (Week 2 Day 3) : Can The Difference of Two Squares Be Applicable To 3 - 12 If No, Why?
No ratings yet
Daily Lesson Log of M8Al-Ib-2 (Week 2 Day 3) : Can The Difference of Two Squares Be Applicable To 3 - 12 If No, Why?
4 pages
Camay Relaunch in Pakistan
100% (1)
Camay Relaunch in Pakistan
26 pages
Tilting Vice PDF
No ratings yet
Tilting Vice PDF
33 pages
8500W Installation-Manual
100% (1)
8500W Installation-Manual
21 pages
Accounting for Financial Liabilities
100% (1)
Accounting for Financial Liabilities
71 pages
Ielts5 - Santiago Suarez
No ratings yet
Ielts5 - Santiago Suarez
1 page
Random Vibration Fatigue Analysis of Car Roof Luggage Carrier - Gulsevincler 2021
No ratings yet
Random Vibration Fatigue Analysis of Car Roof Luggage Carrier - Gulsevincler 2021
12 pages
Mathematical and Physical Formulas
No ratings yet
Mathematical and Physical Formulas
10 pages

Query Retried

Uploaded by

Query Retried

Uploaded by

Experiments on Related Entity Finding Track at TREC 2009

Qing Yang ,Peng Jiang, Chunxia Zhang, Zhendong Niu

School of Computer, Beijing Institute of Technology

2. Related Entity Finding Task

2.1. System Overview

We complete our experimental system architecture as pipeline architecture by

1. REPORT DATE 3. DATES COVERED

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION

11. SPONSOR/MONITOR’S REPORT

12. DISTRIBUTION/AVAILABILITY STATEMENT

Standard Form 298 (Rev. 8-98)

We outline the retrieval framework as above. From TREC-supplied query topics,

2.2. Query Topic Parsing

2.3. Named Entity Identification

2.4. Related Entity Candidates Ranking

2.5. ME-Based Entity Homepages Classifier

We model the entity homepage identifying as a binary-class classifying problem. Our

Table. 2.1 Attributes vector

URL length the number of characters in the URL

Table. 2.2 Variant form rules

1. Replace blank space with “_”,”+”,”%20|”,”” respectively

2.6. Logistic Regression Model for Homepage Finding

2.7. Entity Homepage Finding

2.8. Related Entity Reranking

2.9. Experimental Setup

2.10. Results and Discussions

Acknowledgments. This work is supported by the grant from Chinese National

You might also like