Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
110 views3 pages

Term Vector Model for Home Sales

This document describes calculating term vector models to rank documents for a given query. It shows the term frequency-inverse document frequency (TF-IDF) calculation for terms in 4 documents and 1 query. The documents are ranked based on the cosine similarity between the query vector and each document vector, with Document 3 having the highest similarity and being preferred.

Uploaded by

Bini Teflon Ankh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views3 pages

Term Vector Model for Home Sales

This document describes calculating term vector models to rank documents for a given query. It shows the term frequency-inverse document frequency (TF-IDF) calculation for terms in 4 documents and 1 query. The documents are ranked based on the cosine similarity between the query vector and each document vector, with Document 3 having the highest similarity and being preferred.

Uploaded by

Bini Teflon Ankh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Term Vector model Based on Wi=Tfi*Dfi

Query Q=Home sales


Doc 1 = New home to home sales forecasts
Doc 2 = Rise in home sales in July
Doc 3 = Home purchase rise in July for new homes
Doc 4 = Month of new sales rise
D=4
IDFi=log(D/Dfi)

Terms Counts TF Wi=TFi*


Q D1 D2 D3 D4 Dfi D/Dfi IDFi Q
homes 1 2 1 2 0 5 1.00 0 0
in 0 0 2 1 0 3 1.33 0.124939 0
rise 0 0 1 1 1 3 1.33 0.124939 0
sales 1 1 1 0 1 3 1.33 0.124939 0.124939
to 0 1 0 0 0 1 4.00 0.60206 0
for 0 0 0 1 0 1 4.00 0.60206 0
forecasts 0 1 0 0 0 1 4.00 0.60206 0
july 0 0 1 1 0 2 2.00 0.30103 0
month 0 0 0 0 1 1 4.00 0.60206 0
new 0 1 0 1 1 3 1.33 0.124939 0
of 0 0 0 0 1 1 4.00 0.60206 0
purchase 0 0 0 1 0 1 4.00 0.60206 0

Q D1 D2 D3 D4 D1i2 D2i2 D3i2 D4i2 Qi2


0 0 0 0 0 0 0 0 0 0
0 0 0.249877 0.124939 0 0 0.062439 0.01561 0 0
0 0 0.124939 0.124939 0.124939 0 0.01561 0.01561 0.01561 0
0.124939 0.124939 0.124939 0 0.124939 0.01561 0.01561 0 0.01561 0.01561
0 0.60206 0 0 0 0.362476 0 0 0 0
0 0 0 0.60206 0 0 0 0.362476 0 0
0 0.60206 0 0 0 0.362476 0 0 0 0
0 0 0.30103 0.30103 0 0 0.090619 0.090619 0 0
0 0 0 0 0.60206 0 0 0 0.362476 0
0 0.124939 0 0.124939 0.124939 0.01561 0 0.01561 0.01561 0
0 0 0 0 0.60206 0 0 0 0.362476 0
0 0 0 0.60206 0 0 0 0.362476 0 0
sum 0.756172 0.184277 0.862401 0.771782 0.01561
|D1| |D2| |D3| |D4| |Q|
sqrt 0.869581 0.429275 0.928655 0.878511 0.124939

dot products
Q D1 D2 D3 D4 Q*D1 Q*D2 Q*D3 Q*D4
0 0 0 0 0 0.01561 0.01561 0 0.01561
0 0 0.249877 0.124939 0
0 0 0.124939 0.124939 0.124939
0.124939 0.124939 0.124939 0 0.124939
0 0.60206 0 0 0
0 0 0 0.60206 0
0 0.60206 0 0 0
0 0 0.30103 0.30103 0
0 0 0 0 0.60206
0 0.124939 0 0.124939 0.124939
0 0 0 0 0.60206
0 0 0 0.60206 0

similarity score
Sim(q,d1) Sim(q,d2) Sim(q,d3) Sim(q,d4)
0.143677 0.291046 0 0.142216

Rank
D1 0.143677 1
D2 0.291046 3
D3 1.184642 4
D4 0.284433 2

Prefered document is Doc 3


Wi=TFi*IDFi
D1 D2 D3 D4
0 0 0 0
0 0.249877 0.124939 0
0 0.124939 0.124939 0.124939
0.124939 0.124939 0 0.124939
0.60206 0 0 0
0 0 0.60206 0
0.60206 0 0 0
0 0.30103 0.30103 0
0 0 0 0.60206
0.124939 0 0.124939 0.124939
0 0 0 0.60206
0 0 0.60206 0

|Q|*|D1| 0.108644 Sim(Q,D1) 0.143677


|Q|*|D2| 0.053633 Sim(Q,D2) 0.291046
|Q|*|D3| 0.116025 Sim(Q,D3) 0
|Q|*|D4| 0.10976 Sim(Q,D4) 0.142216

You might also like