Ranks are above one #20

poppingtonic · 2017-07-28T14:24:59Z

postgres=# CREATE TRIGGER tsvectorupdate                                                                                                                                                         
postgres-# BEFORE UPDATE OR INSERT ON test_rum                                                                                                                                                   
postgres-# FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');                                                                                               
postgres=# INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');                                                                                                                   
INSERT 0 1                                                                                                                                                                                       
postgres=# INSERT INTO test_rum(t) VALUES ('It is a beautiful');                                                                                                                                 
INSERT 0 1                                                                                                                                                                                       
postgres=# INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');                                                                                                                   
INSERT 0 1                                                                                                                                                                                       
postgres=# select * from test_rum ;                                                                                                                                                              
postgres=# select * from test_rum ;                                                                                                                                                              
                t                |                   a                                                                                                                                           
---------------------------------+----------------------------------------                                                                                                                       
 The situation is most beautiful | 'beauti':5 'situat':2                                                                                                                                         
 It is a beautiful               | 'beauti':4                                                                                                                                                    
 It looks like a beautiful place | 'beauti':5 'like':3 'look':2 'place':6                                                                                                                        
(3 rows)                                                                                                                                                                                         
postgres=# WITH tsq as (SELECT to_tsquery('english', 'place|beautiful') AS query) SELECT t, rank FROM (SELECT t, a <=> query AS rank FROM test_rum, tsq WHERE a @@ query = true) matches ORDER BY rank ASC LIMIT 10;
                t                |  rank   
---------------------------------+---------
 It looks like a beautiful place | 8.22467
 The situation is most beautiful | 16.4493
 It is a beautiful               | 16.4493
(3 rows)

The text was updated successfully, but these errors were encountered:

poppingtonic · 2017-07-28T14:30:58Z

Also, is there an explanation for why the last two items in the results have the same rank?

za-arthur · 2017-07-30T09:08:13Z

Hello,
What do you mean by the sentence "Ranks are above zero"? Do you expect that they all should be zero?

Last two items have the same rank because the word "beautiful" occurs only one time. The first item has higher rank (and lower distance) because the word "place" appears there. If you want to consider length of documents you may use the function rum_ts_distance, but in this case an index won't be used. For example:

=# WITH tsq as (SELECT to_tsquery('english', 'place|beautiful') AS query) SELECT t, rank FROM (SELECT t, rum_ts_distance(a, query, 1) AS rank FROM test_rum, tsq WHERE a @@ query = true) matches ORDER BY rank ASC LIMIT 10;
                t                |  rank   
---------------------------------+---------
 It is a beautiful               | 11.4018
 It looks like a beautiful place | 13.2371
 The situation is most beautiful | 18.0714
(3 rows)

poppingtonic · 2017-07-31T08:18:04Z

Sorry, that should have been 'one'. I was noting the difference between ranks in the initial exmaples in README.md and what I get when I run the examples. I'll test with rum_ts_distance and see if there's any improvement. Is there any expected usefulness in weighting (multiplying) the <=> operator's results with rum_ts_distance?

poppingtonic · 2017-07-31T08:35:39Z

Reading the code, I see in rum_ts_utils.c that there are three different implementations of the <=> operator, with different normalization methods. How do I configure the <=> operator to work with each, so I can test which is best for my use case?

Current version of RUM uses another function.

za-arthur · 2017-07-31T09:14:06Z

Oh, I got your point. I updated the README.md. Current version of rum uses other function for ranking, old version used ts_rank. They differ. rum_ts_distance return distance between a text and a query, less distance means more relevance.
Thank you for pointing at this mistake!

za-arthur · 2017-07-31T09:22:44Z

Reading the code, I see in rum_ts_utils.c that there are three different implementations of the <=> operator, with different normalization methods. How do I configure the <=> operator to work with each, so I can test which is best for my use case?

You can use rum_distance_query. For example:

=# CREATE TABLE test_rum( t text, a tsvector );
=# CREATE TRIGGER tsvectorupdate
 BEFORE UPDATE OR INSERT ON test_rum
 FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');
=# CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
=# \copy test_rum(t) from 'rum/data/rum.data'
=# explain analyze SELECT a <=> row(to_tsquery('pg_catalog.english', 'way & (go | half)'), 1)::rum_distance_query, *
FROM test_rum
WHERE a @@ to_tsquery('pg_catalog.english', 'way & (go | half)')
ORDER BY a <=> row(to_tsquery('pg_catalog.english', 'way & (go | half)'), 1)::rum_distance_query;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=20.02..20.03 rows=1 width=143) (actual time=0.166..0.168 rows=2 loops=1)
   Sort Key: ((a <=> ROW('''way'' & ( ''go'' | ''half'' )'::tsquery, 1)::rum_distance_query))
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using rumidx on test_rum  (cost=12.00..20.01 rows=1 width=143) (actual time=0.123..0.144 rows=2 loops=1)
         Index Cond: (a @@ '''way'' & ( ''go'' | ''half'' )'::tsquery)
 Planning time: 0.541 ms
 Execution time: 0.278 ms

But as you can see it doesn't use index for sorting, neither does rum_ts_distance. It is necessary to fix operator classes for it.

za-arthur added the question label Jul 30, 2017

poppingtonic changed the title ~~Ranks are above zero~~ Ranks are above one Jul 31, 2017

za-arthur added a commit that referenced this issue Jul 31, 2017

Issue #20. Fix README.md. That README show result of ts_rank() function.

10e2ea6

Current version of RUM uses another function.

za-arthur closed this as completed Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ranks are above one #20

Ranks are above one #20

poppingtonic commented Jul 28, 2017

poppingtonic commented Jul 28, 2017

za-arthur commented Jul 30, 2017

poppingtonic commented Jul 31, 2017 •

edited

Loading

poppingtonic commented Jul 31, 2017

za-arthur commented Jul 31, 2017

za-arthur commented Jul 31, 2017

Ranks are above one #20

Ranks are above one #20

Comments

poppingtonic commented Jul 28, 2017

poppingtonic commented Jul 28, 2017

za-arthur commented Jul 30, 2017

poppingtonic commented Jul 31, 2017 • edited Loading

poppingtonic commented Jul 31, 2017

za-arthur commented Jul 31, 2017

za-arthur commented Jul 31, 2017

poppingtonic commented Jul 31, 2017 •

edited

Loading