Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Ranks are above one #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
poppingtonic opened this issue Jul 28, 2017 · 6 comments
Closed

Ranks are above one #20

poppingtonic opened this issue Jul 28, 2017 · 6 comments
Labels

Comments

@poppingtonic
Copy link

postgres=# CREATE TRIGGER tsvectorupdate                                                                                                                                                         
postgres-# BEFORE UPDATE OR INSERT ON test_rum                                                                                                                                                   
postgres-# FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');                                                                                               
postgres=# INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');                                                                                                                   
INSERT 0 1                                                                                                                                                                                       
postgres=# INSERT INTO test_rum(t) VALUES ('It is a beautiful');                                                                                                                                 
INSERT 0 1                                                                                                                                                                                       
postgres=# INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');                                                                                                                   
INSERT 0 1                                                                                                                                                                                       
postgres=# select * from test_rum ;                                                                                                                                                              
postgres=# select * from test_rum ;                                                                                                                                                              
                t                |                   a                                                                                                                                           
---------------------------------+----------------------------------------                                                                                                                       
 The situation is most beautiful | 'beauti':5 'situat':2                                                                                                                                         
 It is a beautiful               | 'beauti':4                                                                                                                                                    
 It looks like a beautiful place | 'beauti':5 'like':3 'look':2 'place':6                                                                                                                        
(3 rows)                                                                                                                                                                                         
postgres=# WITH tsq as (SELECT to_tsquery('english', 'place|beautiful') AS query) SELECT t, rank FROM (SELECT t, a <=> query AS rank FROM test_rum, tsq WHERE a @@ query = true) matches ORDER BY rank ASC LIMIT 10;
                t                |  rank   
---------------------------------+---------
 It looks like a beautiful place | 8.22467
 The situation is most beautiful | 16.4493
 It is a beautiful               | 16.4493
(3 rows)

@poppingtonic
Copy link
Author

Also, is there an explanation for why the last two items in the results have the same rank?

@za-arthur
Copy link
Contributor

Hello,
What do you mean by the sentence "Ranks are above zero"? Do you expect that they all should be zero?

Last two items have the same rank because the word "beautiful" occurs only one time. The first item has higher rank (and lower distance) because the word "place" appears there. If you want to consider length of documents you may use the function rum_ts_distance, but in this case an index won't be used. For example:

=# WITH tsq as (SELECT to_tsquery('english', 'place|beautiful') AS query) SELECT t, rank FROM (SELECT t, rum_ts_distance(a, query, 1) AS rank FROM test_rum, tsq WHERE a @@ query = true) matches ORDER BY rank ASC LIMIT 10;
                t                |  rank   
---------------------------------+---------
 It is a beautiful               | 11.4018
 It looks like a beautiful place | 13.2371
 The situation is most beautiful | 18.0714
(3 rows)

@poppingtonic poppingtonic changed the title Ranks are above zero Ranks are above one Jul 31, 2017
@poppingtonic
Copy link
Author

poppingtonic commented Jul 31, 2017

Sorry, that should have been 'one'. I was noting the difference between ranks in the initial exmaples in README.md and what I get when I run the examples. I'll test with rum_ts_distance and see if there's any improvement. Is there any expected usefulness in weighting (multiplying) the <=> operator's results with rum_ts_distance?

@poppingtonic
Copy link
Author

Reading the code, I see in rum_ts_utils.c that there are three different implementations of the <=> operator, with different normalization methods. How do I configure the <=> operator to work with each, so I can test which is best for my use case?

za-arthur added a commit that referenced this issue Jul 31, 2017
Current version of RUM uses another function.
@za-arthur
Copy link
Contributor

Oh, I got your point. I updated the README.md. Current version of rum uses other function for ranking, old version used ts_rank. They differ. rum_ts_distance return distance between a text and a query, less distance means more relevance.
Thank you for pointing at this mistake!

@za-arthur
Copy link
Contributor

Reading the code, I see in rum_ts_utils.c that there are three different implementations of the <=> operator, with different normalization methods. How do I configure the <=> operator to work with each, so I can test which is best for my use case?

You can use rum_distance_query. For example:

=# CREATE TABLE test_rum( t text, a tsvector );
=# CREATE TRIGGER tsvectorupdate
 BEFORE UPDATE OR INSERT ON test_rum
 FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');
=# CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
=# \copy test_rum(t) from 'rum/data/rum.data'
=# explain analyze SELECT a <=> row(to_tsquery('pg_catalog.english', 'way & (go | half)'), 1)::rum_distance_query, *
FROM test_rum
WHERE a @@ to_tsquery('pg_catalog.english', 'way & (go | half)')
ORDER BY a <=> row(to_tsquery('pg_catalog.english', 'way & (go | half)'), 1)::rum_distance_query;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=20.02..20.03 rows=1 width=143) (actual time=0.166..0.168 rows=2 loops=1)
   Sort Key: ((a <=> ROW('''way'' & ( ''go'' | ''half'' )'::tsquery, 1)::rum_distance_query))
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using rumidx on test_rum  (cost=12.00..20.01 rows=1 width=143) (actual time=0.123..0.144 rows=2 loops=1)
         Index Cond: (a @@ '''way'' & ( ''go'' | ''half'' )'::tsquery)
 Planning time: 0.541 ms
 Execution time: 0.278 ms

But as you can see it doesn't use index for sorting, neither does rum_ts_distance. It is necessary to fix operator classes for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants