-
Notifications
You must be signed in to change notification settings - Fork 552
Fixed the ONNX encoder truncation bug: [SEP] at the end of long queries are truncated. #3037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks @lilyjge - can you link to the issues where we discussed this? |
|
I can't seem to find the issue but here is the PR: #2936 |
|
hi @FarmersWrap can you run the BEIR regressions here https://github.com/castorini/anserini/blob/master/docs/fatjar-regressions/fatjar-regressions-v1.3.0.md#beir Scores (for arguana at least) must have changed. Let's record the changes as part of the PR? |
|
@FarmersWrap and please add a test case? |
| | `robust04` | 0.4070 | 0.4070 | 0.4952 | 0.4435 | 0.4437 | | ||
| | `arguana` | 0.3970 | 0.4142 | 0.4845 | 0.6228 | 0.6228 | | ||
| | `arguana` | 0.3970 | 0.4142 | 0.4862 | 0.6375 | 0.6375 | | ||
| | `webis-touche2020` | 0.4422 | 0.3673 | 0.3086 | 0.2571 | 0.2571 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't change fatjar-regressions-v1.3.0.md.
Please create a new file called fatjar-regressions-v1.3.1-SNAPSHOT.md and change it there? This will get renamed when the next release goes out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, I haven't created a new one because I didn't check MS MARCO and bright.
For the simplicity to review, I changed fatjar-regressions-v1.3.0.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, please create fatjar-regressions-v1.3.1-SNAPSHOT.md anyway - it's only a SNAPSHOT, we'll check when 1.4 gets released, and rename fatjar-regressions-v1.3.1-SNAPSHOT.md to fatjar-regressions-v1.4.0.md
| // Tests the convertTokensToIds method with empty query. | ||
| @Test | ||
| public void testConvertTokensToIdsEmptyQuery() throws Exception { | ||
| try (io.anserini.encoder.dense.BgeBaseEn15Encoder encoder = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep on a single line - I'm okay with long lines.
This and below.
507112d to
db25aa7
Compare
| @@ -0,0 +1,471 @@ | |||
| # Anserini Fatjar Regresions (v1.3.0) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to anserini-1.3.1-SNAPSHOT.md to match what's in target/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave wget https://repo1.maven.org/maven2/io/anserini/anserini/1.3.0/anserini-1.3.0-fatjar.jar over there. Need to be changed in the future
This leads to bad encodings for long queries.
[CLS] LONG SENTENCE [SEP] -> [CLS] LONG
A fix for Reconcile BGE flat faiss and flat Lucene results for BEIR