Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@FarmersWrap
Copy link
Contributor

@FarmersWrap FarmersWrap commented Nov 17, 2025

This leads to bad encodings for long queries.
[CLS] LONG SENTENCE [SEP] -> [CLS] LONG

A fix for Reconcile BGE flat faiss and flat Lucene results for BEIR

@FarmersWrap FarmersWrap changed the title fixed the ONNX encoder truncation bug: [SEP] at the end of long queries are truncated. Fixed the ONNX encoder truncation bug: [SEP] at the end of long queries are truncated. Nov 17, 2025
@lintool
Copy link
Member

lintool commented Nov 17, 2025

@lilyjge @clides I recall you having grappled with a related issue before?

@lilyjge
Copy link
Member

lilyjge commented Nov 17, 2025

@lintool
Copy link
Member

lintool commented Nov 17, 2025

Thanks @lilyjge - can you link to the issues where we discussed this?

@lilyjge
Copy link
Member

lilyjge commented Nov 17, 2025

I can't seem to find the issue but here is the PR: #2936

@lintool
Copy link
Member

lintool commented Nov 17, 2025

hi @FarmersWrap can you run the BEIR regressions here https://github.com/castorini/anserini/blob/master/docs/fatjar-regressions/fatjar-regressions-v1.3.0.md#beir

Scores (for arguana at least) must have changed. Let's record the changes as part of the PR?

@lintool
Copy link
Member

lintool commented Nov 17, 2025

@FarmersWrap and please add a test case?

| `robust04` | 0.4070 | 0.4070 | 0.4952 | 0.4435 | 0.4437 |
| `arguana` | 0.3970 | 0.4142 | 0.4845 | 0.6228 | 0.6228 |
| `arguana` | 0.3970 | 0.4142 | 0.4862 | 0.6375 | 0.6375 |
| `webis-touche2020` | 0.4422 | 0.3673 | 0.3086 | 0.2571 | 0.2571 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't change fatjar-regressions-v1.3.0.md‎.

Please create a new file called fatjar-regressions-v1.3.1-SNAPSHOT.md‎ and change it there? This will get renamed when the next release goes out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I haven't created a new one because I didn't check MS MARCO and bright.
For the simplicity to review, I changed fatjar-regressions-v1.3.0.md.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, please create fatjar-regressions-v1.3.1-SNAPSHOT.md‎ anyway - it's only a SNAPSHOT, we'll check when 1.4 gets released, and rename fatjar-regressions-v1.3.1-SNAPSHOT.md‎ to fatjar-regressions-v1.4.0.md‎

// Tests the convertTokensToIds method with empty query.
@Test
public void testConvertTokensToIdsEmptyQuery() throws Exception {
try (io.anserini.encoder.dense.BgeBaseEn15Encoder encoder =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep on a single line - I'm okay with long lines.

This and below.

@@ -0,0 +1,471 @@
# Anserini Fatjar Regresions (v1.3.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to anserini-1.3.1-SNAPSHOT.md to match what's in target/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave wget https://repo1.maven.org/maven2/io/anserini/anserini/1.3.0/anserini-1.3.0-fatjar.jar over there. Need to be changed in the future

@FarmersWrap FarmersWrap marked this pull request as ready for review November 19, 2025 20:01
@lintool lintool merged commit a992c62 into castorini:master Nov 19, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants