Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lolipopshock
Copy link
Collaborator

When a paper contain a verbatim of some special tokens (e.g., [SEP] or [BLK]), the current code cannot appropriately handle them, after #29. One interesting example is that, as reported in #31, when parsing our own VILA paper, it will fail on page 2, where there are multiple occurrences of the [BLK] text in the paper. This PR proposes a simple fix -- by simply remove the square brackets [ and ] from the text.

@lolipopshock lolipopshock changed the title Replace special tokens to normal tokens before passing into models Replace special tokens to normal text before passing into models Nov 14, 2022
@lolipopshock lolipopshock merged commit 5c431e7 into main Nov 14, 2022
@lolipopshock lolipopshock mentioned this pull request Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants