allow user to define unknown token symbol#10461
Conversation
| vocab = {invalid_key: invalid_label} | ||
| new_vocab = True | ||
| elif unknown_token: | ||
| new_vocab = True |
There was a problem hiding this comment.
Hi, there is situation where users has their own dictionary, say dict = {'a':1, 'b':2, 'c':3} 'abc' are frequent tokens the user care about. All the rest rare tokens are considered as unknown token (say the user define it as 'UNK'), that return a encoded list [[1,2,3],[2,3,0]], a key-value pair 'UNK': 0 is added into the dictionary.
But the previous version will raise error for this case, which by default assuming that user will provdie a thorough vocaburary.
There was a problem hiding this comment.
you should change the assertion to ignore cases where unkown_token is give instead of changing new_vocab to true
There was a problem hiding this comment.
Sorry for late reply (in exam). Just fix according to your suggestion.
|
@ShootingSpace thanks for adding a test |
test case added
test case added
test case added
Description
Add new feature for issue #10068. It allows unknown token to be added to vocab if user provides a vocabulary and specifies a symbol(e.g. 'UNK'). Along with new default behaviour as ignoring the unknown token, instead of the present way which throwing an error.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments