This is an excellent article. I have one question: how to handle tokens during the tokenization process that did not appear in the training set? Due to some extreme events in time series, it's likely that some tokens in the test set may not have appeared in the training set.