Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

tamewild
Copy link

@tamewild tamewild commented Mar 28, 2025

It was mistakenly added here 4a66f8b

image

@Datta0
Copy link
Collaborator

Datta0 commented Apr 5, 2025

Hey @tamewild can you please add samples of before and after the change?

@mmathew23
Copy link
Collaborator

@danielhanchen @Datta0 This is valid. next(iter(dataset)) gives you the first example, and indexing on dataset_text_field give you the text to tokenize. We don't want the first character we want the whole string in order to check if it startswith the bos_token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants