Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

wojiaodoubao
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 1, 2025
@wojiaodoubao
Copy link
Contributor Author

Hi @jackye1995 , could you help review this document update when you have time, thanks very much!


The full text search index supports multiple tokenizer types for different text processing needs:
The full text search index supports multiple tokenizer types for different text processing needs.
There are two different tokenizer configurations: ```lance_tokenizer``` and ```base_tokenizer```.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to do triple backquotes, single backquotes are fine.


#### Text Tokenizer
Text Tokenizer is responsible for handling TEXT-type data, which is Utf8, LargeUtf8 or List of them in arrow format.
The Text Tokenizer behaves consistently in both "query" and "document parsing" scenarios, which means that if a document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need double quotes for "query" and "document parsing"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also same comment to a few other cases below

#### Text Tokenizer
Text Tokenizer is responsible for handling TEXT-type data, which is Utf8, LargeUtf8 or List of them in arrow format.
The Text Tokenizer behaves consistently in both "query" and "document parsing" scenarios, which means that if a document
contains the word "lance," we can retrieve it using a query with "lance."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lance", and "lance".?

age,number,30
address.city,str,San
address.city,str,Francisco
address.zip,number,94102
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should define and handle what if the document path contains . or : for parsing and querying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks your nice suggestion! I have updated the example, adding . and : to json text with the corresponding triplets. I also added a unit test to . and :.

@github-actions github-actions bot added the python label Oct 3, 2025
@wojiaodoubao wojiaodoubao force-pushed the fts-json-doc branch 2 times, most recently from e1bbd02 to 9e66d90 Compare October 3, 2025 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants