Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Crash on particular emoji with detect_multiple_languagesΒ #203

@PalmerAL

Description

@PalmerAL

Hi, thanks for writing this library, it's really useful!

I'm seeing a crash with particular emoji input on the latest version installed from PyPI, here's a testcase:

from lingua import Language, LanguageDetectorBuilder
langdetector = LanguageDetectorBuilder.from_all_languages().build()

langdetector.detect_multiple_languages_of('test πŸ™ˆ')
thread '<unnamed>' panicked at 'byte index 6 is not a char boundary; it is inside 'πŸ™ˆ' (bytes 5..9) of `test πŸ™ˆ`', src/lib.rs:436:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "[...]/crash_repro.py", line 4, in <module>
    langdetector.detect_multiple_languages_of('test πŸ™ˆ')
pyo3_runtime.PanicException: byte index 6 is not a char boundary; it is inside 'πŸ™ˆ' (bytes 5..9) of `test πŸ™ˆ`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions