Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Wrong identification of text with mixed languages #76

@nicolabertoldi

Description

@nicolabertoldi

I noticed that Lingua identifies wrongly a text which includes portion of foreign words

This is an example of a Korean text, which includes the string "CA" which is not Korean (probably this represents the initials of a person)
( 웃음 ) CA : 실패하는군요 . 안타깝네요 .

This text is identified as "Romanian".
This is a bit strange since there are 3 Koreans tokens and only one (probably) not Korean.

Actually, also the punctuation marks should be "Korean".

Any idea why this wrong identification occurs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions