Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Detect Toki Pona in non-latin text #3

@gregdan3

Description

@gregdan3

Currently, the library can only detect Toki Pona in latin-alphabet text and UCSUR; all text in other writing systems is considered to not be Toki Pona, even though it is perfectly reasonable to render Toki Pona in almost any writing system.

To do this as fully as my preferential config for Latin alphabet text, I would need the following per script:

  • List of words in the dictionary rendered in the target script (each Dictionary filter)
  • A regex which matches words rendered with appropriate syllables (Syllabic filter)
  • A list of all the characters in the language which may be used to render (Alphabetic filter)

While the alphabetic filter specifically would be relatively easy (even though it would be improperly named for, say, Japanese), the dictionary and syllabic filters would be challenging for languages which have multiple ways to write approximately the same sound in Toki Pona. For example, I was provided this list for Greek by jan Niwe (@Nerd1729 on Discord):

α = /a/
ε = αι = /e/
η = ι = υ = ει = οι = υι = /i/
γη = γι = γυ = γει = γοι = γυι = /j/
κ = /k/
λ = /l/
μ = /m/
ν = /n/
ο = ω = /o/
π = /p/
σ = /s/
τ = /t/
ου = ȣ = /u/
β = γου = /w/

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions