Multi-sentence scorer implementation for `are_toki_pona`

NOTE: If you're reading this looking to score every sentence in a message rather than an entire message, but your use case doesn't actually require sentences, checking the entire message with `is_toki_pona` is guaranteed to be more accurate!

I don't have any scorers for all the sentences in a message- only for individual sentences, which I leave it largely up to the user to determine.

I do provide the following example scorers:

```py
        def all_must_pass(message: str) -> bool:
            return all(ILO.are_toki_pona(message))

        def portion_must_pass(message: str, score: Number = 0.8) -> bool:
            results = ILO.are_toki_pona(message)
            sent_count = len(results)
            passing = results.count(True)
            return (passing / sent_count) >= score
```

But these implementations are pretty weak. The first one is a no-go, because it would fail an entire message for ending with a string like `":D"`- the sentence tokenizer doesn't consider emoticons, so this would become at least two sentences, where the second is a single token `"D"` that would fail on its own. 
The second implementation is less severe, but still sensitive to the output of the sentence tokenizer, and sensitive to short messages. It would fail for, say, `"ni li musi :D"`, because this input is 50% toki pona and 50% not for the same reason as above.

It may become necessary to build a particularly lenient meta-scorer, or to be selective about what kinds of sentences are considered in the first place- for example, it may be reasonable to ignore sentences of length 1 in the sentence meta-scorer (except where there is only one sentence), since these would almost certainly be artifacts of the sentence tokenizer. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multi-sentence scorer implementation for `are_toki_pona` #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Multi-sentence scorer implementation for are_toki_pona #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Multi-sentence scorer implementation for `are_toki_pona` #2