Simply simplify language understandability score only

Inspired by https://github.com/machinelearningZH/zix_understandability-index by the Statistisches Amt Kanton Zürich. Many thanks for it.

I just wanted the understandability score, so I stripped everything else out and made a simple API for it.

It works on German text only - preferably Swiss High German.

Usage

Run the app locally

Create a Conda environment: conda create -n simplify python=3.12
Activate environment: conda activate simplify
Clone this repo.
Change into the project directory: cd understandability_score/
Install packages: pip install -r requirements.txt
pip install git+https://github.com/machinelearningZH/zix_understandability-index
Install Spacy language model: python -m spacy download de_core_news_sm
Start app: uvicorn main:app --port 8005 --reload

Run with docker

docker build -t simply-understandability-score .
docker run -p 8005:8005 simply-understandability-score

Call the API

curl --location 'localhost:8005/zix' \
--header 'Content-Type: application/json' \
--data '{"text":"Die Abteilung «Data» ist zum einen Anlaufstelle für Personen, die Daten zum Kanton Zürich und seinen Regionen nutzen wollen. Sie berät Nutzende und fördert das Wissen rund um Daten. Zum anderen koordiniert sie die kantonale Data Governance und bietet Expertise im Bereich Data Science."}'

How does this score work?

This is a metric that has been created and is still being tested and continuously improved during a pilot project at the administration of the Canton of Zurich.

The index was created using a dataset of complex legal and administrative texts, as well as many samples of Einfache and Leichte Sprache (Plain and Simple Language). The authors trained a classification model to differentiate between complex and simple texts. By selecting the most significant model coefficients, they devised a formula to estimate a text's understandability (not just its readability). This pragmatic metric has been found useful during the mentioned pilot project and seems to work well in practice for administrative texts.

The score takes into account sentence lengths, the readability metrix RIX as well as the occurrence of common words.

Important

This package assumes the Swiss ss in your texts rather than the German German ß. You'll get somewhat worse scores, if your text contains ß. The difference shouldn't be substantial. Nonetheless, we want you to be aware. At the moment the score does not take into account other language properties that are essential for Einfache or Leichte Sprache like use of passive voice, subjunctives, complex structures in short sentences etc. Be also aware that the mapping to CEFR levels A1 to C2 should be considered as a pragmatic approach that gives an indication which seems to work well in practice. However, it is by no means an ‘official’ or safe measure.

What does the score mean?

Texts with scores below 13 will be really hard to understand (this is classic «Behördendeutsch» or legal text territory...).
News, Wikipedia and many books have scores between 13 to 16.
Anything higher than 16: You're on a good way. 👍 Keep editing. And validate with users!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
understandability.png		understandability.png
understandability.py		understandability.py
word_scores.parq		word_scores.parq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simply simplify language understandability score only

Usage

How does this score work?

What does the score mean?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

chregu/simply-understandability-score

Folders and files

Latest commit

History

Repository files navigation

Simply simplify language understandability score only

Usage

How does this score work?

What does the score mean?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages