Etymologyneering

🌍📚⚙️
**Etymologyneering **
What if you could learn English (or other languages) by understanding how words are engineered?

Etymologyneering

An educational experiment combining mental-models thinking, Python, and Large Language Models (LLMs) through APIs that teaches 1,056 English words through 38 Proto-Indo-European root stems.

Concept

Etymologyneering teaches English through first principles i.e. the Proto-Indo-European (PIE) roots that act as the atomic units or bedrock of meaning. See here about Indo-European reconstructed language https://en.wikipedia.org/wiki/Proto-Indo-European_language These roots are few, yet they generate thousands of words which feels like a compression of semantic space. I mean that each PIE root captures a core meaning (like “to carry,” “to shine,” “to bind”) and then these few hundred roots generate thousands of words across European Languages (Greek, Latin , Germanic, Balto-Slavic etc including Sanskrit language) through systematic transformations (prefixes, suffixes, metaphorical shifts).

Using analogical thinking, to me PIE roots mirror how ML models compress data: surface complexity emerging from simpler underlying features. However their nature is different:

Dimensionality reduction is computed: algorithmic extraction of latent features from observed data
PIE roots are reconstructed: ancestral forms of an ancient and unwritten language inferred through comparative linguistics

Sources

This work builds upon publicly available etymological data from:

LLM Workflow

The OpenAI ChatGPT-4o API generated structured entries with the following sections:

Greek Translation Greek translation for Greek learners
Phonetic Spelling Provides pronunciation guidance (IPA format)
Part of Speech e.g., noun, verb, adjective.
Etymology historical origin and linguistic evolution.
Nowadays Meaning the modern sense.
Connection to the PIE Root Stem
- Literal: How concrete/physical meanings evolved into abstract concepts
- Interplay: How prefix, root, and suffix fuse into meaning.
Example Sentences three per entry.
Conclusion – summarizes the evolution from root to modern meaning.
One-Line Intuitive Link memorable mnemonic compressing the entire etymology.

Preventing LLM Hallucinations

I acted as the man-in-the-loop to review the whole content using these quality controls:

Missing PIE input for prefixes or stems can lead to hallucinations: When the model isn’t provided with the Proto-Indo-European (PIE) root of one or more components (e.g., in a word with two prefixes and one stem), sometimes invents or distorts meanings for the missing parts. So having all PIE stems for all parts of the words (prefix and stem) is imperative.
Schema constraints: fixed 9-section format with exact headings.
Content separation: no mixing of historical and modern meaning.
Style rules: concise sentences (≤ 20 words), academic tone.
Evidence discipline: expand abbreviations; trace prefixes to PIE.
Reasoning scaffolds: mandatory Literal → Metaphorical Bridge; compact PIE → Modern recap.
Output consistency: one-line summary ≤ 25 words.

Image Generation Pipeline

Prompts for text-to-image generation were created using Meta-LLaMA-4-Scout-17b-16e-Instruct via Groq API (the inference accelerator).

Images were generated by Black-Forest-Labs / FLUX.1-Schnell model through the Hugging Face API During the review phase of the books, some images were updated directly using ChatGPT-5 through the Chat UI.

All captions were LLM-generated either through the model Meta-LLaMA-4-Scout-17b-16e-Instruct or during my review of the books using ChatGPT-5 through the Chat UI.

Python Components

BeautifulSoup – scraping etymological data.
Matplotlib – visualizing clusters of derivative words around each PIE stem.

How to Read the Volumes

For fast learners

Speed-read the "PIE Root Connection" section (Literal + Interplay)
Glance at the image + 3 example sentences
Check the cluster diagram
Move to next word

For deep learners

Speed-read each entry accross all sections.
Check the cluster diagram
Move to next word

Etymologyneering reveals how words are engineered - the mechanics of meaning.

Why Clusters and Images?

Clusters

Neuroscience suggests the brain stores knowledge as networks, not lists.
Each PIE root sits at the center of a branching cluster, mirroring this associative structure.
This visual hierarchy reflects how ideas interconnect, making comprehension intuitive.

Images

Based on Dual-Coding Theory (Paivio) — the mind processes verbal and visual data through two linked systems.
Combining both enhances memory retention and understanding, building stronger neural connections than words alone.

Collaboration Invitation

Seeking open etymological datasets or APIs (English, Greek, French) with:

Word stems and PIE root mappings
Historical explanations and derivatives
No commercial licensing restrictions

Goal: Attempt to build an interactive app for etymological language learning.

How to help:

Email: [[email protected]]
Know someone working in digital humanities or computational linguistics? Please connect us.

Future directions for me would be to expand to other domains of foundational knowledge such as mathematics, code etc to explore how mental models thinking (e.g. Assumption Testing, 5 Whys etc), Python, LLMs and collaboration can accelerate studying.

📘 Download Volumes

You can read or download the Etymologyneering volumes (PDF) below.
Each volume explores English words derived from Proto-Indo-European (PIE) roots,
featuring imagery, semantic clusters, and explanations showing how prefixes and stems fuse into meaning.

License

Books, Text, and Images: CC BY 4.0 License
You are free to share and adapt this material with proper credit to Pantelis Ladopoulos.
Code: MIT License
Free to use, modify, and distribute with attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
code		code
volumes		volumes
LICENSE_BOOK.txt		LICENSE_BOOK.txt
LICENSE_CODE.txt		LICENSE_CODE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Etymologyneering

Concept

Sources

LLM Workflow

Preventing LLM Hallucinations

Image Generation Pipeline

Python Components

How to Read the Volumes

Why Clusters and Images?

Clusters

Images

Collaboration Invitation

📘 Download Volumes

License

About

Uh oh!

Releases

Packages

Languages

License

pladopoulos/etymologyneering

Folders and files

Latest commit

History

Repository files navigation

Etymologyneering

Concept

Sources

LLM Workflow

Preventing LLM Hallucinations

Image Generation Pipeline

Python Components

How to Read the Volumes

Why Clusters and Images?

Clusters

Images

Collaboration Invitation

📘 Download Volumes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages