🌍📚⚙️
**Etymologyneering **
What if you could learn English (or other languages) by understanding how words are engineered?
An educational experiment combining mental-models thinking, Python, and Large Language Models (LLMs) through APIs that teaches 1,056 English words through 38 Proto-Indo-European root stems.
Etymologyneering teaches English through first principles i.e. the Proto-Indo-European (PIE) roots that act as the atomic units or bedrock of meaning. See here about Indo-European reconstructed language https://en.wikipedia.org/wiki/Proto-Indo-European_language These roots are few, yet they generate thousands of words which feels like a compression of semantic space. I mean that each PIE root captures a core meaning (like “to carry,” “to shine,” “to bind”) and then these few hundred roots generate thousands of words across European Languages (Greek, Latin , Germanic, Balto-Slavic etc including Sanskrit language) through systematic transformations (prefixes, suffixes, metaphorical shifts).
Using analogical thinking, to me PIE roots mirror how ML models compress data: surface complexity emerging from simpler underlying features. However their nature is different:
- Dimensionality reduction is computed: algorithmic extraction of latent features from observed data
- PIE roots are reconstructed: ancestral forms of an ancient and unwritten language inferred through comparative linguistics
This work builds upon publicly available etymological data from:
The OpenAI ChatGPT-4o API generated structured entries with the following sections:
- Greek Translation Greek translation for Greek learners
- Phonetic Spelling Provides pronunciation guidance (IPA format)
- Part of Speech e.g., noun, verb, adjective.
- Etymology historical origin and linguistic evolution.
- Nowadays Meaning the modern sense.
- Connection to the PIE Root Stem
- Literal: How concrete/physical meanings evolved into abstract concepts
- Interplay: How prefix, root, and suffix fuse into meaning.
- Example Sentences three per entry.
- Conclusion – summarizes the evolution from root to modern meaning.
- One-Line Intuitive Link memorable mnemonic compressing the entire etymology.
I acted as the man-in-the-loop to review the whole content using these quality controls:
- Missing PIE input for prefixes or stems can lead to hallucinations: When the model isn’t provided with the Proto-Indo-European (PIE) root of one or more components (e.g., in a word with two prefixes and one stem), sometimes invents or distorts meanings for the missing parts. So having all PIE stems for all parts of the words (prefix and stem) is imperative.
- Schema constraints: fixed 9-section format with exact headings.
- Content separation: no mixing of historical and modern meaning.
- Style rules: concise sentences (≤ 20 words), academic tone.
- Evidence discipline: expand abbreviations; trace prefixes to PIE.
- Reasoning scaffolds: mandatory Literal → Metaphorical Bridge; compact PIE → Modern recap.
- Output consistency: one-line summary ≤ 25 words.
Prompts for text-to-image generation were created using Meta-LLaMA-4-Scout-17b-16e-Instruct via Groq API (the inference accelerator).
Images were generated by Black-Forest-Labs / FLUX.1-Schnell model through the Hugging Face API During the review phase of the books, some images were updated directly using ChatGPT-5 through the Chat UI.
All captions were LLM-generated either through the model Meta-LLaMA-4-Scout-17b-16e-Instruct or during my review of the books using ChatGPT-5 through the Chat UI.
- BeautifulSoup – scraping etymological data.
- Matplotlib – visualizing clusters of derivative words around each PIE stem.
For fast learners
- Speed-read the "PIE Root Connection" section (Literal + Interplay)
- Glance at the image + 3 example sentences
- Check the cluster diagram
- Move to next word
For deep learners
- Speed-read each entry accross all sections.
- Check the cluster diagram
- Move to next word
Etymologyneering reveals how words are engineered - the mechanics of meaning.
Neuroscience suggests the brain stores knowledge as networks, not lists.
Each PIE root sits at the center of a branching cluster, mirroring this associative structure.
This visual hierarchy reflects how ideas interconnect, making comprehension intuitive.
Based on Dual-Coding Theory (Paivio) — the mind processes verbal and visual data through two linked systems.
Combining both enhances memory retention and understanding, building stronger neural connections than words alone.
Seeking open etymological datasets or APIs (English, Greek, French) with:
- Word stems and PIE root mappings
- Historical explanations and derivatives
- No commercial licensing restrictions
Goal: Attempt to build an interactive app for etymological language learning.
How to help:
- Email: [[email protected]]
- Know someone working in digital humanities or computational linguistics? Please connect us.
Future directions for me would be to expand to other domains of foundational knowledge such as mathematics, code etc to explore how mental models thinking (e.g. Assumption Testing, 5 Whys etc), Python, LLMs and collaboration can accelerate studying.
Author: Pantelis (Léon) Ladopoulos
Project Type: LLM-assisted English language learning through the Proto-Indo-European (PIE) roots
Languages & Tools: Python | OpenAI API |Meta Llama API| Groq | Hugging Face API| Matplotlib | BeautifulSoup
You can read or download the Etymologyneering volumes (PDF) below.
Each volume explores English words derived from Proto-Indo-European (PIE) roots,
featuring imagery, semantic clusters, and explanations showing how prefixes and stems fuse into meaning.
- 📗 Letter L — Etymologyneering_EN_L (PDF)
- 📘 Letter U — Etymologyneering_EN_U (PDF))
- 📕 Letter W — Etymologyneering_EN_W (PDF)
- 📙 Letter Y — Etymologyneering_EN_Y (PDF)
-
Books, Text, and Images: CC BY 4.0 License
-
You are free to share and adapt this material with proper credit to Pantelis Ladopoulos.
-
Code: MIT License
-
Free to use, modify, and distribute with attribution.