KoSPA is a modular FastAPI application for real-time Korean pronunciation analysis. Users record directly in the browser, upload audio to the backend, and receive instant scoring with targeted feedback.
- Modular architecture – Clean separation of concerns with dedicated modules for config, database, utilities, and routes
- Docker deployment – One-command deployment with PostgreSQL included
- Real-time analysis – 2-second recordings analyzed instantly via
MediaRecorderAPI - Vowel engine – Extracts formants (F1–F3), compares against native speaker references, scores within ±1.5σ
- Consonant engine – Measures VOT, frication, nasal energy with same scoring threshold
- Diphthong analysis – Time-series trajectory extraction with DTW (Dynamic Time Warping) scoring (start 25%, end 25%, direction 20%, magnitude 15%, DTW 15%)
- Visual feedback – Vowel space plots overlay learner samples on native target regions
- User progress – Track improvement over time with personal calibration
- Docker and Docker Compose installed
- Windows/Mac: Docker Desktop
- Linux:
curl -fsSL https://get.docker.com | sh
- Python 3.10+ (tested on 3.11/3.12)
- ffmpeg –
sudo apt install ffmpeg - Python packages –
pip install -r requirements.txt - PostgreSQL – Local instance or cloud database
# Copy environment configuration
cp .env.docker .env
# Build and run containers
docker compose up --build
# Run in background
docker compose up -d --buildOpen http://localhost:8000 in your browser.
Useful commands:
docker compose logs -f app # View logs
docker compose down # Stop containers
docker compose down -v # Reset databasepython -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
./run.shOr: uvicorn main:app --reload --host 0.0.0.0 --port 8000
CAPSTONE/
├── main.py # FastAPI app initialization, router registration
├── config.py # Environment variables, constants, Korean mappings
├── database.py # PostgreSQL connection and query functions
├── utils.py # Audio processing, analysis orchestration
├── routes/ # API endpoint modules
│ ├── __init__.py # Router exports
│ ├── pages.py # HTML pages (/, /sound, /health)
│ ├── auth.py # Authentication (/api/auth/*, /api/progress)
│ └── analysis.py # Sound analysis (/api/analyze-sound*)
├── analysis/ # Pronunciation analysis engines
│ ├── vowel_v2.py # Vowel formant extraction and scoring
│ ├── consonant.py # Consonant dispatcher (routes to specific analyzers)
│ ├── stops.py # Stop consonant analysis (VOT, F0z, place)
│ ├── fricative.py # Fricative analysis (spectral centroid, ㅅ/ㅆ/ㅎ)
│ ├── affricate.py # Affricate analysis (VOT + frication, ㅈ/ㅉ/ㅊ)
│ ├── nasal.py # Nasal consonant analysis (ㄴ/ㅁ)
│ ├── liquid.py # Liquid consonant analysis (ㄹ)
│ ├── config.py # Analysis parameters
│ ├── debug_*.py # Debugging utilities for each consonant type
│ └── README.md # Engine documentation
├── static/ # Frontend assets (CSS, JS, images)
├── templates/ # Jinja2 HTML templates
├── Dockerfile # Container image definition
├── docker-compose.yml # Multi-container setup (app + PostgreSQL)
├── init.sql # Database schema initialization
├── .env.docker # Environment variables template
├── requirements.txt # Python dependencies
└── run.sh # Development server script
GET /– Home page with sound selection gridGET /sound?s=ㅏ– Sound practice pageGET /health– Health check for monitoring
POST /api/auth/signup– Create accountPOST /api/auth/login– Login and get sessionPOST /api/auth/change-password– Update passwordGET /api/progress?username=...– Get user's scoresGET /api/formants?userid=...– Get calibration data
POST /api/calibration– Save calibration recordingPOST /api/analyze-sound– Analyze with progress trackingPOST /api/analyze-sound-guest– Analyze without loginGET /api/info– API information and endpoints
Central configuration including:
DB_URL– Database connection string from environmentVOWEL_SYMBOL_TO_KEY– Maps Hangul jamo to analysis keysCONSONANT_SYMBOL_TO_SYLLABLE– Maps jamo to example syllablesSOUND_DESCRIPTIONS– Pronunciation guidance for each sound
PostgreSQL operations with context-managed connections:
- User CRUD (create, authenticate, update password)
- Progress tracking (get/update high scores)
- Calibration data (formant storage)
Audio processing and analysis orchestration:
- File handling (temp files, cleanup)
- Type conversions (safe_float, normalise_score)
- Analysis functions (vowel/consonant analysis)
- Main entry point:
analyse_uploaded_audio()
Organized API endpoints:
- pages.py – HTML rendering with Jinja2 templates
- auth.py – User authentication and data retrieval
- analysis.py – Sound analysis and calibration
- Threshold: ±1.5σ (standard deviations) from native speaker mean
- Perfect score: 100 points when within threshold
- Penalty: 60 points per σ beyond threshold (linear decrease from 1.5σ to 3σ)
| Type | Sounds | Features | Visualization |
|---|---|---|---|
| Vowels | ㅏ ㅓ ㅗ ㅜ ... | F1/F2/F3 formants | Formant chart + Articulatory map |
| Stops | ㄱ ㄲ ㅋ ㄷ ㄸ ㅌ ㅂ ㅃ ㅍ | VOT, F0z, Place | Place chart + VOT-F0z chart |
| Fricatives | ㅅ ㅆ ㅎ | Spectral centroid, HF contrast | Slider (ㅆ-ㅅ-ㅎ) |
| Affricates | ㅈ ㅉ ㅊ | VOT, Frication duration | Slider (ㅉ-ㅈ-ㅊ) |
| Nasals | ㄴ ㅁ | Nasal energy, formants | Feedback only |
| Liquids | ㄹ | Tap/lateral features | Feedback only |
- Launch EC2 instance (Ubuntu 22.04, t2.small or larger)
- Install Docker:
curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER newgrp docker
- Clone and run:
git clone <repository-url> cd CAPSTONE docker compose up -d --build
- Configure Security Group: Allow inbound TCP port 8000
- Access via:
http://<EC2-Public-IP>:8000
Required for microphone access - browsers require HTTPS for getUserMedia().
-
Setup DuckDNS (free domain)
- Go to https://www.duckdns.org and create a subdomain
- Point it to your EC2 Public IP
-
Install Docker on EC2:
sudo apt-get update sudo apt-get install -y docker.io docker-compose-v2 sudo systemctl start docker && sudo systemctl enable docker sudo usermod -aG docker $USER newgrp docker
-
Configure Security Group:
- Allow inbound TCP port 80 (HTTP - for Let's Encrypt)
- Allow inbound TCP port 443 (HTTPS)
-
Clone and setup:
git clone <repository-url> cd CAPSTONE
-
Get SSL Certificate:
chmod +x init-ssl.sh ./init-ssl.sh your-domain.duckdns.org [email protected]
-
Start production server:
docker compose -f docker-compose.prod.yml up -d --build
-
Access via:
https://your-domain.duckdns.org
- Commit and Push code changes to your
mainbranch on GitHub - SSH into EC2 instance
- Pull changes and restart:
cd ~/CAPSTONE git pull origin main # For development: docker compose up -d --build # For production (HTTPS): docker compose -f docker-compose.prod.yml up -d --build
| File | Description |
|---|---|
docker-compose.prod.yml |
Production setup with Nginx + Certbot |
nginx/nginx.prod.conf |
HTTPS reverse proxy configuration |
init-ssl.sh |
SSL certificate initialization script |
- Microphone requires HTTPS - use production deployment for full functionality
- SSL certificates auto-renew via Certbot (90-day validity)
- Add plot cleanup cron job for
/static/images/analysis/