KoSPA - Korean Speech Pronunciation Analyzer

KoSPA is a modular FastAPI application for real-time Korean pronunciation analysis. Users record directly in the browser, upload audio to the backend, and receive instant scoring with targeted feedback.

Features

Modular architecture – Clean separation of concerns with dedicated modules for config, database, utilities, and routes
Docker deployment – One-command deployment with PostgreSQL included
Real-time analysis – 2-second recordings analyzed instantly via MediaRecorder API
Vowel engine – Extracts formants (F1–F3), compares against native speaker references, scores within ±1.5σ
Consonant engine – Measures VOT, frication, nasal energy with same scoring threshold
Diphthong analysis – Time-series trajectory extraction with DTW (Dynamic Time Warping) scoring (start 25%, end 25%, direction 20%, magnitude 15%, DTW 15%)
Visual feedback – Vowel space plots overlay learner samples on native target regions
User progress – Track improvement over time with personal calibration

Prerequisites

Option A – Docker (Recommended)

Docker and Docker Compose installed
- Windows/Mac: Docker Desktop
- Linux: curl -fsSL https://get.docker.com | sh

Option B – Local Development

Python 3.10+ (tested on 3.11/3.12)
ffmpeg – sudo apt install ffmpeg
Python packages – pip install -r requirements.txt
PostgreSQL – Local instance or cloud database

Running the App

Option 1 – Docker (Recommended)

# Copy environment configuration
cp .env.docker .env

# Build and run containers
docker compose up --build

# Run in background
docker compose up -d --build

Open http://localhost:8000 in your browser.

Useful commands:

docker compose logs -f app     # View logs
docker compose down            # Stop containers
docker compose down -v         # Reset database

Option 2 – Local Development

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
./run.sh

Or: uvicorn main:app --reload --host 0.0.0.0 --port 8000

Project Structure

CAPSTONE/
├── main.py              # FastAPI app initialization, router registration
├── config.py            # Environment variables, constants, Korean mappings
├── database.py          # PostgreSQL connection and query functions
├── utils.py             # Audio processing, analysis orchestration
├── routes/              # API endpoint modules
│   ├── __init__.py      # Router exports
│   ├── pages.py         # HTML pages (/, /sound, /health)
│   ├── auth.py          # Authentication (/api/auth/*, /api/progress)
│   └── analysis.py      # Sound analysis (/api/analyze-sound*)
├── analysis/            # Pronunciation analysis engines
│   ├── vowel_v2.py      # Vowel formant extraction and scoring
│   ├── consonant.py     # Consonant dispatcher (routes to specific analyzers)
│   ├── stops.py         # Stop consonant analysis (VOT, F0z, place)
│   ├── fricative.py     # Fricative analysis (spectral centroid, ㅅ/ㅆ/ㅎ)
│   ├── affricate.py     # Affricate analysis (VOT + frication, ㅈ/ㅉ/ㅊ)
│   ├── nasal.py         # Nasal consonant analysis (ㄴ/ㅁ)
│   ├── liquid.py        # Liquid consonant analysis (ㄹ)
│   ├── config.py        # Analysis parameters
│   ├── debug_*.py       # Debugging utilities for each consonant type
│   └── README.md        # Engine documentation
├── static/              # Frontend assets (CSS, JS, images)
├── templates/           # Jinja2 HTML templates
├── Dockerfile           # Container image definition
├── docker-compose.yml   # Multi-container setup (app + PostgreSQL)
├── init.sql             # Database schema initialization
├── .env.docker          # Environment variables template
├── requirements.txt     # Python dependencies
└── run.sh               # Development server script

API Endpoints

Pages

GET / – Home page with sound selection grid
GET /sound?s=ㅏ – Sound practice page
GET /health – Health check for monitoring

Authentication

POST /api/auth/signup – Create account
POST /api/auth/login – Login and get session
POST /api/auth/change-password – Update password
GET /api/progress?username=... – Get user's scores
GET /api/formants?userid=... – Get calibration data

Analysis

POST /api/calibration – Save calibration recording
POST /api/analyze-sound – Analyze with progress tracking
POST /api/analyze-sound-guest – Analyze without login
GET /api/info – API information and endpoints

Module Descriptions

config.py

Central configuration including:

DB_URL – Database connection string from environment
VOWEL_SYMBOL_TO_KEY – Maps Hangul jamo to analysis keys
CONSONANT_SYMBOL_TO_SYLLABLE – Maps jamo to example syllables
SOUND_DESCRIPTIONS – Pronunciation guidance for each sound

database.py

PostgreSQL operations with context-managed connections:

User CRUD (create, authenticate, update password)
Progress tracking (get/update high scores)
Calibration data (formant storage)

utils.py

Audio processing and analysis orchestration:

File handling (temp files, cleanup)
Type conversions (safe_float, normalise_score)
Analysis functions (vowel/consonant analysis)
Main entry point: analyse_uploaded_audio()

routes/

Organized API endpoints:

pages.py – HTML rendering with Jinja2 templates
auth.py – User authentication and data retrieval
analysis.py – Sound analysis and calibration

Scoring System

Threshold: ±1.5σ (standard deviations) from native speaker mean
Perfect score: 100 points when within threshold
Penalty: 60 points per σ beyond threshold (linear decrease from 1.5σ to 3σ)

Analysis by Sound Type

Type	Sounds	Features	Visualization
Vowels	ㅏ ㅓ ㅗ ㅜ ...	F1/F2/F3 formants	Formant chart + Articulatory map
Stops	ㄱ ㄲ ㅋ ㄷ ㄸ ㅌ ㅂ ㅃ ㅍ	VOT, F0z, Place	Place chart + VOT-F0z chart
Fricatives	ㅅ ㅆ ㅎ	Spectral centroid, HF contrast	Slider (ㅆ-ㅅ-ㅎ)
Affricates	ㅈ ㅉ ㅊ	VOT, Frication duration	Slider (ㅉ-ㅈ-ㅊ)
Nasals	ㄴ ㅁ	Nasal energy, formants	Feedback only
Liquids	ㄹ	Tap/lateral features	Feedback only

AWS EC2 Deployment

Option 1 – Development (HTTP only)

Launch EC2 instance (Ubuntu 22.04, t2.small or larger)

Install Docker:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

Clone and run:

git clone <repository-url>
cd CAPSTONE
docker compose up -d --build

Configure Security Group: Allow inbound TCP port 8000
Access via: http://<EC2-Public-IP>:8000

Option 2 – Production (HTTPS with Nginx + SSL)

Required for microphone access - browsers require HTTPS for getUserMedia().

Setup DuckDNS (free domain)
- Go to https://www.duckdns.org and create a subdomain
- Point it to your EC2 Public IP

Install Docker on EC2:

sudo apt-get update
sudo apt-get install -y docker.io docker-compose-v2
sudo systemctl start docker && sudo systemctl enable docker
sudo usermod -aG docker $USER
newgrp docker

Configure Security Group:
- Allow inbound TCP port 80 (HTTP - for Let's Encrypt)
- Allow inbound TCP port 443 (HTTPS)
Clone and setup:
```
git clone <repository-url>
cd CAPSTONE
```

Get SSL Certificate:

chmod +x init-ssl.sh
./init-ssl.sh your-domain.duckdns.org [email protected]

Start production server:

docker compose -f docker-compose.prod.yml up -d --build

Access via: https://your-domain.duckdns.org

Manual Update Steps

Commit and Push code changes to your main branch on GitHub
SSH into EC2 instance

Pull changes and restart:

cd ~/CAPSTONE
git pull origin main
# For development:
docker compose up -d --build
# For production (HTTPS):
docker compose -f docker-compose.prod.yml up -d --build

Production Files

File	Description
`docker-compose.prod.yml`	Production setup with Nginx + Certbot
`nginx/nginx.prod.conf`	HTTPS reverse proxy configuration
`init-ssl.sh`	SSL certificate initialization script

Production Notes

Microphone requires HTTPS - use production deployment for full functionality
SSL certificates auto-renew via Certbot (90-day validity)
Add plot cleanup cron job for /static/images/analysis/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KoSPA - Korean Speech Pronunciation Analyzer

Features

Prerequisites

Option A – Docker (Recommended)

Option B – Local Development

Running the App

Option 1 – Docker (Recommended)

Option 2 – Local Development

Project Structure

API Endpoints

Pages

Authentication

Analysis

Module Descriptions

config.py

database.py

utils.py

routes/

Scoring System

Analysis by Sound Type

AWS EC2 Deployment

Option 1 – Development (HTTP only)

Option 2 – Production (HTTPS with Nginx + SSL)

Manual Update Steps

Production Files

Production Notes

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
analysis		analysis
nginx		nginx
routes		routes
static		static
templates		templates
.dockerignore		.dockerignore
.env.docker		.env.docker
.gitignore		.gitignore
.nojekyll		.nojekyll
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
database.py		database.py
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
init-ssl.sh		init-ssl.sh
init.sql		init.sql
main.py		main.py
personalization.py		personalization.py
requirements.txt		requirements.txt
run.sh		run.sh
utils.py		utils.py

SammyDuDu/CAPSTONE

Folders and files

Latest commit

History

Repository files navigation

KoSPA - Korean Speech Pronunciation Analyzer

Features

Prerequisites

Option A – Docker (Recommended)

Option B – Local Development

Running the App

Option 1 – Docker (Recommended)

Option 2 – Local Development

Project Structure

API Endpoints

Pages

Authentication

Analysis

Module Descriptions

config.py

database.py

utils.py

routes/

Scoring System

Analysis by Sound Type

AWS EC2 Deployment

Option 1 – Development (HTTP only)

Option 2 – Production (HTTPS with Nginx + SSL)

Manual Update Steps

Production Files

Production Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages