Un chatbot intelligente per tradurre e insegnare il dialetto materano, preservando il patrimonio linguistico e culturale di Matera.
ChatMT = Chat + MT (provincia di Matera) - Il tuo assistente personale per il dialetto materano!
Il progetto mira a creare un assistente conversazionale che:
- Traduce tra italiano e dialetto materano
- Insegna grammatica e pronuncia del dialetto
- Fornisce contesto culturale e storico
- Racconta tradizioni e aneddoti dei Sassi
User Input β LangGraph Workflow β Response
β
[Preprocessing] β [Dictionary Lookup] β [Response Formatting]
User Input β Chat Manager β Specialized Agents β Coordinated Response
β
ββ Traduttore Materano (LangGraph)
ββ Storyteller Culturale
ββ Guida Turistica Matera
ββ Insegnante Dialetto
- Dizionario Materano (Antonio D'Ercole - "Voci di Sassi")
- 80 pagine in PDF
- Formato: Materano β Italiano
- Esempi d'uso e contesto
- Il dialetto materano (caratteristiche fonetiche)
- SassiTour: articoli su espressioni tipiche
- Wikipedia: analisi linguistica dettagliata
- Angelo Sarra: "Dizionario 'Na chedd' di parole in disuso" (con CD audio)
- Python 3.12+
- LangGraph - Workflow orchestration
- LangChain - LLM integration
- Ollama - Local LLM models
- OpenAI API - Cloud LLM models
- pdfplumber - PDF text extraction
- BeautifulSoup4 - Web scraping WikiMatera
- pandas - Data manipulation
- Redis - Vector database and caching
- redis-py - Redis Python client
- Streamlit or Gradio - Chat interface
- FastAPI - Backend API
# Struttura dati Redis per termini dialetto
KEY: "term:materano:{term_id}"
VALUE: {
"materano_term": "abbinì",
"italian_translation": "hai da venire, devi venire",
"category": "verbi",
"examples": ["abbinì appess a miéì"],
"cultural_notes": "formula di invito tipica",
"source": "dizionario_dercole",
"embedding": [0.1, 0.2, ...], # Vector for semantic search
"created_at": "2025-01-01T00:00:00Z"
}
# Indici per ricerca
SET: "terms:by_category:{category}" β {term_id1, term_id2, ...}
SET: "terms:by_source:{source}" β {term_id1, term_id2, ...}
ZSET: "terms:by_popularity" β term_id (score = usage_count)
git clone [repository-url]
cd ChatMT
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Copy and edit environment file
cp .env.example .env
# .env file
OPENAI_API_KEY=your_openai_key
OLLAMA_MODEL=llama3.1 # or preferred local model
USE_LOCAL_MODEL=true # or false for OpenAI
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
REDIS_PASSWORD= # if required
LOG_LEVEL=INFO
# Install Redis (if not already installed)
# macOS: brew install redis
# Ubuntu: sudo apt install redis-server
# Start Redis: redis-server
# Data processing (development-driven approach)
uv run src/data/pdf_extractor.py
uv run src/data/web_scraper.py
# Note: Database setup and schema will be implemented during development
# CLI version
uv run main.py
# Web interface (Fase 3+)
uv run streamlit run src/interface/streamlit_app.py
ChatMT/
βββ π pyproject.toml # UV dependency management
βββ π uv.lock # UV lockfile
βββ π README.md
βββ π ROADMAP.md
βββ π .env.example
βββ π .gitignore
βββ π main.py # Entry point CLI
β
βββ π data/ # All data resources
β βββ π raw/ # Original, immutable resources
β β βββ π dizionario_materano.pdf
β β βββ π images/ # Dialect-related images from WikiMatera
β β
β βββ π scraped/ # Web-scraped content
β β βββ π wikimatera_pages.json
β β βββ π proverbi.json
β β βββ π grammatica.json
β β βββ π numeri.json
β β βββ π poesie.json
β β βββ π preghiere.json
β β βββ π soprannomi.json
β β βββ π parole_antiche.json
β β
β βββ π processed/ # Cleaned, structured data
β β βββ π dictionary_terms.json
β β βββ π phonetic_rules.json
β β βββ π cultural_content.json
β β βββ π training_examples.json
β β
β βββ π exports/ # Generated datasets for training
β βββ π translation_pairs.csv
β βββ π conversation_examples.json
β βββ π validation_set.json
β
βββ π src/ # Source code
β βββ π __init__.py
β β
β βββ π core/ # Core application logic
β β βββ π __init__.py
β β βββ π config.py # Configuration management
β β βββ π exceptions.py # Custom exceptions
β β βββ π constants.py # Application constants
β β
β βββ π models/ # LLM and data models
β β βββ π __init__.py
β β βββ π model_factory.py # Ollama/OpenAI factory
β β βββ π schemas.py # Pydantic models
β β βββ π prompts.py # Prompt templates
β β
β βββ π data/ # Data processing and management
β β βββ π __init__.py
β β βββ π pdf_extractor.py # PDF dictionary processing
β β βββ π web_scraper.py # WikiMatera scraping
β β βββ π redis_manager.py # Redis operations and vector search
β β βββ π text_processor.py # Text cleaning and normalization
β β βββ π knowledge_builder.py # Knowledge base construction
β β
β βββ π workflows/ # LangGraph workflows
β β βββ π __init__.py
β β βββ π translation_workflow.py # Core translation logic
β β βββ π chat_workflow.py # Conversational flow
β β βββ π teaching_workflow.py # Educational interactions
β β βββ π nodes/ # Workflow nodes
β β βββ π __init__.py
β β βββ π language_detection.py
β β βββ π dictionary_lookup.py
β β βββ π phonetic_rules.py
β β βββ π cultural_context.py
β β βββ π response_formatter.py
β β
β βββ π agents/ # Multi-agent system (Fase 4)
β β βββ π __init__.py
β β βββ π base_agent.py # Abstract base agent
β β βββ π chat_manager.py # Main orchestrator
β β βββ π translator.py # Translation specialist
β β βββ π storyteller.py # Cultural narratives
β β βββ π teacher.py # Grammar and lessons
β β βββ π guide.py # Matera tourism info
β β
β βββ π interface/ # User interfaces
β β βββ π __init__.py
β β βββ π cli.py # Command line interface
β β βββ π streamlit_app.py # Web chat interface
β β βββ π gradio_app.py # Alternative web interface
β β
β βββ π services/ # External service integrations
β β βββ π __init__.py
β β βββ π ollama_service.py # Ollama integration
β β βββ π openai_service.py # OpenAI integration
β β
β βββ π utils/ # Utility functions
β βββ π __init__.py
β βββ π logging_config.py # Logging setup
β βββ π validation.py # Data validation
β βββ π text_utils.py # Text processing utilities
β βββ π performance.py # Performance monitoring
β
βββ π scripts/ # Development utilities (as needed)
β βββ π __init__.py
β
βββ π tests/ # Test suite
β βββ π __init__.py
β βββ π conftest.py # Pytest configuration
β βββ π unit/ # Unit tests
β β βββ π test_data_processing.py
β β βββ π test_workflows.py
β β βββ π test_models.py
β β βββ π test_utils.py
β βββ π integration/ # Integration tests
β β βββ π test_translation_flow.py
β β βββ π test_redis_operations.py
β β βββ π test_agents.py
β βββ π fixtures/ # Test data
β βββ π sample_dictionary.json
β βββ π test_conversations.json
β βββ π validation_cases.json
β
βββ π config/ # Configuration files
β βββ π logging.yaml # Logging configuration
β βββ π redis.yaml # Redis connection and schema config
β βββ π agents.yaml # Agent configurations
β
βββ π notebooks/ # Jupyter notebooks for analysis
β βββ π 01_dictionary_analysis.ipynb
β βββ π 02_phonetic_patterns.ipynb
β βββ π 03_cultural_content_exploration.ipynb
β βββ π 04_model_evaluation.ipynb
β
βββ π docs/ # Documentation
β βββ π architecture.md # System architecture
β βββ π data_sources.md # Data documentation
β βββ π api_reference.md # API documentation
β βββ π deployment.md # Deployment guide
β βββ π examples/ # Usage examples
β βββ π basic_translation.py
β βββ π chat_examples.py
β βββ π agent_workflows.py
β
βββ π deployment/ # Deployment configurations
βββ π Dockerfile
βββ π docker-compose.yml
βββ π requirements-prod.txt # Production dependencies
βββ π k8s/ # Kubernetes configs (future)
βββ π deployment.yaml
βββ π service.yaml
- Preservare il dialetto materano per le future generazioni
- Fornire uno strumento di apprendimento interattivo
- Documentare varianti e sfumature linguistiche
- Sperimentare con LangGraph per workflow complessi
- Implementare sistemi multi-agente conversazionali
- Integrare risorse testuali eterogenee
- Promuovere il patrimonio culturale di Matera
- Creare ponte tra tradizione e innovazione tecnologica
- Sviluppare strumento turistico culturale
Il progetto Γ¨ aperto a contributi di:
- Madrelingua materani per validazione linguistica
- Sviluppatori interessati a dialetti regionali
- Esperti di NLP e sistemi conversazionali
- Appassionati di cultura materana
[Da definire - considerare licenza open source per la parte tecnica]
- Antonio D'Ercole per il "Dizionario Materano"
- WikiMatera.it per le risorse culturali
- ComunitΓ materana per la preservazione del dialetto