Un chatbot intelligente per tradurre e insegnare il dialetto materano, preservando il patrimonio linguistico e culturale di Matera.
ChatMT = Chat + MT (provincia di Matera) - Il tuo assistente personale per il dialetto materano!
Il progetto mira a creare un assistente conversazionale che:
- Traduce tra italiano e dialetto materano
- Insegna grammatica e pronuncia del dialetto
- Fornisce contesto culturale e storico
- Racconta tradizioni e aneddoti dei Sassi
User Input β LangGraph Workflow β Response
    β
[Preprocessing] β [Dictionary Lookup] β [Response Formatting]
User Input β Chat Manager β Specialized Agents β Coordinated Response
                   β
    ββ Traduttore Materano (LangGraph)
    ββ Storyteller Culturale  
    ββ Guida Turistica Matera
    ββ Insegnante Dialetto
- Dizionario Materano (Antonio D'Ercole - "Voci di Sassi")
- 80 pagine in PDF
- Formato: Materano β Italiano
- Esempi d'uso e contesto
 
- Il dialetto materano (caratteristiche fonetiche)
- SassiTour: articoli su espressioni tipiche
- Wikipedia: analisi linguistica dettagliata
- Angelo Sarra: "Dizionario 'Na chedd' di parole in disuso" (con CD audio)
- Python 3.12+
- LangGraph - Workflow orchestration
- LangChain - LLM integration
- Ollama - Local LLM models
- OpenAI API - Cloud LLM models
- pdfplumber - PDF text extraction
- BeautifulSoup4 - Web scraping WikiMatera
- pandas - Data manipulation
- Redis - Vector database and caching
- redis-py - Redis Python client
- Streamlit or Gradio - Chat interface
- FastAPI - Backend API
# Struttura dati Redis per termini dialetto
KEY: "term:materano:{term_id}"
VALUE: {
    "materano_term": "abbinì",
    "italian_translation": "hai da venire, devi venire", 
    "category": "verbi",
    "examples": ["abbinì appess a miéì"],
    "cultural_notes": "formula di invito tipica",
    "source": "dizionario_dercole",
    "embedding": [0.1, 0.2, ...],  # Vector for semantic search
    "created_at": "2025-01-01T00:00:00Z"
}
# Indici per ricerca
SET: "terms:by_category:{category}" β {term_id1, term_id2, ...}
SET: "terms:by_source:{source}" β {term_id1, term_id2, ...}
ZSET: "terms:by_popularity" β term_id (score = usage_count)git clone [repository-url]
cd ChatMT
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync# Copy and edit environment file
cp .env.example .env
# .env file
OPENAI_API_KEY=your_openai_key
OLLAMA_MODEL=llama3.1  # or preferred local model
USE_LOCAL_MODEL=true   # or false for OpenAI
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
REDIS_PASSWORD=  # if required
LOG_LEVEL=INFO# Install Redis (if not already installed)
# macOS: brew install redis
# Ubuntu: sudo apt install redis-server
# Start Redis: redis-server
# Data processing (development-driven approach)
uv run src/data/pdf_extractor.py
uv run src/data/web_scraper.py
# Note: Database setup and schema will be implemented during development# CLI version
uv run main.py
# Web interface (Fase 3+)
uv run streamlit run src/interface/streamlit_app.pyChatMT/
βββ π pyproject.toml              # UV dependency management
βββ π uv.lock                     # UV lockfile
βββ π README.md
βββ π ROADMAP.md
βββ π .env.example
βββ π .gitignore
βββ π main.py                     # Entry point CLI
β
βββ π data/                       # All data resources
β   βββ π raw/                    # Original, immutable resources
β   β   βββ π dizionario_materano.pdf
β   β   βββ π images/             # Dialect-related images from WikiMatera
β   β
β   βββ π scraped/                # Web-scraped content
β   β   βββ π wikimatera_pages.json
β   β   βββ π proverbi.json
β   β   βββ π grammatica.json
β   β   βββ π numeri.json
β   β   βββ π poesie.json
β   β   βββ π preghiere.json
β   β   βββ π soprannomi.json
β   β   βββ π parole_antiche.json
β   β
β   βββ π processed/              # Cleaned, structured data
β   β   βββ π dictionary_terms.json
β   β   βββ π phonetic_rules.json
β   β   βββ π cultural_content.json
β   β   βββ π training_examples.json
β   β
β   βββ π exports/                # Generated datasets for training
β       βββ π translation_pairs.csv
β       βββ π conversation_examples.json
β       βββ π validation_set.json
β
βββ π src/                        # Source code
β   βββ π __init__.py
β   β
β   βββ π core/                   # Core application logic
β   β   βββ π __init__.py
β   β   βββ π config.py           # Configuration management
β   β   βββ π exceptions.py       # Custom exceptions
β   β   βββ π constants.py        # Application constants
β   β
β   βββ π models/                 # LLM and data models
β   β   βββ π __init__.py
β   β   βββ π model_factory.py    # Ollama/OpenAI factory
β   β   βββ π schemas.py          # Pydantic models
β   β   βββ π prompts.py          # Prompt templates
β   β
β   βββ π data/                   # Data processing and management
β   β   βββ π __init__.py
β   β   βββ π pdf_extractor.py    # PDF dictionary processing
β   β   βββ π web_scraper.py      # WikiMatera scraping
β   β   βββ π redis_manager.py    # Redis operations and vector search
β   β   βββ π text_processor.py   # Text cleaning and normalization
β   β   βββ π knowledge_builder.py # Knowledge base construction
β   β
β   βββ π workflows/              # LangGraph workflows
β   β   βββ π __init__.py
β   β   βββ π translation_workflow.py  # Core translation logic
β   β   βββ π chat_workflow.py         # Conversational flow
β   β   βββ π teaching_workflow.py     # Educational interactions
β   β   βββ π nodes/                   # Workflow nodes
β   β       βββ π __init__.py
β   β       βββ π language_detection.py
β   β       βββ π dictionary_lookup.py
β   β       βββ π phonetic_rules.py
β   β       βββ π cultural_context.py
β   β       βββ π response_formatter.py
β   β
β   βββ π agents/                 # Multi-agent system (Fase 4)
β   β   βββ π __init__.py
β   β   βββ π base_agent.py       # Abstract base agent
β   β   βββ π chat_manager.py     # Main orchestrator
β   β   βββ π translator.py       # Translation specialist
β   β   βββ π storyteller.py      # Cultural narratives
β   β   βββ π teacher.py          # Grammar and lessons
β   β   βββ π guide.py            # Matera tourism info
β   β
β   βββ π interface/              # User interfaces
β   β   βββ π __init__.py
β   β   βββ π cli.py              # Command line interface
β   β   βββ π streamlit_app.py    # Web chat interface
β   β   βββ π gradio_app.py       # Alternative web interface
β   β
β   βββ π services/               # External service integrations
β   β   βββ π __init__.py
β   β   βββ π ollama_service.py   # Ollama integration
β   β   βββ π openai_service.py   # OpenAI integration
β   β
β   βββ π utils/                  # Utility functions
β       βββ π __init__.py
β       βββ π logging_config.py   # Logging setup
β       βββ π validation.py       # Data validation
β       βββ π text_utils.py       # Text processing utilities
β       βββ π performance.py      # Performance monitoring
β
βββ π scripts/                    # Development utilities (as needed)
β   βββ π __init__.py
β
βββ π tests/                      # Test suite
β   βββ π __init__.py
β   βββ π conftest.py            # Pytest configuration
β   βββ π unit/                  # Unit tests
β   β   βββ π test_data_processing.py
β   β   βββ π test_workflows.py
β   β   βββ π test_models.py
β   β   βββ π test_utils.py
β   βββ π integration/           # Integration tests
β   β   βββ π test_translation_flow.py
β   β   βββ π test_redis_operations.py
β   β   βββ π test_agents.py
β   βββ π fixtures/              # Test data
β       βββ π sample_dictionary.json
β       βββ π test_conversations.json
β       βββ π validation_cases.json
β
βββ π config/                     # Configuration files
β   βββ π logging.yaml           # Logging configuration
β   βββ π redis.yaml             # Redis connection and schema config
β   βββ π agents.yaml            # Agent configurations
β
βββ π notebooks/                  # Jupyter notebooks for analysis
β   βββ π 01_dictionary_analysis.ipynb
β   βββ π 02_phonetic_patterns.ipynb
β   βββ π 03_cultural_content_exploration.ipynb
β   βββ π 04_model_evaluation.ipynb
β
βββ π docs/                       # Documentation
β   βββ π architecture.md        # System architecture
β   βββ π data_sources.md        # Data documentation
β   βββ π api_reference.md       # API documentation
β   βββ π deployment.md          # Deployment guide
β   βββ π examples/              # Usage examples
β       βββ π basic_translation.py
β       βββ π chat_examples.py
β       βββ π agent_workflows.py
β
βββ π deployment/                 # Deployment configurations
    βββ π Dockerfile
    βββ π docker-compose.yml
    βββ π requirements-prod.txt   # Production dependencies
    βββ π k8s/                   # Kubernetes configs (future)
        βββ π deployment.yaml
        βββ π service.yaml
- Preservare il dialetto materano per le future generazioni
- Fornire uno strumento di apprendimento interattivo
- Documentare varianti e sfumature linguistiche
- Sperimentare con LangGraph per workflow complessi
- Implementare sistemi multi-agente conversazionali
- Integrare risorse testuali eterogenee
- Promuovere il patrimonio culturale di Matera
- Creare ponte tra tradizione e innovazione tecnologica
- Sviluppare strumento turistico culturale
Il progetto Γ¨ aperto a contributi di:
- Madrelingua materani per validazione linguistica
- Sviluppatori interessati a dialetti regionali
- Esperti di NLP e sistemi conversazionali
- Appassionati di cultura materana
[Da definire - considerare licenza open source per la parte tecnica]
- Antonio D'Ercole per il "Dizionario Materano"
- WikiMatera.it per le risorse culturali
- ComunitΓ materana per la preservazione del dialetto