Available on Google Play Store for easy installation and automatic updates
LLM Hub is an open-source Android application that brings the power of Large Language Models (LLMs) directly to your mobile device. Experience AI conversations with state-of-the-art models like Gemma, Llama, and Phi - all running locally on your phone for maximum privacy and offline accessibility.
- π€ Multiple LLM Models: Support for Gemma-3, Llama-3.2, Phi-4, and Gemma-3n
- π± On-Device Processing: Complete privacy - no internet required for inference
- πΌοΈ Vision Support: Multimodal models that understand text, images, and audio input
- ποΈ Audio Input: Voice recording support for Gemma-3n models with speech recognition
- π Text-to-Speech (TTS): AI responses can be read aloud with natural voice output
- Auto-readout mode for hands-free conversations
- Manual playback control for each message
- Multi-language support with automatic language detection
- Adjustable speech rate and pitch
- β‘ GPU Acceleration: Optimized performance on supported devices (8GB+ RAM)
- πΎ Offline Usage: Chat without internet connection after model download
- π Privacy First: Your conversations never leave your device
-
βοΈ Writing Aid: Enhance your writing with AI-powered assistance
- Summarize, expand, rewrite, or improve text
- Generate code from descriptions
- Professional tone adjustment
- Grammar and style suggestions
-
π Translator: Real-time language translation
- Support for 30+ languages
- Text-to-text translation
- Image-to-text translation (OCR + translate)
- Audio-to-text translation (speech recognition + translate)
- Offline translation with on-device models
-
οΏ½οΈ Transcriber: Audio transcription
- Convert speech to text
- Support for multiple audio formats
- Works with Gemma-3n audio-capable models
- Offline transcription
-
π‘οΈ Scam Detector: AI-powered scam detection
- Analyze text messages, emails, and images
- Detect phishing attempts and fraudulent content
- Vision support for screenshot analysis
- Real-time risk assessment
- π¨ Modern UI: Clean, intuitive Material Design interface
- π₯ Direct Downloads: Download models directly from HuggingFace
- π§ RAG Memory: Global context memory for enhanced responses
- π Web Search: Optional web search integration for fact-checking
Multi-turn conversations with advanced features:
- Context awareness: Maintains conversation history
- RAG Memory: Access global knowledge base
- Web Search: Optional internet search for real-time information
- Multimodal input: Text, images, and audio (model-dependent)
- Code highlighting: Syntax highlighting for programming languages
- Text-to-Speech: Listen to AI responses with natural voice output
- Auto-readout mode: Automatically plays responses as they're generated
- Manual playback: Tap speaker icon to play any message
- Language detection: Automatically detects and uses appropriate voice
- Playback controls: Stop playback anytime with a single tap
Professional writing assistance powered by AI:
- Modes: Summarize, Expand, Rewrite, Improve, Code Generation
- Use cases:
- Create concise summaries of long documents
- Expand bullet points into full paragraphs
- Rewrite content in different styles
- Improve grammar and clarity
- Generate code from natural language descriptions
- Customizable: Adjust temperature and creativity settings
Comprehensive translation tool with multiple input methods:
- 30+ languages: Major world languages supported
- Input methods:
- Text input: Type or paste text to translate
- Image translation: Upload images with text (OCR + translate)
- Audio translation: Record speech and translate (with Gemma-3n)
- Offline capable: Works without internet using on-device models
- Bidirectional: Translate in both directions
Convert audio to text with high accuracy:
- Audio formats: WAV, MP3, and other common formats
- Real-time processing: Quick transcription on-device
- Multimodal models: Requires Gemma-3n audio-capable models
- Privacy-focused: Audio never leaves your device
Protect yourself from fraud and phishing:
- Text analysis: Detect suspicious patterns in messages and emails
- Image analysis: Scan screenshots for phishing indicators
- Risk assessment: Clear risk level indicators (High/Medium/Low)
- Detailed explanation: Understand why something is flagged as suspicious
- Use cases:
- Verify suspicious emails
- Check text messages for scams
- Analyze social media messages
- Review website screenshots
-
Gemma-3 1B Series (Google)
- INT4 quantization - 2k context
- INT8 quantization - 1.2k context
- INT8 quantization - 2k context
- INT8 quantization - 4k context
-
Llama-3.2 Series (Meta)
- 1B model - 1.2k context
- 3B model - 1.2k context
-
Phi-4 Mini (Microsoft)
- INT8 quantization - 4k context
- Gemma-3n E2B - Supports text, images, and audio input (4k context)
- Gemma-3n E4B - Supports text, images, and audio input (4k context)
-
Gecko-110M Series - Compact embeddings (64D, 256D, 512D, 1024D)
- Quantized and Float32 variants available
- Optimized for on-device semantic search
-
EmbeddingGemma-300M Series - High-quality text embeddings
- 256, 512, 1024, and 2048 sequence length variants
- Mixed-precision for optimal performance
- Ideal for RAG applications and document search
Memory & RAG (Global Context)
- On-device RAG & Embeddings: The app performs retrieval-augmented generation (RAG) locally on the device. Embeddings and semantic search are implemented using the app's RAG manager and embedding models (see
RagServiceManager
,MemoryProcessor
, and the compact Gecko embedding entry inModelData.kt
). - Global Memory (import-only): Users can upload or paste documents into a single global memory store. This is a global context used for RAG lookups β it is not a per-conversation conversational memory. The global memory is managed via the Room database (
memoryDao
) and exposed in the Settings and Memory screens. - Chunking & Persistence: Uploaded documents are split into chunks; chunk embeddings are computed and persisted. On startup the app restores persisted chunk embeddings from the database and repopulates the in-memory RAG index.
- RAG Flow in Chat: The chat pipeline queries the RAG index (both per-chat documents and optional global memory) to build a RAG context that is inserted into the prompt (the code assembles a "USER MEMORY FACTS" block before the assistant prompt). See
ChatViewModel
for the exact integration points where embeddings are generated (generateEmbedding
) and searched (searchRelevantContext
,searchGlobalContext
). - Controls & Settings: Embeddings and RAG can be enabled/disabled in Settings, and the user can choose the embedding model used for semantic search (the UI exposes embedding model selection via the settings and
ThemeViewModel
). - Local-only: All embeddings, RAG searches and document chunk storage happen locally (Room DB + in-memory index). No external endpoints are used for RAG or memory lookups.
- Language: Kotlin
- UI Framework: Jetpack Compose
- AI Runtime: MediaPipe & LiteRT (formerly TensorFlow Lite)
- Model Optimization: INT4/INT8 quantization
- GPU Acceleration: LiteRT XNNPACK delegate
- Model Source: HuggingFace & Google repositories
- Android 8.0 (API level 26) or higher
- RAM:
- Minimum 2GB for small models
- 6GB+ recommended for better performance
- Storage: 1GB - 5GB depending on selected models
- Internet: Required only for model downloads
- Download APK: Get the latest release from Releases
- Install: Enable "Unknown Sources" and install the APK
- Download Models: Use the in-app model downloader to get your desired models
# Clone the repository
git clone https://github.com/timmyy123/LLM-Hub.git
# Navigate to project directory
cd LLM-Hub
# Build the project
./gradlew assembleDebug
# Install on device
./gradlew installDebug
- Launch the app and explore the home screen
- Go to Settings β Download Models to get AI models
- Select and download your preferred model based on device capabilities
- Choose your AI tool:
- Chat: Multi-turn conversations with context memory
- Writing Aid: Improve, summarize, or generate text
- Translator: Translate text, images, or audio across 30+ languages
- Transcriber: Convert audio to text
- Scam Detector: Analyze suspicious messages or images
- For vision models: Tap the image icon to upload photos for image understanding
- For audio models: Use the microphone icon to record audio input
LLM Hub supports importing external models in MediaPipe-compatible formats:
-
Supported formats:
.task
and.litertlm
files -
How to import:
- Go to Settings β Download Models
- Tap the "Import Model" button (folder icon)
- Select your
.task
or.litertlm
file from device storage - The model will be copied to the app's model directory
- Access your imported model from the model selection screen
-
Compatible models: Any model converted to MediaPipe format using:
- MediaPipe Model Maker
- AI Edge Converter
- LiteRT model conversion tools
-
Note: Imported models appear under the "Custom" source in your model list
LLM Hub uses Google's MediaPipe framework with LiteRT to run quantized AI models directly on your Android device. The app:
- Downloads pre-optimized
.task
files from HuggingFace - Loads models into MediaPipe's LLM Inference API
- Processes your input locally using CPU or GPU
- Generates responses without sending data to external servers
- Gemma-3 1B models: recommend at least 4GB RAM for GPU acceleration
- Gemma-3n models: recommend at least 8GB RAM for GPU acceleration
- Phi-4 Mini: GPU supported on 8GB+ RAM devices (recommended for best performance)
- Llama models: CPU only (compatibility issues)
Choose models based on your device capabilities:
- 2GB RAM: Gemma-3 1B INT4
- 4GB RAM: Gemma-3 1B INT8, Llama-3.2 1B
- 6GB+ RAM: Gemma-3n, Llama-3.2 3B
- 8GB+ RAM: Phi-4 Mini with GPU acceleration (recommended)
- Built-in web search: LLM Hub includes an on-device web search integration used for document lookups and optional augmentation of model responses. The implementation is a DuckDuckGo-based service (
WebSearchService
/DuckDuckGoSearchService
) bundled with the app. - How it works: The search service first attempts content-aware searches: it detects if a query contains a URL and will fetch page content directly. For general queries it:
- tries DuckDuckGo Instant Answer API (JSON) for short answers and definitions,
- falls back to DuckDuckGo HTML search scraping when needed,
- performs optional content extraction: fetches result pages and extracts text snippets to return richer snippets to the app.
- Privacy & limits: Searches use public DuckDuckGo endpoints (no API key required). The app performs HTTP requests from the device; network access is required for web search and content fetching. The web search implementation includes timeouts and result limits to avoid excessive requests.
- Usage in app: Search results are returned as title/snippet/url tuples and can be used by the chat UI or RAG/document upload flows to provide external context or to fetch page content when users paste a URL.
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
# Install Android Studio
# Open project in Android Studio
# Sync Gradle files
# Run on device/emulator
This project is licensed under the MIT License - see the LICENSE file for details.
- Google for Gemma models and MediaPipe framework
- Meta for Llama models
- Microsoft for Phi models
- HuggingFace for model hosting and community
- Android Community for development tools and libraries
- Email: [email protected]
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with β€οΈ by Timmy
Bringing AI to your pocket, privately and securely.
To use private or gated models, you need to provide your Hugging Face (HF) access token. This project is set up to securely load your token from your local machine using local.properties
(never commit your token to source control).
-
Open or create
local.properties
in your project root.- This file is usually already present and is ignored by git by default.
-
Add your Hugging Face token:
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Replace
hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
with your actual token from https://huggingface.co/settings/tokens -
Sync Gradle:
- In Android Studio, click "Sync Project with Gradle Files" after saving
local.properties
.
- In Android Studio, click "Sync Project with Gradle Files" after saving
-
How it works:
- The build system injects your token into the app at build time as
BuildConfig.HF_TOKEN
. - The app uses this token for authenticated model downloads.
- The build system injects your token into the app at build time as
Note:
- Never commit your
local.properties
file or your token to version control. - If you change your token, update
local.properties
and re-sync Gradle.