LLM Hub 🤖

📸 Screenshots

Download

Available on Google Play Store for easy installation and automatic updates

LLM Hub is an open-source Android application that brings the power of Large Language Models (LLMs) directly to your mobile device. Experience AI conversations with state-of-the-art models like Gemma, Llama, and Phi - all running locally on your phone for maximum privacy and offline accessibility.

🚀 Features

Core AI Features

🤖 Multiple LLM Models: Support for Gemma-3, Llama-3.2, Phi-4, and Gemma-3n
📱 On-Device Processing: Complete privacy - no internet required for inference
🖼️ Vision Support: Multimodal models that understand text, images, and audio input
🎙️ Audio Input: Voice recording support for Gemma-3n models with speech recognition
🔊 Text-to-Speech (TTS): AI responses can be read aloud with natural voice output
- Auto-readout mode for hands-free conversations
- Manual playback control for each message
- Multi-language support with automatic language detection
- Adjustable speech rate and pitch
⚡ GPU Acceleration: Optimized performance on supported devices (8GB+ RAM)
💾 Offline Usage: Chat without internet connection after model download
🔒 Privacy First: Your conversations never leave your device

Smart AI Tools

✍️ Writing Aid: Enhance your writing with AI-powered assistance
- Summarize, expand, rewrite, or improve text
- Generate code from descriptions
- Professional tone adjustment
- Grammar and style suggestions
🌍 Translator: Real-time language translation
- Support for 30+ languages
- Text-to-text translation
- Image-to-text translation (OCR + translate)
- Audio-to-text translation (speech recognition + translate)
- Offline translation with on-device models
�️ Transcriber: Audio transcription
- Convert speech to text
- Support for multiple audio formats
- Works with Gemma-3n audio-capable models
- Offline transcription
🛡️ Scam Detector: AI-powered scam detection
- Analyze text messages, emails, and images
- Detect phishing attempts and fraudulent content
- Vision support for screenshot analysis
- Real-time risk assessment

Additional Features

🎨 Modern UI: Clean, intuitive Material Design interface
📥 Direct Downloads: Download models directly from HuggingFace
🧠 RAG Memory: Global context memory for enhanced responses
🌐 Web Search: Optional web search integration for fact-checking

🛠️ AI Tools Overview

💬 Chat

Multi-turn conversations with advanced features:

Context awareness: Maintains conversation history
RAG Memory: Access global knowledge base
Web Search: Optional internet search for real-time information
Multimodal input: Text, images, and audio (model-dependent)
Code highlighting: Syntax highlighting for programming languages
Text-to-Speech: Listen to AI responses with natural voice output
- Auto-readout mode: Automatically plays responses as they're generated
- Manual playback: Tap speaker icon to play any message
- Language detection: Automatically detects and uses appropriate voice
- Playback controls: Stop playback anytime with a single tap

✍️ Writing Aid

Professional writing assistance powered by AI:

Modes: Summarize, Expand, Rewrite, Improve, Code Generation
Use cases:
- Create concise summaries of long documents
- Expand bullet points into full paragraphs
- Rewrite content in different styles
- Improve grammar and clarity
- Generate code from natural language descriptions
Customizable: Adjust temperature and creativity settings

🌍 Translator

Comprehensive translation tool with multiple input methods:

30+ languages: Major world languages supported
Input methods:
- Text input: Type or paste text to translate
- Image translation: Upload images with text (OCR + translate)
- Audio translation: Record speech and translate (with Gemma-3n)
Offline capable: Works without internet using on-device models
Bidirectional: Translate in both directions

🎙️ Transcriber

Convert audio to text with high accuracy:

Audio formats: WAV, MP3, and other common formats
Real-time processing: Quick transcription on-device
Multimodal models: Requires Gemma-3n audio-capable models
Privacy-focused: Audio never leaves your device

🛡️ Scam Detector

Protect yourself from fraud and phishing:

Text analysis: Detect suspicious patterns in messages and emails
Image analysis: Scan screenshots for phishing indicators
Risk assessment: Clear risk level indicators (High/Medium/Low)
Detailed explanation: Understand why something is flagged as suspicious
Use cases:
- Verify suspicious emails
- Check text messages for scams
- Analyze social media messages
- Review website screenshots

📱 Supported Models

Text Models

Gemma-3 1B Series (Google)
- INT4 quantization - 2k context
- INT8 quantization - 1.2k context
- INT8 quantization - 2k context
- INT8 quantization - 4k context
Llama-3.2 Series (Meta)
- 1B model - 1.2k context
- 3B model - 1.2k context
Phi-4 Mini (Microsoft)
- INT8 quantization - 4k context

Multimodal Models (Vision + Audio + Text)

Gemma-3n E2B - Supports text, images, and audio input (4k context)
Gemma-3n E4B - Supports text, images, and audio input (4k context)

Embedding Models (for RAG & Semantic Search)

Gecko-110M Series - Compact embeddings (64D, 256D, 512D, 1024D)
- Quantized and Float32 variants available
- Optimized for on-device semantic search
EmbeddingGemma-300M Series - High-quality text embeddings
- 256, 512, 1024, and 2048 sequence length variants
- Mixed-precision for optimal performance
- Ideal for RAG applications and document search

Memory & RAG (Global Context)

On-device RAG & Embeddings: The app performs retrieval-augmented generation (RAG) locally on the device. Embeddings and semantic search are implemented using the app's RAG manager and embedding models (see RagServiceManager, MemoryProcessor, and the compact Gecko embedding entry in ModelData.kt).
Global Memory (import-only): Users can upload or paste documents into a single global memory store. This is a global context used for RAG lookups — it is not a per-conversation conversational memory. The global memory is managed via the Room database (memoryDao) and exposed in the Settings and Memory screens.
Chunking & Persistence: Uploaded documents are split into chunks; chunk embeddings are computed and persisted. On startup the app restores persisted chunk embeddings from the database and repopulates the in-memory RAG index.
RAG Flow in Chat: The chat pipeline queries the RAG index (both per-chat documents and optional global memory) to build a RAG context that is inserted into the prompt (the code assembles a "USER MEMORY FACTS" block before the assistant prompt). See ChatViewModel for the exact integration points where embeddings are generated (generateEmbedding) and searched (searchRelevantContext, searchGlobalContext).
Controls & Settings: Embeddings and RAG can be enabled/disabled in Settings, and the user can choose the embedding model used for semantic search (the UI exposes embedding model selection via the settings and ThemeViewModel).
Local-only: All embeddings, RAG searches and document chunk storage happen locally (Room DB + in-memory index). No external endpoints are used for RAG or memory lookups.

🛠️ Technology Stack

Language: Kotlin
UI Framework: Jetpack Compose
AI Runtime: MediaPipe & LiteRT (formerly TensorFlow Lite)
Model Optimization: INT4/INT8 quantization
GPU Acceleration: LiteRT XNNPACK delegate
Model Source: HuggingFace & Google repositories

📋 Requirements

Android 8.0 (API level 26) or higher
RAM:
- Minimum 2GB for small models
- 6GB+ recommended for better performance
Storage: 1GB - 5GB depending on selected models
Internet: Required only for model downloads

🚀 Getting Started

Installation

Download APK: Get the latest release from Releases
Install: Enable "Unknown Sources" and install the APK
Download Models: Use the in-app model downloader to get your desired models

Building from Source

# Clone the repository
git clone https://github.com/timmyy123/LLM-Hub.git

# Navigate to project directory
cd LLM-Hub

# Build the project
./gradlew assembleDebug

# Install on device
./gradlew installDebug

Usage

Launch the app and explore the home screen
Go to Settings → Download Models to get AI models
Select and download your preferred model based on device capabilities
Choose your AI tool:
- Chat: Multi-turn conversations with context memory
- Writing Aid: Improve, summarize, or generate text
- Translator: Translate text, images, or audio across 30+ languages
- Transcriber: Convert audio to text
- Scam Detector: Analyze suspicious messages or images
For vision models: Tap the image icon to upload photos for image understanding
For audio models: Use the microphone icon to record audio input

Importing Custom Models

LLM Hub supports importing external models in MediaPipe-compatible formats:

Supported formats: .task and .litertlm files
How to import:
1. Go to Settings → Download Models
2. Tap the "Import Model" button (folder icon)
3. Select your .task or .litertlm file from device storage
4. The model will be copied to the app's model directory
5. Access your imported model from the model selection screen
Compatible models: Any model converted to MediaPipe format using:
- MediaPipe Model Maker
- AI Edge Converter
- LiteRT model conversion tools
Note: Imported models appear under the "Custom" source in your model list

📖 How It Works

LLM Hub uses Google's MediaPipe framework with LiteRT to run quantized AI models directly on your Android device. The app:

Downloads pre-optimized .task files from HuggingFace
Loads models into MediaPipe's LLM Inference API
Processes your input locally using CPU or GPU
Generates responses without sending data to external servers

🔧 Configuration

GPU Acceleration

Gemma-3 1B models: recommend at least 4GB RAM for GPU acceleration
Gemma-3n models: recommend at least 8GB RAM for GPU acceleration
Phi-4 Mini: GPU supported on 8GB+ RAM devices (recommended for best performance)
Llama models: CPU only (compatibility issues)

Model Selection

Choose models based on your device capabilities:

2GB RAM: Gemma-3 1B INT4
4GB RAM: Gemma-3 1B INT8, Llama-3.2 1B
6GB+ RAM: Gemma-3n, Llama-3.2 3B
8GB+ RAM: Phi-4 Mini with GPU acceleration (recommended)

🔎 Web Search

Built-in web search: LLM Hub includes an on-device web search integration used for document lookups and optional augmentation of model responses. The implementation is a DuckDuckGo-based service (WebSearchService / DuckDuckGoSearchService) bundled with the app.
How it works: The search service first attempts content-aware searches: it detects if a query contains a URL and will fetch page content directly. For general queries it:
- tries DuckDuckGo Instant Answer API (JSON) for short answers and definitions,
- falls back to DuckDuckGo HTML search scraping when needed,
- performs optional content extraction: fetches result pages and extracts text snippets to return richer snippets to the app.
Privacy & limits: Searches use public DuckDuckGo endpoints (no API key required). The app performs HTTP requests from the device; network access is required for web search and content fetching. The web search implementation includes timeouts and result limits to avoid excessive requests.
Usage in app: Search results are returned as title/snippet/url tuples and can be used by the chat UI or RAG/document upload flows to provide external context or to fetch page content when users paste a URL.

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Install Android Studio
# Open project in Android Studio
# Sync Gradle files
# Run on device/emulator

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Google for Gemma models and MediaPipe framework
Meta for Llama models
Microsoft for Phi models
HuggingFace for model hosting and community
Android Community for development tools and libraries

📞 Support

Email: [email protected]
Issues: GitHub Issues
Discussions: GitHub Discussions

Made with ❤️ by Timmy

Bringing AI to your pocket, privately and securely.

Setting up Hugging Face Token for Development

To use private or gated models, you need to provide your Hugging Face (HF) access token. This project is set up to securely load your token from your local machine using local.properties (never commit your token to source control).

Steps:

Open or create local.properties in your project root.
- This file is usually already present and is ignored by git by default.
Add your Hugging Face token:
```
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
Replace hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx with your actual token from https://huggingface.co/settings/tokens
Sync Gradle:
- In Android Studio, click "Sync Project with Gradle Files" after saving local.properties.
How it works:
- The build system injects your token into the app at build time as BuildConfig.HF_TOKEN.
- The app uses this token for authenticated model downloads.

Note:

Never commit your local.properties file or your token to version control.
If you change your token, update local.properties and re-sync Gradle.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.idea		.idea
.kotlin/errors		.kotlin/errors
app		app
assets/screenshots		assets/screenshots
gradle		gradle
scripts		scripts
.gitignore		.gitignore
CHAT_IMPORT_FEATURE.md		CHAT_IMPORT_FEATURE.md
EMBEDDING_THRESHOLDS.md		EMBEDDING_THRESHOLDS.md
IMPORT_CHAT_TO_MEMORY.md		IMPORT_CHAT_TO_MEMORY.md
PLAYSTORE_DESCRIPTION.txt		PLAYSTORE_DESCRIPTION.txt
README.md		README.md
THRESHOLD_ALIGNMENT.md		THRESHOLD_ALIGNMENT.md
UPDATE_SUMMARY.md		UPDATE_SUMMARY.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

timmyy123/LLM-Hub

Folders and files

Latest commit

History

Repository files navigation