ImageAI on GitHub Desktop + CLI for multiβprovider AI image and video generation with enterprise auth, prompt tools, and MIDIβsynced karaoke/video workflows.
Version 0.24.0
See LelandGreen.com for links to other code and free stuff. Under construction. Implementing social links soon.
- LelandGreen.com on Discord - The Intersection of Art and AI Discord
- Facebook Page: Leland Green Productions
- Facebook Group: The Intersection of Art and AI Community
Created with JetBrains PyCharm and AI assistance from Junie, Claude Code, Codex. With Google auth guidance from Gemini CLI.
ImageAI is a powerful desktop application and CLI tool for AI image generation supporting multiple providers including Google's Gemini API, OpenAI's DALLΒ·E models, Stability AI's Stable Diffusion, and local Stable Diffusion models. It features enterprise-grade authentication options, secure credential management, and works seamlessly across Windows, macOS, and Linux.
- Project Review & Recommendations
- Requirements
- Authentication Setup
- Installation
- Running the Application
- Authentication Management
- CLI Reference
- GUI Features
- Image Management
- Examples and Templates
- Advanced Features
- Utility Scripts
- Frequently Asked Questions
- Pricing and Cost Comparison
- API Reference
- Development
- Changelog
- Screenshots Gallery
- Google Gemini - Access to latest Gemini models for image generation
- OpenAI DALLΒ·E - Support for DALLΒ·E-3 and DALLΒ·E-2 models
- Stability AI - Stable Diffusion XL, SD 2.1, and more via API
- Local Stable Diffusion - Run models locally without API keys (GPU recommended)
- Easy provider switching in both GUI and CLI
- Support for custom Hugging Face models
- Model browser and downloader for Local SD models
- Popular model recommendations with descriptions
- API Key Authentication - Simple setup for individual users
- Google Cloud Authentication - Enterprise-ready with Application Default Credentials
- Hugging Face Authentication - Built-in token management for model downloads
- Secure credential storage in platform-specific directories
- Real-ESRGAN - State-of-the-art AI upscaling for enhanced image quality
- GPU Acceleration - Automatic NVIDIA GPU detection for faster processing
- GUI Installation - One-click installation directly from the application
- Multiple Methods - Choose between AI upscaling, Lanczos, or cloud services
- Smart upscaling when target resolution exceeds provider capabilities
- Environment variable support for CI/CD integration
- Per-provider API key management
- Modern GUI - User-friendly desktop interface built with Qt/PySide6
- Video Project - Full-featured π¬ Video tab for creating AI-powered videos with version control
- Powerful CLI - Full-featured command-line interface for automation
- Cross-platform support (Windows, macOS, Linux)
- Responsive layout with resizable panels
- Reference Image Support - Start with an existing image (provider-dependent):
- Select reference image from file
- Enable/disable reference with checkbox
- Clear reference with one click
- Thumbnail preview of selected image
- Provider-optimized reference handling
- Automatic prompt enhancement with reference context
 
- Enhanced Aspect Ratio Selector - Interactive preview rectangles with custom input support:
- Visual preset buttons for common ratios (1:1, 3:4, 4:3, 16:9, 9:16, 21:9)
- Custom aspect ratio input - Enter any ratio like "16:10" or decimal "1.6"
- Clear mode indicator showing "Using Aspect Ratio" or "Using Resolution"
- Automatic resolution calculation based on provider capabilities
 
- Smart Resolution System - Dual-mode resolution control:
- Auto mode - Resolution calculated from selected aspect ratio
- Manual mode - Direct resolution selection overrides aspect ratio
- Provider-optimized presets (DALLΒ·E, Gemini, Stability AI)
- Visual feedback showing which mode is active (green for AR, blue for resolution)
- Social Media Sizes Dialog - Quick access to platform-specific image dimensions for Instagram, Twitter/X, Facebook, LinkedIn, YouTube, TikTok, and more
 
- Quality & Style Options - Standard/HD quality, style presets for different looks
- Batch Generation - Generate multiple variations at once
- Cost Estimation - Real-time cost calculation for all providers
- Advanced Settings Panel - Fine-tune generation parameters:
- Inference steps (1-50)
- Guidance scale (CFG 0-20)
- Scheduler selection
- Seed control for reproducibility
- Negative prompts
- Prompt rewriting/enhancement
 
- Detailed History Table - View all generations with:
- Date and time stamps
- Provider and model used
- Resolution information
- Cost tracking
- Original prompts
 
- Session Persistence - All UI settings saved between sessions
- Metadata Sidecars - JSON files with complete generation details
- Disk History Scanning - Automatically finds previous generations
- Quick History Access - Click to reload prompts and settings
- Auto-save generated images with metadata sidecars
- Template system with placeholder substitution
- Customizable output paths and filenames
- Auto-copy filename to clipboard option
- Smart filename generation from prompts
- Comprehensive keyboard navigation - Full keyboard control for all features
- Button mnemonics - Alt+key shortcuts for all buttons (e.g., Alt+G for Generate)
- Global shortcuts:
- Ctrl+Enter - Generate image from anywhere in the Generate tab
- Ctrl+S - Save generated image
- Ctrl+Shift+C - Copy image to clipboard
- F1 - Jump to Help tab
 
- Text field support - Ctrl+Enter works even when typing in the prompt field
- Tooltips - All buttons show their keyboard shortcuts
- Screen reader compatible - Proper labels and navigation order
- Image format detection and optimization
- Preview scaling with aspect ratio preservation
- Global LLM Provider Selection - Unified provider and model selection across tabs:
- Provider dropdown syncs between Image and Video tabs
- Model list updates automatically per provider
- Remembers selections between sessions
- Project-specific provider settings
 
- Multi-Provider Support - OpenAI GPT-5, Claude, Gemini, Ollama, LM Studio
- GPT-5 Model Support - gpt-5-chat-latest(auto-updating to newest version)
- Correctly uses max_completion_tokensparameter for GPT-5 and GPT-4+ models
- GPT-5 Specific Controls (UI ready for future API support):
- Reasoning effort selector (low/medium/high) - prepared for when API supports it
- Verbosity control (low/medium/high) - prepared for when API supports it
- Auto-shows/hides based on selected model
 
 
- GPT-5 Model Support - 
- Prompt Enhancement - One-click prompt improvement using selected LLM
- Automatic fallback when LLM returns empty response
- Configurable temperature and max tokens for fine-tuning
- Works across both Image and Video tabs with shared enhancement engine
 
- Ask Questions About Prompts - Interactive Q&A dialog for prompt analysis
- Pre-defined questions for quick insights
- Custom question support with detailed answers
- User-adjustable temperature (0-2) and max tokens (100-4000)
- GPT-5 reasoning and verbosity controls when using GPT-5 models
- Session persistence - remembers last question, settings, and GPT-5 parameters
- History tracking of all Q&A interactions
 
- Automatic Model Syncing - Provider and model selections sync between tabs
- Smart Model Detection - Automatically populates available models per provider
- Reference Image Panel - UI controls for starting with reference images:
- File selection dialog
- Reference image thumbnail
- Enable/disable checkbox
- Clear button
- Provider compatibility indicators
 
- Status Bar - Real-time status messages and provider connection feedback
- Console Output Window - Terminal-style log with color-coded messages:
- Timestamp for each operation
- Color coding: Green (success), Red (errors), Blue (progress), Yellow (responses)
- Visual separators between operations
- Resizable with splitter control
 
- Improved Startup - Progress messages during initialization
- Lazy Video Tab Loading - Faster startup by loading video features on-demand
- Complete Video Creation Pipeline - Text to video with AI scene generation
- Version Control System - Event sourcing with time-travel capabilities
- Dual Rendering Engines - FFmpeg slideshow and Google Veo AI (coming soon)
- Multi-Provider LLM Integration - Enhance prompts with GPT-5, Gemini, Claude, and more
- Visual Continuity System - Maintain consistency across scenes with provider-specific techniques
- Smart Lyric Processing - Automatic detection and visual scene creation from song lyrics
- Professional Effects - Ken Burns, transitions, audio sync
- Project History - Complete audit trail with restore points
- MIDI-Based Timing - Perfect beat/measure alignment for scene transitions
- Musical Structure Detection - Identify verses, choruses, bridges from MIDI
- Karaoke Overlays - Bouncing ball, highlighting, fade-in styles
- Lyric Export Formats - Generate LRC, SRT, and ASS subtitle files
- Word-Level Synchronization - Extract timing from MIDI lyric events
- Adjustable Snap Strength - Control how tightly scenes align to beats
- Audio Track Support - Link MP3, WAV, M4A files without copying
- Volume & Fade Controls - Professional audio mixing options
- Modular architecture with provider abstraction
- Worker threads for non-blocking generation
- Comprehensive error handling and recovery
- Progress tracking and status updates
- Event-driven architecture with Qt signals
- Extensible provider system for new services
- Python 3.9+ (3.9 to 3.13 supported)
- Internet connection
- Google account (for Gemini) or OpenAI account (for DALLΒ·E)
- Dependencies (auto-installed via requirements.txt):
- google-genai- Google Gemini API client
- google-cloud-aiplatform- Google Cloud authentication support
- openai- OpenAI API client
- PySide6- GUI framework (optional for CLI-only usage)
- pillow- Image processing
- protobuf- Protocol buffer support
- pretty-midi- MIDI file analysis and timing extraction
- mido- Low-level MIDI manipulation for lyrics and events
- moviepy- Video processing and assembly
- litellm- Unified LLM provider interface (for video prompts)
 
You have two options for authenticating with Google's Gemini API:
- 
Get your API key: - Visit Google AI Studio
- Create a new API key or use an existing one
- Copy the key (keep it secure!)
 
- 
Review documentation: 
- 
Enable billing if required: - Some regions/models require billing
- Visit Google AI Pricing
 
- 
Install Google Cloud CLI: - Download from Google Cloud SDK
- Windows: Use the interactive installer
- macOS: brew install google-cloud-sdk
- Linux: Follow distribution-specific instructions
 
- 
Set up Google Cloud project: - Create/select project at Cloud Console
- Note your Project ID
 
- 
Authenticate: # Login to Google account gcloud auth application-default login # Set your project gcloud config set project YOUR_PROJECT_ID # Verify authentication gcloud auth list 
- 
Enable required APIs: gcloud services enable aiplatform.googleapis.com gcloud services enable cloudresourcemanager.googleapis.com Or enable via Cloud Console 
- 
Enable billing: - Visit Cloud Billing
- New accounts may have free credits
 
- 
Get your API key: - Sign in at OpenAI Platform
- Create API key at API Keys page
 
- 
Review documentation: 
# Clone the repository
git clone https://github.com/yourusername/ImageAI.git
cd ImageAI
# Create virtual environment (recommended)
python -m venv .venv
# Activate virtual environment
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# Windows Command Prompt:
.venv\Scripts\activate.bat
# macOS/Linux:
source .venv/bin/activate
# Install core dependencies
pip install -r requirements.txt
# Optional: Install Local Stable Diffusion support
# For CPU-only:
pip install -r requirements-local-sd.txt
# For GPU support (CUDA):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements-local-sd.txt- Windows: Ensure Python is added to PATH during installation
- macOS: You may need to install Xcode Command Line Tools
- Linux: Install python3-venv if not present: sudo apt install python3-venv
# Launch the graphical interface
python main.py# Show help
python main.py -h
# Quick examples
python main.py -p "A majestic mountain landscape at sunset" -o mountain.png
python main.py --provider openai -m dall-e-3 -p "Futuristic cityscape" -o city.png# Google Gemini with API key
python main.py -s -k "YOUR_GOOGLE_API_KEY"  # Save key
python main.py -p "Beautiful ocean sunset" -o ocean.png  # Generate
# OpenAI with API key
python main.py --provider openai -s -k "YOUR_OPENAI_API_KEY"  # Save key
python main.py --provider openai -m dall-e-3 -p "Abstract art" -o art.png  # Generate
# Using environment variables
export GOOGLE_API_KEY="YOUR_KEY"  # Linux/macOS
$env:GOOGLE_API_KEY = "YOUR_KEY"  # Windows PowerShell
python main.py -p "Mountain landscape"  # Uses env variable# First-time setup
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
# Generate images
python main.py --auth-mode gcloud -p "Tropical paradise" -o paradise.png
# Test authentication
python main.py --auth-mode gcloud -t# Save key from file
python main.py -s -K /path/to/key.txt
# Use key file directly (one-time)
python main.py -K /path/to/key.txt -p "Desert oasis" -o oasis.pngConfiguration and keys are stored in platform-specific directories:
- Windows: %APPDATA%\ImageAI\config.json
- macOS: ~/Library/Application Support/ImageAI/config.json
- Linux: ~/.config/ImageAI/config.json
For each provider, the authentication order is:
- Command-line key (-kor-Kflags)
- Stored configuration
- Environment variables (GOOGLE_API_KEY,OPENAI_API_KEY)
- Google Cloud ADC (for Google provider with --auth-mode gcloud)
- Never commit API keys to version control
- Use environment variables for CI/CD
- Rotate keys regularly
- Use Google Cloud authentication for enterprise deployments
- Store keys in secure password managers
-h, --help              Show help message
-p, --prompt TEXT       Prompt for image generation
-o, --out PATH          Output path for generated image
-m, --model TEXT        Model to use (provider-specific)
-t, --test              Test authentication
-k, --api-key TEXT      API key string
-K, --api-key-file PATH Path to file containing API key
-s, --set-key           Save the provided key
--auth-mode {api-key|gcloud}  Google auth mode (default: api-key)
--provider {google|openai|stability|local_sd}  Provider to use (default: google)
- Google: gemini-2.5-flash-image-preview
- OpenAI: dall-e-3
- Stability AI: stable-diffusion-xl-1024-v1-0
- Local SD: stabilityai/stable-diffusion-2-1
# Test authentication
python main.py -t
python main.py --provider openai -t
python main.py --provider stability -t
python main.py --provider local_sd -t  # Check if ML deps installed
python main.py --auth-mode gcloud -t
# Generate with different providers
python main.py -p "Sunset over mountains" -o sunset.png
python main.py --provider openai -m dall-e-2 -p "Abstract art" -o abstract.png
python main.py --provider stability -p "Fantasy landscape" -o fantasy.png
python main.py --provider local_sd -p "Cyberpunk city" -o cyber.png
python main.py --auth-mode gcloud -p "Space station" -o space.png
# Save and use API keys
python main.py -s -k "YOUR_KEY"  # Save to config
python main.py --provider stability -s -k "YOUR_STABILITY_KEY"
python main.py -K ~/keys/api.txt -p "Ocean waves"  # Use from file# Basic video generation with slideshow
python main.py video --in lyrics.txt --provider gemini --slideshow \
  --audio /path/to/music.mp3 --out video.mp4
# MIDI-synchronized video with beat alignment
python main.py video --in lyrics.txt --midi /path/to/song.mid \
  --audio /path/to/song.mp3 --sync-mode measure --snap-strength 0.9 \
  --out synced_video.mp4
# Video with karaoke overlay
python main.py video --in lyrics.txt --midi song.mid --audio song.mp3 \
  --karaoke --karaoke-style bouncing_ball \
  --export-lrc --export-srt --export-ass \
  --out karaoke_video.mp4
# Using Veo AI for video generation (when available)
python main.py video --in script.txt --veo-model veo-3.0-generate-001 \
  --audio soundtrack.mp3 --out ai_video.mp4- Model Selection: Dropdown with provider-specific models
- Prompt Input: Multi-line text area with built-in search (Ctrl+F)
- AI Prompt Tools:
- Enhance Prompt: Improve your prompt with AI assistance
- Generate Prompts: Create multiple prompt variations with history tracking
- Ask AI Anything: Interactive AI assistant for prompt help or general questions
- Works with or without a prompt
- Continuous conversation mode with context retention
- Editable prompt field with clear edit controls
- Conversation history saved across sessions
 
- Reference Image Analysis: Analyze images to generate detailed descriptions
- Upload any image for AI analysis
- Customizable analysis prompts
- Copy descriptions directly to main prompt
 
 
- Generate Button: Start image generation with progress tracking
- Image Display: High-quality preview with automatic scaling
- Generation Controls:
- Aspect ratio selector with visual previews
- Resolution selector with provider-optimized presets
- Quality settings (Standard/HD)
- Batch size selector (1-4 images)
- Cost estimator showing real-time pricing
 
- Reference Image (Google Gemini only):
- Select a starting image to guide generation
- Visual preview thumbnail with controls
- Style options: Natural blend, Blurred edges, In circle, In frame, As background, etc.
- Position options: Auto, Left, Center, Right, Top, Bottom, corners
- Auto-inserts natural language instructions into prompt
- Shows preview of what will be inserted
- Tip: Press Ctrl+F to search text in prompt field
- Includes resolution info automatically
- Enable/disable checkbox
- Clear button to remove reference
 
- Advanced Settings (collapsible panel):
- Inference steps slider (1-50)
- Guidance scale (CFG) control
- Scheduler selection
- Seed input for reproducibility
- Negative prompt field
- Prompt rewriting toggle
 
- Output Text: Live generation status and file paths
- Examples Button: Access curated prompts library
- Predefined Templates: Ready-to-use prompts with customizable placeholders
- Quick Generation: Jump-start your creativity with proven prompt patterns
- Placeholder System: Customize templates with your own variables
The Video Project feature provides comprehensive tools for creating AI-powered videos from text, with advanced version control and multiple rendering options.
Project Management:
- Workspace and History Tabs: Dual-tab interface for active editing and version control
- Project Operations: Create, open, save, and manage multiple video projects
- Auto-save: Automatic project saving after generation operations
- Project Directory: Organized storage in ~/.imageai/video_projects/
Text Processing:
- Multiple Input Formats:
- Timestamped lyrics: [00:30] First verse lyrics
- Structured sections: # Verse 1,# Chorus,# Bridge
- Plain text with intelligent scene detection
- Custom scene markers for precise control
 
- Timestamped lyrics: 
- Smart Scene Detection: Automatic breaking of text into scenes based on:
- Timestamps in lyrics
- Section headers
- Paragraph breaks
- Semantic analysis
 
Storyboard & Scene Management:
- Interactive Scene Table: Edit titles, durations, and prompts directly
- Timing Controls: Adjustable duration for each scene (0.5-30 seconds)
- Scene Reordering: Drag-and-drop to rearrange scenes
- Batch Operations: Apply settings to multiple scenes at once
- Preview: Real-time preview of scene timings and transitions
AI-Powered Enhancement:
- Multi-Provider LLM Support:
- OpenAI GPT-5 and GPT-4o
- Anthropic Claude 3.5 Sonnet, Opus, Haiku
- Google Gemini 2.0 Flash and Pro models
- Local Ollama models
 
- Prompt Styles:
- Cinematic: Movie-like dramatic scenes
- Artistic: Painterly and stylized visuals
- Photorealistic: High-fidelity realistic images
- Animated: Cartoon and animation styles
- Documentary: Authentic, journalistic look
- Abstract: Experimental and artistic
 
- Batch Enhancement: Process all scenes with consistent style
Image Generation:
- Multi-Provider Support: Generate with Google Gemini, OpenAI DALL-E, Stability AI
- Variant Generation: Create 1-4 variations per scene
- Concurrent Processing: Generate multiple scenes in parallel
- Smart Caching: Hash-based caching to avoid regenerating identical prompts
- Thumbnail System: Automatic thumbnail generation with composite previews
- Cost Tracking: Real-time cost estimation and tracking per scene
Video Rendering Options:
- 
FFmpeg Slideshow Rendering: - Ken Burns Effects: Automatic pan and zoom animations
- Transitions: Smooth crossfade between scenes
- Multiple Aspect Ratios: 16:9, 4:3, 9:16 (vertical), 1:1 (square)
- Resolution Options: Up to 4K output
- Frame Rate Control: 24, 30, or 60 fps
- Audio Integration: Sync with audio tracks
 
- 
Google Veo 3.1 AI Video (NEW! - Version 0.23.1): - Continuous Video Generation: Automatically grab end frame and use as start frame for next scene
- Seamless Scene Transitions: Maintain visual continuity across multi-scene videos
- Optional End Frame Control: Set custom end frames for precise scene endings
- Start/End Frame Support: Google released start/end frame control on October 15, 2025
- True AI Video: Generate motion video from text prompts
- Model Selection: Veo 3.0 and Veo 3.1 models with frame-to-frame continuity
- Duration Control: 4, 6, or 8 second clips per scene (auto-snapped to provider limits)
- Advanced Features: Camera movements, visual styles, physics-accurate motion
- Duration Enforcement: Automatic snapping to 8-second duration for Veo 3.0/3.1 compliance
 
Version Control and History:
- Event Sourcing Architecture: Complete history of all changes
- Time Travel: Restore project to any point in history
- History Timeline: Visual timeline with event markers
- Event Types Tracked:
- Project creation and saves
- Scene additions and edits
- Prompt enhancements
- Image generations
- Video renders
 
- Filtering and Search: Find specific events quickly
- Restore Points: One-click restoration to previous states
- Diff Viewer: See exact changes between versions
Audio & MIDI Support:
- Audio Track Integration: Link MP3/WAV/M4A/OGG files without copying
- MIDI Synchronization: Load MIDI files for precise timing control
- Beat grid alignment for scene transitions
- Measure and section synchronization
- Tempo and time signature display
- Musical structure detection (verse, chorus, bridge)
 
- Karaoke Features:
- Bouncing ball, highlighting, and fade-in styles
- Export to LRC, SRT, and ASS formats
- Word-level timing from MIDI lyrics
- Customizable font size and position
 
- Audio Controls:
- Volume adjustment with real-time preview
- Fade in/out transitions
- Trim controls for start/end offsets
 
- Sync Options:
- None, Beat, Measure, or Section alignment
- Adjustable snap strength (0-100%)
- Extract lyrics from MIDI files
 
Advanced Settings:
- Generation Settings:
- Image provider and model selection
- Quality settings per provider
- Batch size and concurrency limits
 
- Render Settings:
- Output format (MP4, AVI, MOV)
- Codec selection
- Bitrate control
- Metadata embedding
 
- Performance Options:
- GPU acceleration toggle
- Memory usage limits
- Cache management
 
Video Project Workflow Examples:
- 
Simple Lyrics-to-Video Workflow: - Open Video tab β Create New Project
- Paste song lyrics or structured text
- Click "Split Text" to auto-detect scenes
- Select LLM provider and enhance all prompts
- Generate images for each scene
- Choose FFmpeg slideshow for simple rendering
- Render with Ken Burns effects
 
- 
Music-Synced Video Workflow (Advanced): - Create project with song lyrics
- Load audio file (MP3/WAV/M4A)
- Load MIDI file for perfect timing
- Set sync mode to "Beat" or "Measure"
- Auto-detect musical structure
- Generate scenes synced to beats
- Render final video with audio integration
- Export karaoke overlays (LRC/SRT/ASS) if needed
 
- 
AI Motion Video Workflow (Google Veo 3.1): - Create project with scene descriptions
- Select Veo 3.1 model (requires Google Gemini 2.0 API)
- Generate initial frame for first scene
- Let system auto-grab end frame for next scene's start
- Optionally set custom end frames for precise transitions
- Generate motion videos for each scene
- Automatic 8-second duration enforcement
- Combine videos with crossfades
 
- 
Commercial/Professional Workflow: - Create project from detailed storyboard
- Use GPT-5 for ultra-realistic prompts
- Generate multiple variants per scene for selection
- Apply consistent style across scenes (cinematic/photorealistic)
- Render to 4K with 60fps for professional output
- Add metadata and titles
- Export as commercial-ready MP4
 
Video Tab Troubleshooting:
| Issue | Solution | 
|---|---|
| "Scene detection failed" | Ensure text has clear line breaks or timestamps | 
| "MIDI sync not working" | Verify MIDI file format (SMF 0 or 1) and load audio first | 
| "Veo model not available" | Ensure Google API key is set and Gemini 2.0 API is enabled | 
| "Duration mismatch" | System auto-snaps to 4, 6, or 8 seconds; manual adjustment limited | 
| "Video rendering slow" | Reduce resolution or disable Ken Burns effects | 
| "Memory errors during generation" | Reduce batch size or number of concurrent scene generations | 
| "Audio sync issues" | Adjust "snap strength" slider (0-100%) for timing flexibility | 
| "Karaoke export blank" | Ensure MIDI file contains lyric events and is properly formatted | 
| "Generated images don't match style" | Use consistent LLM provider and set strong style descriptors in prompts | 
- Provider Selection: Switch between Google, OpenAI, Stability AI, and Local SD
- Authentication Mode (Google only):
- API Key mode with key input field
- Google Cloud Account mode with status display
 
- Helper Buttons:
- Get API Key - Opens provider's key page
- Load from File - Import key from text file
- Check Status - Verify authentication
- Cloud Console - Open Google Cloud Console (Google only)
 
- Auto-save Options:
- Auto-save generated images toggle
- Copy filename to clipboard option
- Custom output directory selection
 
- Local SD Settings (when selected):
- Model browser and downloader
- Hugging Face authentication
- Cache directory management
- GPU/CPU device selection
 
- Predefined prompt templates with placeholders
- Variable substitution system with live preview
- Template categories:
- Art Style
- Photography
- Design
- Character
- Scene
- Product
- Marketing
 
- Append or replace current prompt
- Custom template creation and saving
- Detailed History Table with columns:
- Date and time
- Provider used
- Model name
- Resolution
- Cost (when applicable)
- Original prompt
 
- Quick Actions:
- Click to reload prompt and settings
- Open image file location
- View metadata sidecar
 
- Search and Filter:
- Filter by provider
- Search by prompt text
- Sort by date, cost, or model
 
- Embedded README documentation with full content
- Interactive Search:
- Search box with real-time results
- Navigate between matches with Previous/Next buttons
- Match counter showing current/total results
- Keyboard shortcuts (F3 for next, Shift+F3 for previous)
 
- Navigation controls:
- Back/Forward buttons for history
- Home button to return to top
- Keyboard navigation (Alt+Left/Right, Backspace)
 
- Quick reference guide
- Keyboard shortcuts reference
- Provider-specific tips
- Troubleshooting guide
- New Generation (Ctrl+N)
- Save Image As... (Ctrl+S)
- Open Output Directory
- Recent Files
- Exit (Ctrl+Q)
- Copy Prompt (Ctrl+C)
- Paste Prompt (Ctrl+V)
- Clear All (Ctrl+Shift+C)
- Copy Image to Clipboard
- Copy Filename
- Show/Hide History Panel
- Show/Hide Advanced Settings
- Full Screen Mode (F11)
- Reset Layout
- Zoom In/Out
- Model Browser (Local SD)
- Batch Generator
- Template Editor
- Settings Manager
- Documentation (F1)
- Keyboard Shortcuts
- Check for Updates
- About ImageAI
Generated images are automatically saved to:
- Windows: %APPDATA%\ImageAI\generated\
- macOS: ~/Library/Application Support/ImageAI/generated/
- Linux: ~/.config/ImageAI/generated/
- Filenames derived from prompt (sanitized)
- Timestamp added for uniqueness
- Format: prompt_words_YYYYMMDD_HHMMSS.png
Each image gets a .json sidecar file containing:
{
  "prompt": "User's prompt text",
  "model": "Model used",
  "provider": "Provider name",
  "created_at": "ISO timestamp",
  "app_version": "Version number",
  "output_text": "Any text output",
  "template": "Template data if used"
}- In-session history of generated images
- Persistent history across sessions
- Quick access to recent generations
- Metadata search and filtering
"Oil painting of a serene mountain lake at golden hour, impressionist style"
"Cyberpunk street scene with neon lights and rain reflections, ultra-detailed"
"Watercolor portrait of a wise owl in autumn forest, soft pastels"
"Professional photograph of a modern minimalist living room, magazine quality"
"Macro shot of dewdrops on a spider web at sunrise, shallow depth of field"
"Aerial view of tropical islands with crystal clear water, drone photography"
"Steampunk airship floating above Victorian London, brass and copper details"
"Bioluminescent underwater cave with glowing creatures, fantasy art"
"Isometric cutaway of a cozy treehouse library with magical elements"
The template system allows you to create consistent, reusable prompts with variable placeholders.
- Select a Template: Choose from the dropdown in the Templates tab
- Fill Placeholders: Enter values for each variable field (optional)
- Preview: See the assembled prompt update in real-time
- Insert: Click "Insert into Prompt" to use the template
- Append Option: Check "Append to existing" to add to current prompt
Portrait Photography:
"[style] portrait of [subject] with [expression], [lighting] lighting, [background] background"
Landscape Scene:
"[time_of_day] landscape of [location] with [features], [weather] weather, [style] style"
Product Shot:
"Product photography of [item] on [surface], [lighting] lighting, [angle] angle, commercial quality"
Fantasy Art:
"[character] [action] in [setting], [magic_effect], [art_style] fantasy art style"
Architectural:
"[building_type] in [architectural_style] style, [time_of_day], [weather], [perspective] view"
- Variables are defined with square brackets: [variable_name]
- Leaving a field empty removes it from the final prompt
- Multiple instances of the same variable use the same value
- Commas are automatically managed when variables are empty
Be Specific: Instead of "a cat", try "a fluffy orange tabby cat sitting on a windowsill"
Include Style: Add artistic style like "oil painting", "photorealistic", "cartoon style"
Describe Mood: Include lighting and atmosphere like "golden hour", "dramatic lighting", "cozy"
Add Details: More details generally produce better results
Composition Tips:
- Use camera angles: "aerial view", "close-up", "wide angle"
- Specify perspective: "first-person view", "isometric", "side profile"
- Include depth: "shallow depth of field", "bokeh background", "infinite focus"
Quality Modifiers:
- "highly detailed", "ultra-realistic", "4K", "HD"
- "professional photography", "award-winning"
- "trending on artstation", "masterpiece"
GUI Mode:
- Ctrl+Enter: Generate image
- Ctrl+F: Find/search in prompt text
- F3: Find next match
- Shift+F3: Find previous match
- Escape: Close find dialog
- Ctrl+S: Save current image as...
- Ctrl+Q: Quit application
- Ctrl+N: Clear prompt
- Ctrl+A: Select all text
- Ctrl+C/V/X: Copy/Paste/Cut
- Tab: Switch between tabs
- F1: Show help
- F3: Find next in help documentation
- Shift+F3: Find previous in help documentation
- Ctrl+F: Focus search box in help tab
- Alt+Left/Backspace: Navigate back in help
- Alt+Right: Navigate forward in help
- Ctrl+Home: Go to top of help
Generate multiple images with variations:
# Generate 3 variations of the same prompt
for i in {1..3}; do python main.py -p "Sunset landscape" -o "sunset_$i.png"; done
# Generate from a list of prompts
while read prompt; do
  python main.py -p "$prompt" -o "${prompt// /_}.png"
done < prompts.txt# Set default provider
export IMAGEAI_PROVIDER="openai"
export OPENAI_API_KEY="your-key"
# Set default model
export IMAGEAI_MODEL="dall-e-3"
# Set output directory
export IMAGEAI_OUTPUT_DIR="/path/to/images"Edit config file directly:
- Windows: %APPDATA%\ImageAI\config.json
- macOS: ~/Library/Application Support/ImageAI/config.json
- Linux: ~/.config/ImageAI/config.json
Example config:
{
  "provider": "google",
  "google_api_key": "your-key",
  "openai_api_key": "your-key",
  "stability_api_key": "your-key",
  "auto_save": true,
  "output_format": "png",
  "jpeg_quality": 95,
  "default_model": {
    "google": "gemini-2.5-flash-image-preview",
    "openai": "dall-e-3",
    "stability": "stable-diffusion-xl-1024-v1-0"
  }
}When using Local SD provider, advanced settings are available:
Inference Steps: Number of denoising steps (1-50)
- More steps = better quality but slower
- Turbo models: 1-4 steps
- Standard models: 20-50 steps
- Real-time preview of step count impact
Guidance Scale (CFG): How closely to follow prompt (0-20)
- Lower (1-5): More creative/artistic
- Medium (7-8): Balanced
- Higher (10-15): More literal prompt following
- Visual indicator shows optimal range per model
Resolution: Output image dimensions
- SD 1.5/2.1: 512x512 optimal
- SDXL: 1024x1024 optimal
- Custom sizes supported but may affect quality
- Aspect ratio preservation with smart presets
Scheduler: Sampling algorithm
- DPM++ 2M Karras: Good balance
- Euler A: Fast, good for most cases
- DPM++ SDE Karras: Higher quality, slower
- DDIM: Deterministic, good for reproducibility
- LMS: Classic scheduler
- PNDM: Fast convergence
Additional Controls:
- Seed: Set specific seed for reproducible results
- Negative Prompt: Specify what to avoid in generation
- VAE Selection: Choose different VAE models for style
- Attention Slicing: Memory optimization for large images
- CPU Offload: Move models to CPU when not in use
Using Custom Models:
# Download a specific model
python main.py --provider local_sd -m "runwayml/stable-diffusion-v1-5" -p "Test" -o test.png
# Use downloaded model (cached)
python main.py --provider local_sd -m "runwayml/stable-diffusion-v1-5" -p "Art" -o art.pngPopular Models:
- stabilityai/stable-diffusion-2-1: Balanced quality/speed
- runwayml/stable-diffusion-v1-5: Classic, widely compatible
- stabilityai/stable-diffusion-xl-base-1.0: High quality, 1024x1024
- segmind/SSD-1B: Fast SDXL variant
- stabilityai/sdxl-turbo: Ultra-fast 1-4 step generation
Model Cache Location:
- Default: ~/.cache/huggingface/hub/
- Size: Models range from 2GB to 7GB
- First use downloads the model
- Subsequent uses load from cache
GPU Acceleration (Local SD only):
# Install CUDA support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Verify GPU is detected
python -c "import torch; print(torch.cuda.is_available())"Memory Management:
- Close other applications when using Local SD
- Use smaller models for limited VRAM (< 6GB)
- Enable attention slicing for large images
- Reduce batch size if out of memory
The application may take a few seconds to start as it loads AI provider libraries:
- Google Cloud AI: ~10-15 seconds on first load (largest library)
- Google Gemini: ~2 seconds
- OpenAI: ~3 seconds
You'll see "Loading provider: [name]..." in the console during startup. Provider libraries are cached after first load, so switching between providers is faster after the initial import.
Google API Key Issues:
- Verify key at AI Studio
- Check billing is enabled
- Ensure API is not restricted by IP
Google Cloud Auth Issues:
# Verify authentication
gcloud auth list
gcloud auth application-default print-access-token
# Enable required APIs
gcloud services enable aiplatform.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
# Check project
gcloud config get-value projectOpenAI Issues:
- Verify key at OpenAI Platform
- Check rate limits and quotas
- Ensure billing is active
Windows PowerShell:
# If scripts are blocked
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
# If gcloud not found
$env:PATH += ";C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin"Module Import Errors:
# Reinstall specific package
pip install --upgrade google-genai
pip install --upgrade PySide6
# Clear pip cache
pip cache purgePySide6 not loading:
# Install with specific version
pip install PySide6==6.5.3
# Linux: Install system dependencies
sudo apt-get install python3-pyside6Display scaling issues:
# Set Qt scaling
export QT_AUTO_SCREEN_SCALE_FACTOR=1
export QT_SCALE_FACTOR=1.25| Error | Solution | 
|---|---|
| "API key not found" | Set key via Settings or use -s -k YOUR_KEY | 
| "Quota exceeded" | Check billing and quotas in provider console | 
| "Invalid prompt" | Avoid restricted content, check provider policies | 
| "Module not found" | Run pip install -r requirements.txt | 
| "gcloud not found" | Install Google Cloud SDK or use API key mode | 
Migrates old configuration formats to the current structure and optionally secures API keys.
Usage:
# Dry run to see what changes would be made
python migrate_config.py --dry-run
# Perform migration
python migrate_config.py
# Migration without securing keys
python migrate_config.py --no-secureWhat it does:
- Removes legacy api_keyfield from root level
- Fixes incorrect keys.<provider>structure
- Moves all API keys to proper providers.<provider>.api_keyformat
- Attempts to secure keys in system keyring when available
- Creates timestamped backups before making changes
Windows only: Moves API keys from plaintext config.json to Windows Credential Manager for encryption.
Usage (run in Windows PowerShell/Command Prompt, not WSL):
cd D:\path\to\ImageAI
python secure_keys.pyWhat it does:
- Reads API keys from config.json
- Stores them securely in Windows Credential Manager (encrypted by OS)
- Removes plaintext keys from config.json
- Creates backup before modification
- Keys are automatically retrieved by the app when needed
Benefits:
- API keys are encrypted by Windows
- Keys survive config.json deletion/corruption
- More secure than plaintext storage
- Works seamlessly with the application
Q: Which provider should I use? A: It depends on your needs:
- Google Gemini: Best for general purpose, good quality, reliable
- OpenAI DALL-E: Best for creative/artistic images
- Stability AI: Best for photorealistic and detailed images
- Local SD: Best for privacy, unlimited generation, customization
Q: Can I use multiple providers?
A: Yes! You can switch providers anytime in Settings or via CLI --provider flag.
Q: Are my prompts and images private? A:
- API providers (Google, OpenAI, Stability) process prompts on their servers
- Local SD runs entirely on your machine - fully private
- Generated images are saved locally only
Q: How much does it cost? A: See the detailed pricing table below for current rates across all providers.
Q: "API key not found" error A:
- Make sure you've entered the key in Settings
- Click "Save & Test" to save it
- Check the key is valid on provider's website
Q: "Module not found" error
A: Run pip install -r requirements.txt in your terminal
Q: GUI won't start on Linux
A: Install system Qt libraries: sudo apt-get install python3-pyside6
Q: Local SD says "No module named 'diffusers'"
A: Install Local SD dependencies: pip install -r requirements-local-sd.txt
Q: "Safety filter triggered" or "Blocked" message A: Your prompt may contain restricted content. Try:
- Rephrasing your prompt
- Avoiding violence, explicit content, or real people's names
- Using more general terms
Q: Images are low quality A:
- Add quality modifiers: "high quality", "detailed", "4K"
- Try different models (DALL-E 3, SDXL)
- For Local SD: increase steps and guidance scale
Q: Generation is very slow A:
- API providers: Network speed dependent, typically 5-20 seconds
- Local SD on CPU: Can take 2-10 minutes
- Local SD on GPU: Usually 10-60 seconds
- Use turbo models for faster generation
Q: "Out of memory" with Local SD A:
- Use smaller models (SD 1.5 instead of SDXL)
- Reduce image resolution
- Close other applications
- Consider upgrading GPU VRAM
Q: Can I edit existing images? A: Not yet in current version. Planned features include inpainting and image-to-image.
Q: Can I generate multiple images at once? A: Currently one at a time in GUI. Use CLI with shell scripts for batch generation.
Q: Can I use my own Stable Diffusion models? A: Yes! With Local SD, enter any Hugging Face model ID or use the Model Browser.
Q: Is there a web version? A: Not currently. This is a desktop application. Web interface is planned for future.
Q: Can I use this commercially? A: Check each provider's terms:
- Google, OpenAI: Commercial use allowed with paid plans
- Stability AI: Commercial use allowed
- Local SD: Depends on specific model license
| Provider | Model | Free Tier | API Cost per Image | Subscription Plans | 
|---|---|---|---|---|
| Google Gemini | ||||
| Gemini 2.5 Flash (Image Preview) | 25 images/day 5 requests/min Via AI Studio (free) | $0.039 @ 1024x1024 ($30 per 1M tokens) | Gemini Advanced: $19.99/mo β’ 100-150 images/day β’ Priority access | |
| OpenAI | ||||
| DALLΒ·E 3 | 2 images/day (via ChatGPT free) | Standard: $0.04 @ 1024x1024 HD: $0.08 @ 1024x1024 Large: $0.08-0.12 @ 1024x1792 | ChatGPT Plus: $20/mo β’ Unlimited within caps β’ GPT-4 access included | |
| DALLΒ·E 2 | None | $0.02 @ 1024x1024 $0.018 @ 512x512 $0.016 @ 256x256 | Included in Plus | |
| GPT-4o Image | 40 images/month (via API free tier) | $0.035 @ 1024x1024 | Same as above | |
| Stability AI | ||||
| Stable Diffusion XL | 25 free credits on signup | ~$0.01-0.02 per image (credit-based) | Stable Assistant: $9/mo Commercial: $20/mo Enterprise: Custom | |
| SD 3.5 Large | Limited free credits | ~$0.068 @ 1024x1024 (via third-party APIs) | Included in plans | |
| Local SD | ||||
| Any Hugging Face Model | Unlimited | $0 (your hardware) | N/A | 
- Free Tier: 25 requests/day, 5 RPM via AI Studio (completely free)
- Token Pricing: Images consume ~1,290 tokens for 1024x1024
- Rate Limits: Preview models have stricter limits
- Limitations: Currently only generates square (1:1) images regardless of aspect ratio settings
- Best For: Free testing, development, and moderate usage
- Free Access: ChatGPT free users get 2 DALLΒ·E images/day
- API Free Tier: New accounts get 40 images/month free
- Quality Tiers: Standard vs HD (2x price for better quality)
- Resolution Impact: Larger sizes cost 2-3x more
- Best For: High-quality artistic images, professional use
- Credit System: 6.5 credits per image, 0.1 credit per message
- Community License: FREE for businesses <$1M annual revenue
- Enterprise License: Required for >$1M revenue (custom pricing)
- Open Source: Models can be self-hosted for free
- Best For: High-volume generation, customization needs
- Completely Free: No API costs, unlimited generation
- Hardware Requirements:
- Minimum: 8GB RAM (CPU mode, slow)
- Recommended: 6GB+ VRAM GPU
- Optimal: 12GB+ VRAM for SDXL models
 
- Generation Speed:
- CPU: 2-10 minutes per image
- GPU: 10-60 seconds per image
 
- Best For: Privacy, unlimited generation, experimentation
- For Hobbyists: Use Google Gemini free tier (25 images/day)
- For Developers: Start with free tiers, upgrade as needed
- For Production: Compare subscription vs API costs based on volume
- For Privacy: Use Local SD with one-time hardware investment
- For Quality: OpenAI DALLΒ·E 3 HD or Stability AI enterprise
| Usage Pattern | Best Option | Monthly Cost | 
|---|---|---|
| <25 images/day | Google Gemini Free | $0 | 
| 50-100 images/day | ChatGPT Plus | $20 | 
| 500 images/month | Mix free tiers + API | ~$10-20 | 
| 1000+ images/month | Stability AI subscription | $20 | 
| Unlimited | Local SD | $0 (after setup) | 
- Google Cloud: Custom Vertex AI pricing, volume discounts
- OpenAI: Enterprise agreements available, contact sales
- Stability AI: Custom pricing for >$1M revenue companies
- Azure OpenAI: Enterprise SLAs, regional deployment options
Since ImageAI's Video Project feature is designed to work with Google Veo for AI video generation, here's the current pricing:
| Model | Model ID | Duration | Audio | Gemini API Price | Features | 
|---|---|---|---|---|---|
| Veo 3 | veo-3.0-generate-001 | 8 sec | β Yes | $0.75/second ($6.00/video) | Best quality, physics-accurate | 
| Veo 3 Fast | veo-3.0-fast-generate-001 | 8 sec | β Yes | $0.40/second ($3.20/video) | Optimized for speed | 
| Veo 2 | veo-2.0-generate-001 | 5-8 sec | β No | $0.35/second ($2.10-2.80/video) | 4K support, no audio | 
| Plan | Price/Month | Video Credits | Best For | 
|---|---|---|---|
| Google AI Pro | $19.99 | 90 Veo 3 Fast videos | Individual creators | 
| Google AI Ultra | $249.99 | Higher limits + Veo 3 | Professional use | 
| API Pay-as-you-go | Usage-based | Unlimited | Developers | 
Notes:
- Videos are 8 seconds max (chain for longer content)
- Generated videos stored for 2 days only
- All videos include SynthID watermark
- Available in US (consumer plans), global via Vertex AI
- gemini-2.5-flash-image-preview(default)
- gemini-2.0-flash-lite-preview-02-05
- gemini-2.0-flash-thinking-exp-01-21
- dall-e-3(default) - Best quality, 1024x1024
- dall-e-2- Good quality, multiple sizes
- stable-diffusion-xl-1024-v1-0(default) - SDXL, best quality
- stable-diffusion-v1-6- SD 1.6, balanced
- stable-diffusion-512-v2-1- SD 2.1, faster
- stable-diffusion-xl-beta-v2-2-2- SDXL beta
- stabilityai/stable-diffusion-2-1(default)
- runwayml/stable-diffusion-v1-5- Popular SD 1.5
- stabilityai/stable-diffusion-xl-base-1.0- SDXL base
- segmind/SSD-1B- Fast SDXL variant
- Custom models from Hugging Face Hub
| Provider | Requests/Min | Daily Limit | Notes | 
|---|---|---|---|
| Google (Free) | 60 | 1,500 | Varies by region | 
| Google (Paid) | 360 | Unlimited | Billing required | 
| OpenAI | Varies | By tier | Check dashboard | 
| Stability AI | 150 | By credits | Pay per generation | 
| Local SD | Unlimited | Unlimited | Limited by hardware | 
All API providers return images as base64-encoded PNG data or URLs, automatically decoded and saved by the application. Local SD generates images directly as PIL Image objects.
ImageAI/
βββ main.py                   # Main entry point
βββ cli/                      # CLI interface
βββ gui/                      # GUI interface
βββ core/                     # Core functionality
βββ providers/                # Provider implementations
β   βββ base.py              # Base provider interface
β   βββ google.py            # Google Gemini provider
β   βββ openai.py            # OpenAI DALL-E provider
β   βββ stability.py         # Stability AI provider
β   βββ local_sd.py          # Local Stable Diffusion
βββ templates/                # Prompt templates
βββ requirements.txt          # Core dependencies
βββ requirements-local-sd.txt # Local SD dependencies
βββ README.md                 # This file
βββ CLAUDE.md                 # Claude AI guidance
βββ GEMINI.md                 # Gemini setup guide
βββ Plans/                    # Future development plans
β   βββ GoogleCloudAuth.md
β   βββ NewProviders.md
β   βββ ProviderIntegration.md
βββ .gitignore                # Git ignore rules
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- β Stability AI integration (completed)
- β Local Stable Diffusion support (completed)
- β Video Project System with version control (completed)
- Google Veo AI video generation (in progress)
- Advanced video transitions and effects
- Multi-track audio support
- Image editing capabilities (inpainting, outpainting) - Partially implemented
- Local model management GUI (Phase 3 in progress)
- Batch processing improvements
- Plugin system for custom providers
- Additional providers (Midjourney API, Adobe Firefly, etc.)
- Web interface option
- Mobile app companion
See CHANGELOG.md for the complete version history and release notes.
Sizes are automatically set when using the "Social Sizes" button!
Starting point with an empty canvas, ready for your creative prompts.
Clean interface state before generating any images - shows the prompt input area and configuration options
Creating images using AI-powered prompt generation for optimal results.
Using the AI prompt generation feature to create detailed, optimized prompts for better image quality
Improving and refining your prompts with AI assistance for more detailed outputs.
The AI enhancement feature taking simple prompts and expanding them with artistic details and style descriptions
The screenshots showcase:
- Multi-Provider Support: Switch seamlessly between Google Gemini, OpenAI DALLΒ·E, Stability AI, and Local SD
- AI-Powered Prompts: Generate and enhance prompts using GPT-5, Claude, Gemini, or local LLMs
- Flexible Resolution Control: Aspect ratio presets, custom ratios, and direct resolution input
- Real-time Console: Color-coded status messages showing generation progress and API interactions
- Session Persistence: All settings and history maintained between sessions
- Cost Tracking: Real-time cost estimation for all providers
- Tabbed Navigation: Easy access to Image, Templates, Video, Settings, and Help sections
- History Tracking: Visual timeline of all generated images with metadata
Created by Leland Green | LelandGreen.com Contact: | [email protected]
Built with:
- JetBrains PyCharm
- PySide6/Qt Framework
- Google Gemini API
- OpenAI API
See LICENSE file for details.
For issues, feature requests, or questions:
- GitHub Issues: Create an issue
- Support: [email protected]
- LelandGreen.com on Discord -- The Intersection of Art and AI Discord
- Fun stuff:
- Original Boogie Woogie! Leland Green's Boogie Woogie (HQ) on SoundCloud
- Leland's Old Art
- Art Gallery: Art by Leland
 
- Fun stuff:
- [Leland Green's Boogie Woogie (HQ) on SoundCloud](https://soundcloud.- com/aboogieman/leland-greens-boogie-woogie-hq)
- Old art
 
Happy Creating! π¨