This project is a feature-rich, browser-based application designed to interact with multiple AI language models. It provides a user interface inspired by ChatGPT, enabling conversations with various models including:
- OpenAI models:
gpt-4o(using the/v1/responsesAPI) ando3-mini(using the/v1/chat/completionsAPI) - Google Gemini models: Support for Gemini models via the Google AI API
- xAI models: Support for Grok models via the X.AI API
The application now features cloud storage with Supabase, allowing users to create accounts and access their data across devices. All chat history, API settings, and Custom GPT configurations are stored in the Supabase database when logged in, with localStorage as a fallback when offline.
Key features include:
- User Authentication: Create accounts, log in, and securely access your data from any device using Supabase authentication.
- Custom GPT functionality: Create, edit, and manage personalized GPTs with specific instructions, capabilities (like web search), and knowledge bases (uploaded TXT, MD, PDF files).
- Text-to-Speech (TTS): Playback AI responses using OpenAI's
gpt-4o-mini-ttsmodel, with customizable instructions (tone, speed) via the settings panel. - Image Generation: Generate images using DALL-E 3 based on user prompts when
gpt-4ois the effective model. - Multi-modal Input: Supports text, image uploads (JPG/PNG for
gpt-4o), and per-message file attachments (TXT/MD/PDF).
Note: While the application previously stored data only in localStorage, it now offers secure cloud storage through Supabase integration. API keys are stored in the Supabase database when logged in, providing better security than localStorage alone.
-
Multi-Model Support:
- OpenAI Models: Routes requests to OpenAI's
/v1/responsesAPI forgpt-4oor/v1/chat/completionsAPI foro3-mini. - Google Gemini Models: Supports Gemini models through the Google AI API.
- xAI Models: Supports Grok models through the X.AI API.
- Handles streaming responses from all supported API endpoints.
- Intelligently selects the appropriate API based on the model selection.
- OpenAI Models: Routes requests to OpenAI's
-
User Authentication & Cloud Storage:
- User Accounts: Create accounts, sign in, and access your data from any device.
- Secure API Key Storage: API keys are stored in the Supabase database when logged in.
- Cross-Device Access: Access your chats, custom GPTs, and settings from anywhere.
- Offline Fallback: Falls back to localStorage when offline or not logged in.
-
Input Methods:
- Standard text input with automatic resizing.
- Image upload (JPG/PNG) for
gpt-4oprompts (via image icon). - Per-message file attachments (TXT, MD, PDF) for context injection (via paperclip icon).
-
Output & Interaction:
- Streaming AI responses displayed incrementally.
- Markdown rendering for formatted AI responses (using
marked.js). - Copy-to-clipboard action for AI message text (or image URL).
- Regenerate response action for AI messages.
- Text-to-Speech (TTS): Playback AI responses using
gpt-4o-mini-ttsvia a speaker icon on each message. TTS instructions (e.g., tone, speed) can be configured in General Settings. Manages audio playback state (loading, playing, stopping). - Image Generation Display: Renders images generated by DALL-E 3 directly in the chat, along with any revised prompt used by the model.
-
Custom GPT Management:
- Create, edit, and delete Custom GPT configurations via a dedicated modal accessed from the sidebar.
- Define: Name, Description, Instructions (System Prompt).
- Enable Capabilities: Web Search toggle (requires
gpt-4o). - Upload Knowledge Files: Attach TXT, MD, or PDF files (content read, validated, and stored in the database when logged in).
- Activate Custom GPTs from the sidebar list to tailor conversations. The active GPT's name is displayed in the header.
-
Persistence (Supabase & localStorage):
- Cloud Storage: When logged in, all data is stored in Supabase database.
- Local Fallback: When not logged in, falls back to localStorage.
- Saves/loads individual chat conversations.
- Automatically saves the current chat when switching contexts (new chat, load chat).
- Dynamically lists saved chats in the sidebar for easy access and deletion.
- Persists Custom GPT configurations, including instructions and knowledge file content.
-
Settings & Configuration:
- General Settings Modal: Configure API Keys for multiple providers (OpenAI, Google Gemini, xAI).
- Set the default model used when no Custom GPT is active.
- Configure custom instructions for TTS playback.
- Settings are synchronized across devices when logged in.
-
UI Components:
- Responsive layout with collapsible sidebar.
- Main chat interface with user/AI message bubbles.
- Authentication modal for sign up, sign in, and password reset.
- Separate modals for General Settings and Custom GPT creation/editing.
- Toast notifications for user feedback (success, error, info).
- Input toolbar with buttons/toggles for image upload, file attachment, web search, image generation mode. Button states update based on the effective model.
- Header displays the default model selector (dropdown) and the name of the currently active Custom GPT (if any).
- Frontend: HTML5, CSS3, JavaScript (Vanilla ES Modules)
- Markdown Rendering:
marked.js(via CDN) - PDF Text Extraction:
pdf.js(via CDN) - APIs:
- OpenAI API (
/v1/chat/completions,/v1/responses,/v1/audio/speech,/v1/images/generations) - Google AI API (Gemini models)
- X.AI API (Grok models)
- OpenAI API (
- Backend & Authentication: Supabase (PostgreSQL database, Auth, Row Level Security)
- Storage:
- Primary: Supabase database (when logged in)
- Fallback: Browser
localStorage(when offline or not logged in)
.
│
├── index.html # Main HTML file, structure of the page
│
├── css/
│ ├── base.css # Base styles, resets, CSS variables
│ ├── components.css # Styles for individual UI components (buttons, messages, modals, etc.)
│ └── layout.css # Styles for page structure (header, sidebar, chat area, input area)
│
├── js/
│ ├── main.js # Main application entry point, initialization logic
│ ├── api.js # Handles API calls (Chat, Responses, TTS, Image Gen) & routing logic
│ ├── state.js # Manages active session state (settings, history, active GPT, toggles, etc.)
│ ├── chatStore.js # Manages persistent CHAT storage (localStorage)
│ ├── parser.js # Parses Markdown using 'marked', handles streaming text accumulation
│ ├── utils.js # Utility functions (escapeHTML, copy, base64, file reading, PDF processing, ID generation)
│ │
│ ├── components/ # Modules for specific UI parts
│ │ ├── header.js # Header logic (default model dropdown, active GPT display, settings button)
│ │ ├── chatInput.js # Input area, image/file upload, toolbar logic (search, image gen toggles)
│ │ ├── messageList.js # Message rendering, streaming updates, typing indicator, TTS button logic, history rendering
│ │ ├── notification.js # Displaying temporary toast notifications
│ │ ├── settingsModal.js# General settings modal logic (API key, default model, TTS instructions)
│ │ ├── sidebar.js # Sidebar logic (visibility, chat & GPT lists, loading/deleting chats/GPTs, new chat)
│ │ └── welcomeScreen.js# Initial welcome screen logic, example prompts
│ │
│ └── customGpt/ # Modules for Custom GPT functionality
│ ├── gptStore.js # Manages persistent CUSTOM GPT CONFIG storage (localStorage), handles size limits
│ ├── knowledgeHandler.js# Handles processing/validation of knowledge files for the creator modal
│ └── creatorScreen.js # Logic for the Custom GPT creator/editor modal UI
│
└── README.md # This file
- API Routing (
api.js): Determines whether to call Chat Completions (o3-mini), Responses (gpt-4o), Image Generations (DALL-E 3), or TTS (gpt-4o-mini-tts) API based on the effective model and user actions. If a Custom GPT is active, its instructions, knowledge content, and capability settings are retrieved fromstate.jsand injected into the appropriate API request payload. Handles streaming for chat/responses. - State Management (
state.js): Holds the current session's data: API key, default model setting, TTS instructions, current chat history, active chat ID (fromchatStore), active Custom GPT configuration (loaded fromgptStore), staged image/files for the next message, web search toggle state, image generation mode state, last generated image URL, and theprevious_response_idfor ongoing/v1/responsesconversations. - Persistence (
chatStore.js,gptStore.js):chatStore.js: Handles saving/loading/deleting individual chat histories to/fromlocalStorage. Manages the list of chat metadata.gptStore.js: Handles saving/loading/deleting Custom GPT configurations. Stores the entire config, including name, description, instructions, capabilities, and knowledge file content, inlocalStorage. Includes checks to prevent exceeding typicallocalStoragesize limits (around 5MB). Manages the list of config metadata.
- Chat History (
chatStore.js,sidebar.js,messageList.js): Regular chats are saved automatically when switching contexts (new chat, load chat). The sidebar lists saved chats, allowing users to load or delete them.messageList.jsrenders the history fromstate.jsupon loading. - Custom GPTs (
customGpt/modules,sidebar.js,state.js,api.js,header.js):creatorScreen.js: Manages the UI modal for creating/editing configs (name, description, instructions, capabilities, knowledge file list). UsesknowledgeHandler.jsfor file processing andgptStore.jsfor saving/updating.knowledgeHandler.js: Processes uploaded files (TXT, MD, PDF) within the creator modal, validating type/size, reading content, and returning results tocreatorScreen.js.gptStore.js: Saves/loads/deletes the complete configuration (including file content) inlocalStorage.sidebar.js: Lists available Custom GPTs fromgptStore. Handles activation (loads config intostate.js, clears chat), edit (openscreatorScreen.js), and delete actions.state.js: Stores the currently active Custom GPT configuration object.api.js: Injects the active config's instructions and knowledge content into the prompt sent to the OpenAI API.header.js: Updates the header display to show the name of the active Custom GPT.
- Text-to-Speech (
messageList.js,api.js,state.js):- A "Listen" button (speaker icon) appears on completed AI text messages.
messageList.js::handleListenClick: Stops previous audio, sets loading state, retrieves custom TTS instructions fromstate.js.- Calls
api.js::fetchSpeech, passing the message text and instructions to the/v1/audio/speechendpoint (usinggpt-4o-mini-tts). - Receives an audio Blob, creates a URL, and plays it using the browser's
AudioAPI. - Manages playback state (loading, playing, error, ended) and cleans up resources (
stopCurrentAudio).
- Image Generation (
chatInput.js,api.js,state.js,messageList.js):chatInput.js: Toggles image generation mode via button, updates placeholder. On send, checks mode and prompt.api.js::fetchImageGeneration: Calls/v1/images/generations(DALL-E 3) with the prompt.messageList.js: Renders the generated image and any revised prompt in an AI message bubble. Disables regenerate/listen actions for image messages.state.js: Stores theisImageGenerationModeflag and thelastGeneratedImageUrl(used byapi.jsif the next user message should reference the generated image).
- Input Handling (
chatInput.js): Manages the text area, image preview/removal (for user uploads), per-message file attachment/preview/removal, and the state/availability of toolbar buttons (Web Search, Image Generation, Image Upload, File Add) based on the effective model (default or Custom GPT). Callsapi.routeApiCallon send. - Streaming & Parsing (
api.js,parser.js,messageList.js):api.js: Reads streaming responses from both API types.parser.js: Accumulates raw text (accumulateChunkAndGetEscaped), returning escaped chunks for immediate display. Provides final parsed HTML (parseFinalHtml) usingmarked.js.messageList.js: Creates AI message container, appends escaped chunks viaappendAIMessageContent, finalizes with parsed HTML viafinalizeAIMessageContent, and sets up action buttons.
- Clone or download this repository.
- Open the
index.htmlfile in a modern web browser.- Note: Due to browser security policies (CORS) when loading ES Modules or the
pdf.jsworker from local files (file://), you must serve the files using a simple local web server. Many tools can do this, e.g.:- Using Python:
python -m http.server 8000(orpython3 ...) in the project directory. - Using Node.js: Install
npm install -g servethen runserve .in the project directory. - Using VS Code Live Server extension.
- Using Python:
- Access the application via
http://localhost:8000(or the appropriate port configured by your server).
- Note: Due to browser security policies (CORS) when loading ES Modules or the
-
Create an Account or Sign In:
- Click the "Log in" button in the sidebar.
- Create a new account or sign in with your existing credentials.
- Verify your email if creating a new account.
-
Open Settings: Click the "Settings" button in the sidebar footer or the settings icon in the header.
-
Enter API Keys:
- Paste your OpenAI API key into the designated field.
- (Optional) Add Google Gemini API key to use Gemini models.
- (Optional) Add X.AI API key to use Grok models.
-
Set Defaults:
- Choose your preferred default model from the available options.
- (Optional) Enter default instructions for Text-to-Speech playback (e.g., "Speak clearly and calmly.").
-
Save Settings: Your keys and preferences are stored in the Supabase database when logged in, or locally in the browser when not logged in.
-
Start Chatting: Type messages in the input box and press Enter or click the send button.
-
Use Features:
- Model Selection: Choose from OpenAI, Gemini, or Grok models using the dropdown in the header.
- Image Upload: Click the image icon (requires compatible models like
gpt-4o). - File Attachment: Click the paperclip icon to attach TXT/MD/PDF files to the next message.
- Web Search: Click the "Search" button to toggle web search for the next message (requires compatible models).
- Image Generation: Click the "Generate Image" button to toggle image generation mode (requires compatible models). Enter a prompt and send.
- TTS: Click the speaker icon on a completed AI text response to hear it read aloud using your configured instructions.
-
Manage Chats: Use the sidebar ("Chats" section) to start new chats or load/delete previous conversations. Your chats are synchronized across devices when logged in.
-
Manage Custom GPTs:
- Create: Click the "+" button in the "Custom GPTs" sidebar section to open the creator modal.
- Configure: Define Name, Description, Instructions. Toggle Capabilities (Web Search). Upload Knowledge Files (TXT, MD, PDF).
- Save: Click "Save". The configuration (including file content) is stored in the database when logged in.
- Activate: Click a saved Custom GPT in the sidebar list. The header will update, and the chat context will reset for this GPT.
- Edit/Delete: Hover over a Custom GPT in the list to reveal Edit (pencil) and Delete (trash) buttons. Edit opens the creator modal pre-filled. Delete prompts for confirmation.
-
Access Across Devices: When logged in, all your data (chats, custom GPTs, settings) is available on any device where you sign in to your account.
-
API Keys:
- OpenAI API Key: Required for OpenAI models. Entered in the General Settings modal.
- Google Gemini API Key: Optional. Required only if you want to use Gemini models.
- X.AI API Key: Optional. Required only if you want to use Grok models.
- All API keys are stored in the Supabase database when logged in, or in localStorage when not logged in.
-
Default Model: Selected in the General Settings modal or header dropdown. This is the fallback when no Custom GPT is active.
-
TTS Instructions: Optional. Entered in the General Settings modal. Affects voice characteristics.
-
Authentication: User accounts are managed through Supabase Authentication. Email verification is required for new accounts.
-
AI Voice Disclosure: The application includes notes stating that AI voices are generated by OpenAI, as required by their policy.
-
API Key Security: While API keys are now stored in the Supabase database when logged in (improving security), they are still sent directly from the client to the respective AI providers. For production environments, consider implementing a proxy server.
-
PDF Processing: Relies on the
pdf.jslibrary and its worker script loaded from a CDN. Network issues or CDN changes could affect PDF reading. Requires serving files via HTTP(S) due to browser security restrictions with workers. Password-protected PDFs are not supported. -
Model Compatibility: Not all features are available with all models. For example, image upload and generation are only available with certain OpenAI models.
-
Basic Error Handling: While some API errors are caught, complex issues or unexpected API responses might not be handled gracefully.
-
Supabase Dependency: The application now depends on Supabase for authentication and data storage. If Supabase is unavailable, the application will fall back to localStorage, but some features may be limited.
-
Features Not Implemented: Some buttons like "Deep Research", "Voice Input", and "Dark Mode" are placeholders and show an "unimplemented" notification.
- Add support for more AI models and providers.
- Enhance the authentication system with social login options (Google, GitHub, etc.).
- Implement data encryption for API keys stored in the database.
- Add support for sharing custom GPTs between users.
- Implement collaborative chat sessions.
- Add Chat Folders for better organization.
- Implement UI Themes (e.g., Dark Mode toggle).
- Allow selection of different TTS voices.
- Implement Voice Input using browser SpeechRecognition API.
- More robust error handling and user feedback.
- Option to export/import Custom GPT configurations.