Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling#1
Open
jpierzchala wants to merge 22 commits intoTomFrankly:mainfrom
Open
Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling#1jpierzchala wants to merge 22 commits intoTomFrankly:mainfrom
jpierzchala wants to merge 22 commits intoTomFrankly:mainfrom
Conversation
• Added a new static method safe_open_clipboard that attempts to open the clipboard repeatedly (with configurable retries and delay) to handle transient access issues. • Replaced direct win32clipboard.OpenClipboard calls in _paste_with_clipboard_preservation with safe_open_clipboard to safely preserve and later restore clipboard data. • Updated win32clipboard.SetClipboardText to use the CF_UNICODETEXT flag to ensure proper Unicode formatting. • Wrapped the clipboard closing calls in a try/except block to guarantee cleanup even if errors occur. • Added error messages if the clipboard cannot be opened for preservation or restoration. These changes improve the robustness and reliability of clipboard operations during simulated input actions.
…tion Body: Previously, when reading a file in the settings window the file's content was appended to the existing text in the text editor. This led to unexpected modifications of the displayed text. With this change, the editor now retains its current text without appending additional content from the file. File contents are only appended for the api call.
…or gpt-4o-transcribe Body: • Update config_schema.yaml to include a new transcription model option, gpt-4o-transcribe. • Modify transcription.py to obtain the model from api_options instead of hardcoding "whisper-1". • Update the logging message to reflect the chosen model, ensuring transparency during API requests. These changes enable dynamic selection of OpenAI transcription models based on the configuration.
…transcription-with-fallback Retry transcription on failure
This commit introduces a retry mechanism for the audio transcription process. The ResultThread will now attempt to transcribe the audio up to 3 times if the initial attempt fails or returns an empty result. This improves the robustness of the transcription process by handling intermittent errors or empty transcriptions. Key changes: Added a loop to retry transcription up to 3 times. Included a 1-second pause between retries. Logged detailed information for each transcription attempt. If all attempts fail, the audio is saved, and a transcription_failed signal is emitted.
* Initial plan * Enhance failed audio saving with validation and detailed logging Co-authored-by: jpierzchala <[email protected]> * Add implementation summary and complete failed audio saving improvements Co-authored-by: jpierzchala <[email protected]> * Enhance failed audio saving validation and add comprehensive unit tests Co-authored-by: jpierzchala <[email protected]> * fix: resolve test failures in audio saving and result thread tests - Fix ConfigManager mock in test_result_thread.py by adding missing initialize() method - Improve module mocking and cleanup in test_failed_audio_simple.py to prevent import conflicts - Add proper module state management with backup/restore in tests - Ensure clean import state for each test run to avoid cross-test contamination All 7 tests now pass successfully. The fixes address: - AttributeError: MockConfigManager missing 'initialize' attribute - Message capture failures due to improper module mocking - Module import system conflicts between tests Tests now properly validate audio data validation, transcription retry logic, error handling, and failed audio file saving functionality. * feat: add comprehensive AI agent testing requirements and documentation - Create AGENTS.md with mandatory test execution requirements for AI agents - Add testing section to README.md Contributing guidelines - Configure VS Code pytest integration in .vscode/settings.json - Fix existing test failures in audio validation and result thread tests Key changes: - AGENTS.md: Establishes critical requirement that AI agents MUST run tests before completion - README.md: Adds "Running Tests" section with clear pytest commands and AI agent notice - VS Code: Enables automatic test discovery and pytest integration - Tests: Resolve ConfigManager mock issues and module import conflicts All 7 tests now pass successfully. This ensures AI agents (GitHub Copilot, OpenAI Codex, VS Code Copilot Agent) will automatically validate changes before completing work, preventing regressions in audio processing, transcription retry logic, and error handling. Commands for AI agents: - pytest tests/ -v (run all tests) - All tests must pass before work completion * Fix audio saving for failed transcriptions - Modified condition in result_thread.py to check both empty and whitespace-only results - Changed from `if not result:` to `if not result or not result.strip():` - Replaced debug print statements with proper console logging in transcription.py - Ensures failed audio files are saved when transcription returns empty strings after post-processing - Addresses issue where quota exceeded errors weren't triggering audio file saves --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: jpierzchala <[email protected]> Co-authored-by: jpierzchala <[email protected]>
* Initial plan * Add Azure OpenAI Whisper support and English language requirement Co-authored-by: jpierzchala <[email protected]> * Update AGENTS.md with comprehensive dependency installation instructions Co-authored-by: jpierzchala <[email protected]> * Remove implementation summary leftover from previous PR * fix: add Azure OpenAI API key support in keyring system - Fix missing azure_openai_api_key handling in settings_window.py - Add Azure OpenAI key loading from keyring when displaying settings - Add Azure OpenAI key saving to keyring when saving settings - Add Azure OpenAI key removal from config.yaml after keyring save - Create migrate_azure_key.py script to migrate existing keys from config to keyring - Resolve "Azure OpenAI API key not found in keyring" error Fixes issue where Azure OpenAI transcription failed despite API key being set in options * feat(llm): add Azure OpenAI support for LLM post-processing - Add azure_openai as a new API type option in config schema - Implement Azure OpenAI LLM processor with endpoint and deployment support - Add configuration fields for Azure OpenAI LLM credentials and settings - Update settings UI to show/hide provider-specific options dynamically - Add Azure OpenAI LLM API key handling in keyring management - Support both cleanup and instruction modes with Azure OpenAI models * test: Add comprehensive test coverage for Azure OpenAI features - Add 31 new tests covering all Azure OpenAI functionality changes - Test Azure OpenAI LLM processor initialization and text processing - Test Azure OpenAI transcription provider integration - Test keyring manager integration for API key storage - Test Azure key migration script functionality - Test UI integration for Azure OpenAI settings - Test end-to-end workflows and error handling - Resolve test isolation issues with proper mocking strategy - Achieve 100% test coverage for branch changes (40/40 tests passing) Tests added: - test_azure_openai_llm.py (5 tests) - LLM processor core functionality - test_azure_openai_llm_integration.py (5 tests) - Integration tests - test_azure_key_migration.py (6 tests) - Key migration functionality - test_azure_ui_integration.py (10 tests) - UI integration tests - test_azure_end_to_end.py (6 tests) - End-to-end workflow tests All tests pass successfully, ensuring robust coverage of Azure OpenAI features including transcription, LLM processing, keyring integration, and configuration management. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: jpierzchala <[email protected]> Co-authored-by: jpierzchala <[email protected]>
🎯 Summary This PR introduces two major enhancements to WhisperWriter: Windows Autostart functionality and a comprehensive logging system with verbose mode. These features significantly improve user experience by enabling automatic application startup and providing better debugging capabilities. ✨ New Features 1. Windows Autostart Functionality Automatic Startup: WhisperWriter can now automatically start when Windows boots up GUI Integration: Added checkbox in Settings window to easily enable/disable autostart Smart Executable Detection: Automatically detects whether to use run_project.bat or run.py for startup Windows-Only Feature: Gracefully handles non-Windows systems with appropriate messaging Shortcut Management: Creates and manages Windows shortcuts in the startup folder using PowerShell 2. Comprehensive Logging System Verbose Mode: New -V or --verbose command-line flag for detailed debugging output File Logging: Optional logging to file with configurable path (~/.whisperwriter/logs/whisperwriter.log by default) Console Control: Configurable console output (can be disabled when using file logging) LLM Debugging: Full logging of prompts, system messages, and API responses in verbose mode Centralized Logging: All output goes through ConfigManager.console_print() for consistent handling 🔧 Technical Implementation AutostartManager (autostart_manager.py) Platform detection for Windows-only functionality PowerShell integration for reliable shortcut creation Robust error handling and user feedback Working directory and executable path management Configuration Schema Updates autostart_on_login: Boolean setting for autostart preference log_to_file: Boolean setting to enable file logging log_file_path: Configurable path for log file (optional) verbose_mode: Boolean setting for verbose output print_to_terminal: Boolean setting to control console output Enhanced Utils (utils.py) New console_print() method with verbose filtering File logging setup with proper encoding and formatting Dynamic logging configuration on config changes set_verbose_mode() for runtime verbose control 🧪 Testing Comprehensive Test Suite: 234 lines of new test code test_autostart.py: Tests all autostart functionality scenarios test_autostart_checkbox.py: Tests GUI integration Platform-specific test handling for Windows/non-Windows environments Mock testing for PowerShell and file system operations 🔄 Updated Components Main Application: Integration of verbose mode from command-line arguments Settings Window: New autostart checkbox with proper state management LLM Processor: Enhanced logging for debugging API interactions Result Thread: Improved error logging and status reporting Run Script: Verbose flag propagation from run.py to main application 📋 Configuration Changes All new settings are backward-compatible with sensible defaults:
|
I was hoping there would be a way to start this up automatically. I was just going to do it by using pyenv and a startup script, but integrating it into the options is a much better idea! (I also never got around to seeing if my idea would work 😅) |
Author
|
@Lord-Memester, I should have made a pull request from a branch instead of from main, because Tom hasn't touched my pull request here at all since March, and quite a lot has happened in the repository since then. My latest version starts automatically with the system, simply by adding a shortcut to the Windows startup. Simple, but it works. |
…er; remove debug logs (#7)
* fix(pynput): suppress mouse scroll beeps by adding no-op scroll handler; remove debug logs * feat(azure-llm): structured outputs via chat/completions response_format (json_schema) when supported; parse processed_and_cleaned_transcript; robust optional SDK imports; add migrate_azure_key.py; stabilize console_print in tests * refactor: address review comments (remove noqa on ollama import, add _safe_console_print helper, improve API version gating and JSON parse logging); tidy migrate_azure_key nested dict access
* Adds GPT-5.1 via Responses API; improves model UI Integrates the OpenAI Responses API for GPT-5.1 models, forcing reasoning effort to "none" for lower-latency editing and automatically routing applicable requests. Adds robust parsing of Responses payloads with graceful fallbacks. Improves the settings UI by replacing free-text model inputs with editable dropdowns populated with curated OpenAI model IDs and refreshed options. Preserves custom selections and provides sensible defaults when no models are returned. Enhances testability by introducing a lightweight BaseWindow for mocked PyQt environments and guarding widget type checks and icon usage to avoid crashes under mocks. Updates configuration descriptions to reflect GPT-5.1 support and fixes console logging inconsistencies. * Hardens default prompt and UI layout handling Updates system prompt fallback to safely read from configuration using defensive lookups and an empty-string fallback, dropping mode-specific branching to avoid missing-key errors and ensure consistent behavior. Improves UI object naming by detecting and validating layouts before accessing the first child, preventing attribute errors when the widget is not a layout or has no items. Enhances resilience against schema changes and reduces potential crashes in both processing and settings UI.
* Adopt 'openai'; Azure Responses & UX improvements Renames the LLM API type from "chatgpt" to "openai" across docs, schema, code, and tests, and makes "openai" the default provider. Routes reasoning models (gpt-5/o1) through the OpenAI and Azure Responses APIs with reasoning effort set to none, and avoids sending temperature to those models. Adds detection for Azure API versions that support structured outputs and requests a strict JSON schema when available. Introduces separate Azure deployment names for cleanup and instruction modes with sensible fallback to the legacy deployment field, improving configuration flexibility. Improves the settings UI with provider-friendly labels, data-backed comboboxes, explicit Azure deployment labels, and dynamic hiding of the temperature field when reasoning models are selected. Updates model list handling and provider toggles accordingly. Updates tests to reflect the provider rename, new Azure fields, combobox helper behavior, and temperature visibility logic. Benefits: clearer provider naming, proper handling of reasoning-capable models, safer parameter usage, more flexible Azure configuration, and a cleaner UI experience. * Maps deprecated chatgpt to openai Treats the legacy api type "chatgpt" as "openai" and logs a deprecation notice to guide users toward the updated setting. Attempts to migrate the related config key to avoid future mismatches, failing silently if unavailable. Improves backward compatibility and reduces misconfiguration risk.
* Adds opt-in cleanup prompt logging Prevents verbose output from dumping cleanup system prompts by default. Introduces a separate toggle to allow prompt logging when needed for debugging. * Improves Azure deployment and cleanup logs Clarifies Azure OpenAI requests by logging the resolved deployment name and distinguishing it from the model parameter to reduce configuration confusion. Improves cleanup observability by logging raw outputs and whether cleanup changes the transcription, and ensures the original transcription is preserved when cleanup returns empty. Adds safer typing/paste handling with explicit success/failure logging to make input simulation issues easier to diagnose.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces three key improvements:
3 Enhanced Clipboard Handling