Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@vaishcodescape
Copy link
Contributor

@vaishcodescape vaishcodescape commented Oct 14, 2025

Implemented Automated Connector Scheduling System

Description

This PR implements a comprehensive automated scheduling system that allows users to configure periodic syncs for their connectors (Slack, Notion, GitHub, Linear, etc.) without manual intervention. The system addresses critical user pain points around data staleness, manual overhead, and peak load issues.

The implementation includes:

  • Background Scheduler Service: Continuously running service that monitors and executes scheduled connector syncs
  • Enhanced Schedule Configuration: Support for hourly, daily, weekly, and custom cron-based schedules with flexible time selection
  • Comprehensive API: Full CRUD operations for schedule management with monitoring endpoints
  • Modern Frontend UI: Intuitive interface for creating and managing connector schedules
  • Robust Error Handling: Automatic retry logic, comprehensive logging, and graceful failure handling
  • Real-time Monitoring: Live status tracking and execution history

Motivation and Context

Currently, users must manually trigger syncs for each connector to index new content into their search spaces. This creates several critical issues:

  • Data Staleness: Users often forget to sync, leading to outdated search results
  • Manual Overhead: Tedious to sync multiple connectors regularly
  • Peak Load Issues: Users tend to sync during business hours, causing resource spikes
  • Missed Updates: Important information may not be searchable until user manually syncs

This automated scheduling system solves these problems by providing:

  • Set-and-forget automation for connector syncing
  • Off-peak hour execution to reduce server load
  • Consistent data freshness without user intervention
  • Scalable background processing with concurrency limits

Changes Overview

Backend Infrastructure

  • ConnectorSchedulerService: Core background service managing schedule execution

    • Continuous monitoring of due schedules (60-second intervals)
    • Concurrent job execution with configurable limits (max 5 concurrent jobs)
    • Automatic next-run calculation and schedule progression
    • Graceful error handling with retry logic
  • Enhanced Database Schema: Extended ConnectorSchedule model with:

    • Support for multiple schedule types (HOURLY, DAILY, WEEKLY, CUSTOM)
    • Flexible time configuration options (daily_time, weekly_day, weekly_time, hourly_minute)
    • Comprehensive tracking (last_run_at, next_run_at, is_active)
    • Proper foreign key relationships and constraints
  • Schedule Helpers: Enhanced time calculation utilities supporting:

    • Configurable daily times (default: 2 AM off-peak hours)
    • Weekly schedules with day and time selection
    • Hourly schedules with minute precision
    • Custom cron expression validation and parsing

API Endpoints

  • Schedule Management: Full CRUD operations for connector schedules

    • POST /api/v1/connector-schedules/ - Create new schedules
    • GET /api/v1/connector-schedules/ - List schedules with filtering
    • PUT /api/v1/connector-schedules/{id} - Update existing schedules
    • DELETE /api/v1/connector-schedules/{id} - Remove schedules
    • PATCH /api/v1/connector-schedules/{id}/toggle - Activate/deactivate
  • Scheduler Monitoring: Real-time status and control endpoints

    • GET /api/v1/scheduler/status - Current scheduler health and statistics
    • POST /api/v1/scheduler/schedules/{id}/force-execute - Manual execution
    • GET /api/v1/scheduler/schedules/upcoming - Next scheduled executions
    • GET /api/v1/scheduler/schedules/recent-executions - Execution history

Frontend Implementation

  • Schedule Management UI: Comprehensive React-based interface
    • Intuitive schedule creation with type-specific configuration options
    • Real-time status dashboard showing scheduler health and active jobs
    • Schedule listing with execution history and next run times
    • One-click schedule activation/deactivation and manual execution
    • Responsive design with proper loading states and error handling

Integration & Lifecycle Management

  • Application Integration: Seamless integration with FastAPI lifespan events

    • Automatic scheduler startup on application launch
    • Graceful shutdown with active job cancellation
    • Proper resource cleanup and task management
  • Background Task Integration: Leverages existing FastAPI BackgroundTasks infrastructure

    • Reuses proven connector indexer functions
    • Maintains existing error handling and logging patterns
    • Preserves user isolation and security boundaries

Monitoring & Observability

  • Comprehensive Logging: Detailed execution tracking through existing TaskLoggingService

    • Task start/success/failure logging with metadata
    • Error message capture and debugging information
    • Execution duration and document processing statistics
  • Real-time Status: Live monitoring capabilities

    • Active job tracking and concurrency limits
    • Scheduler health status and configuration details
    • Upcoming schedule visibility for debugging

Technical Implementation Details

Architecture

  • Service Pattern: Singleton scheduler service with global lifecycle management
  • Async/Await: Full asynchronous implementation for optimal performance
  • Database Transactions: Proper transaction handling for schedule updates
  • Error Isolation: Individual schedule failures don't affect other schedules

Security & Validation

  • User Isolation: Schedules are scoped to user-owned connectors and search spaces
  • Input Validation: Comprehensive Pydantic validation for schedule parameters
  • Cron Expression Validation: Safe cron parsing with error handling
  • Ownership Verification: Proper authorization checks for all operations

Performance Considerations

  • Concurrent Execution: Configurable limits prevent resource exhaustion
  • Efficient Queries: Optimized database queries with proper indexing
  • Incremental Sync: Smart date range calculation for efficient data processing
  • Background Processing: Non-blocking execution doesn't impact API responsiveness

Files Modified/Created

New Files

  • app/services/connector_scheduler_service.py - Core scheduler implementation
  • app/routes/scheduler_routes.py - Scheduler monitoring and control APIs
  • app/dashboard/[search_space_id]/connectors/schedules/page.tsx - Frontend UI

Modified Files

  • app/app.py - Integrated scheduler lifecycle management
  • app/schemas/connector_schedule.py - Enhanced validation and time options
  • app/utils/schedule_helpers.py - Extended time calculation utilities

Database

  • alembic/versions/23_add_connector_schedules_table.py - Schema migration (already existed)

API Changes

  • This PR includes API changes

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance improvement (non-breaking change which enhances performance)
  • Documentation update
  • Breaking change (fix or feature that would cause existing functionality to change)

Testing

  • I have tested these changes locally
  • I have added/updated unit tests
  • I have added/updated integration tests

Checklist:

  • My code follows the code style of this project
  • My change requires documentation updates
  • I have updated the documentation accordingly
  • My change requires dependency updates
  • I have updated the dependencies accordingly
  • My code builds clean without any errors or warnings
  • All new and existing tests passed

High-level PR Summary

This PR implements a comprehensive automated connector scheduling system that enables users to configure periodic syncs (hourly, daily, weekly, or custom cron-based) for their connectors without manual intervention. The implementation includes a background scheduler service that continuously monitors and executes scheduled jobs, enhanced database schema with the ConnectorSchedule model supporting multiple schedule types, complete REST APIs for schedule CRUD operations and monitoring, a modern React frontend for schedule management, and robust error handling with automatic retry logic and comprehensive logging. The scheduler integrates seamlessly with FastAPI lifespan events and leverages existing connector indexer functions to maintain user isolation and security boundaries.

⏱️ Estimated Review Time: 1-3 hours

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
2 surfsense_backend/app/db.py
3 surfsense_backend/app/schemas/connector_schedule.py
4 surfsense_backend/app/schemas/__init__.py
5 surfsense_backend/app/utils/schedule_helpers.py
6 surfsense_backend/app/services/connector_scheduler_service.py
7 surfsense_backend/app/routes/connector_schedules_routes.py
8 surfsense_backend/app/routes/scheduler_routes.py
9 surfsense_backend/app/app.py
10 surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
11 surfsense_web/package-lock.json
12 .idea/.gitignore
13 .idea/SurfSense.iml
14 .idea/inspectionProfiles/profiles_settings.xml
15 .idea/modules.xml
16 .idea/vcs.xml
⚠️ Inconsistent Changes Detected
File Path Warning
.idea/.gitignore IDE configuration files should typically be in .gitignore rather than committed to the repository
.idea/SurfSense.iml IDE-specific IntelliJ/PyCharm configuration files are unrelated to the scheduling feature and should not be committed
.idea/inspectionProfiles/profiles_settings.xml IDE-specific inspection profile settings are unrelated to the scheduling feature
.idea/modules.xml IDE-specific project module configuration is unrelated to the scheduling feature
.idea/vcs.xml IDE-specific VCS configuration is unrelated to the scheduling feature

Need help? Join our Discord

Analyze latest changes

High-level PR Summary

This PR implements a comprehensive automated connector scheduling system that enables users to configure periodic syncs (hourly, daily, weekly, or custom cron-based) for their connectors without manual intervention. The implementation includes a background scheduler service (ConnectorSchedulerService) that continuously monitors and executes scheduled jobs with configurable concurrency limits, enhanced database schema with the ConnectorSchedule model supporting multiple schedule types and tracking fields, complete REST APIs for schedule CRUD operations and real-time monitoring, a modern React frontend for intuitive schedule management, and robust error handling with automatic retry logic and comprehensive logging. The scheduler integrates seamlessly with FastAPI lifespan events, leverages existing connector indexer functions to maintain user isolation and security boundaries, and provides real-time status tracking with execution history for monitoring and debugging.

⏱️ Estimated Review Time: 1-3 hours

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
2 surfsense_backend/app/db.py
3 surfsense_backend/app/schemas/connector_schedule.py
4 surfsense_backend/app/schemas/__init__.py
5 surfsense_backend/app/utils/schedule_helpers.py
6 surfsense_backend/app/services/connector_scheduler_service.py
7 surfsense_backend/app/routes/connector_schedules_routes.py
8 surfsense_backend/app/routes/scheduler_routes.py
9 surfsense_backend/app/app.py
10 surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
11 surfsense_web/package-lock.json
12 .idea/.gitignore
13 .idea/SurfSense.iml
14 .idea/inspectionProfiles/profiles_settings.xml
15 .idea/modules.xml
16 .idea/vcs.xml
⚠️ Inconsistent Changes Detected
File Path Warning
.idea/.gitignore IDE configuration files (.idea directory) should typically be in .gitignore rather than committed to the repository, as they are user/environment-specific and unrelated to the automated connector scheduling feature
.idea/SurfSense.iml IntelliJ/PyCharm IDE-specific project configuration file is unrelated to the scheduling feature and should not be committed to version control
.idea/inspectionProfiles/profiles_settings.xml IDE-specific inspection profile settings are environment-specific and unrelated to the automated connector scheduling feature
.idea/modules.xml IDE-specific project module configuration is unrelated to the scheduling feature and represents personal development environment settings
.idea/vcs.xml IDE-specific VCS configuration is unrelated to the automated connector scheduling feature and should be excluded via .gitignore

Need help? Join our Discord

Summary by CodeRabbit

  • New Features

    • Connector scheduling (Hourly, Daily, Weekly, Custom cron): full CRUD, background execution, manual force-run, and web UI for managing schedules.
    • Local speech-to-text support (faster local transcription) with integrated transcription metadata.
  • Improvements

    • Scheduler lifecycle, next-run calculation and cron validation; status and recent-execution endpoints.
    • Improved connector indexing/date-range filtering (Jira/ClickUp) and increased documents page default page size.
  • Chores

    • Added IDE config entries and updated .gitignore.

vaishcodescape and others added 9 commits October 9, 2025 19:30
- Add STT service with CPU-optimized Faster-Whisper
- Add API endpoints for transcription and model management
- Add React audio recorder component
- Support multiple Whisper models (tiny to large-v3)
- Include error handling for corrupted/invalid files
- Tested with real speech audio (99% accuracy)
- No external API dependencies, fully offline
- Simplify STT_SERVICE config to local/MODEL_SIZE format
- Remove separate STT routes, integrate with document upload
- Add local STT support to audio file processing pipeline
- Remove React component, use existing upload interface
- Support both local Faster-Whisper and external STT services
- Tested with real speech: 99% accuracy, 2.87s processing
- Compute stt_service_type once and reuse
- Follow DRY principles
- Improve code maintainability
- Use .get() for safe dictionary access instead of direct key access
- Add explicit try-catch for local STT transcription failures
- Validate transcription result is not empty
- Provide clear error messages for corrupted audio files
- Match error handling pattern with external STT service
- Add header to local STT transcription for consistency
- Add empty text validation for external STT path
- Refactor external STT to eliminate duplication in atranscription calls
- Ensure both local and external paths have consistent error handling
@vercel
Copy link

vercel bot commented Oct 14, 2025

@vaishcodescape is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link

coderabbitai bot commented Oct 14, 2025

Walkthrough

Adds end-to-end connector scheduling: DB migration and ORM model, Pydantic schemas, schedule helpers, a background ConnectorSchedulerService with lifecycle and force-execute, new API routes and frontend UI, STT local support and related connector/date-range updates, plus IDE project files and app wiring.

Changes

Cohort / File(s) Summary
IDE configuration
\.idea/.gitignore, \.idea/SurfSense.iml, \.idea/inspectionProfiles/profiles_settings.xml, \.idea/modules.xml, \.idea/vcs.xml
Adds IntelliJ project files and ignore entries (shelf/, workspace.xml); defines module, inspection profile, module mapping, and Git VCS mapping.
DB migration
surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
New Alembic migration: creates scheduletype enum and connector_schedules table with unique constraint on (connector_id, search_space_id), multiple columns, and conditional indexes; downgrade removes them.
ORM models
surfsense_backend/app/db.py
Adds ScheduleType enum and ConnectorSchedule model; updates SearchSpace and SearchSourceConnector relationships to include schedules and adjust cascades/ownership.
Schemas
surfsense_backend/app/schemas/__init__.py, surfsense_backend/app/schemas/connector_schedule.py
Adds Pydantic schemas: ConnectorScheduleBase, ConnectorScheduleCreate, ConnectorScheduleUpdate, ConnectorScheduleRead with cross-field validation and exports.
Scheduling utils
surfsense_backend/app/utils/schedule_helpers.py
Adds calculate_next_run and is_valid_cron_expression helpers for HOURLY/DAILY/WEEKLY/CUSTOM schedules (uses croniter) and validations.
Scheduler service
surfsense_backend/app/services/connector_scheduler_service.py
Implements ConnectorSchedulerService: periodic loop, due-schedule discovery, concurrency control, indexer mapping, task logging, start/stop lifecycle, status reporting, and force-execute; global getters.
API routes: connector schedules
surfsense_backend/app/routes/connector_schedules_routes.py
New CRUD and toggle endpoints for ConnectorSchedule with ownership and indexability checks, next_run calculation, error handling, and DB session usage.
API routes: scheduler control
surfsense_backend/app/routes/scheduler_routes.py
New scheduler endpoints: status, force-execute (background task), upcoming schedules, and recent executions; uses auth, background tasks, and DB queries.
App wiring
surfsense_backend/app/app.py
Integrates scheduler lifecycle into app lifespan (start/stop), registers new routers (connector-schedules, scheduler), enables CORS, and ensures DB tables created at startup.
Frontend: schedules UI
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
Adds page to list/create/toggle/force-execute schedules, show scheduler status, dynamic form fields per schedule type, and interactions with new API endpoints.
Connectors: ClickUp & Jira
surfsense_backend/app/connectors/clickup_connector.py, surfsense_backend/app/connectors/jira_connector.py
ClickUp: use date-range parsing and query params for task range. Jira: extend JQL to include updatedDate, ordering, and project-scoped grouping.
Local STT & docs
surfsense_backend/app/services/stt_service.py, surfsense_backend/app/routes/documents_routes.py, surfsense_backend/.env.example, docs files, pyproject.toml
Adds faster-whisper dependency and local STT service with file/bytes transcription; documents and env examples updated to reflect local vs remote STT/TTS options.
Misc frontend & formatting
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx, various web components and config files
Small UI default changes (page size), minor formatting/import reorders across several frontend files and drizzle config.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant FE as Frontend (Schedules Page)
  participant API as FastAPI (/api/v1)
  participant Svc as ConnectorSchedulerService
  participant DB as DB (AsyncSession)
  rect rgba(200,230,255,0.18)
    note over API,Svc: App startup wiring
    API->>Svc: start_scheduler()
    Svc->>Svc: start background loop (check_interval)
  end
  User->>FE: open schedules page
  FE->>API: GET /api/v1/scheduler/status
  API->>Svc: get_scheduler_status()
  Svc-->>API: status JSON
  API-->>FE: status JSON
  FE->>API: GET /api/v1/connector-schedules?search_space_id=…
  API->>DB: query schedules (+ connector, search_space)
  DB-->>API: schedules
  API-->>FE: schedules JSON
  rect rgba(220,255,220,0.18)
    note over Svc,DB: Periodic scheduler run
    Svc->>DB: find due schedules
    DB-->>Svc: due schedules
    loop per due schedule (bounded by concurrency)
      Svc->>DB: update last_run_at
      Svc->>Svc: dispatch indexer task (by connector type)
      Svc->>DB: update next_run_at
    end
  end
Loading
sequenceDiagram
  autonumber
  actor User
  participant FE as Frontend
  participant API as FastAPI (/api/v1)
  participant BG as BackgroundTasks
  participant Svc as ConnectorSchedulerService
  User->>FE: Click "Force execute"
  FE->>API: POST /api/v1/scheduler/schedules/{id}/force-execute
  API->>BG: enqueue _force_execute_schedule_task(schedule_id)
  API-->>FE: 202 Accepted
  BG->>Svc: force_execute_schedule(schedule_id)
  Svc->>DB: validate schedule active & fetch
  Svc->>Svc: _execute_schedule -> update last_run_at/next_run_at and run indexer
  Svc-->>BG: result / logs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • MODSetter

Poem

I thump my paws to mark the time—tick, hop, run!
Cron-flowers open, hourly chores begun.
I queue the jobs, I nudge the night and day,
A rabbit on a schedule, eager to play. 🥕
Click “force” — I dash, indexing all the way.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “feat:added automated connector scheduling system” directly and concisely reflects the primary feature introduced by this pull request, namely the implementation of an automated scheduling service for connector syncs, and aligns with the detailed objectives without including irrelevant details or generic phrasing.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on d86aaea..b870ddb

  Severity     Location     Issue     Delete  
High surfsense_backend/app/routes/connector_schedules_routes.py:206 Incomplete route implementation causing syntax error
High surfsense_backend/app/app.py:101 Missing router registration breaks API
High surfsense_backend/app/utils/schedule_helpers.py:35 Timezone-naive datetime causes comparison errors
High surfsense_backend/app/routes/connector_schedules_routes.py:84 Missing time parameters causes incorrect schedules
✅ Files analyzed, no issues (12)

.idea/.gitignore
.idea/SurfSense.iml
.idea/inspectionProfiles/profiles_settings.xml
.idea/modules.xml
.idea/vcs.xml
surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
surfsense_backend/app/db.py
surfsense_backend/app/routes/scheduler_routes.py
surfsense_backend/app/schemas/__init__.py
surfsense_backend/app/schemas/connector_schedule.py
surfsense_backend/app/services/connector_scheduler_service.py
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

⏭️ Files skipped (1)
  Locations  
surfsense_web/package-lock.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
surfsense_backend/app/app.py (1)

56-63: CORS misconfiguration: allow_credentials=True with "*" origins.

Browsers disallow wildcard origin when credentials are enabled; requests will fail and it’s insecure. Use explicit origins or set allow_credentials=False.

Example:

-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=config.CORS_ALLOW_ORIGINS,  # e.g., ["https://app.example.com"]
+    allow_credentials=True,
+    allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
+    allow_headers=["*"],
+)

Ensure config.CORS_ALLOW_ORIGINS is defined appropriately per env.

surfsense_backend/app/db.py (1)

283-296: Define missing reverse relationships in SearchSpace
Add these three relationships to the SearchSpace class to match existing back_populates and avoid mapping errors:

search_source_connectors = relationship(
    "SearchSourceConnector",
    back_populates="search_space",
    order_by="SearchSourceConnector.id",
    cascade="all, delete-orphan",
)
llm_configs = relationship(
    "LLMConfig",
    back_populates="search_space",
    order_by="LLMConfig.id",
    cascade="all, delete-orphan",
)
user_preferences = relationship(
    "UserSearchSpacePreference",
    back_populates="search_space",
    order_by="UserSearchSpacePreference.id",
    cascade="all, delete-orphan",
)

Place these inside the SearchSpace class (e.g. after connector_schedules).

🧹 Nitpick comments (11)
.idea/SurfSense.iml (1)

1-8: Remove IDE artifacts from version control.

This IntelliJ module file is environment-specific, churns often, and doesn’t contribute to the backend/frontend feature set. Please drop it (and the other .idea XML files) from the repo and rely on .gitignore to keep the directory untracked.

surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1)

61-78: Indexes: add composite (is_active, next_run_at); consider dropping redundant id index.

  • Scheduler likely queries WHERE is_active = TRUE AND next_run_at <= now(); a composite index helps.
  • The explicit id index duplicates the PK index; remove if not needed.

Apply this diff to add indexes:

     if "ix_connector_schedules_id" not in existing_indexes:
-        op.create_index("ix_connector_schedules_id", "connector_schedules", ["id"])
+        # Optional: id already indexed by PK; consider skipping this extra index.
+        pass
@@
     if "ix_connector_schedules_next_run_at" not in existing_indexes:
         op.create_index(
             "ix_connector_schedules_next_run_at", "connector_schedules", ["next_run_at"]
         )
+    if "ix_connector_schedules_search_space_id" not in existing_indexes:
+        op.create_index(
+            "ix_connector_schedules_search_space_id",
+            "connector_schedules",
+            ["search_space_id"],
+        )
+    if "ix_connector_schedules_is_active_next_run_at" not in existing_indexes:
+        op.create_index(
+            "ix_connector_schedules_is_active_next_run_at",
+            "connector_schedules",
+            ["is_active", "next_run_at"],
+        )

And include drops in downgrade:

-    op.drop_index("ix_connector_schedules_next_run_at", table_name="connector_schedules")
+    op.drop_index("ix_connector_schedules_is_active_next_run_at", table_name="connector_schedules")
+    op.drop_index("ix_connector_schedules_search_space_id", table_name="connector_schedules")
+    op.drop_index("ix_connector_schedules_next_run_at", table_name="connector_schedules")
-    op.drop_index("ix_connector_schedules_id", table_name="connector_schedules")
+    # If created earlier; otherwise safe to skip or guard with IF EXISTS pattern
+    # op.drop_index("ix_connector_schedules_id", table_name="connector_schedules")
surfsense_backend/app/app.py (2)

31-33: Async task is unnecessary; just await start_scheduler().

start_scheduler completes quickly; creating/canceling a task adds noise. Prefer awaiting directly and drop cancel block.

Apply this diff:

-    scheduler_task = asyncio.create_task(start_scheduler())
-    logger.info("Connector scheduler service started")
+    await start_scheduler()
+    logger.info("Connector scheduler service started")

Then remove the later cancel section (Lines 43-49).


26-29: Avoid create_all in production; prefer Alembic migrations.

Running create_db_and_tables() alongside Alembic can cause drift. Gate behind a dev flag or remove in prod environments.

surfsense_backend/app/utils/schedule_helpers.py (2)

37-41: Optional: clamp hourly_minute defensively.

Schemas validate range, but add a local guard to avoid surprises if called directly.

Apply this diff:

-        minute = hourly_minute if hourly_minute is not None else 0
+        minute = hourly_minute if hourly_minute is not None else 0
+        if minute < 0 or minute > 59:
+            raise ValueError("hourly_minute must be between 0 and 59")

51-64: Weekly scheduling logic: clarify default timezone semantics.

If daily/weekly times represent local-wall times, consider storing a timezone per search space or using config default TZ; current UTC assumption may surprise users across DST.

surfsense_backend/app/schemas/connector_schedule.py (1)

88-91: Consider adding optional time fields to Update schema

Allow updating daily/weekly/hourly options if persisted in DB; otherwise clients can’t change them post‑create.

 class ConnectorScheduleUpdate(BaseModel):
@@
-    schedule_type: ScheduleType | None = None
-    cron_expression: str | None = None
-    is_active: bool | None = None
+    schedule_type: ScheduleType | None = None
+    cron_expression: str | None = None
+    is_active: bool | None = None
+    daily_time: time | None = None
+    weekly_day: int | None = None
+    weekly_time: time | None = None
+    hourly_minute: int | None = None

Also applies to: 94-100

surfsense_backend/app/routes/scheduler_routes.py (1)

95-106: Bound the limit parameter to avoid heavy queries

Add simple clamping/validation (e.g., 1..100) to prevent abuse.

-async def get_upcoming_schedules(
-    limit: int = 10,
+async def get_upcoming_schedules(
+    limit: int = 10,
@@
-    try:
+    try:
+        limit = max(1, min(limit, 100))
@@
-async def get_recent_schedule_executions(
-    limit: int = 20,
+async def get_recent_schedule_executions(
+    limit: int = 20,
@@
-    try:
+    try:
+        limit = max(1, min(limit, 100))

Also applies to: 152-163

surfsense_backend/app/routes/connector_schedules_routes.py (1)

41-45: Frontend relies on filtering by search_space_id; ensure API supports it

The UI calls GET /connector-schedules/?search_space_id=... which your code supports. Consider also adding a similar filter to /search-source-connectors to simplify client-side filtering.

Also applies to: 118-121

surfsense_backend/app/services/connector_scheduler_service.py (2)

289-303: Recalculate next_run with full time options if persisted

Only schedule_type/cron are passed, ignoring daily/weekly/hourly options. If these fields exist in DB, include them.

-        next_run = calculate_next_run(
-            schedule.schedule_type, schedule.cron_expression
-        )
+        next_run = calculate_next_run(
+            schedule.schedule_type,
+            schedule.cron_expression,
+            getattr(schedule, "daily_time", None),
+            getattr(schedule, "weekly_day", None),
+            getattr(schedule, "weekly_time", None),
+            getattr(schedule, "hourly_minute", None),
+        )

366-370: start_scheduler awaits an infinite loop; ensure it’s spawned as a background task

If start_scheduler() is awaited directly in app startup, it will block the app. Spawn with asyncio.create_task(start_scheduler()) in lifespan/startup hooks.

Where is start_scheduler() invoked? If it’s awaited, switch to creating a task.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d86aaea and 6fa1a00.

⛔ Files ignored due to path filters (1)
  • surfsense_web/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (15)
  • .idea/.gitignore (1 hunks)
  • .idea/SurfSense.iml (1 hunks)
  • .idea/inspectionProfiles/profiles_settings.xml (1 hunks)
  • .idea/modules.xml (1 hunks)
  • .idea/vcs.xml (1 hunks)
  • surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1 hunks)
  • surfsense_backend/app/app.py (3 hunks)
  • surfsense_backend/app/db.py (3 hunks)
  • surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
  • surfsense_backend/app/routes/scheduler_routes.py (1 hunks)
  • surfsense_backend/app/schemas/__init__.py (2 hunks)
  • surfsense_backend/app/schemas/connector_schedule.py (1 hunks)
  • surfsense_backend/app/services/connector_scheduler_service.py (1 hunks)
  • surfsense_backend/app/utils/schedule_helpers.py (1 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{jsx,tsx}

📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)

**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings

Files:

  • surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
🧬 Code graph analysis (8)
surfsense_backend/app/schemas/__init__.py (1)
surfsense_backend/app/schemas/connector_schedule.py (4)
  • ConnectorScheduleBase (11-85)
  • ConnectorScheduleCreate (88-91)
  • ConnectorScheduleRead (113-119)
  • ConnectorScheduleUpdate (94-110)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
  • BaseModel (154-158)
  • ScheduleType (132-136)
surfsense_backend/app/schemas/base.py (2)
  • IDModel (11-13)
  • TimestampModel (6-8)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (6)
  • ConnectorSchedule (298-324)
  • SearchSourceConnector (263-295)
  • SearchSpace (220-260)
  • User (418-427)
  • User (431-437)
  • get_async_session (478-480)
surfsense_backend/app/schemas/connector_schedule.py (3)
  • ConnectorScheduleCreate (88-91)
  • ConnectorScheduleRead (113-119)
  • ConnectorScheduleUpdate (94-110)
surfsense_backend/app/utils/check_ownership.py (1)
  • check_ownership (9-19)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-76)
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1)
surfsense_backend/app/db.py (1)
  • ConnectorSchedule (298-324)
surfsense_backend/app/app.py (2)
surfsense_backend/app/services/connector_scheduler_service.py (2)
  • start_scheduler (366-369)
  • stop_scheduler (372-377)
surfsense_backend/app/db.py (1)
  • create_db_and_tables (471-475)
surfsense_backend/app/utils/schedule_helpers.py (1)
surfsense_backend/app/db.py (1)
  • ScheduleType (132-136)
surfsense_backend/app/services/connector_scheduler_service.py (5)
surfsense_backend/app/db.py (3)
  • ConnectorSchedule (298-324)
  • SearchSourceConnectorType (55-71)
  • get_async_session (478-480)
surfsense_backend/app/services/task_logging_service.py (2)
  • TaskLoggingService (13-243)
  • log_task_start (20-58)
surfsense_backend/app/tasks/connector_indexers/slack_indexer.py (1)
  • index_slack_messages (30-377)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-76)
surfsense_backend/app/routes/scheduler_routes.py (2)
  • get_scheduler_status (24-45)
  • force_execute_schedule (49-80)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
  • get_async_session (478-480)
  • ConnectorSchedule (298-324)
  • SearchSpace (220-260)
surfsense_backend/app/schemas/connector_schedule.py (1)
  • ConnectorScheduleRead (113-119)
surfsense_backend/app/services/connector_scheduler_service.py (3)
  • get_scheduler (358-363)
  • get_scheduler_status (305-320)
  • force_execute_schedule (322-351)
🪛 GitHub Actions: Code Quality Checks
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

[error] 3-3: biome-check-web: Imports are not sorted. Safe fix available via Organize Imports (Biome).

surfsense_backend/app/app.py

[error] 46-49: SIM105 Use contextlib.suppress(asyncio.CancelledError) instead of try-except-pass

🪛 GitHub Actions: pre-commit
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

[error] 412-416: lint/correctness/useUniqueElementIds: id attribute should not be a static string literal. Generate unique IDs using useId().

surfsense_backend/app/app.py

[error] 46-49: SIM105 Use contextlib.suppress(asyncio.CancelledError) instead of try-except-pass

🔇 Additional comments (3)
surfsense_backend/app/schemas/__init__.py (1)

11-16: LGTM: ConnectorSchedule schemas exposed cleanly.

Re-exports look consistent with usage across routes/services. No issues.

Also applies to: 61-64

surfsense_backend/app/db.py (2)

132-137: LGTM: ScheduleType enum.

Enum values align with schemas and helpers.


255-260: LGTM: SearchSpace.connector_schedules relationship.

Naming and cascade settings look consistent.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on b870ddb..6fa1a00

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (50)

README.md
docs/chinese-llm-setup.md
surfsense_backend/alembic/env.py
surfsense_backend/alembic/versions/23_associate_connectors_with_search_spaces.py
surfsense_backend/alembic/versions/24_fix_null_chat_types.py
surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py
surfsense_backend/alembic/versions/26_add_language_column_to_llm_configs.py
surfsense_backend/alembic/versions/27_add_searxng_connector_enum.py
surfsense_backend/alembic/versions/28_add_chinese_litellmprovider_enum.py
surfsense_backend/app/agents/podcaster/configuration.py
surfsense_backend/app/agents/podcaster/nodes.py
surfsense_backend/app/agents/researcher/configuration.py
surfsense_backend/app/agents/researcher/nodes.py
surfsense_backend/app/agents/researcher/prompts.py
surfsense_backend/app/agents/researcher/qna_agent/configuration.py
surfsense_backend/app/agents/researcher/qna_agent/nodes.py
surfsense_backend/app/agents/researcher/qna_agent/prompts.py
surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py
surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py
surfsense_backend/app/connectors/google_calendar_connector.py
surfsense_backend/app/connectors/google_gmail_connector.py
surfsense_backend/app/db.py
surfsense_backend/app/routes/airtable_add_connector_route.py
surfsense_backend/app/routes/chats_routes.py
surfsense_backend/app/routes/documents_routes.py
surfsense_backend/app/routes/google_calendar_add_connector_route.py
surfsense_backend/app/routes/google_gmail_add_connector_route.py
surfsense_backend/app/routes/llm_config_routes.py
surfsense_backend/app/routes/luma_add_connector_route.py
surfsense_backend/app/routes/search_source_connectors_routes.py
surfsense_backend/app/schemas/llm_config.py
surfsense_backend/app/schemas/search_source_connector.py
surfsense_backend/app/services/connector_service.py
surfsense_backend/app/services/llm_service.py
surfsense_backend/app/services/query_service.py
surfsense_backend/app/services/task_logging_service.py
surfsense_backend/app/tasks/connector_indexers/airtable_indexer.py
surfsense_backend/app/tasks/connector_indexers/clickup_indexer.py
surfsense_backend/app/tasks/connector_indexers/confluence_indexer.py
surfsense_backend/app/tasks/connector_indexers/discord_indexer.py
surfsense_backend/app/tasks/connector_indexers/github_indexer.py
surfsense_backend/app/tasks/connector_indexers/google_calendar_indexer.py
surfsense_backend/app/tasks/connector_indexers/google_gmail_indexer.py
surfsense_backend/app/tasks/connector_indexers/jira_indexer.py
surfsense_backend/app/tasks/connector_indexers/linear_indexer.py
surfsense_backend/app/tasks/connector_indexers/luma_indexer.py
surfsense_backend/app/tasks/connector_indexers/notion_indexer.py
surfsense_backend/app/tasks/document_processors/extension_processor.py
surfsense_backend/app/tasks/document_processors/file_processors.py
surfsense_backend/app/tasks/document_processors/markdown_processor.py

⏭️ Files skipped (56)
  Locations  
surfsense_backend/app/tasks/document_processors/url_crawler.py
surfsense_backend/app/tasks/document_processors/youtube_processor.py
surfsense_backend/app/tasks/podcast_tasks.py
surfsense_backend/app/tasks/stream_connector_search_results.py
surfsense_backend/app/utils/validators.py
surfsense_web/app/dashboard/[search_space_id]/client-layout.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/airtable-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/clickup-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/discord-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/github-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/google-calendar-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/google-gmail-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/jira-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/linear-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/linkup-api/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/luma-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/notion-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/searxng/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/serper-api/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/slack-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/tavily-api/page.tsx
surfsense_web/app/dashboard/[search_space_id]/layout.tsx
surfsense_web/app/dashboard/[search_space_id]/logs/(manage)/page.tsx
surfsense_web/app/dashboard/[search_space_id]/onboard/page.tsx
surfsense_web/app/dashboard/[search_space_id]/podcasts/podcasts-client.tsx
surfsense_web/app/dashboard/[search_space_id]/researcher/[[...chat_id]]/page.tsx
surfsense_web/app/dashboard/[search_space_id]/settings/page.tsx
surfsense_web/app/dashboard/layout.tsx
surfsense_web/components/UserDropdown.tsx
surfsense_web/components/chat/ChatCitation.tsx
surfsense_web/components/chat/ChatInputGroup.tsx
surfsense_web/components/chat/ChatSources.tsx
surfsense_web/components/chat/SourceDetailSheet.tsx
surfsense_web/components/editConnector/types.ts
surfsense_web/components/inference-params-editor.tsx
surfsense_web/components/onboard/add-provider-step.tsx
surfsense_web/components/onboard/assign-roles-step.tsx
surfsense_web/components/onboard/completion-step.tsx
surfsense_web/components/settings/llm-role-manager.tsx
surfsense_web/components/settings/model-config-manager.tsx
surfsense_web/contracts/enums/connector.ts
surfsense_web/contracts/enums/connectorIcons.tsx
surfsense_web/contracts/enums/languages.ts
surfsense_web/contracts/enums/llm-providers.ts
surfsense_web/hooks/index.ts
surfsense_web/hooks/use-chat.ts
surfsense_web/hooks/use-connector-edit-page.ts
surfsense_web/hooks/use-connectors.ts
surfsense_web/hooks/use-llm-configs.ts
surfsense_web/hooks/use-search-source-connectors.ts
surfsense_web/lib/connectors/utils.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
surfsense_backend/app/services/connector_scheduler_service.py (1)

127-145: Fix invalid datetime timezone usage.

Line 129 uses datetime.now(datetime.utc) which is incorrect. datetime.utc is a constant, not a timezone object, and will raise an AttributeError at runtime.

Apply this diff:

     async def _get_due_schedules(self, session: AsyncSession) -> List[ConnectorSchedule]:
         """Get all schedules that are due for execution."""
-        now = datetime.now(datetime.utc)
+        now = datetime.now(timezone.utc)
🧹 Nitpick comments (2)
surfsense_backend/app/routes/connector_schedules_routes.py (2)

90-98: Consider skipping next_run calculation when creating inactive schedules.

Currently, next_run_at is calculated even when is_active is False. While the scheduler ignores inactive schedules, you could optimize by setting next_run_at = None for inactive schedules to avoid unnecessary computation.

Apply this diff:

-        # Calculate next run time
-        next_run_at = calculate_next_run(
+        # Calculate next run time (skip if inactive)
+        next_run_at = None if not schedule.is_active else calculate_next_run(
             schedule.schedule_type, 
             schedule.cron_expression,
             schedule.daily_time,
             schedule.weekly_day,
             schedule.weekly_time,
             schedule.hourly_minute
         )

327-363: Recalculate next_run_at when activating schedules via toggle.

When toggling a schedule from inactive to active, next_run_at may be stale (if it was set before deactivation). The scheduler will handle this by running it immediately, but it's cleaner to recalculate the next run time when activating.

Similarly, consider clearing next_run_at when deactivating for cleaner state.

Apply this diff:

         # Toggle the active status
         schedule.is_active = not schedule.is_active
+        
+        # Recalculate or clear next_run_at based on new state
+        if schedule.is_active:
+            # Recalculate next run time when activating
+            schedule.next_run_at = calculate_next_run(
+                schedule.schedule_type,
+                schedule.cron_expression,
+                schedule.daily_time,
+                schedule.weekly_day,
+                schedule.weekly_time,
+                schedule.hourly_minute,
+            )
+        else:
+            # Clear next run time when deactivating
+            schedule.next_run_at = None
+        
         await session.commit()
         await session.refresh(schedule)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fa1a00 and fc4f677.

📒 Files selected for processing (7)
  • surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1 hunks)
  • surfsense_backend/app/app.py (3 hunks)
  • surfsense_backend/app/db.py (4 hunks)
  • surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
  • surfsense_backend/app/routes/scheduler_routes.py (1 hunks)
  • surfsense_backend/app/schemas/connector_schedule.py (1 hunks)
  • surfsense_backend/app/services/connector_scheduler_service.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
surfsense_backend/app/app.py (2)
surfsense_backend/app/services/connector_scheduler_service.py (2)
  • start_scheduler (371-374)
  • stop_scheduler (377-382)
surfsense_backend/app/db.py (1)
  • create_db_and_tables (477-481)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
  • BaseModel (155-159)
  • ScheduleType (133-137)
surfsense_backend/app/schemas/base.py (2)
  • IDModel (11-13)
  • TimestampModel (6-8)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
  • ConnectorSchedule (299-330)
  • ScheduleType (133-137)
  • SearchSourceConnector (264-296)
  • SearchSpace (221-261)
  • User (424-433)
  • User (437-443)
  • get_async_session (484-486)
surfsense_backend/app/schemas/connector_schedule.py (3)
  • ConnectorScheduleCreate (88-91)
  • ConnectorScheduleRead (113-119)
  • ConnectorScheduleUpdate (94-110)
surfsense_backend/app/utils/check_ownership.py (1)
  • check_ownership (9-19)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-76)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
  • get_async_session (484-486)
  • ConnectorSchedule (299-330)
  • SearchSpace (221-261)
surfsense_backend/app/schemas/connector_schedule.py (1)
  • ConnectorScheduleRead (113-119)
surfsense_backend/app/services/connector_scheduler_service.py (3)
  • get_scheduler (363-368)
  • get_scheduler_status (310-325)
  • force_execute_schedule (327-356)
surfsense_backend/app/services/connector_scheduler_service.py (4)
surfsense_backend/app/db.py (4)
  • ConnectorSchedule (299-330)
  • SearchSourceConnector (264-296)
  • SearchSourceConnectorType (56-72)
  • get_async_session (484-486)
surfsense_backend/app/services/task_logging_service.py (4)
  • TaskLoggingService (13-243)
  • log_task_start (20-58)
  • log_task_failure (107-162)
  • log_task_success (60-105)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-76)
surfsense_backend/app/routes/scheduler_routes.py (2)
  • get_scheduler_status (24-45)
  • force_execute_schedule (49-80)
🪛 GitHub Actions: Code Quality Checks
surfsense_backend/app/routes/scheduler_routes.py

[error] 167-173: Ruff lint: Undefined name 'Log' (F821) and related definitions; potential missing imports or model definitions. Also references to Log in multiple lines.


[error] 178-178: Ruff lint: Redefinition of unused 'logs' (F811) to be resolved after fixing previous undefined references.

🪛 GitHub Actions: pre-commit
surfsense_backend/app/routes/scheduler_routes.py

[error] 167-171: F821 Undefined name Log in query construction.


[error] 178-178: F811 Redefinition of unused logs.

🔇 Additional comments (9)
surfsense_backend/app/db.py (1)

133-330: LGTM! Past review concerns have been addressed.

The ScheduleType enum and ConnectorSchedule model are correctly implemented with all required fields, including the previously missing schedule options (daily_time, weekly_day, weekly_time, hourly_minute). Relationships are properly configured with appropriate cascade settings.

surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1)

20-99: LGTM! Migration correctly implements all schedule fields.

The migration properly creates the scheduletype enum and connector_schedules table with all required columns, including the schedule configuration options (daily_time, weekly_day, weekly_time, hourly_minute) that were flagged in past reviews. The use of inspector to check for existing indexes is a good practice.

surfsense_backend/app/app.py (1)

1-101: LGTM! Past review concerns have been addressed.

The application correctly integrates the scheduler lifecycle with proper startup/shutdown handling, uses contextlib.suppress for CancelledError as suggested, and registers both the connector_schedules_router and scheduler_router. The logging additions improve observability.

surfsense_backend/app/schemas/connector_schedule.py (1)

24-110: LGTM! Pydantic v2 validators correctly implemented.

All field validators have been updated to use the correct Pydantic v2 signature with info: FieldValidationInfo and access field data via info.data. The validation logic properly enforces schedule type constraints.

surfsense_backend/app/routes/scheduler_routes.py (2)

23-45: LGTM!

The scheduler status endpoint correctly delegates to the scheduler service and includes appropriate error handling.


48-92: LGTM!

The force execute endpoint correctly queues the execution as a background task with proper error handling and logging.

surfsense_backend/app/services/connector_scheduler_service.py (2)

47-100: LGTM!

The scheduler service initialization and lifecycle management (start/stop) are correctly implemented with proper cleanup of active jobs.


277-356: LGTM! Remaining methods correctly implemented.

The schedule update methods and force execute functionality are correctly implemented with proper timezone usage at line 282. The TaskLoggingService integration concerns from past reviews have been resolved.

surfsense_backend/app/routes/connector_schedules_routes.py (1)

70-74: Previous issues resolved: search space validation and time parameters.

The connector/search space mismatch validation (lines 70-74) and full time parameter passing to calculate_next_run (lines 91-98) correctly address the critical issues flagged in previous reviews. The explicit field inclusion (lines 100-114) also prevents non-DB fields from being passed to the model constructor.

Also applies to: 91-98

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
surfsense_backend/app/routes/connector_schedules_routes.py (1)

353-359: Recalculate/clear next_run_at on toggle.

On deactivate: set next_run_at=None. On activate: compute next run and ensure tz-aware UTC.

Apply:

         # Toggle the active status
         schedule.is_active = not schedule.is_active
+        if schedule.is_active:
+            nr = calculate_next_run(
+                schedule.schedule_type,
+                schedule.cron_expression,
+                schedule.daily_time,
+                schedule.weekly_day,
+                schedule.weekly_time,
+                schedule.hourly_minute,
+            )
+            if nr.tzinfo is None:
+                nr = nr.replace(tzinfo=timezone.utc)
+            schedule.next_run_at = nr
+        else:
+            schedule.next_run_at = None
         await session.commit()

Also applies to: 355-357

surfsense_backend/app/routes/scheduler_routes.py (1)

63-68: Tighten log query filter.

  • Add Log.source == "connector_scheduler" to avoid capturing unrelated messages.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc4f677 and 7391a34.

📒 Files selected for processing (4)
  • surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
  • surfsense_backend/app/routes/scheduler_routes.py (1 hunks)
  • surfsense_backend/app/schemas/connector_schedule.py (1 hunks)
  • surfsense_backend/app/services/connector_scheduler_service.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
  • ConnectorSchedule (299-330)
  • ScheduleType (133-137)
  • SearchSourceConnector (264-296)
  • SearchSpace (221-261)
  • User (424-433)
  • User (437-443)
  • get_async_session (484-486)
surfsense_backend/app/schemas/connector_schedule.py (3)
  • ConnectorScheduleCreate (88-91)
  • ConnectorScheduleRead (167-173)
  • ConnectorScheduleUpdate (94-164)
surfsense_backend/app/utils/check_ownership.py (1)
  • check_ownership (9-19)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-76)
surfsense_backend/app/services/connector_scheduler_service.py (4)
surfsense_backend/app/db.py (4)
  • ConnectorSchedule (299-330)
  • SearchSourceConnector (264-296)
  • SearchSourceConnectorType (56-72)
  • get_async_session (484-486)
surfsense_backend/app/services/task_logging_service.py (4)
  • TaskLoggingService (13-243)
  • log_task_start (20-58)
  • log_task_failure (107-162)
  • log_task_success (60-105)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-76)
surfsense_backend/app/routes/scheduler_routes.py (2)
  • get_scheduler_status (24-45)
  • force_execute_schedule (49-80)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
  • BaseModel (155-159)
  • ScheduleType (133-137)
surfsense_backend/app/schemas/base.py (2)
  • IDModel (11-13)
  • TimestampModel (6-8)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
  • get_async_session (484-486)
  • ConnectorSchedule (299-330)
  • SearchSpace (221-261)
surfsense_backend/app/schemas/connector_schedule.py (1)
  • ConnectorScheduleRead (167-173)
surfsense_backend/app/services/connector_scheduler_service.py (3)
  • get_scheduler (363-368)
  • get_scheduler_status (310-325)
  • force_execute_schedule (327-356)
🔇 Additional comments (2)
surfsense_backend/app/schemas/connector_schedule.py (1)

24-37: Validators migrated to Pydantic v2 correctly.

Using FieldValidationInfo and info.data is correct. Good cross‑field checks.

Based on learnings

surfsense_backend/app/services/connector_scheduler_service.py (1)

127-141: Timezone-aware scheduling and updates look correct.

Using datetime.now(timezone.utc) and updating last_run_at/next_run_at with tz-aware values aligns with TIMESTAMP(timezone=True).

Also applies to: 277-305

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
surfsense_backend/app/routes/connector_schedules_routes.py (1)

164-179: Consider eager-loading relationships to avoid N+1 queries.

If callers access the connector or search_space relationships from the returned schedules, each access would trigger a separate query. You can optimize this by eager-loading:

         query = (
             select(ConnectorSchedule)
             .join(SearchSourceConnector)
             .filter(SearchSourceConnector.user_id == user.id)
+            .options(
+                selectinload(ConnectorSchedule.connector),
+                selectinload(ConnectorSchedule.search_space)
+            )
         )

Don't forget to import selectinload:

from sqlalchemy.orm import selectinload
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7391a34 and 62ffec9.

📒 Files selected for processing (2)
  • surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
  • surfsense_backend/app/utils/schedule_helpers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • surfsense_backend/app/utils/schedule_helpers.py
🧰 Additional context used
🧬 Code graph analysis (1)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
  • ConnectorSchedule (299-330)
  • ScheduleType (133-137)
  • SearchSourceConnector (264-296)
  • SearchSpace (221-261)
  • User (424-433)
  • User (437-443)
  • get_async_session (484-486)
surfsense_backend/app/schemas/connector_schedule.py (3)
  • ConnectorScheduleCreate (88-91)
  • ConnectorScheduleRead (167-173)
  • ConnectorScheduleUpdate (94-164)
surfsense_backend/app/utils/check_ownership.py (1)
  • check_ownership (9-19)
surfsense_backend/app/utils/schedule_helpers.py (1)
  • calculate_next_run (10-80)
🔇 Additional comments (2)
surfsense_backend/app/routes/connector_schedules_routes.py (2)

70-75: Excellent improvements addressing previous feedback!

The create endpoint now properly:

  1. Enforces connector-space ownership match (lines 70-75)
  2. Passes all time parameters to calculate_next_run (lines 94-100)
  3. Explicitly includes only DB columns when creating the model (lines 104-116)

These changes resolve the critical issues flagged in previous reviews.

Also applies to: 91-101, 103-116


245-319: Update logic is complex but handles all scenarios correctly.

The update endpoint properly handles:

  • Schedule type changes with next_run_at recalculation (lines 270-285)
  • Time field updates (lines 264-268, 270-285)
  • Cron expression updates for CUSTOM schedules (lines 286-301)
  • Activation/deactivation with appropriate next_run_at management (lines 303-319)
  • Validation that cron_expression is only used with CUSTOM type (lines 257-262)
  • Timezone-aware UTC enforcement (lines 282-283, 298-299, 317-318)

The conditional logic correctly avoids duplicate calculations and handles edge cases like simultaneous type+time changes or activation toggles combined with other updates.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62ffec9 and c044d42.

📒 Files selected for processing (2)
  • surfsense_backend/app/connectors/clickup_connector.py (2 hunks)
  • surfsense_backend/app/connectors/jira_connector.py (2 hunks)
🔇 Additional comments (3)
surfsense_backend/app/connectors/clickup_connector.py (2)

8-8: LGTM!

The datetime import is correctly added to support the new date range parsing functionality.


188-193: Verify the date filter logic aligns with scheduling requirements.

The implementation uses both date_created and date_updated filters with the same timestamp range. ClickUp API typically applies OR logic between these filters, meaning tasks are returned if they were either created or updated in the range.

This will include tasks that were created before start_date but updated during the range, which may or may not be the desired behavior for the scheduling system:

  • Correct for incremental syncs: If the scheduler needs to catch task updates, this is appropriate
  • ⚠️ Potentially incorrect for initial syncs: If only newly created tasks are desired, remove the date_updated filters

Please confirm this matches the intended scheduling behavior.

If you only want newly created tasks (not updates), apply this diff:

                 "include_closed": str(include_closed).lower(),
-                # Date filtering - filter by both created and updated dates
+                # Date filtering - filter by created date only
                 "date_created_gt": start_timestamp,
                 "date_created_lt": end_timestamp,
-                "date_updated_gt": start_timestamp,
-                "date_updated_lt": end_timestamp,
surfsense_backend/app/connectors/jira_connector.py (1)

253-253: LGTM on passing constructed JQL into params.

Using params["jql"] = _jql is correct.

Comment on lines +172 to +181
# Convert date strings to Unix timestamps (milliseconds)
start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
end_datetime = datetime.strptime(end_date, "%Y-%m-%d")

# Set time to start and end of day for complete coverage
start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)
end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)

start_timestamp = int(start_datetime.timestamp() * 1000)
end_timestamp = int(end_datetime.timestamp() * 1000)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add explicit timezone handling to prevent date boundary shifts.

The code uses naive datetime objects (no timezone), which will be interpreted in the server's local timezone when calling .timestamp(). If the server timezone differs from the user's expected timezone or ClickUp's timezone, this can cause the date range boundaries to shift unexpectedly, potentially fetching tasks from the wrong days.

Apply this diff to use UTC timezone explicitly:

-            # Convert date strings to Unix timestamps (milliseconds)
-            start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
-            end_datetime = datetime.strptime(end_date, "%Y-%m-%d")
-            
-            # Set time to start and end of day for complete coverage
-            start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)
-            end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)
-            
-            start_timestamp = int(start_datetime.timestamp() * 1000)
-            end_timestamp = int(end_datetime.timestamp() * 1000)
+            # Convert date strings to Unix timestamps (milliseconds)
+            # Use UTC timezone explicitly to ensure consistent date boundaries
+            from datetime import timezone
+            
+            start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace(
+                hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc
+            )
+            end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace(
+                hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc
+            )
+            
+            start_timestamp = int(start_datetime.timestamp() * 1000)
+            end_timestamp = int(end_datetime.timestamp() * 1000)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Convert date strings to Unix timestamps (milliseconds)
start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
end_datetime = datetime.strptime(end_date, "%Y-%m-%d")
# Set time to start and end of day for complete coverage
start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)
end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)
start_timestamp = int(start_datetime.timestamp() * 1000)
end_timestamp = int(end_datetime.timestamp() * 1000)
# Convert date strings to Unix timestamps (milliseconds)
# Use UTC timezone explicitly to ensure consistent date boundaries
from datetime import timezone
start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace(
hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc
)
end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace(
hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc
)
start_timestamp = int(start_datetime.timestamp() * 1000)
end_timestamp = int(end_datetime.timestamp() * 1000)

Comment on lines +225 to 233
f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') "
f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')"
)
# TODO : This JQL needs some improvement to work as expected

_jql = f"{date_filter}"
_jql = f"{date_filter} ORDER BY created DESC"
if project_key:
_jql = (
f'project = "{project_key}" AND {date_filter} ORDER BY created DESC'
f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC'
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix JQL field names; sanitize project_key (prevents 400s and injection).

Use JQL fields created/updated (not createdDate/updatedDate). Also escape quotes in project_key.

Apply this diff:

-            date_filter = (
-                f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') "
-                f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')"
-            )
+            date_filter = (
+                f"(created >= '{start_date}' AND created <= '{end_date}') "
+                f"OR (updated >= '{start_date}' AND updated <= '{end_date}')"
+            )

-            _jql = f"{date_filter} ORDER BY created DESC"
+            _jql = f"{date_filter} ORDER BY created DESC"
             if project_key:
-                _jql = (
-                    f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC'
-                )
+                safe_project = project_key.replace('"', '\\"')
+                _jql = (
+                    f'project = "{safe_project}" AND ({date_filter}) ORDER BY created DESC'
+                )

Optional refinements:

  • Use startOfDay/endOfDay for inclusivity and TZ safety:
    created >= startOfDay("YYYY-MM-DD") AND created <= endOfDay("YYYY-MM-DD") (same for updated).
  • Consider ORDER BY updated DESC to surface most recently changed issues.

🏁 Script executed:

#!/bin/bash
# Find any other occurrences of non-standard JQL fields
rg -nP -C2 '\b(createdDate|updatedDate)\b' --type=py

Length of output: 627


🌐 Web query:

Does Jira JQL use fields named 'createdDate' and 'updatedDate', or should it be 'created' and 'updated'?

💡 Result:

Use created and updated in JQL — createdDate and updatedDate are accepted aliases. [1][2]

Sources: Atlassian JQL fields reference (created → alias createdDate; updated → alias updatedDate). [1][2]


🏁 Script executed:

#!/bin/bash
sed -n '180,260p' surfsense_backend/app/connectors/jira_connector.py

Length of output: 2494


Escape project_key to prevent JQL injection
Sanitize or validate project_key (e.g. escape double quotes or restrict to alphanumeric) before embedding it in the JQL string.

🤖 Prompt for AI Agents
In surfsense_backend/app/connectors/jira_connector.py around lines 225 to 233,
the JQL string interpolates project_key directly which allows JQL injection;
validate or sanitize project_key before embedding it by either restricting it to
an allowed pattern (e.g. only letters, digits, hyphen/underscore) and raising an
error on invalid values, or by escaping any double quotes and backslashes in
project_key (replace " with \" and \ with \\) before inserting into the JQL;
prefer validation against a strict regex (e.g. alphanumeric with allowed
punctuation) to ensure only safe project keys are used.

MODSetter and others added 3 commits October 15, 2025 14:44
chore: updated docs & refactored sst_service.py
- Fixed Jira connector to use constructed JQL query for date filtering
- Fixed ClickUp connector to include date range parameters in API request
- Resolved merge conflicts with remote branch improvements
- Enhanced ClickUp date handling with complete day coverage (00:00:00 to 23:59:59)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
surfsense_backend/app/db.py (1)

285-297: Fix back_populates mismatches (will raise mapper configuration errors).

  • SearchSourceConnector.search_space uses back_populates="search_source_connectors", but SearchSpace defines no search_source_connectors.
  • SearchSourceConnector.user uses back_populates="search_source_connectors", but User defines no search_source_connectors.
  • LLMConfig.search_space uses back_populates="llm_configs" (Lines 353-357), but SearchSpace defines no llm_configs.
  • UserSearchSpacePreference.search_space uses back_populates="user_preferences" (Lines 391-393), but SearchSpace defines no user_preferences.

Add the missing counterparts (and avoid delete-orphan on both parents for the same child):

Apply this diff to restore/add the missing relationships:

 class SearchSpace(BaseModel, TimestampMixin):
@@
     logs = relationship(
         "Log",
         back_populates="search_space",
         order_by="Log.id",
         cascade="all, delete-orphan",
     )
+    # Parents of connectors/configs/preferences
+    search_source_connectors = relationship(
+        "SearchSourceConnector",
+        back_populates="search_space",
+        order_by="SearchSourceConnector.id",
+        cascade="all, delete-orphan",
+    )
+    llm_configs = relationship(
+        "LLMConfig",
+        back_populates="search_space",
+        order_by="LLMConfig.id",
+        cascade="all, delete-orphan",
+    )
+    user_preferences = relationship(
+        "UserSearchSpacePreference",
+        back_populates="search_space",
+        order_by="UserSearchSpacePreference.id",
+        cascade="all, delete-orphan",
+    )
     connector_schedules = relationship(
         "ConnectorSchedule",
         back_populates="search_space",
         order_by="ConnectorSchedule.id",
         cascade="all, delete-orphan",
     )

Add the missing User-side relationship in both auth variants:

@@
     class User(SQLAlchemyBaseUserTableUUID, Base):
         oauth_accounts: Mapped[list[OAuthAccount]] = relationship(
             "OAuthAccount", lazy="joined"
         )
         search_spaces = relationship("SearchSpace", back_populates="user")
         search_space_preferences = relationship(
             "UserSearchSpacePreference",
             back_populates="user",
             cascade="all, delete-orphan",
         )
+        search_source_connectors = relationship(
+            "SearchSourceConnector", back_populates="user"
+        )
@@
     class User(SQLAlchemyBaseUserTableUUID, Base):
         search_spaces = relationship("SearchSpace", back_populates="user")
         search_space_preferences = relationship(
             "UserSearchSpacePreference",
             back_populates="user",
             cascade="all, delete-orphan",
         )
+        search_source_connectors = relationship(
+            "SearchSourceConnector", back_populates="user"
+        )
🧹 Nitpick comments (3)
surfsense_backend/app/services/stt_service.py (2)

25-35: Consider error handling for model initialization failures.

The lazy loading pattern is appropriate, but model initialization could fail due to missing model files, insufficient memory, or other issues. Consider adding try-except around model creation to provide clearer error messages.

 def _get_model(self) -> WhisperModel:
     """Lazy load the Whisper model."""
     if self._model is None:
-        # Use CPU with optimizations for better performance
-        self._model = WhisperModel(
-            self.model_size,
-            device="cpu",
-            compute_type="int8",  # Quantization for faster CPU inference
-            num_workers=1,  # Single worker for stability
-        )
+        try:
+            # Use CPU with optimizations for better performance
+            self._model = WhisperModel(
+                self.model_size,
+                device="cpu",
+                compute_type="int8",  # Quantization for faster CPU inference
+                num_workers=1,  # Single worker for stability
+            )
+        except Exception as e:
+            raise RuntimeError(
+                f"Failed to initialize Whisper model '{self.model_size}': {e}"
+            ) from e
     return self._model

37-68: Add error handling for transcription failures.

The model.transcribe() call can fail for various reasons (corrupted audio, unsupported format, etc.). Consider adding try-except to provide more informative error messages rather than letting raw faster-whisper exceptions propagate.

 def transcribe_file(self, audio_path: str, language: str | None = None) -> dict:
     """Transcribe audio file to text.
     
     Args:
         audio_path: Path to audio file
         language: Optional language code (e.g., "en", "es")
     
     Returns:
         Dict with transcription text and metadata
     """
     model = self._get_model()
 
-    # Transcribe with optimized settings
-    segments, info = model.transcribe(
-        audio_path,
-        language=language,
-        beam_size=1,  # Faster inference
-        best_of=1,  # Single pass
-        temperature=0,  # Deterministic output
-        vad_filter=True,  # Voice activity detection
-        vad_parameters={"min_silence_duration_ms": 500},
-    )
+    try:
+        # Transcribe with optimized settings
+        segments, info = model.transcribe(
+            audio_path,
+            language=language,
+            beam_size=1,  # Faster inference
+            best_of=1,  # Single pass
+            temperature=0,  # Deterministic output
+            vad_filter=True,  # Voice activity detection
+            vad_parameters={"min_silence_duration_ms": 500},
+        )
+    except Exception as e:
+        raise RuntimeError(
+            f"Failed to transcribe audio file '{audio_path}': {e}"
+        ) from e
 
     # Combine all segments
     text = " ".join(segment.text.strip() for segment in segments)
surfsense_backend/app/db.py (1)

300-331: Add DB-level validation for ranges (defensive integrity).

Add CHECK constraints to enforce valid ranges:

  • weekly_day: 0..6
  • hourly_minute: 0..59

Apply this diff:

-from sqlalchemy import (
+from sqlalchemy import (
     ARRAY,
     JSON,
     TIMESTAMP,
     Boolean,
     Column,
     Enum as SQLAlchemyEnum,
     ForeignKey,
     Integer,
     String,
     Text,
     Time,
     UniqueConstraint,
     text,
+    CheckConstraint,
 )
@@
 class ConnectorSchedule(BaseModel, TimestampMixin):
     __tablename__ = "connector_schedules"
-    __table_args__ = (
-        UniqueConstraint(
-            "connector_id", "search_space_id", name="uq_connector_search_space"
-        ),
-    )
+    __table_args__ = (
+        UniqueConstraint(
+            "connector_id", "search_space_id", name="uq_connector_search_space"
+        ),
+        CheckConstraint(
+            "weekly_day IS NULL OR (weekly_day BETWEEN 0 AND 6)",
+            name="ck_connector_schedules_weekly_day_range",
+        ),
+        CheckConstraint(
+            "hourly_minute IS NULL OR (hourly_minute BETWEEN 0 AND 59)",
+            name="ck_connector_schedules_hourly_minute_range",
+        ),
+    )

Optional: if you’ll filter frequently by schedule_type, consider index=True on schedule_type.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c044d42 and b07ce72.

⛔ Files ignored due to path filters (1)
  • surfsense_backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • surfsense_backend/.env.example (1 hunks)
  • surfsense_backend/app/config/__init__.py (1 hunks)
  • surfsense_backend/app/db.py (4 hunks)
  • surfsense_backend/app/routes/documents_routes.py (2 hunks)
  • surfsense_backend/app/services/stt_service.py (1 hunks)
  • surfsense_backend/pyproject.toml (1 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx (1 hunks)
  • surfsense_web/components/chat/ChatInputGroup.tsx (1 hunks)
  • surfsense_web/components/inference-params-editor.tsx (1 hunks)
  • surfsense_web/components/onboard/add-provider-step.tsx (1 hunks)
  • surfsense_web/content/docs/docker-installation.mdx (1 hunks)
  • surfsense_web/content/docs/manual-installation.mdx (1 hunks)
  • surfsense_web/drizzle.config.ts (1 hunks)
  • surfsense_web/hooks/use-connector-edit-page.ts (2 hunks)
✅ Files skipped from review due to trivial changes (6)
  • surfsense_web/content/docs/manual-installation.mdx
  • surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx
  • surfsense_web/components/chat/ChatInputGroup.tsx
  • surfsense_web/components/inference-params-editor.tsx
  • surfsense_web/drizzle.config.ts
  • surfsense_web/content/docs/docker-installation.mdx
🧰 Additional context used
📓 Path-based instructions (3)
**/.env.*

📄 CodeRabbit inference engine (.rules/no_env_files_in_repo.mdc)

Do not commit variant environment files like .env.* (e.g., .env.local, .env.production)

Files:

  • surfsense_backend/.env.example
**/.env.example

📄 CodeRabbit inference engine (.rules/no_env_files_in_repo.mdc)

Provide a .env.example file with placeholder values instead of real secrets

Files:

  • surfsense_backend/.env.example
**/*.{jsx,tsx}

📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)

**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings

Files:

  • surfsense_web/components/onboard/add-provider-step.tsx
🧬 Code graph analysis (1)
surfsense_backend/app/routes/documents_routes.py (2)
surfsense_backend/app/services/stt_service.py (1)
  • transcribe_file (37-68)
surfsense_backend/app/services/task_logging_service.py (1)
  • log_task_progress (164-208)
🔇 Additional comments (16)
surfsense_web/hooks/use-connector-edit-page.ts (2)

340-340: LGTM! Validation logic is correct and more concise.

The consolidated validation condition correctly checks that the parsed value is an integer between 0 and 2, matching the error message. This refactor improves readability without changing behavior.


519-519: LGTM! Form update is consistent with the codebase pattern.

The single-line form value update follows the same pattern used throughout this file for other form fields, using the || operator to provide a default empty string.

surfsense_web/components/onboard/add-provider-step.tsx (1)

21-21: LGTM: Alphabetical import ordering.

The reordering of LLM_PROVIDERS import to follow LANGUAGES improves consistency by maintaining alphabetical order within the enum imports group. This change has no functional impact.

surfsense_backend/app/config/__init__.py (1)

105-105: LGTM! Improved comment accuracy.

The updated comment correctly reflects that the STT service can now be either local (Faster-Whisper) or external (LiteLLM), rather than being limited to LiteLLM.

surfsense_backend/app/services/stt_service.py (2)

15-23: LGTM! Model size parsing with sensible fallback.

The initialization correctly extracts the model size from the config string (e.g., "local/base") with an appropriate fallback to "base" if the format is unexpected.


86-96: LGTM! Proper temporary file cleanup.

The temporary file is correctly cleaned up in the finally block, ensuring cleanup even if transcription fails.

surfsense_backend/app/routes/documents_routes.py (4)

787-793: LGTM! Clean STT service type detection.

The logic correctly identifies local STT by checking if the config starts with "local/", with appropriate fallback to "external".


817-826: LGTM! Comprehensive progress logging for local STT.

The progress logging includes relevant metadata (language, confidence, duration) and clearly indicates the local transcription path.


876-893: LGTM! Proper success logging with STT service type.

The success logging correctly includes the stt_service_type metadata (line 885), enabling differentiation between local and external transcription in logs and analytics.


829-853: The file handle is safe and properly managed as-is.

The current code already keeps the file open during the async operation. The with statement context manager is the outer scope of the await atranscription() call, meaning the file remains open while the function reads it (as required by litellm), and only closes after the await completes. This is the correct and recommended pattern for handling file objects with litellm's atranscription.

The suggested refactor to manual open()/try/finally provides no benefit and is unnecessary—the context manager is superior as it guarantees proper cleanup even if an exception occurs.

Likely an incorrect or invalid review comment.

surfsense_backend/.env.example (1)

34-40: LGTM! Clear STT configuration with local default.

The updated configuration appropriately defaults to local STT (local/base) with clear comments explaining both local and external provider options. The commented-out external provider settings serve as useful documentation.

surfsense_backend/pyproject.toml (1)

46-46: Version constraint is appropriate.

The >=1.1.0 constraint is correct. Verification confirms that faster-whisper v1.1.0 includes all required features: WhisperModel with device, compute_type, and num_workers parameters, plus VAD filtering with customizable parameters. The constraint allows for security patches and compatible updates while maintaining the required functionality.

surfsense_backend/app/db.py (4)

19-19: Good: added Time import needed for new schedule fields.


133-138: ScheduleType enum addition looks correct and scoped.


257-262: New SearchSpace.connector_schedules relationship looks good.


300-331: Alembic migration verified: all four schedule columns are present.

The migration file surfsense_backend/alembic/versions/23_add_connector_schedules_table.py includes all four new columns in the table creation:

  • daily_time TIME
  • weekly_day SMALLINT
  • weekly_time TIME
  • hourly_minute SMALLINT

Model, schemas, and service code all properly reference these columns. No further action needed.

Comment on lines +795 to +815
# Check if using local STT service
if stt_service_type == "local":
# Use local Faster-Whisper for transcription
from app.services.stt_service import stt_service

try:
result = stt_service.transcribe_file(file_path)
transcribed_text = result.get("text", "")

if not transcribed_text:
raise ValueError("Transcription returned empty text")

# Add metadata about the transcription
transcribed_text = (
f"# Transcription of {filename}\n\n{transcribed_text}"
)
else:
except Exception as e:
raise HTTPException(
status_code=422,
detail=f"Failed to transcribe audio file {filename}: {e!s}",
) from e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace HTTPException with regular exception in background task.

Raising HTTPException on line 812 is incorrect for background task execution. Since this code runs in process_file_in_background, HTTP exceptions cannot be properly propagated to the client. This will likely cause the exception to be logged but not handled gracefully.

Apply this diff to use appropriate exception handling:

             try:
                 result = stt_service.transcribe_file(file_path)
                 transcribed_text = result.get("text", "")
 
                 if not transcribed_text:
                     raise ValueError("Transcription returned empty text")
 
                 # Add metadata about the transcription
                 transcribed_text = (
                     f"# Transcription of {filename}\n\n{transcribed_text}"
                 )
             except Exception as e:
-                raise HTTPException(
-                    status_code=422,
-                    detail=f"Failed to transcribe audio file {filename}: {e!s}",
-                ) from e
+                # Let the exception propagate to be caught by the outer try-except
+                # which will log it via task_logger.log_task_failure
+                raise RuntimeError(
+                    f"Failed to transcribe audio file {filename}: {e!s}"
+                ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check if using local STT service
if stt_service_type == "local":
# Use local Faster-Whisper for transcription
from app.services.stt_service import stt_service
try:
result = stt_service.transcribe_file(file_path)
transcribed_text = result.get("text", "")
if not transcribed_text:
raise ValueError("Transcription returned empty text")
# Add metadata about the transcription
transcribed_text = (
f"# Transcription of {filename}\n\n{transcribed_text}"
)
else:
except Exception as e:
raise HTTPException(
status_code=422,
detail=f"Failed to transcribe audio file {filename}: {e!s}",
) from e
# Check if using local STT service
if stt_service_type == "local":
# Use local Faster-Whisper for transcription
from app.services.stt_service import stt_service
try:
result = stt_service.transcribe_file(file_path)
transcribed_text = result.get("text", "")
if not transcribed_text:
raise ValueError("Transcription returned empty text")
# Add metadata about the transcription
transcribed_text = (
f"# Transcription of {filename}\n\n{transcribed_text}"
)
except Exception as e:
# Let the exception propagate to be caught by the outer try-except
# which will log it via task_logger.log_task_failure
raise RuntimeError(
f"Failed to transcribe audio file {filename}: {e!s}"
) from e
🤖 Prompt for AI Agents
In surfsense_backend/app/routes/documents_routes.py around lines 795 to 815, the
background transcription block currently raises an HTTPException inside the
background task which cannot be propagated to the client; replace the
HTTPException with a regular Exception (or a custom non-HTTP exception) and
ensure you log the error before re-raising so the background worker can handle
it properly (e.g., raise Exception(f"Failed to transcribe audio file {filename}:
{e}") or log and return an error state/object instead of using HTTPException).

@MODSetter
Copy link
Owner

@vaishcodescape I think this should be done after we add Celery as the message queue. I believe this was already conveyed. Please DM me on Discord if you’d like to work on something that’s more likely to get merged. Closing this for now — sorry.

@MODSetter MODSetter closed this Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants