feat:added automated connector scheduling system #400

vaishcodescape · 2025-10-14T17:20:23Z

Implemented Automated Connector Scheduling System

Description

This PR implements a comprehensive automated scheduling system that allows users to configure periodic syncs for their connectors (Slack, Notion, GitHub, Linear, etc.) without manual intervention. The system addresses critical user pain points around data staleness, manual overhead, and peak load issues.

The implementation includes:

Background Scheduler Service: Continuously running service that monitors and executes scheduled connector syncs
Enhanced Schedule Configuration: Support for hourly, daily, weekly, and custom cron-based schedules with flexible time selection
Comprehensive API: Full CRUD operations for schedule management with monitoring endpoints
Modern Frontend UI: Intuitive interface for creating and managing connector schedules
Robust Error Handling: Automatic retry logic, comprehensive logging, and graceful failure handling
Real-time Monitoring: Live status tracking and execution history

Motivation and Context

Currently, users must manually trigger syncs for each connector to index new content into their search spaces. This creates several critical issues:

Data Staleness: Users often forget to sync, leading to outdated search results
Manual Overhead: Tedious to sync multiple connectors regularly
Peak Load Issues: Users tend to sync during business hours, causing resource spikes
Missed Updates: Important information may not be searchable until user manually syncs

This automated scheduling system solves these problems by providing:

Set-and-forget automation for connector syncing
Off-peak hour execution to reduce server load
Consistent data freshness without user intervention
Scalable background processing with concurrency limits

Changes Overview

Backend Infrastructure

ConnectorSchedulerService: Core background service managing schedule execution
- Continuous monitoring of due schedules (60-second intervals)
- Concurrent job execution with configurable limits (max 5 concurrent jobs)
- Automatic next-run calculation and schedule progression
- Graceful error handling with retry logic
Enhanced Database Schema: Extended ConnectorSchedule model with:
- Support for multiple schedule types (HOURLY, DAILY, WEEKLY, CUSTOM)
- Flexible time configuration options (daily_time, weekly_day, weekly_time, hourly_minute)
- Comprehensive tracking (last_run_at, next_run_at, is_active)
- Proper foreign key relationships and constraints
Schedule Helpers: Enhanced time calculation utilities supporting:
- Configurable daily times (default: 2 AM off-peak hours)
- Weekly schedules with day and time selection
- Hourly schedules with minute precision
- Custom cron expression validation and parsing

API Endpoints

Schedule Management: Full CRUD operations for connector schedules
- POST /api/v1/connector-schedules/ - Create new schedules
- GET /api/v1/connector-schedules/ - List schedules with filtering
- PUT /api/v1/connector-schedules/{id} - Update existing schedules
- DELETE /api/v1/connector-schedules/{id} - Remove schedules
- PATCH /api/v1/connector-schedules/{id}/toggle - Activate/deactivate
Scheduler Monitoring: Real-time status and control endpoints
- GET /api/v1/scheduler/status - Current scheduler health and statistics
- POST /api/v1/scheduler/schedules/{id}/force-execute - Manual execution
- GET /api/v1/scheduler/schedules/upcoming - Next scheduled executions
- GET /api/v1/scheduler/schedules/recent-executions - Execution history

Frontend Implementation

Schedule Management UI: Comprehensive React-based interface
- Intuitive schedule creation with type-specific configuration options
- Real-time status dashboard showing scheduler health and active jobs
- Schedule listing with execution history and next run times
- One-click schedule activation/deactivation and manual execution
- Responsive design with proper loading states and error handling

Integration & Lifecycle Management

Application Integration: Seamless integration with FastAPI lifespan events
- Automatic scheduler startup on application launch
- Graceful shutdown with active job cancellation
- Proper resource cleanup and task management
Background Task Integration: Leverages existing FastAPI BackgroundTasks infrastructure
- Reuses proven connector indexer functions
- Maintains existing error handling and logging patterns
- Preserves user isolation and security boundaries

Monitoring & Observability

Comprehensive Logging: Detailed execution tracking through existing TaskLoggingService
- Task start/success/failure logging with metadata
- Error message capture and debugging information
- Execution duration and document processing statistics
Real-time Status: Live monitoring capabilities
- Active job tracking and concurrency limits
- Scheduler health status and configuration details
- Upcoming schedule visibility for debugging

Technical Implementation Details

Architecture

Service Pattern: Singleton scheduler service with global lifecycle management
Async/Await: Full asynchronous implementation for optimal performance
Database Transactions: Proper transaction handling for schedule updates
Error Isolation: Individual schedule failures don't affect other schedules

Security & Validation

User Isolation: Schedules are scoped to user-owned connectors and search spaces
Input Validation: Comprehensive Pydantic validation for schedule parameters
Cron Expression Validation: Safe cron parsing with error handling
Ownership Verification: Proper authorization checks for all operations

Performance Considerations

Concurrent Execution: Configurable limits prevent resource exhaustion
Efficient Queries: Optimized database queries with proper indexing
Incremental Sync: Smart date range calculation for efficient data processing
Background Processing: Non-blocking execution doesn't impact API responsiveness

Files Modified/Created

New Files

app/services/connector_scheduler_service.py - Core scheduler implementation
app/routes/scheduler_routes.py - Scheduler monitoring and control APIs
app/dashboard/[search_space_id]/connectors/schedules/page.tsx - Frontend UI

Modified Files

app/app.py - Integrated scheduler lifecycle management
app/schemas/connector_schedule.py - Enhanced validation and time options
app/utils/schedule_helpers.py - Extended time calculation utilities

Database

alembic/versions/23_add_connector_schedules_table.py - Schema migration (already existed)

API Changes

This PR includes API changes

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance improvement (non-breaking change which enhances performance)
Documentation update
Breaking change (fix or feature that would cause existing functionality to change)

Testing

I have tested these changes locally
I have added/updated unit tests
I have added/updated integration tests

Checklist:

My code follows the code style of this project
My change requires documentation updates
I have updated the documentation accordingly
My change requires dependency updates
I have updated the dependencies accordingly
My code builds clean without any errors or warnings
All new and existing tests passed

High-level PR Summary

This PR implements a comprehensive automated connector scheduling system that enables users to configure periodic syncs (hourly, daily, weekly, or custom cron-based) for their connectors without manual intervention. The implementation includes a background scheduler service that continuously monitors and executes scheduled jobs, enhanced database schema with the ConnectorSchedule model supporting multiple schedule types, complete REST APIs for schedule CRUD operations and monitoring, a modern React frontend for schedule management, and robust error handling with automatic retry logic and comprehensive logging. The scheduler integrates seamlessly with FastAPI lifespan events and leverages existing connector indexer functions to maintain user isolation and security boundaries.

⏱️ Estimated Review Time: 1-3 hours

💡 Review Order Suggestion

Order	File Path
1	`surfsense_backend/alembic/versions/23_add_connector_schedules_table.py`
2	`surfsense_backend/app/db.py`
3	`surfsense_backend/app/schemas/connector_schedule.py`
4	`surfsense_backend/app/schemas/__init__.py`
5	`surfsense_backend/app/utils/schedule_helpers.py`
6	`surfsense_backend/app/services/connector_scheduler_service.py`
7	`surfsense_backend/app/routes/connector_schedules_routes.py`
8	`surfsense_backend/app/routes/scheduler_routes.py`
9	`surfsense_backend/app/app.py`
10	`surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx`
11	`surfsense_web/package-lock.json`
12	`.idea/.gitignore`
13	`.idea/SurfSense.iml`
14	`.idea/inspectionProfiles/profiles_settings.xml`
15	`.idea/modules.xml`
16	`.idea/vcs.xml`

⚠️ Inconsistent Changes Detected

File Path	Warning
`.idea/.gitignore`	IDE configuration files should typically be in .gitignore rather than committed to the repository
`.idea/SurfSense.iml`	IDE-specific IntelliJ/PyCharm configuration files are unrelated to the scheduling feature and should not be committed
`.idea/inspectionProfiles/profiles_settings.xml`	IDE-specific inspection profile settings are unrelated to the scheduling feature
`.idea/modules.xml`	IDE-specific project module configuration is unrelated to the scheduling feature
`.idea/vcs.xml`	IDE-specific VCS configuration is unrelated to the scheduling feature

High-level PR Summary

This PR implements a comprehensive automated connector scheduling system that enables users to configure periodic syncs (hourly, daily, weekly, or custom cron-based) for their connectors without manual intervention. The implementation includes a background scheduler service (ConnectorSchedulerService) that continuously monitors and executes scheduled jobs with configurable concurrency limits, enhanced database schema with the ConnectorSchedule model supporting multiple schedule types and tracking fields, complete REST APIs for schedule CRUD operations and real-time monitoring, a modern React frontend for intuitive schedule management, and robust error handling with automatic retry logic and comprehensive logging. The scheduler integrates seamlessly with FastAPI lifespan events, leverages existing connector indexer functions to maintain user isolation and security boundaries, and provides real-time status tracking with execution history for monitoring and debugging.

⏱️ Estimated Review Time: 1-3 hours

💡 Review Order Suggestion

Order	File Path
1	`surfsense_backend/alembic/versions/23_add_connector_schedules_table.py`
2	`surfsense_backend/app/db.py`
3	`surfsense_backend/app/schemas/connector_schedule.py`
4	`surfsense_backend/app/schemas/__init__.py`
5	`surfsense_backend/app/utils/schedule_helpers.py`
6	`surfsense_backend/app/services/connector_scheduler_service.py`
7	`surfsense_backend/app/routes/connector_schedules_routes.py`
8	`surfsense_backend/app/routes/scheduler_routes.py`
9	`surfsense_backend/app/app.py`
10	`surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx`
11	`surfsense_web/package-lock.json`
12	`.idea/.gitignore`
13	`.idea/SurfSense.iml`
14	`.idea/inspectionProfiles/profiles_settings.xml`
15	`.idea/modules.xml`
16	`.idea/vcs.xml`

⚠️ Inconsistent Changes Detected

File Path	Warning
`.idea/.gitignore`	IDE configuration files (.idea directory) should typically be in .gitignore rather than committed to the repository, as they are user/environment-specific and unrelated to the automated connector scheduling feature
`.idea/SurfSense.iml`	IntelliJ/PyCharm IDE-specific project configuration file is unrelated to the scheduling feature and should not be committed to version control
`.idea/inspectionProfiles/profiles_settings.xml`	IDE-specific inspection profile settings are environment-specific and unrelated to the automated connector scheduling feature
`.idea/modules.xml`	IDE-specific project module configuration is unrelated to the scheduling feature and represents personal development environment settings
`.idea/vcs.xml`	IDE-specific VCS configuration is unrelated to the automated connector scheduling feature and should be excluded via .gitignore

Summary by CodeRabbit

New Features
- Connector scheduling (Hourly, Daily, Weekly, Custom cron): full CRUD, background execution, manual force-run, and web UI for managing schedules.
- Local speech-to-text support (faster local transcription) with integrated transcription metadata.
Improvements
- Scheduler lifecycle, next-run calculation and cron validation; status and recent-execution endpoints.
- Improved connector indexing/date-range filtering (Jira/ClickUp) and increased documents page default page size.
Chores
- Added IDE config entries and updated .gitignore.

- Add STT service with CPU-optimized Faster-Whisper - Add API endpoints for transcription and model management - Add React audio recorder component - Support multiple Whisper models (tiny to large-v3) - Include error handling for corrupted/invalid files - Tested with real speech audio (99% accuracy) - No external API dependencies, fully offline

- Simplify STT_SERVICE config to local/MODEL_SIZE format - Remove separate STT routes, integrate with document upload - Add local STT support to audio file processing pipeline - Remove React component, use existing upload interface - Support both local Faster-Whisper and external STT services - Tested with real speech: 99% accuracy, 2.87s processing

- Compute stt_service_type once and reuse - Follow DRY principles - Improve code maintainability

- Use .get() for safe dictionary access instead of direct key access - Add explicit try-catch for local STT transcription failures - Validate transcription result is not empty - Provide clear error messages for corrupted audio files - Match error handling pattern with external STT service

- Add header to local STT transcription for consistency - Add empty text validation for external STT path - Refactor external STT to eliminate duplication in atranscription calls - Ensure both local and external paths have consistent error handling

vercel · 2025-10-14T17:20:29Z

@vaishcodescape is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2025-10-14T17:20:48Z

Walkthrough

Adds end-to-end connector scheduling: DB migration and ORM model, Pydantic schemas, schedule helpers, a background ConnectorSchedulerService with lifecycle and force-execute, new API routes and frontend UI, STT local support and related connector/date-range updates, plus IDE project files and app wiring.

Changes

Cohort / File(s)	Summary
IDE configuration `\.idea/.gitignore`, `\.idea/SurfSense.iml`, `\.idea/inspectionProfiles/profiles_settings.xml`, `\.idea/modules.xml`, `\.idea/vcs.xml`	Adds IntelliJ project files and ignore entries (`shelf/`, `workspace.xml`); defines module, inspection profile, module mapping, and Git VCS mapping.
DB migration `surfsense_backend/alembic/versions/23_add_connector_schedules_table.py`	New Alembic migration: creates `scheduletype` enum and `connector_schedules` table with unique constraint on (`connector_id`, `search_space_id`), multiple columns, and conditional indexes; downgrade removes them.
ORM models `surfsense_backend/app/db.py`	Adds `ScheduleType` enum and `ConnectorSchedule` model; updates `SearchSpace` and `SearchSourceConnector` relationships to include schedules and adjust cascades/ownership.
Schemas `surfsense_backend/app/schemas/__init__.py`, `surfsense_backend/app/schemas/connector_schedule.py`	Adds Pydantic schemas: `ConnectorScheduleBase`, `ConnectorScheduleCreate`, `ConnectorScheduleUpdate`, `ConnectorScheduleRead` with cross-field validation and exports.
Scheduling utils `surfsense_backend/app/utils/schedule_helpers.py`	Adds `calculate_next_run` and `is_valid_cron_expression` helpers for HOURLY/DAILY/WEEKLY/CUSTOM schedules (uses croniter) and validations.
Scheduler service `surfsense_backend/app/services/connector_scheduler_service.py`	Implements `ConnectorSchedulerService`: periodic loop, due-schedule discovery, concurrency control, indexer mapping, task logging, start/stop lifecycle, status reporting, and force-execute; global getters.
API routes: connector schedules `surfsense_backend/app/routes/connector_schedules_routes.py`	New CRUD and toggle endpoints for `ConnectorSchedule` with ownership and indexability checks, next_run calculation, error handling, and DB session usage.
API routes: scheduler control `surfsense_backend/app/routes/scheduler_routes.py`	New scheduler endpoints: status, force-execute (background task), upcoming schedules, and recent executions; uses auth, background tasks, and DB queries.
App wiring `surfsense_backend/app/app.py`	Integrates scheduler lifecycle into app lifespan (start/stop), registers new routers (`connector-schedules`, `scheduler`), enables CORS, and ensures DB tables created at startup.
Frontend: schedules UI `surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx`	Adds page to list/create/toggle/force-execute schedules, show scheduler status, dynamic form fields per schedule type, and interactions with new API endpoints.
Connectors: ClickUp & Jira `surfsense_backend/app/connectors/clickup_connector.py`, `surfsense_backend/app/connectors/jira_connector.py`	ClickUp: use date-range parsing and query params for task range. Jira: extend JQL to include updatedDate, ordering, and project-scoped grouping.
Local STT & docs `surfsense_backend/app/services/stt_service.py`, `surfsense_backend/app/routes/documents_routes.py`, `surfsense_backend/.env.example`, docs files, `pyproject.toml`	Adds faster-whisper dependency and local STT service with file/bytes transcription; documents and env examples updated to reflect local vs remote STT/TTS options.
Misc frontend & formatting `surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx`, various web components and config files	Small UI default changes (page size), minor formatting/import reorders across several frontend files and drizzle config.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant FE as Frontend (Schedules Page)
  participant API as FastAPI (/api/v1)
  participant Svc as ConnectorSchedulerService
  participant DB as DB (AsyncSession)
  rect rgba(200,230,255,0.18)
    note over API,Svc: App startup wiring
    API->>Svc: start_scheduler()
    Svc->>Svc: start background loop (check_interval)
  end
  User->>FE: open schedules page
  FE->>API: GET /api/v1/scheduler/status
  API->>Svc: get_scheduler_status()
  Svc-->>API: status JSON
  API-->>FE: status JSON
  FE->>API: GET /api/v1/connector-schedules?search_space_id=…
  API->>DB: query schedules (+ connector, search_space)
  DB-->>API: schedules
  API-->>FE: schedules JSON
  rect rgba(220,255,220,0.18)
    note over Svc,DB: Periodic scheduler run
    Svc->>DB: find due schedules
    DB-->>Svc: due schedules
    loop per due schedule (bounded by concurrency)
      Svc->>DB: update last_run_at
      Svc->>Svc: dispatch indexer task (by connector type)
      Svc->>DB: update next_run_at
    end
  end

sequenceDiagram
  autonumber
  actor User
  participant FE as Frontend
  participant API as FastAPI (/api/v1)
  participant BG as BackgroundTasks
  participant Svc as ConnectorSchedulerService
  User->>FE: Click "Force execute"
  FE->>API: POST /api/v1/scheduler/schedules/{id}/force-execute
  API->>BG: enqueue _force_execute_schedule_task(schedule_id)
  API-->>FE: 202 Accepted
  BG->>Svc: force_execute_schedule(schedule_id)
  Svc->>DB: validate schedule active & fetch
  Svc->>Svc: _execute_schedule -> update last_run_at/next_run_at and run indexer
  Svc-->>BG: result / logs

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

local STT implementation with Faster-Whisper #387 — Related local STT work: adds faster-whisper STT, documents, and service integration overlapping this PR's STT changes.
[Feature] Add clickup connector #242 — Overlaps ClickUp connector additions and date-range handling updated here.
feat(BACKEND): Implement task logging for connector indexing tasks #193 — Related task-logging integration used by the scheduler and indexer execution flows.

Suggested reviewers

MODSetter

Poem

I thump my paws to mark the time—tick, hop, run!
Cron-flowers open, hourly chores begun.
I queue the jobs, I nudge the night and day,
A rabbit on a schedule, eager to play. 🥕
Click “force” — I dash, indexing all the way.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “feat:added automated connector scheduling system” directly and concisely reflects the primary feature introduced by this pull request, namely the implementation of an automated scheduling service for connector syncs, and aligns with the detailed objectives without including irrelevant details or generic phrasing.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

recurseml

Review by RecurseML

🔍 Review performed on d86aaea..b870ddb

Severity	Location	Issue	Delete
	surfsense_backend/app/routes/connector_schedules_routes.py:206	Incomplete route implementation causing syntax error
	surfsense_backend/app/app.py:101	Missing router registration breaks API
	surfsense_backend/app/utils/schedule_helpers.py:35	Timezone-naive datetime causes comparison errors
	surfsense_backend/app/routes/connector_schedules_routes.py:84	Missing time parameters causes incorrect schedules

✅ Files analyzed, no issues (12)

• .idea/.gitignore
• .idea/SurfSense.iml
• .idea/inspectionProfiles/profiles_settings.xml
• .idea/modules.xml
• .idea/vcs.xml
• surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/scheduler_routes.py
• surfsense_backend/app/schemas/__init__.py
• surfsense_backend/app/schemas/connector_schedule.py
• surfsense_backend/app/services/connector_scheduler_service.py
• surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

⏭️ Files skipped (1)

Locations
`surfsense_web/package-lock.json`

surfsense_backend/app/routes/connector_schedules_routes.py

surfsense_backend/app/app.py

surfsense_backend/app/utils/schedule_helpers.py

surfsense_backend/app/routes/connector_schedules_routes.py

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

surfsense_backend/app/app.py (1)
56-63: CORS misconfiguration: allow_credentials=True with "*" origins.

Browsers disallow wildcard origin when credentials are enabled; requests will fail and it’s insecure. Use explicit origins or set allow_credentials=False.

Example:
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=config.CORS_ALLOW_ORIGINS,  # e.g., ["https://app.example.com"]
+    allow_credentials=True,
+    allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
+    allow_headers=["*"],
+)
Ensure config.CORS_ALLOW_ORIGINS is defined appropriately per env.
surfsense_backend/app/db.py (1)
283-296: Define missing reverse relationships in SearchSpace
Add these three relationships to the SearchSpace class to match existing back_populates and avoid mapping errors:
search_source_connectors = relationship(
    "SearchSourceConnector",
    back_populates="search_space",
    order_by="SearchSourceConnector.id",
    cascade="all, delete-orphan",
)
llm_configs = relationship(
    "LLMConfig",
    back_populates="search_space",
    order_by="LLMConfig.id",
    cascade="all, delete-orphan",
)
user_preferences = relationship(
    "UserSearchSpacePreference",
    back_populates="search_space",
    order_by="UserSearchSpacePreference.id",
    cascade="all, delete-orphan",
)
Place these inside the SearchSpace class (e.g. after connector_schedules).

🧹 Nitpick comments (11)

.idea/SurfSense.iml (1)

1-8: Remove IDE artifacts from version control.

This IntelliJ module file is environment-specific, churns often, and doesn’t contribute to the backend/frontend feature set. Please drop it (and the other .idea XML files) from the repo and rely on .gitignore to keep the directory untracked.
surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1)
61-78: Indexes: add composite (is_active, next_run_at); consider dropping redundant id index.

Scheduler likely queries WHERE is_active = TRUE AND next_run_at <= now(); a composite index helps.

The explicit id index duplicates the PK index; remove if not needed.

Apply this diff to add indexes:
     if "ix_connector_schedules_id" not in existing_indexes:
-        op.create_index("ix_connector_schedules_id", "connector_schedules", ["id"])
+        # Optional: id already indexed by PK; consider skipping this extra index.
+        pass
@@
     if "ix_connector_schedules_next_run_at" not in existing_indexes:
         op.create_index(
             "ix_connector_schedules_next_run_at", "connector_schedules", ["next_run_at"]
         )
+    if "ix_connector_schedules_search_space_id" not in existing_indexes:
+        op.create_index(
+            "ix_connector_schedules_search_space_id",
+            "connector_schedules",
+            ["search_space_id"],
+        )
+    if "ix_connector_schedules_is_active_next_run_at" not in existing_indexes:
+        op.create_index(
+            "ix_connector_schedules_is_active_next_run_at",
+            "connector_schedules",
+            ["is_active", "next_run_at"],
+        )
And include drops in downgrade:
-    op.drop_index("ix_connector_schedules_next_run_at", table_name="connector_schedules")
+    op.drop_index("ix_connector_schedules_is_active_next_run_at", table_name="connector_schedules")
+    op.drop_index("ix_connector_schedules_search_space_id", table_name="connector_schedules")
+    op.drop_index("ix_connector_schedules_next_run_at", table_name="connector_schedules")
-    op.drop_index("ix_connector_schedules_id", table_name="connector_schedules")
+    # If created earlier; otherwise safe to skip or guard with IF EXISTS pattern
+    # op.drop_index("ix_connector_schedules_id", table_name="connector_schedules")
surfsense_backend/app/app.py (2)
31-33: Async task is unnecessary; just await start_scheduler().

start_scheduler completes quickly; creating/canceling a task adds noise. Prefer awaiting directly and drop cancel block.

Apply this diff:
-    scheduler_task = asyncio.create_task(start_scheduler())
-    logger.info("Connector scheduler service started")
+    await start_scheduler()
+    logger.info("Connector scheduler service started")
Then remove the later cancel section (Lines 43-49).

26-29: Avoid create_all in production; prefer Alembic migrations.

Running create_db_and_tables() alongside Alembic can cause drift. Gate behind a dev flag or remove in prod environments.
surfsense_backend/app/utils/schedule_helpers.py (2)
37-41: Optional: clamp hourly_minute defensively.

Schemas validate range, but add a local guard to avoid surprises if called directly.

Apply this diff:
-        minute = hourly_minute if hourly_minute is not None else 0
+        minute = hourly_minute if hourly_minute is not None else 0
+        if minute < 0 or minute > 59:
+            raise ValueError("hourly_minute must be between 0 and 59")
51-64: Weekly scheduling logic: clarify default timezone semantics.

If daily/weekly times represent local-wall times, consider storing a timezone per search space or using config default TZ; current UTC assumption may surprise users across DST.
surfsense_backend/app/schemas/connector_schedule.py (1)
88-91: Consider adding optional time fields to Update schema

Allow updating daily/weekly/hourly options if persisted in DB; otherwise clients can’t change them post‑create.
 class ConnectorScheduleUpdate(BaseModel):
@@
-    schedule_type: ScheduleType | None = None
-    cron_expression: str | None = None
-    is_active: bool | None = None
+    schedule_type: ScheduleType | None = None
+    cron_expression: str | None = None
+    is_active: bool | None = None
+    daily_time: time | None = None
+    weekly_day: int | None = None
+    weekly_time: time | None = None
+    hourly_minute: int | None = None
Also applies to: 94-100
surfsense_backend/app/routes/scheduler_routes.py (1)
95-106: Bound the limit parameter to avoid heavy queries

Add simple clamping/validation (e.g., 1..100) to prevent abuse.
-async def get_upcoming_schedules(
-    limit: int = 10,
+async def get_upcoming_schedules(
+    limit: int = 10,
@@
-    try:
+    try:
+        limit = max(1, min(limit, 100))
@@
-async def get_recent_schedule_executions(
-    limit: int = 20,
+async def get_recent_schedule_executions(
+    limit: int = 20,
@@
-    try:
+    try:
+        limit = max(1, min(limit, 100))
Also applies to: 152-163
surfsense_backend/app/routes/connector_schedules_routes.py (1)

41-45: Frontend relies on filtering by search_space_id; ensure API supports it

The UI calls GET /connector-schedules/?search_space_id=... which your code supports. Consider also adding a similar filter to /search-source-connectors to simplify client-side filtering.

Also applies to: 118-121
surfsense_backend/app/services/connector_scheduler_service.py (2)
289-303: Recalculate next_run with full time options if persisted

Only schedule_type/cron are passed, ignoring daily/weekly/hourly options. If these fields exist in DB, include them.
-        next_run = calculate_next_run(
-            schedule.schedule_type, schedule.cron_expression
-        )
+        next_run = calculate_next_run(
+            schedule.schedule_type,
+            schedule.cron_expression,
+            getattr(schedule, "daily_time", None),
+            getattr(schedule, "weekly_day", None),
+            getattr(schedule, "weekly_time", None),
+            getattr(schedule, "hourly_minute", None),
+        )
366-370: start_scheduler awaits an infinite loop; ensure it’s spawned as a background task

If start_scheduler() is awaited directly in app startup, it will block the app. Spawn with asyncio.create_task(start_scheduler()) in lifespan/startup hooks.

Where is start_scheduler() invoked? If it’s awaited, switch to creating a task.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d86aaea and 6fa1a00.

⛔ Files ignored due to path filters (1)

surfsense_web/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (15)

.idea/.gitignore (1 hunks)
.idea/SurfSense.iml (1 hunks)
.idea/inspectionProfiles/profiles_settings.xml (1 hunks)
.idea/modules.xml (1 hunks)
.idea/vcs.xml (1 hunks)
surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1 hunks)
surfsense_backend/app/app.py (3 hunks)
surfsense_backend/app/db.py (3 hunks)
surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
surfsense_backend/app/routes/scheduler_routes.py (1 hunks)
surfsense_backend/app/schemas/__init__.py (2 hunks)
surfsense_backend/app/schemas/connector_schedule.py (1 hunks)
surfsense_backend/app/services/connector_scheduler_service.py (1 hunks)
surfsense_backend/app/utils/schedule_helpers.py (1 hunks)
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{jsx,tsx}

📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)

**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings

Files:

surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

🧬 Code graph analysis (8)

surfsense_backend/app/schemas/__init__.py (1)

surfsense_backend/app/schemas/connector_schedule.py (4)

ConnectorScheduleBase (11-85)

ConnectorScheduleCreate (88-91)

ConnectorScheduleRead (113-119)

ConnectorScheduleUpdate (94-110)

surfsense_backend/app/schemas/connector_schedule.py (2)

surfsense_backend/app/db.py (2)

BaseModel (154-158)

ScheduleType (132-136)

surfsense_backend/app/schemas/base.py (2)

IDModel (11-13)

TimestampModel (6-8)

surfsense_backend/app/routes/connector_schedules_routes.py (4)

surfsense_backend/app/db.py (6)

ConnectorSchedule (298-324)

SearchSourceConnector (263-295)

SearchSpace (220-260)

User (418-427)

User (431-437)

get_async_session (478-480)

surfsense_backend/app/schemas/connector_schedule.py (3)

ConnectorScheduleCreate (88-91)

ConnectorScheduleRead (113-119)

ConnectorScheduleUpdate (94-110)

surfsense_backend/app/utils/check_ownership.py (1)

check_ownership (9-19)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-76)

surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1)

surfsense_backend/app/db.py (1)

ConnectorSchedule (298-324)

surfsense_backend/app/app.py (2)

surfsense_backend/app/services/connector_scheduler_service.py (2)

start_scheduler (366-369)

stop_scheduler (372-377)

surfsense_backend/app/db.py (1)

create_db_and_tables (471-475)

surfsense_backend/app/utils/schedule_helpers.py (1)

surfsense_backend/app/db.py (1)

ScheduleType (132-136)

surfsense_backend/app/services/connector_scheduler_service.py (5)

surfsense_backend/app/db.py (3)

ConnectorSchedule (298-324)

SearchSourceConnectorType (55-71)

get_async_session (478-480)

surfsense_backend/app/services/task_logging_service.py (2)

TaskLoggingService (13-243)

log_task_start (20-58)

surfsense_backend/app/tasks/connector_indexers/slack_indexer.py (1)

index_slack_messages (30-377)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-76)

surfsense_backend/app/routes/scheduler_routes.py (2)

get_scheduler_status (24-45)

force_execute_schedule (49-80)

surfsense_backend/app/routes/scheduler_routes.py (3)

surfsense_backend/app/db.py (3)

get_async_session (478-480)

ConnectorSchedule (298-324)

SearchSpace (220-260)

surfsense_backend/app/schemas/connector_schedule.py (1)

ConnectorScheduleRead (113-119)

surfsense_backend/app/services/connector_scheduler_service.py (3)

get_scheduler (358-363)

get_scheduler_status (305-320)

force_execute_schedule (322-351)

🪛 GitHub Actions: Code Quality Checks

surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

[error] 3-3: biome-check-web: Imports are not sorted. Safe fix available via Organize Imports (Biome).

surfsense_backend/app/app.py

[error] 46-49: SIM105 Use contextlib.suppress(asyncio.CancelledError) instead of try-except-pass

🪛 GitHub Actions: pre-commit

surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

[error] 412-416: lint/correctness/useUniqueElementIds: id attribute should not be a static string literal. Generate unique IDs using useId().

surfsense_backend/app/app.py

[error] 46-49: SIM105 Use contextlib.suppress(asyncio.CancelledError) instead of try-except-pass

🔇 Additional comments (3)

surfsense_backend/app/schemas/__init__.py (1)

11-16: LGTM: ConnectorSchedule schemas exposed cleanly.

Re-exports look consistent with usage across routes/services. No issues.

Also applies to: 61-64

surfsense_backend/app/db.py (2)

132-137: LGTM: ScheduleType enum.

Enum values align with schemas and helpers.

255-260: LGTM: SearchSpace.connector_schedules relationship.

Naming and cascade settings look consistent.

surfsense_backend/alembic/versions/23_add_connector_schedules_table.py

surfsense_backend/app/app.py

surfsense_backend/app/db.py

surfsense_backend/app/routes/connector_schedules_routes.py

surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx

recurseml

Review by RecurseML

🔍 Review performed on b870ddb..6fa1a00

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (50)

• README.md
• docs/chinese-llm-setup.md
• surfsense_backend/alembic/env.py
• surfsense_backend/alembic/versions/23_associate_connectors_with_search_spaces.py
• surfsense_backend/alembic/versions/24_fix_null_chat_types.py
• surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py
• surfsense_backend/alembic/versions/26_add_language_column_to_llm_configs.py
• surfsense_backend/alembic/versions/27_add_searxng_connector_enum.py
• surfsense_backend/alembic/versions/28_add_chinese_litellmprovider_enum.py
• surfsense_backend/app/agents/podcaster/configuration.py
• surfsense_backend/app/agents/podcaster/nodes.py
• surfsense_backend/app/agents/researcher/configuration.py
• surfsense_backend/app/agents/researcher/nodes.py
• surfsense_backend/app/agents/researcher/prompts.py
• surfsense_backend/app/agents/researcher/qna_agent/configuration.py
• surfsense_backend/app/agents/researcher/qna_agent/nodes.py
• surfsense_backend/app/agents/researcher/qna_agent/prompts.py
• surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py
• surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py
• surfsense_backend/app/connectors/google_calendar_connector.py
• surfsense_backend/app/connectors/google_gmail_connector.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/airtable_add_connector_route.py
• surfsense_backend/app/routes/chats_routes.py
• surfsense_backend/app/routes/documents_routes.py
• surfsense_backend/app/routes/google_calendar_add_connector_route.py
• surfsense_backend/app/routes/google_gmail_add_connector_route.py
• surfsense_backend/app/routes/llm_config_routes.py
• surfsense_backend/app/routes/luma_add_connector_route.py
• surfsense_backend/app/routes/search_source_connectors_routes.py
• surfsense_backend/app/schemas/llm_config.py
• surfsense_backend/app/schemas/search_source_connector.py
• surfsense_backend/app/services/connector_service.py
• surfsense_backend/app/services/llm_service.py
• surfsense_backend/app/services/query_service.py
• surfsense_backend/app/services/task_logging_service.py
• surfsense_backend/app/tasks/connector_indexers/airtable_indexer.py
• surfsense_backend/app/tasks/connector_indexers/clickup_indexer.py
• surfsense_backend/app/tasks/connector_indexers/confluence_indexer.py
• surfsense_backend/app/tasks/connector_indexers/discord_indexer.py
• surfsense_backend/app/tasks/connector_indexers/github_indexer.py
• surfsense_backend/app/tasks/connector_indexers/google_calendar_indexer.py
• surfsense_backend/app/tasks/connector_indexers/google_gmail_indexer.py
• surfsense_backend/app/tasks/connector_indexers/jira_indexer.py
• surfsense_backend/app/tasks/connector_indexers/linear_indexer.py
• surfsense_backend/app/tasks/connector_indexers/luma_indexer.py
• surfsense_backend/app/tasks/connector_indexers/notion_indexer.py
• surfsense_backend/app/tasks/document_processors/extension_processor.py
• surfsense_backend/app/tasks/document_processors/file_processors.py
• surfsense_backend/app/tasks/document_processors/markdown_processor.py

⏭️ Files skipped (56)

Locations
`surfsense_backend/app/tasks/document_processors/url_crawler.py`
`surfsense_backend/app/tasks/document_processors/youtube_processor.py`
`surfsense_backend/app/tasks/podcast_tasks.py`
`surfsense_backend/app/tasks/stream_connector_search_results.py`
`surfsense_backend/app/utils/validators.py`
`surfsense_web/app/dashboard/[search_space_id]/client-layout.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/airtable-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/clickup-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/discord-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/github-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/google-calendar-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/google-gmail-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/jira-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/linear-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/linkup-api/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/luma-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/notion-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/searxng/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/serper-api/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/slack-connector/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/connectors/add/tavily-api/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/layout.tsx`
`surfsense_web/app/dashboard/[search_space_id]/logs/(manage)/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/onboard/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/podcasts/podcasts-client.tsx`
`surfsense_web/app/dashboard/[search_space_id]/researcher/[[...chat_id]]/page.tsx`
`surfsense_web/app/dashboard/[search_space_id]/settings/page.tsx`
`surfsense_web/app/dashboard/layout.tsx`
`surfsense_web/components/UserDropdown.tsx`
`surfsense_web/components/chat/ChatCitation.tsx`
`surfsense_web/components/chat/ChatInputGroup.tsx`
`surfsense_web/components/chat/ChatSources.tsx`
`surfsense_web/components/chat/SourceDetailSheet.tsx`
`surfsense_web/components/editConnector/types.ts`
`surfsense_web/components/inference-params-editor.tsx`
`surfsense_web/components/onboard/add-provider-step.tsx`
`surfsense_web/components/onboard/assign-roles-step.tsx`
`surfsense_web/components/onboard/completion-step.tsx`
`surfsense_web/components/settings/llm-role-manager.tsx`
`surfsense_web/components/settings/model-config-manager.tsx`
`surfsense_web/contracts/enums/connector.ts`
`surfsense_web/contracts/enums/connectorIcons.tsx`
`surfsense_web/contracts/enums/languages.ts`
`surfsense_web/contracts/enums/llm-providers.ts`
`surfsense_web/hooks/index.ts`
`surfsense_web/hooks/use-chat.ts`
`surfsense_web/hooks/use-connector-edit-page.ts`
`surfsense_web/hooks/use-connectors.ts`
`surfsense_web/hooks/use-llm-configs.ts`
`surfsense_web/hooks/use-search-source-connectors.ts`
`surfsense_web/lib/connectors/utils.ts`

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

surfsense_backend/app/services/connector_scheduler_service.py (1)
127-145: Fix invalid datetime timezone usage.

Line 129 uses datetime.now(datetime.utc) which is incorrect. datetime.utc is a constant, not a timezone object, and will raise an AttributeError at runtime.

Apply this diff:
     async def _get_due_schedules(self, session: AsyncSession) -> List[ConnectorSchedule]:
         """Get all schedules that are due for execution."""
-        now = datetime.now(datetime.utc)
+        now = datetime.now(timezone.utc)

🧹 Nitpick comments (2)

surfsense_backend/app/routes/connector_schedules_routes.py (2)

90-98: Consider skipping next_run calculation when creating inactive schedules.

Currently, next_run_at is calculated even when is_active is False. While the scheduler ignores inactive schedules, you could optimize by setting next_run_at = None for inactive schedules to avoid unnecessary computation.

Apply this diff:

-        # Calculate next run time
-        next_run_at = calculate_next_run(
+        # Calculate next run time (skip if inactive)
+        next_run_at = None if not schedule.is_active else calculate_next_run(
             schedule.schedule_type, 
             schedule.cron_expression,
             schedule.daily_time,
             schedule.weekly_day,
             schedule.weekly_time,
             schedule.hourly_minute
         )

327-363: Recalculate next_run_at when activating schedules via toggle.

When toggling a schedule from inactive to active, next_run_at may be stale (if it was set before deactivation). The scheduler will handle this by running it immediately, but it's cleaner to recalculate the next run time when activating.

Similarly, consider clearing next_run_at when deactivating for cleaner state.

Apply this diff:

         # Toggle the active status
         schedule.is_active = not schedule.is_active
+        
+        # Recalculate or clear next_run_at based on new state
+        if schedule.is_active:
+            # Recalculate next run time when activating
+            schedule.next_run_at = calculate_next_run(
+                schedule.schedule_type,
+                schedule.cron_expression,
+                schedule.daily_time,
+                schedule.weekly_day,
+                schedule.weekly_time,
+                schedule.hourly_minute,
+            )
+        else:
+            # Clear next run time when deactivating
+            schedule.next_run_at = None
+        
         await session.commit()
         await session.refresh(schedule)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fa1a00 and fc4f677.

📒 Files selected for processing (7)

surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1 hunks)
surfsense_backend/app/app.py (3 hunks)
surfsense_backend/app/db.py (4 hunks)
surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
surfsense_backend/app/routes/scheduler_routes.py (1 hunks)
surfsense_backend/app/schemas/connector_schedule.py (1 hunks)
surfsense_backend/app/services/connector_scheduler_service.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

surfsense_backend/app/app.py (2)

surfsense_backend/app/services/connector_scheduler_service.py (2)

start_scheduler (371-374)

stop_scheduler (377-382)

surfsense_backend/app/db.py (1)

create_db_and_tables (477-481)

surfsense_backend/app/schemas/connector_schedule.py (2)

surfsense_backend/app/db.py (2)

BaseModel (155-159)

ScheduleType (133-137)

surfsense_backend/app/schemas/base.py (2)

IDModel (11-13)

TimestampModel (6-8)

surfsense_backend/app/routes/connector_schedules_routes.py (4)

surfsense_backend/app/db.py (7)

ConnectorSchedule (299-330)

ScheduleType (133-137)

SearchSourceConnector (264-296)

SearchSpace (221-261)

User (424-433)

User (437-443)

get_async_session (484-486)

surfsense_backend/app/schemas/connector_schedule.py (3)

ConnectorScheduleCreate (88-91)

ConnectorScheduleRead (113-119)

ConnectorScheduleUpdate (94-110)

surfsense_backend/app/utils/check_ownership.py (1)

check_ownership (9-19)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-76)

surfsense_backend/app/routes/scheduler_routes.py (3)

surfsense_backend/app/db.py (3)

get_async_session (484-486)

ConnectorSchedule (299-330)

SearchSpace (221-261)

surfsense_backend/app/schemas/connector_schedule.py (1)

ConnectorScheduleRead (113-119)

surfsense_backend/app/services/connector_scheduler_service.py (3)

get_scheduler (363-368)

get_scheduler_status (310-325)

force_execute_schedule (327-356)

surfsense_backend/app/services/connector_scheduler_service.py (4)

surfsense_backend/app/db.py (4)

ConnectorSchedule (299-330)

SearchSourceConnector (264-296)

SearchSourceConnectorType (56-72)

get_async_session (484-486)

surfsense_backend/app/services/task_logging_service.py (4)

TaskLoggingService (13-243)

log_task_start (20-58)

log_task_failure (107-162)

log_task_success (60-105)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-76)

surfsense_backend/app/routes/scheduler_routes.py (2)

get_scheduler_status (24-45)

force_execute_schedule (49-80)

🪛 GitHub Actions: Code Quality Checks

surfsense_backend/app/routes/scheduler_routes.py

[error] 167-173: Ruff lint: Undefined name 'Log' (F821) and related definitions; potential missing imports or model definitions. Also references to Log in multiple lines.

[error] 178-178: Ruff lint: Redefinition of unused 'logs' (F811) to be resolved after fixing previous undefined references.

🪛 GitHub Actions: pre-commit

surfsense_backend/app/routes/scheduler_routes.py

[error] 167-171: F821 Undefined name Log in query construction.

[error] 178-178: F811 Redefinition of unused logs.

🔇 Additional comments (9)

surfsense_backend/app/db.py (1)

133-330: LGTM! Past review concerns have been addressed.

The ScheduleType enum and ConnectorSchedule model are correctly implemented with all required fields, including the previously missing schedule options (daily_time, weekly_day, weekly_time, hourly_minute). Relationships are properly configured with appropriate cascade settings.

surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1)

20-99: LGTM! Migration correctly implements all schedule fields.

The migration properly creates the scheduletype enum and connector_schedules table with all required columns, including the schedule configuration options (daily_time, weekly_day, weekly_time, hourly_minute) that were flagged in past reviews. The use of inspector to check for existing indexes is a good practice.

surfsense_backend/app/app.py (1)

1-101: LGTM! Past review concerns have been addressed.

The application correctly integrates the scheduler lifecycle with proper startup/shutdown handling, uses contextlib.suppress for CancelledError as suggested, and registers both the connector_schedules_router and scheduler_router. The logging additions improve observability.

surfsense_backend/app/schemas/connector_schedule.py (1)

24-110: LGTM! Pydantic v2 validators correctly implemented.

All field validators have been updated to use the correct Pydantic v2 signature with info: FieldValidationInfo and access field data via info.data. The validation logic properly enforces schedule type constraints.

surfsense_backend/app/routes/scheduler_routes.py (2)

23-45: LGTM!

The scheduler status endpoint correctly delegates to the scheduler service and includes appropriate error handling.

48-92: LGTM!

The force execute endpoint correctly queues the execution as a background task with proper error handling and logging.

surfsense_backend/app/services/connector_scheduler_service.py (2)

47-100: LGTM!

The scheduler service initialization and lifecycle management (start/stop) are correctly implemented with proper cleanup of active jobs.

277-356: LGTM! Remaining methods correctly implemented.

The schedule update methods and force execute functionality are correctly implemented with proper timezone usage at line 282. The TaskLoggingService integration concerns from past reviews have been resolved.

surfsense_backend/app/routes/connector_schedules_routes.py (1)

70-74: Previous issues resolved: search space validation and time parameters.

The connector/search space mismatch validation (lines 70-74) and full time parameter passing to calculate_next_run (lines 91-98) correctly address the critical issues flagged in previous reviews. The explicit field inclusion (lines 100-114) also prevents non-DB fields from being passed to the model constructor.

Also applies to: 91-98

surfsense_backend/app/routes/connector_schedules_routes.py

surfsense_backend/app/routes/scheduler_routes.py

surfsense_backend/app/services/connector_scheduler_service.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

surfsense_backend/app/routes/connector_schedules_routes.py (1)

353-359: Recalculate/clear next_run_at on toggle.

On deactivate: set next_run_at=None. On activate: compute next run and ensure tz-aware UTC.

Apply:

         # Toggle the active status
         schedule.is_active = not schedule.is_active
+        if schedule.is_active:
+            nr = calculate_next_run(
+                schedule.schedule_type,
+                schedule.cron_expression,
+                schedule.daily_time,
+                schedule.weekly_day,
+                schedule.weekly_time,
+                schedule.hourly_minute,
+            )
+            if nr.tzinfo is None:
+                nr = nr.replace(tzinfo=timezone.utc)
+            schedule.next_run_at = nr
+        else:
+            schedule.next_run_at = None
         await session.commit()

Also applies to: 355-357

surfsense_backend/app/routes/scheduler_routes.py (1)

63-68: Tighten log query filter.

Add Log.source == "connector_scheduler" to avoid capturing unrelated messages.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc4f677 and 7391a34.

📒 Files selected for processing (4)

surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
surfsense_backend/app/routes/scheduler_routes.py (1 hunks)
surfsense_backend/app/schemas/connector_schedule.py (1 hunks)
surfsense_backend/app/services/connector_scheduler_service.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

surfsense_backend/app/routes/connector_schedules_routes.py (4)

surfsense_backend/app/db.py (7)

ConnectorSchedule (299-330)

ScheduleType (133-137)

SearchSourceConnector (264-296)

SearchSpace (221-261)

User (424-433)

User (437-443)

get_async_session (484-486)

surfsense_backend/app/schemas/connector_schedule.py (3)

ConnectorScheduleCreate (88-91)

ConnectorScheduleRead (167-173)

ConnectorScheduleUpdate (94-164)

surfsense_backend/app/utils/check_ownership.py (1)

check_ownership (9-19)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-76)

surfsense_backend/app/services/connector_scheduler_service.py (4)

surfsense_backend/app/db.py (4)

ConnectorSchedule (299-330)

SearchSourceConnector (264-296)

SearchSourceConnectorType (56-72)

get_async_session (484-486)

surfsense_backend/app/services/task_logging_service.py (4)

TaskLoggingService (13-243)

log_task_start (20-58)

log_task_failure (107-162)

log_task_success (60-105)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-76)

surfsense_backend/app/routes/scheduler_routes.py (2)

get_scheduler_status (24-45)

force_execute_schedule (49-80)

surfsense_backend/app/schemas/connector_schedule.py (2)

surfsense_backend/app/db.py (2)

BaseModel (155-159)

ScheduleType (133-137)

surfsense_backend/app/schemas/base.py (2)

IDModel (11-13)

TimestampModel (6-8)

surfsense_backend/app/routes/scheduler_routes.py (3)

surfsense_backend/app/db.py (3)

get_async_session (484-486)

ConnectorSchedule (299-330)

SearchSpace (221-261)

surfsense_backend/app/schemas/connector_schedule.py (1)

ConnectorScheduleRead (167-173)

surfsense_backend/app/services/connector_scheduler_service.py (3)

get_scheduler (363-368)

get_scheduler_status (310-325)

force_execute_schedule (327-356)

🔇 Additional comments (2)

surfsense_backend/app/schemas/connector_schedule.py (1)

24-37: Validators migrated to Pydantic v2 correctly.

Using FieldValidationInfo and info.data is correct. Good cross‑field checks.

Based on learnings

surfsense_backend/app/services/connector_scheduler_service.py (1)

127-141: Timezone-aware scheduling and updates look correct.

Using datetime.now(timezone.utc) and updating last_run_at/next_run_at with tz-aware values aligns with TIMESTAMP(timezone=True).

Also applies to: 277-305

surfsense_backend/app/routes/connector_schedules_routes.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

surfsense_backend/app/routes/connector_schedules_routes.py (1)
164-179: Consider eager-loading relationships to avoid N+1 queries.

If callers access the connector or search_space relationships from the returned schedules, each access would trigger a separate query. You can optimize this by eager-loading:
         query = (
             select(ConnectorSchedule)
             .join(SearchSourceConnector)
             .filter(SearchSourceConnector.user_id == user.id)
+            .options(
+                selectinload(ConnectorSchedule.connector),
+                selectinload(ConnectorSchedule.search_space)
+            )
         )
Don't forget to import selectinload:
from sqlalchemy.orm import selectinload

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7391a34 and 62ffec9.

📒 Files selected for processing (2)

surfsense_backend/app/routes/connector_schedules_routes.py (1 hunks)
surfsense_backend/app/utils/schedule_helpers.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

surfsense_backend/app/utils/schedule_helpers.py

🧰 Additional context used

🧬 Code graph analysis (1)

surfsense_backend/app/routes/connector_schedules_routes.py (4)

surfsense_backend/app/db.py (7)

ConnectorSchedule (299-330)

ScheduleType (133-137)

SearchSourceConnector (264-296)

SearchSpace (221-261)

User (424-433)

User (437-443)

get_async_session (484-486)

surfsense_backend/app/schemas/connector_schedule.py (3)

ConnectorScheduleCreate (88-91)

ConnectorScheduleRead (167-173)

ConnectorScheduleUpdate (94-164)

surfsense_backend/app/utils/check_ownership.py (1)

check_ownership (9-19)

surfsense_backend/app/utils/schedule_helpers.py (1)

calculate_next_run (10-80)

🔇 Additional comments (2)

surfsense_backend/app/routes/connector_schedules_routes.py (2)

70-75: Excellent improvements addressing previous feedback!

The create endpoint now properly:

Enforces connector-space ownership match (lines 70-75)

Passes all time parameters to calculate_next_run (lines 94-100)

Explicitly includes only DB columns when creating the model (lines 104-116)

These changes resolve the critical issues flagged in previous reviews.

Also applies to: 91-101, 103-116

245-319: Update logic is complex but handles all scenarios correctly.

The update endpoint properly handles:

Schedule type changes with next_run_at recalculation (lines 270-285)

Time field updates (lines 264-268, 270-285)

Cron expression updates for CUSTOM schedules (lines 286-301)

Activation/deactivation with appropriate next_run_at management (lines 303-319)

Validation that cron_expression is only used with CUSTOM type (lines 257-262)

Timezone-aware UTC enforcement (lines 282-283, 298-299, 317-318)

The conditional logic correctly avoids duplicate calculations and handles edge cases like simultaneous type+time changes or activation toggles combined with other updates.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62ffec9 and c044d42.

📒 Files selected for processing (2)

surfsense_backend/app/connectors/clickup_connector.py (2 hunks)
surfsense_backend/app/connectors/jira_connector.py (2 hunks)

🔇 Additional comments (3)

surfsense_backend/app/connectors/clickup_connector.py (2)
8-8: LGTM!

The datetime import is correctly added to support the new date range parsing functionality.

188-193: Verify the date filter logic aligns with scheduling requirements.

The implementation uses both date_created and date_updated filters with the same timestamp range. ClickUp API typically applies OR logic between these filters, meaning tasks are returned if they were either created or updated in the range.

This will include tasks that were created before start_date but updated during the range, which may or may not be the desired behavior for the scheduling system:

✅ Correct for incremental syncs: If the scheduler needs to catch task updates, this is appropriate

⚠️ Potentially incorrect for initial syncs: If only newly created tasks are desired, remove the date_updated filters

Please confirm this matches the intended scheduling behavior.

If you only want newly created tasks (not updates), apply this diff:
                 "include_closed": str(include_closed).lower(),
-                # Date filtering - filter by both created and updated dates
+                # Date filtering - filter by created date only
                 "date_created_gt": start_timestamp,
                 "date_created_lt": end_timestamp,
-                "date_updated_gt": start_timestamp,
-                "date_updated_lt": end_timestamp,
surfsense_backend/app/connectors/jira_connector.py (1)

253-253: LGTM on passing constructed JQL into params.

Using params["jql"] = _jql is correct.

coderabbitai · 2025-10-15T19:37:25Z

surfsense_backend/app/connectors/clickup_connector.py

+            # Convert date strings to Unix timestamps (milliseconds)
+            start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
+            end_datetime = datetime.strptime(end_date, "%Y-%m-%d")
+
+            # Set time to start and end of day for complete coverage
+            start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)
+            end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)
+
+            start_timestamp = int(start_datetime.timestamp() * 1000)
+            end_timestamp = int(end_datetime.timestamp() * 1000)


⚠️ Potential issue | 🔴 Critical

Add explicit timezone handling to prevent date boundary shifts.

The code uses naive datetime objects (no timezone), which will be interpreted in the server's local timezone when calling .timestamp(). If the server timezone differs from the user's expected timezone or ClickUp's timezone, this can cause the date range boundaries to shift unexpectedly, potentially fetching tasks from the wrong days.

Apply this diff to use UTC timezone explicitly:

- # Convert date strings to Unix timestamps (milliseconds) - start_datetime = datetime.strptime(start_date, "%Y-%m-%d") - end_datetime = datetime.strptime(end_date, "%Y-%m-%d") - - # Set time to start and end of day for complete coverage - start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0) - end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999) - - start_timestamp = int(start_datetime.timestamp() * 1000) - end_timestamp = int(end_datetime.timestamp() * 1000) + # Convert date strings to Unix timestamps (milliseconds) + # Use UTC timezone explicitly to ensure consistent date boundaries + from datetime import timezone + + start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace( + hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc + ) + end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace( + hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc + ) + + start_timestamp = int(start_datetime.timestamp() * 1000) + end_timestamp = int(end_datetime.timestamp() * 1000)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Convert date strings to Unix timestamps (milliseconds)

start_datetime = datetime.strptime(start_date, "%Y-%m-%d")

end_datetime = datetime.strptime(end_date, "%Y-%m-%d")

# Set time to start and end of day for complete coverage

start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)

end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)

start_timestamp = int(start_datetime.timestamp() * 1000)

end_timestamp = int(end_datetime.timestamp() * 1000)

# Convert date strings to Unix timestamps (milliseconds)

# Use UTC timezone explicitly to ensure consistent date boundaries

from datetime import timezone

start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace(

hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc

)

end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace(

hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc

)

start_timestamp = int(start_datetime.timestamp() * 1000)

end_timestamp = int(end_datetime.timestamp() * 1000)

coderabbitai · 2025-10-15T19:37:25Z

surfsense_backend/app/connectors/jira_connector.py

+                f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') "
+                f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')"
            )
-            # TODO : This JQL needs some improvement to work as expected

-            _jql = f"{date_filter}"
+            _jql = f"{date_filter} ORDER BY created DESC"
            if project_key:
                _jql = (
-                    f'project = "{project_key}" AND {date_filter} ORDER BY created DESC'
+                    f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC'
                )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix JQL field names; sanitize project_key (prevents 400s and injection).

Use JQL fields created/updated (not createdDate/updatedDate). Also escape quotes in project_key.

Apply this diff:

- date_filter = ( - f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') " - f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')" - ) + date_filter = ( + f"(created >= '{start_date}' AND created <= '{end_date}') " + f"OR (updated >= '{start_date}' AND updated <= '{end_date}')" + ) - _jql = f"{date_filter} ORDER BY created DESC" + _jql = f"{date_filter} ORDER BY created DESC" if project_key: - _jql = ( - f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC' - ) + safe_project = project_key.replace('"', '\\"') + _jql = ( + f'project = "{safe_project}" AND ({date_filter}) ORDER BY created DESC' + )

Optional refinements:

Use startOfDay/endOfDay for inclusivity and TZ safety:
created >= startOfDay("YYYY-MM-DD") AND created <= endOfDay("YYYY-MM-DD") (same for updated).

Consider ORDER BY updated DESC to surface most recently changed issues.

🏁 Script executed:

#!/bin/bash # Find any other occurrences of non-standard JQL fields rg -nP -C2 '\b(createdDate|updatedDate)\b' --type=py

Length of output: 627

🌐 Web query:

Does Jira JQL use fields named 'createdDate' and 'updatedDate', or should it be 'created' and 'updated'?

💡 Result:

Use created and updated in JQL — createdDate and updatedDate are accepted aliases. [1][2]

Sources: Atlassian JQL fields reference (created → alias createdDate; updated → alias updatedDate). [1][2]

🏁 Script executed:

#!/bin/bash sed -n '180,260p' surfsense_backend/app/connectors/jira_connector.py

Length of output: 2494

Escape project_key to prevent JQL injection
Sanitize or validate project_key (e.g. escape double quotes or restrict to alphanumeric) before embedding it in the JQL string.

🤖 Prompt for AI Agents

In surfsense_backend/app/connectors/jira_connector.py around lines 225 to 233, the JQL string interpolates project_key directly which allows JQL injection; validate or sanitize project_key before embedding it by either restricting it to an allowed pattern (e.g. only letters, digits, hyphen/underscore) and raising an error on invalid values, or by escaping any double quotes and backslashes in project_key (replace " with \" and \ with \\) before inserting into the JQL; prefer validation against a strict regex (e.g. alphanumeric with allowed punctuation) to ensure only safe project keys are used.

local STT implementation with Faster-Whisper

chore: updated docs & refactored sst_service.py

- Fixed Jira connector to use constructed JQL query for date filtering - Fixed ClickUp connector to include date range parameters in API request - Resolved merge conflicts with remote branch improvements - Enhanced ClickUp date handling with complete day coverage (00:00:00 to 23:59:59)

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

surfsense_backend/app/db.py (1)

285-297: Fix back_populates mismatches (will raise mapper configuration errors).

SearchSourceConnector.search_space uses back_populates="search_source_connectors", but SearchSpace defines no search_source_connectors.
SearchSourceConnector.user uses back_populates="search_source_connectors", but User defines no search_source_connectors.
LLMConfig.search_space uses back_populates="llm_configs" (Lines 353-357), but SearchSpace defines no llm_configs.
UserSearchSpacePreference.search_space uses back_populates="user_preferences" (Lines 391-393), but SearchSpace defines no user_preferences.

Add the missing counterparts (and avoid delete-orphan on both parents for the same child):

Apply this diff to restore/add the missing relationships:

 class SearchSpace(BaseModel, TimestampMixin):
@@
     logs = relationship(
         "Log",
         back_populates="search_space",
         order_by="Log.id",
         cascade="all, delete-orphan",
     )
+    # Parents of connectors/configs/preferences
+    search_source_connectors = relationship(
+        "SearchSourceConnector",
+        back_populates="search_space",
+        order_by="SearchSourceConnector.id",
+        cascade="all, delete-orphan",
+    )
+    llm_configs = relationship(
+        "LLMConfig",
+        back_populates="search_space",
+        order_by="LLMConfig.id",
+        cascade="all, delete-orphan",
+    )
+    user_preferences = relationship(
+        "UserSearchSpacePreference",
+        back_populates="search_space",
+        order_by="UserSearchSpacePreference.id",
+        cascade="all, delete-orphan",
+    )
     connector_schedules = relationship(
         "ConnectorSchedule",
         back_populates="search_space",
         order_by="ConnectorSchedule.id",
         cascade="all, delete-orphan",
     )

Add the missing User-side relationship in both auth variants:

@@
     class User(SQLAlchemyBaseUserTableUUID, Base):
         oauth_accounts: Mapped[list[OAuthAccount]] = relationship(
             "OAuthAccount", lazy="joined"
         )
         search_spaces = relationship("SearchSpace", back_populates="user")
         search_space_preferences = relationship(
             "UserSearchSpacePreference",
             back_populates="user",
             cascade="all, delete-orphan",
         )
+        search_source_connectors = relationship(
+            "SearchSourceConnector", back_populates="user"
+        )
@@
     class User(SQLAlchemyBaseUserTableUUID, Base):
         search_spaces = relationship("SearchSpace", back_populates="user")
         search_space_preferences = relationship(
             "UserSearchSpacePreference",
             back_populates="user",
             cascade="all, delete-orphan",
         )
+        search_source_connectors = relationship(
+            "SearchSourceConnector", back_populates="user"
+        )

🧹 Nitpick comments (3)

surfsense_backend/app/services/stt_service.py (2)

25-35: Consider error handling for model initialization failures.

The lazy loading pattern is appropriate, but model initialization could fail due to missing model files, insufficient memory, or other issues. Consider adding try-except around model creation to provide clearer error messages.

 def _get_model(self) -> WhisperModel:
     """Lazy load the Whisper model."""
     if self._model is None:
-        # Use CPU with optimizations for better performance
-        self._model = WhisperModel(
-            self.model_size,
-            device="cpu",
-            compute_type="int8",  # Quantization for faster CPU inference
-            num_workers=1,  # Single worker for stability
-        )
+        try:
+            # Use CPU with optimizations for better performance
+            self._model = WhisperModel(
+                self.model_size,
+                device="cpu",
+                compute_type="int8",  # Quantization for faster CPU inference
+                num_workers=1,  # Single worker for stability
+            )
+        except Exception as e:
+            raise RuntimeError(
+                f"Failed to initialize Whisper model '{self.model_size}': {e}"
+            ) from e
     return self._model

37-68: Add error handling for transcription failures.

The model.transcribe() call can fail for various reasons (corrupted audio, unsupported format, etc.). Consider adding try-except to provide more informative error messages rather than letting raw faster-whisper exceptions propagate.

 def transcribe_file(self, audio_path: str, language: str | None = None) -> dict:
     """Transcribe audio file to text.
     
     Args:
         audio_path: Path to audio file
         language: Optional language code (e.g., "en", "es")
     
     Returns:
         Dict with transcription text and metadata
     """
     model = self._get_model()
 
-    # Transcribe with optimized settings
-    segments, info = model.transcribe(
-        audio_path,
-        language=language,
-        beam_size=1,  # Faster inference
-        best_of=1,  # Single pass
-        temperature=0,  # Deterministic output
-        vad_filter=True,  # Voice activity detection
-        vad_parameters={"min_silence_duration_ms": 500},
-    )
+    try:
+        # Transcribe with optimized settings
+        segments, info = model.transcribe(
+            audio_path,
+            language=language,
+            beam_size=1,  # Faster inference
+            best_of=1,  # Single pass
+            temperature=0,  # Deterministic output
+            vad_filter=True,  # Voice activity detection
+            vad_parameters={"min_silence_duration_ms": 500},
+        )
+    except Exception as e:
+        raise RuntimeError(
+            f"Failed to transcribe audio file '{audio_path}': {e}"
+        ) from e
 
     # Combine all segments
     text = " ".join(segment.text.strip() for segment in segments)

surfsense_backend/app/db.py (1)

300-331: Add DB-level validation for ranges (defensive integrity).

Add CHECK constraints to enforce valid ranges:

weekly_day: 0..6
hourly_minute: 0..59

Apply this diff:

-from sqlalchemy import (
+from sqlalchemy import (
     ARRAY,
     JSON,
     TIMESTAMP,
     Boolean,
     Column,
     Enum as SQLAlchemyEnum,
     ForeignKey,
     Integer,
     String,
     Text,
     Time,
     UniqueConstraint,
     text,
+    CheckConstraint,
 )
@@
 class ConnectorSchedule(BaseModel, TimestampMixin):
     __tablename__ = "connector_schedules"
-    __table_args__ = (
-        UniqueConstraint(
-            "connector_id", "search_space_id", name="uq_connector_search_space"
-        ),
-    )
+    __table_args__ = (
+        UniqueConstraint(
+            "connector_id", "search_space_id", name="uq_connector_search_space"
+        ),
+        CheckConstraint(
+            "weekly_day IS NULL OR (weekly_day BETWEEN 0 AND 6)",
+            name="ck_connector_schedules_weekly_day_range",
+        ),
+        CheckConstraint(
+            "hourly_minute IS NULL OR (hourly_minute BETWEEN 0 AND 59)",
+            name="ck_connector_schedules_hourly_minute_range",
+        ),
+    )

Optional: if you’ll filter frequently by schedule_type, consider index=True on schedule_type.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c044d42 and b07ce72.

⛔ Files ignored due to path filters (1)

surfsense_backend/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

surfsense_backend/.env.example (1 hunks)
surfsense_backend/app/config/__init__.py (1 hunks)
surfsense_backend/app/db.py (4 hunks)
surfsense_backend/app/routes/documents_routes.py (2 hunks)
surfsense_backend/app/services/stt_service.py (1 hunks)
surfsense_backend/pyproject.toml (1 hunks)
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx (1 hunks)
surfsense_web/components/chat/ChatInputGroup.tsx (1 hunks)
surfsense_web/components/inference-params-editor.tsx (1 hunks)
surfsense_web/components/onboard/add-provider-step.tsx (1 hunks)
surfsense_web/content/docs/docker-installation.mdx (1 hunks)
surfsense_web/content/docs/manual-installation.mdx (1 hunks)
surfsense_web/drizzle.config.ts (1 hunks)
surfsense_web/hooks/use-connector-edit-page.ts (2 hunks)

✅ Files skipped from review due to trivial changes (6)

surfsense_web/content/docs/manual-installation.mdx
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx
surfsense_web/components/chat/ChatInputGroup.tsx
surfsense_web/components/inference-params-editor.tsx
surfsense_web/drizzle.config.ts
surfsense_web/content/docs/docker-installation.mdx

🧰 Additional context used

📓 Path-based instructions (3)

**/.env.*

📄 CodeRabbit inference engine (.rules/no_env_files_in_repo.mdc)

Do not commit variant environment files like .env.* (e.g., .env.local, .env.production)

Files:

surfsense_backend/.env.example

**/.env.example

📄 CodeRabbit inference engine (.rules/no_env_files_in_repo.mdc)

Provide a .env.example file with placeholder values instead of real secrets

Files:

surfsense_backend/.env.example

**/*.{jsx,tsx}

📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)

**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings

Files:

surfsense_web/components/onboard/add-provider-step.tsx

🧬 Code graph analysis (1)

surfsense_backend/app/routes/documents_routes.py (2)

surfsense_backend/app/services/stt_service.py (1)

transcribe_file (37-68)

surfsense_backend/app/services/task_logging_service.py (1)

log_task_progress (164-208)

🔇 Additional comments (16)

surfsense_web/hooks/use-connector-edit-page.ts (2)

340-340: LGTM! Validation logic is correct and more concise.

The consolidated validation condition correctly checks that the parsed value is an integer between 0 and 2, matching the error message. This refactor improves readability without changing behavior.

519-519: LGTM! Form update is consistent with the codebase pattern.

The single-line form value update follows the same pattern used throughout this file for other form fields, using the || operator to provide a default empty string.

surfsense_web/components/onboard/add-provider-step.tsx (1)

21-21: LGTM: Alphabetical import ordering.

The reordering of LLM_PROVIDERS import to follow LANGUAGES improves consistency by maintaining alphabetical order within the enum imports group. This change has no functional impact.

surfsense_backend/app/config/__init__.py (1)

105-105: LGTM! Improved comment accuracy.

The updated comment correctly reflects that the STT service can now be either local (Faster-Whisper) or external (LiteLLM), rather than being limited to LiteLLM.

surfsense_backend/app/services/stt_service.py (2)

15-23: LGTM! Model size parsing with sensible fallback.

The initialization correctly extracts the model size from the config string (e.g., "local/base") with an appropriate fallback to "base" if the format is unexpected.

86-96: LGTM! Proper temporary file cleanup.

The temporary file is correctly cleaned up in the finally block, ensuring cleanup even if transcription fails.

surfsense_backend/app/routes/documents_routes.py (4)

787-793: LGTM! Clean STT service type detection.

The logic correctly identifies local STT by checking if the config starts with "local/", with appropriate fallback to "external".

817-826: LGTM! Comprehensive progress logging for local STT.

The progress logging includes relevant metadata (language, confidence, duration) and clearly indicates the local transcription path.

876-893: LGTM! Proper success logging with STT service type.

The success logging correctly includes the stt_service_type metadata (line 885), enabling differentiation between local and external transcription in logs and analytics.

829-853: The file handle is safe and properly managed as-is.

The current code already keeps the file open during the async operation. The with statement context manager is the outer scope of the await atranscription() call, meaning the file remains open while the function reads it (as required by litellm), and only closes after the await completes. This is the correct and recommended pattern for handling file objects with litellm's atranscription.

The suggested refactor to manual open()/try/finally provides no benefit and is unnecessary—the context manager is superior as it guarantees proper cleanup even if an exception occurs.

Likely an incorrect or invalid review comment.

surfsense_backend/.env.example (1)

34-40: LGTM! Clear STT configuration with local default.

The updated configuration appropriately defaults to local STT (local/base) with clear comments explaining both local and external provider options. The commented-out external provider settings serve as useful documentation.

surfsense_backend/pyproject.toml (1)

46-46: Version constraint is appropriate.

The >=1.1.0 constraint is correct. Verification confirms that faster-whisper v1.1.0 includes all required features: WhisperModel with device, compute_type, and num_workers parameters, plus VAD filtering with customizable parameters. The constraint allows for security patches and compatible updates while maintaining the required functionality.

surfsense_backend/app/db.py (4)

19-19: Good: added Time import needed for new schedule fields.

133-138: ScheduleType enum addition looks correct and scoped.

257-262: New SearchSpace.connector_schedules relationship looks good.

300-331: Alembic migration verified: all four schedule columns are present.

The migration file surfsense_backend/alembic/versions/23_add_connector_schedules_table.py includes all four new columns in the table creation:

daily_time TIME

weekly_day SMALLINT

weekly_time TIME

hourly_minute SMALLINT

Model, schemas, and service code all properly reference these columns. No further action needed.

coderabbitai · 2025-10-16T04:49:05Z

surfsense_backend/app/routes/documents_routes.py

+            # Check if using local STT service
+            if stt_service_type == "local":
+                # Use local Faster-Whisper for transcription
+                from app.services.stt_service import stt_service
+
+                try:
+                    result = stt_service.transcribe_file(file_path)
+                    transcribed_text = result.get("text", "")
+
+                    if not transcribed_text:
+                        raise ValueError("Transcription returned empty text")
+
+                    # Add metadata about the transcription
+                    transcribed_text = (
+                        f"# Transcription of {filename}\n\n{transcribed_text}"
                    )
-                else:
+                except Exception as e:
+                    raise HTTPException(
+                        status_code=422,
+                        detail=f"Failed to transcribe audio file {filename}: {e!s}",
+                    ) from e


⚠️ Potential issue | 🟠 Major

Replace HTTPException with regular exception in background task.

Raising HTTPException on line 812 is incorrect for background task execution. Since this code runs in process_file_in_background, HTTP exceptions cannot be properly propagated to the client. This will likely cause the exception to be logged but not handled gracefully.

Apply this diff to use appropriate exception handling:

try: result = stt_service.transcribe_file(file_path) transcribed_text = result.get("text", "") if not transcribed_text: raise ValueError("Transcription returned empty text") # Add metadata about the transcription transcribed_text = ( f"# Transcription of {filename}\n\n{transcribed_text}" ) except Exception as e: - raise HTTPException( - status_code=422, - detail=f"Failed to transcribe audio file {filename}: {e!s}", - ) from e + # Let the exception propagate to be caught by the outer try-except + # which will log it via task_logger.log_task_failure + raise RuntimeError( + f"Failed to transcribe audio file {filename}: {e!s}" + ) from e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Check if using local STT service

if stt_service_type == "local":

# Use local Faster-Whisper for transcription

from app.services.stt_service import stt_service

try:

result = stt_service.transcribe_file(file_path)

transcribed_text = result.get("text", "")

if not transcribed_text:

raise ValueError("Transcription returned empty text")

# Add metadata about the transcription

transcribed_text = (

f"# Transcription of {filename}\n\n{transcribed_text}"

)

else:

except Exception as e:

raise HTTPException(

status_code=422,

detail=f"Failed to transcribe audio file {filename}: {e!s}",

) from e

# Check if using local STT service

if stt_service_type == "local":

# Use local Faster-Whisper for transcription

from app.services.stt_service import stt_service

try:

result = stt_service.transcribe_file(file_path)

transcribed_text = result.get("text", "")

if not transcribed_text:

raise ValueError("Transcription returned empty text")

# Add metadata about the transcription

transcribed_text = (

f"# Transcription of {filename}\n\n{transcribed_text}"

)

except Exception as e:

# Let the exception propagate to be caught by the outer try-except

# which will log it via task_logger.log_task_failure

raise RuntimeError(

f"Failed to transcribe audio file {filename}: {e!s}"

) from e

🤖 Prompt for AI Agents

In surfsense_backend/app/routes/documents_routes.py around lines 795 to 815, the background transcription block currently raises an HTTPException inside the background task which cannot be propagated to the client; replace the HTTPException with a regular Exception (or a custom non-HTTP exception) and ensure you log the error before re-raising so the background worker can handle it properly (e.g., raise Exception(f"Failed to transcribe audio file {filename}: {e}") or log and return an error state/object instead of using HTTPException).

MODSetter · 2025-10-16T05:15:58Z

@vaishcodescape I think this should be done after we add Celery as the message queue. I believe this was already conveyed. Please DM me on Discord if you’d like to work on something that’s more likely to get merged. Closing this for now — sorry.

vaishcodescape and others added 9 commits October 9, 2025 19:30

init commit syncing features

a51fcce

Added local Speech-to-Text (STT) support using Faster-Whisper

dad7967

refactor: eliminate duplicated STT service condition check

504399a

- Compute stt_service_type once and reuse - Follow DRY principles - Improve code maintainability

Remove package-lock.json as requested in PR review

39685b1

Scheduled/Automated Connector Syncing added

b870ddb

Merge branch 'main' into aditya/features

6fa1a00

recurseml bot reviewed Oct 14, 2025

View reviewed changes

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

chore: restore original package-lock.json

6f75ad1

vaishcodescape closed this Oct 15, 2025

vaishcodescape reopened this Oct 15, 2025

Update scheduler routes

97f73ae

recurseml bot reviewed Oct 15, 2025

View reviewed changes

bugs and issues fixed

fc4f677

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

issues resolved

7391a34

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

surfsense_backend/app/routes/connector_schedules_routes.py Outdated Show resolved Hide resolved

surfsense_backend/app/routes/connector_schedules_routes.py Show resolved Hide resolved

pending issues fixed

62ffec9

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

connector date range filtering fixed

c044d42

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

MODSetter and others added 3 commits October 15, 2025 14:08

Merge pull request MODSetter#387 from nabthebest135/local-stt

5ed9aa2

local STT implementation with Faster-Whisper

chore: updated docs & refactored sst_service.py

70b547c

chore: biome checks

3022b25

MODSetter and others added 3 commits October 15, 2025 14:44

Merge pull request MODSetter#408 from MODSetter/dev

fa39176

chore: updated docs & refactored sst_service.py

fixed data range filtering

41d3caa

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

MODSetter closed this Oct 16, 2025

Uh oh!

feat:added automated connector scheduling system #400

feat:added automated connector scheduling system #400

Uh oh!

Conversation

vaishcodescape commented Oct 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implemented Automated Connector Scheduling System

Description

Motivation and Context

Changes Overview

Backend Infrastructure

API Endpoints

Frontend Implementation

Integration & Lifecycle Management

Monitoring & Observability

Technical Implementation Details

Architecture

Security & Validation

Performance Considerations

Files Modified/Created

New Files

Modified Files

Database

API Changes

Types of changes

Testing

Checklist:

High-level PR Summary

High-level PR Summary

Summary by CodeRabbit

Uh oh!

vercel bot commented Oct 14, 2025

Uh oh!

coderabbitai bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

recurseml bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by RecurseML

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

recurseml bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by RecurseML

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

vaishcodescape commented Oct 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 14, 2025 •

edited

Loading

recurseml bot left a comment •

edited

Loading

recurseml bot left a comment •

edited

Loading