-
-
Notifications
You must be signed in to change notification settings - Fork 792
feat:added automated connector scheduling system #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add STT service with CPU-optimized Faster-Whisper - Add API endpoints for transcription and model management - Add React audio recorder component - Support multiple Whisper models (tiny to large-v3) - Include error handling for corrupted/invalid files - Tested with real speech audio (99% accuracy) - No external API dependencies, fully offline
- Simplify STT_SERVICE config to local/MODEL_SIZE format - Remove separate STT routes, integrate with document upload - Add local STT support to audio file processing pipeline - Remove React component, use existing upload interface - Support both local Faster-Whisper and external STT services - Tested with real speech: 99% accuracy, 2.87s processing
- Compute stt_service_type once and reuse - Follow DRY principles - Improve code maintainability
- Use .get() for safe dictionary access instead of direct key access - Add explicit try-catch for local STT transcription failures - Validate transcription result is not empty - Provide clear error messages for corrupted audio files - Match error handling pattern with external STT service
- Add header to local STT transcription for consistency - Add empty text validation for external STT path - Refactor external STT to eliminate duplication in atranscription calls - Ensure both local and external paths have consistent error handling
|
@vaishcodescape is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughAdds end-to-end connector scheduling: DB migration and ORM model, Pydantic schemas, schedule helpers, a background ConnectorSchedulerService with lifecycle and force-execute, new API routes and frontend UI, STT local support and related connector/date-range updates, plus IDE project files and app wiring. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant FE as Frontend (Schedules Page)
participant API as FastAPI (/api/v1)
participant Svc as ConnectorSchedulerService
participant DB as DB (AsyncSession)
rect rgba(200,230,255,0.18)
note over API,Svc: App startup wiring
API->>Svc: start_scheduler()
Svc->>Svc: start background loop (check_interval)
end
User->>FE: open schedules page
FE->>API: GET /api/v1/scheduler/status
API->>Svc: get_scheduler_status()
Svc-->>API: status JSON
API-->>FE: status JSON
FE->>API: GET /api/v1/connector-schedules?search_space_id=…
API->>DB: query schedules (+ connector, search_space)
DB-->>API: schedules
API-->>FE: schedules JSON
rect rgba(220,255,220,0.18)
note over Svc,DB: Periodic scheduler run
Svc->>DB: find due schedules
DB-->>Svc: due schedules
loop per due schedule (bounded by concurrency)
Svc->>DB: update last_run_at
Svc->>Svc: dispatch indexer task (by connector type)
Svc->>DB: update next_run_at
end
end
sequenceDiagram
autonumber
actor User
participant FE as Frontend
participant API as FastAPI (/api/v1)
participant BG as BackgroundTasks
participant Svc as ConnectorSchedulerService
User->>FE: Click "Force execute"
FE->>API: POST /api/v1/scheduler/schedules/{id}/force-execute
API->>BG: enqueue _force_execute_schedule_task(schedule_id)
API-->>FE: 202 Accepted
BG->>Svc: force_execute_schedule(schedule_id)
Svc->>DB: validate schedule active & fetch
Svc->>Svc: _execute_schedule -> update last_run_at/next_run_at and run indexer
Svc-->>BG: result / logs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by RecurseML
🔍 Review performed on d86aaea..b870ddb
| Severity | Location | Issue | Delete |
|---|---|---|---|
| surfsense_backend/app/routes/connector_schedules_routes.py:206 | Incomplete route implementation causing syntax error | ||
| surfsense_backend/app/app.py:101 | Missing router registration breaks API | ||
| surfsense_backend/app/utils/schedule_helpers.py:35 | Timezone-naive datetime causes comparison errors | ||
| surfsense_backend/app/routes/connector_schedules_routes.py:84 | Missing time parameters causes incorrect schedules |
✅ Files analyzed, no issues (12)
• .idea/.gitignore
• .idea/SurfSense.iml
• .idea/inspectionProfiles/profiles_settings.xml
• .idea/modules.xml
• .idea/vcs.xml
• surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/scheduler_routes.py
• surfsense_backend/app/schemas/__init__.py
• surfsense_backend/app/schemas/connector_schedule.py
• surfsense_backend/app/services/connector_scheduler_service.py
• surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
⏭️ Files skipped (1)
| Locations |
|---|
surfsense_web/package-lock.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
surfsense_backend/app/app.py (1)
56-63: CORS misconfiguration: allow_credentials=True with "*" origins.Browsers disallow wildcard origin when credentials are enabled; requests will fail and it’s insecure. Use explicit origins or set allow_credentials=False.
Example:
-app.add_middleware( - CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], -) +app.add_middleware( + CORSMiddleware, + allow_origins=config.CORS_ALLOW_ORIGINS, # e.g., ["https://app.example.com"] + allow_credentials=True, + allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"], + allow_headers=["*"], +)Ensure config.CORS_ALLOW_ORIGINS is defined appropriately per env.
surfsense_backend/app/db.py (1)
283-296: Define missing reverse relationships in SearchSpace
Add these three relationships to the SearchSpace class to match existing back_populates and avoid mapping errors:search_source_connectors = relationship( "SearchSourceConnector", back_populates="search_space", order_by="SearchSourceConnector.id", cascade="all, delete-orphan", ) llm_configs = relationship( "LLMConfig", back_populates="search_space", order_by="LLMConfig.id", cascade="all, delete-orphan", ) user_preferences = relationship( "UserSearchSpacePreference", back_populates="search_space", order_by="UserSearchSpacePreference.id", cascade="all, delete-orphan", )Place these inside the SearchSpace class (e.g. after connector_schedules).
🧹 Nitpick comments (11)
.idea/SurfSense.iml (1)
1-8: Remove IDE artifacts from version control.This IntelliJ module file is environment-specific, churns often, and doesn’t contribute to the backend/frontend feature set. Please drop it (and the other
.ideaXML files) from the repo and rely on.gitignoreto keep the directory untracked.surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1)
61-78: Indexes: add composite (is_active, next_run_at); consider dropping redundant id index.
- Scheduler likely queries WHERE is_active = TRUE AND next_run_at <= now(); a composite index helps.
- The explicit id index duplicates the PK index; remove if not needed.
Apply this diff to add indexes:
if "ix_connector_schedules_id" not in existing_indexes: - op.create_index("ix_connector_schedules_id", "connector_schedules", ["id"]) + # Optional: id already indexed by PK; consider skipping this extra index. + pass @@ if "ix_connector_schedules_next_run_at" not in existing_indexes: op.create_index( "ix_connector_schedules_next_run_at", "connector_schedules", ["next_run_at"] ) + if "ix_connector_schedules_search_space_id" not in existing_indexes: + op.create_index( + "ix_connector_schedules_search_space_id", + "connector_schedules", + ["search_space_id"], + ) + if "ix_connector_schedules_is_active_next_run_at" not in existing_indexes: + op.create_index( + "ix_connector_schedules_is_active_next_run_at", + "connector_schedules", + ["is_active", "next_run_at"], + )And include drops in downgrade:
- op.drop_index("ix_connector_schedules_next_run_at", table_name="connector_schedules") + op.drop_index("ix_connector_schedules_is_active_next_run_at", table_name="connector_schedules") + op.drop_index("ix_connector_schedules_search_space_id", table_name="connector_schedules") + op.drop_index("ix_connector_schedules_next_run_at", table_name="connector_schedules") - op.drop_index("ix_connector_schedules_id", table_name="connector_schedules") + # If created earlier; otherwise safe to skip or guard with IF EXISTS pattern + # op.drop_index("ix_connector_schedules_id", table_name="connector_schedules")surfsense_backend/app/app.py (2)
31-33: Async task is unnecessary; just await start_scheduler().start_scheduler completes quickly; creating/canceling a task adds noise. Prefer awaiting directly and drop cancel block.
Apply this diff:
- scheduler_task = asyncio.create_task(start_scheduler()) - logger.info("Connector scheduler service started") + await start_scheduler() + logger.info("Connector scheduler service started")Then remove the later cancel section (Lines 43-49).
26-29: Avoid create_all in production; prefer Alembic migrations.Running create_db_and_tables() alongside Alembic can cause drift. Gate behind a dev flag or remove in prod environments.
surfsense_backend/app/utils/schedule_helpers.py (2)
37-41: Optional: clamp hourly_minute defensively.Schemas validate range, but add a local guard to avoid surprises if called directly.
Apply this diff:
- minute = hourly_minute if hourly_minute is not None else 0 + minute = hourly_minute if hourly_minute is not None else 0 + if minute < 0 or minute > 59: + raise ValueError("hourly_minute must be between 0 and 59")
51-64: Weekly scheduling logic: clarify default timezone semantics.If daily/weekly times represent local-wall times, consider storing a timezone per search space or using config default TZ; current UTC assumption may surprise users across DST.
surfsense_backend/app/schemas/connector_schedule.py (1)
88-91: Consider adding optional time fields to Update schemaAllow updating daily/weekly/hourly options if persisted in DB; otherwise clients can’t change them post‑create.
class ConnectorScheduleUpdate(BaseModel): @@ - schedule_type: ScheduleType | None = None - cron_expression: str | None = None - is_active: bool | None = None + schedule_type: ScheduleType | None = None + cron_expression: str | None = None + is_active: bool | None = None + daily_time: time | None = None + weekly_day: int | None = None + weekly_time: time | None = None + hourly_minute: int | None = NoneAlso applies to: 94-100
surfsense_backend/app/routes/scheduler_routes.py (1)
95-106: Bound the limit parameter to avoid heavy queriesAdd simple clamping/validation (e.g., 1..100) to prevent abuse.
-async def get_upcoming_schedules( - limit: int = 10, +async def get_upcoming_schedules( + limit: int = 10, @@ - try: + try: + limit = max(1, min(limit, 100)) @@ -async def get_recent_schedule_executions( - limit: int = 20, +async def get_recent_schedule_executions( + limit: int = 20, @@ - try: + try: + limit = max(1, min(limit, 100))Also applies to: 152-163
surfsense_backend/app/routes/connector_schedules_routes.py (1)
41-45: Frontend relies on filtering by search_space_id; ensure API supports itThe UI calls GET /connector-schedules/?search_space_id=... which your code supports. Consider also adding a similar filter to /search-source-connectors to simplify client-side filtering.
Also applies to: 118-121
surfsense_backend/app/services/connector_scheduler_service.py (2)
289-303: Recalculate next_run with full time options if persistedOnly schedule_type/cron are passed, ignoring daily/weekly/hourly options. If these fields exist in DB, include them.
- next_run = calculate_next_run( - schedule.schedule_type, schedule.cron_expression - ) + next_run = calculate_next_run( + schedule.schedule_type, + schedule.cron_expression, + getattr(schedule, "daily_time", None), + getattr(schedule, "weekly_day", None), + getattr(schedule, "weekly_time", None), + getattr(schedule, "hourly_minute", None), + )
366-370: start_scheduler awaits an infinite loop; ensure it’s spawned as a background taskIf
start_scheduler()is awaited directly in app startup, it will block the app. Spawn withasyncio.create_task(start_scheduler())in lifespan/startup hooks.Where is
start_scheduler()invoked? If it’s awaited, switch to creating a task.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
surfsense_web/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (15)
.idea/.gitignore(1 hunks).idea/SurfSense.iml(1 hunks).idea/inspectionProfiles/profiles_settings.xml(1 hunks).idea/modules.xml(1 hunks).idea/vcs.xml(1 hunks)surfsense_backend/alembic/versions/23_add_connector_schedules_table.py(1 hunks)surfsense_backend/app/app.py(3 hunks)surfsense_backend/app/db.py(3 hunks)surfsense_backend/app/routes/connector_schedules_routes.py(1 hunks)surfsense_backend/app/routes/scheduler_routes.py(1 hunks)surfsense_backend/app/schemas/__init__.py(2 hunks)surfsense_backend/app/schemas/connector_schedule.py(1 hunks)surfsense_backend/app/services/connector_scheduler_service.py(1 hunks)surfsense_backend/app/utils/schedule_helpers.py(1 hunks)surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{jsx,tsx}
📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)
**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings
Files:
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
🧬 Code graph analysis (8)
surfsense_backend/app/schemas/__init__.py (1)
surfsense_backend/app/schemas/connector_schedule.py (4)
ConnectorScheduleBase(11-85)ConnectorScheduleCreate(88-91)ConnectorScheduleRead(113-119)ConnectorScheduleUpdate(94-110)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
BaseModel(154-158)ScheduleType(132-136)surfsense_backend/app/schemas/base.py (2)
IDModel(11-13)TimestampModel(6-8)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (6)
ConnectorSchedule(298-324)SearchSourceConnector(263-295)SearchSpace(220-260)User(418-427)User(431-437)get_async_session(478-480)surfsense_backend/app/schemas/connector_schedule.py (3)
ConnectorScheduleCreate(88-91)ConnectorScheduleRead(113-119)ConnectorScheduleUpdate(94-110)surfsense_backend/app/utils/check_ownership.py (1)
check_ownership(9-19)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-76)
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1)
surfsense_backend/app/db.py (1)
ConnectorSchedule(298-324)
surfsense_backend/app/app.py (2)
surfsense_backend/app/services/connector_scheduler_service.py (2)
start_scheduler(366-369)stop_scheduler(372-377)surfsense_backend/app/db.py (1)
create_db_and_tables(471-475)
surfsense_backend/app/utils/schedule_helpers.py (1)
surfsense_backend/app/db.py (1)
ScheduleType(132-136)
surfsense_backend/app/services/connector_scheduler_service.py (5)
surfsense_backend/app/db.py (3)
ConnectorSchedule(298-324)SearchSourceConnectorType(55-71)get_async_session(478-480)surfsense_backend/app/services/task_logging_service.py (2)
TaskLoggingService(13-243)log_task_start(20-58)surfsense_backend/app/tasks/connector_indexers/slack_indexer.py (1)
index_slack_messages(30-377)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-76)surfsense_backend/app/routes/scheduler_routes.py (2)
get_scheduler_status(24-45)force_execute_schedule(49-80)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
get_async_session(478-480)ConnectorSchedule(298-324)SearchSpace(220-260)surfsense_backend/app/schemas/connector_schedule.py (1)
ConnectorScheduleRead(113-119)surfsense_backend/app/services/connector_scheduler_service.py (3)
get_scheduler(358-363)get_scheduler_status(305-320)force_execute_schedule(322-351)
🪛 GitHub Actions: Code Quality Checks
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
[error] 3-3: biome-check-web: Imports are not sorted. Safe fix available via Organize Imports (Biome).
surfsense_backend/app/app.py
[error] 46-49: SIM105 Use contextlib.suppress(asyncio.CancelledError) instead of try-except-pass
🪛 GitHub Actions: pre-commit
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
[error] 412-416: lint/correctness/useUniqueElementIds: id attribute should not be a static string literal. Generate unique IDs using useId().
surfsense_backend/app/app.py
[error] 46-49: SIM105 Use contextlib.suppress(asyncio.CancelledError) instead of try-except-pass
🔇 Additional comments (3)
surfsense_backend/app/schemas/__init__.py (1)
11-16: LGTM: ConnectorSchedule schemas exposed cleanly.Re-exports look consistent with usage across routes/services. No issues.
Also applies to: 61-64
surfsense_backend/app/db.py (2)
132-137: LGTM: ScheduleType enum.Enum values align with schemas and helpers.
255-260: LGTM: SearchSpace.connector_schedules relationship.Naming and cascade settings look consistent.
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
Show resolved
Hide resolved
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
Show resolved
Hide resolved
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
Show resolved
Hide resolved
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
Show resolved
Hide resolved
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by RecurseML
🔍 Review performed on b870ddb..6fa1a00
✨ No bugs found, your code is sparkling clean
✅ Files analyzed, no issues (50)
• README.md
• docs/chinese-llm-setup.md
• surfsense_backend/alembic/env.py
• surfsense_backend/alembic/versions/23_associate_connectors_with_search_spaces.py
• surfsense_backend/alembic/versions/24_fix_null_chat_types.py
• surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py
• surfsense_backend/alembic/versions/26_add_language_column_to_llm_configs.py
• surfsense_backend/alembic/versions/27_add_searxng_connector_enum.py
• surfsense_backend/alembic/versions/28_add_chinese_litellmprovider_enum.py
• surfsense_backend/app/agents/podcaster/configuration.py
• surfsense_backend/app/agents/podcaster/nodes.py
• surfsense_backend/app/agents/researcher/configuration.py
• surfsense_backend/app/agents/researcher/nodes.py
• surfsense_backend/app/agents/researcher/prompts.py
• surfsense_backend/app/agents/researcher/qna_agent/configuration.py
• surfsense_backend/app/agents/researcher/qna_agent/nodes.py
• surfsense_backend/app/agents/researcher/qna_agent/prompts.py
• surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py
• surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py
• surfsense_backend/app/connectors/google_calendar_connector.py
• surfsense_backend/app/connectors/google_gmail_connector.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/airtable_add_connector_route.py
• surfsense_backend/app/routes/chats_routes.py
• surfsense_backend/app/routes/documents_routes.py
• surfsense_backend/app/routes/google_calendar_add_connector_route.py
• surfsense_backend/app/routes/google_gmail_add_connector_route.py
• surfsense_backend/app/routes/llm_config_routes.py
• surfsense_backend/app/routes/luma_add_connector_route.py
• surfsense_backend/app/routes/search_source_connectors_routes.py
• surfsense_backend/app/schemas/llm_config.py
• surfsense_backend/app/schemas/search_source_connector.py
• surfsense_backend/app/services/connector_service.py
• surfsense_backend/app/services/llm_service.py
• surfsense_backend/app/services/query_service.py
• surfsense_backend/app/services/task_logging_service.py
• surfsense_backend/app/tasks/connector_indexers/airtable_indexer.py
• surfsense_backend/app/tasks/connector_indexers/clickup_indexer.py
• surfsense_backend/app/tasks/connector_indexers/confluence_indexer.py
• surfsense_backend/app/tasks/connector_indexers/discord_indexer.py
• surfsense_backend/app/tasks/connector_indexers/github_indexer.py
• surfsense_backend/app/tasks/connector_indexers/google_calendar_indexer.py
• surfsense_backend/app/tasks/connector_indexers/google_gmail_indexer.py
• surfsense_backend/app/tasks/connector_indexers/jira_indexer.py
• surfsense_backend/app/tasks/connector_indexers/linear_indexer.py
• surfsense_backend/app/tasks/connector_indexers/luma_indexer.py
• surfsense_backend/app/tasks/connector_indexers/notion_indexer.py
• surfsense_backend/app/tasks/document_processors/extension_processor.py
• surfsense_backend/app/tasks/document_processors/file_processors.py
• surfsense_backend/app/tasks/document_processors/markdown_processor.py
⏭️ Files skipped (56)
| Locations |
|---|
surfsense_backend/app/tasks/document_processors/url_crawler.py |
surfsense_backend/app/tasks/document_processors/youtube_processor.py |
surfsense_backend/app/tasks/podcast_tasks.py |
surfsense_backend/app/tasks/stream_connector_search_results.py |
surfsense_backend/app/utils/validators.py |
surfsense_web/app/dashboard/[search_space_id]/client-layout.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/airtable-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/clickup-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/discord-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/github-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/google-calendar-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/google-gmail-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/jira-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/linear-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/linkup-api/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/luma-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/notion-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/searxng/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/serper-api/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/slack-connector/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/connectors/add/tavily-api/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/layout.tsx |
surfsense_web/app/dashboard/[search_space_id]/logs/(manage)/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/onboard/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/podcasts/podcasts-client.tsx |
surfsense_web/app/dashboard/[search_space_id]/researcher/[[...chat_id]]/page.tsx |
surfsense_web/app/dashboard/[search_space_id]/settings/page.tsx |
surfsense_web/app/dashboard/layout.tsx |
surfsense_web/components/UserDropdown.tsx |
surfsense_web/components/chat/ChatCitation.tsx |
surfsense_web/components/chat/ChatInputGroup.tsx |
surfsense_web/components/chat/ChatSources.tsx |
surfsense_web/components/chat/SourceDetailSheet.tsx |
surfsense_web/components/editConnector/types.ts |
surfsense_web/components/inference-params-editor.tsx |
surfsense_web/components/onboard/add-provider-step.tsx |
surfsense_web/components/onboard/assign-roles-step.tsx |
surfsense_web/components/onboard/completion-step.tsx |
surfsense_web/components/settings/llm-role-manager.tsx |
surfsense_web/components/settings/model-config-manager.tsx |
surfsense_web/contracts/enums/connector.ts |
surfsense_web/contracts/enums/connectorIcons.tsx |
surfsense_web/contracts/enums/languages.ts |
surfsense_web/contracts/enums/llm-providers.ts |
surfsense_web/hooks/index.ts |
surfsense_web/hooks/use-chat.ts |
surfsense_web/hooks/use-connector-edit-page.ts |
surfsense_web/hooks/use-connectors.ts |
surfsense_web/hooks/use-llm-configs.ts |
surfsense_web/hooks/use-search-source-connectors.ts |
surfsense_web/lib/connectors/utils.ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
surfsense_backend/app/services/connector_scheduler_service.py (1)
127-145: Fix invalid datetime timezone usage.Line 129 uses
datetime.now(datetime.utc)which is incorrect.datetime.utcis a constant, not a timezone object, and will raise anAttributeErrorat runtime.Apply this diff:
async def _get_due_schedules(self, session: AsyncSession) -> List[ConnectorSchedule]: """Get all schedules that are due for execution.""" - now = datetime.now(datetime.utc) + now = datetime.now(timezone.utc)
🧹 Nitpick comments (2)
surfsense_backend/app/routes/connector_schedules_routes.py (2)
90-98: Consider skipping next_run calculation when creating inactive schedules.Currently,
next_run_atis calculated even whenis_activeisFalse. While the scheduler ignores inactive schedules, you could optimize by settingnext_run_at = Nonefor inactive schedules to avoid unnecessary computation.Apply this diff:
- # Calculate next run time - next_run_at = calculate_next_run( + # Calculate next run time (skip if inactive) + next_run_at = None if not schedule.is_active else calculate_next_run( schedule.schedule_type, schedule.cron_expression, schedule.daily_time, schedule.weekly_day, schedule.weekly_time, schedule.hourly_minute )
327-363: Recalculate next_run_at when activating schedules via toggle.When toggling a schedule from inactive to active,
next_run_atmay be stale (if it was set before deactivation). The scheduler will handle this by running it immediately, but it's cleaner to recalculate the next run time when activating.Similarly, consider clearing
next_run_atwhen deactivating for cleaner state.Apply this diff:
# Toggle the active status schedule.is_active = not schedule.is_active + + # Recalculate or clear next_run_at based on new state + if schedule.is_active: + # Recalculate next run time when activating + schedule.next_run_at = calculate_next_run( + schedule.schedule_type, + schedule.cron_expression, + schedule.daily_time, + schedule.weekly_day, + schedule.weekly_time, + schedule.hourly_minute, + ) + else: + # Clear next run time when deactivating + schedule.next_run_at = None + await session.commit() await session.refresh(schedule)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
surfsense_backend/alembic/versions/23_add_connector_schedules_table.py(1 hunks)surfsense_backend/app/app.py(3 hunks)surfsense_backend/app/db.py(4 hunks)surfsense_backend/app/routes/connector_schedules_routes.py(1 hunks)surfsense_backend/app/routes/scheduler_routes.py(1 hunks)surfsense_backend/app/schemas/connector_schedule.py(1 hunks)surfsense_backend/app/services/connector_scheduler_service.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
surfsense_backend/app/app.py (2)
surfsense_backend/app/services/connector_scheduler_service.py (2)
start_scheduler(371-374)stop_scheduler(377-382)surfsense_backend/app/db.py (1)
create_db_and_tables(477-481)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
BaseModel(155-159)ScheduleType(133-137)surfsense_backend/app/schemas/base.py (2)
IDModel(11-13)TimestampModel(6-8)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
ConnectorSchedule(299-330)ScheduleType(133-137)SearchSourceConnector(264-296)SearchSpace(221-261)User(424-433)User(437-443)get_async_session(484-486)surfsense_backend/app/schemas/connector_schedule.py (3)
ConnectorScheduleCreate(88-91)ConnectorScheduleRead(113-119)ConnectorScheduleUpdate(94-110)surfsense_backend/app/utils/check_ownership.py (1)
check_ownership(9-19)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-76)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
get_async_session(484-486)ConnectorSchedule(299-330)SearchSpace(221-261)surfsense_backend/app/schemas/connector_schedule.py (1)
ConnectorScheduleRead(113-119)surfsense_backend/app/services/connector_scheduler_service.py (3)
get_scheduler(363-368)get_scheduler_status(310-325)force_execute_schedule(327-356)
surfsense_backend/app/services/connector_scheduler_service.py (4)
surfsense_backend/app/db.py (4)
ConnectorSchedule(299-330)SearchSourceConnector(264-296)SearchSourceConnectorType(56-72)get_async_session(484-486)surfsense_backend/app/services/task_logging_service.py (4)
TaskLoggingService(13-243)log_task_start(20-58)log_task_failure(107-162)log_task_success(60-105)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-76)surfsense_backend/app/routes/scheduler_routes.py (2)
get_scheduler_status(24-45)force_execute_schedule(49-80)
🪛 GitHub Actions: Code Quality Checks
surfsense_backend/app/routes/scheduler_routes.py
[error] 167-173: Ruff lint: Undefined name 'Log' (F821) and related definitions; potential missing imports or model definitions. Also references to Log in multiple lines.
[error] 178-178: Ruff lint: Redefinition of unused 'logs' (F811) to be resolved after fixing previous undefined references.
🪛 GitHub Actions: pre-commit
surfsense_backend/app/routes/scheduler_routes.py
[error] 167-171: F821 Undefined name Log in query construction.
[error] 178-178: F811 Redefinition of unused logs.
🔇 Additional comments (9)
surfsense_backend/app/db.py (1)
133-330: LGTM! Past review concerns have been addressed.The ScheduleType enum and ConnectorSchedule model are correctly implemented with all required fields, including the previously missing schedule options (daily_time, weekly_day, weekly_time, hourly_minute). Relationships are properly configured with appropriate cascade settings.
surfsense_backend/alembic/versions/23_add_connector_schedules_table.py (1)
20-99: LGTM! Migration correctly implements all schedule fields.The migration properly creates the scheduletype enum and connector_schedules table with all required columns, including the schedule configuration options (daily_time, weekly_day, weekly_time, hourly_minute) that were flagged in past reviews. The use of inspector to check for existing indexes is a good practice.
surfsense_backend/app/app.py (1)
1-101: LGTM! Past review concerns have been addressed.The application correctly integrates the scheduler lifecycle with proper startup/shutdown handling, uses
contextlib.suppressfor CancelledError as suggested, and registers both the connector_schedules_router and scheduler_router. The logging additions improve observability.surfsense_backend/app/schemas/connector_schedule.py (1)
24-110: LGTM! Pydantic v2 validators correctly implemented.All field validators have been updated to use the correct Pydantic v2 signature with
info: FieldValidationInfoand access field data viainfo.data. The validation logic properly enforces schedule type constraints.surfsense_backend/app/routes/scheduler_routes.py (2)
23-45: LGTM!The scheduler status endpoint correctly delegates to the scheduler service and includes appropriate error handling.
48-92: LGTM!The force execute endpoint correctly queues the execution as a background task with proper error handling and logging.
surfsense_backend/app/services/connector_scheduler_service.py (2)
47-100: LGTM!The scheduler service initialization and lifecycle management (start/stop) are correctly implemented with proper cleanup of active jobs.
277-356: LGTM! Remaining methods correctly implemented.The schedule update methods and force execute functionality are correctly implemented with proper timezone usage at line 282. The TaskLoggingService integration concerns from past reviews have been resolved.
surfsense_backend/app/routes/connector_schedules_routes.py (1)
70-74: Previous issues resolved: search space validation and time parameters.The connector/search space mismatch validation (lines 70-74) and full time parameter passing to
calculate_next_run(lines 91-98) correctly address the critical issues flagged in previous reviews. The explicit field inclusion (lines 100-114) also prevents non-DB fields from being passed to the model constructor.Also applies to: 91-98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
surfsense_backend/app/routes/connector_schedules_routes.py (1)
353-359: Recalculate/clear next_run_at on toggle.On deactivate: set next_run_at=None. On activate: compute next run and ensure tz-aware UTC.
Apply:
# Toggle the active status schedule.is_active = not schedule.is_active + if schedule.is_active: + nr = calculate_next_run( + schedule.schedule_type, + schedule.cron_expression, + schedule.daily_time, + schedule.weekly_day, + schedule.weekly_time, + schedule.hourly_minute, + ) + if nr.tzinfo is None: + nr = nr.replace(tzinfo=timezone.utc) + schedule.next_run_at = nr + else: + schedule.next_run_at = None await session.commit()Also applies to: 355-357
surfsense_backend/app/routes/scheduler_routes.py (1)
63-68: Tighten log query filter.
- Add
Log.source == "connector_scheduler"to avoid capturing unrelated messages.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
surfsense_backend/app/routes/connector_schedules_routes.py(1 hunks)surfsense_backend/app/routes/scheduler_routes.py(1 hunks)surfsense_backend/app/schemas/connector_schedule.py(1 hunks)surfsense_backend/app/services/connector_scheduler_service.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
ConnectorSchedule(299-330)ScheduleType(133-137)SearchSourceConnector(264-296)SearchSpace(221-261)User(424-433)User(437-443)get_async_session(484-486)surfsense_backend/app/schemas/connector_schedule.py (3)
ConnectorScheduleCreate(88-91)ConnectorScheduleRead(167-173)ConnectorScheduleUpdate(94-164)surfsense_backend/app/utils/check_ownership.py (1)
check_ownership(9-19)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-76)
surfsense_backend/app/services/connector_scheduler_service.py (4)
surfsense_backend/app/db.py (4)
ConnectorSchedule(299-330)SearchSourceConnector(264-296)SearchSourceConnectorType(56-72)get_async_session(484-486)surfsense_backend/app/services/task_logging_service.py (4)
TaskLoggingService(13-243)log_task_start(20-58)log_task_failure(107-162)log_task_success(60-105)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-76)surfsense_backend/app/routes/scheduler_routes.py (2)
get_scheduler_status(24-45)force_execute_schedule(49-80)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
BaseModel(155-159)ScheduleType(133-137)surfsense_backend/app/schemas/base.py (2)
IDModel(11-13)TimestampModel(6-8)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
get_async_session(484-486)ConnectorSchedule(299-330)SearchSpace(221-261)surfsense_backend/app/schemas/connector_schedule.py (1)
ConnectorScheduleRead(167-173)surfsense_backend/app/services/connector_scheduler_service.py (3)
get_scheduler(363-368)get_scheduler_status(310-325)force_execute_schedule(327-356)
🔇 Additional comments (2)
surfsense_backend/app/schemas/connector_schedule.py (1)
24-37: Validators migrated to Pydantic v2 correctly.Using FieldValidationInfo and info.data is correct. Good cross‑field checks.
Based on learnings
surfsense_backend/app/services/connector_scheduler_service.py (1)
127-141: Timezone-aware scheduling and updates look correct.Using datetime.now(timezone.utc) and updating last_run_at/next_run_at with tz-aware values aligns with TIMESTAMP(timezone=True).
Also applies to: 277-305
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
surfsense_backend/app/routes/connector_schedules_routes.py (1)
164-179: Consider eager-loading relationships to avoid N+1 queries.If callers access the
connectororsearch_spacerelationships from the returned schedules, each access would trigger a separate query. You can optimize this by eager-loading:query = ( select(ConnectorSchedule) .join(SearchSourceConnector) .filter(SearchSourceConnector.user_id == user.id) + .options( + selectinload(ConnectorSchedule.connector), + selectinload(ConnectorSchedule.search_space) + ) )Don't forget to import
selectinload:from sqlalchemy.orm import selectinload
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
surfsense_backend/app/routes/connector_schedules_routes.py(1 hunks)surfsense_backend/app/utils/schedule_helpers.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- surfsense_backend/app/utils/schedule_helpers.py
🧰 Additional context used
🧬 Code graph analysis (1)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
ConnectorSchedule(299-330)ScheduleType(133-137)SearchSourceConnector(264-296)SearchSpace(221-261)User(424-433)User(437-443)get_async_session(484-486)surfsense_backend/app/schemas/connector_schedule.py (3)
ConnectorScheduleCreate(88-91)ConnectorScheduleRead(167-173)ConnectorScheduleUpdate(94-164)surfsense_backend/app/utils/check_ownership.py (1)
check_ownership(9-19)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-80)
🔇 Additional comments (2)
surfsense_backend/app/routes/connector_schedules_routes.py (2)
70-75: Excellent improvements addressing previous feedback!The create endpoint now properly:
- Enforces connector-space ownership match (lines 70-75)
- Passes all time parameters to
calculate_next_run(lines 94-100)- Explicitly includes only DB columns when creating the model (lines 104-116)
These changes resolve the critical issues flagged in previous reviews.
Also applies to: 91-101, 103-116
245-319: Update logic is complex but handles all scenarios correctly.The update endpoint properly handles:
- Schedule type changes with next_run_at recalculation (lines 270-285)
- Time field updates (lines 264-268, 270-285)
- Cron expression updates for CUSTOM schedules (lines 286-301)
- Activation/deactivation with appropriate next_run_at management (lines 303-319)
- Validation that cron_expression is only used with CUSTOM type (lines 257-262)
- Timezone-aware UTC enforcement (lines 282-283, 298-299, 317-318)
The conditional logic correctly avoids duplicate calculations and handles edge cases like simultaneous type+time changes or activation toggles combined with other updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
surfsense_backend/app/connectors/clickup_connector.py(2 hunks)surfsense_backend/app/connectors/jira_connector.py(2 hunks)
🔇 Additional comments (3)
surfsense_backend/app/connectors/clickup_connector.py (2)
8-8: LGTM!The
datetimeimport is correctly added to support the new date range parsing functionality.
188-193: Verify the date filter logic aligns with scheduling requirements.The implementation uses both
date_createdanddate_updatedfilters with the same timestamp range. ClickUp API typically applies OR logic between these filters, meaning tasks are returned if they were either created or updated in the range.This will include tasks that were created before
start_datebut updated during the range, which may or may not be the desired behavior for the scheduling system:
- ✅ Correct for incremental syncs: If the scheduler needs to catch task updates, this is appropriate
⚠️ Potentially incorrect for initial syncs: If only newly created tasks are desired, remove thedate_updatedfiltersPlease confirm this matches the intended scheduling behavior.
If you only want newly created tasks (not updates), apply this diff:
"include_closed": str(include_closed).lower(), - # Date filtering - filter by both created and updated dates + # Date filtering - filter by created date only "date_created_gt": start_timestamp, "date_created_lt": end_timestamp, - "date_updated_gt": start_timestamp, - "date_updated_lt": end_timestamp,surfsense_backend/app/connectors/jira_connector.py (1)
253-253: LGTM on passing constructed JQL into params.Using params["jql"] = _jql is correct.
| # Convert date strings to Unix timestamps (milliseconds) | ||
| start_datetime = datetime.strptime(start_date, "%Y-%m-%d") | ||
| end_datetime = datetime.strptime(end_date, "%Y-%m-%d") | ||
|
|
||
| # Set time to start and end of day for complete coverage | ||
| start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0) | ||
| end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999) | ||
|
|
||
| start_timestamp = int(start_datetime.timestamp() * 1000) | ||
| end_timestamp = int(end_datetime.timestamp() * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explicit timezone handling to prevent date boundary shifts.
The code uses naive datetime objects (no timezone), which will be interpreted in the server's local timezone when calling .timestamp(). If the server timezone differs from the user's expected timezone or ClickUp's timezone, this can cause the date range boundaries to shift unexpectedly, potentially fetching tasks from the wrong days.
Apply this diff to use UTC timezone explicitly:
- # Convert date strings to Unix timestamps (milliseconds)
- start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
- end_datetime = datetime.strptime(end_date, "%Y-%m-%d")
-
- # Set time to start and end of day for complete coverage
- start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)
- end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)
-
- start_timestamp = int(start_datetime.timestamp() * 1000)
- end_timestamp = int(end_datetime.timestamp() * 1000)
+ # Convert date strings to Unix timestamps (milliseconds)
+ # Use UTC timezone explicitly to ensure consistent date boundaries
+ from datetime import timezone
+
+ start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace(
+ hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc
+ )
+ end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace(
+ hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc
+ )
+
+ start_timestamp = int(start_datetime.timestamp() * 1000)
+ end_timestamp = int(end_datetime.timestamp() * 1000)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Convert date strings to Unix timestamps (milliseconds) | |
| start_datetime = datetime.strptime(start_date, "%Y-%m-%d") | |
| end_datetime = datetime.strptime(end_date, "%Y-%m-%d") | |
| # Set time to start and end of day for complete coverage | |
| start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0) | |
| end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999) | |
| start_timestamp = int(start_datetime.timestamp() * 1000) | |
| end_timestamp = int(end_datetime.timestamp() * 1000) | |
| # Convert date strings to Unix timestamps (milliseconds) | |
| # Use UTC timezone explicitly to ensure consistent date boundaries | |
| from datetime import timezone | |
| start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace( | |
| hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc | |
| ) | |
| end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace( | |
| hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc | |
| ) | |
| start_timestamp = int(start_datetime.timestamp() * 1000) | |
| end_timestamp = int(end_datetime.timestamp() * 1000) |
| f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') " | ||
| f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')" | ||
| ) | ||
| # TODO : This JQL needs some improvement to work as expected | ||
|
|
||
| _jql = f"{date_filter}" | ||
| _jql = f"{date_filter} ORDER BY created DESC" | ||
| if project_key: | ||
| _jql = ( | ||
| f'project = "{project_key}" AND {date_filter} ORDER BY created DESC' | ||
| f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC' | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix JQL field names; sanitize project_key (prevents 400s and injection).
Use JQL fields created/updated (not createdDate/updatedDate). Also escape quotes in project_key.
Apply this diff:
- date_filter = (
- f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') "
- f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')"
- )
+ date_filter = (
+ f"(created >= '{start_date}' AND created <= '{end_date}') "
+ f"OR (updated >= '{start_date}' AND updated <= '{end_date}')"
+ )
- _jql = f"{date_filter} ORDER BY created DESC"
+ _jql = f"{date_filter} ORDER BY created DESC"
if project_key:
- _jql = (
- f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC'
- )
+ safe_project = project_key.replace('"', '\\"')
+ _jql = (
+ f'project = "{safe_project}" AND ({date_filter}) ORDER BY created DESC'
+ )Optional refinements:
- Use startOfDay/endOfDay for inclusivity and TZ safety:
created >= startOfDay("YYYY-MM-DD") AND created <= endOfDay("YYYY-MM-DD") (same for updated). - Consider ORDER BY updated DESC to surface most recently changed issues.
🏁 Script executed:
#!/bin/bash
# Find any other occurrences of non-standard JQL fields
rg -nP -C2 '\b(createdDate|updatedDate)\b' --type=pyLength of output: 627
🌐 Web query:
Does Jira JQL use fields named 'createdDate' and 'updatedDate', or should it be 'created' and 'updated'?
💡 Result:
Use created and updated in JQL — createdDate and updatedDate are accepted aliases. [1][2]
Sources: Atlassian JQL fields reference (created → alias createdDate; updated → alias updatedDate). [1][2]
🏁 Script executed:
#!/bin/bash
sed -n '180,260p' surfsense_backend/app/connectors/jira_connector.pyLength of output: 2494
Escape project_key to prevent JQL injection
Sanitize or validate project_key (e.g. escape double quotes or restrict to alphanumeric) before embedding it in the JQL string.
🤖 Prompt for AI Agents
In surfsense_backend/app/connectors/jira_connector.py around lines 225 to 233,
the JQL string interpolates project_key directly which allows JQL injection;
validate or sanitize project_key before embedding it by either restricting it to
an allowed pattern (e.g. only letters, digits, hyphen/underscore) and raising an
error on invalid values, or by escaping any double quotes and backslashes in
project_key (replace " with \" and \ with \\) before inserting into the JQL;
prefer validation against a strict regex (e.g. alphanumeric with allowed
punctuation) to ensure only safe project keys are used.
local STT implementation with Faster-Whisper
chore: updated docs & refactored sst_service.py
- Fixed Jira connector to use constructed JQL query for date filtering - Fixed ClickUp connector to include date range parameters in API request - Resolved merge conflicts with remote branch improvements - Enhanced ClickUp date handling with complete day coverage (00:00:00 to 23:59:59)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
surfsense_backend/app/db.py (1)
285-297: Fix back_populates mismatches (will raise mapper configuration errors).
- SearchSourceConnector.search_space uses back_populates="search_source_connectors", but SearchSpace defines no search_source_connectors.
- SearchSourceConnector.user uses back_populates="search_source_connectors", but User defines no search_source_connectors.
- LLMConfig.search_space uses back_populates="llm_configs" (Lines 353-357), but SearchSpace defines no llm_configs.
- UserSearchSpacePreference.search_space uses back_populates="user_preferences" (Lines 391-393), but SearchSpace defines no user_preferences.
Add the missing counterparts (and avoid delete-orphan on both parents for the same child):
Apply this diff to restore/add the missing relationships:
class SearchSpace(BaseModel, TimestampMixin): @@ logs = relationship( "Log", back_populates="search_space", order_by="Log.id", cascade="all, delete-orphan", ) + # Parents of connectors/configs/preferences + search_source_connectors = relationship( + "SearchSourceConnector", + back_populates="search_space", + order_by="SearchSourceConnector.id", + cascade="all, delete-orphan", + ) + llm_configs = relationship( + "LLMConfig", + back_populates="search_space", + order_by="LLMConfig.id", + cascade="all, delete-orphan", + ) + user_preferences = relationship( + "UserSearchSpacePreference", + back_populates="search_space", + order_by="UserSearchSpacePreference.id", + cascade="all, delete-orphan", + ) connector_schedules = relationship( "ConnectorSchedule", back_populates="search_space", order_by="ConnectorSchedule.id", cascade="all, delete-orphan", )Add the missing User-side relationship in both auth variants:
@@ class User(SQLAlchemyBaseUserTableUUID, Base): oauth_accounts: Mapped[list[OAuthAccount]] = relationship( "OAuthAccount", lazy="joined" ) search_spaces = relationship("SearchSpace", back_populates="user") search_space_preferences = relationship( "UserSearchSpacePreference", back_populates="user", cascade="all, delete-orphan", ) + search_source_connectors = relationship( + "SearchSourceConnector", back_populates="user" + ) @@ class User(SQLAlchemyBaseUserTableUUID, Base): search_spaces = relationship("SearchSpace", back_populates="user") search_space_preferences = relationship( "UserSearchSpacePreference", back_populates="user", cascade="all, delete-orphan", ) + search_source_connectors = relationship( + "SearchSourceConnector", back_populates="user" + )
🧹 Nitpick comments (3)
surfsense_backend/app/services/stt_service.py (2)
25-35: Consider error handling for model initialization failures.The lazy loading pattern is appropriate, but model initialization could fail due to missing model files, insufficient memory, or other issues. Consider adding try-except around model creation to provide clearer error messages.
def _get_model(self) -> WhisperModel: """Lazy load the Whisper model.""" if self._model is None: - # Use CPU with optimizations for better performance - self._model = WhisperModel( - self.model_size, - device="cpu", - compute_type="int8", # Quantization for faster CPU inference - num_workers=1, # Single worker for stability - ) + try: + # Use CPU with optimizations for better performance + self._model = WhisperModel( + self.model_size, + device="cpu", + compute_type="int8", # Quantization for faster CPU inference + num_workers=1, # Single worker for stability + ) + except Exception as e: + raise RuntimeError( + f"Failed to initialize Whisper model '{self.model_size}': {e}" + ) from e return self._model
37-68: Add error handling for transcription failures.The
model.transcribe()call can fail for various reasons (corrupted audio, unsupported format, etc.). Consider adding try-except to provide more informative error messages rather than letting raw faster-whisper exceptions propagate.def transcribe_file(self, audio_path: str, language: str | None = None) -> dict: """Transcribe audio file to text. Args: audio_path: Path to audio file language: Optional language code (e.g., "en", "es") Returns: Dict with transcription text and metadata """ model = self._get_model() - # Transcribe with optimized settings - segments, info = model.transcribe( - audio_path, - language=language, - beam_size=1, # Faster inference - best_of=1, # Single pass - temperature=0, # Deterministic output - vad_filter=True, # Voice activity detection - vad_parameters={"min_silence_duration_ms": 500}, - ) + try: + # Transcribe with optimized settings + segments, info = model.transcribe( + audio_path, + language=language, + beam_size=1, # Faster inference + best_of=1, # Single pass + temperature=0, # Deterministic output + vad_filter=True, # Voice activity detection + vad_parameters={"min_silence_duration_ms": 500}, + ) + except Exception as e: + raise RuntimeError( + f"Failed to transcribe audio file '{audio_path}': {e}" + ) from e # Combine all segments text = " ".join(segment.text.strip() for segment in segments)surfsense_backend/app/db.py (1)
300-331: Add DB-level validation for ranges (defensive integrity).Add CHECK constraints to enforce valid ranges:
- weekly_day: 0..6
- hourly_minute: 0..59
Apply this diff:
-from sqlalchemy import ( +from sqlalchemy import ( ARRAY, JSON, TIMESTAMP, Boolean, Column, Enum as SQLAlchemyEnum, ForeignKey, Integer, String, Text, Time, UniqueConstraint, text, + CheckConstraint, ) @@ class ConnectorSchedule(BaseModel, TimestampMixin): __tablename__ = "connector_schedules" - __table_args__ = ( - UniqueConstraint( - "connector_id", "search_space_id", name="uq_connector_search_space" - ), - ) + __table_args__ = ( + UniqueConstraint( + "connector_id", "search_space_id", name="uq_connector_search_space" + ), + CheckConstraint( + "weekly_day IS NULL OR (weekly_day BETWEEN 0 AND 6)", + name="ck_connector_schedules_weekly_day_range", + ), + CheckConstraint( + "hourly_minute IS NULL OR (hourly_minute BETWEEN 0 AND 59)", + name="ck_connector_schedules_hourly_minute_range", + ), + )Optional: if you’ll filter frequently by schedule_type, consider index=True on schedule_type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
surfsense_backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
surfsense_backend/.env.example(1 hunks)surfsense_backend/app/config/__init__.py(1 hunks)surfsense_backend/app/db.py(4 hunks)surfsense_backend/app/routes/documents_routes.py(2 hunks)surfsense_backend/app/services/stt_service.py(1 hunks)surfsense_backend/pyproject.toml(1 hunks)surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx(1 hunks)surfsense_web/components/chat/ChatInputGroup.tsx(1 hunks)surfsense_web/components/inference-params-editor.tsx(1 hunks)surfsense_web/components/onboard/add-provider-step.tsx(1 hunks)surfsense_web/content/docs/docker-installation.mdx(1 hunks)surfsense_web/content/docs/manual-installation.mdx(1 hunks)surfsense_web/drizzle.config.ts(1 hunks)surfsense_web/hooks/use-connector-edit-page.ts(2 hunks)
✅ Files skipped from review due to trivial changes (6)
- surfsense_web/content/docs/manual-installation.mdx
- surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx
- surfsense_web/components/chat/ChatInputGroup.tsx
- surfsense_web/components/inference-params-editor.tsx
- surfsense_web/drizzle.config.ts
- surfsense_web/content/docs/docker-installation.mdx
🧰 Additional context used
📓 Path-based instructions (3)
**/.env.*
📄 CodeRabbit inference engine (.rules/no_env_files_in_repo.mdc)
Do not commit variant environment files like .env.* (e.g., .env.local, .env.production)
Files:
surfsense_backend/.env.example
**/.env.example
📄 CodeRabbit inference engine (.rules/no_env_files_in_repo.mdc)
Provide a .env.example file with placeholder values instead of real secrets
Files:
surfsense_backend/.env.example
**/*.{jsx,tsx}
📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)
**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings
Files:
surfsense_web/components/onboard/add-provider-step.tsx
🧬 Code graph analysis (1)
surfsense_backend/app/routes/documents_routes.py (2)
surfsense_backend/app/services/stt_service.py (1)
transcribe_file(37-68)surfsense_backend/app/services/task_logging_service.py (1)
log_task_progress(164-208)
🔇 Additional comments (16)
surfsense_web/hooks/use-connector-edit-page.ts (2)
340-340: LGTM! Validation logic is correct and more concise.The consolidated validation condition correctly checks that the parsed value is an integer between 0 and 2, matching the error message. This refactor improves readability without changing behavior.
519-519: LGTM! Form update is consistent with the codebase pattern.The single-line form value update follows the same pattern used throughout this file for other form fields, using the
||operator to provide a default empty string.surfsense_web/components/onboard/add-provider-step.tsx (1)
21-21: LGTM: Alphabetical import ordering.The reordering of
LLM_PROVIDERSimport to followLANGUAGESimproves consistency by maintaining alphabetical order within the enum imports group. This change has no functional impact.surfsense_backend/app/config/__init__.py (1)
105-105: LGTM! Improved comment accuracy.The updated comment correctly reflects that the STT service can now be either local (Faster-Whisper) or external (LiteLLM), rather than being limited to LiteLLM.
surfsense_backend/app/services/stt_service.py (2)
15-23: LGTM! Model size parsing with sensible fallback.The initialization correctly extracts the model size from the config string (e.g., "local/base") with an appropriate fallback to "base" if the format is unexpected.
86-96: LGTM! Proper temporary file cleanup.The temporary file is correctly cleaned up in the finally block, ensuring cleanup even if transcription fails.
surfsense_backend/app/routes/documents_routes.py (4)
787-793: LGTM! Clean STT service type detection.The logic correctly identifies local STT by checking if the config starts with "local/", with appropriate fallback to "external".
817-826: LGTM! Comprehensive progress logging for local STT.The progress logging includes relevant metadata (language, confidence, duration) and clearly indicates the local transcription path.
876-893: LGTM! Proper success logging with STT service type.The success logging correctly includes the
stt_service_typemetadata (line 885), enabling differentiation between local and external transcription in logs and analytics.
829-853: The file handle is safe and properly managed as-is.The current code already keeps the file open during the async operation. The
withstatement context manager is the outer scope of theawait atranscription()call, meaning the file remains open while the function reads it (as required by litellm), and only closes after theawaitcompletes. This is the correct and recommended pattern for handling file objects with litellm'satranscription.The suggested refactor to manual
open()/try/finallyprovides no benefit and is unnecessary—the context manager is superior as it guarantees proper cleanup even if an exception occurs.Likely an incorrect or invalid review comment.
surfsense_backend/.env.example (1)
34-40: LGTM! Clear STT configuration with local default.The updated configuration appropriately defaults to local STT (
local/base) with clear comments explaining both local and external provider options. The commented-out external provider settings serve as useful documentation.surfsense_backend/pyproject.toml (1)
46-46: Version constraint is appropriate.The
>=1.1.0constraint is correct. Verification confirms that faster-whisper v1.1.0 includes all required features: WhisperModel withdevice,compute_type, andnum_workersparameters, plus VAD filtering with customizable parameters. The constraint allows for security patches and compatible updates while maintaining the required functionality.surfsense_backend/app/db.py (4)
19-19: Good: added Time import needed for new schedule fields.
133-138: ScheduleType enum addition looks correct and scoped.
257-262: New SearchSpace.connector_schedules relationship looks good.
300-331: Alembic migration verified: all four schedule columns are present.The migration file
surfsense_backend/alembic/versions/23_add_connector_schedules_table.pyincludes all four new columns in the table creation:
daily_time TIMEweekly_day SMALLINTweekly_time TIMEhourly_minute SMALLINTModel, schemas, and service code all properly reference these columns. No further action needed.
| # Check if using local STT service | ||
| if stt_service_type == "local": | ||
| # Use local Faster-Whisper for transcription | ||
| from app.services.stt_service import stt_service | ||
|
|
||
| try: | ||
| result = stt_service.transcribe_file(file_path) | ||
| transcribed_text = result.get("text", "") | ||
|
|
||
| if not transcribed_text: | ||
| raise ValueError("Transcription returned empty text") | ||
|
|
||
| # Add metadata about the transcription | ||
| transcribed_text = ( | ||
| f"# Transcription of {filename}\n\n{transcribed_text}" | ||
| ) | ||
| else: | ||
| except Exception as e: | ||
| raise HTTPException( | ||
| status_code=422, | ||
| detail=f"Failed to transcribe audio file {filename}: {e!s}", | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace HTTPException with regular exception in background task.
Raising HTTPException on line 812 is incorrect for background task execution. Since this code runs in process_file_in_background, HTTP exceptions cannot be properly propagated to the client. This will likely cause the exception to be logged but not handled gracefully.
Apply this diff to use appropriate exception handling:
try:
result = stt_service.transcribe_file(file_path)
transcribed_text = result.get("text", "")
if not transcribed_text:
raise ValueError("Transcription returned empty text")
# Add metadata about the transcription
transcribed_text = (
f"# Transcription of {filename}\n\n{transcribed_text}"
)
except Exception as e:
- raise HTTPException(
- status_code=422,
- detail=f"Failed to transcribe audio file {filename}: {e!s}",
- ) from e
+ # Let the exception propagate to be caught by the outer try-except
+ # which will log it via task_logger.log_task_failure
+ raise RuntimeError(
+ f"Failed to transcribe audio file {filename}: {e!s}"
+ ) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Check if using local STT service | |
| if stt_service_type == "local": | |
| # Use local Faster-Whisper for transcription | |
| from app.services.stt_service import stt_service | |
| try: | |
| result = stt_service.transcribe_file(file_path) | |
| transcribed_text = result.get("text", "") | |
| if not transcribed_text: | |
| raise ValueError("Transcription returned empty text") | |
| # Add metadata about the transcription | |
| transcribed_text = ( | |
| f"# Transcription of {filename}\n\n{transcribed_text}" | |
| ) | |
| else: | |
| except Exception as e: | |
| raise HTTPException( | |
| status_code=422, | |
| detail=f"Failed to transcribe audio file {filename}: {e!s}", | |
| ) from e | |
| # Check if using local STT service | |
| if stt_service_type == "local": | |
| # Use local Faster-Whisper for transcription | |
| from app.services.stt_service import stt_service | |
| try: | |
| result = stt_service.transcribe_file(file_path) | |
| transcribed_text = result.get("text", "") | |
| if not transcribed_text: | |
| raise ValueError("Transcription returned empty text") | |
| # Add metadata about the transcription | |
| transcribed_text = ( | |
| f"# Transcription of {filename}\n\n{transcribed_text}" | |
| ) | |
| except Exception as e: | |
| # Let the exception propagate to be caught by the outer try-except | |
| # which will log it via task_logger.log_task_failure | |
| raise RuntimeError( | |
| f"Failed to transcribe audio file {filename}: {e!s}" | |
| ) from e |
🤖 Prompt for AI Agents
In surfsense_backend/app/routes/documents_routes.py around lines 795 to 815, the
background transcription block currently raises an HTTPException inside the
background task which cannot be propagated to the client; replace the
HTTPException with a regular Exception (or a custom non-HTTP exception) and
ensure you log the error before re-raising so the background worker can handle
it properly (e.g., raise Exception(f"Failed to transcribe audio file {filename}:
{e}") or log and return an error state/object instead of using HTTPException).
|
@vaishcodescape I think this should be done after we add Celery as the message queue. I believe this was already conveyed. Please DM me on Discord if you’d like to work on something that’s more likely to get merged. Closing this for now — sorry. |
Implemented Automated Connector Scheduling System
Description
This PR implements a comprehensive automated scheduling system that allows users to configure periodic syncs for their connectors (Slack, Notion, GitHub, Linear, etc.) without manual intervention. The system addresses critical user pain points around data staleness, manual overhead, and peak load issues.
The implementation includes:
Motivation and Context
Currently, users must manually trigger syncs for each connector to index new content into their search spaces. This creates several critical issues:
This automated scheduling system solves these problems by providing:
Changes Overview
Backend Infrastructure
ConnectorSchedulerService: Core background service managing schedule executionEnhanced Database Schema: Extended
ConnectorSchedulemodel with:Schedule Helpers: Enhanced time calculation utilities supporting:
API Endpoints
Schedule Management: Full CRUD operations for connector schedules
POST /api/v1/connector-schedules/- Create new schedulesGET /api/v1/connector-schedules/- List schedules with filteringPUT /api/v1/connector-schedules/{id}- Update existing schedulesDELETE /api/v1/connector-schedules/{id}- Remove schedulesPATCH /api/v1/connector-schedules/{id}/toggle- Activate/deactivateScheduler Monitoring: Real-time status and control endpoints
GET /api/v1/scheduler/status- Current scheduler health and statisticsPOST /api/v1/scheduler/schedules/{id}/force-execute- Manual executionGET /api/v1/scheduler/schedules/upcoming- Next scheduled executionsGET /api/v1/scheduler/schedules/recent-executions- Execution historyFrontend Implementation
Integration & Lifecycle Management
Application Integration: Seamless integration with FastAPI lifespan events
Background Task Integration: Leverages existing FastAPI BackgroundTasks infrastructure
Monitoring & Observability
Comprehensive Logging: Detailed execution tracking through existing
TaskLoggingServiceReal-time Status: Live monitoring capabilities
Technical Implementation Details
Architecture
Security & Validation
Performance Considerations
Files Modified/Created
New Files
app/services/connector_scheduler_service.py- Core scheduler implementationapp/routes/scheduler_routes.py- Scheduler monitoring and control APIsapp/dashboard/[search_space_id]/connectors/schedules/page.tsx- Frontend UIModified Files
app/app.py- Integrated scheduler lifecycle managementapp/schemas/connector_schedule.py- Enhanced validation and time optionsapp/utils/schedule_helpers.py- Extended time calculation utilitiesDatabase
alembic/versions/23_add_connector_schedules_table.py- Schema migration (already existed)API Changes
Types of changes
Testing
Checklist:
High-level PR Summary
This PR implements a comprehensive automated connector scheduling system that enables users to configure periodic syncs (hourly, daily, weekly, or custom cron-based) for their connectors without manual intervention. The implementation includes a background scheduler service that continuously monitors and executes scheduled jobs, enhanced database schema with the
ConnectorSchedulemodel supporting multiple schedule types, complete REST APIs for schedule CRUD operations and monitoring, a modern React frontend for schedule management, and robust error handling with automatic retry logic and comprehensive logging. The scheduler integrates seamlessly with FastAPI lifespan events and leverages existing connector indexer functions to maintain user isolation and security boundaries.⏱️ Estimated Review Time: 1-3 hours
💡 Review Order Suggestion
surfsense_backend/alembic/versions/23_add_connector_schedules_table.pysurfsense_backend/app/db.pysurfsense_backend/app/schemas/connector_schedule.pysurfsense_backend/app/schemas/__init__.pysurfsense_backend/app/utils/schedule_helpers.pysurfsense_backend/app/services/connector_scheduler_service.pysurfsense_backend/app/routes/connector_schedules_routes.pysurfsense_backend/app/routes/scheduler_routes.pysurfsense_backend/app/app.pysurfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsxsurfsense_web/package-lock.json.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xml.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xmlHigh-level PR Summary
This PR implements a comprehensive automated connector scheduling system that enables users to configure periodic syncs (hourly, daily, weekly, or custom cron-based) for their connectors without manual intervention. The implementation includes a background scheduler service (
ConnectorSchedulerService) that continuously monitors and executes scheduled jobs with configurable concurrency limits, enhanced database schema with theConnectorSchedulemodel supporting multiple schedule types and tracking fields, complete REST APIs for schedule CRUD operations and real-time monitoring, a modern React frontend for intuitive schedule management, and robust error handling with automatic retry logic and comprehensive logging. The scheduler integrates seamlessly with FastAPI lifespan events, leverages existing connector indexer functions to maintain user isolation and security boundaries, and provides real-time status tracking with execution history for monitoring and debugging.⏱️ Estimated Review Time: 1-3 hours
💡 Review Order Suggestion
surfsense_backend/alembic/versions/23_add_connector_schedules_table.pysurfsense_backend/app/db.pysurfsense_backend/app/schemas/connector_schedule.pysurfsense_backend/app/schemas/__init__.pysurfsense_backend/app/utils/schedule_helpers.pysurfsense_backend/app/services/connector_scheduler_service.pysurfsense_backend/app/routes/connector_schedules_routes.pysurfsense_backend/app/routes/scheduler_routes.pysurfsense_backend/app/app.pysurfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsxsurfsense_web/package-lock.json.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xml.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xmlSummary by CodeRabbit
New Features
Improvements
Chores