Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@MODSetter
Copy link
Owner

@MODSetter MODSetter commented Jun 7, 2025

  • This should stabilize manual syning.

Description

feat: Added Calendar Based Indexing.

Motivation and Context

To stabilize manual syncing.

Screenshots

@
calender_based_indexing_trigger

API Changes

  • This PR includes API changes

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance improvement (non-breaking change which enhances performance)
  • Documentation update
  • Breaking change (fix or feature that would cause existing functionality to change)

Testing

  • I have tested these changes locally
  • I have added/updated unit tests
  • I have added/updated integration tests

Checklist:

  • My code follows the code style of this project
  • My change requires documentation updates
  • I have updated the documentation accordingly
  • My change requires dependency updates
  • I have updated the dependencies accordingly
  • My code builds clean without any errors or warnings
  • All new and existing tests passed

Summary by CodeRabbit

  • New Features

    • Added the ability to select a custom date range when indexing connector content, allowing users to specify start and end dates for indexing.
    • Introduced a calendar date picker for selecting date ranges in the UI.
    • Added quick indexing and date range indexing options for connectors.
  • Improvements

    • Enhanced indexing to consistently support date range parameters across all connector types.
    • Updated notifications to reflect the outcome of indexing actions.
  • Dependencies

    • Added the react-day-picker library to support the new calendar component.

- This should stabalize manual syning.
@vercel
Copy link

vercel bot commented Jun 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
surf-sense-frontend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 7, 2025 1:31am

@coderabbitai
Copy link

coderabbitai bot commented Jun 7, 2025

Walkthrough

The indexing functionality for connector content was enhanced to support optional date range selection. Backend endpoints, background tasks, and indexing functions now accept and propagate start_date and end_date parameters. The frontend UI introduces a date range picker dialog and quick index option, with supporting calendar components and updated API calls to handle the new parameters.

Changes

File(s) Change Summary
surfsense_backend/app/routes/search_source_connectors_routes.py Enhanced indexing endpoint and background task triggers to accept and propagate start_date/end_date params.
surfsense_backend/app/tasks/connectors_indexing_tasks.py Updated all connector indexing functions to accept and use optional date range parameters, with improved logging.
surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx Added date range picker UI, split index action into quick and date-range options, updated handlers and state.
surfsense_web/components/ui/calendar.tsx Introduced a reusable, styled calendar component for date picking, with custom day buttons and accessibility.
surfsense_web/hooks/useSearchSourceConnectors.ts Modified indexConnector to accept and send optional date range parameters in API requests.
surfsense_web/package.json Added react-day-picker dependency to support the new calendar component.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant IndexingTask

    User->>Frontend: Click "Index with Date Range"
    Frontend->>User: Show date picker dialog
    User->>Frontend: Select start and end dates, confirm
    Frontend->>Backend: POST /search-source-connectors/{id}/index?start_date&end_date
    Backend->>IndexingTask: Trigger indexing task with date range
    IndexingTask->>Backend: Index content within date range
    Backend->>Frontend: Respond with indexing status
    Frontend->>User: Show success/failure notification
Loading

Poem

🐇
A calendar hops into view,
Now connectors know just what to do—
Pick your days, or index fast,
With date range logic built to last.
Backend and frontend, side by side,
In harmony, they now reside!
🗓️✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99fa03d and 37f5458.

📒 Files selected for processing (1)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@recurseml
Copy link

recurseml bot commented Jun 7, 2025

⚠️ Only 5 files will be analyzed due to processing limits.

1 similar comment
@recurseml
Copy link

recurseml bot commented Jun 7, 2025

⚠️ Only 5 files will be analyzed due to processing limits.

async def index_connector_content(
connector_id: int,
search_space_id: int = Query(..., description="ID of the search space to store indexed content"),
start_date: str = Query(None, description="Start date for indexing (YYYY-MM-DD format). If not provided, uses last_indexed_at or defaults to 365 days ago"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing date format validation for start_date parameter. While the description specifies YYYY-MM-DD format, there is no validation of the input string format. Invalid date strings could cause runtime errors in datetime operations later in the code.


React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

const startDateStr = startDate ? format(startDate, "yyyy-MM-dd") : undefined;
const endDateStr = endDate ? format(endDate, "yyyy-MM-dd") : undefined;

await indexConnector(selectedConnectorForIndexing, searchSpaceId, startDateStr, endDateStr);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code attempts to pass startDateStr and endDateStr parameters to indexConnector, but the function call at line 172 reveals that the original implementation only accepts two parameters (connectorId and searchSpaceId). This mismatch in function parameters will likely cause the date range indexing to fail as the backend API may not be prepared to handle these additional parameters.


React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it accepts it bro

const handleIndexConnector = async () => {
if (selectedConnectorForIndexing === null) return;

setIndexingConnectorId(selectedConnectorForIndexing);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code sets indexingConnectorId before making the API call and managing error states. If the API call fails, the finally block in handleIndexConnector correctly resets it, but if there's a problem before the API call (e.g., date validation failure), the loading state won't be cleared since the finally block won't be reached. This could leave the UI in a perpetual loading state.


React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

@recurseml
Copy link

recurseml bot commented Jun 7, 2025

😱 Found 3 issues. Time to roll up your sleeves! 😱

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)

556-567: 🛠️ Refactor suggestion

Update docstring to clarify that date parameters are not used for GitHub indexing.

The function accepts start_date and end_date parameters but doesn't use them for filtering. This should be documented clearly.

     """
     Index code and documentation files from accessible GitHub repositories.

     Args:
         session: Database session
         connector_id: ID of the GitHub connector
         search_space_id: ID of the search space to store documents in
+        start_date: Accepted for API consistency but not used - GitHub indexes all files
+        end_date: Accepted for API consistency but not used - GitHub indexes all files
         update_last_indexed: Whether to update the last_indexed_at timestamp (default: True)

     Returns:
         Tuple containing (number of documents indexed, error message or None)
     """
♻️ Duplicate comments (1)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)

759-786: Apply the same date handling simplification as suggested for Slack indexing.

This function has the same timezone comparison issues and redundant date calculation logic as index_slack_messages.

🧹 Nitpick comments (4)
surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx (1)

269-327: Consider extracting the date picker buttons to improve maintainability.

The dual-button implementation with tooltips works well functionally, but the inline button definitions make the component quite verbose.

Consider extracting this to a separate component:

const IndexingButtons = ({ 
  connector, 
  isIndexing, 
  onOpenDatePicker, 
  onQuickIndex 
}) => (
  <div className="flex gap-1">
    <TooltipProvider>
      <Tooltip>
        <TooltipTrigger asChild>
          <Button
            variant="outline"
            size="sm"
            onClick={() => onOpenDatePicker(connector.id)}
            disabled={isIndexing}
          >
            {isIndexing ? (
              <RefreshCw className="h-4 w-4 animate-spin" />
            ) : (
              <CalendarIcon className="h-4 w-4" />
            )}
            <span className="sr-only">Index with Date Range</span>
          </Button>
        </TooltipTrigger>
        <TooltipContent>
          <p>Index with Date Range</p>
        </TooltipContent>
      </Tooltip>
    </TooltipProvider>
    {/* Quick index button */}
  </div>
);
surfsense_web/components/ui/calendar.tsx (1)

43-123: Comprehensive styling implementation but consider refactoring for maintainability.

The className definitions provide extensive customization but are becoming difficult to maintain due to their length and complexity.

Consider extracting complex className definitions to constants:

const calendarClassNames = {
  root: cn("w-fit", defaultClassNames.root),
  months: cn("flex gap-4 flex-col md:flex-row relative", defaultClassNames.months),
  // ... other class definitions
  day: cn(
    "relative w-full h-full p-0 text-center",
    "[&:first-child[data-selected=true]_button]:rounded-l-md",
    "[&:last-child[data-selected=true]_button]:rounded-r-md",
    "group/day aspect-square select-none",
    defaultClassNames.day
  ),
};
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)

999-1028: Good timezone handling, but consider simplifying the date logic.

This function correctly uses timezone-aware datetime objects, which is better than the other indexing functions. However, the date calculation logic could still be simplified.

Consider extracting the date conversion logic into a helper function to reduce duplication:

def convert_to_iso_date(date_str: str, timezone_obj=timezone.utc) -> str:
    """Convert YYYY-MM-DD string to ISO format with timezone."""
    if date_str:
        return datetime.strptime(date_str, "%Y-%m-%d").replace(tzinfo=timezone_obj).isoformat()
    return None

This would make the code more maintainable and consistent across all indexing functions.

surfsense_backend/app/routes/search_source_connectors_routes.py (1)

323-327: Use ternary operator for conciseness.

As suggested by static analysis, simplify the conditional:

-        if end_date is None:
-            indexing_to = today_str
-        else:
-            indexing_to = end_date
+        indexing_to = today_str if end_date is None else end_date
🧰 Tools
🪛 Ruff (0.11.9)

323-326: Use ternary operator indexing_to = today_str if end_date is None else end_date instead of if-else-block

Replace if-else-block with indexing_to = today_str if end_date is None else end_date

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74b3017 and 99fa03d.

⛔ Files ignored due to path filters (1)
  • surfsense_web/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (6)
  • surfsense_backend/app/routes/search_source_connectors_routes.py (9 hunks)
  • surfsense_backend/app/tasks/connectors_indexing_tasks.py (12 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx (6 hunks)
  • surfsense_web/components/ui/calendar.tsx (1 hunks)
  • surfsense_web/hooks/useSearchSourceConnectors.ts (1 hunks)
  • surfsense_web/package.json (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
surfsense_backend/app/routes/search_source_connectors_routes.py (2)
surfsense_backend/app/db.py (1)
  • SearchSourceConnectorType (55-63)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (5)
  • index_slack_messages (23-286)
  • index_notion_pages (288-546)
  • index_github_repos (548-714)
  • index_linear_issues (716-952)
  • index_discord_messages (954-1225)
🪛 Ruff (0.11.9)
surfsense_backend/app/routes/search_source_connectors_routes.py

323-326: Use ternary operator indexing_to = today_str if end_date is None else end_date instead of if-else-block

Replace if-else-block with indexing_to = today_str if end_date is None else end_date

(SIM108)

🔇 Additional comments (13)
surfsense_web/package.json (1)

53-53:

✅ Verification successful

Verify the latest version of react-day-picker.

The dependency addition looks good and aligns with the calendar-based indexing feature. However, please verify that version ^9.7.0 is the latest stable version available.


🌐 Web query:

What is the latest stable version of react-day-picker npm package?

💡 Result:

The latest stable version of react-day-picker npm package is 9.7.0, which was released on May 14, 2025[1][7][8]. This version includes improvements to accessibility, styling, internationalization, and performance compared to previous versions. The package is designed for creating date pickers, calendars, and date inputs for React web applications, and it's written in TypeScript with compilation to CommonJS and ESM formats[1][8].

React DayPicker offers numerous features including customizable calendars, support for various selection modes (single day, multiple days, or ranges), localization capabilities, and compatibility with different calendar systems including ISO 8601, Persian, and broadcast calendars[1]. It also complies with WCAG 2.1 AA requirements for accessibility[1][8].

It's worth noting that there have been some compatibility issues reported with React 19, so developers using the latest React version may encounter deployment challenges[9]. The package requires React 16.8 or later for proper functionality[1].

Citations:


react-day-picker version confirmed as latest (9.7.0)

Confirmed that version 9.7.0 is the current stable release of react-day-picker (released May 14, 2025). No further changes are needed.

surfsense_web/hooks/useSearchSourceConnectors.ts (2)

256-261: Well-implemented function signature enhancement.

The addition of optional startDate and endDate parameters maintains backward compatibility while extending functionality for the calendar-based indexing feature.


269-279: Correct query parameter construction.

The URLSearchParams usage is appropriate and handles optional date parameters correctly. The implementation properly constructs the query string and includes it in the fetch URL.

surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx (5)

12-12: Good import organization for new calendar functionality.

The new imports properly support the calendar-based indexing feature. The addition of Calendar icon and date formatting utilities aligns well with the enhanced UI.

Also applies to: 49-63


107-111: Proper state management for date picker functionality.

The new state variables are well-named and appropriately typed for managing the date picker dialog and selected dates.


134-166: Well-structured date-based indexing handler.

The implementation correctly handles date formatting, state management, and error handling. The cleanup logic ensures proper state reset after operations.


168-184: Good separation of quick vs date-range indexing.

Providing separate handlers for quick indexing (without dates) and date-range indexing improves user experience and maintains the existing workflow.


390-516: Comprehensive date picker dialog implementation.

The dialog provides excellent user experience with calendar popovers, preset ranges, and proper validation. The date constraints prevent invalid selections and the preset buttons offer convenient options.

surfsense_web/components/ui/calendar.tsx (3)

14-26: Well-structured Calendar component interface.

The component properly extends DayPicker props and adds custom buttonVariant option. The TypeScript typing is accurate and the prop defaults are sensible.


28-42: Good usage of react-day-picker foundation.

The component properly leverages the DayPicker base with appropriate defaults and custom formatters. The month dropdown formatting enhancement is a nice touch.


172-208: Well-implemented custom day button with proper focus management.

The CalendarDayButton component correctly handles focus states, data attributes, and styling variants. The ref management and useEffect for focus is properly implemented.

surfsense_backend/app/routes/search_source_connectors_routes.py (2)

311-319: Clarify the logic for adjusting the start date when last indexed today.

The special case where the start date is set to yesterday when last_indexed_at is today could be confusing and may lead to re-indexing the same data.

Consider:

  1. Document why this adjustment is necessary
  2. Make it configurable or remove it if not essential
  3. Consider using timestamps instead of dates to avoid this issue
-                if connector.last_indexed_at.date() == today:
-                    # If last indexed today, go back 1 day to ensure we don't miss anything
-                    indexing_from = (today - timedelta(days=1)).strftime("%Y-%m-%d")
-                else:
-                    indexing_from = connector.last_indexed_at.strftime("%Y-%m-%d")
+                # Use the exact last indexed timestamp
+                indexing_from = connector.last_indexed_at.strftime("%Y-%m-%d")
+                logger.info(f"Using last_indexed_at date: {indexing_from}")

410-634: Well-structured background task updates!

The consistent pattern of updating all background task functions to accept and propagate the date parameters is clean and maintainable. The separation between session wrappers and actual task runners is a good design choice.

Comment on lines +200 to +204
className={cn(
"data-[selected-single=true]:bg-primary data-[selected-single=true]:text-primary-foreground data-[range-middle=true]:bg-accent data-[range-middle=true]:text-accent-foreground data-[range-start=true]:bg-primary data-[range-start=true]:text-primary-foreground data-[range-end=true]:bg-primary data-[range-end=true]:text-primary-foreground group-data-[focused=true]/day:border-ring group-data-[focused=true]/day:ring-ring/50 dark:hover:text-accent-foreground flex aspect-square size-auto w-full min-w-(--cell-size) flex-col gap-1 leading-none font-normal group-data-[focused=true]/day:relative group-data-[focused=true]/day:z-10 group-data-[focused=true]/day:ring-[3px] data-[range-end=true]:rounded-md data-[range-end=true]:rounded-r-md data-[range-middle=true]:rounded-none data-[range-start=true]:rounded-md data-[range-start=true]:rounded-l-md [&>span]:text-xs [&>span]:opacity-70",
defaultClassNames.day,
className
)}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor the extremely long className for better maintainability.

The className string is over 1000 characters long, making it very difficult to read and maintain.

Break down the className into logical groups:

const baseClasses = cn(
  "flex aspect-square size-auto w-full min-w-(--cell-size) flex-col gap-1 leading-none font-normal",
  defaultClassNames.day,
  className
);

const stateClasses = cn(
  "data-[selected-single=true]:bg-primary data-[selected-single=true]:text-primary-foreground",
  "data-[range-middle=true]:bg-accent data-[range-middle=true]:text-accent-foreground",
  "data-[range-start=true]:bg-primary data-[range-start=true]:text-primary-foreground",
  "data-[range-end=true]:bg-primary data-[range-end=true]:text-primary-foreground"
);

const focusClasses = cn(
  "group-data-[focused=true]/day:border-ring group-data-[focused=true]/day:ring-ring/50",
  "group-data-[focused=true]/day:relative group-data-[focused=true]/day:z-10 group-data-[focused=true]/day:ring-[3px]"
);

const roundingClasses = cn(
  "data-[range-end=true]:rounded-md data-[range-end=true]:rounded-r-md",
  "data-[range-middle=true]:rounded-none",
  "data-[range-start=true]:rounded-md data-[range-start=true]:rounded-l-md"
);

className={cn(baseClasses, stateClasses, focusClasses, roundingClasses, "[&>span]:text-xs [&>span]:opacity-70")}
🤖 Prompt for AI Agents
In surfsense_web/components/ui/calendar.tsx around lines 200 to 204, the
className string is excessively long and hard to maintain. Refactor by splitting
the className into multiple logical groups such as baseClasses, stateClasses,
focusClasses, and roundingClasses, each containing related class names. Then
combine these groups using the cn function to form the final className. This
improves readability and maintainability without changing functionality.

Comment on lines +332 to 353
if start_date is None or end_date is None:
# Fall back to calculating dates
calculated_end_date = datetime.now()
calculated_start_date = calculated_end_date - timedelta(days=365) # Check for last 1 year of pages

# Use calculated dates if not provided
if start_date is None:
start_date_iso = calculated_start_date.strftime("%Y-%m-%dT%H:%M:%SZ")
else:
# Convert YYYY-MM-DD to ISO format
start_date_iso = datetime.strptime(start_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")

if end_date is None:
end_date_iso = calculated_end_date.strftime("%Y-%m-%dT%H:%M:%SZ")
else:
# Convert YYYY-MM-DD to ISO format
end_date_iso = datetime.strptime(end_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
else:
# Convert provided dates to ISO format for Notion API
start_date_iso = datetime.strptime(start_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
end_date_iso = datetime.strptime(end_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify date conversion logic for better readability.

The nested conditionals for date conversion are hard to follow and could be simplified.

-        # Calculate date range
-        if start_date is None or end_date is None:
-            # Fall back to calculating dates
-            calculated_end_date = datetime.now()
-            calculated_start_date = calculated_end_date - timedelta(days=365)  # Check for last 1 year of pages
-            
-            # Use calculated dates if not provided
-            if start_date is None:
-                start_date_iso = calculated_start_date.strftime("%Y-%m-%dT%H:%M:%SZ")
-            else:
-                # Convert YYYY-MM-DD to ISO format
-                start_date_iso = datetime.strptime(start_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
-                
-            if end_date is None:
-                end_date_iso = calculated_end_date.strftime("%Y-%m-%dT%H:%M:%SZ")
-            else:
-                # Convert YYYY-MM-DD to ISO format
-                end_date_iso = datetime.strptime(end_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
-        else:
-            # Convert provided dates to ISO format for Notion API
-            start_date_iso = datetime.strptime(start_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
-            end_date_iso = datetime.strptime(end_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
+        # Convert dates to ISO format for Notion API
+        if start_date:
+            start_date_iso = datetime.strptime(start_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
+        else:
+            start_date_iso = (datetime.now() - timedelta(days=365)).strftime("%Y-%m-%dT%H:%M:%SZ")
+            
+        if end_date:
+            end_date_iso = datetime.strptime(end_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
+        else:
+            end_date_iso = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

Also, consider adding error handling for invalid date formats:

try:
    start_date_iso = datetime.strptime(start_date, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%SZ")
except ValueError as e:
    logger.error(f"Invalid start_date format: {start_date}")
    return 0, f"Invalid start_date format. Expected YYYY-MM-DD"
🤖 Prompt for AI Agents
In surfsense_backend/app/tasks/connectors_indexing_tasks.py around lines 332 to
353, the date conversion logic is overly nested and hard to read. Refactor the
code to flatten the conditional structure by first setting default dates if
start_date or end_date is None, then convert both dates to ISO format in a
straightforward manner. Additionally, add try-except blocks around the
datetime.strptime calls to catch ValueError exceptions for invalid date formats,
log an error message with the invalid input, and return an appropriate error
response.

Comment on lines +66 to +94
if start_date is None or end_date is None:
# Fall back to calculating dates based on last_indexed_at
calculated_end_date = datetime.now()

# Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
start_date = end_date - timedelta(days=365)
# Use last_indexed_at as start date if available, otherwise use 365 days ago
if connector.last_indexed_at:
# Convert dates to be comparable (both timezone-naive)
last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at

# Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > calculated_end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
calculated_start_date = calculated_end_date - timedelta(days=365)
else:
calculated_start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({calculated_start_date.strftime('%Y-%m-%d')}) as start date")
else:
start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
calculated_start_date = calculated_end_date - timedelta(days=365) # Use 365 days as default
logger.info(f"No last_indexed_at found, using {calculated_start_date.strftime('%Y-%m-%d')} (365 days ago) as start date")

# Use calculated dates if not provided
start_date_str = start_date if start_date else calculated_start_date.strftime("%Y-%m-%d")
end_date_str = end_date if end_date else calculated_end_date.strftime("%Y-%m-%d")
else:
start_date = end_date - timedelta(days=365) # Use 365 days as default
logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")

# Format dates for Slack API
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
# Use provided dates
start_date_str = start_date
end_date_str = end_date

logger.info(f"Indexing Slack messages from {start_date_str} to {end_date_str}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Simplify date calculation logic and fix timezone handling issues.

The date calculation logic has several issues:

  1. The route handler already calculates and passes the date range, making this fallback logic redundant when called from the API.
  2. Timezone comparison on line 76 could fail - you're comparing timezone-naive datetimes.
  3. The variable naming is confusing - start_date parameter vs start_date_str variable.

Consider simplifying this logic:

-        # Calculate date range
-        if start_date is None or end_date is None:
-            # Fall back to calculating dates based on last_indexed_at
-            calculated_end_date = datetime.now()
-            
-            # Use last_indexed_at as start date if available, otherwise use 365 days ago
-            if connector.last_indexed_at:
-                # Convert dates to be comparable (both timezone-naive)
-                last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at
-                
-                # Check if last_indexed_at is in the future or after end_date
-                if last_indexed_naive > calculated_end_date:
-                    logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
-                    calculated_start_date = calculated_end_date - timedelta(days=365)
-                else:
-                    calculated_start_date = last_indexed_naive
-                    logger.info(f"Using last_indexed_at ({calculated_start_date.strftime('%Y-%m-%d')}) as start date")
-            else:
-                calculated_start_date = calculated_end_date - timedelta(days=365)  # Use 365 days as default
-                logger.info(f"No last_indexed_at found, using {calculated_start_date.strftime('%Y-%m-%d')} (365 days ago) as start date")
-            
-            # Use calculated dates if not provided
-            start_date_str = start_date if start_date else calculated_start_date.strftime("%Y-%m-%d")
-            end_date_str = end_date if end_date else calculated_end_date.strftime("%Y-%m-%d")
-        else:
-            # Use provided dates
-            start_date_str = start_date
-            end_date_str = end_date
+        # Use provided dates or calculate defaults
+        if start_date and end_date:
+            start_date_str = start_date
+            end_date_str = end_date
+        else:
+            # This should only happen when called directly, not from the API
+            logger.warning("Date range not provided to index_slack_messages. Using defaults.")
+            end_date_str = datetime.now().strftime("%Y-%m-%d")
+            if connector.last_indexed_at:
+                start_date_str = connector.last_indexed_at.strftime("%Y-%m-%d")
+            else:
+                start_date_str = (datetime.now() - timedelta(days=365)).strftime("%Y-%m-%d")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if start_date is None or end_date is None:
# Fall back to calculating dates based on last_indexed_at
calculated_end_date = datetime.now()
# Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
start_date = end_date - timedelta(days=365)
# Use last_indexed_at as start date if available, otherwise use 365 days ago
if connector.last_indexed_at:
# Convert dates to be comparable (both timezone-naive)
last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at
# Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > calculated_end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
calculated_start_date = calculated_end_date - timedelta(days=365)
else:
calculated_start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({calculated_start_date.strftime('%Y-%m-%d')}) as start date")
else:
start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
calculated_start_date = calculated_end_date - timedelta(days=365) # Use 365 days as default
logger.info(f"No last_indexed_at found, using {calculated_start_date.strftime('%Y-%m-%d')} (365 days ago) as start date")
# Use calculated dates if not provided
start_date_str = start_date if start_date else calculated_start_date.strftime("%Y-%m-%d")
end_date_str = end_date if end_date else calculated_end_date.strftime("%Y-%m-%d")
else:
start_date = end_date - timedelta(days=365) # Use 365 days as default
logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")
# Format dates for Slack API
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
# Use provided dates
start_date_str = start_date
end_date_str = end_date
logger.info(f"Indexing Slack messages from {start_date_str} to {end_date_str}")
# Use provided dates or calculate defaults
if start_date and end_date:
start_date_str = start_date
end_date_str = end_date
else:
# This should only happen when called directly, not from the API
logger.warning("Date range not provided to index_slack_messages. Using defaults.")
end_date_str = datetime.now().strftime("%Y-%m-%d")
if connector.last_indexed_at:
start_date_str = connector.last_indexed_at.strftime("%Y-%m-%d")
else:
start_date_str = (datetime.now() - timedelta(days=365)).strftime("%Y-%m-%d")
logger.info(f"Indexing Slack messages from {start_date_str} to {end_date_str}")
🤖 Prompt for AI Agents
In surfsense_backend/app/tasks/connectors_indexing_tasks.py around lines 66 to
94, simplify the date calculation by removing redundant fallback logic since the
route handler already provides the date range. Fix timezone handling by ensuring
all datetime comparisons use timezone-aware objects or convert consistently to
naive datetimes before comparing. Rename variables to clearly distinguish
between input parameters (start_date, end_date) and formatted string variables
(e.g., start_date_str) to avoid confusion. Streamline the logic to directly use
provided dates when available and only calculate defaults if both are None,
handling timezone consistently.

Comment on lines +273 to +274
start_date: str = Query(None, description="Start date for indexing (YYYY-MM-DD format). If not provided, uses last_indexed_at or defaults to 365 days ago"),
end_date: str = Query(None, description="End date for indexing (YYYY-MM-DD format). If not provided, uses today's date"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add date format validation for query parameters.

The endpoint accepts date strings but doesn't validate their format. Invalid dates will cause errors in the indexing functions.

Add validation:

from datetime import datetime

# After getting the parameters
if start_date:
    try:
        datetime.strptime(start_date, "%Y-%m-%d")
    except ValueError:
        raise HTTPException(
            status_code=400,
            detail="Invalid start_date format. Expected YYYY-MM-DD"
        )
        
if end_date:
    try:
        datetime.strptime(end_date, "%Y-%m-%d")
    except ValueError:
        raise HTTPException(
            status_code=400,
            detail="Invalid end_date format. Expected YYYY-MM-DD"
        )
🤖 Prompt for AI Agents
In surfsense_backend/app/routes/search_source_connectors_routes.py around lines
273 to 274, the start_date and end_date query parameters lack validation for the
expected YYYY-MM-DD format, which can cause errors later. Add validation after
receiving these parameters by attempting to parse them with datetime.strptime
using the "%Y-%m-%d" format inside try-except blocks. If parsing fails, raise an
HTTPException with status code 400 and a clear error message indicating the
invalid date format for each parameter.

@MODSetter MODSetter merged commit d0e9fdf into main Jun 7, 2025
3 checks passed
AbdullahAlMousawi pushed a commit to AbdullahAlMousawi/SurfSense that referenced this pull request Jul 14, 2025
feat: Added Calender Based Indexing.
CREDO23 pushed a commit to CREDO23/SurfSense that referenced this pull request Jul 25, 2025
feat: Added Calender Based Indexing.
@coderabbitai coderabbitai bot mentioned this pull request Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant