-
-
Notifications
You must be signed in to change notification settings - Fork 785
Fix/slack rate limiting & Github Repos ORG Filtering #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The `get_all_channels` method in `slack_history.py` was making paginated requests to `conversations.list` without any delay, leading to HTTP 429 errors when fetching channels from large Slack workspaces. This commit introduces the following changes: - Adds a 3-second delay between paginated calls to `conversations.list` to comply with Slack's Tier 2 rate limits (approx. 20 requests/minute). - Implements handling for the `Retry-After` header when a 429 error is received. The system will wait for the specified duration before retrying. If the header is missing or invalid, a default of 60 seconds is used. - Adds comprehensive unit tests to verify the new delay and retry logic, covering scenarios with and without the `Retry-After` header, as well as other API errors.
This commit includes two main improvements: 1. Slack Connector (`slack_history.py`): - Addresses API rate limiting for `conversations.list` by introducing a 3-second delay between paginated calls. - Implements handling for the `Retry-After` header when HTTP 429 errors occur. - Fixes a `SyntaxError` caused by a non-printable character accidentally introduced in a previous modification. - Adds comprehensive unit tests for the rate limiting and retry logic in `test_slack_history.py`. 2. GitHub Connector (`github_connector.py`): - Modifies `get_user_repositories` to fetch all repositories accessible by you (including organization repositories) by changing the API call parameter from `type='owner'` to `type='all'`. - Adds unit tests in `test_github_connector.py` to verify this change and other connector functionalities.
Here's a rundown of what I did: Fix: Robust Slack rate limiting, error handling & GitHub org repos This update delivers comprehensive improvements to Slack connector stability and enhances the GitHub connector. **Slack Connector (`slack_history.py`, `connectors_indexing_tasks.py`):** - I've implemented proactive delays (1.2s for `conversations.history`, 3s for `conversations.list` pagination) and `Retry-After` header handling for 429 rate limit errors across `conversations.list`, `conversations.history`, and `users.info` API calls. - I'll now gracefully handle `not_in_channel` errors when fetching conversation history by logging a warning and skipping the channel. - I've refactored channel info fetching: `get_all_channels` now returns richer channel data (including `is_member`, `is_private`). - I've removed direct calls to `conversations.info` from `connectors_indexing_tasks.py`, using the richer data from `get_all_channels` instead, to prevent associated rate limits. - I corrected a `SyntaxError` (non-printable character) in `slack_history.py`. - I've enhanced logging for rate limit actions, delays, and errors. - I've updated unit tests in `test_slack_history.py` to cover all new logic. **GitHub Connector (`github_connector.py`):** - I've modified `get_user_repositories` to fetch all repositories accessible by you (owned, collaborated, organization) by changing the API call parameter from `type='owner'` to `type='all'`. - I've included unit tests in `test_github_connector.py` for this change.
This commit addresses recurring `SyntaxError: invalid non-printable character U+001B` errors in `surfsense_backend/app/connectors/slack_history.py`. The file was cleaned to remove all occurrences of the U+001B (ESCAPE) character. This ensures that previously introduced problematic control characters are fully removed, allowing the application to parse and load the module correctly.
@google-labs-jules[bot] is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughThe updates broaden GitHub repository fetching to include all accessible repositories, enhance Slack API integration with robust rate limit and error handling, and update Slack channel data structures. Associated test suites for both connectors are added, and Slack channel iteration is refactored to use richer channel objects and simplified membership checks. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant GitHubConnector
participant GitHubAPI
User->>GitHubConnector: get_user_repositories()
GitHubConnector->>GitHubAPI: List repositories (type='all', sort='updated')
GitHubAPI-->>GitHubConnector: Return all accessible repositories
GitHubConnector-->>User: Return list of repository dicts
sequenceDiagram
participant SlackHistory
participant SlackAPI
participant Logger
SlackHistory->>SlackAPI: conversations.list (with cursor)
alt Rate limit (429)
SlackAPI-->>SlackHistory: 429 error with Retry-After
SlackHistory->>Logger: Log rate limit
SlackHistory->>SlackAPI: Retry after delay
else Success
SlackAPI-->>SlackHistory: Channel data page
SlackHistory->>SlackAPI: (repeat if next_cursor)
end
SlackHistory-->>Caller: List of channel dicts
sequenceDiagram
participant SlackHistory
participant SlackAPI
participant Logger
SlackHistory->>SlackAPI: conversations.history (with cursor)
alt Rate limit (429)
SlackAPI-->>SlackHistory: 429 error with Retry-After
SlackHistory->>Logger: Log rate limit
SlackHistory->>SlackAPI: Retry after delay
else Not in channel
SlackAPI-->>SlackHistory: not_in_channel error
SlackHistory->>Logger: Log warning
SlackHistory-->>Caller: []
else Success
SlackAPI-->>SlackHistory: Message data page
SlackHistory->>SlackAPI: (repeat if next_cursor)
end
SlackHistory-->>Caller: List of messages
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
kwargs["cursor"] = next_cursor | ||
|
||
result = self.client.conversations_history(**kwargs) | ||
current_api_call_successful = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result variable is explicitly set to None on line 169 after current_api_call_successful is set to False, but the code attempts to access result['messages'] on line 192 without checking if result is None, which could cause a NoneType error.
React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)
|
||
# Process each channel | ||
for channel_name, channel_id in channels.items(): | ||
for channel_obj in channels: # Modified loop to iterate over list of channel objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code lacks validation for the expected dictionary keys ('id', 'name', 'is_private', 'is_member') in the channel_obj. If any of these keys are missing, the code will raise KeyError exceptions. The previous version didn't require these checks as it was using a simpler key-value structure. Should add validation or use .get() with default values.
React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)
😱 Found 2 issues. Time to roll up your sleeves! 😱 |
@MODSetter will this fix be a good fit to the codebase? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
surfsense_backend/app/connectors/test_github_connector.py (2)
2-2
: Remove unused import.The
call
import fromunittest.mock
is not used anywhere in the test file.Apply this diff to remove the unused import:
-from unittest.mock import patch, Mock, call +from unittest.mock import patch, Mock🧰 Tools
🪛 Ruff (0.11.9)
2-2:
unittest.mock.call
imported but unusedRemove unused import:
unittest.mock.call
(F401)
7-7
: Fix typo in comment.There's a typo in the comment: "surfsend_backend" should be "surfsense_backend".
Apply this diff to fix the typo:
-# Assuming surfsend_backend/app/connectors/test_github_connector.py +# Assuming surfsense_backend/app/connectors/test_github_connector.pysurfsense_backend/app/connectors/slack_history.py (3)
13-14
: Remove unused imports.The imports
timedelta
andUnion
are not used in this file.-from datetime import datetime, timedelta +from datetime import datetime -from typing import Dict, List, Optional, Tuple, Any, Union +from typing import Dict, List, Optional, Tuple, Any🧰 Tools
🪛 Ruff (0.11.9)
13-13:
datetime.timedelta
imported but unusedRemove unused import:
datetime.timedelta
(F401)
14-14:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
153-189
: Good rate limit handling, but consider simplifying the nested try-except structure.The proactive delay and rate limit handling are well-implemented. However, the nested try-except blocks make the code harder to follow.
Consider extracting the API call with retry logic into a separate helper method to improve readability:
def _call_conversations_history_with_retry(self, **kwargs): """Helper method to call conversations.history with rate limit retry.""" while True: time.sleep(1.2) # Proactive delay for Tier 3 try: return self.client.conversations_history(**kwargs) except SlackApiError as e: if e.response and e.response.status_code == 429: retry_after = e.response.headers.get('Retry-After', '60') wait_time = int(retry_after) if retry_after.isdigit() else 60 logger.warning(f"Rate limited on conversations.history. Retrying after {wait_time} seconds.") time.sleep(wait_time) continue raise
115-115
: Use exception chaining for better error traceability.When re-raising exceptions within except blocks, use
raise ... from
to maintain the exception chain.For line 115:
-raise SlackApiError(f"Error retrieving channels: {e}", e.response) +raise SlackApiError(f"Error retrieving channels: {e}", e.response) from eFor line 119:
-raise RuntimeError(f"An unexpected error occurred during channel fetching: {general_error}") +raise RuntimeError(f"An unexpected error occurred during channel fetching: {general_error}") from general_errorFor line 211:
-raise SlackApiError(f"Error retrieving history for channel {channel_id}: {e}", e.response) +raise SlackApiError(f"Error retrieving history for channel {channel_id}: {e}", e.response) from eFor line 316:
-raise SlackApiError(f"Error retrieving user info for {user_id}: {e_user_info}", e_user_info.response) +raise SlackApiError(f"Error retrieving user info for {user_id}: {e_user_info}", e_user_info.response) from e_user_infoAlso applies to: 119-119, 211-211, 316-316
🧰 Tools
🪛 Ruff (0.11.9)
115-115: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
surfsense_backend/app/connectors/test_slack_history.py (1)
2-2
: Remove unused import.The
time
module is imported but not used in this test file.-import time # Imported to be available for patching target module
🧰 Tools
🪛 Ruff (0.11.9)
2-2:
time
imported but unusedRemove unused import:
time
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
surfsense_backend/app/connectors/github_connector.py
(1 hunks)surfsense_backend/app/connectors/slack_history.py
(4 hunks)surfsense_backend/app/connectors/test_github_connector.py
(1 hunks)surfsense_backend/app/connectors/test_slack_history.py
(1 hunks)surfsense_backend/app/tasks/connectors_indexing_tasks.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
surfsense_backend/app/connectors/test_slack_history.py
2-2: time
imported but unused
Remove unused import: time
(F401)
surfsense_backend/app/connectors/slack_history.py
13-13: datetime.timedelta
imported but unused
Remove unused import: datetime.timedelta
(F401)
14-14: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
115-115: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
119-119: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
211-211: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
316-316: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
surfsense_backend/app/connectors/test_github_connector.py
2-2: unittest.mock.call
imported but unused
Remove unused import: unittest.mock.call
(F401)
🔇 Additional comments (9)
surfsense_backend/app/connectors/github_connector.py (1)
83-83
:✅ Verification successful
Verify the impact of expanding repository scope.
The change from
type='owner'
totype='all'
is a significant functional modification that will now include collaborative repositories in addition to owned ones. This could potentially return many more repositories and impact performance or existing functionality.Please run the following script to verify how this method is used across the codebase:
🏁 Script executed:
#!/bin/bash # Description: Find all usages of get_user_repositories to assess impact of scope change # Search for direct method calls rg -A 5 -B 2 "get_user_repositories" # Search for any code that might assume only owned repositories rg -A 3 -B 3 "(owner.*repo|owned.*repo|user.*own)"Length of output: 24541
Change to
type='all'
validated—no further action required.
get_user_repositories()
is only consumed insurfsense_backend/app/routes/search_source_connectors_routes.py
, where the full list is returned without owner-specific assumptions.- Tests in
surfsense_backend/app/connectors/test_github_connector.py
already asserttype='all'
, confirming the intended behavior.surfsense_backend/app/connectors/test_github_connector.py (4)
14-65
: Excellent test coverage for the main functionality.This test method comprehensively verifies that the
get_user_repositories
method correctly calls the GitHub API withtype='all'
andsort='updated'
parameters, and properly structures the returned data. The test properly mocks the GitHub client and validates both the API call parameters and the returned repository structure.
68-95
: Good edge case handling verification.This test properly verifies that the method handles edge cases where repository descriptions are
None
andupdated_at
isNone
, ensuring they are converted to empty strings and preserved asNone
respectively. This is important for robust error handling.
98-132
: Comprehensive authentication failure testing.These test methods thoroughly verify that the connector properly handles authentication failures with both
ForbiddenError
andAuthenticationFailed
exceptions, ensuring that meaningful error messages are provided to users. The mock setup is appropriate and the assertions validate the expected behavior.
135-150
: Good API exception handling verification.This test ensures that when the GitHub API throws an exception during repository fetching, the method gracefully handles it by logging the error and returning an empty list instead of crashing. This is good defensive programming practice.
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)
117-131
: LGTM! Simplified channel membership verification.The refactoring to use channel objects with membership flags directly is a good improvement. It eliminates redundant API calls and simplifies the logic while maintaining the same functionality.
surfsense_backend/app/connectors/slack_history.py (2)
40-121
: Excellent rate limit handling implementation!The refactored
get_all_channels
method now provides:
- Proper pagination with delays between requests
- Robust rate limit handling with Retry-After header support
- Richer channel data structure with membership information
- Comprehensive error handling and logging
This implementation aligns well with Slack API best practices.
🧰 Tools
🪛 Ruff (0.11.9)
115-115: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
119-119: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
296-319
: Well-implemented rate limit handling for user info retrieval.The retry logic with Retry-After header support is consistent with the other methods and follows best practices.
🧰 Tools
🪛 Ruff (0.11.9)
316-316: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
surfsense_backend/app/connectors/test_slack_history.py (1)
9-421
: Excellent test coverage for rate limit handling!The test suite provides comprehensive coverage of:
- Pagination with delays
- Rate limit handling with various Retry-After scenarios
- Error propagation for different error types
- Logging behavior verification
- Edge cases like malformed channel data
This thorough testing ensures the robustness of the rate limiting implementation.
Hey @fblgit thanks for this. Will review & merge by EOD 👍 |
Looks good to me. Thanks @fblgit |
Fix/slack rate limiting & Github Repos ORG Filtering
Fix/slack rate limiting & Github Repos ORG Filtering
Description
Motivation and Context
The project is nice, and this will allow large slack workspaces to be indexed properly with retry/ratelimit into consideration.
Also this addresses the support for org-based github repositories.
Types of changes
Testing
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Tests