-
Notifications
You must be signed in to change notification settings - Fork 57
fix(sheet-redirect): fixed sub-sheet redirect issue #1001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughModifies Google Sheet URL generation logic to prioritize gid-based URLs when sheetId is available, and introduces a new comprehensive batch processing script that systematically updates Sheet URLs across Service Account and OAuth-connected users with concurrency controls, progress tracking, and error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Main as Update Script
participant DB as Database
participant CM as SheetsClientManager
participant API as Google Sheets API
participant Vespa as Vespa Index
Main->>DB: Scan for Google Drive sync jobs
DB-->>Main: User list (Service Account + OAuth)
loop Per batch of users (3 concurrent)
Main->>CM: Initialize clients for user
CM->>API: Create/cache JWT or OAuth2 client
API-->>CM: Client ready
Main->>API: List user spreadsheets (filtered by MIME type)
API-->>Main: Spreadsheet list (per-user limit)
loop Per spreadsheet (5 concurrent)
Main->>API: Fetch sheet metadata
API-->>Main: Sheets with IDs
loop Per sheet in spreadsheet
Main->>Main: Compute docId (spreadsheetId_index)
Main->>Vespa: Fetch document
Vespa-->>Main: Current document data
alt sheetId available
Main->>Main: Build gid-based URL
else
Main->>Main: Use spreadsheet.webViewLink
end
alt URL changed or missing
Main->>Vespa: Update document with new URL
Vespa-->>Main: Update confirmed
Main->>Main: Mark as updated
else
Main->>Main: Mark as skipped
end
end
end
Main->>Main: Aggregate per-user results
Main->>Main: Log batch summary
end
Main->>CM: Cleanup all clients
Main->>DB: Close connections
Main->>Main: Exit with status code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The change to Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @SahilKumar000, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where Google Sheet links were not consistently navigating to the correct sub-sheet. It implements a more robust URL generation strategy that incorporates individual sheet IDs, ensuring precise redirection. Additionally, it provides a comprehensive migration script to update all previously ingested Google Sheet URLs to this improved format, enhancing user experience and data accuracy. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an issue with sub-sheet redirection in Google Sheets and introduces a migration script to update existing data. The changes are well-structured, particularly the migration script which includes good practices like concurrency control and batch processing. I've added a few comments to improve consistency, type safety, and remove redundancy in the new migration script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
server/scripts/updateSheetUrls.ts (5)
19-19: Consider making the service account connector ID configurableThe
serviceAccountConnectorIdis imported from a hard-coded configuration file. This reduces flexibility and requires code changes for different environments.Consider accepting the connector ID as an environment variable or command-line argument:
-import { serviceAccountConnectorId } from "./googleConfig" +const serviceAccountConnectorId = process.env.SERVICE_ACCOUNT_CONNECTOR_ID + ? parseInt(process.env.SERVICE_ACCOUNT_CONNECTOR_ID) + : (() => { throw new Error("SERVICE_ACCOUNT_CONNECTOR_ID environment variable is required") })()
737-741: Consider making the script timeout configurableThe hard-coded 30-minute timeout might be insufficient for large organizations with many users and spreadsheets.
-const SCRIPT_TIMEOUT = 30 * 60 * 1000 // 30 minutes +const SCRIPT_TIMEOUT = parseInt(process.env.SCRIPT_TIMEOUT_MINUTES || '30') * 60 * 1000 +Logger.info(`Script timeout set to ${SCRIPT_TIMEOUT / 60000} minutes`) const timeoutId = setTimeout(() => { Logger.error("Script timed out after 30 minutes, forcing exit...") process.exit(1) }, SCRIPT_TIMEOUT)
431-431: Make the spreadsheet processing limit configurableThe hard-coded limit of 100 spreadsheets per user might be too restrictive for users with extensive Google Drive usage.
- const sheetPromises = googleSheets.slice(0, 100).map(spreadsheet => // Limit to 100 spreadsheets + const maxSpreadsheetsPerUser = parseInt(process.env.MAX_SPREADSHEETS_PER_USER || '100') + const sheetPromises = googleSheets.slice(0, maxSpreadsheetsPerUser).map(spreadsheet =>
718-721: Improve database connection cleanupThe current cleanup logic has defensive checks but could be more robust.
- if (db && db.$client && typeof db.$client.end === 'function') { - await db.$client.end() - Logger.info("Database connection closed") - } + try { + // Drizzle ORM typically uses $client for the underlying connection + if (db?.$client) { + if (typeof db.$client.end === 'function') { + await db.$client.end() + } else if (typeof db.$client.close === 'function') { + await db.$client.close() + } else if (typeof db.$client.destroy === 'function') { + await db.$client.destroy() + } + Logger.info("Database connection closed") + } + } catch (dbError) { + Logger.warn({ error: dbError }, "Failed to close database connection gracefully") + }
494-494: Monitor memory usage for large-scale migrationsThe
processedDocIdsMap stores all processed document IDs in memory. For organizations with hundreds of thousands of sheets, this could consume significant memory.Consider implementing a size limit or using a more memory-efficient deduplication strategy:
// Add after line 494 const MAX_PROCESSED_DOCS = parseInt(process.env.MAX_PROCESSED_DOCS || '1000000') if (processedDocIds.size > MAX_PROCESSED_DOCS) { Logger.warn(`Processed documents map size (${processedDocIds.size}) exceeds limit. Consider running in smaller batches.`) }Would you like me to implement a disk-based or database-backed deduplication strategy for handling very large datasets?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/integrations/google/index.ts(1 hunks)server/package.json(1 hunks)server/scripts/updateSheetUrls.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/scripts/updateSheetUrls.ts (10)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)server/types.ts (3)
TxnOrClient(302-302)GoogleServiceAccount(442-445)OAuthCredentials(304-310)server/db/schema/syncJobs.ts (1)
syncJobs(27-59)server/db/connector.ts (3)
getConnector(137-158)getConnectorByExternalId(302-336)getOAuthConnectorWithCredentials(178-300)server/scripts/googleConfig.ts (1)
serviceAccountConnectorId(1-1)server/db/schema/connectors.ts (1)
SelectConnector(146-146)server/db/syncJob.ts (1)
getAppSyncJobsByEmail(47-63)server/db/oauthProvider.ts (1)
getOAuthProviderByConnectorId(53-67)server/integrations/google/index.ts (2)
listFiles(2966-3016)getSpreadsheet(2150-2181)server/search/vespa.ts (2)
getDocumentOrNull(65-65)UpdateDocument(66-66)
🔇 Additional comments (2)
server/package.json (1)
34-34: LGTM!The new migration script follows the established naming convention and is correctly integrated into the package.json scripts section.
server/integrations/google/index.ts (1)
2297-2299: Good improvement for sheet-specific URLs!The updated URL construction correctly uses the
gidparameter to link directly to specific sheets within a spreadsheet when available, while maintaining backward compatibility with the fallback towebViewLink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/integrations/google/index.ts(1 hunks)server/scripts/updateSheetUrls.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/scripts/updateSheetUrls.ts
## [3.19.1](v3.19.0...v3.19.1) (2025-10-22) ### Bug Fixes * **sheet-redirect:** fixed sub-sheet redirect issue ([#1001](#1001)) ([6747ea5](6747ea5))
Description
Testing
Additional Notes
Summary by CodeRabbit
Bug Fixes
Chores