Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@SahilKumar000
Copy link
Contributor

@SahilKumar000 SahilKumar000 commented Sep 29, 2025

Description

Testing

Additional Notes

Summary by CodeRabbit

  • Bug Fixes

    • Improved Google Sheets URL generation to display more accurate, sheet-specific links when available, ensuring users navigate directly to the correct sheet instead of the spreadsheet root.
  • Chores

    • Updated sheet URLs for both Service Account and OAuth-connected spreadsheets to reflect the corrected link format.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 29, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Modifies Google Sheet URL generation logic to prioritize gid-based URLs when sheetId is available, and introduces a new comprehensive batch processing script that systematically updates Sheet URLs across Service Account and OAuth-connected users with concurrency controls, progress tracking, and error handling.

Changes

Cohort / File(s) Summary
URL Generation Logic Update
server/integrations/google/index.ts
Modified sheet URL construction to use gid-based Google Sheets URLs when sheetId is present; falls back to spreadsheet.webViewLink when unavailable.
Batch Sheet URL Update Script
server/scripts/updateSheetUrls.ts
New script orchestrating URL updates for multiple users across Service Account and OAuth credentials. Includes SheetsClientManager for client caching, database-driven user and job discovery, spreadsheet enumeration, per-sheet URL computation, Vespa document updates, global and per-batch concurrency controls (3 users, 5 spreadsheets per user), progress aggregation, deduplication, graceful shutdown, and 30-minute watchdog timeout.

Sequence Diagram(s)

sequenceDiagram
    participant Main as Update Script
    participant DB as Database
    participant CM as SheetsClientManager
    participant API as Google Sheets API
    participant Vespa as Vespa Index
    
    Main->>DB: Scan for Google Drive sync jobs
    DB-->>Main: User list (Service Account + OAuth)
    
    loop Per batch of users (3 concurrent)
        Main->>CM: Initialize clients for user
        CM->>API: Create/cache JWT or OAuth2 client
        API-->>CM: Client ready
        
        Main->>API: List user spreadsheets (filtered by MIME type)
        API-->>Main: Spreadsheet list (per-user limit)
        
        loop Per spreadsheet (5 concurrent)
            Main->>API: Fetch sheet metadata
            API-->>Main: Sheets with IDs
            
            loop Per sheet in spreadsheet
                Main->>Main: Compute docId (spreadsheetId_index)
                Main->>Vespa: Fetch document
                Vespa-->>Main: Current document data
                
                alt sheetId available
                    Main->>Main: Build gid-based URL
                else
                    Main->>Main: Use spreadsheet.webViewLink
                end
                
                alt URL changed or missing
                    Main->>Vespa: Update document with new URL
                    Vespa-->>Main: Update confirmed
                    Main->>Main: Mark as updated
                else
                    Main->>Main: Mark as skipped
                end
            end
        end
        
        Main->>Main: Aggregate per-user results
        Main->>Main: Log batch summary
    end
    
    Main->>CM: Cleanup all clients
    Main->>DB: Close connections
    Main->>Main: Exit with status code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The change to index.ts is straightforward URL logic modification, but the new updateSheetUrls.ts script introduces substantial complexity: multi-user orchestration with concurrency controls, dual authentication paths (Service Account and OAuth), Vespa integration, comprehensive error handling, resource lifecycle management, deduplication logic, and batch-aggregated progress tracking across distributed operations.

Poem

🐰 URLs now gleam with gid-based light,
A script hops through sheets with might,
OAuth and Service Accounts aligned,
Batches processed, no sheets left behind,
Vespa rejoices—URLs refined! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "fixed sub-sheet redirect issue" directly aligns with the core objective of the changeset. The primary modification in server/integrations/google/index.ts improves how sub-sheet URLs are constructed by using gid-based Google Sheets URLs when the sheetId is available, which directly addresses a redirect issue for individual sheets. The accompanying new script in server/scripts/updateSheetUrls.ts serves as infrastructure to apply this URL generation fix across the database. The title is concise, specific, and clearly conveys the main change without being vague or overly broad.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/sheet-redirect

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @SahilKumar000, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where Google Sheet links were not consistently navigating to the correct sub-sheet. It implements a more robust URL generation strategy that incorporates individual sheet IDs, ensuring precise redirection. Additionally, it provides a comprehensive migration script to update all previously ingested Google Sheet URLs to this improved format, enhancing user experience and data accuracy.

Highlights

  • Google Sheets URL Correction: The logic for generating Google Sheets URLs has been updated to include the specific sheetId when available. This ensures that links direct users to the exact sub-sheet within a spreadsheet, rather than just the main spreadsheet view.
  • Data Migration Script: A new TypeScript script (updateSheetUrls.ts) has been introduced to proactively update existing Google Sheet entries in the system (e.g., Vespa documents) with the newly formatted, sub-sheet specific URLs.
  • Migration Command: A new npm script, migrate:google-sheet, has been added to package.json to facilitate the execution of the URL update migration script.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with sub-sheet redirection in Google Sheets and introduces a migration script to update existing data. The changes are well-structured, particularly the migration script which includes good practices like concurrency control and batch processing. I've added a few comments to improve consistency, type safety, and remove redundancy in the new migration script.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
server/scripts/updateSheetUrls.ts (5)

19-19: Consider making the service account connector ID configurable

The serviceAccountConnectorId is imported from a hard-coded configuration file. This reduces flexibility and requires code changes for different environments.

Consider accepting the connector ID as an environment variable or command-line argument:

-import { serviceAccountConnectorId } from "./googleConfig"
+const serviceAccountConnectorId = process.env.SERVICE_ACCOUNT_CONNECTOR_ID 
+  ? parseInt(process.env.SERVICE_ACCOUNT_CONNECTOR_ID) 
+  : (() => { throw new Error("SERVICE_ACCOUNT_CONNECTOR_ID environment variable is required") })()

737-741: Consider making the script timeout configurable

The hard-coded 30-minute timeout might be insufficient for large organizations with many users and spreadsheets.

-const SCRIPT_TIMEOUT = 30 * 60 * 1000 // 30 minutes
+const SCRIPT_TIMEOUT = parseInt(process.env.SCRIPT_TIMEOUT_MINUTES || '30') * 60 * 1000
+Logger.info(`Script timeout set to ${SCRIPT_TIMEOUT / 60000} minutes`)
 const timeoutId = setTimeout(() => {
   Logger.error("Script timed out after 30 minutes, forcing exit...")
   process.exit(1)
 }, SCRIPT_TIMEOUT)

431-431: Make the spreadsheet processing limit configurable

The hard-coded limit of 100 spreadsheets per user might be too restrictive for users with extensive Google Drive usage.

-    const sheetPromises = googleSheets.slice(0, 100).map(spreadsheet => // Limit to 100 spreadsheets
+    const maxSpreadsheetsPerUser = parseInt(process.env.MAX_SPREADSHEETS_PER_USER || '100')
+    const sheetPromises = googleSheets.slice(0, maxSpreadsheetsPerUser).map(spreadsheet =>

718-721: Improve database connection cleanup

The current cleanup logic has defensive checks but could be more robust.

-    if (db && db.$client && typeof db.$client.end === 'function') {
-      await db.$client.end()
-      Logger.info("Database connection closed")
-    }
+    try {
+      // Drizzle ORM typically uses $client for the underlying connection
+      if (db?.$client) {
+        if (typeof db.$client.end === 'function') {
+          await db.$client.end()
+        } else if (typeof db.$client.close === 'function') {
+          await db.$client.close()
+        } else if (typeof db.$client.destroy === 'function') {
+          await db.$client.destroy()
+        }
+        Logger.info("Database connection closed")
+      }
+    } catch (dbError) {
+      Logger.warn({ error: dbError }, "Failed to close database connection gracefully")
+    }

494-494: Monitor memory usage for large-scale migrations

The processedDocIds Map stores all processed document IDs in memory. For organizations with hundreds of thousands of sheets, this could consume significant memory.

Consider implementing a size limit or using a more memory-efficient deduplication strategy:

// Add after line 494
const MAX_PROCESSED_DOCS = parseInt(process.env.MAX_PROCESSED_DOCS || '1000000')
if (processedDocIds.size > MAX_PROCESSED_DOCS) {
  Logger.warn(`Processed documents map size (${processedDocIds.size}) exceeds limit. Consider running in smaller batches.`)
}

Would you like me to implement a disk-based or database-backed deduplication strategy for handling very large datasets?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe95f0b and 0f46dab.

📒 Files selected for processing (3)
  • server/integrations/google/index.ts (1 hunks)
  • server/package.json (1 hunks)
  • server/scripts/updateSheetUrls.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/scripts/updateSheetUrls.ts (10)
server/logger/index.ts (2)
  • getLogger (36-93)
  • Subsystem (15-15)
server/types.ts (3)
  • TxnOrClient (302-302)
  • GoogleServiceAccount (442-445)
  • OAuthCredentials (304-310)
server/db/schema/syncJobs.ts (1)
  • syncJobs (27-59)
server/db/connector.ts (3)
  • getConnector (137-158)
  • getConnectorByExternalId (302-336)
  • getOAuthConnectorWithCredentials (178-300)
server/scripts/googleConfig.ts (1)
  • serviceAccountConnectorId (1-1)
server/db/schema/connectors.ts (1)
  • SelectConnector (146-146)
server/db/syncJob.ts (1)
  • getAppSyncJobsByEmail (47-63)
server/db/oauthProvider.ts (1)
  • getOAuthProviderByConnectorId (53-67)
server/integrations/google/index.ts (2)
  • listFiles (2966-3016)
  • getSpreadsheet (2150-2181)
server/search/vespa.ts (2)
  • getDocumentOrNull (65-65)
  • UpdateDocument (66-66)
🔇 Additional comments (2)
server/package.json (1)

34-34: LGTM!

The new migration script follows the established naming convention and is correctly integrated into the package.json scripts section.

server/integrations/google/index.ts (1)

2297-2299: Good improvement for sheet-specific URLs!

The updated URL construction correctly uses the gid parameter to link directly to specific sheets within a spreadsheet when available, while maintaining backward compatibility with the fallback to webViewLink.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f46dab and fb3a7f5.

📒 Files selected for processing (2)
  • server/integrations/google/index.ts (1 hunks)
  • server/scripts/updateSheetUrls.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/scripts/updateSheetUrls.ts

@junaid-shirur junaid-shirur merged commit 6747ea5 into main Oct 22, 2025
4 checks passed
@junaid-shirur junaid-shirur deleted the fix/sheet-redirect branch October 22, 2025 19:14
MayankBansal2004 pushed a commit that referenced this pull request Oct 22, 2025
## [3.19.1](v3.19.0...v3.19.1) (2025-10-22)

### Bug Fixes

* **sheet-redirect:** fixed sub-sheet redirect issue ([#1001](#1001)) ([6747ea5](6747ea5))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants