fix(sheet-redirect): fixed sub-sheet redirect issue #1001

SahilKumar000 · 2025-09-29T11:42:21Z

Description

Testing

Additional Notes

Summary by CodeRabbit

Bug Fixes
- Improved Google Sheets URL generation to display more accurate, sheet-specific links when available, ensuring users navigate directly to the correct sheet instead of the spreadsheet root.
Chores
- Updated sheet URLs for both Service Account and OAuth-connected spreadsheets to reflect the corrected link format.

coderabbitai · 2025-09-29T11:42:29Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Modifies Google Sheet URL generation logic to prioritize gid-based URLs when sheetId is available, and introduces a new comprehensive batch processing script that systematically updates Sheet URLs across Service Account and OAuth-connected users with concurrency controls, progress tracking, and error handling.

Changes

Cohort / File(s)	Summary
URL Generation Logic Update `server/integrations/google/index.ts`	Modified sheet URL construction to use gid-based Google Sheets URLs when sheetId is present; falls back to spreadsheet.webViewLink when unavailable.
Batch Sheet URL Update Script `server/scripts/updateSheetUrls.ts`	New script orchestrating URL updates for multiple users across Service Account and OAuth credentials. Includes SheetsClientManager for client caching, database-driven user and job discovery, spreadsheet enumeration, per-sheet URL computation, Vespa document updates, global and per-batch concurrency controls (3 users, 5 spreadsheets per user), progress aggregation, deduplication, graceful shutdown, and 30-minute watchdog timeout.

Sequence Diagram(s)

sequenceDiagram
    participant Main as Update Script
    participant DB as Database
    participant CM as SheetsClientManager
    participant API as Google Sheets API
    participant Vespa as Vespa Index
    
    Main->>DB: Scan for Google Drive sync jobs
    DB-->>Main: User list (Service Account + OAuth)
    
    loop Per batch of users (3 concurrent)
        Main->>CM: Initialize clients for user
        CM->>API: Create/cache JWT or OAuth2 client
        API-->>CM: Client ready
        
        Main->>API: List user spreadsheets (filtered by MIME type)
        API-->>Main: Spreadsheet list (per-user limit)
        
        loop Per spreadsheet (5 concurrent)
            Main->>API: Fetch sheet metadata
            API-->>Main: Sheets with IDs
            
            loop Per sheet in spreadsheet
                Main->>Main: Compute docId (spreadsheetId_index)
                Main->>Vespa: Fetch document
                Vespa-->>Main: Current document data
                
                alt sheetId available
                    Main->>Main: Build gid-based URL
                else
                    Main->>Main: Use spreadsheet.webViewLink
                end
                
                alt URL changed or missing
                    Main->>Vespa: Update document with new URL
                    Vespa-->>Main: Update confirmed
                    Main->>Main: Mark as updated
                else
                    Main->>Main: Mark as skipped
                end
            end
        end
        
        Main->>Main: Aggregate per-user results
        Main->>Main: Log batch summary
    end
    
    Main->>CM: Cleanup all clients
    Main->>DB: Close connections
    Main->>Main: Exit with status code

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The change to index.ts is straightforward URL logic modification, but the new updateSheetUrls.ts script introduces substantial complexity: multi-user orchestration with concurrency controls, dual authentication paths (Service Account and OAuth), Vespa integration, comprehensive error handling, resource lifecycle management, deduplication logic, and batch-aggregated progress tracking across distributed operations.

Poem

🐰 URLs now gleam with gid-based light,
A script hops through sheets with might,
OAuth and Service Accounts aligned,
Batches processed, no sheets left behind,
Vespa rejoices—URLs refined! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "fixed sub-sheet redirect issue" directly aligns with the core objective of the changeset. The primary modification in `server/integrations/google/index.ts` improves how sub-sheet URLs are constructed by using gid-based Google Sheets URLs when the sheetId is available, which directly addresses a redirect issue for individual sheets. The accompanying new script in `server/scripts/updateSheetUrls.ts` serves as infrastructure to apply this URL generation fix across the database. The title is concise, specific, and clearly conveys the main change without being vague or overly broad.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/sheet-redirect

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-09-29T11:42:35Z

Summary of Changes

Hello @SahilKumar000, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where Google Sheet links were not consistently navigating to the correct sub-sheet. It implements a more robust URL generation strategy that incorporates individual sheet IDs, ensuring precise redirection. Additionally, it provides a comprehensive migration script to update all previously ingested Google Sheet URLs to this improved format, enhancing user experience and data accuracy.

Highlights

Google Sheets URL Correction: The logic for generating Google Sheets URLs has been updated to include the specific sheetId when available. This ensures that links direct users to the exact sub-sheet within a spreadsheet, rather than just the main spreadsheet view.
Data Migration Script: A new TypeScript script (updateSheetUrls.ts) has been introduced to proactively update existing Google Sheet entries in the system (e.g., Vespa documents) with the newly formatted, sub-sheet specific URLs.
Migration Command: A new npm script, migrate:google-sheet, has been added to package.json to facilitate the execution of the URL update migration script.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes an issue with sub-sheet redirection in Google Sheets and introduces a migration script to update existing data. The changes are well-structured, particularly the migration script which includes good practices like concurrency control and batch processing. I've added a few comments to improve consistency, type safety, and remove redundancy in the new migration script.

server/scripts/updateSheetUrls.ts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

server/scripts/updateSheetUrls.ts (5)
19-19: Consider making the service account connector ID configurable

The serviceAccountConnectorId is imported from a hard-coded configuration file. This reduces flexibility and requires code changes for different environments.

Consider accepting the connector ID as an environment variable or command-line argument:
-import { serviceAccountConnectorId } from "./googleConfig"
+const serviceAccountConnectorId = process.env.SERVICE_ACCOUNT_CONNECTOR_ID 
+  ? parseInt(process.env.SERVICE_ACCOUNT_CONNECTOR_ID) 
+  : (() => { throw new Error("SERVICE_ACCOUNT_CONNECTOR_ID environment variable is required") })()
737-741: Consider making the script timeout configurable

The hard-coded 30-minute timeout might be insufficient for large organizations with many users and spreadsheets.
-const SCRIPT_TIMEOUT = 30 * 60 * 1000 // 30 minutes
+const SCRIPT_TIMEOUT = parseInt(process.env.SCRIPT_TIMEOUT_MINUTES || '30') * 60 * 1000
+Logger.info(`Script timeout set to ${SCRIPT_TIMEOUT / 60000} minutes`)
 const timeoutId = setTimeout(() => {
   Logger.error("Script timed out after 30 minutes, forcing exit...")
   process.exit(1)
 }, SCRIPT_TIMEOUT)
431-431: Make the spreadsheet processing limit configurable

The hard-coded limit of 100 spreadsheets per user might be too restrictive for users with extensive Google Drive usage.
-    const sheetPromises = googleSheets.slice(0, 100).map(spreadsheet => // Limit to 100 spreadsheets
+    const maxSpreadsheetsPerUser = parseInt(process.env.MAX_SPREADSHEETS_PER_USER || '100')
+    const sheetPromises = googleSheets.slice(0, maxSpreadsheetsPerUser).map(spreadsheet =>
718-721: Improve database connection cleanup

The current cleanup logic has defensive checks but could be more robust.
-    if (db && db.$client && typeof db.$client.end === 'function') {
-      await db.$client.end()
-      Logger.info("Database connection closed")
-    }
+    try {
+      // Drizzle ORM typically uses $client for the underlying connection
+      if (db?.$client) {
+        if (typeof db.$client.end === 'function') {
+          await db.$client.end()
+        } else if (typeof db.$client.close === 'function') {
+          await db.$client.close()
+        } else if (typeof db.$client.destroy === 'function') {
+          await db.$client.destroy()
+        }
+        Logger.info("Database connection closed")
+      }
+    } catch (dbError) {
+      Logger.warn({ error: dbError }, "Failed to close database connection gracefully")
+    }
494-494: Monitor memory usage for large-scale migrations

The processedDocIds Map stores all processed document IDs in memory. For organizations with hundreds of thousands of sheets, this could consume significant memory.

Consider implementing a size limit or using a more memory-efficient deduplication strategy:
// Add after line 494
const MAX_PROCESSED_DOCS = parseInt(process.env.MAX_PROCESSED_DOCS || '1000000')
if (processedDocIds.size > MAX_PROCESSED_DOCS) {
  Logger.warn(`Processed documents map size (${processedDocIds.size}) exceeds limit. Consider running in smaller batches.`)
}
Would you like me to implement a disk-based or database-backed deduplication strategy for handling very large datasets?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe95f0b and 0f46dab.

📒 Files selected for processing (3)

server/integrations/google/index.ts (1 hunks)
server/package.json (1 hunks)
server/scripts/updateSheetUrls.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

server/scripts/updateSheetUrls.ts (10)

server/logger/index.ts (2)

getLogger (36-93)

Subsystem (15-15)

server/types.ts (3)

TxnOrClient (302-302)

GoogleServiceAccount (442-445)

OAuthCredentials (304-310)

server/db/schema/syncJobs.ts (1)

syncJobs (27-59)

server/db/connector.ts (3)

getConnector (137-158)

getConnectorByExternalId (302-336)

getOAuthConnectorWithCredentials (178-300)

server/scripts/googleConfig.ts (1)

serviceAccountConnectorId (1-1)

server/db/schema/connectors.ts (1)

SelectConnector (146-146)

server/db/syncJob.ts (1)

getAppSyncJobsByEmail (47-63)

server/db/oauthProvider.ts (1)

getOAuthProviderByConnectorId (53-67)

server/integrations/google/index.ts (2)

listFiles (2966-3016)

getSpreadsheet (2150-2181)

server/search/vespa.ts (2)

getDocumentOrNull (65-65)

UpdateDocument (66-66)

🔇 Additional comments (2)

server/package.json (1)

34-34: LGTM!

The new migration script follows the established naming convention and is correctly integrated into the package.json scripts section.

server/integrations/google/index.ts (1)

2297-2299: Good improvement for sheet-specific URLs!

The updated URL construction correctly uses the gid parameter to link directly to specific sheets within a spreadsheet when available, while maintaining backward compatibility with the fallback to webViewLink.

…redirect

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f46dab and fb3a7f5.

📒 Files selected for processing (2)

server/integrations/google/index.ts (1 hunks)
server/scripts/updateSheetUrls.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

server/scripts/updateSheetUrls.ts

server/integrations/google/index.ts

## [3.19.1](v3.19.0...v3.19.1) (2025-10-22) ### Bug Fixes * **sheet-redirect:** fixed sub-sheet redirect issue ([#1001](#1001)) ([6747ea5](6747ea5))

fix(sheet-redirect): fixed sub-sheet redirect issue

0f46dab

SahilKumar000 requested review from devesh-juspay, junaid-shirur, kalpadhwaryu, shivamashtikar and zereraz as code owners September 29, 2025 11:42

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

server/scripts/updateSheetUrls.ts Outdated Show resolved Hide resolved

server/scripts/updateSheetUrls.ts Outdated Show resolved Hide resolved

server/scripts/updateSheetUrls.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

SahilKumar000 added 2 commits September 29, 2025 17:57

fix(sheet-redirect): quick fixes

fcdaf1f

Merge branch 'main' of https://github.com/xynehq/xyne into fix/sheet-…

fb3a7f5

…redirect

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

server/integrations/google/index.ts Show resolved Hide resolved

junaid-shirur approved these changes Oct 22, 2025

View reviewed changes

junaid-shirur merged commit 6747ea5 into main Oct 22, 2025
4 checks passed

junaid-shirur deleted the fix/sheet-redirect branch October 22, 2025 19:14

MayankBansal2004 pushed a commit that referenced this pull request Oct 22, 2025

chore(release): 3.19.1 [skip ci]

5442a18

## [3.19.1](v3.19.0...v3.19.1) (2025-10-22) ### Bug Fixes * **sheet-redirect:** fixed sub-sheet redirect issue ([#1001](#1001)) ([6747ea5](6747ea5))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sheet-redirect): fixed sub-sheet redirect issue #1001

fix(sheet-redirect): fixed sub-sheet redirect issue #1001

Uh oh!

SahilKumar000 commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot commented Sep 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(sheet-redirect): fixed sub-sheet redirect issue #1001

fix(sheet-redirect): fixed sub-sheet redirect issue #1001

Uh oh!

Conversation

SahilKumar000 commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Sep 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SahilKumar000 commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading