-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Optimize aggregate/clicks cron with batching logic #3148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Rate limit exceeded@steven-tey has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 23 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughConsolidates GET/POST into a shared cron handler for aggregate-clicks, adds Qstash verification and chained batched processing, replaces reward-based joins with direct link queries and analytics-driven click counts, creates commissions in bulk, and adds a new script to export partners' country-change summaries. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant H as Cron Handler (GET/POST)
participant V as Verifier (Vercel/Qstash)
participant P as Prisma
participant A as Analytics
participant Q as Qstash
rect `#f0f8ff`
C->>H: GET or POST request
H->>V: Verify signature (Vercel for GET / Qstash for POST)
V-->>H: Valid
end
rect `#ffffff`
H->>P: Fetch links with programEnrollment.clickReward (batch)
P-->>H: linksWithClickRewards
H->>A: getAnalytics for links (measure timing)
A-->>H: linkClicksMap
H->>P: Build commissions array -> createMany
P-->>H: Created
H->>P: syncTotalCommissions (per partner/program)
P-->>H: Synced
end
alt more records (batch full)
H->>Q: Publish next batch (startingAfter, batchNumber+1)
Q-->>H: Accepted
H-->>C: 202 Accepted (batch chained)
else final batch or no results
H-->>C: 200 Final endMessage
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
apps/web/scripts/perplexity/partners-updated-countries.ts (2)
46-65: Potential undefined access on empty array.Line 52 accesses
[0]on the filtered/sorted array without checking if it's empty. While the code handles this gracefully with optional chaining on lines 60-63, the pattern could be clearer.Consider using a more explicit pattern:
- const finalCountryChange = changeHistoryLog - .filter((ch) => ch.field === "country") // filter by country field - .sort((a, b) => b.changedAt.getTime() - a.changedAt.getTime())[0]; // sort by changedAt descending + const countryChanges = changeHistoryLog + .filter((ch) => ch.field === "country") + .sort((a, b) => b.changedAt.getTime() - a.changedAt.getTime()); + const finalCountryChange = countryChanges.length > 0 ? countryChanges[0] : undefined;
69-69: Consider parameterizing the output filename.The hardcoded
"output.csv"filename could overwrite existing files. Consider making it configurable or adding a timestamp.- fs.writeFileSync("output.csv", Papa.unparse(finalPartners)); + const outputFile = process.env.OUTPUT_FILE || `output-${programId}-${Date.now()}.csv`; + fs.writeFileSync(outputFile, Papa.unparse(finalPartners)); + console.log(`Output written to ${outputFile}`);apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (1)
31-34: Remove redundant default initialization.The schema already defines default values, making this initialization redundant. The GET path will use schema defaults, and POST will override them from the body.
- let { startingAfter, batchNumber } = schema.parse({ - startingAfter: undefined, - batchNumber: 1, - }); + let startingAfter: string | undefined; + let batchNumber = 1;Then update line 45 to:
- ({ startingAfter, batchNumber } = schema.parse(JSON.parse(rawBody))); + const parsed = schema.parse(JSON.parse(rawBody)); + startingAfter = parsed.startingAfter; + batchNumber = parsed.batchNumber;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts(4 hunks)apps/web/scripts/perplexity/partners-updated-countries.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T17:44:03.965Z
Learnt from: TWilson023
Repo: dubinc/dub PR: 2857
File: apps/web/lib/actions/partners/update-program.ts:96-0
Timestamp: 2025-09-17T17:44:03.965Z
Learning: In apps/web/lib/actions/partners/update-program.ts, the team prefers to keep the messagingEnabledAt update logic simple by allowing client-provided timestamps rather than implementing server-controlled timestamp logic to avoid added complexity.
Applied to files:
apps/web/scripts/perplexity/partners-updated-countries.ts
📚 Learning: 2025-06-06T07:59:03.120Z
Learnt from: devkiran
Repo: dubinc/dub PR: 2177
File: apps/web/lib/api/links/bulk-create-links.ts:66-84
Timestamp: 2025-06-06T07:59:03.120Z
Learning: In apps/web/lib/api/links/bulk-create-links.ts, the team accepts the risk of potential undefined results from links.find() operations when building invalidLinks arrays, because existing links are fetched from the database based on the input links, so matches are expected to always exist.
Applied to files:
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts
🧬 Code graph analysis (2)
apps/web/scripts/perplexity/partners-updated-countries.ts (2)
packages/prisma/index.ts (1)
prisma(3-9)apps/web/lib/zod/schemas/partner-profile.ts (1)
partnerProfileChangeHistoryLogSchema(181-196)
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4)
apps/web/lib/cron/verify-vercel.ts (1)
verifyVercelSignature(3-20)apps/web/lib/zod/schemas/analytics-response.ts (1)
analyticsResponse(30-578)apps/web/lib/analytics/get-analytics.ts (1)
getAnalytics(20-336)apps/web/lib/api/errors.ts (1)
handleAndReturnErrorResponse(162-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (8)
apps/web/scripts/perplexity/partners-updated-countries.ts (1)
10-44: LGTM!The Prisma query is well-structured with appropriate filters for the use case. The JSON path operator usage for checking change history is correct.
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (7)
19-24: LGTM!The batch size and schema definitions are well-structured for cursor-based pagination.
60-94: LGTM!The cursor-based pagination implementation is correct. The
skip: 1when a cursor is present properly excludes the cursor record, and the filtering logic appropriately targets links with click rewards and recent activity.
102-112: LGTM!The analytics timing wrapper provides useful performance metrics, and the query parameters correctly target the filtered links for the previous day's date range.
125-150: LGTM!The defensive null checks are appropriate for runtime safety, even though the query filters should prevent most null cases. The type assertion using
satisfiescorrectly validates the filtered array type while preserving compile-time checking.
170-186: LGTM!The batch chaining logic correctly enqueues the next batch only when a full batch is retrieved, and the array access is safe within the conditional check. Using
APP_DOMAIN_WITH_NGROKensures the cron job works in both production and development environments.
29-46: LGTM!The dual-path handler design elegantly supports both Vercel cron (GET for initial trigger) and Qstash (POST for batch chaining) with appropriate signature verification for each path.
194-194: LGTM!The export pattern correctly exposes the handler for both GET and POST methods, which is the standard approach for Next.js App Router route handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (1)
96-101: Fix off‑by‑one in total link count inendMessageThe total links in
endMessagecurrently use:batchNumber * BATCH_SIZE + linksWithClickRewards.lengthFor a final partial batch, this double‑counts a full batch. E.g., with
BATCH_SIZE = 200,batchNumber = 4, andlinksWithClickRewards.length = 150, it logs950instead of750.You want “completed full batches before this one” plus the current batch size, i.e.:
- const endMessage = `Finished aggregating clicks for ${batchNumber} batches (total ${nFormatter(batchNumber * BATCH_SIZE + linksWithClickRewards.length, { full: true })} links)`; + const endMessage = `Finished aggregating clicks for ${batchNumber} batches (total ${nFormatter((batchNumber - 1) * BATCH_SIZE + linksWithClickRewards.length, { full: true })} links)`;This also keeps the zero‑result follow‑up batch case consistent (the final empty batch just reports the total from the previous full batches).
Also applies to: 171-189
🧹 Nitpick comments (4)
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4)
102-113: Commission calculation pipeline is consistent; consider tightening the filter typingThe flow from analytics →
linkClicksMap→commissionsToCreateis coherent: zero/non‑paid links are dropped, and only links with completeprogramId/partnerId/clickRewarddata produce commissions, with earnings derived viagetRewardAmount(serializeReward(...)) * linkClicks. That all looks correct.If TypeScript ever complains about
nullincommissionsToCreate, you can make the filter a proper type guard instead of relying onsatisfiesfor narrowing:- .filter((c) => c !== null) satisfies Prisma.CommissionCreateManyInput[]; + .filter( + (c): c is Prisma.CommissionCreateManyInput => c !== null, + );This gives
commissionsToCreatethe precisePrisma.CommissionCreateManyInput[]type without needingsatisfies, which also makes the subsequentfor ... ofdestructuring safer in strict mode.Also applies to: 119-152
153-169: Avoid repeatedsyncTotalCommissionscalls for the same partner/program pairRight now,
syncTotalCommissionsis called once per commission incommissionsToCreate. If multiple links in the same partner/program pair generate commissions in the same batch, this will call the sync multiple times for the same pair, which is unnecessary extra work on a cron path.You could deduplicate by partner/program before syncing, for example:
- for (const { partnerId, programId } of commissionsToCreate) { - // Sync total commissions for each partner that we created commissions for - await syncTotalCommissions({ - partnerId, - programId, - }); - } + const uniquePairs = new Map<string, { partnerId: string; programId: string }>(); + + for (const { partnerId, programId } of commissionsToCreate) { + const key = `${partnerId}:${programId}`; + if (!uniquePairs.has(key)) { + uniquePairs.set(key, { partnerId, programId }); + } + } + + for (const { partnerId, programId } of uniquePairs.values()) { + await syncTotalCommissions({ partnerId, programId }); + }This keeps behavior the same while bounding the number of sync calls per batch by the number of distinct partner/program combinations instead of number of links.
153-159: Consider dropping or gatingconsole.tablein cron
console.table(commissionsToCreate);on a cron that can process many links per day may produce noisy logs (and include partner/program/link identifiers). If this was for debugging, consider removing it or guarding behind a debug flag or environment check so production logs stay leaner.
171-187: QStash chaining looks correct; minor log message polishThe logic to detect a full batch, enqueue the next job via QStash with
startingAfterandbatchNumber + 1, and immediately return a log/response is sound and should give you robust chaining across large datasets.Tiny nit: the log string has slightly unbalanced parentheses and could be clearer:
- return logAndRespond( - `Enqueued next batch (batch #${batchNumber + 1} for aggregate clicks cron (startingAfter: ${nextStartingAfter}).`, - ); + return logAndRespond( + `Enqueued next batch (batch #${batchNumber + 1}) for aggregate clicks cron (startingAfter: ${nextStartingAfter}).`, + );Purely cosmetic, feel free to ignore if you like the current wording.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-06T07:59:03.120Z
Learnt from: devkiran
Repo: dubinc/dub PR: 2177
File: apps/web/lib/api/links/bulk-create-links.ts:66-84
Timestamp: 2025-06-06T07:59:03.120Z
Learning: In apps/web/lib/api/links/bulk-create-links.ts, the team accepts the risk of potential undefined results from links.find() operations when building invalidLinks arrays, because existing links are fetched from the database based on the input links, so matches are expected to always exist.
Applied to files:
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (3)
6-24: Unified handler + verification wiring looks solidUsing a single
handlerfor both GET (Vercel cron) and POST (QStash) with the appropriate signature verification and a small Zod schema forstartingAfter/batchNumberkeeps this route easy to reason about and type‑safe. No issues here.Also applies to: 29-47
60-94: Batching + cursor pagination over links looks correctThe
findManyquery correctly scopes to links with aprogramEnrollment.clickRewardId, positiveclicks, and a recentlastClicked, and usestake,cursor, andskiponidordered ascending, which gives stable, non‑overlapping batches for chaining. This should behave well with thestartingAftersemantics you use later.
195-195: Re-exportinghandleras both GET and POST is a nice reuse patternExporting the same
handlerfor bothGETandPOSTkeeps the routing surface minimal and avoids divergence between code paths over time. No changes needed here.
Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.