Optimize aggregate/clicks cron with batching logic #3148

steven-tey · 2025-11-24T05:28:35Z

Summary by CodeRabbit

Refactor
- Overhauled click aggregation to use batched processing, improved request verification, analytics timing, and more robust commission computation and batch-chaining for scalable, reliable cron runs.
Chores
- Added an administrative script to identify partners with recent country changes and export a CSV summary for reporting.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

vercel · 2025-11-24T05:28:39Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Updated (UTC)
dub	Ready	Preview	Nov 24, 2025 5:55am

coderabbitai · 2025-11-24T05:28:50Z

Warning

Rate limit exceeded

@steven-tey has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 23 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0d280ee and 69ead51.

📒 Files selected for processing (1)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4 hunks)

Walkthrough

Consolidates GET/POST into a shared cron handler for aggregate-clicks, adds Qstash verification and chained batched processing, replaces reward-based joins with direct link queries and analytics-driven click counts, creates commissions in bulk, and adds a new script to export partners' country-change summaries.

Changes

Cohort / File(s)	Summary
Cron route & batching `apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts`	New shared `handler` exported as GET and POST; GET verifies Vercel signatures, POST verifies Qstash and parses `startingAfter`/`batchNumber`; uses BATCH_SIZE (200), cursor pagination (startingAfter, order by id asc), queries links with non-null `programEnrollment.clickReward` and recent clicks, calls analytics to get per-link click counts, computes earnings from serialized `clickReward`, builds and bulk-creates commissions via Prisma, syncs total commissions per partner/program, and enqueues next batch to Qstash when more records remain.
New partner export script `apps/web/scripts/perplexity/partners-updated-countries.ts`	New script that queries partners with approved programs and country-change history, parses `changeHistoryLog` to extract latest `country` transitions, computes pending commission totals, outputs a console table and writes `output.csv` via Papa.unparse.
Manifest `package.json`	(Touched) manifest updated alongside additions (exact changes in diff).

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant H as Cron Handler (GET/POST)
    participant V as Verifier (Vercel/Qstash)
    participant P as Prisma
    participant A as Analytics
    participant Q as Qstash

    rect `#f0f8ff`
    C->>H: GET or POST request
    H->>V: Verify signature (Vercel for GET / Qstash for POST)
    V-->>H: Valid
    end

    rect `#ffffff`
    H->>P: Fetch links with programEnrollment.clickReward (batch)
    P-->>H: linksWithClickRewards
    H->>A: getAnalytics for links (measure timing)
    A-->>H: linkClicksMap
    H->>P: Build commissions array -> createMany
    P-->>H: Created
    H->>P: syncTotalCommissions (per partner/program)
    P-->>H: Synced
    end

    alt more records (batch full)
        H->>Q: Publish next batch (startingAfter, batchNumber+1)
        Q-->>H: Accepted
        H-->>C: 202 Accepted (batch chained)
    else final batch or no results
        H-->>C: 200 Final endMessage
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:
- Cursor/pagination correctness (startingAfter, ordering, skip semantics)
- Qstash publish/verify paths and signature validation
- Earnings calculation from serialized programEnrollment.clickReward
- Bulk Prisma createMany typing and upsert/syncTotalCommissions correctness
- Batch-chaining edge cases (duplicates, idempotency)

Possibly related PRs

Improve aggregate clicks cron #2725 — Overlaps edits to the same aggregate-clicks cron route and refactors commission creation into batched/aggregated flows.
Optimize aggregate-clicks #2739 — Related refactor of analytics retrieval and bulk commission creation for the same cron handler.
Sync partner link stats to ProgramEnrollment table #2956 — Touches the aggregate-clicks route and commission-sync logic that this change invokes post-creation.

Suggested reviewers

devkiran

"I'm a rabbit in the code garden, swift and bright,
I hop through batches by Qstash moonlight,
Cursor in paw, I tally each click,
Bulk commissions sprout — quick, quick, quick! 🐇✨"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Optimize aggregate/clicks cron with batching logic' directly reflects the main change: introducing batching controls (BATCH_SIZE, skip/cursor pagination) and batch-chaining semantics to the cron route handler.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

apps/web/scripts/perplexity/partners-updated-countries.ts (2)
46-65: Potential undefined access on empty array.

Line 52 accesses [0] on the filtered/sorted array without checking if it's empty. While the code handles this gracefully with optional chaining on lines 60-63, the pattern could be clearer.

Consider using a more explicit pattern:
-    const finalCountryChange = changeHistoryLog
-      .filter((ch) => ch.field === "country") // filter by country field
-      .sort((a, b) => b.changedAt.getTime() - a.changedAt.getTime())[0]; // sort by changedAt descending
+    const countryChanges = changeHistoryLog
+      .filter((ch) => ch.field === "country")
+      .sort((a, b) => b.changedAt.getTime() - a.changedAt.getTime());
+    const finalCountryChange = countryChanges.length > 0 ? countryChanges[0] : undefined;
69-69: Consider parameterizing the output filename.

The hardcoded "output.csv" filename could overwrite existing files. Consider making it configurable or adding a timestamp.
-  fs.writeFileSync("output.csv", Papa.unparse(finalPartners));
+  const outputFile = process.env.OUTPUT_FILE || `output-${programId}-${Date.now()}.csv`;
+  fs.writeFileSync(outputFile, Papa.unparse(finalPartners));
+  console.log(`Output written to ${outputFile}`);
apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (1)
31-34: Remove redundant default initialization.

The schema already defines default values, making this initialization redundant. The GET path will use schema defaults, and POST will override them from the body.
-    let { startingAfter, batchNumber } = schema.parse({
-      startingAfter: undefined,
-      batchNumber: 1,
-    });
+    let startingAfter: string | undefined;
+    let batchNumber = 1;
Then update line 45 to:
-      ({ startingAfter, batchNumber } = schema.parse(JSON.parse(rawBody)));
+      const parsed = schema.parse(JSON.parse(rawBody));
+      startingAfter = parsed.startingAfter;
+      batchNumber = parsed.batchNumber;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52305c0 and f7b2bb8.

📒 Files selected for processing (2)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4 hunks)
apps/web/scripts/perplexity/partners-updated-countries.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-17T17:44:03.965Z

Learnt from: TWilson023
Repo: dubinc/dub PR: 2857
File: apps/web/lib/actions/partners/update-program.ts:96-0
Timestamp: 2025-09-17T17:44:03.965Z
Learning: In apps/web/lib/actions/partners/update-program.ts, the team prefers to keep the messagingEnabledAt update logic simple by allowing client-provided timestamps rather than implementing server-controlled timestamp logic to avoid added complexity.

Applied to files:

apps/web/scripts/perplexity/partners-updated-countries.ts

📚 Learning: 2025-06-06T07:59:03.120Z

Learnt from: devkiran
Repo: dubinc/dub PR: 2177
File: apps/web/lib/api/links/bulk-create-links.ts:66-84
Timestamp: 2025-06-06T07:59:03.120Z
Learning: In apps/web/lib/api/links/bulk-create-links.ts, the team accepts the risk of potential undefined results from links.find() operations when building invalidLinks arrays, because existing links are fetched from the database based on the input links, so matches are expected to always exist.

Applied to files:

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts

🧬 Code graph analysis (2)

apps/web/scripts/perplexity/partners-updated-countries.ts (2)

packages/prisma/index.ts (1)

prisma (3-9)

apps/web/lib/zod/schemas/partner-profile.ts (1)

partnerProfileChangeHistoryLogSchema (181-196)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4)

apps/web/lib/cron/verify-vercel.ts (1)

verifyVercelSignature (3-20)

apps/web/lib/zod/schemas/analytics-response.ts (1)

analyticsResponse (30-578)

apps/web/lib/analytics/get-analytics.ts (1)

getAnalytics (20-336)

apps/web/lib/api/errors.ts (1)

handleAndReturnErrorResponse (162-165)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (8)

apps/web/scripts/perplexity/partners-updated-countries.ts (1)

10-44: LGTM!

The Prisma query is well-structured with appropriate filters for the use case. The JSON path operator usage for checking change history is correct.

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (7)

19-24: LGTM!

The batch size and schema definitions are well-structured for cursor-based pagination.

60-94: LGTM!

The cursor-based pagination implementation is correct. The skip: 1 when a cursor is present properly excludes the cursor record, and the filtering logic appropriately targets links with click rewards and recent activity.

102-112: LGTM!

The analytics timing wrapper provides useful performance metrics, and the query parameters correctly target the filtered links for the previous day's date range.

125-150: LGTM!

The defensive null checks are appropriate for runtime safety, even though the query filters should prevent most null cases. The type assertion using satisfies correctly validates the filtered array type while preserving compile-time checking.

170-186: LGTM!

The batch chaining logic correctly enqueues the next batch only when a full batch is retrieved, and the array access is safe within the conditional check. Using APP_DOMAIN_WITH_NGROK ensures the cron job works in both production and development environments.

29-46: LGTM!

The dual-path handler design elegantly supports both Vercel cron (GET for initial trigger) and Qstash (POST for batch chaining) with appropriate signature verification for each path.

194-194: LGTM!

The export pattern correctly exposes the handler for both GET and POST methods, which is the standard approach for Next.js App Router route handlers.

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts

apps/web/scripts/perplexity/partners-updated-countries.ts

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (1)
96-101: Fix off‑by‑one in total link count in endMessage

The total links in endMessage currently use:
batchNumber * BATCH_SIZE + linksWithClickRewards.length
For a final partial batch, this double‑counts a full batch. E.g., with BATCH_SIZE = 200, batchNumber = 4, and linksWithClickRewards.length = 150, it logs 950 instead of 750.

You want “completed full batches before this one” plus the current batch size, i.e.:
-    const endMessage = `Finished aggregating clicks for ${batchNumber} batches (total ${nFormatter(batchNumber * BATCH_SIZE + linksWithClickRewards.length, { full: true })} links)`;
+    const endMessage = `Finished aggregating clicks for ${batchNumber} batches (total ${nFormatter((batchNumber - 1) * BATCH_SIZE + linksWithClickRewards.length, { full: true })} links)`;
This also keeps the zero‑result follow‑up batch case consistent (the final empty batch just reports the total from the previous full batches).

Also applies to: 171-189

🧹 Nitpick comments (4)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4)
102-113: Commission calculation pipeline is consistent; consider tightening the filter typing

The flow from analytics → linkClicksMap → commissionsToCreate is coherent: zero/non‑paid links are dropped, and only links with complete programId / partnerId / clickReward data produce commissions, with earnings derived via getRewardAmount(serializeReward(...)) * linkClicks. That all looks correct.

If TypeScript ever complains about null in commissionsToCreate, you can make the filter a proper type guard instead of relying on satisfies for narrowing:
-      .filter((c) => c !== null) satisfies Prisma.CommissionCreateManyInput[];
+      .filter(
+        (c): c is Prisma.CommissionCreateManyInput => c !== null,
+      );
This gives commissionsToCreate the precise Prisma.CommissionCreateManyInput[] type without needing satisfies, which also makes the subsequent for ... of destructuring safer in strict mode.

Also applies to: 119-152

153-169: Avoid repeated syncTotalCommissions calls for the same partner/program pair

Right now, syncTotalCommissions is called once per commission in commissionsToCreate. If multiple links in the same partner/program pair generate commissions in the same batch, this will call the sync multiple times for the same pair, which is unnecessary extra work on a cron path.

You could deduplicate by partner/program before syncing, for example:
-    for (const { partnerId, programId } of commissionsToCreate) {
-      // Sync total commissions for each partner that we created commissions for
-      await syncTotalCommissions({
-        partnerId,
-        programId,
-      });
-    }
+    const uniquePairs = new Map<string, { partnerId: string; programId: string }>();
+
+    for (const { partnerId, programId } of commissionsToCreate) {
+      const key = `${partnerId}:${programId}`;
+      if (!uniquePairs.has(key)) {
+        uniquePairs.set(key, { partnerId, programId });
+      }
+    }
+
+    for (const { partnerId, programId } of uniquePairs.values()) {
+      await syncTotalCommissions({ partnerId, programId });
+    }
This keeps behavior the same while bounding the number of sync calls per batch by the number of distinct partner/program combinations instead of number of links.

153-159: Consider dropping or gating console.table in cron

console.table(commissionsToCreate); on a cron that can process many links per day may produce noisy logs (and include partner/program/link identifiers). If this was for debugging, consider removing it or guarding behind a debug flag or environment check so production logs stay leaner.

171-187: QStash chaining looks correct; minor log message polish

The logic to detect a full batch, enqueue the next job via QStash with startingAfter and batchNumber + 1, and immediately return a log/response is sound and should give you robust chaining across large datasets.

Tiny nit: the log string has slightly unbalanced parentheses and could be clearer:
-      return logAndRespond(
-        `Enqueued next batch (batch #${batchNumber + 1} for aggregate clicks cron (startingAfter: ${nextStartingAfter}).`,
-      );
+      return logAndRespond(
+        `Enqueued next batch (batch #${batchNumber + 1}) for aggregate clicks cron (startingAfter: ${nextStartingAfter}).`,
+      );
Purely cosmetic, feel free to ignore if you like the current wording.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7b2bb8 and 0d280ee.

📒 Files selected for processing (1)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-06-06T07:59:03.120Z

Learnt from: devkiran
Repo: dubinc/dub PR: 2177
File: apps/web/lib/api/links/bulk-create-links.ts:66-84
Timestamp: 2025-06-06T07:59:03.120Z
Learning: In apps/web/lib/api/links/bulk-create-links.ts, the team accepts the risk of potential undefined results from links.find() operations when building invalidLinks arrays, because existing links are fetched from the database based on the input links, so matches are expected to always exist.

Applied to files:

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (3)

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts (3)

6-24: Unified handler + verification wiring looks solid

Using a single handler for both GET (Vercel cron) and POST (QStash) with the appropriate signature verification and a small Zod schema for startingAfter / batchNumber keeps this route easy to reason about and type‑safe. No issues here.

Also applies to: 29-47

60-94: Batching + cursor pagination over links looks correct

The findMany query correctly scopes to links with a programEnrollment.clickRewardId, positive clicks, and a recent lastClicked, and uses take, cursor, and skip on id ordered ascending, which gives stable, non‑overlapping batches for chaining. This should behave well with the startingAfter semantics you use later.

195-195: Re-exporting handler as both GET and POST is a nice reuse pattern

Exporting the same handler for both GET and POST keeps the routing surface minimal and avoids divergence between code paths over time. No changes needed here.

Optimize aggregate/clicks cron with batching logic

f7b2bb8

vercel bot had a problem deploying to Preview November 24, 2025 05:31 Failure

steven-tey added 2 commits November 23, 2025 21:31

fix zero earnings bug

73d82f2

fix comment

0d280ee

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

apps/web/app/(ee)/api/cron/aggregate-clicks/route.ts Outdated Show resolved Hide resolved

apps/web/scripts/perplexity/partners-updated-countries.ts Show resolved Hide resolved

devkiran approved these changes Nov 24, 2025

View reviewed changes

fix type issues

a8013ea

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

fix totalLinks calculation logic

9ccffd8

vercel bot had a problem deploying to Preview November 24, 2025 05:42 Failure

fix ts build issues

69ead51

vercel bot deployed to Preview November 24, 2025 05:55 View deployment

steven-tey merged commit 070649f into main Nov 24, 2025
7 of 8 checks passed

steven-tey deleted the optimize-aggregate-clicks branch November 24, 2025 06:02

coderabbitai bot mentioned this pull request Nov 25, 2025

Fraud detection improvements #3153

Merged

coderabbitai bot mentioned this pull request Dec 19, 2025

Improve connect payout reminders #3256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize aggregate/clicks cron with batching logic #3148

Optimize aggregate/clicks cron with batching logic #3148

Uh oh!

steven-tey commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize aggregate/clicks cron with batching logic #3148

Optimize aggregate/clicks cron with batching logic #3148

Uh oh!

Conversation

steven-tey commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steven-tey commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Nov 24, 2025 •

edited

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading