Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@steven-tey
Copy link
Collaborator

@steven-tey steven-tey commented Oct 24, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added enhanced analytics data collection with new device, browser, and OS fields
    • Introduced v3 analytics pipelines for improved performance and flexibility
    • Added group-by aggregations and time-series reporting capabilities
  • Bug Fixes & Improvements

    • Optimized data storage for frequently queried fields
    • Streamlined analytics endpoints for faster query performance
    • Improved workspace and link filtering logic
  • Deprecated

    • Removed legacy v2 analytics endpoints in favor of v3 pipelines
    • Deprecated several specialized materialization views

@vercel
Copy link
Contributor

vercel bot commented Oct 24, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
dub Ready Ready Preview Oct 24, 2025 9:54pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

Walkthrough

This pull request comprehensively refactors the Tinybird analytics infrastructure by quoting TOKEN values across datasources for consistency, enriching schemas with workspace-scoped fields (workspace_id, domain, key), removing 23 v2 endpoint pipelines, and introducing three new v3 pipes (v3_count, v3_group_by, v3_timeseries, v3_usage) alongside extensive schema transformations and materialization updates.

Changes

Cohort / File(s) Summary
TOKEN quoting across datasources
packages/tinybird/datasources/dub_audit_logs.datasource, dub_click_events.datasource, dub_conversion_events_log.datasource, dub_import_error_logs.datasource, dub_sale_events.datasource, dub_webhook_events.datasource, dub_links_metadata.datasource
Changed TOKEN value from unquoted identifier to quoted string (e.g., TOKEN dub_tinybird_token APPENDTOKEN "dub_tinybird_token" APPEND) for parsing consistency.
Datasource schema enrichment
dub_click_events.datasource, dub_lead_events.datasource, dub_sale_events.datasource
Added new schema fields: workspace_id, domain, key, and trigger with corresponding JSON path mappings. Also added ENGINE_SORTING_KEY and refined field ordering.
Datasource schema expansion with cardinality optimizations
dub_click_events_id.datasource, dub_click_events_mv.datasource, dub_lead_events_mv.datasource, dub_sale_events_mv.datasource
Added numerous fields (device_model, device_vendor, browser_version, os_version, cpu_architecture, bot, engine_version, qr) and converted city/region to LowCardinality(String). Updated ENGINE_SORTING_KEY to include workspace_id.
Metadata datasource modifications
dub_links_metadata_latest.datasource, dub_links_metadata.datasource
Changed workspace_id type from String to LowCardinality(String), added program_id/partner_id/folder_id fields with type changes, removed ENGINE_PARTITION_KEY, updated ENGINE_SORTING_KEY, and added ENGINE_IS_DELETED.
Removed datasource and regular links metadata
dub_regular_links_metadata_latest.datasource, dub_sale_events_id.datasource
Entirely deleted datasource definitions (both files removed).
Materialization pipe deletions
packages/tinybird/materializations/dub_click_events_id_pipe.pipe, dub_click_events_pipe.pipe, dub_lead_events_pipe.pipe, dub_links_metadata_pipe.pipe, dub_regular_links_metadata_pipe.pipe, dub_sale_events_id_pipe.pipe, dub_sale_events_pipe.pipe
Removed seven materialization pipe definitions entirely.
v2 endpoint pipe deletions
packages/tinybird/endpoints/all_stats.pipe, coordinates_all.pipe, coordinates_sales.pipe, get_audit_logs.pipe, get_click_event.pipe, get_framer_lead_events.pipe, get_import_error_logs.pipe, get_lead_event.pipe, get_lead_event_by_id.pipe, get_lead_events.pipe, get_sale_event.pipe, get_webhook_events.pipe, v2_browsers.pipe, v2_cities.pipe, v2_continents.pipe, v2_countries.pipe, v2_customer_events.pipe, v2_devices.pipe, v2_events.pipe, v2_os.pipe, v2_referer_urls.pipe, v2_referers.pipe, v2_regions.pipe, v2_timeseries.pipe, v2_top_links.pipe, v2_top_partners.pipe, v2_top_tags.pipe, v2_top_urls.pipe, v2_triggers.pipe, v2_usage.pipe, v2_utms.pipe
Deleted 31 v2 endpoint pipeline files, removing complex multi-node SQL pipelines with templated conditionals, PREWHERE optimizations, and dynamic filtering across clicks, leads, and sales metrics.
v2 pipe deletions
packages/tinybird/pipes/v2_browsers.pipe, v2_cities.pipe, v2_continents.pipe, v2_countries.pipe, v2_customer_events.pipe, v2_devices.pipe, v2_events.pipe, v2_os.pipe, v2_referer_urls.pipe, v2_referers.pipe, v2_regions.pipe, v2_timeseries.pipe, v2_top_links.pipe, v2_top_partners.pipe, v2_top_tags.pipe, v2_top_urls.pipe, v2_triggers.pipe, v2_usage.pipe, v2_utms.pipe
Deleted 19 v2 pipe definitions (note: some overlap with endpoint deletions above).
Core pipe migrations and updates
packages/tinybird/pipes/all_stats.pipe, coordinates_all.pipe, coordinates_sales.pipe, dub_click_events_id_pipe.pipe, dub_click_events_pipe.pipe, dub_lead_events_pipe.pipe, dub_links_metadata_pipe.pipe, dub_regular_links_metadata_pipe.pipe, dub_sale_events_id_pipe.pipe, dub_sale_events_pipe.pipe, get_audit_logs.pipe, get_click_event.pipe, get_framer_lead_events.pipe, get_import_error_logs.pipe, get_lead_event.pipe, get_lead_events.pipe, get_webhook_events.pipe
Refactored pipes: changed datasource tables (e.g., dub_click_events_mv instead of explicit column selection), added LEFT JOINs to enrich fields from link metadata, introduced low-cardinality field wrappers, updated trigger logic with CASE expressions, and reformatted SQL for consistency.
v3 pipe new implementations
packages/tinybird/pipes/v3_count.pipe, v3_events.pipe, v3_group_by.pipe, v3_group_by_link_metadata.pipe, v3_timeseries.pipe, v3_usage.pipe
Added six new v3 pipes with templated SQL nodes for counting, events retrieval, time-bucketed series, grouped aggregations by metadata, and usage tracking. Each includes workspace-scoped filtering, interval generation, and composite aggregation nodes with conditional event-type routing.
v2 top programs pipe migration
packages/tinybird/pipes/v2_top_programs.pipe
Changed datasource reference from dub_regular_links_metadata_latest to dub_links_metadata_latest, with minimal semantic changes to filtering and aggregation logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Analytics Client
    participant Router as Endpoint Router
    participant V3Pipe as V3 Pipe (Count/Events/etc.)
    participant WL as Workspace Links<br/>(Metadata Filter)
    participant Events as Event Tables<br/>(Click/Lead/Sale MV)
    
    Client->>Router: Query eventType=clicks, workspace_id=ws_123
    Router->>V3Pipe: Route to v3_count/v3_events
    V3Pipe->>WL: Filter workspace links
    WL-->>V3Pipe: Valid link_ids for workspace
    V3Pipe->>Events: Query with workspace context
    Events-->>V3Pipe: Filtered event data
    V3Pipe->>V3Pipe: Aggregate (GROUP BY, ORDER BY LIMIT)
    V3Pipe-->>Router: Result with groupByField
    Router-->>Client: Aggregated analytics

    Note over Client,Events: V2 pipes (deleted) no longer in path<br/>V3 introduces workspace-scoped filtering
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • Scope & heterogeneity: 50+ files across datasources, endpoints, pipes, and materializations; mix of schema changes, deletions, and new logic.
  • Deletion impact: 31 v2 endpoint/pipe files deleted; requires validation that v3 replacements cover all functionality and no consumers break.
  • Schema complexity: Multiple datasource schema transformations with cardinality optimizations, field reordering, and new computed fields; potential data compatibility concerns.
  • SQL logic density: New v3 pipes contain templated SQL with extensive conditionals, joins, and aggregations; v2 deletion removal requires tracing dependency chains.

Areas requiring extra attention:

  • V2 pipe deletions — verify no external dependencies or endpoints still reference removed files (e.g., v2_customer_events, v2_timeseries).
  • Schema field additions and datasource table migrations — ensure backward compatibility and confirm that queries using dub_links_metadata_latest (now required) handle field availability correctly.
  • V3 pipe implementations — validate SQL correctness for interval generation, workspace scoping, and composite joins; confirm groupByField routing logic covers all eventType branches.
  • dub_regular_links_metadata_latest removal — check for any remaining references in queries or endpoints that expected this filtered dataset.

Possibly related PRs

Suggested reviewers

  • devkiran

🐰 Hop along, refactor bold!
V2 pipes archived, v3 unfolds,
Workspace scopes shine with care,
Schemas enrich, metadata fair. 🌟

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Finalize Tinybird pipes/datasources" is clearly related to the changeset, which consists entirely of modifications to Tinybird pipes and datasources. The changes include updates to datasource schemas and token configurations, deletion of many v2 endpoint/pipe definitions, addition of new v3 pipes, modifications to materialization configurations, and various structural adjustments. The title accurately captures the primary focus of consolidating and finalizing these Tinybird components. While the title doesn't enumerate every specific change (token quoting, schema additions, v2-to-v3 migration), this level of detail is not expected or required.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tb-pipes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 34

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
packages/tinybird/pipes/v3_events.pipe (1)

1-2: Update the description to match the pipe's functionality.

The description says "Top countries" but this pipe fetches individual events (clicks, leads, sales), not country aggregations. Update the description to accurately reflect the pipe's purpose.

 DESCRIPTION >
-	Top countries
+	Fetch individual events (clicks, leads, sales) with filtering
packages/tinybird/pipes/v3_count.pipe (2)

75-76: Fix referer URL filtering (1‑based index).

-        {% if defined(refererUrl) %} AND splitByString('?', referer_url)[1] = {{ refererUrl }} {% end %}
+        {% if defined(refererUrl) %} AND splitByString('?', referer_url)[2] = {{ refererUrl }} {% end %}

197-199: Fix referer URL filtering in sales (1‑based index).

-                {% if defined(refererUrl) %}
-                    AND splitByString('?', referer_url)[1] = {{ refererUrl }}
-                {% end %}
+                {% if defined(refererUrl) %}
+                    AND splitByString('?', referer_url)[2] = {{ refererUrl }}
+                {% end %}
packages/tinybird/pipes/get_lead_event.pipe (1)

13-20: Remove real-looking customerId default and resolve required/default mismatch.

Hardcoding "cus_JzMqCL…" risks leaking PII and conflicts with required=True. Use a neutral placeholder or drop default.

-                customerId,
-                "cus_JzMqCLdaiVM1o1grw0yk84uC",
+                customerId,
+                "",
                 description="The unique ID for a given customer.",
-                required=True,
+                required=True,
🧹 Nitpick comments (29)
packages/tinybird/pipes/get_audit_logs.pipe (2)

14-16: Confirm intent to expose PII (IP, UA, metadata) and access controls.

These fields can be personal data. Verify endpoint is auth‑protected and exposure is required. If not strictly needed, consider gating with a param (e.g., include_sensitive=false by default) or masking IP/UA. I can propose a templated SELECT variant if desired.


24-25: Type check for equality filters.

Ensure workspace_id/program_id column types match String(). If columns are numeric/UUID, use the appropriate sanitizer/cast (e.g., Int64/UUID) to avoid type mismatch and suboptimal query plans.

packages/tinybird/pipes/v3_events.pipe (1)

63-69: Consider documenting the duplicated link filtering logic.

The link filtering logic is duplicated across click_events (lines 63-69), lead_events (lines 118-124), and sale_events (lines 196-202). While Tinybird's architecture may require this duplication, it increases maintenance burden—any change to the filtering logic must be replicated in three places.

Consider adding a comment documenting this duplication and the need to keep all three implementations synchronized.

+        {# Link filtering logic - keep synchronized across click_events, lead_events, and sale_events #}
         {% if defined(linkIds) %} AND link_id IN {{ Array(linkIds, 'String') }}

Also applies to: 118-124, 196-202

packages/tinybird/pipes/v3_usage.pipe (1)

1-2: Consider a more specific description.

The current description "Timeseries data" is generic. Consider updating it to reflect the specific purpose of this pipe, such as "Usage metrics for events and links by day" or "Daily usage timeseries for workspace events and link creation."

packages/tinybird/pipes/v3_group_by_link_metadata.pipe (2)

232-233: Add LIMIT for the composite node.

Upstream node limits help, but add an explicit LIMIT for defensive consistency.

-        ORDER BY saleAmount DESC
+        ORDER BY saleAmount DESC
+        LIMIT 5000

242-246: Tiny spacing nit in endpoint.

Add a space before group_by_link_metadata_sales for readability.

-        {% elif eventType == 'sales' %}group_by_link_metadata_sales 
+        {% elif eventType == 'sales' %} group_by_link_metadata_sales 
packages/tinybird/pipes/v3_group_by.pipe (4)

234-235: Standardize time filter types to DateTime64.

If timestamp is DateTime64 in MVs (as used in clicks), align leads.

-        {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %}
-        {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %}
+        {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %}
+        {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}

356-357: Standardize time filter types to DateTime64 (sales).

-        {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %}
-        {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %}
+        {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %}
+        {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}

426-427: Add explicit LIMIT to composite.

Keep result size bounded post-join.

-    ORDER BY clicks DESC
+    ORDER BY clicks DESC
+    LIMIT 5000

438-441: Tiny spacing nit in endpoint.

Add spaces for readability around group_by_sales / group_by_composite.

-        {% elif eventType == 'sales' %}group_by_sales 
-        {% elif eventType == 'composite' %}group_by_composite
+        {% elif eventType == 'sales' %} group_by_sales 
+        {% elif eventType == 'composite' %} group_by_composite
packages/tinybird/pipes/v3_count.pipe (4)

141-142: Standardize to DateTime64 in leads.

-        {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %}
-        {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %}
+        {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %}
+        {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}

222-223: Standardize to DateTime64 in sales.

-                {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %}
-                {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %}
+                {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %}
+                {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}

230-242: Metadata numeric comparisons currently lexicographic.

For >, <, >=, <= on numeric metadata, cast to numeric to avoid string ordering pitfalls.

Example for one branch:

-                            {% elif operator == 'greaterThan' %}
-                                AND JSONExtractString(metadata, {{ metadataKey }}) > {{ value }}
+                            {% elif operator == 'greaterThan' %}
+                                AND toFloat64OrNull(JSONExtractString(metadata, {{ metadataKey }})) > toFloat64OrNull({{ value }})

Apply similarly to <, <=, >=. Keep equals/notEquals as string.


265-270: Endpoint defaults look good. Minor readability.

Optional: add spaces before count_sales / count_composite for consistency.

-        {% elif eventType == 'sales' %} count_sales
-        {% elif eventType == 'composite' %} count_composite
+        {% elif eventType == 'sales' %} count_sales
+        {% elif eventType == 'composite' %} count_composite
packages/tinybird/datasources/dub_lead_events.datasource (1)

34-38: Optimize new text columns for cardinality.

trigger, and often domain/key/workspace_id, are low‑cardinality. Marking them accordingly reduces storage/memory.

Minimal change:

-    `trigger` String `json:$.trigger`,
+    `trigger` LowCardinality(String) `json:$.trigger`,

Optionally (verify CH version supports LC+Nullable):

-    `workspace_id` Nullable(String) `json:$.workspace_id`,
-    `domain` Nullable(String) `json:$.domain`,
-    `key` Nullable(String) `json:$.key`
+    `workspace_id` LowCardinality(Nullable(String)) `json:$.workspace_id`,
+    `domain` LowCardinality(Nullable(String)) `json:$.domain`,
+    `key` LowCardinality(Nullable(String)) `json:$.key`
packages/tinybird/datasources/dub_click_events.datasource (1)

35-38: Use LowCardinality for small-enum/text columns.

trigger, and commonly workspace_id/domain/key, benefit from LC encoding.

-    `trigger` String `json:$.trigger`,
+    `trigger` LowCardinality(String) `json:$.trigger`,

Optionally (verify CH version):

-    `workspace_id` Nullable(String) `json:$.workspace_id`,
-    `domain` Nullable(String) `json:$.domain`,
-    `key` Nullable(String) `json:$.key`
+    `workspace_id` LowCardinality(Nullable(String)) `json:$.workspace_id`,
+    `domain` LowCardinality(Nullable(String)) `json:$.domain`,
+    `key` LowCardinality(Nullable(String)) `json:$.key`
packages/tinybird/pipes/get_lead_event.pipe (2)

1-2: Description doesn’t match behavior (read vs update).

Rename to reflect read/lookup to avoid confusion in ops/docs.


9-22: Harden endpoint: avoid SELECT * and add caps/filters.

  • Select explicit columns.
  • Add LIMIT (parametrized) and optional time window (start/end) to cap scans.
  • Type parameters to prevent injection/implicit casts.
-    SELECT *
+    SELECT timestamp, event_id, event_name, link_id, customer_id, url, device, browser, os, referer, referer_url
     FROM dub_lead_events_mv
     WHERE
         customer_id
-        = {{
-            String(
-                customerId,
-                "",
-                description="The unique ID for a given customer.",
-                required=True,
-            )
-        }}
-        {% if defined(eventName) %} AND event_name = {{ eventName }} {% end %}
+        = {{ String(customerId, "", description="The unique ID for a given customer.", required=True) }}
+        {% if defined(eventName) %} AND event_name = {{ String(eventName) }} {% end %}
+        {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %}
+        {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}
     ORDER BY timestamp DESC
+    LIMIT {{ Int32(limit, 1000, description="Max rows") }}
packages/tinybird/datasources/dub_links_metadata_latest.datasource (1)

18-21: Consider (re)adding monthly partitioning for scan pruning.

Without ENGINE_PARTITION_KEY on timestamp, large-range scans may degrade. Recommend toYYYYMM(timestamp) if retention/time filters are common.

 ENGINE "ReplacingMergeTree"
 ENGINE_SORTING_KEY "workspace_id, link_id"
 ENGINE_VER "timestamp"
 ENGINE_IS_DELETED "deleted"
+ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"

Please confirm if Tinybird auto-partitions this table elsewhere; if so, ignore.

packages/tinybird/pipes/v2_top_programs.pipe (3)

9-12: FINAL on dub_links_metadata_latest can be expensive.

If possible, avoid FINAL by materializing a deduped lookup (deleted==0, latest) or using argMaxState/argMaxFinal. Keep FINAL only if correctness requires it at read-time.

Can you share row volume and latency targets to decide whether to precompute this JOIN input?


111-151: Sales query looks good; add optional trigger parity (if needed).

Clicks support trigger filtering; sales/leads don’t. If symmetry is desired, add se.trigger filter.

-                {% if defined(os) %} AND se.os = {{ os }} {% end %}
+                {% if defined(os) %} AND se.os = {{ os }} {% end %}
+                {% if defined(trigger) %} AND se.trigger = {{ String(trigger) }} {% end %}

170-176: Validate eventType and default safely.

Constrain to allowed values and set a default to avoid typos falling into sales branch.

-            {% if eventType == 'clicks' %} top_programs_clicks
+            {% if String(eventType, 'sales') == 'clicks' %} top_programs_clicks
             {% elif eventType == 'leads' %} top_programs_leads
             {% elif eventType == 'composite' %} top_programs_composite
             {% else %} top_programs_sales
             {% end %}

Also confirm composite’s intent: it LEFT JOINs from clicks; programs with sales/leads but zero clicks won’t appear.

packages/tinybird/pipes/dub_sale_events_pipe.pipe (2)

22-24: Make sale_type deterministic; remove extra scan.

Timestamp equality can label multiple rows as "new" on ties; also the first_sales join adds an avoidable full scan. Prefer windowed row_number and drop the join.

-        -- sale_type: "new" if this sale is the earliest one for (customer_id, link_id)
-        if(timestamp = first_sale_ts, 'new', 'recurring') AS sale_type,
+        -- sale_type: deterministic first sale per (customer_id, link_id)
+        if(
+          row_number() OVER (PARTITION BY customer_id, link_id ORDER BY timestamp ASC, event_id ASC) = 1,
+          'new',
+          'recurring'
+        ) AS sale_type,
@@
-    LEFT JOIN
-        (
-            -- find the first sale timestamp for each customer_id:link_id pair
-            SELECT customer_id, link_id, min(timestamp) AS first_sale_ts
-            FROM dub_sale_events
-            GROUP BY customer_id, link_id
-        ) AS first_sales USING (customer_id, link_id)
+    -- first_sale computed via window function; no need for a pre-aggregated join

Also applies to: 57-64


65-70: Avoid IN-subquery on the right side; join directly.

The WHERE link_id IN (SELECT link_id FROM dub_sale_events) likely triggers a full scan and duplicates the left read. Drop it and rely on the LEFT JOIN.

-    LEFT JOIN
-        (
-            SELECT link_id, workspace_id, domain, key
-            FROM dub_links_metadata_latest FINAL
-            WHERE link_id IN (SELECT link_id FROM dub_sale_events)
-        ) AS link_metadata USING (link_id)
+    LEFT JOIN
+        (
+            SELECT link_id, workspace_id, domain, key
+            FROM dub_links_metadata_latest FINAL
+        ) AS link_metadata USING (link_id)

If you expect multiple rows per link_id in dub_links_metadata_latest even after FINAL, consider ANY LEFT JOIN to cap matches at one.

packages/tinybird/datasources/dub_click_events_mv.datasource (1)

20-21: Consider LowCardinality for trigger; revisit latitude/longitude types.

trigger has tiny domain; LowCardinality(String) saves space. If possible, store latitude/longitude as Decimal(9,6) for numeric ops; keep String only if non-numeric values occur.

Validate query patterns; if trigger is frequently grouped/filtered, LC helps.

Also applies to: 33-34

packages/tinybird/datasources/dub_click_events_id.datasource (1)

36-39: Validate partitioning for scale.

tuple() disables partitioning; fine for small tables, risky at high volume. If rows are high and retained long, prefer toYYYYMM(timestamp) to ease merges/TTL without hurting click_id lookups.

packages/tinybird/pipes/dub_click_events_pipe.pipe (1)

45-51: Drop IN-subquery; consider FINAL on metadata.

Remove WHERE link_id IN (...) to avoid scanning dub_click_events again. If link_metadata may contain multiple versions, add FINAL for determinism with ANY LEFT JOIN.

-    FROM dub_click_events AS click_event ANY
-    LEFT JOIN
-        (
-            SELECT link_id, workspace_id, domain, key
-            FROM dub_links_metadata_latest
-            WHERE link_id IN (SELECT link_id FROM dub_click_events)
-        ) AS link_metadata USING link_id
+    FROM dub_click_events AS click_event ANY
+    LEFT JOIN
+        (
+            SELECT link_id, workspace_id, domain, key
+            FROM dub_links_metadata_latest FINAL
+        ) AS link_metadata USING link_id

Confirm dub_links_metadata_latest is deduplicated with FINAL; otherwise retain ANY.

packages/tinybird/pipes/v3_timeseries.pipe (2)

273-276: Remove placeholder description.

The description "undefined" should be replaced with a meaningful description or removed entirely.

Apply this diff:

 NODE timeseries_sales_data
-DESCRIPTION >
-    undefined
-
 SQL >

342-342: Verify if duplicate saleAmount column is necessary.

Line 342 selects amount, amount as saleAmount, creating two columns with identical values. If this is for backward compatibility or API contract, consider adding a comment explaining why both are needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78f6f42 and f4658bd.

📒 Files selected for processing (100)
  • packages/tinybird/datasources/dub_audit_logs.datasource (1 hunks)
  • packages/tinybird/datasources/dub_click_events.datasource (2 hunks)
  • packages/tinybird/datasources/dub_click_events_id.datasource (1 hunks)
  • packages/tinybird/datasources/dub_click_events_mv.datasource (1 hunks)
  • packages/tinybird/datasources/dub_conversion_events_log.datasource (1 hunks)
  • packages/tinybird/datasources/dub_import_error_logs.datasource (1 hunks)
  • packages/tinybird/datasources/dub_lead_events.datasource (2 hunks)
  • packages/tinybird/datasources/dub_lead_events_mv.datasource (1 hunks)
  • packages/tinybird/datasources/dub_links_metadata.datasource (1 hunks)
  • packages/tinybird/datasources/dub_links_metadata_latest.datasource (1 hunks)
  • packages/tinybird/datasources/dub_regular_links_metadata_latest.datasource (0 hunks)
  • packages/tinybird/datasources/dub_sale_events.datasource (2 hunks)
  • packages/tinybird/datasources/dub_sale_events_id.datasource (0 hunks)
  • packages/tinybird/datasources/dub_sale_events_mv.datasource (1 hunks)
  • packages/tinybird/datasources/dub_webhook_events.datasource (1 hunks)
  • packages/tinybird/endpoints/all_stats.pipe (0 hunks)
  • packages/tinybird/endpoints/coordinates_all.pipe (0 hunks)
  • packages/tinybird/endpoints/coordinates_sales.pipe (0 hunks)
  • packages/tinybird/endpoints/get_audit_logs.pipe (0 hunks)
  • packages/tinybird/endpoints/get_click_event.pipe (0 hunks)
  • packages/tinybird/endpoints/get_framer_lead_events.pipe (0 hunks)
  • packages/tinybird/endpoints/get_import_error_logs.pipe (0 hunks)
  • packages/tinybird/endpoints/get_lead_event.pipe (0 hunks)
  • packages/tinybird/endpoints/get_lead_event_by_id.pipe (0 hunks)
  • packages/tinybird/endpoints/get_lead_events.pipe (0 hunks)
  • packages/tinybird/endpoints/get_sale_event.pipe (0 hunks)
  • packages/tinybird/endpoints/get_webhook_events.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_browsers.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_cities.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_continents.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_count.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_countries.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_customer_events.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_devices.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_events.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_os.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_referer_urls.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_referers.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_regions.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_timeseries.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_top_links.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_top_partners.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_top_programs.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_top_tags.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_top_urls.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_triggers.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_usage.pipe (0 hunks)
  • packages/tinybird/endpoints/v2_utms.pipe (0 hunks)
  • packages/tinybird/materializations/dub_click_events_id_pipe.pipe (0 hunks)
  • packages/tinybird/materializations/dub_click_events_pipe.pipe (0 hunks)
  • packages/tinybird/materializations/dub_lead_events_pipe.pipe (0 hunks)
  • packages/tinybird/materializations/dub_links_metadata_pipe.pipe (0 hunks)
  • packages/tinybird/materializations/dub_regular_links_metadata_pipe.pipe (0 hunks)
  • packages/tinybird/materializations/dub_sale_events_id_pipe.pipe (0 hunks)
  • packages/tinybird/materializations/dub_sale_events_pipe.pipe (0 hunks)
  • packages/tinybird/pipes/all_stats.pipe (1 hunks)
  • packages/tinybird/pipes/coordinates_all.pipe (1 hunks)
  • packages/tinybird/pipes/coordinates_sales.pipe (1 hunks)
  • packages/tinybird/pipes/dub_click_events_id_pipe.pipe (1 hunks)
  • packages/tinybird/pipes/dub_click_events_pipe.pipe (1 hunks)
  • packages/tinybird/pipes/dub_click_events_pipe_with_domain_key.pipe (0 hunks)
  • packages/tinybird/pipes/dub_lead_events_pipe.pipe (1 hunks)
  • packages/tinybird/pipes/dub_links_metadata_pipe.pipe (1 hunks)
  • packages/tinybird/pipes/dub_regular_links_metadata_pipe.pipe (0 hunks)
  • packages/tinybird/pipes/dub_sale_events_id_pipe.pipe (0 hunks)
  • packages/tinybird/pipes/dub_sale_events_pipe.pipe (1 hunks)
  • packages/tinybird/pipes/get_audit_logs.pipe (1 hunks)
  • packages/tinybird/pipes/get_click_event.pipe (1 hunks)
  • packages/tinybird/pipes/get_framer_lead_events.pipe (1 hunks)
  • packages/tinybird/pipes/get_import_error_logs.pipe (1 hunks)
  • packages/tinybird/pipes/get_lead_event.pipe (1 hunks)
  • packages/tinybird/pipes/get_lead_event_by_id.pipe (0 hunks)
  • packages/tinybird/pipes/get_lead_events.pipe (1 hunks)
  • packages/tinybird/pipes/get_sale_event.pipe (0 hunks)
  • packages/tinybird/pipes/get_webhook_events.pipe (1 hunks)
  • packages/tinybird/pipes/v2_browsers.pipe (0 hunks)
  • packages/tinybird/pipes/v2_cities.pipe (0 hunks)
  • packages/tinybird/pipes/v2_continents.pipe (0 hunks)
  • packages/tinybird/pipes/v2_countries.pipe (0 hunks)
  • packages/tinybird/pipes/v2_customer_events.pipe (1 hunks)
  • packages/tinybird/pipes/v2_devices.pipe (0 hunks)
  • packages/tinybird/pipes/v2_os.pipe (0 hunks)
  • packages/tinybird/pipes/v2_referer_urls.pipe (0 hunks)
  • packages/tinybird/pipes/v2_referers.pipe (0 hunks)
  • packages/tinybird/pipes/v2_regions.pipe (0 hunks)
  • packages/tinybird/pipes/v2_timeseries.pipe (0 hunks)
  • packages/tinybird/pipes/v2_top_links.pipe (0 hunks)
  • packages/tinybird/pipes/v2_top_partners.pipe (0 hunks)
  • packages/tinybird/pipes/v2_top_programs.pipe (2 hunks)
  • packages/tinybird/pipes/v2_top_tags.pipe (0 hunks)
  • packages/tinybird/pipes/v2_top_urls.pipe (0 hunks)
  • packages/tinybird/pipes/v2_triggers.pipe (0 hunks)
  • packages/tinybird/pipes/v2_usage.pipe (0 hunks)
  • packages/tinybird/pipes/v2_utms.pipe (0 hunks)
  • packages/tinybird/pipes/v3_count.pipe (8 hunks)
  • packages/tinybird/pipes/v3_events.pipe (8 hunks)
  • packages/tinybird/pipes/v3_group_by.pipe (1 hunks)
  • packages/tinybird/pipes/v3_group_by_link_metadata.pipe (1 hunks)
  • packages/tinybird/pipes/v3_timeseries.pipe (1 hunks)
  • packages/tinybird/pipes/v3_usage.pipe (1 hunks)
💤 Files with no reviewable changes (64)
  • packages/tinybird/pipes/get_lead_event_by_id.pipe
  • packages/tinybird/pipes/dub_sale_events_id_pipe.pipe
  • packages/tinybird/datasources/dub_regular_links_metadata_latest.datasource
  • packages/tinybird/endpoints/all_stats.pipe
  • packages/tinybird/pipes/v2_browsers.pipe
  • packages/tinybird/endpoints/v2_devices.pipe
  • packages/tinybird/pipes/get_sale_event.pipe
  • packages/tinybird/pipes/v2_top_links.pipe
  • packages/tinybird/endpoints/get_lead_event.pipe
  • packages/tinybird/pipes/v2_cities.pipe
  • packages/tinybird/pipes/v2_top_tags.pipe
  • packages/tinybird/pipes/v2_countries.pipe
  • packages/tinybird/materializations/dub_click_events_pipe.pipe
  • packages/tinybird/materializations/dub_links_metadata_pipe.pipe
  • packages/tinybird/endpoints/coordinates_sales.pipe
  • packages/tinybird/pipes/v2_top_urls.pipe
  • packages/tinybird/endpoints/v2_referers.pipe
  • packages/tinybird/materializations/dub_sale_events_id_pipe.pipe
  • packages/tinybird/endpoints/get_click_event.pipe
  • packages/tinybird/pipes/v2_referer_urls.pipe
  • packages/tinybird/endpoints/v2_top_tags.pipe
  • packages/tinybird/pipes/v2_usage.pipe
  • packages/tinybird/pipes/v2_utms.pipe
  • packages/tinybird/pipes/v2_timeseries.pipe
  • packages/tinybird/materializations/dub_regular_links_metadata_pipe.pipe
  • packages/tinybird/endpoints/v2_continents.pipe
  • packages/tinybird/materializations/dub_lead_events_pipe.pipe
  • packages/tinybird/endpoints/v2_top_links.pipe
  • packages/tinybird/endpoints/get_framer_lead_events.pipe
  • packages/tinybird/datasources/dub_sale_events_id.datasource
  • packages/tinybird/endpoints/v2_customer_events.pipe
  • packages/tinybird/endpoints/get_audit_logs.pipe
  • packages/tinybird/endpoints/get_sale_event.pipe
  • packages/tinybird/endpoints/get_lead_events.pipe
  • packages/tinybird/endpoints/v2_triggers.pipe
  • packages/tinybird/endpoints/v2_events.pipe
  • packages/tinybird/endpoints/coordinates_all.pipe
  • packages/tinybird/endpoints/get_lead_event_by_id.pipe
  • packages/tinybird/endpoints/get_import_error_logs.pipe
  • packages/tinybird/endpoints/v2_referer_urls.pipe
  • packages/tinybird/endpoints/v2_os.pipe
  • packages/tinybird/pipes/v2_regions.pipe
  • packages/tinybird/pipes/dub_regular_links_metadata_pipe.pipe
  • packages/tinybird/pipes/v2_devices.pipe
  • packages/tinybird/endpoints/v2_cities.pipe
  • packages/tinybird/endpoints/v2_countries.pipe
  • packages/tinybird/pipes/dub_click_events_pipe_with_domain_key.pipe
  • packages/tinybird/endpoints/v2_top_urls.pipe
  • packages/tinybird/pipes/v2_referers.pipe
  • packages/tinybird/endpoints/v2_count.pipe
  • packages/tinybird/materializations/dub_sale_events_pipe.pipe
  • packages/tinybird/endpoints/v2_regions.pipe
  • packages/tinybird/endpoints/v2_top_programs.pipe
  • packages/tinybird/endpoints/get_webhook_events.pipe
  • packages/tinybird/pipes/v2_triggers.pipe
  • packages/tinybird/endpoints/v2_browsers.pipe
  • packages/tinybird/materializations/dub_click_events_id_pipe.pipe
  • packages/tinybird/endpoints/v2_usage.pipe
  • packages/tinybird/pipes/v2_continents.pipe
  • packages/tinybird/pipes/v2_os.pipe
  • packages/tinybird/endpoints/v2_timeseries.pipe
  • packages/tinybird/endpoints/v2_top_partners.pipe
  • packages/tinybird/pipes/v2_top_partners.pipe
  • packages/tinybird/endpoints/v2_utms.pipe
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (25)
packages/tinybird/pipes/get_click_event.pipe (1)

5-18: Good formatting improvement — SQL query structure is clean and readable.

The indentation enhancement makes the endpoint query more maintainable while preserving its logic: it correctly filters click events by ID, orders by timestamp descending, and limits to the most recent record. Parameter handling is well-defined with proper type, default value, and required flag.

packages/tinybird/pipes/v2_customer_events.pipe (1)

1-163: LGTM! Formatting improvements maintain query semantics.

This is a clean, non-functional refactor of the pipe's SQL formatting—indentation and whitespace are improved for readability across all four NODE blocks (lead_events, click_events, sale_events, endpoint). The underlying query logic, column selection, WHERE clauses, and UNION logic are unchanged. The reformatted code is consistent and well-aligned.

packages/tinybird/pipes/get_framer_lead_events.pipe (1)

5-9: ****

The pipe get_framer_lead_events is actively referenced in the codebase across multiple systems:

  • apps/web/app/(ee)/api/cron/framer/backfill-leads-batch/route.ts (cron job)
  • apps/web/scripts/framer/process-lead-events.ts (processing script)
  • apps/web/scripts/framer/1-process-framer-combined.ts (another script)

The changes are formatting/indentation adjustments only, and the pipe is not a dangling reference. No action required.

packages/tinybird/pipes/coordinates_all.pipe (1)

7-94: Formatting-only changes - no semantic impact.

These indentation adjustments improve consistency across the SQL nodes without altering behavior.

packages/tinybird/pipes/v3_events.pipe (2)

49-69: Good use of PREWHERE optimization.

The conditional FROM clause and PREWHERE optimization for customerId filtering improves query performance by reducing data processed. The expanded link filtering logic provides good flexibility for workspace-scoped queries.


265-271: Routing logic looks good.

The endpoint routing correctly branches to the appropriate event node based on eventType, with proper handling of the saleType parameter for sales events.

packages/tinybird/pipes/v3_group_by_link_metadata.pipe (2)

196-197: Potential schema mismatch: trigger filter on sales.

Same concern as leads; verify trigger exists on dub_sale_events_mv before filtering.


139-141: The review comment is incorrect; trigger column exists in the schema.

The verification confirms that trigger is a String column in both dub_lead_events_mv and dub_sale_events_mv schema definitions (appearing at lines 24 and 28 respectively). The filter on trigger at line 139 will not fail due to a missing column.

Likely an incorrect or invalid review comment.

packages/tinybird/pipes/coordinates_sales.pipe (1)

7-24: LGTM! Formatting improvement.

The SQL query reformatting improves readability without any functional changes.

packages/tinybird/pipes/get_webhook_events.pipe (1)

9-22: LGTM! Formatting improvement.

The SQL query reformatting improves readability without any functional changes.

packages/tinybird/datasources/dub_import_error_logs.datasource (1)

1-2: LGTM! Consistency improvement.

Quoting the TOKEN value aligns with the broader PR effort to standardize token declarations across Tinybird datasources.

packages/tinybird/datasources/dub_audit_logs.datasource (1)

1-2: LGTM! Consistency improvement.

Quoting the TOKEN value aligns with the broader PR effort to standardize token declarations across Tinybird datasources.

packages/tinybird/pipes/get_import_error_logs.pipe (1)

9-29: LGTM! Formatting improvement.

The SQL query reformatting improves readability without any functional changes.

packages/tinybird/datasources/dub_conversion_events_log.datasource (1)

1-2: LGTM! Consistency improvement.

Quoting the TOKEN value aligns with the broader PR effort to standardize token declarations across Tinybird datasources.

packages/tinybird/datasources/dub_links_metadata.datasource (1)

1-2: Token standardization confirmed, but permissions verification needed.

The token name change from dub_links_metadata to dub_tinybird_token is intentional—all 8 datasources now uniformly use dub_tinybird_token as part of coordinated Tinybird infrastructure finalization. No references to dub_links_metadata as a token remain in the codebase.

However, verify manually that dub_tinybird_token has the necessary ingestion permissions configured in your Tinybird platform settings, as token capability scoping is managed outside the codebase.

packages/tinybird/datasources/dub_webhook_events.datasource (1)

1-1: TOKEN quoting LGTM.

Consistent with other DS files. Ensure "dub_tinybird_token" is defined in Tinybird deploy envs for all workspaces.

packages/tinybird/pipes/dub_links_metadata_pipe.pipe (1)

8-15: The review comment is incorrect. The startsWith('ws_c') logic is intentional.

Analysis reveals this is legacy workspace ID normalization, not a bug:

  1. The TypeScript code contains an identical normalizeWorkspaceId function with the same logic: startsWith("ws_c") then replace("ws_", "")

  2. In create-workspace-id.ts, the system explicitly guards against generating IDs that start with "ws_c" (rejecting them as collisions with the old workspace ID format)

  3. The pipe's logic handles legacy workspace IDs from an older format for backward compatibility. Applying the suggested diff would break this legacy data handling.

The code is working as designed—the 'ws_c' prefix specifically targets old IDs requiring normalization.

Likely an incorrect or invalid review comment.

packages/tinybird/datasources/dub_sale_events.datasource (2)

1-1: Token quoting LGTM.

Consistent with other datasources.


38-43: Good enrichment; ensure downstream MV handles NULLs.

You’ve added Nullable(domain/key/workspace_id). Verify the MV schema/pipe coalesces or matches nullability (see related MV comment).

packages/tinybird/datasources/dub_click_events_mv.datasource (1)

38-38: Sort key choice LGTM.

workspace_id, link_id, timestamp aligns with common filters and timeseries scans.

packages/tinybird/pipes/dub_click_events_pipe.pipe (1)

27-30: Trigger normalization looks good.

Empty trigger defaults to 'qr' when qr=1 else 'link'; else keep original. Matches intended semantics.

Ensure qr is UInt8 (0/1); if Nullable, wrap with coalesce(qr, 0).

packages/tinybird/pipes/v3_timeseries.pipe (4)

7-77: LGTM!

The interval generation logic for month, day, and hour granularities is correct. The use of toStartOf* functions with timezone handling and arrayJoin for generating sequences is appropriate for this use case.


80-110: LGTM!

The workspace_links node correctly uses the FINAL modifier to get the latest state and properly validates the required workspaceId. The conditional filter logic for optional parameters is well-structured.


356-363: LGTM!

The composite timeseries correctly uses left joins to combine all event types, starting from clicks as the base. The join logic on groupByField is consistent across all timeseries nodes.


366-378: LGTM!

The endpoint routing logic is clear and properly handles all event types with a sensible default fallback to clicks.

Comment on lines +6 to 10
`workspace_id` LowCardinality(String),
`link_id` String,
`customer_id` String,
`domain` String,
`key` String,
`url` String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Non‑nullable fields may reject NULLs from the pipe.

workspace_id, domain, key are non‑nullable. If upstream join/coalesce yields NULL, inserts will fail. Either (a) make these columns Nullable, or (b) guarantee non‑NULLs in the pipe (see suggested fallbacks in dub_lead_events_pipe.pipe).

Option A (make columns nullable):

-    `workspace_id` LowCardinality(String),
+    `workspace_id` LowCardinality(Nullable(String)),
-    `domain` String,
-    `key` String,
+    `domain` Nullable(String),
+    `key` Nullable(String),

Option B (keep non‑nullable): keep schema as‑is and adopt the pipe changes proposed earlier.

Also applies to: 11-15, 41-41


🏁 Script executed:

#!/bin/bash
# Locate and examine the datasource file
find . -name "dub_lead_events_mv.datasource" -type f 2>/dev/null | head -5

Length of output: 61


🏁 Script executed:

#!/bin/bash
# Locate and examine the pipe file
find . -name "dub_lead_events_pipe.pipe" -type f 2>/dev/null | head -5

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Search for related files in tinybird datasources directory
fd -e datasource -e pipe . packages/tinybird/ 2>/dev/null | grep -E "(dub_lead|event)" | head -20

Length of output: 1066


🏁 Script executed:

#!/bin/bash
# Examine the datasource schema file
cat packages/tinybird/datasources/dub_lead_events_mv.datasource

Length of output: 1150


🏁 Script executed:

#!/bin/bash
# Examine the pipe file
cat packages/tinybird/pipes/dub_lead_events_pipe.pipe

Length of output: 1493


Address NULL handling for workspace_id, domain, key fields.

The datasource declares workspace_id, domain, and key as non-nullable, but the upstream pipe uses coalesce() without a final fallback. If both lead_event and link_metadata sources are NULL, coalesce() returns NULL, causing insert failures.

Choose one approach:

Option A (make columns nullable):

-    `workspace_id` LowCardinality(String),
+    `workspace_id` LowCardinality(Nullable(String)),
-    `domain` String,
-    `key` String,
+    `domain` Nullable(String),
+    `key` Nullable(String),

Option B (ensure non-NULL in pipe): Add fallback values (e.g., coalesce(..., '')) to guarantee non-NULL results before insert.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
`workspace_id` LowCardinality(String),
`link_id` String,
`customer_id` String,
`domain` String,
`key` String,
`url` String,
`workspace_id` LowCardinality(Nullable(String)),
`link_id` String,
`domain` Nullable(String),
`key` Nullable(String),
`url` String,
🤖 Prompt for AI Agents
In packages/tinybird/datasources/dub_lead_events_mv.datasource around lines 6 to
10, the fields `workspace_id`, `domain`, and `key` are declared as non-nullable
but upstream uses coalesce() without a final fallback so NULLs can be inserted;
either make those columns nullable (change types to Nullable(String) or
Nullable(LowCardinality(String)) as appropriate) or update the upstream pipe to
provide a non-NULL default (wrap the existing coalesce calls with a final
fallback like '', e.g., coalesce(..., '') ) so inserts never receive NULL for
these columns — pick one approach and apply consistently for all three fields.

Comment on lines +6 to +11
`workspace_id` LowCardinality(String),
`link_id` String,
`domain` String,
`key` String,
`url` String,
`event_id` String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Nullability mismatch between raw and MV can break ingestion.

Raw source defines workspace_id/domain/key as Nullable(String); MV has non-nullable String/LowCardinality(String). Inserts with NULLs will fail unless the pipe coerces to ''. Align schemas or coalesce in the pipe.

Option A — align MV schema:

-    `workspace_id` LowCardinality(String),
+    `workspace_id` LowCardinality(Nullable(String)),
-    `domain` String,
+    `domain` Nullable(String),
-    `key` String,
+    `key` Nullable(String),

Option B — keep MV non-nullable and coalesce upstream:

-- In the MV-building pipe
COALESCE(workspace_id, '') AS workspace_id,
COALESCE(domain, '') AS domain,
COALESCE(`key`, '') AS `key`,

Please confirm which approach you prefer; I can propose a full diff once chosen.

Also applies to: 17-19, 28-41


I'll help you verify this review comment by examining the datasource files and checking for the nullability mismatch. Let me start by exploring the codebase.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find datasource files related to dub_sale_events
fd -e datasource | grep -i "dub_sale_events"

Length of output: 116


🏁 Script executed:

#!/bin/bash
# List all files in the tinybird datasources directory
fd . packages/tinybird/datasources --type f | head -20

Length of output: 786


🏁 Script executed:

#!/bin/bash
# Show the MV datasource file content
cat packages/tinybird/datasources/dub_sale_events_mv.datasource

Length of output: 1268


🏁 Script executed:

#!/bin/bash
# Look for the raw datasource file
fd "dub_sale_events" packages/tinybird/datasources --type f

Length of output: 116


🏁 Script executed:

#!/bin/bash
# Search for pipe files that might reference these datasources
fd -e pipe packages/tinybird/ | xargs grep -l "dub_sale_events" 2>/dev/null | head -5

Length of output: 377



---

🏁 Script executed:

```shell
#!/bin/bash
# Show the raw datasource file content
cat packages/tinybird/datasources/dub_sale_events.datasource

Length of output: 1929


🏁 Script executed:

#!/bin/bash
# Search for pipe files related to dub_sale_events with corrected syntax
fd . packages/tinybird --full-path '.*dub_sale_events.*\.pipe'

Length of output: 2007


🏁 Script executed:

#!/bin/bash
# List all pipe files in tinybird
fd -e pipe packages/tinybird/

Length of output: 377


🏁 Script executed:

#!/bin/bash
# Check the MV pipe definition
fd "dub_sale_events_pipe" packages/tinybird/ -e pipe

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Show the pipe file content to see NULL handling
cat packages/tinybird/pipes/dub_sale_events_pipe.pipe

Length of output: 2043


Nullability mismatch confirmed—pipe's COALESCE insufficient without default fallback.

The raw source defines workspace_id, domain, and key as Nullable(String), while the MV expects non-nullable variants. The pipe uses coalesce(sale_event.workspace_id, link_metadata.workspace_id), but this still returns NULL when both inputs are NULL, which will fail insertion into the non-nullable MV columns.

Choose and implement one approach:

  • Option A — update MV schema to accept NULLs (lines 7, 9, 10 in dub_sale_events_mv.datasource)
  • Option B — update pipe to coalesce with empty string default (lines 7–9 in dub_sale_events_pipe.pipe)

Apply the same fix to the other affected fields at lines 17–19 and 28–41 of the MV datasource if they have the same issue.

🤖 Prompt for AI Agents
In packages/tinybird/datasources/dub_sale_events_mv.datasource around lines 6–11
(and similarly check lines 17–19 and 28–41), the MV defines workspace_id,
domain, and key as non-nullable but the pipe's coalesce can still produce NULL;
fix this by updating the pipe
(packages/tinybird/datasources/dub_sale_events_pipe.pipe, around the referenced
pipe lines 7–9 and the other affected expressions) to use a safe default in
coalesce (e.g., coalesce(sale_event.workspace_id, link_metadata.workspace_id,
'') and coalesce(...) with '' for domain and key and any other affected fields)
so the MV inserts never receive NULL, or alternatively update the MV schema to
Nullable for the listed columns if you prefer Option A (apply the same chosen
approach to the other affected fields at MV lines 17–19 and 28–41).

Comment on lines 6 to 9
SELECT
(SELECT COUNT(timestamp) FROM dub_click_events_mv) AS clicks,
(SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata_latest FINAL) AS links,
(SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links,
(SELECT SUM(amount) FROM dub_sale_events_mv) AS sales
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Links count is incorrect source and uses a brittle magic number.

Counting rows on dub_links_metadata will include historical versions/soft-deletes; the + 42036155 offset is opaque and risks drift. Use the deduped latest MV and exclude deleted rows.

Apply this diff:

-        (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links,
+        (SELECT countIf(deleted = 0) FROM dub_links_metadata_latest FINAL) AS links,

I'll verify the review comment by exploring the codebase structure and validating the concerns raised.


---

🏁 Script executed:

```shell
#!/bin/bash
# Check if the file exists and view its content
cat packages/tinybird/pipes/all_stats.pipe 2>/dev/null || echo "File not found at specified path"

Length of output: 272


🏁 Script executed:

#!/bin/bash
# Search for dub_links_metadata and related table definitions
rg -n "dub_links_metadata" --type=pipe --type=sql -C 3

Length of output: 28


🏁 Script executed:

#!/bin/bash
# Look for materialized view naming patterns and _latest variants
rg -n "_latest" --type=pipe --type=sql | head -20

Length of output: 28


🏁 Script executed:

#!/bin/bash
# Search for the magic number 42036155 in the codebase
rg -n "42036155" -C 2

Length of output: 429


🏁 Script executed:

#!/bin/bash
# Check for soft-delete or deleted field patterns
rg -n "deleted\s*=" --type=pipe --type=sql -C 2 | head -30

Length of output: 28


🏁 Script executed:

#!/bin/bash
# Look for other COUNT queries and FINAL patterns to understand conventions
rg -n "FINAL|countIf" --type=pipe --type=sql | head -20

Length of output: 28


I need to adjust my search strategy since .pipe is not a recognized file type. Let me explore the codebase structure more broadly.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find all pipe files and schema/definition files
fd -e pipe -e sql -e md | grep -E "(schema|definition|dub_links|dub_click|dub_sale)" | head -20

Length of output: 207


🏁 Script executed:

#!/bin/bash
# Search for dub_links_metadata definition or documentation
rg -n "dub_links_metadata" -C 3

Length of output: 9716


🏁 Script executed:

#!/bin/bash
# Search for _latest variants
rg -n "_latest" -C 2 | head -30

Length of output: 2069


🏁 Script executed:

#!/bin/bash
# Look for deleted or soft-delete field usage in any file
rg -n "deleted\s*=|WHERE.*deleted" -C 2 | head -30

Length of output: 2065


🏁 Script executed:

#!/bin/bash
# Find other pipe files to understand query patterns and conventions
fd -e pipe | head -20

Length of output: 928


🏁 Script executed:

#!/bin/bash
# Check git history for the magic number (when/why it was added)
git log -p --all -S "42036155" -- packages/tinybird/pipes/all_stats.pipe 2>/dev/null | head -50

Length of output: 59


Let me check the datasource definition for dub_links_metadata_latest to confirm the structure:


---

🏁 Script executed:

```shell
#!/bin/bash
# View the datasource definition
cat packages/tinybird/datasources/dub_links_metadata_latest.datasource

Length of output: 533


🏁 Script executed:

#!/bin/bash
# View the pipe that generates the materialized view
cat packages/tinybird/pipes/dub_links_metadata_pipe.pipe

Length of output: 587


Use the materialized view and exclude soft-deleted rows to match codebase patterns.

The issue is confirmed. All related queries (v3_count, v3_timeseries, v3_events, v3_group_by, v2_top_programs, and event pipes) consistently use dub_links_metadata_latest FINAL with WHERE deleted == 0, but this query incorrectly references the raw dub_links_metadata table with an unexplained magic number offset. The datasource is properly configured as a ReplacingMergeTree with soft-delete support, making the raw table access both dangerous and inconsistent.

Adjust the suggested diff for ClickHouse syntax consistency:

-        (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links,
+        (SELECT countIf(deleted == 0) FROM dub_links_metadata_latest FINAL) AS links,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
(SELECT COUNT(timestamp) FROM dub_click_events_mv) AS clicks,
(SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata_latest FINAL) AS links,
(SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links,
(SELECT SUM(amount) FROM dub_sale_events_mv) AS sales
SELECT
(SELECT COUNT(timestamp) FROM dub_click_events_mv) AS clicks,
(SELECT countIf(deleted == 0) FROM dub_links_metadata_latest FINAL) AS links,
(SELECT SUM(amount) FROM dub_sale_events_mv) AS sales
🤖 Prompt for AI Agents
In packages/tinybird/pipes/all_stats.pipe around lines 6 to 9, the query
incorrectly reads from the raw dub_links_metadata table and adds a hard-coded
offset; replace that reference with the materialized ReplacingMergeTree view
dub_links_metadata_latest FINAL and exclude soft-deleted rows by adding WHERE
deleted == 0, and remove the unexplained magic number so the links count is
simply the COUNT(timestamp) from the view with the deleted filter.

WHEN trigger = '' THEN CASE WHEN qr = true THEN 'qr' ELSE 'link' END ELSE trigger
END as trigger
FROM dub_click_events
SELECT * FROM dub_click_events_mv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Avoid SELECT * in pipes to a typed datasource.

This is brittle to upstream schema drift and column order. Explicitly select columns in the ID DS order.

-    SELECT * FROM dub_click_events_mv
+    SELECT
+      timestamp,
+      click_id,
+      workspace_id,
+      link_id,
+      domain,
+      key,
+      url,
+      continent,
+      country,
+      city,
+      region,
+      latitude,
+      longitude,
+      device,
+      browser,
+      os,
+      trigger,
+      ua,
+      referer,
+      referer_url,
+      ip,
+      identity_hash,
+      device_model,
+      device_vendor,
+      browser_version,
+      os_version,
+      engine,
+      engine_version,
+      cpu_architecture,
+      qr,
+      bot
+    FROM dub_click_events_mv

Confirm the exact column order in packages/tinybird/datasources/dub_click_events_id.datasource and mirror it here.


I'll verify the review comment by examining the datasource file to confirm the exact column order.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find and read the datasource file to verify column order
find . -name "dub_click_events_id.datasource" -type f

Length of output: 62


🏁 Script executed:

#!/bin/bash
# Read the datasource file to extract column definitions
cat packages/tinybird/datasources/dub_click_events_id.datasource

Length of output: 1091


🏁 Script executed:

#!/bin/bash
# Read the pipe file to see current state
cat packages/tinybird/pipes/dub_click_events_id_pipe.pipe

Length of output: 125


🏁 Script executed:

#!/bin/bash
# Also search for the materialized view definition
find . -name "dub_click_events_mv*" -type f

Length of output: 62


🏁 Script executed:

#!/bin/bash
# Search for any file containing dub_click_events_mv definition
rg -l "dub_click_events_mv" --type-list | head -20
rg "CREATE MATERIALIZED VIEW dub_click_events_mv" -A 50

Length of output: 389


Refactor SELECT * to explicit columns matching datasource schema.

The review comment is correct and verified. The suggested column list in the diff exactly matches the column order from packages/tinybird/datasources/dub_click_events_id.datasource. Applying this change eliminates brittleness from upstream schema drift and ensures the pipe explicitly declares which columns it depends on.

🤖 Prompt for AI Agents
In packages/tinybird/pipes/dub_click_events_id_pipe.pipe around line 6, replace
the brittle "SELECT * FROM dub_click_events_mv" with an explicit SELECT listing
the columns in the exact order defined in
packages/tinybird/datasources/dub_click_events_id.datasource; update the query
to enumerate each column name from that datasource (preserving the same order
and names) instead of using *, so the pipe depends only on the declared schema
and is resilient to upstream schema drift.

Comment on lines +9 to 15
toLowCardinality(
coalesce(lead_event.workspace_id, link_metadata.workspace_id)
) AS workspace_id,
link_id,
customer_id,
coalesce(lead_event.domain, link_metadata.domain) AS domain,
coalesce(lead_event.key, link_metadata.key) AS key,
url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prevent NULL→non‑nullable inserts; drop redundant subquery; align cardinality.

  • Coalescing without a default can still return NULL; dub_lead_events_mv.datasource defines workspace_id/domain/key as non‑nullable. Add '' fallback.
  • The WHERE link_id IN (SELECT link_id FROM dub_lead_events) forces a full scan and is unnecessary for a LEFT JOIN on link_id.
  • Wrap device/browser/os with LC to match DS and reduce footprint.

Apply this diff:

-        toLowCardinality(
-            coalesce(lead_event.workspace_id, link_metadata.workspace_id)
-        ) AS workspace_id,
+        toLowCardinality(
+            coalesce(lead_event.workspace_id, link_metadata.workspace_id, '')
+        ) AS workspace_id,

-        coalesce(lead_event.domain, link_metadata.domain) AS domain,
-        coalesce(lead_event.key, link_metadata.key) AS key,
+        coalesce(lead_event.domain, link_metadata.domain, '') AS domain,
+        coalesce(lead_event.key, link_metadata.key, '') AS key,

-        device,
-        browser,
-        os,
+        toLowCardinality(device) device,
+        toLowCardinality(browser) browser,
+        toLowCardinality(os) os,
@@
-            SELECT link_id, workspace_id, domain, key
-            FROM dub_links_metadata_latest FINAL
-            WHERE link_id IN (SELECT link_id FROM dub_lead_events)
+            SELECT link_id, workspace_id, domain, key
+            FROM dub_links_metadata_latest FINAL

Also applies to: 13-14, 27-31, 39-47, 49-54

🤖 Prompt for AI Agents
In packages/tinybird/pipes/dub_lead_events_pipe.pipe around lines 9-15 (and
similarly fix ranges 13-14, 27-31, 39-47, 49-54): prevent NULL→non‑nullable
inserts by changing coalesce(...) to coalesce(..., '') for workspace_id, domain
and key; remove the redundant WHERE link_id IN (SELECT link_id FROM
dub_lead_events) subquery since the LEFT JOIN on link_id already covers matching
and the subquery forces a full scan; and wrap device, browser and os with
toLowCardinality(...) to align cardinality with the datasource and reduce
footprint. Ensure all adjusted coalesce calls include the empty-string fallback
and replace field references accordingly without adding extra filters.

Comment on lines +23 to +24
range(toUInt32(start + 86400), toUInt32(end + 86400),
86400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Off-by-one error: first day is excluded from the interval range.

Adding 86400 seconds to the start parameter causes the range() function to begin generating intervals from the day after the requested start date. For example, if start is "2025-09-03 00:00:00" and end is "2025-09-05 00:00:00", the current implementation generates intervals for Sep 4 and Sep 5, but omits Sep 3.

Apply this diff to fix the off-by-one error:

-                    range(toUInt32(start + 86400), toUInt32(end + 86400),
+                    range(toUInt32(start), toUInt32(end + 86400),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
range(toUInt32(start + 86400), toUInt32(end + 86400),
86400
range(toUInt32(start), toUInt32(end + 86400),
86400
🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around lines 23-24, the range() call
adds 86400 to the start value which causes the first day to be omitted; change
the start argument so the range begins at the provided start (remove the +86400
on start) and keep the end adjusted as needed (e.g., end + 86400) so the full
inclusive interval from start through end is generated.

%
SELECT
toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval,
uniq(*) AS events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify the aggregation logic for event counting.

Using uniq(*) counts unique timestamp values across all three event types, not the total number of events. If the intent is to count all events (including duplicate timestamps), use count() instead. If the intent is indeed to count unique timestamp values, consider renaming the column to reflect this (e.g., unique_timestamps).

Run the following script to check if there are examples of similar aggregation patterns elsewhere in the codebase:


🏁 Script executed:

#!/bin/bash
# Search for usage patterns of uniq(*) vs count() in other Tinybird pipes
rg -n -C2 'uniq\(\*\)' --type=sql -g '*.pipe'
rg -n -C2 'count\(\)' --type=sql -g '*.pipe'

Length of output: 1611



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full context of the v3_usage.pipe file around line 37, specifically the complete query block
cat packages/tinybird/pipes/v3_usage.pipe | sed -n '31,77p'

Length of output: 1568



---

🌐 Web query:

```web
ClickHouse uniq(*) function syntax semantics

💡 Result:

Short answer

  • ClickHouse does not have a special uniq(*) syntax. The uniq aggregate is called as uniq(expr[, ...]) — i.e. pass the column(s) or tuple(s) you want to count distinct values for. [1]

What uniq does (semantics)

  • uniq(x[, ...]) returns an approximate number of distinct values (UInt64). It hashes the arguments and uses an adaptive sampling algorithm (keeps up to 65,536 hashes) to produce a deterministic approximate cardinality. Use it in most cases for memory/CPU efficiency. [1]

When to use exact / other variants

  • uniqExact(x[, ...]) — exact distinct count (unbounded state / more memory). [2]
  • uniqCombined / uniqCombined64 / uniqHLL12 / uniqTheta — alternative approximate implementations with different memory/accuracy trade‑offs (uniqCombined is a recommended lower-memory, higher-accuracy alternative). [3][4]

Notes on syntax and combinators

  • uniq accepts multiple arguments (they are combined into the hashed key) and supports aggregate combinators/suffixes like -If and -Array (e.g., uniqIf(...), uniqArray(...)). You can also store intermediate states via AggregateFunction(uniq, ...) and use -State/-Merge forms. [5][6]

References

  1. ClickHouse docs — uniq. [1]
  2. ClickHouse docs — uniqExact. [2]
  3. ClickHouse docs — uniqCombined. [3]
  4. ClickHouse docs — other uniq family / HLL notes. [11]
  5. ClickHouse docs — aggregate function combinators (-If, -Array). [8]
  6. ClickHouse docs — AggregateFunction type and -State/-Merge usage. [7]

If you want, tell me the exact expression you tried (e.g. uniq(*), uniq((col1,col2)), COUNT(DISTINCT col)) and I’ll show the correct query and explain differences.



---

🌐 Web query:

```web
Tinybird uniq(*) aggregate function syntax

💡 Result:

Use uniq(column) — i.e. uniq(x[, ...]). Tinybird follows ClickHouse-style syntax: pass one or more columns/expressions (or a Tuple/Array) to uniq; there is no special uniq(*) form. For aggregate-state storage use uniqState(...) and uniqMerge(...) / AggregateFunction(uniq, ...). [1][2]

Sources:
[1] Tinybird Aggregate functions (uniq, uniqState/uniqMerge, combinators).
[2] ClickHouse uniq reference (uniq(x[, ...]) and AggregateFunction usage).


Fix invalid uniq(*) syntax—must specify column name.

uniq(*) is not valid Tinybird/ClickHouse syntax. Use count() to count total events or uniq(timestamp) for distinct timestamps. This query also has the same issue in:

  • v3_usage.pipe line 100: uniq(*) as links
  • v3_timeseries.pipe line 212: uniq(*) as leads
  • v3_timeseries.pipe line 290: uniq(*) as sales

All must be corrected to either count() (for total count) or uniq(column_name) (for distinct values).

🤖 Prompt for AI Agents
packages/tinybird/pipes/v3_usage.pipe lines 37 and 100, and
packages/tinybird/pipes/v3_timeseries.pipe lines 212 and 290: the query uses
invalid Tinybird/ClickHouse syntax `uniq(*)`; replace each occurrence with the
appropriate aggregation—use `count()` if you intend a total event/row count, or
`uniq(<column_name>)` if you intend to count distinct values (e.g.,
`uniq(timestamp)` or `uniq(user_id)`); update the specific column name based on
the metric semantics for events, links, leads, and sales.


NODE usage_links_data
DESCRIPTION >
undefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove placeholder description.

The description "undefined" should be replaced with a meaningful description of what the usage_links_data node does, such as "Aggregates unique link creation counts per day within the specified workspace and date range."

🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around line 93, the node description
currently reads "undefined"; replace this placeholder with a meaningful
description such as "Aggregates unique link creation counts per day within the
specified workspace and date range." Update the description field for the
usage_links_data node to clearly state its purpose, inputs (workspace,
start_date, end_date) and output (daily unique link counts), keeping it concise
and aligned with existing node documentation format.


%
SELECT
toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Field name inconsistency: timestamp vs created_at.

Line 99 uses the timestamp field to calculate the interval, but lines 112-113 filter using the created_at field. This inconsistency will cause incorrect results or a runtime error if timestamp doesn't exist in dub_links_metadata_latest. Both should use the same field, likely created_at.

Apply this diff to fix the inconsistency:

-        toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval,
+        toDateTime64(toStartOfDay(created_at, {{ String(timezone, 'UTC') }}), 3) AS interval,

Also applies to: 112-113

🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around lines 99 and 112-113, the code
uses timestamp to compute the interval but filters on created_at, causing a
field-name mismatch; update the interval expression to use created_at (i.e.,
replace timestamp with created_at in the toDateTime64/toStartOfDay call) so both
the grouping/interval and the WHERE filters reference the same created_at field
(keeping the existing timezone handling intact).

%
SELECT
toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval,
uniq(*) as links
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Ambiguous aggregation: specify the field for uniq() or use count().

Using uniq(*) without specifying which field(s) determine uniqueness is unclear and makes the query harder to understand. Consider using count() if you want to count all link records per interval, or uniq(link_id) if you want to count unique link IDs.

Example fix using count():

-        uniq(*) as links
+        count() as links

Or if you need unique link IDs:

-        uniq(*) as links
+        uniq(link_id) as links
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uniq(*) as links
count() as links
🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around line 100, the aggregation uses
uniq(*) which is ambiguous; replace uniq(*) with an explicit aggregation —
either use count(*) if you want the total number of link records per interval,
or use uniq(link_id) (or the appropriate unique field name, e.g., url or
link_id) if you want the number of distinct links; update the query to call the
chosen function and adjust any downstream code or aliases to match the new field
name (e.g., links_count or unique_links).

@steven-tey steven-tey merged commit e659ea6 into main Oct 24, 2025
7 checks passed
@steven-tey steven-tey deleted the tb-pipes branch October 24, 2025 22:06
dbryson pushed a commit to beyond-the-checkout/btc that referenced this pull request Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants