-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Finalize Tinybird pipes/datasources #3001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughThis pull request comprehensively refactors the Tinybird analytics infrastructure by quoting TOKEN values across datasources for consistency, enriching schemas with workspace-scoped fields (workspace_id, domain, key), removing 23 v2 endpoint pipelines, and introducing three new v3 pipes (v3_count, v3_group_by, v3_timeseries, v3_usage) alongside extensive schema transformations and materialization updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Analytics Client
participant Router as Endpoint Router
participant V3Pipe as V3 Pipe (Count/Events/etc.)
participant WL as Workspace Links<br/>(Metadata Filter)
participant Events as Event Tables<br/>(Click/Lead/Sale MV)
Client->>Router: Query eventType=clicks, workspace_id=ws_123
Router->>V3Pipe: Route to v3_count/v3_events
V3Pipe->>WL: Filter workspace links
WL-->>V3Pipe: Valid link_ids for workspace
V3Pipe->>Events: Query with workspace context
Events-->>V3Pipe: Filtered event data
V3Pipe->>V3Pipe: Aggregate (GROUP BY, ORDER BY LIMIT)
V3Pipe-->>Router: Result with groupByField
Router-->>Client: Aggregated analytics
Note over Client,Events: V2 pipes (deleted) no longer in path<br/>V3 introduces workspace-scoped filtering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 34
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
packages/tinybird/pipes/v3_events.pipe (1)
1-2: Update the description to match the pipe's functionality.The description says "Top countries" but this pipe fetches individual events (clicks, leads, sales), not country aggregations. Update the description to accurately reflect the pipe's purpose.
DESCRIPTION > - Top countries + Fetch individual events (clicks, leads, sales) with filteringpackages/tinybird/pipes/v3_count.pipe (2)
75-76: Fix referer URL filtering (1‑based index).- {% if defined(refererUrl) %} AND splitByString('?', referer_url)[1] = {{ refererUrl }} {% end %} + {% if defined(refererUrl) %} AND splitByString('?', referer_url)[2] = {{ refererUrl }} {% end %}
197-199: Fix referer URL filtering in sales (1‑based index).- {% if defined(refererUrl) %} - AND splitByString('?', referer_url)[1] = {{ refererUrl }} - {% end %} + {% if defined(refererUrl) %} + AND splitByString('?', referer_url)[2] = {{ refererUrl }} + {% end %}packages/tinybird/pipes/get_lead_event.pipe (1)
13-20: Remove real-looking customerId default and resolve required/default mismatch.Hardcoding "cus_JzMqCL…" risks leaking PII and conflicts with required=True. Use a neutral placeholder or drop default.
- customerId, - "cus_JzMqCLdaiVM1o1grw0yk84uC", + customerId, + "", description="The unique ID for a given customer.", - required=True, + required=True,
🧹 Nitpick comments (29)
packages/tinybird/pipes/get_audit_logs.pipe (2)
14-16: Confirm intent to expose PII (IP, UA, metadata) and access controls.These fields can be personal data. Verify endpoint is auth‑protected and exposure is required. If not strictly needed, consider gating with a param (e.g., include_sensitive=false by default) or masking IP/UA. I can propose a templated SELECT variant if desired.
24-25: Type check for equality filters.Ensure workspace_id/program_id column types match String(). If columns are numeric/UUID, use the appropriate sanitizer/cast (e.g., Int64/UUID) to avoid type mismatch and suboptimal query plans.
packages/tinybird/pipes/v3_events.pipe (1)
63-69: Consider documenting the duplicated link filtering logic.The link filtering logic is duplicated across
click_events(lines 63-69),lead_events(lines 118-124), andsale_events(lines 196-202). While Tinybird's architecture may require this duplication, it increases maintenance burden—any change to the filtering logic must be replicated in three places.Consider adding a comment documenting this duplication and the need to keep all three implementations synchronized.
+ {# Link filtering logic - keep synchronized across click_events, lead_events, and sale_events #} {% if defined(linkIds) %} AND link_id IN {{ Array(linkIds, 'String') }}Also applies to: 118-124, 196-202
packages/tinybird/pipes/v3_usage.pipe (1)
1-2: Consider a more specific description.The current description "Timeseries data" is generic. Consider updating it to reflect the specific purpose of this pipe, such as "Usage metrics for events and links by day" or "Daily usage timeseries for workspace events and link creation."
packages/tinybird/pipes/v3_group_by_link_metadata.pipe (2)
232-233: Add LIMIT for the composite node.Upstream node limits help, but add an explicit LIMIT for defensive consistency.
- ORDER BY saleAmount DESC + ORDER BY saleAmount DESC + LIMIT 5000
242-246: Tiny spacing nit in endpoint.Add a space before
group_by_link_metadata_salesfor readability.- {% elif eventType == 'sales' %}group_by_link_metadata_sales + {% elif eventType == 'sales' %} group_by_link_metadata_salespackages/tinybird/pipes/v3_group_by.pipe (4)
234-235: Standardize time filter types to DateTime64.If
timestampis DateTime64 in MVs (as used in clicks), align leads.- {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %} - {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %} + {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %} + {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}
356-357: Standardize time filter types to DateTime64 (sales).- {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %} - {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %} + {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %} + {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}
426-427: Add explicit LIMIT to composite.Keep result size bounded post-join.
- ORDER BY clicks DESC + ORDER BY clicks DESC + LIMIT 5000
438-441: Tiny spacing nit in endpoint.Add spaces for readability around
group_by_sales/group_by_composite.- {% elif eventType == 'sales' %}group_by_sales - {% elif eventType == 'composite' %}group_by_composite + {% elif eventType == 'sales' %} group_by_sales + {% elif eventType == 'composite' %} group_by_compositepackages/tinybird/pipes/v3_count.pipe (4)
141-142: Standardize to DateTime64 in leads.- {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %} - {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %} + {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %} + {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}
222-223: Standardize to DateTime64 in sales.- {% if defined(start) %} AND timestamp >= {{ DateTime(start) }} {% end %} - {% if defined(end) %} AND timestamp <= {{ DateTime(end) }} {% end %} + {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %} + {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %}
230-242: Metadata numeric comparisons currently lexicographic.For >, <, >=, <= on numeric metadata, cast to numeric to avoid string ordering pitfalls.
Example for one branch:
- {% elif operator == 'greaterThan' %} - AND JSONExtractString(metadata, {{ metadataKey }}) > {{ value }} + {% elif operator == 'greaterThan' %} + AND toFloat64OrNull(JSONExtractString(metadata, {{ metadataKey }})) > toFloat64OrNull({{ value }})Apply similarly to <, <=, >=. Keep equals/notEquals as string.
265-270: Endpoint defaults look good. Minor readability.Optional: add spaces before
count_sales/count_compositefor consistency.- {% elif eventType == 'sales' %} count_sales - {% elif eventType == 'composite' %} count_composite + {% elif eventType == 'sales' %} count_sales + {% elif eventType == 'composite' %} count_compositepackages/tinybird/datasources/dub_lead_events.datasource (1)
34-38: Optimize new text columns for cardinality.
trigger, and oftendomain/key/workspace_id, are low‑cardinality. Marking them accordingly reduces storage/memory.Minimal change:
- `trigger` String `json:$.trigger`, + `trigger` LowCardinality(String) `json:$.trigger`,Optionally (verify CH version supports LC+Nullable):
- `workspace_id` Nullable(String) `json:$.workspace_id`, - `domain` Nullable(String) `json:$.domain`, - `key` Nullable(String) `json:$.key` + `workspace_id` LowCardinality(Nullable(String)) `json:$.workspace_id`, + `domain` LowCardinality(Nullable(String)) `json:$.domain`, + `key` LowCardinality(Nullable(String)) `json:$.key`packages/tinybird/datasources/dub_click_events.datasource (1)
35-38: Use LowCardinality for small-enum/text columns.
trigger, and commonlyworkspace_id/domain/key, benefit from LC encoding.- `trigger` String `json:$.trigger`, + `trigger` LowCardinality(String) `json:$.trigger`,Optionally (verify CH version):
- `workspace_id` Nullable(String) `json:$.workspace_id`, - `domain` Nullable(String) `json:$.domain`, - `key` Nullable(String) `json:$.key` + `workspace_id` LowCardinality(Nullable(String)) `json:$.workspace_id`, + `domain` LowCardinality(Nullable(String)) `json:$.domain`, + `key` LowCardinality(Nullable(String)) `json:$.key`packages/tinybird/pipes/get_lead_event.pipe (2)
1-2: Description doesn’t match behavior (read vs update).Rename to reflect read/lookup to avoid confusion in ops/docs.
9-22: Harden endpoint: avoid SELECT * and add caps/filters.
- Select explicit columns.
- Add LIMIT (parametrized) and optional time window (start/end) to cap scans.
- Type parameters to prevent injection/implicit casts.
- SELECT * + SELECT timestamp, event_id, event_name, link_id, customer_id, url, device, browser, os, referer, referer_url FROM dub_lead_events_mv WHERE customer_id - = {{ - String( - customerId, - "", - description="The unique ID for a given customer.", - required=True, - ) - }} - {% if defined(eventName) %} AND event_name = {{ eventName }} {% end %} + = {{ String(customerId, "", description="The unique ID for a given customer.", required=True) }} + {% if defined(eventName) %} AND event_name = {{ String(eventName) }} {% end %} + {% if defined(start) %} AND timestamp >= {{ DateTime64(start) }} {% end %} + {% if defined(end) %} AND timestamp <= {{ DateTime64(end) }} {% end %} ORDER BY timestamp DESC + LIMIT {{ Int32(limit, 1000, description="Max rows") }}packages/tinybird/datasources/dub_links_metadata_latest.datasource (1)
18-21: Consider (re)adding monthly partitioning for scan pruning.Without ENGINE_PARTITION_KEY on timestamp, large-range scans may degrade. Recommend toYYYYMM(timestamp) if retention/time filters are common.
ENGINE "ReplacingMergeTree" ENGINE_SORTING_KEY "workspace_id, link_id" ENGINE_VER "timestamp" ENGINE_IS_DELETED "deleted" +ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"Please confirm if Tinybird auto-partitions this table elsewhere; if so, ignore.
packages/tinybird/pipes/v2_top_programs.pipe (3)
9-12: FINAL on dub_links_metadata_latest can be expensive.If possible, avoid FINAL by materializing a deduped lookup (deleted==0, latest) or using argMaxState/argMaxFinal. Keep FINAL only if correctness requires it at read-time.
Can you share row volume and latency targets to decide whether to precompute this JOIN input?
111-151: Sales query looks good; add optional trigger parity (if needed).Clicks support trigger filtering; sales/leads don’t. If symmetry is desired, add se.trigger filter.
- {% if defined(os) %} AND se.os = {{ os }} {% end %} + {% if defined(os) %} AND se.os = {{ os }} {% end %} + {% if defined(trigger) %} AND se.trigger = {{ String(trigger) }} {% end %}
170-176: Validate eventType and default safely.Constrain to allowed values and set a default to avoid typos falling into sales branch.
- {% if eventType == 'clicks' %} top_programs_clicks + {% if String(eventType, 'sales') == 'clicks' %} top_programs_clicks {% elif eventType == 'leads' %} top_programs_leads {% elif eventType == 'composite' %} top_programs_composite {% else %} top_programs_sales {% end %}Also confirm composite’s intent: it LEFT JOINs from clicks; programs with sales/leads but zero clicks won’t appear.
packages/tinybird/pipes/dub_sale_events_pipe.pipe (2)
22-24: Make sale_type deterministic; remove extra scan.Timestamp equality can label multiple rows as "new" on ties; also the first_sales join adds an avoidable full scan. Prefer windowed row_number and drop the join.
- -- sale_type: "new" if this sale is the earliest one for (customer_id, link_id) - if(timestamp = first_sale_ts, 'new', 'recurring') AS sale_type, + -- sale_type: deterministic first sale per (customer_id, link_id) + if( + row_number() OVER (PARTITION BY customer_id, link_id ORDER BY timestamp ASC, event_id ASC) = 1, + 'new', + 'recurring' + ) AS sale_type, @@ - LEFT JOIN - ( - -- find the first sale timestamp for each customer_id:link_id pair - SELECT customer_id, link_id, min(timestamp) AS first_sale_ts - FROM dub_sale_events - GROUP BY customer_id, link_id - ) AS first_sales USING (customer_id, link_id) + -- first_sale computed via window function; no need for a pre-aggregated joinAlso applies to: 57-64
65-70: Avoid IN-subquery on the right side; join directly.The WHERE link_id IN (SELECT link_id FROM dub_sale_events) likely triggers a full scan and duplicates the left read. Drop it and rely on the LEFT JOIN.
- LEFT JOIN - ( - SELECT link_id, workspace_id, domain, key - FROM dub_links_metadata_latest FINAL - WHERE link_id IN (SELECT link_id FROM dub_sale_events) - ) AS link_metadata USING (link_id) + LEFT JOIN + ( + SELECT link_id, workspace_id, domain, key + FROM dub_links_metadata_latest FINAL + ) AS link_metadata USING (link_id)If you expect multiple rows per link_id in dub_links_metadata_latest even after FINAL, consider
ANY LEFT JOINto cap matches at one.packages/tinybird/datasources/dub_click_events_mv.datasource (1)
20-21: Consider LowCardinality for trigger; revisit latitude/longitude types.trigger has tiny domain; LowCardinality(String) saves space. If possible, store latitude/longitude as Decimal(9,6) for numeric ops; keep String only if non-numeric values occur.
Validate query patterns; if trigger is frequently grouped/filtered, LC helps.
Also applies to: 33-34
packages/tinybird/datasources/dub_click_events_id.datasource (1)
36-39: Validate partitioning for scale.tuple() disables partitioning; fine for small tables, risky at high volume. If rows are high and retained long, prefer toYYYYMM(timestamp) to ease merges/TTL without hurting click_id lookups.
packages/tinybird/pipes/dub_click_events_pipe.pipe (1)
45-51: Drop IN-subquery; consider FINAL on metadata.Remove WHERE link_id IN (...) to avoid scanning dub_click_events again. If link_metadata may contain multiple versions, add FINAL for determinism with ANY LEFT JOIN.
- FROM dub_click_events AS click_event ANY - LEFT JOIN - ( - SELECT link_id, workspace_id, domain, key - FROM dub_links_metadata_latest - WHERE link_id IN (SELECT link_id FROM dub_click_events) - ) AS link_metadata USING link_id + FROM dub_click_events AS click_event ANY + LEFT JOIN + ( + SELECT link_id, workspace_id, domain, key + FROM dub_links_metadata_latest FINAL + ) AS link_metadata USING link_idConfirm dub_links_metadata_latest is deduplicated with FINAL; otherwise retain ANY.
packages/tinybird/pipes/v3_timeseries.pipe (2)
273-276: Remove placeholder description.The description "undefined" should be replaced with a meaningful description or removed entirely.
Apply this diff:
NODE timeseries_sales_data -DESCRIPTION > - undefined - SQL >
342-342: Verify if duplicatesaleAmountcolumn is necessary.Line 342 selects
amount, amount as saleAmount, creating two columns with identical values. If this is for backward compatibility or API contract, consider adding a comment explaining why both are needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (100)
packages/tinybird/datasources/dub_audit_logs.datasource(1 hunks)packages/tinybird/datasources/dub_click_events.datasource(2 hunks)packages/tinybird/datasources/dub_click_events_id.datasource(1 hunks)packages/tinybird/datasources/dub_click_events_mv.datasource(1 hunks)packages/tinybird/datasources/dub_conversion_events_log.datasource(1 hunks)packages/tinybird/datasources/dub_import_error_logs.datasource(1 hunks)packages/tinybird/datasources/dub_lead_events.datasource(2 hunks)packages/tinybird/datasources/dub_lead_events_mv.datasource(1 hunks)packages/tinybird/datasources/dub_links_metadata.datasource(1 hunks)packages/tinybird/datasources/dub_links_metadata_latest.datasource(1 hunks)packages/tinybird/datasources/dub_regular_links_metadata_latest.datasource(0 hunks)packages/tinybird/datasources/dub_sale_events.datasource(2 hunks)packages/tinybird/datasources/dub_sale_events_id.datasource(0 hunks)packages/tinybird/datasources/dub_sale_events_mv.datasource(1 hunks)packages/tinybird/datasources/dub_webhook_events.datasource(1 hunks)packages/tinybird/endpoints/all_stats.pipe(0 hunks)packages/tinybird/endpoints/coordinates_all.pipe(0 hunks)packages/tinybird/endpoints/coordinates_sales.pipe(0 hunks)packages/tinybird/endpoints/get_audit_logs.pipe(0 hunks)packages/tinybird/endpoints/get_click_event.pipe(0 hunks)packages/tinybird/endpoints/get_framer_lead_events.pipe(0 hunks)packages/tinybird/endpoints/get_import_error_logs.pipe(0 hunks)packages/tinybird/endpoints/get_lead_event.pipe(0 hunks)packages/tinybird/endpoints/get_lead_event_by_id.pipe(0 hunks)packages/tinybird/endpoints/get_lead_events.pipe(0 hunks)packages/tinybird/endpoints/get_sale_event.pipe(0 hunks)packages/tinybird/endpoints/get_webhook_events.pipe(0 hunks)packages/tinybird/endpoints/v2_browsers.pipe(0 hunks)packages/tinybird/endpoints/v2_cities.pipe(0 hunks)packages/tinybird/endpoints/v2_continents.pipe(0 hunks)packages/tinybird/endpoints/v2_count.pipe(0 hunks)packages/tinybird/endpoints/v2_countries.pipe(0 hunks)packages/tinybird/endpoints/v2_customer_events.pipe(0 hunks)packages/tinybird/endpoints/v2_devices.pipe(0 hunks)packages/tinybird/endpoints/v2_events.pipe(0 hunks)packages/tinybird/endpoints/v2_os.pipe(0 hunks)packages/tinybird/endpoints/v2_referer_urls.pipe(0 hunks)packages/tinybird/endpoints/v2_referers.pipe(0 hunks)packages/tinybird/endpoints/v2_regions.pipe(0 hunks)packages/tinybird/endpoints/v2_timeseries.pipe(0 hunks)packages/tinybird/endpoints/v2_top_links.pipe(0 hunks)packages/tinybird/endpoints/v2_top_partners.pipe(0 hunks)packages/tinybird/endpoints/v2_top_programs.pipe(0 hunks)packages/tinybird/endpoints/v2_top_tags.pipe(0 hunks)packages/tinybird/endpoints/v2_top_urls.pipe(0 hunks)packages/tinybird/endpoints/v2_triggers.pipe(0 hunks)packages/tinybird/endpoints/v2_usage.pipe(0 hunks)packages/tinybird/endpoints/v2_utms.pipe(0 hunks)packages/tinybird/materializations/dub_click_events_id_pipe.pipe(0 hunks)packages/tinybird/materializations/dub_click_events_pipe.pipe(0 hunks)packages/tinybird/materializations/dub_lead_events_pipe.pipe(0 hunks)packages/tinybird/materializations/dub_links_metadata_pipe.pipe(0 hunks)packages/tinybird/materializations/dub_regular_links_metadata_pipe.pipe(0 hunks)packages/tinybird/materializations/dub_sale_events_id_pipe.pipe(0 hunks)packages/tinybird/materializations/dub_sale_events_pipe.pipe(0 hunks)packages/tinybird/pipes/all_stats.pipe(1 hunks)packages/tinybird/pipes/coordinates_all.pipe(1 hunks)packages/tinybird/pipes/coordinates_sales.pipe(1 hunks)packages/tinybird/pipes/dub_click_events_id_pipe.pipe(1 hunks)packages/tinybird/pipes/dub_click_events_pipe.pipe(1 hunks)packages/tinybird/pipes/dub_click_events_pipe_with_domain_key.pipe(0 hunks)packages/tinybird/pipes/dub_lead_events_pipe.pipe(1 hunks)packages/tinybird/pipes/dub_links_metadata_pipe.pipe(1 hunks)packages/tinybird/pipes/dub_regular_links_metadata_pipe.pipe(0 hunks)packages/tinybird/pipes/dub_sale_events_id_pipe.pipe(0 hunks)packages/tinybird/pipes/dub_sale_events_pipe.pipe(1 hunks)packages/tinybird/pipes/get_audit_logs.pipe(1 hunks)packages/tinybird/pipes/get_click_event.pipe(1 hunks)packages/tinybird/pipes/get_framer_lead_events.pipe(1 hunks)packages/tinybird/pipes/get_import_error_logs.pipe(1 hunks)packages/tinybird/pipes/get_lead_event.pipe(1 hunks)packages/tinybird/pipes/get_lead_event_by_id.pipe(0 hunks)packages/tinybird/pipes/get_lead_events.pipe(1 hunks)packages/tinybird/pipes/get_sale_event.pipe(0 hunks)packages/tinybird/pipes/get_webhook_events.pipe(1 hunks)packages/tinybird/pipes/v2_browsers.pipe(0 hunks)packages/tinybird/pipes/v2_cities.pipe(0 hunks)packages/tinybird/pipes/v2_continents.pipe(0 hunks)packages/tinybird/pipes/v2_countries.pipe(0 hunks)packages/tinybird/pipes/v2_customer_events.pipe(1 hunks)packages/tinybird/pipes/v2_devices.pipe(0 hunks)packages/tinybird/pipes/v2_os.pipe(0 hunks)packages/tinybird/pipes/v2_referer_urls.pipe(0 hunks)packages/tinybird/pipes/v2_referers.pipe(0 hunks)packages/tinybird/pipes/v2_regions.pipe(0 hunks)packages/tinybird/pipes/v2_timeseries.pipe(0 hunks)packages/tinybird/pipes/v2_top_links.pipe(0 hunks)packages/tinybird/pipes/v2_top_partners.pipe(0 hunks)packages/tinybird/pipes/v2_top_programs.pipe(2 hunks)packages/tinybird/pipes/v2_top_tags.pipe(0 hunks)packages/tinybird/pipes/v2_top_urls.pipe(0 hunks)packages/tinybird/pipes/v2_triggers.pipe(0 hunks)packages/tinybird/pipes/v2_usage.pipe(0 hunks)packages/tinybird/pipes/v2_utms.pipe(0 hunks)packages/tinybird/pipes/v3_count.pipe(8 hunks)packages/tinybird/pipes/v3_events.pipe(8 hunks)packages/tinybird/pipes/v3_group_by.pipe(1 hunks)packages/tinybird/pipes/v3_group_by_link_metadata.pipe(1 hunks)packages/tinybird/pipes/v3_timeseries.pipe(1 hunks)packages/tinybird/pipes/v3_usage.pipe(1 hunks)
💤 Files with no reviewable changes (64)
- packages/tinybird/pipes/get_lead_event_by_id.pipe
- packages/tinybird/pipes/dub_sale_events_id_pipe.pipe
- packages/tinybird/datasources/dub_regular_links_metadata_latest.datasource
- packages/tinybird/endpoints/all_stats.pipe
- packages/tinybird/pipes/v2_browsers.pipe
- packages/tinybird/endpoints/v2_devices.pipe
- packages/tinybird/pipes/get_sale_event.pipe
- packages/tinybird/pipes/v2_top_links.pipe
- packages/tinybird/endpoints/get_lead_event.pipe
- packages/tinybird/pipes/v2_cities.pipe
- packages/tinybird/pipes/v2_top_tags.pipe
- packages/tinybird/pipes/v2_countries.pipe
- packages/tinybird/materializations/dub_click_events_pipe.pipe
- packages/tinybird/materializations/dub_links_metadata_pipe.pipe
- packages/tinybird/endpoints/coordinates_sales.pipe
- packages/tinybird/pipes/v2_top_urls.pipe
- packages/tinybird/endpoints/v2_referers.pipe
- packages/tinybird/materializations/dub_sale_events_id_pipe.pipe
- packages/tinybird/endpoints/get_click_event.pipe
- packages/tinybird/pipes/v2_referer_urls.pipe
- packages/tinybird/endpoints/v2_top_tags.pipe
- packages/tinybird/pipes/v2_usage.pipe
- packages/tinybird/pipes/v2_utms.pipe
- packages/tinybird/pipes/v2_timeseries.pipe
- packages/tinybird/materializations/dub_regular_links_metadata_pipe.pipe
- packages/tinybird/endpoints/v2_continents.pipe
- packages/tinybird/materializations/dub_lead_events_pipe.pipe
- packages/tinybird/endpoints/v2_top_links.pipe
- packages/tinybird/endpoints/get_framer_lead_events.pipe
- packages/tinybird/datasources/dub_sale_events_id.datasource
- packages/tinybird/endpoints/v2_customer_events.pipe
- packages/tinybird/endpoints/get_audit_logs.pipe
- packages/tinybird/endpoints/get_sale_event.pipe
- packages/tinybird/endpoints/get_lead_events.pipe
- packages/tinybird/endpoints/v2_triggers.pipe
- packages/tinybird/endpoints/v2_events.pipe
- packages/tinybird/endpoints/coordinates_all.pipe
- packages/tinybird/endpoints/get_lead_event_by_id.pipe
- packages/tinybird/endpoints/get_import_error_logs.pipe
- packages/tinybird/endpoints/v2_referer_urls.pipe
- packages/tinybird/endpoints/v2_os.pipe
- packages/tinybird/pipes/v2_regions.pipe
- packages/tinybird/pipes/dub_regular_links_metadata_pipe.pipe
- packages/tinybird/pipes/v2_devices.pipe
- packages/tinybird/endpoints/v2_cities.pipe
- packages/tinybird/endpoints/v2_countries.pipe
- packages/tinybird/pipes/dub_click_events_pipe_with_domain_key.pipe
- packages/tinybird/endpoints/v2_top_urls.pipe
- packages/tinybird/pipes/v2_referers.pipe
- packages/tinybird/endpoints/v2_count.pipe
- packages/tinybird/materializations/dub_sale_events_pipe.pipe
- packages/tinybird/endpoints/v2_regions.pipe
- packages/tinybird/endpoints/v2_top_programs.pipe
- packages/tinybird/endpoints/get_webhook_events.pipe
- packages/tinybird/pipes/v2_triggers.pipe
- packages/tinybird/endpoints/v2_browsers.pipe
- packages/tinybird/materializations/dub_click_events_id_pipe.pipe
- packages/tinybird/endpoints/v2_usage.pipe
- packages/tinybird/pipes/v2_continents.pipe
- packages/tinybird/pipes/v2_os.pipe
- packages/tinybird/endpoints/v2_timeseries.pipe
- packages/tinybird/endpoints/v2_top_partners.pipe
- packages/tinybird/pipes/v2_top_partners.pipe
- packages/tinybird/endpoints/v2_utms.pipe
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (25)
packages/tinybird/pipes/get_click_event.pipe (1)
5-18: Good formatting improvement — SQL query structure is clean and readable.The indentation enhancement makes the endpoint query more maintainable while preserving its logic: it correctly filters click events by ID, orders by timestamp descending, and limits to the most recent record. Parameter handling is well-defined with proper type, default value, and required flag.
packages/tinybird/pipes/v2_customer_events.pipe (1)
1-163: LGTM! Formatting improvements maintain query semantics.This is a clean, non-functional refactor of the pipe's SQL formatting—indentation and whitespace are improved for readability across all four NODE blocks (lead_events, click_events, sale_events, endpoint). The underlying query logic, column selection, WHERE clauses, and UNION logic are unchanged. The reformatted code is consistent and well-aligned.
packages/tinybird/pipes/get_framer_lead_events.pipe (1)
5-9: ****The pipe
get_framer_lead_eventsis actively referenced in the codebase across multiple systems:
apps/web/app/(ee)/api/cron/framer/backfill-leads-batch/route.ts(cron job)apps/web/scripts/framer/process-lead-events.ts(processing script)apps/web/scripts/framer/1-process-framer-combined.ts(another script)The changes are formatting/indentation adjustments only, and the pipe is not a dangling reference. No action required.
packages/tinybird/pipes/coordinates_all.pipe (1)
7-94: Formatting-only changes - no semantic impact.These indentation adjustments improve consistency across the SQL nodes without altering behavior.
packages/tinybird/pipes/v3_events.pipe (2)
49-69: Good use of PREWHERE optimization.The conditional FROM clause and PREWHERE optimization for
customerIdfiltering improves query performance by reducing data processed. The expanded link filtering logic provides good flexibility for workspace-scoped queries.
265-271: Routing logic looks good.The endpoint routing correctly branches to the appropriate event node based on
eventType, with proper handling of thesaleTypeparameter for sales events.packages/tinybird/pipes/v3_group_by_link_metadata.pipe (2)
196-197: Potential schema mismatch:triggerfilter on sales.Same concern as leads; verify
triggerexists ondub_sale_events_mvbefore filtering.
139-141: The review comment is incorrect;triggercolumn exists in the schema.The verification confirms that
triggeris aStringcolumn in bothdub_lead_events_mvanddub_sale_events_mvschema definitions (appearing at lines 24 and 28 respectively). The filter ontriggerat line 139 will not fail due to a missing column.Likely an incorrect or invalid review comment.
packages/tinybird/pipes/coordinates_sales.pipe (1)
7-24: LGTM! Formatting improvement.The SQL query reformatting improves readability without any functional changes.
packages/tinybird/pipes/get_webhook_events.pipe (1)
9-22: LGTM! Formatting improvement.The SQL query reformatting improves readability without any functional changes.
packages/tinybird/datasources/dub_import_error_logs.datasource (1)
1-2: LGTM! Consistency improvement.Quoting the TOKEN value aligns with the broader PR effort to standardize token declarations across Tinybird datasources.
packages/tinybird/datasources/dub_audit_logs.datasource (1)
1-2: LGTM! Consistency improvement.Quoting the TOKEN value aligns with the broader PR effort to standardize token declarations across Tinybird datasources.
packages/tinybird/pipes/get_import_error_logs.pipe (1)
9-29: LGTM! Formatting improvement.The SQL query reformatting improves readability without any functional changes.
packages/tinybird/datasources/dub_conversion_events_log.datasource (1)
1-2: LGTM! Consistency improvement.Quoting the TOKEN value aligns with the broader PR effort to standardize token declarations across Tinybird datasources.
packages/tinybird/datasources/dub_links_metadata.datasource (1)
1-2: Token standardization confirmed, but permissions verification needed.The token name change from
dub_links_metadatatodub_tinybird_tokenis intentional—all 8 datasources now uniformly usedub_tinybird_tokenas part of coordinated Tinybird infrastructure finalization. No references todub_links_metadataas a token remain in the codebase.However, verify manually that
dub_tinybird_tokenhas the necessary ingestion permissions configured in your Tinybird platform settings, as token capability scoping is managed outside the codebase.packages/tinybird/datasources/dub_webhook_events.datasource (1)
1-1: TOKEN quoting LGTM.Consistent with other DS files. Ensure
"dub_tinybird_token"is defined in Tinybird deploy envs for all workspaces.packages/tinybird/pipes/dub_links_metadata_pipe.pipe (1)
8-15: The review comment is incorrect. ThestartsWith('ws_c')logic is intentional.Analysis reveals this is legacy workspace ID normalization, not a bug:
The TypeScript code contains an identical
normalizeWorkspaceIdfunction with the same logic:startsWith("ws_c")thenreplace("ws_", "")In
create-workspace-id.ts, the system explicitly guards against generating IDs that start with "ws_c" (rejecting them as collisions with the old workspace ID format)The pipe's logic handles legacy workspace IDs from an older format for backward compatibility. Applying the suggested diff would break this legacy data handling.
The code is working as designed—the 'ws_c' prefix specifically targets old IDs requiring normalization.
Likely an incorrect or invalid review comment.
packages/tinybird/datasources/dub_sale_events.datasource (2)
1-1: Token quoting LGTM.Consistent with other datasources.
38-43: Good enrichment; ensure downstream MV handles NULLs.You’ve added Nullable(domain/key/workspace_id). Verify the MV schema/pipe coalesces or matches nullability (see related MV comment).
packages/tinybird/datasources/dub_click_events_mv.datasource (1)
38-38: Sort key choice LGTM.workspace_id, link_id, timestamp aligns with common filters and timeseries scans.
packages/tinybird/pipes/dub_click_events_pipe.pipe (1)
27-30: Trigger normalization looks good.Empty trigger defaults to 'qr' when qr=1 else 'link'; else keep original. Matches intended semantics.
Ensure qr is UInt8 (0/1); if Nullable, wrap with coalesce(qr, 0).
packages/tinybird/pipes/v3_timeseries.pipe (4)
7-77: LGTM!The interval generation logic for month, day, and hour granularities is correct. The use of
toStartOf*functions with timezone handling andarrayJoinfor generating sequences is appropriate for this use case.
80-110: LGTM!The workspace_links node correctly uses the
FINALmodifier to get the latest state and properly validates the requiredworkspaceId. The conditional filter logic for optional parameters is well-structured.
356-363: LGTM!The composite timeseries correctly uses left joins to combine all event types, starting from clicks as the base. The join logic on
groupByFieldis consistent across all timeseries nodes.
366-378: LGTM!The endpoint routing logic is clear and properly handles all event types with a sensible default fallback to clicks.
| `workspace_id` LowCardinality(String), | ||
| `link_id` String, | ||
| `customer_id` String, | ||
| `domain` String, | ||
| `key` String, | ||
| `url` String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Non‑nullable fields may reject NULLs from the pipe.
workspace_id, domain, key are non‑nullable. If upstream join/coalesce yields NULL, inserts will fail. Either (a) make these columns Nullable, or (b) guarantee non‑NULLs in the pipe (see suggested fallbacks in dub_lead_events_pipe.pipe).
Option A (make columns nullable):
- `workspace_id` LowCardinality(String),
+ `workspace_id` LowCardinality(Nullable(String)),
- `domain` String,
- `key` String,
+ `domain` Nullable(String),
+ `key` Nullable(String),Option B (keep non‑nullable): keep schema as‑is and adopt the pipe changes proposed earlier.
Also applies to: 11-15, 41-41
🏁 Script executed:
#!/bin/bash
# Locate and examine the datasource file
find . -name "dub_lead_events_mv.datasource" -type f 2>/dev/null | head -5Length of output: 61
🏁 Script executed:
#!/bin/bash
# Locate and examine the pipe file
find . -name "dub_lead_events_pipe.pipe" -type f 2>/dev/null | head -5Length of output: 51
🏁 Script executed:
#!/bin/bash
# Search for related files in tinybird datasources directory
fd -e datasource -e pipe . packages/tinybird/ 2>/dev/null | grep -E "(dub_lead|event)" | head -20Length of output: 1066
🏁 Script executed:
#!/bin/bash
# Examine the datasource schema file
cat packages/tinybird/datasources/dub_lead_events_mv.datasourceLength of output: 1150
🏁 Script executed:
#!/bin/bash
# Examine the pipe file
cat packages/tinybird/pipes/dub_lead_events_pipe.pipeLength of output: 1493
Address NULL handling for workspace_id, domain, key fields.
The datasource declares workspace_id, domain, and key as non-nullable, but the upstream pipe uses coalesce() without a final fallback. If both lead_event and link_metadata sources are NULL, coalesce() returns NULL, causing insert failures.
Choose one approach:
Option A (make columns nullable):
- `workspace_id` LowCardinality(String),
+ `workspace_id` LowCardinality(Nullable(String)),
- `domain` String,
- `key` String,
+ `domain` Nullable(String),
+ `key` Nullable(String),Option B (ensure non-NULL in pipe): Add fallback values (e.g., coalesce(..., '')) to guarantee non-NULL results before insert.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| `workspace_id` LowCardinality(String), | |
| `link_id` String, | |
| `customer_id` String, | |
| `domain` String, | |
| `key` String, | |
| `url` String, | |
| `workspace_id` LowCardinality(Nullable(String)), | |
| `link_id` String, | |
| `domain` Nullable(String), | |
| `key` Nullable(String), | |
| `url` String, |
🤖 Prompt for AI Agents
In packages/tinybird/datasources/dub_lead_events_mv.datasource around lines 6 to
10, the fields `workspace_id`, `domain`, and `key` are declared as non-nullable
but upstream uses coalesce() without a final fallback so NULLs can be inserted;
either make those columns nullable (change types to Nullable(String) or
Nullable(LowCardinality(String)) as appropriate) or update the upstream pipe to
provide a non-NULL default (wrap the existing coalesce calls with a final
fallback like '', e.g., coalesce(..., '') ) so inserts never receive NULL for
these columns — pick one approach and apply consistently for all three fields.
| `workspace_id` LowCardinality(String), | ||
| `link_id` String, | ||
| `domain` String, | ||
| `key` String, | ||
| `url` String, | ||
| `event_id` String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Nullability mismatch between raw and MV can break ingestion.
Raw source defines workspace_id/domain/key as Nullable(String); MV has non-nullable String/LowCardinality(String). Inserts with NULLs will fail unless the pipe coerces to ''. Align schemas or coalesce in the pipe.
Option A — align MV schema:
- `workspace_id` LowCardinality(String),
+ `workspace_id` LowCardinality(Nullable(String)),
- `domain` String,
+ `domain` Nullable(String),
- `key` String,
+ `key` Nullable(String),Option B — keep MV non-nullable and coalesce upstream:
-- In the MV-building pipe
COALESCE(workspace_id, '') AS workspace_id,
COALESCE(domain, '') AS domain,
COALESCE(`key`, '') AS `key`,Please confirm which approach you prefer; I can propose a full diff once chosen.
Also applies to: 17-19, 28-41
I'll help you verify this review comment by examining the datasource files and checking for the nullability mismatch. Let me start by exploring the codebase.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find datasource files related to dub_sale_events
fd -e datasource | grep -i "dub_sale_events"
Length of output: 116
🏁 Script executed:
#!/bin/bash
# List all files in the tinybird datasources directory
fd . packages/tinybird/datasources --type f | head -20Length of output: 786
🏁 Script executed:
#!/bin/bash
# Show the MV datasource file content
cat packages/tinybird/datasources/dub_sale_events_mv.datasourceLength of output: 1268
🏁 Script executed:
#!/bin/bash
# Look for the raw datasource file
fd "dub_sale_events" packages/tinybird/datasources --type fLength of output: 116
🏁 Script executed:
#!/bin/bash
# Search for pipe files that might reference these datasources
fd -e pipe packages/tinybird/ | xargs grep -l "dub_sale_events" 2>/dev/null | head -5Length of output: 377
---
🏁 Script executed:
```shell
#!/bin/bash
# Show the raw datasource file content
cat packages/tinybird/datasources/dub_sale_events.datasource
Length of output: 1929
🏁 Script executed:
#!/bin/bash
# Search for pipe files related to dub_sale_events with corrected syntax
fd . packages/tinybird --full-path '.*dub_sale_events.*\.pipe'Length of output: 2007
🏁 Script executed:
#!/bin/bash
# List all pipe files in tinybird
fd -e pipe packages/tinybird/Length of output: 377
🏁 Script executed:
#!/bin/bash
# Check the MV pipe definition
fd "dub_sale_events_pipe" packages/tinybird/ -e pipeLength of output: 49
🏁 Script executed:
#!/bin/bash
# Show the pipe file content to see NULL handling
cat packages/tinybird/pipes/dub_sale_events_pipe.pipeLength of output: 2043
Nullability mismatch confirmed—pipe's COALESCE insufficient without default fallback.
The raw source defines workspace_id, domain, and key as Nullable(String), while the MV expects non-nullable variants. The pipe uses coalesce(sale_event.workspace_id, link_metadata.workspace_id), but this still returns NULL when both inputs are NULL, which will fail insertion into the non-nullable MV columns.
Choose and implement one approach:
- Option A — update MV schema to accept NULLs (lines 7, 9, 10 in dub_sale_events_mv.datasource)
- Option B — update pipe to coalesce with empty string default (lines 7–9 in dub_sale_events_pipe.pipe)
Apply the same fix to the other affected fields at lines 17–19 and 28–41 of the MV datasource if they have the same issue.
🤖 Prompt for AI Agents
In packages/tinybird/datasources/dub_sale_events_mv.datasource around lines 6–11
(and similarly check lines 17–19 and 28–41), the MV defines workspace_id,
domain, and key as non-nullable but the pipe's coalesce can still produce NULL;
fix this by updating the pipe
(packages/tinybird/datasources/dub_sale_events_pipe.pipe, around the referenced
pipe lines 7–9 and the other affected expressions) to use a safe default in
coalesce (e.g., coalesce(sale_event.workspace_id, link_metadata.workspace_id,
'') and coalesce(...) with '' for domain and key and any other affected fields)
so the MV inserts never receive NULL, or alternatively update the MV schema to
Nullable for the listed columns if you prefer Option A (apply the same chosen
approach to the other affected fields at MV lines 17–19 and 28–41).
| SELECT | ||
| (SELECT COUNT(timestamp) FROM dub_click_events_mv) AS clicks, | ||
| (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata_latest FINAL) AS links, | ||
| (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links, | ||
| (SELECT SUM(amount) FROM dub_sale_events_mv) AS sales |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Links count is incorrect source and uses a brittle magic number.
Counting rows on dub_links_metadata will include historical versions/soft-deletes; the + 42036155 offset is opaque and risks drift. Use the deduped latest MV and exclude deleted rows.
Apply this diff:
- (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links,
+ (SELECT countIf(deleted = 0) FROM dub_links_metadata_latest FINAL) AS links,I'll verify the review comment by exploring the codebase structure and validating the concerns raised.
---
🏁 Script executed:
```shell
#!/bin/bash
# Check if the file exists and view its content
cat packages/tinybird/pipes/all_stats.pipe 2>/dev/null || echo "File not found at specified path"
Length of output: 272
🏁 Script executed:
#!/bin/bash
# Search for dub_links_metadata and related table definitions
rg -n "dub_links_metadata" --type=pipe --type=sql -C 3Length of output: 28
🏁 Script executed:
#!/bin/bash
# Look for materialized view naming patterns and _latest variants
rg -n "_latest" --type=pipe --type=sql | head -20Length of output: 28
🏁 Script executed:
#!/bin/bash
# Search for the magic number 42036155 in the codebase
rg -n "42036155" -C 2Length of output: 429
🏁 Script executed:
#!/bin/bash
# Check for soft-delete or deleted field patterns
rg -n "deleted\s*=" --type=pipe --type=sql -C 2 | head -30Length of output: 28
🏁 Script executed:
#!/bin/bash
# Look for other COUNT queries and FINAL patterns to understand conventions
rg -n "FINAL|countIf" --type=pipe --type=sql | head -20Length of output: 28
I need to adjust my search strategy since .pipe is not a recognized file type. Let me explore the codebase structure more broadly.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find all pipe files and schema/definition files
fd -e pipe -e sql -e md | grep -E "(schema|definition|dub_links|dub_click|dub_sale)" | head -20
Length of output: 207
🏁 Script executed:
#!/bin/bash
# Search for dub_links_metadata definition or documentation
rg -n "dub_links_metadata" -C 3Length of output: 9716
🏁 Script executed:
#!/bin/bash
# Search for _latest variants
rg -n "_latest" -C 2 | head -30Length of output: 2069
🏁 Script executed:
#!/bin/bash
# Look for deleted or soft-delete field usage in any file
rg -n "deleted\s*=|WHERE.*deleted" -C 2 | head -30Length of output: 2065
🏁 Script executed:
#!/bin/bash
# Find other pipe files to understand query patterns and conventions
fd -e pipe | head -20Length of output: 928
🏁 Script executed:
#!/bin/bash
# Check git history for the magic number (when/why it was added)
git log -p --all -S "42036155" -- packages/tinybird/pipes/all_stats.pipe 2>/dev/null | head -50Length of output: 59
Let me check the datasource definition for dub_links_metadata_latest to confirm the structure:
---
🏁 Script executed:
```shell
#!/bin/bash
# View the datasource definition
cat packages/tinybird/datasources/dub_links_metadata_latest.datasource
Length of output: 533
🏁 Script executed:
#!/bin/bash
# View the pipe that generates the materialized view
cat packages/tinybird/pipes/dub_links_metadata_pipe.pipeLength of output: 587
Use the materialized view and exclude soft-deleted rows to match codebase patterns.
The issue is confirmed. All related queries (v3_count, v3_timeseries, v3_events, v3_group_by, v2_top_programs, and event pipes) consistently use dub_links_metadata_latest FINAL with WHERE deleted == 0, but this query incorrectly references the raw dub_links_metadata table with an unexplained magic number offset. The datasource is properly configured as a ReplacingMergeTree with soft-delete support, making the raw table access both dangerous and inconsistent.
Adjust the suggested diff for ClickHouse syntax consistency:
- (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links,
+ (SELECT countIf(deleted == 0) FROM dub_links_metadata_latest FINAL) AS links,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SELECT | |
| (SELECT COUNT(timestamp) FROM dub_click_events_mv) AS clicks, | |
| (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata_latest FINAL) AS links, | |
| (SELECT COUNT(timestamp) + 42036155 FROM dub_links_metadata) AS links, | |
| (SELECT SUM(amount) FROM dub_sale_events_mv) AS sales | |
| SELECT | |
| (SELECT COUNT(timestamp) FROM dub_click_events_mv) AS clicks, | |
| (SELECT countIf(deleted == 0) FROM dub_links_metadata_latest FINAL) AS links, | |
| (SELECT SUM(amount) FROM dub_sale_events_mv) AS sales |
🤖 Prompt for AI Agents
In packages/tinybird/pipes/all_stats.pipe around lines 6 to 9, the query
incorrectly reads from the raw dub_links_metadata table and adds a hard-coded
offset; replace that reference with the materialized ReplacingMergeTree view
dub_links_metadata_latest FINAL and exclude soft-deleted rows by adding WHERE
deleted == 0, and remove the unexplained magic number so the links count is
simply the COUNT(timestamp) from the view with the deleted filter.
| WHEN trigger = '' THEN CASE WHEN qr = true THEN 'qr' ELSE 'link' END ELSE trigger | ||
| END as trigger | ||
| FROM dub_click_events | ||
| SELECT * FROM dub_click_events_mv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Avoid SELECT * in pipes to a typed datasource.
This is brittle to upstream schema drift and column order. Explicitly select columns in the ID DS order.
- SELECT * FROM dub_click_events_mv
+ SELECT
+ timestamp,
+ click_id,
+ workspace_id,
+ link_id,
+ domain,
+ key,
+ url,
+ continent,
+ country,
+ city,
+ region,
+ latitude,
+ longitude,
+ device,
+ browser,
+ os,
+ trigger,
+ ua,
+ referer,
+ referer_url,
+ ip,
+ identity_hash,
+ device_model,
+ device_vendor,
+ browser_version,
+ os_version,
+ engine,
+ engine_version,
+ cpu_architecture,
+ qr,
+ bot
+ FROM dub_click_events_mvConfirm the exact column order in packages/tinybird/datasources/dub_click_events_id.datasource and mirror it here.
I'll verify the review comment by examining the datasource file to confirm the exact column order.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and read the datasource file to verify column order
find . -name "dub_click_events_id.datasource" -type f
Length of output: 62
🏁 Script executed:
#!/bin/bash
# Read the datasource file to extract column definitions
cat packages/tinybird/datasources/dub_click_events_id.datasourceLength of output: 1091
🏁 Script executed:
#!/bin/bash
# Read the pipe file to see current state
cat packages/tinybird/pipes/dub_click_events_id_pipe.pipeLength of output: 125
🏁 Script executed:
#!/bin/bash
# Also search for the materialized view definition
find . -name "dub_click_events_mv*" -type fLength of output: 62
🏁 Script executed:
#!/bin/bash
# Search for any file containing dub_click_events_mv definition
rg -l "dub_click_events_mv" --type-list | head -20
rg "CREATE MATERIALIZED VIEW dub_click_events_mv" -A 50Length of output: 389
Refactor SELECT * to explicit columns matching datasource schema.
The review comment is correct and verified. The suggested column list in the diff exactly matches the column order from packages/tinybird/datasources/dub_click_events_id.datasource. Applying this change eliminates brittleness from upstream schema drift and ensures the pipe explicitly declares which columns it depends on.
🤖 Prompt for AI Agents
In packages/tinybird/pipes/dub_click_events_id_pipe.pipe around line 6, replace
the brittle "SELECT * FROM dub_click_events_mv" with an explicit SELECT listing
the columns in the exact order defined in
packages/tinybird/datasources/dub_click_events_id.datasource; update the query
to enumerate each column name from that datasource (preserving the same order
and names) instead of using *, so the pipe depends only on the declared schema
and is resilient to upstream schema drift.
| toLowCardinality( | ||
| coalesce(lead_event.workspace_id, link_metadata.workspace_id) | ||
| ) AS workspace_id, | ||
| link_id, | ||
| customer_id, | ||
| coalesce(lead_event.domain, link_metadata.domain) AS domain, | ||
| coalesce(lead_event.key, link_metadata.key) AS key, | ||
| url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent NULL→non‑nullable inserts; drop redundant subquery; align cardinality.
- Coalescing without a default can still return NULL;
dub_lead_events_mv.datasourcedefinesworkspace_id/domain/keyas non‑nullable. Add''fallback. - The
WHERE link_id IN (SELECT link_id FROM dub_lead_events)forces a full scan and is unnecessary for a LEFT JOIN onlink_id. - Wrap
device/browser/oswith LC to match DS and reduce footprint.
Apply this diff:
- toLowCardinality(
- coalesce(lead_event.workspace_id, link_metadata.workspace_id)
- ) AS workspace_id,
+ toLowCardinality(
+ coalesce(lead_event.workspace_id, link_metadata.workspace_id, '')
+ ) AS workspace_id,
- coalesce(lead_event.domain, link_metadata.domain) AS domain,
- coalesce(lead_event.key, link_metadata.key) AS key,
+ coalesce(lead_event.domain, link_metadata.domain, '') AS domain,
+ coalesce(lead_event.key, link_metadata.key, '') AS key,
- device,
- browser,
- os,
+ toLowCardinality(device) device,
+ toLowCardinality(browser) browser,
+ toLowCardinality(os) os,
@@
- SELECT link_id, workspace_id, domain, key
- FROM dub_links_metadata_latest FINAL
- WHERE link_id IN (SELECT link_id FROM dub_lead_events)
+ SELECT link_id, workspace_id, domain, key
+ FROM dub_links_metadata_latest FINALAlso applies to: 13-14, 27-31, 39-47, 49-54
🤖 Prompt for AI Agents
In packages/tinybird/pipes/dub_lead_events_pipe.pipe around lines 9-15 (and
similarly fix ranges 13-14, 27-31, 39-47, 49-54): prevent NULL→non‑nullable
inserts by changing coalesce(...) to coalesce(..., '') for workspace_id, domain
and key; remove the redundant WHERE link_id IN (SELECT link_id FROM
dub_lead_events) subquery since the LEFT JOIN on link_id already covers matching
and the subquery forces a full scan; and wrap device, browser and os with
toLowCardinality(...) to align cardinality with the datasource and reduce
footprint. Ensure all adjusted coalesce calls include the empty-string fallback
and replace field references accordingly without adding extra filters.
| range(toUInt32(start + 86400), toUInt32(end + 86400), | ||
| 86400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-by-one error: first day is excluded from the interval range.
Adding 86400 seconds to the start parameter causes the range() function to begin generating intervals from the day after the requested start date. For example, if start is "2025-09-03 00:00:00" and end is "2025-09-05 00:00:00", the current implementation generates intervals for Sep 4 and Sep 5, but omits Sep 3.
Apply this diff to fix the off-by-one error:
- range(toUInt32(start + 86400), toUInt32(end + 86400),
+ range(toUInt32(start), toUInt32(end + 86400),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| range(toUInt32(start + 86400), toUInt32(end + 86400), | |
| 86400 | |
| range(toUInt32(start), toUInt32(end + 86400), | |
| 86400 |
🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around lines 23-24, the range() call
adds 86400 to the start value which causes the first day to be omitted; change
the start argument so the range begins at the provided start (remove the +86400
on start) and keep the end adjusted as needed (e.g., end + 86400) so the full
inclusive interval from start through end is generated.
| % | ||
| SELECT | ||
| toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval, | ||
| uniq(*) AS events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Clarify the aggregation logic for event counting.
Using uniq(*) counts unique timestamp values across all three event types, not the total number of events. If the intent is to count all events (including duplicate timestamps), use count() instead. If the intent is indeed to count unique timestamp values, consider renaming the column to reflect this (e.g., unique_timestamps).
Run the following script to check if there are examples of similar aggregation patterns elsewhere in the codebase:
🏁 Script executed:
#!/bin/bash
# Search for usage patterns of uniq(*) vs count() in other Tinybird pipes
rg -n -C2 'uniq\(\*\)' --type=sql -g '*.pipe'
rg -n -C2 'count\(\)' --type=sql -g '*.pipe'Length of output: 1611
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full context of the v3_usage.pipe file around line 37, specifically the complete query block
cat packages/tinybird/pipes/v3_usage.pipe | sed -n '31,77p'
Length of output: 1568
---
🌐 Web query:
```web
ClickHouse uniq(*) function syntax semantics
💡 Result:
Short answer
- ClickHouse does not have a special uniq(*) syntax. The uniq aggregate is called as uniq(expr[, ...]) — i.e. pass the column(s) or tuple(s) you want to count distinct values for. [1]
What uniq does (semantics)
- uniq(x[, ...]) returns an approximate number of distinct values (UInt64). It hashes the arguments and uses an adaptive sampling algorithm (keeps up to 65,536 hashes) to produce a deterministic approximate cardinality. Use it in most cases for memory/CPU efficiency. [1]
When to use exact / other variants
- uniqExact(x[, ...]) — exact distinct count (unbounded state / more memory). [2]
- uniqCombined / uniqCombined64 / uniqHLL12 / uniqTheta — alternative approximate implementations with different memory/accuracy trade‑offs (uniqCombined is a recommended lower-memory, higher-accuracy alternative). [3][4]
Notes on syntax and combinators
- uniq accepts multiple arguments (they are combined into the hashed key) and supports aggregate combinators/suffixes like -If and -Array (e.g., uniqIf(...), uniqArray(...)). You can also store intermediate states via AggregateFunction(uniq, ...) and use -State/-Merge forms. [5][6]
References
- ClickHouse docs — uniq. [1]
- ClickHouse docs — uniqExact. [2]
- ClickHouse docs — uniqCombined. [3]
- ClickHouse docs — other uniq family / HLL notes. [11]
- ClickHouse docs — aggregate function combinators (-If, -Array). [8]
- ClickHouse docs — AggregateFunction type and -State/-Merge usage. [7]
If you want, tell me the exact expression you tried (e.g. uniq(*), uniq((col1,col2)), COUNT(DISTINCT col)) and I’ll show the correct query and explain differences.
---
🌐 Web query:
```web
Tinybird uniq(*) aggregate function syntax
💡 Result:
Use uniq(column) — i.e. uniq(x[, ...]). Tinybird follows ClickHouse-style syntax: pass one or more columns/expressions (or a Tuple/Array) to uniq; there is no special uniq(*) form. For aggregate-state storage use uniqState(...) and uniqMerge(...) / AggregateFunction(uniq, ...). [1][2]
Sources:
[1] Tinybird Aggregate functions (uniq, uniqState/uniqMerge, combinators).
[2] ClickHouse uniq reference (uniq(x[, ...]) and AggregateFunction usage).
Fix invalid uniq(*) syntax—must specify column name.
uniq(*) is not valid Tinybird/ClickHouse syntax. Use count() to count total events or uniq(timestamp) for distinct timestamps. This query also has the same issue in:
- v3_usage.pipe line 100:
uniq(*) as links - v3_timeseries.pipe line 212:
uniq(*) as leads - v3_timeseries.pipe line 290:
uniq(*) as sales
All must be corrected to either count() (for total count) or uniq(column_name) (for distinct values).
🤖 Prompt for AI Agents
packages/tinybird/pipes/v3_usage.pipe lines 37 and 100, and
packages/tinybird/pipes/v3_timeseries.pipe lines 212 and 290: the query uses
invalid Tinybird/ClickHouse syntax `uniq(*)`; replace each occurrence with the
appropriate aggregation—use `count()` if you intend a total event/row count, or
`uniq(<column_name>)` if you intend to count distinct values (e.g.,
`uniq(timestamp)` or `uniq(user_id)`); update the specific column name based on
the metric semantics for events, links, leads, and sales.
|
|
||
| NODE usage_links_data | ||
| DESCRIPTION > | ||
| undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove placeholder description.
The description "undefined" should be replaced with a meaningful description of what the usage_links_data node does, such as "Aggregates unique link creation counts per day within the specified workspace and date range."
🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around line 93, the node description
currently reads "undefined"; replace this placeholder with a meaningful
description such as "Aggregates unique link creation counts per day within the
specified workspace and date range." Update the description field for the
usage_links_data node to clearly state its purpose, inputs (workspace,
start_date, end_date) and output (daily unique link counts), keeping it concise
and aligned with existing node documentation format.
|
|
||
| % | ||
| SELECT | ||
| toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field name inconsistency: timestamp vs created_at.
Line 99 uses the timestamp field to calculate the interval, but lines 112-113 filter using the created_at field. This inconsistency will cause incorrect results or a runtime error if timestamp doesn't exist in dub_links_metadata_latest. Both should use the same field, likely created_at.
Apply this diff to fix the inconsistency:
- toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval,
+ toDateTime64(toStartOfDay(created_at, {{ String(timezone, 'UTC') }}), 3) AS interval,Also applies to: 112-113
🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around lines 99 and 112-113, the code
uses timestamp to compute the interval but filters on created_at, causing a
field-name mismatch; update the interval expression to use created_at (i.e.,
replace timestamp with created_at in the toDateTime64/toStartOfDay call) so both
the grouping/interval and the WHERE filters reference the same created_at field
(keeping the existing timezone handling intact).
| % | ||
| SELECT | ||
| toDateTime64(toStartOfDay(timestamp, {{ String(timezone, 'UTC') }}), 3) AS interval, | ||
| uniq(*) as links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Ambiguous aggregation: specify the field for uniq() or use count().
Using uniq(*) without specifying which field(s) determine uniqueness is unclear and makes the query harder to understand. Consider using count() if you want to count all link records per interval, or uniq(link_id) if you want to count unique link IDs.
Example fix using count():
- uniq(*) as links
+ count() as linksOr if you need unique link IDs:
- uniq(*) as links
+ uniq(link_id) as links📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uniq(*) as links | |
| count() as links |
🤖 Prompt for AI Agents
In packages/tinybird/pipes/v3_usage.pipe around line 100, the aggregation uses
uniq(*) which is ambiguous; replace uniq(*) with an explicit aggregation —
either use count(*) if you want the total number of link records per interval,
or use uniq(link_id) (or the appropriate unique field name, e.g., url or
link_id) if you want the number of distinct links; update the query to call the
chosen function and adjust any downstream code or aliases to match the new field
name (e.g., links_count or unique_links).
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes & Improvements
Deprecated