|
| 1 | +--- |
| 2 | +title: Profiles Sync Sample Queries |
| 3 | +beta: true |
| 4 | +--- |
| 5 | + |
| 6 | +Through Profiles Sync, Segment provides data sets and models that can help you enrich customer profiles using any warehouse data available to you. |
| 7 | + |
| 8 | +On this page, you’ll find queries that you can run with Profiles Sync to address common use cases. |
| 9 | + |
| 10 | +> info "" |
| 11 | +> The examples in this guide are based on a Snowflake installation. If you’re using another warehouse, you may need to adjust the syntax. |
| 12 | +
|
| 13 | +## About example schemas |
| 14 | + |
| 15 | +The queries on this page use two example schemas: |
| 16 | + |
| 17 | +- `ps_segment`, a schema where Segment lands data |
| 18 | +- `ps_ materialize`, a schema with your produced materializations |
| 19 | + |
| 20 | +These schema names may not match your own. |
| 21 | + |
| 22 | +## Monitor and diagnose identity graphs |
| 23 | + |
| 24 | +These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by [identity resolution](/docs/profiles/identity-resolution/). |
| 25 | + |
| 26 | +### Show how many profiles Segment creates and merges per hour |
| 27 | + |
| 28 | +This example queries the `id_graph_udpates` table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change: |
| 29 | + |
| 30 | +```sql |
| 31 | +SELECT |
| 32 | + DATE_TRUNC('hour',timestamp) as hr, |
| 33 | + CASE |
| 34 | + WHEN canonical_segment_id=segment_id |
| 35 | + THEN 'profile creation' ELSE 'profile merge' |
| 36 | + END as profile_event, |
| 37 | + triggering_event_type, |
| 38 | + COUNT(DISTINCT triggering_event_id) as event_count |
| 39 | +FROM ps_segment.id_graph_updates |
| 40 | +GROUP BY 1,2,3 |
| 41 | +``` |
| 42 | + |
| 43 | +### Isolate profiles that have reached an identifier's maximum configured value |
| 44 | + |
| 45 | +Segment’s [configurable identifier limits](/docs/profiles/identity-resolution/identity-resolution-settings/) let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile. |
| 46 | + |
| 47 | +The following query lets you view Profiles that have reached a configured limit for the email identifier: |
| 48 | + |
| 49 | +```sql |
| 50 | +WITH agg AS ( |
| 51 | + SELECT |
| 52 | + canonical_segment_id, |
| 53 | + COUNT(LOWER(TRIM(external_id_value))) as value_count, |
| 54 | + LISTAGG(external_id_value,', ') as external_id_values |
| 55 | + FROM ps_materialize.external_id_mapping |
| 56 | + WHERE external_id_type='email' |
| 57 | + GROUP BY 1 |
| 58 | +) |
| 59 | +SELECT |
| 60 | + canonical_segment_id, |
| 61 | + external_id_values, |
| 62 | + value_count |
| 63 | +FROM agg |
| 64 | +WHERE value_count > 5 -- set to your configured limit |
| 65 | +``` |
| 66 | +## Reconstruct a profile's traits |
| 67 | + |
| 68 | +<!-- add intro phrase here and fix this next header for clarity --> |
| 69 | + |
| 70 | +### Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles |
| 71 | + |
| 72 | +When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the `profile_traits` table. |
| 73 | + |
| 74 | +You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait’s origin. |
| 75 | + |
| 76 | +The following example inspects a particular profile, `use_XX`, and trait, `trait_1`. The query reports the profile’s last observed trait, its source ID, and any profiles Segment has since merged into the profile: |
| 77 | + |
| 78 | +```sql |
| 79 | +SELECT * FROM ( |
| 80 | + SELECT |
| 81 | + ids.canonical_segment_id, |
| 82 | + ident.segment_id, |
| 83 | + ident.event_source_id, |
| 84 | + ident.trait_1, |
| 85 | + row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn |
| 86 | + FROM ps_segment.identifies as ident |
| 87 | + INNER JOIN ps_materialize.id_graph as ids |
| 88 | +ON ids.segment_id = ident.segment_id |
| 89 | +AND ids.canonical_segment_id = 'use_XXX' |
| 90 | +AND ident.trait_1 IS NOT NULL |
| 91 | +) WHERE rn=1 |
| 92 | +``` |
| 93 | + |
| 94 | +## Measure and model your customer base |
| 95 | + |
| 96 | +<!-- add intro phrase here and fix this next header for clarity --> |
| 97 | + |
| 98 | +### Pull a complete list of your customers, along with their merges, external identifiers, or traits |
| 99 | + |
| 100 | +The following three snippets will provide a full list of your customers, along with: |
| 101 | + |
| 102 | +- The profile IDs merged into that customer: |
| 103 | + |
| 104 | +```sql |
| 105 | +SELECT |
| 106 | + canonical_segment_id, |
| 107 | + LISTAGG(segment_id, ', ') as associated_segment_ids |
| 108 | +FROM ps_materialize.id_graph |
| 109 | +GROUP BY 1 |
| 110 | +``` |
| 111 | + |
| 112 | +- The external IDs associated with that customer: |
| 113 | + |
| 114 | +```sql |
| 115 | +SELECT |
| 116 | + canonical_segment_id, |
| 117 | + LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids |
| 118 | +FROM ps_materialize.external_id_mapping |
| 119 | +GROUP BY 1 |
| 120 | +``` |
| 121 | + |
| 122 | +- The customer’s traits: |
| 123 | + |
| 124 | +```sql |
| 125 | +SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL |
| 126 | +``` |
| 127 | + |
| 128 | +### Show all pages visited by a user |
| 129 | + |
| 130 | +To get complete user histories, join event tables to the identity graph and aggregate or filter with `id_graph.canonical_segment_id`: |
| 131 | + |
| 132 | +```sql |
| 133 | +SELECT |
| 134 | + id_graph.canonical_segment_id, |
| 135 | + pages.* |
| 136 | +FROM ps_segment.pages |
| 137 | +LEFT JOIN ps_materialize.id_graph |
| 138 | + ON id_graph.segment_id = pages.segment_id |
| 139 | +WHERE canonical_segment_id = ‘use_XX..’ |
| 140 | +``` |
| 141 | + |
| 142 | +### Show the complete history of a trait or audience membership associated with a customer |
| 143 | + |
| 144 | +Suppose you want to track a user’s entrances and exits of the audience `aud_1`. Running the following query would return all qualifying entrance and exits: |
| 145 | + |
| 146 | +```sql |
| 147 | +SELECT |
| 148 | + id_graph.canonical_segment_id, |
| 149 | + identifies.aud_1, |
| 150 | + identifies.timestamp |
| 151 | +FROM ps_segment.identifies |
| 152 | +INNER JOIN ps_materialize.id_graph |
| 153 | + ON id_graph.segment_id = identifies.segment_id |
| 154 | + AND identifies.aud_1 IS NOT NULL |
| 155 | +``` |
| 156 | + |
| 157 | +This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream. |
| 158 | + |
| 159 | +## Frequently asked questions |
| 160 | + |
| 161 | +#### Can I view Engage Audience membership and Computed Trait values in my Warehouse? |
| 162 | + |
| 163 | +Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse. |
| 164 | + |
| 165 | +The column name corresponds to the Audience or Trait key shown on the settings page: |
| 166 | + |
| 167 | +Surface these values the same way as any other trait value: |
| 168 | + |
| 169 | +- The Trait’s complete history will be in `identifies` |
| 170 | +- The Trait’s current state for each customer will be in `profile_traits` |
| 171 | + |
| 172 | +#### What is the relationship between `segment_id` and `canonical_segment_id`? Are they unique? |
| 173 | + |
| 174 | +Identity merges change Segment’s understanding of who performed historical events. |
| 175 | + |
| 176 | +For example, if `profile_b` completed a “Product Purchased” event but Segment understands that `profile_b` should be merged into `profile_a`, Segment deduces that `profile_a` performed that initial “Product Purchased” event. |
| 177 | + |
| 178 | +With that in mind, here's how to differentiate between `segment_id` and `canonical_segment_id`: |
| 179 | + |
| 180 | +- `segment_id` is a unique identifier representing Segment’s understanding of who performed an action at the time the action happened. |
| 181 | +- `canonical_segment_id` is a unique identifier representing Segment’s current understanding of who performed that action. |
| 182 | + |
| 183 | +The mapping between these two identifiers materializes in your `id_graph` table. If a profile has not been merged away, then `segment_id` is equivalent to `canonical_segment_id`. If a profile has been merged away, `id_graph` reflects that state. |
| 184 | + |
| 185 | +As a result, you can retrieve a customer’s complete event history by joining an event table, like `product_purchased` to `id_graph`. |
| 186 | + |
| 187 | +For more information, view the [Profiles Sync tables guide](/docs/profiles/profiles-sync/tables/). |
| 188 | + |
| 189 | +#### Should I expect discrepancies between Profile data seen in Segment Profiles (or) UI vs. what’s exposed via Profiles in the Warehouse? |
| 190 | + |
| 191 | +<!-- fix this header^^ --> |
| 192 | + |
| 193 | +Profiles Sync mimics the materialization performed by Segment Profiles. A user’s merges, external IDs, and traits should be expected whether they’re queried in the warehouse, Profile API, or viewed in the UI. |
| 194 | + |
| 195 | +The following edge cases might drive slight (<0.01%) variation: |
| 196 | + |
| 197 | +- Data processed by Profiles hasn’t yet landed in Profiles Sync. |
| 198 | +- If you rebuild or use non-incremental materialization for `profile_traits`, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users. |
| 199 | + |
| 200 | +By contrast, Segment Profiles and incrementally-built Profiles Sync materializations won’t combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile’s traits will be chosen across the board. |
0 commit comments