Automatic Schema Management

Jitsu automatically managers a schema for data warehouse destinations. Based in incoming event JSON, it creates a table with columns that correspond to JSON properties.

It's done in a few steps:

Flatenning

{
  "userProperties": {
    "id": 1,
    "name": "John"
  }
}

becomes

{
    "user_properties_id": 1,
    "user_properties_name": "John"
}

In addition to flatenning, jitsu converts camelCase name into camel_case. Then Jitsu make sure that colums user_id and user_name are created in the table.

ALTER TABLE user_id ADD COLUMN user_properties_id INT;
ALTER TABLE events ADD COLUMN user_properties_name TEXT;

Jitsu selects the most appropriate column type for specific Data Warehouse based on JSON data type. User can override that selection using SQL column type override function, see details below.

Customization

Certain aspects of schema management can be customized in Jitsu UI.

Primary Key selection

When Jitsu creates a table in Data Warehouse, it must know a primary key column that identifies each row. By default, Jitsu uses message_id column as primary key. messageId (transformed to message_id) is root field of every event payload and is guaranteed to be unique.

Timeseries Database Optimization

Jitsu can create destination tables optimized for time series queries. Concrete implementation will depend on Data Warehouse you use. Some datawarehouses, like ClickHouse or BigQuery, have special column types for time series data.

By default, Jitsu uses timestamp column for that purpose. This column is a root field of every event payload and is guaranteed to be present. It's generated by Jitsu SDK and equals to the time when event was sent to Jitsu. Alternatively, users can set the value explicitly

Schema Freeze

Sometimes, it's not desirable to let Jitsu add additional columns to the table. For example, you might not want to store all properties of incoming JSON in the table. Inf this case you can create all necessary columns in the destination table ahead of time and enable Schema Freeze in Jitsu UI.

Incoming data for any properties that don't have corresponding columns will be stored in _unmapped_data column in JSON format.

SQL column type override

Sometimes, Jitsu might select a column type that is not optimal for your use case. You can override Jitsu default type selection for specific column using Functions.

When creating a new column, Jitsu tries its best to pick an appropriate column type based on the data type of the incoming JSON. However, sometimes it's not going to work. For example, if the fist value of the column is an Integer, Jitsu will create an Integer column. But if the second value is a Double, Jitsu will fail to insert it into the Integer column.

You can override Jitsu default type selection for specific column using Functions. Please refer to SQL column type override for detailed explanations Functions documentation:

Example:

export default async function(event, ctx) {
  return {
    ...event,
    properties: {
      ...event.properties,
      __sql_type_number: "DOUBLE PRECISION",
    },
  }
}

Column Number Limitations

note

This feature is under development and not available yet in the latest release

Jitsu will inflate table column up to the point when it reaches 1500 columns. After that all properties that exceed that limit will be stored in _unmapped_data column in JSON format.

Unmapped Data

If Jitsu won't able to store data in the table by any reason, it will store it in _unmapped_data column in JSON format.

Flatenning​

Customization​

Primary Key selection​

Timeseries Database Optimization​

Schema Freeze​

SQL column type override​

Column Number Limitations​

Unmapped Data​