Clickhouse cluster with 1 shards and 3 replicas built with docker-compose. Includes custom dbt-clickhouse adapter. It has aim to easely set ap sandbox for developing DBT project.
[!warn] Not for production use.
- Add credentials to
.envfor approptiate ENV VARS- CLICKHOUSE_SCR_HOST - credentials for source Clickhouse cluster which will be used as template fow sandbox.
- CLICKHOUSE_SCR_PORT - credentials for source Clickhouse cluster which will be used as template fow sandbox.
- CLICKHOUSE_SCR_USER - credentials for source Clickhouse cluster which will be used as template fow sandbox.
- CLICKHOUSE_SCR_PASSWORD - credentials for source Clickhouse cluster which will be used as template fow sandbox.
- DBT_USER - user wich will be used in profiles.yaml
- DBT_PASSWORD - user wich will be used in profiles.yaml
- MOUNT_DBT_PROJECT_DIR - an absolute path to local DBT project
- Switch on tech VPN
Run single command, and it will copy configs for each node and
run empty clickhouse cluster cluster_name with docker-compose
make startContainers will be available in docker network 172.23.0.0/24
| Container | Address |
|---|---|
| zookeeper | 172.23.0.10 |
| clickhouse01 | 172.23.0.11 |
| clickhouse02 | 172.23.0.12 |
| clickhouse03 | 172.23.0.13 |
Run single command, and it will copy migration.sh script inside a docker compose and run it. After that script will re-create database structure as it was in Clickhouse source system with the sample data of each source table in DBT project (~100 rows)
make migrate[!info] You can rewrite executing DBT_SOURCES var in
migration.shscript by adding--select <add selection>attribute and it will reduce a number of source tables used for re-creation.
Now you are ready to use sandbox. Return to your local BDT project an run dbt run-operation create_udfs -t dev
it will create nessessary function. Done. Enjoy you dev process.
default- no passwordadmin- password123airflow- password
Login to clickhouse01 console (first node's ports are mapped to localhost)
clickhouse-client -h localhostOr open clickhouse-client inside any container
docker exec -it clickhouse01 clickhouse-client -h localhostCreate a test database and table (sharded and replicated)
CREATE DATABASE company_db ON CLUSTER 'cluster_name';
CREATE TABLE company_db.events ON CLUSTER 'cluster_name' (
time DateTime,
uid Int64,
type LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/events', '{replica}')
PARTITION BY toDate(time)
ORDER BY (uid);
CREATE TABLE company_db.events_distr ON CLUSTER 'cluster_name' AS company_db.events
ENGINE = Distributed('cluster_name', company_db, events, uid);Load some data
INSERT INTO company_db.events_distr VALUES
('2020-01-01 10:00:00', 100, 'view'),
('2020-01-01 10:05:00', 101, 'view'),
('2020-01-01 11:00:00', 100, 'contact'),
('2020-01-01 12:10:00', 101, 'view'),
('2020-01-02 08:10:00', 100, 'view'),
('2020-01-03 13:00:00', 103, 'view');Check data from the current shard
SELECT * FROM company_db.events;Check data from all cluster
SELECT * FROM company_db.events_distr;If you need more Clickhouse nodes, add them like this:
- Add replicas/shards to
config.xmlto the blockcompany/remote_servers/cluster_name. - Add nodes to
docker-compose.yml. - Add nodes in
Makefileinconfigtarget.
Start/stop the cluster without removing containers
make start
make stopStop and remove containers
make down