You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+61-8
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,17 @@
1
-
# Glint: SQL Query Compiler for Java
1
+
# Glint: Vectorized and Code Generation Driven Query Engine in Java
2
2
3
3
> Briefly flashing the powers of query compilation without the machinery of a spark.
4
4
5
5
## Description
6
6
7
-
Glint is a SQL query engine with query compilation support in Java.
7
+
Glint is a minimal SQL query engine with vectorized and query compilation support in Java.
8
8
9
9
Following in the tradition of the new movement of modular database architectures
10
10
Glint has no catalog or data management; its only capability is turning SQL queries
11
11
into Java code that is then compiled and executed; think Calcite not Spark.
12
12
13
13
In order to make it fun, at least for tests and benchmark purposes, we did plug
14
-
an Arrow compatible API with support for Memory, CSV and Partquet data sources
15
-
allowing us to run against most benchmark datasets out there.
14
+
an Arrow compatible API with support for Memory, CSV and Parquet data sources.
16
15
17
16
## Architecture
18
17
@@ -22,7 +21,7 @@ aspect of a query compiler is studied or demonstrated.
22
21
23
22
But before all of this, let's start with a brief tour of query engines in general this
24
23
will allow us to frame the architecture discussion in a concrete context by understanding
25
-
the fundamental components and patterns that shape modern query processing systems.
24
+
the fundamental components and patterns that shape modern query processing systems.
26
25
27
26
### Query Engine Architecture and Paradigms
28
27
@@ -42,7 +41,7 @@ SELECT col1 FROM table WHERE col2 > 10
42
41
```
43
42
44
43
Driving the execution of the above model are two execution paradigms: vectorized and compiled.
45
-
Vectorized execution processes data in batches (vectors) to better utilize CPU caches and
44
+
Vectorized execution processes data in batches (vectors) to better utilize CPU caches and
46
45
enable SIMD operations.
47
46
48
47
Instead of processing one row at a time like the Volcano model, it handles chunks of data
@@ -75,9 +74,63 @@ complexity.
75
74
76
75
Each approach has its trade-offs: Vectorized engines have lower compilation overhead and are
77
76
more flexible for dynamic workloads, while compiled engines can achieve better absolute performance
78
-
for stable queries by generating specialized code paths.
77
+
for stable queries by generating specialized code paths.
79
78
80
79
In a paper by Timo Kersten and others - [Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask](https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf) they showed that the performance of
81
80
both approaches was pretty much on-par, with the results showing that data-centric code generation
82
81
being slightly better at compute intensive queries and vectorized being better at memory-bound
0 commit comments