
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://duckdb.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://duckdb.org/" rel="alternate" type="text/html" /><updated>2026-05-28T14:10:49+00:00</updated><id>https://duckdb.org/feed.xml</id><title type="html">DuckDB</title><subtitle>DuckDB is an in-process SQL database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bindings for C/C++, Python, R, Java, Node.js, Go and other languages.</subtitle><author><name>GitHub User</name><email>your-email@domain.com</email></author><entry><title type="html">Test-Driving the Lance Lakehouse Format in DuckDB</title><link href="https://duckdb.org/2026/05/21/test-driving-lance.html" rel="alternate" type="text/html" title="Test-Driving the Lance Lakehouse Format in DuckDB" /><published>2026-05-21T00:00:00+00:00</published><updated>2026-05-21T00:00:00+00:00</updated><id>https://duckdb.org/2026/05/21/test-driving-lance</id><content type="html" xml:base="https://duckdb.org/2026/05/21/test-driving-lance.html"><![CDATA[<p>With the <a href="/docs/current/core_extensions/lance.html"><code class="language-plaintext highlighter-rouge">lance</code> extension</a>, DuckDB users can query Lance datasets with the same familiar SQL interface (via the CLI or SDKs), while adding capabilities for AI and retrieval workloads. This blog post highlights how Lance is a good option for workloads that need to support storage and querying of vectors, rich table operations, and AI-oriented access patterns, while also supporting scan-friendly analytical workloads at scale. And with DuckDB, it becomes trivial to query those kinds of datasets in SQL.</p>

<blockquote>
  <p>In this blog, there will be some mentions of “retrieval workloads” and “AI data patterns” or “AI datasets”. By “retrieval workloads,” we mean queries that find rows by similarity or keyword relevance, such as vector search and full-text search, rather than by exact filters or aggregations. By “AI data patterns,” we mean datasets that mix embeddings, images, or audio alongside scalar metadata.</p>
</blockquote>

<h2 id="what-is-lance">What is Lance?</h2>

<p><a href="https://lance.org/">Lance</a> is an open lakehouse format designed for modern ML and AI workloads. Unlike Parquet, Lance is a file format, a table format, and a lightweight catalog spec all at once. At the table format level, Lance supports versioning, schema evolution, indexes, and transactional updates through MVCC and ACID-style semantics. In practice, this means Lance is built for datasets that change over time and need more than read-only scans.</p>

<p>This matters because many AI datasets are no longer just rows of scalar values. They often contain embeddings, long-form text, images, audio, metadata for filtering, and indexes used for retrieval. A format that works well for these workloads needs to do more than store and scan columns efficiently. It also needs to support search, updates, and lifecycle operations without forcing users into managing multiple different systems.</p>

<p>The mental model is still familiar to users coming from Parquet: columnar data in an open format, queried with standard analytical tools. Lance's fragment-based layout stores data in small columnar chunks. This design enables efficient random access without trade-offs in scan performance or memory utilization, something that has historically been difficult to achieve in columnar formats, and that the Lance team examines in <a href="https://arxiv.org/abs/2504.15247">this 2025 paper on adaptive structural encodings</a>.</p>

<p>On the data evolution side, adding columns or backfilling existing rows with new data only writes new files without touching existing ones. That makes schema changes lightweight in practice, which is useful for workflows where columns are added incrementally, such as appending derived features or embeddings to an existing dataset.</p>

<h2 id="the-lance-duckdb-extension">The Lance DuckDB Extension</h2>

<p>The <a href="/docs/current/core_extensions/lance.html"><code class="language-plaintext highlighter-rouge">lance</code> extension</a> brings Lance into DuckDB as part of the SQL-based workflow. You can read Lance datasets directly, write to them via <code class="language-plaintext highlighter-rouge">COPY</code>, attach them as table namespaces, build indexes, and query them with regular DuckDB SQL. On top of that, the extension exposes Lance-native search functionality through SQL table functions.</p>

<p>This fits naturally with how DuckDB is already used: as a single, embedded SQL query engine that operates on many different data sources and file formats. With the Lance extension, DuckDB remains the familiar query engine, while Lance provides the storage, indexing, and search capabilities underneath, which is especially beneficial when your data is multimodal and includes embeddings.</p>

<h3 id="example-usage">Example Usage</h3>

<p>Installing and using the extension is straightforward:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> lance</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> lance</span><span class="p">;</span>

<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="s1">'path/to/dataset.lance'</span>
<span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</code></pre></div></div>

<p>DuckDB can also write Lance datasets directly:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">COPY</span> <span class="p">(</span>
    <span class="k">SELECT</span> <span class="o">*</span>
    <span class="k">FROM</span> <span class="p">(</span>
        <span class="k">VALUES</span>
            <span class="p">(</span><span class="mi">1</span><span class="p">::</span><span class="nb">BIGINT</span><span class="p">,</span> <span class="s1">'duck'</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">]::</span><span class="nb">FLOAT</span><span class="p">[</span><span class="mi">3</span><span class="p">]),</span>
            <span class="p">(</span><span class="mi">2</span><span class="p">::</span><span class="nb">BIGINT</span><span class="p">,</span> <span class="s1">'horse'</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]::</span><span class="nb">FLOAT</span><span class="p">[</span><span class="mi">3</span><span class="p">]),</span>
            <span class="p">(</span><span class="mi">3</span><span class="p">::</span><span class="nb">BIGINT</span><span class="p">,</span> <span class="s1">'dragon'</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">]::</span><span class="nb">FLOAT</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span>
        <span class="p">)</span> <span class="k">AS</span> <span class="n">t</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">animal</span><span class="p">,</span> <span class="n">vec</span><span class="p">)</span>
<span class="p">)</span> <span class="k">TO</span> <span class="s1">'path/to/out.lance'</span> <span class="p">(</span><span class="k">FORMAT</span> <span class="k">lance</span><span class="p">,</span> <span class="n">mode</span> <span class="s1">'overwrite'</span><span class="p">);</span>
</code></pre></div></div>

<p>Once the data is in Lance, DuckDB can query it with Lance-native search operators. For example, hybrid search combines vector similarity and keyword relevance in one SQL query:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">id</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">_hybrid_score</span><span class="p">,</span> <span class="n">_distance</span><span class="p">,</span> <span class="n">_score</span>
<span class="k">FROM</span> <span class="nf">lance_hybrid_search</span><span class="p">(</span>
    <span class="s1">'path/to/dataset.lance'</span><span class="p">,</span>
    <span class="s1">'vec'</span><span class="p">,</span>
    <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">]::</span><span class="nb">FLOAT</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span>
    <span class="s1">'text'</span><span class="p">,</span>
    <span class="s1">'puppy'</span><span class="p">,</span>
    <span class="n">k</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span>
    <span class="n">prefilter</span> <span class="o">=</span> <span class="k">false</span><span class="p">,</span>
    <span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">,</span>
    <span class="n">oversample_factor</span> <span class="o">=</span> <span class="mi">4</span>
<span class="p">)</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">_hybrid_score</span> <span class="k">DESC</span><span class="p">;</span>
</code></pre></div></div>

<p>The extension also exposes <code class="language-plaintext highlighter-rouge">lance_vector_search(...)</code> for vector similarity search and <code class="language-plaintext highlighter-rouge">lance_fts(...)</code> for full-text search, so users can choose the retrieval mode that fits their workload.</p>

<p>If you want table-style access instead of path-based access, you can attach a directory as a Lance namespace:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'path/to/dir'</span> <span class="k">AS</span> <span class="n">ns</span> <span class="p">(</span><span class="k">TYPE</span> <span class="k">lance</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="nf">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">ns.main.my_table</span><span class="p">;</span>
</code></pre></div></div>

<p>Index creation also happens through SQL. For example, a vector index can be created directly on a Lance dataset:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">vec_idx</span> <span class="k">ON</span> <span class="s1">'path/to/dataset.lance'</span> <span class="p">(</span><span class="n">vec</span><span class="p">)</span>
<span class="k">USING</span> <span class="k">IVF_FLAT</span> <span class="k">WITH</span> <span class="p">(</span><span class="n">num_partitions</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">metric_type</span> <span class="o">=</span> <span class="s1">'l2'</span><span class="p">);</span>
</code></pre></div></div>

<p>The extension surface goes well beyond read-only scans. In the current implementation, DuckDB can:</p>

<ul>
  <li>Read Lance datasets with direct path scans</li>
  <li>Write and append Lance datasets with <code class="language-plaintext highlighter-rouge">COPY ... TO ... (FORMAT lance)</code></li>
  <li>Run vector, full-text, and hybrid search with SQL functions</li>
  <li>Attach local directories or custom catalogs via REST namespaces</li>
  <li>Create, update, delete, merge, and alter tables in attached namespaces</li>
  <li>Create and manage vector, scalar, and full-text indexes</li>
  <li>Run maintenance operations such as compaction, cleanup, and index optimization</li>
</ul>

<p>Note that this extension is not just a file reader, but it also gives DuckDB users a way to work with Lance as an operational table format from inside SQL.</p>

<h2 id="why-lance-and-duckdb">Why Lance and DuckDB?</h2>

<p>The combination of Lance and DuckDB is compelling for three reasons.</p>

<p>First, it gives users one SQL surface for analytics plus retrieval. The same DuckDB workflow can scan a dataset, filter it, join it with other tables, compute aggregates, and then run vector search or hybrid search over the result set. That is a good fit for AI applications where retrieval is only one step in a larger analytical pipeline.</p>

<p>Second, Lance is a table format for more than traditional analytics. Many AI pipelines need versioned datasets, updates, deletes, <code class="language-plaintext highlighter-rouge">MERGE</code>-style changes, index management, and schema evolution. The DuckDB extension exposes these capabilities through SQL, which means users do not need to leave the DuckDB environment just because their dataset is doing more than serving analytical reads.</p>

<p>Third, the workflow scales from local files to remote storage without changing the mental model. You can start with a local <code class="language-plaintext highlighter-rouge">lance</code> dataset, then <a href="/docs/current/core_extensions/lance.html#query-a-lance-dataset">move to object storage</a>.</p>

<p>The extension also supports REST namespaces, so DuckDB can connect to a remote Lance catalog (including <a href="https://docs.lancedb.com/enterprise/index">LanceDB Enterprise</a>) and treat it like an attached database. That makes the local-to-remote storage progression feel incremental rather than disruptive.</p>

<p>To sum up, DuckDB remains the familiar SQL engine, while Lance adds storage and indexing features that are especially useful when the same dataset powers both analytics and retrieval.</p>

<h2 id="performance-experiment">Performance Experiment</h2>

<p><a href="https://laion.ai/">LAION</a> is an open dataset of image/caption pairs scraped from the public web, originally released to support research on models like <a href="https://openai.com/index/clip/">CLIP</a>, which learn a shared embedding space for images and text. The full release spans billions of pairs. For this experiment, we used the <code class="language-plaintext highlighter-rouge">lance-format/laion-1m</code> subset on Hugging Face Hub, which is easy to reproduce locally.</p>

<p>Each row carries a caption, a 768-dimensional CLIP image embedding, the raw image bytes, and scalar metadata like width, height, and NSFW flags. This mix of scalar, text, vector, and blob data in a single table makes it a useful workload for comparing formats, and it is structurally different from the wide-but-flat schemas like TPC-H or ClickBench that are traditionally used for analytical benchmarks.</p>

<p>The public Hugging Face export used by the benchmark currently materializes 69,632 rows locally, not the full million-row source dataset. The runner first downloads the public Parquet shards, then builds all local artifacts from that same baseline: an LZ4-compressed Parquet file, an indexed DuckDB database, and a Lance dataset. Generated files are reused across runs, so the initial download is the only networked step.</p>

<blockquote>
  <p>The experiments were run on an Apple MacBook Pro with a 10-core M1 Max CPU and 32 GB of RAM, running DuckDB 1.5.2.</p>
</blockquote>

<p>The benchmark was run using DuckDB as the query engine for the following three storage formats:</p>

<ul>
  <li><strong>Parquet</strong>: DuckDB scanning the LZ4-compressed Parquet baseline directly, with no auxiliary indexes.</li>
  <li><strong>DuckDB indexed</strong>: the same baseline loaded into a DuckDB table, with DuckDB's <code class="language-plaintext highlighter-rouge">vss</code> (HNSW) and <code class="language-plaintext highlighter-rouge">fts</code> extensions layered on top, plus scalar indexes on filter columns. This is the typical “build it yourself in DuckDB” stack.</li>
  <li><strong>Lance native</strong>: the same baseline written to a Lance dataset with a vector index, a full-text index, and native blob storage, queried through the DuckDB lance extension.</li>
</ul>

<p>The workloads are aligned by task across the three paths, even though the exact SQL differs by storage/indexing backend:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">fts</code>: find rows by keyword search over the caption text.</li>
  <li><code class="language-plaintext highlighter-rouge">vector_exact</code>: run nearest-neighbor search over the CLIP embedding column without using an approximate vector index.</li>
  <li><code class="language-plaintext highlighter-rouge">vector_indexed</code>: run nearest-neighbor search over the same embedding column using the available vector index.</li>
  <li><code class="language-plaintext highlighter-rouge">hybrid</code>: combine text search and vector search into one retrieval query, returning the best-ranked matches from both signals.</li>
  <li><code class="language-plaintext highlighter-rouge">blob_read</code>: fetch image bytes for selected rows, which exercises random access to large binary values rather than just scalar or vector columns.</li>
</ul>

<p>Each workload is run five times by default, and the tables below report the average. The full scripts and SQL queries are in the <code class="language-plaintext highlighter-rouge">laion_1m</code> benchmark directory.</p>

<h3 id="cold-results">Cold Results</h3>

<p>The table below runs each workload cold, in a fresh DuckDB process, so it captures process startup, file open, and first-query cost. It's closest to what a one-off script or a cron job would see.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Workload</th>
      <th style="text-align: right">Parquet</th>
      <th style="text-align: right">DuckDB indexed</th>
      <th style="text-align: right">Lance native</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">fts</code></td>
      <td style="text-align: right">12 ms</td>
      <td style="text-align: right">11 ms</td>
      <td style="text-align: right">21 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">vector_exact</code></td>
      <td style="text-align: right">695 ms</td>
      <td style="text-align: right">61 ms</td>
      <td style="text-align: right">89 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">vector_indexed</code></td>
      <td style="text-align: right">761 ms</td>
      <td style="text-align: right">104 ms</td>
      <td style="text-align: right">12 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">hybrid</code></td>
      <td style="text-align: right">465 ms</td>
      <td style="text-align: right">80 ms</td>
      <td style="text-align: right">17 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">blob_read</code></td>
      <td style="text-align: right">1559 ms</td>
      <td style="text-align: right">271 ms</td>
      <td style="text-align: right">278 ms</td>
    </tr>
  </tbody>
</table>

<p>In the cold run, Lance stands out in the <code class="language-plaintext highlighter-rouge">vector_indexed</code> and <code class="language-plaintext highlighter-rouge">hybrid</code> workloads. DuckDB’s own format does well in <code class="language-plaintext highlighter-rouge">vector_exact</code> and <code class="language-plaintext highlighter-rouge">fts</code>, while the <code class="language-plaintext highlighter-rouge">blob_read</code> workload is pretty much on par. Parquet is not well optimized for vector searches or blob reads, but does well on a simple text search powered by regex.</p>

<h3 id="warm-results">Warm Results</h3>

<p>The warm results are from running all workloads in a single DuckDB session after a silent warmup pass, so caches, memory-mapped pages, and loaded indexes are already primed.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Workload</th>
      <th style="text-align: right">Parquet</th>
      <th style="text-align: right">DuckDB indexed</th>
      <th style="text-align: right">Lance native</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">fts</code></td>
      <td style="text-align: right">12 ms</td>
      <td style="text-align: right">10 ms</td>
      <td style="text-align: right">7 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">vector_exact</code></td>
      <td style="text-align: right">703 ms</td>
      <td style="text-align: right">30 ms</td>
      <td style="text-align: right">50 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">vector_indexed</code></td>
      <td style="text-align: right">755 ms</td>
      <td style="text-align: right">2 ms</td>
      <td style="text-align: right">5 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">hybrid</code></td>
      <td style="text-align: right">471 ms</td>
      <td style="text-align: right">11 ms</td>
      <td style="text-align: right">8 ms</td>
    </tr>
    <tr>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">blob_read</code></td>
      <td style="text-align: right">1484 ms</td>
      <td style="text-align: right">266 ms</td>
      <td style="text-align: right">276 ms</td>
    </tr>
  </tbody>
</table>

<p>When caches and indexes are already warm, both DuckDB and Lance are significantly faster than using Parquet on retrieval workloads.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Lance is a relatively new addition to the world of open lakehouse formats. It is designed for datasets that change over time, contain more than scalar values, and need to support both search and retrieval alongside traditional scan workloads. From DuckDB, the extension makes these capabilities available through SQL, while preserving the familiar embedded workflow. The benchmark results reflect, particularly in cold runs, how Lance is a good alternative to DuckDB’s own format for vector and hybrid search.</p>

<blockquote>
  <p>The Lance support in DuckDB was made possible through a collaboration between <a href="https://ducklabs.com/">DuckLabs</a> and <a href="https://www.lancedb.com/">LanceDB</a>.</p>
</blockquote>]]></content><author><name>LanceDB team and Guillermo Sanchez</name></author><category term="extensions" /><summary type="html"><![CDATA[Lance is an open lakehouse format with a design geared toward AI workloads. LanceDB and DuckLabs have partnered to bring you fast vector and hybrid search directly from DuckDB SQL, without leaving your analytical workflow. In this post, we explain what Lance is, how to use it in DuckDB, and, of course, show some benchmark results.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/testing-lance.jpg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/testing-lance.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">DuckDB 1.5.3: Not an Ordinary Patch Release</title><link href="https://duckdb.org/2026/05/20/announcing-duckdb-153.html" rel="alternate" type="text/html" title="DuckDB 1.5.3: Not an Ordinary Patch Release" /><published>2026-05-20T00:00:00+00:00</published><updated>2026-05-20T00:00:00+00:00</updated><id>https://duckdb.org/2026/05/20/announcing-duckdb-153</id><content type="html" xml:base="https://duckdb.org/2026/05/20/announcing-duckdb-153.html"><![CDATA[<p>In this blog post, we highlight a few important features shipped in DuckDB v1.5.3, the third patch release in <a href="/2026/03/09/announcing-duckdb-150.html">DuckDB's v1.5 line</a>.
You can find the complete <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.3">release notes on GitHub</a>.</p>

<p>To install the new version, please visit the <a href="/install/">installation page</a>.</p>

<h2 id="whats-new">What's New</h2>

<p>While DuckDB v1.5.3 is a patch release, its extensions brings various new features.
We list these below.</p>

<h3 id="quack-as-a-core-extension">Quack as a Core Extension</h3>

<p>On May 12, we introduced Quack, our new remote protocol that turns DuckDB into a client-server database.
If you are new to Quack and don't know where to start, check out the following resources:</p>

<ul>
  <li>For a high-level overview, see the <a href="/quack/">Quack explainer page</a>.</li>
  <li>For the rationale and history behind Quack, along with an introduction of the protocol and its features, see the <a href="/2026/05/12/quack-remote-protocol.html">announcement blog post</a>.</li>
  <li>For the reference manual and setup guide, check out the <a href="/docs/current/quack/overview.html">Quack documentation</a>.</li>
</ul>

<p>Starting from DuckDB v1.5.3, we ship Quack as a <a href="/docs/current/core_extensions/quack.html">core extension</a>. This means that you can now start using Quack right away from any client running DuckDB:
it will be transparently autoinstalled and <a href="/docs/current/extensions/overview.html#autoloading-extension">autoloaded</a> on first use.</p>

<!-- markdownlint-disable MD001 -->

<div class="duck-diagram">

  <div class="duck-diagram-box">

    <h4 id="duckdb-server"><svg class="icon"><use href="#database-01"></use></svg> DuckDB Server</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CALL</span> <span class="nf">quack_serve</span><span class="p">(</span>
    <span class="s1">'quack:localhost'</span><span class="p">,</span>
    <span class="k">token</span> <span class="o">=</span> <span class="s1">'super_secret'</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">hello</span> <span class="k">AS</span>
    <span class="k">FROM</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'world'</span><span class="p">)</span> <span class="n">v</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
</code></pre></div>    </div>

  </div>

  <div class="duck-diagram-arrow">quack:</div>

  <div class="duck-diagram-box">

    <h4 id="duckdb-client"><svg class="icon"><use href="#database-01"></use></svg> DuckDB Client</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">SECRET</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="n">quack</span><span class="p">,</span>
    <span class="k">TOKEN</span> <span class="s1">'super_secret'</span>
<span class="p">);</span>

<span class="k">ATTACH</span> <span class="s1">'quack:localhost'</span> <span class="k">AS</span> <span class="n">remote</span><span class="p">;</span>
<span class="k">FROM</span> <span class="n">remote.hello</span><span class="p">;</span>
</code></pre></div>    </div>

  </div>

</div>

<!-- markdownlint-enable MD001 -->

<p>Please note that Quack is still in beta state and breaking changes may happen in the protocol, in function names, etc.
We plan to release the production-ready version of Quack together with <a href="/release_calendar.html">DuckDB v2.0</a> in fall 2026.</p>

<h3 id="ducklake-with-quack">DuckLake with Quack</h3>

<p>DuckLake now supports DuckDB with Quack as its catalog database (<a href="https://github.com/duckdb/ducklake/pull/1151">ducklake#1151</a>).
Let the example speak for itself!</p>

<!-- markdownlint-disable MD001 -->

<div class="duck-diagram">

  <div class="duck-diagram-box">

    <h4 id="duckdb-server-1"><svg class="icon"><use href="#database-01"></use></svg> DuckDB Server</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CALL</span> <span class="nf">quack_serve</span><span class="p">(</span>
    <span class="s1">'quack:localhost'</span><span class="p">,</span>
    <span class="k">token</span> <span class="o">=&gt;</span> <span class="s1">'oogieboogie'</span>
<span class="p">);</span>
</code></pre></div>    </div>

  </div>

  <div class="duck-diagram-arrow">quack:</div>

  <div class="duck-diagram-box">

    <h4 id="duckdb-client-1"><svg class="icon"><use href="#database-01"></use></svg> DuckDB Client</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> ducklake</span><span class="p">;</span>

<span class="k">CREATE</span> <span class="k">SECRET</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="n">quack</span><span class="p">,</span> <span class="k">TOKEN</span> <span class="s1">'oogieboogie'</span>
<span class="p">);</span>
<span class="k">ATTACH</span> <span class="s1">'ducklake:quack:localhost'</span>
    <span class="k">AS</span> <span class="n">lake</span> <span class="p">(</span><span class="k">DATA_PATH</span> <span class="s1">'data'</span><span class="p">);</span>
<span class="k">USE</span> <span class="n">lake</span><span class="p">;</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">pond</span> <span class="p">(</span>
    <span class="n">id</span> <span class="nb">INT</span><span class="p">,</span>
    <span class="n">species</span> <span class="nb">VARCHAR</span><span class="p">,</span>
    <span class="n">weight</span> <span class="nb">DOUBLE</span>
<span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">pond</span> <span class="k">VALUES</span>
    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'mallard'</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'pintail'</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">pond</span> <span class="k">VALUES</span>
    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'wood duck'</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">pond</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">id</span><span class="p">;</span>
</code></pre></div>    </div>

  </div>

</div>

<!-- markdownlint-enable MD001 -->

<h3 id="aws-extension-features">AWS Extension Features</h3>

<p>The <a href="/docs/current/core_extensions/aws.html">AWS extension</a> now supports the <a href="https://github.com/duckdb/duckdb-aws/pull/136"><code class="language-plaintext highlighter-rouge">web_identity</code> chain type for IAM Roles for Service Accounts (IRSA) support</a>.
This was made possible through a contribution by community member <a href="https://github.com/mst">Marcel Steinbach (@mst)</a>.</p>

<p>The AWS extension now also supports <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.html">IAM authentication</a> for managed PostgreSQL databases running on RDS/Aurora. For more details, see the <a href="/docs/current/core_extensions/postgres/secrets.html#aws-rds-iam-authentication">AWS RDS IAM Authentication section</a> in the documentation.</p>

<h3 id="http_proxy-variable-for-the-https-extension"><code class="language-plaintext highlighter-rouge">HTTP_PROXY</code> Variable for the HTTPS Extension</h3>

<p>Setting the <code class="language-plaintext highlighter-rouge">HTTP_PROXY</code> environment variable now sets the <code class="language-plaintext highlighter-rouge">http_proxy</code> DuckDB configuration option (<a href="https://github.com/duckdb/duckdb/pull/22541">duckdb#22541</a>).
This option makes sure that extensions installs are also passing through the proxy, which may come in handy in e.g. environments that use firewalls.</p>

<p>Note that since the <a href="/2026/03/09/announcing-duckdb-150.html#network-stack">introduction of <code class="language-plaintext highlighter-rouge">curl</code> into DuckDB's network stack</a>, <code class="language-plaintext highlighter-rouge">curl</code> automatically uses <code class="language-plaintext highlighter-rouge">HTTP_PROXY</code> and <code class="language-plaintext highlighter-rouge">HTTPS_PROXY</code>, so now implicitly also DuckDB handles those parameters when the <code class="language-plaintext highlighter-rouge">httpfs</code> extension is loaded with the default <code class="language-plaintext highlighter-rouge">curl</code> backend.</p>

<h3 id="iceberg">Iceberg</h3>

<p>The DuckDB-Iceberg extension has shipped a number of features between DuckDB v1.5.2 and v1.5.3. Most notably:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">MERGE INTO</code> is now supported for Iceberg tables (<a href="https://github.com/duckdb/duckdb-iceberg/pull/788">iceberg#788</a>)</li>
  <li>The <code class="language-plaintext highlighter-rouge">INSERT</code> and <code class="language-plaintext highlighter-rouge">UPDATE</code> statements are now supported on partitioned Iceberg tables with a <code class="language-plaintext highlighter-rouge">truncate</code> or <code class="language-plaintext highlighter-rouge">bucket</code> transform (<a href="https://github.com/duckdb/duckdb-iceberg/pull/879">iceberg#879</a>)</li>
  <li><a href="/docs/current/sql/statements/create_table.html#create-table--as-select-ctas">CTAS</a> statements in DuckDB-Iceberg using <a href="/docs/current/clients/adbc.html">ADBC</a> are now possible (<a href="https://github.com/duckdb/duckdb-iceberg/pull/974">iceberg#974</a>)</li>
  <li>We added the <code class="language-plaintext highlighter-rouge">iceberg_schema_properties</code>, <code class="language-plaintext highlighter-rouge">set_iceberg_schema_properties</code>, and <code class="language-plaintext highlighter-rouge">remove_iceberg_schema_properties</code> functions to allow getting, setting, and removing Iceberg schema properties (<a href="https://github.com/duckdb/duckdb-iceberg/pull/960">iceberg#960</a>)</li>
  <li><code class="language-plaintext highlighter-rouge">ALTER TABLE</code> support has been added for Iceberg tables (<a href="https://github.com/duckdb/duckdb-iceberg/pull/932">iceberg#932</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/928">iceberg#928</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/924">iceberg#924</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/912">iceberg#912</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/904">iceberg#904</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/853">iceberg#853</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/985">iceberg#985</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/981">iceberg#981</a>)</li>
  <li>Support for the <code class="language-plaintext highlighter-rouge">GEOMETRY</code> type has been added for Iceberg tables (<a href="https://github.com/duckdb/duckdb-iceberg/pull/968">iceberg#968</a>, <a href="https://github.com/duckdb/duckdb-iceberg/pull/902">iceberg#902</a>)</li>
</ul>

<h2 id="development-and-internals">Development and Internals</h2>

<h3 id="shipping-jemalloc-as-a-statically-linked-library">Shipping jemalloc as a Statically Linked Library</h3>

<p>The <a href="/docs/current/internals/jemalloc.html">jemalloc allocator</a> is now part of core DuckDB (<a href="https://github.com/duckdb/duckdb/pull/22603">duckdb#22603</a>) as a static third-party library which is included and linked by default on Linux.
Previously jemalloc was a statically-linked extension – the new packaging is cleaner since other DuckDB extensions can be loaded dynamically.</p>

<h3 id="disable_extension_load-flag"><code class="language-plaintext highlighter-rouge">DISABLE_EXTENSION_LOAD</code> Flag</h3>

<p>The <code class="language-plaintext highlighter-rouge">DISABLE_EXTENSION_LOAD</code> compile-time flag was fixed in <a href="https://github.com/duckdb/duckdb/pull/22019">duckdb#22019</a>.
When compiling DuckDB with this flag, loading extensions is disabled.</p>

<h2 id="coming-up">Coming Up</h2>

<p>We have two events coming up in the next few weeks:</p>

<p><strong>DuckCon #7.</strong> On June 24, we'll host our next user conference, <a href="/events/2026/06/24/duckcon7/">DuckCon #7</a>, in Amsterdam's beautiful <a href="https://www.kit.nl/about-us/">Royal Tropical Institute</a>.</p>

<p><strong>Ubuntu Summit Talk.</strong> Next week, Gábor Szárnyas of DuckLabs will give a talk titled <a href="/library/duckdb-not-quack-science/">“DuckDB: Not Quack Science”</a> at the <a href="https://ubuntu.com/summit">Ubuntu Summit</a>. Yes, his talk will include the new <a href="#quack-as-a-core-extension">Quack</a> protocol.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This post is a short summary of the changes in v1.5.3. As usual, you can find the <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.3">full release notes on GitHub</a>.</p>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[We are releasing DuckDB version v1.5.3. While updates in DuckDB itself are limited bugfixes, the upgraded extensions shipped with v1.5.3 bring a ton of new features. These include the Quack client-server protocol, which is now available as a core extension, support for Quack in DuckLake, and several new features for Iceberg, AWS and HTTPS.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-5-3.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-5-3.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Quack: The DuckDB Client-Server Protocol</title><link href="https://duckdb.org/2026/05/12/quack-remote-protocol.html" rel="alternate" type="text/html" title="Quack: The DuckDB Client-Server Protocol" /><published>2026-05-12T00:00:00+00:00</published><updated>2026-05-12T00:00:00+00:00</updated><id>https://duckdb.org/2026/05/12/quack-remote-protocol</id><content type="html" xml:base="https://duckdb.org/2026/05/12/quack-remote-protocol.html"><![CDATA[<div class="video-container">
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/L_lttD-d1wc?si=Gd8WfFnRfXEV-M1o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
</div>

<h2 id="background-database-architectures">Background: Database Architectures</h2>

<p>When databases first emerged, there was no distinction between a ‘client’ and a ‘server’, the whole database just ran on a single computer. Somewhere in the 80s, <a href="https://en.wikipedia.org/wiki/Sybase">Sybase</a> was the first to introduce the concept of a database ‘server’ and a ‘client’ running on different computers. Ever since, it was just assumed that every database system used a client-server architecture along with a communication protocol to talk between those. This was convenient, because the single mutable state stays in a single place under the control of a server, and there can be many clients at the same time reading and writing data. There are of course drawbacks to this method, most notably, those protocols can add a significant amount of overhead. If you’re curious to read more, we <a href="/library/dont-hold-my-data-hostage/">wrote a research paper</a> on database protocols a while back.</p>

<p>Of course, there were always dissenters to the client-server architecture, most notably the ubiquitous <a href="https://sqlite.org">SQLite</a> in 2000, and of course DuckDB, first released in 2019. We made <a href="https://www.youtube.com/watch?v=9OFzOvV-to4">quite</a> <a href="https://www.youtube.com/watch?v=5ddoZR6PYNU">a lot</a> <a href="https://www.youtube.com/watch?v=Z-6SnP6yzgo">of</a> noise about implementing an in-process architecture, where there is no client-server, no protocol, just low-level API calls. This worked really well for interactive use cases in e.g., data science, where analysts would interact with their data for example in a Python notebook and their data was managed in a DuckDB instance running in the very same process. It also worked really well for the many use cases where DuckDB was just “glued” to an existing application to provide SQL functionality on data living in that application.</p>

<p>Being an in-process system works “less well” for use cases when trying to modify the same database file from multiple processes at the same time. There are a lot of use cases where this is relevant, for example, when inserting into the same database from a bunch of processes collecting telemetry while at the same time querying the same tables to drive a dashboard. There are very good technical reasons why we could not make this work, most notably, the fact that DuckDB keeps a bunch of state in main memory and would have to synchronize that state if multiple processes start making changes simultaneously.</p>

<p>And yes, there were workarounds. Of course you can whip up a custom <a href="https://en.wikipedia.org/wiki/Remote_procedure_call">Remote Procedure Call</a> (RPC) solution where there is a process that holds the DuckDB database instance and offers a service to other processes to query and insert data. There are also multiple projects out there that retrofit client-server abilities to DuckDB, for example using the <a href="https://arrow.apache.org/docs/format/FlightSql.html">Arrow Flight SQL protocol</a>. <a href="https://motherduck.com">MotherDuck</a> has their own custom client-server protocol. And of course, you can always (gasp) switch to a more traditional database system that had client-server support, for example the also-ubiquitous PostgreSQL. You can then even proceed to run a so-called “<a href="https://en.wikipedia.org/wiki/Turducken">EleDucken</a>”, DuckDB in said PostgreSQL using one of the various extensions out there that enable this, for example <a href="https://github.com/duckdb/pg_duckdb">pg_duckdb</a>.</p>

<p>The vast number of workarounds people built to bolt a client-server solution onto DuckDB has at the very least convinced us that this is something people cared about. We see DuckDB as a universal data wrangling tool. If this means having a client-server protocol in addition to the in-process capabilities – fine. If this ends up unlocking a vast new set of cases in which DuckDB can be useful – excellent! In the end we care deeply about user experience and perhaps less about having the last word on architecture. So we bit the bullet, eventually, finally, and today we are very happy to announce the result:</p>

<h2 id="introducing-the-quack-protocol-for-duckdb">Introducing the Quack Protocol for DuckDB</h2>

<p>What do two (or more) ducks do if they want to talk to each other? They <a href="https://en.wikipedia.org/wiki/Duck#Communication">quack</a>! So it is quite natural that we need to call the protocol that two DuckDB instances can use to talk to each other “Quack”, too! We had the opportunity to design a database protocol from scratch in 2026 without having to consider any legacy, which is quite a luxury. We were able to learn from the existing protocols, including the more recent Arrow Flight SQL and others. Before we dive into how Quack works internally, let's see how it works from a user perspective. First, you need two DuckDB instances. That’s right, DuckDB will act both as a client and as a server! The two instances can be on different computers a world apart (or in space) or just two different terminal windows on your laptop. First, we need to install the Quack extension in both DuckDB instances. For now, Quack lives in the <code class="language-plaintext highlighter-rouge">core_nightly</code> repository and is available in <a href="/install/">DuckDB v1.5.2</a>, the current release version:</p>

<!-- markdownlint-disable MD001 -->

<div class="duck-diagram">

  <div class="duck-diagram-box">

    <h4 id="duckdb-1"><svg class="icon"><use href="#database-01"></use></svg> DuckDB #1</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CALL</span> <span class="nf">quack_serve</span><span class="p">(</span>
    <span class="s1">'quack:localhost'</span><span class="p">,</span>
    <span class="k">token</span> <span class="o">=</span> <span class="s1">'super_secret'</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">hello</span> <span class="k">AS</span>
    <span class="k">FROM</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'world'</span><span class="p">)</span> <span class="n">v</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
</code></pre></div>    </div>

  </div>

  <div class="duck-diagram-arrow">quack:</div>

  <div class="duck-diagram-box">

    <h4 id="duckdb-2"><svg class="icon"><use href="#database-01"></use></svg> DuckDB #2</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">SECRET</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="n">quack</span><span class="p">,</span>
    <span class="k">TOKEN</span> <span class="s1">'super_secret'</span>
<span class="p">);</span>

<span class="k">ATTACH</span> <span class="s1">'quack:localhost'</span> <span class="k">AS</span> <span class="n">remote</span><span class="p">;</span>
<span class="k">FROM</span> <span class="n">remote.hello</span><span class="p">;</span>
</code></pre></div>    </div>

  </div>

</div>

<!-- markdownlint-enable MD001 -->

<p>This should show the content of the remote table hello, <code class="language-plaintext highlighter-rouge">world</code> in DuckDB #2. Witchcraft! We can also copy data from the local instance to the remote one:</p>

<!-- markdownlint-disable MD001 -->

<div class="duck-diagram">

  <div class="duck-diagram-box">

    <h4 id="duckdb-1-1"><svg class="icon"><use href="#database-01"></use></svg> DuckDB #1</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code>


<span class="c1">-- Step two</span>
<span class="k">FROM</span> <span class="n">hello2</span><span class="p">;</span>
</code></pre></div>    </div>

  </div>

  <div class="duck-diagram-arrow">quack:</div>

  <div class="duck-diagram-box">

    <h4 id="duckdb-2-1"><svg class="icon"><use href="#database-01"></use></svg> DuckDB #2</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Step one</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">remote.hello2</span> <span class="k">AS</span>
    <span class="k">FROM</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'world2'</span><span class="p">)</span> <span class="n">v</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
</code></pre></div>    </div>

  </div>

</div>

<!-- markdownlint-enable MD001 -->

<p>Similarly, you should see <code class="language-plaintext highlighter-rouge">world2</code> in the output on DuckDB #1. Obviously those are the most basic examples we can think of. Tables can be much more complex, queries can be much more complex, data volumes can be quite vast (see below). There is also a way to just ship an entire verbatim query to the remote side using the <code class="language-plaintext highlighter-rouge">query</code> function, which is better for very complex queries on large datasets and offers more control over what exactly is executed remotely:</p>

<!-- markdownlint-disable MD001 -->

<div class="duck-diagram">

  <div class="duck-diagram-box">

    <h4 id="duckdb-1-2"><svg class="icon"><use href="#database-01"></use></svg> DuckDB #1</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Waiting to serve data</span>
</code></pre></div>    </div>

  </div>

  <div class="duck-diagram-arrow">quack:</div>

  <div class="duck-diagram-box">

    <h4 id="duckdb-2-2"><svg class="icon"><use href="#database-01"></use></svg> DuckDB #2</h4>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span> <span class="n">remote.</span><span class="nf">query</span><span class="p">(</span>
    <span class="s1">'SELECT s FROM hello'</span>
<span class="p">);</span>
</code></pre></div>    </div>

  </div>

</div>

<!-- markdownlint-enable MD001 -->

<p>Of course there is much more to see here. Please <a href="/docs/current/quack/overview.html">consult our documentation</a> for more details.</p>

<h2 id="protocol-design">Protocol Design</h2>

<h3 id="http-based">HTTP-Based</h3>

<p>Quack is built straight on the venerable HTTP, the Hypertext Transfer Protocol. From its humble beginnings at CERN, HTTP has become a de-facto protocol layer on top of TCP and all the stuff below. The entire stack is optimized to transmit HTTP message streams efficiently. The protocol has surprisingly low overhead if implemented properly. Everyone and their little brother knows how to deal with HTTP in load balancing, authentication, firewalls, intrusion detection etc. It would be rather misguided not to build a database protocol on top of HTTP in 2026. HTTP also allows the <a href="/docs/current/quack/setup/quack_wasm.html">DuckDB-Wasm distribution to speak Quack natively</a>! So DuckDB running in a browser can e.g., directly connect to a DuckDB instance running in an EC2 server using Quack.</p>

<h3 id="request-response-pattern">Request-Response Pattern</h3>

<p>Interactions on Quack are always driven by the client in a request-response pattern. Quack messages are for example connection requests, to authenticate with a token as seen above. See below on how authentication and authorization are handled in detail. Subsequent messages are requests to execute a query and return the first part of the response and follow-up fetch messages to retrieve large results, possibly from multiple threads in parallel.</p>

<h3 id="serialization">Serialization</h3>

<p>Requests and responses are encoded using the new MIME type application/duckdb. This encoding leverages DuckDB’s internal efficient serialization primitives for complex structures like data types and result sets. We have been using the same primitives for example in our Write-Ahead Log (WAL) files for years, meaning they are fairly well-optimized and battle-tested.</p>

<h3 id="encryption">Encryption</h3>

<p>While we want Quack to “just work” we also are wary of the security nightmares of attaching a database directly to the evil internet, as has happened before. This is why Quack will by default generate a random authentication token at server start-up, which then has to be given to the client. In addition, the Quack server will by default only bind to localhost (which can of course be overridden). Quack does not use SSL by default, because it is a bit silly to bring all that infrastructure and add dependencies just for localhost communication. We do not recommend opening up a DuckDB Quack endpoint directly to the Internet. Instead we strongly recommend that you use a common HTTP endpoint like <a href="https://nginx.org/">nginx</a> if you should choose to expose Quack to the World Wide Web and have that proxy terminate SSL (e.g., with Let's Encrypt). The Quack client will assume SSL is enabled for non-local connections, this can be overridden. We provide a <a href="/docs/current/quack/setup/reverse_proxy.html">guide for this in our documentation</a>.</p>

<h3 id="round-trips">Round-Trips</h3>

<p>We have been careful to optimize the number of protocol round trips or request/response pairs for queries. Once connected, a query can be completely handled with a single round trip. This is a critical optimization for latency-sensitive environments. At the same time, we have seriously optimized Quack for efficient bulk response transfer. As far as we know, Quack is currently the fastest way to shove tables through a socket, and millions of rows can be transferred in a few seconds. Below are a few benchmark results.</p>

<h3 id="authentication-and-authorization">Authentication and Authorization</h3>

<p>Authentication and authorization of database queries are an endless source of joy and complexity. We are likely unable to capture everyone’s use case, certainly not in a first release. The smart thing is therefore not to try. For Quack, we have chosen an auth model that ties into DuckDB’s philosophy of extensibility. There are hundreds of DuckDB extensions out there already. Quack ships with a default Authentication method and no authorization restrictions, but both can be overridden by user-supplied code. As you have seen above, the Quack server generates a default random authentication token on startup. When a client connects, it provides an authentication string. The server side will call an authentication callback. By default, it will compare the client-supplied token with the one that was randomly generated before. But this callback can be changed through configuration! You can bring your own authentication function that for example queries an LDAP directory, reads a text file, or just rolls the dice. Up to you. Similarly, the authorization function can be changed. The default authorization function just says “yes” to everything, but you can inspect each query a client attempts to execute, correlate the query to the previously used authentication string etc. Those callbacks can even be plain SQL macros! Please see our documentation for more details.</p>

<h3 id="default-port">Default Port</h3>

<p>By default, a Quack server listens on port <code class="language-plaintext highlighter-rouge">9494</code>, the number <code class="language-plaintext highlighter-rouge">94</code> being easy to remember as the year <a href="https://en.wikipedia.org/wiki/Netscape_Navigator">Netscape Navigator</a> was released.</p>

<h2 id="benchmarks">Benchmarks</h2>

<p>We have set up two benchmarks to showcase the Quack protocol. Those benchmarks were run on AWS virtual machines running Ubuntu on Arm. We picked the <a href="https://instances.vantage.sh/aws/ec2/m8g.2xlarge">m8g.2xlarge</a> instance type, which has 8 vCPUs and 32 GB of RAM and, importantly, “up to 15 Gbps” network bandwidth. We recreated a real-world scenario where client and server are in the same data center, but on different machines. We made sure both instances were in the same “availability zone”. Ping time between the instances averaged around 0.280 ms.</p>

<h3 id="bulk-transfer">Bulk Transfer</h3>

<p>The first benchmark tests bulk transfer, the case where a fairly large number of rows should be transferred over the database protocol. If you’ve read the paper we linked above, you know that this is a case where traditional database protocols were struggling. We compare Quack with two systems: the widespread PostgreSQL protocol and the newer Arrow Flight SQL protocol. Arrow Flight is provided by the <a href="https://docs.gizmosql.com/#/">GizmoSQL</a> server that also uses DuckDB internally. We transfer an increasing number of rows of the TPC-H lineitem table, all the way up to a whopping 60 million rows (76 GB in CSV format!) and report the median wall clock time over 5 runs. We expect the modern bulk-oriented protocols to far outclass the PostgreSQL protocol. Here are the results:</p>

<div class="figure-title">Runtimes of bulk transfer operations (lower is better)</div>
<p><img src="/images/blog/quack/quack-bulk-light.svg" alt="Bulk transfer performance" width="809" height="514" class="lightmode-img" />
<img src="/images/blog/quack/quack-bulk-dark.svg" alt="Bulk transfer performance" width="809" height="514" class="darkmode-img" /></p>

<details>
  <summary>
Would you like to see the results as a table? Click here.
</summary>
  <div>
<table>
<thead>
<tr>
<th style="text-align: right;">Rows</th>
<th style="text-align: right;">DuckDB Quack</th>
<th style="text-align: right;">Arrow Flight</th>
<th style="text-align: right;">PostgreSQL</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">100k</td>
<td style="text-align: right;"><strong>0.07 s</strong></td>
<td style="text-align: right;"><strong>0.07 s</strong></td>
<td style="text-align: right;">0.20 s</td>
</tr>
<tr>
<td style="text-align: right;">1M</td>
<td style="text-align: right;"><strong>0.24 s</strong></td>
<td style="text-align: right;">0.38 s</td>
<td style="text-align: right;">2.20 s</td>
</tr>
<tr>
<td style="text-align: right;">10M</td>
<td style="text-align: right;"><strong>0.89 s</strong></td>
<td style="text-align: right;">2.90 s</td>
<td style="text-align: right;">25.64 s</td>
</tr>
<tr>
<td style="text-align: right;">60M</td>
<td style="text-align: right;"><strong>4.94 s</strong></td>
<td style="text-align: right;">17.40 s</td>
<td style="text-align: right;">158.37 s</td>
</tr>
</tbody>
</table>
</div>
</details>

<p>We can see how Quack is doing great for bulk result set transfer, transferring the 60 million rows in under 5 seconds! Even the purpose-built Arrow Flight SQL protocol can’t compete here, and Postgres’ row-based protocol is rather hopeless in general.</p>

<p>In fairness we have to mention that the standard PostgreSQL clients do not parallelize reads over multiple threads, but Quack and Arrow can. Shameless plug: DuckDB’s <a href="/docs/current/core_extensions/postgres/overview.html">PostgreSQL client</a> can also do that in some cases!</p>

<h3 id="small-writes">Small Writes</h3>

<p>The second benchmark tests small appends. This is a common use case to, for example, centralize observability data in a single central DuckDB instance. This stresses the database protocol in a different way, for example, multiple round trips between client and server to complete a single transaction will be a disadvantage. We test this by creating an empty table with the same structure as the TPC-H lineitem table, and then insert randomized values into it, each row in its own <code class="language-plaintext highlighter-rouge">INSERT</code> transaction. The inserted values somewhat follow the distribution of the usual benchmark generator. We ran an increasing amount of parallel threads for five seconds. We repeated this experiment five times and reported the median transactions per second.</p>

<p>We expect a highly transaction-optimized system like PostgreSQL to dominate this benchmark. We also expect the bulk-optimized Arrow Flight to not do particularly well.</p>

<div class="figure-title">Throughput of small writes (higher is better)</div>
<p><img src="/images/blog/quack/quack-transactional-light.svg" alt="Small writes performance" width="800" height="628" class="lightmode-img" />
<img src="/images/blog/quack/quack-transactional-dark.svg" alt="Small writes performance" width="800" height="628" class="darkmode-img" /></p>

<details>
  <summary>
Would you like to see the results as a table? Click here.
</summary>
  <div>
<table>
<thead>
<tr>
<th style="text-align: right;">Threads</th>
<th style="text-align: right;">DuckDB Quack</th>
<th style="text-align: right;">Arrow Flight</th>
<th style="text-align: right;">PostgreSQL</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">1</td>
<td style="text-align: right;"><strong>1,038 tx/s</strong></td>
<td style="text-align: right;">469 tx/s</td>
<td style="text-align: right;">839 tx/s</td>
</tr>
<tr>
<td style="text-align: right;">2</td>
<td style="text-align: right;"><strong>1,956 tx/s</strong></td>
<td style="text-align: right;">799 tx/s</td>
<td style="text-align: right;">1,094 tx/s</td>
</tr>
<tr>
<td style="text-align: right;">4</td>
<td style="text-align: right;"><strong>3,504 tx/s</strong></td>
<td style="text-align: right;">1,224 tx/s</td>
<td style="text-align: right;">2,180 tx/s</td>
</tr>
<tr>
<td style="text-align: right;">8</td>
<td style="text-align: right;"><strong>5,434 tx/s</strong></td>
<td style="text-align: right;">1,358 tx/s</td>
<td style="text-align: right;">4,320 tx/s</td>
</tr>
</tbody>
</table>
</div>
</details>

<p>Quite surprisingly, we see Quack outperforming PostgreSQL up to 8 parallel threads to a maximum transaction rate of around 5,500 transactions per second. Beyond that, we hit a current limitation of DuckDB itself in concurrent insertions per second into the same table. PostgreSQL scales better here, which is something to look into for us in the near future. Arrow Flight is not doing too well, being roughly half as fast as Postgres, as expected.</p>

<p><a href="https://github.com/duckdb/duckdb-quack/tree/v1.5-variegata/benchmarks">Benchmark scripts are available online.</a></p>

<h2 id="conclusion">Conclusion</h2>

<p>Today we released Quack, a client-server protocol for DuckDB along with an initial implementation as a DuckDB extension. Quack unlocks a full multiplayer experience with DuckDB, where multiple separate processes – locally or remote – can now modify contents of tables in parallel without locking each other out. And while part of this could also already be achieved with <a href="https://ducklake.select/">DuckLake</a>, Quack makes this far simpler and provides far higher performance.</p>

<h3 id="use-cases">Use Cases</h3>

<p>With Quack, DuckDB can now be useful in a wide range of new use cases, where centralizing state is more important than hyper-local querying. We have already had to learn that data is not always local with the rise of data lakes. Speaking of lakes, Quack is also going to be integrated into DuckLake so that DuckDB itself can be a remotely-accessible Catalog server. This will unlock new capabilities, e.g., for data inlining. If you have more questions on this, please consult the <a href="/quack/faq.html">Quack FAQ</a>.</p>

<p>Overall, DuckDB is moving further out of its initial niche of an in-process database for interactive analytics into a core building block of modern data architecture. We have been playing with Quack for a while now, and are quite excited to hear what you are going to build with it. If you have any suggestions on how Quack could be improved, let us know! And hey, the MythBusters have already <a href="https://www.youtube.com/watch?v=WevspopGGeY">proven that a duck’s quack echos</a>, so let's see what kind of noise this leads to.</p>

<h3 id="next-steps">Next Steps</h3>

<p>There are of course a lot of things still to do. First off, we are going to integrate Quack into DuckLake, so that it becomes possible to use a remote DuckDB server as a DuckLake catalog! We expect this to greatly improve performance, especially with inlining. Next, we are going to polish Quack over the coming months and release a first production release together with <a href="/release_calendar.html">DuckDB v2.0</a> when it's coming in fall this year. We plan for example to enable auto-installation and auto-loading of the Quack extension whenever it is needed. Using our <a href="/docs/current/sql/peg_parser.html">new parser</a>, we are also planning to improve on the syntax for talking to remote SQL databases from DuckDB. On the core DuckDB side, we plan to work on greatly increasing the transactions per second achievable, so we can scale transactions far beyond eight parallel threads.</p>

<p>Further on, we are thinking about allowing extensions to the Quack protocol beyond authentication and authorization, for example, by allowing DuckDB extensions to add new protocol messages and the code to handle them. And we are also thinking about adding a replication protocol on top of Quack so that changes to a DuckDB instance can be replicated to other servers, for example to set up a cluster of read replicas.</p>

<p>If you want to learn more about Quack – and hear about its initial adoption – join our community conference, <a href="/events/2026/06/24/duckcon7/">DuckCon #7</a>, on June 24. DuckCon will start with the <a href="/library/duckcon-opening/">“State of the Duck”</a> talk presented by the co-creators of DuckDB. You can either join in-person or watch the online stream on YouTube.</p>

<p>PS: We have a separate page for the <a href="/quack/">Quack project</a>, make sure you give it a visit.</p>

<h2 id="acknowledgements">Acknowledgements</h2>

<p>We would like to thank Boaz Leskes from <a href="https://motherduck.com/">MotherDuck</a> for sharing their lessons learned from building the MotherDuck protocol with us. We also want to thank Philip Moore from <a href="https://gizmodata.com/gizmosql">GizmoSQL / GizmoData</a>, who has blazed this trail for us already and shown that client-server DuckDB is a very worthwhile thing.</p>

<h2 id="appendix-why-not-arrow-flight-sql">Appendix: Why Not Arrow Flight SQL?</h2>

<p>We also have to address one of the few elephants in the room: why on earth did we not use the existing Arrow Flight SQL protocol? It’s there. It’s available. There are existing implementations. We see the value in Arrow and related projects like ADBC: they are interchange APIs like ODBC and JDBC before them aimed at reducing friction in exchanging data between systems. And that works pretty well.</p>

<p>However, we are also wary of using interchange formats like Arrow inside DuckDB. And while DuckDB’s internal structures for query intermediates are in some ways close to Arrow, in other ways they are quite different. We feel that in order to be able to keep innovating in data systems, we cannot allow ourselves to be restricted by formats that are controlled externally. This is why we use our own serialization in Quack. If we want to add a new data type or protocol message, we can ship tomorrow.</p>

<p>Deep down, there is also one fateful design decision in Arrow Flight SQL: every single query requires at least two protocol round trips, <code class="language-plaintext highlighter-rouge">CommandStatementQuery</code> and <code class="language-plaintext highlighter-rouge">DoGet</code>. This is not ideal for small updates like in our second experiment above, especially in higher-latency environments. As mentioned, we designed Quack to be able to do single-round trip query execution and result fetching for small queries.</p>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[DuckDB instances can now talk to each other using the Quack remote protocol. This lets you run DuckDB in a client-server setup with multiple concurrent writers. In DuckDB's spirit, Quack is simple to set up and builds on proven technologies such as HTTP. It's also fast, which allows it to support workloads ranging from bulk operations to small transactions.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/quack-release.jpg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/quack-release.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing the Program of DuckCon #7 Amsterdam</title><link href="https://duckdb.org/2026/05/08/announcing-duckcon7.html" rel="alternate" type="text/html" title="Announcing the Program of DuckCon #7 Amsterdam" /><published>2026-05-08T00:00:00+00:00</published><updated>2026-05-08T00:00:00+00:00</updated><id>https://duckdb.org/2026/05/08/announcing-duckcon7</id><content type="html" xml:base="https://duckdb.org/2026/05/08/announcing-duckcon7.html"><![CDATA[<p><img src="/images/events/thumbs/duckcon-7-amsterdam.svg" alt="DuckCon #7 Splashscreen" width="680" /></p>

<p>We are excited to announce the program of <strong>DuckCon #7 Amsterdam</strong>, DuckDB's user conference.
The event will be held on <strong>Wednesday, June 24, 2026</strong>, at the <a href="https://www.kit.nl/about-us/">Royal Tropical Institute</a>.
The program runs from <strong>15:00 to 20:00 CEST</strong>.</p>

<p>See the registration link and the full program on the <a href="/events/2026/06/24/duckcon7/">DuckCon #7 event page</a>.</p>]]></content><author><name>{&quot;twitter&quot; =&gt; &quot;none&quot;, &quot;picture&quot; =&gt; &quot;/images/blog/authors/gabor_szarnyas.png&quot;}</name></author><category term="DuckCon" /><summary type="html"><![CDATA[We are hosting DuckCon #7 in Amsterdam on June 24, 2026. Join us at the Royal Tropical Institute for talks, lightning sessions, and a borrel.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/events/thumbs/duckcon-7-amsterdam.png" /><media:content medium="image" url="https://duckdb.org/images/events/thumbs/duckcon-7-amsterdam.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Delta Grows Up: Writes, Unity Catalog and Time Travel</title><link href="https://duckdb.org/2026/05/07/delta-uc-updates.html" rel="alternate" type="text/html" title="Delta Grows Up: Writes, Unity Catalog and Time Travel" /><published>2026-05-07T00:00:00+00:00</published><updated>2026-05-07T00:00:00+00:00</updated><id>https://duckdb.org/2026/05/07/delta-uc-updates</id><content type="html" xml:base="https://duckdb.org/2026/05/07/delta-uc-updates.html"><![CDATA[<p>Welcome back! While we here at DuckLabs are typically of the quacking
persuasion, we’ve been busy as beavers, shoring up our Delta to prepare for
what’s next… Unity Catalog! Let’s look at how DuckDB’s
<a href="/docs/current/core_extensions/delta.html">Delta</a> and
<a href="/docs/current/core_extensions/unity_catalog.html">Unity Catalog</a>
extensions have grown up enough to shed the experimental tag, and see what
has changed since our <a href="/2025/03/21/maximizing-your-delta-scan-performance.html">last
update</a>.</p>

<h2 id="time-to-open-the-delta">Time to Open the Delta</h2>

<p>Before we jump in, let's review briefly. Delta is a foundational <a href="https://docs.delta.io/">open
table format and toolset</a> for building and managing
data lakes, related to Iceberg and other lakehouse formats. DuckDB supports
Delta tables via its <a href="/docs/current/core_extensions/delta.html">Delta
Extension</a>.</p>

<p>In that last update we highlighted performance wins, particularly file skipping
via filter pushdowns, and metadata caching with snapshot pinning. Now we build
on these, and add writes, time travel and Unity Catalog support, plus
more performance gains!</p>

<h3 id="building-up-the-delta-lake-writes">Building Up the Delta (Lake): Writes</h3>

<p>What fun are reads without writes? The big addition since we last chatted is
<code class="language-plaintext highlighter-rouge">INSERT</code> support! It works as simply as you expect. Let's assume you have a Delta
table ready to go. <code class="language-plaintext highlighter-rouge">INSERT</code> away, it's that simple:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Schema: (text VARCHAR, code BIGINT)</span>
<span class="k">ATTACH</span> <span class="s1">'./path/to/my_table'</span> <span class="k">AS</span> <span class="n">my_table</span> <span class="p">(</span><span class="k">TYPE</span> <span class="k">delta</span><span class="p">);</span>

<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">my_table</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="s1">'Question 2'</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="s1">'The Answer'</span><span class="p">,</span> <span class="mi">42</span><span class="p">);</span>

<span class="c1">-- Bulk insert from a query</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">my_table</span>
<span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="n">text</span> <span class="o">||</span> <span class="s1">' (copy)'</span><span class="p">,</span> <span class="n">code</span> <span class="o">+</span> <span class="mi">100</span> <span class="k">FROM</span> <span class="n">my_table</span><span class="p">);</span>
</code></pre></div></div>

<p>Also worth calling out – multiple <code class="language-plaintext highlighter-rouge">INSERT</code>s within a <code class="language-plaintext highlighter-rouge">BEGIN</code> / <code class="language-plaintext highlighter-rouge">COMMIT</code> block are
stored as a single Delta version: one atomic commit, one new log entry. And,
as you'll see later, this works with catalogs too! <code class="language-plaintext highlighter-rouge">UPDATE</code>, <code class="language-plaintext highlighter-rouge">MERGE</code>, and <code class="language-plaintext highlighter-rouge">DELETE</code>
are not yet supported, but on our future work list.</p>

<h3 id="time-travel">Time Travel</h3>

<p>DuckDB's Delta extension now supports <a href="https://delta.io/blog/2023-02-01-delta-lake-time-travel/">time
travel</a>. Any Delta
table can be queried as of a particular version. DuckDB supports binding to a
specific <code class="language-plaintext highlighter-rouge">VERSION</code> either at <code class="language-plaintext highlighter-rouge">ATTACH</code> time, or as part of an individual query.</p>

<p>Let's assume that we have built up the above <code class="language-plaintext highlighter-rouge">my_table</code> incrementally, with
versions 0, 1, and 2 containing:</p>

<table>
  <thead>
    <tr>
      <th>Version</th>
      <th>Contents</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>0</td>
      <td><code class="language-plaintext highlighter-rouge">('Question 1', 1)</code></td>
    </tr>
    <tr>
      <td>1</td>
      <td>+ <code class="language-plaintext highlighter-rouge">('Question 2', 2)</code>, <code class="language-plaintext highlighter-rouge">('The Answer', 42)</code></td>
    </tr>
    <tr>
      <td>2</td>
      <td>+ <code class="language-plaintext highlighter-rouge">('Question 1 (copy)', 101)</code>, <code class="language-plaintext highlighter-rouge">('Question 2 (copy)', 102)</code>, <code class="language-plaintext highlighter-rouge">('The Answer (copy)', 142)</code></td>
    </tr>
  </tbody>
</table>

<p>You can attach normally and query arbitrary versions inline as needed. The
most flexible approach:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'./path/to/my_table'</span> <span class="k">AS</span> <span class="n">my_table</span> <span class="p">(</span><span class="k">TYPE</span> <span class="k">delta</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">my_table</span> <span class="k">AT</span> <span class="p">(</span><span class="k">VERSION</span> <span class="o">=&gt;</span> <span class="mi">0</span><span class="p">);</span> <span class="c1">-- 1  (Question 1 only)</span>
<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">my_table</span> <span class="k">AT</span> <span class="p">(</span><span class="k">VERSION</span> <span class="o">=&gt;</span> <span class="mi">1</span><span class="p">);</span> <span class="c1">-- 3  (after 1st insert)</span>
<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">my_table</span><span class="p">;</span>                   <span class="c1">-- 6  (latest)</span>
</code></pre></div></div>

<p>Or attach, pinned to a specific version, which is useful when you want a stable
reference that never changes, regardless of future writes:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Always v1, no matter what gets written later</span>
<span class="k">ATTACH</span> <span class="s1">'./path/to/my_table'</span> <span class="k">AS</span> <span class="n">my_table_v1</span>
    <span class="p">(</span><span class="k">TYPE</span> <span class="k">delta</span><span class="p">,</span> <span class="k">VERSION</span> <span class="mi">1</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">my_table_v1</span><span class="p">;</span>      <span class="c1">-- → 3</span>

<span class="c1">-- Locked to whatever was latest at attach time</span>
<span class="k">ATTACH</span> <span class="s1">'./path/to/my_table'</span> <span class="k">AS</span> <span class="n">my_table_pinned</span>
    <span class="p">(</span><span class="k">TYPE</span> <span class="k">delta</span><span class="p">,</span> <span class="k">PIN_SNAPSHOT</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">my_table_pinned</span><span class="p">;</span>  <span class="c1">-- → 6</span>
</code></pre></div></div>

<h3 id="growing-up-no-longer-a-kit-">Growing Up: No Longer a Kit 🦫</h3>

<p>The DuckDB Delta extension is no longer a
<a href="https://duckduckgo.com/?q=what+is+a+baby+beaver+called">kit</a> and has grown
up quite a bit since a year ago.
As you just saw, we added writes and time travel. These features open the
door to something bigger: Unity Catalog coordination.</p>

<h2 id="unity-catalog-support-atop-the-delta">Unity Catalog Support atop the Delta</h2>

<p>Data lake systems excel at scale. As your data assets multiply,
you need a way to discover what exists, control who can access it, audit how
it's being used, and coordinate writes across multiple engines. Data catalogs
have evolved to address exactly these needs, sitting above the storage layer
to manage the metadata, governance, and transactional bookkeeping that make
large-scale data lakes effective. The OSS Unity Catalog team has a <a href="https://unitycatalog.io/blogs/what-is-a-data-catalog-and-why-do-i-need-one/">good
overview</a>
if you'd like to go deeper; the concepts apply broadly regardless of which
catalog you use.</p>

<h3 id="what-is-unity-catalog">What is Unity Catalog?</h3>

<p>Unity Catalog (UC for short) is an open standard for governing data and AI
assets, including tables, volumes, models, and functions, across engines and
clouds. It turns your data lake into a lakehouse, and gives you a single place
to discover, audit, and control access to your data, regardless of what's
reading or writing it. DuckDB's Unity Catalog extension is built upon the
<a href="https://go.unitycatalog.io/apidocs">Unity Catalog Open API</a>. There are two main
implementations: <a href="https://unitycatalog.io/">OSS Unity Catalog</a>, which you can
self-host (and Docker-ify in minutes), and <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/">Databricks Unity
Catalog</a>,
the managed version. Like Delta, the DuckDB Unity Catalog extension has shed
its experimental tag. Let's put both to work.</p>

<h3 id="getting-started-oss-unity-catalog">Getting Started: OSS Unity Catalog</h3>

<p>We've set up a <a href="https://github.com/benfleis/duckdb-unitycatalog-playground/">Docker image playground bundling OSS Unity Catalog and DuckDB
together</a>,
so you can follow along with easy docker build-and-run setup. Grab it
if you would like to walk through the samples or experiment on your own. (If
you'd prefer to run OSS UC directly, the official image is the upstream of our
playground.)</p>

<p>Let's start with Docker. Assuming you now have the image running, it
already executed (roughly) the following steps in the build phase to prepare
our playground:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create a schema</span>
/home/unitycatalog/bin/uc schema create <span class="nt">--catalog</span> unity <span class="nt">--name</span> my_schema

<span class="c"># Create the "pets" table</span>
/home/unitycatalog/bin/uc table create <span class="se">\</span>
    <span class="nt">--full_name</span>        unity.my_schema.pets <span class="se">\</span>
    <span class="nt">--columns</span>          <span class="s2">"uuid STRING, name STRING, age INT, adopted BOOLEAN"</span> <span class="se">\</span>
    <span class="nt">--format</span>           DELTA <span class="se">\</span>
    <span class="nt">--storage_location</span> file:///home/unitycatalog/etc/data/external/unity/my_schema/tables/pets
</code></pre></div></div>

<p>After that, we can test things out from DuckDB. To see for
yourself, <code class="language-plaintext highlighter-rouge">docker exec -it duckdb-playground duckdb</code> will give you a DuckDB shell
inside the container.</p>

<p>Before doing anything meaningful we'll need to set up a DuckDB secret. In this
example the <code class="language-plaintext highlighter-rouge">TOKEN</code> value is ignored by local OSS UC server, but the field is
required. Create the secret, then you can immediately attach and read:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">LOAD</span><span class="n"> unity_catalog</span><span class="p">;</span>

<span class="k">CREATE</span> <span class="k">SECRET</span> <span class="p">(</span>
    <span class="k">TYPE</span>     <span class="k">unity_catalog</span><span class="p">,</span>
    <span class="k">TOKEN</span>    <span class="s1">'demo-ignored-token'</span><span class="p">,</span>
    <span class="k">ENDPOINT</span> <span class="s1">'http://unitycatalog:8080'</span>
<span class="p">);</span>

<span class="k">ATTACH</span> <span class="s1">'unity'</span> <span class="k">AS</span> <span class="n">my_catalog</span>
    <span class="p">(</span><span class="k">TYPE</span> <span class="k">unity_catalog</span><span class="p">,</span> <span class="k">DEFAULT_SCHEMA</span> <span class="s1">'my_schema'</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="k">name</span><span class="p">,</span> <span class="n">age</span><span class="p">,</span> <span class="n">adopted</span> <span class="k">FROM</span> <span class="n">my_catalog.pets</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="k">name</span><span class="p">;</span>
<span class="c1">-- returns a single 'Seed' row</span>
</code></pre></div></div>

<p>That's it! You just queried Unity-Catalog-managed, Delta-stored pets data.</p>

<blockquote>
  <p>Tip Want to experiment with this on Databricks Unity Catalog? Setting up a
Databricks Unity Catalog is out of scope for this blog, but if you have one
ready to go, you will need these to get bootstrapped with DuckDB:</p>

  <ul>
    <li>set <code class="language-plaintext highlighter-rouge">ENDPOINT</code> to <a href="https://docs.databricks.com/aws/en/workspace/workspace-details#workspace-instance-names-urls-and-ids">your Workspace
URL</a>
(typically: https://{instance}.cloud.databricks.com/)</li>
    <li>set <code class="language-plaintext highlighter-rouge">TOKEN</code> appropriately (e.g. <a href="https://docs.databricks.com/aws/en/dev-tools/auth/pat">create a
PAT</a> with
<code class="language-plaintext highlighter-rouge">unity-catalog</code> scope); getting the correct token depends
entirely on your setup. To dive in, see <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/access-control/">Access Control in Unity
Catalog</a>.</li>
  </ul>

  <p>With these in hand you can use DuckDB directly, or access
the extensive <a href="https://docs.databricks.com/api/workspace/introduction">UC Open
API</a> directly.</p>
</blockquote>

<p>Next, let's complete the circle and write some data into our pets table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">my_catalog.pets</span>
    <span class="p">(</span><span class="n">uuid</span><span class="p">,</span> <span class="k">name</span><span class="p">,</span> <span class="n">age</span><span class="p">,</span> <span class="n">adopted</span><span class="p">)</span>
<span class="k">SELECT</span>
    <span class="nf">gen_random_uuid</span><span class="p">()::</span><span class="nb">VARCHAR</span><span class="p">,</span>
    <span class="p">[</span><span class="s1">'Luna'</span><span class="p">,</span> <span class="s1">'Milo'</span><span class="p">,</span> <span class="s1">'Bella'</span><span class="p">,</span> <span class="s1">'Charlie'</span><span class="p">,</span> <span class="s1">'Max'</span><span class="p">,</span> <span class="s1">'Lucy'</span><span class="p">,</span> <span class="s1">'Cooper'</span><span class="p">,</span>
     <span class="s1">'Daisy'</span><span class="p">,</span> <span class="s1">'Buddy'</span><span class="p">,</span> <span class="s1">'Lily'</span><span class="p">,</span> <span class="s1">'Rocky'</span><span class="p">,</span> <span class="s1">'Molly'</span><span class="p">,</span> <span class="s1">'Bear'</span><span class="p">,</span> <span class="s1">'Lola'</span><span class="p">,</span>
     <span class="s1">'Duke'</span><span class="p">,</span> <span class="s1">'Sadie'</span><span class="p">,</span> <span class="s1">'Tucker'</span><span class="p">,</span> <span class="s1">'Zoe'</span><span class="p">,</span> <span class="s1">'Oliver'</span><span class="p">,</span> <span class="s1">'Stella'</span>
    <span class="p">][</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="nf">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">19</span><span class="p">)::</span><span class="nb">INT</span><span class="p">],</span>
    <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="nf">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">14</span><span class="p">)::</span><span class="nb">INT</span><span class="p">)::</span><span class="nb">INT</span><span class="p">,</span>
    <span class="nf">random</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">0.5</span>
<span class="k">FROM</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">my_catalog.pets</span><span class="p">;</span>
</code></pre></div></div>

<p>You can also easily find and see the created files; check the local <code class="language-plaintext highlighter-rouge">data</code>
directory (also bind-mounted in Docker), and you should find both pre-existing
files, and a new Parquet file containing the inserted rows. In my case it looks
like this:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tree </span>data
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>data
└── external
    └── unity
        └── my_schema
            └── tables
                └── pets
                    ├── _delta_log
                    │   ├── 00000000000000000000.json
                    │   ├── 00000000000000000001.json
                    │   └── 00000000000000000002.json
                    ├── duckdb-19cb47ae-9f35-4126-b67d-c94fcade68cc.parquet
                    └── duckdb-e3bb0336-f16a-4d21-9495-0fbf55c6cba8.parquet

7 directories, 5 files
</code></pre></div></div>

<h3 id="catalog-managed-tables">Catalog Managed Tables</h3>

<p>With the basics out of the way, we can talk about <a href="https://docs.databricks.com/aws/en/tables/managed">Catalog Managed Tables
(CMT)</a>. This is available
today in both <a href="https://www.unitycatalog.io/">OSS</a> and
<a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/">Databricks</a>
Unity Catalog.</p>

<p>The big feature in CMT is Catalog Commits, which enables coordinated concurrent writes. Without Catalog Commits,
DuckDB writes go directly to the Delta log. While modern storage backends
prevent outright lost writes, UC is left out of the loop entirely. Its
metadata, audit trail, and statistics fall out of sync with the actual table
state, and other engines querying through UC may see a stale view.</p>

<p>Catalog Commits (CC) fixes this: every write is staged and registered through UC before it
becomes visible. UC acts as the commit arbiter, preserving first writer
commits, and sending a conflict error to later writers. This matters
wherever multiple writers are appending simultaneously, e.g., parallel ETL
pipelines, partitioned bulk loads, and concurrent analytical inserts. Each
writer works independently; UC ensures exactly one commit lands per version and
keeps its own catalog in sync with every one of them.</p>

<p>Consistent reads and audit history are already inherent to Delta and UC
respectively. CC doesn't add functionality, it just ensures UC stays in sync with
every commit. And Catalog Commits coordinate per table; there is no cross-table
atomicity. If you write to two tables in the same <code class="language-plaintext highlighter-rouge">BEGIN</code> / <code class="language-plaintext highlighter-rouge">COMMIT</code> block,
each table commits independently.</p>

<p>To opt a table into CMT (and therefore CC), set the <code class="language-plaintext highlighter-rouge">delta.feature.catalogManaged</code> table property
at creation time. This is done via Spark or the UC CLI, as DuckDB's Unity Catalog
extension does not yet support <code class="language-plaintext highlighter-rouge">CREATE TABLE</code> DDL:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Via Spark</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">my_catalog.my_schema.concurrent_tbl</span> <span class="p">(</span>
    <span class="n">uuid</span>    <span class="nb">STRING</span>  <span class="k">NOT</span> <span class="nb">NULL</span><span class="p">,</span>
    <span class="k">name</span>    <span class="nb">STRING</span>  <span class="k">NOT</span> <span class="nb">NULL</span><span class="p">,</span>
    <span class="n">age</span>     <span class="nb">INT</span>     <span class="k">NOT</span> <span class="nb">NULL</span><span class="p">,</span>
    <span class="n">adopted</span> <span class="nb">BOOLEAN</span> <span class="k">NOT</span> <span class="nb">NULL</span>
<span class="p">)</span>
<span class="k">TBLPROPERTIES</span> <span class="p">(</span><span class="s1">'delta.feature.catalogManaged'</span> <span class="o">=</span> <span class="s1">'supported'</span><span class="p">);</span>
</code></pre></div></div>

<p>Once enabled, DuckDB writes go through UC's commit staging automatically —
the <code class="language-plaintext highlighter-rouge">INSERT</code> syntax is unchanged:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">my_catalog.my_schema.concurrent_tbl</span>
    <span class="p">(</span><span class="n">uuid</span><span class="p">,</span> <span class="k">name</span><span class="p">,</span> <span class="n">age</span><span class="p">,</span> <span class="n">adopted</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="nf">gen_random_uuid</span><span class="p">()::</span><span class="nb">VARCHAR</span><span class="p">,</span> <span class="s1">'Luna'</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="k">true</span><span class="p">);</span>
</code></pre></div></div>

<p>Now each DuckDB writer stages its commit to a <code class="language-plaintext highlighter-rouge">_staged_commits/</code> directory and
registers it with UC before that data becomes visible. UC arbitrates: exactly
one writer wins each version in a race, the others get a conflict error and can
retry. Next, let's look at how UC handles the race.</p>

<h2 id="deeper-dive">Deeper Dive</h2>

<h3 id="racing-commits">Racing Commits</h3>

<p>To see how Catalog Commits arbitrates, we launched 20 concurrent DuckDB
writers, 8 at a time, all inserting into the same managed table:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">seq </span>1 20 | xargs <span class="nt">-P</span> 8 <span class="nt">-I</span><span class="o">{}</span> scripts/unity/05-cmc/write-single <span class="o">{}</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[worker 6] OK - inserted 5 rows
[worker 5] CONFLICT - another writer won this version, retry needed
[worker 2] CONFLICT - another writer won this version, retry needed
[worker 8] CONFLICT - another writer won this version, retry needed
[worker 7] CONFLICT - another writer won this version, retry needed
[worker 3] CONFLICT - another writer won this version, retry needed
[worker 1] OK - inserted 5 rows
[worker 4] CONFLICT - another writer won this version, retry needed
[worker 16] OK - inserted 5 rows
[worker 13] CONFLICT - another writer won this version, retry needed
[worker 15] CONFLICT - another writer won this version, retry needed
[worker 11] CONFLICT - another writer won this version, retry needed
[worker 14] CONFLICT - another writer won this version, retry needed
[worker 12] OK - inserted 5 rows
[worker 9] CONFLICT - another writer won this version, retry needed
[worker 10] CONFLICT - another writer won this version, retry needed
[worker 17] CONFLICT - another writer won this version, retry needed
[worker 20] CONFLICT - another writer won this version, retry needed
[worker 18] OK - inserted 5 rows
[worker 19] CONFLICT - another writer won this version, retry needed
</code></pre></div></div>

<p>Here we see 5 successful writes, and 15 signaled conflicts. Let's confirm in
the data:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">AS</span> <span class="n">total_rows</span> <span class="k">FROM</span> <span class="n">my_catalog.my_schema.concurrent_tbl</span><span class="p">;</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌────────────┐
│ total_rows │
│   int64    │
├────────────┤
│         35 │
└────────────┘
</code></pre></div></div>

<p>10 seeded rows + (5 writes × 5 rows each) = 35 total rows. (In a real workload,
you would retry the conflicted writes and land all 20 inserts.) Catalog Managed
Table commits gave us clear signal and semantics during highly concurrent
writes, as promised.</p>

<h3 id="travel-in-time-faster">Travel in Time, Faster</h3>

<p>DuckDB's Delta snapshot loading is getting a speed boost: snapshots
will load incrementally when possible, making time travel across nearby
versions significantly faster. Consider a table where some initial queries are
made against version 16:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'./path/to/table'</span> <span class="k">AS</span> <span class="n">t</span> <span class="p">(</span><span class="k">TYPE</span> <span class="k">delta</span><span class="p">,</span> <span class="k">VERSION</span> <span class="mi">16</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">t</span><span class="p">;</span>  <span class="c1">-- → 17</span>
</code></pre></div></div>

<p>And now some work needs to be done against version 20. If we peek under the
hood (warning: sneaky code follows), we'll see that none of the previously
loaded Delta log metadata files were re-loaded:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span> <span class="n">enable_logging</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">SET</span> <span class="n">delta_kernel_logging</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">CALL</span> <span class="nf">enable_logging</span><span class="p">(</span><span class="s1">'DeltaKernel'</span><span class="p">,</span> <span class="n">level</span> <span class="o">=</span> <span class="s1">'trace'</span><span class="p">);</span>

<span class="k">ATTACH</span> <span class="s1">'./path/to/table'</span> <span class="k">AS</span> <span class="n">t</span> <span class="p">(</span><span class="k">TYPE</span> <span class="k">delta</span><span class="p">,</span> <span class="k">VERSION</span> <span class="mi">20</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">t</span><span class="p">;</span>  <span class="c1">-- → 21</span>

<span class="c1">-- Delta kernel logs 'Provisionally selecting ... &lt;version&gt;.json'</span>
<span class="c1">-- whenever it reads a log file from scratch. We search for any such</span>
<span class="c1">-- message referencing a zero-padded log filename; zero matches</span>
<span class="c1">-- means the cached v16 snapshot was reused rather than rebuilt.</span>
<span class="k">SELECT</span> <span class="nf">count</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">duckdb_logs</span>
<span class="k">WHERE</span> <span class="n">type</span> <span class="o">=</span> <span class="s1">'DeltaKernel'</span>
  <span class="k">AND</span> <span class="n">message</span> <span class="k">LIKE</span> <span class="s1">'%00000000000000000%.json%'</span><span class="p">;</span>
<span class="c1">-- → 0</span>
</code></pre></div></div>

<p>In Delta lakes with thousands or millions of snapshots, incremental loading
provides a big win when working across multiple versions.</p>

<blockquote>
  <p>At time of writing, incremental snapshot loading is supported in nightly builds.
You can install it using:</p>

  <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FORCE INSTALL</span><span class="n"> delta</span> <span class="k">FROM</span> <span class="n">core_nightly</span><span class="p">;</span>
</code></pre></div>  </div>

  <p>Please be aware that nightly builds are not intended for production use.
The implementation will be included in the next stable release,
<a href="/release_calendar.html">v1.5.3</a>.</p>
</blockquote>

<h2 id="conclusions">Conclusions</h2>

<p>A year ago, DuckDB could read Delta tables. Today it can insert data into them,
travel through their history, and query and write through a governed catalog —
without the experimental caveat on any of it. The combination of Delta for open
storage, Unity Catalog for governance and coordination, and DuckDB for fast
analytical queries is a stack you can build on.</p>

<p>There's more to come: DDL support to create and manage tables directly,
delete/update/merge support, and multi-table atomicity for writes that span
more than one table. In the meantime, the playground image linked above has
everything you need to kick the tires. As always, feedback and bug reports
are welcome on <a href="https://github.com/duckdb/duckdb-delta">GitHub</a>.</p>]]></content><author><name>Ben Fleis</name></author><category term="extensions" /><summary type="html"><![CDATA[DuckDB's Delta and Unity Catalog extensions shed their experimental tags — now with writes, Unity Catalog and time travel support.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/delta-uc-updates.jpg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/delta-uc-updates.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The DuckLake Spec Is so Simple, Even a Clanker Can Build One for Dataframes</title><link href="https://duckdb.org/2026/05/04/ducklake-dataframe.html" rel="alternate" type="text/html" title="The DuckLake Spec Is so Simple, Even a Clanker Can Build One for Dataframes" /><published>2026-05-04T00:00:00+00:00</published><updated>2026-05-04T00:00:00+00:00</updated><id>https://duckdb.org/2026/05/04/ducklake-dataframe</id><content type="html" xml:base="https://duckdb.org/2026/05/04/ducklake-dataframe.html"><![CDATA[]]></content><author><name>Pedro Holanda, Dr. Peter van Holland</name></author><category term="extensions" /><summary type="html"><![CDATA[We are showcasing the simplicity of DuckLake's v1.0 specification by developing a dataframe reader/writer with AI.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/ducklake-dataframe.jpg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/ducklake-dataframe.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing DuckDB 1.5.2</title><link href="https://duckdb.org/2026/04/13/announcing-duckdb-152.html" rel="alternate" type="text/html" title="Announcing DuckDB 1.5.2" /><published>2026-04-13T00:00:00+00:00</published><updated>2026-04-13T00:00:00+00:00</updated><id>https://duckdb.org/2026/04/13/announcing-duckdb-152</id><content type="html" xml:base="https://duckdb.org/2026/04/13/announcing-duckdb-152.html"><![CDATA[<p>In this blog post, we highlight a few important fixes in DuckDB v1.5.2, the second patch release in <a href="/2026/03/09/announcing-duckdb-150.html">DuckDB's v1.5 line</a>.
You can find the complete <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.2">release notes on GitHub</a>.</p>

<p>To install the new version, please visit the <a href="/install/">installation page</a>.</p>

<h2 id="data-lake-and-lakehouse-formats">Data Lake and Lakehouse Formats</h2>

<h3 id="ducklake">DuckLake</h3>

<p>We are proud to release a stable, production-ready lakehouse specification and its reference implementation in DuckDB.</p>

<p>We published a <a href="https://ducklake.select/2026/04/13/ducklake-10/">detailed blog post on the DuckLake site</a> but here's a quick summary: DuckLake v1.0 ships dozens of bugfixes and guarantees backward-compatibility. Additionally, it has a number of cool features: <a href="https://ducklake.select/2026/04/02/data-inlining-in-ducklake/">data inlining</a>, sorted tables, bucket partitioning, and deletion buffers as Iceberg-compatible Puffin files. More on this in the <a href="https://ducklake.select/2026/04/13/ducklake-10/">announcement blog post</a>.</p>

<h3 id="iceberg">Iceberg</h3>

<p>The <a href="/docs/current/core_extensions/iceberg/overview.html">Iceberg extension</a> ships a number of new features. It now supports the following:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">GEOMETRY</code> type</li>
  <li><code class="language-plaintext highlighter-rouge">ALTER TABLE</code> statement</li>
  <li>Updates and deletes from <a href="https://iceberg.apache.org/docs/latest/partitioning/">partitioned tables</a></li>
  <li>Truncate and bucket partitions</li>
</ul>

<p>Last week, DuckLabs engineer Tom Ebergen gave a talk at the <a href="https://www.icebergsummit.org/">Iceberg Summit</a> titled <a href="/library/building-duckdb-iceberg-exploring-the-iceberg-ecosystem/">“Building DuckDB-Iceberg: Exploring the Iceberg Ecosystem”</a>, where he shared his experiences with Iceberg.</p>

<h2 id="preliminary-jepsen-test-results">Preliminary Jepsen Test Results</h2>

<p>To make DuckDB as robust as possible, we started a collaboration with <a href="https://jepsen.io/">Jepsen</a>. The preliminary test suite is available at <a href="https://github.com/duckdb/duckdb-jepsen">https://github.com/duckdb/duckdb-jepsen</a>.</p>

<p>The test suite has uncovered a bug that was triggered by <code class="language-plaintext highlighter-rouge">INSERT INTO</code> statements that perform conflict resolution on a primary key, and already <a href="https://github.com/duckdb/duckdb/pull/21489">shipped a fix</a> in this release.</p>

<h2 id="new-online-shell">New Online Shell</h2>

<p>The online <a href="/docs/current/clients/wasm/overview.html">WebAssembly</a> shell at <a href="https://shell.duckdb.org/"><code class="language-plaintext highlighter-rouge">shell.duckdb.org</code></a> received a complete overhaul.
A highlight of the new shell is the ability to store and list files using the <code class="language-plaintext highlighter-rouge">.files</code> dot command and its variants.</p>

<p>Using the file storage feature, you can turn your browser session into workbench: you can drag-and-drop files from your local file system to upload them, create new ones using DuckDB's <a href="/docs/current/sql/statements/copy.html#copy--to"><code class="language-plaintext highlighter-rouge">COPY ... TO</code> statement</a> and download the results. For more information on this feature, use the <code class="language-plaintext highlighter-rouge">.help</code> command.</p>

<!--
<img src="/images/blog/online-shell-example.png" alt="Example use of the new online shell at shell.duckdb.org" width="800" />
-->

<p>The new shell comes with a few built-in datasets: you're welcome to try them out and experiment.
Your old links to <code class="language-plaintext highlighter-rouge">shell.duckdb.org</code> should still work but if you experience any problems, please submit an issue in the <a href="https://github.com/duckdb/duckdb-wasm"><code class="language-plaintext highlighter-rouge">duckdb-web</code> repository</a>.</p>

<h2 id="benchmarks">Benchmarks</h2>

<p>We benchmarked DuckDB using the Linux v7 kernel on an <a href="https://instances.vantage.sh/aws/ec2/r8gd.8xlarge?currency=USD">r8gd.8xlarge</a> instance with 32 vCPUs, 256 GiB RAM, and an NVMe SSD.
We first ran the scale factor 300 test on Ubuntu 24.04 LTS, then upgraded to Ubuntu 26.04 beta.
We noticed that the composite TPC-H score shows a <strong>~10% improvement</strong>, jumping from 778,041 to 854,676 when measured with TPC-H's QphH@Score metric.</p>

<h2 id="coming-up">Coming Up</h2>

<p>This quarter, we have quite a few exciting events lined up.</p>

<p><strong>DuckCon #7.</strong> On June 24, we'll host our next user conference, <a href="/events/2026/06/24/duckcon7/">DuckCon #7</a>, in Amsterdam's beautiful <a href="https://www.kit.nl/about-us/">Royal Tropical Institute</a>.</p>

<p><strong>AI Council Talk.</strong> On May 12, DuckDB co-creator Hannes Mühleisen will give a talk at AI Council 2026 titled <a href="/library/super-secret-next-big-thing-for-duckdb/">“Super-Secret Next Big Thing for DuckDB”</a>. Well, at this point, we cannot tell you more than he will present the super-secret next big thing for DuckDB. But, if you cannot make it, don't worry: we'll publish the presentation afterwards.</p>

<p><strong>Ubuntu Summit Talk.</strong> We already talked about performance on Ubuntu. In late May, Gábor Szárnyas of DuckLabs will give a talk titled <a href="/library/duckdb-not-quack-science/">“DuckDB: Not Quack Science”</a> at the <a href="https://ubuntu.com/summit">Ubuntu Summit</a>.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This post is a short summary of the changes in v1.5.2. As usual, you can find the <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.2">full release notes on GitHub</a>.</p>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[We are releasing DuckDB version v1.5.2, a patch release with bugfixes and performance improvements, and support for the DuckLake v1.0 lakehouse format.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-5-2.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-5-2.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">DuckLake v1.0: The Lakehouse Format Built on SQL Reaches Production-Readiness</title><link href="https://duckdb.org/2026/04/13/ducklake-10.html" rel="alternate" type="text/html" title="DuckLake v1.0: The Lakehouse Format Built on SQL Reaches Production-Readiness" /><published>2026-04-13T00:00:00+00:00</published><updated>2026-04-13T00:00:00+00:00</updated><id>https://duckdb.org/2026/04/13/ducklake-10</id><content type="html" xml:base="https://duckdb.org/2026/04/13/ducklake-10.html"><![CDATA[]]></content><author><name>The DuckDB team</name></author><category term="extensions" /><summary type="html"><![CDATA[We released the DuckLake v1.0 standard!]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/ducklake-1-0.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/ducklake-1-0.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Data Inlining in DuckLake: Unlocking Streaming for Data Lakes</title><link href="https://duckdb.org/2026/04/02/data-inlining-in-ducklake.html" rel="alternate" type="text/html" title="Data Inlining in DuckLake: Unlocking Streaming for Data Lakes" /><published>2026-04-02T00:00:00+00:00</published><updated>2026-04-02T00:00:00+00:00</updated><id>https://duckdb.org/2026/04/02/data-inlining-in-ducklake</id><content type="html" xml:base="https://duckdb.org/2026/04/02/data-inlining-in-ducklake.html"><![CDATA[]]></content><author><name>{&quot;twitter&quot; =&gt; &quot;holanda_pe&quot;, &quot;picture&quot; =&gt; &quot;/images/blog/authors/pedro_holanda.jpg&quot;}</name></author><category term="deep dive" /><summary type="html"><![CDATA[DuckLake’s data inlining stores small updates directly in the catalog, eliminating the “small files problem” and making continuous streaming into data lakes practical. Our benchmark shows 926× faster queries and 105× faster ingestion when compared to Iceberg.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/ducklake-inlining.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/ducklake-inlining.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">DuckDB Now Speaks Dutch!</title><link href="https://duckdb.org/2026/04/01/duckdb-now-speaks-dutch.html" rel="alternate" type="text/html" title="DuckDB Now Speaks Dutch!" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://duckdb.org/2026/04/01/duckdb-now-speaks-dutch</id><content type="html" xml:base="https://duckdb.org/2026/04/01/duckdb-now-speaks-dutch.html"><![CDATA[<p>Historically speaking, SQL queries have always been formulated in English. The initial name of the language was even Structured <strong>English</strong> Query Language (SEQUEL), before it became SQL. Now, what if the Dutch hadn't traded away New Amsterdam (present-day New York)? Would we all have been writing SQL in Dutch instead?</p>

<p>Well, wonder no longer. Today we're releasing <a href="/community_extensions/extensions/eenddb.html"><strong>EendDB</strong></a>: a DuckDB extension that brings you the <strong>Gestructureerde Zoektaal,</strong> or GZT for short.</p>

<p>Want joins? We've got <code class="language-plaintext highlighter-rouge">SAMENVOEGEN</code>. Aggregates? <code class="language-plaintext highlighter-rouge">GROEP PER</code>. Window functions? Those work too — though you'll have to look up the Dutch keywords in the repository yourself.</p>

<p>You can try it out right now in <a href="/2026/03/23/announcing-duckdb-151.html">DuckDB v1.5.1</a>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> eenddb</span> <span class="k">FROM</span> <span class="n">community</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> eenddb</span><span class="p">;</span>
<span class="k">CALL</span> <span class="nf">enable_dutch_parser</span><span class="p">();</span>

<span class="k">MAAK</span> <span class="k">TABEL</span> <span class="n">eend</span> <span class="p">(</span>
    <span class="n">id</span>        <span class="nb">GEHEEL_GETAL</span><span class="p">,</span>
    <span class="n">naam</span>      <span class="nb">TEKST</span><span class="p">,</span>
    <span class="n">leeftijd</span>  <span class="nb">GEHEEL_GETAL</span><span class="p">,</span>
    <span class="n">gewicht</span>   <span class="nb">KOMMAGETAL</span><span class="p">,</span>
    <span class="n">soort</span>     <span class="nb">TEKST</span>
<span class="p">);</span>

<span class="k">TOEVOEGEN</span> <span class="k">AAN</span> <span class="n">eend</span> <span class="k">WAARDEN</span>
    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Donald'</span><span class="p">,</span>  <span class="mi">29</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">,</span> <span class="s1">'Wilde eend'</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Daffy'</span><span class="p">,</span>   <span class="mi">35</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">,</span> <span class="s1">'Zwarte eend'</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Daisy'</span><span class="p">,</span>   <span class="mi">27</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">,</span> <span class="s1">'Wilde eend'</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'Scrooge'</span><span class="p">,</span> <span class="mi">75</span><span class="p">,</span> <span class="mf">1.8</span><span class="p">,</span> <span class="s1">'Wilde eend'</span><span class="p">);</span>

<span class="k">SELECTEER</span> <span class="o">*</span>
<span class="k">VAN</span> <span class="n">eend</span>
<span class="k">WAARBIJ</span> <span class="n">gewicht</span> <span class="o">&gt;</span> <span class="mf">1.2</span> <span class="k">EN</span> <span class="n">naam</span> <span class="k">ZOALS</span> <span class="s1">'%D%'</span>
<span class="k">VOLGORDE</span> <span class="nb">PER</span> <span class="n">leeftijd</span><span class="p">;</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┬─────────┬──────────┬─────────┬─────────────┐
│  id   │  naam   │ leeftijd │ gewicht │    soort    │
│ int32 │ varchar │  int32   │  float  │   varchar   │
├───────┼─────────┼──────────┼─────────┼─────────────┤
│     2 │ Daffy   │       35 │     1.5 │ Zwarte eend │
└───────┴─────────┴──────────┴─────────┴─────────────┘
</code></pre></div></div>

<p>Of course, no query language is complete without joins and aggregates. Let's create a second table and count the ducks per <em>soort:</em></p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">MAAK</span> <span class="k">TABEL</span> <span class="n">soorten</span> <span class="p">(</span><span class="n">soort</span> <span class="nb">TEKST</span><span class="p">,</span> <span class="n">leefgebied</span> <span class="nb">TEKST</span><span class="p">);</span>

<span class="k">TOEVOEGEN</span> <span class="k">AAN</span> <span class="n">soorten</span> <span class="k">WAARDEN</span>
    <span class="p">(</span><span class="s1">'Wilde eend'</span><span class="p">,</span>  <span class="s1">'Meren en rivieren'</span><span class="p">),</span>
    <span class="p">(</span><span class="s1">'Zwarte eend'</span><span class="p">,</span> <span class="s1">'Kustgebieden'</span><span class="p">);</span>

<span class="k">SELECTEER</span> <span class="n">s.leefgebied</span><span class="p">,</span> <span class="nf">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">ALS</span> <span class="n">aantal_eenden</span>
<span class="k">VAN</span> <span class="n">eend</span> <span class="k">ALS</span> <span class="n">e</span>
<span class="k">LINKS</span> <span class="k">SAMENVOEGEN</span> <span class="n">soorten</span> <span class="k">ALS</span> <span class="n">s</span> <span class="k">OP</span> <span class="n">e.soort</span> <span class="o">=</span> <span class="n">s.soort</span>
<span class="k">GROEP</span> <span class="nb">PER</span> <span class="n">s.leefgebied</span>
<span class="k">VOLGORDE</span> <span class="nb">PER</span> <span class="n">aantal_eenden</span> <span class="k">AFLOPEND</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────────────────┬───────────────┐
│    leefgebied     │ aantal_eenden │
│      varchar      │     int64     │
├───────────────────┼───────────────┤
│ Meren en rivieren │             3 │
│ Kustgebieden      │             1 │
└───────────────────┴───────────────┘
</code></pre></div></div>

<p>After we are done playing around, we obviously have to clean up after ourselves. Rather than <code class="language-plaintext highlighter-rouge">DROP</code> a table, in Dutch we like to throw it away (“weggooien”):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">GOOI_WEG</span> <span class="k">TABEL</span> <span class="n">eend</span><span class="p">;</span>
<span class="k">GOOI_WEG</span> <span class="k">TABEL</span> <span class="n">soorten</span><span class="p">;</span>
</code></pre></div></div>

<p>Under the hood, the parser is using DuckDB's <a href="/2026/03/09/announcing-duckdb-150.html#peg-parser">new experimental parser</a>, based on <a href="/2024/11/22/runtime-extensible-parsers.html">Parsing Expression Grammar</a>.</p>

<p>For more examples, check out the <a href="https://github.com/Dtenwolde/eenddb/">repository on GitHub</a>.</p>]]></content><author><name>Daniël ten Wolde</name></author><category term="extensions" /><summary type="html"><![CDATA[DuckDB now speaks Dutch! Load the EendDB community extension and start writing your queries in het Nederlands.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-now-speaks-dutch.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-now-speaks-dutch.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>