Thanks to visit codestin.com
Credit goes to github.com

Skip to content

junhyeong9812/python-elastic-query-builder

Repository files navigation

elastic-query-builder

Build Elasticsearch queries you can actually read.

PyPI version Python License: MIT

A lightweight, zero-dependency Python library that brings Java's QueryBuilders pattern to Python. Method chaining, explicit bool clause control, and plain dict output — no magic, no hidden abstractions.


Why This Exists

The raw dict problem

Building ES queries in Python usually starts like this:

query = {"query": {"bool": {"must": [], "filter": []}}}

if keyword:
    query["query"]["bool"]["must"].append(
        {"match": {"title": {"query": keyword, "operator": "and"}}}
    )

if date_from or date_to:
    range_q = {"range": {"date": {}}}
    if date_from:
        range_q["range"]["date"]["gte"] = date_from
    if date_to:
        range_q["range"]["date"]["lte"] = date_to
    query["query"]["bool"]["filter"].append(range_q)

if status:
    query["query"]["bool"]["filter"].append(
        {"term": {"status": status}}
    )

if exclude_test:
    query["query"]["bool"].setdefault("must_not", []).append(
        {"term": {"applicant": "test"}}
    )

query["size"] = 20
query["sort"] = [{"date": {"order": "desc"}}]

When you have 10–20 fields with dynamic conditions, this becomes unreadable fast. Bracket nesting, setdefault calls, manual list appends — hard to review, easy to break.

The elasticsearch-dsl approach

The official elasticsearch-dsl wraps this in its own abstraction. Simple cases look clean:

s = Search().query("match", title="python").filter("term", status="active")

But when you need nested bool queries with mixed clauses, it forces deeply nested constructor calls:

s = Search()
s = s.query(
    Q('bool',
        must=[
            Q('match', title={'query': 'semiconductor', 'operator': 'and'}),
            Q('bool',
                should=[
                    Q('match', content='LED'),
                    Q('match_phrase', abstract='light emitting diode'),
                ],
                minimum_should_match=1
            ),
        ],
        filter=[
            Q('range', application_date={'gte': '20200101', 'lte': '20241231'}),
            Q('term', status_code='registered'),
        ],
        must_not=[Q('term', applicant_name='test')]
    )
)

The Q() wrappers become noise. You still have to mentally parse the nesting to figure out what's in must vs filter vs should. The abstraction hides the actual ES structure instead of making it clearer.

What this library does instead

Think of it like Elasticsearch's own Query DSL — but expressed as Python method chains instead of JSON nesting:

qb = QueryBuilder()

inner = qb.nested_bool()
inner.add_should(QueryBuilder.Match.build("content", "LED"))
inner.add_should(QueryBuilder.MatchPhrase.build("abstract", "light emitting diode"))
inner.add_minimum_should_match(1)

query = (
    qb.add_must(QueryBuilder.Match.build("title", "semiconductor", operator="and"))
      .add_must(inner.build())
      .add_filter(QueryBuilder.Range.build("application_date", gte="20200101", lte="20241231"))
      .add_filter(QueryBuilder.Term.build("status_code", "registered"))
      .add_must_not(QueryBuilder.Term.build("applicant_name", "test"))
      .build()
)

Every line tells you exactly what it does. add_must goes to must. add_filter goes to filter. The structure maps directly to the ES Query DSL you already know.

And dynamic conditions stay clean:

qb = QueryBuilder()

if keyword:
    qb.add_must(QueryBuilder.Match.build("title", keyword, operator="and"))

if date_from or date_to:
    qb.add_filter(QueryBuilder.Range.build("date", gte=date_from, lte=date_to))

if status:
    qb.add_filter(QueryBuilder.Term.build("status", status))

if exclude_test:
    qb.add_must_not(QueryBuilder.Term.build("applicant", "test"))

query = qb.build()

No bracket hell. No setdefault. No manual list management. Just add conditions and build.

This matters for code review. Whether you wrote the query, a teammate wrote it, or an LLM generated it — someone has to verify it does what it's supposed to. Readable structure makes that verification fast.


Installation

pip install elastic-query-builder

Zero dependencies. Works with elasticsearch-py, opensearch-py, or any HTTP client — build() returns a plain dict.


Quick Start

from elastic_query_builder import QueryBuilder

qb = QueryBuilder()
query = (
    qb.add_must(QueryBuilder.Match.build("title", "elasticsearch"))
      .add_filter(QueryBuilder.Range.build("date", gte="2024-01-01"))
      .set_size(10)
      .build()
)

# Use with any ES client
# es.search(index="my-index", body=query)

Real-World Example: Patent Search

from elastic_query_builder import QueryBuilder
from elastic_query_builder.core.enums import SortOrder

qb = QueryBuilder()

# Search conditions
qb.add_must(QueryBuilder.Match.build("productKor", "반도체", operator="and"))
qb.add_must(QueryBuilder.Match.build("abstract", "발광 다이오드", boost=2.0))

# Filters
qb.add_filter(QueryBuilder.Range.build("applicationDate", gte="20200101", lte="20241231"))
qb.add_filter(QueryBuilder.Term.build("statusCode", "registered"))

# Exclusions
qb.add_must_not(QueryBuilder.Term.build("applicantName", "테스트"))

# Sort + Pagination
qb.add_sort("applicationDate", SortOrder.DESC)
qb.set_size(20)
qb.set_from(0)
qb.set_track_total_hits(True)

# Source filtering
qb.set_source_includes(["applicationNumber", "productKor", "applicantName", "applicationDate"])

query = qb.build()

What It Supports

Queries — Bool, Term, Terms, Match, MatchPhrase, Range, Wildcard, Exists, IDs, MatchAll, MatchNone, DisMax, Nested, SpanTerm, SpanNear

Aggregations — Terms, DateHistogram, Histogram, Range, Filter, Filters, Nested (bucket) / Sum, Avg, Min, Max, Stats, Cardinality, TopHits (metric)

Sort — Field sorting, score sorting, script sorting, missing value handling

Source filtering — Includes/excludes control, full disable

Output — Plain dict. No vendor lock-in.

For full API documentation with examples, see the docs/ folder.


Design Principles

  1. Explicit over implicit — method names map directly to ES bool clauses (add_must, add_filter, add_should, add_must_not)
  2. build() returns Dict[str, Any] — predictable, inspectable, serializable
  3. Method chaining via return self — fluent API without sacrificing readability
  4. Zero dependencies — pure Python, no transitive dependency risks
  5. Each query class owns its own output — single responsibility, easy to extend

Contributing

Contributions are welcome. Fork, branch, write tests, open a PR. See docs/ for architecture details.

License

MIT — see LICENSE.

About

파이썬 엘라스틱 서치 빌더형태의 쿼리빌더 구현 프로젝트

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages