Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FILTER issue while using index #21944

@fqrious

Description

@fqrious

Environment

{
  "server": "arango",
  "license": "community",
  "version": "3.12.4"
}

Issue Summary

Two nearly identical AQL queries produce different results when a simple filter is added.

  • Query 1 (without doc._is_latest == TRUE) returns expected results.
  • Query 2 (with the additional filter doc._is_latest == TRUE) returns no results.
  • This seem to only happen with when I filter by fields indexed with [*].
  • Seems to work correctly with other indexed fields
  • It appears that filtering by any [*] field causes issue when using additional filters on fields not indexed in the index.

Both queries use the same inverted index (kev_epss_inv) with forceIndexHint: true. The execution plan confirms that the filters are covered by the index.


Expected Behavior

Query 2 should return a filtered subset of the documents returned in Query 1. Since all documents returned by Query 1 already satisfy that condition, both queries should return the same results.


Actual Behavior

Query 2 returns an empty result set, even though all documents from Query 1 have _is_latest == true.


Query 1

Query String (200 chars, results cachable: true):
 FOR doc IN nvd_cve_vertex_collection OPTIONS {indexHint: "kev_epss_inv", forceIndexHint: true}
 FILTER doc.labels[*] == 'epss' //AND doc._is_latest == TRUE
 LIMIT 10
 RETURN KEEP(doc, 'id', '_is_latest')

Execution plan:
 Id   NodeType          Par      Est.   Comment
  1   SingletonNode                 1   * ROOT 
  8   IndexNode               2602005     - FOR doc IN nvd_cve_vertex_collection   /* inverted index scan, index scan + document lookup */    
  5   LimitNode                    10       - LIMIT 0, 10
  6   CalculationNode     ✓        10       - LET #4 = KEEP(doc, "id", "_is_latest")   /* simple expression */   /* collections used: doc : nvd_cve_vertex_collection */
  7   ReturnNode                   10       - RETURN #4

Indexes used:
 By   Name           Type       Collection                  Unique   Sparse   Cache   Selectivity   Fields                          Stored values   Ranges
  8   kev_epss_inv   inverted   nvd_cve_vertex_collection   false    true     false           n/a   [ `id`, `labels[*]`, `name` ]   [  ]            (doc.`labels`[*] == "epss")

Functions used:
 Name   Deterministic   Cacheable   Uses V8
 KEEP   true            true        false  

Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                        
  1   use-indexes                                3   remove-unnecessary-calculations-2
  2   remove-filter-covered-by-index             4   async-prefetch                   

58 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00034


Result 1

[
  {
    "_is_latest": true,
    "id": "report--8a22a246-f149-538c-8d1d-611cb74de030"
  },
  {
    "_is_latest": true,
    "id": "report--cb0f7af1-cc05-5103-b112-3620f50295cd"
  },
  {
    "_is_latest": true,
    "id": "report--cb16f552-f80e-5ef5-bf8a-9b6d1e4f05ba"
  },
  {
    "_is_latest": true,
    "id": "report--cb26eb1b-7cb7-5c70-9422-e8ab47c9ea46"
  },
  {
    "_is_latest": true,
    "id": "report--cb29d083-25bc-5abd-8755-749083cfefc1"
  },
  {
    "_is_latest": true,
    "id": "report--cb2e23bb-e26d-5af2-a018-81cb25a9822c"
  },
  {
    "_is_latest": true,
    "id": "report--cb3934e4-b433-5fea-8500-010195624c80"
  },
  {
    "_is_latest": true,
    "id": "report--cb46ea40-0943-5611-b15b-ded5b3b58fe9"
  },
  {
    "_is_latest": true,
    "id": "report--cb49b842-85b5-567e-be43-d307455af4bb"
  },
  {
    "_is_latest": true,
    "id": "report--cb5e8543-e2e9-5667-b065-3d5017d68eca"
  }
]

Query 2

Query String (198 chars, results cachable: true):
 FOR doc IN nvd_cve_vertex_collection OPTIONS {indexHint: "kev_epss_inv", forceIndexHint: true}
 FILTER doc.labels[*] == 'epss' AND doc._is_latest == TRUE
 LIMIT 10
 RETURN KEEP(doc, 'id', '_is_latest')

Execution plan:
 Id   NodeType          Par      Est.   Comment
  1   SingletonNode                 1   * ROOT 
  8   IndexNode               2602005     - FOR doc IN nvd_cve_vertex_collection   /* inverted index scan, index scan + document lookup (filter projections: `_is_latest`, `labels`) */    FILTER ((doc.`labels`[*] == "epss") && (doc.`_is_latest` == true))   /* early pruning */   
  5   LimitNode                    10       - LIMIT 0, 10
  6   CalculationNode     ✓        10       - LET #4 = KEEP(doc, "id", "_is_latest")   /* simple expression */   /* collections used: doc : nvd_cve_vertex_collection */
  7   ReturnNode                   10       - RETURN #4

Indexes used:
 By   Name           Type       Collection                  Unique   Sparse   Cache   Selectivity   Fields                          Stored values   Ranges
  8   kev_epss_inv   inverted   nvd_cve_vertex_collection   false    true     false           n/a   [ `id`, `labels[*]`, `name` ]   [  ]            (doc.`labels`[*] == "epss")

Functions used:
 Name   Deterministic   Cacheable   Uses V8
 KEEP   true            true        false  

Optimization rules applied:
 Id   Rule Name                           Id   Rule Name                           Id   Rule Name                  
  1   use-indexes                          2   move-filters-into-enumerate          3   async-prefetch             

57 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00066


Result 2

[]


Steps to Reproduce

  1. Create a collection and import the following documents:

     [
         {
             "_is_latest": true,
             "id": "report--8a22a246-f149-538c-8d1d-611cb74de030",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2025-31772"
         },
         {
             "_is_latest": true,
             "id": "report--cb0f7af1-cc05-5103-b112-3620f50295cd",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2019-8950"
         },
         {
             "_is_latest": true,
             "id": "report--cb16f552-f80e-5ef5-bf8a-9b6d1e4f05ba",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2018-11289"
         },
         {
             "_is_latest": true,
             "id": "report--cb26eb1b-7cb7-5c70-9422-e8ab47c9ea46",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2019-6596"
         },
         {
             "_is_latest": true,
             "id": "report--cb29d083-25bc-5abd-8755-749083cfefc1",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2018-16071"
         },
         {
             "_is_latest": true,
             "id": "report--cb2e23bb-e26d-5af2-a018-81cb25a9822c",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2018-18810"
         },
         {
             "_is_latest": true,
             "id": "report--cb3934e4-b433-5fea-8500-010195624c80",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2018-15518"
         },
         {
             "_is_latest": true,
             "id": "report--cb46ea40-0943-5611-b15b-ded5b3b58fe9",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2018-17584"
         },
         {
             "_is_latest": true,
             "id": "report--cb49b842-85b5-567e-be43-d307455af4bb",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2019-9563"
         },
         {
             "_is_latest": true,
             "id": "report--cb5e8543-e2e9-5667-b065-3d5017d68eca",
             "labels": [
             "epss"
             ],
             "name": "EPSS Scores: CVE-2018-5800"
         }
     ]
    
  2. Create the following inverted index on the collection:

     {
         "type": "inverted",
         "name": "",
         "inBackground": true,
         "analyzer": "",
         "features": [],
         "cache": false,
         "includeAllFields": false,
         "trackListPositions": false,
         "searchField": false,
         "fields": [
             {
             "name": "id"
             },
             {
             "name": "labels[*]"
             },
             {
             "name": "name"
             }
         ],
         "cleanupIntervalStep": 2,
         "commitIntervalMsec": 1000,
         "consolidationIntervalMsec": 1000,
         "writebufferIdle": 64,
         "writebufferActive": 0,
         "writebufferSizeMax": 33554432,
         "primarySort": {
             "fields": [
             {
                 "field": "",
                 "direction": "asc"
             }
             ],
             "compression": "lz4"
         },
         "storedValues": [
             {
             "fields": [],
             "compression": "lz4"
             }
         ],
         "consolidationPolicy": {
             "type": "tier",
             "segmentsMin": 1,
             "segmentsMax": 10,
             "segmentsBytesMax": 5368709120,
             "segmentsBytesFloor": 2097152,
             "minScore": 0
         }
     }
  3. Run the two queries above, ensuring they both use the kev_epss_inv index with forceIndexHint: true.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions