Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions apps/query-eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Sycamore Query Evaluation tool

This tool can be used to evaluate the query planning and answering capabilities
of Sycamore Query against a given dataset and set of queries. This is a wrapper
around the `sycamore.query.client.SycamoreQueryClient` class that reads a
configuration from an input YAML file, and writes results to an output YAML file.

## Input file format

The input file format is YAML and is defined by the `queryeval.types.QueryEvalInputFile`
class. The following is a minimal example of the input file format:

```yaml
# General configuration options. Each of these can be specified
# on the command line as well.
config:
# The OpenSearch index to use.
index: const_ntsb

# The list of queries to run. Each has a query and an expected
# result, which can be either a string, or a list of dictionaries,
# with each element of the list representing a Sycamore Document
# expected to be returned by the query.
queries:
- query: "How many incidents were there in 2023?"
expected: "There were 10 incidents in 2023."
- query: "How many incidents occurred in bad weather?"
expected: "7 incidents occurred in bad weather."
```

Examples of input files can be found in the `data/` directory.

## Output file format

The output file format is YAML and is defined by the `queryeval.types.QueryEvalResultsFile`
type. Depending on the configuration options used when run, the output file may
contain one or more of the following:
* Query plans generated by the Sycamore Query Planner.
* Query results produced by running these query plans.
* Accuracy and quality metrics calculated from the query results.

The idea is that each of these stages of the evaluation can be run independently, and
the results from previous stages can be used as input to the next stage.

## Running the tool

First, run `poetry install` in this directory to install all dependencies.

You can get a full list of options by running:

```bash
$ poetry run python queryeval/main.py --help
```

To generate query plans and run all of the resulting queries:

```bash
$ poetry run python queryeval/main.py --outfile results.yaml data/ntsb-mini.yaml run
```

To only generate query plans:

```bash
$ poetry run python queryeval/main.py --outfile results.yaml data/ntsb-mini.yaml plan
```

Note that the query plans generated during the `plan` phase are saved to the results
file, so if you use `run` after `plan` with the same `--outfile` option set, the query plans
will be reused. You can force regeneration of the query plans using the `--overwrite` option.

## Specifying the schema

By default, the data schema will be fetched from the provided OpenSearch index.
However, the schema can be specified manually by setting the `data_schema` field in the
input file. Each field in the schema has two entries: the type of the field, and
a list of example values. For example:

```yaml
data_schema:

properties.entity.accidentNumber:
# The type of the field.
- str
# A list of example values.
- ["CEN23LAO80", "DCA23LA133", "CEN23LA086", "ERA23LA168", "CEN23LA097"]

properties.entity.aircraftDamage:
- str
# You can also specify individual examples as list entries on their own line.
- - Destroyed
- None
- Substantial
```

## Useful flags

Use the `--query-cache-path` and `--llm-cache-path` flags to specify caches for intermediate
query results and LLM results, respectively. This can save a substantial amount of time and
LLM cost if you are doing repeated evaluations, however, be aware that stale cache entries
may affect your results.

Use `--dry-run` to avoid performing any planning, queries, or writing results. This is useful
to test if your config file format is correct.

Use `--logfile` to write detailed logs of the evaluation process to a file.



101 changes: 101 additions & 0 deletions apps/query-eval/data/ntsb-full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# This file contains a query-eval config for evaluating Sycamore Query against
# the NTSB incident dataset.

config:
index: const_ntsb

queries:
- query: "Were there any environmentally caused incidents?"
expected: "Yes, there were environmentally caused incidents."
- query: "Were there any ice related incidents in Alaska?"
expected: "Yes, there were ice-related incidents in Alaska."
- query: "Were there any incidents in the last three days of January 2023 in Washington?"
expected: "Yes, there was an incident in the last three days of January 2023 in Washington."
- query: "Were there any fire related incidents in CA in 2023?"
expected: "Yes, there was a fire-related incident in California in 2023."
- query: "How many Piper aircrafts were involved in accidents?"
expected: "There were 21 Piper aircrafts involved in accidents."
- query: "How many incidents occurred in the summer months of 2023 which involved birds?"
expected: "No incidents occurred in the summer months of 2023 which involved birds."
- query: "What fraction of incidents that resulted in substantial damage were due to engine problems?"
expected: "0.338 of the incidents that resulted in substantial damage were due to engine problems."
- query: "What fraction of environmentally caused incidents were due to fires in the past 5 years?"
expected: "0.043 of environmentally caused incidents were due to fires in the past 5 years."
- query: "How many more environmentally caused incidents were there compared to human errors?"
expected: "There were 16 fewer environmentally caused incidents compared to human errors."
- query: "What planes (by company) were involved in incidents in California?"
expected: "Cessna and Piper planes were involved in incidents in California."
- query: "What was the most prevalent cause of incidents in 2023 with 2+ serious injuries?"
expected: "The most prevalent cause of incidents in 2023 with 2+ serious injuries was 'unknown cause'."
- query: "Of all the incidents related to icy conditions on the tarmac, what was the top three types of failures?"
expected: "The top three types of failures were 'snow berm', 'impact to ground', and 'wet/dense snow'."
- query: "Which states in the Midwest were most affected by aviation incidents in 2023?"
expected: "Nebraska was the Midwest state that was the most affected by aviation incidents in 2023."
- query: "How many incidents happened in california in 2023?"
expected: "There were 9 incidents."
- query: "How many incidents occurred in California?"
expected: "There were 9 incidents."
- query: "How many locations did incidents in the first 5 days of January 2023 occur in?"
expected: "There were 10 locations."
- query: "How many incidents happened due to environmental issues?"
expected: "There were 15 incidents."
- query: "How many types of planes did incidents in the first 5 days of January 2023 occur in?"
expected: "There were 5 types of planes."
- query: "How many U.S. States did incidents in the first 5 days of January 2023 occur in?"
expected: "Incidents occurred in 3 U.S. States."
- query: "Where did incidents happen?"
expected: "Incidents happened in California, Florida, and Texas."
- query: "What percentage of incidents that resulted in substantial damage were due to engine problems?"
expected: "30% of incidents that resulted in substantial damage were due to engine problems."
- query: "What fraction of incidents that resulted in substantial damage involved engine problems?"
expected: "50% of incidents that resulted in substantial damage involved engine problems."
- query: "What fraction of incidents occurred in the first 5 days of January 2023?"
expected: "60% of incidents occurred in the first 5 days of January 2023."
- query: "How many incidents occurred in the first 5 days of January 2023?"
expected: "There were 100 incidents."
- query: "How many incidents occurred before January 6, 2023?"
expected: "There were 50 incidents."
- query: "How many incidents occurred after January 6, 2023?"
expected: "There were 75 incidents."
- query: "What fraction of incidents resulted in substantial damage?"
expected: "40% of incidents resulted in substantial damage."
- query: "What fraction of incidents that resulted in substantial damage occurred in California?"
expected: "20% of incidents that resulted in substantial damage occurred in California."
- query: "How many more incidents happened in California compared to Florida?"
expected: "There were 10 more incidents in California compared to Florida."
- query: "How many incidents resulted in 2+ serious injuries?"
expected: "There were 5 incidents that resulted in 2+ serious injuries."
- query: "How many U.S states did incidents occur in?"
expected: "Incidents occurred in 10 U.S. states."
- query: "What were the top 2 states in the Midwest that were most affected by aviation incidents in 2023?"
expected: "The top 2 states in the Midwest that were most affected by aviation incidents in 2023 were Illinois and Ohio."
- query: "What was the most prevalent cause of incidents in 2023 with 1+ serious injuries?"
expected: "The most prevalent cause of incidents in 2023 with 1+ serious injuries was pilot error."
- query: "Was northern or southern California more affected by airplane incidents?"
expected: "Northern California was more affected by airplane incidents."
- query: "Were there any environmentally caused incidents?"
expected: "Yes, there were environmentally caused incidents."
- query: "Were there any ice related incidents in Alaska?"
expected: "Yes, there were ice related incidents in Alaska."
- query: "Were there any incidents in the last three days of January 2023 in Washington?"
expected: "Yes, there were incidents in the last three days of January 2023 in Washington."
- query: "Were there any fire related incidents in CA in 2023?"
expected: "Yes, there were fire related incidents in CA in 2023."
- query: "How many Piper aircrafts were involved in accidents?"
expected: "There were 5 Piper aircrafts involved in accidents."
- query: "How many incidents occurred in the summer months of 2023 which involved birds?"
expected: "There were 20 incidents that occurred in the summer months of 2023 which involved birds."
- query: "What fraction of incidents that resulted in substantial damage were due to engine problems?"
expected: "30% of incidents that resulted in substantial damage were due to engine problems."
- query: "What fraction of environmentally caused incidents were due to fires in the past 5 years?"
expected: "50% of environmentally caused incidents were due to fires in the past 5 years."
- query: "How many more environmentally caused incidents were there compared to human errors?"
expected: "There were 10 more environmentally caused incidents compared to human errors."
- query: "What planes (by company) were involved in incidents in California?"
expected: "The planes involved in incidents in California were Boeing, Airbus, and Cessna."
- query: "What was the most prevalent cause of incidents in 2023 with 2+ serious injuries?"
expected: "The most prevalent cause of incidents in 2023 with 2+ serious injuries was mechanical failure."
- query: "Of all the incidents related to icy conditions on the tarmac, what were the top three types of failure?"
expected: "The top three types of failure in incidents related to icy conditions on the tarmac were braking failure, steering failure, and engine failure."
- query: "Which states in the Midwest were most affected by aviation incidents in 2023?"
expected: "The states in the Midwest that were most affected by aviation incidents in 2023 were Illinois, Ohio, and Michigan."
15 changes: 15 additions & 0 deletions apps/query-eval/data/ntsb-mini.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This file contains a query-eval config for evaluating Sycamore Query against
# the NTSB incident dataset. This version only contains a few queries for quick
# testing.

config:
index: const_ntsb


queries:
- query: "Were there any fire related incidents in CA in 2023?"
expected: "Yes, there was a fire-related incident in California in 2023."
- query: "How many Piper aircrafts were involved in accidents?"
expected: "There were 21 Piper aircrafts involved in accidents."
- query: "What fraction of incidents that resulted in substantial damage were due to engine problems?"
expected: "0.338 of the incidents that resulted in substantial damage were due to engine problems."
Loading
Loading