Add support for node integration testing #1163

phil-opp · 2025-10-15T15:45:24Z

How it works

Set the DORA_TEST_WITH_INPUTS env variable with the path to your input JSON file
(Optional) Set the DORA_TEST_WRITE_OUTPUTS_TO env variable with the path where the outputs should be written. If not set, dora will write a outputs.jsonl file next to the given inputs file
Start the node executable/script

The node will be run as usual, but its event channel will be filled from the given inputs JSON file. No connection to a dora daemon will be made.

Input JSON file format example:

{
     // ID of the node
    "id": "foo",
    // defines the events that the node should receive
    "events": [
        {
            // specifies when the event arrives (seconds since node start)
            "time_offset_secs": 0.7,
            // type of the event (supported types are `Input`, `Stop`, `InputClosed`, `AllInputsClosed`
            "type": "Input",
            // input ID
            "id": "tick"
            // optional: `data` field with input data
        },
        {
            "time_offset_secs": 0.9,
            "type": "Input",
            "id": "tick"
        },
        {
            "time_offset_secs": 1.2,
            "type": "Stop"
        }
    ]
    // other supported fields: name, description, args, env, outputs, inputs, send_stdout_as (they all behave like in dataflow.yaml)
}

Output JSON file format example:

{"id":"random","data":9267023440904143729,"time_offset_secs":0.700793541}
{"id":"random","data":5753749540645363621,"time_offset_secs":0.900897584}

TODO:

Documentation
- API docs
- dora-rs.ai docs
add some tests for our examples and use them on our CI
take a look at arrow_integration_test JSON format -> this might be better suitable than our custom input json format
add option (via env variable) to write out received inputs as inputs.json files during normal dataflow operation -> to make creating files with complex input data easier
add option (via env variable) to omit time offsets in output formats -> to make them diff-able with expected outputs (the time offsets are a bit different on each run)
~~use ArrowTestUnwrap format for outputs.jsonl~~ (not possible because of ArrowJsonBatch::from_batch is incomplete apache/arrow-rs#8684)
- instead: Include data type in output JSON file

Add an optional `data_format` field that specifies the format of the `data` field. It defaults to deriving the schema from the given JSON object and converting it to the closest Arrow representation. The `ArrowTest` and `ArrowTestUnwrap` formats expect the `data` field to follow the Arrow integration test data format. The `ArrowTestUnwrap` format unwraps the first column of the deserialized RecordBatch to make other Arrow types representable (i.e. not just StructArrays).

The arrow integration test format crate panics in certain situations, which lead to a closed integration test channel. We want to panic on the sending side too in this case to avoid endless loops.

Useful for diffing the file against an expected file (as time offsets are not deterministic).

phil-opp · 2025-10-29T14:31:51Z

I opened apache/arrow-rs#8737 to add support for binary decoding to arrow-json.

The `arrow_integration_test` crate is incomplete and apparently only for internal use.

- wrap values if necessary - avoid double-wrapping array values

phil-opp added 14 commits October 10, 2025 13:43

wip

69d5d44

Implement event channel for integration testing

e3d46ea

Implement integration testing inputs from JSON file

47c718a

Implement output encoding

aa78acd

Improve output and input file formats

0580754

Introduce env variable to enable testing

fb7d369

Add missing message format file

f522c38

Improve error handling during integration tests

0b4e90e

The arrow integration test format crate panics in certain situations, which lead to a closed integration test channel. We want to panic on the sending side too in this case to avoid endless loops.

Add option to skip time offset from output

a143f56

Useful for diffing the file against an expected file (as time offsets are not deterministic).

Include data type in output JSON file

fe0820e

Merge branch 'main' into node-integration-testing

9e6b797

Update indexmap in cargo.lock

c1a4811

Allow arrow files as inputs + add optional schema for JSON values

b2d948a

phil-opp added 2 commits October 29, 2025 17:44

Remove support for Arrow integration test format again

1fc13ef

The `arrow_integration_test` crate is incomplete and apparently only for internal use.

Improve input JSON handling

cc914ad

- wrap values if necessary - avoid double-wrapping array values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for node integration testing #1163

Add support for node integration testing #1163

Uh oh!

phil-opp commented Oct 15, 2025 •

edited

Loading

Uh oh!

phil-opp commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for node integration testing #1163

Are you sure you want to change the base?

Add support for node integration testing #1163

Uh oh!

Conversation

phil-opp commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Input JSON file format example:

Output JSON file format example:

TODO:

Uh oh!

phil-opp commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phil-opp commented Oct 15, 2025 •

edited

Loading