-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What needs to happen?
This issue is to track and refer to other issues/prs for various prism features. This issue shouldn't generally be commented on, but have this top entry edited as needed, referring to granular issues for individual features and support.
Ultimately, this will eventually track support in the Beam Compatibility Matrix, and keeping the Prism README up to date.
Complete items should be checked, and have links to their completing PR or closed primary tracking issue.
Items marked should only have an issue filed when the work has started, typically there's a meaningful design proposal, and understanding of what the closing criteria are. This can be "X set of existing SDK tests now pass", or a given capability is possible (eg. UI related features.)
Prism Areas for Contribution
Beam Core Priorities
These are features that prevent Prism use and adoption.
In progress by @lostluck
- State Handling ([Task][prism]: Support State API #28543)
- [prism] Timer Handling (Event Time) #29772
- [prism] TestStream support (Elements + Watermarks) #29917
- Watermark and Element Event handling
- [prism] Processing Time handling (TestStream, Timers, ProcessContinuations) #30083
- [prism] Triggers and Panes and Windowing Strategies #31438
- [Bug][Prism]: unsupported feature "WindowStrategy.AllowedLateness" set with MaxTimestamp. #31463
- [prism]: unsupported feature "WindowStrategy.OutputTime" set with value (LATEST|EARLIEST)_IN_PANE #31462
- [Bug][Prism]: unsupported feature "WindowingStrategy.Trigger" set with value (never|always) #31461
Beam Feature Burn Down (from Java and Python Validates Runner Tests)
The goal in this section is correctly implement Beam features in prism such that the validates runner suites of each SDK pass. The following issues were produced from examining the test outputs from the failing tests. This list will be refined as new failures are discovered initial blocking features are implemented.
-
Features
- [Prism] Support Bundle Finalization #31912
- [prism] Triggers and Panes and Windowing Strategies #31438
- [prism] Support WindowingStrategy.AccumulationMode Accumulating/Discarding
- [prism] Support WindowingStrategy.AllowedLateness
- [prism]: unsupported feature "WindowStrategy.OutputTime" set with value (LATEST|EARLIEST)_IN_PANE #31462
- [prism] Support Custom WindowFns #31921
- [prism] Timer Clearing #32115
-
Metrics
- [prism] Support Attempted Metrics #31926
- MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
- MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics
- [prism] Support StringSet Metrics #31927
- MetricsTest.testCommittedStringSetMetrics
- MetricsTest.testAttemptedStringSetMetrics
- [prism] Support Gauge Metrics #31928
- MetricsTest.testCommittedGuageMetrics
- MetricsTest.testAttemptedGuageMetrics
- [prism] Support Bounded Trie metrics in Prism.
- MetricsTest.testCommittedBoundedTrieMetrics
- MetricsTest.testAttemptedBoundedTrieMetrics
- [prism] Support Attempted Metrics #31926
-
Windowing
-
Various Coders or PreProcessing issues.
- [prism] Java PipelineTest: Unknown Urn "ProjectTag", "IdentityTransform" - Support "Empty" Composites without subtransforms. #31991
- [prism] Support Empty Flattens as Side Input - no way to make progress with pending elements #32003
- [prism] Preprocess failure - Expected Runner Flatten Node - but wasn't #31992
- [prism] Java and Python SplittableDoFnTests - invalid stream header (likely coders) #32004
- [prism] Java CoGgroupByKeyTest - test failures (likely coders)
- [prism] Java FlattenTest.testFlattenMultipleCoders - worker crash #32930
- [prism] Java org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests - testLargeKeys100MB
- Now a mystery panic? error? Message is getting lost. Need to find what's getting logged.
- Mysteriously now passing.
- [prism] ViewTest.testSideInputWithNestedIterables - user coder errors #32932
- [prism] org.apache.beam.sdk.transforms.ParDoSchemaTest - UserCoder not a row coder exceptions. #32931
- Likely started failing due to other fixes. Might need to change Row Coder handling in some instances, instead of simply wrapping them.
- [prism] Python Validates Runner (test_pack_combiners) - Unknown Coder not being processed (tuple) #32636
-
State and Timer issues.
- [prism] Java PerKeyOrderingTest - test failures (per key order not maintained) #32064
- [prism] Python Assert failures - Missing Data (likely state & timers)
- [prism] Support OnWindowExpiry #32211
- [prism] Java timer families cannot be looked up SDK side (prism re-write issue) #32221
- [prism] Support SortedTime input (java annotation & requirement) #33513
- [prism] Support OrderedListState #32929
- [prism] Resolve endless timer test org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerOrderingWithCreate #32222
-
Next Steps - only once the filtered suite fully passes.
- [prism] Next Step - Java Validation Tests - Unfilter tests from Gradle file.
- beam/runners/portability/java/build.gradle
- Did so locally and filed broad strokes failures above.
- [prism] Next Step - Python Validation Tests - Unfilter skipped portable tests (super class)
- [prism] Next Step - Java Validation Tests - Unfilter tests from Gradle file.
Non-Go Blockers
Notable issues found in trying to run the Non Go SDKs (Java, Python, or others). Tracked in #28187, and more granular issues should be referred to here.
- [Task][prism]: Be able to execute non-Go SDKs on Prism. #28187
- Go SDK Cross Language PreCommit Suite ([#28187][prism] Basic cross language support. #28545)
- No Basic Xlang Tests filtered. ([Bug][prism]: Support receiving Java's CoGBKList Result #28544)
- Prism Java Validates Runner Suite
- [Task][Prism]: Create a PrismRunner for Java. #31793
- Gradle Target exists
- Github Action exists
- No tests filtered.
- Python Validates Runner Suite
- Properly respect and handle SDK & Runner Capabilities.
- [prism] Good Non-Go Beam developer experience #32564
Other Beam Core
This is an incomplete list of Beam features that would be nice to have.
- Multi-Chunk Iterable protocol ([Feature Request][prism]: Add support for the multi-chunk iterable protocol #27762)
- State Backed Iterables (for GBKs & Side Inputs)
- State based Continuation Tokens
- [prism] Support OrderedListState #32929
- Blocked on inclusion in the FnAPI once in the FnAPI
- [prism] Document contradiction between Stateful and Splittable DoFns #32139
Persistence & Reliability Features
Prism currently stores everything in memory. This includes all element data, in progress bundle data, pipeline info, artifacts etc. This is fast, but not the best use of memory for using prism long term as a stand alone runner.
- Per Pipeline data should be moved to a local file cache.
- They aren’t stored in memory when not needed. Eg. Artifacts shouldn’t live in memory once necessary environments are spun up.
- Garbage collect artifacts after pipeline termination.
- Garbage collect older pipelines after some threshold.
- Separate Prism management logs and pipeline logs, with rolling log files
- https://github.com/natefinch/lumberjack is a dependency free impl for this
- Pipeline Restarts
- Optimized stages need to be stored, so no complex mapping needs to occur for any persisted state.
- Per stage pending elements and state needs to be stored so bundles can be re-computed on restarts.
- It should be possible for a pipeline to be aborted, and prism torn down, and for a pipeline to be restarted from where it left off, with new worker processes.
- FrostDB is an embeddable-in-Go, write optimized, in-memory + persistence, columnar database that might be a good thing to look at to enable these features.
- Bundles Retries
- Prism currently doesn’t retry failed bundles. A bundle failure fails the pipeline.
- Adding a sensible retries policy would improve bundle reliability.
- Affects how elements are divided into bundles, and scheduled.
- Eg. A failed bundle could be split into smaller and smaller bundles, until the failing elements are isolated. Such a strategy would also enable implementation of error tolerance policies for example.
- Improve (static) Bundle Splitting
- Prism currently schedules all available pending elements into a single bundle.
- Instead it could use some heuristic to determine how to split pending elements into new bundles to improve worker level parallelism before Channel or Sub Element Splitting occurs.
- [prism] Smarter "globally" aware dynamic splits. #32538
- [prism] Programmatic Cancel, and Drain #29669
- Pipeline Update
- Similar to Cancel + Drain in combination with Pipeline restarts. Allow a pipeline to be updated mid execution.
- [Prism] Use the worker-id gRPC metadata #32167
- Distinguishes between pipeline workers to avoid needing a new port for each instance.
- Need a single "multiplexer" layer to route between the handlers for given jobs and workers.
- [Bug]: Worker logs on Prism Stand-Alone Binary Not Appearing #30703
- SDK side logs via the Beam logging service to the runner should be available via the API. This may just require an SDK side change, or at least be a toggle on the prism stand alone binary.
- [prism] Metrics from runner expanded composites aren't mapped back to user terms. #31971
Performance features
These are non-user facing Beam features that Dataflow implements. In order for Prism to serve the purpose of validating pipeline locally before production runner execution, these are required, to reduce worker side execution differences.
- Side Input + State Cache
- Elements on ProcessBundleRequest
- Elements on ProcessBundleResponse
- Autosharded keys
- Map Side Input Keys #31628
- Eagerly Bucket elements by Key + Window for GBKs
- Intern user-keys, tags strings, byte arrays to reduce memory bloat during stateful pipeline execution. Go is garbage collected, not magic.
Stand Alone UI Based Features
These are features that are best tied to the ability to understand a job in the UI.
- Data Sampling + plumbing to UI
- Worker Status support + plumbing to UI
- Runner side PubSub Transform (due to being a Beam built in)
- Display of Optimized stages in UI
- Display of Graph structure in UI
- Interactivity with same.
- Display of Job Logs in UI
- ...and storage thereof in local cache.
Other features
The following are known issues/desires without a specific categorization at present.
- Prism Per Job Configurability
- Being able to toggle or set specific configurations using PipelineOptions or similar.
- AKA the described Variants approach.
- Additional runner side execution metrics
- Count Splits per Transform
- Count executed "Bundles" Executed.
- "stage" execution time
- Num elements per stage
- Num Keys per stage (if stateful)
- Histograms or Timeseries of the above?
- Public Documentation of Prism
- Include Prism on the Beam Website #32562
- Add prism to the Runner Capability Matrix
- Deep Dive Documentation on how Prism Works #32601
- Add Prism to Quickstarts and Examples
- Blog Post about Prism
Completed Work
This section should be structured similarly to the Beam Compatibility Matrix for ease of transition to populating it there.
- Environment Execution
- LOOPBACK/External
- Docker
- Process
- AnyOf
- Basics
- DoFns
- GBKS
- Windows
- Side Inputs
- Scaling
- Splittable DoFn support
- ProcessContinuation support
- Performance
- Fusion