Tags: pytorch/executorch
Tags
Merge branch 'main' into dont_build_ref_model_tests
Arm backend: Add FP16 tests of models (mv3, ic3) Add testing of the following models executed in FP16: - MobileNetV3 - InceptionV3 This patch verifies that the Arm backend is able to lower full models in FP16 to valid TOSA, and execute them with acceptable numerical accuracy. Signed-off-by: Martin Lindström <[email protected]> Change-Id: Ice3c6913598d540f7c7a52e403260943a7c8c597
Add MaxPool1D decomposition pass support (#17022) Summary: Pull Request resolved: #17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
3/x: Wire LoadBackendOptionsMap through Program and Method (#17531) Summary: This diff wires the `LoadBackendOptionsMap` through the executor layer, connecting `Program::load_method()` and `Method` to accept and route backend options to delegates. Key changes: - `Program::load_method()` now accepts optional `LoadBackendOptionsMap*` parameter - `Method` stores reference to the options map and looks up options by backend ID - When initializing delegates, the runtime queries the map for backend-specific options and passes them to `BackendInitContext` This enables the end-to-end flow: ``` Module::load(method_name, options_map) → Program::load_method(..., options_map) → Method initialization → Backend delegate init with runtime_specs from options_map ``` Reviewed By: larryliu0820 Differential Revision: D92461088
Add STABLE softmax decomposition config for Ethos-U (#17109) Summary: Current behavior for U55 defaults to UNSTABLE. It seems like this is because mask is not supported on U55, but there does not seem to be an inherent need to couple this. Defaulting to UNSTABLE negatively impacts quantization performance. This PR adds a new `STABLE` option to `SoftmaxDecompositionConfig` that provides numerically stable softmax decomposition without masked fill decomposition. The three softmax configs now behave as follows: - `MASKED`: Stable softmax (with amax subtraction) + masked fill decomposition - `UNSTABLE`: Unstable softmax (no amax subtraction), no masked fill decomposition - `STABLE`: Stable softmax (with amax subtraction), no masked fill decomposition For Ethos-U55 targets, `disable_masked_softmax()` now sets the config to `STABLE` instead of `UNSTABLE`, providing numerically stable softmax while avoiding masked fill decomposition which is not needed for these targets. Reviewed By: Ninja91 Differential Revision: D92058235
Arm backend: Consolidate simple operator visitors Signed-off-by: Sebastian Larsson <[email protected]> Change-Id: I23339f808f1074adea1fafddf90110c04fc5695f
PreviousNext