|
| 1 | +# ADR-016: RuVector Integration for Training Pipeline |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Implementing |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +The `wifi-densepose-train` crate (ADR-015) was initially implemented using |
| 10 | +standard crates (`petgraph`, `ndarray`, custom signal processing). The ruvector |
| 11 | +ecosystem provides published Rust crates with subpolynomial algorithms that |
| 12 | +directly replace several components with superior implementations. |
| 13 | + |
| 14 | +All ruvector crates are published at v2.0.4 on crates.io (confirmed) and their |
| 15 | +source is available at https://github.com/ruvnet/ruvector. |
| 16 | + |
| 17 | +### Available ruvector crates (all at v2.0.4, published on crates.io) |
| 18 | + |
| 19 | +| Crate | Description | Default Features | |
| 20 | +|-------|-------------|-----------------| |
| 21 | +| `ruvector-mincut` | World's first subpolynomial dynamic min-cut | `exact`, `approximate` | |
| 22 | +| `ruvector-attn-mincut` | Min-cut gating attention (graph-based alternative to softmax) | all modules | |
| 23 | +| `ruvector-attention` | Geometric, graph, and sparse attention mechanisms | all modules | |
| 24 | +| `ruvector-temporal-tensor` | Temporal tensor compression with tiered quantization | all modules | |
| 25 | +| `ruvector-solver` | Sublinear-time sparse linear solvers O(log n) to O(√n) | `neumann`, `cg`, `forward-push` | |
| 26 | +| `ruvector-core` | HNSW-indexed vector database core | v2.0.5 | |
| 27 | +| `ruvector-math` | Optimal transport, information geometry | v2.0.4 | |
| 28 | + |
| 29 | +### Verified API Details (from source inspection of github.com/ruvnet/ruvector) |
| 30 | + |
| 31 | +#### ruvector-mincut |
| 32 | + |
| 33 | +```rust |
| 34 | +use ruvector_mincut::{MinCutBuilder, DynamicMinCut, MinCutResult, VertexId, Weight}; |
| 35 | + |
| 36 | +// Build a dynamic min-cut structure |
| 37 | +let mut mincut = MinCutBuilder::new() |
| 38 | + .exact() // or .approximate(0.1) |
| 39 | + .with_edges(vec![(u: VertexId, v: VertexId, w: Weight)]) // (u32, u32, f64) tuples |
| 40 | + .build() |
| 41 | + .expect("Failed to build"); |
| 42 | + |
| 43 | +// Subpolynomial O(n^{o(1)}) amortized dynamic updates |
| 44 | +mincut.insert_edge(u, v, weight) -> Result<f64> // new cut value |
| 45 | +mincut.delete_edge(u, v) -> Result<f64> // new cut value |
| 46 | + |
| 47 | +// Queries |
| 48 | +mincut.min_cut_value() -> f64 |
| 49 | +mincut.min_cut() -> MinCutResult // includes partition |
| 50 | +mincut.partition() -> (Vec<VertexId>, Vec<VertexId>) // S and T sets |
| 51 | +mincut.cut_edges() -> Vec<Edge> // edges crossing the cut |
| 52 | +// Note: VertexId = u64 (not u32); Edge has fields { source: u64, target: u64, weight: f64 } |
| 53 | +``` |
| 54 | + |
| 55 | +`MinCutResult` contains: |
| 56 | +- `value: f64` — minimum cut weight |
| 57 | +- `is_exact: bool` |
| 58 | +- `approximation_ratio: f64` |
| 59 | +- `partition: Option<(Vec<VertexId>, Vec<VertexId>)>` — S and T node sets |
| 60 | + |
| 61 | +#### ruvector-attn-mincut |
| 62 | + |
| 63 | +```rust |
| 64 | +use ruvector_attn_mincut::{attn_mincut, attn_softmax, AttentionOutput, MinCutConfig}; |
| 65 | + |
| 66 | +// Min-cut gated attention (drop-in for softmax attention) |
| 67 | +// Q, K, V are all flat &[f32] with shape [seq_len, d] |
| 68 | +let output: AttentionOutput = attn_mincut( |
| 69 | + q: &[f32], // queries: flat [seq_len * d] |
| 70 | + k: &[f32], // keys: flat [seq_len * d] |
| 71 | + v: &[f32], // values: flat [seq_len * d] |
| 72 | + d: usize, // feature dimension |
| 73 | + seq_len: usize, // number of tokens / antenna paths |
| 74 | + lambda: f32, // min-cut threshold (larger = more pruning) |
| 75 | + tau: usize, // temporal hysteresis window |
| 76 | + eps: f32, // numerical epsilon |
| 77 | +) -> AttentionOutput; |
| 78 | + |
| 79 | +// AttentionOutput |
| 80 | +pub struct AttentionOutput { |
| 81 | + pub output: Vec<f32>, // attended values [seq_len * d] |
| 82 | + pub gating: GatingResult, // which edges were kept/pruned |
| 83 | +} |
| 84 | + |
| 85 | +// Baseline softmax attention for comparison |
| 86 | +let output: Vec<f32> = attn_softmax(q, k, v, d, seq_len); |
| 87 | +``` |
| 88 | + |
| 89 | +**Use case in wifi-densepose-train**: In `ModalityTranslator`, treat the |
| 90 | +`T * n_tx * n_rx` antenna×time paths as `seq_len` tokens and the `n_sc` |
| 91 | +subcarriers as feature dimension `d`. Apply `attn_mincut` to gate irrelevant |
| 92 | +antenna-pair correlations before passing to FC layers. |
| 93 | + |
| 94 | +#### ruvector-solver (NeumannSolver) |
| 95 | + |
| 96 | +```rust |
| 97 | +use ruvector_solver::neumann::NeumannSolver; |
| 98 | +use ruvector_solver::types::CsrMatrix; |
| 99 | +use ruvector_solver::traits::SolverEngine; |
| 100 | + |
| 101 | +// Build sparse matrix from COO entries |
| 102 | +let matrix = CsrMatrix::<f32>::from_coo(rows, cols, vec![ |
| 103 | + (row: usize, col: usize, val: f32), ... |
| 104 | +]); |
| 105 | + |
| 106 | +// Solve Ax = b in O(√n) for sparse systems |
| 107 | +let solver = NeumannSolver::new(tolerance: f64, max_iterations: usize); |
| 108 | +let result = solver.solve(&matrix, rhs: &[f32]) -> Result<SolverResult, SolverError>; |
| 109 | + |
| 110 | +// SolverResult |
| 111 | +result.solution: Vec<f32> // solution vector x |
| 112 | +result.residual_norm: f64 // ||b - Ax|| |
| 113 | +result.iterations: usize // number of iterations used |
| 114 | +``` |
| 115 | + |
| 116 | +**Use case in wifi-densepose-train**: In `subcarrier.rs`, model the 114→56 |
| 117 | +subcarrier resampling as a sparse regularized least-squares problem `A·x ≈ b` |
| 118 | +where `A` is a sparse basis-function matrix (physically motivated by multipath |
| 119 | +propagation model: each target subcarrier is a sparse combination of adjacent |
| 120 | +source subcarriers). Gives O(√n) vs O(n) for n=114 subcarriers. |
| 121 | + |
| 122 | +#### ruvector-temporal-tensor |
| 123 | + |
| 124 | +```rust |
| 125 | +use ruvector_temporal_tensor::{TemporalTensorCompressor, TierPolicy}; |
| 126 | +use ruvector_temporal_tensor::segment; |
| 127 | + |
| 128 | +// Create compressor for `element_count` f32 elements per frame |
| 129 | +let mut comp = TemporalTensorCompressor::new( |
| 130 | + TierPolicy::default(), // configures hot/warm/cold thresholds |
| 131 | + element_count: usize, // n_tx * n_rx * n_sc (elements per CSI frame) |
| 132 | + id: u64, // tensor identity (0 for amplitude, 1 for phase) |
| 133 | +); |
| 134 | + |
| 135 | +// Mark access recency (drives tier selection): |
| 136 | +// hot = accessed within last few timestamps → 8-bit (~4x compression) |
| 137 | +// warm = moderately recent → 5 or 7-bit (~4.6–6.4x) |
| 138 | +// cold = rarely accessed → 3-bit (~10.67x) |
| 139 | +comp.set_access(timestamp: u64, tensor_id: u64); |
| 140 | + |
| 141 | +// Compress frames into a byte segment |
| 142 | +let mut segment_buf: Vec<u8> = Vec::new(); |
| 143 | +comp.push_frame(frame: &[f32], timestamp: u64, &mut segment_buf); |
| 144 | +comp.flush(&mut segment_buf); // flush current partial segment |
| 145 | + |
| 146 | +// Decompress |
| 147 | +let mut decoded: Vec<f32> = Vec::new(); |
| 148 | +segment::decode(&segment_buf, &mut decoded); // all frames |
| 149 | +segment::decode_single_frame(&segment_buf, frame_index: usize) -> Option<Vec<f32>>; |
| 150 | +segment::compression_ratio(&segment_buf) -> f64; |
| 151 | +``` |
| 152 | + |
| 153 | +**Use case in wifi-densepose-train**: In `dataset.rs`, buffer CSI frames in |
| 154 | +`TemporalTensorCompressor` to reduce memory footprint by 50–75%. The CSI window |
| 155 | +contains `window_frames` (default 100) frames per sample; hot frames (recent) |
| 156 | +stay at f32 fidelity, cold frames (older) are aggressively quantized. |
| 157 | + |
| 158 | +#### ruvector-attention |
| 159 | + |
| 160 | +```rust |
| 161 | +use ruvector_attention::{ |
| 162 | + attention::ScaledDotProductAttention, |
| 163 | + traits::Attention, |
| 164 | +}; |
| 165 | + |
| 166 | +let attention = ScaledDotProductAttention::new(d: usize); // feature dim |
| 167 | + |
| 168 | +// Compute attention: q is [d], keys and values are Vec<&[f32]> |
| 169 | +let output: Vec<f32> = attention.compute( |
| 170 | + query: &[f32], // [d] |
| 171 | + keys: &[&[f32]], // n_nodes × [d] |
| 172 | + values: &[&[f32]], // n_nodes × [d] |
| 173 | +) -> Result<Vec<f32>>; |
| 174 | +``` |
| 175 | + |
| 176 | +**Use case in wifi-densepose-train**: In `model.rs` spatial decoder, replace the |
| 177 | +standard Conv2D upsampling pass with graph-based spatial attention among spatial |
| 178 | +locations, where nodes represent spatial grid points and edges connect neighboring |
| 179 | +antenna footprints. |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## Decision |
| 184 | + |
| 185 | +Integrate ruvector crates into `wifi-densepose-train` at five integration points: |
| 186 | + |
| 187 | +### 1. `ruvector-mincut` → `metrics.rs` (replaces petgraph Hungarian for multi-frame) |
| 188 | + |
| 189 | +**Before:** O(n³) Kuhn-Munkres via DFS augmenting paths using `petgraph::DiGraph`, |
| 190 | +single-frame only (no state across frames). |
| 191 | + |
| 192 | +**After:** `DynamicPersonMatcher` struct wrapping `ruvector_mincut::DynamicMinCut`. |
| 193 | +Maintains the bipartite assignment graph across frames using subpolynomial updates: |
| 194 | +- `insert_edge(pred_id, gt_id, oks_cost)` when new person detected |
| 195 | +- `delete_edge(pred_id, gt_id)` when person leaves scene |
| 196 | +- `partition()` returns S/T split → `cut_edges()` returns the matched pred→gt pairs |
| 197 | + |
| 198 | +**Performance:** O(n^{1.5} log n) amortized update vs O(n³) rebuild per frame. |
| 199 | +Critical for >3 person scenarios and video tracking (frame-to-frame updates). |
| 200 | + |
| 201 | +The original `hungarian_assignment` function is **kept** for single-frame static |
| 202 | +matching (used in proof verification for determinism). |
| 203 | + |
| 204 | +### 2. `ruvector-attn-mincut` → `model.rs` (replaces flat MLP fusion in ModalityTranslator) |
| 205 | + |
| 206 | +**Before:** Amplitude/phase FC encoders → concatenate [B, 512] → fuse Linear → ReLU. |
| 207 | + |
| 208 | +**After:** Treat the `n_ant = T * n_tx * n_rx` antenna×time paths as `seq_len` |
| 209 | +tokens and `n_sc` subcarriers as feature dimension `d`. Apply `attn_mincut` to |
| 210 | +gate irrelevant antenna-pair correlations: |
| 211 | + |
| 212 | +```rust |
| 213 | +// In ModalityTranslator::forward_t: |
| 214 | +// amp/ph tensors: [B, n_ant, n_sc] → convert to Vec<f32> |
| 215 | +// Apply attn_mincut with seq_len=n_ant, d=n_sc, lambda=0.3 |
| 216 | +// → attended output [B, n_ant, n_sc] → flatten → FC layers |
| 217 | +``` |
| 218 | + |
| 219 | +**Benefit:** Automatic antenna-path selection without explicit learned masks; |
| 220 | +min-cut gating is more computationally principled than learned gates. |
| 221 | + |
| 222 | +### 3. `ruvector-temporal-tensor` → `dataset.rs` (CSI temporal compression) |
| 223 | + |
| 224 | +**Before:** Raw CSI windows stored as full f32 `Array4<f32>` in memory. |
| 225 | + |
| 226 | +**After:** `CompressedCsiBuffer` struct backed by `TemporalTensorCompressor`. |
| 227 | +Tiered quantization based on frame access recency: |
| 228 | +- Hot frames (last 10): f32 equivalent (8-bit quant ≈ 4× smaller than f32) |
| 229 | +- Warm frames (11–50): 5/7-bit quantization |
| 230 | +- Cold frames (>50): 3-bit (10.67× smaller) |
| 231 | + |
| 232 | +Encode on `push_frame`, decode on `get(idx)` for transparent access. |
| 233 | + |
| 234 | +**Benefit:** 50–75% memory reduction for the default 100-frame temporal window; |
| 235 | +allows 2–4× larger batch sizes on constrained hardware. |
| 236 | + |
| 237 | +### 4. `ruvector-solver` → `subcarrier.rs` (phase sanitization) |
| 238 | + |
| 239 | +**Before:** Linear interpolation across subcarriers using precomputed (i0, i1, frac) tuples. |
| 240 | + |
| 241 | +**After:** `NeumannSolver` for sparse regularized least-squares subcarrier |
| 242 | +interpolation. The CSI spectrum is modeled as a sparse combination of Fourier |
| 243 | +basis functions (physically motivated by multipath propagation): |
| 244 | + |
| 245 | +```rust |
| 246 | +// A = sparse basis matrix [target_sc, src_sc] (Gaussian or sinc basis) |
| 247 | +// b = source CSI values [src_sc] |
| 248 | +// Solve: A·x ≈ b via NeumannSolver(tolerance=1e-5, max_iter=500) |
| 249 | +// x = interpolated values at target subcarrier positions |
| 250 | +``` |
| 251 | + |
| 252 | +**Benefit:** O(√n) vs O(n) for n=114 source subcarriers; more accurate at |
| 253 | +subcarrier boundaries than linear interpolation. |
| 254 | + |
| 255 | +### 5. `ruvector-attention` → `model.rs` (spatial decoder) |
| 256 | + |
| 257 | +**Before:** Standard ConvTranspose2D upsampling in `KeypointHead` and `DensePoseHead`. |
| 258 | + |
| 259 | +**After:** `ScaledDotProductAttention` applied to spatial feature nodes. |
| 260 | +Each spatial location [H×W] becomes a token; attention captures long-range |
| 261 | +spatial dependencies between antenna footprint regions: |
| 262 | + |
| 263 | +```rust |
| 264 | +// feature map: [B, C, H, W] → flatten to [B, H*W, C] |
| 265 | +// For each batch: compute attention among H*W spatial nodes |
| 266 | +// → reshape back to [B, C, H, W] |
| 267 | +``` |
| 268 | + |
| 269 | +**Benefit:** Captures long-range spatial dependencies missed by local convolutions; |
| 270 | +important for multi-person scenarios. |
| 271 | + |
| 272 | +--- |
| 273 | + |
| 274 | +## Implementation Plan |
| 275 | + |
| 276 | +### Files modified |
| 277 | + |
| 278 | +| File | Change | |
| 279 | +|------|--------| |
| 280 | +| `Cargo.toml` (workspace + crate) | Add ruvector-mincut, ruvector-attn-mincut, ruvector-temporal-tensor, ruvector-solver, ruvector-attention = "2.0.4" | |
| 281 | +| `metrics.rs` | Add `DynamicPersonMatcher` wrapping `ruvector_mincut::DynamicMinCut`; keep `hungarian_assignment` for deterministic proof | |
| 282 | +| `model.rs` | Add `attn_mincut` bridge in `ModalityTranslator::forward_t`; add `ScaledDotProductAttention` in spatial heads | |
| 283 | +| `dataset.rs` | Add `CompressedCsiBuffer` backed by `TemporalTensorCompressor`; `MmFiDataset` uses it | |
| 284 | +| `subcarrier.rs` | Add `interpolate_subcarriers_sparse` using `NeumannSolver`; keep `interpolate_subcarriers` as fallback | |
| 285 | + |
| 286 | +### Files unchanged |
| 287 | + |
| 288 | +`config.rs`, `losses.rs`, `trainer.rs`, `proof.rs`, `error.rs` — no change needed. |
| 289 | + |
| 290 | +### Feature gating |
| 291 | + |
| 292 | +All ruvector integrations are **always-on** (not feature-gated). The ruvector |
| 293 | +crates are pure Rust with no C FFI, so they add no platform constraints. |
| 294 | + |
| 295 | +--- |
| 296 | + |
| 297 | +## Implementation Status |
| 298 | + |
| 299 | +| Phase | Status | |
| 300 | +|-------|--------| |
| 301 | +| Cargo.toml (workspace + crate) | **Complete** | |
| 302 | +| ADR-016 documentation | **Complete** | |
| 303 | +| ruvector-mincut in metrics.rs | Implementing | |
| 304 | +| ruvector-attn-mincut in model.rs | Implementing | |
| 305 | +| ruvector-temporal-tensor in dataset.rs | Implementing | |
| 306 | +| ruvector-solver in subcarrier.rs | Implementing | |
| 307 | +| ruvector-attention in model.rs spatial decoder | Implementing | |
| 308 | + |
| 309 | +--- |
| 310 | + |
| 311 | +## Consequences |
| 312 | + |
| 313 | +**Positive:** |
| 314 | +- Subpolynomial O(n^{1.5} log n) dynamic min-cut for multi-person tracking |
| 315 | +- Min-cut gated attention is physically motivated for CSI antenna arrays |
| 316 | +- 50–75% memory reduction from temporal quantization |
| 317 | +- Sparse least-squares interpolation is physically principled vs linear |
| 318 | +- All ruvector crates are pure Rust (no C FFI, no platform restrictions) |
| 319 | + |
| 320 | +**Negative:** |
| 321 | +- Additional compile-time dependencies (ruvector crates) |
| 322 | +- `attn_mincut` requires tensor↔Vec<f32> conversion overhead per batch element |
| 323 | +- `TemporalTensorCompressor` adds compression/decompression latency on dataset load |
| 324 | +- `NeumannSolver` requires diagonally dominant matrices; a sparse Tikhonov |
| 325 | + regularization term (λI) is added to ensure convergence |
| 326 | + |
| 327 | +## References |
| 328 | + |
| 329 | +- ADR-015: Public Dataset Training Strategy |
| 330 | +- ADR-014: SOTA Signal Processing Algorithms |
| 331 | +- github.com/ruvnet/ruvector (source: crates at v2.0.4) |
| 332 | +- ruvector-mincut: https://crates.io/crates/ruvector-mincut |
| 333 | +- ruvector-attn-mincut: https://crates.io/crates/ruvector-attn-mincut |
| 334 | +- ruvector-temporal-tensor: https://crates.io/crates/ruvector-temporal-tensor |
| 335 | +- ruvector-solver: https://crates.io/crates/ruvector-solver |
| 336 | +- ruvector-attention: https://crates.io/crates/ruvector-attention |
0 commit comments