|
| 1 | +# ADR-039: ESP32-S3 Edge Intelligence — On-Device Signal Processing and RuVector Integration |
| 2 | + |
| 3 | +| Field | Value | |
| 4 | +|-------|-------| |
| 5 | +| **Status** | Proposed | |
| 6 | +| **Date** | 2026-03-03 | |
| 7 | +| **Depends on** | ADR-018 (binary frame format), ADR-014 (SOTA signal processing), ADR-021 (vital sign extraction), ADR-029 (multistatic sensing), ADR-030 (persistent field model), ADR-031 (RuView sensing-first RF) | |
| 8 | +| **Supersedes** | None | |
| 9 | + |
| 10 | +## Context |
| 11 | + |
| 12 | +The current ESP32-S3 firmware (1,018 lines, 7 files) is a "dumb sensor" — it captures raw CSI frames and streams them unprocessed over UDP at ~20 Hz. All signal processing, feature extraction, presence detection, vital sign estimation, and pose inference happen server-side in the Rust crates. |
| 13 | + |
| 14 | +This creates several limitations: |
| 15 | +1. **Bandwidth waste** — raw CSI frames are 128-384 bytes each at 20 Hz = ~60 KB/s per node. Most of this is noise. |
| 16 | +2. **Latency** — round-trip to server adds 5-50ms depending on network. |
| 17 | +3. **Server dependency** — nodes are useless without an active aggregator. |
| 18 | +4. **Scalability ceiling** — 6-node mesh at 20 Hz = 120 frames/s = server bottleneck. |
| 19 | +5. **No local alerting** — fall detection, breathing anomaly, or intrusion must wait for server roundtrip. |
| 20 | + |
| 21 | +The ESP32-S3 has significant untapped compute: |
| 22 | +- **Dual-core Xtensa LX7** at 240 MHz |
| 23 | +- **512 KB SRAM** + optional 8 MB PSRAM (our board has 8 MB flash) |
| 24 | +- **Vector/DSP instructions** (PIE — Processor Instruction Extensions) |
| 25 | +- **FPU** — hardware single-precision floating point |
| 26 | +- **~80% idle CPU** — current firmware uses <20% (WiFi + CSI callback + UDP send) |
| 27 | + |
| 28 | +## Decision |
| 29 | + |
| 30 | +Implement a **3-tier edge intelligence pipeline** on the ESP32-S3 firmware, progressively offloading signal processing from the server to the device. Each tier is independently toggleable via NVS configuration. |
| 31 | + |
| 32 | +### Tier 1: Smart Filtering & Compression (Firmware C) |
| 33 | + |
| 34 | +Lightweight processing in the CSI callback path. Zero additional latency. |
| 35 | + |
| 36 | +| Feature | Source ADR | Algorithm | Memory | CPU | |
| 37 | +|---------|-----------|-----------|--------|-----| |
| 38 | +| **Phase sanitization** | ADR-014 | Linear phase unwrap + conjugate multiply | 256 B | <1% | |
| 39 | +| **Amplitude normalization** | ADR-014 | Per-subcarrier running mean/std (Welford) | 512 B | <1% | |
| 40 | +| **Subcarrier selection** | ADR-016 (ruvector-mincut) | Top-K variance subcarriers | 128 B | <1% | |
| 41 | +| **Static environment suppression** | ADR-030 | Exponential moving average subtraction | 512 B | <1% | |
| 42 | +| **Adaptive frame decimation** | New | Skip frames when CSI variance < threshold | 8 B | <1% | |
| 43 | +| **Delta compression** | New | XOR + RLE vs. previous frame | 512 B | <2% | |
| 44 | + |
| 45 | +**Bandwidth reduction**: 60-80% (send only changed, high-variance subcarriers). |
| 46 | + |
| 47 | +**ADR-018 v2 frame extension** (backward-compatible): |
| 48 | + |
| 49 | +``` |
| 50 | +Existing 20-byte header unchanged. |
| 51 | +New optional trailer (if magic bit set): |
| 52 | + [N*2] Compressed I/Q (delta-coded, only selected subcarriers) |
| 53 | + [2] Subcarrier bitmap (which of 64 subcarriers included) |
| 54 | + [1] Frame flags: bit0=compressed, bit1=phase-sanitized, bit2=amplitude-normed |
| 55 | + [1] Motion score (0-255) |
| 56 | + [1] Presence confidence (0-255) |
| 57 | + [1] Reserved |
| 58 | +``` |
| 59 | + |
| 60 | +### Tier 2: On-Device Vital Signs & Presence (Firmware C + fixed-point DSP) |
| 61 | + |
| 62 | +Runs as a FreeRTOS task on Core 1 (CSI collection on Core 0), processing a sliding window of CSI frames. |
| 63 | + |
| 64 | +| Feature | Source ADR | Algorithm | Memory | CPU (Core 1) | |
| 65 | +|---------|-----------|-----------|--------|--------------| |
| 66 | +| **Presence detection** | ADR-029 | Variance threshold on amplitude envelope | 2 KB | 5% | |
| 67 | +| **Motion scoring** | ADR-014 | Subcarrier correlation coefficient | 1 KB | 3% | |
| 68 | +| **Breathing rate** | ADR-021 | Bandpass 0.1-0.5 Hz + peak detection on CSI phase | 8 KB | 10% | |
| 69 | +| **Heart rate** | ADR-021 | Bandpass 0.8-2.0 Hz + autocorrelation on CSI phase | 8 KB | 15% | |
| 70 | +| **Fall detection** | ADR-029 | Sudden variance spike + sustained stillness | 1 KB | 2% | |
| 71 | +| **Room occupancy count** | ADR-037 | CSI rank estimation (eigenvalue spread) | 4 KB | 8% | |
| 72 | +| **Coherence gate** | ADR-029 (ruvsense) | Z-score coherence, accept/reject/recalibrate | 1 KB | 2% | |
| 73 | + |
| 74 | +**Total memory**: ~25 KB (fits in SRAM, no PSRAM needed). |
| 75 | +**Total CPU**: ~45% of Core 1. |
| 76 | + |
| 77 | +**Output**: Compact vital-signs UDP packet (32 bytes) at 1 Hz: |
| 78 | + |
| 79 | +``` |
| 80 | +Offset Size Field |
| 81 | +0 4 Magic: 0xC5110002 (vitals packet) |
| 82 | +4 1 Node ID |
| 83 | +5 1 Packet type (0x02 = vitals) |
| 84 | +6 2 Sequence (LE u16) |
| 85 | +8 1 Presence (0=empty, 1=present, 2=moving) |
| 86 | +9 1 Motion score (0-255) |
| 87 | +10 1 Occupancy estimate (0-8 persons) |
| 88 | +11 1 Coherence gate (0=reject, 1=predict, 2=accept, 3=recalibrate) |
| 89 | +12 2 Breathing rate (BPM * 100, LE u16) — 0 if not detected |
| 90 | +14 2 Heart rate (BPM * 100, LE u16) — 0 if not detected |
| 91 | +16 2 Breathing confidence (0-10000, LE u16) |
| 92 | +18 2 Heart rate confidence (0-10000, LE u16) |
| 93 | +20 1 Fall detected (0/1) |
| 94 | +21 1 Anomaly flags (bitfield) |
| 95 | +22 2 Ambient RSSI mean (LE i16) |
| 96 | +24 4 CSI frame count since last report (LE u32) |
| 97 | +28 4 Uptime seconds (LE u32) |
| 98 | +``` |
| 99 | + |
| 100 | +### Tier 3: Lightweight Feature Extraction (Firmware C + optional PSRAM) |
| 101 | + |
| 102 | +Pre-compute features that the server-side neural network needs, reducing server CPU by 60-80%. |
| 103 | + |
| 104 | +| Feature | Source ADR | Algorithm | Memory | CPU | |
| 105 | +|---------|-----------|-----------|--------|-----| |
| 106 | +| **Phase difference matrix** | ADR-014 | Adjacent subcarrier phase diff | 4 KB | 5% | |
| 107 | +| **Amplitude spectrogram** | ADR-014 | 64-bin FFT on 1s window per subcarrier | 32 KB | 15% | |
| 108 | +| **Doppler-time map** | ADR-029 | 2D FFT across subcarriers × time | 16 KB | 10% | |
| 109 | +| **Fresnel zone crossing** | ADR-014 | First Fresnel radius + fade count | 1 KB | 2% | |
| 110 | +| **Cross-link correlation** | ADR-029 | Pearson correlation between TX-RX pairs | 2 KB | 5% | |
| 111 | +| **Environment fingerprint** | ADR-027 (MERIDIAN) | PCA-compressed 16-dim CSI signature | 4 KB | 5% | |
| 112 | +| **Gesture template match** | ADR-029 (ruvsense) | DTW on 8-dim feature vector | 8 KB | 10% | |
| 113 | + |
| 114 | +**Total memory**: ~67 KB (SRAM) or up to 256 KB with PSRAM. |
| 115 | +**Total CPU**: ~52% of Core 1. |
| 116 | + |
| 117 | +**Output**: Feature vector UDP packet (variable size, ~200-500 bytes) at 4 Hz: |
| 118 | + |
| 119 | +``` |
| 120 | +Offset Size Field |
| 121 | +0 4 Magic: 0xC5110003 (feature packet) |
| 122 | +4 1 Node ID |
| 123 | +5 1 Packet type (0x03 = features) |
| 124 | +6 2 Feature bitmap (which features included) |
| 125 | +8 4 Timestamp ms (LE u32) |
| 126 | +12 N Feature payloads (concatenated, lengths determined by bitmap) |
| 127 | +``` |
| 128 | + |
| 129 | +## NVS Configuration |
| 130 | + |
| 131 | +All tiers controllable via NVS without reflashing: |
| 132 | + |
| 133 | +| NVS Key | Type | Default | Description | |
| 134 | +|---------|------|---------|-------------| |
| 135 | +| `edge_tier` | u8 | 0 | 0=raw only, 1=smart filter, 2=+vitals, 3=+features | |
| 136 | +| `decim_thresh` | u16 | 100 | Adaptive decimation variance threshold | |
| 137 | +| `subk_count` | u8 | 32 | Top-K subcarriers to keep (Tier 1) | |
| 138 | +| `vital_window` | u16 | 300 | Vital sign window frames (15s at 20 Hz) | |
| 139 | +| `vital_interval` | u16 | 1000 | Vital report interval ms | |
| 140 | +| `feature_hz` | u8 | 4 | Feature extraction rate | |
| 141 | +| `fall_thresh` | u16 | 500 | Fall detection variance spike threshold | |
| 142 | +| `presence_thresh` | u16 | 50 | Presence detection threshold | |
| 143 | + |
| 144 | +Provisioning: |
| 145 | +```bash |
| 146 | +python firmware/esp32-csi-node/provision.py --port COM7 \ |
| 147 | + --edge-tier 2 --vital-window 300 --presence-thresh 50 |
| 148 | +``` |
| 149 | + |
| 150 | +## Implementation Plan |
| 151 | + |
| 152 | +### Phase 1: Infrastructure (1 week) |
| 153 | + |
| 154 | +1. **Dual-core task architecture** |
| 155 | + - Core 0: WiFi + CSI callback (existing) |
| 156 | + - Core 1: Edge processing task (new FreeRTOS task) |
| 157 | + - Lock-free ring buffer between cores (producer-consumer) |
| 158 | + |
| 159 | +2. **Ring buffer design** |
| 160 | + ```c |
| 161 | + #define RING_BUF_FRAMES 64 // ~3.2s at 20 Hz |
| 162 | + typedef struct { |
| 163 | + wifi_csi_info_t info; |
| 164 | + int8_t iq_data[384]; // Max I/Q payload |
| 165 | + uint32_t timestamp_ms; |
| 166 | + uint8_t tx_mac[6]; |
| 167 | + } csi_ring_entry_t; |
| 168 | + ``` |
| 169 | +
|
| 170 | +3. **NVS config extension** — add `edge_tier` and tier-specific params |
| 171 | +4. **ADR-018 v2 header** — backward-compatible extension bit |
| 172 | +
|
| 173 | +### Phase 2: Tier 1 — Smart Filtering (1 week) |
| 174 | +
|
| 175 | +1. **Phase unwrap** — O(N) linear scan, in-place |
| 176 | +2. **Welford running stats** — per-subcarrier mean/variance, O(1) update |
| 177 | +3. **Top-K subcarrier selection** — partial sort, O(N) with selection algorithm |
| 178 | +4. **Delta compression** — XOR vs previous frame, RLE encode |
| 179 | +5. **Adaptive decimation** — skip frame if total variance < threshold |
| 180 | +
|
| 181 | +### Phase 3: Tier 2 — Vital Signs (2 weeks) |
| 182 | +
|
| 183 | +1. **Presence detector** — amplitude variance over 1s window |
| 184 | +2. **Motion scorer** — correlation coefficient between consecutive frames |
| 185 | +3. **Breathing extractor** — port from `wifi-densepose-vitals::BreathingExtractor::esp32_default()` |
| 186 | + - Bandpass via biquad IIR filter (0.1-0.5 Hz) |
| 187 | + - Peak detection with parabolic interpolation |
| 188 | + - Fixed-point arithmetic (Q15.16) for efficiency |
| 189 | +4. **Heart rate extractor** — port from `wifi-densepose-vitals::HeartRateExtractor::esp32_default()` |
| 190 | + - Bandpass via biquad IIR (0.8-2.0 Hz) |
| 191 | + - Autocorrelation peak search |
| 192 | +5. **Fall detection** — variance spike (>5σ) followed by sustained stillness (>3s) |
| 193 | +6. **Coherence gate** — port from `ruvsense::coherence_gate` (Z-score threshold) |
| 194 | +
|
| 195 | +### Phase 4: Tier 3 — Feature Extraction (2 weeks) |
| 196 | +
|
| 197 | +1. **FFT engine** — fixed-point 64-point FFT (radix-2 DIT, no library needed) |
| 198 | +2. **Amplitude spectrogram** — 1s sliding window FFT per subcarrier |
| 199 | +3. **Doppler-time map** — 2D FFT across subcarrier × time dimensions |
| 200 | +4. **Phase difference matrix** — adjacent subcarrier Δφ |
| 201 | +5. **Environment fingerprint** — online PCA (incremental SVD, 16 components) |
| 202 | +6. **Gesture DTW** — 8 stored templates, dynamic time warping on 8-dim feature |
| 203 | +
|
| 204 | +### Phase 5: CI/CD + Testing (1 week) |
| 205 | +
|
| 206 | +1. **GitHub Actions firmware build** — Docker `espressif/idf:v5.2` on every PR |
| 207 | +2. **Host-side unit tests** — compile edge processing functions on x86 with mock CSI data |
| 208 | +3. **Credential leak check** — binary string scan in CI |
| 209 | +4. **Binary size tracking** — fail CI if firmware exceeds 90% of partition |
| 210 | +5. **QEMU smoke test** — boot verification, NVS load, task creation |
| 211 | +
|
| 212 | +## ESP32-S3 Resource Budget |
| 213 | +
|
| 214 | +| Resource | Available | Tier 1 | Tier 2 | Tier 3 | Remaining | |
| 215 | +|----------|-----------|--------|--------|--------|-----------| |
| 216 | +| **SRAM** | 512 KB | 2 KB | 25 KB | 67 KB | 418 KB | |
| 217 | +| **Core 0 CPU** | 100% | 5% | 0% | 0% | 75% (WiFi uses ~20%) | |
| 218 | +| **Core 1 CPU** | 100% | 0% | 45% | 52% | 3% (Tier 2+3 exclusive) | |
| 219 | +| **Flash** | 1 MB partition | 4 KB code | 12 KB code | 20 KB code | 964 KB | |
| 220 | +
|
| 221 | +Note: Tier 2 and Tier 3 run on Core 1 but are time-multiplexed — vitals at 1 Hz, features at 4 Hz. Combined peak load is ~60% of Core 1. |
| 222 | +
|
| 223 | +## Mapping to Existing ADRs |
| 224 | +
|
| 225 | +| Existing ADR | Capability | Edge Tier | Implementation | |
| 226 | +|-------------|------------|-----------|----------------| |
| 227 | +| **ADR-014** (SOTA signal) | Phase sanitization | 1 | Linear unwrap in CSI callback | |
| 228 | +| **ADR-014** | Amplitude normalization | 1 | Welford running stats | |
| 229 | +| **ADR-014** | Feature extraction | 3 | FFT spectrogram + phase diff matrix | |
| 230 | +| **ADR-014** | Fresnel zone detection | 3 | Fade counting + first Fresnel radius | |
| 231 | +| **ADR-016** (RuVector) | Subcarrier selection | 1 | Top-K variance (simplified mincut) | |
| 232 | +| **ADR-021** (Vitals) | Breathing rate | 2 | Biquad IIR + peak detect | |
| 233 | +| **ADR-021** | Heart rate | 2 | Biquad IIR + autocorrelation | |
| 234 | +| **ADR-021** | Anomaly detection | 2 | Z-score on vital readings | |
| 235 | +| **ADR-027** (MERIDIAN) | Environment fingerprint | 3 | Online PCA, 16-dim signature | |
| 236 | +| **ADR-029** (RuvSense) | Coherence gate | 2 | Z-score coherence scoring | |
| 237 | +| **ADR-029** | Multistatic correlation | 3 | Pearson cross-link correlation | |
| 238 | +| **ADR-029** | Gesture recognition | 3 | DTW template matching | |
| 239 | +| **ADR-030** (Field model) | Static suppression | 1 | EMA background subtraction | |
| 240 | +| **ADR-031** (RuView) | Sensing-first NDP | Existing | Already in firmware (stub) | |
| 241 | +| **ADR-037** (Multi-person) | Occupancy counting | 2 | CSI rank estimation | |
| 242 | +
|
| 243 | +## Server-Side Changes |
| 244 | +
|
| 245 | +The Rust aggregator (`wifi-densepose-hardware`) needs to handle the new packet types: |
| 246 | +
|
| 247 | +```rust |
| 248 | +match magic { |
| 249 | + 0xC5110001 => parse_raw_csi_frame(buf), // Existing |
| 250 | + 0xC5110002 => parse_vitals_packet(buf), // New: Tier 2 |
| 251 | + 0xC5110003 => parse_feature_packet(buf), // New: Tier 3 |
| 252 | + _ => Err(ParseError::UnknownMagic(magic)), |
| 253 | +} |
| 254 | +``` |
| 255 | + |
| 256 | +When edge tier ≥ 1, the server can skip its own phase sanitization and amplitude normalization. When edge tier = 3, the server skips feature extraction entirely and feeds pre-computed features directly to the neural network. |
| 257 | + |
| 258 | +## Testing Strategy |
| 259 | + |
| 260 | +| Test Type | Tool | What | |
| 261 | +|-----------|------|------| |
| 262 | +| **Host unit tests** | gcc + Unity + mock CSI data | Phase unwrap, Welford stats, IIR filter, peak detect, DTW | |
| 263 | +| **QEMU smoke test** | Docker QEMU | Boot, NVS load, task creation, ring buffer | |
| 264 | +| **Hardware regression** | ESP32-S3 + serial log | Full pipeline: CSI → edge processing → UDP → server | |
| 265 | +| **Accuracy validation** | Python reference impl | Compare edge vitals vs. server vitals on same CSI data | |
| 266 | +| **Stress test** | 6-node mesh | Tier 3 at 20 Hz sustained, no frame drops | |
| 267 | + |
| 268 | +## Alternatives Considered |
| 269 | + |
| 270 | +1. **Rust on ESP32 (esp-rs)** — More type-safe, could share code with server crates. Rejected: larger binary, longer compile times, less mature ESP-IDF support for CSI APIs. |
| 271 | + |
| 272 | +2. **MicroPython on ESP32** — Easier prototyping. Rejected: too slow for 20 Hz real-time processing, no fixed-point DSP. |
| 273 | + |
| 274 | +3. **External co-processor (FPGA/DSP)** — Maximum throughput. Rejected: cost ($50+ per node), defeats the $8 ESP32 value proposition. |
| 275 | + |
| 276 | +4. **Server-only processing** — Keep firmware dumb. Rejected: doesn't solve bandwidth, latency, or standalone operation requirements. |
| 277 | + |
| 278 | +## Risks |
| 279 | + |
| 280 | +| Risk | Mitigation | |
| 281 | +|------|------------| |
| 282 | +| Core 1 processing exceeds real-time budget | Adaptive quality: reduce feature_hz or fall back to lower tier | |
| 283 | +| Fixed-point arithmetic introduces accuracy drift | Validate against Rust f64 reference on same CSI data; track error bounds | |
| 284 | +| NVS config complexity overwhelms users | Sensible defaults; provision.py presets: `--preset home`, `--preset medical`, `--preset security` | |
| 285 | +| ADR-018 v2 header breaks old aggregators | Backward-compatible: old magic = old format. New bit in flags field signals extension | |
| 286 | +| Memory fragmentation from ring buffer | Static allocation only; no malloc in edge processing path | |
| 287 | + |
| 288 | +## Success Criteria |
| 289 | + |
| 290 | +- [ ] Tier 1 reduces bandwidth by ≥60% with <1 dB SNR loss |
| 291 | +- [ ] Tier 2 breathing rate within ±1 BPM of server-side estimate |
| 292 | +- [ ] Tier 2 heart rate within ±3 BPM of server-side estimate |
| 293 | +- [ ] Tier 2 fall detection latency <500ms (vs. ~2s server roundtrip) |
| 294 | +- [ ] Tier 2 presence detection accuracy ≥95% |
| 295 | +- [ ] Tier 3 feature extraction matches server output within 5% RMSE |
| 296 | +- [ ] All tiers: zero frame drops at 20 Hz sustained on single node |
| 297 | +- [ ] Firmware binary stays under 90% of 1 MB app partition |
| 298 | +- [ ] SRAM usage stays under 400 KB (leave headroom for WiFi stack) |
| 299 | +- [ ] CI pipeline: build + host unit tests + binary size check on every PR |
0 commit comments