Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 14902e6

Browse files
committed
docs: ADR-039 ESP32-S3 edge intelligence — on-device signal processing
3-tier edge pipeline: smart filtering (60-80% bandwidth reduction), on-device vital signs (breathing, heart rate, fall detection), and lightweight feature extraction. Maps capabilities from ADR-014, 016, 021, 027, 029, 030, 031, 037 to ESP32-S3 hardware budget. Co-Authored-By: claude-flow <[email protected]>
1 parent 086b0e6 commit 14902e6

1 file changed

Lines changed: 299 additions & 0 deletions

File tree

Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
# ADR-039: ESP32-S3 Edge Intelligence — On-Device Signal Processing and RuVector Integration
2+
3+
| Field | Value |
4+
|-------|-------|
5+
| **Status** | Proposed |
6+
| **Date** | 2026-03-03 |
7+
| **Depends on** | ADR-018 (binary frame format), ADR-014 (SOTA signal processing), ADR-021 (vital sign extraction), ADR-029 (multistatic sensing), ADR-030 (persistent field model), ADR-031 (RuView sensing-first RF) |
8+
| **Supersedes** | None |
9+
10+
## Context
11+
12+
The current ESP32-S3 firmware (1,018 lines, 7 files) is a "dumb sensor" — it captures raw CSI frames and streams them unprocessed over UDP at ~20 Hz. All signal processing, feature extraction, presence detection, vital sign estimation, and pose inference happen server-side in the Rust crates.
13+
14+
This creates several limitations:
15+
1. **Bandwidth waste** — raw CSI frames are 128-384 bytes each at 20 Hz = ~60 KB/s per node. Most of this is noise.
16+
2. **Latency** — round-trip to server adds 5-50ms depending on network.
17+
3. **Server dependency** — nodes are useless without an active aggregator.
18+
4. **Scalability ceiling** — 6-node mesh at 20 Hz = 120 frames/s = server bottleneck.
19+
5. **No local alerting** — fall detection, breathing anomaly, or intrusion must wait for server roundtrip.
20+
21+
The ESP32-S3 has significant untapped compute:
22+
- **Dual-core Xtensa LX7** at 240 MHz
23+
- **512 KB SRAM** + optional 8 MB PSRAM (our board has 8 MB flash)
24+
- **Vector/DSP instructions** (PIE — Processor Instruction Extensions)
25+
- **FPU** — hardware single-precision floating point
26+
- **~80% idle CPU** — current firmware uses <20% (WiFi + CSI callback + UDP send)
27+
28+
## Decision
29+
30+
Implement a **3-tier edge intelligence pipeline** on the ESP32-S3 firmware, progressively offloading signal processing from the server to the device. Each tier is independently toggleable via NVS configuration.
31+
32+
### Tier 1: Smart Filtering & Compression (Firmware C)
33+
34+
Lightweight processing in the CSI callback path. Zero additional latency.
35+
36+
| Feature | Source ADR | Algorithm | Memory | CPU |
37+
|---------|-----------|-----------|--------|-----|
38+
| **Phase sanitization** | ADR-014 | Linear phase unwrap + conjugate multiply | 256 B | <1% |
39+
| **Amplitude normalization** | ADR-014 | Per-subcarrier running mean/std (Welford) | 512 B | <1% |
40+
| **Subcarrier selection** | ADR-016 (ruvector-mincut) | Top-K variance subcarriers | 128 B | <1% |
41+
| **Static environment suppression** | ADR-030 | Exponential moving average subtraction | 512 B | <1% |
42+
| **Adaptive frame decimation** | New | Skip frames when CSI variance < threshold | 8 B | <1% |
43+
| **Delta compression** | New | XOR + RLE vs. previous frame | 512 B | <2% |
44+
45+
**Bandwidth reduction**: 60-80% (send only changed, high-variance subcarriers).
46+
47+
**ADR-018 v2 frame extension** (backward-compatible):
48+
49+
```
50+
Existing 20-byte header unchanged.
51+
New optional trailer (if magic bit set):
52+
[N*2] Compressed I/Q (delta-coded, only selected subcarriers)
53+
[2] Subcarrier bitmap (which of 64 subcarriers included)
54+
[1] Frame flags: bit0=compressed, bit1=phase-sanitized, bit2=amplitude-normed
55+
[1] Motion score (0-255)
56+
[1] Presence confidence (0-255)
57+
[1] Reserved
58+
```
59+
60+
### Tier 2: On-Device Vital Signs & Presence (Firmware C + fixed-point DSP)
61+
62+
Runs as a FreeRTOS task on Core 1 (CSI collection on Core 0), processing a sliding window of CSI frames.
63+
64+
| Feature | Source ADR | Algorithm | Memory | CPU (Core 1) |
65+
|---------|-----------|-----------|--------|--------------|
66+
| **Presence detection** | ADR-029 | Variance threshold on amplitude envelope | 2 KB | 5% |
67+
| **Motion scoring** | ADR-014 | Subcarrier correlation coefficient | 1 KB | 3% |
68+
| **Breathing rate** | ADR-021 | Bandpass 0.1-0.5 Hz + peak detection on CSI phase | 8 KB | 10% |
69+
| **Heart rate** | ADR-021 | Bandpass 0.8-2.0 Hz + autocorrelation on CSI phase | 8 KB | 15% |
70+
| **Fall detection** | ADR-029 | Sudden variance spike + sustained stillness | 1 KB | 2% |
71+
| **Room occupancy count** | ADR-037 | CSI rank estimation (eigenvalue spread) | 4 KB | 8% |
72+
| **Coherence gate** | ADR-029 (ruvsense) | Z-score coherence, accept/reject/recalibrate | 1 KB | 2% |
73+
74+
**Total memory**: ~25 KB (fits in SRAM, no PSRAM needed).
75+
**Total CPU**: ~45% of Core 1.
76+
77+
**Output**: Compact vital-signs UDP packet (32 bytes) at 1 Hz:
78+
79+
```
80+
Offset Size Field
81+
0 4 Magic: 0xC5110002 (vitals packet)
82+
4 1 Node ID
83+
5 1 Packet type (0x02 = vitals)
84+
6 2 Sequence (LE u16)
85+
8 1 Presence (0=empty, 1=present, 2=moving)
86+
9 1 Motion score (0-255)
87+
10 1 Occupancy estimate (0-8 persons)
88+
11 1 Coherence gate (0=reject, 1=predict, 2=accept, 3=recalibrate)
89+
12 2 Breathing rate (BPM * 100, LE u16) — 0 if not detected
90+
14 2 Heart rate (BPM * 100, LE u16) — 0 if not detected
91+
16 2 Breathing confidence (0-10000, LE u16)
92+
18 2 Heart rate confidence (0-10000, LE u16)
93+
20 1 Fall detected (0/1)
94+
21 1 Anomaly flags (bitfield)
95+
22 2 Ambient RSSI mean (LE i16)
96+
24 4 CSI frame count since last report (LE u32)
97+
28 4 Uptime seconds (LE u32)
98+
```
99+
100+
### Tier 3: Lightweight Feature Extraction (Firmware C + optional PSRAM)
101+
102+
Pre-compute features that the server-side neural network needs, reducing server CPU by 60-80%.
103+
104+
| Feature | Source ADR | Algorithm | Memory | CPU |
105+
|---------|-----------|-----------|--------|-----|
106+
| **Phase difference matrix** | ADR-014 | Adjacent subcarrier phase diff | 4 KB | 5% |
107+
| **Amplitude spectrogram** | ADR-014 | 64-bin FFT on 1s window per subcarrier | 32 KB | 15% |
108+
| **Doppler-time map** | ADR-029 | 2D FFT across subcarriers × time | 16 KB | 10% |
109+
| **Fresnel zone crossing** | ADR-014 | First Fresnel radius + fade count | 1 KB | 2% |
110+
| **Cross-link correlation** | ADR-029 | Pearson correlation between TX-RX pairs | 2 KB | 5% |
111+
| **Environment fingerprint** | ADR-027 (MERIDIAN) | PCA-compressed 16-dim CSI signature | 4 KB | 5% |
112+
| **Gesture template match** | ADR-029 (ruvsense) | DTW on 8-dim feature vector | 8 KB | 10% |
113+
114+
**Total memory**: ~67 KB (SRAM) or up to 256 KB with PSRAM.
115+
**Total CPU**: ~52% of Core 1.
116+
117+
**Output**: Feature vector UDP packet (variable size, ~200-500 bytes) at 4 Hz:
118+
119+
```
120+
Offset Size Field
121+
0 4 Magic: 0xC5110003 (feature packet)
122+
4 1 Node ID
123+
5 1 Packet type (0x03 = features)
124+
6 2 Feature bitmap (which features included)
125+
8 4 Timestamp ms (LE u32)
126+
12 N Feature payloads (concatenated, lengths determined by bitmap)
127+
```
128+
129+
## NVS Configuration
130+
131+
All tiers controllable via NVS without reflashing:
132+
133+
| NVS Key | Type | Default | Description |
134+
|---------|------|---------|-------------|
135+
| `edge_tier` | u8 | 0 | 0=raw only, 1=smart filter, 2=+vitals, 3=+features |
136+
| `decim_thresh` | u16 | 100 | Adaptive decimation variance threshold |
137+
| `subk_count` | u8 | 32 | Top-K subcarriers to keep (Tier 1) |
138+
| `vital_window` | u16 | 300 | Vital sign window frames (15s at 20 Hz) |
139+
| `vital_interval` | u16 | 1000 | Vital report interval ms |
140+
| `feature_hz` | u8 | 4 | Feature extraction rate |
141+
| `fall_thresh` | u16 | 500 | Fall detection variance spike threshold |
142+
| `presence_thresh` | u16 | 50 | Presence detection threshold |
143+
144+
Provisioning:
145+
```bash
146+
python firmware/esp32-csi-node/provision.py --port COM7 \
147+
--edge-tier 2 --vital-window 300 --presence-thresh 50
148+
```
149+
150+
## Implementation Plan
151+
152+
### Phase 1: Infrastructure (1 week)
153+
154+
1. **Dual-core task architecture**
155+
- Core 0: WiFi + CSI callback (existing)
156+
- Core 1: Edge processing task (new FreeRTOS task)
157+
- Lock-free ring buffer between cores (producer-consumer)
158+
159+
2. **Ring buffer design**
160+
```c
161+
#define RING_BUF_FRAMES 64 // ~3.2s at 20 Hz
162+
typedef struct {
163+
wifi_csi_info_t info;
164+
int8_t iq_data[384]; // Max I/Q payload
165+
uint32_t timestamp_ms;
166+
uint8_t tx_mac[6];
167+
} csi_ring_entry_t;
168+
```
169+
170+
3. **NVS config extension** — add `edge_tier` and tier-specific params
171+
4. **ADR-018 v2 header** — backward-compatible extension bit
172+
173+
### Phase 2: Tier 1 — Smart Filtering (1 week)
174+
175+
1. **Phase unwrap** — O(N) linear scan, in-place
176+
2. **Welford running stats** — per-subcarrier mean/variance, O(1) update
177+
3. **Top-K subcarrier selection** — partial sort, O(N) with selection algorithm
178+
4. **Delta compression** — XOR vs previous frame, RLE encode
179+
5. **Adaptive decimation** — skip frame if total variance < threshold
180+
181+
### Phase 3: Tier 2 — Vital Signs (2 weeks)
182+
183+
1. **Presence detector** — amplitude variance over 1s window
184+
2. **Motion scorer** — correlation coefficient between consecutive frames
185+
3. **Breathing extractor** — port from `wifi-densepose-vitals::BreathingExtractor::esp32_default()`
186+
- Bandpass via biquad IIR filter (0.1-0.5 Hz)
187+
- Peak detection with parabolic interpolation
188+
- Fixed-point arithmetic (Q15.16) for efficiency
189+
4. **Heart rate extractor** — port from `wifi-densepose-vitals::HeartRateExtractor::esp32_default()`
190+
- Bandpass via biquad IIR (0.8-2.0 Hz)
191+
- Autocorrelation peak search
192+
5. **Fall detection** — variance spike (>5σ) followed by sustained stillness (>3s)
193+
6. **Coherence gate** — port from `ruvsense::coherence_gate` (Z-score threshold)
194+
195+
### Phase 4: Tier 3 — Feature Extraction (2 weeks)
196+
197+
1. **FFT engine** — fixed-point 64-point FFT (radix-2 DIT, no library needed)
198+
2. **Amplitude spectrogram** — 1s sliding window FFT per subcarrier
199+
3. **Doppler-time map** — 2D FFT across subcarrier × time dimensions
200+
4. **Phase difference matrix** — adjacent subcarrier Δφ
201+
5. **Environment fingerprint** — online PCA (incremental SVD, 16 components)
202+
6. **Gesture DTW** — 8 stored templates, dynamic time warping on 8-dim feature
203+
204+
### Phase 5: CI/CD + Testing (1 week)
205+
206+
1. **GitHub Actions firmware build** — Docker `espressif/idf:v5.2` on every PR
207+
2. **Host-side unit tests** — compile edge processing functions on x86 with mock CSI data
208+
3. **Credential leak check** — binary string scan in CI
209+
4. **Binary size tracking** — fail CI if firmware exceeds 90% of partition
210+
5. **QEMU smoke test** — boot verification, NVS load, task creation
211+
212+
## ESP32-S3 Resource Budget
213+
214+
| Resource | Available | Tier 1 | Tier 2 | Tier 3 | Remaining |
215+
|----------|-----------|--------|--------|--------|-----------|
216+
| **SRAM** | 512 KB | 2 KB | 25 KB | 67 KB | 418 KB |
217+
| **Core 0 CPU** | 100% | 5% | 0% | 0% | 75% (WiFi uses ~20%) |
218+
| **Core 1 CPU** | 100% | 0% | 45% | 52% | 3% (Tier 2+3 exclusive) |
219+
| **Flash** | 1 MB partition | 4 KB code | 12 KB code | 20 KB code | 964 KB |
220+
221+
Note: Tier 2 and Tier 3 run on Core 1 but are time-multiplexed — vitals at 1 Hz, features at 4 Hz. Combined peak load is ~60% of Core 1.
222+
223+
## Mapping to Existing ADRs
224+
225+
| Existing ADR | Capability | Edge Tier | Implementation |
226+
|-------------|------------|-----------|----------------|
227+
| **ADR-014** (SOTA signal) | Phase sanitization | 1 | Linear unwrap in CSI callback |
228+
| **ADR-014** | Amplitude normalization | 1 | Welford running stats |
229+
| **ADR-014** | Feature extraction | 3 | FFT spectrogram + phase diff matrix |
230+
| **ADR-014** | Fresnel zone detection | 3 | Fade counting + first Fresnel radius |
231+
| **ADR-016** (RuVector) | Subcarrier selection | 1 | Top-K variance (simplified mincut) |
232+
| **ADR-021** (Vitals) | Breathing rate | 2 | Biquad IIR + peak detect |
233+
| **ADR-021** | Heart rate | 2 | Biquad IIR + autocorrelation |
234+
| **ADR-021** | Anomaly detection | 2 | Z-score on vital readings |
235+
| **ADR-027** (MERIDIAN) | Environment fingerprint | 3 | Online PCA, 16-dim signature |
236+
| **ADR-029** (RuvSense) | Coherence gate | 2 | Z-score coherence scoring |
237+
| **ADR-029** | Multistatic correlation | 3 | Pearson cross-link correlation |
238+
| **ADR-029** | Gesture recognition | 3 | DTW template matching |
239+
| **ADR-030** (Field model) | Static suppression | 1 | EMA background subtraction |
240+
| **ADR-031** (RuView) | Sensing-first NDP | Existing | Already in firmware (stub) |
241+
| **ADR-037** (Multi-person) | Occupancy counting | 2 | CSI rank estimation |
242+
243+
## Server-Side Changes
244+
245+
The Rust aggregator (`wifi-densepose-hardware`) needs to handle the new packet types:
246+
247+
```rust
248+
match magic {
249+
0xC5110001 => parse_raw_csi_frame(buf), // Existing
250+
0xC5110002 => parse_vitals_packet(buf), // New: Tier 2
251+
0xC5110003 => parse_feature_packet(buf), // New: Tier 3
252+
_ => Err(ParseError::UnknownMagic(magic)),
253+
}
254+
```
255+
256+
When edge tier ≥ 1, the server can skip its own phase sanitization and amplitude normalization. When edge tier = 3, the server skips feature extraction entirely and feeds pre-computed features directly to the neural network.
257+
258+
## Testing Strategy
259+
260+
| Test Type | Tool | What |
261+
|-----------|------|------|
262+
| **Host unit tests** | gcc + Unity + mock CSI data | Phase unwrap, Welford stats, IIR filter, peak detect, DTW |
263+
| **QEMU smoke test** | Docker QEMU | Boot, NVS load, task creation, ring buffer |
264+
| **Hardware regression** | ESP32-S3 + serial log | Full pipeline: CSI → edge processing → UDP → server |
265+
| **Accuracy validation** | Python reference impl | Compare edge vitals vs. server vitals on same CSI data |
266+
| **Stress test** | 6-node mesh | Tier 3 at 20 Hz sustained, no frame drops |
267+
268+
## Alternatives Considered
269+
270+
1. **Rust on ESP32 (esp-rs)** — More type-safe, could share code with server crates. Rejected: larger binary, longer compile times, less mature ESP-IDF support for CSI APIs.
271+
272+
2. **MicroPython on ESP32** — Easier prototyping. Rejected: too slow for 20 Hz real-time processing, no fixed-point DSP.
273+
274+
3. **External co-processor (FPGA/DSP)** — Maximum throughput. Rejected: cost ($50+ per node), defeats the $8 ESP32 value proposition.
275+
276+
4. **Server-only processing** — Keep firmware dumb. Rejected: doesn't solve bandwidth, latency, or standalone operation requirements.
277+
278+
## Risks
279+
280+
| Risk | Mitigation |
281+
|------|------------|
282+
| Core 1 processing exceeds real-time budget | Adaptive quality: reduce feature_hz or fall back to lower tier |
283+
| Fixed-point arithmetic introduces accuracy drift | Validate against Rust f64 reference on same CSI data; track error bounds |
284+
| NVS config complexity overwhelms users | Sensible defaults; provision.py presets: `--preset home`, `--preset medical`, `--preset security` |
285+
| ADR-018 v2 header breaks old aggregators | Backward-compatible: old magic = old format. New bit in flags field signals extension |
286+
| Memory fragmentation from ring buffer | Static allocation only; no malloc in edge processing path |
287+
288+
## Success Criteria
289+
290+
- [ ] Tier 1 reduces bandwidth by ≥60% with <1 dB SNR loss
291+
- [ ] Tier 2 breathing rate within ±1 BPM of server-side estimate
292+
- [ ] Tier 2 heart rate within ±3 BPM of server-side estimate
293+
- [ ] Tier 2 fall detection latency <500ms (vs. ~2s server roundtrip)
294+
- [ ] Tier 2 presence detection accuracy ≥95%
295+
- [ ] Tier 3 feature extraction matches server output within 5% RMSE
296+
- [ ] All tiers: zero frame drops at 20 Hz sustained on single node
297+
- [ ] Firmware binary stays under 90% of 1 MB app partition
298+
- [ ] SRAM usage stays under 400 KB (leave headroom for WiFi stack)
299+
- [ ] CI pipeline: build + host unit tests + binary size check on every PR

0 commit comments

Comments
 (0)