Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
238 lines (188 loc) · 9.18 KB

File metadata and controls

238 lines (188 loc) · 9.18 KB

ADR-072: WiFlow Pose Estimation Architecture

  • Status: Proposed
  • Date: 2026-04-02
  • Deciders: ruv
  • Relates to: ADR-071 (ruvllm Training Pipeline), ADR-070 (Self-Supervised Pretraining), ADR-024 (Contrastive CSI Embedding / AETHER), ADR-069 (Cognitum Seed CSI Pipeline)

Context

The WiFi-DensePose project needs a neural architecture that can convert raw CSI amplitude data into 17-keypoint COCO pose estimates. The existing train-ruvllm.js pipeline uses a simple 2-layer FC encoder (8 -> 64 -> 128) that produces contrastive embeddings for presence detection but cannot output spatial keypoint coordinates.

We evaluated published WiFi-based pose estimation architectures:

Architecture Params Input Key Innovation Publication
WiFlow 4.82M 540x20 TCN + AsymConv + Axial Attention arXiv:2602.08661
WiPose 11.2M 3x3x30x20 3D CNN + heatmap regression CVPR 2021
MetaFi++ 8.6M 114x30x20 Transformer + meta-learning NeurIPS 2023
Person-in-WiFi 3D 15.3M Multi-antenna Deformable attention + 3D CVPR 2024

WiFlow is the lightest published SOTA architecture, designed specifically for commercial WiFi hardware. Its key advantage is operating on CSI amplitude only (no phase), which is critical for ESP32-S3 where phase calibration is unreliable.

Why WiFlow

  1. Lightest SOTA: 4.82M parameters at original scale; our adaptation targets ~2.5M
  2. Amplitude-only: Discards phase, which is noisy on consumer hardware
  3. Published architecture: Fully specified in arXiv:2602.08661, reproducible
  4. Temporal modeling: TCN with dilated causal convolutions captures motion dynamics
  5. Efficient attention: Axial attention reduces O(H^2W^2) to O(H^2W + HW^2)
  6. Proven on commercial WiFi: Validated on commodity Intel 5300 and Atheros hardware

Decision

Implement the WiFlow architecture in pure JavaScript (ruvllm native) with the following adaptations for our ESP32 single TX/RX deployment.

Architecture Overview

CSI Amplitude [128, 20]
        |
   Stage 1: TCN (Dilated Causal Conv)
   dilation = (1, 2, 4, 8), kernel = 7
   128 -> 256 -> 192 -> 128 channels
        |
   Stage 2: Asymmetric Conv Encoder
   1xk conv (k=3), stride (1,2)
   [1, 128, 20] -> [256, 8, 20]
        |
   Stage 3: Axial Self-Attention
   Width (temporal): 8 heads
   Height (feature): 8 heads
        |
   Decoder: Adaptive Avg Pool + Linear
   [256, 8, 20] -> pool -> [2048] -> [17, 2]
        |
   17 COCO Keypoints [x, y] in [0, 1]

Our Adaptation vs Original WiFlow

Aspect WiFlow Original Our Adaptation Reason
Input channels 540 (18 links x 30 SC) 128 (1 TX x 1 RX x 128 SC) Single ESP32 link
Time steps 20 20 Same
TCN channels 540 -> 256 -> 128 -> 64 128 -> 256 -> 192 -> 128 Proportional reduction
Spatial blocks 4 (stride 2) 4 (stride 2) Same
Attention heads 8 8 Same
Parameters 4.82M ~1.8M Fewer input channels
Input type Amplitude only Amplitude only Same
Output 17 x 2 17 x 2 Same

Parameter Budget Breakdown

Stage Parameters % of Total
TCN (4 blocks, k=7, d=1,2,4,8) ~969K 54%
Asymmetric Conv (4 blocks, 1x3, stride 2) ~174K 10%
Axial Attention (width + height, 8 heads) ~592K 33%
Pose Decoder (pool + linear -> 17x2) ~70K 4%
Total ~1.8M 100%

Loss Function

L = L_H + 0.2 * L_B

L_H = SmoothL1(predicted, target, beta=0.1)
L_B = (1/14) * sum_b (bone_length_b - prior_b)^2

14 bone connections enforce anatomical constraints:

  • Nose-eye (x2): 0.06
  • Eye-ear (x2): 0.06
  • Shoulder-elbow (x2): 0.15
  • Elbow-wrist (x2): 0.13
  • Shoulder-hip (x2): 0.26
  • Hip-knee (x2): 0.25
  • Knee-ankle (x2): 0.25
  • Shoulder width: 0.20

All lengths normalized to person height.

Training Strategy (Camera-Free Pipeline)

Since we have no ground-truth pose labels from cameras, training proceeds in three phases:

Phase 1: Contrastive Pretraining

  • Temporal triplets: adjacent windows are positive pairs, distant windows are negative
  • Cross-node triplets: same-time windows from different ESP32 nodes are positive
  • Uses ruvllm ContrastiveTrainer with triplet + InfoNCE loss
  • Learns a representation where similar CSI states cluster together

Phase 2: Pose Proxy Training

  • Generate coarse pose proxies from vitals data:
    • Person detected (presence > 0.3): place standing skeleton at center
    • High motion: perturb limb positions proportional to motion energy
    • Breathing: add micro-oscillation to torso keypoints
  • Train with SmoothL1 + bone constraint loss
  • Confidence-weighted updates (higher presence = stronger gradient)

Phase 3: Self-Refinement (Future)

  • Multi-node consistency: same person seen from different nodes should produce consistent pose after geometric transform
  • Temporal smoothness: adjacent frames should produce similar poses
  • Bone constraint tightening: gradually reduce tolerance

Integration with Existing Pipeline

train-ruvllm.js (ADR-071)        train-wiflow.js (ADR-072)
  |                                  |
  | 8-dim features                   | 128-dim raw CSI amplitude
  | -> 128-dim embedding             | -> 17x2 keypoint coordinates
  | -> presence/activity/vitals      | -> bone-constrained pose
  |                                  |
  +-- ContrastiveTrainer -----+------+
  +-- TrainingPipeline -------+------+
  +-- LoRA per-node ----------+------+
  +-- TurboQuant quantize ----+------+
  +-- SafeTensors export -----+------+

Both pipelines share the ruvllm infrastructure; WiFlow adds the deeper architecture for direct pose regression while the simple encoder handles embedding tasks.

Performance Targets

Metric Target Notes
PCK@20 > 80% On lab data with 2+ nodes
Forward latency < 50ms Pi Zero 2W at INT8
Model size (INT8) < 2 MB TurboQuant
Bone violation rate < 10% 50% tolerance
Temporal jitter < 3cm Exponential smoothing

Risk Assessment

Risk Severity Mitigation
Single TX/RX has less spatial info than 18 links High 2-node multi-static compensates; cross-node fusion from ADR-029
Camera-free labels are coarse Medium Bone constraints enforce anatomy; contrastive pretrain provides structure
Pure JS too slow for real-time Medium INT8 quantization; axial attention is O(H^2W+HW^2) not O(H^2W^2)
Overfitting with ~5K frames Medium Temporal augmentation + noise + cross-node interpolation
Phase not available (amplitude-only) Low WiFlow was designed amplitude-only; not a limitation

Consequences

Positive

  • Proven SOTA architecture adapted to our hardware constraints
  • Pure JavaScript implementation runs everywhere ruvllm runs (Node.js, browser WASM)
  • Bone constraints enforce physically plausible outputs even with noisy inputs
  • Shares training infrastructure with existing ruvllm pipeline
  • Modular: each stage (TCN, AsymConv, Axial, Decoder) is independently testable

Negative

  • ~1.8M parameters is 193x larger than simple CsiEncoder (9,344 params)
  • Forward pass is slower (~50ms vs <1ms for simple encoder)
  • Camera-free training will produce lower accuracy than supervised WiFlow
  • No ground-truth PCK evaluation possible without camera labels
  • Axial attention is O(N^2) within each axis, limiting scalability

Neutral

  • FLOPs dominated by TCN (~48%) due to dilated convolutions
  • INT8 quantization brings model to ~1.7MB, viable for edge deployment
  • Architecture is fixed (no NAS); future work could explore lighter variants

Implementation

Files Created

File Purpose
scripts/wiflow-model.js WiFlow architecture (all stages, loss, metrics)
scripts/train-wiflow.js Training pipeline (contrastive + pose proxy + LoRA + quant)
scripts/benchmark-wiflow.js Benchmarking (latency, params, FLOPs, memory, quality)
docs/adr/ADR-072-wiflow-architecture.md This document

Usage

# Train on collected data
node scripts/train-wiflow.js --data data/recordings/pretrain-*.csi.jsonl

# Train with more epochs and custom output
node scripts/train-wiflow.js --data data/recordings/*.csi.jsonl --epochs 50 --output models/wiflow-v2

# Contrastive pretraining only (no labels needed)
node scripts/train-wiflow.js --data data/recordings/*.csi.jsonl --contrastive-only

# Benchmark
node scripts/benchmark-wiflow.js

# Benchmark with trained model
node scripts/benchmark-wiflow.js --model models/wiflow-v1

Dependencies

  • ruvllm (vendored at vendor/ruvector/npm/packages/ruvllm/src/)
    • ContrastiveTrainer, tripletLoss, infoNCELoss, computeGradient
    • TrainingPipeline
    • LoraAdapter, LoraManager
    • EwcManager
    • ModelExporter, SafeTensorsWriter
  • No external ML frameworks (no PyTorch, no TensorFlow, no ONNX Runtime)

References

  • WiFlow: arXiv:2602.08661
  • COCO Keypoints: https://cocodataset.org/#keypoints-2020
  • Axial Attention: Wang et al., "Axial-DeepLab", ECCV 2020
  • TCN: Bai et al., "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling", 2018