|
| 1 | +# ADR-015: Public Dataset Strategy for Trained Pose Estimation Model |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Proposed |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +The WiFi-DensePose system has a complete model architecture (`DensePoseHead`, |
| 10 | +`ModalityTranslationNetwork`, `WiFiDensePoseRCNN`) and signal processing pipeline, |
| 11 | +but no trained weights. Without a trained model, pose estimation produces random |
| 12 | +outputs regardless of input quality. |
| 13 | + |
| 14 | +Training requires paired data: simultaneous WiFi CSI captures alongside ground-truth |
| 15 | +human pose annotations. Collecting this data from scratch requires months of effort |
| 16 | +and specialized hardware (multiple WiFi nodes + camera + motion capture rig). Several |
| 17 | +public datasets exist that can bootstrap training without custom collection. |
| 18 | + |
| 19 | +### The Teacher-Student Constraint |
| 20 | + |
| 21 | +The CMU "DensePose From WiFi" paper (2023) trains using a teacher-student approach: |
| 22 | +a camera-based RGB pose model (e.g. Detectron2 DensePose) generates pseudo-labels |
| 23 | +during training, so the WiFi model learns to replicate those outputs. At inference, |
| 24 | +the camera is removed. This means any dataset that provides *either* ground-truth |
| 25 | +pose annotations *or* synchronized RGB frames (from which a teacher can generate |
| 26 | +labels) is sufficient for training. |
| 27 | + |
| 28 | +## Decision |
| 29 | + |
| 30 | +Use MM-Fi as the primary training dataset, supplemented by XRF55 for additional |
| 31 | +diversity, with a teacher-student pipeline for any dataset that lacks dense pose |
| 32 | +annotations but provides RGB video. |
| 33 | + |
| 34 | +### Primary Dataset: MM-Fi |
| 35 | + |
| 36 | +**Paper:** "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless |
| 37 | +Sensing" (NeurIPS 2023 Datasets Track) |
| 38 | +**Repository:** https://github.com/ybCliff/MM-Fi |
| 39 | +**Size:** 40 volunteers × 27 action classes × ~320,000 frames |
| 40 | +**Modalities:** WiFi CSI, mmWave radar, LiDAR, RGB-D, IMU |
| 41 | +**CSI format:** 3 Tx × 3 Rx antennas, 114 subcarriers, 100 Hz sampling rate, |
| 42 | +IEEE 802.11n 5 GHz, raw amplitude + phase |
| 43 | +**Pose annotations:** 17-keypoint COCO skeleton (from RGB-D ground truth) |
| 44 | +**License:** CC BY-NC 4.0 |
| 45 | +**Why primary:** Largest public WiFi CSI + pose dataset; raw amplitude and phase |
| 46 | +available (not just processed features); antenna count (3×3) is compatible with the |
| 47 | +existing `CSIProcessor` configuration; COCO keypoints map directly to the |
| 48 | +`KeypointHead` output format. |
| 49 | + |
| 50 | +### Secondary Dataset: XRF55 |
| 51 | + |
| 52 | +**Paper:** "XRF55: A Radio-Frequency Dataset for Human Indoor Action Recognition" |
| 53 | +(ACM MM 2023) |
| 54 | +**Repository:** https://github.com/aiotgroup/XRF55 |
| 55 | +**Size:** 55 action classes, multiple subjects and environments |
| 56 | +**CSI format:** WiFi CSI + UWB radar, 3 Tx × 3 Rx, 30 subcarriers |
| 57 | +**Pose annotations:** Skeleton keypoints from Kinect |
| 58 | +**License:** Research use |
| 59 | +**Why secondary:** Different environments and action vocabulary increase |
| 60 | +generalization; 30 subcarriers requires subcarrier interpolation to match the |
| 61 | +existing 56-subcarrier config. |
| 62 | + |
| 63 | +### Excluded Datasets and Reasons |
| 64 | + |
| 65 | +| Dataset | Reason for exclusion | |
| 66 | +|---------|---------------------| |
| 67 | +| RF-Pose / RF-Pose3D (MIT) | Uses 60 GHz mmWave, not 2.4/5 GHz WiFi CSI; incompatible signal physics | |
| 68 | +| Person-in-WiFi (CMU 2019) | Amplitude only, no phase; not publicly released | |
| 69 | +| Widar 3.0 | Gesture recognition only, no full-body pose | |
| 70 | +| NTU-Fi | Activity labels only, no pose keypoints | |
| 71 | +| WiPose | Limited release; superseded by MM-Fi | |
| 72 | + |
| 73 | +## Implementation Plan |
| 74 | + |
| 75 | +### Phase 1: MM-Fi Loader |
| 76 | + |
| 77 | +Implement a `PyTorch Dataset` class that: |
| 78 | +- Reads MM-Fi's HDF5/numpy CSI files |
| 79 | +- Resamples from 114 subcarriers → 56 subcarriers (linear interpolation along |
| 80 | + frequency axis) to match the existing `CSIProcessor` config |
| 81 | +- Normalizes amplitude and unwraps phase using the existing `PhaseSanitizer` |
| 82 | +- Returns `(amplitude, phase, keypoints_17)` tuples |
| 83 | + |
| 84 | +### Phase 2: Teacher-Student Labels |
| 85 | + |
| 86 | +For samples where only skeleton keypoints are available (not full DensePose UV maps): |
| 87 | +- Run Detectron2 DensePose on the paired RGB frames to generate `(part_labels, |
| 88 | + u_coords, v_coords)` pseudo-labels |
| 89 | +- Cache generated labels to avoid recomputation during training epochs |
| 90 | +- This matches the training procedure in the original CMU paper |
| 91 | + |
| 92 | +### Phase 3: Training Pipeline |
| 93 | + |
| 94 | +- **Loss:** Combined keypoint heatmap loss (MSE) + DensePose part classification |
| 95 | + (cross-entropy) + UV regression (Smooth L1) + transfer loss against teacher |
| 96 | + RGB backbone features |
| 97 | +- **Optimizer:** Adam, lr=1e-3, milestones at 48k and 96k steps (paper schedule) |
| 98 | +- **Hardware:** Single GPU (RTX 3090 or A100); MM-Fi fits in ~50 GB disk |
| 99 | +- **Checkpointing:** Save every epoch; keep best-by-validation-PCK |
| 100 | + |
| 101 | +### Phase 4: Evaluation |
| 102 | + |
| 103 | +- **Keypoints: ** [email protected] (Percentage of Correct Keypoints within 20% of torso size) |
| 104 | +- **DensePose:** GPS (Geodesic Point Similarity) and GPSM with segmentation mask |
| 105 | +- **Held-out split:** MM-Fi subjects 33-40 (20%) for validation; no test-set leakage |
| 106 | + |
| 107 | +## Subcarrier Mismatch: MM-Fi (114) vs System (56) |
| 108 | + |
| 109 | +MM-Fi captures 114 subcarriers at 5 GHz with 40 MHz bandwidth. The existing system |
| 110 | +is configured for 56 subcarriers. Resolution options in order of preference: |
| 111 | + |
| 112 | +1. **Interpolate MM-Fi → 56** (recommended for initial training): linear interpolation |
| 113 | + preserves spectral envelope, fast, no architecture change needed |
| 114 | +2. **Reconfigure system → 114**: change `CSIProcessor` config; requires re-running |
| 115 | + `verify.py --generate-hash` to update proof hash |
| 116 | +3. **Train at native 114, serve at 56**: separate train/inference configs; adds |
| 117 | + complexity |
| 118 | + |
| 119 | +Option 1 is chosen for Phase 1 to unblock training immediately. |
| 120 | + |
| 121 | +## Consequences |
| 122 | + |
| 123 | +**Positive:** |
| 124 | +- Unblocks end-to-end training without hardware collection |
| 125 | +- MM-Fi's 3×3 antenna setup matches this system's target hardware (ESP32 mesh, ADR-012) |
| 126 | +- 40 subjects with 27 action classes provides reasonable diversity for a first model |
| 127 | +- CC BY-NC license is compatible with research and internal use |
| 128 | + |
| 129 | +**Negative:** |
| 130 | +- CC BY-NC prohibits commercial deployment of weights trained solely on MM-Fi; |
| 131 | + custom data collection required before commercial release |
| 132 | +- 114→56 subcarrier interpolation loses some frequency resolution; acceptable for |
| 133 | + initial training, revisit in Phase 2 |
| 134 | +- MM-Fi was captured in controlled lab environments; expect accuracy drop in |
| 135 | + complex real-world deployments until fine-tuned on domain-specific data |
| 136 | + |
| 137 | +## References |
| 138 | + |
| 139 | +- He et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023) |
| 140 | +- Yang et al., "DensePose From WiFi" (arXiv 2301.00250, CMU 2023) |
| 141 | +- ADR-012: ESP32 CSI Sensor Mesh (hardware target) |
| 142 | +- ADR-013: Feature-Level Sensing on Commodity Gear |
| 143 | +- ADR-014: SOTA Signal Processing Algorithms |
0 commit comments