ASR performance measurement #39

drowe67 · 2024-12-13T02:59:43Z

Experimental way to evaluate performance of SSB versus RADE using automatic speech recognition.

SSB test with AWGN:
```
./asr_test.sh ssb --No -30 -n 100
```

RADE test with MPP channel:

./asr_test.sh rade --EbNodB 3 -n 10 --g_file g_mpp.f32

Listen to some of the files generated

cd ~/.cache/LibriSpeech
find test-other -name '*.flac'
<play one of them>

Generate 500 seconds of multipath samples:

Fs=8000; Rs=50; Nc=20; multipath_samples('mpp', Fs, Rs, Nc, 5000, '','g_mpp.f32');

Plotting some results:

octave:3> radae_plots; plot_wer("241221","","wer_snr")

…cript

…les to allow for any prcoessing delay

drowe67 · 2025-05-02T01:52:16Z

Training versus run time symbol allocation over OFDM modem frame

Have added --print_frame option to dump OFDM frame format, and exit. Elements of z vector are replaced with their index, e.g. for d=80, index is 0..79. Every frame contains 3 z vectors (120 QAM symbols), and lasts 120ms.

Training, Nc=20, each row is frequency (a separate carrier), each column a step in time at the Rq~50 Hz symbols rate.

python3 train.py --cuda-visible-devices 0 --sequence-length 400 --batch-size 512 --epochs 100 --lr 0.003 --lr-decay-factor 0.0001 ~/Downloads/tts_speech_16k_speexdsp.f32 tmp --bottleneck 3 --h_file h_nc20_train_mpp.f32 --range_EbNo --plot_loss --auxdata --print_frame
Rs: 50.00 Rs': 50.00 Ts': 0.020 Nsmf: 120 Ns:   6 Nc:  20 M: 160 Ncp: 0
<snip>
 0+1j	40+41j	 0+1j	40+41j	 0+1j	40+41j	
 2+3j	42+43j	 2+3j	42+43j	 2+3j	42+43j	
 4+5j	44+45j	 4+5j	44+45j	 4+5j	44+45j	
 6+7j	46+47j	 6+7j	46+47j	 6+7j	46+47j	
 8+9j	48+49j	 8+9j	48+49j	 8+9j	48+49j	
10+11j	50+51j	10+11j	50+51j	10+11j	50+51j	
12+13j	52+53j	12+13j	52+53j	12+13j	52+53j	
14+15j	54+55j	14+15j	54+55j	14+15j	54+55j	
16+17j	56+57j	16+17j	56+57j	16+17j	56+57j	
18+19j	58+59j	18+19j	58+59j	18+19j	58+59j	
20+21j	60+61j	20+21j	60+61j	20+21j	60+61j	
22+23j	62+63j	22+23j	62+63j	22+23j	62+63j	
24+25j	64+65j	24+25j	64+65j	24+25j	64+65j	
26+27j	66+67j	26+27j	66+67j	26+27j	66+67j	
28+29j	68+69j	28+29j	68+69j	28+29j	68+69j	
30+31j	70+71j	30+31j	70+71j	30+31j	70+71j	
32+33j	72+73j	32+33j	72+73j	32+33j	72+73j	
34+35j	74+75j	34+35j	74+75j	34+35j	74+75j	
36+37j	76+77j	36+37j	76+77j	36+37j	76+77j	
38+39j	78+79j	38+39j	78+79j	38+39j	78+79j

We can see elements of z mapped to complex pairs, and there is a consistency in the mapping of symbols to each carrier, e.g. z_{n} and z_{n+1} have element 38,39 mapped to the same carrier. There are correlations in the channel simulation dataset along the carrier axis that might be useful to learn.

Now consider run time, when pilots are inserted we re-arrange the modem frame (for protocol efficiency reasons) to have Nc=30 carriers. It also last 120ms, but we have inserted the first column of pilot symbols into every carrier:

python3 ./inference.py model19_check3/checkpoints/checkpoint_epoch_100.pth features_in.f32 features_out.f32 --rate_Fs --pilots --pilot_eq --eq_ls --cp 0.004 --bottleneck 3 --auxdata --time_offset -16 --print_frame
Rs: 33.33 Rs': 50.00 Ts': 0.020 Nsmf: 120 Ns:   4 Nc:  30 M: 160 Ncp: 32

33+0j	 0+1j	60+61j	40+41j	20+21j	
33+0j	 2+3j	62+63j	42+43j	22+23j	
33+0j	 4+5j	64+65j	44+45j	24+25j	
33+0j	 6+7j	66+67j	46+47j	26+27j	
33+0j	 8+9j	68+69j	48+49j	28+29j	
-33+0j	10+11j	70+71j	50+51j	30+31j	
-33+0j	12+13j	72+73j	52+53j	32+33j	
33+0j	14+15j	74+75j	54+55j	34+35j	
33+0j	16+17j	76+77j	56+57j	36+37j	
-33+0j	18+19j	78+79j	58+59j	38+39j	
33+0j	20+21j	 0+1j	60+61j	40+41j	
-33+0j	22+23j	 2+3j	62+63j	42+43j	
33+0j	24+25j	 4+5j	64+65j	44+45j	
33+0j	26+27j	 6+7j	66+67j	46+47j	
33+0j	28+29j	 8+9j	68+69j	48+49j	
33+0j	30+31j	10+11j	70+71j	50+51j	
33+0j	32+33j	12+13j	72+73j	52+53j	
33+0j	34+35j	14+15j	74+75j	54+55j	
-33+0j	36+37j	16+17j	76+77j	56+57j	
-33+0j	38+39j	18+19j	78+79j	58+59j	
33+0j	40+41j	20+21j	 0+1j	60+61j	
33+0j	42+43j	22+23j	 2+3j	62+63j	
-33+0j	44+45j	24+25j	 4+5j	64+65j	
33+0j	46+47j	26+27j	 6+7j	66+67j	
-33+0j	48+49j	28+29j	 8+9j	68+69j	
33+0j	50+51j	30+31j	10+11j	70+71j	
33+0j	52+53j	32+33j	12+13j	72+73j	
33+0j	54+55j	34+35j	14+15j	74+75j	
33+0j	56+57j	36+37j	16+17j	76+77j	
33+0j	58+59j	38+39j	18+19j	78+79j

The first column are the pilot symbols (not enumerated), the remaining 4 columns hold the payload data symbols comprised of 3 z vectors. In this case the mapping between adjacent z vectors changes, and is different from training. This would imply poorer performance, however at run time it works quite well on multipath channels (in simulation and over the air).

drowe67 · 2025-05-02T07:49:27Z

New waveform design denoted candidate3 (ref FreeDV-032 spreadsheet) with consistent z->symbol mapping (run time version with pilots as first col):

43+0j	 0+1j	 0+1j	 0+1j	
43+0j	 2+3j	 2+3j	 2+3j	
43+0j	 4+5j	 4+5j	 4+5j	
43+0j	 6+7j	 6+7j	 6+7j	
43+0j	 8+9j	 8+9j	 8+9j	
-43+0j	10+11j	10+11j	10+11j	
-43+0j	12+13j	12+13j	12+13j	
43+0j	14+15j	14+15j	14+15j	
43+0j	16+17j	16+17j	16+17j	
-43+0j	18+19j	18+19j	18+19j	
43+0j	20+21j	20+21j	20+21j	
-43+0j	22+23j	22+23j	22+23j	
43+0j	24+25j	24+25j	24+25j	
43+0j	26+27j	26+27j	26+27j	
43+0j	28+29j	28+29j	28+29j	
43+0j	30+31j	30+31j	30+31j	
43+0j	32+33j	32+33j	32+33j	
43+0j	34+35j	34+35j	34+35j	
-43+0j	36+37j	36+37j	36+37j	
-43+0j	38+39j	38+39j	38+39j	
43+0j	40+41j	40+41j	40+41j	
43+0j	42+43j	42+43j	42+43j	
-43+0j	44+45j	44+45j	44+45j	
43+0j	46+47j	46+47j	46+47j	
-43+0j	48+49j	48+49j	48+49j	
43+0j	50+51j	50+51j	50+51j	
43+0j	52+53j	52+53j	52+53j	
43+0j	54+55j	54+55j	54+55j	
43+0j	56+57j	56+57j	56+57j	
43+0j	58+59j	58+59j	58+59j

Note this is not a 1:1 match to candidate2 waveforms above, as several conflicting (hyper) parameters needed to be tweaked to get an implementable waveform.

Candidate	d	Nc	Tf	Rs''	RF BW	Lcp+Lp	Epochs	--range_start
2	80	30	120ms	50 Hz	1500 Hz	~2 dB	100	-6 dB
3	60	30	120ms	38.46 Hz	1200 Hz	~2 dB	200	-3 dB

Candidate	model name	Comment
2	model19_check3	As used in RADE V1
3	250502

Training:

octave:18> Rs=40; Nc=30; multipath_samples('mpp', Rs, Rs, Nc, 250*60*60, 'h_nc30_mpp_train.f32',"",0);
python3 train.py --latent-dim 60 --cuda-visible-devices 0 --sequence-length 400 --batch-size 512 --epochs 200 --lr 0.003 --lr-decay-factor 0.0001 ~/Downloads/tts_speech_16k_speexdsp.f32 250502 --bottleneck 3 --h_file h_nc30_mpp_train.f32 --range_EbNo --range_EbNo_start -3 --plot_loss --auxdata

Initial run-time test (time domain with pilots):

./inference.sh 250502/checkpoints/checkpoint_epoch_200.pth wav/brian_g8sez.wav - --rate_Fs --pilots --pilot_eq --eq_ls --cp 0.004 --bottleneck 3 --auxdata --time_offset -16 --latent-dim 60

… for train and run time

drowe67 · 2025-05-02T22:49:08Z

Initial comparison of candidate2 (current RADE V1) and candidate3 (consistent mapping of z elements to symbols):

For each point runs 10 minutes of Librispeech samples through run time model (rate Fs, pilots inserted, EQ, time domain multipath channel simulation), and measures mean loss.
PSNR=SNR+PAPR takes into account PAPR differences as well as Rx SNR (candidate3 has a slightly different PAPR)
Sanity check: We would expect AWGN results (blue and yellow) to be similar, as mapping issue doesn't affect AWGN perf. At low SNRs they are similar, and high SNR candidate3 is slightly improved (probably due to hyper parameter changes). ✔️
If the mapping makes a sig difference we would expect the candidate3 MPP curve (purple) to be significantly better (e.g. several dB) than candidate2 (red). There appears to be a slight improvement, but this could also be explained by the other (hyper) parameter changes required to perform the test. In the region beneath 5dB the difference is ~ 0.5dB.
At high SNR, the improvement in loss on the candidate3 MPP curve is consistent with the AWGN curve, so this cannot be attributed to the symbol mapping.
Therefore with this test, the theory that consistent symbol->carrier mapping between consecutive z vectors matters is not proven.

drowe67 added 17 commits December 12, 2024 13:15

wip ASR - building up test framework

bd8afde

processing a subset of dataset

67a669f

agc and hilbert compression

eb887d1

first pass at RADE prcoessing, a bit slow

b61bbff

WIP single file rade processing, basic framework OK

040d6c6

plotting AWGN and MPP curves

465b347

support for large/turbo whisper models, clean up asr_test top level s…

265bdcd

…cript

added some test modes to asr_test.sh, adding 500ms silence after samp…

225bba7

…les to allow for any prcoessing delay

automated gen & plotting of controls

cf8d632

refining Latex plot for paper

22af4f0

typos

7096797

increased font size for Latex plot

a17dc51

lower vertical size of ASR plot

8279aaa

another 10% lower vertical, () around %

7ed0028

700D added to WER/ASR curves for 2024 HF RADE paper

cd430fe

trying solid line on FARGAN and clean control points

c680318

--print_frame option to dump OFDM modem frame

c168ca4

first pass at candidate2 waveform design where symbol mapping is same…

70bd4f4

… for train and run time

drowe67 added 2 commits May 5, 2025 06:34

symbol-carrier mapping test plots - no improvement found

5516122

restore hard coded Ts changes back to RADE V1 (candidate2)

9389497

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ASR performance measurement #39

ASR performance measurement #39

Uh oh!

drowe67 commented Dec 13, 2024 •

edited

Loading

Uh oh!

drowe67 commented May 2, 2025 •

edited

Loading

Uh oh!

drowe67 commented May 2, 2025 •

edited

Loading

Uh oh!

drowe67 commented May 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

ASR performance measurement #39

Are you sure you want to change the base?

ASR performance measurement #39

Uh oh!

Conversation

drowe67 commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drowe67 commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Training versus run time symbol allocation over OFDM modem frame

Uh oh!

drowe67 commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drowe67 commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drowe67 commented Dec 13, 2024 •

edited

Loading

drowe67 commented May 2, 2025 •

edited

Loading

drowe67 commented May 2, 2025 •

edited

Loading

drowe67 commented May 2, 2025 •

edited

Loading