ESPnet Recipe for ASR on the Makerere Radio Speech Corpus#5730
ESPnet Recipe for ASR on the Makerere Radio Speech Corpus#5730sw005320 merged 16 commits intoespnet:masterfrom
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5730 +/- ##
===========================================
+ Coverage 0 16.28% +16.28%
===========================================
Files 0 767 +767
Lines 0 70337 +70337
===========================================
+ Hits 0 11453 +11453
- Misses 0 58884 +58884
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
The error |
|
@jctian98, can you review this PR? |
|
@satvik-dixit Thanks for making this PR. I've left some comments above. |
@satvik-dixit, please respond to them.
This is fixed. |
|
egs2/makerere/asr1/cmd.sh
Outdated
|
|
||
|
|
||
| # Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh" | ||
| cmd_backend='slurm' |
There was a problem hiding this comment.
Please do not change some environment related configs.
Please submit them as they are (e.g., cmd.sh and slurm.conf)
There was a problem hiding this comment.
I have made the suggested changes in cmd.sh and slurm.conf
| - espnet version: `espnet 202402` | ||
| - pytorch version: `pytorch 2.0.1` | ||
| - Git hash: `eed7751c910977290ef9a177ea0942a0e3c2fd35` | ||
| - Commit date: `Mon Mar 25 18:26:50 2024 +0000` |
There was a problem hiding this comment.
Please upload the model to the HF hub and add a link here
There was a problem hiding this comment.
I have uploaded the model to the HF hub and added a link to this file. Here's the link: https://huggingface.co/satvik-dixit/asr_makerere.
|
Hi @satvik-dixit , is there anything I can help? |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
@satvik-dixit Thanks for the current progress. Shinji and I left some comments before. Please also kindly address those comments :) |
|
@sw005320 It seems some CI tests fail at the Espnet/Python Installation stage, which should not be @satvik-dixit 's fault. Is there a good way to deal with it? |
|
LGTM. |
|
Please add a corpus description to https://github.com/espnet/espnet/blob/master/egs2/README.md |
|
Added the corpus description to readme (https://github.com/espnet/espnet/blob/master/egs2/README.md) |
|
Thanks a lot! |
What?
This is a new recipe for preparing the Makerere Radio Speech Corpus dataset and training automatic speech recognition models on it.
Details about the dataset:
The Makerere Radio Speech Corpus is a dataset in the Luganda language. It includes 20 hours of human-transcribed radio speech.
Related Links
Details about the dataset: https://zenodo.org/records/5855017
Paper on the dataset: https://arxiv.org/abs/2206.09790