Update the chunk iterator for the TSE task#4929
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4929 +/- ##
==========================================
+ Coverage 74.79% 77.00% +2.20%
==========================================
Files 606 606
Lines 53721 53761 +40
==========================================
+ Hits 40183 41396 +1213
+ Misses 13538 12365 -1173
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| self.seed = seed | ||
| self.shuffle = shuffle | ||
| self.excluded_key_pattern = ( | ||
| "(" + "[0-9]*)|(".join(excluded_key_prefixes) + "[0-9]*)" |
There was a problem hiding this comment.
Are you only assuming the cases of numbers are appended?
If so, it should be documented in add_argument or other places.
Adding a comment to the config is also informative.
There was a problem hiding this comment.
Yes, only an exact match and those matched with trailing numbers are considered here. I will update the information in the argument definition and configs.
This PR updates the chunk iterator to make it more general for different tasks.
I added a new argument
excluded_key_prefixesfor ChunkIterFactory to allow certain keys to be ignored when checking the length consistency for each sample:This can be useful for audio-to-audio tasks like target speaker extraction (TSE), where additional features are required as input, which do not necessarily have the same length as the input/target signal.