-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
transcribe: add model path to vosk Model #12479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LocalStack Community integration with Pro 2 files ± 0 2 suites ±0 1m 20s ⏱️ - 1h 51m 30s Results for commit 77f25ee. ± Comparison against base commit 073eab9. This pull request removes 4325 tests.
♻️ This comment has been updated with latest results. |
68ad955
to
827124f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests have been sticking point, hopefully this should resolve it once and for all 🤞
from vosk import KaldiRecognizer, Model # noqa | ||
|
||
model = Model(model_name=model_name) | ||
model = Model(model_path=str(model_path), model_name=model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it's implemented in C, but did you look into why the cache detection didn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure at the moment, but the changes in this PR are based on two key observations:
- The vosk
Model
may attempt to download a model when the model directory is empty due to an incorrectmodel_path
being used. We want to make sure that we useVOSK_MODEL_PATH
as the path. - When checking the LocalStack cache, we only verify whether the model path exists. It's possible that the directory exists but doesn't actually contain any model files.
I have added a log to check if we use a cached model and log its path.
We can take as a next step to verify the checksum if the tests remain flaky, let me know thoughts on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vosk Model may attempt to download a model when the model directory is empty due to an incorrect model_path being used. We want to make sure that we use VOSK_MODEL_PATH as the path.
If model path is now passed as part of Model
instantiation, is it still necessary to set VOSK_MODEL_PATH
?
When checking the LocalStack cache, we only verify whether the model path exists. It's possible that the directory exists but doesn't actually contain any model files.
Any idea what would lead to this situation, where empty directories are created?
I think our CI networks are pretty resilient and shouldn't cause datastream to be corrupted, otherwise this would pop up in several other tests that we have. Adding checksum verification is probably unnecessary, I suspect the root cause is something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely, thanks for pointing that out 🙌 I have removed VOSK_MODEL_PATH
in b6c7b46
.
One of the possible reasons of failures could be that we are setting up cache directory from the environment variable. Which needs to be set before vosk module is imported. We currently import vosk in different locations - which could be the possible issue of cache path not being set.
I am currently working on investigating |
f34abe5
to
280df94
Compare
For the latest runs of the PR, the pipeline is green. Current findings for
As discussed @alexrashed do you think there is something more I can verify? Please let me know your thoughts here @alexrashed @viren-nadkarni. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the deliberate investigation of the issues with the transcribe tests! 💯
The implemented fix for the model caching will hopefully fix some of the instabilities with the transcribe tests. When it comes to the issues with the Python package download, I can also only imagine that this is due to some kind of issues on the side of PyPi, their CDN, or some kind of rate limiting with the CI/CD runners. In my opinion we can merge this PR and unskip the tests to see if they stay stable on master
.
Successful Github run on |
Motivation
CI flaky test runs for transcribe:
Changes
Fix the vosk
Model
class downloading issue by providingmodel_path
to prevent re-downloading of the model and ensure accurate and consistent model path is being used.Added a check to see if the model path is non-empty along with a log statement for the path.
Enabled the skipped flaky tests: #12473.
Testing
Running the pipeline a few times to ensure stability of the pipelines:
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/a3f797d5-461a-4a7b-ad47-e977efb253ee
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/5b74f6a1-242b-4549-bce6-610541b95388
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/23981c99-96e2-4eb7-be83-a800d0fb7b9b
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/c98da336-23f4-4696-8c8a-e223a8323d40
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/0162b75f-3be3-4ba1-9dcb-222472b0b4f3
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/f6c63c2f-747c-4dbc-8e6d-49c673eeb6fc
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32093/workflows/db5d09e3-792d-4d91-b8d0-b7822cd09c7a
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32102/workflows/48508602-6c4c-44ca-b5fe-1a68efb97293
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32164/workflows/42109e13-4c81-41e6-b744-a32d1e81bc0a
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32167/workflows/c483268e-802b-4a41-86ff-8775df25cec0
TODO
localstack.packages.api.PackageException: Installation of vosk 0.3.43 failed.
error.vosk
installation failures without model path changes: Test PyPivosk
installation failures #12510✅ Successful Runs: transcribe: add model path to vosk Model #12479 (comment)