Thanks to visit codestin.com
Credit goes to github.com

Skip to content

transcribe: add model path to vosk Model #12479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 11, 2025
Merged

Conversation

sannya-singal
Copy link
Contributor

@sannya-singal sannya-singal commented Apr 4, 2025

Motivation

CI flaky test runs for transcribe:

Changes

Fix the vosk Model class downloading issue by providing model_path to prevent re-downloading of the model and ensure accurate and consistent model path is being used.

Added a check to see if the model path is non-empty along with a log statement for the path.

Enabled the skipped flaky tests: #12473.

Testing

Running the pipeline a few times to ensure stability of the pipelines:
https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/a3f797d5-461a-4a7b-ad47-e977efb253ee
https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/5b74f6a1-242b-4549-bce6-610541b95388
https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/23981c99-96e2-4eb7-be83-a800d0fb7b9b
https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/c98da336-23f4-4696-8c8a-e223a8323d40
https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/0162b75f-3be3-4ba1-9dcb-222472b0b4f3
https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/f6c63c2f-747c-4dbc-8e6d-49c673eeb6fc
https://app.circleci.com/pipelines/github/localstack/localstack/32093/workflows/db5d09e3-792d-4d91-b8d0-b7822cd09c7a
https://app.circleci.com/pipelines/github/localstack/localstack/32102/workflows/48508602-6c4c-44ca-b5fe-1a68efb97293
https://app.circleci.com/pipelines/github/localstack/localstack/32164/workflows/42109e13-4c81-41e6-b744-a32d1e81bc0a
https://app.circleci.com/pipelines/github/localstack/localstack/32167/workflows/c483268e-802b-4a41-86ff-8775df25cec0

TODO

@sannya-singal sannya-singal self-assigned this Apr 4, 2025
@sannya-singal sannya-singal added the semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases label Apr 4, 2025
Copy link

github-actions bot commented Apr 4, 2025

LocalStack Community integration with Pro

 2 files  ±    0   2 suites  ±0   1m 20s ⏱️ - 1h 51m 30s
25 tests  - 4 325  21 ✅  - 3 962  4 💤  - 363  0 ❌ ±0 
27 runs   - 4 325  21 ✅  - 3 962  6 💤  - 363  0 ❌ ±0 

Results for commit 77f25ee. ± Comparison against base commit 073eab9.

This pull request removes 4325 tests.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…

♻️ This comment has been updated with latest results.

@sannya-singal sannya-singal added this to the 4.4 milestone Apr 4, 2025
@sannya-singal sannya-singal marked this pull request as ready for review April 4, 2025 09:04
@sannya-singal sannya-singal requested a review from bentsku April 4, 2025 09:04
Copy link
Member

@viren-nadkarni viren-nadkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests have been sticking point, hopefully this should resolve it once and for all 🤞

from vosk import KaldiRecognizer, Model # noqa

model = Model(model_name=model_name)
model = Model(model_path=str(model_path), model_name=model_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's implemented in C, but did you look into why the cache detection didn't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure at the moment, but the changes in this PR are based on two key observations:

  • The vosk Model may attempt to download a model when the model directory is empty due to an incorrect model_path being used. We want to make sure that we use VOSK_MODEL_PATH as the path.
  • When checking the LocalStack cache, we only verify whether the model path exists. It's possible that the directory exists but doesn't actually contain any model files.

I have added a log to check if we use a cached model and log its path.

We can take as a next step to verify the checksum if the tests remain flaky, let me know thoughts on it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vosk Model may attempt to download a model when the model directory is empty due to an incorrect model_path being used. We want to make sure that we use VOSK_MODEL_PATH as the path.

If model path is now passed as part of Model instantiation, is it still necessary to set VOSK_MODEL_PATH?

When checking the LocalStack cache, we only verify whether the model path exists. It's possible that the directory exists but doesn't actually contain any model files.

Any idea what would lead to this situation, where empty directories are created?

I think our CI networks are pretty resilient and shouldn't cause datastream to be corrupted, otherwise this would pop up in several other tests that we have. Adding checksum verification is probably unnecessary, I suspect the root cause is something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely, thanks for pointing that out 🙌 I have removed VOSK_MODEL_PATH in b6c7b46.

One of the possible reasons of failures could be that we are setting up cache directory from the environment variable. Which needs to be set before vosk module is imported. We currently import vosk in different locations - which could be the possible issue of cache path not being set.

@sannya-singal sannya-singal removed the request for review from alexrashed April 9, 2025 06:43
@sannya-singal
Copy link
Contributor Author

I am currently working on investigating localstack.packages.api.PackageException: Installation of vosk 0.3.43 failed. in the latest runs, updated the PR description.

@viren-nadkarni viren-nadkarni self-requested a review April 9, 2025 09:53
@sannya-singal sannya-singal marked this pull request as draft April 9, 2025 11:30
@sannya-singal
Copy link
Contributor Author

sannya-singal commented Apr 10, 2025

@sannya-singal sannya-singal marked this pull request as ready for review April 10, 2025 11:36
@sannya-singal
Copy link
Contributor Author

For the latest runs of the PR, the pipeline is green. Current findings for localstack.packages.api.PackageException: Installation of vosk 0.3.43 failed. error:

As discussed @alexrashed do you think there is something more I can verify? Please let me know your thoughts here @alexrashed @viren-nadkarni.

Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the deliberate investigation of the issues with the transcribe tests! 💯
The implemented fix for the model caching will hopefully fix some of the instabilities with the transcribe tests. When it comes to the issues with the Python package download, I can also only imagine that this is due to some kind of issues on the side of PyPi, their CDN, or some kind of rate limiting with the CI/CD runners. In my opinion we can merge this PR and unskip the tests to see if they stay stable on master.

@sannya-singal sannya-singal merged commit b619ded into master Apr 11, 2025
37 checks passed
@sannya-singal sannya-singal deleted the transcribe-flaky branch April 11, 2025 04:35
@sannya-singal
Copy link
Contributor Author

Successful Github run on master: https://github.com/localstack/localstack/actions/runs/14395665806

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants