Thanks to visit codestin.com
Credit goes to github.com

Skip to content

flake: TestProvision/returns-modules #322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spikecurtis opened this issue Jan 27, 2025 · 7 comments
Closed

flake: TestProvision/returns-modules #322

spikecurtis opened this issue Jan 27, 2025 · 7 comments
Assignees
Labels

Comments

@spikecurtis
Copy link

    provision_test.go:803: log: [DEBUG] Initializing the backend...
    provision_test.go:803: log: [DEBUG] Initializing modules...
    provision_test.go:803: log: [DEBUG] - hello in module
    provision_test.go:803: log: [DEBUG] - hello.there in module/inner_module
    provision_test.go:803: log: [DEBUG] Initializing provider plugins...
    provision_test.go:803: log: [DEBUG] - Finding latest version of hashicorp/null...
Error: vision_test.go:803: log: [ERROR] Error: Failed to query available provider packages
Error: vision_test.go:803: log: [ERROR] Could not retrieve the list of available versions for provider
Error: vision_test.go:803: log: [ERROR] hashicorp/null: could not connect to registry.terraform.io: failed to request
Error: vision_test.go:803: log: [ERROR] discovery document: Get
Error: vision_test.go:803: log: [ERROR] "https://registry.terraform.io/.well-known/terraform.json": context deadline
Error: vision_test.go:803: log: [ERROR] exceeded
    t.go:106: 2025-01-27 07:07:03.674 [debu]  executor: command done  args="[init -no-color -input=false]"  error="exit status 1"
    t.go:106: 2025-01-27 07:07:03.674 [debu]  executor: closing writers  error="exit status 1"
    t.go:106: 2025-01-27 07:07:03.674 [debu]  init failed  error="exit status 1"
    t.go:106: 2025-01-27 07:07:03.674 [debu]  canceledOrComplete closed
    t.go:106: 2025-01-27 07:07:03.674 [debu]  executor: kill context ended  args="[/Users/runner/work/_temp/04f2bf78-299d-4436-8e4e-d964e2f76538/terraform init -no-color -input=false]"
    provision_test.go:819: 
        	Error Trace:	/Users/runner/work/coder/coder/provisioner/terraform/provision_test.go:819
        	Error:      	Not equal: 
        	            	expected: ""
        	            	actual  : "initialize terraform: exit status 1"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-
        	            	+initialize terraform: exit status 1
        	Test:       	TestProvision/returns-modules

https://github.com/coder/coder/actions/runs/12983674870/job/36205319350?pr=16265

Looks like this test requires reaching out to registry.terraform.io, which is generally a bad idea in unit tests, since it subjects us to network issues, throttling/limits, etc.

@spikecurtis
Copy link
Author

@hugodutka
Copy link

Sorry for the delay, I completely forgot about this one.

The root cause of the issue seems to be terraform downloading providers over and over again. In the first error a connection to registry.terraform.io failed, in the second - a connection to github.com.

I think we could instead download the providers once and cache them between CI runs. I'd enforce that terraform init only uses a local registry mirror by using a provider_installation block in the Terraform CLI config for the tests:

provider_installation {
  filesystem_mirror {
    path    = "/path/to/local/providers"
    include = ["*/*"]
  }
  direct {
    exclude = ["*/*"]
  }
}

Each test would try to add its providers to the local mirror if it hasn't already while holding a system-wide lock with this command:

terraform providers mirror /path/to/local/providers

This approach doesn't completely remove the dependency on the external registry, but it dramatically reduces the number of network requests required. In fact, if terraform tests don't change between runs, no new network requests should be made at all. Do you see any issues with this approach, @spikecurtis? If not, I'll implement it this week.

@spikecurtis
Copy link
Author

Is there any way we can completely remove the dependency on the external registry? Unit tests like this should be fast, self-contained, and bullet proof.

@hugodutka
Copy link

hugodutka commented Apr 8, 2025

@spikecurtis From what I can see many of the TestProvider tests depend on the coder provider and check whether it behaves correctly. I don't think there's a way to remove the dependency without deleting the tests completely.

@spikecurtis
Copy link
Author

Let's start with caching between runs in CI, and see if the flakes go away.

@hugodutka
Copy link

I've implemented the coder-side caching logic in coder/coder#17373. I'm still working out the caching strategy in GitHub Actions. I'll have an update soon, and then the PR will be ready for review.

hugodutka added a commit to coder/coder that referenced this issue Apr 28, 2025
Addresses coder/internal#322.

This PR starts caching Terraform providers used by `TestProvision` in
`provisioner/terraform/provision_test.go`. The goal is to improve the
reliability of this test by cutting down on the number of network calls
to external services. It leverages GitHub Actions cache, which [on depot
runners is persisted for 14 days by
default](https://depot.dev/docs/github-actions/overview#cache-retention-policy).

Other than the aforementioned `TestProvision`, I couldn't find any other
tests which depend on external terraform providers.
@hugodutka
Copy link

Closing the issue since coder/coder#17373 should address it. If the problem resurfaces, we can reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants