Here we also introduce a `test` provider meant as an aid to exposing
via automated tests issues involving interactions between
`helper/schema` and Terraform core.
This has been helpful so far in diagnosing `ignore_changes` problems,
and I imagine it will be helpful in other contexts as well.
We'll have to be careful to prevent the `test` provider from becoming a
dumping ground for poorly specified tests that have a clear home
elsewhere. But for bug exposure I think it's useful to have.
This should be quite helpful in debugging aws-sdk-go operations.
Required some tweaking around the `helper/logging` functions to expose an
`IsDebugOrHigher()` helper for us to use.
Change the `RetryFunc` from a plain `error` return type to a
specialized `RetryError` which must decide whether it is
retryable or not.
Add `RetryableError` / `NonRetryableError` factory functions that
callers are meant to use to build up these errors.
This makes it eminently clear whether or not a given error is
retryable from inside the client code.
Goal here is to _not_ change any behavior, simply reflect the
existing behavior with the new, clearer, API.
In #4700 while fixing a data race I made an incorrect assumption about
the return value of StateChangeConf, and ended up changing the behavior
in the timeout case to always return:
> timeout while waiting for state to become '[success]'
When it used to capture the "most recent error" from the function
itself.
It's much more useful to see that error bubbling up, so here we revert
to pulling it out of the function as we did before, and we protect
against the data race with a good old fashioned mutex.
Helpful when iterating on a drift test.
Eventually I think this assertion could be fanned out to something much
more targeted like:
ExpectAttributeDiff(resource, attr, oldval, newval)
But this is a step in the right direction.
* MaxItems defines a maximum amount of items that can exist within a
TypeSet or TypeList. Specific use cases would be if a TypeSet is being
used to wrap a complex structure, however more than one instance would
cause instability.
It was a mistake to switched fully to `==` when activating waiting for
capacity on updates in #3947. Users that didn't set `min_elb_capacity ==
desired_capacity` and instead treated it as an actual "minimum" would
see timeouts for every create, since their target numbers would never be
reached exactly.
Here, we fix that regression by restoring the minimum waiting behavior
during creates.
In order to preserve all the stated behavior, I had to split out
different criteria for create and update, criteria which are now
exhaustively unit tested.
The set of fields that affect capacity waiting behavior has become a bit
of a mess. Next major release I'd like to rework all of these into a
more consistently named block of config. For now, just getting the
behavior correct and documented.
(Also removes all the fixed names from the ASG tests as I was hitting
collision issues running them over here.)
Fixes#4792
Implementation notes:
* The hash implementation was not considering key value, causing "diffs
did not match" errors when a value was updated. Switching to default
HashResource implementation fixes this
* Using HashResource as a default exposed a bug in helper/schema that
needed to be fixed so the Set function is picked up properly during
Read
* Stop writing back values into the `key` attribute; it triggers extra
diffs when `default` is used. Computed values all just go into `var`.
* Includes a state migration to prevent unnecessary diffs based on
"key" attribute hashcodes changing.
In the tests:
* Switch from leaning on the public demo Consul instance to requiring a
CONSUL_HTTP_ADDR variable be set pointing to a `consul agent -dev`
instance to be used only for testing.
* Add a test that exposes the updating issues and covers the fixes
Fixes#774Fixes#1866Fixes#3023
Adds the `TF_SKIP_REMOTE_TESTS` env var to be used in cases where the
`http.Get()` smoke test passes but the network is not able to service
the needs of the tests.
Fixes#4421
The implementation was attempting to capture the error using a scoped
variable reference, but the error value is already exposed via the
return value of `WaitForState()`. Using that instead fixes the data
race.
Fixes the following data race:
```
==================
WARNING: DATA RACE
Read by goroutine 74:
github.com/hashicorp/terraform/helper/resource.Retry()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait.go:35 +0x284
github.com/hashicorp/terraform/helper/resource.TestRetry_timeout()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait_test.go:35 +0x60
testing.tRunner()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:456 +0xdc
Previous write by goroutine 90:
github.com/hashicorp/terraform/helper/resource.Retry.func1()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait.go:20 +0x87
github.com/hashicorp/terraform/helper/resource.(*StateChangeConf).WaitForState.func1()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/state.go:83 +0x284
Goroutine 74 (running) created at:
testing.RunTests()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:561 +0xaa3
testing.(*M).Run()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:494 +0xe4
main.main()
github.com/hashicorp/terraform/helper/resource/_test/_testmain.go:84 +0x20f
Goroutine 90 (running) created at:
github.com/hashicorp/terraform/helper/resource.(*StateChangeConf).WaitForState()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/state.go:127 +0x283
github.com/hashicorp/terraform/helper/resource.Retry()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait.go:34 +0x276
github.com/hashicorp/terraform/helper/resource.TestRetry_timeout()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait_test.go:35 +0x60
testing.tRunner()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:456 +0xdc
==================
```
Generate bucket names and object names per test instead of once at the
top level. Should help avoid failures like this one:
https://travis-ci.org/hashicorp/terraform/jobs/100254008
All storage tests checked on this commit:
```
TF_ACC=1 go test -v ./builtin/providers/google -run TestAccGoogleStorage
=== RUN TestAccGoogleStorageBucketAcl_basic
--- PASS: TestAccGoogleStorageBucketAcl_basic (8.90s)
=== RUN TestAccGoogleStorageBucketAcl_upgrade
--- PASS: TestAccGoogleStorageBucketAcl_upgrade (14.18s)
=== RUN TestAccGoogleStorageBucketAcl_downgrade
--- PASS: TestAccGoogleStorageBucketAcl_downgrade (12.83s)
=== RUN TestAccGoogleStorageBucketAcl_predefined
--- PASS: TestAccGoogleStorageBucketAcl_predefined (4.51s)
=== RUN TestAccGoogleStorageObject_basic
--- PASS: TestAccGoogleStorageObject_basic (3.77s)
=== RUN TestAccGoogleStorageObjectAcl_basic
--- PASS: TestAccGoogleStorageObjectAcl_basic (4.85s)
=== RUN TestAccGoogleStorageObjectAcl_upgrade
--- PASS: TestAccGoogleStorageObjectAcl_upgrade (7.68s)
=== RUN TestAccGoogleStorageObjectAcl_downgrade
--- PASS: TestAccGoogleStorageObjectAcl_downgrade (7.37s)
=== RUN TestAccGoogleStorageObjectAcl_predefined
--- PASS: TestAccGoogleStorageObjectAcl_predefined (4.16s)
PASS
ok github.com/hashicorp/terraform/builtin/providers/google 68.275s
```
* Add SSH Keys to all droplets in tests, this prevents acctests from
spamming account owner email with root password details
* Add a new helper/acctest package to be a home for random string / int
implementations used in tests.
* Insert some random details into record tests to prevent collisions
* Normalize config style in tests to hclfmt conventions
In the acceptance testing framework, it is neccessary to provide a copy
of the state _before_ the destroy is applied to the check in order that
it can loop over resources to verify their destruction. This patch makes
a deep copy of the state prior to applying test steps which have the
Destroy option set and then passes that to the destroy check.
We weren't doing any log setup for acceptance tests, which made it
difficult to wrangle log output in CI.
This moves the log setup functions we use in `main` over into a helper
package so we can use them for acceptance tests as well.
This means that acceptance tests will by default be a _lot_ quieter,
only printing out actual test output. Setting `TF_LOG=trace` will
restore the full prior noise level.
Only minor behavior change is to make `ioutil.Discard` the default
return value rather than a `nil` that needs to be checked for.
Changing the Set internals makes a lot of sense as it saves doing
conversions in multiple places and gives a central place to alter
the key when a item is computed.
This will have no side effects other then that the ordering is now
based on strings instead on integers, so the order will be different.
This will however have no effect on existing configs as these will
use the individual codes/keys and not the ordering to determine if
there is a diff or not.
Lastly (but I think also most importantly) there is a fix in this PR
that makes diffing sets extremely more performand. Before a full diff
required reading the complete Set for every single parameter/attribute
you wanted to diff, while now it only gets that specific parameter.
We have a use case where we have a Set that has 18 parameters and the
set consist of about 600 items (don't ask 😉). So when doing a diff
it would take 100% CPU of all cores and stay that way for almost an
hour before being able to complete the diff.
Debugging this we learned that for retrieving every single parameter
it made over 52.000 calls to `func (c *ResourceConfig) get(..)`. In
this function a slice is created and used only for the duration of the
call, so the time needed to create all needed slices and on the other
hand the time the garbage collector needed to clean them up again caused
the system to cripple itself. Next to that there are also some expensive
reflect calls in this function which also claimed a fair amount of CPU
time.
After this fix the number of calls needed to get a single parameter
dropped from 52.000+ to only 2! 😃
I promised myself that next time I jumped in this file I'd fix this up.
Now we don't have to manually index the file with comments, we can just
add descriptive names to the test cases!
Because `aws_security_group_rule` resources are an abstraction on top of
Security Groups, they must interact with the AWS Security Group APIs in
a pattern that often results in lots of parallel requests interacting
with the same security group.
We've found that this pattern can trigger race conditions resulting in
inconsistent behavior, including:
* Rules that report as created but don't actually exist on AWS's side
* Rules that show up in AWS but don't register as being created
locally, resulting in follow up attempts to authorize the rule
failing w/ Duplicate errors
Here, we introduce a per-SG mutex that must be held by any security
group before it is allowed to interact with AWS APIs. This protects the
space between `DescribeSecurityGroup` and `Authorize*` / `Revoke*`
calls, ensuring that no other rules interact with the SG during that
span.
The included test exposes the race by applying a security group with
lots of rules, which based on the dependency graph can all be handled in
parallel. This fails most of the time without the new locking behavior.
I've omitted the mutex from `Read`, since it is only called during the
Refresh walk when no changes are being made, meaning a bunch of parallel
`DescribeSecurityGroup` API calls should be consistent in that case.