* Update error message for apply validation
Add a hint that the validation failure has occurred at the root of the resource
schema to the error message. This is because the root resource has an empty
path when being validated and the path is being relied upon to provide context
into the error message.
The resource apply nodes need to be GraphNodeDestroyerCBD in order to
correctly inherit create_before_destroy. While the plan will have
recorded this to create the correct deposed nodes, the edges still need
to be transformed correctly.
We also need create_before_destroy to be saved to state for nodes that
inherited it, so that if they are removed from state the destroy will
happen in the correct order.
Stop evaluating count and for each if they aren't set in the config.
Remove "Resource" from the function names, as they are also now used
with modules.
Implement a new provider_meta block in the terraform block of modules, allowing provider-keyed metadata to be communicated from HCL to provider binaries.
Bundled in this change for minimal protocol version bumping is the addition of markdown support for attribute descriptions and the ability to indicate when an attribute is deprecated, so this information can be shown in the schema dump.
Co-authored-by: Paul Tyng <paul@paultyng.net>
During destroy, the for expression may be unknown and evaluation will
fail. Destroy provisioners however can only reference the key value,
which is known in the address.
a large refactor to addrs.AbsProviderConfig, embedding the addrs.Provider instead of a Type string. I've added and updated tests, added some Legacy functions to support older state formats and shims, and added a normalization step when reading v4 (current) state files (not the added tests under states/statefile/roundtrip which work with both current and legacy-style AbsProviderConfig strings).
The remaining 'fixme' and 'todo' comments are mostly going to be addressed in a subsequent PR and involve looking up a given local provider config's FQN. This is fine for now as we are only working with default assumption.
* Introduce "Local" terminology for non-absolute provider config addresses
In a future change AbsProviderConfig and LocalProviderConfig are going to
become two entirely distinct types, rather than Abs embedding Local as
written here. This naming change is in preparation for that subsequent
work, which will also include introducing a new "ProviderConfig" type
that is an interface that AbsProviderConfig and LocalProviderConfig both
implement.
This is intended to be largely just a naming change to get started, so
we can deal with all of the messy renaming. However, this did also require
a slight change in modeling where the Resource.DefaultProviderConfig
method has become Resource.DefaultProvider returning a Provider address
directly, because this method doesn't have enough information to construct
a true and accurate LocalProviderConfig -- it would need to refer to the
configuration to know what this module is calling the provider it has
selected.
In order to leave a trail to follow for subsequent work, all of the
changes here are intended to ensure that remaining work will become
obvious via compile-time errors when all of the following changes happen:
- The concept of "legacy" provider addresses is removed from the addrs
package, including removing addrs.NewLegacyProvider and
addrs.Provider.LegacyString.
- addrs.AbsProviderConfig stops having addrs.LocalProviderConfig embedded
in it and has an addrs.Provider and a string alias directly instead.
- The provider-schema-handling parts of Terraform core are updated to
work with addrs.Provider to identify providers, rather than legacy
strings.
In particular, there are still several codepaths here making legacy
provider address assumptions (in order to limit the scope of this change)
but I've made sure each one is doing something that relies on at least
one of the above changes not having been made yet.
* addrs: ProviderConfig interface
In a (very) few special situations in the main "terraform" package we need
to make runtime decisions about whether a provider config is absolute
or local.
We currently do that by exploiting the fact that AbsProviderConfig has
LocalProviderConfig nested inside of it and so in the local case we can
just ignore the wrapping AbsProviderConfig and use the embedded value.
In a future change we'll be moving away from that embedding and making
these two types distinct in order to represent that mapping between them
requires consulting a lookup table in the configuration, and so here we
introduce a new interface type ProviderConfig that can represent either
AbsProviderConfig or LocalProviderConfig decided dynamically at runtime.
This also includes the Config.ResolveAbsProviderAddr method that will
eventually be responsible for that local-to-absolute translation, so
that callers with access to the configuration can normalize to an
addrs.AbsProviderConfig given a non-nil addrs.ProviderConfig. That's
currently unused because existing callers are still relying on the
simplistic structural transform, but we'll switch them over in a later
commit.
* rename LocalType to LocalName
Co-authored-by: Kristin Laemmert <mildwonkey@users.noreply.github.com>
Make use of the new Dependencies field in the instance state.
The inter-instance dependencies will be determined from the complete
reference graph, so that absolute addresses can be stored, rather than
just references within a module. The Dependencies are added to the node
in the same manner as state, i.e. via an "attacher" interface and
transformer. This is because dependencies are calculated from the graph
itself, and not from the config.
Previously we were using the experimental HCL 2 repository, but now we'll
shift over to the v2 import path within the main HCL repository as part of
actually releasing HCL 2.0 as stable.
This is a mechanical search/replace to the new import paths. It also
switches to the v2.0.0 release of HCL, which includes some new code that
Terraform didn't previously have but should not change any behavior that
matters for Terraform's purposes.
For the moment the experimental HCL2 repository is still an indirect
dependency via terraform-config-inspect, so it remains in our go.sum and
vendor directories for the moment. Because terraform-config-inspect uses
a much smaller subset of the HCL2 functionality, this does still manage
to prune the vendor directory a little. A subsequent release of
terraform-config-inspect should allow us to completely remove that old
repository in a future commit.
Now that the most common cause of unknowns (invalid resource indexes) is
caught earlier, we can validate that the final apply config is wholly
known before attempting to apply it. This ensures that we're applying
the configuration we intend, and not silently dropping values.
When an operation fails, providers may return a null new value rather than
returning a partial state. In that case, we'd prefer to keep the old value
so that we stand the best chance of being able to retry on a subsequent
run.
Previously we were making an exception for the delete action, allowing
the result of that to be null even when an error is returned. In practice
that was a bad idea because it would cause Terraform to lose track of the
object even though it might not actually have been deleted.
Now we'll retain the old object even in the delete case. Providers can
still return partial new objects if they were able to partially complete
a delete operation, in which case we'll discard what we had before, but
if the result is null with errors then we'll assume the delete failed
entirely and so just keep the old state as-is, giving us the opportunity
to refresh it on the next run to see if anything actually happened after
all.
(This also includes a new resource in the test provider which isn't used
by the patch but was useful for some manual UX testing here, so I thought
I'd include it in case it's similarly useful in future, given how simple
its implementation is.)
We are now allowing the legacy SDK to opt out of the safety checks we try
to do after plan and apply, and so in such cases the before/after values
in planned changes may be inconsistent with our usual rules.
To avoid adding lots of extra complexity to the diff renderer to deal with
these situations, instead we'll normalize the handling of nested blocks
prior to using these values.
In the long run it'd be better to do this normalization at the source,
immediately after we receive an object from a provider using the opt-out,
but we're doing this at the outermost layer for now to avoid risking
unintended impacts on other Terraform Core components when we're just
about to enter the beta phase of the v0.12.0 release cycle.
A provider may react to a create or update failing by returning error
diagnostics and a partially-updated or nil new value, in which case we
do not expect our AssertObjectCompatible consistency check to succeed: the
provider is just assumed to be doing the best it can to preserve whatever
partial outcome it was able to achieve.
However, if errors are accompanied with a nil new value after an update,
we'll assume that the provider is telling us it wasn't able to get far
enough to make any change at all, and so we'll retain the prior value in
state. This ensures that a provider can't cause an object to be forgotten
from the state just because an update failed.
We've allowed the legacy SDK an opt-out from the post-apply safety checks,
but previously we produced only a generic warning message in that case.
Now instead we'll still run the safety checks, but report the results in
the logs instead of as error diagnostics.
This should allow developers who are debugging strange interactions
between buggy legacy providers to get better insight into what's going
on upstream in order to help explain what's going on when these problems
inevitably get caught by other downstream safety checks when trying to
make use of these invalid results.
The shim layer for the legacy SDK type system is not precise enough to
guarantee it will produce identical results between plan and apply. In
particular, values that are null during plan will often become zero-valued
during apply.
To avoid breaking those existing providers while still allowing us to
introduce this check in the future, we'll introduce a rather-hacky new
flag that allows the legacy SDK to signal that it is the legacy SDK and
thus disable the check.
Once we start phasing out the legacy SDK in favor of one that natively
understands our new type system, we can stop setting this flag and thus
get the additional safety of this check without breaking any
previously-released providers.
No other SDK is permitted to set this flag, and we will remove it if we
ever introduce protocol version 6 in future, assuming that any provider
supporting that protocol will always produce consistent results.
Previously we would allow providers to change anything about the planned
object value during apply, possibly returning an entirely-unrelated object
of the same type. In practice this led to some subtle bugs where a single
planned attribute value would change during apply and cause a downstream
failure due to a dependent resource now seeing input other than what
_it_ expected during plan.
Now we'll produce an explicit error message for this case which places the
blame with the correct party: the upstream resource that changed. Without
this, unexpected changes would often lead to the downstream resource
implementation being blamed in error message even though it was just
reacting to the change from upstream.
As with most errors during apply, we'll still save the updated value in
the state but we'll halt the walk to ensure that the unexpected value
cannot propagate further and cause the result to potentially diverge
greatly from the changeset shown in the plan.
Compared to Terraform 0.11, we expect to see this error in many of the
same cases we saw the "diffs didn't match during apply" error in earlier
versions, since it is likely that many errors of that sort were the result
of unexpected upstream changes being incorrectly blamed on the downstream
resource that then used the result.
In an ideal world, providers are supposed to respond to errors during
apply by returning a partial new state alongside the error diagnostics.
In practice though, our SDK leaves the new value set to nil for certain
errors, which was causing Terraform to "forget" the object altogether by
assuming that the provider intended to say "null".
We now adjust that assumption to apply only in the delete case. In all
other cases (including updates) we retain the prior state if the new
state is given as nil. Although we could potentially fix this in the SDK
itself, I expect this is a likely bug in other future SDKs for other
languages too, so this new assumption is a safer one to make to be
resilient to data loss when providers don't behave perfectly.
Providers that return both nil new value and no errors are considered
buggy, but unfortunately that applies to the mocks in many of our tests,
so for pragmatic reasons we can't generate an error for that case as we do
for other "should never happen" situations. Instead, we'll just retain the
prior value in the state so the user can retry.
Previously we were fetching these from the provider but then immediately
discarding the version numbers because the schema API had nowhere to put
them.
To avoid a late-breaking change to the internal structure of
terraform.ProviderSchema (which is constructed directly all over the
tests) we're retaining the resource type schemas in a new map alongside
the existing one with the same keys, rather than just switching to
using the providers.Schema struct directly there.
The methods that return resource type schemas now return two arguments,
intentionally creating a little API friction here so each new caller can
be reminded to think about whether they need to do something with the
schema version, though it can be ignored by many callers.
Since this was a breaking change to the Schemas API anyway, this also
fixes another API wart where there was a separate method for fetching
managed vs. data resource types and thus every caller ended up having a
switch statement on "mode". Now we just accept mode as an argument and
do the switch statement within the single SchemaForResourceType method.
Previously we used a single plan action "Replace" to represent both the
destroy-before-create and the create-before-destroy variants of replacing.
However, this forces the apply graph builder to jump through a lot of
hoops to figure out which nodes need it forced on and rebuild parts of
the graph to represent that.
If we instead decide between these two cases at plan time, the actual
determination of it is more straightforward because each resource is
represented by only one node in the plan graph, and then we can ensure
we put the right nodes in the graph during DiffTransformer and thus avoid
the logic for dealing with deposed instances being spread across various
different transformers and node types.
As a nice side-effect, this also allows us to show the difference between
destroy-then-create and create-then-destroy in the rendered diff in the
CLI, although this change doesn't fully implement that yet.
Previously our handling of create_before_destroy -- and of deposed objects
in particular -- was rather "implicit" and spread over various different
subsystems. We'd quietly just destroy every deposed object during a
destroy operation, without any user-visible plan to do so.
Here we make things more explicit by tracking each deposed object
individually by its pseudorandomly-allocated key. There are two different
mechanisms at play here, building on the same concepts:
- During a replace operation with create_before_destroy, we *pre-allocate*
a DeposedKey to use for the prior object in the "apply" node and then
pass that exact id to the destroy node, ensuring that we only destroy
the single object we planned to destroy. In the happy path here the
user never actually sees the allocated deposed key because we use it and
then immediately destroy it within the same operation. However, that
destroy may fail, which brings us to the second mechanism:
- If any deposed objects are already present in state during _plan_, we
insert a destroy change for them into the plan so that it's explicit to
the user that we are going to destroy these additional objects, and then
create an individual graph node for each one in DiffTransformer.
The main motivation here is to be more careful in how we handle these
destroys so that from a user's standpoint we never destroy something
without the user knowing about it ahead of time.
However, this new organization also hopefully makes the code itself a
little easier to follow because the connection between the create and
destroy steps of a Replace is reprseented in a single place (in
DiffTransformer) and deposed instances each have their own explicit graph
node rather than being secretly handled as part of the main instance-level
graph node.
We can be more relaxed about our rules that a create musn't return null
or a destroy must return null if the provider also itself indicated an
error. In that case, it's expected that the return value is describing a
partial result, and so we'll just store it and move on.
Although we have a special case where a result of the wrong type will bail
early, we must keep that set of diagnostics separate so that we can still
run to completion when there are _already_ diagnostics present (from the
provider's response) but the return value _is_ type-conforming.
This fix is verified by TestContext2Apply_provisionerCreateFail.
Our previous mechanism for dealing with tainting relied on directly
mutating the InstanceState object to mark it as such. In our new state
models we consider the instance objects to be immutable by convention, and
so we frequently copy them. As a result, the taint flagging was no longer
making it all the way through the apply evaluation process.
Here we now implement tainting as a separate step in the evaluation
process, creating a copy of the object with a tainted status if there were
any errors during creation.
This introduces a new behavior where any provider-level errors during
creation will also cause an instance to be marked as tainted if any object
is returned at all. Create-time errors _normally_ result in no object at
all, but the provider might return an object if the failure occurred at
a subsequent step of a multi-step creation process and so left behind a
remote object that needs to be cleaned up on a future run.
The error handling here is a bit tricky due to the ability for users to
opt out of aborting on error. It's important that we keep straight the
distinction between applyDiags and diags so we can tell the difference
between the errors from _this_ provisioner and the errors for the entire
run so far.
Previously we kept the dependencies one level higher on the resource
instance itself, which meant that updating it was handled in a different
EvalNode, but now we consider these to be dependencies of the object
itself (derived from the configuration that was current at the time it
was created), so we must handle this during EvalApply.
The subtle difference here is that if an object is moved to "deposed"
during a create_before_destroy replace then it will retain the
dependencies it had on its last apply, rather than them being replaced
by the dependencies of the newly-created object.
If the plan called for us to delete but the result isn't null then that's
suspect, because it suggests the object wasn't deleted after all.
Likewise, no other apply action should cause the the result to be missing.
In order to avoid the confusing user experience that results in this case
(since it often looks like Terraform did nothing at all) we'll produce
some errors about it, but still update the state to reflect what the
provider returned anyway to allow for debugging and recovery.
Incorrect pointer discipline here was causing the error to be lost rather
than returned as expected.
Additionally we'll include a log line in this case because otherwise an
apply error is reported so far from the actual apply operation that it
can be difficult to understand what happened.
This allows the provider to distinguish whether a particular value is set
in configuration or whether it's coming from prior state. It has no
particular purpose other than that.
The provider is allowed to return a partial result if it also includes
error diagnostics. Real providers still return at least a null value in
that case due to the RPC format, but test mocks are often more sloppy.
Chaange ResourceProvider to providers.Interface starting from the
context, and fix all type errors.
This only replaced some of method calls directly applicable to the
providers themselves. The resource methods will follow.
Due to how often the state and plan types are referenced throughout
Terraform, there isn't a great way to switch them out gradually. As a
consequence, this huge commit gets us from the old world to a _compilable_
new world, but still has a large number of known test failures due to
key functionality being stubbed out.
The stubs here are for anything that interacts with providers, since we
now need to do the follow-up work to similarly replace the old
terraform.ResourceProvider interface with its replacement in the new
"providers" package. That work, along with work to fix the remaining
failing tests, will follow in subsequent commits.
The aim here was to replace all references to terraform.State and its
downstream types with states.State, terraform.Plan with plans.Plan,
state.State with statemgr.State, and switch to the new implementations of
the state and plan file formats. However, due to the number of times those
types are used, this also ended up affecting numerous other parts of core
such as terraform.Hook, the backend.Backend interface, and most of the CLI
commands.
Just as with 5861dbf3fc49b19587a31816eb06f511ab861bb4 before, I apologize
in advance to the person who inevitably just found this huge commit while
spelunking through the commit history.
Previously we had the evaluate methods accept directly an
addrs.InstanceKey and had our evaluator infer a suitable value for
count.index for it, but that prevents us from setting the index to be
unknown in the validation scenario where we may not be able to predict
the number of instances yet but we still want to be able to check that
the configuration block is type-safe for all possible count values.
To achieve this, we separate the concern of deciding on a value for
count.index from the concern of evaluating it, which then allows for
other implementations of this in future. For the purpose of this commit
there is no change in behavior, with the count.index value being populated
whenever the instance key is a number.
This commit does a little more groundwork for the future implementation
of the for_each feature (which'll support each.key and each.value) but
still doesn't yet implement it, leaving it just stubbed out for the
moment.
Previously we would just retain an empty InstanceState in this case, but
now that we must enumerate all of the available instances during
expression evaluation it's important that we be able to recognize
instances that have been deleted.