There are several steps here and a number of them can include reaching out
to remote servers or executing local processes, so it's helpful to have
some trace logs to better narrow down causes of errors and hangs during
this step.
When we're being asked to destroy everything, we ideally want to end up
with a totally empty state. Normally we will conservatively keep around
the "husks" of resources (what's left after all of the instances have been
destroyed) unless they are configured without count or for_each, but in
this special case we'll prune those out.
The implication of this is that in "weird" expression contexts that happen
before the next "terraform plan", such as evaluation in
"terraform console" or expressions in data resources and provider blocks
that get evaluated during the refresh walk, we will see these results
as unknown rather than as empty lists of objects. We accept that weirdness
for now because in a future release we are likely to remove "refresh" as
a separate walk anyway, doing all of that work during the plan walk where
we can ensure that these values are properly re-populated before trying
to use them.
Since stopping is a rather complex mechanism that relies on correct
handling of concurrency, it's handy to have these logs here to debug when
things don't happen in quite the right order.
Prior to our refactoring here, we were relying on a lucky coincidence for
correct behavior of the plan walk following a refresh in the same run:
- The refresh phase created placeholder objects in the state to represent
any resource instance pending creation, to allow the interpolator to
read attributes from them when evaluating "provider" and "data" blocks.
In effect, the refresh walk is creating a partial plan that only covers
creation actions, but was immediately discarding the actual diff entries
and storing only the planned new state.
- It happened that objects pending creation showed up in state with an
empty ID value, since that only gets assigned by the provider during
apply.
- The Refresh function concluded by calling terraform.State.Prune, which
deletes from the state any objects that have an empty ID value, which
therefore prevented these temporary objects from surviving into the
plan phase.
After refactoring, we no longer have this special ID field on instance
object state, and we instead rely on the Status field for tracking such
things. We also no longer have an explicit "prune" step on state, since
the state mutation methods themselves keep the structure pruned.
To address this, here we introduce a new instance object status "planned",
which is equivalent to having an empty ID value in the old world. We also
introduce a new method on states.SyncState that deletes from the state
any planned objects, which therefore replaces that portion of the old
State.prune operation just for this refresh use-case.
Finally, we are now expecting the expression evaluator to pull pending
objects from the planned changeset rather than from the state directly,
and so for correct results these placeholder resource creation changes
must also be reported in a throwaway changeset during the refresh walk.
The addition of states.ObjectPlanned also permits a previously-missing
safety check in the expression evaluator to prevent us from relying on the
incomplete value stored in state for a pending object, in the event that
some bug prevents the real pending object from being written into the
planned changeset.
This is a light adaptation of our earlier prototype of structural diff
rendering, as a starting point for what we'll actually ship. This is not
consistent with the latest mocks, so will need some additional work before
it is ready, but integrating this allows us to at least see the plan
contents while fixing up remaining issues elsewhere.
Chaange ResourceProvider to providers.Interface starting from the
context, and fix all type errors.
This only replaced some of method calls directly applicable to the
providers themselves. The resource methods will follow.
Due to how often the state and plan types are referenced throughout
Terraform, there isn't a great way to switch them out gradually. As a
consequence, this huge commit gets us from the old world to a _compilable_
new world, but still has a large number of known test failures due to
key functionality being stubbed out.
The stubs here are for anything that interacts with providers, since we
now need to do the follow-up work to similarly replace the old
terraform.ResourceProvider interface with its replacement in the new
"providers" package. That work, along with work to fix the remaining
failing tests, will follow in subsequent commits.
The aim here was to replace all references to terraform.State and its
downstream types with states.State, terraform.Plan with plans.Plan,
state.State with statemgr.State, and switch to the new implementations of
the state and plan file formats. However, due to the number of times those
types are used, this also ended up affecting numerous other parts of core
such as terraform.Hook, the backend.Backend interface, and most of the CLI
commands.
Just as with 5861dbf3fc49b19587a31816eb06f511ab861bb4 before, I apologize
in advance to the person who inevitably just found this huge commit while
spelunking through the commit history.
Previously we had the evaluate methods accept directly an
addrs.InstanceKey and had our evaluator infer a suitable value for
count.index for it, but that prevents us from setting the index to be
unknown in the validation scenario where we may not be able to predict
the number of instances yet but we still want to be able to check that
the configuration block is type-safe for all possible count values.
To achieve this, we separate the concern of deciding on a value for
count.index from the concern of evaluating it, which then allows for
other implementations of this in future. For the purpose of this commit
there is no change in behavior, with the count.index value being populated
whenever the instance key is a number.
This commit does a little more groundwork for the future implementation
of the for_each feature (which'll support each.key and each.value) but
still doesn't yet implement it, leaving it just stubbed out for the
moment.
Provider input is now longer handled with a graph walk, so the code
related to the input graph and walk are no longer needed.
For now the Input method is retained on the ResourceProvider interface,
but it will never be called. Subsequent work to revamp the provider API
will remove this method.
Now that core has access to the provider configuration schema, our input
logic can be implemented entirely within Context.Input, removing the need
to execute a full graph walk to gather input.
This commit replaces the graph walk call with instead just visiting the
provider configurations (explicit and implied) in the root module, using
the schema to prompt.
The code to manage the input graph walk is not yet removed by this commit,
and will be cleaned up in a subsequent commit once we've made sure there
aren't any other callers/tests depending on parts of it.
Previously we fetched schemas during the AttachSchemaTransformer,
potentially multiple times as that was re-run for each graph built. Now
we fetch the schemas just once during context construction, passing that
result into each of the graph builders.
This only addresses the schema accesses during graph construction. We're
still separately loading schemas during the main walk for evaluation
purposes. This will be addressed in a later commit.
These transformers both construct temporary graphs using many of the same
transformers used in the apply graph, and properly doing this now requires
access to the providers and provisioners in order to obtain their schemas.
Along with this, we also update the tests here to use the
simpleMockComponentFactory helper to get a mock provider with a schema
already configured, which means we also need to update the test fixtures
and assertions to use the resource type and attributes defined in that
mock factory.
Some of the objects that are referencable from expressions are transient
values computed only during a graph walk, and not persisted in state. In
order to support arbitrary evaluation of expressions, such as in the
"terraform console" CLI command, it's necessary to be able to evaluate
these values before we start evaluating.
This new Eval method achieves this by performing a special graph walk that
ignores resources (except for dependency resolution) and just focuses on
evaluating all of these transient values, before returning an evaluation
scope that can then resolve expressions in terms of that result.
This replaces the Context.Interpolator method, which was fraught with
various issues due to it not properly priming the state before evaluating.
I took some missteps here while doing the initial refactor for HCL2 types.
This restores the map of maps that retains all of the variable values, and
then makes it available to the evaluator.
This is a little awkward since we need to instantiate the providers much
earlier than before. To avoid a lot of reshuffling here we just spin each
one up and then immediately shut it down again, letting our existing init
functionality during the graph walk still do the main initialization.
Due to how deeply the configuration types go into Terraform Core, there
isn't a great way to switch out to HCL2 gradually. As a consequence, this
huge commit gets us from the old state to a _compilable_ new state, but
does not yet attempt to fix any tests and has a number of known missing
parts and bugs. We will continue to iterate on this in forthcoming
commits, heading back towards passing tests and making Terraform
fully-functional again.
The three main goals here are:
- Use the configuration models from the "configs" package instead of the
older models in the "config" package, which is now deprecated and
preserved only to help us write our migration tool.
- Do expression inspection and evaluation using the functionality of the
new "lang" package, instead of the Interpolator type and related
functionality in the main "terraform" package.
- Represent addresses of various objects using types in the addrs package,
rather than hand-constructed strings. This is not critical to support
the above, but was a big help during the implementation of these other
points since it made it much more explicit what kind of address is
expected in each context.
Since our new packages are built to accommodate some future planned
features that are not yet implemented (e.g. the "for_each" argument on
resources, "count"/"for_each" on modules), and since there's still a fair
amount of functionality still using old-style APIs, there is a moderate
amount of shimming here to connect new assumptions with old, hopefully in
a way that makes it easier to find and eliminate these shims later.
I apologize in advance to the person who inevitably just found this huge
commit while spelunking through the commit history.
The init command needs to parse the state to resolve providers, but
changes to the state format can cause that to fail with difficult to
understand errors. Check the terraform version during init and provide
the same error that would be returned by plan or apply.
Validation is the best time to return detailed diagnostics
to the user since we're much more likely to have source
location information, etc than we are in later operations.
This change doesn't actually add any detail to the messages
yet, but it changes the interface so that we can gradually
introduce more detailed diagnostics over time.
While here there are some minor adjustments to some of the
messages to improve their consistency with terminology we
use elsewhere.
Update all references to the version values to use the new package.
The VersionString function was left in the terraform package
specifically for the aws provider, which is vendored. We can remove that
last call once the provider is updated.
When working on an existing plan, the context always used walkApply,
even if the plan was for a full destroy. Mark in the plan if it was
icreated for a destroy, and transfer that to the context when reading
the plan.
Previously we were checking required_version only during "real" operations, and not during initialization. Catching it during init is better because that's the first command users run on a new working directory.
The shadow graph was incredibly useful during the 0.7 cycle but these days
it is idle, since we're not planning any significant graph-related changes
for the forseeable future.
The shadow graph infrastructure is somewhat burdensome since any change
to the ResourceProvider interface must have shims written. Since we _are_
expecting changes to the ResourceProvider interface in the next few
releases, I'm calling "YAGNI" on the shadow graph support to reduce our
maintenence burden.
If we do end up wanting to use shadow graph again in future, we'll always
be able to pull it out of version control and then make whatever changes
we skipped making in the mean time, but we can avoid that cost in the
mean time while we don't have any evidence that we'll need to pay it.
Skips checksum validation if the `TF_SKIP_PROVIDER_VERIFY` environment variable is set. Undocumented variable, as the primary goal is to significantly improve the local provider development workflow.
The information stored in a plan is tightly coupled to the Terraform core
and provider plugins that were used to create it, since we have no
mechanism to "upgrade" a plan to reflect schema changes and so mismatching
versions are likely to lead to the "diffs didn't match during apply"
error.
To allow us to catch this early and return an error message that _doesn't_
say it's a bug in Terraform, we'll remember the Terraform version and
plugin binaries that created a particular plan and then require that
those match when loading the plan in order to apply it.
The planFormatVersion is increased here so that plan files produced by
earlier Terraform versions _without_ this information won't be accepted
by this new version, and also that older versions won't try to process
plans created by newer versions.
When set, this information gets passed on to the provider resolver as
part of the requirements information, causing us to reject any plugins
that do not match during initialization.
Previously the set of providers was fixed early on in the command package
processing. In order to be version-aware we need to defer this work until
later, so this interface exists so we can hold on to the possibly-many
versions of plugins we have available and then later, once we've finished
determining the provider dependencies, select the appropriate version of
each provider to produce the final set of providers to use.
This commit establishes the use of this new mechanism, and thus populates
the provider factory map with only the providers that result from the
dependency resolution process.
This disables support for internal provider plugins, though the
mechanisms for building and launching these are still here vestigially,
to be cleaned up in a subsequent commit.
This also adds a new awkward quirk to the "terraform import" workflow
where one can't import a resource from a provider that isn't already
mentioned (implicitly or explicitly) in config. We will do some UX work
in subsequent commits to make this behavior better.
This breaks many tests due to the change in interface, but to keep this
particular diff reasonably easy to read the test fixes are split into
a separate commit.
The documentation for Refresh indicates that it will always return a
valid state, but that wasn't true in the case of a graph builder error.
While this same concept wasn't documented for Apply, it was still
assumed in the terraform apply code.
Since the helper testing framework relies on the absence of a state to
determine if it can call Destroy, the Context can't can't start
returning a state in all cases. Document this, and use the State method
to fetch the correct state value after Apply.
Add a nil check to the WriteState function, so that writing a nil state
is a noop.
Make sure to init before sorting the state, to make sure we're not
attempting to sort nil values. This isn't technically needed with the
current code, but it's just safer in general.
Always wait for watchStop to return during context.walk.
Context.walk would often complete immediately after sending the close
signal to watchStop, which would in turn call the deferred releaseRun
cancelling the runContext.
Without any synchronization points after the select statement in
watchStop, that goroutine was not guaranteed to be scheduled
immediately, and in fact it often didn't continue until after the
runContext was canceled. This in turn left the select statement with
multiple successful cases, and half the time it would chose to Stop the
providers.
Stopping the providers after the walk of course didn't cause any
immediate failures, but if there was another walk performed, the
provider StopContext would no longer be valid and could cause
cancellation errors in the provider.
Fixes#10911
Outputs that aren't targeted shouldn't be included in the graph.
This requires passing targets to the apply graph. This is unfortunate
but long term should be removable since I'd like to move output changes
to the diff as well.
To avoid chasing down issues like #11635 I'm proposing we disable the
shadow graph for end users now that we have merged in all the new
graphs. I've kept it around and default-on for tests so that we can use
it to test new features as we build them. I think it'll still have value
going forward but I don't want to hold us for making it work 100% with
all of Terraform at all times.
I propose backporting this to 0-8-stable, too.
This switches to the Go "context" package for cancellation and threads
the context through all the way to evaluation to allow behavior based on
stopping deep within graph execution.
This also adds the Stop API to provisioners so they can quickly exit
when stop is called.