For a while now we've had information equivalent to this in various internal documents that we've referred to when designing features such as config-driven refactoring, the "replace" planning option, and so forth. However, so far we've not put that information in any sort of durable public place that we can easily find and refer to when having design discussions on GitHub and similar. This is therefore an attempt to capture a summary of the three main design patterns we've identified for planning-related behaviors, with a few motivating examples of each one, in the hope that this will be a good reference and some helpful inspiration for future design work. It's intentionally not totally comprehensive of all planning behaviors both because that would duplicate the end-user-oriented documentation and because it would be burdensome to keep updating this document each time we add anything new which might fit into these categories. However, we might add a later feature to this document if it illustrates a new take or different perspective on one of these patterns.
16 KiB
Planning Behaviors
A key design tenet for Terraform is that any actions with externally-visible side-effects should be carried out via the standard process of creating a plan and then applying it. Any new features should typically fit within this model.
There are also some historical exceptions to this rule, which we hope to supplement with plan-and-apply-based equivalents over time.
This document describes the default planning behavior of Terraform in the absense of any special instructions, and also describes the three main design approaches we can choose from when modelling non-default behaviors that require additional information from outside of Terraform Core.
This document focuses primarily on actions relating to resource instances, because that is Terraform's main concern. However, these design principles can potentially generalize to other externally-visible objects, if we can describe their behaviors in a way comparable to the resource instance behaviors.
This is developer-oriented documentation rather than user-oriented documentation. See the main Terraform documentation for information on existing planning behaviors and other behaviors as viewed from an end-user perspective.
Default Planning Behavior
When given no explicit information to the contrary, Terraform Core will automatically propose taking the following actions in the appropriate situations:
- Create, if either of the following are true:
- There is a
resource
block in the configuration that has no corresponding managed resource in the prior state. - There is a
resource
block in the configuration that is recorded in the prior state but whosecount
orfor_each
argument (or lack thereof) describes an instance key that is not tracked in the prior state.
- There is a
- Delete, if either of the following are true:
- There is a managed resource tracked in the prior state which has no
corresponding
resource
block in the configuration. - There is a managed resource tracked in the prior state which has a
corresponding
resource
block in the configuration but itscount
orfor_each
argument (or lack thereof) lacks an instance key that is tracked in the prior state.
- There is a managed resource tracked in the prior state which has no
corresponding
- Update, if there is a corresponding resource instance both declared in the
configuration (in a
resource
block) and recorded in the prior state (unless it's marked as "tainted") but there are differences between the prior state and the configuration which the corresponding provider doesn't explicitly classify as just being normalization. - Replace, if there is a corresponding resource instance both declared in
the configuration (in a
resource
block) and recorded in the prior state marked as "tainted". The special "tainted" status means that the process of creating the object failed partway through and so the existing object does not necessarily match the configuration, so Terraform plans to replace it in order to ensure that the resulting object is complete. - Read, if there is a
data
block in the configuration.- If possible, Terraform will eagerly perform this action during the planning phase, rather than waiting until the apply phase.
- If the configuration contains at least one unknown value, or if the data resource directly depends on a managed resource that has any change proposed elsewhere in the plan, Terraform will instead delay this action to the apply phase so that it can react to the completion of modification actions on other objects.
- No-op, to explicitly represent that Terraform considered a particular resource instance but concluded that no action was required.
The Replace action described above is really a sort of "meta-action", which Terraform expands into separate Create and Delete operations. There are two possible orderings, and the first one is the default planning behavior unless overridden by a special planning behavior as described later. The two possible lowerings of Replace are:
- Delete then Create: first delete the existing object bound to an instance, and then create a new object at the same address based on the current configuration.
- Create then Delete: mark the existing object bound to an instance as "deposed" (still exists but not current), create a new current object at the same address based on the current configuration, and then delete the deposed object.
Special Planning Behaviors
For the sake of this document, a "special" planning behavior is one where Terraform Core will select a different action than the defaults above, based on explicit instructions given either by a module author, an operator, or a provider.
There are broadly three different design patterns for special planning behaviors, and so each "special" use-case will typically be met by one or more of the following depending on which stakeholder is activating the behavior:
-
Configuration-driven Behaviors are activated by additional annotations given in the source code of a module.
This design pattern is good for situations where the behavior relates to a particular module and so should be activated for anyone using that module. These behaviors are therefore specified by the module author, such that any caller of the module will automatically benefit with no additional work.
-
Provider-driven Behaviors are activated by optional fields in a provider's response when asked to help plan one of the default actions given above.
This design pattern is good for situations where the behavior relates to the behavior of the remote system that a provider is wrapping, and so from the perspective of a user of the provider the behavior should appear "automatic".
Because these special behaviors are activated by values in the provider's response to the planning request from Terraform Core, behaviors of this sort will typically represent "tweaks" to or variants of the default planning behaviors, rather than entirely different behaviors.
-
Single-run Behaviors are activated by explicitly setting additional "plan options" when calling Terraform Core's plan operation.
This design pattern is good for situations where the direct operator of Terraform needs to do something exceptional or one-off, such as when the configuration is correct but the real system has become degraded or damaged in a way that Terraform cannot automatically understand.
However, this design pattern has the disadvantage that each new single-run behavior type requires custom work in every wrapping UI or automaton around Terraform Core, in order provide the user of that wrapper some way to directly activate the special option, or to offer an "escape hatch" to use Terraform CLI directly and bypass the wrapping automation for a particular change.
We've also encountered use-cases that seem to call for a hybrid between these different patterns. For example, a configuration construct might cause Terraform Core to invite a provider to activate a special behavior, but let the provider make the final call about whether to do it. Or conversely, a provider might advertise the possibility of a special behavior but require the user to specify something in the configuration to activate it. The above are just broad categories to help us think through potential designs; some problems will require more creative combinations of these patterns than others.
Configuration-driven Behaviors
Within the space of configuration-driven behaviors, we've encountered two main sub-categories:
- Resource-specific behaviors, whose effect is scoped to a particular resource.
The configuration for these often lives inside the
resource
ordata
block that declares the resource. - Global behaviors, whose effect can span across more than one resource and sometimes between resources in different modules. The configuration for these often lives in a separate location in a module, such as a separate top-level block which refers to other resources using the typical address syntax.
The following is a non-exhastive list of existing examples of configuration-driven behaviors, selected to illustrate some different variations that might be useful inspiration for new designs:
-
The
ignore_changes
argument insideresource
blocklifecycle
blocks tells Terraform that if there is an existing object bound to a particular resource instance address then Terraform should ignore the configured value for a particular argument and use the corresponding value from the prior state instead.This can therefore potentially cause what would've been an Update to be a No-op instead.
-
The
replace_triggered_by
argument insideresource
blocklifecycle
blocks can use a proposed change elsewhere in a module to force Terraform to propose one of the two Replace variants for a particular resource. -
The
create_before_destroy
argument insideresource
blocklifecycle
blocks only takes effect if a particular resource instance has a proposed Replace action. If not set or set tofalse
, Terraform will decompose it to Destroy then Create, but if set totrue
Terraform will use the inverted ordering.Because Terraform Core will never select a Replace action automatically by itself, this is an example of a hybrid design where the config-driven
create_before_destroy
combines with any other behavior (config-driven or otherwise) that might cause Replace to customize exactly what that Replace will mean. -
Top-level
moved
blocks in a module activate a special behavior during the planning phase, where Terraform will first try to change the bindings of existing objects in the prior state to attach to new addresses before running the normal planning process. This therefore allows a module author to document certain kinds of refactoring so that Terraform can update the state automatically once users upgrade to a new version of the module.This special behavior is interesting because it doesn't directly change what actions Terraform will propose, but instead it adds an extra preparation step before the typical planning process which changes the addresses that the planning process will consider. It can therefore indirectly cause different proposed actions for affected resource instances, such as transforming what by default might've been a Delete of one instance and a Create of another into just a No-op or Update of the second instance.
This one is an example of a "global behavior", because at minimum it affects two resource instance addresses and, if working with whole resource or whole module addresses, can potentially affect a large number of resource instances all at once.
Provider-driven Behaviors
Providers get an opportunity to activate some special behaviors for a particular
resource instance when they respond to the PlanResourceChange
function of
the provider plugin protocol.
When Terraform Core executes this RPC, it has already selected between Create, Delete, or Update actions for the particular resource instance, and so the special behaviors a provider may activate will typically serve as modifiers or tweaks to that base action, and will not allow the provider to select another base action altogether. The provider wire protocol does not talk about the action types explicitly, and instead only implies them via other content of the request and response, with Terraform Core making the final decision about how to react to that information.
The following is a non-exhastive list of existing examples of provider-driven behaviors, selected to illustrate some different variations that might be useful inspiration for new designs:
-
When the base action is Update, a provider may optionally return one or more paths to attributes which have changes that the provider cannot implement as an in-place update due to limitations of the remote system.
In that case, Terraform Core will replace the Update action with one of the two Replace variants, which means that from the provider's perspective the apply phase will really be two separate calls for the decomposed Create and Delete actions (in either order), rather than Update directly.
-
When the base action is Update, a provider may optionally return a proposed new object where one or more of the arguments has its value set to what was in the prior state rather than what was set in the configuration. This represents any situation where a remote system supports multiple different serializations of the same value that are all equivalent, and so changing from one to another doesn't represent a real change in the remote system.
If all of those taken together causes the new object to match the prior state, Terraform Core will treat the update as a No-op instead.
Of the three genres of special behaviors, provider-driven behaviors is the one we've made the least use of historically but one that seems to have a lot of opportunities for future exploration. Provider-driven behaviors can often be ideal because their effects appear as if they are built in to Terraform so that "it just works", with Terraform automatically deciding and explaining what needs to happen and why, without any special effort on the user's part.
Single-run Behaviors
Terraform Core's "plan" operation takes a set of arguments that we collectively call "plan options", that can modify Terraform's planning behavior on a per-run basis without any configuration changes or special provider behaviors.
As noted above, this particular genre of designs is the most burdensome to implement because any wrapping software that can ask Terraform Core to create a plan must ideally offer some way to set all of the available planning options, or else some part of Terraform's functionality won't be available to anyone using that wrapper.
However, we've seen various situations where single-run behaviors really are the most appropriate way to handle a particular use-case, because the need for the behavior originates in some process happening outside of the scope of any particular Terraform module or provider.
The following is a non-exhastive list of existing examples of single-run behaviors, selected to illustrate some different variations that might be useful inspiration for new designs:
-
The "replace" planning option specifies zero or more resource instance addresses.
For any resource instance specified, Terraform Core will transform any Update or No-op action for that instance into one of the Replace actions, thereby allowing an operator to respond to something having become degraded in a way that Terraform and providers cannot automatically detect and force Terraform to replace that object with a new one that will hopefully function correctly.
-
The "refresh only" planning mode ("planning mode" is a single planning option that selects between a few mutually-exclusive behaviors) forces Terraform to treat every resource instance as No-op, regardless of what is bound to that address in state or present in the configuration.
Legacy Operations
Some of the legacy operations Terraform CLI offers that aren't integrated with the plan and apply flow could be thought of as various degenerate kinds of single-run behaviors. Most don't offer any opportunity to preview an effect before applying it, but do meet a similar set of use-cases where an operator needs to take some action to respond to changes to the context Terraform is in rather than to the Terraform configuration itself.
Most of these legacy operations could therefore most readily be translated to single-run behaviors, but before doing so it's worth researching whether people are using them as a workaround for missing configuration-driven and/or provider-driven behaviors. A particular legacy operation might be better replaced with a different sort of special behavior, or potentially by multiple different special behaviors of different genres if it's currently serving as a workaround for many different unmet needs.