mirror of https://github.com/opentofu/opentofu.git synced 2025-02-25 18:45:20 -06:00

Martin Atkins 954e3aed01 rfc: Static Evaluation of Provider Iteration state tracking revision

The original proposal called for the state snapshot loader to accept a
resource instance with both an instance-level provider instance address
and a resource-level provider instance address.

The final implementation does follow that specification, but it also emits
a warning in that case to draw attention to the inconsistency.

Signed-off-by: Martin Atkins <mart@degeneration.co.uk>

2024-12-11 12:52:50 -08:00

56 KiB

Raw Permalink Blame History

Static Evaluation of Provider Iteration

Issue: https://github.com/opentofu/opentofu/issues/300

Since the introduction of for_each/count, users have been trying to use each/count in provider configurations and resource/module mappings. Providers are a special case throughout OpenTofu and interacting with them either as a user or developer requires significant care.

Note

Please read Provider References before diving into this section! This document uses the same terminology and formatting.

Proposed Solution

The approach proposed in the Static Evaluation RFC can be extended to support provider for_each/count with some clever code and lots of testing. It is assumed that the reader has gone through the Static Evaluation RFC thoroughly before continuing here.

User Documentation

Provider Configuration Expansion

How are multiple configured providers of the same type used today?

provider "aws" {
  alias = "us"
  region = "us-east-1"
}
// Copy pasted from above provider, with minor changes
provider "aws" {
  alias = "eu"
  region = "eu-west-1"
}

resource "aws_s3_bucket" "primary_us" {
  provider = aws.us
}
// Copy pasted from above resource, with minor changes
resource "aws_s3_bucket" "primary_eu" {
  provider = aws.eu
}

module "mod_us" {
  source = "./mod"
  providers {
    aws = aws.us
  }
}
// Copy pasted from above module, with minor changes
module "mod_eu" {
  source = "./mod"
  providers {
    aws = aws.eu
  }
}

For scenarios where you wish to use the same pattern with multiple providers, copy-paste or multiple workspaces is the only path available. Any copy-paste like this can easily introduce errors and bugs and should be avoided. For this reason, users have been asking to use for_each and count in providers for a long time.

Let's run through what it would look like to enable this workflow.

What is expected when a user adds for_each/count to a provider configuration:

locals {
  regions = {"us": "us-east-1", "eu": "eu-west-1"}
}

provider "aws" {
  alias    = "by_region"
  for_each = local.regions

  region = each.value
}

At first glance, this looks fairly straightforward. Following the rules in place with resources, we would expect aws.by_region["us"] and aws.by_region["eu"] to be valid.

What happens if you use for_each without alias? That would presumably cause reference addresses like aws["us"] and aws["eu"] which, based on rules elsewhere in the language, end-users would likely assume means the same thing as aws.us or aws.eu, but that would conflict with a provider configuration that explicitly sets alias = "us".

Therefore we retain the current assumption that a default provider configuration (one without "alias") is always a singleton, and so for_each can only be used when alias is also set and the instance keys then appear as an extra dynamic index segment at the end of the provider reference syntax. The concept of "default provider configuration" exists in OpenTofu to allow for automatic selection of the provider for a resource in simple cases, and those automatic behaviors rely on the default provider configuration being a singleton.

In the longer term we might allow fully-dynamic expansion and references similar to the existing for_each support for resources and module calls, but in prototyping so far we've learned that requires quite significant and risky changes to the core language runtime. This RFC therefore proposes an intermediate step which relies on the ideas set forth in the Static Evaluation RFC, which means that the for_each argument in a provider block will initially allow only values known during early evaluation: input variables and local values that are derived from them. Anything dynamic, like a reference to a data resource, remains forbidden for now.

With the provider references clarified, we can now use the providers defined above in resources:

resource "aws_s3_bucket" "primary" {
  for_each = local.regions
  provider = aws.by_region["us"] # Extends the existing reference format with an instance key
}

locals {
  region = "eu"
}

module "mod" {
  source = "./mod"
  providers = {
    # The new instance key segment can use arbitrary expressions in the index brackets.
    aws = aws.by_region[local.region]
  }
}

The provider argument for resource/data blocks and the providers argument for module blocks retains much of the rigid static structure previously required, to ensure that it remains possible to statically recognize the relationships between resource configuration blocks and provider configuration blocks since that would be required for a later fully-dynamic version of this feature implemented in the main language runtime: it must be able to construct the dependency graph before evaluating any expressions.

However, the new "index" portion delimited by square brackets is, from a syntax perspective, allowed to contain any valid HCL expression. The only constraints on that expression are on what it is derived from and on its result: it must be derived only from values known during planning, its result must be a value that can convert to string, and the string after conversion must match one of the instance keys of the indicated provider configuration.

Note that in particular this design forces all of the instances of a resource to refer to instances of the same provider configuration, although they are each allowed to refer to a different instance. Again, this is a forward-looking constraint to ensure that it remains possible to build a dynamic evaluation dependency graph where each resource/data block depends on the appropriate provider block before dynamic expression evaluation is possible, even though that isn't strictly required for the static-eval-only implementation.

Provider Alias Mappings

Now that we can reference providers via variables, how should this interact with for_each / count in resources and modules?

locals {
  regions = {"us": "us-east-1", "eu": "eu-west-1"}
}

provider "aws" {
  alias    = "by_region"
  for_each = local.regions

  region = each.value
}

resource "aws_s3_bucket" "primary" {
  for_each = local.regions
  provider = aws.by_region[each.key]

  # ...
}

module "mod" {
  source   = "./mod"
  for_each = local.regions
  providers = {
    aws = aws.by_region[each.key]
  }

  # ...
}

This use of each.key is familiar from its existing use in resource, data, and module blocks that use for_each. The bracketed portion of the address is evaluated dynamically by the main language runtime, and so (unlike the for_each argument in provider blocks) this particular expression is not constrained to only static-eval-compatible symbols.

Multi-instance provider configurations in tests

The test scenario language (.tftest.hcl/etc files) includes three features that affect the treatment of provider configurations:

test-scenario-level provider blocks allow a test scenario to use a different configuration for a provider than would be used normally, while still using the "real" provider plugin.
mock_provider blocks allow replacing a "real" provider plugin with an inert mock implemented inside OpenTofu itself, so that the provider configurations are effectively unused.
The providers argument in a run block changes which provider configurations are bound to which module-local provider configuration addresses in the root module during that test run.

All of these features currently assume that each provider configuration has exactly one instance. That constraint does not hold when testing a module that uses for_each in a provider block, and so these features will also change as follows:

Test-scenario-level provider blocks and mock_provider blocks are effectively substitutes for provider configurations in the module under test.

These must both therefore also support for_each with the same meaning as in provider blocks in the main OpenTofu language.

Additionally, the presence of for_each must be consistent with the corresponding configuration block in the module under test: if the module itself declares a provider block with for_each set then any provider or mock_provider block in a test scenario that uses the same alias must include a for_each argument. Conversely, if the main module's configuration block does not include for_each then the corresponding test scenario configuration must not have one.

A test-scenario-level provider or mock_provider block whose address matches a configuration_aliases entry in the required_providers block of a module under test -- that is, the situation where the module expects its caller to pass in a provider configuration and so the test-scenerio-level block is acting as a substitute for that passed-in provider configuaration -- also must not include a for_each argument, because in the first iteration of these features described by this proposal it is not yet allowed to pass collections of provider instances between modules.

A test-scenario-level block that doesn't correspond to one of an actual provider block in the module under test, an implied empty provider block in the module under test, or a configuration_aliases entry is a test-scenario-only provider configuration used in conjunction with the providers argument in a run block. Those are currently always forbidden from using for_each.
The providers argument in a run block should ideally support the same kinds of provider mappings that the argument of the same name in a module block supports, since this argument is intended to behave as if the run block is a module block calling the module under test.

However, to reduce scope for the initial iteration of this feature we will compromise:
- As with providers in a module call, it's invalid to refer to a multi-instance provider configuration without including an instance key, BUT...
- It's also initially forbidden to include an instance key too, because it's not yet clear how to implement that without significant changes to the design of the test harness. This corresponds to the rule that a test-scenario-only provider configuration may not use for_each, since the providers argument exists primarily to force substituting a test-scenario-only provider configuration for a particular test run and our rule against using for_each on those means that there's never any need to specify an instance key.

For example, consider a module under test written as follows:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

variable "aws_regions" {
  type = map(object({
    enabled = optional(bool, true)
  }))
}

provider "aws" {
  alias    = "by_region"
  for_each = var.aws_regions

  region = each.key
}

resource "aws_instance" "example" {
  for_each = {
    for k, v in var.aws_regions : k => v
    if v.enabled
  }

  # ...
}

If a test scenario for this module needed to use a mock to replace all uses of the "real" hashicorp/aws provider, the mock_provider blocks must use the same alias and must use for_each to match the provider block in the main module:

# This test scenario verifies that it's possible to perform a two-step
# removal of a region by first disabling it (causing all of the resource
# instances to be destroyed) and then finally removing it.

mock_provider "aws" {
  alias    = "by_region"
  for_each = var.aws_regions
  # NOTE: In this example for_each is set to exactly the same expression as
  # the corresponding block in the main module, but that's not actually
  # required: it would be valid to specify only a subset of regions or
  # even a _fake_ set of regions since the mock provider doesn't actually
  # know what an "AWS region" even is.
  # However, this particular module _does_ expect to have one instance
  # of the provider for each entry in var.aws_regions, so we must match
  # that here to ensure that we're testing something realistic.

  # ...all the existing mock provider settings, unchanged...
}

run "initial_create" {
  variables {
    aws_regions = {
      faked-region-a = {}
      faked-region-b = {}
    }
  }
}

run "disable_b" {
  variables {
    aws_regions = {
      faked-region-a = {}
      faked-region-b = {
        # Must first disable this region, causing all of the
        # resource instances in it to be destroyed before we
        # remove the provider configuration that is responsible
        # for destroying them.
        #
        # Failure at this step suggests that there's at least one
        # resource directly using for_each = var.aws_regions,
        # rather than a filtered version that only includes
        # the enabled regions.
        enabled = false
      }
    }
  }
}

run "remove_b" {
  variables {
    aws_regions = {
      faked-region-a = {}
      # faked-region-b is now completely removed, so it's
      # (now-inert) provider configuration is not declared at all.
    }
  }
}

What's not currently allowed

There are scenarios that module authors might think could work, but that we don't intend to support initially so that we can release an initial increment and then react to feedback.

The same expression for `for_each` in both a provider configuration and its associated resources

Someone encountering this feature for the first time is highly likely to try to write something like the following:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

variable "aws_regions" {
  type = map(object({
    vpc_cidr_block = string
  }))
}

provider "aws" {
  alias    = "by_region"
  for_each = var.aws_regions

  region = each.key
}

resource "aws_vpc" "example" {
  for_each = var.aws_regions
  provider = aws.by_region[each.key]

  cidr_block = each.value.vpc_cidr_block
}

That example declares that each region represented by an element of var.aws_regions should have both an instance of the hashicorp/aws provider and an aws_vpc.example resource instance belonging to that provider.

This example would work during initial creation, and would support adding new elements to var.aws_regions later. However, this example is problematic if the operator would ever want to remove an element from var.aws_regions, because that would effectively remove both the resource instance and the provider instance that manages it at the same time. OpenTofu needs to use a provider instance to plan and apply the destruction of a resource instance, so the provider instance must always live for at least one more plan/apply round than the resource instances it is managing.

To draw attention to this trap, OpenTofu will detect when the for_each expression in the provider block seems too similar to the for_each expression in one of the module or resource blocks that refers to it and will produce a warning explaining this risk.

"Too similar" will initially be defined as follows:

An expression that contains no references can never be "too similar" to any other expression, because the problem we're drawing attention to arises only when the two for_each arguments are based on the same source of data.
The rest of the comparison rules depend on specific HCL expression nodes, evaluated recursively:
- An expression in parentheses is "too similar" to another expression without parentheses if the two expressions match another one of these rules.
- Two reference expressions (e.g. var.foo["bar"]), attribute accesses from another value, or indexes of another value with a constant expression, (all of which are "traversals" in HCL's model) are "too similar" if all of the traversal steps are equivalent.
  
  "Equivalent" means that, for all steps of each traversal:
  - The steps at a particular position in each traversal are of the same type.
  - "Root" names (i.e. var in the example above) are character-for-character equal.
  - Attribute steps (i.e. .foo in the example above) have names that are character-for-character equal.
  - Index steps (i.e. ["bar"] in the example above) have an index value that is equal by the same rules as for the == operator.
- Two literal value expressions are "too similar" if their results are equal by the same rules as for the == operator. (But note that this applies only when a literal value appears as part of a larger expression that also involves a reference.)
- Two function call expressions are "too similar" if the called function names are identical, if both calls have the same number of arguments, and the expressions given for the arguments are each also "too similar" after recursively applying these rules.
- Two conditional expressions are "too similar" if the predicate expression and the two result expressions are each "too similar" after recursively applying these rules.
- Two index expressions with dynamic key expressions are "too similar" if the expressions they are applied to are "too similar" and their key expressions are "too similar" after recursively applying these rules to both.
- Two tuple constructor or object constructor expressions are "too similar" if they are both of the same type, they have the same number of constituent element expressions, and each of those constituent expressions are "too similar" after recursively applying these rules.
- Two for expressions are "too similar" if their temporary key/value symbol names are the same, and the source collection expressions, result key expressions, result value expressions, and filter predicate expressions are each "too similar" after recursively applying these rules.
- Two binary operation expressions (e.g. 1 + 1) are "too similar" if they both have the same operation and the left and right operand expressions of each operation are each "too similar" after recursively applying these rules.
- Two unary operation expressions (e.g. -var.foo) are "too similar" if they both have the same operation and the operand expressions of each operation are "too similar" after recursively applying these rules.
- Two template expressions are "too similar" if they have the same number of template parts and each of the template parts are "too similar" after recursively applying these rules.
No other combinations are considered to be "too similar". This is a best-effort heuristic that does not intend to achieve full coverage of all possible expression types.

Although the problem only really affects managed resources, since they are the only object that needs provider envolvement to destroy, the warning would appear for a call to a module block that does not contain any resource blocks because the warning is generated only based on syntax in the configuration layer, rather than at runtime, and so it cannot "see into" the child module.

Authors are expected to respond to this warning by somehow changing the expressions on the module and resource blocks to filter out a subset of the elements used with the provider block's for_each based on something in the source value. For example, an author might choose to use a null map element to represent that the provider instance should be declared but the resource instances must not, or might add an enabled attribute to the element object type which defaults to true and disables the resource instances if overridden to false. For example:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

variable "aws_regions" {
  type = map(object({
    vpc_cidr_block = string
    enabled        = optional(bool, true)
  }))
}

locals {
  # enabled_regions includes only the elements of var.aws_regions
  # where enabled = true, and thus resource instances should be
  # declared.
  enabled_regions = tomap({
    for region, config in var.aws_regions : region => config
    if config.enabled
  })
}

provider "aws" {
  alias    = "by_region"
  for_each = var.aws_regions

  region = each.key
}

resource "aws_vpc" "example" {
  for_each = local.enabled_regions
  provider = aws.by_region[each.key]

  cidr_block = each.value.vpc_cidr_block
}

In practice any change to either of the expressions that causes them to no longer be considered "too similar" is sufficient to quiet the warning, regardless of whether the difference actually solves the problem the warning is describing. It's the module author's responsibility to ensure that their solution actually solves the problem.

(For some alternatives we considered and why we ultimately chose this path, refer to Alternatives to the warning about for_each expressions being "too similar".)

Provider references as normal values

locals {
  my_provider = aws.by_region["us"]
}

resource "aws_s3_bucket" "primary" {
  provider = local.my_provider
}

It's not currenly well-defined what a "provider reference" is outside of a provider or providers argument. All of the provider references are built as special cases that are handled using HCL static analysis.

Additionally, the OpenTofu language would currently recognize aws.by_region as a reference to a resource "aws" "by_region" block in any normal expression context, so we cannot easily redefine that existing meaning while retaining backward compatibility.

Syntax compatibility concerns aside, it would be technically possible to treat a provider configuration reference as a value using a concept in the upstream library that provides the low-level building blocks of OpenTofu's type system: cty Capsule Types. Assuming we also defined a suitable syntax for declaring a new kind of type constraint and referring to a provider in a normal expression, we could technically allow provider instance references to be passed around in a similar way as other values are passed around.

One resource with multiple provider configurations

OpenTofu's existing graph construction logic fundamentally assumes that each resource block is associated with exactly one provider configuration. This proposal introduces the possibility of a single provider configuration generating multiple provider instances, but since this initial proposal aims to avoid making significant changes to the graph transformation logic we will initially require that all instances of a particular resource refer to (potentially different) instances of the same provider configuration.

That restriction is most obvious in the simplest case where a resource block has provider directly referring to a provider instance in the same module:

provider "example" {
  alias    = "foo"
  for_each = toset(["bar", "baz"])

  # ...
}

resource "example_thing" "example" {
  for_each = toset(["bar", "baz"])

  # The part in brackets is dynamic, but the initial reference to the
  # provider configuration block remains static, so effectively all
  # of the instances of this resource _must_ belong to the one
  # single provider configuration block above.

  # The following would be valid because "example.foo" is statically known...
  provider = example.foo[each.key]

  # ...but something like the following is not supported as it could refer
  # to many potential provider configurations depending on the result of
  # local.alias:
  # provider = example[local.alias][each.key]
}

This restriction also applies when passing provider configurations indirectly through module calls. Each instance of a module can have its own provider configuration namespace populated with different provider instances defined in the parent, but they must still all refer to the same configuration block:

provider "example" {
  alias    = "foo"
  for_each = toset(["bar", "baz"])

  # ...
}

module "example" {
  source   = "./example"
  for_each = toset(["bar", "baz"])

  providers = {
    # All of the resource blocks in this module can just assume there's
    # a default (unaliased) "example" provider instance available to
    # use. Each one is bound to a different instance from the provider
    # block above, but they must still nonetheless all be bound to
    # the same block.
    example = example.foo[each.key] # Supported

    # However, if we add uncertainty to which provider configuration is
    # being referenced, the above assertions do not hold and become much
    # more complex to reason about:
    # example = example[local.alias][each.key] # Unsupported
  }
}

We should be able to loosen this restriction in a future RFC, either as part of a fully-dynamic design where provider references are normal expressions as described in the previous section or via some specialized syntax that's only used for provider configurations. The main requirement, regardless of how the surface syntax is designed, is that the graph builder change to support creating multiple dependency edges from a single resource node to multiple provider configuration nodes, so that evaluation of the resource node is delayed until after all of the possibly-referred-to provider configurations have been visited and thus their provider instances are running.

Passing collections of provider instances between modules

There was previously no syntax for causing a single provider configuration to produce multiple provider instances, and so we also have no existing syntax to use to differentiate between a reference to one provider instance vs. a reference to a collection of provider instances.

In this first implementation it's forbidden to refer to a multi-instance provider configuration without specifying an instance key.

For example, consider the following configuration:

provider "example" {
  alias    = "foo"
  for_each = toset(["bar", "baz"])

  # ...
}

resource "example_thing" "a" {
  # The following is invalid: must include the instance key part to
  # tell OpenTofu how to select exactly one provider instance.
  provider = example.foo
}

module "child" {
  source = "./child"
  providers = {
    # The right-hand side of this item is invalid: must include the
    # instance key part to tell OpenTofu how to select exactly one
    # provider instance.
    example.foo = example.foo
  }
}

The consequences of example.foo being invalid don't really matter for a resource block because it doesn't make sense to specify multiple provider instances for a single resource anyway.

It would be useful to be able to pass a collection of provider instances to a child module as shown in the second example, but that raises various new design questions about how the child module should declare that it's expecting to be passed a collection of provider instances, since today provider instance references are not normal values and so it isn't really meaningful to describe "a map of example provider instances".

Leaving this unsupported for the first release again gives us room to consider various different design options in later RFCs as our understanding of the use-cases grows based on feedback. The following are some early potential designs we've considered so far, but it isn't clear yet which (if any) is the best fit:

Borrow the "splat operator" syntax to talk about recieving a collection of provider instances in the configuration_aliases argument of a module's required_providers block:

terraform {
  required_providers {
    example = {
      source = "tf.example.com/example/example"

      configuration_aliases = [
        # The following could potentially mean something similar to
        # a provider "example" block with alias = "foo" and a
        # for_each argument, specifying that this module expects its
        # caller to provide a collection of provider instances.
        example.foo[*],

        # ...but this syntax does not allow to differentiate a
        # collection keyed by strings vs. a collection keyed by
        # integers, and so adopting this syntax might block the
        # later addition of "count" argument support for provider
        # configurations.
      ]
    }
  }
}

Borrow the "splat operator" syntax instead for use at the call site, so the calling module can declare that it's intending to pass a collection of instances:
```
module "child" {
  # ...
  providers = {
    aws.each_region[*] = aws.by_region[*]
  }
}
```
This is similar to the previous idea except that it makes the assertion of intent to pass multiple instances appear on the caller side rather than on the callee side. It's technically redundant to use extra syntax here since the presence or absense of for_each in the provider "aws" block that has alias = "by_region" is already sufficient to represent whether aws.by_region is a single instance or a collection of instances,

(This and the previous option could potentially be combined together to allow both caller and callee to explicitly declare what they are intending to do.)

Wait until we're ready to support provider instance references being treated as normal values of a new type, and then encourage module authors to pass collections of provider configurations via input variables that use a new type constraint syntax, effectively deprecating the current provider-specific sidechannel.

This generalized approach would allow for various more interesting combinations, such as sending provider instances along with other data as part of a single object per region:

variable "aws_regions" {
  type = map(object({
    cidr_block        = string
    provider_instance = providerinstance(aws)
    # (the above is hypothetical new type constraint syntax for
    # "a reference to an instance of whichever provider has the
    # local name "aws" in this module.)
  }))
}

resource "aws_vpc" "example" {
  for_each = var.aws_regions
  provider = each.value.provider_instance

  cidr_block = each.value.cidr_block
}

module "example" {
  # (this is a call to the module in the previous code block)
  # ...

  aws_regions = {
    us-west-2 = {
      cidr_block        = "10.1.0.0/16"
      provider_instance = provider.aws.usw2
      # (the above is hypothetical new syntax for referring to a
      # provider instance in normal expression context, where
      # the "provider." prefix differentiates it from being
      # a reference to a managed resource.)
    }
    eu-west-1 = {
      cidr_block        = "10.2.0.0/16"
      provider_instance = provider.aws.euw2
    }
  }
}

Using the `count` meta-argument in `provider` blocks

The existing resource, data, and module blocks support either count or for_each as two different strategies for causing a single configuration block to dynamically declare multiple instances.

count is most appropriate for situations where the multiple instances are "fungible". For example, the instances could be considered "fungible" if the process of reducing the number of instances doesn't need to consider which of the instances will be destroyed. A collection of virtual machines all booted from the same disk image, with the same configuration, and running the same software could be considered fungible.

for_each is the better option when each instance has at least one unique characteristic that makes it functionally or meaningfully distinct from the others. A collection of virtual network objects where each is used to host different services are not fungible because each network has its own separate role to play in the overall system.

The technical design we've adopted here could potentially allow supporting count in provider blocks, generating instances with integer numbers as keys just as with count in other blocks. However, we don't yet know of any clear use-case for a collection of "fungible" provider configurations: the only reasons we know of to have multiple instances of a provider involve each one being configured differently, such as using a different AWS region. It's important for correctness for the resource instances associated with these provider instances to retain their individual bindings as the set of provider instances changes, because an object created using one provider instance will often appear to be non-existent if requested with another provider instance.

Because of how crucial it is to preserve the relationships between resource instances and provider instances between runs, we've chosen to intentionally exclude count support from the initial design. However, the count argument remains reserved in provider blocks so that we can potentially implement it in future if feedback suggests a significant use-case for it. We hope that the future discovered use-case(s) will also give us some ideas on how we could help protect against the inevitable misbehavior caused by a resource instance getting accidentally reassociated with the wrong provider instance.

Technical Approach

The following describes the high level changes. A potential implementation has been proposed here: https://github.com/opentofu/opentofu/pull/2105

Our goal is to implement this feature with as little risk as possible. We prioritize minimizing the features and the changeset to minimize the risk. Additional refactoring will likely be performed as this feature grows over time and we choose to implement the most clearly defined and useful parts first.

The core of the changes needed to implement what is described above:

Allow for_each in provider configuration blocks and evaluate them in the static context
Initialize and configure a provider instance for each element of the for_each collection (exec binary + pass configuration)
Allow provider key expressions in resource > provider and module > providers configuration blocks
Evaluate the provider key expressions when needed during the evaluation graph to determine which provider instance each resource instance should use
Update state storage to understand per-resource-instance provider keys

Provider Configuration Blocks

A "provider configuration" is the provider "type" {} block in the OpenTofu Language. This is located within either the root module or a child module. Provider configurations cannot be declared in child modules which have been called with for_each or count.

Each provider block in the configuration is decoded into an instance of the configs.Provider struct type and added to it's respective configs.Module at the correct location within the module tree. The existence of this provider block is heavily used when validating the providers within the module tree, see configs/provider_validation.go.

Each provider block will need to contain it's static expansion information. "Expansion" is the process of deciding the set of zero or more instance keys that are declared for an object using either count or for_each. For this initial iteration of the feature, the for_each expression will be evaluated as part of the main config loader using the static evaluation context as defined in Init-time static evaluation of constant variables and locals.

The configs.Provider struct representing each configured provider block will now contain new field that contains the static mapping from "provider instance key" to "provider repetition data". In practice, this is a map[addrs.InstanceKey]instances.RepetitionData value. This map can be built using the StaticContext available in configs.NewModule() as defined in the Static Evaluation RFC.

At the end of the configs.NewModule constructor, all provider configurations that contain a for_each argument will have their new "instance mapping" field set (or an error message produced). This should not change the majority of provider validation logic in the configuration package as the name/type/alias information has not changed.

Provider Node Execution

Each "provider configuration block" is turned into a "provider node" (NodeApplyableProvider) within the tofu graph. When a provider node is executed, it starts up and configures a providers.Interface of the corresponding provider. This "interface" is stored in the tofu.EvalContext and is available for use by other nodes via the context (referenced below).

NodeApplyableProvider now effectively represents all provider.Interfaces of a particular provider configuration block at once, and its "execute" step now involves a loop performing for each instance the behavior that was previously always singleton:

Evaluate the configuration using the appropriate instances.RepetitionData, thereby causing each.key and each.value to be available with instance-specific values. The result is a cty.Value representing an object of the type implied by the provider's configuration schema.
Create a child process running the appropriate provider plugin executable, creating a running provider interface.
Call the ConfigureProvider function from the provider protocol on the new provider interface, passing it the cty.Value representing its instance-specific configuration.
Save the provider interface in the shared tofu.EvalContext for use by downstream resource nodes.

A special case exists here for tofu validate. The validate codepath does not configure any providers or deal with module/resource "expansion" (for_each/count). A validate graph walk should only initialize and register one providers.Interface. It can still however validate provider configurations using the provider's schema and providers.Interface.ValidateProviderConfiguration call for each set of provider instance data on the single instance.

Selecting a provider instance for each resource instance

In the initial dependency graph constructed during the planning phase, there are graph nodes representing whole provider and resource/data blocks, but not yet their individual instances. This is because the existing expansion behaviors all use dynamic expression evaluation and so for_each or count has not yet been evaluated at the time of graph construction, and the expressions in those arguments can potentially imply additional dependencies that need to be included in the graph for the expression to ultimately evaluate successfully.

The initial graph node type for a resource or data block is nodeExpandPlannableResource, which ultimately represents the entire process of deciding which dynamic instances are declared for the block and performing the per-instance configuration evaluation for each of the instances.

The per-instance evaluation process now grows to include the task of evaluating the dynamic instance key portion of the provider argument. If an author wrote provider = aws.by_region[each.key] then each instance selects a different instance of aws.by_region based on its own value of each.key.

OpenTofu does not need to know the final provider instance for a resource instance until just before it begins making requests to the provider, so we can safely wait until visiting/executing an individual resource instance before deciding its provider configuration but when doing so we must deal with the possibility that the target provider instance might be accessed only indirectly through the provider instances passed by the parent module, and therefore we must now track for each resource the following new information:

The hcl.Expression representing the expression in the brackets at the end of the reference expression.
The addrs.Module of which module in the chain contains the dynamic instance key expression, and therefore which scope the instance key needs to be evaluated in.

The initial rule against passing entire collections of providers between modules means that in practice there will always be exactly one reference per resource that carries a dynamic instance key expression but that reference might actually be in a calling module block rather than in the resource block itself. (Future generalizations to support passing collections of providers will require tracking some additional detail here.)

Each resource instance must also now track the final addrs.InstanceKey that was the result of evaluating the resource's hcl.Expression instance key expression using that instance's containing scope and repetition data. That combines with the already-tracked provider configuration address to produce a fully-qualified provider instance address.

Dynamic provider instance keys in `module` blocks

When the dynamic provider instance selection occurs in a module block, rather than directly in a resource/data block, it participates in the existing capability of projecting some or all of the provider configurations in the calling module to also appear (potentially with different aliases) in the callee. For example:

provider "aws" {
  alias    = "by_region"
  for_each = var.aws_regions

  # ...
}

module "example" {
  source   = "./example"
  for_each = var.aws_regions
  providers = {
    aws = aws.by_region[each.key]
  }
}

This configuration means that each instance of module "example" has its default (unaliased) configuration for hashicorp/aws acting as an alias for one of the dynamic instances of the calling module's aws.by_region.

Therefore when resolving the dynamically-chosen provider instance for each resource in the child module, the resource instance node for each resource instance in the child module needs to be able to effectively walk up the module tree to find out that any reference to aws in the child module (whether explicit or implicit) must be resolved by evaluating aws.by_region[each.key] in the calling module, using the calling module's scope and the appropriate module call instance's each.key value.

Fotunately for us, this process is already encapsulated well within the ProviderConfigTransformer and can be extended for this purpose. The ProviderConfigTransformer adds nodes to the graph that either represent "provider configurations" directly (NodeApplyableProvider) or proxies to those providers. These proxies know how to walk up the tree to locate the actual provider configuration.

The proxy structure can easily be extended to also include the "provider key expression" required to perform the mapping. The ProviderTransformer is in charge of determining what "provider configuration node" is used by all of the resource nodes in the graph. Additional information from the proxy traversal can be added to each resource node's SetProvider() function call to help it determine which instance it should be using.

At the end of this process, each resource node should know both what "provider configuration" it should be trying to use, as well as if there is a provider key expression which needs to be evaluated within a given module path.

Resource Instance Visiting/Execution

When a resource instance is visited (Execute method called), it's first task is to determine which provider it should be using.

With all of the previously described machinery in place, it has access to:

The required provider configuration block address (ProviderTransformer)
The module level provider key expression + module path (ProviderTransformer/ProviderConfigTransformer)
The resource provider key expression (configs.Resource)

The resource instance then determines if it should be using the module or resource provider key expression (only one is allowed). It evaluates that expression within the specified context and produces a "provider instance key".

It then asks the EvalContext for the providers.Interface using the provider configuration block address and the provider instance key, which together represent a fully-qualified provider instance address.

Once the provider instance has been selected, it can continue it's usual unmodified execution path.

Providers Instances in the State

Currently OpenTofu assumes that all instances of a resource must be bound to the same provider instance, because each provider configuration can have only one instance and each resource can be bound to only one provider configuration. This assumption is unfortunately exposed in the state snapshot format, which tracks the provider configuration address as a property of the overall resource rather than of each instance separately.

To allow for future generalization without the need to make further breaking changes to the state snapshot format, and to avoid making rollback to an older OpenTofu version difficult for anyone who has never used these new features, we will extend the state snapshot format to also track full provider instance information at the resource instance level only for resources whose provider instance selection uses a dynamic instance key.

Today's state snapshot format includes the following information for each resource (irrelevant properties excluded):

{
  "resources": [{
    "module": "module.example",
    "type": "aws_instance",
    "name": "example",
    "provider": "module.example.provider[\"registry.opentofu.org/hashicorp/aws\"].by_region",
    "instances": [
      {
        "index_key": "first"
      },
      {
        "index_key": "second"
      }
    ]
  }]
}

We will mode the "provider" property to each of the objects under "instances" which tracks the same information, with the addition of a trailing instance key for the specific selected provider instance:

{
  "resources": [{
    "module": "module.example",
    "type": "aws_instance",
    "name": "example",
    "instances": [
      {
        "index_key": "first",
        "provider": "module.example.provider[\"registry.opentofu.org/hashicorp/aws\"].by_region[\"us-west-2\"]"
      },
      {
        "index_key": "second",
        "provider": "module.example.provider[\"registry.opentofu.org/hashicorp/aws\"].by_region[\"eu-east-1\"]"
      }
    ]
  }]
}

The state snapshot loader will support both the old and new forms. If the resource-level "provider" property is present then any instance that does not have a "provider" property will have the value from the resource-level property copied into it, causing all of those instances to therefore refer to the same singleton provider configuration for the same meaning as before. Any resource instance that has both an instance-level provider address and a resource-level provider address will emit a warning and the instance-level address will take priority.

After potentially propagating the resource-level addresses into instances that didn't have such a property, the state snapshot loader will verify that all of the instances have provider instance addresses that differ only in the trailing instance key, and will fail with an error if not. That constraint preserves for now the fundamental assumption in the language runtime that each resource depends on exactly one provider configuration, while retaining the freedom to loosen that constraint in later versions without further changes to the state snapshot syntax and thus without the need for additional format migration code.

The current version of the OpenTofu v1.x Compatibility Promises at the time of writing contains the following statements:

You should be able to upgrade from any v1.x release to any later v1.x release. You might also be able to downgrade to an earlier v1.x release, but that isn't guaranteed: later releases may introduce new features that earlier versions cannot understand, including new storage formats for OpenTofu state snapshots.

An older version of OpenTofu would not be able to successfully load a state snapshot without a resource-level "provider" property, which would therefore be an example of "new features that earlier versions cannot understand". However, we try to avoid blocking downgrades whenever possible notwithstanding the above statements, because downgrading back to an earlier version shortly after upgrading can be a helpful workaround for newly-introduced bugs.

As a compromise then, the state snapshot writer will generate instance-level "provider" properties only if there's at least one instance that has an instance key. If all instances of a particular resource select the same no-key provider instance address then the writer will instead generate a resource-level "provider" property containing that single provider instance address. This means that any configuration that is not using the new features proposed in this RFC will continue to generate state snapshots that are backward-compatible with previous versions of OpenTofu (notwithstanding any other unrelated changes to the state snapshot format outside the scope of this RFC).

The state snapshot loader will parse both the resource-level address and the instance-level addresses and determine which to use for each resource instance. It will, for now, return an error if any of the instance addresses are not identical aside from the optional trailing instance key. This preserves our current simplification of each resource still being bound to exactly one provider configuration while offering a more general syntax that we can retain in future if we weaken that constraint. We don't yet know exactly what flexibility we will need for future features, and so capturing the entire provider instance address (and then verifying their consistency) gives us the freedom to refer to literally any valid provider instance address in future versions, without changing the syntax and thus without the need for new state snapshot format upgrade logic.

This section described changes only to the internal state snapshot format that is not documented for external use. We do not intend to extend the documented tofu show -json output formats yet as part of this change, because we want to retain design flexibility for later work and to learn more about what real-world use-cases exist for machine-consumption of the provider instance relationships before we design a final format that will then be subject to compatibilty constraints.

Note

An earlier version of this document instead proposed that the state snapshot writer would always populate both the resource-level and the instance-level "provider" properties.

The main implication of this decision is whether it is likely to be beneficial or harmful for older versions of OpenTofu (and potentially other software parsing the state snapshot format despite it not being subject to compatibility constraints) to be able to still find the resource-level provider configuration address for a resource whose instances each dynamically select an instance of that configuration.

The current proposal text aims for the compromise of intentionally making older versions of OpenTofu fail if there is any resource making use of the new features in this proposal, but to succeed if this feature has not been used.

In particular this means that anyone who wishes to downgrade to an older version of OpenTofu after using these features will need to first rewrite their configuration to remove uses of these features and run tofu apply to return the state into a backward-compatible form. If we were to permit older versions of OpenTofu to read a state snapshot generated from use of these features then the older versions would misunderstand the references in the state and fail in a more confusing way, whereas the versions of OpenTofu that support these new features will have explicit support for generating old-style state snapshots whenever that's possible.

Open Questions

Dynamic or early-evaluated provider dependencies

Should variables be allowed in required_providers now or in the future? Could help with versioning / swapping out for testing?

variable "version" {
  type = string
}

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = var.aws_version
    }
  }
}

Source address and version selection is primarily an installation-time concern rather than a runtime concern, as long as the source argument is always populated consistently. Therefore there is no real interaction between supporting multiple provider instances and supporting programmatically-generated source information, and so the decision about whether and how to support the latter can be left for a later discussion.

Dynamic or early-evaluated provider aliases

Early discussion for this feature considered possibly allowing more than just constant strings in the alias argument. If we supported this it would likely be limited only to early-evaluation to preserve the requirement that each provider block have a known alias when we build the main language runtime's dependency graph. For example:

provider "aws" {
  alias = var.foo
}

We have intentionally omitted this for now because we don't know of any real use-cases for it, and it's conceptually simpler to explain alias as being effectively equivalent to the second label in the header of a resource block -- a statically-chosen unique identifier for use in references elsewhere -- than as another possibly-dynamic element in addition to the dynamic instance keys.

We can safely add early-eval-based aliases in a later release without any breaking changes, so we will await specific use-cases for this before we decide whether and how to support it.

Future Considerations

A potential fully-dynamic version of this feature is discussed in another RFC: Dynamic Provider Instances and Instance Assignment. We don't intend to support that immediately but have "designed ahead" to reduce the risk that backward-compatibility with the static eval implementation would block a later dynamic implementation.

If we ever decide to implement Static Module Expansion, how will that interact with the work proposed in this RFC?

Potential Alternatives

Go the route of expanded modules/resources as detailed in Static Module Expansion

Concept has been explored for modules
Not yet explored for resources
Massive development and testing effort!

Alternatives to the warning about `for_each` expressions being "too similar"

The behavior described in The same expression for for_each in both a provider configuration and its associated resources is a compromise intended to allow module authors maximum flexibility, which comes at the expense of us therefore being unable to give strong guidance as to exactly how an author might solve the problem the warning describes.

We also considered various options that would involve being more opinionated about exactly how a module author should filter their input collection in resource/module block for_each, including but not limited to:

Forcing the use of null element values to represent "disabled", providing a built-in function that automatically removes such elements from a given map, and then requiring that function to be used in the for_each of any resource/module block associated with a multi-instance provider configuration.
Introducing a function that takes a map and a set whose elements are a subset of the keys from the map, which returns a new map with those keys removed, and then requiring that function to be used in the for_each of any resource/module block associated with a multi-instance provider configuration.

Both of these would both have a higher likelihood of a configuration meeting the requirements actually being correct, and would allow our error messages to be considerably more specific about what is required to solve the problem, but they also both force a particular module design strategy that is unlikely to match all module authors' tastes.

Ultimately we prefer to let module authors freely decide how to solve this problem, even at the risk of them accidentally writing something that is sufficient to quiet the warning but not actually sufficient to solve the problem the warning describes.

Anyone who addresses this problem incorrectly and thus ends up in the "trap" despite their efforts will be able to move forward by using tofu destroy with the -target=... option to force destroying the managed resource instances before removing the provider instance that manages them.

56 KiB Raw Permalink Blame History