This commit is contained in:
Martin Atkins 2025-02-25 13:44:37 -05:00 committed by GitHub
commit 1e90d379f4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -0,0 +1,447 @@
# Explicit mechanism for working with Temporary Files
Issues: This is indirectly motivated by both [#1086](https://github.com/opentofu/opentofu/issues/1086) and [#1942](https://github.com/opentofu/opentofu/issues/1942), but also represents a distinct feature in its own right.
Although ideally an OpenTofu configuration would do all of its work in local RAM and remote APIs, there are some situations where the most appropriate technique is to write a temporary file to disk and then read that file back into memory in some other location.
For example, when preparing large files to be uploaded into blob storage systems like Amazon S3 it's common to want to prepare the file using some other provider and then ask the `hashicorp/aws` provider to read that file and upload it into an Amazon S3 bucket (or equivalent for other blob storage services).
OpenTofu has no built-in features for managing temporary files, and so informal patterns have emerged. The two most prominent approaches are:
- Use `path.cwd` to generate a path referring to a file in the current working directory, and then pass that absolute path to all of the providers that need to write or read the file in question.
This keeps all of the generated files together in one place, but requires vigilance in managing what is effectively a shared namespace of filenames. The most robust form of this pattern involves the root module deciding all of the paths and then passing them as strings to other modules, so that the root module can act as the central coordinator of its own namespace.
- Use `path.module` to generate a path relative to a specific module, and then have that module return its generated path as an output value so that other parts of the configuration can read the generated file.
This is largely the same as the first point but has the minor benefit of allowing each module to manage its own filesystem namespace. However, it's particularly problematic when a module is called using `for_each` or `count` because then multiple instances of the module all share a single directory, and will clobber each other's written files unless special care is taken.
This technique also has two major externalities:
- `tofu init` currently makes a separate copy of a module package for each `module` block that refers to it, because that then at least gives each `module` block a separate directory to read/write notwithstanding the fact that `for_each` and `count` don't get that benefit. [Issue #1086](https://github.com/opentofu/opentofu/issues/1086) asks for this to be changed, but changing it unilaterally would break existing configurations that include modules which self-modify their own directories.
- OpenTofu does not include module package selections in the dependency lock file, and while this is not the _sole_ reason a big part of it is that being able to verify package checksums relies on packages being immutable. This pattern of self-modifying module packages using `path.module` was already well-established before the introduction of the dependency lock file and so the initial dependency lock file focused only on providers, but [issue #586](https://github.com/opentofu/opentofu/issues/586) and [issue #1942](https://github.com/opentofu/opentofu/issues/1942) both make the reasonable request to include modules in the dependency lock file too.
OpenTofu is long overdue to offer a built-in, robust, and standardized method for dealing with temporary files. A solution to this need should meet each of the following criteria to at least some extent:
1. Temporary files are written into a location separate from any module source packages or any other packages for which it's useful to calculate checksums.
OpenTofu supports installing module packages from a variety of locations with different levels of trust, and different users want different levels of guarantee against source code modification at different phases. For example, any organization relying on policy checks between plan and apply is effectively relying on all dependencies, including module packages, remaining unchanged between plan and apply. Therefore as with provider packages it should be possible to re-verify that dependencies are unchanged even after packages have been installed to the local cache directory.
2. Individual module instances each have their own independent namespace of temporary files, so that modules written by different authors can collaborate together successfully without conflict and without excessive out-of-band coordination.
3. Files generated by provider actions during the planning phase should, as far as is practical, be made available automatically to the apply phase with minimal external orchestration support.
However, given that these are framed as _temporary_ files, the files generated in one plan/apply round should _not_ automatically propagate to a second plan/apply round that happens to occur in the same working directory on the same computer. A subsequent plan/apply round is supposed to consider only information expressly provided or requested in the configuration, and not contextual cruft created by earlier work on the same computer.
4. Generated path names should not be _absolute_ filesystem paths, and should be normalized as much as possible against operating system differences, to avoid specific absolute filesystem layout details from any particular computer being captured into saved plan files and state snapshots.
(This is _mainly_ a cosmetic concern: it'd be confusing to review an attribute diff from `C:\temp\whatever.txt` to `/tmp/whatever.txt` just because you're using Linux but the most recent apply was run on a Windows machine. But absolute paths can also often include potentially-sensitive information such as the name of the user running OpenTofu, or the name of some internal project that isn't yet widely-announced, so this is also a security concern at least indirectly.)
5. Generated temporary files must be placed into a location that is writable/readable by existing, unmodified OpenTofu provider releases, to avoid splitting the ecosystem into temporary-file-friendly vs. legacy provider plugins.
In practice this means that the files must be generated in the real OS filesystem as observable to OpenTofu Core and the plugins; we cannot use any special higher-level filesystem abstraction because existing providers would not immediately understand how to access it.
Ideally this approach would also work for situations where the provider is running on a different computer than OpenTofu Core, to enable big ideas like [running providers on remote systems over SSH](https://github.com/opentofu/opentofu/issues/1138), but unfortunately the assumption of a shared OS filesystem between Core and providers currently runs very deep, and so is highly unlikely to be weakened in the foreseeable future. Therefore supporting different providers having different views of the filesystem from one another, or providers having different views than OpenTofu Core has, are explicitly non-requirements under this proposal.
6. Subjectively, the new pattern should be easy to use correctly and hard to use incorrectly.
However, notably it's _not_ a goal to prevent a module author from actively subverting the convention. Intentionally generating a path that traverses out of the designated temporary directory and into other unrelated directories is incorrect usage, but modules already have considerable power to defy convention and it's not a goal of this proposal to begin imposing hard constraints.
Several of these requirements are in tension with one another. The following proposal therefore includes some compromise while still aiming to remain as true as possible to the requirements as stated.
## Proposed Solution
In the OpenTofu language, a new referenceable symbol `path.temp` evaluates to a local filesystem path that is unique to the module instance where the symbol was written.
As with most of the other referenceable symbols, `path.temp` produces a different value depending on which module instance it is being evaluated in, and so should be understood as "the temporary directory for this module instance" in the same way that `path.module` currently means "the directory containing the source code this module instance is using".
A module author may use `path.temp` when specifying the path where a temporary file is to be created, and assume that the OpenTofu language runtime will provide a path that is sufficiently unique to avoid conflicts with any other module or module instance. In particular, if a `module` block uses `count` or `for_each` then each instance of the child module observes its own unique `path.temp` result.
For example,
```hcl
terraform {
required_providers {
archive = {
source = "hashicorp/archive"
}
}
}
data "archive_file" "example" {
output_path = "${path.temp}/source-package.zip"
type = "zip"
# ...
}
output "source_package_path" {
# The module can selectively export references to temporary
# files it created, and thus allow other parts of the
# configuration to refer to this file. The module has the
# right both to decide the meaning of path names under its
# own path.temp prefix _and_ to explicitly offer some or
# all of those path names to be read by resources in
# other modules.
value = data.archive_file.example.output_path
}
```
The path returned by `path.temp` will point to a subdirectory of the `.terraform` working directory state directory, but that's an implementation detail rather than a guarantee and so will be discussed further in technical approach below.
Any files placed into a directory accessed through a reference to `path.temp` are included as part of a saved plan file, if requested. A subsequent `tofu apply` of that saved plan file would begin by extracting the files into an equivalent location under `.terraform`. Therefore any existing automation that already carries saved plan files between plan and apply phases running in different contexts will find temporary files automaticaly preserved between the plan and apply phases.
(For some fuller use-case examples, refer to the appendix [Use-case Examples](#appendix-use-case-examples).)
### User Documentation
The main reference documentation for this feature aimed at module authors belongs in [References to Named Values](https://opentofu.org/docs/language/expressions/references/) in the existing section [Filesystem and Workspace Info](https://opentofu.org/docs/language/expressions/references/). The following is a copy of the content there at the time of writing, modified to include the new feature:
> The following values are available:
>
> - `path.module` is the filesystem path of the module where the expression is placed. We do not recommend using `path.module` in write operations; use `path.temp` as a location for temporary files.
> - `path.temp` is the path to an automatically-generated temporary directory where a module can arrange to write temporary files. Each module instance receieves a unique `path.temp` value, so module authors can assume exclusive control of file and directory name choices under the returned directory. Files created under this prefix are automatically captured into a saved plan file and then extracted for use during the apply phase.
> - `path.root` is filesystem path of the root module of the configuration. This is included for completeness but general-purpose shared modules should not typically make assumptions about which modules are calling them.
> - `path.cwd` is the filesystem path of the original working directory from where you ran OpenTofu before applying any `-chdir` argument. This path is an absolute path that includes details about the filesystem structure. It may be useful in some unusual cases where OpenTofu is run from a directory other than the root module directory, but we recommend preferring `path.module` or `path.temp` where possible.
> - `terraform.workspace` is the name of the currently selected workspace.
>
> _[...some unmodified content skipped...]_
>
> Aside from `path.module` and `path.temp`, we recommend using the values in this section only in the root module of your configuration.
>
> If you are writing a shared module which needs a prefix to help create unique resource names, define an input variable for your module and allow the calling module to define the prefix. The calling module can then use `terraform.workspace` to define it if appropriate, or some other value if not:
>
> _[...existing example unchanged, since it's not relevant to filesystem paths...]_
The CLI documentation page [Command: plan](https://opentofu.org/docs/cli/commands/plan/) provides the operator-oriented description of the "saved plan file" functionality. The writing there does not currently make any commitments about what is actually covered by a saved plan file, and so this seems like a good opportunity to elaborate on that a little:
> You can use the optional `-out=FILE` option to save the generated plan to a file on disk, which you can later execute by passing the file to `tofu apply` as an extra argument. The saved plan file includes a record of the actions that OpenTofu proposed to take, a snapshot of the prior state the plan was based on, a copy of the module source code used to generate the plan, and any temporary files generated under `path.temp` directories during the planning phase. This two-step workflow is primarily intended for when running OpenTofu in automation.
(In a finalized form of this document it would be helpful for the `path.temp` text in this paragraph to link to the "Filesystem and Workspace Info" section discussed earlier, to better explain what that is referring to.)
[Issue #1842](https://github.com/opentofu/opentofu/issues/1842) is considering the addition of one or more guides on automating OpenTofu in CI-like systems, and such documentation is also often useful to integrators building OpenTofu-specific automation/collaboration products (so-called "TACOS"). Such guides would likely be a good place to further elaborate on how this feature behaves operationally, and also include the context that there will undoubtedly still be some older modules that continue with their self-modifying behaviors, and so those building automation around OpenTofu will need to decide whether or not to take the extra steps required to support such modules.
### Technical Approach
#### Temporary Path Layout
The most significant constraints on the technical approach are that the temporary files need to be created in the real OS filesystem and that the generated path strings must remain character-for-character equal between the plan and apply phases.
The filesystem paths generated using `path.temp` references will inevitably appear in attribute values in OpenTofu plan output and occasionally in error messages. Neither of those contexts is very amenable to presenting very long string values, and so ideally the generated directory prefixes should be as short as possible and of relatively uniform length. This motivates many of the specific decisions that follow.
In the initial implementation temporary directories will be placed under the `.terraform/tmp` prefix, extending the existing `.terraform` directory used for transient working directory state. `path.temp` will always return a string that begins with `.terraform/tmp`, although that's an implementation detail rather than a compatibility guarantee.
Beneath that prefix there would be two mandatory levels of further heirarchy:
1. A pseudorandomly-generated "plan ID" string. This ensures that two consecutive `tofu plan` commands in the same working directory will each have their own "private" directory.
This should either be a standard-compliant UUID or something with similar uniqueness properties. It should be highly unlikely that two plans created by the same team will be issued the same plan ID.
The generated plan ID is saved as part of the metadata in a saved plan file so that it can persist to an apply phase potentially running in a different working directory on a different computer.
2. A suitably-strong hash of the full address of the module instance where the `path.temp` expression was evaluated. This ensures that `path.temp` references under different module paths will each have their own private directory.
This should use a hashing algorithm where collisions between two different module addresses are highly unlikely but where the resulting string serialization is not excessively long, because these generated paths will inevitably appear in plan output and error messages and neither of those contexts are well-suited to presenting excessively long unbroken strings.
Using a hash rather than the literal address has a few related benefits. Firstly, typical hash algorithms generate output of a uniform length regardless of the length of the input, and so we can tailor our choice of hash algorithm and its string presentation to produce a predictable string length with a predictable degree of uniqueness. Secondly, a module instance address string potentially contains instance key strings which are allowed to contain any character in the Unicode repertiore, and attempting to map that character repertoire to directory names portably across operating systems and specific filesystems is a complex problem, whereas with a hash we can choose a more constrained character set that includes only characters that are likely to be treated compatibly across systems.
(This RFC intentionally doesn't nail down specific choices of unique ID format and hash format yet. If this idea seems promising then we'll experiment further with those details as part of building a proof-of-concept. The key tradeoff is uniqueness vs. length, to avoid presenting excessively-long path strings in the UI, and so generation techniques and string-encoding techniques are our main tools in making that tradeoff.)
With all of that taken together, `path.temp` in a particular module would return a string shaped like `.terraform/tmp/<PLAN-ID>/<MODULE-ADDR-HASH>`. As with the other `path.` symbols, module authors would be encouraged to concatenate further elements separated by forward slashes, since that syntax is the most portable across the operating systems that OpenTofu supports.
A "plan ID" technically needs to be generated only when there's at least one `path.temp` reference somewhere in the configuration, but the cost of generating and storing it is relatively low and so it's overall simpler to just always generate one for any newly-constructed [`plans.Changes`](https://pkg.go.dev/github.com/opentofu/opentofu@v1.8.3/internal/plans#Changes) object. The ID would be recorded initially as a field of that type, but also persisted in the saved plan file format by extending [`planfile.Create`](https://pkg.go.dev/github.com/opentofu/opentofu@v1.8.3/internal/plans/planfile#Create) and its opposite [`planfile.Reader.ReadPlan`](https://pkg.go.dev/github.com/opentofu/opentofu@v1.8.3/internal/plans/planfile#Reader.ReadPlan).
The per-module-instance hash portion can be generated deterministically from the [`addrs.ModuleInstance`](https://pkg.go.dev/github.com/opentofu/opentofu@v1.8.3/internal/addrs#ModuleInstance) representing the module where the expression is being evaluated. The language runtime's main implementation of [`lang.Data`](https://pkg.go.dev/github.com/opentofu/opentofu@v1.8.3/internal/lang#Data), used as the data source for expression evaluation, [already "knows" the address of the module instance it's evaluating within](https://github.com/opentofu/opentofu/blob/0d1e6cd5f0a23e9abdff8a583dce25c54c3701b3/internal/tofu/evaluate.go#L97-L99), and so [the `GetPathAttr` method](https://github.com/opentofu/opentofu/blob/0d1e6cd5f0a23e9abdff8a583dce25c54c3701b3/internal/tofu/evaluate.go#L576) has sufficient information to compute that portion of the path on request.
To ensure that the returned path is immediately ready to use, the `GetPathAttr` method must also attempt to create the directory it is intending to return if it doesn't already exist. If creation fails with any error other than `EEXIST` (or equivalent on non-POSIX platforms) then the evaluation of the `path.temp` reference expression fails with an error diagnostic, thereby halting further evaluation.
#### Adding files to the saved plan file format
OpenTofu's saved plan format is currently implemented as a ZIP archive with a specific convention for how the file paths inside are used. At the time of writing (circa OpenTofu 1.8) the current convention is to include:
* `tfplan`: A protocol-buffers-serialized representation of most of the `plans.Plan` data structure.
* `tfstate`: A state snapshot of the state as it was after all of the refresh steps were performed during planning, in the same JSON-based state snapshot format used for storage in state backends. The content of this represents what OpenTofu internals call the "prior state".
* `tfstate-prev`: A state snapshot of the state as it was before any refresh steps, in the same JSON-based state snapshot format. This includes the effect of OpenTofu's own state format upgrade rules and providers' own upgrade rules, but is not updated to capture any changes made in remote systems. The content of this represents what OpenTofu internals call the "previous run state".
* `.terraform.lock.hcl`: A direct copy of the dependency lock file that was present in the working directory when the plan was created. The contents of this must match the dependency lock file present when the plan is applied, to ensure that the plan is being applied with the same providers that created it.
* `tfconfig/`: A directory prefix containing copies of all of the `.tf`, `.tofu`, `.tf.json`, and `.tofu.json` source files that were used to compute the "desired state" for the plan. When applying a saved plan, OpenTofu uses this configuration snapshot instead of the loose files on disk.
This format was specified as a ZIP archive with the intention of it eventually growing to be used as a virtual filesystem so that it would be possible to apply a saved plan with nothing other than the saved plan file itself. However, that lofty goal turned out to be impractical because of the constraint that existing providers expect to be able to resolve paths directly from the real OS file system using direct system calls. Retaining only the OpenTofu source files was accepted as a compromise, since those files are read by OpenTofu CLI itself and thus OpenTofu _can_ use a virtual filesystem abstraction to read them directly from the `.zip` archive without extracting them first.
As a pragmatic compromise, this proposal does not propose to extend the virtual filesystem abstraction and does not try to solve for preserving _all_ possible files; instead we focus only on preserving files placed under the `path.temp` prefixes in particular. For modern modules written to exploit the features described in this document, `path.temp` should be the only place where the modules write files at plan time. Legacy modules following the current de-facto patterns will continue to require special handling of any generated files, but automation implementers can set their own policy for whether they intend continue supporting such modules over time, depending on how well-adopted these features are in practical modules.
This proposal would introduce a new directory prefix `tmp/` in the saved plan zip files. Under that prefix is one directory for each distinct module instance address hash as described in the previous section. Within each such directory is a copy of every file that was present in the corresponding directory under `.terraform/tmp` after the plan phase completed.
OpenTofu will also preserve file metadata at a similar level of detail to what a Git tree can capture, which includes distinguishing between executable and non-executable files but _not_ preserving other metadata such as file ownership, read/write permissions, and extended attributes. Symlinks are allowed only if the target is specified as a relative path and the relative path does not traverse out of the bounds of a particular module instance prefix. Empty directories are not preserved.
As implied earlier, the existing `tfplan` file would also grow to include a new field for the pseudorandomly-generated "plan ID", which we can assume to be common across all `path.temp` expansions generated in a particular plan.
#### Recovering the temporary files during a separate apply phase
When using `tofu apply` with a saved plan file, Tofu CLI would check the plan file for any entries with the `tmp/` prefix. If any are present, it would retrieve the plan ID from the `tfplan` file and append it to `.terraform/tmp` to reconstruct the base directory path into which all of the files under the `tmp/` prefix need to be extracted.
Before beginning the true apply phase in the language runtime, Tofu CLI must ensure that the directory structure under the generated prefix matches the files present under `tmp/` in the plan file, which includes creating any files that aren't already present, updating any pre-exiting files to match the plan file, and deleting any files that are not present in the plan file. The ultimate goal is to create the illusion that the plan is being applied in the same filesystem where it was created, so that file paths that were captured as strings in the OpenTofu execution plan
can all resolve to equivalent files in the apply execution environment.
The `.zip` archive format can preserve metadata at a greater level of detail than we commited to support in the previous section. Although OpenTofu itself will never construct a plan file using other metadata, the extraction code must be robust against maliciously-tampered plan files and will fail with an error if it encounters metadata outside the limits of what the plan file writer would encode.
OpenTofu does not offer any compatibility in plan file format between versions of OpenTofu: it's assumed that a plan will always be applied with exactly the same version of OpenTofu and exactly the same provider versions that created it. It's also assumed that nothing other than OpenTofu directly reads or writes the saved plan file format. Therefore any details described in this and the previous section can change in future versions of OpenTofu if we learn that it would beneficial to capture files at a different level of detail or to represent the same information in a different way.
#### For Combined Plan/Apply
The previous two sections were focused on the automation-oriented workflow of saving a plan to a file and then applying it at some later time, possibly in a different directory on a different computer.
When _not_ using automation, most users just run `terraform apply` without a saved plan file and thus they effectively run both the plan phase and the apply phase together as a single command. In this case we can assume that plan and apply are both running in the same working directory on the same computer, and thus the content of `.terraform/tmp` will be faithfully preserved between the two phases without any special effort.
Therefore the CLI layer's handling of combined plan/apply does not need to change at all: the language runtime will produce the same results for `path.temp` references in the plan phase as in the apply phase, and so both phases will refer to the same files.
### Open Questions
- `TF_DATA_DIR` handling: By far the common case is to let OpenTofu select the default subdirectory `.terraform` as the place to keep transient working directory state. However, OpenTofu already allows setting the environment variable `TF_DATA_DIR` to select a different place to keep all of that information.
This mechanism already causes some minor problems in separated plan/apply because the `TF_DATA_DIR` value becomes encoded as part of any `path.module` results for remote modules that have been cached under whatever prefix is acting as the `.terraform/modules` directory. Unless `TF_DATA_DIR` is set identically both during plan and apply, the apply phase can potentially fail trying to read modules from the wrong location.
I have assumed in this RFC that we are willing to accept a similar level of potential-brokenness when what would normally be the `.terraform/tmp` directory is redirected elsewhere using this environment variable. As with the situation with module package directories, it would be the user's responsibility to ensure they are using `TF_DATA_DIR` in a reasonable way. Any reasonable technique that already works correctly with `.terraform/modules` should also work for `.terraform/tmp`.
Is that an acceptable tradeoff for this situation? Are there additional constraints unique to `path.temp` that would not have already been encountered by those using `path.module` in conjunction with `TF_DATA_DIR`?
- Including all of the files from under `.terraform/tmp` in saved plan files means that a typical saved plan file could be materially larger than we're accustomed to today.
There are already numerous ways that a troublemaking OpenTofu operator could cause a saved plan file to be excessively large, such as including a `.tf` file containing a 5GiB comment or loading a huge file into a resource instance argument using the `file` function.
However, I could accept the argument that this new feature actively _encourages_ including large files as part of saved plan files, whereas the existing situations where that occurs are likely to be accidental or exceptional.
The cost of transferring and storing saved plan files is felt most acutely by providers of OpenTofu collaboration services, so I would welcome their feedback on whether they feel concerned about a growth in the size of saved plan files and in particular whether they expect this to be more burdonsome than whatever existing strategies they are using to allow files to be generated during planning and then used during the apply phase.
- I've framed this proposal as eventually making it easier for us to implement other features that would rely on module source packages being immutable at runtime (discussed further under "Future Considerations" below), but there is a notable gap: as long as there might be modules out there that modify themselves -- which I think we can assume is _forever_ -- anything that would rely on immutable module packages would need some way to distinguish between a package that is safe to treat as immutable and one that isn't.
An overall problem with any answer to this question is a problem of granularity: the _module installer_'s atomic unit is the "module package" -- a single artifact that is either downloaded in its entirety or not downloaded at all, such as a Git repository or a `.zip` file fetched over HTTP. A module package is allowed to contain any number of modules, and so if we want to classify a particular module package as safe-to-checksum or safe-to-share-source-directory that suggests a need to analyze both the initial requested module and any other modules in the same package that are reached by relative source addresses. Therefore a broad question to answer, regardless of details, is to decide the appropriate level of granlularity for the decision: individual modules, or entire module packages?
On the individual-modules end of the scale, one partial answer is to assert that any module using `path.temp` _must_ be written for at least the version of OpenTofu that introduced this feature, and assert that such modules are required to be immutable-friendly. However, that is both a rather tenuous connection (for example, a single module package could contain a mixture of immutable-friendly and -unfriendly modules) and also provides an incomplete signal: it can't tell us whether a module that _doesn't_ use `path.temp` is intended to be immutable-compatible.
With a whole-module-package lens instead, another potential answer is to have immutable-friendly module _packages_ explicitly mark them as such in some way, which would then opt in to the new treatments. However, we have no existing precedent for module-package-level metadata. We could potentially establish a metadata file such as `.tofu-pkg` that can be optionally placed at the root of a module package and provide package-level metadata. Other problems that such a file could help solve have arisen intermittently over the years, but no single one has been strong enough to carry the weight of introducing it so far.
The `path.temp` feature itself does not actually depend on there being a way to determine whether a particular module package can be treated immutably, but I motivated this proposal by being an increment towards two other proposals that _do_ need that, and so the incomplete answer here feels unsatisfying.
- Do we need some explicit command for cleaning up the `.terraform/tmp` directory as a whole, or cleaning up a single plan-specific directory under it?
It has always been a design gap in OpenTofu that anything left behind on a user's system after you complete `tofu apply` is the user's problem to clean up. For example, any files written into `path.module` or `path.cwd` using today's patterns just gets left behind in the working directory after the apply is complete, with no explicit way to clean it up and return to the initial filesystem state.
It's tempting to argue that `.terraform/tmp` should just be treated in the same way: your configuration will write things in there and then they'll just say there until you clean them up manually at some point. However, there is one notable difference: with the `path.module` pattern it's typical for subsequent plans to simply clobber over the files left behind by previous rounds -- assuming the module doesn't react poorly to the presence of the new files -- but `.terraform/tmp` is intentionally designed so that each new plan/apply round generates a new directory, and so if running many consecutive rounds in the same working directory those directories will "pile up" and could eventually consume a lot of disk space.
Given that this is explicitly presented as a _temporary_ storage location -- where "temporary" in this case means only for the duration of a single plan/apply round -- we could potentially decide to have OpenTofu delete the directories automatically in the common case. The following is a hypothetical set of rules for how that could work:
- In the one-shot `tofu apply` case where the plan and apply phases run consecutively in a single execution, OpenTofu CLI would automatically delete the directory corresponding to the current plan once the apply phase is complete, unless the apply phase returns an error. If the apply phase is unsuccessful then the directory would be left in place to aid in debugging, so the user would need to manually delete it once they are satisfied.
- In the separated plan/apply case using saved plan files, CLI would delete the directory immediately after capturing its content into the saved plan file, on the assumption that it'll be recreated again if the operator chooses to apply the plan. An unsuccessful plan phase does not generate a saved plan file, and so in that case the loose files would be left on disk for debugging purposes. As with the previous point, a successful apply phase would also cause the files to be deleted, but an error during apply would leave the files behind for debugging.
- In any case where OpenTofu CLI exits with loose files left on disk due to an error, it would add a warning diagnostic that includes the path to the plan-specific temporary directory both so that the operator can be aware that it's been left there and to make sure they definitely have at least one readout of the randomly-selected plan ID to be able to differentiate the relevant directory from any others that might still be left there from previous failed plan/apply rounds.
I don't yet have a good signal for whether this extra behavior is actually needed. If we made an initial release that did _not_ automatically delete the directories and then retrofitted it later then that might be considered a breaking change for some esoteric use-cases where e.g. someone returns a `.terraform/tmp` path in a root module output value with the expectation of using it in a subsequent step outside of OpenTofu. Therefore I suspect we'll need to decide whether or not to include this behavior before the first release that includes this feature, so that the initial users will then design their uses of it around the constraints of the behavior we've selected.
### Future Considerations
As noted back at the top, although this feature seems useful in its own right the motivation for proposing it _now_ was in helping to address one of the root problems that has made it harder to implement some other popularly-requested features. The reliance on a shared read/write filesystem is also a significant constraint for potential sandboxing of provider plugins. The following sections summarize those related concerns and how `path.temp` could help with those in the long run.
### Module packages in the dependency lock file
OpenTofu's `init` subcommand generates [a dependency lock file](https://opentofu.org/docs/language/files/dependency-lock/) to record the specific versions of external dependencies that were selected during the initialization process, with three main goals in mind:
- Running `tofu init` again in the same directory, or running it on another computer after cloning out the same source code, is guaranteed to select the same versions of each provider. This removed the need to abuse the version constraints mechanism (originally intended for describing compatibility with dependencies, not specific dependency _selections_) to avoid inadvertently adopting new versions of a dependency.
- If reinstalling a dependency that was previously seen and recorded, OpenTofu additionally guarantees that the packages that get installed remain identical to those that the origin registry had reported as being the "official" packages, using the developer's cryptographic signature.
If an attacker manages to modify the upstream copy of a package and also publishes a new signature covering their modified copy using a different private key, the checksums recorded in the dependency lock file allow OpenTofu to detect and report the inconsistency, so the operator can investigate further and decide how to proceed.
- If running plan and apply separately using a saved plan file, OpenTofu verifies that the plan is being applied with the same providers that it was created with. This is important for correctness, because the `ApplyResourceChange` RPC for a particular provider plugin is only required to accept as input a verbatim copy of something previously returned by `PlanResourceChange`, and providers can potentially misbehave in unpredictable ways if that guarantee isn't maintained.
At the time of originally designing the dependency lock file, I had intended it to cover both provider _and_ module packages. However, while external provider packages had been designed from the outset to be treated as immutable (due to them being cryptographically signed), the module installation mechanism has been around for considerably longer and was not designed with such ideas in mind.
By the time I came to designing and implementing the dependency lock file, the prevailing patterns for existing modules already included these workarounds of using `path.module` or `path.cwd` to directly modify a module's own directory, and so attempts to capture and record checksums for module packages were very troublesome and often generated spurious inconsistency errors in practice due to the general assumption that modules were free to modify themselves arbitrarily at runtime (along with other reasons that I'll touch on below for context).
Although the introduction of `path.temp` cannot immediately compell all existing modules to stop self-modifying, it does at least establish an alternative pattern that addresses what is by far the most common reason for a module to modify itself today, hopefully setting us on a path where in future a greater percentage of the most commonly-used modules are distributed in packages that can be assumed to be immutable, and thus be reliably covered by checksums in the dependency lock file.
However, I do want to be clear that this issue is far from the _only_ change that would be required for OpenTofu to provide the same full set of guarantees for module packages as it does for provider packages (none of which are related to `path.temp`, but included here for context):
- The provider registry protocol does not have any mechanism for authors to sign their packages, so although we can generate checksums to determine whether a module package is still the same as the last time we saw it, we would not have any comparable idea of what is the "official" content of a module package.
(It is admittedly debatable whether such an idea is even needed for module packages, since they tend to be used internally within an organization far more than shared between organizations and it's easier to directly inspect source code than a binary package, but I mention it for completeness.)
- Whereas OpenTofu has only one distribution format for provider packages -- a `.zip` archive containing the plugin executable and any associated supporting files -- [OpenTofu supports a wide variety of different installation methods for modules](https://opentofu.org/docs/language/modules/sources/).
Each of those installation methods adds some additioanl variation into proceedings. For example, installing directly from a Git repository causes the resulting package directory to contain a `.git` directory containing repository objects and metadata, and so a naive checksum calculation would produce a different result than if identical module package content had been installed from a `.zip` archive in an S3 bucket. Normalizing those differences away is possible with some care, but there are far more different cases to cover than as for providers.
A design goal of the dependency lock file was that it should lock the _content_ of the dependendencies separate from their current physical source location, because it's inevitable that from time to time teams will need to make cross-cutting changes to their package distribution strategy -- for example, switching to a different vendor for the underlying service -- and having to bulk-update all locked dependencies across all configurations would both be highly annoying and would also undermine the "trust on first use" guarantees that the lock file aims to provide because a team making a large number of updates all at once is far less likely to inspect them all carefully.
- Whereas OpenTofu requires an entire configuration to agree on a single version to use for each provider, currently each `module` block is able to make its own independent selection abuot which version of a module to use.
This implies that a dependency locking model for module packages would likely require an additional level of indirection, where the first level specifies which version number (or equivalent) each `module` block has selected, and then the second level captures the expected checksums for each distinct selected version.
This is not a technical blocker but is still some extra complexity that needs to be considered when finally designing the feature.
### Sharing module package directories in the local cache
Ever since the earliest versions of OpenTofu's module installer, the system has created a separate directory under `.terraform/modules` for each `module` block. The exact details of how they are named and how they get placed there have evolved over the years, but this overall approach is what enabled the de-facto pattern of writing files into `path.module`.
The introduction of `count` and `for_each` for modules raised the question of whether each instance of a module also needed its own directory, for the same reason. Ultimately I proposed the compromise that exists today: no existing valid module could possibly be using `count` or `for_each` in a `module` block and so it's acceptable to make a breaking change, and at that time I was already anticipating making a proposal just like this one as the more general solution that would work for multi-instance module calls.
However, it's now been quite some years since that change and we find ourselves in the awkward situation where any module that is designed to write into `path.module` must be very carefully designed if it also wants to be compatible with `count` and `for_each`. At the time I had hoped that this limitation would both dissuade from future writing into `path.module` _and_ motivate prioritizing `path.temp`, but alas module authors are resourceful and worked hard to implement workarounds that therefore implicitly de-prioritized the more complete solution.
However, more recently operators have noticed that the structure of the `.terraform/modules` directory is wasteful of space and expensive to build in situations where a configuration depends on many different parts of the same source package. That situation is particularly troublesome in environments that place all of their modules together in the same Git repository, because the whole repository is Terraform's unit of "module package" and so `.terraform/modules` ends up containing an entire copy of the Git repository for every single `module `block that refers to it.
Implementing this proposal would immediately provide a better alternative for writing modules that both generate files on disk _and_ need to be used with `count` or `for_each`. This proposal does not _directly_ enable reusing the same cache directory for multiple calls to the same module, but it could potentially allow offering that approach as an opt-in behavior at first -- for users who know that all of their modules exclusively generate files in `path.temp` -- and hopefully eventually that becomes the common case rather than the exception and we would find it appropriate to pivot the default so that it's an opt-out for the few people who need to continue using legacy modules.
### Provider Sandboxing
Since the introduction of automatic installation of third-party providers there has been interest in renegotiating the level of access that a provider plugin has to the system where it is running. Provider plugins are entirely separate processes managed by the operating system and so the only constraints they operate under are those provided by the surrounding operating system.
This proposal builds on one particular implication of the current design: that OpenTofu Core and all of the provider plugins all have a common view of the OS filesystem, and so these components are able to collaborate by reading and writing files and assuming that filesystem paths generated in one component will resolve to the same file in another.
This proposal therefore does not materially change the current security model, for better or worse: there would now be an additional directory prefix in the filesystem that module authors are encouraged to use as _the place_ for filesystem-based collaboration, but the ability to use that directory is a logical consequence of the current design rather than something new this proposal is adding.
If a future RFC defines some mechanism for "sandboxing" plugins and/or OpenTofu Core in a stricter way than occurs today, then one or both of the following would need to be true whether or not we implement what I described in this RFC:
- When running existing providers as currently published, if the plugins no longer have a common view of the _entire_ OS filesystem then they must at least have _some_ shared directory prefix in common or else we'd break currently-working configurations.
If `path.temp` were implemented and were in wide use by the time we are considering sandboxing, we can presumably arrange for `path.temp` to be within the portion of the filesystem that is shared and we _might_ be able to segregate all other directories. If `path.temp` were not implemented and we did not implement anything similar to it, then keeping existing configurations working would certainly require the full set of directories containing _configuration_ to be shared, so that `path.module` and `path.cwd` would both refer to directories that all of the components have access to.
It's worth noting that the requirement here is only that OpenTofu Core and the plugins have at least one common directory prefix they can all access. There is not necessarily any requirement that the directory in question be accessible to any other processes on the system, or that changes to it be visible outside of the OpenTofu Core and plugin processes. For example, a hypothetical sandbox implementation could layer a shared copy-on-write filesystem over all or part of the real filesystem and arrange for all of the processes to share it, but with nothing written to that filesystem visible to other processes on the system.
- Another possibility is that "sandbox-capable provider" is an entirely new kind of provider, with OpenTofu supporting both the current style and the new style of provider and letting users choose whether they are willing to risk running unsandboxed providers or if they'd prefer to restrict themselves to the new style of provider.
For example, we could decide that the new kind of provider is distributed as a WebAssembly module using the WASI API, which uses a capability-based design that is far more amenable to fine-grain control over exactly what is shared between components by minimizing or eliminating [ambient authority](https://en.wikipedia.org/wiki/Ambient_authority).
If that, or something similar to it, were the chosen approach then we have considerably more options, because existing providers can go on doing what they are currently doing while providers written in the new style would be created under the new set of constraints and so would not have the legacy assumption that they can just take a filesystem path as an argument and read from it.
Under this sandboxing model `path.temp` isn't necessarily applicable, but it's also not _harmful_ because providers written without the assumption of a shared filesystem would not be designed with the expectation to be passed paths to files in temporary directories anyway. `path.temp` would remain useful for the long-standing existing providers that are built to work with the filesystem and where ceasing that usage would be a breaking change, but new-style providers would presumably choose not to accept arbitrary filesystem paths as arguments at all.
With all of that said, it seems that although the assumption of a shared filesystem is certainly inconvenient for any effort to introduce further isolation between provider plugins, that assumption is long-standing and a hypothetical sandboxing proposal needs to deal with it regardless of whether this proposal is implemented. It is possible, though certainly not guaranteed, that this proposal would successfully promote using only a single specific directory prefix as the location for shared files, in which case the subsequent sandboxing proposal may find that it's acceptable only to share _that_ directory rather than somehow sharing the entire virtual filesystem tree.
## Potential Alternatives
### Do nothing at all
Nobody has directly asked for this feature yet in OpenTofu, and so it would be reasonable to argue that this proposal is a solution in search of a problem.
If this proposal were declined, it's quite possible that we would find some other way to reliably or heuristically distinguish immutable vs. mutable module packages, or to reduce the requirements for package-dir-sharing and dependency locking so that immutable package directories are unimportant.
My position is that this feature is useful enough in its own right that it'd be worth doing even if it _didn't_ potentially remove some roadblocks on other features, but of course others may disagree. Community feedback will presumably ultimately decide this.
### Skip including temporary files in saved plan files
It is technically not necessary to extend the saved plan format to include generated temporary files.
Any existing robust and general-purpose automation around OpenTofu must presumably already have _some_ mechanism for preserving a full snapshot of the working directory from the plan phase to the apply phase, because otherwise various real-world modules would not work properly in that automation.
Therefore arguably placing the temporary files into the saved plan files creates _more_ work for those existing systems: unless they modify their automation to explicitly carve `.terraform/tmp` out of their filesystem snapshots, they'll be redundantly storing the temporary files both in the saved plan file _and_ in the full working directory snapshot.
The proposal to include temporary files in saved plan files is primarily intended to help green-field systems that haven't been built yet. An entirely new OpenTofu user can build their modules to use `path.temp` as the only means of generating files (no legacy modules to support) and in return they have one fewer annoying detail to handle in their automation around OpenTofu.
I would personally assert that asking existing systems to endure a one-time burden of changing their systems to exclude the `.terraform/tmp` directory from filesystem snapshots is worth it to produce a more robust design for future implementers, but of course I do understand that many of these participants _indirectly benefit_ from OpenTofu being tricky to robustly automate and so might not wish to invest in even small development effort that could ultimately slightly drain their figurative moats.
### Add a value presentation heuristic to the plan renderer
This particular section is not really an "alternative" so much as a potential addition that I am, for now, proposing to postpone for later work if we choose to do it at all.
OpenTofu CLI's built-in plan renderer includes a set of rules for presenting values in the OpenTofu type system in a way that is hopefully relatively easy for a human reader to understand and review. Most of those rules are essentially just quoting and escaping, such as presenting strings either as quoted template literals or heredoc template literals.
But the renderer also includes some more advanced heuristics, such as noticing that a string can be parsed as a valid JSON object or array and presenting it in a form that is similar to the syntax for calling the built-in `jsonencode` function. This therefore allows the human reader to hopefully focus on the meaning of the data structure in terms of JSON concepts, rather than on the character-by-character serialization as JSON _syntax_.
This proposal might benefit from another such heuristic: the plan renderer could notice when a string has the prefix `.terraform/tmp` followed by two path segments that match the patterns `path.temp` uses for plan ID and module address hash. If it finds such a prefix, and the specific plan ID and module address hash match those that would be valid for the current plan, then present the string as including a `${path.temp}` interpolation instead of the literal path.
As with all heuristic presentations of this type, this is a tricky tradeoff: it bakes in an opinion that certain details about the plan are not actually important to be reviewed by a human reviewer. The existing JSON heuristic is founded in the assumption that it should not matter whether a JSON string is indented or not, for example, and experience has shown that to be broadly true. Likewise, it might be arguable that the exact sequence of characters used to represent the plan ID and the module source address is irrelevant, and really all that matters is that the file is being written into the designated temporary location.
However, given the uncertainty about whether this transformation would help or hinder, along with some quirks in the details such as what should happen if the string contains a path to _some other module instance's_ temporary directory, I recommend to leave the plan renderer rules unchanged for the first iteration and then learn from feedback whether it would be helpful to refine the plan renderer in a later release.
## Appendix: Use-case Examples
The following sections include some worked examples of how this proposed feature might be used in situations that would typically be handled using `path.module` or `path.cwd` in today's OpenTofu.
### Preparing a source package for AWS Lambda
(This is a fuller version of the example in the "Proposed Solution" section above.)
```hcl
terraform {
required_providers {
archive = {
source = "hashicorp/archive"
}
aws = {
source = "hashicorp/aws"
}
}
}
data "archive_file" "sources" {
source_dir = "${path.module}/lambda-sources"
output_path = "${path.temp}/lambda-sources.zip"
type = "zip"
}
resource "aws_s3_object" "sources" {
bucket = var.lambda_artifact_bucket
key = "${data.archive_file.sources.output_sha256}/sources.zip"
source = data.archive_file.sources.output_path
etag = filemd5(data.archive_file.sources.output_path)
}
resource "aws_lambda_function" "example" {
name = "example"
s3_bucket = aws_s3_object.sources.bucket
s3_key = aws_s3_object.sources.key
s3_object_version = aws_s3_object.sources.version_id
handler = "index.example"
runtime = "nodejs18.x"
# ...
}
```
"Scale-to-zero" compute products like AWS Lambda are sometimes used as a form of software glue between different products in cloud platforms like Amazon Web Services. AWS Lambda, as with several other services of this type, expects the user program to be submitted as a `.zip` archive that can be extracted into a directory in the execution environment.
For more "application-deployment-like" situations I would typically recommend a design where the zip archive is built in a separate artifact-building pipeline, and then passed to the Terraform configuration only as the location in S3 where the archive was already placed. However, when Lambda is used for smaller-scale needs that are tightly coupled to a specific infrastructure configuration it's excessive complexity to maintain a separate build process alongside any Terraform plan/apply automation, and so it's pragmatic and convenient to ask Terraform to build the source package "just in time" during the planning phase, using the `archive_file` data source from the `hashicorp/archive` provider in this case.
Today's versions of this pattern will typically use either `path.module` or `path.cwd`, or in some cases no prefix whatsoever in which case it's _more-or-less_ the same as using `path.cwd`. Using `path.temp` instead means that each instance of this module has its own separate directory in which to create a `lambda-sources.zip` file, and so multiple instances in the same configuration will not collide with one another and the module doesn't need to modify its own source directory or the root module's source directory to perform this task.
### Reading a file created by a local-exec provisioner
This particular use-case is more a workaround than a true pattern, but nonetheless it is a _popular_ workaround across many modules in use today.
Despite the advice that [provisioners are a last resort](https://opentofu.org/docs/language/resources/provisioners/syntax/#provisioners-are-a-last-resort), `local-exec` in particular is undeniably an attractive way to hack in a little "glue code" to compensate for a missing feature in a provider, or even to compensate for there being no provider at all for a particular target system.
```hcl
terraform {
required_providers {
null = {
source = "hashicorp/null"
}
local = {
source = "hashicorp/local"
}
}
}
resource "null_resource" "workaround" {
triggers = {
meta_file = "${path.temp}/infrastructure-meta.json"
}
provisioner "local-exec" {
command = "our-special-in-house-provisioning-tool --out=${self.triggers.meta_file}"
}
}
data "local_file" "meta" {
filename = null_resource.workaround.triggers.meta_file
}
locals {
meta = jsondecode(data.local_file.meta.content)
}
```
I don't think anyone _enjoys_ writing OpenTofu configuration like this, but sometimes you've gotta do what you've gotta do. Today this pattern would again typically be implemented using `path.module` or `path.cwd`, and as with the previous example switching to `path.temp` helps ensure that multiple instances of this module can always execute independently of one another and that the module source directories are not modified.
I think that "paving the cowpath" here is helpful, even though ideally this problem would be solved by writing an OpenTofu provider for "our special in-house provisioning tool". Perhaps one day we'll have another feature that helps with this sort of glue _without_ writing temporary files to disk, but `path.temp` makes a relatively straightforward drop-in replacement for this commonly-used pattern that will hopefully make it relatively easy and low-risk to adopt for existing modules.