opentofu

mirror of https://github.com/opentofu/opentofu.git synced 2024-12-30 10:47:14 -06:00

History

Paul Hinze 6b6b5a43c3 provider/aws: serialize SG rule access to fix race condition Because `aws_security_group_rule` resources are an abstraction on top of Security Groups, they must interact with the AWS Security Group APIs in a pattern that often results in lots of parallel requests interacting with the same security group. We've found that this pattern can trigger race conditions resulting in inconsistent behavior, including: * Rules that report as created but don't actually exist on AWS's side * Rules that show up in AWS but don't register as being created locally, resulting in follow up attempts to authorize the rule failing w/ Duplicate errors Here, we introduce a per-SG mutex that must be held by any security group before it is allowed to interact with AWS APIs. This protects the space between `DescribeSecurityGroup` and `Authorize` / `Revoke` calls, ensuring that no other rules interact with the SG during that span. The included test exposes the race by applying a security group with lots of rules, which based on the dependency graph can all be handled in parallel. This fails most of the time without the new locking behavior. I've omitted the mutex from `Read`, since it is only called during the Refresh walk when no changes are being made, meaning a bunch of parallel `DescribeSecurityGroup` API calls should be consistent in that case.	2015-11-18 12:39:59 -06:00
..
mutexkv_test.go	provider/aws: serialize SG rule access to fix race condition	2015-11-18 12:39:59 -06:00
mutexkv.go	provider/aws: serialize SG rule access to fix race condition	2015-11-18 12:39:59 -06:00

Paul Hinze 6b6b5a43c3 provider/aws: serialize SG rule access to fix race condition

Because `aws_security_group_rule` resources are an abstraction on top of
Security Groups, they must interact with the AWS Security Group APIs in
a pattern that often results in lots of parallel requests interacting
with the same security group.

We've found that this pattern can trigger race conditions resulting in
inconsistent behavior, including:

 * Rules that report as created but don't actually exist on AWS's side
 * Rules that show up in AWS but don't register as being created
   locally, resulting in follow up attempts to authorize the rule
   failing w/ Duplicate errors

Here, we introduce a per-SG mutex that must be held by any security
group before it is allowed to interact with AWS APIs. This protects the
space between `DescribeSecurityGroup` and `Authorize*` / `Revoke*`
calls, ensuring that no other rules interact with the SG during that
span.

The included test exposes the race by applying a security group with
lots of rules, which based on the dependency graph can all be handled in
parallel. This fails most of the time without the new locking behavior.

I've omitted the mutex from `Read`, since it is only called during the
Refresh walk when no changes are being made, meaning a bunch of parallel
`DescribeSecurityGroup` API calls should be consistent in that case.

2015-11-18 12:39:59 -06:00

mutexkv_test.go

provider/aws: serialize SG rule access to fix race condition

2015-11-18 12:39:59 -06:00

mutexkv.go

provider/aws: serialize SG rule access to fix race condition

2015-11-18 12:39:59 -06:00