make op templates map based internally

2025-02-25 18:55:28 -06:00 · 2021-06-17 09:51:45 -05:00
parent 070086e655
commit beea2fdbf6
30 changed files with 726 additions and 277 deletions
--- a/devdocs/sketches/linearized/linearized.md
+++ b/devdocs/sketches/linearized/linearized.md
@@ -1,114 +1,97 @@
 # Linearized Operations

-NOTE: This is a sketch/work in progress and will not be suitable for
-earnest review until this notice is removed.
+NOTE: This is a sketch/work in progress and will not be suitable for earnest review until this
+notice is removed.

-Thanks to Seb and Wei for helping this design along with their discussions
-along the way.
+Thanks to Seb and Wei for helping this design along with their discussions along the way.

 See https://github.com/nosqlbench/nosqlbench/issues/136

-Presently, it is possible to stitch together rudimentary chained
-operations, as long as you already know how statement sequences, bindings
-functions, and thread-local state work. This is a significant amount of
-knowledge to expect from a user who simply wants to configure chained
-operations with internal dependencies.
+Presently, it is possible to stitch together rudimentary chained operations, as long as you already
+know how statement sequences, bindings functions, and thread-local state work. This is a significant
+amount of knowledge to expect from a user who simply wants to configure chained operations with
+internal dependencies.

-The design changes needed to make this easy to express are non-trivial and
-cut across a few of the extant runtime systems within nosqlbench. This
-design sketch will try to capture each of the requirements and approached
-sufficiently for discussion and feedback.
+The design changes needed to make this easier are non-trivial and cut across the runtime systems
+within nosqlbench. This design sketch will try to capture each of the requirements and approaches
+for discussion and feedback.

 # Sync and Async

 ## As it is: Sync vs Async

-The current default mode (without `async=`) emulates a request-per-thread
-model, with operations being planned in a deterministic sequence. In this
-mode, each thread dispatches operations from the sequence only after the
-previous one is fully completed, even if there is no dependence between
+The current default mode (without `async=`) emulates a request-per-thread model, with operations
+being planned in a deterministic sequence. In this mode, each thread dispatches operations from the
+sequence only after the previous one is fully completed, even if there is no dependence between
 them. This is typical of many applications, even today, but not all.

-On the other end of the spectrum is the fully asynchronous dispatch mode
-enabled with the `async=` option. This uses a completely different
-internal API to allow threads to juggle a number of operations. In
-contrast to the default mode, the async mode dispatches operations eagerly
-as long as the user's selected concurrency level is not yet met. This
-means that operations may overlap and also occur out of order with respect
-to the sequence.
+On the other end of the spectrum is the fully asynchronous dispatch mode enabled with the `async=`
+option. This uses a completely different internal API to allow threads to juggle a number of
+operations. In contrast to the default mode, the async mode dispatches operations eagerly as long as
+the user's selected concurrency level is not yet met. This means that operations may overlap and
+also occur out of order with respect to the sequence.

-Choosing between these modes is a hard choice that does not offer a
-uniform way of looking at operations. As well, it also forces users to
-pick between two extremes of all request-per-thread or all asynchronous,
-which is becoming less common in application designs, and at the very
-least does not rise to the level of expressivity of the toolchains that
-most users have access to.
+Choosing between these modes is a hard choice that does not offer a uniform way of looking at
+operations. As well, it also forces users to pick between two extremes of all request-per-thread or
+all asynchronous, which is becoming less common in application designs, and at the very least does
+not rise to the level of expressivity of the toolchains that most users have access to.

 ## As it should be: Async with Explicit Dependencies

-* The user should be able to create explicit dependencies from one
-  operation to another.
-* Operations which are not dependent on other operations should be
-  dispatched as soon as possible within the concurrency limits of the
-  workload.
-* Operations with dependencies on other operations should only be
-  dispatched if the upstream operations completed successfully.
-* Users should have clear expectations of how error handling will occur
-  for individual operations as well as chains of operations.
+* The user should be able to create explicit dependencies from one operation to another.
+* Operations which are not dependent on other operations should be dispatched as soon as possible
+  within the concurrency limits of the workload.
+* Operations with dependencies on other operations should only be dispatched if the upstream
+  operations completed successfully.
+* Users should have clear expectations of how error handling will occur for individual operations as
+  well as chains of operations.

 # Dependent Ops

-We are using the phrase _dependent ops_ to capture the notions of
-data-flow dependency between ops (implying linearization in ordering and
-isolation of input and output boundaries), successful execution, and data
-sharing within an appropriate scope.
+We are using the phrase _dependent ops_ to capture the notions of data-flow dependency between ops (
+implying linearization in ordering and isolation of input and output boundaries), successful
+execution, and data sharing within an appropriate scope.

 ## As it is: Data Flow

-Presently, you can store state within a thread local object map in order
-to share data between operations. This is using the implied scope of "
-thread local" which works well with the "sequence per thread, request per
-thread" model. This works because both the op sequence as well as the
-variable state used in binding functions are thread local.
+Presently, you can store state within a thread local object map in order to share data between
+operations. This is using the implied scope of "
+thread local" which works well with the "sequence per thread, request per thread" model. This works
+because both the op sequence as well as the variable state used in binding functions are thread
+local.

-However, it does not work well with the async mode, since there is no
-implied scope to tie the variable state to the op sequence. There can be
-many operations within a thread operating on the same state even
-concurrently. This may appear to function, but will create problems for
-users who are not aware of the limitation.
+However, it does not work well with the async mode, since there is no implied scope to tie the
+variable state to the op sequence. There can be many operations within a thread operating on the
+same state even concurrently. This may appear to function, but will create problems for users who
+are not aware of the limitation.

 ## As it should be: Data Flow

-* Data flow between operations should be easily expressed with a standard
-  configuration primitive which can work across all driver types.
-* The scope of data shared should be
-
-The scope of a captured value should be clear to users
+* Data flow between operations should be easily expressed with a standard configuration primitive
+  which can work across all driver types.
+* The scope of data shared should be clear to users when configuring op templates, and in any
+  diagnostic outputs from failed operations.

 ## As it is: Data Capture

-Presently, the CQL driver has additional internal operators which allow
-for the capture of values. These decorator behaviors allow for configured
-statements to do more than just dispatch an operation. However, they are
-not built upon standard data capture and sharing operations which are
-implemented uniformly across driver types. This makes scope management
-largely a matter of convention, which is ok for the first implementation (
+Presently, the CQL driver has additional internal operators which allow for the capture of values.
+These decorator behaviors allow for configured statements to do more than just dispatch an
+operation. However, they are not built upon standard data capture and sharing operations which are
+implemented uniformly across driver types. This makes scope management largely a matter of
+convention, which is ok for the first implementation (
 in the CQL driver) but not as a building block for cross-driver behaviors.

 # Injecting Operations

 ## As it is: Injecting Operations

-Presently operations are derived from statement templates on a
-deterministic op sequence which is of a fixed length known as the stride.
-This follows closely the pattern of assuming each operation comes from one
-distinct cycle and that there is always a one-to-one relationship with
-cycles. This has carried some weight internally in how metrics for cycles
-are derived, etc. There is presently no separate operational queue for
-statements except by modifying statements in the existing sequence with
-side-effect binding assignment. It is difficult to reason about additional
-operations as independent without decoupling these two into separate
-mechanisms.
+Presently operations are derived from statement templates on a deterministic op sequence which is of
+a fixed length known as the stride. This follows closely the pattern of assuming each operation
+comes from one distinct cycle and that there is always a one-to-one relationship with cycles. This
+has carried some weight internally in how metrics for cycles are derived, etc. There is presently no
+separate operational queue for statements except by modifying statements in the existing sequence
+with side-effect binding assignment. It is difficult to reason about additional operations as
+independent without decoupling these two into separate mechanisms.

 ## As it should be: Injecting Operations

@@ -126,8 +109,7 @@ Open concerns

 - before: variable state was per-thread
 - now: variable state is per opflow
- (opflow state is back-filled into thread local as the default
-  implementation)
+- (opflow state is back-filled into thread local as the default implementation)

 * gives scope for enumerating op flows, meaning you opflow 0... opflow (
  cycles/stride)
@@ -157,52 +139,46 @@ statements:

 ## Capture Syntax

-Capturing of variables in statement templates will be signified
-with `[varname]`. This examples represents the simplest case where the
-user just wants to capture a varaible. Thus the above is taken to mean:
+Capturing of variables in statement templates will be signified with `[varname]`. This examples
+represents the simplest case where the user just wants to capture a variable. Thus the above is
+taken to mean:

 - The scope of the captured variable is the OpFlow.
- The operation is required to succeed. Any other operation which depends
-  on a `varname` value will be skipped and counted as such.
- The captured type of `varname` is a single object, to be determined
-  dynamically, with no type checking required.
- A field named `varname` is required to be present in the result set for
-  the statement that included it.
+- The operation is required to succeed. Any other operation which depends on a `varname` value will
+  be skipped and counted as such.
+- The captured type of `varname` is a single object, to be determined dynamically, with no type
+  checking required.
+- A field named `varname` is required to be present in the result set for the statement that
+  included it.
 - Exactly one value for `varname` is required to be present.
- Without other settings to relax sanity constraints, any other appearance
-  of `[varname]` in another active statement should yield a warning to the
-  user.
+- Without other settings to relax sanity constraints, any other appearance of `[varname]` in another
+  active statement should yield a warning to the user.

-All behavioral variations that diverge from the above will be signified
-within the capture syntax as a variation on the above example.
+All behavioral variations that diverge from the above will be signified within the capture syntax as
+a variation on the above example.

 ## Inject Syntax

-Similar to binding tokens used in statement templates like '{varname}', it
-is possible to inject captured variables into statement templates with
-the `{[varname]}` syntax. This indicates that the user explicitly wants to
-pull a value directly from the captured variable. It is necessary to
-indicate variable capture and variable injection distinctly from each
-other, and this syntax supports that while remaining familiar to the
-bindings formats already supported.
+Similar to binding tokens used in statement templates like '{varname}', it is possible to inject
+captured variables into statement templates with the `{[varname]}` syntax. This indicates that the
+user explicitly wants to pull a value directly from the captured variable. It is necessary to
+indicate variable capture and variable injection distinctly from each other, and this syntax
+supports that while remaining familiar to the bindings formats already supported.

-The above syntax example represents the case where the user simply wants
-to refer to a variable of a given name. This is the simplest case, and is
-taken to mean:
+The above syntax example represents the case where the user simply wants to refer to a variable of a
+given name. This is the simplest case, and is taken to mean:

- The scope of the variable is not specified. The value may come from
-  OpFlow, thread, global or any scope that is available. By default,
-  scopes should be consulted with the shortest-lived inner scopes first
-  and widened only if needed to find the variable.
- The variable must be defined in some available scope. By default, It is
-  an error to refer to a variable for injection that is not defined.
- The type of the variable is not checked on access. The type is presumed
-  to be compatible with any assignments which are made within whatever
-  driver type is in use.
+- The scope of the variable is not specified. The value may come from OpFlow, thread, global or any
+  scope that is available. By default, scopes should be consulted with the shortest-lived inner
+  scopes first and widened only if needed to find the variable.
+- The variable must be defined in some available scope. By default, It is an error to refer to a
+  variable for injection that is not defined.
+- The type of the variable is not checked on access. The type is presumed to be compatible with any
+  assignments which are made within whatever driver type is in use.
 - The variable is assumed to be a single-valued type.

-All behavioral variations that diverge from the above will be signified
-within the variable injection syntax as a variation on the above syntax.
+All behavioral variations that diverge from the above will be signified within the variable
+injection syntax as a variation on the above syntax.

 ## Scenarios to Consider

@@ -215,22 +191,18 @@ advanced scenarios:
 - some ops may be required to produce a value
 - some ops may be required to produce multiple values

-* The carrier of op state should enable the following programmatic
-  constructions:
+* The carrier of op state should enable the following programmatic constructions:
    * Metric measuring the service time of the op on failure
    * Metric measuring the service time of the op on success
    * Metric measuring the size of the op on success
-    * Hooks for transforming or acting upon the op or cycle before the op
-      executes
-    * Hooks for transforming or acting upon the op or cycle after the op
-      executes, regardless of result
+    * Hooks for transforming or acting upon the op or cycle before the op executes
+    * Hooks for transforming or acting upon the op or cycle after the op executes, regardless of
+      result
    * Additional modifiers on the op, as in transformers.

-* All op contextual actions should be presented as a function on the op
-  type
+* All op contextual actions should be presented as a function on the op type

-* Completion Stages that support the op API should come from built-in
-  template implementations that already include metrics options, logging
-  support, etc.
+* Completion Stages that support the op API should come from built-in template implementations that
+  already include metrics options, logging support, etc.