make op templates map based internally

This commit is contained in:
Jonathan Shook
2021-06-17 09:51:45 -05:00
parent 070086e655
commit beea2fdbf6
30 changed files with 726 additions and 277 deletions

View File

@@ -1,114 +1,97 @@
# Linearized Operations
NOTE: This is a sketch/work in progress and will not be suitable for
earnest review until this notice is removed.
NOTE: This is a sketch/work in progress and will not be suitable for earnest review until this
notice is removed.
Thanks to Seb and Wei for helping this design along with their discussions
along the way.
Thanks to Seb and Wei for helping this design along with their discussions along the way.
See https://github.com/nosqlbench/nosqlbench/issues/136
Presently, it is possible to stitch together rudimentary chained
operations, as long as you already know how statement sequences, bindings
functions, and thread-local state work. This is a significant amount of
knowledge to expect from a user who simply wants to configure chained
operations with internal dependencies.
Presently, it is possible to stitch together rudimentary chained operations, as long as you already
know how statement sequences, bindings functions, and thread-local state work. This is a significant
amount of knowledge to expect from a user who simply wants to configure chained operations with
internal dependencies.
The design changes needed to make this easy to express are non-trivial and
cut across a few of the extant runtime systems within nosqlbench. This
design sketch will try to capture each of the requirements and approached
sufficiently for discussion and feedback.
The design changes needed to make this easier are non-trivial and cut across the runtime systems
within nosqlbench. This design sketch will try to capture each of the requirements and approaches
for discussion and feedback.
# Sync and Async
## As it is: Sync vs Async
The current default mode (without `async=`) emulates a request-per-thread
model, with operations being planned in a deterministic sequence. In this
mode, each thread dispatches operations from the sequence only after the
previous one is fully completed, even if there is no dependence between
The current default mode (without `async=`) emulates a request-per-thread model, with operations
being planned in a deterministic sequence. In this mode, each thread dispatches operations from the
sequence only after the previous one is fully completed, even if there is no dependence between
them. This is typical of many applications, even today, but not all.
On the other end of the spectrum is the fully asynchronous dispatch mode
enabled with the `async=` option. This uses a completely different
internal API to allow threads to juggle a number of operations. In
contrast to the default mode, the async mode dispatches operations eagerly
as long as the user's selected concurrency level is not yet met. This
means that operations may overlap and also occur out of order with respect
to the sequence.
On the other end of the spectrum is the fully asynchronous dispatch mode enabled with the `async=`
option. This uses a completely different internal API to allow threads to juggle a number of
operations. In contrast to the default mode, the async mode dispatches operations eagerly as long as
the user's selected concurrency level is not yet met. This means that operations may overlap and
also occur out of order with respect to the sequence.
Choosing between these modes is a hard choice that does not offer a
uniform way of looking at operations. As well, it also forces users to
pick between two extremes of all request-per-thread or all asynchronous,
which is becoming less common in application designs, and at the very
least does not rise to the level of expressivity of the toolchains that
most users have access to.
Choosing between these modes is a hard choice that does not offer a uniform way of looking at
operations. As well, it also forces users to pick between two extremes of all request-per-thread or
all asynchronous, which is becoming less common in application designs, and at the very least does
not rise to the level of expressivity of the toolchains that most users have access to.
## As it should be: Async with Explicit Dependencies
* The user should be able to create explicit dependencies from one
operation to another.
* Operations which are not dependent on other operations should be
dispatched as soon as possible within the concurrency limits of the
workload.
* Operations with dependencies on other operations should only be
dispatched if the upstream operations completed successfully.
* Users should have clear expectations of how error handling will occur
for individual operations as well as chains of operations.
* The user should be able to create explicit dependencies from one operation to another.
* Operations which are not dependent on other operations should be dispatched as soon as possible
within the concurrency limits of the workload.
* Operations with dependencies on other operations should only be dispatched if the upstream
operations completed successfully.
* Users should have clear expectations of how error handling will occur for individual operations as
well as chains of operations.
# Dependent Ops
We are using the phrase _dependent ops_ to capture the notions of
data-flow dependency between ops (implying linearization in ordering and
isolation of input and output boundaries), successful execution, and data
sharing within an appropriate scope.
We are using the phrase _dependent ops_ to capture the notions of data-flow dependency between ops (
implying linearization in ordering and isolation of input and output boundaries), successful
execution, and data sharing within an appropriate scope.
## As it is: Data Flow
Presently, you can store state within a thread local object map in order
to share data between operations. This is using the implied scope of "
thread local" which works well with the "sequence per thread, request per
thread" model. This works because both the op sequence as well as the
variable state used in binding functions are thread local.
Presently, you can store state within a thread local object map in order to share data between
operations. This is using the implied scope of "
thread local" which works well with the "sequence per thread, request per thread" model. This works
because both the op sequence as well as the variable state used in binding functions are thread
local.
However, it does not work well with the async mode, since there is no
implied scope to tie the variable state to the op sequence. There can be
many operations within a thread operating on the same state even
concurrently. This may appear to function, but will create problems for
users who are not aware of the limitation.
However, it does not work well with the async mode, since there is no implied scope to tie the
variable state to the op sequence. There can be many operations within a thread operating on the
same state even concurrently. This may appear to function, but will create problems for users who
are not aware of the limitation.
## As it should be: Data Flow
* Data flow between operations should be easily expressed with a standard
configuration primitive which can work across all driver types.
* The scope of data shared should be
The scope of a captured value should be clear to users
* Data flow between operations should be easily expressed with a standard configuration primitive
which can work across all driver types.
* The scope of data shared should be clear to users when configuring op templates, and in any
diagnostic outputs from failed operations.
## As it is: Data Capture
Presently, the CQL driver has additional internal operators which allow
for the capture of values. These decorator behaviors allow for configured
statements to do more than just dispatch an operation. However, they are
not built upon standard data capture and sharing operations which are
implemented uniformly across driver types. This makes scope management
largely a matter of convention, which is ok for the first implementation (
Presently, the CQL driver has additional internal operators which allow for the capture of values.
These decorator behaviors allow for configured statements to do more than just dispatch an
operation. However, they are not built upon standard data capture and sharing operations which are
implemented uniformly across driver types. This makes scope management largely a matter of
convention, which is ok for the first implementation (
in the CQL driver) but not as a building block for cross-driver behaviors.
# Injecting Operations
## As it is: Injecting Operations
Presently operations are derived from statement templates on a
deterministic op sequence which is of a fixed length known as the stride.
This follows closely the pattern of assuming each operation comes from one
distinct cycle and that there is always a one-to-one relationship with
cycles. This has carried some weight internally in how metrics for cycles
are derived, etc. There is presently no separate operational queue for
statements except by modifying statements in the existing sequence with
side-effect binding assignment. It is difficult to reason about additional
operations as independent without decoupling these two into separate
mechanisms.
Presently operations are derived from statement templates on a deterministic op sequence which is of
a fixed length known as the stride. This follows closely the pattern of assuming each operation
comes from one distinct cycle and that there is always a one-to-one relationship with cycles. This
has carried some weight internally in how metrics for cycles are derived, etc. There is presently no
separate operational queue for statements except by modifying statements in the existing sequence
with side-effect binding assignment. It is difficult to reason about additional operations as
independent without decoupling these two into separate mechanisms.
## As it should be: Injecting Operations
@@ -126,8 +109,7 @@ Open concerns
- before: variable state was per-thread
- now: variable state is per opflow
- (opflow state is back-filled into thread local as the default
implementation)
- (opflow state is back-filled into thread local as the default implementation)
* gives scope for enumerating op flows, meaning you opflow 0... opflow (
cycles/stride)
@@ -157,52 +139,46 @@ statements:
## Capture Syntax
Capturing of variables in statement templates will be signified
with `[varname]`. This examples represents the simplest case where the
user just wants to capture a varaible. Thus the above is taken to mean:
Capturing of variables in statement templates will be signified with `[varname]`. This examples
represents the simplest case where the user just wants to capture a variable. Thus the above is
taken to mean:
- The scope of the captured variable is the OpFlow.
- The operation is required to succeed. Any other operation which depends
on a `varname` value will be skipped and counted as such.
- The captured type of `varname` is a single object, to be determined
dynamically, with no type checking required.
- A field named `varname` is required to be present in the result set for
the statement that included it.
- The operation is required to succeed. Any other operation which depends on a `varname` value will
be skipped and counted as such.
- The captured type of `varname` is a single object, to be determined dynamically, with no type
checking required.
- A field named `varname` is required to be present in the result set for the statement that
included it.
- Exactly one value for `varname` is required to be present.
- Without other settings to relax sanity constraints, any other appearance
of `[varname]` in another active statement should yield a warning to the
user.
- Without other settings to relax sanity constraints, any other appearance of `[varname]` in another
active statement should yield a warning to the user.
All behavioral variations that diverge from the above will be signified
within the capture syntax as a variation on the above example.
All behavioral variations that diverge from the above will be signified within the capture syntax as
a variation on the above example.
## Inject Syntax
Similar to binding tokens used in statement templates like '{varname}', it
is possible to inject captured variables into statement templates with
the `{[varname]}` syntax. This indicates that the user explicitly wants to
pull a value directly from the captured variable. It is necessary to
indicate variable capture and variable injection distinctly from each
other, and this syntax supports that while remaining familiar to the
bindings formats already supported.
Similar to binding tokens used in statement templates like '{varname}', it is possible to inject
captured variables into statement templates with the `{[varname]}` syntax. This indicates that the
user explicitly wants to pull a value directly from the captured variable. It is necessary to
indicate variable capture and variable injection distinctly from each other, and this syntax
supports that while remaining familiar to the bindings formats already supported.
The above syntax example represents the case where the user simply wants
to refer to a variable of a given name. This is the simplest case, and is
taken to mean:
The above syntax example represents the case where the user simply wants to refer to a variable of a
given name. This is the simplest case, and is taken to mean:
- The scope of the variable is not specified. The value may come from
OpFlow, thread, global or any scope that is available. By default,
scopes should be consulted with the shortest-lived inner scopes first
and widened only if needed to find the variable.
- The variable must be defined in some available scope. By default, It is
an error to refer to a variable for injection that is not defined.
- The type of the variable is not checked on access. The type is presumed
to be compatible with any assignments which are made within whatever
driver type is in use.
- The scope of the variable is not specified. The value may come from OpFlow, thread, global or any
scope that is available. By default, scopes should be consulted with the shortest-lived inner
scopes first and widened only if needed to find the variable.
- The variable must be defined in some available scope. By default, It is an error to refer to a
variable for injection that is not defined.
- The type of the variable is not checked on access. The type is presumed to be compatible with any
assignments which are made within whatever driver type is in use.
- The variable is assumed to be a single-valued type.
All behavioral variations that diverge from the above will be signified
within the variable injection syntax as a variation on the above syntax.
All behavioral variations that diverge from the above will be signified within the variable
injection syntax as a variation on the above syntax.
## Scenarios to Consider
@@ -215,22 +191,18 @@ advanced scenarios:
- some ops may be required to produce a value
- some ops may be required to produce multiple values
* The carrier of op state should enable the following programmatic
constructions:
* The carrier of op state should enable the following programmatic constructions:
* Metric measuring the service time of the op on failure
* Metric measuring the service time of the op on success
* Metric measuring the size of the op on success
* Hooks for transforming or acting upon the op or cycle before the op
executes
* Hooks for transforming or acting upon the op or cycle after the op
executes, regardless of result
* Hooks for transforming or acting upon the op or cycle before the op executes
* Hooks for transforming or acting upon the op or cycle after the op executes, regardless of
result
* Additional modifiers on the op, as in transformers.
* All op contextual actions should be presented as a function on the op
type
* All op contextual actions should be presented as a function on the op type
* Completion Stages that support the op API should come from built-in
template implementations that already include metrics options, logging
support, etc.
* Completion Stages that support the op API should come from built-in template implementations that
already include metrics options, logging support, etc.