2021-08-10 10:34:26 -05:00
|
|
|
# NoSQLBench Driver Adapter Standards
|
|
|
|
|
|
|
|
This document is intended to replace the earlier driver standard guide, as the APIs in the upcoming
|
|
|
|
release are both more streamlined and prescriptive. If you have built a driver in NoSQLBench before,
|
|
|
|
you will find that there is not much to it with the latest API updates.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
This is the document to read if you want to know if your NoSQLBench Driver Adapter is complete.
|
|
|
|
Within this document, the phrase `conformant` will be taken to mean that a DriverAdapter is
|
|
|
|
implemented according to the design intent and standards of this NoSQLBench guide and the API that
|
|
|
|
it describes.
|
|
|
|
|
|
|
|
While it may be possible to partially implement a Driver Adapter for basic use, following the
|
|
|
|
guidelines in this document will ensure that contributed drivers for NoSQLBench work in a familiar
|
|
|
|
and reliable way for users from one driver to another.
|
|
|
|
|
|
|
|
Over time, the standards in this guide will be programmatically enforced by the NoSQLBench Driver
|
|
|
|
Adapter API.
|
|
|
|
|
|
|
|
## Terms
|
|
|
|
|
|
|
|
- Driver Adapter - The NoSQLBench level driver adapter, the code that this document refers to.
|
|
|
|
- Native driver - An underlying driver which is provided by a vendor or project.
|
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
|
|
The NoSQLBench runtime supports the usage of multiple native drivers via a runtime layer called the
|
|
|
|
Driver Adapter. Each DriverAdapter implementation serves as a bridge between users and a native
|
|
|
|
driver API. For each operation specified by a user (as an op template), a Driver Adapter does the
|
|
|
|
following:
|
|
|
|
|
|
|
|
1) Examine the op fields and values to determine what type of operation a user intends. This process
|
|
|
|
is called *op mapping*. (TODO: link to OpMapper JavaDoc)
|
|
|
|
2) Construct a dispenser object which can create instances of the specific type of operation
|
|
|
|
determined above. This step is called *op synthesis*. (TODO: Link to OpDispenser JavaDoc)
|
|
|
|
|
|
|
|
The first (op mapping) is done at initialization time, and is responsible for doing all the
|
|
|
|
pre-vetting of what users put into their op templates (via yaml, json, or whatever). As a mechanism
|
|
|
|
that determines user intent, clarity is paramount.
|
|
|
|
|
|
|
|
The second (op synthesis) is done for each and every cycle of an activity that is run, and must be
|
|
|
|
done efficiently to allow NoSQLBench to operate as an effective testing instrument. If the op
|
|
|
|
mapping done correctly, there is no need for an op dispenser object to try to sort out what kind of
|
|
|
|
operation the user intended. An op dispenser synthesizes the fields together.
|
|
|
|
|
|
|
|
These two steps are closely related although they have very distinct responsibilities. They are
|
|
|
|
connected directly in the API -- The OpMapper dispenses OpDispensers (which dispense Ops). Although
|
|
|
|
they are connected in this way, you have *exact* control of when each happens in the lifecycle of an
|
|
|
|
activity: Everything in an op mapper and it's _apply_ method *and* in the constructors of op
|
|
|
|
dispensers that it creates is done before an activity starts (before the first cycle of that
|
|
|
|
activity). Everything after that (in the body of the OpDispenser apply method, for example) occurs
|
|
|
|
within a cycle of an activity.
|
|
|
|
|
|
|
|
## Op Templates
|
|
|
|
|
|
|
|
Users create op templates by writing YAML or JSON or data structures via the scripting API. These
|
|
|
|
are simply nested collections in their normative form, with string-based keys and values as in any
|
|
|
|
data structure. NoSQLBench recognizes a standard format (TODO: LINK THIS)
|
|
|
|
which is the same across all driver types that can be used with NoSQLBench.
|
|
|
|
|
|
|
|
The Op Templates which are provided by users are normalized by NoSQLBench into a standard
|
|
|
|
representation that is used within the op mapping and synthesis steps. This representation is
|
2021-09-13 09:43:19 -05:00
|
|
|
provided by the ParsedOp and ParsedTemplate APIs. The User-Facing construct is _Op Template_,
|
2021-08-10 10:34:26 -05:00
|
|
|
while the developer building driver adapters only sees _Parsed Commands_ and a fully-normalized API.
|
|
|
|
|
|
|
|
## Effective Op Mapping
|
|
|
|
|
|
|
|
### Setting the Stage
|
|
|
|
|
|
|
|
Op Mapping has specific inputs and specific outputs. On the input side, an op mapper can see as much
|
|
|
|
as the user specifies in the op template, and possibly more as provided to the op mapper's
|
|
|
|
constructor. Op Mappers may need access to the activity's parameters or the activity's space cache.
|
|
|
|
These can be provided from the DriverAdapter base type when needed.
|
|
|
|
|
|
|
|
Assuming you provide the activity params and the space cache to an OpMapper implementation, when
|
2021-09-13 09:43:19 -05:00
|
|
|
it's `apply(ParsedOp cmd)` method is called, you have access to a few levels of information:
|
2021-08-10 10:34:26 -05:00
|
|
|
|
2021-09-13 09:43:19 -05:00
|
|
|
1. The ParsedOp -- representing the specific details of an operation to be performed:
|
2021-08-10 10:34:26 -05:00
|
|
|
* op field names
|
|
|
|
* static field values - literal values or any non-string collection type (map, set, list)
|
|
|
|
* dynamic field values - Any type which contains a string template or a single binding
|
|
|
|
2. Op params, specified outside the layer of the op payload above. These are static fields which
|
|
|
|
users can specify outside
|
|
|
|
3. The Activity params. Sometimes you want to provide an activity-wide default for how a type of
|
|
|
|
operations works. When this applies, be sure to favor the op-specific parameters over any
|
|
|
|
activity params.
|
|
|
|
4. The state cache for the DriverAdapter instance, AKA the space cache. (TODO: Link to javadocs
|
|
|
|
for this)
|
|
|
|
|
|
|
|
Since op mapping logic is responsible for creating the op dispenser, an op mapper must pass along
|
|
|
|
any state, config, or other runtime details needed to create a native operation. You have control of
|
|
|
|
this since you design and instantiate op mapper types directly from your DriverAdapter
|
|
|
|
implementation.
|
|
|
|
|
|
|
|
### Distinguishing Op Types
|
|
|
|
|
|
|
|
There are multiple methods a driver adapter may use to determine what kind of operation a user
|
|
|
|
intends. Of these mentioned here, the best guidance is to choose the one that most closely mimics
|
|
|
|
the semantics and extant APIs for the specific native driver in question:
|
|
|
|
|
|
|
|
* Use a type designator field like `type: put` or something similar.
|
|
|
|
* Infer the op type by which field names are present in the template. (be careful with this one!)
|
|
|
|
* Model the op exactly like the payload of the native driver, and hand the op data directly to the
|
|
|
|
native driver as such (only one "type" of op here at the NB level)
|
|
|
|
|
|
|
|
### Show Users How
|
|
|
|
|
|
|
|
In any case, the method that you choose to use needs to be clearly documented, unambiguous, and
|
|
|
|
unaffected by the addition of new op types added to your driver in the future. You should
|
|
|
|
provide examples of each op type in your (driver adapter) documentation. Ideally, your
|
|
|
|
documentation is based on testable examples that are kept in the source tree and used for both
|
|
|
|
unti testing *and* user examples.
|
|
|
|
|
|
|
|
## Effective Op Synthesis
|
|
|
|
|
|
|
|
1. Pre-compute as much as you can in the constructor of the OpDispenser. These objects are retained
|
|
|
|
for the life of an activity.
|
|
|
|
2. Store re-usable elements of an operation in thread-safe form and re-use it wherever possible.
|
|
|
|
|
|
|
|
|
|
|
|
# Congruent Behavior
|
|
|
|
|
|
|
|
In order to ensure fairness and equity in how drivers work across systems and vendors, it is
|
|
|
|
necessary to standardize on what each driver does with its operations. A compliant driver
|
|
|
|
adapter will do the following:
|
|
|
|
|
|
|
|
1. Provide an Op implementation which can be retried without resynthesis
|
|
|
|
2. Fully read all the data in every result by default. Deviations from this default can only be
|
|
|
|
allowed when users explicitly specify something else, and should be accompanied by a
|
|
|
|
documentation or logging level warning that it is not normal behavior for a client.
|
|
|
|
3. Provide metrics about the quantity of elements read in a result.
|
|
|
|
|
|
|
|
# Config Sources
|
|
|
|
|
|
|
|
Activites have configuration at various levels:
|
|
|
|
|
|
|
|
1. Activity-wide parameters, called _activity params_.
|
|
|
|
2. (within the workload template, like a YAML doc) doc level params
|
|
|
|
3. (with a workload template, such as a YAML doc) block level params
|
|
|
|
4. op level params
|
|
|
|
5. op template fields
|
|
|
|
|
|
|
|
Op template fields (seen by the NB driver developer through the
|
2021-09-13 09:43:19 -05:00
|
|
|
ParsedOp API) are properly meant to specify a distinct type of operation
|
2021-08-10 10:34:26 -05:00
|
|
|
by its defined properties, no less or more. However, users will sometimes
|
|
|
|
put op params into the op template alongside the op fields. This is *OK*.
|
|
|
|
|
|
|
|
*The rule of thumb is to ensure that a named field can only be used as an
|
2021-09-13 09:43:19 -05:00
|
|
|
op field or an op param but not both.* Each ParsedOp has access to
|
2021-08-10 10:34:26 -05:00
|
|
|
all of the layers above, and should be used to extract out the fields
|
|
|
|
which are properly configuration level data before the fields are used
|
|
|
|
for op mapping. By using this technique, op fields can be configured from any convenient
|
|
|
|
level.
|
|
|
|
|
|
|
|
|
|
|
|
# Enhancements
|
|
|
|
|
|
|
|
* Configuration params that govern op behavior can be specified at any level, including within the op template itself.
|
|
|
|
* Binding functions can be expressed as named anchors or inline as direct definitions
|
|
|
|
* Variable capture syntax in parsed op formats is standardized.
|
|
|
|
...
|
|
|
|
|
|
|
|
# Revamp Below!
|
|
|
|
|
|
|
|
|
|
|
|
## Result Validation
|
|
|
|
|
|
|
|
TBD
|
|
|
|
|
|
|
|
## Diagnostic Mode
|
|
|
|
|
|
|
|
TBD
|
|
|
|
|
|
|
|
## Naming Conventions
|
|
|
|
|
|
|
|
TBD
|
|
|
|
|
|
|
|
### Parameter naming
|
|
|
|
|
|
|
|
Parameters should be formatted as snake_case by default. Hyphens or camel case often cause issues
|
|
|
|
when using mixed media such as command lines and yaml formats. Snake case is a simple common
|
|
|
|
denominator which works across all these forms with little risk of ambiguity when parsing or
|
|
|
|
documenting how parameters are set apart from other syntax.
|
|
|
|
|
|
|
|
## Documentation
|
|
|
|
|
|
|
|
Each activity is required to have a set of markdown documentation in its resource directory. The
|
|
|
|
name of the driver should also be used as the name of the documentation for that driver.
|
|
|
|
|
|
|
|
Additional documentation can be added beyond this file. However, all documentation for a given
|
|
|
|
driver must start with the drivers name and a hyphen.
|
|
|
|
|
|
|
|
If a driver wants to include topics, the convention is to mention these other topics within the
|
|
|
|
driver's main help. Any markdown file which is included in the resources of a driver module will be
|
|
|
|
viewable by users with the help command `nb help <name>`. For example, if a driver module
|
|
|
|
contains `../src/main/resources/mydriver-specials.md`, then a user would be able to find this help
|
|
|
|
by running `nb help mydriver-specials`.
|
|
|
|
|
|
|
|
These sources of documentation can be wired into the main NoSQLBench documentation system with a set
|
|
|
|
of content descriptors.
|
|
|
|
|
|
|
|
## Named Scenarios
|
|
|
|
|
|
|
|
Conformant driver implementations should come with one or more examples of a workload under the
|
|
|
|
activities directory path. Useful driver implementations should come with one or more examples of a
|
|
|
|
workloads under the activities directory path. These examples should employ the "named scenarios"
|
|
|
|
format as described in the main docs. By including named scenarios in the yaml format, these named
|
|
|
|
scenarios then become available to users when they look for scenarios to call with the
|
|
|
|
`--list-scenarios` command.
|
|
|
|
|
|
|
|
To include such scenario, simply add a working yaml with a scenarios section to the root of your
|
|
|
|
module under the
|
|
|
|
`src/main/resources/activities` directory.
|
|
|
|
|
|
|
|
## Included Examples
|
|
|
|
|
|
|
|
Useful driver implementations should come with a set of examples under the examples directory path
|
|
|
|
which demonstrate useful patterns, bindings, or statement forms.
|
|
|
|
|
|
|
|
Users can find these examples in the same way as they can find the named scenarios above with the
|
|
|
|
only difference being their location. By convention the directory `src/main/resources/examples`
|
|
|
|
directory is where these are located.
|
|
|
|
|
|
|
|
The format is the same as for named scenarios, because the examples *are*
|
|
|
|
named scenarios. Users can find these by using the `--include=examples`
|
|
|
|
option in addition to the `--list-scenarios` command.
|
|
|
|
|
|
|
|
## Testing and Docs
|
|
|
|
|
|
|
|
Complete driver implementations should also come with a set of examples under the examples directory
|
|
|
|
path.
|
|
|
|
|
|
|
|
Unit testing within the NB code base is necessary in many places, but not in others. Use your
|
|
|
|
judgement about when to *not* add unit testing, but default to adding it when it seems subjective. A
|
|
|
|
treatise on when and how to choose appropriate unit testing won't fit here, but suffice it to say
|
|
|
|
that you can always ask the project maintainers for help on this if you need.
|
|
|
|
|
|
|
|
Non-trivial code in pull requests without any form of quality checks or testing will not be merged
|
|
|
|
until or unless the project maintainers are satisfied that there is little risk of user impact.
|
|
|
|
Experimental features clearly labeled as such will be given more wiggle room here, but the label
|
|
|
|
will not be removable unless/until a degree of robustness is proven in some testing layer.
|
|
|
|
|
|
|
|
### Testing Futures
|
|
|
|
|
|
|
|
In the future, the integration testing and the docs system are intended to become part of one whole.
|
|
|
|
Particularly, docs should provide executable examples which can also be used to explain how NB or
|
|
|
|
drivers work. Until this is done, use the guidelines above.
|
|
|
|
|
|
|
|
## Handling secrets
|
|
|
|
|
|
|
|
Reading passwords ...
|
|
|
|
|
|
|
|
## Parameter Use
|
|
|
|
|
|
|
|
Activity parameters *and* statement parameters must combine in intuitive ways.
|
|
|
|
|
|
|
|
### ActivityType Parameters
|
|
|
|
|
|
|
|
The documentation for an activity type should have an explanation of all the activity parameters
|
|
|
|
that are unique to it. Examples of each of these should be given. The default values for these
|
|
|
|
parameters should be given. Further, if there are some common settings that may be useful to users,
|
|
|
|
these should be included in the examples.
|
|
|
|
|
|
|
|
### Statement Parameters
|
|
|
|
|
|
|
|
The documentation for an activity type should have an explanation of all the statement parameters
|
|
|
|
that are unique to it. Examples of each of these should be given. The default values for these
|
|
|
|
parameters should be given.
|
|
|
|
|
|
|
|
### Additive Configuration
|
|
|
|
|
|
|
|
If there is a configuration element in the activity type which can be modified in multiple ways that
|
|
|
|
are not mutually exclusive, each time that configuration element is modified, it should be done
|
|
|
|
additively. This means that users should not be surprised when they use multiple parameters that
|
|
|
|
modify the configuration element with only the last one being applied. An example of this would be
|
|
|
|
adding a load-balancing policy to a cql driver and then, separately adding another. The second one
|
|
|
|
should wrap the first, as this is expected to be additive by nature of the native driver's API.
|
|
|
|
|
|
|
|
### Parameter Conflicts
|
|
|
|
|
|
|
|
If it is possible for parameters to conflict with each other in a way that would provide an invalid
|
|
|
|
configuration when both are applied, or in a way that the underlying API would not strictly allow,
|
|
|
|
then these conditions must be detected by the activity type, with an error thrown to the user
|
|
|
|
explaining the conflict.
|
|
|
|
|
|
|
|
### Parameter Diagnostics
|
|
|
|
|
|
|
|
Each and every activity parameter that is set on an activity *must* be logged at DEBUG level with
|
|
|
|
the pattern `ACTIVITY PARAMETER: <activity alias>` included in the log line, so that the user may
|
|
|
|
verify applied parameter settings. Further, an explanation for what this parameter does to the
|
|
|
|
specific activity *should*
|
|
|
|
be included in a following log line.
|
|
|
|
|
|
|
|
Each and every statement parameter that is set on a statement *must* be logged at DEBUG level with
|
|
|
|
the pattern `STATEMENT PARAMETER: <statement name>: ` included in the log line, so that the user may
|
|
|
|
verify applied statement settings. Further, an explanation for what this parameter does to the
|
|
|
|
specific statement *
|
|
|
|
should* be included in a following log line.
|
|
|
|
|
|
|
|
### Environment Variables
|
|
|
|
|
|
|
|
Environment variable may be hoisted into a driver's configuration, but only using explicit
|
|
|
|
mechanisms. By default, environment variables are not injected into any NoSQLBench usage context
|
|
|
|
where it is not explicitly enabled by the user. The mechanism of enabling environment variables is
|
|
|
|
simple indirection, using a symbolic variable reference where they would normally use a value.
|
|
|
|
|
|
|
|
Further, the variable must be explicitly enabled for env interpolation by the developer, and
|
|
|
|
documented as such. Having variables which often use
|
|
|
|
`$...` formats for other purposes besides environment variables is a nuisance. Conversely, not
|
|
|
|
supporting env vars in `$...` values which are historically enabled for such is also a nuisance.
|
|
|
|
|
|
|
|
#### format
|
|
|
|
|
|
|
|
such as `myparam=$ENV_VAR_FOO`, where the env var name must follow this pattern:
|
|
|
|
|
|
|
|
1. A `$` literal dollar sign.
|
|
|
|
2. Any alphabetic or underscore character (`[a-zA-Z_]`)
|
|
|
|
3. Zero or more trailing characters to include optional dots and digits. (`[a-zA-Z0-9_]*`)
|
|
|
|
|
|
|
|
Alternately, the `${...}` form is less strict, and allows any characters which are not `}`.
|