reorg stale docs

This commit is contained in:
Jonathan Shook
2021-02-04 17:46:58 -06:00
parent 04c64861cd
commit 4c726b0333
25 changed files with 529 additions and 714 deletions

View File

@@ -1,5 +1,8 @@
# NoSQLBench Module Dependencies
This is viewable in Intellij markdown preview with the optional plugins
enabled.
```plantuml
digraph Test {

View File

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 54 KiB

View File

Before

Width:  |  Height:  |  Size: 29 KiB

After

Width:  |  Height:  |  Size: 29 KiB

View File

@@ -5,7 +5,7 @@ all of NoSQLBench and supported drivers.
## Scopes of Execution
![Scopes](scopes.png)
![Scopes](../_tosort/scopes.png)
### Process

View File

@@ -7,9 +7,9 @@
id="svg8"
inkscape:version="1.1-dev (1:1.0+devel+202008182239+0d2e79aadc)"
sodipodi:docname="hybrid_ratelimiter.svg"
inkscape:export-filename="/home/jshook/IdeaProjects/nosqlbench/devdocs/hybrid_ratelimiter.png"
inkscape:export-xdpi="96"
inkscape:export-ydpi="96"
inkscape:export-filename="/home/jshook/IdeaProjects/nosqlbench/devdocs/devguide/drivers/optemplate.png"
inkscape:export-xdpi="95.040001"
inkscape:export-ydpi="95.040001"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
@@ -259,16 +259,16 @@
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="1.6411848"
inkscape:cx="269.62229"
inkscape:cy="433.22361"
inkscape:cx="-1048.9373"
inkscape:cy="-189.8019"
inkscape:document-units="mm"
inkscape:current-layer="g1199"
showgrid="true"
inkscape:window-width="3554"
inkscape:window-height="2007"
inkscape:window-x="286"
inkscape:window-y="2160"
inkscape:window-maximized="1"
inkscape:window-width="2115"
inkscape:window-height="1969"
inkscape:window-x="1061"
inkscape:window-y="2452"
inkscape:window-maximized="0"
inkscape:snap-intersection-paths="true"
inkscape:snap-bbox="true"
inkscape:bbox-paths="true"
@@ -2068,7 +2068,9 @@
id="text1978"><tspan
sodipodi:role="line"
id="tspan1976"
style="stroke-width:0.264583" /></text>
style="stroke-width:0.264583"
x="30.427082"
y="34.395832" /></text>
<g
id="g2074"
transform="matrix(0.86478602,0,0,0.86478602,22.001849,2.146522)">
@@ -2119,7 +2121,7 @@
d="M 95.625248,10.129208 89.958333,21.62495 h 11.906257 l -5.668972,-11.4999 z m -0.11067,1.573867 4.762502,9.525 h -9.525002 z"
sodipodi:nodetypes="ccccccccc"/>
<path
style="display:inline;fill:#ffffff;stroke:#000000;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-variation-settings:normal;opacity:1;vector-effect:none;fill-opacity:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stop-color:#000000;stop-opacity:1"
style="font-variation-settings:normal;display:inline;opacity:1;vector-effect:none;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;stop-color:#000000;stop-opacity:1"
d="m 72.495837,22.621875 h 8.46666 l -4.23333,8.46666 z"
id="path1104-7-2-6-4"
inkscape:connector-curvature="0"
@@ -2134,7 +2136,7 @@
inkscape:export-ydpi="174.39999"
sodipodi:nodetypes="cc"/>
<path
style="display:inline;fill:#ffffff;stroke:#000000;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;fill-opacity:1"
style="display:inline;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 72.495834,33.205208 H 80.9625 l -4.233333,8.466666 z"
id="path1104"
inkscape:connector-curvature="0"

Before

Width:  |  Height:  |  Size: 143 KiB

After

Width:  |  Height:  |  Size: 141 KiB

View File

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 114 KiB

View File

@@ -1,196 +0,0 @@
# Linearized Operations
NOTE: This is a sketch/work in progress and will not be suitable for earnest review until this notice is removed.
Thanks to Seb and Wei for helping this design along with their discussions along the way.
See https://github.com/nosqlbench/nosqlbench/issues/136
Presently, it is possible to stitch together rudimentary chained operations, as long as you already know how statement
sequences, bindings functions, and thread-local state work. This is a significant amount of knowledge to expect from a
user who simply wants to configure chained operations with internal dependencies.
The design changes needed to make this easy to express are non-trivial and cut across a few of the extant runtime
systems within nosqlbench. This design sketch will try to capture each of the requirements and approached sufficiently
for discussion and feedback.
# Sync and Async
## As it is: Sync vs Async
The current default mode (without `async=`) emulates a request-per-thread model, with operations being planned in a
deterministic sequence. In this mode, each thread dispatches operations from the sequence only after the previous one is
fully completed, even if there is no dependence between them. This is typical of many applications, even today, but not
all.
On the other end of the spectrum is the fully asynchronous dispatch mode enabled with the `async=` option. This uses a
completely different internal API to allow threads to juggle a number of operations. In contrast to the default mode,
the async mode dispatches operations eagerly as long as the user's selected concurrency level is not yet met. This means
that operations may overlap and also occur out of order with respect to the sequence.
Choosing between these modes is a hard choice that does not offer a uniform way of looking at operations. As well, it
also forces users to pick between two extremes of all request-per-thread or all asynchronous, which is becoming less
common in application designs, and at the very least does not rise to the level of expressivity of the toolchains that
most users have access to.
## As it should be: Async with Explicit Dependencies
* The user should be able to create explicit dependencies from one operation to another.
* Operations which are not dependent on other operations should be dispatched as soon as possible within the concurrency
limits of the workload.
* Operations with dependencies on other operations should only be dispatched if the upstream operations completed
successfully.
* Users should have clear expectations of how error handling will occur for individual operations as well
as chains of operations.
# Dependent Ops
We are using the phrase _dependent ops_ to capture the notions of data-flow dependency between ops (implying
linearization in ordering and isolation of input and output boundaries), successful execution, and data sharing within
an appropriate scope.
## As it is: Data Flow
Presently, you can store state within a thread local object map in order to share data between operations. This is using
the implied scope of "thread local" which works well with the "sequence per thread, request per thread" model. This
works because both the op sequence as well as the variable state used in binding functions are thread local.
However, it does not work well with the async mode, since there is no implied scope to tie the variable state to the op
sequence. There can be many operations within a thread operating on the same state even concurrently. This may appear to
function, but will create problems for users who are not aware of the limitation.
## As it should be: Data Flow
* Data flow between operations should be easily expressed with a standard configuration primitive which can work across
all driver types.
* The scope of data shared should be
The scope of a captured value should be clear to users
## As it is: Data Capture
Presently, the CQL driver has additional internal operators which allow for the capture of values. These decorator
behaviors allow for configured statements to do more than just dispatch an operation. However, they are not built upon
standard data capture and sharing operations which are implemented uniformly across driver types. This makes scope
management largely a matter of convention, which is ok for the first implementation (in the CQL driver) but not as a
building block for cross-driver behaviors.
# Injecting Operations
## As it is: Injecting Operations
Presently operations are derived from statement templates on a deterministic op sequence which is of a fixed length
known as the stride. This follows closely the pattern of assuming each operation comes from one distinct cycle and that
there is always a one-to-one relationship with cycles. This has carried some weight internally in how metrics for cycles
are derived, etc. There is presently no separate operational queue for statements except by modifying statements in the
existing sequence with side-effect binding assignment. It is difficult to reason about additional operations as
independent without decoupling these two into separate mechanisms.
## As it should be: Injecting Operations
## Seeding Context
# Diagrams
![idealized](idealized.svg)
## Op Flow
To track
Open concerns
- before: variable state was per-thread
- now: variable state is per opflow
- (opflow state is back-filled into thread local as the default implementation)
* gives scope for enumerating op flows, meaning you opflow 0... opflow (cycles/stride)
* 5 statements in sequence, stride=5,
- scoping for state
- implied data flow dependence vs explicit data flow dependence
- opflow retries vs op retries
discussion
```yaml
bindings:
yesterday: HashRange(0L,1234234L);
statements:
- s1-with-binding: select [userid*] from foobar.baz where day=23
- s2-with-binding: select [userid],[yesterday] from accounts where id={id} and timestamp>{yesterday}
- s3-with-dependency: select login_history from sessions where userid={[userid]}
- rogue-statement: select [yesterday] from ... <--- WARN USER because of explicit dependency below
- s4: select login_history from sessions where userid={[userid]} and timestamp>{yesterday}
- s5: select login_history from sessions where userid={[userid]} and timestamp>{[s2-with-binding/yesterday]}
```
## Dependency Indirection
## Error Handling and DataFlow Semantics
## Capture Syntax
Capturing of variables in statement templates will be signified with `[varname]`. This examples represents the simplest
case where the user just wants to capture a varaible. Thus the above is taken to mean:
- The scope of the captured variable is the OpFlow.
- The operation is required to succeed. Any other operation which depends on a `varname` value will be skipped and
counted as such.
- The captured type of `varname` is a single object, to be determined dynamically, with no type checking required.
- A field named `varname` is required to be present in the result set for the statement that included it.
- Exactly one value for `varname` is required to be present.
- Without other settings to relax sanity constraints, any other appearance of `[varname]` in another active statement
should yield a warning to the user.
All behavioral variations that diverge from the above will be signified within the capture syntax as a variation on the
above example.
## Inject Syntax
Similar to binding tokens used in statement templates like '{varname}', it is possible to inject captured variables into
statement templates with the `{[varname]}` syntax. This indicates that the user explicitly wants to pull a value
directly from the captured variable. It is necessary to indicate variable capture and variable injection distinctly from
each other, and this syntax supports that while remaining familiar to the bindings formats already supported.
The above syntax example represents the case where the user simply wants to refer to a variable of a given name. This is
the simplest case, and is taken to mean:
- The scope of the variable is not specified. The value may come from OpFlow, thread, global or any scope that is
available. By default, scopes should be consulted with the shortest-lived inner scopes first and widened only if
needed to find the variable.
- The variable must be defined in some available scope. By default, It is an error to refer to a variable for injection
that is not defined.
- The type of the variable is not checked on access. The type is presumed to be compatible with any assignments which
are made within whatever driver type is in use.
- The variable is assumed to be a single-valued type.
All behavioral variations that diverge from the above will be signified within the variable injection syntax as a
variation on the above syntax.
## Scenarios to Consider
basic scenario: user wants to capture each variable from one place
advanced scenarios:
- user wants to capture a named var from one or more places
- some ops may be required to complete successfully, others may not
- some ops may be required to produce a value
- some ops may be required to produce multiple values
* The carrier of op state should enable the following programmatic constructions:
* Metric measuring the service time of the op on failure
* Metric measuring the service time of the op on success
* Metric measuring the size of the op on success
* Hooks for transforming or acting upon the op or cycle before the op executes
* Hooks for transforming or acting upon the op or cycle after the op executes, regardless of result
* Additional modifiers on the op, as in transformers.
* All op contextual actions should be presented as a function on the op type
* Completion Stages that support the op API should come from built-in template implementations that already include
metrics options, logging support, etc.

View File

Before

Width:  |  Height:  |  Size: 5.9 KiB

After

Width:  |  Height:  |  Size: 5.9 KiB

View File

@@ -0,0 +1,236 @@
# Linearized Operations
NOTE: This is a sketch/work in progress and will not be suitable for
earnest review until this notice is removed.
Thanks to Seb and Wei for helping this design along with their discussions
along the way.
See https://github.com/nosqlbench/nosqlbench/issues/136
Presently, it is possible to stitch together rudimentary chained
operations, as long as you already know how statement sequences, bindings
functions, and thread-local state work. This is a significant amount of
knowledge to expect from a user who simply wants to configure chained
operations with internal dependencies.
The design changes needed to make this easy to express are non-trivial and
cut across a few of the extant runtime systems within nosqlbench. This
design sketch will try to capture each of the requirements and approached
sufficiently for discussion and feedback.
# Sync and Async
## As it is: Sync vs Async
The current default mode (without `async=`) emulates a request-per-thread
model, with operations being planned in a deterministic sequence. In this
mode, each thread dispatches operations from the sequence only after the
previous one is fully completed, even if there is no dependence between
them. This is typical of many applications, even today, but not all.
On the other end of the spectrum is the fully asynchronous dispatch mode
enabled with the `async=` option. This uses a completely different
internal API to allow threads to juggle a number of operations. In
contrast to the default mode, the async mode dispatches operations eagerly
as long as the user's selected concurrency level is not yet met. This
means that operations may overlap and also occur out of order with respect
to the sequence.
Choosing between these modes is a hard choice that does not offer a
uniform way of looking at operations. As well, it also forces users to
pick between two extremes of all request-per-thread or all asynchronous,
which is becoming less common in application designs, and at the very
least does not rise to the level of expressivity of the toolchains that
most users have access to.
## As it should be: Async with Explicit Dependencies
* The user should be able to create explicit dependencies from one
operation to another.
* Operations which are not dependent on other operations should be
dispatched as soon as possible within the concurrency limits of the
workload.
* Operations with dependencies on other operations should only be
dispatched if the upstream operations completed successfully.
* Users should have clear expectations of how error handling will occur
for individual operations as well as chains of operations.
# Dependent Ops
We are using the phrase _dependent ops_ to capture the notions of
data-flow dependency between ops (implying linearization in ordering and
isolation of input and output boundaries), successful execution, and data
sharing within an appropriate scope.
## As it is: Data Flow
Presently, you can store state within a thread local object map in order
to share data between operations. This is using the implied scope of "
thread local" which works well with the "sequence per thread, request per
thread" model. This works because both the op sequence as well as the
variable state used in binding functions are thread local.
However, it does not work well with the async mode, since there is no
implied scope to tie the variable state to the op sequence. There can be
many operations within a thread operating on the same state even
concurrently. This may appear to function, but will create problems for
users who are not aware of the limitation.
## As it should be: Data Flow
* Data flow between operations should be easily expressed with a standard
configuration primitive which can work across all driver types.
* The scope of data shared should be
The scope of a captured value should be clear to users
## As it is: Data Capture
Presently, the CQL driver has additional internal operators which allow
for the capture of values. These decorator behaviors allow for configured
statements to do more than just dispatch an operation. However, they are
not built upon standard data capture and sharing operations which are
implemented uniformly across driver types. This makes scope management
largely a matter of convention, which is ok for the first implementation (
in the CQL driver) but not as a building block for cross-driver behaviors.
# Injecting Operations
## As it is: Injecting Operations
Presently operations are derived from statement templates on a
deterministic op sequence which is of a fixed length known as the stride.
This follows closely the pattern of assuming each operation comes from one
distinct cycle and that there is always a one-to-one relationship with
cycles. This has carried some weight internally in how metrics for cycles
are derived, etc. There is presently no separate operational queue for
statements except by modifying statements in the existing sequence with
side-effect binding assignment. It is difficult to reason about additional
operations as independent without decoupling these two into separate
mechanisms.
## As it should be: Injecting Operations
## Seeding Context
# Diagrams
![idealized](idealized.svg)
## Op Flow
To track
Open concerns
- before: variable state was per-thread
- now: variable state is per opflow
- (opflow state is back-filled into thread local as the default
implementation)
* gives scope for enumerating op flows, meaning you opflow 0... opflow (
cycles/stride)
* 5 statements in sequence, stride=5,
- scoping for state
- implied data flow dependence vs explicit data flow dependence
- opflow retries vs op retries
discussion
```yaml
bindings:
yesterday: HashRange(0L,1234234L);
statements:
- s1-with-binding: select [userid*] from foobar.baz where day=23
- s2-with-binding: select [userid],[yesterday] from accounts where id={id} and timestamp>{yesterday}
- s3-with-dependency: select login_history from sessions where userid={[userid]}
- rogue-statement: select [yesterday] from ... <--- WARN USER because of explicit dependency below
- s4: select login_history from sessions where userid={[userid]} and timestamp>{yesterday}
- s5: select login_history from sessions where userid={[userid]} and timestamp>{[s2-with-binding/yesterday]}
```
## Dependency Indirection
## Error Handling and DataFlow Semantics
## Capture Syntax
Capturing of variables in statement templates will be signified
with `[varname]`. This examples represents the simplest case where the
user just wants to capture a varaible. Thus the above is taken to mean:
- The scope of the captured variable is the OpFlow.
- The operation is required to succeed. Any other operation which depends
on a `varname` value will be skipped and counted as such.
- The captured type of `varname` is a single object, to be determined
dynamically, with no type checking required.
- A field named `varname` is required to be present in the result set for
the statement that included it.
- Exactly one value for `varname` is required to be present.
- Without other settings to relax sanity constraints, any other appearance
of `[varname]` in another active statement should yield a warning to the
user.
All behavioral variations that diverge from the above will be signified
within the capture syntax as a variation on the above example.
## Inject Syntax
Similar to binding tokens used in statement templates like '{varname}', it
is possible to inject captured variables into statement templates with
the `{[varname]}` syntax. This indicates that the user explicitly wants to
pull a value directly from the captured variable. It is necessary to
indicate variable capture and variable injection distinctly from each
other, and this syntax supports that while remaining familiar to the
bindings formats already supported.
The above syntax example represents the case where the user simply wants
to refer to a variable of a given name. This is the simplest case, and is
taken to mean:
- The scope of the variable is not specified. The value may come from
OpFlow, thread, global or any scope that is available. By default,
scopes should be consulted with the shortest-lived inner scopes first
and widened only if needed to find the variable.
- The variable must be defined in some available scope. By default, It is
an error to refer to a variable for injection that is not defined.
- The type of the variable is not checked on access. The type is presumed
to be compatible with any assignments which are made within whatever
driver type is in use.
- The variable is assumed to be a single-valued type.
All behavioral variations that diverge from the above will be signified
within the variable injection syntax as a variation on the above syntax.
## Scenarios to Consider
basic scenario: user wants to capture each variable from one place
advanced scenarios:
- user wants to capture a named var from one or more places
- some ops may be required to complete successfully, others may not
- some ops may be required to produce a value
- some ops may be required to produce multiple values
* The carrier of op state should enable the following programmatic
constructions:
* Metric measuring the service time of the op on failure
* Metric measuring the service time of the op on success
* Metric measuring the size of the op on success
* Hooks for transforming or acting upon the op or cycle before the op
executes
* Hooks for transforming or acting upon the op or cycle after the op
executes, regardless of result
* Additional modifiers on the op, as in transformers.
* All op contextual actions should be presented as a function on the op
type
* Completion Stages that support the op API should come from built-in
template implementations that already include metrics options, logging
support, etc.

View File

@@ -1,68 +0,0 @@
---
title: Design Guidelines
weight: 34
menu:
main:
parent: Dev Guide
identifier: Design Guidelines
weight: 12
---
These guidelines are partially aspirational. As the project evolves, attempts will be made to
codify these guidelines and measure them on a per-release basis.
## ActivityType Naming
Each activity type should be named with a single lowercase name that is accurate and stable. Any activity type
implementations submitted to the nosqlbench project may be changed by the project maintainers to ensure this.
## ActivityType Documentation
Each activity type should have a file which provides markdown-formatted documentation for the user. This documentation
should be in a markdown format that is clean for terminal rendering for when users have *only* a terminal to read
with.
The single file should be hosted in the classpath under the name of the activity type with a `.md` extension. For example,
the `tcpclient` activity type has documentation in `tcpclient.md` at the root of the classpath.
This allows for users to run `help tcpclient` to get that documentation.
### ActivityType Parameters
The documentation for an activity type should have an explanation of all the activity parameters that are unique to it.
Examples of each of these should be given. The default values for these parameters should be given. Further, if
there are some common settings that may be useful to users, these should be included in the examples.
### Statement Parameters
The documentation for an activity type should have an explanation of all the statement parameters that are unique to it.
Examples of each of these should be given. The default values for these parameters should be given.
## Parameter Use
Activity parameters *and* statement parameters must combine in intuitive ways.
### Additive Configuration
If there is a configuration element in the activity type which can be modified in multiple ways that are not mutually exclusive, each time that
configuration element is modified, it should be done additively. This means that users should not be surprised when
they use multiple parameters that modify the configuration element with only the last one being applied.
### Parameter Conflicts
If it is possible for parameters to conflict with each other in a way that would provide an invalid configuration when both are applied,
or in a way that the underlying API would not strictly allow, then these conditions must be detected by the activity type, with
an error thrown to the user explaining the conflict.
### Parameter Diagnostics
Each and every activity parameter that is set on an activity *must* be logged at DEBUG level with the
pattern `ACTIVITY PARAMETER: <activity alias>` included in the log line, so that the user may verify applied parameter settings.
Further, an explanation for what this parameter does to the specific activity *should* be included in a following log line.
Each and every statement parameter that is set on a statement *must* be logged at DEBUG level with the
pattern `STATEMENT PARAMETER: <statement name>: ` included in the log line, so that the user may verify applied statement settings.
Further, an explanation for what this parameter does to the specific statement *should* be included in a following log line.

View File

@@ -1,20 +0,0 @@
---
title: Error Mapping
weight: 36
menu:
main:
parent: Dev Guide
identifier: Error Mapping
weight: 13
---
Each activity type should provide its own mapping between thrown errors and the error codes assigned to them.
This is facilitated by the `ErrorMapper` interface. It simply provides a way to initialize a cache-friendly view
of classes which are known exception types to a stable numbering of error codes.
By providing an error mapper for your activity type, you are enabling advanced testing scenarios that deal with
error routing and advanced error handling.
If no error mapper is installed in the ActivityType implementation, then a default one is provided which simply
maps all errors to _unknown_.

View File

@@ -1,83 +0,0 @@
---
title: Project Structure
weight: 32
menu:
main:
parent: Dev Guide
identifier: Project Structure
weight: 12
---
nosqlbench is packaged as a
[Maven Reactor](https://maven.apache.org/guides/mini/guide-multiple-modules.html) project.
## Defaults and Dependencies
Maven reactor projects often confuse developers. In this document, we'll explain
the basic structure of the nosqlbench project and the reasons for it.
Firstly, there is a parent for each of the modules. In Maven parlance, you can
think of a parent project as a template for projects that reference it. One of
the reasons you would do this is to simply common build or dependency settings
across many maven projects or modules. That is exactly why we do that here. The
'parent' project for all nosqlbench modules is aptly named 'project-defaults',
as that is exactly what we use it for.
As well, there is a "root" project, which is simply the project at the project's
base directory. It pulls in the modules of the project explicitly as in:
~~~
<modules>
<module>project-defaults</module> <!-- Holds project level defaults -->
<module>engine-api</module> <!-- APIs -->
...
</modules>
~~~
This means that when you build the root project, it will build all the modules
included, but only after linearizing the build order around the inter-module
dependencies. This is an important detail, as it is often overlooked that this
is the purpose of a reactor-style project.
The dependencies between the modules is not implicit. Each module listed in the
root pom.xml has its own explicit dependencies to other modules in the project.
We could cause them to have a common set of dependencies by adding those
dependencies to the 'project-defaults' module, but this would mostly prevent us
from making the dependencies for each as lean and specific as we like. That is
why the dependencies in the project-default **parent** module are empty.
The project-defaults module does, however, have some build, locale, and project
identity settings. You can consider these cross-cutting aspects of the modules
in the project. If you want to put something in the project-default module, and
it is not strictly cross-cutting across the other modules, then don't. That's
how you keep thing sane.
To be clear, cross-cutting build behavior and per-module dependencies are two
separate axes of build management. Try to keep this in mind when thinking about
modular projects and it will help you stay sane. Violating this basic rule is
one of the most common mistakes that newer Maven users make when trying to
enable modularity.
## Intermodule Dependencies
<!--![Project Structure](../../static/diagrams/project_structure.png)-->
Modularity at runtime is enabled via the
[ServiceLoader](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html) API.
The engine-core module uses the engine-api module to know the loadable activity types.
ActivityType implementations use the engine-api module to implement the loadable
activity types. In this way, they both depend on the engine-api module to provide
the common types needed for this to work.
The nb-runtime module allows the separate implementations of the core and the
activity type implementations to exist together in the same classpath. This goes
hand-in-hand with how the runtime jar is bundled. Said differently, the artifact
produced by nb-runtime is a bundling of the things it depends on as a single
application. nb-runtime consolidates dependencies and provides a proper place to
do integration testing.
Taking the API at the bottom, and the components that can be composed together
at the middle, and the bundling project at the top, you'll see a not-uncommon
project structure that looks like a diamond. Going from bottom to top, you can
think of it as API, implementation, and packaging.

View File

@@ -1,58 +0,0 @@
---
title: YAML Config API
weight: 12
menu:
main:
parent: Dev Guide
identifier: configfiles-api
weight: 22
---
In the nosqlbench 2.* and newer versions, a standard YAML configuration format
is provided that makes it easy to use for any activity that requires statements,
tags, parameters and data bindings. This section describes how to use it as a
developer*. Developers should already be familiar with the user guide for the
YAML config format first.
## Simple Usage
StrInterpolater interpolator = new StrInterpolater(activityDef);
String yaml_loc = activityDef.getParams().getOptionalString("yaml").orElse("default");
StmtsDocList sdl = StatementsLoader.load(logger, yaml_loc, interp, "activities");
This loads the yaml at file path *yaml_loc*, while transforming template variables
with the interpolator, searching in the current directory and in the "activities"
subdirectory, and logging all diagnostics.
What you do next depends on the activity type. Typically, an activity will instantiate
an SequencePlanner to establish an operation ordering. See the *stdout* activity type
for an example of this.
## Implementation Notes
The getter methods on this API are intended to provide statements. Thus, all
access to bindings, params, or tags is provided via the StmtDef type.
It is possible to get these as aggregations at the block or doc level for activity
types that can make meaningful use of these as aggregations points. However,
it is usually sufficient to simply access the StmtDef iterator methods, as all
binding, tag, and param values are templated and overridden automatically for you.
within the API.
## On Bindings Usage
It is important to not instantiate or call bindings that are not expected to be
used by the user. This means that your statement form should use named anchors
for each and every binding that will be activated, *or* a clear contract with
the user should be expressed in the documentation for how bindings will be
resolved to statements.
## Named Anchors
The format of named anchors varies by activity type. There are some conventions
that can be used in order to maintain a more uniform user experience:
- String interpolation should use single curly braces when there are no local
conventions.
- Named anchors in prepared statements or other DB activity types should simply
add a name to the existing place holder, to be filtered out by the activity type
before being passed to the lower level driver.

View File

@@ -16,19 +16,19 @@ can be obtained at the releases section of the main NoSQLBench project:
- [NoSQLBench Releases](https://github.com/nosqlbench/nosqlbench/releases)
:::info
**NOTE:**
Once you download the binary, you may need to `chmod +x nb` to make it
executable. In order to run AppImage binaries, like nb, you need to have
fuse support on your system. This is already provided on most
distributions. If after downloading and executing nb, you get an error,
please consult the
[AppImage troubleshooting page](https://docs.appimage.org/user-guide/run-appimages.html#troubleshooting)
.
Once you download the binary, you may need to `chmod +x nb` to make it executable. In order to run AppImage binaries,
like nb, you need to have fuse support on your system. This is already provided on most distributions. If after
downloading and executing nb, you get an error, please consult the
[AppImage troubleshooting page](https://docs.appimage.org/user-guide/run-appimages.html#troubleshooting).
:::
This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`. If you are using
the jar, just replace `./nb` with `java -jar nb.jar` when running
commands. If you are using the jar version, Java 15 is
recommended, and will be required soon.
This documentation assumes you are using the Linux binary initiating
NoSqlBench commands with `./nb`. If you are using the jar, just
replace `./nb` with `java -jar nb.jar` when running commands. If you are
using the jar version, Java 15 is recommended, and will be required soon.
## Run a cluster
@@ -56,15 +56,14 @@ If you want a simple list of yamls which contain named scenarios, run:
# Get a simple list of yamls containing named scenarios
./nb --list-workloads
:::info
Note: These commands will include workloads that were shipped with nb and workloads in your local directory. To learn
more about how to design custom workloads see
**NOTE:**
Note: These commands will include workloads that were shipped with nb and
workloads in your local directory. To learn more about how to design
custom workloads see
[designing workloads](/index.html#/docs/designing_workloads.html)
:::
To provide your own contact points (comma separated), add the `hosts=` parameter
To provide your own contact points (comma separated), add the `hosts=`
parameter
./nb cql-iot hosts=host1,host2
@@ -76,14 +75,14 @@ Additionally, if you have docker installed on your local system, and your user h
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can start
running real workloads without having to learn much about the machinery that makes it happen.
The rest of this section has a more elaborate example that exposes some of the basic options you may want to adjust for
your first serious test.
The rest of this section has a more elaborate example that exposes some of
the basic options you may want to adjust for your first serious test.
:::info
If you want to see system-level metrics from your cluster, it is possible to get these as well as Apache Cassandra level
metrics by using the DSE Metrics Collector (if using DSE), or by setting up a metrics feed to the Prometheus instance in
your local docker stack. You can find the DSE Metrics Collector docs
[here](https://docs.datastax.com/en/monitoring/doc/monitoring/metricsCollector/mcExportMetricsDocker.html).
:::
**NOTE:**
If you want to see system-level metrics from your cluster, it is possible
to get these as well as Apache Cassandra level metrics by using the DSE
Metrics Collector (if using DSE), or by setting up a metrics feed to the
Prometheus instance in your local docker stack. You can find the DSE
Metrics Collector docs
[here](https://docs.datastax.com/en/monitoring/doc/monitoring/metricsCollector/mcExportMetricsDocker.html)
.