fix markdown highlights

This commit is contained in:
Jonathan Shook
2021-02-04 17:46:16 -06:00
parent 0a74a20891
commit ff05868fde
12 changed files with 593 additions and 464 deletions

View File

@@ -5,61 +5,75 @@ weight: 00
# YAML Organization
It is best to keep every workload self-contained within a single YAML file, including schema, data rampup, and the main
phase of testing. The phases of testing are controlled by tags as described in the Standard YAML section.
It is best to keep every workload self-contained within a single YAML
file, including schema, data rampup, and the main phase of testing. The
phases of testing are controlled by tags as described below.
:::info
The phase names described below have been adopted as a convention within the built-in workloads. It is strongly advised
that new workload YAMLs use the same tagging scheme so that workload are more plugable across YAMLs.
:::
**NOTE:**
The phase names described below have been adopted as a convention within
the built-in workloads. It is strongly advised that new workload YAMLs use
the same tagging scheme so that workload are more plugable across YAMLs.
## Schema phase
The schema phase is simply a phase of your test which creates the necessary schema on your target system. For CQL, this
generally consists of a keyspace and one ore more table statements. There is no special schema layer in nosqlbench. All
statements executed are simply statements. This provides the greatest flexibility in testing since every activity type
is allowed to control its DDL and DML using the same machinery.
The schema phase is simply a phase of your test which creates the
necessary schema on your target system. For CQL, this generally consists
of a keyspace and one ore more table statements. There is no special
schema layer in nosqlbench. All statements executed are simply statements.
This provides the greatest flexibility in testing since every activity
type is allowed to control its DDL and DML using the same machinery.
The schema phase is normally executed with defaults for most parameters. This means that statements will execute in the
order specified in the YAML, in serialized form, exactly once. This is a welcome side-effect of how the initial
parameters like _cycles_ is set from the statements which are activated by tagging.
The schema phase is normally executed with defaults for most parameters.
This means that statements will execute in the order specified in the
YAML, in serialized form, exactly once. This is a welcome side-effect of
how the initial parameters like _cycles_ is set from the statements which
are activated by tagging.
You can mark statements as schema phase statements by adding this set of tags to the statements, either directly, or by
block:
You can mark statements as schema phase statements by adding this set of
tags to the statements, either directly, or by block:
tags:
phase: schema
## Rampup phase
When you run a performance test, it is very important to be aware of how much data is present. Higher density tests are
more realistic for systems which accumulate data over time, or which have a large working set of data. The amount of
data on the system you are testing should recreate a realistic amount of data that you would run in production, ideally.
In general, there is a triangular trade-off between service time, op rate, and data density.
When you run a performance test, it is very important to be aware of how
much data is present. Higher density tests are more realistic for systems
which accumulate data over time, or which have a large working set of
data. The amount of data on the system you are testing should recreate a
realistic amount of data that you would run in production, ideally. In
general, there is a triangular trade-off between service time, op rate,
and data density.
It is the purpose of the _rampup_ phase to create the backdrop data on a target system that makes a test meaningful for
some level of data density. Data density is normally discussed as average per node, but it is also important to consider
distribution of data as it varies from the least dense to the most dense nodes.
It is the purpose of the _rampup_ phase to create the backdrop data on a
target system that makes a test meaningful for some level of data density.
Data density is normally discussed as average per node, but it is also
important to consider distribution of data as it varies from the least
dense to the most dense nodes.
Because it is useful to be able to add data to a target cluster in an incremental way, the bindings which are used with
a _rampup_ phase may actually be different from the ones used for a _main_ phase. In most cases, you want the rampup
phase to create data in a way that incrementally adds to the population of data in the cluster. This allows you to add
some data to a cluster with `cycles=0..1M` and then decide whether to continue adding data using the next contiguous
range of cycles, with `cycles=1M..2M` and so on.
Because it is useful to be able to add data to a target cluster in an
incremental way, the bindings which are used with a _rampup_ phase may
actually be different from the ones used for a _main_ phase. In most
cases, you want the rampup phase to create data in a way that
incrementally adds to the population of data in the cluster. This allows
you to add some data to a cluster with `cycles=0..1M` and then decide
whether to continue adding data using the next contiguous range of cycles,
with `cycles=1M..2M` and so on.
You can mark statements as rampup phase statements by adding this set of tags to the statements, either directly, or by
block:
You can mark statements as rampup phase statements by adding this set of
tags to the statements, either directly, or by block:
tags:
phase: rampup
## Main phase
The main phase of a nosqlbench scenario is the one during which you really care about the metric. This is the actual
test that everything else has prepared your system for.
The main phase of a nosqlbench scenario is the one during which you really
care about the metric. This is the actual test that everything else has
prepared your system for.
You can mark statement as schema phase statements by adding this set of tags to the statements, either directly, or by
block:
You can mark statement as schema phase statements by adding this set of
tags to the statements, either directly, or by block:
tags:
phase: main

View File

@@ -5,15 +5,18 @@ weight: 07
# Multi-Docs
The YAML spec allows for multiple yaml documents to be concatenated in the same file with a separator:
The YAML spec allows for multiple yaml documents to be concatenated in the
same file with a separator:
```yaml
---
```
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of
statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you
need only put them in separate logical documents, separated by a triple-dash.
This offers an additional convenience when configuring activities. If you
want to parameterize or tag some a set of statements with their own
bindings, params, or tags, but alongside another set of uniquely
configured statements, you need only put them in separate logical
documents, separated by a triple-dash.
For example:
@@ -43,11 +46,13 @@ doc2.number eight
doc1.form1 doc1.1
```
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken
apart into a whole other level of partitioning if desired.
This shows that you can use the power of blocks and tags together at one
level and also allow statements to be broken apart into a whole other
level of partitioning if desired.
:::warning
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML
workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are
generally a better choice. See examples in the standard workloads.
:::
**WARNING:**
The multi-doc support is there as a ripcord when you need it. However, it
is strongly advised that you keep your YAML workloads simple to start and
only use features like the multi-doc when you absolutely need it. For
this, blocks are generally a better choice. See examples in the standard
workloads.

View File

@@ -10,26 +10,30 @@ Docs, Blocks, and Statements can all have names:
```yaml
name: doc1
blocks:
- name: block1
statements:
- stmt1: statement1
- name: st2
stmt: statement2
- name: block1
statements:
- stmt1: statement1
- name: st2
stmt: statement2
---
name: doc2
...
```
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except
for documentation or metric naming purposes.
This provides a layered naming scheme for the statements themselves. It is
not usually important to name things except for documentation or metric
naming purposes.
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the
document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
If no names are provided, then names are automatically created for blocks
and statements. Statements assigned at the document level are assigned
to "block0". All other statements are named with the
format `doc#--block#--stmt#`.
For example, the full name of statement1 above would be `doc1--block1--stmt1`.
For example, the full name of statement1 above would
be `doc1--block1--stmt1`.
:::info
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want
to adopt the habit of naming all your statements something basic and descriptive.
:::
**NOTE:**
If you anticipate wanting to get metrics for a specific statement in
addition to the other metrics, then you will want to adopt the habit of
naming all your statements something basic and descriptive.

View File

@@ -5,44 +5,50 @@ weight: 10
# Named Scenarios
There is one final element of a yaml that you need to know about: _named scenarios_.
There is one final element of a yaml that you need to know about: _named
scenarios_.
**Named Scenarios allow anybody to run your testing workflows with a single command.**
**Named Scenarios allow anybody to run your testing workflows with a
single command.**
You can provide named scenarios for a workload like this:
```yaml
# contents of myworkloads.yaml
scenarios:
default:
- run driver=diag cycles=10 alias=first-ten
- run driver=diag cycles=10..20 alias=second-ten
longrun:
- run driver=diag cycles=10M
default:
- run driver=diag cycles=10 alias=first-ten
- run driver=diag cycles=10..20 alias=second-ten
longrun:
- run driver=diag cycles=10M
```
This provides a way to specify more detailed workflows that users may want to run without them having to build up a
command line for themselves.
This provides a way to specify more detailed workflows that users may want
to run without them having to build up a command line for themselves.
A couple of other forms are supported in the YAML, for terseness:
```yaml
scenarios:
oneliner: run driver=diag cycles=10
mapform:
part1: run driver=diag cycles=10 alias=part2
part2: run driver=diag cycles=20 alias=part2
oneliner: run driver=diag cycles=10
mapform:
part1: run driver=diag cycles=10 alias=part2
part2: run driver=diag cycles=20 alias=part2
```
These forms simply provide finesse for common editing habits, but they are automatically read internally as a list. In
the map form, the names are discarded, but they may be descriptive enough for use as inline docs for some users. The
order is retained as listed, since the names have no bearing on the order.
These forms simply provide finesse for common editing habits, but they are
automatically read internally as a list. In the map form, the names are
discarded, but they may be descriptive enough for use as inline docs for
some users. The order is retained as listed, since the names have no
bearing on the order.
## Scenario selection
When a named scenario is run, it is *always* named, so that it can be looked up in the list of named scenarios under
your `scenarios:` property. The only exception to this is when an explicit scenario name is not found on the command
line, in which case it is automatically assumed to be _default_.
When a named scenario is run, it is *always* named, so that it can be
looked up in the list of named scenarios under your `scenarios:` property.
The only exception to this is when an explicit scenario name is not found
on the command line, in which case it is automatically assumed to be _
default_.
Some examples may be more illustrative:
@@ -67,60 +73,75 @@ nb scenario myworkloads longrun default longrun scenario another.yaml name1 name
## Workload selection
The examples above contain no reference to a workload (formerly called _yaml_). They don't need to, as they refer to
themselves implicitly. You may add a `workload=` parameter to the command templates if you like, but this is never
needed for basic use, and it is error prone to keep the filename matched to the command template. Just leave it out by
default.
The examples above contain no reference to a workload (formerly called _
yaml_). They don't need to, as they refer to themselves implicitly. You
may add a `workload=` parameter to the command templates if you like, but
this is never needed for basic use, and it is error prone to keep the
filename matched to the command template. Just leave it out by default.
_However_, if you are doing advanced scripting across multiple systems, you can actually provide a `workload=` parameter
particularly to use another workload description in your test.
_However_, if you are doing advanced scripting across multiple systems,
you can actually provide a `workload=` parameter particularly to use
another workload description in your test.
:::info
This is a powerful feature for workload automation and organization. However, it can get unweildy quickly. Caution is
advised for deep-linking too many scenarios in a workspace, as there is no mechanism for keeping them in sync when small
changes are made.
:::
**NOTE:**
This is a powerful feature for workload automation and organization.
However, it can get unweildy quickly. Caution is advised for deep-linking
too many scenarios in a workspace, as there is no mechanism for keeping
them in sync when small changes are made.
## Named Scenario Discovery
For named scenarios, there is a way for users to find all the named scenarios that are currently bundled or in view of
their current directory. A couple simple rules must be followed by scenario publishers in order to keep things simple:
For named scenarios, there is a way for users to find all the named
scenarios that are currently bundled or in view of their current
directory. A couple simple rules must be followed by scenario publishers
in order to keep things simple:
1. Workload files in the current directory `*.yaml` are considered.
2. Workload files under in the relative path `activities/` with name `*.yaml` are
considered.
3. The same rules are used when looking in the bundled nosqlbench, so built-ins
come along for the ride.
4. Any workload file that contains a `scenarios:` tag is included, but all others
are ignored.
2. Workload files under in the relative path `activities/` with
name `*.yaml` are considered.
3. The same rules are used when looking in the bundled nosqlbench, so
built-ins come along for the ride.
4. Any workload file that contains a `scenarios:` tag is included, but all
others are ignored.
This doesn't mean that you can't use named scenarios for workloads in other locations. It simply means that when users
use the `--list-scenarios` option, these are the only ones they will see listed.
This doesn't mean that you can't use named scenarios for workloads in
other locations. It simply means that when users use
the `--list-scenarios` option, these are the only ones they will see
listed.
## Parameter Overrides
You can override parameters that are provided by named scenarios. Any parameter that you specify on the command line
after your workload and optional scenario name will be used to override or augment the commands that are provided for
the named scenario.
You can override parameters that are provided by named scenarios. Any
parameter that you specify on the command line after your workload and
optional scenario name will be used to override or augment the commands
that are provided for the named scenario.
This is powerful, but it also means that you can sometimes munge user-provided activity parameters on the command line
with the named scenario commands in ways that may not make sense. To solve this, the parameters in the named scenario
commands may be locked. You can lock them silently, or you can provide a verbose locking that will cause an error if the
user even tries to adjust them.
This is powerful, but it also means that you can sometimes munge
user-provided activity parameters on the command line with the named
scenario commands in ways that may not make sense. To solve this, the
parameters in the named scenario commands may be locked. You can lock them
silently, or you can provide a verbose locking that will cause an error if
the user even tries to adjust them.
Silent locking is provided with a form like `param==value`. Any silent locked parameters will reject overrides from the
command line, but will not interrupt the user.
Silent locking is provided with a form like `param==value`. Any silent
locked parameters will reject overrides from the command line, but will
not interrupt the user.
Verbose locking is provided with a form like `param===value`. Any time a user provides a parameter on the command line
for the named parameter, an error is thrown and they are informed that this is not possible. This level is provided for
cases in which you would not want the user to be unaware of an unset parameter which is germain and specific to the
named scenario.
Verbose locking is provided with a form like `param===value`. Any time a
user provides a parameter on the command line for the named parameter, an
error is thrown and they are informed that this is not possible. This
level is provided for cases in which you would not want the user to be
unaware of an unset parameter which is germain and specific to the named
scenario.
All other parameters provided by the user will take the place of the same-named parameters provided in *each* command
templates, in the order they appear in the template. Any other parameters provided by the user will be added to *each*
All other parameters provided by the user will take the place of the
same-named parameters provided in *each* command templates, in the order
they appear in the template. Any other parameters provided by the user
will be added to *each*
of the command templates in the order they appear on the command line.
This is a little counter-intuitive at first, but once you see some examples it should make sense.
This is a little counter-intuitive at first, but once you see some
examples it should make sense.
## Parameter Override Examples
@@ -129,18 +150,19 @@ Consider a simple workload with three named scenarios:
```yaml
# basics.yaml
scenarios:
s1: run driver=stdout cycles=10
s2: run driver=stdout cycles==10
s3: run driver=stdout cycles===10
s1: run driver=stdout cycles=10
s2: run driver=stdout cycles==10
s3: run driver=stdout cycles===10
bindings:
c: Identity()
c: Identity()
statements:
- A: "cycle={c}\n"
- A: "cycle={c}\n"
```
Running this with no options prompts the user to select one of the named scenarios:
Running this with no options prompts the user to select one of the named
scenarios:
```text
$ nb basics
@@ -150,8 +172,8 @@ $
### Basic Override example
If you run the first scenario `s1` with your own value for `cycles=7`, it does as you
ask:
If you run the first scenario `s1` with your own value for `cycles=7`, it
does as you ask:
```text
$ nb basics s1 cycles=7
@@ -168,8 +190,10 @@ $
### Silent Locking example
If you run the second scenario `s2` with your own value for `cycles=7`, then it does what the locked parameter
`cycles==10` requires, without telling you that it is ignoring the specified value on your command line.
If you run the second scenario `s2` with your own value for `cycles=7`,
then it does what the locked parameter
`cycles==10` requires, without telling you that it is ignoring the
specified value on your command line.
```text
$ nb basics s2 cycles=7
@@ -187,13 +211,16 @@ cycle=9
$
```
Sometimes, this is appropriate, such as when specifying settings like `threads==` for schema phases.
Sometimes, this is appropriate, such as when specifying settings
like `threads==` for schema phases.
### Verbose Locking example
If you run the third scenario `s3` with your own value for `cycles=7`, then you will get an error telling you that this
is not possible. Sometimes you want to make sure tha the user knows a parameter should not be changed, and that if they
want to change it, they'll have to make their own custom version of the scenario in question.
If you run the third scenario `s3` with your own value for `cycles=7`,
then you will get an error telling you that this is not possible.
Sometimes you want to make sure tha the user knows a parameter should not
be changed, and that if they want to change it, they'll have to make their
own custom version of the scenario in question.
```text
$ nb basics s3 cycles=7
@@ -201,52 +228,65 @@ ERROR: Unable to reassign value for locked param 'cycles===7'
$
```
Ultimately, it is up to the scenario designer when to lock parameters for users. The built-in workloads offer some
examples on how to set these parameters so that the right value are locked in place without bother the user, but some
values are made very clear in how they should be set. Please look at these examples for inspiration when you need.
Ultimately, it is up to the scenario designer when to lock parameters for
users. The built-in workloads offer some examples on how to set these
parameters so that the right value are locked in place without bother the
user, but some values are made very clear in how they should be set.
Please look at these examples for inspiration when you need.
## Forcing Undefined (default) Parameters
If you want to ensure that any parameter in a named scenario template remains unset in the generated scenario script,
you can assign it a value of UNDEF. The locking behaviors described above apply to this one as well. Thus, for schema
commands which rely on the default sequence length (which is based on the number of active statements), you can set
cycles==UNDEF to ensure that when a user passes a cycles parameter the schema phase doesn't break with too many cycles.
If you want to ensure that any parameter in a named scenario template
remains unset in the generated scenario script, you can assign it a value
of UNDEF. The locking behaviors described above apply to this one as well.
Thus, for schema commands which rely on the default sequence length (which
is based on the number of active statements), you can set cycles==UNDEF to
ensure that when a user passes a cycles parameter the schema phase doesn't
break with too many cycles.
## Automatic Parameters
Some parameters are already known due to the fact that you are using named scenarios.
Some parameters are already known due to the fact that you are using named
scenarios.
### workload
The `workload` parameter is, by default, set to the logical path (fully qualified workload name) of the yaml file
containing the named scenario. However, if the command template contains this parameter, it may be overridden by users
as any other parameter depending on the assignment operators as explained above.
The `workload` parameter is, by default, set to the logical path (fully
qualified workload name) of the yaml file containing the named scenario.
However, if the command template contains this parameter, it may be
overridden by users as any other parameter depending on the assignment
operators as explained above.
### alias
The `alias` parameter is, by default, set to the expanded name of WORKLOAD_SCENARIO_STEP, which means that each activity
within the scenario has a distinct and symbolic name. This is important for distinguishing metrics from one another
across workloads, named scenarios, and steps within a named scenario. The above words are interpolated into the alias as
follows:
The `alias` parameter is, by default, set to the expanded name of
WORKLOAD_SCENARIO_STEP, which means that each activity within the scenario
has a distinct and symbolic name. This is important for distinguishing
metrics from one another across workloads, named scenarios, and steps
within a named scenario. The above words are interpolated into the alias
as follows:
- WORKLOAD - The simple name part of the fully qualified workload name. For example, with a workload (yaml path) of
foo/bar/baz.yaml, the WORKLOAD name used here would be `baz`.
- WORKLOAD - The simple name part of the fully qualified workload name.
For example, with a workload (yaml path) of foo/bar/baz.yaml, the
WORKLOAD name used here would be `baz`.
- SCENARIO - The name of the scenario as provided on the command line.
- STEP - The name of the step in the named scenario. If you used the list or string forms to provide a command template,
then the steps are automatically named as a zero-padded number representing the step in the named scenario, starting
from `000`, per named scenario. (The numbers are not globally assigned)
- STEP - The name of the step in the named scenario. If you used the list
or string forms to provide a command template, then the steps are
automatically named as a zero-padded number representing the step in the
named scenario, starting from `000`, per named scenario. (The numbers
are not globally assigned)
Because it is important to have uniquely named activities for the sake of sane metrics and logging, any alias provided
when using named scenarios which does not include the three tokens above will cause a warning to be issued to the user
explaining why this is a bad idea.
Because it is important to have uniquely named activities for the sake of
sane metrics and logging, any alias provided when using named scenarios
which does not include the three tokens above will cause a warning to be
issued to the user explaining why this is a bad idea.
:::info
**NOTE:**
UNDEF is handled before alias expansion above, so it is possible to force
the default activity naming behavior above with `alias===UNDEF`. This is
generally recommended, and will inform users if they try to set the alias
in an unsafe way.
UNDEF is handled before alias expansion above, so it is possible to force the default activity naming behavior above
with `alias===UNDEF`. This is generally recommended, and will inform users if they try to set the alias in an unsafe
way.
:::

View File

@@ -5,12 +5,14 @@ weight: 2
# Example Commands
Let's run a simple test against a cluster to establish some basic familiarity with the NoSQLBench.
Let's run a simple test against a cluster to establish some basic
familiarity with the NoSQLBench.
## Create a Schema
We will start by creating a simple schema in the database. From your command line, go ahead and execute the following
command, replacing the `host=<host-or-ip>` with that of one of your database nodes.
We will start by creating a simple schema in the database. From your
command line, go ahead and execute the following command, replacing
the `host=<host-or-ip>` with that of one of your database nodes.
```text
./nb run driver=cql workload=cql-keyvalue tags=phase:schema host=<host-or-ip>
@@ -33,28 +35,36 @@ Let's break down each of those command line options.
`run` tells nosqlbench to run an activity.
`driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to
use the DataStax Java Driver and execute CQL statements against a database.
`driver=...` is used to specify the activity type (driver). In this case
we are using `cql`, which tells nosqlbench to use the DataStax Java Driver
and execute CQL statements against a database.
`workload=...` is used to specify the workload definition file that defines the activity.
`workload=...` is used to specify the workload definition file that
defines the activity.
In this example, we use `cql-keyvalue` which is a pre-built workload that is packaged with nosqlbench.
In this example, we use `cql-keyvalue` which is a pre-built workload that
is packaged with nosqlbench.
`tags=phase:schema` tells nosqlbench to run the yaml block that has the `phase:schema` defined as one of its tags.
`tags=phase:schema` tells nosqlbench to run the yaml block that has
the `phase:schema` defined as one of its tags.
In this example, that is the DDL portion of the `cql-keyvalue` workload. `host=...` tells nosqlbench how to connect to
your database, only one host is necessary.
In this example, that is the DDL portion of the `cql-keyvalue`
workload. `host=...` tells nosqlbench how to connect to your database,
only one host is necessary.
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with
If you like, you can verify the result of this command by decribing your
keyspace in cqlsh or DataStax Studio with
`DESCRIBE KEYSPACE baselines`.
## Load Some Data
Before running a test of typical access patterns where you want to capture the results, you need to make the test more
interesting than loading an empty table. For this, we use the rampup phase.
Before running a test of typical access patterns where you want to capture
the results, you need to make the test more interesting than loading an
empty table. For this, we use the rampup phase.
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is
generating for CQL statements.
Before sending our test writes to the database, we will use the `stdout`
activity type so we can see what nosqlbench is generating for CQL
statements.
Go ahead and execute the following command:
@@ -75,29 +85,36 @@ insert into baselines.keyvalue (key, value) values (8,296173906);
insert into baselines.keyvalue (key, value) values (9,97405552);
```
NoSQLBench deterministically generates data, so the generated values will be the same from run to run.
NoSQLBench deterministically generates data, so the generated values will
be the same from run to run.
Now we are ready to write some data to our database. Go ahead and execute the following from your command line:
Now we are ready to write some data to our database. Go ahead and execute
the following from your command line:
./nb run driver=cql workload=cql-keyvalue tags=phase:rampup host=<host-or-ip> cycles=100k --progress console:1s
Note the differences between this and the command that we used to generate the schema.
Note the differences between this and the command that we used to generate
the schema.
`tags=phase:rampup` is running the yaml block in `cql-keyvalue` that has only INSERT statements.
`tags=phase:rampup` is running the yaml block in `cql-keyvalue` that has
only INSERT statements.
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an
appropriately large number of cycles in actual testing to make your main test meaningful.
`cycles=100k` will run a total of 100,000 operations, in this case,
100,000 writes. You will want to pick an appropriately large number of
cycles in actual testing to make your main test meaningful.
:::info
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for
`cycles=0..n`, which makes cycles a zero-based range. For example, cycles=5 means that the activity will use cycles
0,1,2,3,4, but
not 5. The reason for this is explained in detail in the Activity Parameters section.
:::
**NOTE:**
The cycles parameter is not just a quantity. It is a range of values.
The `cycles=n` format is short for
`cycles=0..n`, which makes cycles a zero-based range. For example,
cycles=5 means that the activity will use cycles 0,1,2,3,4, but not 5. The
reason for this is explained in detail in the Activity Parameters section.
These parameters are explained in detail in the section on _Activity Parameters_.
These parameters are explained in detail in the section on _Activity
Parameters_.
`--progress console:1s` will print the progression of the run to the console every 1 second.
`--progress console:1s` will print the progression of the run to the
console every 1 second.
You should see output that looks like this
@@ -118,8 +135,9 @@ cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
## Run the main test phase
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default
this runs a 50% read / 50% write workload.
Now that we have a base dataset of 100k rows in the database, we will now
run a mixed read / write workload, by default this runs a 50% read / 50%
write workload.
./nb run driver=cql workload=cql-keyvalue tags=phase:main host=<host-or-ip> cycles=100k cyclerate=5000 threads=50 --progress console:1s
@@ -156,17 +174,23 @@ cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
We have a few new command line options here:
`tags=phase:main` is using a new block in our activity's yaml that contains both read and write queries.
`tags=phase:main` is using a new block in our activity's yaml that
contains both read and write queries.
`threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for
workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
`threads=50` is an important one. The default for nosqlbench is to run
with a single thread. This is not adequate for workloads that will be
running many operations, so threads is used as a way to increase
concurrency on the client side.
`cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option
is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
`cyclerate=5000` is used to control the operations per second that are
initiated by nosqlbench. This command line option is the primary means to
rate limit the workload and here we are running at 5000 ops/sec.
## Now What?
Note in the above output, we see `Logging to logs/scenario_20190812_154431_028.log`.
Note in the above output, we
see `Logging to logs/scenario_20190812_154431_028.log`.
By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the
next section Viewing Results.
By default nosqlbench records the metrics from the run in this file, we
will go into detail about these metrics in the next section Viewing
Results.

View File

@@ -5,26 +5,30 @@ weight: 3
# Example Results
We just ran a very simple workload against our database. In that example, we saw that nosqlbench writes to a log file
and it is in that log file where the most basic form of metrics are displayed.
We just ran a very simple workload against our database. In that example,
we saw that nosqlbench writes to a log file and it is in that log file
where the most basic form of metrics are displayed.
## Log File Metrics
For our previous run, we saw that nosqlbench was writing to `logs/scenario_20190812_154431_028.log`
For our previous run, we saw that nosqlbench was writing
to `logs/scenario_20190812_154431_028.log`
Even when you don't configure nosqlbench to write its metrics to another location, it will periodically report all the
metrics to the log file. At the end of a scenario, before nosqlbench shuts down, it will flush the partial reporting
interval again to the logs. This means you can always look in the logs for metrics information.
Even when you don't configure nosqlbench to write its metrics to another
location, it will periodically report all the metrics to the log file. At
the end of a scenario, before nosqlbench shuts down, it will flush the
partial reporting interval again to the logs. This means you can always
look in the logs for metrics information.
:::warning
If you look in the logs for metrics, be aware that the last report will only contain a partial interval of results. When
looking at the last partial window, only metrics which average over time or which compute the mean for the whole test
will be meaningful.
:::
**WARNING:**
If you look in the logs for metrics, be aware that the last report will
only contain a partial interval of results. When looking at the last
partial window, only metrics which average over time or which compute the
mean for the whole test will be meaningful.
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus
a subset of the most important metrics.
Below is a sample of the log that gives us our basic metrics. There is a
lot to digest here, for now we will only focus a subset of the most
important metrics.
```text
2019-08-12 15:46:00,274 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:48] -- BEGIN METRICS DETAIL --
@@ -35,16 +39,20 @@ a subset of the most important metrics.
2019-08-12 15:46:01,703 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:56] -- END METRICS DETAIL --
```
The log contains lots of information on metrics, but this is obviously _
not_ the most desirable way to consume metrics from nosqlbench.
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics
from nosqlbench.
We recommend that you use one of these methods, according to your
environment or tooling available:
We recommend that you use one of these methods, according to your environment or tooling available:
1. `--docker-metrics` with a local docker-based grafana dashboard (See the
section on Docker Based Metrics)
2. Send your metrics to a dedicated graphite server
with `--report-graphite-to graphitehost`
3. Record your metrics to local CSV files
with `--report-csv-to my_metrics_dir`
4. Record your metrics to HDR logs
with `--log-histograms my_hdr_metrics.log`
1. `--docker-metrics` with a local docker-based grafana dashboard (See the section on Docker Based Metrics)
2. Send your metrics to a dedicated graphite server with `--report-graphite-to graphitehost`
3. Record your metrics to local CSV files with `--report-csv-to my_metrics_dir`
4. Record your metrics to HDR logs with `--log-histograms my_hdr_metrics.log`
See the command line reference for details on how to route your metrics to a metrics collector or format of your
preference.
See the command line reference for details on how to route your metrics to
a metrics collector or format of your preference.

View File

@@ -5,32 +5,33 @@ weight: 05
# Activity Parameters
Activity parameters are passed as named arguments for an activity,
either on the command line or via a scenario script. On the command
line, these take the form of
Activity parameters are passed as named arguments for an activity, either
on the command line or via a scenario script. On the command line, these
take the form of
<paramname>=<paramvalue>
Some activity parameters are universal in that they can be used with any
driver type. These parameters are recognized by nosqlbench whether or
not they are recognized by a particular driver implementation. These are
driver type. These parameters are recognized by nosqlbench whether or not
they are recognized by a particular driver implementation. These are
called _core parameters_. Only core activity parameters are documented
here.
:::info
To see what activity parameters are valid for a given activity type, see the documentation for that activity type with
**NOTE:**
To see what activity parameters are valid for a given activity type, see
the documentation for that activity type with
`nb help <activity type>`.
:::
When starting out, you want to familiarize yourself with these parameters. The most important ones to learn about first
are driver, cycles and threads.
When starting out, you want to familiarize yourself with these parameters.
The most important ones to learn about first are driver, cycles and
threads.
## driver
For historic reasons, you can also use `type`. They both mean the same
thing for now, but `driver` is more descriptive. The `type` parameter
will continue to be supported in this major version (3.x), but it will
be an error to use it in 4.x and newer.
thing for now, but `driver` is more descriptive. The `type` parameter will
continue to be supported in this major version (3.x), but it will be an
error to use it in 4.x and newer.
- `driver=<activity type>`
- _default_: inferred from `alias` or `yaml` parameters, or unset
@@ -39,17 +40,17 @@ be an error to use it in 4.x and newer.
Every activity is powered by a named ActivityType. Thus, you must set
the `type` parameter. If you do not specify this parameter, it will be
inferred from a substring match against the alias and/or yaml
parameters. If there is more than one valid match for a valid type
value, then you must set the type parameter directly.
inferred from a substring match against the alias and/or yaml parameters.
If there is more than one valid match for a valid type value, then you
must set the type parameter directly.
Telling nosqlbench what type of an activity will be run also determines
what other parameters are considered valid and how they will be used. So
in this way, the type parameter is actually the base parameter for any
activity. When used with scenario commands like `run` or `start`, an
activity of the named type will be initialized, and then further
activity parameters on the command line will be used to configure it
before it is started.
activity of the named type will be initialized, and then further activity
parameters on the command line will be used to configure it before it is
started.
## alias
@@ -62,10 +63,9 @@ You *should* set the _alias_ parameter when you have multiple activities,
when you want to name metrics per-activity, or when you want to control
activities via scripting.
Each activity can be given a symbolic name known as an _alias_. It is
good practice to give all your activities an alias, since this
determines the named used in logging, metrics, and even scripting
control.
Each activity can be given a symbolic name known as an _alias_. It is good
practice to give all your activities an alias, since this determines the
named used in logging, metrics, and even scripting control.
_default value_ : The name of any provided YAML filename is used as the
basis for the default alias. Otherwise, the activity type name is used.
@@ -81,34 +81,38 @@ This is a convenience for simple test scenarios only.
You *should* set the _threads_ parameter when you need to ramp up a
workload.
Each activity can be created with a number of threads. It is important
to adjust this setting to the system types used by nosqlbench.
Each activity can be created with a number of threads. It is important to
adjust this setting to the system types used by nosqlbench.
_default value_ : For now, the default is simply *1*. Users must be
aware of this setting and adjust it to a reasonable value for their
workloads.
_default value_ : For now, the default is simply *1*. Users must be aware
of this setting and adjust it to a reasonable value for their workloads.
`threads=auto` : When you set `threads=auto`, it will set the number of threads to 10x the number of cores
in your system. There is no distinction here between full cores and hardware threads. This is generally
a reasonable number of threads to tap into the procesing power of a client system.
`threads=auto` : When you set `threads=auto`, it will set the number of
threads to 10x the number of cores in your system. There is no distinction
here between full cores and hardware threads. This is generally a
reasonable number of threads to tap into the procesing power of a client
system.
`threads=__x` : When you set `threads=5x` or `threads=10x`, you will set the number of threads to some multiplier
of the logical CPUs in the local system.
`threads=__x` : When you set `threads=5x` or `threads=10x`, you will set
the number of threads to some multiplier of the logical CPUs in the local
system.
:::info
The threads parameter will work slightly differently for activities using the async parameter. For example, when
`async=500` is provided, then the number of async operations is split between all configured threads, and each thread
will juggle a number of in-flight operations asynchronously. Without the async parameter, threads determines the logical
concurrency level of nosqlbench in the classic 'request-per-thread' mode. Neither mode is strictly correct, and both
modes can be used for more accurate testing depending on the constraints of your environment.
:::
**NOTE:**
The threads parameter will work slightly differently for activities using
the async parameter. For example, when `async=500` is provided, then the
number of async operations is split between all configured threads, and
each thread will juggle a number of in-flight operations asynchronously.
Without the async parameter, threads determines the logical concurrency
level of nosqlbench in the classic 'request-per-thread' mode. Neither mode
is strictly correct, and both modes can be used for more accurate testing
depending on the constraints of your environment.
A good rule of thumb for setting threads for maximum effect is to set it
relatively high, such as 10XvCPU when running synchronous workloads
(when not providing the async parameter), and to 5XvCPU for all async
workloads. Variation in system dynamics make it difficult to peg an
ideal number, so experimentation is encouraged while you dial in your
settings initially.
workloads. Variation in system dynamics make it difficult to peg an ideal
number, so experimentation is encouraged while you dial in your settings
initially.
## cycles
@@ -119,14 +123,14 @@ settings initially.
- _dynamic_: no
The cycles parameter determines the starting and ending point for an
activity. It determines the range of values which will act as seed
values for each operation. For each cycle of the test, a statement is
built from a statement template and executed as an operation.
activity. It determines the range of values which will act as seed values
for each operation. For each cycle of the test, a statement is built from
a statement template and executed as an operation.
If you do not set the cycles parameter, then it will automatically be
set to the size of the sequence. The sequence is simply the length of
the op sequence that is constructed from the active statements and
ratios in your activity YAML.
If you do not set the cycles parameter, then it will automatically be set
to the size of the sequence. The sequence is simply the length of the op
sequence that is constructed from the active statements and ratios in your
activity YAML.
You *should* set the cycles for every activity except for schema-like
activities, or activities which you run just as a sanity check of active
@@ -137,14 +141,15 @@ number of cycles, and is equivalent to `cycles=0..<cycle max>`. In both
cases, the max value is not the actual number of the last cycle. This is
because all cycle parameters define a closed-open interval. In other
words, the minimum value is either zero by default or the specified
minimum value, but the maximum value is the first value *not* included
in the interval. This means that you can easily stack intervals over
minimum value, but the maximum value is the first value *not* included in
the interval. This means that you can easily stack intervals over
subsequent runs while knowing that you will cover all logical cycles
without gaps or duplicates. For example, given `cycles=1000` and then
`cycles=1000..2000`, and then `cycles=2000..5K`, you know that all
cycles between 0 (inclusive) and 5000 (exclusive) have been specified.
`cycles=1000..2000`, and then `cycles=2000..5K`, you know that all cycles
between 0 (inclusive) and 5000 (exclusive) have been specified.
## stride
- `stride=<stride>`
- _default_: same as op sequence length
- _required_: no
@@ -153,30 +158,28 @@ cycles between 0 (inclusive) and 5000 (exclusive) have been specified.
Usually, you don't want to provide a setting for stride, but it is still
important to understand what it does. Within nosqlbench, each time a
thread needs to allocate a set of cycles to operate on, it takes a
contiguous range of values from a shared atomic value. Thus, the stride
is the unit of micro-batching within nosqlbench. It also means that you
can use stride to optimize a workload by setting the value higher than
the default. For example if you are running a single-statement workload
at a very high rate, it doesn't make sense for threads to allocate one
op at a time from a shared atomic value. You can simply set
contiguous range of values from a shared atomic value. Thus, the stride is
the unit of micro-batching within nosqlbench. It also means that you can
use stride to optimize a workload by setting the value higher than the
default. For example if you are running a single-statement workload at a
very high rate, it doesn't make sense for threads to allocate one op at a
time from a shared atomic value. You can simply set
`stride=1000` to cause (ballpark estimation) about 1000X less internal
contention.
The stride is initialized to the calculated sequence length. The
sequence length is simply the number of operations in the op sequence
that is planned from your active statements and their ratios.
The stride is initialized to the calculated sequence length. The sequence
length is simply the number of operations in the op sequence that is
planned from your active statements and their ratios.
You usually do not want to set the stride directly. If you do, make sure
it is a multiple of what it would normally be set to if you need to ensure
that sequences are not divided up differently. This can be important when
simulating the access patterns of applications.
:::info
When simulating multi-op access patterns in non-async mode, the
stride metric can tell you how long it took for a whole group of
operations to complete.
:::
**NOTE:**
When simulating multi-op access patterns in non-async mode, the stride
metric can tell you how long it took for a whole group of operations to
complete.
## async
@@ -185,21 +188,20 @@ operations to complete.
- _required_: no
- _dynamic_: no
The `async=<ops>` parameter puts an activity into an asynchronous
dispatch mode and configures each thread to juggle a proportion of the
operations specified. If you specify `async=500 threads=10`, then each
of 10 threads will manage execution of 50 operations at a time. With
async mode, a thread will always prepare and send operations if there
are fewer in flight than it is allotted before servicing any pending
responses.
The `async=<ops>` parameter puts an activity into an asynchronous dispatch
mode and configures each thread to juggle a proportion of the operations
specified. If you specify `async=500 threads=10`, then each of 10 threads
will manage execution of 50 operations at a time. With async mode, a
thread will always prepare and send operations if there are fewer in
flight than it is allotted before servicing any pending responses.
Async mode also puts threads into a different sequencing behavior. When
in async mode, responses from an operation may arrive in a different
order than they are sent, and thus linearized operations can't be
guaranteed as with the non-async mode. This means that sometimes you use
want to avoid async mode when you are intentionally simulating access
patterns with multiple linearized operations per user as you may see in
your application.
Async mode also puts threads into a different sequencing behavior. When in
async mode, responses from an operation may arrive in a different order
than they are sent, and thus linearized operations can't be guaranteed as
with the non-async mode. This means that sometimes you use want to avoid
async mode when you are intentionally simulating access patterns with
multiple linearized operations per user as you may see in your
application.
The absence of the async parameter leaves the activity in the default
non-async mode, where each thread works through a sequence of ops one
@@ -217,23 +219,23 @@ The cyclerate parameter sets a maximum op rate for individual cycles
within the activity, across the whole activity, irrespective of how many
threads are active.
:::info
The cyclerate is a rate limiter, and can thus only throttle an activity
to be slower than it would otherwise run. Rate limiting is also an
invasive element in a workload, and will always come at a cost. For
extremely high throughput testing, consider carefully whether your
testing would benefit more from concurrency-based throttling as with
async or the striderate described below.
:::
**NOTE:**
The cyclerate is a rate limiter, and can thus only throttle an activity to
be slower than it would otherwise run. Rate limiting is also an invasive
element in a workload, and will always come at a cost. For extremely high
throughput testing, consider carefully whether your testing would benefit
more from concurrency-based throttling as with async or the striderate
described below.
When the cyclerate parameter is provided, two additional metrics are
tracked: the wait time and the response time. See the 'Reference|Timing
Terms' section for more details on these metrics.
_default_: None. When the cyclerate parameter is not provided, an
activity runs as fast as it can given how fast operations can complete.
_default_: None. When the cyclerate parameter is not provided, an activity
runs as fast as it can given how fast operations can complete.
Examples:
- `cyclerate=1000` - set the cycle rate limiter to 1000 ops/s and a
default burst ratio of 1.1.
- `cyclerate=1000,1.0` - same as above, but with burstrate set to 1.0
@@ -242,15 +244,16 @@ Examples:
50% burst allowed)
Synonyms:
- `rate`
- `targetrate`
### burst ratio
This is only an optional part of the cyclerate as shown in examples
above. If you do not specify it when you initialize a cyclerate, then it
defaults 1.1. The burst ratio is only valid as part of a rate limit and
can not be specified by itself.
This is only an optional part of the cyclerate as shown in examples above.
If you do not specify it when you initialize a cyclerate, then it defaults
1.1. The burst ratio is only valid as part of a rate limit and can not be
specified by itself.
* _default_: `1.1`
* _dynamic_: yes
@@ -259,31 +262,31 @@ The nosqlbench rate limiter provides a sliding scale between strict rate
limiting and average rate limiting. The difference between them is
controlled by a _burst ratio_ parameter. When the burst ratio is 1.0
(burst up to 100% relative rate), the rate limiter acts as a strict rate
limiter, disallowing faster operations from using time that was
previously forfeited by prior slower operations. This is a "use it or
lose it" mode that means things like GC events can steal throughput from
a running client as a necessary effect of losing time in a strict timing
sense.
limiter, disallowing faster operations from using time that was previously
forfeited by prior slower operations. This is a "use it or lose it" mode
that means things like GC events can steal throughput from a running
client as a necessary effect of losing time in a strict timing sense.
When the burst ratio is set to higher than 1.0, faster operations may
recover lost time from previously slower operations. For example, a
burst ratio of 1.3 means that the rate limiter will allow bursting up to
130% of the base rate, but only until the average rate is back to 100%
relative speed. This means that any valleys created in the actual op
rate of the client can be converted into plateaus of throughput above
the strict rate, but only at a speed that fits within (op rate * burst
ratio). This allows for workloads to approximate the average target rate
over time, with controllable bursting rates. This ability allows for
near-strict behavior while allowing clients to still track truer to rate
limit expectations, so long as the overall workload is not saturating
resources.
recover lost time from previously slower operations. For example, a burst
ratio of 1.3 means that the rate limiter will allow bursting up to 130% of
the base rate, but only until the average rate is back to 100% relative
speed. This means that any valleys created in the actual op rate of the
client can be converted into plateaus of throughput above the strict rate,
but only at a speed that fits within (op rate * burst ratio). This allows
for workloads to approximate the average target rate over time, with
controllable bursting rates. This ability allows for near-strict behavior
while allowing clients to still track truer to rate limit expectations, so
long as the overall workload is not saturating resources.
:::info
The default burst ratio of 1.1 makes testing results slightly more stable on average, but can also hide some
short-term slow-downs in system throughput. It is set at the default to fit most tester's expectations for averaging
results, but it may not be strict enough for your testing purposes. However, a strict setting of 1.0 nearly always adds
cold/startup time to the result, so if you are testing for steady state, be sure to account for this across test runs.
:::
**NOTE:**
The default burst ratio of 1.1 makes testing results slightly more stable
on average, but can also hide some short-term slow-downs in system
throughput. It is set at the default to fit most tester's expectations for
averaging results, but it may not be strict enough for your testing
purposes. However, a strict setting of 1.0 nearly always adds cold/startup
time to the result, so if you are testing for steady state, be sure to
account for this across test runs.
## striderate
@@ -295,23 +298,24 @@ cold/startup time to the result, so if you are testing for steady state, be sure
The `striderate` parameter allows you to limit the start of a stride
according to some rate. This works almost exactly like the cyclerate
parameter, except that it blocks a whole group of operations from
starting instead of a single operation. The striderate can use a burst
ratio just as the cyclerate.
parameter, except that it blocks a whole group of operations from starting
instead of a single operation. The striderate can use a burst ratio just
as the cyclerate.
This sets the target rate for strides. In nosqlbench, a stride is a group of
operations that are dispatched and executed together within the same thread.
This is useful, for example, to emulate application behaviors in which some
outside request translates to multiple internal requests. It is also a way
to optimize a client runtime for more efficiency and throughput. The stride
rate limiter applies to the whole activity irrespective of how many threads
it has.
This sets the target rate for strides. In nosqlbench, a stride is a group
of operations that are dispatched and executed together within the same
thread. This is useful, for example, to emulate application behaviors in
which some outside request translates to multiple internal requests. It is
also a way to optimize a client runtime for more efficiency and
throughput. The stride rate limiter applies to the whole activity
irrespective of how many threads it has.
:::warning
When using the cyclerate an striderate options together, operations are delayed based on both rate limiters. If the
relative rates are not synchronised with the side of a stride, then one rate limiter will artificially throttle the
other. Thus, it usually doesn't make sense to use both of these settings in the same activity.
:::
**WARNING:**
When using the cyclerate an striderate options together, operations are
delayed based on both rate limiters. If the relative rates are not
synchronised with the side of a stride, then one rate limiter will
artificially throttle the other. Thus, it usually doesn't make sense to
use both of these settings in the same activity.
## seq
@@ -321,8 +325,8 @@ other. Thus, it usually doesn't make sense to use both of these settings in the
- _dynamic_: no
The `seq=<bucket|concat|interval>` parameter determines the type of
sequencing that will be used to plan the op sequence. The op sequence is
a look-up-table that is used for each stride to pick statement forms
sequencing that will be used to plan the op sequence. The op sequence is a
look-up-table that is used for each stride to pick statement forms
according to the cycle offset. It is simply the sequence of statements
from your YAML that will be executed, but in a pre-planned, and highly
efficient form.
@@ -335,14 +339,13 @@ might expect wil happen: those statements will occur multiple times to
meet their ratio in the op mix. You can customize the op mix further by
changing the seq parameter to concat or interval.
:::info
**NOTE:**
The op sequence is a look up table of statement templates, *not*
individual statements or operations. Thus, the cycle still determines
the uniqueness of an operation as you would expect. For example, if
statement form ABC occurs 3x per sequence because you set its ratio to
3, then each of these would manifest as a distinct operation with fields
determined by distinct cycle values.
:::
individual statements or operations. Thus, the cycle still determines the
uniqueness of an operation as you would expect. For example, if statement
form ABC occurs 3x per sequence because you set its ratio to 3, then each
of these would manifest as a distinct operation with fields determined by
distinct cycle values.
There are three schemes to pick from:
@@ -366,20 +369,21 @@ frequency over a unit interval of time, and apportions the associated
operation to occur evenly over that time. When two operations would be
assigned the same time, then the order of appearance establishes
precedence. In other words, statements appearing first win ties for the
same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A
A B A. This occurs because, over the unit interval (0.0,1.0), A is
assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the
same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A
B A. This occurs because, over the unit interval (0.0,1.0), A is assigned
the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the
positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`. These
offsets are all sorted with a position-stable sort, and then the
associated ops are taken as the order.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A),
0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A B A` as the op
sequence.
In detail, the rendering appears
as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which
yields `A B C A A B A` as the op sequence.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where
each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle
sequencing of statements.
This sequencer is most useful when you want a stable ordering of operation
from a rich mix of statement types, where each operations is spaced as
evenly as possible over time, and where it is not important to control the
cycle-by-cycle sequencing of statements.
## hdr_digits
@@ -388,10 +392,12 @@ sequencing of statements.
- _required_: no
- _dynamic_: no
This parameter determines the number of significant digits used in all HDR histograms for metrics collected from this
activity. The default of 4 allows 4 significant digits, which means *up to* 10000 distinct histogram buckets per named
metric, per histogram interval. This does not mean that there _will be_ 10000 distinct buckets, but it means there could
be if there is significant volume and variety in the measurements.
This parameter determines the number of significant digits used in all HDR
histograms for metrics collected from this activity. The default of 4
allows 4 significant digits, which means *up to* 10000 distinct histogram
buckets per named metric, per histogram interval. This does not mean that
there _will be_ 10000 distinct buckets, but it means there could be if
there is significant volume and variety in the measurements.
If you are running a scenario that creates many activities, then you can set `hdr_digits=1` on some of them to save
client resources.
If you are running a scenario that creates many activities, then you can
set `hdr_digits=1` on some of them to save client resources.

View File

@@ -5,39 +5,44 @@ weight: 06
# Core Statement Parameters
Some statement parameters are recognized by the nosqlbench runtime and can be used on any statement in a YAML file.
Some statement parameters are recognized by the nosqlbench runtime and can
be used on any statement in a YAML file.
## *ratio*
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a
document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is
started.
A statement parameter called _ratio_ is supported by every workload. It
can be attached to a statement, or a block or a document level parameter
block. It sets the relative ratio of a statement in the op sequence before
an activity is started.
When an activity is initialized, all of the active statements are combined into a sequence based on their relative
ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
When an activity is initialized, all of the active statements are combined
into a sequence based on their relative ratios. By default, all statement
templates are initialized with a ratio of 1 if non is specified by the
user.
For example, consider the statements below:
```yaml
statements:
- s1: "select foo,bar from baz where ..."
ratio: 1
- s2: "select bar,baz from foo where ..."
ratio: 2
- s3: "select baz,foo from bar where ..."
ratio: 3
- s1: "select foo,bar from baz where ..."
ratio: 1
- s2: "select bar,baz from foo where ..."
ratio: 2
- s3: "select baz,foo from bar where ..."
ratio: 3
```
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence
length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first
statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60%
overall. It is important to remember that statement ratios are always relative to the total sum of the active
statements' ratios.
If all statements are activated (there is no tag filtering), then the
activity will be initialized with a sequence length of 6. In this case,
the relative ratio of statement "s3" will be 50% overall. If you filtered
out the first statement, then the sequence would be 5 operations long. In
this case, the relative ratio of statement "s3" would be 60% overall. It
is important to remember that statement ratios are always relative to the
total sum of the active statements' ratios.
:::info
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include
below.
:::
**NOTE:**
Because the ratio works so closely with the activity parameter `seq`, the
description for that parameter is include below.
### *seq* (activity level - do not use on statements)
@@ -46,52 +51,65 @@ below.
- _required_: no
- _dynamic_: no
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op
sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle
offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly
The `seq=<bucket|concat|interval>` parameter determines the type of
sequencing that will be used to plan the op sequence. The op sequence is a
look-up-table that is used for each stride to pick statement forms
according to the cycle offset. It is simply the sequence of statements
from your YAML that will be executed, but in a pre-planned, and highly
efficient form.
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket
scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding
ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple
times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or
interval.
An op sequence is planned for every activity. With the default ratio on
every statement as 1, and the default bucket scheme, the basic result is
that each active statement will occur once in the order specified. Once
you start adding ratios to statements, the most obvious thing that you
might expect wil happen: those statements will occur multiple times to
meet their ratio in the op mix. You can customize the op mix further by
changing the seq parameter to concat or interval.
:::info
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle
still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per
sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields
determined by distinct cycle values.
:::
**NOTE:**
The op sequence is a look up table of statement templates, *not*
individual statements or operations. Thus, the cycle still determines the
uniqueness of an operation as you would expect. For example, if statement
form ABC occurs 3x per sequence because you set its ratio to 3, then each
of these would manifest as a distinct operation with fields determined by
distinct cycle values.
There are three schemes to pick from:
### bucket
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is
exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield
the sequence A B B B B B.
This is a round robin planner which draws operations from buckets in
circular fashion, removing each bucket as it is exhausted. For example,
the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The
ratios A:1, B5 would yield the sequence A B B B B B.
### concat
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The
ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
This simply takes each statement template as it occurs in order and
duplicates it in place to achieve the ratio. The ratios above (A:4, B:2,
C:1) would yield the sequence A A A A B B C for the concat sequencer.
### interval
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and
apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time,
then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same
time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval
(0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is
assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are
taken as the order.
This is arguably the most complex sequencer. It takes each ratio as a
frequency over a unit interval of time, and apportions the associated
operation to occur evenly over that time. When two operations would be
assigned the same time, then the order of appearance establishes
precedence. In other words, statements appearing first win ties for the
same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A
B A. This occurs because, over the unit interval
(0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is
assigned the positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`.
These offsets are all sorted with a position-stable sort, and then the
associated ops are taken as the order.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A
B A` as the op sequence.
In detail, the rendering appears
as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which
yields `A B C A A B A` as the op sequence.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where
each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle
sequencing of statements.
This sequencer is most useful when you want a stable ordering of operation
from a rich mix of statement types, where each operations is spaced as
evenly as possible over time, and where it is not important to control the
cycle-by-cycle sequencing of statements.

View File

@@ -5,16 +5,20 @@ weight: 2
# Grafana Metrics
NoSQLBench comes with a built-in helper to get you up and running quickly with client-side testing metrics. This
functionality is based on docker, and a built-in method for bringing up a docker stack, automated by NoSQLBench.
NoSQLBench comes with a built-in helper to get you up and running quickly
with client-side testing metrics. This functionality is based on docker,
and a built-in method for bringing up a docker stack, automated by
NoSQLBench.
:::warning
This feature requires that you have docker running on the local system and that your user is in a group that
is allowed to manage docker. Using the `--docker-metrics` command *will* attempt to manage docker on your local system.
:::
**WARNING:**
This feature requires that you have docker running on the local system and
that your user is in a group that is allowed to manage docker. Using
the `--docker-metrics` command *will* attempt to manage docker on your
local system.
To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime, use this command line option
with any other nosqlbench commands:
To ask nosqlbench to stand up your metrics infrastructure using a local
docker runtime, use this command line option with any other nosqlbench
commands:
--docker-metrics

View File

@@ -53,18 +53,22 @@ Each activity metric for a given activity alias is available at this name. This
directly. Some metrics objects have also been enhanced with wrapper logic to provide simple getters and setters, like
`.p99ms` or `.p99ns`, for example.
Interaction with the nosqlbench runtime and the activities therein is made easy by the above variables and objects. When
an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to
_threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other
changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
Interaction with the nosqlbench runtime and the activities therein is made
easy by the above variables and objects. When an assignment is made to any
of these variables, the changes are propagated to internal listeners. For
changes to
_threads_, the thread pool responsible for the affected activity adjusts
the number of active threads (AKA slots). Other changes are further
propagated directly to the thread harnesses and components which implement
the ActivityType.
:::warning
Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a
different driver once it has been created.
:::
**WARNING:**
Assignment to the _workload_ and _alias_ activity parameters has no
special effect, as you can't change an activity to a different driver once
it has been created.
You can make use of more extensive Java or Javascript libraries as needed, mixing then with the runtime controls
provided above.
You can make use of more extensive Java or Javascript libraries as needed,
mixing then with the runtime controls provided above.
## Enhanced Metrics for Scripting

View File

@@ -5,10 +5,10 @@ weight: 13
# Advanced Testing
:::info
Some of the features discussed here are only for advanced testing scenarios.
:::
**NOTE:**
Some of the features discussed here are only for advanced testing
scenarios. First-time users should become familiar with the basic options
first.
## Hybrid Rate Limiting