docs naming and formatting

This commit is contained in:
Jonathan Shook 2020-03-26 12:33:46 -05:00
parent 3fcd0f6159
commit cf81331a41
52 changed files with 1083 additions and 1004 deletions

View File

@ -1,16 +1,17 @@
# cql activity type - advanced features
# cql driver - advanced features
This is an addendum to the standard CQL Activity Type docs. For that, see "cql".
Use the features in this guide carefully. They do not come with as much documentation
as they are less used than the main CQL features.
This is an addendum to the standard CQL Activity Type docs. For that,
see "cql". Use the features in this guide carefully. They do not come
with as much documentation as they are less used than the main CQL
features.
### ResultSet and Row operators
Within the CQL Activity type, synchronous mode (activities with out
the async= parameter), you have the ability to attach operators to a
given statement such that it will get per-statement handling. These
operators are ways of interrogating the result of an operation, saving
values, or managing other side-effects for specific types of testing.
Within the CQL Activity type, synchronous mode (activities with out the
async= parameter), you have the ability to attach operators to a given
statement such that it will get per-statement handling. These operators
are ways of interrogating the result of an operation, saving values, or
managing other side-effects for specific types of testing.
When enabled for a statement, operators are applied in this order:
@ -35,7 +36,7 @@ row data, you must apply a row operator as explained below.
- **rowoperators** - If provided as a CQL statement param, then the
list of operator names that follow, separated by a comma, will
be used to attache Row operators to the given statement.
## Available ResultSet Operators
- pushvars - Push a copy of the current thread local variables onto
@ -44,11 +45,11 @@ row data, you must apply a row operator as explained below.
conjunction with the row operators below.
- popvars - Pop the last thread local variable set from the thread-local
stack into vars, replacing the previous content. This does nothing
with the ResultSet data.
with the ResultSet data.
- clearvars - Clears the contents of the thread local variables. This
does nothign with the ResultSet data.
- trace - Flags a statement to be traced on the server-side and then logs
the details of the trace to the trace log file.
does nothign with the ResultSet data.
- trace - Flags a statement to be traced on the server-side and then
logs the details of the trace to the trace log file.
- log - Logs basic data to the main log. This is useful to verify that
operators are loading and triggering as expected.
- assert_singlerow - Throws an exception (ResultSetVerificationException)
@ -61,22 +62,22 @@ Examples:
- s1: |
a statement
rsoperators: pushvars, clearvars
```
```
## Available Row Operators:
- savevars - Copies the values of the row into the thread-local variables.
- saverows - Copies the rows into a special CQL-only thread local row state.
Examples:
```
statements:
- s2: |
a statement
a statement
rowoperators: saverows
```
## Injecting additional Queries
## Injecting additional Queries (Future)
It is possible to inject new operations to an activity. However, such
operations are _indirect_ to cycles, since they must be based on the results

View File

@ -1,4 +1,4 @@
# cqlverify activity type
# cqlverify
This activity type allows you to read values from a database and compare them to
the generated values that were expected to be written, row-by-row, producing a

View File

@ -1,5 +1,5 @@
---
title: Diag ActivityType
title: Diag ActivityType
weight: 32
menu:
main:
@ -8,10 +8,9 @@ menu:
weight: 12
---
{{< warning >}}
This section is out of date, and will be updated after the next major release
with details on building async activity types.
{{< /warning >}}
{{< warning >}} This section is out of date, and will be updated after
the next major release with details on building async drivers. {{<
/warning >}}
If you take all the code chunks from this document and concatenate them
together, you'll have 'diag', one of the in-build activity types for
@ -241,4 +240,3 @@ report. If it is time to report, we mark the time in lastUpdate.
This is all there is to making an activity react to real-time changes in the activity definition.

View File

@ -8,10 +8,9 @@ menu:
weight: 12
---
{{< warning >}}
This section is out of date, and will be updated after the next major release
with details on building async activity types.
{{< /warning >}}
{{< warning >}} This section is out of date, and will be updated after
the next major release with details on building async drivers. {{<
/warning >}}
## Introduction
@ -27,7 +26,7 @@ In an async activity, you still have multiple threads, but in this case, each th
more asynchronous operations. The `async=100` parameter, for example, informs an activity that it needs to allocate
100 total operations over the allocated threads. In the case of `async=100 threads=10`, it is the responsibility
of the ActivityType's action dispenser to configure their actions to know that each of them can juggle 10 operations
each.
each.
{{< note >}}The *async* parameter has a standard meaning in nosqlbench. If it is defined, async is enabled. Its
parameter value is the number of total async operations that can be in flight at any one instant, with the number
@ -42,7 +41,7 @@ behavior but getting something else.
The contract between a motor and an action is very basic.
- Each motor submits as many async operations as is allowed to its action, as long as there are
- Each motor submits as many async operations as is allowed to its action, as long as there are
cycles remaining, until the action signals that it is at its limit.
- As long as an action is able to retire an operation by giving a result back to its motor,
the motor keeps providing one more and retiring one more, as long as there are cycles remaining.
@ -74,8 +73,8 @@ as a developer.
but it can return a simple op context if no specialization is needed.
4. op contexts are recycled to avoid heap pressure for high data rates. This makes it relatively
low-cost to use the specialized op context to hold contextual data that may otherwise be
expensive to _malloc_ and _free_.
expensive to _malloc_ and _free_.
### Examples
Developers can refer to the Diag activity type implementation for further examples.
Developers can refer to the Diag activity type implementation for further examples.

View File

@ -1,5 +1,5 @@
---
title: Building ActivityTypes
title: Building ActivityTypes
weight: 32
menu:
main:
@ -15,7 +15,7 @@ menu:
- Maven
## Building new Activity Types
## Building new Driver Types
1. Add the nosqlbench API to your project via Maven:

View File

@ -1,22 +0,0 @@
## Help Topics
### Built-in Component Docs
Generally, all named activity types, input types, oputput types, etc
have their own documentation. You can access those with a command like:
PROG help diag
### Advanced Topics
For any of the topics listed here, you can get detailed help by
running PROG help <topic>.
- topics
- commandline
- cli_scripting
- activity_inputs
- activity_outputs
- cycle_log

View File

@ -5,31 +5,34 @@ weight: 10
# Getting Support
In general, our goals with NoSQLBench are to make the help systems and examples wrap around the users like
a suit of armor, so that they feel capable of doing most things without having to ask for help. Please
keep this in mind when looking for personal support form our community, and help us find those places
where the docs are lacking. Maybe you can help us by adding some missing docs!
In general, our goals with NoSQLBench are to make the help systems and examples wrap around the users like a suit of
armor, so that they feel capable of doing most things without having to ask for help. Please keep this in mind when
looking for personal support form our community, and help us find those places where the docs are lacking. Maybe you can
help us by adding some missing docs!
## NoSQLBench Slack
There is a new [slack channel](https://join.slack.com/t/nosqlbench/shared_invite/zt-cu9f2jpe-XiHN3SsUDcjkVgxaURFuaw) for NoSQLBench.
There is a new
[slack channel](https://join.slack.com/t/nosqlbench/shared_invite/zt-cu9f2jpe-XiHN3SsUDcjkVgxaURFuaw) for NoSQLBench.
Please join it if you are a new or existing NoSQLBench user and help us get it going!
## General Feedback
These guidelines are mirrored at the [Submitting Feedback](https://github.com/nosqlbench/nosqlbench/wiki/Submitting-Feedback)
wiki page at the nosqlbench project site, which is also where any `[Submit Feedback]` links should will take you.
These guidelines are mirrored at the
[Submitting Feedback](https://github.com/nosqlbench/nosqlbench/wiki/Submitting-Feedback) wiki page at the nosqlbench
project site, which is also where any `[Submit Feedback]` links should will take you.
## Bug Fixes
If you think you have found a bug, please [file a bug report](https://github.com/nosqlbench/nosqlbench/issues/new?labels=bug).
nosqlbench is actively used within DataStax, and verified bugs will get attention as resources permit. Bugs reports which are
more detailed, or bug reports which include steps to reproduce will get attention first.
If you think you have found a bug, please
[file a bug report](https://github.com/nosqlbench/nosqlbench/issues/new?labels=bug). nosqlbench is actively used within
DataStax, and verified bugs will get attention as resources permit. Bugs reports which are more detailed, or bug reports
which include steps to reproduce will get attention first.
## Feature Requests
If you would like to see something in nosqlbench that is not there yet,
please [submit a feature request](https://github.com/nosqlbench/nosqlbench/issues/new?labels=feature).
If you would like to see something in nosqlbench that is not there yet,please
[submit a feature request](https://github.com/nosqlbench/nosqlbench/issues/new?labels=feature).
## Documentation Requests

View File

@ -5,12 +5,14 @@ weight: 0
## Welcome to NoSQLBench
Welcome to the documentation for NoSQLBench. This is a power tool that emulates real application workloads.
This means that you can fast-track performance, sizing and data model testing without writing your own testing harness.
Welcome to the documentation for NoSQLBench. This is a power tool that emulates real application workloads. This means
that you can fast-track performance, sizing and data model testing without writing your own testing harness.
To get started right away, jump to the [Quick Start Example](/index.html#/docs/02_getting_started.html) from the menu on the left.
To get started right away, jump to the
[Quick Start Example](/index.html#/docs/02_getting_started.html) from the menu on the left.
To see the ways you can get NoSQLBench, check out the project site [DOWNLOADS.md](https://github.com/nosqlbench/nosqlbench/blob/master/DOWNLOADS.md).
To see the ways you can get NoSQLBench, check out the project site
[DOWNLOADS.md](https://github.com/nosqlbench/nosqlbench/blob/master/DOWNLOADS.md).
## What is NoSQLBench?
@ -18,54 +20,44 @@ NoSQLBench is a serious performance testing tool for the NoSQL ecosystem.
**NoSQLBench brings advanced testing capabilities into one tool that are not found in other testing tools.**
- You can run common testing workloads directly from the command line. You
can start doing this within 5 minutes of reading this.
- You can generate virtual data sets of arbitrary size, with deterministic
data and statistically shaped values.
- You can design custom workloads that emulate your application, contained
in a single file, based on statement templates - no IDE or coding required.
- You can immediately plot your results in a docker and grafana stack on Linux
with a single command line option.
- When needed, you can open the access panels and rewire the runtime behavior
of NoSQLBench to do advanced testing, including a full scripting environment
with Javascript.
- You can run common testing workloads directly from the command line. You can start doing this within 5 minutes of
reading this.
- You can generate virtual data sets of arbitrary size, with deterministic data and statistically shaped values.
- You can design custom workloads that emulate your application, contained in a single file, based on statement
templates - no IDE or coding required.
- You can immediately plot your results in a docker and grafana stack on Linux with a single command line option.
- When needed, you can open the access panels and rewire the runtime behavior of NoSQLBench to do advanced testing,
including a full scripting environment with Javascript.
The core machinery of NoSQLBench has been built with attention to detail.
It has been battle tested within DataStax as a way to help users validate their
data models, baseline system performance, and qualify system designs for scale.
The core machinery of NoSQLBench has been built with attention to detail. It has been battle tested within DataStax as a
way to help users validate their data models, baseline system performance, and qualify system designs for scale.
In short, NoSQLBench wishes to be a programmable power tool for performance
testing. However, it is somewhat generic. It doesn't know directly about a
particular type of system, or protocol. It simply provides a suitable machine
harness in which to put your drivers and testing logic. If you know how to build
a client for a particular kind of system, EB will let you load it like a plugin
and control it dynamically.
In short, NoSQLBench wishes to be a programmable power tool for performance testing. However, it is somewhat generic. It
doesn't know directly about a particular type of system, or protocol. It simply provides a suitable machine harness in
which to put your drivers and testing logic. If you know how to build a client for a particular kind of system, EB will
let you load it like a plugin and control it dynamically.
Initially, NoSQLBench comes with support for CQL, but we would like to see this
expanded with contributions from others.
Initially, NoSQLBench comes with support for CQL, but we would like to see this expanded with contributions from others.
## Origins
The code in this project comes from multiple sources. The procedural data
generation capability was known before as 'Virtual Data Set'. The core runtime
and scripting harness was from the 'EngineBlock' project. The CQL support was
previously used within DataStax. In March of 2020, DataStax and the project
maintainers for these projects decided to put everything into one OSS project
in order to make contributions and sharing easier for everyone. Thus, the new
project name and structure was launched as nosqlbench.io. NoSQLBench is an
independent project that is primarily sponsored by DataStax.
The code in this project comes from multiple sources. The procedural data generation capability was known before as
'Virtual Data Set'. The core runtime and scripting harness was from the 'EngineBlock' project. The CQL support was
previously used within DataStax. In March of 2020, DataStax and the project maintainers for these projects decided to
put everything into one OSS project in order to make contributions and sharing easier for everyone. Thus, the new
project name and structure was launched as nosqlbench.io. NoSQLBench is an independent project that is primarily
sponsored by DataStax.
We offer NoSQLBench as a new way of thinking about testing systems. It is not
limited to testing only one type of system. It is our wish to build a community
of users and practice around this project so that everyone in the NoSQL ecosystem
can benefit from common concepts and understanding and reliable patterns of use.
We offer NoSQLBench as a new way of thinking about testing systems. It is not limited to testing only one type of
system. It is our wish to build a community of users and practice around this project so that everyone in the NoSQL
ecosystem can benefit from common concepts and understanding and reliable patterns of use.
## Scalable User Experience
NoSQLBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to
do just what you need without worrying about the rest. If you need to do something simple, it should
be simple to find the right settings and just do it. If you need something more sophisticated, then you
should be able to find what you need with a reasonable amount of effort and no surprises.
NoSQLBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to do just what you
need without worrying about the rest. If you need to do something simple, it should be simple to find the right settings
and just do it. If you need something more sophisticated, then you should be able to find what you need with a
reasonable amount of effort and no surprises.
That is the core design principle behind NoSQLBench. We hope you like it.

View File

@ -12,21 +12,17 @@ Some of the features discussed here are only for advanced testing scenarios.
## Hybrid Rate Limiting
Rate limiting is a complicated endeavor, if you want to do it well. The basic
rub is that going fast means you have to be less accurate, and vice-versa.
As such, rate limiting is a parasitic drain on any system. The act of rate
limiting is in and of itself poses a limit to the maximum rate, regardless
of the settings you pick, because this forces your system to interact with
some hardware notion of time passing, and this takes CPU cycles that could
be going to the thing you are limiting.
Rate limiting is a complicated endeavor, if you want to do it well. The basic rub is that going fast means you have to
be less accurate, and vice-versa. As such, rate limiting is a parasitic drain on any system. The act of rate limiting is
in and of itself poses a limit to the maximum rate, regardless of the settings you pick, because this forces your system
to interact with some hardware notion of time passing, and this takes CPU cycles that could be going to the thing you
are limiting.
This means that in practice, rate limiters are often very featureless. It's
daunting enough to need rate limiting, and asking for anything more than
that is often wishful thinking. Not so in NoSQLBench.
This means that in practice, rate limiters are often very featureless. It's daunting enough to need rate limiting, and
asking for anything more than that is often wishful thinking. Not so in NoSQLBench.
The rate limiter in NoSQLBench provides a comparable degree of performance
and accuracy to others found in the Java ecosystem, but it *also* has advanced
features:
The rate limiter in NoSQLBench provides a comparable degree of performance and accuracy to others found in the Java
ecosystem, but it *also* has advanced features:
- Allows a sliding scale between average rate limiting and strict rate limiting.
- Internally accumulates delay time, for C.O. friendly metrics
@ -35,60 +31,48 @@ features:
## Flexible Error Handling
An emergent facility in NoSQLBench is the way that error are handled within
an activity. For example, with the CQL activity type, you are able to route
error handling for any of the known exception types. You can count errors,
you can log them. You can cause errored operations to auto-retry if possible,
up to a configurable number of tries.
An emergent facility in NoSQLBench is the way that error are handled within an activity. For example, with the CQL
activity type, you are able to route error handling for any of the known exception types. You can count errors, you can
log them. You can cause errored operations to auto-retry if possible, up to a configurable number of tries.
This means, that as a user, you get to decide what your test is about. Is it
about measuring some nominal but anticipated level of errors due to intentional
over-saturation? If so, then count the errors, and look at their histogram data
for timing details within the available timeout.
This means, that as a user, you get to decide what your test is about. Is it about measuring some nominal but
anticipated level of errors due to intentional over-saturation? If so, then count the errors, and look at their
histogram data for timing details within the available timeout.
Are you doing a basic stability test, where you want the test to error out
for even the slightest error? You can configure for that if you need.
Are you doing a basic stability test, where you want the test to error out for even the slightest error? You can
configure for that if you need.
## Cycle Logging
It is possible to record the result status of each and every cycles in
a NoSQLBench test run. If the results are mostly homogeneous, the RLE
encoding of the results will reduce the output file down to a small
fraction of the number of cycles. The errors are mapped to ordinals, and
these ordinals are stored into a direct RLE-encoded log file. For most
testing where most of the result are simply success, this file will be tiny.
You can also convert the cycle log into textual form for other testing
and post-processing and vice-versa.
It is possible to record the result status of each and every cycles in a NoSQLBench test run. If the results are mostly
homogeneous, the RLE encoding of the results will reduce the output file down to a small fraction of the number of
cycles. The errors are mapped to ordinals, and these ordinals are stored into a direct RLE-encoded log file. For most
testing where most of the result are simply success, this file will be tiny. You can also convert the cycle log into
textual form for other testing and post-processing and vice-versa.
## Op Sequencing
The way that operations are planned for execution in NoSQLBench is based on
a stable ordering that is configurable. The statement forms are mixed
together based on their relative ratios. The three schemes currently supported
are round-robin with exhaustion (bucket), duplicate in order (concat), and
a way to spread each statement out over the unit interval (interval). These
account for most configuration scenarios without users having to micro-manage
their statement templates.
The way that operations are planned for execution in NoSQLBench is based on a stable ordering that is configurable. The
statement forms are mixed together based on their relative ratios. The three schemes currently supported are round-robin
with exhaustion (bucket), duplicate in order (concat), and a way to spread each statement out over the unit interval
(interval). These account for most configuration scenarios without users having to micro-manage their statement
templates.
## Sync and Async
There are two distinct usage modes in NoSQLBench when it comes to operation
dispatch and thread management:
There are two distinct usage modes in NoSQLBench when it comes to operation dispatch and thread management:
### Sync
Sync is the default form. In this mode, each thread reads its sequence
and dispatches one statement at a time, holding only one operation in flight
per thread. This is the mode you often use when you want to emulate an
application's request-per-thread model, as it implicitly linearizes the
order of operations within the computed sequence of statements.
Sync is the default form. In this mode, each thread reads its sequence and dispatches one statement at a time, holding
only one operation in flight per thread. This is the mode you often use when you want to emulate an application's
request-per-thread model, as it implicitly linearizes the order of operations within the computed sequence of
statements.
### Async
In Async mode, each thread in an activity is reponsible for juggling a number
of operations in-flight. This allows a NoSQLBench client to juggle an
arbitrarily high number of connections, limited primarily by how much memory
you have.
In Async mode, each thread in an activity is reponsible for juggling a number of operations in-flight. This allows a
NoSQLBench client to juggle an arbitrarily high number of connections, limited primarily by how much memory you have.
Internally, the Sync and Async modes have different code paths. It is possible
for an activity type to support one or both of these.
Internally, the Sync and Async modes have different code paths. It is possible for an activity type to support one or
both of these.

View File

@ -5,61 +5,46 @@ weight: 2
# Refined Core Concepts
The core concepts that NoSQLBench is built on have been scrutinized,
replaced, refined, and hardened through several years of use
by users of various needs and backgrounds.
The core concepts that NoSQLBench is built on have been scrutinized, replaced, refined, and hardened through several
years of use by users of various needs and backgrounds.
This is important when trying to find a way to express common patterns
in what is often a highly fragmented practice. Testing is hard. Scale
testing is hard. Distributed testing is hard. We need a set of conceptual
building blocks that can span across workloads and system types, and
machinery to put these concepts to use. Some concepts used in NoSQLBench
are shared below for illustration, but this is by no means an exhaustive
list.
This is important when trying to find a way to express common patterns in what is often a highly fragmented practice.
Testing is hard. Scale testing is hard. Distributed testing is hard. We need a set of conceptual building blocks that
can span across workloads and system types, and machinery to put these concepts to use. Some concepts used in NoSQLBench
are shared below for illustration, but this is by no means an exhaustive list.
### The Cycle
Cycles in NoSQLBench are whole numbers on a number line. All operations
in a NoSQLBench session are derived from a single cycle. It's a long value,
and a seed. The cycle determines not only which statements (of those available)
will get executed, but it also determines what the values bound to that
statement will be.
Cycles in NoSQLBench are whole numbers on a number line. All operations in a NoSQLBench session are derived from a
single cycle. It's a long value, and a seed. The cycle determines not only which statements (of those available) will
get executed, but it also determines what the values bound to that statement will be.
Cycles are specified as a closed-open `[min,max)` interval, just as slices
in some languages. That is, the min value is included in the range, but the
max value is not. This means that you can stack slices using common numeric
reference points without overlaps or gaps. It means you can have exact awareness
of what data is in your dataset, even incrementally.
Cycles are specified as a closed-open `[min,max)` interval, just as slices in some languages. That is, the min value is
included in the range, but the max value is not. This means that you can stack slices using common numeric reference
points without overlaps or gaps. It means you can have exact awareness of what data is in your dataset, even
incrementally.
You can think of a cycle as a single-valued coordinate system for data that
lives adjacent to that number on the number line.
You can think of a cycle as a single-valued coordinate system for data that lives adjacent to that number on the number
line.
### The Activity
An activity is a multi-threaded flywheel of statements in some sequence
and ratio. Activities run over the numbers in a cycle range. Each activity
has a driver type which determines the native protocol that it speaks.
An activity continuously
An activity is a multi-threaded flywheel of statements in some sequence and ratio. Activities run over the numbers in a
cycle range. Each activity has a driver type which determines the native protocol that it speaks.
### The Activity Type
An activity type is a high level driver for a protocol. It is like a
statement-aware cartridge that knows how to take a basic statement template
and turn it into an operation for the scenario to execute.
An activity type is a high level driver for a protocol. It is like a statement-aware cartridge that knows how to take a
basic statement template and turn it into an operation for the scenario to execute.
### The Scenario
The scenario is a runtime session that holds the activities while they run.
A NoSQLBench scenario is responsible for aggregating global runtime settings,
metrics reporting channels, logfiles, and so on.
The scenario is a runtime session that holds the activities while they run. A NoSQLBench scenario is responsible for
aggregating global runtime settings, metrics reporting channels, logfiles, and so on.
### The Scenario Script
Each scenario is governed by a script runs single-threaded, asynchronously
from activities, but in control of activities. If needed, the scenario script
is automatically created for the user, and the user never knows it is there.
If the user has advanced testing requirements, then they may take advantage
of the scripting capability at such time.
When the script exits, *AND* all activities are complete, then the scenario
is complete..
Each scenario is governed by a script runs single-threaded, asynchronously from activities, but in control of
activities. If needed, the scenario script is automatically created for the user, and the user never knows it is there.
If the user has advanced testing requirements, then they may take advantage of the scripting capability at such time.
When the script exits, *AND* all activities are complete, then the scenario is complete..

View File

@ -5,48 +5,43 @@ weight: 12
# High Fidelity Metrics
Since NoSQLBench has been built as a serious testing tool for all users,
some attention was necessary on the way metric are used.
Since NoSQLBench has been built as a serious testing tool for all users, some attention was necessary on the way metric
are used.
## Discrete Reservoirs
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs.
Internally, we use HDR reservoirs with discrete time boundaries. This
is so that you can look at the min and max values and know that they
apply accurately to the whole sampling window.
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs. Internally, we use HDR reservoirs with discrete
time boundaries. This is so that you can look at the min and max values and know that they apply accurately to the whole
sampling window.
## Metric Naming
All activity types that run have a symbolic alias that identifies
them for the purposes of automation and metrics. If you have multiple
activities running concurrently, they will have different names and will
be represnted distinctly in the metrics flow.
All running activities have a symbolic alias that identifies them for the purposes of automation and metrics. If you
have multiple activities running concurrently, they will have different names and will be represnted distinctly in the
metrics flow.
## Precision and Units
By default, the internal HDR histogram reservoirs are kept at 4 digits
of precision. All timers are kept at nanosecond resolution.
By default, the internal HDR histogram reservoirs are kept at 4 digits of precision. All timers are kept at nanosecond
resolution.
## Metrics Reportring
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and
HDR stats summary CSV files.
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and HDR stats summary CSV files.
## Coordianated Omission
The metrics naming and semantics in NoSQLBench are setup so that you
can have coordinated omission metrics when they are appropriate, but
there are no there changes when they are not. This means that the metric
names and meanings remain stable in any case.
The metrics naming and semantics in NoSQLBench are setup so that you can have coordinated omission metrics when they are
appropriate, but there are no there changes when they are not. This means that the metric names and meanings remain
stable in any case.
Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused
and thus prone to confusing people.
Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused and thus prone to confusing
people.
Instead, the terms `service time`, `wait time`, and `response time` are used.
These are abbreviated in metrics as `servicetime`, `waittime`, and `responsetime`.
Instead, the terms `service time`, `wait time`, and `response time` are used. These are abbreviated in metrics as
`servicetime`, `waittime`, and `responsetime`.
The `servicetime` metric is the only one which is always present. When a
rate limiter is used, then additionally `waittime` and `responsetime` are
reported.
The `servicetime` metric is the only one which is always present. When a rate limiter is used, then additionally
`waittime` and `responsetime` are reported.

View File

@ -5,23 +5,18 @@ weight: 10
# NoSQLBench Showcase
Since NoSQLBench is new on the scene in its current form, you may be wondering
why you would want to use it over any other tool. That is what this section is all
about.
Since NoSQLBench is new on the scene in its current form, you may be wondering why you would want to use it over any
other tool. That is what this section is all about.
If you want to look under the hood of this toolkit before giving it a spin,
this section is for you. You don't have to read all of this! It is here for those
who want to know the answer to the question "So, what's the big deal??"
Just remember it is here for later if you want to skip to the next section and get
started testing.
If you want to look under the hood of this toolkit before giving it a spin, this section is for you. You don't have to
read all of this! It is here for those who want to know the answer to the question "So, what's the big deal??" Just
remember it is here for later if you want to skip to the next section and get started testing.
NoSQLBench can do nearly everything that other testing tools can do, and more. It
achieves this by focusing on a scalable user experience in combination with a
modular internal architecture.
NoSQLBench can do nearly everything that other testing tools can do, and more. It achieves this by focusing on a
scalable user experience in combination with a modular internal architecture.
NoSQLBench is a workload construction and simulation tool for scalable systems
testing. That is an entirely different scope of endeavor than most other tools.
NoSQLBench is a workload construction and simulation tool for scalable systems testing. That is an entirely different
scope of endeavor than most other tools.
The pages in this section all speak to advanced capabilities that are unique
to NoSQLBench. In time, we want to show these with basic scenario examples, right
in the docs.
The pages in this section all speak to advanced capabilities that are unique to NoSQLBench. In time, we want to show
these with basic scenario examples, right in the docs.

View File

@ -5,23 +5,18 @@ weight: 11
# Modular Architecture
The internal architecture of NoSQLBench is modular throughout.
Everything from the scripting extensions to the data generation functions
is enumerated at compile time into a service descriptor, and then discovered
at runtime by the SPI mechanism in Java.
The internal architecture of NoSQLBench is modular throughout. Everything from the scripting extensions to the data
generation functions is enumerated at compile time into a service descriptor, and then discovered at runtime by the SPI
mechanism in Java.
This means that extending and customizing bundles and features is quite
manageable.
This means that extending and customizing bundles and features is quite manageable.
It also means that it is relatively easy to provide a suitable
API for multi-protocol support. In fact, there are several drivers
avaialble in the current NoSQLBench distribution. You can list them
out with `./nb --list-drivers`, and you can get help on
how to use each of them with `./nb help <name>`.
It also means that it is relatively easy to provide a suitable API for multi-protocol support. In fact, there are
several drivers avaialble in the current NoSQLBench distribution. You can list them out with `./nb --list-drivers`, and
you can get help on how to use each of them with `./nb help <name>`.
This also is a way for us to encourage and empower other contributors
to help develop the capabilities and reach of NoSQLBench as a bridge
building tool in our community. This level of modularity is somewhat
unusual, but it serves the purpose of helping users with new features.
This also is a way for us to encourage and empower other contributors to help develop the capabilities and reach of
NoSQLBench as a bridge building tool in our community. This level of modularity is somewhat unusual, but it serves the
purpose of helping users with new features.

View File

@ -5,47 +5,38 @@ weight: 2
# Portable Workloads
All of the workloads that you can build with NoSQLBench are self-contained
in a workload file. This is a statement-oriented configuration file that
contains templates for the operations you want to run in a workload.
All of the workloads that you can build with NoSQLBench are self-contained in a workload file. This is a
statement-oriented configuration file that contains templates for the operations you want to run in a workload.
This defines part of an activity - the iterative flywheel part that is
run directly within an activity type. This file contains everything needed
to run a basic activity -- A set of statements in some ratio. It can be
used to start an activity, or as part of several activities within a scenario.
This defines part of an activity - the iterative flywheel part that is run directly within an activity type. This file
contains everything needed to run a basic activity -- A set of statements in some ratio. It can be used to start an
activity, or as part of several activities within a scenario.
## Standard YAML Format
The format for describing statements in NoSQLBench is generic, but in a
particular way that is specialized around describing statements for a workload.
The format for describing statements in NoSQLBench is generic, but in a particular way that is specialized around
describing statements for a workload.
That means that you can use the same YAML format to describe a workload
for kafka as you can for Apache Cassandra or DSE.
That means that you can use the same YAML format to describe a workload for kafka as you can for Apache Cassandra or
DSE.
The YAML structure has been tailored to describing statements, their
data generation bindings, how they are grouped and selected, and the
parameters needed by drivers, like whether they should be prepared
statements or not.
The YAML structure has been tailored to describing statements, their data generation bindings, how they are grouped and
selected, and the parameters needed by drivers, like whether they should be prepared statements or not.
Further, the YAML format allows for defaults and overrides with a
very simple mechanism that reduces editing fatigue for frequent users.
Further, the YAML format allows for defaults and overrides with a very simple mechanism that reduces editing fatigue for
frequent users.
You can also template document-wide macro paramers which are taken
from the command line parameters just like any other parameter. This is
a way of templating a workload and make it multi-purpose or adjustable
on the fly.
You can also template document-wide macro paramers which are taken from the command line parameters just like any other
parameter. This is a way of templating a workload and make it multi-purpose or adjustable on the fly.
## Experimentation Friendly
Because the workload YAML format is generic across activity types,
it is possible to ask one acivity type to interpret the statements that are
meant for another. This isn't generally a good idea, but it becomes
extremely handy when you want to have a very high level activity type like
`stdout` use a lower-level syntax like that of the `cql` activity type.
When you do this, the stdout activity type _plays_ the statements to your
console as they would be executed in CQL, data bindings and all.
Because the workload YAML format is generic across activity types, it is possible to ask one acivity type to interpret
the statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you
want to have a very high level activity type like `stdout` use a lower-level syntax like that of the `cql` activity
type. When you do this, the stdout activity type _plays_ the statements to your console as they would be executed in
CQL, data bindings and all.
This means you can empirically and substantively demonstrate and verify
access patterns, data skew, and other dataset details before you
change back to cql mode and turn up the settings for a higher scale test.
This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
details before you change back to cql mode and turn up the settings for a higher scale test.

View File

@ -5,91 +5,68 @@ weight: 3
# Scripting Environment
The ability to write open-ended testing simulations is provided in
EngineBlock by means of a scripted runtime, where each scenario is
driven from a control script that can do anything the user wants.
The ability to write open-ended testing simulations is provided in EngineBlock by means of a scripted runtime, where
each scenario is driven from a control script that can do anything the user wants.
## Dynamic Parameters
Some configuration parameters of activities are designed to be
assignable while a workload is running. This makes things like
threads, rates, and other workload dynamics pseudo real-time.
The internal APIs work with the scripting environment to expose
these parameters directly to scenario scripts.
Some configuration parameters of activities are designed to be assignable while a workload is running. This makes things
like threads, rates, and other workload dynamics pseudo real-time. The internal APIs work with the scripting environment
to expose these parameters directly to scenario scripts.
## Scripting Automatons
When a NoSQLBench scenario is running, it is under the control of a
single-threaded script. Each activity that is started by this script
is run within its own threadpool, asynchronously.
When a NoSQLBench scenario is running, it is under the control of a single-threaded script. Each activity that is
started by this script is run within its own threadpool, asynchronously.
The control script has executive control of the activities, as well
as full visibility into the metrics that are provided by each activity.
The way these two parts of the runtime meet is through the service
objects which are installed into the scripting runtime. These service
objects provide a named access point for each running activity and its
metrics.
The control script has executive control of the activities, as well as full visibility into the metrics that are
provided by each activity. The way these two parts of the runtime meet is through the service objects which are
installed into the scripting runtime. These service objects provide a named access point for each running activity and
its metrics.
This means that the scenario script can do something simple, like start
activities and wait for them to complete, OR, it can do something
more sophisticated like dynamically and interative scrutinize the metrics
and make realtime adjustments to the workload while it runs.
This means that the scenario script can do something simple, like start activities and wait for them to complete, OR, it
can do something more sophisticated like dynamically and interative scrutinize the metrics and make realtime adjustments
to the workload while it runs.
## Analysis Methods
Scripting automatons that do feedback-oriented analysis of a target system
are called analysis methods in NoSQLBench. We have prototypes a couple of
these already, but there is nothing keeping the adventurous from coming up
with their own.
Scripting automatons that do feedback-oriented analysis of a target system are called analysis methods in NoSQLBench. We
have prototypes a couple of these already, but there is nothing keeping the adventurous from coming up with their own.
## Command Line Scripting
The command line has the form of basic test commands and parameters.
These command get converted directly into scenario control script
in the order they appear. The user can choose whether to stay in
high level executive mode, with simple commands like "run workload=...",
or to drop down directly into script design. They can look at the
equivalent script for any command line by running --show-script.
If you take the script that is dumped to console and run it, it should
do exactly the same thing as if you hadn't even looked at it and just
the standard commands.
The command line has the form of basic test commands and parameters. These command get converted directly into scenario
control script in the order they appear. The user can choose whether to stay in high level executive mode, with simple
commands like "run workload=...", or to drop down directly into script design. They can look at the equivalent script
for any command line by running --show-script. If you take the script that is dumped to console and run it, it should do
exactly the same thing as if you hadn't even looked at it and just the standard commands.
There are even ways to combine script fragments, full commands, and calls
to scripts on the command line. Since each variant is merely a way of
constructing scenario script, they all get composited together before
the scenario script is run.
There are even ways to combine script fragments, full commands, and calls to scripts on the command line. Since each
variant is merely a way of constructing scenario script, they all get composited together before the scenario script is
run.
New introductions to NoSQLBench should focus on the command line. Once
a user is familiar with this, it is up to them whether to tap into the
deeper functionality. If they don't need to know about scenario scripting,
then they shouldn't have to learn about it to be effective.
New introductions to NoSQLBench should focus on the command line. Once a user is familiar with this, it is up to them
whether to tap into the deeper functionality. If they don't need to know about scenario scripting, then they shouldn't
have to learn about it to be effective.
## Compared to DSLs
Other tools may claim that their DSL makes scenario "simulation" easier.
In practice, any DSL is generally dependent on a development tool to
lay the language out in front of a user in a fluent way. This means that
DSLs are almost always developer-targeted tools, and mostly useless for
casual users who don't want to break out an IDE.
Other tools may claim that their DSL makes scenario "simulation" easier. In practice, any DSL is generally dependent on
a development tool to lay the language out in front of a user in a fluent way. This means that DSLs are almost always
developer-targeted tools, and mostly useless for casual users who don't want to break out an IDE.
One of the things a DSL proponent may tell you is that it tells you
"all the things you can do!". This is de-facto the same thing as it
telling you "all the things you can't do" because it's not part of the
DSL. This is not a win for the user. For DSL-based systems, the user
has to use the DSL whether or not it enhances their creative control,
while in fact, most DSL aren't rich enough to do much that is interesting
from a simulation perspective.
One of the things a DSL proponent may tell you is that it tells you "all the things you can do!". This is de-facto the
same thing as it telling you "all the things you can't do" because it's not part of the DSL. This is not a win for the
user. For DSL-based systems, the user has to use the DSL whether or not it enhances their creative control, while in
fact, most DSL aren't rich enough to do much that is interesting from a simulation perspective.
In NoSQLBench, we don't force the user to use the programming abstractions
except at a very surface level -- the CLI. It is up to the user whether
or not to open the secret access panel for the more advance functionality.
If they decide to do this, we give them a commodity language (ECMAScript),
and we wire it into all the things they were already using. We don't take
away their expressivity by telling them what they can't do. This way,
users can pick their level of investment and reward as best fits thir individual
needs, as it should be.
In NoSQLBench, we don't force the user to use the programming abstractions except at a very surface level -- the CLI. It
is up to the user whether or not to open the secret access panel for the more advance functionality. If they decide to
do this, we give them a commodity language (ECMAScript), and we wire it into all the things they were already using. We
don't take away their expressivity by telling them what they can't do. This way, users can pick their level of
investment and reward as best fits thir individual needs, as it should be.
## Scripting Extensions
Also mentioned under the section on modularity, it is relatively easy
for a developer to add their own scripting extensions into NoSQLBench.
Also mentioned under the section on modularity, it is relatively easy for a developer to add their own scripting
extensions into NoSQLBench.

View File

@ -5,92 +5,71 @@ weight: 1
# Virtual Datasets
The _Virtual Dataset_ capabilities within NoSQLBench allow you to
generate data on the fly. There are many reasons for using this technique
in testing, but it is often a topic that is overlooked or taken for granted.
The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate data on the fly. There are many reasons for
using this technique in testing, but it is often a topic that is overlooked or taken for granted.
## Industrial Strength
The algorithms used to generate data are based on
advanced techniques in the realm of variate sampling. The authors have
gone to great lengths to ensure that data generation is efficient and
as much O(1) in processing time as possible.
The algorithms used to generate data are based on advanced techniques in the realm of variate sampling. The authors have
gone to great lengths to ensure that data generation is efficient and as much O(1) in processing time as possible.
For example...
One technique that is used to achieve this is to initialize and cache
data in high resolution look-up tables for distributions which may perform
differently depending on their density functions. The existing Apache
Commons Math libraries have been adapted into a set of interpolated
Inverse Cumulative Distribution sampling functions. This means that
you can use a Zipfian distribution in the same place as you would a
Uniform distribution, and once initialized, they sample with identical
overhead. This means that by changing your test definition, you don't
accidentally change the behavior of your test client.
One technique that is used to achieve this is to initialize and cache data in high resolution look-up tables for
distributions which may perform differently depending on their density functions. The existing Apache Commons Math
libraries have been adapted into a set of interpolated Inverse Cumulative Distribution sampling functions. This means
that you can use a Zipfian distribution in the same place as you would a Uniform distribution, and once initialized,
they sample with identical overhead. This means that by changing your test definition, you don't accidentally change the
behavior of your test client.
## The Right Tool
Many other testing systems avoid building a dataset generation component.
It's a toubgh problem to solve, so it's often just avoided. Instead, they use
libraries like "faker" and variations on that. However, faker is well named,
no pun intended. It was meant as a vignette library, not a source of test
data for realistic results. If you are using a testing tool for scale testing
and relying on a faker variant, then you will almost certainly get invalid
results for any serious test.
Many other testing systems avoid building a dataset generation component. It's a toubgh problem to solve, so it's often
just avoided. Instead, they use libraries like "faker" and variations on that. However, faker is well named, no pun
intended. It was meant as a vignette library, not a source of test data for realistic results. If you are using a
testing tool for scale testing and relying on a faker variant, then you will almost certainly get invalid results for
any serious test.
The virtual dataset component of NoSQLBench is a library that was designed
for high scale and realistic data streams.
The virtual dataset component of NoSQLBench is a library that was designed for high scale and realistic data streams.
## Deterministic
The data that is generated by the virtual dataset libraries is determinstic.
This means that for a given cycle in a test, the operation that is synthesized
for that cycle will be the same from one session to the next. This is intentional.
If you want to perturb the test data from one session to the next, then you can
most easily do it by simply selecting a different set of cycles as your basis.
The data that is generated by the virtual dataset libraries is determinstic. This means that for a given cycle in a
test, the operation that is synthesized for that cycle will be the same from one session to the next. This is
intentional. If you want to perturb the test data from one session to the next, then you can most easily do it by simply
selecting a different set of cycles as your basis.
This means that if you find something intersting in a test run, you can go
back to it just by specifying the cycles in question. It also means that you
aren't losing comparative value between tests with additional randomness thrown
in. The data you generate will still look random to the human eye, but that doesn't
mean that it can't be reproducible.
This means that if you find something intersting in a test run, you can go back to it just by specifying the cycles in
question. It also means that you aren't losing comparative value between tests with additional randomness thrown in. The
data you generate will still look random to the human eye, but that doesn't mean that it can't be reproducible.
## Statistically Shaped
All this means is that the values you use to tie your dataset together
can be specific to any distribution that is appropriate. You can ask for
a stream of floating point values 1 trillion values long, in any order.
You can use discrete or continuous distributions, with whatever parameters
you need.
All this means is that the values you use to tie your dataset together can be specific to any distribution that is
appropriate. You can ask for a stream of floating point values 1 trillion values long, in any order. You can use
discrete or continuous distributions, with whatever parameters you need.
## Best of Both Worlds
Some might worry that fully synthetic testing data is not realistic enough.
The devil is in the details on these arguments, but suffice it to say that
you can pick the level of real data you use as seed data with NoSQLBench.
Some might worry that fully synthetic testing data is not realistic enough. The devil is in the details on these
arguments, but suffice it to say that you can pick the level of real data you use as seed data with NoSQLBench.
For example, using the alias sampling method and a published US census
(public domain) list of names and surnames tha occured more than 100x,
we can provide extremely accurate samples of names according to the
discrete distribution we know of. The alias method allows us to sample
accurately in O(1) time from the entire dataset by turning a large number
of weights into two uniform samples. You will simply not find a better way
to sample names of US names than this. (but if you do, please file an issue!)
For example, using the alias sampling method and a published US census (public domain) list of names and surnames tha
occured more than 100x, we can provide extremely accurate samples of names according to the discrete distribution we
know of. The alias method allows us to sample accurately in O(1) time from the entire dataset by turning a large number
of weights into two uniform samples. You will simply not find a better way to sample names of US names than this. (but
if you do, please file an issue!)
## Java Idiomatic Extension
The way that the virtual dataset component works allows Java developers to
write any extension to the data generation functions simply in the form
of Java 8 or newer Funtional interfaces. As long as they include the
annotation processor and annotate their classes, they will show up in the
runtime and be available to any workload by their class name.
The way that the virtual dataset component works allows Java developers to write any extension to the data generation
functions simply in the form of Java 8 or newer Funtional interfaces. As long as they include the annotation processor
and annotate their classes, they will show up in the runtime and be available to any workload by their class name.
## Binding Recipes
It is possible to stitch data generation functions together directly in
a workload YAML. These are data-flow sketches of functions that can
be copied and pasted between workload descriptions to share or remix
data streams. This allows for the adventurous to build sophisticated
virtual datasets that emulate nuances of real datasets, but in a form
that takes up less space on the screen than this paragraph!
It is possible to stitch data generation functions together directly in a workload YAML. These are data-flow sketches of
functions that can be copied and pasted between workload descriptions to share or remix data streams. This allows for
the adventurous to build sophisticated virtual datasets that emulate nuances of real datasets, but in a form that takes
up less space on the screen than this paragraph!

View File

@ -9,9 +9,8 @@ Let's run a simple test against a cluster to establish some basic familiarity wi
## Create a Schema
We will start by creating a simple schema in the database.
From your command line, go ahead and execute the following command,
replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
We will start by creating a simple schema in the database. From your command line, go ahead and execute the following
command, replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
```
./nb run driver=cql workload=cql-keyvalue tags=phase:schema host=<dse-host-or-ip>
@ -20,7 +19,9 @@ replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
This command is creating the following schema in your database:
```cql
CREATE KEYSPACE baselines WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE KEYSPACE baselines
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
AND durable_writes = true;
CREATE TABLE baselines.keyvalue (
key text PRIMARY KEY,
@ -32,7 +33,8 @@ Let's break down each of those command line options.
`start` tells nosqlbench to start an activity.
`driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to use the DataStax Java Driver and execute CQL statements against a database.
`driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to
use the DataStax Java Driver and execute CQL statements against a database.
`workload=...` is used to specify the workload definition file that defines the activity.
@ -40,17 +42,19 @@ In this example, we use `cql-keyvalue` which is a pre-built workload that is pac
`tags=phase:schema` tells nosqlbench to run the yaml block that has the `phase:schema` defined as one of its tags.
In this example, that is the DDL portion of the `cql-keyvalue` workload.
In this example, that is the DDL portion of the `cql-keyvalue` workload. `host=...` tells nosqlbench how to connect to
your database, only one host is necessary.
`host=...` tells nosqlbench how to connect to your database, only one host is necessary.
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with `DESCRIBE KEYSPACE baselines`.
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with
`DESCRIBE KEYSPACE baselines`.
## Load Some Data
Before running a test of typical access patterns where you want to capture the results, you need to make the test more interesting than loading an empty table. For this, we use the rampup phase.
Before running a test of typical access patterns where you want to capture the results, you need to make the test more
interesting than loading an empty table. For this, we use the rampup phase.
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is generating for CQL statements.
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is
generating for CQL statements.
Go ahead and execute the following command:
@ -71,7 +75,7 @@ insert into baselines.keyvalue (key, value) values (8,296173906);
insert into baselines.keyvalue (key, value) values (9,97405552);
```
One thing to know is that nosqlbench deterministically generates data, so the generated values will be the same from run to run.
NoSQLBench deterministically generates data, so the generated values will be the same from run to run.
Now we are ready to write some data to our database. Go ahead and execute the following from your command line:
@ -81,11 +85,21 @@ Note the differences between this and the command that we used to generate the s
`tags=phase:rampup` is running the yaml block in `cql-keyvalue` that has only INSERT statements.
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an appropriately large number of cycles in actual testing to make your main test meaningful.
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an
appropriately large number of cycles in actual testing to make your main test meaningful.
:::info
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for `cycles=0..n`,
which makes cycles a zero-based quantity by default. For example, cycles=5 means that the activity will use cycles
0,1,2,3,4, but not 5. The reason for this is explained in detail in the Activity Parameters section.
:::
These parameters are explained in detail in the section on _Activity Parameters_.
`--progress console:1s` will print the progression of the run to the console every 1 second.
You should see output that looks like this
```
cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
@ -103,11 +117,13 @@ cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
## Run the main test phase
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default this runs a 50% read / 50% write workload.
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default
this runs a 50% read / 50% write workload.
./nb start driver=cql workload=cql-keyvalue tags=phase:main host=<dse-host-or-ip> cycles=100k cyclerate=5000 threads=50 --progress console:1s
You should see output that looks like this:
```
Logging to logs/scenario_20190812_154431_028.log
cql-keyvalue: 0.50%/Running (details: min=0 cycle=500 max=100000)
@ -141,12 +157,15 @@ We have a few new command line options here:
`tags=phase:main` is using a new block in our activity's yaml that contains both read and write queries.
`threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
`threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for
workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
`cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
`cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option
is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
## Now What?
Note in the above output, we see `Logging to logs/scenario_20190812_154431_028.log`.
By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the next section Viewing Results.
By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the
next section Viewing Results.

View File

@ -5,26 +5,26 @@ weight: 3
# Example Results
We just ran a very simple workload against our database. In that example, we saw that
nosqlbench writes to a log file and it is in that log file where the most basic form of metrics are displayed.
We just ran a very simple workload against our database. In that example, we saw that nosqlbench writes to a log file
and it is in that log file where the most basic form of metrics are displayed.
## Log File Metrics
For our previous run, we saw that nosqlbench was writing to `logs/scenario_20190812_154431_028.log`
Even when you don't configure nosqlbench to write its metrics to another location, it
will periodically report all the metrics to the log file. At the end of a scenario,
before nosqlbench shuts down, it will flush the partial reporting interval again to
the logs. This means you can always look in the logs for metrics information.
Even when you don't configure nosqlbench to write its metrics to another location, it will periodically report all the
metrics to the log file. At the end of a scenario, before nosqlbench shuts down, it will flush the partial reporting
interval again to the logs. This means you can always look in the logs for metrics information.
:::warning
If you look in the logs for metrics, be aware that the last report will only contain a
partial interval of results. When looking at the last partial window, only metrics which
average over time or which compute the mean for the whole test will be meaningful.
If you look in the logs for metrics, be aware that the last report will only contain a partial interval of results. When
looking at the last partial window, only metrics which average over time or which compute the mean for the whole test
will be meaningful.
:::
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus a subset of the most important metrics.
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus
a subset of the most important metrics.
```
2019-08-12 15:46:00,274 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:48] -- BEGIN METRICS DETAIL --
@ -36,7 +36,8 @@ Below is a sample of the log that gives us our basic metrics. There is a lot to
```
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics from nosqlbench.
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics
from nosqlbench.
We recommend that you use one of these methods, according to your environment or tooling available:
@ -45,4 +46,5 @@ We recommend that you use one of these methods, according to your environment or
3. Record your metrics to local CSV files with `--report-csv-to my_metrics_dir`
4. Record your metrics to HDR logs with `--log-histograms my_hdr_metrics.log`
See the command line reference for details on how to route your metrics to a metrics collector or format of your preference.
See the command line reference for details on how to route your metrics to a metrics collector or format of your
preference.

View File

@ -5,56 +5,64 @@ weight: 4
# Example Metrics
A set of core metrics are provided for every workload that runs with nosqlbench,
regardless of the activity type and protocol used. This section explains each of
these metrics and shows an example of them from the log file.
A set of core metrics are provided for every workload that runs with nosqlbench, regardless of the activity type and
protocol used. This section explains each of these metrics and shows an example of them from the log file.
## metric: result
This is the primary metric that should be used to get a quick idea of the
throughput and latency for a given run. It encapsulates the entire
operation life cycle ( ie. bind, execute, get result back ).
This is the primary metric that should be used to get a quick idea of the throughput and latency for a given run. It
encapsulates the entire operation life cycle ( ie. bind, execute, get result back ).
For this example we see that we averaged 3732 operations / second with 3.6ms
75th percentile latency and 23.9ms 99th percentile latency. Note the raw metrics are
in microseconds. This duration_unit may change depending on how a user configures
nosqlbench, so always double-check it.
For this example we see that we averaged 3732 operations / second with 3.6ms 75th percentile latency and 23.9ms 99th
percentile latency. Note the raw metrics are in microseconds. This duration_unit may change depending on how a user
configures nosqlbench, so always double-check it.
```
2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185, median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415, mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132, rate_unit=events/second, duration_unit=microseconds
2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER,
name=cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185,
median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415,
mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132,
rate_unit=events/second, duration_unit=microseconds
```
## metric: result-success
This metric shows whether there were any errors during the run. You can confirm that
the count is equal to the number of cycles for the run if
you are expecting or requiring zero failed operations.
This metric shows whether there were any errors during the run. You can confirm that the count is equal to the number of
cycles for the run if you are expecting or requiring zero failed operations.
Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput
and latency here are slightly different than the `results` metric simply because this
is a separate timer that only includes operations which completed with no exceptions.
Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput and latency here are slightly
different than the `results` metric simply because this is a separate timer that only includes operations which
completed with no exceptions.
```
2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808, stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063, p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243, rate_unit=events/second, duration_unit=microseconds
2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER,
name=cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808,
stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063,
p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243,
rate_unit=events/second, duration_unit=microseconds
```
## metric: resultset-size
For read workloads, this metric shows the size of result sent back to nosqlbench
from the server. This is useful to confirm that you are reading rows that already
exist in the database.
For read workloads, this metric shows the size of result sent back to nosqlbench from the server. This is useful to
confirm that you are reading rows that already exist in the database.
```
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM,
name=cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056,
median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
```
#### metric: tries
nosqlbench will retry failures 10 times by default, this is configurable via the `maxtries` command line
option for the cql activity type. This metric shows a histogram of the number of tries that each operation
required, in this example, there were no retries as the `count` is 100k.
NoSQLBench will retry failures 10 times by default, this is configurable via the `maxtries` command line option for the
cql activity type. This metric shows a histogram of the number of tries that each operation required, in this example,
there were no retries as the `count` is 100k.
```
2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0, p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM,
name=cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0,
p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
```
### More Metrics
@ -66,7 +74,6 @@ nosqlbench extends many ways to report the metrics from a run, including:
- Reporting to Graphite
- Reporting to HDR
To get more information on these options, see the output of
./nb --help
@ -75,6 +82,6 @@ To get more information on these options, see the output of
You have completed your first run with nosqlbench!
In the 'Next Steps' section, you'll find options for how to continue, whether you are looking
for basic testing or something more advanced.
In the 'Next Steps' section, you'll find options for how to continue, whether you are looking for basic testing or
something more advanced.

View File

@ -5,20 +5,19 @@ weight: 5
# Next Steps
Now that you've run nosqlbench for the first time and seen what it does, you can
choose what level of customization you want for further testing.
Now that you've run nosqlbench for the first time and seen what it does, you can choose what level of customization you
want for further testing.
The sections below describe key areas that users typically customize
when working with nosqlbench.
The sections below describe key areas that users typically customize when working with nosqlbench.
Everyone who uses nosqlbench will want to get familiar with the 'NoSQLBench Basics' section below.
This is essential reading for new and experienced testers alike.
Everyone who uses nosqlbench will want to get familiar with the 'NoSQLBench Basics' section below. This is essential
reading for new and experienced testers alike.
## High-Level Users
Several canonical workloads are already baked-in to nosqlbench for immediate use.
If you are simply wanting to drive workloads from nosqlbench without building a custom workload,
then you'll want to learn about the available workloads and their options.
Several canonical workloads are already baked-in to nosqlbench for immediate use. If you are simply wanting to drive
workloads from nosqlbench without building a custom workload, then you'll want to learn about the available workloads
and their options.
Recommended reading for high-level testing workflow:
1. 'Built-In Workloads'
@ -26,10 +25,9 @@ Recommended reading for high-level testing workflow:
## Workload Builders
If you want to use nosqlbench to build a tailored workload that closely emulates what
a specific application would do, then you can build a YAML file that specifies all
of the details of an iterative workload. You can specify the access patterns,
data distributions, and more.
If you want to use nosqlbench to build a tailored workload that closely emulates what a specific application would do,
then you can build a YAML file that specifies all of the details of an iterative workload. You can specify the access
patterns, data distributions, and more.
The recommended reading for this is:
@ -39,9 +37,7 @@ The recommended reading for this is:
## Scenario Developers
The underlying runtime for a scenario in nosqlbench is based on EngineBlock,
which means it has all the scripting power that comes with that. For advanced
scenario designs, iterative testing models, or analysis methods, you can use
ECMAScript to control the scenario from start to finish. This is an advanced
feature that is not recommended for first-time users. A guide for scenario
developers will be released in increments.
The underlying runtime for a scenario in nosqlbench is based on EngineBlock, which means it has all the scripting power
that comes with that. For advanced scenario designs, iterative testing models, or analysis methods, you can use
ECMAScript to control the scenario from start to finish. This is an advanced feature that is not recommended for
first-time users. A guide for scenario developers will be released in increments.

View File

@ -7,16 +7,13 @@ weight: 20
## Downloading
NoSQLBench is packaged directly as a Linux binary named `nb` and as
an executable Java jar named `nb.jar`.
NoSQLBench is packaged directly as a Linux binary named `nb` and as an executable Java jar named `nb.jar`.
The Linux binary is recommended, since it comes with its own
JVM and eliminates the need to manage Java downloads. Both can be obtained
at the releases section of the main NoSQLBench project:
The Linux binary is recommended, since it comes with its own JVM and eliminates the need to manage Java downloads. Both
can be obtained at the releases section of the main NoSQLBench project:
- [NoSQLBench Releases](https://github.com/nosqlbench/nosqlbench/releases)
:::info
Once you download the binary, you may need to `chmod +x nb` to make it
executable.
@ -27,8 +24,8 @@ If you choose to use the nb.jar instead of the binary, it is recommended
to run it with at least Java 12.
:::
This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`.
If you are using the jar, just replace `./nb` with `java -jar nb.jar` when running commands.
This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`. If you are using
the jar, just replace `./nb` with `java -jar nb.jar` when running commands.
## Running
@ -51,15 +48,13 @@ To provide your own contact points (comma separated), add the `hosts=` parameter
./nb cql-iot hosts=host1,host2
Additionally, if you have docker installed on your local system, and your user has permissions to use it, you
can use `--docker-metrics` to stand up a live metrics dashboard at port 3000.
Additionally, if you have docker installed on your local system, and your user has permissions to use it, you can use
`--docker-metrics` to stand up a live metrics dashboard at port 3000.
./nb cql-iot --docker-metrics
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can start
running real workloads without having to learn much about the machinery that makes it happen.
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can
start running real workloads without having to learn much about the machinery that makes it happen.
The rest of this section has a more elaborate example that exposes some of the basic options you may want to
adjust for your first serious test.
The rest of this section has a more elaborate example that exposes some of the basic options you may want to adjust for
your first serious test.

View File

@ -10,34 +10,28 @@ This is the same documentation you get in markdown format with the
---------------------------------------
Help ( You're looking at it. )
--help
Short options, like '-v' represent simple options, like verbosity.
Using multiples increases the level of the option, like '-vvv'.
Short options, like '-v' represent simple options, like verbosity. Using multiples increases the level of the option,
like '-vvv'.
Long options, like '--help' are top-level options that may only be
used once. These modify general behavior, or allow you to get more
details on how to use nosqlbench.
Long options, like '--help' are top-level options that may only be used once. These modify general behavior, or allow
you to get more details on how to use nosqlbench.
All other options are either commands, or named arguments to commands.
Any single word without dashes is a command that will be converted
into script form. Any option that includes an equals sign is a
named argument to the previous command. The following example
is a commandline with a command *start*, and two named arguments
to that command.
All other options are either commands, or named arguments to commands. Any single word without dashes is a command that
will be converted into script form. Any option that includes an equals sign is a named argument to the previous command.
The following example is a commandline with a command *start*, and two named arguments to that command.
./nb start driver=diag alias=example
### Discovery options ###
These options help you learn more about running nosqlbench, and
about the plugins that are present in your particular version.
These options help you learn more about running nosqlbench, and about the plugins that are present in your particular
version.
Get a list of additional help topics that have more detailed
documentation:
Get a list of additional help topics that have more detailed documentation:
./nb help topics
@ -55,11 +49,9 @@ Provide the metrics that are available for scripting
### Execution Options ###
This is how you actually tell nosqlbench what scenario to run. Each of these
commands appends script logic to the scenario that will be executed.
These are considered as commands, can occur in any order and quantity.
The only rule is that arguments in the arg=value form will apply to
the preceding script or activity.
This is how you actually tell nosqlbench what scenario to run. Each of these commands appends script logic to the
scenario that will be executed. These are considered as commands, can occur in any order and quantity. The only rule is
that arguments in the arg=value form will apply to the preceding script or activity.
Add the named script file to the scenario, interpolating named parameters:
@ -136,17 +128,16 @@ or
--progress logonly:5m
If you want to add in classic time decaying histogram metrics
for your histograms and timers, you may do so with this option:
If you want to add in classic time decaying histogram metrics for your histograms and timers, you may do so with this
option:
--classic-histograms prefix
--classic-histograms 'prefix:.*' # same as above
--classic-histograms 'prefix:.*specialmetrics' # subset of names
Name the current session, for logfile naming, etc
By default, this will be "scenario-TIMESTAMP", and a logfile will be created
for this name.
Name the current session, for logfile naming, etc By default, this will be "scenario-TIMESTAMP", and a logfile will be
created for this name.
--session-name <name>
@ -154,10 +145,13 @@ Enlist engineblock to stand up your metrics infrastructure using a local docker
--docker-metrics
When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker, configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be found on localhost:3000 with the default credentials admin/admin).
When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker,
configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base
dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance
(grafana can be found on localhost:3000 with the default credentials admin/admin).
### Console Options ###
Increase console logging levels: (Default console logging level is *warning*)
-v (info)
@ -166,8 +160,8 @@ Increase console logging levels: (Default console logging level is *warning*)
--progress console:1m (disables itself if -v options are used)
These levels affect *only* the console output level. Other logging level
parameters affect logging to the scenario log, stored by default in logs/...
These levels affect *only* the console output level. Other logging level parameters affect logging to the scenario log,
stored by default in logs/...
Show version, long form, with artifact coordinates.

View File

@ -5,26 +5,20 @@ weight: 2
# Grafana Metrics
nosqlbench comes with a built-in helper to get you up and running quickly
with client-side testing metrics.
This functionality is based on docker, and a built-in method for bringing up a docker stack,
automated by NoSQLBench.
NoSQLBench comes with a built-in helper to get you up and running quickly with client-side testing metrics. This
functionality is based on docker, and a built-in method for bringing up a docker stack, automated by NoSQLBench.
:::warning
This feature requires that you have docker running on the local system and that
your user is in a group that is allowed to manage docker.
Using the `--docker-metrics` command *will* attempt to manage docker
on your local system.
This feature requires that you have docker running on the local system and that your user is in a group that
is allowed to manage docker. Using the `--docker-metrics` command *will* attempt to manage docker on your local system.
:::
To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime,
use this command line option with any other nosqlbench commands:
To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime, use this command line option
with any other nosqlbench commands:
--docker-metrics
When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically
on your local docker, configure them to work together, and to send metrics the system
automatically. It also imports a base dashboard for nosqlbench and configures grafana
snapshot export to share with a central DataStax grafana instance (grafana can be found
on localhost:3000 with the default credentials admin/admin).
When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically on your local docker,
configure them to work together, and to send metrics the system automatically. It also imports a base dashboard for
nosqlbench and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be
found on localhost:3000 with the default credentials admin/admin).

View File

@ -5,36 +5,51 @@ weight: 03
# Parameter Types
To configure an nosqlbench activity to do something meaningful, you have to
provide parameters to it. This can occur in one of several ways. This section is a guide on nosqlbench parameters, how they layer together, and when to use one form over another.
To configure an nosqlbench activity to do something meaningful, you have to provide parameters to it. This can occur in
one of several ways. This section is a guide on nosqlbench parameters, how they layer together, and when to use one form
over another.
The command line is used to configure both the overall nosqlbench runtime (logging, etc) as well as the individual activities and scripts. Global nosqlbench options can be distinguished from scenario commands and their parameters because because global options always start with a single or --double-hyphen.
The command line is used to configure both the overall nosqlbench runtime (logging, etc) as well as the individual
activities and scripts. Global nosqlbench options can be distinguished from scenario commands and their parameters
because because global options always start with a single or --double-hyphen.
## Activity Parameters
Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must* follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to that command.
Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must*
follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading
hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to
that command.
Activity parameters can be provided by the nosqlbench core runtime or they can be provided by the activity type. All of the params are usable to configure an activity together. It's not important where they are provided from so long as you know what they do for your workloads, how to configure them, and where to find the docs.
Activity parameters can be provided by the nosqlbench core runtime or they can be provided by the activity type. All of
the params are usable to configure an activity together. It's not important where they are provided from so long as you
know what they do for your workloads, how to configure them, and where to find the docs.
*Core* Activity Parameters are those provided by the core runtime.
They are part of the core API and used by every activity type. Core activity params include type*, *alias*, and *threads*, for example.
These parameters are explained individually under the next section.
*Core* Activity Parameters are those provided by the core runtime. They are part of the core API and used by every
activity type. Core activity params include type*, *alias*, and *threads*, for example. These parameters are explained
individually under the next section.
*Custom* Activity Parameters are those provided by an activity type.
These parameters are documented for each activity type. You can see them by running `nosqlbench help <activity type>`.
*Custom* Activity Parameters are those provided by an activity type. These parameters are documented for each activity
type. You can see them by running `nosqlbench help <activity type>`.
Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an activity is running. This means that scenario scripting logic may change some variables while an activity is running, and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting scenarios.
Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an
activity is running. This means that scenario scripting logic may change some variables while an activity is running,
and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting
scenarios.
Parameters that are dynamic should be documented as such in the respective activity type's help page.
### Template Parameters
If you need to provide general-purpose overrides to a named section of the
standard YAML, then you may use a mechanism called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults. This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template Params'.
If you need to provide general-purpose overrides to a named section of the standard YAML, then you may use a mechanism
called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults.
This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override
on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template
Params'.
### Parameter Loading
Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command line or script, the parameters are resolved in the following order:
Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command
line or script, the parameters are resolved in the following order:
1. The `type` parameter tells nosqlbench which activity type implementation to load.
2. The activity type implementation creates an activity.
@ -46,9 +61,13 @@ Now that we've described all the parameter types, let's tie them together. When
## Statement Parameters
Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an activity type. These are also documented in the respective activity type's documentation included in the 'Activity Types' section.
Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely
different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a
workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an
activity type. These are also documented in the respective activity type's documentation included in the 'Activity
Types' section.
The core statement parameters are explained just below the core activity parameters in this sectin.
The core statement parameters are explained just below the core activity parameters in this section.

View File

@ -1,9 +1,9 @@
---
title: Core Activity Params
title: Activity Parameters
weight: 05
---
# Core Activity Parameters
# Activity Parameters
Activity parameters are passed as named arguments for an activity,
either on the command line or via a scenario script. On the command
@ -12,14 +12,16 @@ line, these take the form of
<paramname>=<paramvalue>
Some activity parameters are universal in that they can be used with any
activity type. These parameters are recognized by nosqlbench whether or
not they are recognized by a particular activity type implementation.
These are called _core parameters_. Only core activity parameters are
documented here.
driver type. These parameters are recognized by nosqlbench whether or
not they are recognized by a particular driver implementation. These are
called _core parameters_. Only core activity parameters are documented
here.
:::info To see what activity parameters are valid for a given activity
type, see the documentation for that activity type with `nosqlbench
help <activity type>`. :::
:::info
To see what activity parameters are valid for a given activity type, see
the documentation for that activity type with `nosqlbench help <activity
type>`.
:::
## driver

View File

@ -9,9 +9,12 @@ Some statement parameters are recognized by the nosqlbench runtime and can be us
## *ratio*
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is started.
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a
document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is
started.
When an activity is initialized, all of the active statements are combined into a sequence based on their relative ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
When an activity is initialized, all of the active statements are combined into a sequence based on their relative
ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
For example, consider the statements below:
@ -25,10 +28,15 @@ statements:
ratio: 3
```
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60% overall. It is important to remember that statement ratios are always relative to the total sum of the active statements' ratios.
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence
length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first
statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60%
overall. It is important to remember that statement ratios are always relative to the total sum of the active
statements' ratios.
:::info
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include below.
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include
below.
:::
### *seq* (activity level - do not use on statements)
@ -38,30 +46,52 @@ Because the ratio works so closely with the activity parameter `seq`, the descri
- _required_: no
- _dynamic_: no
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly efficient form.
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op
sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle
offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly
efficient form.
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or interval.
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket
scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding
ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple
times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or
interval.
:::info
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields determined by distinct cycle values.
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle
still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per
sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields
determined by distinct cycle values.
:::
There are three schemes to pick from:
### bucket
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield the sequence A B B B B B.
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is
exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield
the sequence A B B B B B.
### concat
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The
ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
### interval
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time, then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval (0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are taken as the order.
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and
apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time,
then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same
time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval
(0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is
assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are
taken as the order.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A B A` as the op sequence.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A
B A` as the op sequence.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle sequencing of statements.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where
each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle
sequencing of statements.

View File

@ -5,6 +5,5 @@ weight: 30
# NoSQLBench Basics
This section covers the essential details that you'll need to
run nosqlbench in different ways.
This section covers the essential details that you'll need to run nosqlbench in different ways.

View File

@ -5,17 +5,15 @@ weight: 2
## Description
The CQL IoT workload demonstrates a time-series telemetry system as typically
found in IoT applications. The bulk of the traffic is telemetry ingest. This is
useful for establishing steady-state capacity with an actively managed data
lifecycle. This is a steady-state workload, where inserts are 90% of the
operations and queries are the remaining 10%.
The CQL IoT workload demonstrates a time-series telemetry system as typically found in IoT applications. The bulk of the
traffic is telemetry ingest. This is useful for establishing steady-state capacity with an actively managed data
lifecycle. This is a steady-state workload, where inserts are 90% of the operations and queries are the remaining 10%.
## Schema
CREATE KEYSPACE baselines WITH replication =
CREATE KEYSPACE baselines WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE baselines.iot (
station_id UUID,
machine_id UUID,
@ -33,9 +31,8 @@ operations and queries are the remaining 10%.
2. rampup - Ramp-Up to steady state for normative density, writes only 100M rows
3. main - Run at steady state with 10% reads and 90% writes, 100M rows
For in-depth testing, this workload will take some time to build up data density
where TTLs begin purging expired data. At this point, the test should be
considered steady-state.
For in-depth testing, this workload will take some time to build up data density where TTLs begin purging expired data.
At this point, the test should be considered steady-state.
## Data Set
@ -60,7 +57,7 @@ considered steady-state.
select * from baselines.iot
where machine_id=? and sensor_name=?
limit 10
## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests.
@ -74,17 +71,14 @@ When used for additional testing, the following parameters should be supported:
- compression - enabled or disabled, to disable, set compression=''
- write_cl - the consistency level for writes (default: LOCAL_QUORUM)
- read_cl - the consistency level for reads (defaultL LOCAL_QUORUM)
## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a
user's perspective. For microbench and baseline tests, these are the only
required metrics. When gathering metrics from multiple server nodes, they should
be kept in aggregate form, for min, max, and average for each time interval in
monitoring. For example, the avg p99 latency for reads should be kept, as well
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
with discrete histogram values per interval.
### Client-Side

View File

@ -5,22 +5,19 @@ weight: 1
## Description
The CQL Key-Value workload demonstrates the simplest possible schema with
payload data. This is useful for measuring system capacity most directly in
terms of raw operations. As a reference point, provides some insight around
types of workloads that are constrained around messaging, threading, and
tasking, rather than bulk throughput.
The CQL Key-Value workload demonstrates the simplest possible schema with payload data. This is useful for measuring
system capacity most directly in terms of raw operations. As a reference point, provides some insight around types of
workloads that are constrained around messaging, threading, and tasking, rather than bulk throughput.
During preload, all keys are set with a value. During the main phase of the
workload, random keys from the known population are replaced with new values
which never repeat. During the main phase, random partitions are selected for
During preload, all keys are set with a value. During the main phase of the workload, random keys from the known
population are replaced with new values which never repeat. During the main phase, random partitions are selected for
upsert, with row values never repeating.
## Schema
CREATE KEYSPACE baselines IF NOT EXISTS WITH replication =
CREATE KEYSPACE baselines IF NOT EXISTS WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE baselines.keyvalue (
user_id UUID,
user_code text
@ -31,7 +28,7 @@ upsert, with row values never repeating.
1. schema - Initialize the schema.
2. rampup - Load data according to the data set size.
3. main - Run the workload
3. main - Run the workload
## Operations
@ -41,19 +38,19 @@ upsert, with row values never repeating.
### read (main)
select * from baselines.keyvalue where key=?key;
select * from baselines.keyvalue where key=?key;
## Data Set
### baselines.keyvalue insert (rampup)
- key - text, number as string, selected sequentially up to keycount
- value - text, number as string, selected sequentially up to valuecount
- value - text, number as string, selected sequentially up to valuecount
### baselines.keyvalue insert (main)
- key - text, number as string, selected uniformly within keycount
- value - text, number as string, selected uniformly within valuecount
- value - text, number as string, selected uniformly within valuecount
### baselines.keyvalue read (main)
@ -70,13 +67,11 @@ When used for additional testing, the following parameters should be supported:
## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a
user's perspective. For microbench and baseline tests, these are the only
required metrics. When gathering metrics from multiple server nodes, they should
be kept in aggregate form, for min, max, and average for each time interval in
monitoring. For example, the avg p99 latency for reads should be kept, as well
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
with discrete histogram values per interval.
### Client-Side
@ -95,6 +90,5 @@ form, with discrete histogram values per interval.
# Notes on Interpretation
Once the average ratio of overwrites starts to balance with the rate of
compaction, a steady state should be achieved. At this point, pending
compactions and bytes compacted should be mostly flat over time.
Once the average ratio of overwrites starts to balance with the rate of compaction, a steady state should be achieved.
At this point, pending compactions and bytes compacted should be mostly flat over time.

View File

@ -5,14 +5,15 @@ weight: 3
## Description
The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help understand underlying performance differences between version and configuration options
when using data models that have wide rows.
The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help
understand underlying performance differences between version and configuration options when using data models that have
wide rows.
## Schema
CREATE KEYSPACE if not exists baselines WITH replication =
CREATE KEYSPACE if not exists baselines WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE if not exists baselines.widerows (
part text,
clust text,
@ -26,17 +27,16 @@ when using data models that have wide rows.
2. rampup - Fully populate the widerows with data, 100000 elements per row
3. main - Run at steady state with 50% reads and 50% writes, 100M rows
For in-depth testing, this workload needs significant density of partitions in
combination with fully populated wide rows. For exploratory or parameter
contrasting tests, ensure that the rampup phase is configured correctly to
establish this initial state.
For in-depth testing, this workload needs significant density of partitions in combination with fully populated wide
rows. For exploratory or parameter contrasting tests, ensure that the rampup phase is configured correctly to establish
this initial state.
## Data Set
### baselines.widerows dataset (rampup)
- part - text, number in string form, sequentially from 1..1E9
- clust - text, number in string form, sequentially from 1..1E9
- clust - text, number in string form, sequentially from 1..1E9
- data - text, extract from lorem ipsum between 50 and 150 characters
### baselines.widerows dataset (main)
@ -64,7 +64,7 @@ establish this initial state.
select * from baselines.iot
where machine_id=? and sensor_name=?
limit 10
## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests.
@ -73,16 +73,14 @@ When used for additional testing, the following parameters should be supported:
- partcount - the number of unique partitions
- partsize - the number of logical rows within a CQL partition
## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a
user's perspective. For microbench and baseline tests, these are the only
required metrics. When gathering metrics from multiple server nodes, they should
be kept in aggregate form, for min, max, and average for each time interval in
monitoring. For example, the avg p99 latency for reads should be kept, as well
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
with discrete histogram values per interval.
### Client-Side

View File

@ -5,24 +5,20 @@ weight: 40
# Built-In Workloads
There are a few built-in workloads which you may want to run.
These workloads can be run from a command without having to configure anything,
or they can be tailored with their built-in parameters.
There are a few built-in workloads which you may want to run. These workloads can be run from a command without having
to configure anything, or they can be tailored with their built-in parameters.
There is now a way to list the built-in workloads:
`nb --list-workloads` will give you a list of all the pre-defined workloads
which have a named scenarios built-in.
`nb --list-workloads` will give you a list of all the pre-defined workloads which have a named scenarios built-in.
## Common Built-Ins
This section of the guidebook will explain a couple of the common
scenarios in detail.
This section of the guidebook will explain a couple of the common scenarios in detail.
## Built-In Workload Conventions
The built-in workloads follow a set of conventions so that they can
be used interchangeably:
The built-in workloads follow a set of conventions so that they can be used interchangeably:
### Phases
@ -34,7 +30,7 @@ Each built-in contains the following tags that can be used to break the workload
### Parameters
Each built-in has a set of adjustable parameters which is documented below per workload. For example,
the cql-iot workload has a `sources` parameter which determines the number of unique devices in the dataset.
Each built-in has a set of adjustable parameters which is documented below per workload. For example, the cql-iot
workload has a `sources` parameter which determines the number of unique devices in the dataset.

View File

@ -3,68 +3,61 @@ title: 00 YAML Organization
weight: 00
---
It is best to keep every workload self-contained within a single YAML file,
including schema, data rampup, and the main phase of testing.
The phases of testing are controlled by tags as described in the Standard YAML section.
It is best to keep every workload self-contained within a single YAML file, including schema, data rampup, and the main
phase of testing. The phases of testing are controlled by tags as described in the Standard YAML section.
:::info
The phase names described below have been adopted as a convention within the
built-in workloads. It is strongly advised that new workload YAMLs use the same
tagging scheme so that workload are more plugable across YAMLs.
The phase names described below have been adopted as a convention within the built-in workloads. It is strongly advised
that new workload YAMLs use the same tagging scheme so that workload are more plugable across YAMLs.
:::
### Schema phase
The schema phase is simply a phase of your test which creates the necessary schema
on your target system. For CQL, this generally consists of a keyspace and one ore
more table statements. There is no special schema layer in nosqlbench. All statements
executed are simply statements. This provides the greatest flexibility in testing since
every activity type is allowed to control its DDL and DML using the same machinery.
The schema phase is simply a phase of your test which creates the necessary schema on your target system. For CQL, this
generally consists of a keyspace and one ore more table statements. There is no special schema layer in nosqlbench. All
statements executed are simply statements. This provides the greatest flexibility in testing since every activity type
is allowed to control its DDL and DML using the same machinery.
The schema phase is normally executed with defaults for most parameters. This means
that statements will execute in the order specified in the YAML, in serialized form,
exactly once. This is a welcome side-effect of how the initial parameters like _cycles_
is set from the statements which are activated by tagging.
The schema phase is normally executed with defaults for most parameters. This means that statements will execute in the
order specified in the YAML, in serialized form, exactly once. This is a welcome side-effect of how the initial
parameters like _cycles_ is set from the statements which are activated by tagging.
You can mark statements as schema phase statements by adding this set of tags to the
statements, either directly, or by block:
You can mark statements as schema phase statements by adding this set of tags to the statements, either directly, or by
block:
tags:
phase: schema
### Rampup phase
When you run a performance test, it is very important to be aware of how much data is
present. Higher density tests are more realistic for systems which accumulate data over
time, or which have a large working set of data. The amount of data on the system you are
testing should recreate a realistic amount of data that you would run in production,
ideally. In general, there is a triangular trade-off between service time, op rate, and data density.
When you run a performance test, it is very important to be aware of how much data is present. Higher density tests are
more realistic for systems which accumulate data over time, or which have a large working set of data. The amount of
data on the system you are testing should recreate a realistic amount of data that you would run in production, ideally.
In general, there is a triangular trade-off between service time, op rate, and data density.
It is the purpose of the _rampup_ phase to create the backdrop data on a target system
that makes a test meaningful for some level of data density. Data density is normally
discussed as average per node, but it is also important to consider distribution of data
as it varies from the least dense to the most dense nodes.
It is the purpose of the _rampup_ phase to create the backdrop data on a target system that makes a test meaningful for
some level of data density. Data density is normally discussed as average per node, but it is also important to consider
distribution of data as it varies from the least dense to the most dense nodes.
Because it is useful to be able to add data to a target cluster in an incremental way,
the bindings which are used with a _rampup_ phase may actually be different from the
ones used for a _main_ phase. In most cases, you want the rampup phase to create data
in a way that incrementally adds to the population of data in the cluster. This allows
you to add some data to a cluster with `cycles=0..1M` and then decide whether to
continue adding data using the next contiguous range of cycles, with `cycles=1M..2M` and so on.
Because it is useful to be able to add data to a target cluster in an incremental way, the bindings which are used with
a _rampup_ phase may actually be different from the ones used for a _main_ phase. In most cases, you want the rampup
phase to create data in a way that incrementally adds to the population of data in the cluster. This allows you to add
some data to a cluster with `cycles=0..1M` and then decide whether to continue adding data using the next contiguous
range of cycles, with `cycles=1M..2M` and so on.
You can mark statements as rampup phase statements by adding this set of tags to the
statements, either directly, or by block:
You can mark statements as rampup phase statements by adding this set of tags to the statements, either directly, or by
block:
tags:
phase: rampup
### Main phase
The main phase of a nosqlbench scenario is the one during which you really care about
the metric. This is the actual test that everything else has prepared your system for.
The main phase of a nosqlbench scenario is the one during which you really care about the metric. This is the actual
test that everything else has prepared your system for.
You can mark statement as schema phase statements by adding this set of tags to the
statements, either directly, or by block:
You can mark statement as schema phase statements by adding this set of tags to the statements, either directly, or by
block:
tags:
phase: main

View File

@ -5,11 +5,11 @@ weight: 01
## Statement Templates
A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data to use with them, and tags for organizing them.
A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data
to use with them, and tags for organizing them.
In essence, the config format is *all about configuring statements*.
Every other element in the config format is in some way modifying
or otherwise helping create statements to be used in an activity.
In essence, the config format is *all about configuring statements*. Every other element in the config format is in some
way modifying or otherwise helping create statements to be used in an activity.
Statement templates are the single most important part of a YAML config.
@ -19,12 +19,16 @@ statements:
- a single statement body
```
This is a valid activity YAML file in and of itself. It has a single
statement template.
This is a valid activity YAML file in and of itself. It has a single statement template.
It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and how it will be used.
It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The
example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL
activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then
the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and
how it will be used.
You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little further in:
You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little
further in:
```yaml
statements:
@ -46,5 +50,6 @@ statements:
submit job {alpha} on queue {beta} with options {gamma};
```
Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the statement based on its position in the YAML file.
Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the
statement based on its position in the YAML file.

View File

@ -5,7 +5,12 @@ weight: 02
## Data Bindings
Procedural data generation is built-in to the nosqlbench runtime by way of the [Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named recipes for generated data are called bindings. Procedural generation for test data has [many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to pull in the recipes we want.
Procedural data generation is built-in to the nosqlbench runtime by way of the
[Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named
recipes for generated data are called bindings. Procedural generation for test data has
[many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed
and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to
pull in the recipes we want.
You can add a bindings section like this:
@ -17,9 +22,12 @@ bindings:
delta: WeightedStrings('one:1;six:6;three:3;')
```
This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of the four bindings together are called a bindings set.
This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that
takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of
the four bindings together are called a bindings set.
The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity can construct a statement template from the provided bindings if needed, so this is valid:
The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity
can construct a statement template from the provided bindings if needed, so this is valid:
```text
[test]$ cat > stdout-test.yaml
@ -43,13 +51,21 @@ The above bindings block is also a valid activity YAML, at least for the _stdout
9,nine,00J_pro,six
```
Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments.
Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the
default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments.
This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time nosqlbench needs to create a set of values as parameters to a statement, the functions are called with an input, known as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an individual statement for a database operation. In this way, each cycle represents a specific operation. Since the functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making all nosqlbench workloads deterministic.
This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time
nosqlbench needs to create a set of values as parameters to a statement, the functions are called with an input, known
as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an
individual statement for a database operation. In this way, each cycle represents a specific operation. Since the
functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making
all nosqlbench workloads deterministic.
In the example above, you can see the cycle numbers down the left.
If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly different result, as the bindings apply to the statements that are provided, rather than creating a default statement for the bindings. See the example below:
If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly
different result, as the bindings apply to the statements that are provided, rather than creating a default statement
for the bindings. See the example below:
```text
[test]$ cat > stdout-test.yaml
@ -84,11 +100,19 @@ know how statements will be used!
submit job 9 on queue nine with options 00J_pro;
```
There are a few things to notice here. First, the statements that are executed are automatically alternated between. If you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run 5 times.
There are a few things to notice here. First, the statements that are executed are automatically alternated between. If
you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run
5 times.
Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In fact, they were both treated as templates, but one of them had no anchors.
Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In
fact, they were both treated as templates, but one of them had no anchors.
On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements. Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it.
On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements.
Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it.
This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves are templates for data generation, only to be used when necessary. This means that the bindings that are defined around a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions.
This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves
are templates for data generation, only to be used when necessary. This means that the bindings that are defined around
a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then
the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both
selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions.

View File

@ -6,17 +6,23 @@ weight: 03
## Statement Parameters
Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different than the parameters that you use at the activity level. They apply specifically to a statement template, and are interpreted by an activity type when the statement template is used to construct a native statement form.
Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different
than the parameters that you use at the activity level. They apply specifically to a statement template, and are
interpreted by an activity type when the statement template is used to construct a native statement form.
For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL statement should be prepared or not.
For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In
the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL
statement should be prepared or not.
As with the bindings, a params section can be added at the same level, setting additional parameters to be used with statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do much on their own, although they have the same basic map syntax as bindings:
As with the bindings, a params section can be added at the same level, setting additional parameters to be used with
statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a
way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do
much on their own, although they have the same basic map syntax as bindings:
```yaml
params:
ratio: 1
```
As with statements, it is up to each activity type to interpret params in a
useful way.
As with statements, it is up to each activity type to interpret params in a useful way.

View File

@ -5,7 +5,8 @@ weight: 04
## Statement Tags
Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are generally free-form, but there is a set of conventions that can make your testing easier.
Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are
generally free-form, but there is a set of conventions that can make your testing easier.
An example:
@ -17,7 +18,8 @@ tags:
### Tag Filtering
The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided as an activity parameter when an activity is launched. The rules for tag filtering are:
The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided
as an activity parameter when an activity is launched. The rules for tag filtering are:
1. If no tag filter is specified, then the statement matches.
2. A tag name predicate like `tags=name` asserts the presence of a specific
@ -74,7 +76,5 @@ I'm alive!
# compound tag predicate does not fully match
[test]$ ./nb run driver=stdout workload=stdout-test tags='name=fox.*',unit=delta
11:02:53.490 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
```

View File

@ -5,7 +5,11 @@ weight: 05
## Statement Blocks
All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that need to be configured independently.
All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and
parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your
statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or
parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that
need to be configured independently.
This is where blocks become useful:
@ -38,5 +42,7 @@ blocks:
9,block2-O
```
This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the root document level. Any of these values that are defined at the base document level apply to all blocks contained in that document, unless specifically overridden within a given block.
This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the
root document level. Any of these values that are defined at the base document level apply to all blocks contained in
that document, unless specifically overridden within a given block.

View File

@ -7,7 +7,9 @@ weight: 06
## Statement Delimiting
Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way for small statements is in double quotes. If you need to express a much longer statement with special characters an newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so:
Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way
for small statements is in double quotes. If you need to express a much longer statement with special characters an
newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so:
```yaml
statements:
@ -18,16 +20,24 @@ statements:
submit job {alpha} on queue {beta} with options {gamma};
```
Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed.
Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice
because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed.
Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific activity type they are being used with. Generally, the statements should be thought of as a statement form that you want to use in your activity -- something that has place holders for data bindings. These place holders are called *named anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for each cycle of an activity.
Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific
activity type they are being used with. Generally, the statements should be thought of as a statement form that you want
to use in your activity -- something that has place holders for data bindings. These place holders are called *named
anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for
each cycle of an activity.
There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions regarding multi-line blocks, see [YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block)
There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it
retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions
regarding multi-line blocks, see
[YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block)
## Statement Sequences
To provide a degree of flexibility to the user for statement definitions,
multiple statements may be provided together as a sequence.
To provide a degree of flexibility to the user for statement definitions, multiple statements may be provided together
as a sequence.
```yaml
# a list of statements
@ -42,7 +52,8 @@ statements:
name2: "statement two"
```
In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as ordered map keys.
In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as
ordered map keys.
## Statement Properties
@ -57,7 +68,10 @@ statements:
stmt: statement two
```
This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format names each property of the statement, it allows for other properties to be defined at this level as well. This includes all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A detailed example follows:
This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format
names each property of the statement, it allows for other properties to be defined at this level as well. This includes
all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A
detailed example follows:
```yaml
statements:
@ -72,9 +86,12 @@ statements:
freeparam3: a value, as if it were assigned under the params block.
```
In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as `freeparam3`. These are simply values that get assigned to the params map once all other processing has completed.
In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing
block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as
`freeparam3`. These are simply values that get assigned to the params map once all other processing has completed.
It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long as some specific rules are followed. An example, which is equivalent to the above:
It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long
as some specific rules are followed. An example, which is equivalent to the above:
```yaml
statements:
@ -93,7 +110,8 @@ The rules:
2. Do not use the **`<name>: <statement>`** form in combination with a
**`stmt: <statement>`** property. It is not possible to detect if this occurs. Use caution if you choose to mix these forms.
As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params map for the statement.
As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params
map for the statement.
### Per-Statement Format
@ -111,7 +129,9 @@ statements:
type: preload
```
Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>: statement` form), the third is a statement config map, and the fourth is a combination of the previous two.
Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>:
statement` form), the third is a statement config map, and the fourth is a combination of the previous two.
The above is valid nosqlbench YAML, although a reader would need
to know about the rules explained above in order to really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for overrides and naming when you need it.
The above is valid nosqlbench YAML, although a reader would need to know about the rules explained above in order to
really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for
overrides and naming when you need it.

View File

@ -5,14 +5,15 @@ weight: 07
# Multi-Docs
The YAML spec allows for multiple yaml documents to be concatenated in the
same file with a separator:
The YAML spec allows for multiple yaml documents to be concatenated in the same file with a separator:
```yaml
---
```
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you need only put them in separate logical documents, separated by a triple-dash.
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of
statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you
need only put them in separate logical documents, separated by a triple-dash.
For example:
@ -42,8 +43,11 @@ doc2.number eight
doc1.form1 doc1.1
```
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken apart into a whole other level of partitioning if desired.
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken
apart into a whole other level of partitioning if desired.
:::warning
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are generally a better choice. See examples in the standard workloads.
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML
workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are
generally a better choice. See examples in the standard workloads.
:::

View File

@ -5,7 +5,8 @@ weight: 08
# Template Params
All nosqlbench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic macro facility that allows named anchors to be placed in the document as a whole:
All nosqlbench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic
macro facility that allows named anchors to be placed in the document as a whole:
```text
<<varname:defaultval>>
@ -13,7 +14,9 @@ All nosqlbench YAML formats support a parameter macro format that applies before
TEMPLATE(varname,defaultval)
```
In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with `barbaz`. If none is provided then the default value will be used instead. For example:
In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity
parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with
`barbaz`. If none is provided then the default value will be used instead. For example:
```text
[test]$ cat > stdout-test.yaml
@ -28,6 +31,7 @@ MISSING
THIS IS IT
```
If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or `TEMPLATE(varname,)`.
If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or
`TEMPLATE(varname,)`.

View File

@ -20,13 +20,16 @@ name: doc2
...
```
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except for documentation or metric naming purposes.
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except
for documentation or metric naming purposes.
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the
document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
For example, the full name of statement1 above would be `doc1--block1--stmt1`.
:::info
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want to adopt the habit of naming all your statements something basic and descriptive.
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want
to adopt the habit of naming all your statements something basic and descriptive.
:::

View File

@ -21,10 +21,11 @@ scenarios:
- run driver=diag cycles=10M
```
This provides a way to specify more detailed workflows that users may want
to run without them having to build up a command line for themselves.
This provides a way to specify more detailed workflows that users may want to run without them having to build up a
command line for themselves.
A couple of other forms are supported in the YAML, for terseness:
```yaml
scenarios:
oneliner: run driver=diag cycles=10
@ -32,16 +33,15 @@ scenarios:
part1: run driver=diag cycles=10 alias=part2
part2: run driver=diag cycles=20 alias=part2
```
These forms simply provide finesse for common editing habits, but they are
automatically read internally as a list. In the map form, the names are discarded,
but they may be descriptive enough for use as inline docs for some users. The
These forms simply provide finesse for common editing habits, but they are automatically read internally as a list. In
the map form, the names are discarded, but they may be descriptive enough for use as inline docs for some users. The
order is retained as listed, since the names have no bearing on the order.
## Scenario selection
When a named scenario is run, it is *always* named, so that it can be looked up
in the list of named scenarios under your `scenarios:` property. The only
exception to this is when an explicit scenario name is not found on the command
When a named scenario is run, it is *always* named, so that it can be looked up in the list of named scenarios under
your `scenarios:` property. The only exception to this is when an explicit scenario name is not found on the command
line, in which case it is automatically assumed to be _default_.
Some examples may be more illustrative:
@ -69,27 +69,24 @@ You can run multiple named scenarios in the same command if
## Workload selection
The examples above contain no reference to a workload (formerly called _yaml_).
They don't need to, as they refer to themselves implicitly. You may add a `workload=`
parameter to the command templates if you like, but this is never needed for basic
use, and it is error prone to keep the filename matched to the command template. Just
leave it out by default.
The examples above contain no reference to a workload (formerly called _yaml_). They don't need to, as they refer to
themselves implicitly. You may add a `workload=` parameter to the command templates if you like, but this is never
needed for basic use, and it is error prone to keep the filename matched to the command template. Just leave it out by
default.
_However_, if you are doing advanced scripting across multiple systems, you can
actually provide a `workload=` parameter particularly to use another workload
description in your test.
_However_, if you are doing advanced scripting across multiple systems, you can actually provide a `workload=` parameter
particularly to use another workload description in your test.
:::info
This is a powerful feature for workload automation and organization. However, it can
get unweildy quickly. Caution is advised for deep-linking too many scenarios in a workspace,
as there is no mechanism for keeping them in sync when small changes are made.
This is a powerful feature for workload automation and organization. However, it can get unweildy quickly. Caution is
advised for deep-linking too many scenarios in a workspace, as there is no mechanism for keeping them in sync when small
changes are made.
:::
## Named Scenario Discovery
For named scenarios, there is a way for users to find all the named scenarios that are
currently bundled or in view of their current directory. A couple simple rules must
be followed by scenario publishers in order to keep things simple:
For named scenarios, there is a way for users to find all the named scenarios that are currently bundled or in view of
their current directory. A couple simple rules must be followed by scenario publishers in order to keep things simple:
1. Workload files in the current directory `*.yaml` are considered.
2. Workload files under in the relative path `activities/` with name `*.yaml` are
@ -99,38 +96,33 @@ be followed by scenario publishers in order to keep things simple:
4. Any workload file that contains a `scenarios:` tag is included, but all others
are ignored.
This doesn't mean that you can't use named scenarios for workloads in other locations.
It simply means that when users use the `--list-scenarios` option, these are the only
ones they will see listed.
This doesn't mean that you can't use named scenarios for workloads in other locations. It simply means that when users
use the `--list-scenarios` option, these are the only ones they will see listed.
## Parameter Overrides
You can override parameters that are provided by named scenarios. Any parameter
that you specify on the command line after your workload and optional scenario name
will be used to override or augment the commands that are provided for the named scenario.
You can override parameters that are provided by named scenarios. Any parameter that you specify on the command line
after your workload and optional scenario name will be used to override or augment the commands that are provided for
the named scenario.
This is powerful, but it also means that you can sometimes munge user-provided
activity parameters on the command line with the named scenario commands in ways
that may not make sense. To solve this, the parameters in the named scenario commands
may be locked. You can lock them silently, or you can provide a verbose locking that will
cause an error if the user even tries to adjust them.
This is powerful, but it also means that you can sometimes munge user-provided activity parameters on the command line
with the named scenario commands in ways that may not make sense. To solve this, the parameters in the named scenario
commands may be locked. You can lock them silently, or you can provide a verbose locking that will cause an error if the
user even tries to adjust them.
Silent locking is provided with a form like `param==value`. Any silent locked parameters
will reject overrides from the command line, but will not interrupt the user.
Silent locking is provided with a form like `param==value`. Any silent locked parameters will reject overrides from the
command line, but will not interrupt the user.
Verbose locking is provided with a form like `param===value`. Any time a user provides
a parameter on the command line for the named parameter, an error is thrown and they
are informed that this is not possible. This level is provided for cases in which you
would not want the user to be unaware of an unset parameter which is germain and specific
to the named scenario.
Verbose locking is provided with a form like `param===value`. Any time a user provides a parameter on the command line
for the named parameter, an error is thrown and they are informed that this is not possible. This level is provided for
cases in which you would not want the user to be unaware of an unset parameter which is germain and specific to the
named scenario.
All other parameters provided by the user will take the place of the same-named parameters
provided in *each* command templates, in the order they appear in the template.
Any other parameters provided by the user will be added to *each* of the command templates
in the order they appear on the command line.
All other parameters provided by the user will take the place of the same-named parameters provided in *each* command
templates, in the order they appear in the template. Any other parameters provided by the user will be added to *each*
of the command templates in the order they appear on the command line.
This is a little counter-intuitive at first, but once you see some examples it should
make sense.
This is a little counter-intuitive at first, but once you see some examples it should make sense.
## Parameter Overide Examples
@ -176,9 +168,8 @@ $
### Silent Locking example
If you run the second scenario `s2` with your own value for `cycles=7`, then it does
what the locked parameter `cycles==10` requires, without telling you that it is
ignoring the specified value on your command line.
If you run the second scenario `s2` with your own value for `cycles=7`, then it does what the locked parameter
`cycles==10` requires, without telling you that it is ignoring the specified value on your command line.
```
$ nb basics s2 cycles=7
@ -200,19 +191,15 @@ Sometimes, this is appropriate, such as when specifying settings like `threads==
### Verbose Locking example
If you run the third scenario `s3` with your own value for `cycles=7`, then you
will get an error telling you that this is not possible. Sometimes you want to
make sure tha the user knows a parameter should not be changed, and that if they
want to change it, they'll have to make their own custom version of the scenario
in question.
If you run the third scenario `s3` with your own value for `cycles=7`, then you will get an error telling you that this
is not possible. Sometimes you want to make sure tha the user knows a parameter should not be changed, and that if they
want to change it, they'll have to make their own custom version of the scenario in question.
```
$ nb basics s3 cycles=7
ERROR: Unable to reassign value for locked param 'cycles===7'
$
```
Ultimately, it is up to the scenario designer when to lock parameters for users.
The built-in workloads offer some examples on how to set these parameters so that
the right value are locked in place without bother the user, but some values
are made very clear in how they should be set. Please look at these examples
for inspiration when you need.
Ultimately, it is up to the scenario designer when to lock parameters for users. The built-in workloads offer some
examples on how to set these parameters so that the right value are locked in place without bother the user, but some
values are made very clear in how they should be set. Please look at these examples for inspiration when you need.

View File

@ -5,78 +5,86 @@ weight: 99
## Diagnostics
This section describes errors that you might see if you have a YAML loading issue, and what
you can do to fix them.
This section describes errors that you might see if you have a YAML loading issue, and what you can do to fix them.
### Undefined Name-Statement Tuple
This exception is thrown when the statement body is not found in a statement definition
in any of the supported formats. For example, the following block will cause an error:
This exception is thrown when the statement body is not found in a statement definition in any of the supported formats.
For example, the following block will cause an error:
statements:
- name: statement-foo
params:
aparam: avalue
```yaml
statements:
- name: statement-foo
params:
aparam: avalue
```
This is because `name` and `params` are reserved property names -- removed from the list of name-value
pairs before free parameters are read. If the statement is not defined before free parameters
are read, then the first free parameter is taken as the name and statement in `name: statement` form.
This is because `name` and `params` are reserved property names -- removed from the list of name-value pairs before free
parameters are read. If the statement is not defined before free parameters are read, then the first free parameter is
taken as the name and statement in `name: statement` form.
To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry
with a `statement-foo: statement body` at the top of the map:
To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry with a
`statement-foo: statement body` at the top of the map:
Either of these will work:
statements:
- name: statement-foo
stmt: statement body
params:
aparam: avalue
```yaml
statements:
- name: statement-foo
stmt: statement body
params:
aparam: avalue
---
statements:
- statement-foo: statement body
params:
aparam: avalue
```
statements:
- statement-foo: statement body
params:
aparam: avalue
In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit
naming should occur.
In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit naming
should occur.
### Redefined Name-Statement Tuple
This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception
to avoid possible ambiguity about which value the user intended. For example, the following statements
definition will cause an error:
This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception to avoid
possible ambiguity about which value the user intended. For example, the following statements definition will cause an
error:
statements:
- name: name1
name2: statement body
```yaml
statements:
- name: name1
name2: statement body
```
This is an error because the statement is not defined before free parameters are read, and the `name: statement`
form includes a second definition for the statement name. In order to correct this, simply remove the separate
`name` entry, or use the `stmt` property to explicitly set the statement body. Either of these will work:
This is an error because the statement is not defined before free parameters are read, and the `name: statement` form
includes a second definition for the statement name. In order to correct this, simply remove the separate `name` entry,
or use the `stmt` property to explicitly set the statement body. Either of these will work:
statements:
- name2: statement body
statements:
- name: name1
stmt: statement body
```yaml
statements:
- name2: statement body
---
statements:
- name: name1
stmt: statement body
```
In both cases, there is only one name defined for the statement according to the supported formats.
### YAML Parsing Error
This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not
working from examples that are known to load cleanly, then please review your document for correctness
according to the [YAML Specification]().
This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not working from
examples that are known to load cleanly, then please review your document for correctness according to the
[YAML Specification]().
If you are sure that the YAML should load, then please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug)
with details on the type of YAML file you are trying to load.
If you are sure that the YAML should load, then please
[submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details on the type of YAML
file you are trying to load.
### YAML Construction Error
This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed
from the in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation.
Please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details
on the type of YAML file you are trying to load.
This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed from the
in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation. Please
[submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details on the type of YAML
file you are trying to load.

View File

@ -5,27 +5,42 @@ weight: 40
# Designing Workloads
Workloads in nosqlbench are always controlled by a workload definition. Even the built-in workloads are simply pre-configured and controlled from a single YAML file which is bundled internally.
Workloads in nosqlbench are always controlled by a workload definition.
Even the built-in workloads are simply pre-configured and controlled
from a single YAML file which is bundled internally.
With nosqlbench a standard YAML configuration format is provided that is used across all activity types. This makes it easy to specify statements, statement parameters, data bindings, and tags. This section describes the standard YAML format and how to use it.
With nosqlbench a standard YAML configuration format is provided that is
used across all activity types. This makes it easy to specify
statements, statement parameters, data bindings, and tags. This section
describes the standard YAML format and how to use it.
It is recommended that you read through the examples in each of the design sections in order. This guide was designed to give you a detailed understanding of workload construction with nosqlbench. The examples will also give you better insight into how nosqlbench works at a fundamental level.
It is recommended that you read through the examples in each of the
design sections in order. This guide was designed to give you a detailed
understanding of workload construction with nosqlbench. The examples
will also give you better insight into how nosqlbench works at a
fundamental level.
## Multi-Protocol Support
You will notice that this guide is not overly CQL-specific. That is because nosqlbench is a multi-protocol tool. All that is needed for you to use this guide with other protocols is the release of more activity types. Try to keep that in mind as you think about designing workloads.
You will notice that this guide is not overly CQL-specific. That is
because nosqlbench is a multi-protocol tool. All that is needed for you
to use this guide with other protocols is the release of more activity
types. Try to keep that in mind as you think about designing workloads.
## Advice for new builders
### Review existing examples
The built-in workloads that are include with nosqlbench are also shared on the github site where we manage the nosqlbench project:
The built-in workloads that are include with nosqlbench are also shared
on the github site where we manage the nosqlbench project:
- [baselines](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/baselines)
- [bindings](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/bindings)
### Follow the conventions
The tagging conventions described under the YAML Conventions section will make your testing go smoother. All of the baselines that we publish for nosqlbench will use this form.
The tagging conventions described under the YAML Conventions section
will make your testing go smoother. All of the baselines that we publish
for nosqlbench will use this form.

View File

@ -1,5 +1,5 @@
---
title: activity type - CQL
title: driver - CQL
weight: 06
---
@ -16,35 +16,31 @@ To select this activity type, pass `driver=cql` to a run or start command.
# cql activity type
This is an activity type which allows for the execution of CQL statements.
This particular activity type is wired synchronously within each client
thread, however the async API is used in order to expose fine-grain
metrics about op binding, op submission, and waiting for a result.
This is an activity type which allows for the execution of CQL statements. This particular activity type is wired
synchronously within each client thread, however the async API is used in order to expose fine-grain metrics about op
binding, op submission, and waiting for a result.
### Example activity definitions
Run a cql activity named 'cql1', with definitions from activities/cqldefs.yaml
~~~
... driver=cql alias=cql1 workload=cqldefs
~~~
... driver=cql alias=cql1 workload=cqldefs
Run a cql activity defined by cqldefs.yaml, but with shortcut naming
~~~
... driver=cql workload=cqldefs
~~~
... driver=cql workload=cqldefs
Only run statement groups which match a tag regex
~~~
... driver=cql workload=cqldefs tags=group:'ddl.*'
~~~
... driver=cql workload=cqldefs tags=group:'ddl.*'
Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
~~~
... driver=cql workload=cqldefs tags=group:'dml.*' cycles=1000..1100
~~~
This last example shows that the cycle range is [inclusive..exclusive),
to allow for stacking test intervals. This is standard across all
activity types.
... driver=cql workload=cqldefs tags=group:'dml.*' cycles=1000..1100
This last example shows that the cycle range is [inclusive..exclusive), to allow for stacking test intervals. This is
standard across all activity types.
### CQL ActivityType Parameters

View File

@ -23,19 +23,16 @@ that uses the curly brace token form in statements.
## Example activity definitions
Run a stdout activity named 'stdout-test', with definitions from activities/stdout-test.yaml
~~~
... driver=stdout workload=stdout-test
~~~
... driver=stdout workload=stdout-test
Only run statement groups which match a tag regex
~~~
... driver=stdout workload=stdout-test tags=group:'ddl.*'
~~~
... driver=stdout workload=stdout-test tags=group:'ddl.*'
Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
~~~
... driver=stdout workload=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv
~~~
... driver=stdout workload=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv
This last example shows that the cycle range is [inclusive..exclusive),
to allow for stacking test intervals. This is standard across all
@ -54,45 +51,50 @@ activity types.
## Configuration
This activity type uses the uniform yaml configuration format.
For more details on this format, please refer to the
This activity type uses the uniform yaml configuration format. For more details on this format, please refer to the
[Standard YAML Format](http://docs.engineblock.io/user-guide/standard_yaml/)
## Configuration Parameters
- **newline** - If a statement has this param defined, then it determines
whether or not to automatically add a missing newline for that statement
only. If this is not defined for a statement, then the activity-level
parameter takes precedence.
- **newline** - If a statement has this param defined, then it determines whether or not to automatically add a missing
newline for that statement only. If this is not defined for a statement, then the activity-level parameter takes
precedence.
## Statement Format
The statement format for this activity type is a simple string. Tokens between
curly braces are used to refer to binding names, as in the following example:
The statement format for this activity type is a simple string. Tokens between curly braces are used to refer to binding
names, as in the following example:
statements:
- "It is {minutes} past {hour}."
```yaml
statements:
- "It is {minutes} past {hour}."
```
If you want to suppress the trailing newline that is automatically added, then
you must either pass `newline=false` as an activity param, or specify it
in the statement params in your config as in:
```yaml
params:
newline: false
```
### Auto-generated statements
If no statement is provided, then the defined binding names are used as-is
to create a CSV-style line format. The values are concatenated with
comma delimiters, so a set of bindings like this:
If no statement is provided, then the defined binding names are used as-is to create a CSV-style line format. The values
are concatenated with comma delimiters, so a set of bindings like this:
bindings:
one: Identity()
two: NumberNameToString()
```yaml
bindings:
one: Identity()
two: NumberNameToString()
```
would create an automatic string template like this:
statements:
- "{one},{two}\n"
```yaml
statements:
- "{one},{two}\n"
```
The auto-generation behavior is forced when the format parameter is supplied.

View File

@ -3,11 +3,12 @@ title: Driver Types
weight: 50
---
Each nosqlbench scenario is comprised of one or more activities of a specific type.
The types of activities available are provided by the version of nosqlbench.
Each nosqlbench scenario is comprised of one or more activities of a
specific type. The types of activities available are provided by the
version of nosqlbench.
Additional activity types will be added in future releases.
There are command line help topics for each activity type (driver).
Additional drivers will be added in future releases. There are command
line help topics for each activity type (driver).
To get a list of topics run:

View File

@ -4,17 +4,25 @@ title: CLI Scripting
# CLI Scripting
Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to write javascript for every scenario.
Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in
between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to
write javascript for every scenario.
For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them together.
For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key
API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them
together.
## Script Construction
As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a snippet of script. It is important to remember that order is important.
As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once
the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a
snippet of script. It is important to remember that order is important.
## Command line format
Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you can simply string multiple commands together with spaces between. As usual, single word options without double dashes are commands, key=value style parameters apply to the previous command, and all other commands with
Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you
can simply string multiple commands together with spaces between. As usual, single word options without double dashes
are commands, key=value style parameters apply to the previous command, and all other commands with
--this-style
@ -22,28 +30,35 @@ are non-scripting options.
## Concurrency & Control
All activities that run during a scenario run under the control of, but
independently from the scenario script. This means that you can have a number of activities running while the scenario script is doing its own thing. The scenario only completes when both the scenario script and the activities are finished.
All activities that run during a scenario run under the control of, but independently from the scenario script. This
means that you can have a number of activities running while the scenario script is doing its own thing. The scenario
only completes when both the scenario script and the activities are finished.
### `start driver=<activity type> alias=<alias> ...`
You can start an activity with this command. At the time this command is
evaluated, the activity is started, and the script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in this way, they will run concurrently.
You can start an activity with this command. At the time this command is evaluated, the activity is started, and the
script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in
this way, they will run concurrently.
The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless you want to be able to interact with the started activity later. In any case, it is a good idea to name all your activities with a meaningful alias.
The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless
you want to be able to interact with the started activity later. In any case, it is a good idea to name all your
activities with a meaningful alias.
### `stop <alias>`
Stop an activity with the given alias. This is synchronous, and causes the
scenario to pause until the activity is stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state.
Stop an activity with the given alias. This is synchronous, and causes the scenario to pause until the activity is
stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state.
### `await <alias>`
Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a paused state until the named activity is complete.
Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits
for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a
paused state until the named activity is complete.
### `run driver=<activity type> alias=<alias> ...`
Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is effectively the same as
Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is
effectively the same as
start driver=<activity type> ... alias=<alias>
await <alias>
@ -71,7 +86,8 @@ await one \
stop two
~~~
in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative of what happens when it is run.
in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative
of what happens when it is run.
1. An activity named 'a' is started, with 100K cycles of work.
2. An activity named 'b' is started, with 200K cycles of work.

View File

@ -6,81 +6,115 @@ title: Scenario Scripting
## Motive
The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done by human hands.
The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not
accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of
scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this
approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done
by human hands.
## Machinery, Controls & Instruments
All of the heavy lifting is left to Java and the core nosqlbench runtime. This includes the iterative workloads that are meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open design palette of a first-class scripting language.
All of the heavy lifting is left to Java and the core nosqlbench runtime. This includes the iterative workloads that are
meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually
GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity
logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity
workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open
design palette of a first-class scripting language.
Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic control points and parameters which special to that activity type (driver). This exposes a full feedback loop between a running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a running activity and make changes to it on the fly.
Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic
control points and parameters which special to that activity type (driver). This exposes a full feedback loop between a
running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a
running activity and make changes to it on the fly.
## Scripting Environment
The nosqlbench scripting environment provided has a few
modifications meant to streamline understanding and usage of nosqlbench dynamic parameters and metric.
The nosqlbench scripting environment provided has a few modifications meant to streamline understanding and usage of
nosqlbench dynamic parameters and metric.
### Active Bindings
Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime. Each of the variables
below is pre-wired into each script environment.
Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime.
Each of the variables below is pre-wired into each script environment.
#### scenario
This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this Java type are provided
to the scripting environment directly.
This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this
Java type are provided to the scripting environment directly.
#### activities.&lt;alias&gt;.&lt;paramname&gt;
Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value to it as in `activities.foo.threads=3`.
Any assignments take effect synchronously before the next line of the script continues executing.
Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you
can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value
to it as in `activities.foo.threads=3`. Any assignments take effect synchronously before the next line of the script
continues executing.
#### __metrics__.&lt;alias&gt;.&lt;metric name&gt;
Each activity metric for a given activity alias is available at this name.
This gives you access to the metrics objects directly. Some metrics objects
have also been enhanced with wrapper logic to provide simple getters and setters, like `.p99ms` or `.p99ns`, for example.
Each activity metric for a given activity alias is available at this name. This gives you access to the metrics objects
directly. Some metrics objects have also been enhanced with wrapper logic to provide simple getters and setters, like
`.p99ms` or `.p99ns`, for example.
Interaction with the nosqlbench runtime and the activities therein is made easy
by the above variables and objects. When an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to _threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
Interaction with the nosqlbench runtime and the activities therein is made easy by the above variables and objects. When
an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to
_threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other
changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
:::warning
Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a different driver once it has been created.
Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a
different driver once it has been created.
:::
You can make use of more extensive Java or Javascript libraries as needed,
mixing then with the runtime controls provided above.
You can make use of more extensive Java or Javascript libraries as needed, mixing then with the runtime controls
provided above.
## Enhanced Metrics for Scripting
The metrics available in nosqlbench are slightly different than the standard
kit with dropwizard metrics. The key differences are:
The metrics available in nosqlbench are slightly different than the standard kit with dropwizard metrics. The key
differences are:
### HDR Histograms
All histograms use HDR histograms with *four* significant digits.
All histograms reset on snapshot, automatically keeping all data until you
report the snapshot or access the snapshot via scripting. (see below).
All histograms reset on snapshot, automatically keeping all data until you report the snapshot or access the snapshot
via scripting. (see below).
The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything differently in your reporter config to use them. However, if you need to use the enhanced versions in your local scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions, then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full snapshots for each report to your downstream metrics systems.
The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything
differently in your reporter config to use them. However, if you need to use the enhanced versions in your local
scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions,
then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full
snapshots for each report to your downstream metrics systems.
### Scripting with Delta Snapshots
For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as &lt;metric&gt;.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the reported snapshot will be exactly what was used in the script.
For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as
&lt;metric&gt;.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the
deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes
the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for
that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the
reported snapshot will be exactly what was used in the script.
This is important for using local scripting methods and calculations with aggregate views downstream. It means that the histograms will match up between your local script output and your downstream dashboards, as they will both be using the same frame of data, when done properly.
This is important for using local scripting methods and calculations with aggregate views downstream. It means that the
histograms will match up between your local script output and your downstream dashboards, as they will both be using the
same frame of data, when done properly.
### Histogram Convenience Methods
All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98, P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and every other possible combination. This means that you can access the 99th percentile metric value in your scripts for activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_.
All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98,
P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and
every other possible combination. This means that you can access the 99th percentile metric value in your scripts for
activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_.
## Control Flow
When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop all activities.
When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its
end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop
all activities.
## Strategies
You can use nosqlbench in the classic form with `run driver=<activity_type> param=value ...` command line syntax. There are reasons, however, that you will sometimes want customize and modify your scripts directly, such as:
You can use nosqlbench in the classic form with `run driver=<activity_type> param=value ...` command line syntax. There
are reasons, however, that you will sometimes want customize and modify your scripts directly, such as:
- Permute test variables to cover many sub-conditions in a test.
- Automatically adjust load factors to identify the nominal capacity of a system.
@ -89,7 +123,9 @@ You can use nosqlbench in the classic form with `run driver=<activity_type> para
## Script Input & Output
Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that each line was recorded from.
Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the
logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that
each line was recorded from.
## External Docs

View File

@ -4,23 +4,34 @@ title: Standard Metrics
# Standard Metrics
nosqlbench comes with a set of standard metrics that will be part of every activity type (driver). Each activity type (driver) enhances the metrics available by adding their own metrics with the nosqlbench APIs. This section explains what the standard metrics are, and how to interpret them.
nosqlbench comes with a set of standard metrics that will be part of every activity type (driver). Each activity type
(driver) enhances the metrics available by adding their own metrics with the nosqlbench APIs. This section explains what
the standard metrics are, and how to interpret them.
## read-input
Within nosqlbench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only measured the time it takes for a given activity thread to read the input value, nothing more.
Within nosqlbench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that
will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric
is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only
measured the time it takes for a given activity thread to read the input value, nothing more.
## strides
A stride represents the work-unit for a thread within nosqlbench. It allows a set of cycles to be logically grouped together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them.
A stride represents the work-unit for a thread within nosqlbench. It allows a set of cycles to be logically grouped
together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple
operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them.
The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring time before the cycle starts, and stops measuring after the last cycle in the stride has run.
The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring
time before the cycle starts, and stops measuring after the last cycle in the stride has run.
## cycles
Within nosqlbench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an iteration of a workload. This corresponds to a single operation executed according to some statement definition.
Within nosqlbench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an
iteration of a workload. This corresponds to a single operation executed according to some statement definition.
The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by multi-phase actions.
The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has
control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by
multi-phase actions.

View File

@ -4,26 +4,45 @@ title: Timing Terms
# Timing Terms
Often, terms used to describe latency can create confusion.
In fact, the term _latency_ is so overloaded in practice that it is not useful by itself. Because of this, nosqlbench will avoid using the term latency _except in a specific way_. Instead, the terms described in this section will be used.
Often, terms used to describe latency can create confusion. In fact, the term _latency_ is so overloaded in practice
that it is not useful by itself. Because of this, nosqlbench will avoid using the term latency _except in a specific
way_. Instead, the terms described in this section will be used.
nosqlbench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it *cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that are adopted with nosqlbench.
nosqlbench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to
what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it
*cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that
are adopted with nosqlbench.
Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When using this term to describe other effects in your system, what is included depends on the perspective of the requester. The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is defined by the vantage point of the requester. This is the perspective taken by the nosqlbench approach for naming and semantics below.
Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When
using this term to describe other effects in your system, what is included depends on the perspective of the requester.
The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is
defined by the vantage point of the requester. This is the perspective taken by the nosqlbench approach for naming and
semantics below.
## responsetime
**The duration of time a user has to wait for a response from the time they submitted the request.** Response time is the duration of time from when a request was expected to start, to the time at which the response is finally seen by the user. A request is generally expected to start immediately when users make a request. For example, when a user enters a URL into a browser, they expect the request to start immediately when they hit enter.
**The duration of time a user has to wait for a response from the time they submitted the request.** Response time is
the duration of time from when a request was expected to start, to the time at which the response is finally seen by the
user. A request is generally expected to start immediately when users make a request. For example, when a user enters a
URL into a browser, they expect the request to start immediately when they hit enter.
In nosqlbench, the response time for any operation can be calculated by adding its wait time and its the service time together.
In nosqlbench, the response time for any operation can be calculated by adding its wait time and its the service time
together.
## waittime
**The duration of time between when an operation is intended to start and when it actually starts on a client.** This is also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that delays a client request, it must wait.
**The duration of time between when an operation is intended to start and when it actually starts on a client.** This is
also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests
instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This
ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that
delays a client request, it must wait.
Wait time can accumulate when you are running something according to a dispatch rate, as with a rate limiter.
## servicetime
**The duration of time it takes a server or other system to fully process to a request and send a response.** From the perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the service time metrics in nosqlbench include any operational time that is external to the client, including transport latency.
**The duration of time it takes a server or other system to fully process to a request and send a response.** From the
perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the
service time metrics in nosqlbench include any operational time that is external to the client, including transport
latency.

View File

@ -6,7 +6,8 @@ title: Advanced Metrics
## Unit of Measure
All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4 digits of precision using HDR histograms.
All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4
digits of precision using HDR histograms.
## Metric Outputs
@ -19,14 +20,14 @@ Metrics from a scenario run can be gathered in multiple ways:
- To a monitoring system via graphite
- via the --docker-metrics option
With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command line options for enabling these are documented in the built-in help, although some examples of these may be found below.
With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command
line options for enabling these are documented in the built-in help, although some examples of these may be found below.
## Metrics via Graphite
If you like to have all of your testing data in one place, then you may be
interested in reporting your measurements to a monitoring system. For this,
nosqlbench includes a [Metrics Library](https://github.com/dropwizard/metrics).
Graphite reporting is baked in as the default reporter.
If you like to have all of your testing data in one place, then you may be interested in reporting your measurements to
a monitoring system. For this, nosqlbench includes a
[Metrics Library](https://github.com/dropwizard/metrics). Graphite reporting is baked in as the default reporter.
In order to enable graphite reporting, use one of these options formats:
@ -43,12 +44,16 @@ Core metrics use the prefix _engineblock_ by default. You can override this with
## Identifiers
Metrics associated with a specific activity will have the activity alias in
their name. There is a set of core metrics which are always present regardless of the activity type. The names and types of additional metrics provided for each activity type vary.
Metrics associated with a specific activity will have the activity alias in their name. There is a set of core metrics
which are always present regardless of the activity type. The names and types of additional metrics provided for each
activity type vary.
Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the most verbose, a metric name could take on the form like
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements, which is recommended.
Just keep in mind that the double dash connects an activity's alias with named statements *within* that activity.
Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given
statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the
most verbose, a metric name could take on the form like
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements,
which is recommended. Just keep in mind that the double dash connects an activity's alias with named statements *within*
that activity.
## HDR Histograms
@ -63,26 +68,30 @@ If you want to record only certain metrics in this way, then use this form:
--log-histograms 'hdrdata.log:.*suffix'
Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The '.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as using '.\*', which matches all metrics. Any valid regex is allowed here.
Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The
'.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as
using '.\*', which matches all metrics. Any valid regex is allowed here.
Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram reservoirs does come with a cost, so be sure to be specific in what you record as much as possible.
Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram
reservoirs does come with a cost, so be sure to be specific in what you record as much as possible.
If you want to specify the recording interval, use this form:
--log-histograms 'hdrdata.log:.*suffix:5s'
If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty, such as 'hdrdata.log::5s'.
If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty,
such as 'hdrdata.log::5s'.
Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in terms of accuracy.
Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in
terms of accuracy.
### Recording HDR Histogram Stats
You can also record basic snapshots of histogram data on a periodic interval
just like above with HDR histogram logs. The option to do this is:
You can also record basic snapshots of histogram data on a periodic interval just like above with HDR histogram logs.
The option to do this is:
--log-histostats 'hdrstats.log:.*suffix:10s'
Everything works the same as for hdr histogram logging, except that the format is in CSV as shown in the example below:
~~~
@ -97,5 +106,8 @@ Tag=diag1.cycles,0.501,0.499,498,1024,2047,2047,4095,4095,4095,4095,4095,4095,40
...
~~~
This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common percentiles are recorded directly.
This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number
of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that
the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common
percentiles are recorded directly.