docs naming and formatting

This commit is contained in:
Jonathan Shook 2020-03-26 12:33:46 -05:00
parent 3fcd0f6159
commit cf81331a41
52 changed files with 1083 additions and 1004 deletions

View File

@ -1,16 +1,17 @@
# cql activity type - advanced features # cql driver - advanced features
This is an addendum to the standard CQL Activity Type docs. For that, see "cql". This is an addendum to the standard CQL Activity Type docs. For that,
Use the features in this guide carefully. They do not come with as much documentation see "cql". Use the features in this guide carefully. They do not come
as they are less used than the main CQL features. with as much documentation as they are less used than the main CQL
features.
### ResultSet and Row operators ### ResultSet and Row operators
Within the CQL Activity type, synchronous mode (activities with out Within the CQL Activity type, synchronous mode (activities with out the
the async= parameter), you have the ability to attach operators to a async= parameter), you have the ability to attach operators to a given
given statement such that it will get per-statement handling. These statement such that it will get per-statement handling. These operators
operators are ways of interrogating the result of an operation, saving are ways of interrogating the result of an operation, saving values, or
values, or managing other side-effects for specific types of testing. managing other side-effects for specific types of testing.
When enabled for a statement, operators are applied in this order: When enabled for a statement, operators are applied in this order:
@ -35,7 +36,7 @@ row data, you must apply a row operator as explained below.
- **rowoperators** - If provided as a CQL statement param, then the - **rowoperators** - If provided as a CQL statement param, then the
list of operator names that follow, separated by a comma, will list of operator names that follow, separated by a comma, will
be used to attache Row operators to the given statement. be used to attache Row operators to the given statement.
## Available ResultSet Operators ## Available ResultSet Operators
- pushvars - Push a copy of the current thread local variables onto - pushvars - Push a copy of the current thread local variables onto
@ -44,11 +45,11 @@ row data, you must apply a row operator as explained below.
conjunction with the row operators below. conjunction with the row operators below.
- popvars - Pop the last thread local variable set from the thread-local - popvars - Pop the last thread local variable set from the thread-local
stack into vars, replacing the previous content. This does nothing stack into vars, replacing the previous content. This does nothing
with the ResultSet data. with the ResultSet data.
- clearvars - Clears the contents of the thread local variables. This - clearvars - Clears the contents of the thread local variables. This
does nothign with the ResultSet data. does nothign with the ResultSet data.
- trace - Flags a statement to be traced on the server-side and then logs - trace - Flags a statement to be traced on the server-side and then
the details of the trace to the trace log file. logs the details of the trace to the trace log file.
- log - Logs basic data to the main log. This is useful to verify that - log - Logs basic data to the main log. This is useful to verify that
operators are loading and triggering as expected. operators are loading and triggering as expected.
- assert_singlerow - Throws an exception (ResultSetVerificationException) - assert_singlerow - Throws an exception (ResultSetVerificationException)
@ -61,22 +62,22 @@ Examples:
- s1: | - s1: |
a statement a statement
rsoperators: pushvars, clearvars rsoperators: pushvars, clearvars
``` ```
## Available Row Operators: ## Available Row Operators:
- savevars - Copies the values of the row into the thread-local variables. - savevars - Copies the values of the row into the thread-local variables.
- saverows - Copies the rows into a special CQL-only thread local row state. - saverows - Copies the rows into a special CQL-only thread local row state.
Examples: Examples:
``` ```
statements: statements:
- s2: | - s2: |
a statement a statement
rowoperators: saverows rowoperators: saverows
``` ```
## Injecting additional Queries ## Injecting additional Queries (Future)
It is possible to inject new operations to an activity. However, such It is possible to inject new operations to an activity. However, such
operations are _indirect_ to cycles, since they must be based on the results operations are _indirect_ to cycles, since they must be based on the results

View File

@ -1,4 +1,4 @@
# cqlverify activity type # cqlverify
This activity type allows you to read values from a database and compare them to This activity type allows you to read values from a database and compare them to
the generated values that were expected to be written, row-by-row, producing a the generated values that were expected to be written, row-by-row, producing a

View File

@ -1,5 +1,5 @@
--- ---
title: Diag ActivityType title: Diag ActivityType
weight: 32 weight: 32
menu: menu:
main: main:
@ -8,10 +8,9 @@ menu:
weight: 12 weight: 12
--- ---
{{< warning >}} {{< warning >}} This section is out of date, and will be updated after
This section is out of date, and will be updated after the next major release the next major release with details on building async drivers. {{<
with details on building async activity types. /warning >}}
{{< /warning >}}
If you take all the code chunks from this document and concatenate them If you take all the code chunks from this document and concatenate them
together, you'll have 'diag', one of the in-build activity types for together, you'll have 'diag', one of the in-build activity types for
@ -241,4 +240,3 @@ report. If it is time to report, we mark the time in lastUpdate.
This is all there is to making an activity react to real-time changes in the activity definition. This is all there is to making an activity react to real-time changes in the activity definition.

View File

@ -8,10 +8,9 @@ menu:
weight: 12 weight: 12
--- ---
{{< warning >}} {{< warning >}} This section is out of date, and will be updated after
This section is out of date, and will be updated after the next major release the next major release with details on building async drivers. {{<
with details on building async activity types. /warning >}}
{{< /warning >}}
## Introduction ## Introduction
@ -27,7 +26,7 @@ In an async activity, you still have multiple threads, but in this case, each th
more asynchronous operations. The `async=100` parameter, for example, informs an activity that it needs to allocate more asynchronous operations. The `async=100` parameter, for example, informs an activity that it needs to allocate
100 total operations over the allocated threads. In the case of `async=100 threads=10`, it is the responsibility 100 total operations over the allocated threads. In the case of `async=100 threads=10`, it is the responsibility
of the ActivityType's action dispenser to configure their actions to know that each of them can juggle 10 operations of the ActivityType's action dispenser to configure their actions to know that each of them can juggle 10 operations
each. each.
{{< note >}}The *async* parameter has a standard meaning in nosqlbench. If it is defined, async is enabled. Its {{< note >}}The *async* parameter has a standard meaning in nosqlbench. If it is defined, async is enabled. Its
parameter value is the number of total async operations that can be in flight at any one instant, with the number parameter value is the number of total async operations that can be in flight at any one instant, with the number
@ -42,7 +41,7 @@ behavior but getting something else.
The contract between a motor and an action is very basic. The contract between a motor and an action is very basic.
- Each motor submits as many async operations as is allowed to its action, as long as there are - Each motor submits as many async operations as is allowed to its action, as long as there are
cycles remaining, until the action signals that it is at its limit. cycles remaining, until the action signals that it is at its limit.
- As long as an action is able to retire an operation by giving a result back to its motor, - As long as an action is able to retire an operation by giving a result back to its motor,
the motor keeps providing one more and retiring one more, as long as there are cycles remaining. the motor keeps providing one more and retiring one more, as long as there are cycles remaining.
@ -74,8 +73,8 @@ as a developer.
but it can return a simple op context if no specialization is needed. but it can return a simple op context if no specialization is needed.
4. op contexts are recycled to avoid heap pressure for high data rates. This makes it relatively 4. op contexts are recycled to avoid heap pressure for high data rates. This makes it relatively
low-cost to use the specialized op context to hold contextual data that may otherwise be low-cost to use the specialized op context to hold contextual data that may otherwise be
expensive to _malloc_ and _free_. expensive to _malloc_ and _free_.
### Examples ### Examples
Developers can refer to the Diag activity type implementation for further examples. Developers can refer to the Diag activity type implementation for further examples.

View File

@ -1,5 +1,5 @@
--- ---
title: Building ActivityTypes title: Building ActivityTypes
weight: 32 weight: 32
menu: menu:
main: main:
@ -15,7 +15,7 @@ menu:
- Maven - Maven
## Building new Activity Types ## Building new Driver Types
1. Add the nosqlbench API to your project via Maven: 1. Add the nosqlbench API to your project via Maven:

View File

@ -1,22 +0,0 @@
## Help Topics
### Built-in Component Docs
Generally, all named activity types, input types, oputput types, etc
have their own documentation. You can access those with a command like:
PROG help diag
### Advanced Topics
For any of the topics listed here, you can get detailed help by
running PROG help <topic>.
- topics
- commandline
- cli_scripting
- activity_inputs
- activity_outputs
- cycle_log

View File

@ -5,31 +5,34 @@ weight: 10
# Getting Support # Getting Support
In general, our goals with NoSQLBench are to make the help systems and examples wrap around the users like In general, our goals with NoSQLBench are to make the help systems and examples wrap around the users like a suit of
a suit of armor, so that they feel capable of doing most things without having to ask for help. Please armor, so that they feel capable of doing most things without having to ask for help. Please keep this in mind when
keep this in mind when looking for personal support form our community, and help us find those places looking for personal support form our community, and help us find those places where the docs are lacking. Maybe you can
where the docs are lacking. Maybe you can help us by adding some missing docs! help us by adding some missing docs!
## NoSQLBench Slack ## NoSQLBench Slack
There is a new [slack channel](https://join.slack.com/t/nosqlbench/shared_invite/zt-cu9f2jpe-XiHN3SsUDcjkVgxaURFuaw) for NoSQLBench. There is a new
[slack channel](https://join.slack.com/t/nosqlbench/shared_invite/zt-cu9f2jpe-XiHN3SsUDcjkVgxaURFuaw) for NoSQLBench.
Please join it if you are a new or existing NoSQLBench user and help us get it going! Please join it if you are a new or existing NoSQLBench user and help us get it going!
## General Feedback ## General Feedback
These guidelines are mirrored at the [Submitting Feedback](https://github.com/nosqlbench/nosqlbench/wiki/Submitting-Feedback) These guidelines are mirrored at the
wiki page at the nosqlbench project site, which is also where any `[Submit Feedback]` links should will take you. [Submitting Feedback](https://github.com/nosqlbench/nosqlbench/wiki/Submitting-Feedback) wiki page at the nosqlbench
project site, which is also where any `[Submit Feedback]` links should will take you.
## Bug Fixes ## Bug Fixes
If you think you have found a bug, please [file a bug report](https://github.com/nosqlbench/nosqlbench/issues/new?labels=bug). If you think you have found a bug, please
nosqlbench is actively used within DataStax, and verified bugs will get attention as resources permit. Bugs reports which are [file a bug report](https://github.com/nosqlbench/nosqlbench/issues/new?labels=bug). nosqlbench is actively used within
more detailed, or bug reports which include steps to reproduce will get attention first. DataStax, and verified bugs will get attention as resources permit. Bugs reports which are more detailed, or bug reports
which include steps to reproduce will get attention first.
## Feature Requests ## Feature Requests
If you would like to see something in nosqlbench that is not there yet, If you would like to see something in nosqlbench that is not there yet,please
please [submit a feature request](https://github.com/nosqlbench/nosqlbench/issues/new?labels=feature). [submit a feature request](https://github.com/nosqlbench/nosqlbench/issues/new?labels=feature).
## Documentation Requests ## Documentation Requests

View File

@ -5,12 +5,14 @@ weight: 0
## Welcome to NoSQLBench ## Welcome to NoSQLBench
Welcome to the documentation for NoSQLBench. This is a power tool that emulates real application workloads. Welcome to the documentation for NoSQLBench. This is a power tool that emulates real application workloads. This means
This means that you can fast-track performance, sizing and data model testing without writing your own testing harness. that you can fast-track performance, sizing and data model testing without writing your own testing harness.
To get started right away, jump to the [Quick Start Example](/index.html#/docs/02_getting_started.html) from the menu on the left. To get started right away, jump to the
[Quick Start Example](/index.html#/docs/02_getting_started.html) from the menu on the left.
To see the ways you can get NoSQLBench, check out the project site [DOWNLOADS.md](https://github.com/nosqlbench/nosqlbench/blob/master/DOWNLOADS.md). To see the ways you can get NoSQLBench, check out the project site
[DOWNLOADS.md](https://github.com/nosqlbench/nosqlbench/blob/master/DOWNLOADS.md).
## What is NoSQLBench? ## What is NoSQLBench?
@ -18,54 +20,44 @@ NoSQLBench is a serious performance testing tool for the NoSQL ecosystem.
**NoSQLBench brings advanced testing capabilities into one tool that are not found in other testing tools.** **NoSQLBench brings advanced testing capabilities into one tool that are not found in other testing tools.**
- You can run common testing workloads directly from the command line. You - You can run common testing workloads directly from the command line. You can start doing this within 5 minutes of
can start doing this within 5 minutes of reading this. reading this.
- You can generate virtual data sets of arbitrary size, with deterministic - You can generate virtual data sets of arbitrary size, with deterministic data and statistically shaped values.
data and statistically shaped values. - You can design custom workloads that emulate your application, contained in a single file, based on statement
- You can design custom workloads that emulate your application, contained templates - no IDE or coding required.
in a single file, based on statement templates - no IDE or coding required. - You can immediately plot your results in a docker and grafana stack on Linux with a single command line option.
- You can immediately plot your results in a docker and grafana stack on Linux - When needed, you can open the access panels and rewire the runtime behavior of NoSQLBench to do advanced testing,
with a single command line option. including a full scripting environment with Javascript.
- When needed, you can open the access panels and rewire the runtime behavior
of NoSQLBench to do advanced testing, including a full scripting environment
with Javascript.
The core machinery of NoSQLBench has been built with attention to detail. The core machinery of NoSQLBench has been built with attention to detail. It has been battle tested within DataStax as a
It has been battle tested within DataStax as a way to help users validate their way to help users validate their data models, baseline system performance, and qualify system designs for scale.
data models, baseline system performance, and qualify system designs for scale.
In short, NoSQLBench wishes to be a programmable power tool for performance In short, NoSQLBench wishes to be a programmable power tool for performance testing. However, it is somewhat generic. It
testing. However, it is somewhat generic. It doesn't know directly about a doesn't know directly about a particular type of system, or protocol. It simply provides a suitable machine harness in
particular type of system, or protocol. It simply provides a suitable machine which to put your drivers and testing logic. If you know how to build a client for a particular kind of system, EB will
harness in which to put your drivers and testing logic. If you know how to build let you load it like a plugin and control it dynamically.
a client for a particular kind of system, EB will let you load it like a plugin
and control it dynamically.
Initially, NoSQLBench comes with support for CQL, but we would like to see this Initially, NoSQLBench comes with support for CQL, but we would like to see this expanded with contributions from others.
expanded with contributions from others.
## Origins ## Origins
The code in this project comes from multiple sources. The procedural data The code in this project comes from multiple sources. The procedural data generation capability was known before as
generation capability was known before as 'Virtual Data Set'. The core runtime 'Virtual Data Set'. The core runtime and scripting harness was from the 'EngineBlock' project. The CQL support was
and scripting harness was from the 'EngineBlock' project. The CQL support was previously used within DataStax. In March of 2020, DataStax and the project maintainers for these projects decided to
previously used within DataStax. In March of 2020, DataStax and the project put everything into one OSS project in order to make contributions and sharing easier for everyone. Thus, the new
maintainers for these projects decided to put everything into one OSS project project name and structure was launched as nosqlbench.io. NoSQLBench is an independent project that is primarily
in order to make contributions and sharing easier for everyone. Thus, the new sponsored by DataStax.
project name and structure was launched as nosqlbench.io. NoSQLBench is an
independent project that is primarily sponsored by DataStax.
We offer NoSQLBench as a new way of thinking about testing systems. It is not We offer NoSQLBench as a new way of thinking about testing systems. It is not limited to testing only one type of
limited to testing only one type of system. It is our wish to build a community system. It is our wish to build a community of users and practice around this project so that everyone in the NoSQL
of users and practice around this project so that everyone in the NoSQL ecosystem ecosystem can benefit from common concepts and understanding and reliable patterns of use.
can benefit from common concepts and understanding and reliable patterns of use.
## Scalable User Experience ## Scalable User Experience
NoSQLBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to NoSQLBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to do just what you
do just what you need without worrying about the rest. If you need to do something simple, it should need without worrying about the rest. If you need to do something simple, it should be simple to find the right settings
be simple to find the right settings and just do it. If you need something more sophisticated, then you and just do it. If you need something more sophisticated, then you should be able to find what you need with a
should be able to find what you need with a reasonable amount of effort and no surprises. reasonable amount of effort and no surprises.
That is the core design principle behind NoSQLBench. We hope you like it. That is the core design principle behind NoSQLBench. We hope you like it.

View File

@ -12,21 +12,17 @@ Some of the features discussed here are only for advanced testing scenarios.
## Hybrid Rate Limiting ## Hybrid Rate Limiting
Rate limiting is a complicated endeavor, if you want to do it well. The basic Rate limiting is a complicated endeavor, if you want to do it well. The basic rub is that going fast means you have to
rub is that going fast means you have to be less accurate, and vice-versa. be less accurate, and vice-versa. As such, rate limiting is a parasitic drain on any system. The act of rate limiting is
As such, rate limiting is a parasitic drain on any system. The act of rate in and of itself poses a limit to the maximum rate, regardless of the settings you pick, because this forces your system
limiting is in and of itself poses a limit to the maximum rate, regardless to interact with some hardware notion of time passing, and this takes CPU cycles that could be going to the thing you
of the settings you pick, because this forces your system to interact with are limiting.
some hardware notion of time passing, and this takes CPU cycles that could
be going to the thing you are limiting.
This means that in practice, rate limiters are often very featureless. It's This means that in practice, rate limiters are often very featureless. It's daunting enough to need rate limiting, and
daunting enough to need rate limiting, and asking for anything more than asking for anything more than that is often wishful thinking. Not so in NoSQLBench.
that is often wishful thinking. Not so in NoSQLBench.
The rate limiter in NoSQLBench provides a comparable degree of performance The rate limiter in NoSQLBench provides a comparable degree of performance and accuracy to others found in the Java
and accuracy to others found in the Java ecosystem, but it *also* has advanced ecosystem, but it *also* has advanced features:
features:
- Allows a sliding scale between average rate limiting and strict rate limiting. - Allows a sliding scale between average rate limiting and strict rate limiting.
- Internally accumulates delay time, for C.O. friendly metrics - Internally accumulates delay time, for C.O. friendly metrics
@ -35,60 +31,48 @@ features:
## Flexible Error Handling ## Flexible Error Handling
An emergent facility in NoSQLBench is the way that error are handled within An emergent facility in NoSQLBench is the way that error are handled within an activity. For example, with the CQL
an activity. For example, with the CQL activity type, you are able to route activity type, you are able to route error handling for any of the known exception types. You can count errors, you can
error handling for any of the known exception types. You can count errors, log them. You can cause errored operations to auto-retry if possible, up to a configurable number of tries.
you can log them. You can cause errored operations to auto-retry if possible,
up to a configurable number of tries.
This means, that as a user, you get to decide what your test is about. Is it This means, that as a user, you get to decide what your test is about. Is it about measuring some nominal but
about measuring some nominal but anticipated level of errors due to intentional anticipated level of errors due to intentional over-saturation? If so, then count the errors, and look at their
over-saturation? If so, then count the errors, and look at their histogram data histogram data for timing details within the available timeout.
for timing details within the available timeout.
Are you doing a basic stability test, where you want the test to error out Are you doing a basic stability test, where you want the test to error out for even the slightest error? You can
for even the slightest error? You can configure for that if you need. configure for that if you need.
## Cycle Logging ## Cycle Logging
It is possible to record the result status of each and every cycles in It is possible to record the result status of each and every cycles in a NoSQLBench test run. If the results are mostly
a NoSQLBench test run. If the results are mostly homogeneous, the RLE homogeneous, the RLE encoding of the results will reduce the output file down to a small fraction of the number of
encoding of the results will reduce the output file down to a small cycles. The errors are mapped to ordinals, and these ordinals are stored into a direct RLE-encoded log file. For most
fraction of the number of cycles. The errors are mapped to ordinals, and testing where most of the result are simply success, this file will be tiny. You can also convert the cycle log into
these ordinals are stored into a direct RLE-encoded log file. For most textual form for other testing and post-processing and vice-versa.
testing where most of the result are simply success, this file will be tiny.
You can also convert the cycle log into textual form for other testing
and post-processing and vice-versa.
## Op Sequencing ## Op Sequencing
The way that operations are planned for execution in NoSQLBench is based on The way that operations are planned for execution in NoSQLBench is based on a stable ordering that is configurable. The
a stable ordering that is configurable. The statement forms are mixed statement forms are mixed together based on their relative ratios. The three schemes currently supported are round-robin
together based on their relative ratios. The three schemes currently supported with exhaustion (bucket), duplicate in order (concat), and a way to spread each statement out over the unit interval
are round-robin with exhaustion (bucket), duplicate in order (concat), and (interval). These account for most configuration scenarios without users having to micro-manage their statement
a way to spread each statement out over the unit interval (interval). These templates.
account for most configuration scenarios without users having to micro-manage
their statement templates.
## Sync and Async ## Sync and Async
There are two distinct usage modes in NoSQLBench when it comes to operation There are two distinct usage modes in NoSQLBench when it comes to operation dispatch and thread management:
dispatch and thread management:
### Sync ### Sync
Sync is the default form. In this mode, each thread reads its sequence Sync is the default form. In this mode, each thread reads its sequence and dispatches one statement at a time, holding
and dispatches one statement at a time, holding only one operation in flight only one operation in flight per thread. This is the mode you often use when you want to emulate an application's
per thread. This is the mode you often use when you want to emulate an request-per-thread model, as it implicitly linearizes the order of operations within the computed sequence of
application's request-per-thread model, as it implicitly linearizes the statements.
order of operations within the computed sequence of statements.
### Async ### Async
In Async mode, each thread in an activity is reponsible for juggling a number In Async mode, each thread in an activity is reponsible for juggling a number of operations in-flight. This allows a
of operations in-flight. This allows a NoSQLBench client to juggle an NoSQLBench client to juggle an arbitrarily high number of connections, limited primarily by how much memory you have.
arbitrarily high number of connections, limited primarily by how much memory
you have.
Internally, the Sync and Async modes have different code paths. It is possible Internally, the Sync and Async modes have different code paths. It is possible for an activity type to support one or
for an activity type to support one or both of these. both of these.

View File

@ -5,61 +5,46 @@ weight: 2
# Refined Core Concepts # Refined Core Concepts
The core concepts that NoSQLBench is built on have been scrutinized, The core concepts that NoSQLBench is built on have been scrutinized, replaced, refined, and hardened through several
replaced, refined, and hardened through several years of use years of use by users of various needs and backgrounds.
by users of various needs and backgrounds.
This is important when trying to find a way to express common patterns This is important when trying to find a way to express common patterns in what is often a highly fragmented practice.
in what is often a highly fragmented practice. Testing is hard. Scale Testing is hard. Scale testing is hard. Distributed testing is hard. We need a set of conceptual building blocks that
testing is hard. Distributed testing is hard. We need a set of conceptual can span across workloads and system types, and machinery to put these concepts to use. Some concepts used in NoSQLBench
building blocks that can span across workloads and system types, and are shared below for illustration, but this is by no means an exhaustive list.
machinery to put these concepts to use. Some concepts used in NoSQLBench
are shared below for illustration, but this is by no means an exhaustive
list.
### The Cycle ### The Cycle
Cycles in NoSQLBench are whole numbers on a number line. All operations Cycles in NoSQLBench are whole numbers on a number line. All operations in a NoSQLBench session are derived from a
in a NoSQLBench session are derived from a single cycle. It's a long value, single cycle. It's a long value, and a seed. The cycle determines not only which statements (of those available) will
and a seed. The cycle determines not only which statements (of those available) get executed, but it also determines what the values bound to that statement will be.
will get executed, but it also determines what the values bound to that
statement will be.
Cycles are specified as a closed-open `[min,max)` interval, just as slices Cycles are specified as a closed-open `[min,max)` interval, just as slices in some languages. That is, the min value is
in some languages. That is, the min value is included in the range, but the included in the range, but the max value is not. This means that you can stack slices using common numeric reference
max value is not. This means that you can stack slices using common numeric points without overlaps or gaps. It means you can have exact awareness of what data is in your dataset, even
reference points without overlaps or gaps. It means you can have exact awareness incrementally.
of what data is in your dataset, even incrementally.
You can think of a cycle as a single-valued coordinate system for data that You can think of a cycle as a single-valued coordinate system for data that lives adjacent to that number on the number
lives adjacent to that number on the number line. line.
### The Activity ### The Activity
An activity is a multi-threaded flywheel of statements in some sequence An activity is a multi-threaded flywheel of statements in some sequence and ratio. Activities run over the numbers in a
and ratio. Activities run over the numbers in a cycle range. Each activity cycle range. Each activity has a driver type which determines the native protocol that it speaks.
has a driver type which determines the native protocol that it speaks.
An activity continuously
### The Activity Type ### The Activity Type
An activity type is a high level driver for a protocol. It is like a An activity type is a high level driver for a protocol. It is like a statement-aware cartridge that knows how to take a
statement-aware cartridge that knows how to take a basic statement template basic statement template and turn it into an operation for the scenario to execute.
and turn it into an operation for the scenario to execute.
### The Scenario ### The Scenario
The scenario is a runtime session that holds the activities while they run. The scenario is a runtime session that holds the activities while they run. A NoSQLBench scenario is responsible for
A NoSQLBench scenario is responsible for aggregating global runtime settings, aggregating global runtime settings, metrics reporting channels, logfiles, and so on.
metrics reporting channels, logfiles, and so on.
### The Scenario Script ### The Scenario Script
Each scenario is governed by a script runs single-threaded, asynchronously Each scenario is governed by a script runs single-threaded, asynchronously from activities, but in control of
from activities, but in control of activities. If needed, the scenario script activities. If needed, the scenario script is automatically created for the user, and the user never knows it is there.
is automatically created for the user, and the user never knows it is there. If the user has advanced testing requirements, then they may take advantage of the scripting capability at such time.
If the user has advanced testing requirements, then they may take advantage When the script exits, *AND* all activities are complete, then the scenario is complete..
of the scripting capability at such time.
When the script exits, *AND* all activities are complete, then the scenario
is complete..

View File

@ -5,48 +5,43 @@ weight: 12
# High Fidelity Metrics # High Fidelity Metrics
Since NoSQLBench has been built as a serious testing tool for all users, Since NoSQLBench has been built as a serious testing tool for all users, some attention was necessary on the way metric
some attention was necessary on the way metric are used. are used.
## Discrete Reservoirs ## Discrete Reservoirs
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs. In NoSQLBench, we avoid the use of time-decaying metrics reservoirs. Internally, we use HDR reservoirs with discrete
Internally, we use HDR reservoirs with discrete time boundaries. This time boundaries. This is so that you can look at the min and max values and know that they apply accurately to the whole
is so that you can look at the min and max values and know that they sampling window.
apply accurately to the whole sampling window.
## Metric Naming ## Metric Naming
All activity types that run have a symbolic alias that identifies All running activities have a symbolic alias that identifies them for the purposes of automation and metrics. If you
them for the purposes of automation and metrics. If you have multiple have multiple activities running concurrently, they will have different names and will be represnted distinctly in the
activities running concurrently, they will have different names and will metrics flow.
be represnted distinctly in the metrics flow.
## Precision and Units ## Precision and Units
By default, the internal HDR histogram reservoirs are kept at 4 digits By default, the internal HDR histogram reservoirs are kept at 4 digits of precision. All timers are kept at nanosecond
of precision. All timers are kept at nanosecond resolution. resolution.
## Metrics Reportring ## Metrics Reportring
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and Metrics can be reported via graphite as well as CSV, logs, HDR logs, and HDR stats summary CSV files.
HDR stats summary CSV files.
## Coordianated Omission ## Coordianated Omission
The metrics naming and semantics in NoSQLBench are setup so that you The metrics naming and semantics in NoSQLBench are setup so that you can have coordinated omission metrics when they are
can have coordinated omission metrics when they are appropriate, but appropriate, but there are no there changes when they are not. This means that the metric names and meanings remain
there are no there changes when they are not. This means that the metric stable in any case.
names and meanings remain stable in any case.
Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused and thus prone to confusing
and thus prone to confusing people. people.
Instead, the terms `service time`, `wait time`, and `response time` are used. Instead, the terms `service time`, `wait time`, and `response time` are used. These are abbreviated in metrics as
These are abbreviated in metrics as `servicetime`, `waittime`, and `responsetime`. `servicetime`, `waittime`, and `responsetime`.
The `servicetime` metric is the only one which is always present. When a The `servicetime` metric is the only one which is always present. When a rate limiter is used, then additionally
rate limiter is used, then additionally `waittime` and `responsetime` are `waittime` and `responsetime` are reported.
reported.

View File

@ -5,23 +5,18 @@ weight: 10
# NoSQLBench Showcase # NoSQLBench Showcase
Since NoSQLBench is new on the scene in its current form, you may be wondering Since NoSQLBench is new on the scene in its current form, you may be wondering why you would want to use it over any
why you would want to use it over any other tool. That is what this section is all other tool. That is what this section is all about.
about.
If you want to look under the hood of this toolkit before giving it a spin, If you want to look under the hood of this toolkit before giving it a spin, this section is for you. You don't have to
this section is for you. You don't have to read all of this! It is here for those read all of this! It is here for those who want to know the answer to the question "So, what's the big deal??" Just
who want to know the answer to the question "So, what's the big deal??" remember it is here for later if you want to skip to the next section and get started testing.
Just remember it is here for later if you want to skip to the next section and get
started testing.
NoSQLBench can do nearly everything that other testing tools can do, and more. It NoSQLBench can do nearly everything that other testing tools can do, and more. It achieves this by focusing on a
achieves this by focusing on a scalable user experience in combination with a scalable user experience in combination with a modular internal architecture.
modular internal architecture.
NoSQLBench is a workload construction and simulation tool for scalable systems NoSQLBench is a workload construction and simulation tool for scalable systems testing. That is an entirely different
testing. That is an entirely different scope of endeavor than most other tools. scope of endeavor than most other tools.
The pages in this section all speak to advanced capabilities that are unique The pages in this section all speak to advanced capabilities that are unique to NoSQLBench. In time, we want to show
to NoSQLBench. In time, we want to show these with basic scenario examples, right these with basic scenario examples, right in the docs.
in the docs.

View File

@ -5,23 +5,18 @@ weight: 11
# Modular Architecture # Modular Architecture
The internal architecture of NoSQLBench is modular throughout. The internal architecture of NoSQLBench is modular throughout. Everything from the scripting extensions to the data
Everything from the scripting extensions to the data generation functions generation functions is enumerated at compile time into a service descriptor, and then discovered at runtime by the SPI
is enumerated at compile time into a service descriptor, and then discovered mechanism in Java.
at runtime by the SPI mechanism in Java.
This means that extending and customizing bundles and features is quite This means that extending and customizing bundles and features is quite manageable.
manageable.
It also means that it is relatively easy to provide a suitable It also means that it is relatively easy to provide a suitable API for multi-protocol support. In fact, there are
API for multi-protocol support. In fact, there are several drivers several drivers avaialble in the current NoSQLBench distribution. You can list them out with `./nb --list-drivers`, and
avaialble in the current NoSQLBench distribution. You can list them you can get help on how to use each of them with `./nb help <name>`.
out with `./nb --list-drivers`, and you can get help on
how to use each of them with `./nb help <name>`.
This also is a way for us to encourage and empower other contributors This also is a way for us to encourage and empower other contributors to help develop the capabilities and reach of
to help develop the capabilities and reach of NoSQLBench as a bridge NoSQLBench as a bridge building tool in our community. This level of modularity is somewhat unusual, but it serves the
building tool in our community. This level of modularity is somewhat purpose of helping users with new features.
unusual, but it serves the purpose of helping users with new features.

View File

@ -5,47 +5,38 @@ weight: 2
# Portable Workloads # Portable Workloads
All of the workloads that you can build with NoSQLBench are self-contained All of the workloads that you can build with NoSQLBench are self-contained in a workload file. This is a
in a workload file. This is a statement-oriented configuration file that statement-oriented configuration file that contains templates for the operations you want to run in a workload.
contains templates for the operations you want to run in a workload.
This defines part of an activity - the iterative flywheel part that is This defines part of an activity - the iterative flywheel part that is run directly within an activity type. This file
run directly within an activity type. This file contains everything needed contains everything needed to run a basic activity -- A set of statements in some ratio. It can be used to start an
to run a basic activity -- A set of statements in some ratio. It can be activity, or as part of several activities within a scenario.
used to start an activity, or as part of several activities within a scenario.
## Standard YAML Format ## Standard YAML Format
The format for describing statements in NoSQLBench is generic, but in a The format for describing statements in NoSQLBench is generic, but in a particular way that is specialized around
particular way that is specialized around describing statements for a workload. describing statements for a workload.
That means that you can use the same YAML format to describe a workload That means that you can use the same YAML format to describe a workload for kafka as you can for Apache Cassandra or
for kafka as you can for Apache Cassandra or DSE. DSE.
The YAML structure has been tailored to describing statements, their The YAML structure has been tailored to describing statements, their data generation bindings, how they are grouped and
data generation bindings, how they are grouped and selected, and the selected, and the parameters needed by drivers, like whether they should be prepared statements or not.
parameters needed by drivers, like whether they should be prepared
statements or not.
Further, the YAML format allows for defaults and overrides with a Further, the YAML format allows for defaults and overrides with a very simple mechanism that reduces editing fatigue for
very simple mechanism that reduces editing fatigue for frequent users. frequent users.
You can also template document-wide macro paramers which are taken You can also template document-wide macro paramers which are taken from the command line parameters just like any other
from the command line parameters just like any other parameter. This is parameter. This is a way of templating a workload and make it multi-purpose or adjustable on the fly.
a way of templating a workload and make it multi-purpose or adjustable
on the fly.
## Experimentation Friendly ## Experimentation Friendly
Because the workload YAML format is generic across activity types, Because the workload YAML format is generic across activity types, it is possible to ask one acivity type to interpret
it is possible to ask one acivity type to interpret the statements that are the statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you
meant for another. This isn't generally a good idea, but it becomes want to have a very high level activity type like `stdout` use a lower-level syntax like that of the `cql` activity
extremely handy when you want to have a very high level activity type like type. When you do this, the stdout activity type _plays_ the statements to your console as they would be executed in
`stdout` use a lower-level syntax like that of the `cql` activity type. CQL, data bindings and all.
When you do this, the stdout activity type _plays_ the statements to your
console as they would be executed in CQL, data bindings and all.
This means you can empirically and substantively demonstrate and verify This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
access patterns, data skew, and other dataset details before you details before you change back to cql mode and turn up the settings for a higher scale test.
change back to cql mode and turn up the settings for a higher scale test.

View File

@ -5,91 +5,68 @@ weight: 3
# Scripting Environment # Scripting Environment
The ability to write open-ended testing simulations is provided in The ability to write open-ended testing simulations is provided in EngineBlock by means of a scripted runtime, where
EngineBlock by means of a scripted runtime, where each scenario is each scenario is driven from a control script that can do anything the user wants.
driven from a control script that can do anything the user wants.
## Dynamic Parameters ## Dynamic Parameters
Some configuration parameters of activities are designed to be Some configuration parameters of activities are designed to be assignable while a workload is running. This makes things
assignable while a workload is running. This makes things like like threads, rates, and other workload dynamics pseudo real-time. The internal APIs work with the scripting environment
threads, rates, and other workload dynamics pseudo real-time. to expose these parameters directly to scenario scripts.
The internal APIs work with the scripting environment to expose
these parameters directly to scenario scripts.
## Scripting Automatons ## Scripting Automatons
When a NoSQLBench scenario is running, it is under the control of a When a NoSQLBench scenario is running, it is under the control of a single-threaded script. Each activity that is
single-threaded script. Each activity that is started by this script started by this script is run within its own threadpool, asynchronously.
is run within its own threadpool, asynchronously.
The control script has executive control of the activities, as well The control script has executive control of the activities, as well as full visibility into the metrics that are
as full visibility into the metrics that are provided by each activity. provided by each activity. The way these two parts of the runtime meet is through the service objects which are
The way these two parts of the runtime meet is through the service installed into the scripting runtime. These service objects provide a named access point for each running activity and
objects which are installed into the scripting runtime. These service its metrics.
objects provide a named access point for each running activity and its
metrics.
This means that the scenario script can do something simple, like start This means that the scenario script can do something simple, like start activities and wait for them to complete, OR, it
activities and wait for them to complete, OR, it can do something can do something more sophisticated like dynamically and interative scrutinize the metrics and make realtime adjustments
more sophisticated like dynamically and interative scrutinize the metrics to the workload while it runs.
and make realtime adjustments to the workload while it runs.
## Analysis Methods ## Analysis Methods
Scripting automatons that do feedback-oriented analysis of a target system Scripting automatons that do feedback-oriented analysis of a target system are called analysis methods in NoSQLBench. We
are called analysis methods in NoSQLBench. We have prototypes a couple of have prototypes a couple of these already, but there is nothing keeping the adventurous from coming up with their own.
these already, but there is nothing keeping the adventurous from coming up
with their own.
## Command Line Scripting ## Command Line Scripting
The command line has the form of basic test commands and parameters. The command line has the form of basic test commands and parameters. These command get converted directly into scenario
These command get converted directly into scenario control script control script in the order they appear. The user can choose whether to stay in high level executive mode, with simple
in the order they appear. The user can choose whether to stay in commands like "run workload=...", or to drop down directly into script design. They can look at the equivalent script
high level executive mode, with simple commands like "run workload=...", for any command line by running --show-script. If you take the script that is dumped to console and run it, it should do
or to drop down directly into script design. They can look at the exactly the same thing as if you hadn't even looked at it and just the standard commands.
equivalent script for any command line by running --show-script.
If you take the script that is dumped to console and run it, it should
do exactly the same thing as if you hadn't even looked at it and just
the standard commands.
There are even ways to combine script fragments, full commands, and calls There are even ways to combine script fragments, full commands, and calls to scripts on the command line. Since each
to scripts on the command line. Since each variant is merely a way of variant is merely a way of constructing scenario script, they all get composited together before the scenario script is
constructing scenario script, they all get composited together before run.
the scenario script is run.
New introductions to NoSQLBench should focus on the command line. Once New introductions to NoSQLBench should focus on the command line. Once a user is familiar with this, it is up to them
a user is familiar with this, it is up to them whether to tap into the whether to tap into the deeper functionality. If they don't need to know about scenario scripting, then they shouldn't
deeper functionality. If they don't need to know about scenario scripting, have to learn about it to be effective.
then they shouldn't have to learn about it to be effective.
## Compared to DSLs ## Compared to DSLs
Other tools may claim that their DSL makes scenario "simulation" easier. Other tools may claim that their DSL makes scenario "simulation" easier. In practice, any DSL is generally dependent on
In practice, any DSL is generally dependent on a development tool to a development tool to lay the language out in front of a user in a fluent way. This means that DSLs are almost always
lay the language out in front of a user in a fluent way. This means that developer-targeted tools, and mostly useless for casual users who don't want to break out an IDE.
DSLs are almost always developer-targeted tools, and mostly useless for
casual users who don't want to break out an IDE.
One of the things a DSL proponent may tell you is that it tells you One of the things a DSL proponent may tell you is that it tells you "all the things you can do!". This is de-facto the
"all the things you can do!". This is de-facto the same thing as it same thing as it telling you "all the things you can't do" because it's not part of the DSL. This is not a win for the
telling you "all the things you can't do" because it's not part of the user. For DSL-based systems, the user has to use the DSL whether or not it enhances their creative control, while in
DSL. This is not a win for the user. For DSL-based systems, the user fact, most DSL aren't rich enough to do much that is interesting from a simulation perspective.
has to use the DSL whether or not it enhances their creative control,
while in fact, most DSL aren't rich enough to do much that is interesting
from a simulation perspective.
In NoSQLBench, we don't force the user to use the programming abstractions In NoSQLBench, we don't force the user to use the programming abstractions except at a very surface level -- the CLI. It
except at a very surface level -- the CLI. It is up to the user whether is up to the user whether or not to open the secret access panel for the more advance functionality. If they decide to
or not to open the secret access panel for the more advance functionality. do this, we give them a commodity language (ECMAScript), and we wire it into all the things they were already using. We
If they decide to do this, we give them a commodity language (ECMAScript), don't take away their expressivity by telling them what they can't do. This way, users can pick their level of
and we wire it into all the things they were already using. We don't take investment and reward as best fits thir individual needs, as it should be.
away their expressivity by telling them what they can't do. This way,
users can pick their level of investment and reward as best fits thir individual
needs, as it should be.
## Scripting Extensions ## Scripting Extensions
Also mentioned under the section on modularity, it is relatively easy Also mentioned under the section on modularity, it is relatively easy for a developer to add their own scripting
for a developer to add their own scripting extensions into NoSQLBench. extensions into NoSQLBench.

View File

@ -5,92 +5,71 @@ weight: 1
# Virtual Datasets # Virtual Datasets
The _Virtual Dataset_ capabilities within NoSQLBench allow you to The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate data on the fly. There are many reasons for
generate data on the fly. There are many reasons for using this technique using this technique in testing, but it is often a topic that is overlooked or taken for granted.
in testing, but it is often a topic that is overlooked or taken for granted.
## Industrial Strength ## Industrial Strength
The algorithms used to generate data are based on The algorithms used to generate data are based on advanced techniques in the realm of variate sampling. The authors have
advanced techniques in the realm of variate sampling. The authors have gone to great lengths to ensure that data generation is efficient and as much O(1) in processing time as possible.
gone to great lengths to ensure that data generation is efficient and
as much O(1) in processing time as possible.
For example... For example...
One technique that is used to achieve this is to initialize and cache One technique that is used to achieve this is to initialize and cache data in high resolution look-up tables for
data in high resolution look-up tables for distributions which may perform distributions which may perform differently depending on their density functions. The existing Apache Commons Math
differently depending on their density functions. The existing Apache libraries have been adapted into a set of interpolated Inverse Cumulative Distribution sampling functions. This means
Commons Math libraries have been adapted into a set of interpolated that you can use a Zipfian distribution in the same place as you would a Uniform distribution, and once initialized,
Inverse Cumulative Distribution sampling functions. This means that they sample with identical overhead. This means that by changing your test definition, you don't accidentally change the
you can use a Zipfian distribution in the same place as you would a behavior of your test client.
Uniform distribution, and once initialized, they sample with identical
overhead. This means that by changing your test definition, you don't
accidentally change the behavior of your test client.
## The Right Tool ## The Right Tool
Many other testing systems avoid building a dataset generation component. Many other testing systems avoid building a dataset generation component. It's a toubgh problem to solve, so it's often
It's a toubgh problem to solve, so it's often just avoided. Instead, they use just avoided. Instead, they use libraries like "faker" and variations on that. However, faker is well named, no pun
libraries like "faker" and variations on that. However, faker is well named, intended. It was meant as a vignette library, not a source of test data for realistic results. If you are using a
no pun intended. It was meant as a vignette library, not a source of test testing tool for scale testing and relying on a faker variant, then you will almost certainly get invalid results for
data for realistic results. If you are using a testing tool for scale testing any serious test.
and relying on a faker variant, then you will almost certainly get invalid
results for any serious test.
The virtual dataset component of NoSQLBench is a library that was designed The virtual dataset component of NoSQLBench is a library that was designed for high scale and realistic data streams.
for high scale and realistic data streams.
## Deterministic ## Deterministic
The data that is generated by the virtual dataset libraries is determinstic. The data that is generated by the virtual dataset libraries is determinstic. This means that for a given cycle in a
This means that for a given cycle in a test, the operation that is synthesized test, the operation that is synthesized for that cycle will be the same from one session to the next. This is
for that cycle will be the same from one session to the next. This is intentional. intentional. If you want to perturb the test data from one session to the next, then you can most easily do it by simply
If you want to perturb the test data from one session to the next, then you can selecting a different set of cycles as your basis.
most easily do it by simply selecting a different set of cycles as your basis.
This means that if you find something intersting in a test run, you can go This means that if you find something intersting in a test run, you can go back to it just by specifying the cycles in
back to it just by specifying the cycles in question. It also means that you question. It also means that you aren't losing comparative value between tests with additional randomness thrown in. The
aren't losing comparative value between tests with additional randomness thrown data you generate will still look random to the human eye, but that doesn't mean that it can't be reproducible.
in. The data you generate will still look random to the human eye, but that doesn't
mean that it can't be reproducible.
## Statistically Shaped ## Statistically Shaped
All this means is that the values you use to tie your dataset together All this means is that the values you use to tie your dataset together can be specific to any distribution that is
can be specific to any distribution that is appropriate. You can ask for appropriate. You can ask for a stream of floating point values 1 trillion values long, in any order. You can use
a stream of floating point values 1 trillion values long, in any order. discrete or continuous distributions, with whatever parameters you need.
You can use discrete or continuous distributions, with whatever parameters
you need.
## Best of Both Worlds ## Best of Both Worlds
Some might worry that fully synthetic testing data is not realistic enough. Some might worry that fully synthetic testing data is not realistic enough. The devil is in the details on these
The devil is in the details on these arguments, but suffice it to say that arguments, but suffice it to say that you can pick the level of real data you use as seed data with NoSQLBench.
you can pick the level of real data you use as seed data with NoSQLBench.
For example, using the alias sampling method and a published US census For example, using the alias sampling method and a published US census (public domain) list of names and surnames tha
(public domain) list of names and surnames tha occured more than 100x, occured more than 100x, we can provide extremely accurate samples of names according to the discrete distribution we
we can provide extremely accurate samples of names according to the know of. The alias method allows us to sample accurately in O(1) time from the entire dataset by turning a large number
discrete distribution we know of. The alias method allows us to sample of weights into two uniform samples. You will simply not find a better way to sample names of US names than this. (but
accurately in O(1) time from the entire dataset by turning a large number if you do, please file an issue!)
of weights into two uniform samples. You will simply not find a better way
to sample names of US names than this. (but if you do, please file an issue!)
## Java Idiomatic Extension ## Java Idiomatic Extension
The way that the virtual dataset component works allows Java developers to The way that the virtual dataset component works allows Java developers to write any extension to the data generation
write any extension to the data generation functions simply in the form functions simply in the form of Java 8 or newer Funtional interfaces. As long as they include the annotation processor
of Java 8 or newer Funtional interfaces. As long as they include the and annotate their classes, they will show up in the runtime and be available to any workload by their class name.
annotation processor and annotate their classes, they will show up in the
runtime and be available to any workload by their class name.
## Binding Recipes ## Binding Recipes
It is possible to stitch data generation functions together directly in It is possible to stitch data generation functions together directly in a workload YAML. These are data-flow sketches of
a workload YAML. These are data-flow sketches of functions that can functions that can be copied and pasted between workload descriptions to share or remix data streams. This allows for
be copied and pasted between workload descriptions to share or remix the adventurous to build sophisticated virtual datasets that emulate nuances of real datasets, but in a form that takes
data streams. This allows for the adventurous to build sophisticated up less space on the screen than this paragraph!
virtual datasets that emulate nuances of real datasets, but in a form
that takes up less space on the screen than this paragraph!

View File

@ -9,9 +9,8 @@ Let's run a simple test against a cluster to establish some basic familiarity wi
## Create a Schema ## Create a Schema
We will start by creating a simple schema in the database. We will start by creating a simple schema in the database. From your command line, go ahead and execute the following
From your command line, go ahead and execute the following command, command, replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
``` ```
./nb run driver=cql workload=cql-keyvalue tags=phase:schema host=<dse-host-or-ip> ./nb run driver=cql workload=cql-keyvalue tags=phase:schema host=<dse-host-or-ip>
@ -20,7 +19,9 @@ replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
This command is creating the following schema in your database: This command is creating the following schema in your database:
```cql ```cql
CREATE KEYSPACE baselines WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true; CREATE KEYSPACE baselines
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
AND durable_writes = true;
CREATE TABLE baselines.keyvalue ( CREATE TABLE baselines.keyvalue (
key text PRIMARY KEY, key text PRIMARY KEY,
@ -32,7 +33,8 @@ Let's break down each of those command line options.
`start` tells nosqlbench to start an activity. `start` tells nosqlbench to start an activity.
`driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to use the DataStax Java Driver and execute CQL statements against a database. `driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to
use the DataStax Java Driver and execute CQL statements against a database.
`workload=...` is used to specify the workload definition file that defines the activity. `workload=...` is used to specify the workload definition file that defines the activity.
@ -40,17 +42,19 @@ In this example, we use `cql-keyvalue` which is a pre-built workload that is pac
`tags=phase:schema` tells nosqlbench to run the yaml block that has the `phase:schema` defined as one of its tags. `tags=phase:schema` tells nosqlbench to run the yaml block that has the `phase:schema` defined as one of its tags.
In this example, that is the DDL portion of the `cql-keyvalue` workload. In this example, that is the DDL portion of the `cql-keyvalue` workload. `host=...` tells nosqlbench how to connect to
your database, only one host is necessary.
`host=...` tells nosqlbench how to connect to your database, only one host is necessary. If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with
`DESCRIBE KEYSPACE baselines`.
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with `DESCRIBE KEYSPACE baselines`.
## Load Some Data ## Load Some Data
Before running a test of typical access patterns where you want to capture the results, you need to make the test more interesting than loading an empty table. For this, we use the rampup phase. Before running a test of typical access patterns where you want to capture the results, you need to make the test more
interesting than loading an empty table. For this, we use the rampup phase.
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is generating for CQL statements. Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is
generating for CQL statements.
Go ahead and execute the following command: Go ahead and execute the following command:
@ -71,7 +75,7 @@ insert into baselines.keyvalue (key, value) values (8,296173906);
insert into baselines.keyvalue (key, value) values (9,97405552); insert into baselines.keyvalue (key, value) values (9,97405552);
``` ```
One thing to know is that nosqlbench deterministically generates data, so the generated values will be the same from run to run. NoSQLBench deterministically generates data, so the generated values will be the same from run to run.
Now we are ready to write some data to our database. Go ahead and execute the following from your command line: Now we are ready to write some data to our database. Go ahead and execute the following from your command line:
@ -81,11 +85,21 @@ Note the differences between this and the command that we used to generate the s
`tags=phase:rampup` is running the yaml block in `cql-keyvalue` that has only INSERT statements. `tags=phase:rampup` is running the yaml block in `cql-keyvalue` that has only INSERT statements.
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an appropriately large number of cycles in actual testing to make your main test meaningful. `cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an
appropriately large number of cycles in actual testing to make your main test meaningful.
:::info
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for `cycles=0..n`,
which makes cycles a zero-based quantity by default. For example, cycles=5 means that the activity will use cycles
0,1,2,3,4, but not 5. The reason for this is explained in detail in the Activity Parameters section.
:::
These parameters are explained in detail in the section on _Activity Parameters_.
`--progress console:1s` will print the progression of the run to the console every 1 second. `--progress console:1s` will print the progression of the run to the console every 1 second.
You should see output that looks like this You should see output that looks like this
``` ```
cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000) cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000) cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
@ -103,11 +117,13 @@ cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
## Run the main test phase ## Run the main test phase
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default this runs a 50% read / 50% write workload. Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default
this runs a 50% read / 50% write workload.
./nb start driver=cql workload=cql-keyvalue tags=phase:main host=<dse-host-or-ip> cycles=100k cyclerate=5000 threads=50 --progress console:1s ./nb start driver=cql workload=cql-keyvalue tags=phase:main host=<dse-host-or-ip> cycles=100k cyclerate=5000 threads=50 --progress console:1s
You should see output that looks like this: You should see output that looks like this:
``` ```
Logging to logs/scenario_20190812_154431_028.log Logging to logs/scenario_20190812_154431_028.log
cql-keyvalue: 0.50%/Running (details: min=0 cycle=500 max=100000) cql-keyvalue: 0.50%/Running (details: min=0 cycle=500 max=100000)
@ -141,12 +157,15 @@ We have a few new command line options here:
`tags=phase:main` is using a new block in our activity's yaml that contains both read and write queries. `tags=phase:main` is using a new block in our activity's yaml that contains both read and write queries.
`threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side. `threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for
workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
`cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option is the primary means to rate limit the workload and here we are running at 5000 ops/sec. `cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option
is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
## Now What? ## Now What?
Note in the above output, we see `Logging to logs/scenario_20190812_154431_028.log`. Note in the above output, we see `Logging to logs/scenario_20190812_154431_028.log`.
By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the next section Viewing Results. By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the
next section Viewing Results.

View File

@ -5,26 +5,26 @@ weight: 3
# Example Results # Example Results
We just ran a very simple workload against our database. In that example, we saw that We just ran a very simple workload against our database. In that example, we saw that nosqlbench writes to a log file
nosqlbench writes to a log file and it is in that log file where the most basic form of metrics are displayed. and it is in that log file where the most basic form of metrics are displayed.
## Log File Metrics ## Log File Metrics
For our previous run, we saw that nosqlbench was writing to `logs/scenario_20190812_154431_028.log` For our previous run, we saw that nosqlbench was writing to `logs/scenario_20190812_154431_028.log`
Even when you don't configure nosqlbench to write its metrics to another location, it Even when you don't configure nosqlbench to write its metrics to another location, it will periodically report all the
will periodically report all the metrics to the log file. At the end of a scenario, metrics to the log file. At the end of a scenario, before nosqlbench shuts down, it will flush the partial reporting
before nosqlbench shuts down, it will flush the partial reporting interval again to interval again to the logs. This means you can always look in the logs for metrics information.
the logs. This means you can always look in the logs for metrics information.
:::warning :::warning
If you look in the logs for metrics, be aware that the last report will only contain a If you look in the logs for metrics, be aware that the last report will only contain a partial interval of results. When
partial interval of results. When looking at the last partial window, only metrics which looking at the last partial window, only metrics which average over time or which compute the mean for the whole test
average over time or which compute the mean for the whole test will be meaningful. will be meaningful.
::: :::
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus a subset of the most important metrics. Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus
a subset of the most important metrics.
``` ```
2019-08-12 15:46:00,274 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:48] -- BEGIN METRICS DETAIL -- 2019-08-12 15:46:00,274 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:48] -- BEGIN METRICS DETAIL --
@ -36,7 +36,8 @@ Below is a sample of the log that gives us our basic metrics. There is a lot to
``` ```
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics from nosqlbench. The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics
from nosqlbench.
We recommend that you use one of these methods, according to your environment or tooling available: We recommend that you use one of these methods, according to your environment or tooling available:
@ -45,4 +46,5 @@ We recommend that you use one of these methods, according to your environment or
3. Record your metrics to local CSV files with `--report-csv-to my_metrics_dir` 3. Record your metrics to local CSV files with `--report-csv-to my_metrics_dir`
4. Record your metrics to HDR logs with `--log-histograms my_hdr_metrics.log` 4. Record your metrics to HDR logs with `--log-histograms my_hdr_metrics.log`
See the command line reference for details on how to route your metrics to a metrics collector or format of your preference. See the command line reference for details on how to route your metrics to a metrics collector or format of your
preference.

View File

@ -5,56 +5,64 @@ weight: 4
# Example Metrics # Example Metrics
A set of core metrics are provided for every workload that runs with nosqlbench, A set of core metrics are provided for every workload that runs with nosqlbench, regardless of the activity type and
regardless of the activity type and protocol used. This section explains each of protocol used. This section explains each of these metrics and shows an example of them from the log file.
these metrics and shows an example of them from the log file.
## metric: result ## metric: result
This is the primary metric that should be used to get a quick idea of the This is the primary metric that should be used to get a quick idea of the throughput and latency for a given run. It
throughput and latency for a given run. It encapsulates the entire encapsulates the entire operation life cycle ( ie. bind, execute, get result back ).
operation life cycle ( ie. bind, execute, get result back ).
For this example we see that we averaged 3732 operations / second with 3.6ms For this example we see that we averaged 3732 operations / second with 3.6ms 75th percentile latency and 23.9ms 99th
75th percentile latency and 23.9ms 99th percentile latency. Note the raw metrics are percentile latency. Note the raw metrics are in microseconds. This duration_unit may change depending on how a user
in microseconds. This duration_unit may change depending on how a user configures configures nosqlbench, so always double-check it.
nosqlbench, so always double-check it.
``` ```
2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185, median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415, mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132, rate_unit=events/second, duration_unit=microseconds 2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER,
name=cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185,
median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415,
mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132,
rate_unit=events/second, duration_unit=microseconds
``` ```
## metric: result-success ## metric: result-success
This metric shows whether there were any errors during the run. You can confirm that This metric shows whether there were any errors during the run. You can confirm that the count is equal to the number of
the count is equal to the number of cycles for the run if cycles for the run if you are expecting or requiring zero failed operations.
you are expecting or requiring zero failed operations.
Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput and latency here are slightly
and latency here are slightly different than the `results` metric simply because this different than the `results` metric simply because this is a separate timer that only includes operations which
is a separate timer that only includes operations which completed with no exceptions. completed with no exceptions.
``` ```
2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808, stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063, p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243, rate_unit=events/second, duration_unit=microseconds 2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER,
name=cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808,
stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063,
p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243,
rate_unit=events/second, duration_unit=microseconds
``` ```
## metric: resultset-size ## metric: resultset-size
For read workloads, this metric shows the size of result sent back to nosqlbench For read workloads, this metric shows the size of result sent back to nosqlbench from the server. This is useful to
from the server. This is useful to confirm that you are reading rows that already confirm that you are reading rows that already exist in the database.
exist in the database.
``` ```
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0 2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM,
name=cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056,
median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
``` ```
#### metric: tries #### metric: tries
nosqlbench will retry failures 10 times by default, this is configurable via the `maxtries` command line NoSQLBench will retry failures 10 times by default, this is configurable via the `maxtries` command line option for the
option for the cql activity type. This metric shows a histogram of the number of tries that each operation cql activity type. This metric shows a histogram of the number of tries that each operation required, in this example,
required, in this example, there were no retries as the `count` is 100k. there were no retries as the `count` is 100k.
``` ```
2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0, p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0 2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM,
name=cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0,
p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
``` ```
### More Metrics ### More Metrics
@ -66,7 +74,6 @@ nosqlbench extends many ways to report the metrics from a run, including:
- Reporting to Graphite - Reporting to Graphite
- Reporting to HDR - Reporting to HDR
To get more information on these options, see the output of To get more information on these options, see the output of
./nb --help ./nb --help
@ -75,6 +82,6 @@ To get more information on these options, see the output of
You have completed your first run with nosqlbench! You have completed your first run with nosqlbench!
In the 'Next Steps' section, you'll find options for how to continue, whether you are looking In the 'Next Steps' section, you'll find options for how to continue, whether you are looking for basic testing or
for basic testing or something more advanced. something more advanced.

View File

@ -5,20 +5,19 @@ weight: 5
# Next Steps # Next Steps
Now that you've run nosqlbench for the first time and seen what it does, you can Now that you've run nosqlbench for the first time and seen what it does, you can choose what level of customization you
choose what level of customization you want for further testing. want for further testing.
The sections below describe key areas that users typically customize The sections below describe key areas that users typically customize when working with nosqlbench.
when working with nosqlbench.
Everyone who uses nosqlbench will want to get familiar with the 'NoSQLBench Basics' section below. Everyone who uses nosqlbench will want to get familiar with the 'NoSQLBench Basics' section below. This is essential
This is essential reading for new and experienced testers alike. reading for new and experienced testers alike.
## High-Level Users ## High-Level Users
Several canonical workloads are already baked-in to nosqlbench for immediate use. Several canonical workloads are already baked-in to nosqlbench for immediate use. If you are simply wanting to drive
If you are simply wanting to drive workloads from nosqlbench without building a custom workload, workloads from nosqlbench without building a custom workload, then you'll want to learn about the available workloads
then you'll want to learn about the available workloads and their options. and their options.
Recommended reading for high-level testing workflow: Recommended reading for high-level testing workflow:
1. 'Built-In Workloads' 1. 'Built-In Workloads'
@ -26,10 +25,9 @@ Recommended reading for high-level testing workflow:
## Workload Builders ## Workload Builders
If you want to use nosqlbench to build a tailored workload that closely emulates what If you want to use nosqlbench to build a tailored workload that closely emulates what a specific application would do,
a specific application would do, then you can build a YAML file that specifies all then you can build a YAML file that specifies all of the details of an iterative workload. You can specify the access
of the details of an iterative workload. You can specify the access patterns, patterns, data distributions, and more.
data distributions, and more.
The recommended reading for this is: The recommended reading for this is:
@ -39,9 +37,7 @@ The recommended reading for this is:
## Scenario Developers ## Scenario Developers
The underlying runtime for a scenario in nosqlbench is based on EngineBlock, The underlying runtime for a scenario in nosqlbench is based on EngineBlock, which means it has all the scripting power
which means it has all the scripting power that comes with that. For advanced that comes with that. For advanced scenario designs, iterative testing models, or analysis methods, you can use
scenario designs, iterative testing models, or analysis methods, you can use ECMAScript to control the scenario from start to finish. This is an advanced feature that is not recommended for
ECMAScript to control the scenario from start to finish. This is an advanced first-time users. A guide for scenario developers will be released in increments.
feature that is not recommended for first-time users. A guide for scenario
developers will be released in increments.

View File

@ -7,16 +7,13 @@ weight: 20
## Downloading ## Downloading
NoSQLBench is packaged directly as a Linux binary named `nb` and as NoSQLBench is packaged directly as a Linux binary named `nb` and as an executable Java jar named `nb.jar`.
an executable Java jar named `nb.jar`.
The Linux binary is recommended, since it comes with its own The Linux binary is recommended, since it comes with its own JVM and eliminates the need to manage Java downloads. Both
JVM and eliminates the need to manage Java downloads. Both can be obtained can be obtained at the releases section of the main NoSQLBench project:
at the releases section of the main NoSQLBench project:
- [NoSQLBench Releases](https://github.com/nosqlbench/nosqlbench/releases) - [NoSQLBench Releases](https://github.com/nosqlbench/nosqlbench/releases)
:::info :::info
Once you download the binary, you may need to `chmod +x nb` to make it Once you download the binary, you may need to `chmod +x nb` to make it
executable. executable.
@ -27,8 +24,8 @@ If you choose to use the nb.jar instead of the binary, it is recommended
to run it with at least Java 12. to run it with at least Java 12.
::: :::
This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`. This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`. If you are using
If you are using the jar, just replace `./nb` with `java -jar nb.jar` when running commands. the jar, just replace `./nb` with `java -jar nb.jar` when running commands.
## Running ## Running
@ -51,15 +48,13 @@ To provide your own contact points (comma separated), add the `hosts=` parameter
./nb cql-iot hosts=host1,host2 ./nb cql-iot hosts=host1,host2
Additionally, if you have docker installed on your local system, and your user has permissions to use it, you can use
Additionally, if you have docker installed on your local system, and your user has permissions to use it, you `--docker-metrics` to stand up a live metrics dashboard at port 3000.
can use `--docker-metrics` to stand up a live metrics dashboard at port 3000.
./nb cql-iot --docker-metrics ./nb cql-iot --docker-metrics
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can start
running real workloads without having to learn much about the machinery that makes it happen.
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can The rest of this section has a more elaborate example that exposes some of the basic options you may want to adjust for
start running real workloads without having to learn much about the machinery that makes it happen. your first serious test.
The rest of this section has a more elaborate example that exposes some of the basic options you may want to
adjust for your first serious test.

View File

@ -10,34 +10,28 @@ This is the same documentation you get in markdown format with the
--------------------------------------- ---------------------------------------
Help ( You're looking at it. ) Help ( You're looking at it. )
--help --help
Short options, like '-v' represent simple options, like verbosity. Short options, like '-v' represent simple options, like verbosity. Using multiples increases the level of the option,
Using multiples increases the level of the option, like '-vvv'. like '-vvv'.
Long options, like '--help' are top-level options that may only be Long options, like '--help' are top-level options that may only be used once. These modify general behavior, or allow
used once. These modify general behavior, or allow you to get more you to get more details on how to use nosqlbench.
details on how to use nosqlbench.
All other options are either commands, or named arguments to commands. All other options are either commands, or named arguments to commands. Any single word without dashes is a command that
Any single word without dashes is a command that will be converted will be converted into script form. Any option that includes an equals sign is a named argument to the previous command.
into script form. Any option that includes an equals sign is a The following example is a commandline with a command *start*, and two named arguments to that command.
named argument to the previous command. The following example
is a commandline with a command *start*, and two named arguments
to that command.
./nb start driver=diag alias=example ./nb start driver=diag alias=example
### Discovery options ### ### Discovery options ###
These options help you learn more about running nosqlbench, and These options help you learn more about running nosqlbench, and about the plugins that are present in your particular
about the plugins that are present in your particular version. version.
Get a list of additional help topics that have more detailed Get a list of additional help topics that have more detailed documentation:
documentation:
./nb help topics ./nb help topics
@ -55,11 +49,9 @@ Provide the metrics that are available for scripting
### Execution Options ### ### Execution Options ###
This is how you actually tell nosqlbench what scenario to run. Each of these This is how you actually tell nosqlbench what scenario to run. Each of these commands appends script logic to the
commands appends script logic to the scenario that will be executed. scenario that will be executed. These are considered as commands, can occur in any order and quantity. The only rule is
These are considered as commands, can occur in any order and quantity. that arguments in the arg=value form will apply to the preceding script or activity.
The only rule is that arguments in the arg=value form will apply to
the preceding script or activity.
Add the named script file to the scenario, interpolating named parameters: Add the named script file to the scenario, interpolating named parameters:
@ -136,17 +128,16 @@ or
--progress logonly:5m --progress logonly:5m
If you want to add in classic time decaying histogram metrics If you want to add in classic time decaying histogram metrics for your histograms and timers, you may do so with this
for your histograms and timers, you may do so with this option: option:
--classic-histograms prefix --classic-histograms prefix
--classic-histograms 'prefix:.*' # same as above --classic-histograms 'prefix:.*' # same as above
--classic-histograms 'prefix:.*specialmetrics' # subset of names --classic-histograms 'prefix:.*specialmetrics' # subset of names
Name the current session, for logfile naming, etc Name the current session, for logfile naming, etc By default, this will be "scenario-TIMESTAMP", and a logfile will be
By default, this will be "scenario-TIMESTAMP", and a logfile will be created created for this name.
for this name.
--session-name <name> --session-name <name>
@ -154,10 +145,13 @@ Enlist engineblock to stand up your metrics infrastructure using a local docker
--docker-metrics --docker-metrics
When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker, configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be found on localhost:3000 with the default credentials admin/admin). When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker,
configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base
dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance
(grafana can be found on localhost:3000 with the default credentials admin/admin).
### Console Options ### ### Console Options ###
Increase console logging levels: (Default console logging level is *warning*) Increase console logging levels: (Default console logging level is *warning*)
-v (info) -v (info)
@ -166,8 +160,8 @@ Increase console logging levels: (Default console logging level is *warning*)
--progress console:1m (disables itself if -v options are used) --progress console:1m (disables itself if -v options are used)
These levels affect *only* the console output level. Other logging level These levels affect *only* the console output level. Other logging level parameters affect logging to the scenario log,
parameters affect logging to the scenario log, stored by default in logs/... stored by default in logs/...
Show version, long form, with artifact coordinates. Show version, long form, with artifact coordinates.

View File

@ -5,26 +5,20 @@ weight: 2
# Grafana Metrics # Grafana Metrics
nosqlbench comes with a built-in helper to get you up and running quickly NoSQLBench comes with a built-in helper to get you up and running quickly with client-side testing metrics. This
with client-side testing metrics. functionality is based on docker, and a built-in method for bringing up a docker stack, automated by NoSQLBench.
This functionality is based on docker, and a built-in method for bringing up a docker stack,
automated by NoSQLBench.
:::warning :::warning
This feature requires that you have docker running on the local system and that This feature requires that you have docker running on the local system and that your user is in a group that
your user is in a group that is allowed to manage docker. is allowed to manage docker. Using the `--docker-metrics` command *will* attempt to manage docker on your local system.
Using the `--docker-metrics` command *will* attempt to manage docker
on your local system.
::: :::
To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime, To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime, use this command line option
use this command line option with any other nosqlbench commands: with any other nosqlbench commands:
--docker-metrics --docker-metrics
When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically on your local docker,
on your local docker, configure them to work together, and to send metrics the system configure them to work together, and to send metrics the system automatically. It also imports a base dashboard for
automatically. It also imports a base dashboard for nosqlbench and configures grafana nosqlbench and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be
snapshot export to share with a central DataStax grafana instance (grafana can be found found on localhost:3000 with the default credentials admin/admin).
on localhost:3000 with the default credentials admin/admin).

View File

@ -5,36 +5,51 @@ weight: 03
# Parameter Types # Parameter Types
To configure an nosqlbench activity to do something meaningful, you have to To configure an nosqlbench activity to do something meaningful, you have to provide parameters to it. This can occur in
provide parameters to it. This can occur in one of several ways. This section is a guide on nosqlbench parameters, how they layer together, and when to use one form over another. one of several ways. This section is a guide on nosqlbench parameters, how they layer together, and when to use one form
over another.
The command line is used to configure both the overall nosqlbench runtime (logging, etc) as well as the individual activities and scripts. Global nosqlbench options can be distinguished from scenario commands and their parameters because because global options always start with a single or --double-hyphen. The command line is used to configure both the overall nosqlbench runtime (logging, etc) as well as the individual
activities and scripts. Global nosqlbench options can be distinguished from scenario commands and their parameters
because because global options always start with a single or --double-hyphen.
## Activity Parameters ## Activity Parameters
Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must* follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to that command. Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must*
follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading
hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to
that command.
Activity parameters can be provided by the nosqlbench core runtime or they can be provided by the activity type. All of the params are usable to configure an activity together. It's not important where they are provided from so long as you know what they do for your workloads, how to configure them, and where to find the docs. Activity parameters can be provided by the nosqlbench core runtime or they can be provided by the activity type. All of
the params are usable to configure an activity together. It's not important where they are provided from so long as you
know what they do for your workloads, how to configure them, and where to find the docs.
*Core* Activity Parameters are those provided by the core runtime. *Core* Activity Parameters are those provided by the core runtime. They are part of the core API and used by every
They are part of the core API and used by every activity type. Core activity params include type*, *alias*, and *threads*, for example. activity type. Core activity params include type*, *alias*, and *threads*, for example. These parameters are explained
These parameters are explained individually under the next section. individually under the next section.
*Custom* Activity Parameters are those provided by an activity type. *Custom* Activity Parameters are those provided by an activity type. These parameters are documented for each activity
These parameters are documented for each activity type. You can see them by running `nosqlbench help <activity type>`. type. You can see them by running `nosqlbench help <activity type>`.
Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an activity is running. This means that scenario scripting logic may change some variables while an activity is running, and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting scenarios. Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an
activity is running. This means that scenario scripting logic may change some variables while an activity is running,
and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting
scenarios.
Parameters that are dynamic should be documented as such in the respective activity type's help page. Parameters that are dynamic should be documented as such in the respective activity type's help page.
### Template Parameters ### Template Parameters
If you need to provide general-purpose overrides to a named section of the If you need to provide general-purpose overrides to a named section of the standard YAML, then you may use a mechanism
standard YAML, then you may use a mechanism called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults. This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template Params'. called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults.
This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override
on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template
Params'.
### Parameter Loading ### Parameter Loading
Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command line or script, the parameters are resolved in the following order: Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command
line or script, the parameters are resolved in the following order:
1. The `type` parameter tells nosqlbench which activity type implementation to load. 1. The `type` parameter tells nosqlbench which activity type implementation to load.
2. The activity type implementation creates an activity. 2. The activity type implementation creates an activity.
@ -46,9 +61,13 @@ Now that we've described all the parameter types, let's tie them together. When
## Statement Parameters ## Statement Parameters
Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an activity type. These are also documented in the respective activity type's documentation included in the 'Activity Types' section. Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely
different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a
workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an
activity type. These are also documented in the respective activity type's documentation included in the 'Activity
Types' section.
The core statement parameters are explained just below the core activity parameters in this sectin. The core statement parameters are explained just below the core activity parameters in this section.

View File

@ -1,9 +1,9 @@
--- ---
title: Core Activity Params title: Activity Parameters
weight: 05 weight: 05
--- ---
# Core Activity Parameters # Activity Parameters
Activity parameters are passed as named arguments for an activity, Activity parameters are passed as named arguments for an activity,
either on the command line or via a scenario script. On the command either on the command line or via a scenario script. On the command
@ -12,14 +12,16 @@ line, these take the form of
<paramname>=<paramvalue> <paramname>=<paramvalue>
Some activity parameters are universal in that they can be used with any Some activity parameters are universal in that they can be used with any
activity type. These parameters are recognized by nosqlbench whether or driver type. These parameters are recognized by nosqlbench whether or
not they are recognized by a particular activity type implementation. not they are recognized by a particular driver implementation. These are
These are called _core parameters_. Only core activity parameters are called _core parameters_. Only core activity parameters are documented
documented here. here.
:::info To see what activity parameters are valid for a given activity :::info
type, see the documentation for that activity type with `nosqlbench To see what activity parameters are valid for a given activity type, see
help <activity type>`. ::: the documentation for that activity type with `nosqlbench help <activity
type>`.
:::
## driver ## driver

View File

@ -9,9 +9,12 @@ Some statement parameters are recognized by the nosqlbench runtime and can be us
## *ratio* ## *ratio*
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is started. A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a
document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is
started.
When an activity is initialized, all of the active statements are combined into a sequence based on their relative ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user. When an activity is initialized, all of the active statements are combined into a sequence based on their relative
ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
For example, consider the statements below: For example, consider the statements below:
@ -25,10 +28,15 @@ statements:
ratio: 3 ratio: 3
``` ```
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60% overall. It is important to remember that statement ratios are always relative to the total sum of the active statements' ratios. If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence
length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first
statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60%
overall. It is important to remember that statement ratios are always relative to the total sum of the active
statements' ratios.
:::info :::info
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include below. Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include
below.
::: :::
### *seq* (activity level - do not use on statements) ### *seq* (activity level - do not use on statements)
@ -38,30 +46,52 @@ Because the ratio works so closely with the activity parameter `seq`, the descri
- _required_: no - _required_: no
- _dynamic_: no - _dynamic_: no
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly efficient form. The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op
sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle
offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly
efficient form.
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or interval. An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket
scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding
ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple
times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or
interval.
:::info :::info
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields determined by distinct cycle values. The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle
still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per
sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields
determined by distinct cycle values.
::: :::
There are three schemes to pick from: There are three schemes to pick from:
### bucket ### bucket
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield the sequence A B B B B B. This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is
exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield
the sequence A B B B B B.
### concat ### concat
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer. This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The
ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
### interval ### interval
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time, then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval (0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are taken as the order. This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and
apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time,
then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same
time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval
(0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is
assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are
taken as the order.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A B A` as the op sequence. In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A
B A` as the op sequence.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle sequencing of statements. This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where
each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle
sequencing of statements.

View File

@ -5,6 +5,5 @@ weight: 30
# NoSQLBench Basics # NoSQLBench Basics
This section covers the essential details that you'll need to This section covers the essential details that you'll need to run nosqlbench in different ways.
run nosqlbench in different ways.

View File

@ -5,17 +5,15 @@ weight: 2
## Description ## Description
The CQL IoT workload demonstrates a time-series telemetry system as typically The CQL IoT workload demonstrates a time-series telemetry system as typically found in IoT applications. The bulk of the
found in IoT applications. The bulk of the traffic is telemetry ingest. This is traffic is telemetry ingest. This is useful for establishing steady-state capacity with an actively managed data
useful for establishing steady-state capacity with an actively managed data lifecycle. This is a steady-state workload, where inserts are 90% of the operations and queries are the remaining 10%.
lifecycle. This is a steady-state workload, where inserts are 90% of the
operations and queries are the remaining 10%.
## Schema ## Schema
CREATE KEYSPACE baselines WITH replication = CREATE KEYSPACE baselines WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 }; { 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE baselines.iot ( CREATE TABLE baselines.iot (
station_id UUID, station_id UUID,
machine_id UUID, machine_id UUID,
@ -33,9 +31,8 @@ operations and queries are the remaining 10%.
2. rampup - Ramp-Up to steady state for normative density, writes only 100M rows 2. rampup - Ramp-Up to steady state for normative density, writes only 100M rows
3. main - Run at steady state with 10% reads and 90% writes, 100M rows 3. main - Run at steady state with 10% reads and 90% writes, 100M rows
For in-depth testing, this workload will take some time to build up data density For in-depth testing, this workload will take some time to build up data density where TTLs begin purging expired data.
where TTLs begin purging expired data. At this point, the test should be At this point, the test should be considered steady-state.
considered steady-state.
## Data Set ## Data Set
@ -60,7 +57,7 @@ considered steady-state.
select * from baselines.iot select * from baselines.iot
where machine_id=? and sensor_name=? where machine_id=? and sensor_name=?
limit 10 limit 10
## Workload Parameters ## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests. This workload has no adjustable parameters when used in the baseline tests.
@ -74,17 +71,14 @@ When used for additional testing, the following parameters should be supported:
- compression - enabled or disabled, to disable, set compression='' - compression - enabled or disabled, to disable, set compression=''
- write_cl - the consistency level for writes (default: LOCAL_QUORUM) - write_cl - the consistency level for writes (default: LOCAL_QUORUM)
- read_cl - the consistency level for reads (defaultL LOCAL_QUORUM) - read_cl - the consistency level for reads (defaultL LOCAL_QUORUM)
## Key Performance Metrics ## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
user's perspective. For microbench and baseline tests, these are the only baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
required metrics. When gathering metrics from multiple server nodes, they should kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
be kept in aggregate form, for min, max, and average for each time interval in for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
monitoring. For example, the avg p99 latency for reads should be kept, as well with discrete histogram values per interval.
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
### Client-Side ### Client-Side

View File

@ -5,22 +5,19 @@ weight: 1
## Description ## Description
The CQL Key-Value workload demonstrates the simplest possible schema with The CQL Key-Value workload demonstrates the simplest possible schema with payload data. This is useful for measuring
payload data. This is useful for measuring system capacity most directly in system capacity most directly in terms of raw operations. As a reference point, provides some insight around types of
terms of raw operations. As a reference point, provides some insight around workloads that are constrained around messaging, threading, and tasking, rather than bulk throughput.
types of workloads that are constrained around messaging, threading, and
tasking, rather than bulk throughput.
During preload, all keys are set with a value. During the main phase of the During preload, all keys are set with a value. During the main phase of the workload, random keys from the known
workload, random keys from the known population are replaced with new values population are replaced with new values which never repeat. During the main phase, random partitions are selected for
which never repeat. During the main phase, random partitions are selected for
upsert, with row values never repeating. upsert, with row values never repeating.
## Schema ## Schema
CREATE KEYSPACE baselines IF NOT EXISTS WITH replication = CREATE KEYSPACE baselines IF NOT EXISTS WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 }; { 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE baselines.keyvalue ( CREATE TABLE baselines.keyvalue (
user_id UUID, user_id UUID,
user_code text user_code text
@ -31,7 +28,7 @@ upsert, with row values never repeating.
1. schema - Initialize the schema. 1. schema - Initialize the schema.
2. rampup - Load data according to the data set size. 2. rampup - Load data according to the data set size.
3. main - Run the workload 3. main - Run the workload
## Operations ## Operations
@ -41,19 +38,19 @@ upsert, with row values never repeating.
### read (main) ### read (main)
select * from baselines.keyvalue where key=?key; select * from baselines.keyvalue where key=?key;
## Data Set ## Data Set
### baselines.keyvalue insert (rampup) ### baselines.keyvalue insert (rampup)
- key - text, number as string, selected sequentially up to keycount - key - text, number as string, selected sequentially up to keycount
- value - text, number as string, selected sequentially up to valuecount - value - text, number as string, selected sequentially up to valuecount
### baselines.keyvalue insert (main) ### baselines.keyvalue insert (main)
- key - text, number as string, selected uniformly within keycount - key - text, number as string, selected uniformly within keycount
- value - text, number as string, selected uniformly within valuecount - value - text, number as string, selected uniformly within valuecount
### baselines.keyvalue read (main) ### baselines.keyvalue read (main)
@ -70,13 +67,11 @@ When used for additional testing, the following parameters should be supported:
## Key Performance Metrics ## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
user's perspective. For microbench and baseline tests, these are the only baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
required metrics. When gathering metrics from multiple server nodes, they should kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
be kept in aggregate form, for min, max, and average for each time interval in for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
monitoring. For example, the avg p99 latency for reads should be kept, as well with discrete histogram values per interval.
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
### Client-Side ### Client-Side
@ -95,6 +90,5 @@ form, with discrete histogram values per interval.
# Notes on Interpretation # Notes on Interpretation
Once the average ratio of overwrites starts to balance with the rate of Once the average ratio of overwrites starts to balance with the rate of compaction, a steady state should be achieved.
compaction, a steady state should be achieved. At this point, pending At this point, pending compactions and bytes compacted should be mostly flat over time.
compactions and bytes compacted should be mostly flat over time.

View File

@ -5,14 +5,15 @@ weight: 3
## Description ## Description
The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help understand underlying performance differences between version and configuration options The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help
when using data models that have wide rows. understand underlying performance differences between version and configuration options when using data models that have
wide rows.
## Schema ## Schema
CREATE KEYSPACE if not exists baselines WITH replication = CREATE KEYSPACE if not exists baselines WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 }; { 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE if not exists baselines.widerows ( CREATE TABLE if not exists baselines.widerows (
part text, part text,
clust text, clust text,
@ -26,17 +27,16 @@ when using data models that have wide rows.
2. rampup - Fully populate the widerows with data, 100000 elements per row 2. rampup - Fully populate the widerows with data, 100000 elements per row
3. main - Run at steady state with 50% reads and 50% writes, 100M rows 3. main - Run at steady state with 50% reads and 50% writes, 100M rows
For in-depth testing, this workload needs significant density of partitions in For in-depth testing, this workload needs significant density of partitions in combination with fully populated wide
combination with fully populated wide rows. For exploratory or parameter rows. For exploratory or parameter contrasting tests, ensure that the rampup phase is configured correctly to establish
contrasting tests, ensure that the rampup phase is configured correctly to this initial state.
establish this initial state.
## Data Set ## Data Set
### baselines.widerows dataset (rampup) ### baselines.widerows dataset (rampup)
- part - text, number in string form, sequentially from 1..1E9 - part - text, number in string form, sequentially from 1..1E9
- clust - text, number in string form, sequentially from 1..1E9 - clust - text, number in string form, sequentially from 1..1E9
- data - text, extract from lorem ipsum between 50 and 150 characters - data - text, extract from lorem ipsum between 50 and 150 characters
### baselines.widerows dataset (main) ### baselines.widerows dataset (main)
@ -64,7 +64,7 @@ establish this initial state.
select * from baselines.iot select * from baselines.iot
where machine_id=? and sensor_name=? where machine_id=? and sensor_name=?
limit 10 limit 10
## Workload Parameters ## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests. This workload has no adjustable parameters when used in the baseline tests.
@ -73,16 +73,14 @@ When used for additional testing, the following parameters should be supported:
- partcount - the number of unique partitions - partcount - the number of unique partitions
- partsize - the number of logical rows within a CQL partition - partsize - the number of logical rows within a CQL partition
## Key Performance Metrics ## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
user's perspective. For microbench and baseline tests, these are the only baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
required metrics. When gathering metrics from multiple server nodes, they should kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
be kept in aggregate form, for min, max, and average for each time interval in for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
monitoring. For example, the avg p99 latency for reads should be kept, as well with discrete histogram values per interval.
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
### Client-Side ### Client-Side

View File

@ -5,24 +5,20 @@ weight: 40
# Built-In Workloads # Built-In Workloads
There are a few built-in workloads which you may want to run. There are a few built-in workloads which you may want to run. These workloads can be run from a command without having
These workloads can be run from a command without having to configure anything, to configure anything, or they can be tailored with their built-in parameters.
or they can be tailored with their built-in parameters.
There is now a way to list the built-in workloads: There is now a way to list the built-in workloads:
`nb --list-workloads` will give you a list of all the pre-defined workloads `nb --list-workloads` will give you a list of all the pre-defined workloads which have a named scenarios built-in.
which have a named scenarios built-in.
## Common Built-Ins ## Common Built-Ins
This section of the guidebook will explain a couple of the common This section of the guidebook will explain a couple of the common scenarios in detail.
scenarios in detail.
## Built-In Workload Conventions ## Built-In Workload Conventions
The built-in workloads follow a set of conventions so that they can The built-in workloads follow a set of conventions so that they can be used interchangeably:
be used interchangeably:
### Phases ### Phases
@ -34,7 +30,7 @@ Each built-in contains the following tags that can be used to break the workload
### Parameters ### Parameters
Each built-in has a set of adjustable parameters which is documented below per workload. For example, Each built-in has a set of adjustable parameters which is documented below per workload. For example, the cql-iot
the cql-iot workload has a `sources` parameter which determines the number of unique devices in the dataset. workload has a `sources` parameter which determines the number of unique devices in the dataset.

View File

@ -3,68 +3,61 @@ title: 00 YAML Organization
weight: 00 weight: 00
--- ---
It is best to keep every workload self-contained within a single YAML file, It is best to keep every workload self-contained within a single YAML file, including schema, data rampup, and the main
including schema, data rampup, and the main phase of testing. phase of testing. The phases of testing are controlled by tags as described in the Standard YAML section.
The phases of testing are controlled by tags as described in the Standard YAML section.
:::info :::info
The phase names described below have been adopted as a convention within the The phase names described below have been adopted as a convention within the built-in workloads. It is strongly advised
built-in workloads. It is strongly advised that new workload YAMLs use the same that new workload YAMLs use the same tagging scheme so that workload are more plugable across YAMLs.
tagging scheme so that workload are more plugable across YAMLs.
::: :::
### Schema phase ### Schema phase
The schema phase is simply a phase of your test which creates the necessary schema The schema phase is simply a phase of your test which creates the necessary schema on your target system. For CQL, this
on your target system. For CQL, this generally consists of a keyspace and one ore generally consists of a keyspace and one ore more table statements. There is no special schema layer in nosqlbench. All
more table statements. There is no special schema layer in nosqlbench. All statements statements executed are simply statements. This provides the greatest flexibility in testing since every activity type
executed are simply statements. This provides the greatest flexibility in testing since is allowed to control its DDL and DML using the same machinery.
every activity type is allowed to control its DDL and DML using the same machinery.
The schema phase is normally executed with defaults for most parameters. This means The schema phase is normally executed with defaults for most parameters. This means that statements will execute in the
that statements will execute in the order specified in the YAML, in serialized form, order specified in the YAML, in serialized form, exactly once. This is a welcome side-effect of how the initial
exactly once. This is a welcome side-effect of how the initial parameters like _cycles_ parameters like _cycles_ is set from the statements which are activated by tagging.
is set from the statements which are activated by tagging.
You can mark statements as schema phase statements by adding this set of tags to the You can mark statements as schema phase statements by adding this set of tags to the statements, either directly, or by
statements, either directly, or by block: block:
tags: tags:
phase: schema phase: schema
### Rampup phase ### Rampup phase
When you run a performance test, it is very important to be aware of how much data is When you run a performance test, it is very important to be aware of how much data is present. Higher density tests are
present. Higher density tests are more realistic for systems which accumulate data over more realistic for systems which accumulate data over time, or which have a large working set of data. The amount of
time, or which have a large working set of data. The amount of data on the system you are data on the system you are testing should recreate a realistic amount of data that you would run in production, ideally.
testing should recreate a realistic amount of data that you would run in production, In general, there is a triangular trade-off between service time, op rate, and data density.
ideally. In general, there is a triangular trade-off between service time, op rate, and data density.
It is the purpose of the _rampup_ phase to create the backdrop data on a target system It is the purpose of the _rampup_ phase to create the backdrop data on a target system that makes a test meaningful for
that makes a test meaningful for some level of data density. Data density is normally some level of data density. Data density is normally discussed as average per node, but it is also important to consider
discussed as average per node, but it is also important to consider distribution of data distribution of data as it varies from the least dense to the most dense nodes.
as it varies from the least dense to the most dense nodes.
Because it is useful to be able to add data to a target cluster in an incremental way, Because it is useful to be able to add data to a target cluster in an incremental way, the bindings which are used with
the bindings which are used with a _rampup_ phase may actually be different from the a _rampup_ phase may actually be different from the ones used for a _main_ phase. In most cases, you want the rampup
ones used for a _main_ phase. In most cases, you want the rampup phase to create data phase to create data in a way that incrementally adds to the population of data in the cluster. This allows you to add
in a way that incrementally adds to the population of data in the cluster. This allows some data to a cluster with `cycles=0..1M` and then decide whether to continue adding data using the next contiguous
you to add some data to a cluster with `cycles=0..1M` and then decide whether to range of cycles, with `cycles=1M..2M` and so on.
continue adding data using the next contiguous range of cycles, with `cycles=1M..2M` and so on.
You can mark statements as rampup phase statements by adding this set of tags to the You can mark statements as rampup phase statements by adding this set of tags to the statements, either directly, or by
statements, either directly, or by block: block:
tags: tags:
phase: rampup phase: rampup
### Main phase ### Main phase
The main phase of a nosqlbench scenario is the one during which you really care about The main phase of a nosqlbench scenario is the one during which you really care about the metric. This is the actual
the metric. This is the actual test that everything else has prepared your system for. test that everything else has prepared your system for.
You can mark statement as schema phase statements by adding this set of tags to the You can mark statement as schema phase statements by adding this set of tags to the statements, either directly, or by
statements, either directly, or by block: block:
tags: tags:
phase: main phase: main

View File

@ -5,11 +5,11 @@ weight: 01
## Statement Templates ## Statement Templates
A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data to use with them, and tags for organizing them. A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data
to use with them, and tags for organizing them.
In essence, the config format is *all about configuring statements*. In essence, the config format is *all about configuring statements*. Every other element in the config format is in some
Every other element in the config format is in some way modifying way modifying or otherwise helping create statements to be used in an activity.
or otherwise helping create statements to be used in an activity.
Statement templates are the single most important part of a YAML config. Statement templates are the single most important part of a YAML config.
@ -19,12 +19,16 @@ statements:
- a single statement body - a single statement body
``` ```
This is a valid activity YAML file in and of itself. It has a single This is a valid activity YAML file in and of itself. It has a single statement template.
statement template.
It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and how it will be used. It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The
example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL
activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then
the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and
how it will be used.
You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little further in: You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little
further in:
```yaml ```yaml
statements: statements:
@ -46,5 +50,6 @@ statements:
submit job {alpha} on queue {beta} with options {gamma}; submit job {alpha} on queue {beta} with options {gamma};
``` ```
Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the statement based on its position in the YAML file. Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the
statement based on its position in the YAML file.

View File

@ -5,7 +5,12 @@ weight: 02
## Data Bindings ## Data Bindings
Procedural data generation is built-in to the nosqlbench runtime by way of the [Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named recipes for generated data are called bindings. Procedural generation for test data has [many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to pull in the recipes we want. Procedural data generation is built-in to the nosqlbench runtime by way of the
[Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named
recipes for generated data are called bindings. Procedural generation for test data has
[many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed
and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to
pull in the recipes we want.
You can add a bindings section like this: You can add a bindings section like this:
@ -17,9 +22,12 @@ bindings:
delta: WeightedStrings('one:1;six:6;three:3;') delta: WeightedStrings('one:1;six:6;three:3;')
``` ```
This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of the four bindings together are called a bindings set. This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that
takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of
the four bindings together are called a bindings set.
The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity can construct a statement template from the provided bindings if needed, so this is valid: The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity
can construct a statement template from the provided bindings if needed, so this is valid:
```text ```text
[test]$ cat > stdout-test.yaml [test]$ cat > stdout-test.yaml
@ -43,13 +51,21 @@ The above bindings block is also a valid activity YAML, at least for the _stdout
9,nine,00J_pro,six 9,nine,00J_pro,six
``` ```
Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments. Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the
default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments.
This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time nosqlbench needs to create a set of values as parameters to a statement, the functions are called with an input, known as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an individual statement for a database operation. In this way, each cycle represents a specific operation. Since the functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making all nosqlbench workloads deterministic. This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time
nosqlbench needs to create a set of values as parameters to a statement, the functions are called with an input, known
as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an
individual statement for a database operation. In this way, each cycle represents a specific operation. Since the
functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making
all nosqlbench workloads deterministic.
In the example above, you can see the cycle numbers down the left. In the example above, you can see the cycle numbers down the left.
If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly different result, as the bindings apply to the statements that are provided, rather than creating a default statement for the bindings. See the example below: If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly
different result, as the bindings apply to the statements that are provided, rather than creating a default statement
for the bindings. See the example below:
```text ```text
[test]$ cat > stdout-test.yaml [test]$ cat > stdout-test.yaml
@ -84,11 +100,19 @@ know how statements will be used!
submit job 9 on queue nine with options 00J_pro; submit job 9 on queue nine with options 00J_pro;
``` ```
There are a few things to notice here. First, the statements that are executed are automatically alternated between. If you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run 5 times. There are a few things to notice here. First, the statements that are executed are automatically alternated between. If
you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run
5 times.
Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In fact, they were both treated as templates, but one of them had no anchors. Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In
fact, they were both treated as templates, but one of them had no anchors.
On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements. Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it. On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements.
Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it.
This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves are templates for data generation, only to be used when necessary. This means that the bindings that are defined around a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions. This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves
are templates for data generation, only to be used when necessary. This means that the bindings that are defined around
a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then
the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both
selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions.

View File

@ -6,17 +6,23 @@ weight: 03
## Statement Parameters ## Statement Parameters
Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different than the parameters that you use at the activity level. They apply specifically to a statement template, and are interpreted by an activity type when the statement template is used to construct a native statement form. Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different
than the parameters that you use at the activity level. They apply specifically to a statement template, and are
interpreted by an activity type when the statement template is used to construct a native statement form.
For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL statement should be prepared or not. For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In
the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL
statement should be prepared or not.
As with the bindings, a params section can be added at the same level, setting additional parameters to be used with statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do much on their own, although they have the same basic map syntax as bindings: As with the bindings, a params section can be added at the same level, setting additional parameters to be used with
statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a
way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do
much on their own, although they have the same basic map syntax as bindings:
```yaml ```yaml
params: params:
ratio: 1 ratio: 1
``` ```
As with statements, it is up to each activity type to interpret params in a As with statements, it is up to each activity type to interpret params in a useful way.
useful way.

View File

@ -5,7 +5,8 @@ weight: 04
## Statement Tags ## Statement Tags
Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are generally free-form, but there is a set of conventions that can make your testing easier. Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are
generally free-form, but there is a set of conventions that can make your testing easier.
An example: An example:
@ -17,7 +18,8 @@ tags:
### Tag Filtering ### Tag Filtering
The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided as an activity parameter when an activity is launched. The rules for tag filtering are: The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided
as an activity parameter when an activity is launched. The rules for tag filtering are:
1. If no tag filter is specified, then the statement matches. 1. If no tag filter is specified, then the statement matches.
2. A tag name predicate like `tags=name` asserts the presence of a specific 2. A tag name predicate like `tags=name` asserts the presence of a specific
@ -74,7 +76,5 @@ I'm alive!
# compound tag predicate does not fully match # compound tag predicate does not fully match
[test]$ ./nb run driver=stdout workload=stdout-test tags='name=fox.*',unit=delta [test]$ ./nb run driver=stdout workload=stdout-test tags='name=fox.*',unit=delta
11:02:53.490 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured. 11:02:53.490 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
``` ```

View File

@ -5,7 +5,11 @@ weight: 05
## Statement Blocks ## Statement Blocks
All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that need to be configured independently. All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and
parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your
statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or
parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that
need to be configured independently.
This is where blocks become useful: This is where blocks become useful:
@ -38,5 +42,7 @@ blocks:
9,block2-O 9,block2-O
``` ```
This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the root document level. Any of these values that are defined at the base document level apply to all blocks contained in that document, unless specifically overridden within a given block. This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the
root document level. Any of these values that are defined at the base document level apply to all blocks contained in
that document, unless specifically overridden within a given block.

View File

@ -7,7 +7,9 @@ weight: 06
## Statement Delimiting ## Statement Delimiting
Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way for small statements is in double quotes. If you need to express a much longer statement with special characters an newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so: Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way
for small statements is in double quotes. If you need to express a much longer statement with special characters an
newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so:
```yaml ```yaml
statements: statements:
@ -18,16 +20,24 @@ statements:
submit job {alpha} on queue {beta} with options {gamma}; submit job {alpha} on queue {beta} with options {gamma};
``` ```
Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed. Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice
because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed.
Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific activity type they are being used with. Generally, the statements should be thought of as a statement form that you want to use in your activity -- something that has place holders for data bindings. These place holders are called *named anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for each cycle of an activity. Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific
activity type they are being used with. Generally, the statements should be thought of as a statement form that you want
to use in your activity -- something that has place holders for data bindings. These place holders are called *named
anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for
each cycle of an activity.
There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions regarding multi-line blocks, see [YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block) There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it
retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions
regarding multi-line blocks, see
[YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block)
## Statement Sequences ## Statement Sequences
To provide a degree of flexibility to the user for statement definitions, To provide a degree of flexibility to the user for statement definitions, multiple statements may be provided together
multiple statements may be provided together as a sequence. as a sequence.
```yaml ```yaml
# a list of statements # a list of statements
@ -42,7 +52,8 @@ statements:
name2: "statement two" name2: "statement two"
``` ```
In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as ordered map keys. In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as
ordered map keys.
## Statement Properties ## Statement Properties
@ -57,7 +68,10 @@ statements:
stmt: statement two stmt: statement two
``` ```
This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format names each property of the statement, it allows for other properties to be defined at this level as well. This includes all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A detailed example follows: This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format
names each property of the statement, it allows for other properties to be defined at this level as well. This includes
all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A
detailed example follows:
```yaml ```yaml
statements: statements:
@ -72,9 +86,12 @@ statements:
freeparam3: a value, as if it were assigned under the params block. freeparam3: a value, as if it were assigned under the params block.
``` ```
In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as `freeparam3`. These are simply values that get assigned to the params map once all other processing has completed. In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing
block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as
`freeparam3`. These are simply values that get assigned to the params map once all other processing has completed.
It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long as some specific rules are followed. An example, which is equivalent to the above: It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long
as some specific rules are followed. An example, which is equivalent to the above:
```yaml ```yaml
statements: statements:
@ -93,7 +110,8 @@ The rules:
2. Do not use the **`<name>: <statement>`** form in combination with a 2. Do not use the **`<name>: <statement>`** form in combination with a
**`stmt: <statement>`** property. It is not possible to detect if this occurs. Use caution if you choose to mix these forms. **`stmt: <statement>`** property. It is not possible to detect if this occurs. Use caution if you choose to mix these forms.
As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params map for the statement. As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params
map for the statement.
### Per-Statement Format ### Per-Statement Format
@ -111,7 +129,9 @@ statements:
type: preload type: preload
``` ```
Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>: statement` form), the third is a statement config map, and the fourth is a combination of the previous two. Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>:
statement` form), the third is a statement config map, and the fourth is a combination of the previous two.
The above is valid nosqlbench YAML, although a reader would need The above is valid nosqlbench YAML, although a reader would need to know about the rules explained above in order to
to know about the rules explained above in order to really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for overrides and naming when you need it. really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for
overrides and naming when you need it.

View File

@ -5,14 +5,15 @@ weight: 07
# Multi-Docs # Multi-Docs
The YAML spec allows for multiple yaml documents to be concatenated in the The YAML spec allows for multiple yaml documents to be concatenated in the same file with a separator:
same file with a separator:
```yaml ```yaml
--- ---
``` ```
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you need only put them in separate logical documents, separated by a triple-dash. This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of
statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you
need only put them in separate logical documents, separated by a triple-dash.
For example: For example:
@ -42,8 +43,11 @@ doc2.number eight
doc1.form1 doc1.1 doc1.form1 doc1.1
``` ```
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken apart into a whole other level of partitioning if desired. This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken
apart into a whole other level of partitioning if desired.
:::warning :::warning
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are generally a better choice. See examples in the standard workloads. The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML
workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are
generally a better choice. See examples in the standard workloads.
::: :::

View File

@ -5,7 +5,8 @@ weight: 08
# Template Params # Template Params
All nosqlbench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic macro facility that allows named anchors to be placed in the document as a whole: All nosqlbench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic
macro facility that allows named anchors to be placed in the document as a whole:
```text ```text
<<varname:defaultval>> <<varname:defaultval>>
@ -13,7 +14,9 @@ All nosqlbench YAML formats support a parameter macro format that applies before
TEMPLATE(varname,defaultval) TEMPLATE(varname,defaultval)
``` ```
In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with `barbaz`. If none is provided then the default value will be used instead. For example: In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity
parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with
`barbaz`. If none is provided then the default value will be used instead. For example:
```text ```text
[test]$ cat > stdout-test.yaml [test]$ cat > stdout-test.yaml
@ -28,6 +31,7 @@ MISSING
THIS IS IT THIS IS IT
``` ```
If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or `TEMPLATE(varname,)`. If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or
`TEMPLATE(varname,)`.

View File

@ -20,13 +20,16 @@ name: doc2
... ...
``` ```
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except for documentation or metric naming purposes. This provides a layered naming scheme for the statements themselves. It is not usually important to name things except
for documentation or metric naming purposes.
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`. If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the
document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
For example, the full name of statement1 above would be `doc1--block1--stmt1`. For example, the full name of statement1 above would be `doc1--block1--stmt1`.
:::info :::info
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want to adopt the habit of naming all your statements something basic and descriptive. If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want
to adopt the habit of naming all your statements something basic and descriptive.
::: :::

View File

@ -21,10 +21,11 @@ scenarios:
- run driver=diag cycles=10M - run driver=diag cycles=10M
``` ```
This provides a way to specify more detailed workflows that users may want This provides a way to specify more detailed workflows that users may want to run without them having to build up a
to run without them having to build up a command line for themselves. command line for themselves.
A couple of other forms are supported in the YAML, for terseness: A couple of other forms are supported in the YAML, for terseness:
```yaml ```yaml
scenarios: scenarios:
oneliner: run driver=diag cycles=10 oneliner: run driver=diag cycles=10
@ -32,16 +33,15 @@ scenarios:
part1: run driver=diag cycles=10 alias=part2 part1: run driver=diag cycles=10 alias=part2
part2: run driver=diag cycles=20 alias=part2 part2: run driver=diag cycles=20 alias=part2
``` ```
These forms simply provide finesse for common editing habits, but they are
automatically read internally as a list. In the map form, the names are discarded, These forms simply provide finesse for common editing habits, but they are automatically read internally as a list. In
but they may be descriptive enough for use as inline docs for some users. The the map form, the names are discarded, but they may be descriptive enough for use as inline docs for some users. The
order is retained as listed, since the names have no bearing on the order. order is retained as listed, since the names have no bearing on the order.
## Scenario selection ## Scenario selection
When a named scenario is run, it is *always* named, so that it can be looked up When a named scenario is run, it is *always* named, so that it can be looked up in the list of named scenarios under
in the list of named scenarios under your `scenarios:` property. The only your `scenarios:` property. The only exception to this is when an explicit scenario name is not found on the command
exception to this is when an explicit scenario name is not found on the command
line, in which case it is automatically assumed to be _default_. line, in which case it is automatically assumed to be _default_.
Some examples may be more illustrative: Some examples may be more illustrative:
@ -69,27 +69,24 @@ You can run multiple named scenarios in the same command if
## Workload selection ## Workload selection
The examples above contain no reference to a workload (formerly called _yaml_). The examples above contain no reference to a workload (formerly called _yaml_). They don't need to, as they refer to
They don't need to, as they refer to themselves implicitly. You may add a `workload=` themselves implicitly. You may add a `workload=` parameter to the command templates if you like, but this is never
parameter to the command templates if you like, but this is never needed for basic needed for basic use, and it is error prone to keep the filename matched to the command template. Just leave it out by
use, and it is error prone to keep the filename matched to the command template. Just default.
leave it out by default.
_However_, if you are doing advanced scripting across multiple systems, you can _However_, if you are doing advanced scripting across multiple systems, you can actually provide a `workload=` parameter
actually provide a `workload=` parameter particularly to use another workload particularly to use another workload description in your test.
description in your test.
:::info :::info
This is a powerful feature for workload automation and organization. However, it can This is a powerful feature for workload automation and organization. However, it can get unweildy quickly. Caution is
get unweildy quickly. Caution is advised for deep-linking too many scenarios in a workspace, advised for deep-linking too many scenarios in a workspace, as there is no mechanism for keeping them in sync when small
as there is no mechanism for keeping them in sync when small changes are made. changes are made.
::: :::
## Named Scenario Discovery ## Named Scenario Discovery
For named scenarios, there is a way for users to find all the named scenarios that are For named scenarios, there is a way for users to find all the named scenarios that are currently bundled or in view of
currently bundled or in view of their current directory. A couple simple rules must their current directory. A couple simple rules must be followed by scenario publishers in order to keep things simple:
be followed by scenario publishers in order to keep things simple:
1. Workload files in the current directory `*.yaml` are considered. 1. Workload files in the current directory `*.yaml` are considered.
2. Workload files under in the relative path `activities/` with name `*.yaml` are 2. Workload files under in the relative path `activities/` with name `*.yaml` are
@ -99,38 +96,33 @@ be followed by scenario publishers in order to keep things simple:
4. Any workload file that contains a `scenarios:` tag is included, but all others 4. Any workload file that contains a `scenarios:` tag is included, but all others
are ignored. are ignored.
This doesn't mean that you can't use named scenarios for workloads in other locations. This doesn't mean that you can't use named scenarios for workloads in other locations. It simply means that when users
It simply means that when users use the `--list-scenarios` option, these are the only use the `--list-scenarios` option, these are the only ones they will see listed.
ones they will see listed.
## Parameter Overrides ## Parameter Overrides
You can override parameters that are provided by named scenarios. Any parameter You can override parameters that are provided by named scenarios. Any parameter that you specify on the command line
that you specify on the command line after your workload and optional scenario name after your workload and optional scenario name will be used to override or augment the commands that are provided for
will be used to override or augment the commands that are provided for the named scenario. the named scenario.
This is powerful, but it also means that you can sometimes munge user-provided This is powerful, but it also means that you can sometimes munge user-provided activity parameters on the command line
activity parameters on the command line with the named scenario commands in ways with the named scenario commands in ways that may not make sense. To solve this, the parameters in the named scenario
that may not make sense. To solve this, the parameters in the named scenario commands commands may be locked. You can lock them silently, or you can provide a verbose locking that will cause an error if the
may be locked. You can lock them silently, or you can provide a verbose locking that will user even tries to adjust them.
cause an error if the user even tries to adjust them.
Silent locking is provided with a form like `param==value`. Any silent locked parameters Silent locking is provided with a form like `param==value`. Any silent locked parameters will reject overrides from the
will reject overrides from the command line, but will not interrupt the user. command line, but will not interrupt the user.
Verbose locking is provided with a form like `param===value`. Any time a user provides Verbose locking is provided with a form like `param===value`. Any time a user provides a parameter on the command line
a parameter on the command line for the named parameter, an error is thrown and they for the named parameter, an error is thrown and they are informed that this is not possible. This level is provided for
are informed that this is not possible. This level is provided for cases in which you cases in which you would not want the user to be unaware of an unset parameter which is germain and specific to the
would not want the user to be unaware of an unset parameter which is germain and specific named scenario.
to the named scenario.
All other parameters provided by the user will take the place of the same-named parameters All other parameters provided by the user will take the place of the same-named parameters provided in *each* command
provided in *each* command templates, in the order they appear in the template. templates, in the order they appear in the template. Any other parameters provided by the user will be added to *each*
Any other parameters provided by the user will be added to *each* of the command templates of the command templates in the order they appear on the command line.
in the order they appear on the command line.
This is a little counter-intuitive at first, but once you see some examples it should This is a little counter-intuitive at first, but once you see some examples it should make sense.
make sense.
## Parameter Overide Examples ## Parameter Overide Examples
@ -176,9 +168,8 @@ $
### Silent Locking example ### Silent Locking example
If you run the second scenario `s2` with your own value for `cycles=7`, then it does If you run the second scenario `s2` with your own value for `cycles=7`, then it does what the locked parameter
what the locked parameter `cycles==10` requires, without telling you that it is `cycles==10` requires, without telling you that it is ignoring the specified value on your command line.
ignoring the specified value on your command line.
``` ```
$ nb basics s2 cycles=7 $ nb basics s2 cycles=7
@ -200,19 +191,15 @@ Sometimes, this is appropriate, such as when specifying settings like `threads==
### Verbose Locking example ### Verbose Locking example
If you run the third scenario `s3` with your own value for `cycles=7`, then you If you run the third scenario `s3` with your own value for `cycles=7`, then you will get an error telling you that this
will get an error telling you that this is not possible. Sometimes you want to is not possible. Sometimes you want to make sure tha the user knows a parameter should not be changed, and that if they
make sure tha the user knows a parameter should not be changed, and that if they want to change it, they'll have to make their own custom version of the scenario in question.
want to change it, they'll have to make their own custom version of the scenario
in question.
``` ```
$ nb basics s3 cycles=7 $ nb basics s3 cycles=7
ERROR: Unable to reassign value for locked param 'cycles===7' ERROR: Unable to reassign value for locked param 'cycles===7'
$ $
``` ```
Ultimately, it is up to the scenario designer when to lock parameters for users. Ultimately, it is up to the scenario designer when to lock parameters for users. The built-in workloads offer some
The built-in workloads offer some examples on how to set these parameters so that examples on how to set these parameters so that the right value are locked in place without bother the user, but some
the right value are locked in place without bother the user, but some values values are made very clear in how they should be set. Please look at these examples for inspiration when you need.
are made very clear in how they should be set. Please look at these examples
for inspiration when you need.

View File

@ -5,78 +5,86 @@ weight: 99
## Diagnostics ## Diagnostics
This section describes errors that you might see if you have a YAML loading issue, and what This section describes errors that you might see if you have a YAML loading issue, and what you can do to fix them.
you can do to fix them.
### Undefined Name-Statement Tuple ### Undefined Name-Statement Tuple
This exception is thrown when the statement body is not found in a statement definition This exception is thrown when the statement body is not found in a statement definition in any of the supported formats.
in any of the supported formats. For example, the following block will cause an error: For example, the following block will cause an error:
statements: ```yaml
- name: statement-foo statements:
params: - name: statement-foo
aparam: avalue params:
aparam: avalue
```
This is because `name` and `params` are reserved property names -- removed from the list of name-value This is because `name` and `params` are reserved property names -- removed from the list of name-value pairs before free
pairs before free parameters are read. If the statement is not defined before free parameters parameters are read. If the statement is not defined before free parameters are read, then the first free parameter is
are read, then the first free parameter is taken as the name and statement in `name: statement` form. taken as the name and statement in `name: statement` form.
To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry with a
with a `statement-foo: statement body` at the top of the map: `statement-foo: statement body` at the top of the map:
Either of these will work: Either of these will work:
statements: ```yaml
- name: statement-foo statements:
stmt: statement body - name: statement-foo
params: stmt: statement body
aparam: avalue params:
aparam: avalue
---
statements:
- statement-foo: statement body
params:
aparam: avalue
```
statements: In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit naming
- statement-foo: statement body should occur.
params:
aparam: avalue
In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit
naming should occur.
### Redefined Name-Statement Tuple ### Redefined Name-Statement Tuple
This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception to avoid
to avoid possible ambiguity about which value the user intended. For example, the following statements possible ambiguity about which value the user intended. For example, the following statements definition will cause an
definition will cause an error: error:
statements: ```yaml
- name: name1 statements:
name2: statement body - name: name1
name2: statement body
```
This is an error because the statement is not defined before free parameters are read, and the `name: statement` This is an error because the statement is not defined before free parameters are read, and the `name: statement` form
form includes a second definition for the statement name. In order to correct this, simply remove the separate includes a second definition for the statement name. In order to correct this, simply remove the separate `name` entry,
`name` entry, or use the `stmt` property to explicitly set the statement body. Either of these will work: or use the `stmt` property to explicitly set the statement body. Either of these will work:
statements: ```yaml
- name2: statement body statements:
- name2: statement body
statements: ---
- name: name1 statements:
stmt: statement body - name: name1
stmt: statement body
```
In both cases, there is only one name defined for the statement according to the supported formats. In both cases, there is only one name defined for the statement according to the supported formats.
### YAML Parsing Error ### YAML Parsing Error
This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not working from
working from examples that are known to load cleanly, then please review your document for correctness examples that are known to load cleanly, then please review your document for correctness according to the
according to the [YAML Specification](). [YAML Specification]().
If you are sure that the YAML should load, then please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) If you are sure that the YAML should load, then please
with details on the type of YAML file you are trying to load. [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details on the type of YAML
file you are trying to load.
### YAML Construction Error ### YAML Construction Error
This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed from the
from the in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation. in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation. Please
Please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details on the type of YAML
on the type of YAML file you are trying to load. file you are trying to load.

View File

@ -5,27 +5,42 @@ weight: 40
# Designing Workloads # Designing Workloads
Workloads in nosqlbench are always controlled by a workload definition. Even the built-in workloads are simply pre-configured and controlled from a single YAML file which is bundled internally. Workloads in nosqlbench are always controlled by a workload definition.
Even the built-in workloads are simply pre-configured and controlled
from a single YAML file which is bundled internally.
With nosqlbench a standard YAML configuration format is provided that is used across all activity types. This makes it easy to specify statements, statement parameters, data bindings, and tags. This section describes the standard YAML format and how to use it. With nosqlbench a standard YAML configuration format is provided that is
used across all activity types. This makes it easy to specify
statements, statement parameters, data bindings, and tags. This section
describes the standard YAML format and how to use it.
It is recommended that you read through the examples in each of the design sections in order. This guide was designed to give you a detailed understanding of workload construction with nosqlbench. The examples will also give you better insight into how nosqlbench works at a fundamental level. It is recommended that you read through the examples in each of the
design sections in order. This guide was designed to give you a detailed
understanding of workload construction with nosqlbench. The examples
will also give you better insight into how nosqlbench works at a
fundamental level.
## Multi-Protocol Support ## Multi-Protocol Support
You will notice that this guide is not overly CQL-specific. That is because nosqlbench is a multi-protocol tool. All that is needed for you to use this guide with other protocols is the release of more activity types. Try to keep that in mind as you think about designing workloads. You will notice that this guide is not overly CQL-specific. That is
because nosqlbench is a multi-protocol tool. All that is needed for you
to use this guide with other protocols is the release of more activity
types. Try to keep that in mind as you think about designing workloads.
## Advice for new builders ## Advice for new builders
### Review existing examples ### Review existing examples
The built-in workloads that are include with nosqlbench are also shared on the github site where we manage the nosqlbench project: The built-in workloads that are include with nosqlbench are also shared
on the github site where we manage the nosqlbench project:
- [baselines](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/baselines) - [baselines](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/baselines)
- [bindings](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/bindings) - [bindings](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/bindings)
### Follow the conventions ### Follow the conventions
The tagging conventions described under the YAML Conventions section will make your testing go smoother. All of the baselines that we publish for nosqlbench will use this form. The tagging conventions described under the YAML Conventions section
will make your testing go smoother. All of the baselines that we publish
for nosqlbench will use this form.

View File

@ -1,5 +1,5 @@
--- ---
title: activity type - CQL title: driver - CQL
weight: 06 weight: 06
--- ---
@ -16,35 +16,31 @@ To select this activity type, pass `driver=cql` to a run or start command.
# cql activity type # cql activity type
This is an activity type which allows for the execution of CQL statements. This is an activity type which allows for the execution of CQL statements. This particular activity type is wired
This particular activity type is wired synchronously within each client synchronously within each client thread, however the async API is used in order to expose fine-grain metrics about op
thread, however the async API is used in order to expose fine-grain binding, op submission, and waiting for a result.
metrics about op binding, op submission, and waiting for a result.
### Example activity definitions ### Example activity definitions
Run a cql activity named 'cql1', with definitions from activities/cqldefs.yaml Run a cql activity named 'cql1', with definitions from activities/cqldefs.yaml
~~~
... driver=cql alias=cql1 workload=cqldefs ... driver=cql alias=cql1 workload=cqldefs
~~~
Run a cql activity defined by cqldefs.yaml, but with shortcut naming Run a cql activity defined by cqldefs.yaml, but with shortcut naming
~~~
... driver=cql workload=cqldefs ... driver=cql workload=cqldefs
~~~
Only run statement groups which match a tag regex Only run statement groups which match a tag regex
~~~
... driver=cql workload=cqldefs tags=group:'ddl.*' ... driver=cql workload=cqldefs tags=group:'ddl.*'
~~~
Run the matching 'dml' statements, with 100 cycles, from [1000..1100) Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
~~~
... driver=cql workload=cqldefs tags=group:'dml.*' cycles=1000..1100
~~~ ... driver=cql workload=cqldefs tags=group:'dml.*' cycles=1000..1100
This last example shows that the cycle range is [inclusive..exclusive),
to allow for stacking test intervals. This is standard across all This last example shows that the cycle range is [inclusive..exclusive), to allow for stacking test intervals. This is
activity types. standard across all activity types.
### CQL ActivityType Parameters ### CQL ActivityType Parameters

View File

@ -23,19 +23,16 @@ that uses the curly brace token form in statements.
## Example activity definitions ## Example activity definitions
Run a stdout activity named 'stdout-test', with definitions from activities/stdout-test.yaml Run a stdout activity named 'stdout-test', with definitions from activities/stdout-test.yaml
~~~
... driver=stdout workload=stdout-test ... driver=stdout workload=stdout-test
~~~
Only run statement groups which match a tag regex Only run statement groups which match a tag regex
~~~
... driver=stdout workload=stdout-test tags=group:'ddl.*' ... driver=stdout workload=stdout-test tags=group:'ddl.*'
~~~
Run the matching 'dml' statements, with 100 cycles, from [1000..1100) Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
~~~
... driver=stdout workload=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv ... driver=stdout workload=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv
~~~
This last example shows that the cycle range is [inclusive..exclusive), This last example shows that the cycle range is [inclusive..exclusive),
to allow for stacking test intervals. This is standard across all to allow for stacking test intervals. This is standard across all
@ -54,45 +51,50 @@ activity types.
## Configuration ## Configuration
This activity type uses the uniform yaml configuration format. This activity type uses the uniform yaml configuration format. For more details on this format, please refer to the
For more details on this format, please refer to the
[Standard YAML Format](http://docs.engineblock.io/user-guide/standard_yaml/) [Standard YAML Format](http://docs.engineblock.io/user-guide/standard_yaml/)
## Configuration Parameters ## Configuration Parameters
- **newline** - If a statement has this param defined, then it determines - **newline** - If a statement has this param defined, then it determines whether or not to automatically add a missing
whether or not to automatically add a missing newline for that statement newline for that statement only. If this is not defined for a statement, then the activity-level parameter takes
only. If this is not defined for a statement, then the activity-level precedence.
parameter takes precedence.
## Statement Format ## Statement Format
The statement format for this activity type is a simple string. Tokens between The statement format for this activity type is a simple string. Tokens between curly braces are used to refer to binding
curly braces are used to refer to binding names, as in the following example: names, as in the following example:
statements: ```yaml
- "It is {minutes} past {hour}." statements:
- "It is {minutes} past {hour}."
```
If you want to suppress the trailing newline that is automatically added, then If you want to suppress the trailing newline that is automatically added, then
you must either pass `newline=false` as an activity param, or specify it you must either pass `newline=false` as an activity param, or specify it
in the statement params in your config as in: in the statement params in your config as in:
```yaml
params: params:
newline: false newline: false
```
### Auto-generated statements ### Auto-generated statements
If no statement is provided, then the defined binding names are used as-is If no statement is provided, then the defined binding names are used as-is to create a CSV-style line format. The values
to create a CSV-style line format. The values are concatenated with are concatenated with comma delimiters, so a set of bindings like this:
comma delimiters, so a set of bindings like this:
bindings: ```yaml
one: Identity() bindings:
two: NumberNameToString() one: Identity()
two: NumberNameToString()
```
would create an automatic string template like this: would create an automatic string template like this:
statements: ```yaml
- "{one},{two}\n" statements:
- "{one},{two}\n"
```
The auto-generation behavior is forced when the format parameter is supplied. The auto-generation behavior is forced when the format parameter is supplied.

View File

@ -3,11 +3,12 @@ title: Driver Types
weight: 50 weight: 50
--- ---
Each nosqlbench scenario is comprised of one or more activities of a specific type. Each nosqlbench scenario is comprised of one or more activities of a
The types of activities available are provided by the version of nosqlbench. specific type. The types of activities available are provided by the
version of nosqlbench.
Additional activity types will be added in future releases. Additional drivers will be added in future releases. There are command
There are command line help topics for each activity type (driver). line help topics for each activity type (driver).
To get a list of topics run: To get a list of topics run:

View File

@ -4,17 +4,25 @@ title: CLI Scripting
# CLI Scripting # CLI Scripting
Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to write javascript for every scenario. Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in
between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to
write javascript for every scenario.
For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them together. For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key
API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them
together.
## Script Construction ## Script Construction
As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a snippet of script. It is important to remember that order is important. As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once
the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a
snippet of script. It is important to remember that order is important.
## Command line format ## Command line format
Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you can simply string multiple commands together with spaces between. As usual, single word options without double dashes are commands, key=value style parameters apply to the previous command, and all other commands with Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you
can simply string multiple commands together with spaces between. As usual, single word options without double dashes
are commands, key=value style parameters apply to the previous command, and all other commands with
--this-style --this-style
@ -22,28 +30,35 @@ are non-scripting options.
## Concurrency & Control ## Concurrency & Control
All activities that run during a scenario run under the control of, but All activities that run during a scenario run under the control of, but independently from the scenario script. This
independently from the scenario script. This means that you can have a number of activities running while the scenario script is doing its own thing. The scenario only completes when both the scenario script and the activities are finished. means that you can have a number of activities running while the scenario script is doing its own thing. The scenario
only completes when both the scenario script and the activities are finished.
### `start driver=<activity type> alias=<alias> ...` ### `start driver=<activity type> alias=<alias> ...`
You can start an activity with this command. At the time this command is You can start an activity with this command. At the time this command is evaluated, the activity is started, and the
evaluated, the activity is started, and the script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in this way, they will run concurrently. script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in
this way, they will run concurrently.
The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless you want to be able to interact with the started activity later. In any case, it is a good idea to name all your activities with a meaningful alias. The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless
you want to be able to interact with the started activity later. In any case, it is a good idea to name all your
activities with a meaningful alias.
### `stop <alias>` ### `stop <alias>`
Stop an activity with the given alias. This is synchronous, and causes the Stop an activity with the given alias. This is synchronous, and causes the scenario to pause until the activity is
scenario to pause until the activity is stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state. stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state.
### `await <alias>` ### `await <alias>`
Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a paused state until the named activity is complete. Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits
for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a
paused state until the named activity is complete.
### `run driver=<activity type> alias=<alias> ...` ### `run driver=<activity type> alias=<alias> ...`
Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is effectively the same as Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is
effectively the same as
start driver=<activity type> ... alias=<alias> start driver=<activity type> ... alias=<alias>
await <alias> await <alias>
@ -71,7 +86,8 @@ await one \
stop two stop two
~~~ ~~~
in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative of what happens when it is run. in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative
of what happens when it is run.
1. An activity named 'a' is started, with 100K cycles of work. 1. An activity named 'a' is started, with 100K cycles of work.
2. An activity named 'b' is started, with 200K cycles of work. 2. An activity named 'b' is started, with 200K cycles of work.

View File

@ -6,81 +6,115 @@ title: Scenario Scripting
## Motive ## Motive
The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done by human hands. The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not
accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of
scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this
approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done
by human hands.
## Machinery, Controls & Instruments ## Machinery, Controls & Instruments
All of the heavy lifting is left to Java and the core nosqlbench runtime. This includes the iterative workloads that are meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open design palette of a first-class scripting language. All of the heavy lifting is left to Java and the core nosqlbench runtime. This includes the iterative workloads that are
meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually
GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity
logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity
workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open
design palette of a first-class scripting language.
Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic control points and parameters which special to that activity type (driver). This exposes a full feedback loop between a running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a running activity and make changes to it on the fly. Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic
control points and parameters which special to that activity type (driver). This exposes a full feedback loop between a
running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a
running activity and make changes to it on the fly.
## Scripting Environment ## Scripting Environment
The nosqlbench scripting environment provided has a few The nosqlbench scripting environment provided has a few modifications meant to streamline understanding and usage of
modifications meant to streamline understanding and usage of nosqlbench dynamic parameters and metric. nosqlbench dynamic parameters and metric.
### Active Bindings ### Active Bindings
Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime. Each of the variables Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime.
below is pre-wired into each script environment. Each of the variables below is pre-wired into each script environment.
#### scenario #### scenario
This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this Java type are provided This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this
to the scripting environment directly. Java type are provided to the scripting environment directly.
#### activities.&lt;alias&gt;.&lt;paramname&gt; #### activities.&lt;alias&gt;.&lt;paramname&gt;
Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value to it as in `activities.foo.threads=3`. Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you
Any assignments take effect synchronously before the next line of the script continues executing. can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value
to it as in `activities.foo.threads=3`. Any assignments take effect synchronously before the next line of the script
continues executing.
#### __metrics__.&lt;alias&gt;.&lt;metric name&gt; #### __metrics__.&lt;alias&gt;.&lt;metric name&gt;
Each activity metric for a given activity alias is available at this name. Each activity metric for a given activity alias is available at this name. This gives you access to the metrics objects
This gives you access to the metrics objects directly. Some metrics objects directly. Some metrics objects have also been enhanced with wrapper logic to provide simple getters and setters, like
have also been enhanced with wrapper logic to provide simple getters and setters, like `.p99ms` or `.p99ns`, for example. `.p99ms` or `.p99ns`, for example.
Interaction with the nosqlbench runtime and the activities therein is made easy Interaction with the nosqlbench runtime and the activities therein is made easy by the above variables and objects. When
by the above variables and objects. When an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to _threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other changes are further propagated directly to the thread harnesses and components which implement the ActivityType. an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to
_threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other
changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
:::warning :::warning
Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a different driver once it has been created. Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a
different driver once it has been created.
::: :::
You can make use of more extensive Java or Javascript libraries as needed, You can make use of more extensive Java or Javascript libraries as needed, mixing then with the runtime controls
mixing then with the runtime controls provided above. provided above.
## Enhanced Metrics for Scripting ## Enhanced Metrics for Scripting
The metrics available in nosqlbench are slightly different than the standard The metrics available in nosqlbench are slightly different than the standard kit with dropwizard metrics. The key
kit with dropwizard metrics. The key differences are: differences are:
### HDR Histograms ### HDR Histograms
All histograms use HDR histograms with *four* significant digits. All histograms use HDR histograms with *four* significant digits.
All histograms reset on snapshot, automatically keeping all data until you All histograms reset on snapshot, automatically keeping all data until you report the snapshot or access the snapshot
report the snapshot or access the snapshot via scripting. (see below). via scripting. (see below).
The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything differently in your reporter config to use them. However, if you need to use the enhanced versions in your local scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions, then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full snapshots for each report to your downstream metrics systems. The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything
differently in your reporter config to use them. However, if you need to use the enhanced versions in your local
scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions,
then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full
snapshots for each report to your downstream metrics systems.
### Scripting with Delta Snapshots ### Scripting with Delta Snapshots
For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as &lt;metric&gt;.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the reported snapshot will be exactly what was used in the script. For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as
&lt;metric&gt;.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the
deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes
the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for
that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the
reported snapshot will be exactly what was used in the script.
This is important for using local scripting methods and calculations with aggregate views downstream. It means that the histograms will match up between your local script output and your downstream dashboards, as they will both be using the same frame of data, when done properly. This is important for using local scripting methods and calculations with aggregate views downstream. It means that the
histograms will match up between your local script output and your downstream dashboards, as they will both be using the
same frame of data, when done properly.
### Histogram Convenience Methods ### Histogram Convenience Methods
All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98, P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and every other possible combination. This means that you can access the 99th percentile metric value in your scripts for activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_. All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98,
P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and
every other possible combination. This means that you can access the 99th percentile metric value in your scripts for
activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_.
## Control Flow ## Control Flow
When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop all activities. When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its
end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop
all activities.
## Strategies ## Strategies
You can use nosqlbench in the classic form with `run driver=<activity_type> param=value ...` command line syntax. There are reasons, however, that you will sometimes want customize and modify your scripts directly, such as: You can use nosqlbench in the classic form with `run driver=<activity_type> param=value ...` command line syntax. There
are reasons, however, that you will sometimes want customize and modify your scripts directly, such as:
- Permute test variables to cover many sub-conditions in a test. - Permute test variables to cover many sub-conditions in a test.
- Automatically adjust load factors to identify the nominal capacity of a system. - Automatically adjust load factors to identify the nominal capacity of a system.
@ -89,7 +123,9 @@ You can use nosqlbench in the classic form with `run driver=<activity_type> para
## Script Input & Output ## Script Input & Output
Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that each line was recorded from. Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the
logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that
each line was recorded from.
## External Docs ## External Docs

View File

@ -4,23 +4,34 @@ title: Standard Metrics
# Standard Metrics # Standard Metrics
nosqlbench comes with a set of standard metrics that will be part of every activity type (driver). Each activity type (driver) enhances the metrics available by adding their own metrics with the nosqlbench APIs. This section explains what the standard metrics are, and how to interpret them. nosqlbench comes with a set of standard metrics that will be part of every activity type (driver). Each activity type
(driver) enhances the metrics available by adding their own metrics with the nosqlbench APIs. This section explains what
the standard metrics are, and how to interpret them.
## read-input ## read-input
Within nosqlbench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only measured the time it takes for a given activity thread to read the input value, nothing more. Within nosqlbench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that
will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric
is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only
measured the time it takes for a given activity thread to read the input value, nothing more.
## strides ## strides
A stride represents the work-unit for a thread within nosqlbench. It allows a set of cycles to be logically grouped together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them. A stride represents the work-unit for a thread within nosqlbench. It allows a set of cycles to be logically grouped
together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple
operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them.
The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring time before the cycle starts, and stops measuring after the last cycle in the stride has run. The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring
time before the cycle starts, and stops measuring after the last cycle in the stride has run.
## cycles ## cycles
Within nosqlbench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an iteration of a workload. This corresponds to a single operation executed according to some statement definition. Within nosqlbench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an
iteration of a workload. This corresponds to a single operation executed according to some statement definition.
The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by multi-phase actions. The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has
control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by
multi-phase actions.

View File

@ -4,26 +4,45 @@ title: Timing Terms
# Timing Terms # Timing Terms
Often, terms used to describe latency can create confusion. Often, terms used to describe latency can create confusion. In fact, the term _latency_ is so overloaded in practice
In fact, the term _latency_ is so overloaded in practice that it is not useful by itself. Because of this, nosqlbench will avoid using the term latency _except in a specific way_. Instead, the terms described in this section will be used. that it is not useful by itself. Because of this, nosqlbench will avoid using the term latency _except in a specific
way_. Instead, the terms described in this section will be used.
nosqlbench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it *cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that are adopted with nosqlbench. nosqlbench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to
what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it
*cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that
are adopted with nosqlbench.
Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When using this term to describe other effects in your system, what is included depends on the perspective of the requester. The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is defined by the vantage point of the requester. This is the perspective taken by the nosqlbench approach for naming and semantics below. Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When
using this term to describe other effects in your system, what is included depends on the perspective of the requester.
The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is
defined by the vantage point of the requester. This is the perspective taken by the nosqlbench approach for naming and
semantics below.
## responsetime ## responsetime
**The duration of time a user has to wait for a response from the time they submitted the request.** Response time is the duration of time from when a request was expected to start, to the time at which the response is finally seen by the user. A request is generally expected to start immediately when users make a request. For example, when a user enters a URL into a browser, they expect the request to start immediately when they hit enter. **The duration of time a user has to wait for a response from the time they submitted the request.** Response time is
the duration of time from when a request was expected to start, to the time at which the response is finally seen by the
user. A request is generally expected to start immediately when users make a request. For example, when a user enters a
URL into a browser, they expect the request to start immediately when they hit enter.
In nosqlbench, the response time for any operation can be calculated by adding its wait time and its the service time together. In nosqlbench, the response time for any operation can be calculated by adding its wait time and its the service time
together.
## waittime ## waittime
**The duration of time between when an operation is intended to start and when it actually starts on a client.** This is also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that delays a client request, it must wait. **The duration of time between when an operation is intended to start and when it actually starts on a client.** This is
also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests
instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This
ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that
delays a client request, it must wait.
Wait time can accumulate when you are running something according to a dispatch rate, as with a rate limiter. Wait time can accumulate when you are running something according to a dispatch rate, as with a rate limiter.
## servicetime ## servicetime
**The duration of time it takes a server or other system to fully process to a request and send a response.** From the perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the service time metrics in nosqlbench include any operational time that is external to the client, including transport latency. **The duration of time it takes a server or other system to fully process to a request and send a response.** From the
perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the
service time metrics in nosqlbench include any operational time that is external to the client, including transport
latency.

View File

@ -6,7 +6,8 @@ title: Advanced Metrics
## Unit of Measure ## Unit of Measure
All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4 digits of precision using HDR histograms. All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4
digits of precision using HDR histograms.
## Metric Outputs ## Metric Outputs
@ -19,14 +20,14 @@ Metrics from a scenario run can be gathered in multiple ways:
- To a monitoring system via graphite - To a monitoring system via graphite
- via the --docker-metrics option - via the --docker-metrics option
With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command line options for enabling these are documented in the built-in help, although some examples of these may be found below. With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command
line options for enabling these are documented in the built-in help, although some examples of these may be found below.
## Metrics via Graphite ## Metrics via Graphite
If you like to have all of your testing data in one place, then you may be If you like to have all of your testing data in one place, then you may be interested in reporting your measurements to
interested in reporting your measurements to a monitoring system. For this, a monitoring system. For this, nosqlbench includes a
nosqlbench includes a [Metrics Library](https://github.com/dropwizard/metrics). [Metrics Library](https://github.com/dropwizard/metrics). Graphite reporting is baked in as the default reporter.
Graphite reporting is baked in as the default reporter.
In order to enable graphite reporting, use one of these options formats: In order to enable graphite reporting, use one of these options formats:
@ -43,12 +44,16 @@ Core metrics use the prefix _engineblock_ by default. You can override this with
## Identifiers ## Identifiers
Metrics associated with a specific activity will have the activity alias in Metrics associated with a specific activity will have the activity alias in their name. There is a set of core metrics
their name. There is a set of core metrics which are always present regardless of the activity type. The names and types of additional metrics provided for each activity type vary. which are always present regardless of the activity type. The names and types of additional metrics provided for each
activity type vary.
Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the most verbose, a metric name could take on the form like Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements, which is recommended. statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the
Just keep in mind that the double dash connects an activity's alias with named statements *within* that activity. most verbose, a metric name could take on the form like
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements,
which is recommended. Just keep in mind that the double dash connects an activity's alias with named statements *within*
that activity.
## HDR Histograms ## HDR Histograms
@ -63,26 +68,30 @@ If you want to record only certain metrics in this way, then use this form:
--log-histograms 'hdrdata.log:.*suffix' --log-histograms 'hdrdata.log:.*suffix'
Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The '.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as using '.\*', which matches all metrics. Any valid regex is allowed here. Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The
'.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as
using '.\*', which matches all metrics. Any valid regex is allowed here.
Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram reservoirs does come with a cost, so be sure to be specific in what you record as much as possible. Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram
reservoirs does come with a cost, so be sure to be specific in what you record as much as possible.
If you want to specify the recording interval, use this form: If you want to specify the recording interval, use this form:
--log-histograms 'hdrdata.log:.*suffix:5s' --log-histograms 'hdrdata.log:.*suffix:5s'
If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty, such as 'hdrdata.log::5s'. If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty,
such as 'hdrdata.log::5s'.
Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in terms of accuracy. Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in
terms of accuracy.
### Recording HDR Histogram Stats ### Recording HDR Histogram Stats
You can also record basic snapshots of histogram data on a periodic interval You can also record basic snapshots of histogram data on a periodic interval just like above with HDR histogram logs.
just like above with HDR histogram logs. The option to do this is: The option to do this is:
--log-histostats 'hdrstats.log:.*suffix:10s' --log-histostats 'hdrstats.log:.*suffix:10s'
Everything works the same as for hdr histogram logging, except that the format is in CSV as shown in the example below: Everything works the same as for hdr histogram logging, except that the format is in CSV as shown in the example below:
~~~ ~~~
@ -97,5 +106,8 @@ Tag=diag1.cycles,0.501,0.499,498,1024,2047,2047,4095,4095,4095,4095,4095,4095,40
... ...
~~~ ~~~
This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common percentiles are recorded directly. This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number
of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that
the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common
percentiles are recorded directly.