mirror of
https://github.com/nosqlbench/nosqlbench.git
synced 2025-02-25 18:55:28 -06:00
docs naming and formatting
This commit is contained in:
parent
3fcd0f6159
commit
cf81331a41
@ -1,16 +1,17 @@
|
||||
# cql activity type - advanced features
|
||||
# cql driver - advanced features
|
||||
|
||||
This is an addendum to the standard CQL Activity Type docs. For that, see "cql".
|
||||
Use the features in this guide carefully. They do not come with as much documentation
|
||||
as they are less used than the main CQL features.
|
||||
This is an addendum to the standard CQL Activity Type docs. For that,
|
||||
see "cql". Use the features in this guide carefully. They do not come
|
||||
with as much documentation as they are less used than the main CQL
|
||||
features.
|
||||
|
||||
### ResultSet and Row operators
|
||||
|
||||
Within the CQL Activity type, synchronous mode (activities with out
|
||||
the async= parameter), you have the ability to attach operators to a
|
||||
given statement such that it will get per-statement handling. These
|
||||
operators are ways of interrogating the result of an operation, saving
|
||||
values, or managing other side-effects for specific types of testing.
|
||||
Within the CQL Activity type, synchronous mode (activities with out the
|
||||
async= parameter), you have the ability to attach operators to a given
|
||||
statement such that it will get per-statement handling. These operators
|
||||
are ways of interrogating the result of an operation, saving values, or
|
||||
managing other side-effects for specific types of testing.
|
||||
|
||||
When enabled for a statement, operators are applied in this order:
|
||||
|
||||
@ -35,7 +36,7 @@ row data, you must apply a row operator as explained below.
|
||||
- **rowoperators** - If provided as a CQL statement param, then the
|
||||
list of operator names that follow, separated by a comma, will
|
||||
be used to attache Row operators to the given statement.
|
||||
|
||||
|
||||
## Available ResultSet Operators
|
||||
|
||||
- pushvars - Push a copy of the current thread local variables onto
|
||||
@ -44,11 +45,11 @@ row data, you must apply a row operator as explained below.
|
||||
conjunction with the row operators below.
|
||||
- popvars - Pop the last thread local variable set from the thread-local
|
||||
stack into vars, replacing the previous content. This does nothing
|
||||
with the ResultSet data.
|
||||
with the ResultSet data.
|
||||
- clearvars - Clears the contents of the thread local variables. This
|
||||
does nothign with the ResultSet data.
|
||||
- trace - Flags a statement to be traced on the server-side and then logs
|
||||
the details of the trace to the trace log file.
|
||||
does nothign with the ResultSet data.
|
||||
- trace - Flags a statement to be traced on the server-side and then
|
||||
logs the details of the trace to the trace log file.
|
||||
- log - Logs basic data to the main log. This is useful to verify that
|
||||
operators are loading and triggering as expected.
|
||||
- assert_singlerow - Throws an exception (ResultSetVerificationException)
|
||||
@ -61,22 +62,22 @@ Examples:
|
||||
- s1: |
|
||||
a statement
|
||||
rsoperators: pushvars, clearvars
|
||||
```
|
||||
```
|
||||
## Available Row Operators:
|
||||
|
||||
- savevars - Copies the values of the row into the thread-local variables.
|
||||
- saverows - Copies the rows into a special CQL-only thread local row state.
|
||||
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
statements:
|
||||
- s2: |
|
||||
a statement
|
||||
a statement
|
||||
rowoperators: saverows
|
||||
```
|
||||
|
||||
## Injecting additional Queries
|
||||
## Injecting additional Queries (Future)
|
||||
|
||||
It is possible to inject new operations to an activity. However, such
|
||||
operations are _indirect_ to cycles, since they must be based on the results
|
||||
|
@ -1,4 +1,4 @@
|
||||
# cqlverify activity type
|
||||
# cqlverify
|
||||
|
||||
This activity type allows you to read values from a database and compare them to
|
||||
the generated values that were expected to be written, row-by-row, producing a
|
||||
|
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Diag ActivityType
|
||||
title: Diag ActivityType
|
||||
weight: 32
|
||||
menu:
|
||||
main:
|
||||
@ -8,10 +8,9 @@ menu:
|
||||
weight: 12
|
||||
---
|
||||
|
||||
{{< warning >}}
|
||||
This section is out of date, and will be updated after the next major release
|
||||
with details on building async activity types.
|
||||
{{< /warning >}}
|
||||
{{< warning >}} This section is out of date, and will be updated after
|
||||
the next major release with details on building async drivers. {{<
|
||||
/warning >}}
|
||||
|
||||
If you take all the code chunks from this document and concatenate them
|
||||
together, you'll have 'diag', one of the in-build activity types for
|
||||
@ -241,4 +240,3 @@ report. If it is time to report, we mark the time in lastUpdate.
|
||||
|
||||
This is all there is to making an activity react to real-time changes in the activity definition.
|
||||
|
||||
|
@ -8,10 +8,9 @@ menu:
|
||||
weight: 12
|
||||
---
|
||||
|
||||
{{< warning >}}
|
||||
This section is out of date, and will be updated after the next major release
|
||||
with details on building async activity types.
|
||||
{{< /warning >}}
|
||||
{{< warning >}} This section is out of date, and will be updated after
|
||||
the next major release with details on building async drivers. {{<
|
||||
/warning >}}
|
||||
|
||||
|
||||
## Introduction
|
||||
@ -27,7 +26,7 @@ In an async activity, you still have multiple threads, but in this case, each th
|
||||
more asynchronous operations. The `async=100` parameter, for example, informs an activity that it needs to allocate
|
||||
100 total operations over the allocated threads. In the case of `async=100 threads=10`, it is the responsibility
|
||||
of the ActivityType's action dispenser to configure their actions to know that each of them can juggle 10 operations
|
||||
each.
|
||||
each.
|
||||
|
||||
{{< note >}}The *async* parameter has a standard meaning in nosqlbench. If it is defined, async is enabled. Its
|
||||
parameter value is the number of total async operations that can be in flight at any one instant, with the number
|
||||
@ -42,7 +41,7 @@ behavior but getting something else.
|
||||
|
||||
The contract between a motor and an action is very basic.
|
||||
|
||||
- Each motor submits as many async operations as is allowed to its action, as long as there are
|
||||
- Each motor submits as many async operations as is allowed to its action, as long as there are
|
||||
cycles remaining, until the action signals that it is at its limit.
|
||||
- As long as an action is able to retire an operation by giving a result back to its motor,
|
||||
the motor keeps providing one more and retiring one more, as long as there are cycles remaining.
|
||||
@ -74,8 +73,8 @@ as a developer.
|
||||
but it can return a simple op context if no specialization is needed.
|
||||
4. op contexts are recycled to avoid heap pressure for high data rates. This makes it relatively
|
||||
low-cost to use the specialized op context to hold contextual data that may otherwise be
|
||||
expensive to _malloc_ and _free_.
|
||||
|
||||
expensive to _malloc_ and _free_.
|
||||
|
||||
### Examples
|
||||
|
||||
Developers can refer to the Diag activity type implementation for further examples.
|
||||
Developers can refer to the Diag activity type implementation for further examples.
|
||||
|
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Building ActivityTypes
|
||||
title: Building ActivityTypes
|
||||
weight: 32
|
||||
menu:
|
||||
main:
|
||||
@ -15,7 +15,7 @@ menu:
|
||||
- Maven
|
||||
|
||||
|
||||
## Building new Activity Types
|
||||
## Building new Driver Types
|
||||
|
||||
1. Add the nosqlbench API to your project via Maven:
|
||||
|
||||
|
@ -1,22 +0,0 @@
|
||||
## Help Topics
|
||||
|
||||
### Built-in Component Docs
|
||||
|
||||
Generally, all named activity types, input types, oputput types, etc
|
||||
have their own documentation. You can access those with a command like:
|
||||
|
||||
PROG help diag
|
||||
|
||||
|
||||
### Advanced Topics
|
||||
|
||||
For any of the topics listed here, you can get detailed help by
|
||||
running PROG help <topic>.
|
||||
|
||||
- topics
|
||||
- commandline
|
||||
- cli_scripting
|
||||
- activity_inputs
|
||||
- activity_outputs
|
||||
- cycle_log
|
||||
|
@ -5,31 +5,34 @@ weight: 10
|
||||
|
||||
# Getting Support
|
||||
|
||||
In general, our goals with NoSQLBench are to make the help systems and examples wrap around the users like
|
||||
a suit of armor, so that they feel capable of doing most things without having to ask for help. Please
|
||||
keep this in mind when looking for personal support form our community, and help us find those places
|
||||
where the docs are lacking. Maybe you can help us by adding some missing docs!
|
||||
|
||||
In general, our goals with NoSQLBench are to make the help systems and examples wrap around the users like a suit of
|
||||
armor, so that they feel capable of doing most things without having to ask for help. Please keep this in mind when
|
||||
looking for personal support form our community, and help us find those places where the docs are lacking. Maybe you can
|
||||
help us by adding some missing docs!
|
||||
|
||||
## NoSQLBench Slack
|
||||
|
||||
There is a new [slack channel](https://join.slack.com/t/nosqlbench/shared_invite/zt-cu9f2jpe-XiHN3SsUDcjkVgxaURFuaw) for NoSQLBench.
|
||||
There is a new
|
||||
[slack channel](https://join.slack.com/t/nosqlbench/shared_invite/zt-cu9f2jpe-XiHN3SsUDcjkVgxaURFuaw) for NoSQLBench.
|
||||
Please join it if you are a new or existing NoSQLBench user and help us get it going!
|
||||
|
||||
## General Feedback
|
||||
|
||||
These guidelines are mirrored at the [Submitting Feedback](https://github.com/nosqlbench/nosqlbench/wiki/Submitting-Feedback)
|
||||
wiki page at the nosqlbench project site, which is also where any `[Submit Feedback]` links should will take you.
|
||||
These guidelines are mirrored at the
|
||||
[Submitting Feedback](https://github.com/nosqlbench/nosqlbench/wiki/Submitting-Feedback) wiki page at the nosqlbench
|
||||
project site, which is also where any `[Submit Feedback]` links should will take you.
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
If you think you have found a bug, please [file a bug report](https://github.com/nosqlbench/nosqlbench/issues/new?labels=bug).
|
||||
nosqlbench is actively used within DataStax, and verified bugs will get attention as resources permit. Bugs reports which are
|
||||
more detailed, or bug reports which include steps to reproduce will get attention first.
|
||||
If you think you have found a bug, please
|
||||
[file a bug report](https://github.com/nosqlbench/nosqlbench/issues/new?labels=bug). nosqlbench is actively used within
|
||||
DataStax, and verified bugs will get attention as resources permit. Bugs reports which are more detailed, or bug reports
|
||||
which include steps to reproduce will get attention first.
|
||||
|
||||
## Feature Requests
|
||||
|
||||
If you would like to see something in nosqlbench that is not there yet,
|
||||
please [submit a feature request](https://github.com/nosqlbench/nosqlbench/issues/new?labels=feature).
|
||||
If you would like to see something in nosqlbench that is not there yet,please
|
||||
[submit a feature request](https://github.com/nosqlbench/nosqlbench/issues/new?labels=feature).
|
||||
|
||||
## Documentation Requests
|
||||
|
||||
|
@ -5,12 +5,14 @@ weight: 0
|
||||
|
||||
## Welcome to NoSQLBench
|
||||
|
||||
Welcome to the documentation for NoSQLBench. This is a power tool that emulates real application workloads.
|
||||
This means that you can fast-track performance, sizing and data model testing without writing your own testing harness.
|
||||
Welcome to the documentation for NoSQLBench. This is a power tool that emulates real application workloads. This means
|
||||
that you can fast-track performance, sizing and data model testing without writing your own testing harness.
|
||||
|
||||
To get started right away, jump to the [Quick Start Example](/index.html#/docs/02_getting_started.html) from the menu on the left.
|
||||
To get started right away, jump to the
|
||||
[Quick Start Example](/index.html#/docs/02_getting_started.html) from the menu on the left.
|
||||
|
||||
To see the ways you can get NoSQLBench, check out the project site [DOWNLOADS.md](https://github.com/nosqlbench/nosqlbench/blob/master/DOWNLOADS.md).
|
||||
To see the ways you can get NoSQLBench, check out the project site
|
||||
[DOWNLOADS.md](https://github.com/nosqlbench/nosqlbench/blob/master/DOWNLOADS.md).
|
||||
|
||||
## What is NoSQLBench?
|
||||
|
||||
@ -18,54 +20,44 @@ NoSQLBench is a serious performance testing tool for the NoSQL ecosystem.
|
||||
|
||||
**NoSQLBench brings advanced testing capabilities into one tool that are not found in other testing tools.**
|
||||
|
||||
- You can run common testing workloads directly from the command line. You
|
||||
can start doing this within 5 minutes of reading this.
|
||||
- You can generate virtual data sets of arbitrary size, with deterministic
|
||||
data and statistically shaped values.
|
||||
- You can design custom workloads that emulate your application, contained
|
||||
in a single file, based on statement templates - no IDE or coding required.
|
||||
- You can immediately plot your results in a docker and grafana stack on Linux
|
||||
with a single command line option.
|
||||
- When needed, you can open the access panels and rewire the runtime behavior
|
||||
of NoSQLBench to do advanced testing, including a full scripting environment
|
||||
with Javascript.
|
||||
- You can run common testing workloads directly from the command line. You can start doing this within 5 minutes of
|
||||
reading this.
|
||||
- You can generate virtual data sets of arbitrary size, with deterministic data and statistically shaped values.
|
||||
- You can design custom workloads that emulate your application, contained in a single file, based on statement
|
||||
templates - no IDE or coding required.
|
||||
- You can immediately plot your results in a docker and grafana stack on Linux with a single command line option.
|
||||
- When needed, you can open the access panels and rewire the runtime behavior of NoSQLBench to do advanced testing,
|
||||
including a full scripting environment with Javascript.
|
||||
|
||||
The core machinery of NoSQLBench has been built with attention to detail.
|
||||
It has been battle tested within DataStax as a way to help users validate their
|
||||
data models, baseline system performance, and qualify system designs for scale.
|
||||
The core machinery of NoSQLBench has been built with attention to detail. It has been battle tested within DataStax as a
|
||||
way to help users validate their data models, baseline system performance, and qualify system designs for scale.
|
||||
|
||||
In short, NoSQLBench wishes to be a programmable power tool for performance
|
||||
testing. However, it is somewhat generic. It doesn't know directly about a
|
||||
particular type of system, or protocol. It simply provides a suitable machine
|
||||
harness in which to put your drivers and testing logic. If you know how to build
|
||||
a client for a particular kind of system, EB will let you load it like a plugin
|
||||
and control it dynamically.
|
||||
In short, NoSQLBench wishes to be a programmable power tool for performance testing. However, it is somewhat generic. It
|
||||
doesn't know directly about a particular type of system, or protocol. It simply provides a suitable machine harness in
|
||||
which to put your drivers and testing logic. If you know how to build a client for a particular kind of system, EB will
|
||||
let you load it like a plugin and control it dynamically.
|
||||
|
||||
Initially, NoSQLBench comes with support for CQL, but we would like to see this
|
||||
expanded with contributions from others.
|
||||
Initially, NoSQLBench comes with support for CQL, but we would like to see this expanded with contributions from others.
|
||||
|
||||
## Origins
|
||||
|
||||
The code in this project comes from multiple sources. The procedural data
|
||||
generation capability was known before as 'Virtual Data Set'. The core runtime
|
||||
and scripting harness was from the 'EngineBlock' project. The CQL support was
|
||||
previously used within DataStax. In March of 2020, DataStax and the project
|
||||
maintainers for these projects decided to put everything into one OSS project
|
||||
in order to make contributions and sharing easier for everyone. Thus, the new
|
||||
project name and structure was launched as nosqlbench.io. NoSQLBench is an
|
||||
independent project that is primarily sponsored by DataStax.
|
||||
The code in this project comes from multiple sources. The procedural data generation capability was known before as
|
||||
'Virtual Data Set'. The core runtime and scripting harness was from the 'EngineBlock' project. The CQL support was
|
||||
previously used within DataStax. In March of 2020, DataStax and the project maintainers for these projects decided to
|
||||
put everything into one OSS project in order to make contributions and sharing easier for everyone. Thus, the new
|
||||
project name and structure was launched as nosqlbench.io. NoSQLBench is an independent project that is primarily
|
||||
sponsored by DataStax.
|
||||
|
||||
We offer NoSQLBench as a new way of thinking about testing systems. It is not
|
||||
limited to testing only one type of system. It is our wish to build a community
|
||||
of users and practice around this project so that everyone in the NoSQL ecosystem
|
||||
can benefit from common concepts and understanding and reliable patterns of use.
|
||||
We offer NoSQLBench as a new way of thinking about testing systems. It is not limited to testing only one type of
|
||||
system. It is our wish to build a community of users and practice around this project so that everyone in the NoSQL
|
||||
ecosystem can benefit from common concepts and understanding and reliable patterns of use.
|
||||
|
||||
## Scalable User Experience
|
||||
|
||||
NoSQLBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to
|
||||
do just what you need without worrying about the rest. If you need to do something simple, it should
|
||||
be simple to find the right settings and just do it. If you need something more sophisticated, then you
|
||||
should be able to find what you need with a reasonable amount of effort and no surprises.
|
||||
NoSQLBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to do just what you
|
||||
need without worrying about the rest. If you need to do something simple, it should be simple to find the right settings
|
||||
and just do it. If you need something more sophisticated, then you should be able to find what you need with a
|
||||
reasonable amount of effort and no surprises.
|
||||
|
||||
That is the core design principle behind NoSQLBench. We hope you like it.
|
||||
|
||||
|
@ -12,21 +12,17 @@ Some of the features discussed here are only for advanced testing scenarios.
|
||||
|
||||
## Hybrid Rate Limiting
|
||||
|
||||
Rate limiting is a complicated endeavor, if you want to do it well. The basic
|
||||
rub is that going fast means you have to be less accurate, and vice-versa.
|
||||
As such, rate limiting is a parasitic drain on any system. The act of rate
|
||||
limiting is in and of itself poses a limit to the maximum rate, regardless
|
||||
of the settings you pick, because this forces your system to interact with
|
||||
some hardware notion of time passing, and this takes CPU cycles that could
|
||||
be going to the thing you are limiting.
|
||||
Rate limiting is a complicated endeavor, if you want to do it well. The basic rub is that going fast means you have to
|
||||
be less accurate, and vice-versa. As such, rate limiting is a parasitic drain on any system. The act of rate limiting is
|
||||
in and of itself poses a limit to the maximum rate, regardless of the settings you pick, because this forces your system
|
||||
to interact with some hardware notion of time passing, and this takes CPU cycles that could be going to the thing you
|
||||
are limiting.
|
||||
|
||||
This means that in practice, rate limiters are often very featureless. It's
|
||||
daunting enough to need rate limiting, and asking for anything more than
|
||||
that is often wishful thinking. Not so in NoSQLBench.
|
||||
This means that in practice, rate limiters are often very featureless. It's daunting enough to need rate limiting, and
|
||||
asking for anything more than that is often wishful thinking. Not so in NoSQLBench.
|
||||
|
||||
The rate limiter in NoSQLBench provides a comparable degree of performance
|
||||
and accuracy to others found in the Java ecosystem, but it *also* has advanced
|
||||
features:
|
||||
The rate limiter in NoSQLBench provides a comparable degree of performance and accuracy to others found in the Java
|
||||
ecosystem, but it *also* has advanced features:
|
||||
|
||||
- Allows a sliding scale between average rate limiting and strict rate limiting.
|
||||
- Internally accumulates delay time, for C.O. friendly metrics
|
||||
@ -35,60 +31,48 @@ features:
|
||||
|
||||
## Flexible Error Handling
|
||||
|
||||
An emergent facility in NoSQLBench is the way that error are handled within
|
||||
an activity. For example, with the CQL activity type, you are able to route
|
||||
error handling for any of the known exception types. You can count errors,
|
||||
you can log them. You can cause errored operations to auto-retry if possible,
|
||||
up to a configurable number of tries.
|
||||
An emergent facility in NoSQLBench is the way that error are handled within an activity. For example, with the CQL
|
||||
activity type, you are able to route error handling for any of the known exception types. You can count errors, you can
|
||||
log them. You can cause errored operations to auto-retry if possible, up to a configurable number of tries.
|
||||
|
||||
This means, that as a user, you get to decide what your test is about. Is it
|
||||
about measuring some nominal but anticipated level of errors due to intentional
|
||||
over-saturation? If so, then count the errors, and look at their histogram data
|
||||
for timing details within the available timeout.
|
||||
This means, that as a user, you get to decide what your test is about. Is it about measuring some nominal but
|
||||
anticipated level of errors due to intentional over-saturation? If so, then count the errors, and look at their
|
||||
histogram data for timing details within the available timeout.
|
||||
|
||||
Are you doing a basic stability test, where you want the test to error out
|
||||
for even the slightest error? You can configure for that if you need.
|
||||
Are you doing a basic stability test, where you want the test to error out for even the slightest error? You can
|
||||
configure for that if you need.
|
||||
|
||||
## Cycle Logging
|
||||
|
||||
It is possible to record the result status of each and every cycles in
|
||||
a NoSQLBench test run. If the results are mostly homogeneous, the RLE
|
||||
encoding of the results will reduce the output file down to a small
|
||||
fraction of the number of cycles. The errors are mapped to ordinals, and
|
||||
these ordinals are stored into a direct RLE-encoded log file. For most
|
||||
testing where most of the result are simply success, this file will be tiny.
|
||||
You can also convert the cycle log into textual form for other testing
|
||||
and post-processing and vice-versa.
|
||||
It is possible to record the result status of each and every cycles in a NoSQLBench test run. If the results are mostly
|
||||
homogeneous, the RLE encoding of the results will reduce the output file down to a small fraction of the number of
|
||||
cycles. The errors are mapped to ordinals, and these ordinals are stored into a direct RLE-encoded log file. For most
|
||||
testing where most of the result are simply success, this file will be tiny. You can also convert the cycle log into
|
||||
textual form for other testing and post-processing and vice-versa.
|
||||
|
||||
## Op Sequencing
|
||||
|
||||
The way that operations are planned for execution in NoSQLBench is based on
|
||||
a stable ordering that is configurable. The statement forms are mixed
|
||||
together based on their relative ratios. The three schemes currently supported
|
||||
are round-robin with exhaustion (bucket), duplicate in order (concat), and
|
||||
a way to spread each statement out over the unit interval (interval). These
|
||||
account for most configuration scenarios without users having to micro-manage
|
||||
their statement templates.
|
||||
The way that operations are planned for execution in NoSQLBench is based on a stable ordering that is configurable. The
|
||||
statement forms are mixed together based on their relative ratios. The three schemes currently supported are round-robin
|
||||
with exhaustion (bucket), duplicate in order (concat), and a way to spread each statement out over the unit interval
|
||||
(interval). These account for most configuration scenarios without users having to micro-manage their statement
|
||||
templates.
|
||||
|
||||
## Sync and Async
|
||||
|
||||
There are two distinct usage modes in NoSQLBench when it comes to operation
|
||||
dispatch and thread management:
|
||||
There are two distinct usage modes in NoSQLBench when it comes to operation dispatch and thread management:
|
||||
|
||||
### Sync
|
||||
|
||||
Sync is the default form. In this mode, each thread reads its sequence
|
||||
and dispatches one statement at a time, holding only one operation in flight
|
||||
per thread. This is the mode you often use when you want to emulate an
|
||||
application's request-per-thread model, as it implicitly linearizes the
|
||||
order of operations within the computed sequence of statements.
|
||||
Sync is the default form. In this mode, each thread reads its sequence and dispatches one statement at a time, holding
|
||||
only one operation in flight per thread. This is the mode you often use when you want to emulate an application's
|
||||
request-per-thread model, as it implicitly linearizes the order of operations within the computed sequence of
|
||||
statements.
|
||||
|
||||
### Async
|
||||
|
||||
In Async mode, each thread in an activity is reponsible for juggling a number
|
||||
of operations in-flight. This allows a NoSQLBench client to juggle an
|
||||
arbitrarily high number of connections, limited primarily by how much memory
|
||||
you have.
|
||||
In Async mode, each thread in an activity is reponsible for juggling a number of operations in-flight. This allows a
|
||||
NoSQLBench client to juggle an arbitrarily high number of connections, limited primarily by how much memory you have.
|
||||
|
||||
Internally, the Sync and Async modes have different code paths. It is possible
|
||||
for an activity type to support one or both of these.
|
||||
Internally, the Sync and Async modes have different code paths. It is possible for an activity type to support one or
|
||||
both of these.
|
||||
|
@ -5,61 +5,46 @@ weight: 2
|
||||
|
||||
# Refined Core Concepts
|
||||
|
||||
The core concepts that NoSQLBench is built on have been scrutinized,
|
||||
replaced, refined, and hardened through several years of use
|
||||
by users of various needs and backgrounds.
|
||||
The core concepts that NoSQLBench is built on have been scrutinized, replaced, refined, and hardened through several
|
||||
years of use by users of various needs and backgrounds.
|
||||
|
||||
This is important when trying to find a way to express common patterns
|
||||
in what is often a highly fragmented practice. Testing is hard. Scale
|
||||
testing is hard. Distributed testing is hard. We need a set of conceptual
|
||||
building blocks that can span across workloads and system types, and
|
||||
machinery to put these concepts to use. Some concepts used in NoSQLBench
|
||||
are shared below for illustration, but this is by no means an exhaustive
|
||||
list.
|
||||
This is important when trying to find a way to express common patterns in what is often a highly fragmented practice.
|
||||
Testing is hard. Scale testing is hard. Distributed testing is hard. We need a set of conceptual building blocks that
|
||||
can span across workloads and system types, and machinery to put these concepts to use. Some concepts used in NoSQLBench
|
||||
are shared below for illustration, but this is by no means an exhaustive list.
|
||||
|
||||
### The Cycle
|
||||
|
||||
Cycles in NoSQLBench are whole numbers on a number line. All operations
|
||||
in a NoSQLBench session are derived from a single cycle. It's a long value,
|
||||
and a seed. The cycle determines not only which statements (of those available)
|
||||
will get executed, but it also determines what the values bound to that
|
||||
statement will be.
|
||||
Cycles in NoSQLBench are whole numbers on a number line. All operations in a NoSQLBench session are derived from a
|
||||
single cycle. It's a long value, and a seed. The cycle determines not only which statements (of those available) will
|
||||
get executed, but it also determines what the values bound to that statement will be.
|
||||
|
||||
Cycles are specified as a closed-open `[min,max)` interval, just as slices
|
||||
in some languages. That is, the min value is included in the range, but the
|
||||
max value is not. This means that you can stack slices using common numeric
|
||||
reference points without overlaps or gaps. It means you can have exact awareness
|
||||
of what data is in your dataset, even incrementally.
|
||||
Cycles are specified as a closed-open `[min,max)` interval, just as slices in some languages. That is, the min value is
|
||||
included in the range, but the max value is not. This means that you can stack slices using common numeric reference
|
||||
points without overlaps or gaps. It means you can have exact awareness of what data is in your dataset, even
|
||||
incrementally.
|
||||
|
||||
You can think of a cycle as a single-valued coordinate system for data that
|
||||
lives adjacent to that number on the number line.
|
||||
You can think of a cycle as a single-valued coordinate system for data that lives adjacent to that number on the number
|
||||
line.
|
||||
|
||||
### The Activity
|
||||
|
||||
An activity is a multi-threaded flywheel of statements in some sequence
|
||||
and ratio. Activities run over the numbers in a cycle range. Each activity
|
||||
has a driver type which determines the native protocol that it speaks.
|
||||
|
||||
An activity continuously
|
||||
An activity is a multi-threaded flywheel of statements in some sequence and ratio. Activities run over the numbers in a
|
||||
cycle range. Each activity has a driver type which determines the native protocol that it speaks.
|
||||
|
||||
### The Activity Type
|
||||
|
||||
An activity type is a high level driver for a protocol. It is like a
|
||||
statement-aware cartridge that knows how to take a basic statement template
|
||||
and turn it into an operation for the scenario to execute.
|
||||
An activity type is a high level driver for a protocol. It is like a statement-aware cartridge that knows how to take a
|
||||
basic statement template and turn it into an operation for the scenario to execute.
|
||||
|
||||
### The Scenario
|
||||
|
||||
The scenario is a runtime session that holds the activities while they run.
|
||||
A NoSQLBench scenario is responsible for aggregating global runtime settings,
|
||||
metrics reporting channels, logfiles, and so on.
|
||||
The scenario is a runtime session that holds the activities while they run. A NoSQLBench scenario is responsible for
|
||||
aggregating global runtime settings, metrics reporting channels, logfiles, and so on.
|
||||
|
||||
### The Scenario Script
|
||||
|
||||
Each scenario is governed by a script runs single-threaded, asynchronously
|
||||
from activities, but in control of activities. If needed, the scenario script
|
||||
is automatically created for the user, and the user never knows it is there.
|
||||
If the user has advanced testing requirements, then they may take advantage
|
||||
of the scripting capability at such time.
|
||||
When the script exits, *AND* all activities are complete, then the scenario
|
||||
is complete..
|
||||
Each scenario is governed by a script runs single-threaded, asynchronously from activities, but in control of
|
||||
activities. If needed, the scenario script is automatically created for the user, and the user never knows it is there.
|
||||
If the user has advanced testing requirements, then they may take advantage of the scripting capability at such time.
|
||||
When the script exits, *AND* all activities are complete, then the scenario is complete..
|
||||
|
@ -5,48 +5,43 @@ weight: 12
|
||||
|
||||
# High Fidelity Metrics
|
||||
|
||||
Since NoSQLBench has been built as a serious testing tool for all users,
|
||||
some attention was necessary on the way metric are used.
|
||||
Since NoSQLBench has been built as a serious testing tool for all users, some attention was necessary on the way metric
|
||||
are used.
|
||||
|
||||
## Discrete Reservoirs
|
||||
|
||||
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs.
|
||||
Internally, we use HDR reservoirs with discrete time boundaries. This
|
||||
is so that you can look at the min and max values and know that they
|
||||
apply accurately to the whole sampling window.
|
||||
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs. Internally, we use HDR reservoirs with discrete
|
||||
time boundaries. This is so that you can look at the min and max values and know that they apply accurately to the whole
|
||||
sampling window.
|
||||
|
||||
## Metric Naming
|
||||
|
||||
All activity types that run have a symbolic alias that identifies
|
||||
them for the purposes of automation and metrics. If you have multiple
|
||||
activities running concurrently, they will have different names and will
|
||||
be represnted distinctly in the metrics flow.
|
||||
All running activities have a symbolic alias that identifies them for the purposes of automation and metrics. If you
|
||||
have multiple activities running concurrently, they will have different names and will be represnted distinctly in the
|
||||
metrics flow.
|
||||
|
||||
## Precision and Units
|
||||
|
||||
By default, the internal HDR histogram reservoirs are kept at 4 digits
|
||||
of precision. All timers are kept at nanosecond resolution.
|
||||
By default, the internal HDR histogram reservoirs are kept at 4 digits of precision. All timers are kept at nanosecond
|
||||
resolution.
|
||||
|
||||
## Metrics Reportring
|
||||
|
||||
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and
|
||||
HDR stats summary CSV files.
|
||||
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and HDR stats summary CSV files.
|
||||
|
||||
## Coordianated Omission
|
||||
|
||||
The metrics naming and semantics in NoSQLBench are setup so that you
|
||||
can have coordinated omission metrics when they are appropriate, but
|
||||
there are no there changes when they are not. This means that the metric
|
||||
names and meanings remain stable in any case.
|
||||
The metrics naming and semantics in NoSQLBench are setup so that you can have coordinated omission metrics when they are
|
||||
appropriate, but there are no there changes when they are not. This means that the metric names and meanings remain
|
||||
stable in any case.
|
||||
|
||||
Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused
|
||||
and thus prone to confusing people.
|
||||
Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused and thus prone to confusing
|
||||
people.
|
||||
|
||||
Instead, the terms `service time`, `wait time`, and `response time` are used.
|
||||
These are abbreviated in metrics as `servicetime`, `waittime`, and `responsetime`.
|
||||
Instead, the terms `service time`, `wait time`, and `response time` are used. These are abbreviated in metrics as
|
||||
`servicetime`, `waittime`, and `responsetime`.
|
||||
|
||||
The `servicetime` metric is the only one which is always present. When a
|
||||
rate limiter is used, then additionally `waittime` and `responsetime` are
|
||||
reported.
|
||||
The `servicetime` metric is the only one which is always present. When a rate limiter is used, then additionally
|
||||
`waittime` and `responsetime` are reported.
|
||||
|
||||
|
||||
|
@ -5,23 +5,18 @@ weight: 10
|
||||
|
||||
# NoSQLBench Showcase
|
||||
|
||||
Since NoSQLBench is new on the scene in its current form, you may be wondering
|
||||
why you would want to use it over any other tool. That is what this section is all
|
||||
about.
|
||||
Since NoSQLBench is new on the scene in its current form, you may be wondering why you would want to use it over any
|
||||
other tool. That is what this section is all about.
|
||||
|
||||
If you want to look under the hood of this toolkit before giving it a spin,
|
||||
this section is for you. You don't have to read all of this! It is here for those
|
||||
who want to know the answer to the question "So, what's the big deal??"
|
||||
Just remember it is here for later if you want to skip to the next section and get
|
||||
started testing.
|
||||
If you want to look under the hood of this toolkit before giving it a spin, this section is for you. You don't have to
|
||||
read all of this! It is here for those who want to know the answer to the question "So, what's the big deal??" Just
|
||||
remember it is here for later if you want to skip to the next section and get started testing.
|
||||
|
||||
NoSQLBench can do nearly everything that other testing tools can do, and more. It
|
||||
achieves this by focusing on a scalable user experience in combination with a
|
||||
modular internal architecture.
|
||||
NoSQLBench can do nearly everything that other testing tools can do, and more. It achieves this by focusing on a
|
||||
scalable user experience in combination with a modular internal architecture.
|
||||
|
||||
NoSQLBench is a workload construction and simulation tool for scalable systems
|
||||
testing. That is an entirely different scope of endeavor than most other tools.
|
||||
NoSQLBench is a workload construction and simulation tool for scalable systems testing. That is an entirely different
|
||||
scope of endeavor than most other tools.
|
||||
|
||||
The pages in this section all speak to advanced capabilities that are unique
|
||||
to NoSQLBench. In time, we want to show these with basic scenario examples, right
|
||||
in the docs.
|
||||
The pages in this section all speak to advanced capabilities that are unique to NoSQLBench. In time, we want to show
|
||||
these with basic scenario examples, right in the docs.
|
||||
|
@ -5,23 +5,18 @@ weight: 11
|
||||
|
||||
# Modular Architecture
|
||||
|
||||
The internal architecture of NoSQLBench is modular throughout.
|
||||
Everything from the scripting extensions to the data generation functions
|
||||
is enumerated at compile time into a service descriptor, and then discovered
|
||||
at runtime by the SPI mechanism in Java.
|
||||
The internal architecture of NoSQLBench is modular throughout. Everything from the scripting extensions to the data
|
||||
generation functions is enumerated at compile time into a service descriptor, and then discovered at runtime by the SPI
|
||||
mechanism in Java.
|
||||
|
||||
This means that extending and customizing bundles and features is quite
|
||||
manageable.
|
||||
This means that extending and customizing bundles and features is quite manageable.
|
||||
|
||||
It also means that it is relatively easy to provide a suitable
|
||||
API for multi-protocol support. In fact, there are several drivers
|
||||
avaialble in the current NoSQLBench distribution. You can list them
|
||||
out with `./nb --list-drivers`, and you can get help on
|
||||
how to use each of them with `./nb help <name>`.
|
||||
It also means that it is relatively easy to provide a suitable API for multi-protocol support. In fact, there are
|
||||
several drivers avaialble in the current NoSQLBench distribution. You can list them out with `./nb --list-drivers`, and
|
||||
you can get help on how to use each of them with `./nb help <name>`.
|
||||
|
||||
This also is a way for us to encourage and empower other contributors
|
||||
to help develop the capabilities and reach of NoSQLBench as a bridge
|
||||
building tool in our community. This level of modularity is somewhat
|
||||
unusual, but it serves the purpose of helping users with new features.
|
||||
This also is a way for us to encourage and empower other contributors to help develop the capabilities and reach of
|
||||
NoSQLBench as a bridge building tool in our community. This level of modularity is somewhat unusual, but it serves the
|
||||
purpose of helping users with new features.
|
||||
|
||||
|
||||
|
@ -5,47 +5,38 @@ weight: 2
|
||||
|
||||
# Portable Workloads
|
||||
|
||||
All of the workloads that you can build with NoSQLBench are self-contained
|
||||
in a workload file. This is a statement-oriented configuration file that
|
||||
contains templates for the operations you want to run in a workload.
|
||||
All of the workloads that you can build with NoSQLBench are self-contained in a workload file. This is a
|
||||
statement-oriented configuration file that contains templates for the operations you want to run in a workload.
|
||||
|
||||
This defines part of an activity - the iterative flywheel part that is
|
||||
run directly within an activity type. This file contains everything needed
|
||||
to run a basic activity -- A set of statements in some ratio. It can be
|
||||
used to start an activity, or as part of several activities within a scenario.
|
||||
This defines part of an activity - the iterative flywheel part that is run directly within an activity type. This file
|
||||
contains everything needed to run a basic activity -- A set of statements in some ratio. It can be used to start an
|
||||
activity, or as part of several activities within a scenario.
|
||||
|
||||
## Standard YAML Format
|
||||
|
||||
The format for describing statements in NoSQLBench is generic, but in a
|
||||
particular way that is specialized around describing statements for a workload.
|
||||
The format for describing statements in NoSQLBench is generic, but in a particular way that is specialized around
|
||||
describing statements for a workload.
|
||||
|
||||
That means that you can use the same YAML format to describe a workload
|
||||
for kafka as you can for Apache Cassandra or DSE.
|
||||
That means that you can use the same YAML format to describe a workload for kafka as you can for Apache Cassandra or
|
||||
DSE.
|
||||
|
||||
The YAML structure has been tailored to describing statements, their
|
||||
data generation bindings, how they are grouped and selected, and the
|
||||
parameters needed by drivers, like whether they should be prepared
|
||||
statements or not.
|
||||
The YAML structure has been tailored to describing statements, their data generation bindings, how they are grouped and
|
||||
selected, and the parameters needed by drivers, like whether they should be prepared statements or not.
|
||||
|
||||
Further, the YAML format allows for defaults and overrides with a
|
||||
very simple mechanism that reduces editing fatigue for frequent users.
|
||||
Further, the YAML format allows for defaults and overrides with a very simple mechanism that reduces editing fatigue for
|
||||
frequent users.
|
||||
|
||||
You can also template document-wide macro paramers which are taken
|
||||
from the command line parameters just like any other parameter. This is
|
||||
a way of templating a workload and make it multi-purpose or adjustable
|
||||
on the fly.
|
||||
You can also template document-wide macro paramers which are taken from the command line parameters just like any other
|
||||
parameter. This is a way of templating a workload and make it multi-purpose or adjustable on the fly.
|
||||
|
||||
## Experimentation Friendly
|
||||
|
||||
Because the workload YAML format is generic across activity types,
|
||||
it is possible to ask one acivity type to interpret the statements that are
|
||||
meant for another. This isn't generally a good idea, but it becomes
|
||||
extremely handy when you want to have a very high level activity type like
|
||||
`stdout` use a lower-level syntax like that of the `cql` activity type.
|
||||
When you do this, the stdout activity type _plays_ the statements to your
|
||||
console as they would be executed in CQL, data bindings and all.
|
||||
Because the workload YAML format is generic across activity types, it is possible to ask one acivity type to interpret
|
||||
the statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you
|
||||
want to have a very high level activity type like `stdout` use a lower-level syntax like that of the `cql` activity
|
||||
type. When you do this, the stdout activity type _plays_ the statements to your console as they would be executed in
|
||||
CQL, data bindings and all.
|
||||
|
||||
This means you can empirically and substantively demonstrate and verify
|
||||
access patterns, data skew, and other dataset details before you
|
||||
change back to cql mode and turn up the settings for a higher scale test.
|
||||
This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
|
||||
details before you change back to cql mode and turn up the settings for a higher scale test.
|
||||
|
||||
|
@ -5,91 +5,68 @@ weight: 3
|
||||
|
||||
# Scripting Environment
|
||||
|
||||
The ability to write open-ended testing simulations is provided in
|
||||
EngineBlock by means of a scripted runtime, where each scenario is
|
||||
driven from a control script that can do anything the user wants.
|
||||
The ability to write open-ended testing simulations is provided in EngineBlock by means of a scripted runtime, where
|
||||
each scenario is driven from a control script that can do anything the user wants.
|
||||
|
||||
## Dynamic Parameters
|
||||
|
||||
Some configuration parameters of activities are designed to be
|
||||
assignable while a workload is running. This makes things like
|
||||
threads, rates, and other workload dynamics pseudo real-time.
|
||||
The internal APIs work with the scripting environment to expose
|
||||
these parameters directly to scenario scripts.
|
||||
Some configuration parameters of activities are designed to be assignable while a workload is running. This makes things
|
||||
like threads, rates, and other workload dynamics pseudo real-time. The internal APIs work with the scripting environment
|
||||
to expose these parameters directly to scenario scripts.
|
||||
|
||||
## Scripting Automatons
|
||||
|
||||
When a NoSQLBench scenario is running, it is under the control of a
|
||||
single-threaded script. Each activity that is started by this script
|
||||
is run within its own threadpool, asynchronously.
|
||||
When a NoSQLBench scenario is running, it is under the control of a single-threaded script. Each activity that is
|
||||
started by this script is run within its own threadpool, asynchronously.
|
||||
|
||||
The control script has executive control of the activities, as well
|
||||
as full visibility into the metrics that are provided by each activity.
|
||||
The way these two parts of the runtime meet is through the service
|
||||
objects which are installed into the scripting runtime. These service
|
||||
objects provide a named access point for each running activity and its
|
||||
metrics.
|
||||
The control script has executive control of the activities, as well as full visibility into the metrics that are
|
||||
provided by each activity. The way these two parts of the runtime meet is through the service objects which are
|
||||
installed into the scripting runtime. These service objects provide a named access point for each running activity and
|
||||
its metrics.
|
||||
|
||||
This means that the scenario script can do something simple, like start
|
||||
activities and wait for them to complete, OR, it can do something
|
||||
more sophisticated like dynamically and interative scrutinize the metrics
|
||||
and make realtime adjustments to the workload while it runs.
|
||||
This means that the scenario script can do something simple, like start activities and wait for them to complete, OR, it
|
||||
can do something more sophisticated like dynamically and interative scrutinize the metrics and make realtime adjustments
|
||||
to the workload while it runs.
|
||||
|
||||
## Analysis Methods
|
||||
|
||||
Scripting automatons that do feedback-oriented analysis of a target system
|
||||
are called analysis methods in NoSQLBench. We have prototypes a couple of
|
||||
these already, but there is nothing keeping the adventurous from coming up
|
||||
with their own.
|
||||
Scripting automatons that do feedback-oriented analysis of a target system are called analysis methods in NoSQLBench. We
|
||||
have prototypes a couple of these already, but there is nothing keeping the adventurous from coming up with their own.
|
||||
|
||||
## Command Line Scripting
|
||||
|
||||
The command line has the form of basic test commands and parameters.
|
||||
These command get converted directly into scenario control script
|
||||
in the order they appear. The user can choose whether to stay in
|
||||
high level executive mode, with simple commands like "run workload=...",
|
||||
or to drop down directly into script design. They can look at the
|
||||
equivalent script for any command line by running --show-script.
|
||||
If you take the script that is dumped to console and run it, it should
|
||||
do exactly the same thing as if you hadn't even looked at it and just
|
||||
the standard commands.
|
||||
The command line has the form of basic test commands and parameters. These command get converted directly into scenario
|
||||
control script in the order they appear. The user can choose whether to stay in high level executive mode, with simple
|
||||
commands like "run workload=...", or to drop down directly into script design. They can look at the equivalent script
|
||||
for any command line by running --show-script. If you take the script that is dumped to console and run it, it should do
|
||||
exactly the same thing as if you hadn't even looked at it and just the standard commands.
|
||||
|
||||
There are even ways to combine script fragments, full commands, and calls
|
||||
to scripts on the command line. Since each variant is merely a way of
|
||||
constructing scenario script, they all get composited together before
|
||||
the scenario script is run.
|
||||
There are even ways to combine script fragments, full commands, and calls to scripts on the command line. Since each
|
||||
variant is merely a way of constructing scenario script, they all get composited together before the scenario script is
|
||||
run.
|
||||
|
||||
New introductions to NoSQLBench should focus on the command line. Once
|
||||
a user is familiar with this, it is up to them whether to tap into the
|
||||
deeper functionality. If they don't need to know about scenario scripting,
|
||||
then they shouldn't have to learn about it to be effective.
|
||||
New introductions to NoSQLBench should focus on the command line. Once a user is familiar with this, it is up to them
|
||||
whether to tap into the deeper functionality. If they don't need to know about scenario scripting, then they shouldn't
|
||||
have to learn about it to be effective.
|
||||
|
||||
## Compared to DSLs
|
||||
|
||||
Other tools may claim that their DSL makes scenario "simulation" easier.
|
||||
In practice, any DSL is generally dependent on a development tool to
|
||||
lay the language out in front of a user in a fluent way. This means that
|
||||
DSLs are almost always developer-targeted tools, and mostly useless for
|
||||
casual users who don't want to break out an IDE.
|
||||
Other tools may claim that their DSL makes scenario "simulation" easier. In practice, any DSL is generally dependent on
|
||||
a development tool to lay the language out in front of a user in a fluent way. This means that DSLs are almost always
|
||||
developer-targeted tools, and mostly useless for casual users who don't want to break out an IDE.
|
||||
|
||||
One of the things a DSL proponent may tell you is that it tells you
|
||||
"all the things you can do!". This is de-facto the same thing as it
|
||||
telling you "all the things you can't do" because it's not part of the
|
||||
DSL. This is not a win for the user. For DSL-based systems, the user
|
||||
has to use the DSL whether or not it enhances their creative control,
|
||||
while in fact, most DSL aren't rich enough to do much that is interesting
|
||||
from a simulation perspective.
|
||||
One of the things a DSL proponent may tell you is that it tells you "all the things you can do!". This is de-facto the
|
||||
same thing as it telling you "all the things you can't do" because it's not part of the DSL. This is not a win for the
|
||||
user. For DSL-based systems, the user has to use the DSL whether or not it enhances their creative control, while in
|
||||
fact, most DSL aren't rich enough to do much that is interesting from a simulation perspective.
|
||||
|
||||
In NoSQLBench, we don't force the user to use the programming abstractions
|
||||
except at a very surface level -- the CLI. It is up to the user whether
|
||||
or not to open the secret access panel for the more advance functionality.
|
||||
If they decide to do this, we give them a commodity language (ECMAScript),
|
||||
and we wire it into all the things they were already using. We don't take
|
||||
away their expressivity by telling them what they can't do. This way,
|
||||
users can pick their level of investment and reward as best fits thir individual
|
||||
needs, as it should be.
|
||||
In NoSQLBench, we don't force the user to use the programming abstractions except at a very surface level -- the CLI. It
|
||||
is up to the user whether or not to open the secret access panel for the more advance functionality. If they decide to
|
||||
do this, we give them a commodity language (ECMAScript), and we wire it into all the things they were already using. We
|
||||
don't take away their expressivity by telling them what they can't do. This way, users can pick their level of
|
||||
investment and reward as best fits thir individual needs, as it should be.
|
||||
|
||||
## Scripting Extensions
|
||||
|
||||
Also mentioned under the section on modularity, it is relatively easy
|
||||
for a developer to add their own scripting extensions into NoSQLBench.
|
||||
Also mentioned under the section on modularity, it is relatively easy for a developer to add their own scripting
|
||||
extensions into NoSQLBench.
|
||||
|
@ -5,92 +5,71 @@ weight: 1
|
||||
|
||||
# Virtual Datasets
|
||||
|
||||
The _Virtual Dataset_ capabilities within NoSQLBench allow you to
|
||||
generate data on the fly. There are many reasons for using this technique
|
||||
in testing, but it is often a topic that is overlooked or taken for granted.
|
||||
The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate data on the fly. There are many reasons for
|
||||
using this technique in testing, but it is often a topic that is overlooked or taken for granted.
|
||||
|
||||
## Industrial Strength
|
||||
|
||||
The algorithms used to generate data are based on
|
||||
advanced techniques in the realm of variate sampling. The authors have
|
||||
gone to great lengths to ensure that data generation is efficient and
|
||||
as much O(1) in processing time as possible.
|
||||
The algorithms used to generate data are based on advanced techniques in the realm of variate sampling. The authors have
|
||||
gone to great lengths to ensure that data generation is efficient and as much O(1) in processing time as possible.
|
||||
|
||||
For example...
|
||||
|
||||
One technique that is used to achieve this is to initialize and cache
|
||||
data in high resolution look-up tables for distributions which may perform
|
||||
differently depending on their density functions. The existing Apache
|
||||
Commons Math libraries have been adapted into a set of interpolated
|
||||
Inverse Cumulative Distribution sampling functions. This means that
|
||||
you can use a Zipfian distribution in the same place as you would a
|
||||
Uniform distribution, and once initialized, they sample with identical
|
||||
overhead. This means that by changing your test definition, you don't
|
||||
accidentally change the behavior of your test client.
|
||||
One technique that is used to achieve this is to initialize and cache data in high resolution look-up tables for
|
||||
distributions which may perform differently depending on their density functions. The existing Apache Commons Math
|
||||
libraries have been adapted into a set of interpolated Inverse Cumulative Distribution sampling functions. This means
|
||||
that you can use a Zipfian distribution in the same place as you would a Uniform distribution, and once initialized,
|
||||
they sample with identical overhead. This means that by changing your test definition, you don't accidentally change the
|
||||
behavior of your test client.
|
||||
|
||||
## The Right Tool
|
||||
|
||||
Many other testing systems avoid building a dataset generation component.
|
||||
It's a toubgh problem to solve, so it's often just avoided. Instead, they use
|
||||
libraries like "faker" and variations on that. However, faker is well named,
|
||||
no pun intended. It was meant as a vignette library, not a source of test
|
||||
data for realistic results. If you are using a testing tool for scale testing
|
||||
and relying on a faker variant, then you will almost certainly get invalid
|
||||
results for any serious test.
|
||||
Many other testing systems avoid building a dataset generation component. It's a toubgh problem to solve, so it's often
|
||||
just avoided. Instead, they use libraries like "faker" and variations on that. However, faker is well named, no pun
|
||||
intended. It was meant as a vignette library, not a source of test data for realistic results. If you are using a
|
||||
testing tool for scale testing and relying on a faker variant, then you will almost certainly get invalid results for
|
||||
any serious test.
|
||||
|
||||
The virtual dataset component of NoSQLBench is a library that was designed
|
||||
for high scale and realistic data streams.
|
||||
The virtual dataset component of NoSQLBench is a library that was designed for high scale and realistic data streams.
|
||||
|
||||
## Deterministic
|
||||
|
||||
The data that is generated by the virtual dataset libraries is determinstic.
|
||||
This means that for a given cycle in a test, the operation that is synthesized
|
||||
for that cycle will be the same from one session to the next. This is intentional.
|
||||
If you want to perturb the test data from one session to the next, then you can
|
||||
most easily do it by simply selecting a different set of cycles as your basis.
|
||||
The data that is generated by the virtual dataset libraries is determinstic. This means that for a given cycle in a
|
||||
test, the operation that is synthesized for that cycle will be the same from one session to the next. This is
|
||||
intentional. If you want to perturb the test data from one session to the next, then you can most easily do it by simply
|
||||
selecting a different set of cycles as your basis.
|
||||
|
||||
This means that if you find something intersting in a test run, you can go
|
||||
back to it just by specifying the cycles in question. It also means that you
|
||||
aren't losing comparative value between tests with additional randomness thrown
|
||||
in. The data you generate will still look random to the human eye, but that doesn't
|
||||
mean that it can't be reproducible.
|
||||
This means that if you find something intersting in a test run, you can go back to it just by specifying the cycles in
|
||||
question. It also means that you aren't losing comparative value between tests with additional randomness thrown in. The
|
||||
data you generate will still look random to the human eye, but that doesn't mean that it can't be reproducible.
|
||||
|
||||
## Statistically Shaped
|
||||
|
||||
All this means is that the values you use to tie your dataset together
|
||||
can be specific to any distribution that is appropriate. You can ask for
|
||||
a stream of floating point values 1 trillion values long, in any order.
|
||||
You can use discrete or continuous distributions, with whatever parameters
|
||||
you need.
|
||||
All this means is that the values you use to tie your dataset together can be specific to any distribution that is
|
||||
appropriate. You can ask for a stream of floating point values 1 trillion values long, in any order. You can use
|
||||
discrete or continuous distributions, with whatever parameters you need.
|
||||
|
||||
## Best of Both Worlds
|
||||
|
||||
Some might worry that fully synthetic testing data is not realistic enough.
|
||||
The devil is in the details on these arguments, but suffice it to say that
|
||||
you can pick the level of real data you use as seed data with NoSQLBench.
|
||||
Some might worry that fully synthetic testing data is not realistic enough. The devil is in the details on these
|
||||
arguments, but suffice it to say that you can pick the level of real data you use as seed data with NoSQLBench.
|
||||
|
||||
For example, using the alias sampling method and a published US census
|
||||
(public domain) list of names and surnames tha occured more than 100x,
|
||||
we can provide extremely accurate samples of names according to the
|
||||
discrete distribution we know of. The alias method allows us to sample
|
||||
accurately in O(1) time from the entire dataset by turning a large number
|
||||
of weights into two uniform samples. You will simply not find a better way
|
||||
to sample names of US names than this. (but if you do, please file an issue!)
|
||||
For example, using the alias sampling method and a published US census (public domain) list of names and surnames tha
|
||||
occured more than 100x, we can provide extremely accurate samples of names according to the discrete distribution we
|
||||
know of. The alias method allows us to sample accurately in O(1) time from the entire dataset by turning a large number
|
||||
of weights into two uniform samples. You will simply not find a better way to sample names of US names than this. (but
|
||||
if you do, please file an issue!)
|
||||
|
||||
## Java Idiomatic Extension
|
||||
|
||||
The way that the virtual dataset component works allows Java developers to
|
||||
write any extension to the data generation functions simply in the form
|
||||
of Java 8 or newer Funtional interfaces. As long as they include the
|
||||
annotation processor and annotate their classes, they will show up in the
|
||||
runtime and be available to any workload by their class name.
|
||||
The way that the virtual dataset component works allows Java developers to write any extension to the data generation
|
||||
functions simply in the form of Java 8 or newer Funtional interfaces. As long as they include the annotation processor
|
||||
and annotate their classes, they will show up in the runtime and be available to any workload by their class name.
|
||||
|
||||
## Binding Recipes
|
||||
|
||||
It is possible to stitch data generation functions together directly in
|
||||
a workload YAML. These are data-flow sketches of functions that can
|
||||
be copied and pasted between workload descriptions to share or remix
|
||||
data streams. This allows for the adventurous to build sophisticated
|
||||
virtual datasets that emulate nuances of real datasets, but in a form
|
||||
that takes up less space on the screen than this paragraph!
|
||||
It is possible to stitch data generation functions together directly in a workload YAML. These are data-flow sketches of
|
||||
functions that can be copied and pasted between workload descriptions to share or remix data streams. This allows for
|
||||
the adventurous to build sophisticated virtual datasets that emulate nuances of real datasets, but in a form that takes
|
||||
up less space on the screen than this paragraph!
|
||||
|
||||
|
@ -9,9 +9,8 @@ Let's run a simple test against a cluster to establish some basic familiarity wi
|
||||
|
||||
## Create a Schema
|
||||
|
||||
We will start by creating a simple schema in the database.
|
||||
From your command line, go ahead and execute the following command,
|
||||
replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
|
||||
We will start by creating a simple schema in the database. From your command line, go ahead and execute the following
|
||||
command, replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
|
||||
|
||||
```
|
||||
./nb run driver=cql workload=cql-keyvalue tags=phase:schema host=<dse-host-or-ip>
|
||||
@ -20,7 +19,9 @@ replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
|
||||
This command is creating the following schema in your database:
|
||||
|
||||
```cql
|
||||
CREATE KEYSPACE baselines WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
|
||||
CREATE KEYSPACE baselines
|
||||
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
|
||||
AND durable_writes = true;
|
||||
|
||||
CREATE TABLE baselines.keyvalue (
|
||||
key text PRIMARY KEY,
|
||||
@ -32,7 +33,8 @@ Let's break down each of those command line options.
|
||||
|
||||
`start` tells nosqlbench to start an activity.
|
||||
|
||||
`driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to use the DataStax Java Driver and execute CQL statements against a database.
|
||||
`driver=...` is used to specify the activity type (driver). In this case we are using `cql`, which tells nosqlbench to
|
||||
use the DataStax Java Driver and execute CQL statements against a database.
|
||||
|
||||
`workload=...` is used to specify the workload definition file that defines the activity.
|
||||
|
||||
@ -40,17 +42,19 @@ In this example, we use `cql-keyvalue` which is a pre-built workload that is pac
|
||||
|
||||
`tags=phase:schema` tells nosqlbench to run the yaml block that has the `phase:schema` defined as one of its tags.
|
||||
|
||||
In this example, that is the DDL portion of the `cql-keyvalue` workload.
|
||||
In this example, that is the DDL portion of the `cql-keyvalue` workload. `host=...` tells nosqlbench how to connect to
|
||||
your database, only one host is necessary.
|
||||
|
||||
`host=...` tells nosqlbench how to connect to your database, only one host is necessary.
|
||||
|
||||
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with `DESCRIBE KEYSPACE baselines`.
|
||||
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with
|
||||
`DESCRIBE KEYSPACE baselines`.
|
||||
|
||||
## Load Some Data
|
||||
|
||||
Before running a test of typical access patterns where you want to capture the results, you need to make the test more interesting than loading an empty table. For this, we use the rampup phase.
|
||||
Before running a test of typical access patterns where you want to capture the results, you need to make the test more
|
||||
interesting than loading an empty table. For this, we use the rampup phase.
|
||||
|
||||
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is generating for CQL statements.
|
||||
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what nosqlbench is
|
||||
generating for CQL statements.
|
||||
|
||||
Go ahead and execute the following command:
|
||||
|
||||
@ -71,7 +75,7 @@ insert into baselines.keyvalue (key, value) values (8,296173906);
|
||||
insert into baselines.keyvalue (key, value) values (9,97405552);
|
||||
```
|
||||
|
||||
One thing to know is that nosqlbench deterministically generates data, so the generated values will be the same from run to run.
|
||||
NoSQLBench deterministically generates data, so the generated values will be the same from run to run.
|
||||
|
||||
Now we are ready to write some data to our database. Go ahead and execute the following from your command line:
|
||||
|
||||
@ -81,11 +85,21 @@ Note the differences between this and the command that we used to generate the s
|
||||
|
||||
`tags=phase:rampup` is running the yaml block in `cql-keyvalue` that has only INSERT statements.
|
||||
|
||||
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an appropriately large number of cycles in actual testing to make your main test meaningful.
|
||||
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an
|
||||
appropriately large number of cycles in actual testing to make your main test meaningful.
|
||||
|
||||
:::info
|
||||
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for `cycles=0..n`,
|
||||
which makes cycles a zero-based quantity by default. For example, cycles=5 means that the activity will use cycles
|
||||
0,1,2,3,4, but not 5. The reason for this is explained in detail in the Activity Parameters section.
|
||||
:::
|
||||
|
||||
These parameters are explained in detail in the section on _Activity Parameters_.
|
||||
|
||||
`--progress console:1s` will print the progression of the run to the console every 1 second.
|
||||
|
||||
You should see output that looks like this
|
||||
|
||||
```
|
||||
cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
|
||||
cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
|
||||
@ -103,11 +117,13 @@ cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
|
||||
|
||||
## Run the main test phase
|
||||
|
||||
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default this runs a 50% read / 50% write workload.
|
||||
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default
|
||||
this runs a 50% read / 50% write workload.
|
||||
|
||||
./nb start driver=cql workload=cql-keyvalue tags=phase:main host=<dse-host-or-ip> cycles=100k cyclerate=5000 threads=50 --progress console:1s
|
||||
|
||||
You should see output that looks like this:
|
||||
|
||||
```
|
||||
Logging to logs/scenario_20190812_154431_028.log
|
||||
cql-keyvalue: 0.50%/Running (details: min=0 cycle=500 max=100000)
|
||||
@ -141,12 +157,15 @@ We have a few new command line options here:
|
||||
|
||||
`tags=phase:main` is using a new block in our activity's yaml that contains both read and write queries.
|
||||
|
||||
`threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
|
||||
`threads=50` is an important one. The default for nosqlbench is to run with a single thread. This is not adequate for
|
||||
workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
|
||||
|
||||
`cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
|
||||
`cyclerate=5000` is used to control the operations per second that are initiated by nosqlbench. This command line option
|
||||
is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
|
||||
|
||||
## Now What?
|
||||
|
||||
Note in the above output, we see `Logging to logs/scenario_20190812_154431_028.log`.
|
||||
|
||||
By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the next section Viewing Results.
|
||||
By default nosqlbench records the metrics from the run in this file, we will go into detail about these metrics in the
|
||||
next section Viewing Results.
|
||||
|
@ -5,26 +5,26 @@ weight: 3
|
||||
|
||||
# Example Results
|
||||
|
||||
We just ran a very simple workload against our database. In that example, we saw that
|
||||
nosqlbench writes to a log file and it is in that log file where the most basic form of metrics are displayed.
|
||||
We just ran a very simple workload against our database. In that example, we saw that nosqlbench writes to a log file
|
||||
and it is in that log file where the most basic form of metrics are displayed.
|
||||
|
||||
## Log File Metrics
|
||||
|
||||
For our previous run, we saw that nosqlbench was writing to `logs/scenario_20190812_154431_028.log`
|
||||
|
||||
Even when you don't configure nosqlbench to write its metrics to another location, it
|
||||
will periodically report all the metrics to the log file. At the end of a scenario,
|
||||
before nosqlbench shuts down, it will flush the partial reporting interval again to
|
||||
the logs. This means you can always look in the logs for metrics information.
|
||||
Even when you don't configure nosqlbench to write its metrics to another location, it will periodically report all the
|
||||
metrics to the log file. At the end of a scenario, before nosqlbench shuts down, it will flush the partial reporting
|
||||
interval again to the logs. This means you can always look in the logs for metrics information.
|
||||
|
||||
:::warning
|
||||
If you look in the logs for metrics, be aware that the last report will only contain a
|
||||
partial interval of results. When looking at the last partial window, only metrics which
|
||||
average over time or which compute the mean for the whole test will be meaningful.
|
||||
If you look in the logs for metrics, be aware that the last report will only contain a partial interval of results. When
|
||||
looking at the last partial window, only metrics which average over time or which compute the mean for the whole test
|
||||
will be meaningful.
|
||||
:::
|
||||
|
||||
|
||||
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus a subset of the most important metrics.
|
||||
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus
|
||||
a subset of the most important metrics.
|
||||
|
||||
```
|
||||
2019-08-12 15:46:00,274 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:48] -- BEGIN METRICS DETAIL --
|
||||
@ -36,7 +36,8 @@ Below is a sample of the log that gives us our basic metrics. There is a lot to
|
||||
```
|
||||
|
||||
|
||||
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics from nosqlbench.
|
||||
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics
|
||||
from nosqlbench.
|
||||
|
||||
We recommend that you use one of these methods, according to your environment or tooling available:
|
||||
|
||||
@ -45,4 +46,5 @@ We recommend that you use one of these methods, according to your environment or
|
||||
3. Record your metrics to local CSV files with `--report-csv-to my_metrics_dir`
|
||||
4. Record your metrics to HDR logs with `--log-histograms my_hdr_metrics.log`
|
||||
|
||||
See the command line reference for details on how to route your metrics to a metrics collector or format of your preference.
|
||||
See the command line reference for details on how to route your metrics to a metrics collector or format of your
|
||||
preference.
|
||||
|
@ -5,56 +5,64 @@ weight: 4
|
||||
|
||||
# Example Metrics
|
||||
|
||||
A set of core metrics are provided for every workload that runs with nosqlbench,
|
||||
regardless of the activity type and protocol used. This section explains each of
|
||||
these metrics and shows an example of them from the log file.
|
||||
A set of core metrics are provided for every workload that runs with nosqlbench, regardless of the activity type and
|
||||
protocol used. This section explains each of these metrics and shows an example of them from the log file.
|
||||
|
||||
## metric: result
|
||||
|
||||
This is the primary metric that should be used to get a quick idea of the
|
||||
throughput and latency for a given run. It encapsulates the entire
|
||||
operation life cycle ( ie. bind, execute, get result back ).
|
||||
This is the primary metric that should be used to get a quick idea of the throughput and latency for a given run. It
|
||||
encapsulates the entire operation life cycle ( ie. bind, execute, get result back ).
|
||||
|
||||
For this example we see that we averaged 3732 operations / second with 3.6ms
|
||||
75th percentile latency and 23.9ms 99th percentile latency. Note the raw metrics are
|
||||
in microseconds. This duration_unit may change depending on how a user configures
|
||||
nosqlbench, so always double-check it.
|
||||
For this example we see that we averaged 3732 operations / second with 3.6ms 75th percentile latency and 23.9ms 99th
|
||||
percentile latency. Note the raw metrics are in microseconds. This duration_unit may change depending on how a user
|
||||
configures nosqlbench, so always double-check it.
|
||||
|
||||
```
|
||||
2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185, median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415, mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132, rate_unit=events/second, duration_unit=microseconds
|
||||
2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER,
|
||||
name=cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185,
|
||||
median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415,
|
||||
mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132,
|
||||
rate_unit=events/second, duration_unit=microseconds
|
||||
```
|
||||
|
||||
## metric: result-success
|
||||
|
||||
This metric shows whether there were any errors during the run. You can confirm that
|
||||
the count is equal to the number of cycles for the run if
|
||||
you are expecting or requiring zero failed operations.
|
||||
This metric shows whether there were any errors during the run. You can confirm that the count is equal to the number of
|
||||
cycles for the run if you are expecting or requiring zero failed operations.
|
||||
|
||||
Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput
|
||||
and latency here are slightly different than the `results` metric simply because this
|
||||
is a separate timer that only includes operations which completed with no exceptions.
|
||||
Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput and latency here are slightly
|
||||
different than the `results` metric simply because this is a separate timer that only includes operations which
|
||||
completed with no exceptions.
|
||||
|
||||
```
|
||||
2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808, stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063, p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243, rate_unit=events/second, duration_unit=microseconds
|
||||
2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER,
|
||||
name=cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808,
|
||||
stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063,
|
||||
p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243,
|
||||
rate_unit=events/second, duration_unit=microseconds
|
||||
```
|
||||
|
||||
## metric: resultset-size
|
||||
|
||||
For read workloads, this metric shows the size of result sent back to nosqlbench
|
||||
from the server. This is useful to confirm that you are reading rows that already
|
||||
exist in the database.
|
||||
For read workloads, this metric shows the size of result sent back to nosqlbench from the server. This is useful to
|
||||
confirm that you are reading rows that already exist in the database.
|
||||
|
||||
```
|
||||
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
|
||||
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM,
|
||||
name=cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056,
|
||||
median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
|
||||
```
|
||||
|
||||
#### metric: tries
|
||||
|
||||
nosqlbench will retry failures 10 times by default, this is configurable via the `maxtries` command line
|
||||
option for the cql activity type. This metric shows a histogram of the number of tries that each operation
|
||||
required, in this example, there were no retries as the `count` is 100k.
|
||||
NoSQLBench will retry failures 10 times by default, this is configurable via the `maxtries` command line option for the
|
||||
cql activity type. This metric shows a histogram of the number of tries that each operation required, in this example,
|
||||
there were no retries as the `count` is 100k.
|
||||
|
||||
```
|
||||
2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0, p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
|
||||
2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM,
|
||||
name=cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0,
|
||||
p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
|
||||
```
|
||||
|
||||
### More Metrics
|
||||
@ -66,7 +74,6 @@ nosqlbench extends many ways to report the metrics from a run, including:
|
||||
- Reporting to Graphite
|
||||
- Reporting to HDR
|
||||
|
||||
|
||||
To get more information on these options, see the output of
|
||||
|
||||
./nb --help
|
||||
@ -75,6 +82,6 @@ To get more information on these options, see the output of
|
||||
|
||||
You have completed your first run with nosqlbench!
|
||||
|
||||
In the 'Next Steps' section, you'll find options for how to continue, whether you are looking
|
||||
for basic testing or something more advanced.
|
||||
In the 'Next Steps' section, you'll find options for how to continue, whether you are looking for basic testing or
|
||||
something more advanced.
|
||||
|
||||
|
@ -5,20 +5,19 @@ weight: 5
|
||||
|
||||
# Next Steps
|
||||
|
||||
Now that you've run nosqlbench for the first time and seen what it does, you can
|
||||
choose what level of customization you want for further testing.
|
||||
Now that you've run nosqlbench for the first time and seen what it does, you can choose what level of customization you
|
||||
want for further testing.
|
||||
|
||||
The sections below describe key areas that users typically customize
|
||||
when working with nosqlbench.
|
||||
The sections below describe key areas that users typically customize when working with nosqlbench.
|
||||
|
||||
Everyone who uses nosqlbench will want to get familiar with the 'NoSQLBench Basics' section below.
|
||||
This is essential reading for new and experienced testers alike.
|
||||
Everyone who uses nosqlbench will want to get familiar with the 'NoSQLBench Basics' section below. This is essential
|
||||
reading for new and experienced testers alike.
|
||||
|
||||
## High-Level Users
|
||||
|
||||
Several canonical workloads are already baked-in to nosqlbench for immediate use.
|
||||
If you are simply wanting to drive workloads from nosqlbench without building a custom workload,
|
||||
then you'll want to learn about the available workloads and their options.
|
||||
Several canonical workloads are already baked-in to nosqlbench for immediate use. If you are simply wanting to drive
|
||||
workloads from nosqlbench without building a custom workload, then you'll want to learn about the available workloads
|
||||
and their options.
|
||||
|
||||
Recommended reading for high-level testing workflow:
|
||||
1. 'Built-In Workloads'
|
||||
@ -26,10 +25,9 @@ Recommended reading for high-level testing workflow:
|
||||
|
||||
## Workload Builders
|
||||
|
||||
If you want to use nosqlbench to build a tailored workload that closely emulates what
|
||||
a specific application would do, then you can build a YAML file that specifies all
|
||||
of the details of an iterative workload. You can specify the access patterns,
|
||||
data distributions, and more.
|
||||
If you want to use nosqlbench to build a tailored workload that closely emulates what a specific application would do,
|
||||
then you can build a YAML file that specifies all of the details of an iterative workload. You can specify the access
|
||||
patterns, data distributions, and more.
|
||||
|
||||
The recommended reading for this is:
|
||||
|
||||
@ -39,9 +37,7 @@ The recommended reading for this is:
|
||||
|
||||
## Scenario Developers
|
||||
|
||||
The underlying runtime for a scenario in nosqlbench is based on EngineBlock,
|
||||
which means it has all the scripting power that comes with that. For advanced
|
||||
scenario designs, iterative testing models, or analysis methods, you can use
|
||||
ECMAScript to control the scenario from start to finish. This is an advanced
|
||||
feature that is not recommended for first-time users. A guide for scenario
|
||||
developers will be released in increments.
|
||||
The underlying runtime for a scenario in nosqlbench is based on EngineBlock, which means it has all the scripting power
|
||||
that comes with that. For advanced scenario designs, iterative testing models, or analysis methods, you can use
|
||||
ECMAScript to control the scenario from start to finish. This is an advanced feature that is not recommended for
|
||||
first-time users. A guide for scenario developers will be released in increments.
|
||||
|
@ -7,16 +7,13 @@ weight: 20
|
||||
|
||||
## Downloading
|
||||
|
||||
NoSQLBench is packaged directly as a Linux binary named `nb` and as
|
||||
an executable Java jar named `nb.jar`.
|
||||
NoSQLBench is packaged directly as a Linux binary named `nb` and as an executable Java jar named `nb.jar`.
|
||||
|
||||
The Linux binary is recommended, since it comes with its own
|
||||
JVM and eliminates the need to manage Java downloads. Both can be obtained
|
||||
at the releases section of the main NoSQLBench project:
|
||||
The Linux binary is recommended, since it comes with its own JVM and eliminates the need to manage Java downloads. Both
|
||||
can be obtained at the releases section of the main NoSQLBench project:
|
||||
|
||||
- [NoSQLBench Releases](https://github.com/nosqlbench/nosqlbench/releases)
|
||||
|
||||
|
||||
:::info
|
||||
Once you download the binary, you may need to `chmod +x nb` to make it
|
||||
executable.
|
||||
@ -27,8 +24,8 @@ If you choose to use the nb.jar instead of the binary, it is recommended
|
||||
to run it with at least Java 12.
|
||||
:::
|
||||
|
||||
This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`.
|
||||
If you are using the jar, just replace `./nb` with `java -jar nb.jar` when running commands.
|
||||
This documentation assumes you are using the Linux binary initiating NoSqlBench commands with `./nb`. If you are using
|
||||
the jar, just replace `./nb` with `java -jar nb.jar` when running commands.
|
||||
|
||||
## Running
|
||||
|
||||
@ -51,15 +48,13 @@ To provide your own contact points (comma separated), add the `hosts=` parameter
|
||||
|
||||
./nb cql-iot hosts=host1,host2
|
||||
|
||||
|
||||
Additionally, if you have docker installed on your local system, and your user has permissions to use it, you
|
||||
can use `--docker-metrics` to stand up a live metrics dashboard at port 3000.
|
||||
Additionally, if you have docker installed on your local system, and your user has permissions to use it, you can use
|
||||
`--docker-metrics` to stand up a live metrics dashboard at port 3000.
|
||||
|
||||
./nb cql-iot --docker-metrics
|
||||
|
||||
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can start
|
||||
running real workloads without having to learn much about the machinery that makes it happen.
|
||||
|
||||
This example doesn't go into much detail about what it is doing. It is here to show you how quickly you can
|
||||
start running real workloads without having to learn much about the machinery that makes it happen.
|
||||
|
||||
The rest of this section has a more elaborate example that exposes some of the basic options you may want to
|
||||
adjust for your first serious test.
|
||||
The rest of this section has a more elaborate example that exposes some of the basic options you may want to adjust for
|
||||
your first serious test.
|
||||
|
@ -10,34 +10,28 @@ This is the same documentation you get in markdown format with the
|
||||
|
||||
---------------------------------------
|
||||
|
||||
|
||||
Help ( You're looking at it. )
|
||||
|
||||
--help
|
||||
|
||||
Short options, like '-v' represent simple options, like verbosity.
|
||||
Using multiples increases the level of the option, like '-vvv'.
|
||||
Short options, like '-v' represent simple options, like verbosity. Using multiples increases the level of the option,
|
||||
like '-vvv'.
|
||||
|
||||
Long options, like '--help' are top-level options that may only be
|
||||
used once. These modify general behavior, or allow you to get more
|
||||
details on how to use nosqlbench.
|
||||
Long options, like '--help' are top-level options that may only be used once. These modify general behavior, or allow
|
||||
you to get more details on how to use nosqlbench.
|
||||
|
||||
All other options are either commands, or named arguments to commands.
|
||||
Any single word without dashes is a command that will be converted
|
||||
into script form. Any option that includes an equals sign is a
|
||||
named argument to the previous command. The following example
|
||||
is a commandline with a command *start*, and two named arguments
|
||||
to that command.
|
||||
All other options are either commands, or named arguments to commands. Any single word without dashes is a command that
|
||||
will be converted into script form. Any option that includes an equals sign is a named argument to the previous command.
|
||||
The following example is a commandline with a command *start*, and two named arguments to that command.
|
||||
|
||||
./nb start driver=diag alias=example
|
||||
|
||||
### Discovery options ###
|
||||
|
||||
These options help you learn more about running nosqlbench, and
|
||||
about the plugins that are present in your particular version.
|
||||
These options help you learn more about running nosqlbench, and about the plugins that are present in your particular
|
||||
version.
|
||||
|
||||
Get a list of additional help topics that have more detailed
|
||||
documentation:
|
||||
Get a list of additional help topics that have more detailed documentation:
|
||||
|
||||
./nb help topics
|
||||
|
||||
@ -55,11 +49,9 @@ Provide the metrics that are available for scripting
|
||||
|
||||
### Execution Options ###
|
||||
|
||||
This is how you actually tell nosqlbench what scenario to run. Each of these
|
||||
commands appends script logic to the scenario that will be executed.
|
||||
These are considered as commands, can occur in any order and quantity.
|
||||
The only rule is that arguments in the arg=value form will apply to
|
||||
the preceding script or activity.
|
||||
This is how you actually tell nosqlbench what scenario to run. Each of these commands appends script logic to the
|
||||
scenario that will be executed. These are considered as commands, can occur in any order and quantity. The only rule is
|
||||
that arguments in the arg=value form will apply to the preceding script or activity.
|
||||
|
||||
Add the named script file to the scenario, interpolating named parameters:
|
||||
|
||||
@ -136,17 +128,16 @@ or
|
||||
|
||||
--progress logonly:5m
|
||||
|
||||
If you want to add in classic time decaying histogram metrics
|
||||
for your histograms and timers, you may do so with this option:
|
||||
If you want to add in classic time decaying histogram metrics for your histograms and timers, you may do so with this
|
||||
option:
|
||||
|
||||
--classic-histograms prefix
|
||||
--classic-histograms 'prefix:.*' # same as above
|
||||
--classic-histograms 'prefix:.*specialmetrics' # subset of names
|
||||
|
||||
|
||||
Name the current session, for logfile naming, etc
|
||||
By default, this will be "scenario-TIMESTAMP", and a logfile will be created
|
||||
for this name.
|
||||
Name the current session, for logfile naming, etc By default, this will be "scenario-TIMESTAMP", and a logfile will be
|
||||
created for this name.
|
||||
|
||||
--session-name <name>
|
||||
|
||||
@ -154,10 +145,13 @@ Enlist engineblock to stand up your metrics infrastructure using a local docker
|
||||
|
||||
--docker-metrics
|
||||
|
||||
When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker, configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be found on localhost:3000 with the default credentials admin/admin).
|
||||
|
||||
When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker,
|
||||
configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base
|
||||
dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance
|
||||
(grafana can be found on localhost:3000 with the default credentials admin/admin).
|
||||
|
||||
### Console Options ###
|
||||
|
||||
Increase console logging levels: (Default console logging level is *warning*)
|
||||
|
||||
-v (info)
|
||||
@ -166,8 +160,8 @@ Increase console logging levels: (Default console logging level is *warning*)
|
||||
|
||||
--progress console:1m (disables itself if -v options are used)
|
||||
|
||||
These levels affect *only* the console output level. Other logging level
|
||||
parameters affect logging to the scenario log, stored by default in logs/...
|
||||
These levels affect *only* the console output level. Other logging level parameters affect logging to the scenario log,
|
||||
stored by default in logs/...
|
||||
|
||||
Show version, long form, with artifact coordinates.
|
||||
|
||||
|
@ -5,26 +5,20 @@ weight: 2
|
||||
|
||||
# Grafana Metrics
|
||||
|
||||
nosqlbench comes with a built-in helper to get you up and running quickly
|
||||
with client-side testing metrics.
|
||||
|
||||
This functionality is based on docker, and a built-in method for bringing up a docker stack,
|
||||
automated by NoSQLBench.
|
||||
NoSQLBench comes with a built-in helper to get you up and running quickly with client-side testing metrics. This
|
||||
functionality is based on docker, and a built-in method for bringing up a docker stack, automated by NoSQLBench.
|
||||
|
||||
:::warning
|
||||
This feature requires that you have docker running on the local system and that
|
||||
your user is in a group that is allowed to manage docker.
|
||||
Using the `--docker-metrics` command *will* attempt to manage docker
|
||||
on your local system.
|
||||
This feature requires that you have docker running on the local system and that your user is in a group that
|
||||
is allowed to manage docker. Using the `--docker-metrics` command *will* attempt to manage docker on your local system.
|
||||
:::
|
||||
|
||||
To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime,
|
||||
use this command line option with any other nosqlbench commands:
|
||||
To ask nosqlbench to stand up your metrics infrastructure using a local docker runtime, use this command line option
|
||||
with any other nosqlbench commands:
|
||||
|
||||
--docker-metrics
|
||||
|
||||
When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically
|
||||
on your local docker, configure them to work together, and to send metrics the system
|
||||
automatically. It also imports a base dashboard for nosqlbench and configures grafana
|
||||
snapshot export to share with a central DataStax grafana instance (grafana can be found
|
||||
on localhost:3000 with the default credentials admin/admin).
|
||||
When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically on your local docker,
|
||||
configure them to work together, and to send metrics the system automatically. It also imports a base dashboard for
|
||||
nosqlbench and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be
|
||||
found on localhost:3000 with the default credentials admin/admin).
|
||||
|
@ -5,36 +5,51 @@ weight: 03
|
||||
|
||||
# Parameter Types
|
||||
|
||||
To configure an nosqlbench activity to do something meaningful, you have to
|
||||
provide parameters to it. This can occur in one of several ways. This section is a guide on nosqlbench parameters, how they layer together, and when to use one form over another.
|
||||
To configure an nosqlbench activity to do something meaningful, you have to provide parameters to it. This can occur in
|
||||
one of several ways. This section is a guide on nosqlbench parameters, how they layer together, and when to use one form
|
||||
over another.
|
||||
|
||||
The command line is used to configure both the overall nosqlbench runtime (logging, etc) as well as the individual activities and scripts. Global nosqlbench options can be distinguished from scenario commands and their parameters because because global options always start with a single or --double-hyphen.
|
||||
The command line is used to configure both the overall nosqlbench runtime (logging, etc) as well as the individual
|
||||
activities and scripts. Global nosqlbench options can be distinguished from scenario commands and their parameters
|
||||
because because global options always start with a single or --double-hyphen.
|
||||
|
||||
## Activity Parameters
|
||||
|
||||
Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must* follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to that command.
|
||||
Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must*
|
||||
follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading
|
||||
hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to
|
||||
that command.
|
||||
|
||||
Activity parameters can be provided by the nosqlbench core runtime or they can be provided by the activity type. All of the params are usable to configure an activity together. It's not important where they are provided from so long as you know what they do for your workloads, how to configure them, and where to find the docs.
|
||||
Activity parameters can be provided by the nosqlbench core runtime or they can be provided by the activity type. All of
|
||||
the params are usable to configure an activity together. It's not important where they are provided from so long as you
|
||||
know what they do for your workloads, how to configure them, and where to find the docs.
|
||||
|
||||
*Core* Activity Parameters are those provided by the core runtime.
|
||||
They are part of the core API and used by every activity type. Core activity params include type*, *alias*, and *threads*, for example.
|
||||
These parameters are explained individually under the next section.
|
||||
*Core* Activity Parameters are those provided by the core runtime. They are part of the core API and used by every
|
||||
activity type. Core activity params include type*, *alias*, and *threads*, for example. These parameters are explained
|
||||
individually under the next section.
|
||||
|
||||
*Custom* Activity Parameters are those provided by an activity type.
|
||||
These parameters are documented for each activity type. You can see them by running `nosqlbench help <activity type>`.
|
||||
*Custom* Activity Parameters are those provided by an activity type. These parameters are documented for each activity
|
||||
type. You can see them by running `nosqlbench help <activity type>`.
|
||||
|
||||
Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an activity is running. This means that scenario scripting logic may change some variables while an activity is running, and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting scenarios.
|
||||
Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an
|
||||
activity is running. This means that scenario scripting logic may change some variables while an activity is running,
|
||||
and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting
|
||||
scenarios.
|
||||
|
||||
Parameters that are dynamic should be documented as such in the respective activity type's help page.
|
||||
|
||||
### Template Parameters
|
||||
|
||||
If you need to provide general-purpose overrides to a named section of the
|
||||
standard YAML, then you may use a mechanism called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults. This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template Params'.
|
||||
If you need to provide general-purpose overrides to a named section of the standard YAML, then you may use a mechanism
|
||||
called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults.
|
||||
This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override
|
||||
on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template
|
||||
Params'.
|
||||
|
||||
### Parameter Loading
|
||||
|
||||
Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command line or script, the parameters are resolved in the following order:
|
||||
Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command
|
||||
line or script, the parameters are resolved in the following order:
|
||||
|
||||
1. The `type` parameter tells nosqlbench which activity type implementation to load.
|
||||
2. The activity type implementation creates an activity.
|
||||
@ -46,9 +61,13 @@ Now that we've described all the parameter types, let's tie them together. When
|
||||
|
||||
## Statement Parameters
|
||||
|
||||
Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an activity type. These are also documented in the respective activity type's documentation included in the 'Activity Types' section.
|
||||
Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely
|
||||
different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a
|
||||
workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an
|
||||
activity type. These are also documented in the respective activity type's documentation included in the 'Activity
|
||||
Types' section.
|
||||
|
||||
The core statement parameters are explained just below the core activity parameters in this sectin.
|
||||
The core statement parameters are explained just below the core activity parameters in this section.
|
||||
|
||||
|
||||
|
||||
|
@ -1,9 +1,9 @@
|
||||
---
|
||||
title: Core Activity Params
|
||||
title: Activity Parameters
|
||||
weight: 05
|
||||
---
|
||||
|
||||
# Core Activity Parameters
|
||||
# Activity Parameters
|
||||
|
||||
Activity parameters are passed as named arguments for an activity,
|
||||
either on the command line or via a scenario script. On the command
|
||||
@ -12,14 +12,16 @@ line, these take the form of
|
||||
<paramname>=<paramvalue>
|
||||
|
||||
Some activity parameters are universal in that they can be used with any
|
||||
activity type. These parameters are recognized by nosqlbench whether or
|
||||
not they are recognized by a particular activity type implementation.
|
||||
These are called _core parameters_. Only core activity parameters are
|
||||
documented here.
|
||||
driver type. These parameters are recognized by nosqlbench whether or
|
||||
not they are recognized by a particular driver implementation. These are
|
||||
called _core parameters_. Only core activity parameters are documented
|
||||
here.
|
||||
|
||||
:::info To see what activity parameters are valid for a given activity
|
||||
type, see the documentation for that activity type with `nosqlbench
|
||||
help <activity type>`. :::
|
||||
:::info
|
||||
To see what activity parameters are valid for a given activity type, see
|
||||
the documentation for that activity type with `nosqlbench help <activity
|
||||
type>`.
|
||||
:::
|
||||
|
||||
## driver
|
||||
|
@ -9,9 +9,12 @@ Some statement parameters are recognized by the nosqlbench runtime and can be us
|
||||
|
||||
## *ratio*
|
||||
|
||||
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is started.
|
||||
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a
|
||||
document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is
|
||||
started.
|
||||
|
||||
When an activity is initialized, all of the active statements are combined into a sequence based on their relative ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
|
||||
When an activity is initialized, all of the active statements are combined into a sequence based on their relative
|
||||
ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
|
||||
|
||||
For example, consider the statements below:
|
||||
|
||||
@ -25,10 +28,15 @@ statements:
|
||||
ratio: 3
|
||||
```
|
||||
|
||||
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60% overall. It is important to remember that statement ratios are always relative to the total sum of the active statements' ratios.
|
||||
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence
|
||||
length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first
|
||||
statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60%
|
||||
overall. It is important to remember that statement ratios are always relative to the total sum of the active
|
||||
statements' ratios.
|
||||
|
||||
:::info
|
||||
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include below.
|
||||
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include
|
||||
below.
|
||||
:::
|
||||
|
||||
### *seq* (activity level - do not use on statements)
|
||||
@ -38,30 +46,52 @@ Because the ratio works so closely with the activity parameter `seq`, the descri
|
||||
- _required_: no
|
||||
- _dynamic_: no
|
||||
|
||||
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly efficient form.
|
||||
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op
|
||||
sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle
|
||||
offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly
|
||||
efficient form.
|
||||
|
||||
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or interval.
|
||||
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket
|
||||
scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding
|
||||
ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple
|
||||
times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or
|
||||
interval.
|
||||
|
||||
:::info
|
||||
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields determined by distinct cycle values.
|
||||
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle
|
||||
still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per
|
||||
sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields
|
||||
determined by distinct cycle values.
|
||||
:::
|
||||
|
||||
There are three schemes to pick from:
|
||||
|
||||
### bucket
|
||||
|
||||
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield the sequence A B B B B B.
|
||||
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is
|
||||
exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield
|
||||
the sequence A B B B B B.
|
||||
|
||||
### concat
|
||||
|
||||
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
|
||||
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The
|
||||
ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
|
||||
|
||||
### interval
|
||||
|
||||
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time, then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval (0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are taken as the order.
|
||||
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and
|
||||
apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time,
|
||||
then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same
|
||||
time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval
|
||||
(0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is
|
||||
assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are
|
||||
taken as the order.
|
||||
|
||||
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A B A` as the op sequence.
|
||||
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A
|
||||
B A` as the op sequence.
|
||||
|
||||
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle sequencing of statements.
|
||||
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where
|
||||
each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle
|
||||
sequencing of statements.
|
||||
|
||||
|
||||
|
@ -5,6 +5,5 @@ weight: 30
|
||||
|
||||
# NoSQLBench Basics
|
||||
|
||||
This section covers the essential details that you'll need to
|
||||
run nosqlbench in different ways.
|
||||
This section covers the essential details that you'll need to run nosqlbench in different ways.
|
||||
|
||||
|
@ -5,17 +5,15 @@ weight: 2
|
||||
|
||||
## Description
|
||||
|
||||
The CQL IoT workload demonstrates a time-series telemetry system as typically
|
||||
found in IoT applications. The bulk of the traffic is telemetry ingest. This is
|
||||
useful for establishing steady-state capacity with an actively managed data
|
||||
lifecycle. This is a steady-state workload, where inserts are 90% of the
|
||||
operations and queries are the remaining 10%.
|
||||
The CQL IoT workload demonstrates a time-series telemetry system as typically found in IoT applications. The bulk of the
|
||||
traffic is telemetry ingest. This is useful for establishing steady-state capacity with an actively managed data
|
||||
lifecycle. This is a steady-state workload, where inserts are 90% of the operations and queries are the remaining 10%.
|
||||
|
||||
## Schema
|
||||
|
||||
CREATE KEYSPACE baselines WITH replication =
|
||||
CREATE KEYSPACE baselines WITH replication =
|
||||
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
|
||||
|
||||
|
||||
CREATE TABLE baselines.iot (
|
||||
station_id UUID,
|
||||
machine_id UUID,
|
||||
@ -33,9 +31,8 @@ operations and queries are the remaining 10%.
|
||||
2. rampup - Ramp-Up to steady state for normative density, writes only 100M rows
|
||||
3. main - Run at steady state with 10% reads and 90% writes, 100M rows
|
||||
|
||||
For in-depth testing, this workload will take some time to build up data density
|
||||
where TTLs begin purging expired data. At this point, the test should be
|
||||
considered steady-state.
|
||||
For in-depth testing, this workload will take some time to build up data density where TTLs begin purging expired data.
|
||||
At this point, the test should be considered steady-state.
|
||||
|
||||
## Data Set
|
||||
|
||||
@ -60,7 +57,7 @@ considered steady-state.
|
||||
select * from baselines.iot
|
||||
where machine_id=? and sensor_name=?
|
||||
limit 10
|
||||
|
||||
|
||||
## Workload Parameters
|
||||
|
||||
This workload has no adjustable parameters when used in the baseline tests.
|
||||
@ -74,17 +71,14 @@ When used for additional testing, the following parameters should be supported:
|
||||
- compression - enabled or disabled, to disable, set compression=''
|
||||
- write_cl - the consistency level for writes (default: LOCAL_QUORUM)
|
||||
- read_cl - the consistency level for reads (defaultL LOCAL_QUORUM)
|
||||
|
||||
|
||||
## Key Performance Metrics
|
||||
|
||||
Client side metrics are a more accurate measure of the system behavior from a
|
||||
user's perspective. For microbench and baseline tests, these are the only
|
||||
required metrics. When gathering metrics from multiple server nodes, they should
|
||||
be kept in aggregate form, for min, max, and average for each time interval in
|
||||
monitoring. For example, the avg p99 latency for reads should be kept, as well
|
||||
as the min p99 latency for reads. If possible metrics, should be kept in plot
|
||||
form, with discrete histogram values per interval.
|
||||
|
||||
Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
|
||||
baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
|
||||
kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
|
||||
for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
|
||||
with discrete histogram values per interval.
|
||||
|
||||
### Client-Side
|
||||
|
||||
|
@ -5,22 +5,19 @@ weight: 1
|
||||
|
||||
## Description
|
||||
|
||||
The CQL Key-Value workload demonstrates the simplest possible schema with
|
||||
payload data. This is useful for measuring system capacity most directly in
|
||||
terms of raw operations. As a reference point, provides some insight around
|
||||
types of workloads that are constrained around messaging, threading, and
|
||||
tasking, rather than bulk throughput.
|
||||
The CQL Key-Value workload demonstrates the simplest possible schema with payload data. This is useful for measuring
|
||||
system capacity most directly in terms of raw operations. As a reference point, provides some insight around types of
|
||||
workloads that are constrained around messaging, threading, and tasking, rather than bulk throughput.
|
||||
|
||||
During preload, all keys are set with a value. During the main phase of the
|
||||
workload, random keys from the known population are replaced with new values
|
||||
which never repeat. During the main phase, random partitions are selected for
|
||||
During preload, all keys are set with a value. During the main phase of the workload, random keys from the known
|
||||
population are replaced with new values which never repeat. During the main phase, random partitions are selected for
|
||||
upsert, with row values never repeating.
|
||||
|
||||
## Schema
|
||||
|
||||
CREATE KEYSPACE baselines IF NOT EXISTS WITH replication =
|
||||
CREATE KEYSPACE baselines IF NOT EXISTS WITH replication =
|
||||
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
|
||||
|
||||
|
||||
CREATE TABLE baselines.keyvalue (
|
||||
user_id UUID,
|
||||
user_code text
|
||||
@ -31,7 +28,7 @@ upsert, with row values never repeating.
|
||||
|
||||
1. schema - Initialize the schema.
|
||||
2. rampup - Load data according to the data set size.
|
||||
3. main - Run the workload
|
||||
3. main - Run the workload
|
||||
|
||||
## Operations
|
||||
|
||||
@ -41,19 +38,19 @@ upsert, with row values never repeating.
|
||||
|
||||
### read (main)
|
||||
|
||||
select * from baselines.keyvalue where key=?key;
|
||||
select * from baselines.keyvalue where key=?key;
|
||||
|
||||
## Data Set
|
||||
|
||||
### baselines.keyvalue insert (rampup)
|
||||
|
||||
- key - text, number as string, selected sequentially up to keycount
|
||||
- value - text, number as string, selected sequentially up to valuecount
|
||||
- value - text, number as string, selected sequentially up to valuecount
|
||||
|
||||
### baselines.keyvalue insert (main)
|
||||
|
||||
- key - text, number as string, selected uniformly within keycount
|
||||
- value - text, number as string, selected uniformly within valuecount
|
||||
- value - text, number as string, selected uniformly within valuecount
|
||||
|
||||
### baselines.keyvalue read (main)
|
||||
|
||||
@ -70,13 +67,11 @@ When used for additional testing, the following parameters should be supported:
|
||||
|
||||
## Key Performance Metrics
|
||||
|
||||
Client side metrics are a more accurate measure of the system behavior from a
|
||||
user's perspective. For microbench and baseline tests, these are the only
|
||||
required metrics. When gathering metrics from multiple server nodes, they should
|
||||
be kept in aggregate form, for min, max, and average for each time interval in
|
||||
monitoring. For example, the avg p99 latency for reads should be kept, as well
|
||||
as the min p99 latency for reads. If possible metrics, should be kept in plot
|
||||
form, with discrete histogram values per interval.
|
||||
Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
|
||||
baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
|
||||
kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
|
||||
for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
|
||||
with discrete histogram values per interval.
|
||||
|
||||
### Client-Side
|
||||
|
||||
@ -95,6 +90,5 @@ form, with discrete histogram values per interval.
|
||||
|
||||
# Notes on Interpretation
|
||||
|
||||
Once the average ratio of overwrites starts to balance with the rate of
|
||||
compaction, a steady state should be achieved. At this point, pending
|
||||
compactions and bytes compacted should be mostly flat over time.
|
||||
Once the average ratio of overwrites starts to balance with the rate of compaction, a steady state should be achieved.
|
||||
At this point, pending compactions and bytes compacted should be mostly flat over time.
|
||||
|
@ -5,14 +5,15 @@ weight: 3
|
||||
|
||||
## Description
|
||||
|
||||
The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help understand underlying performance differences between version and configuration options
|
||||
when using data models that have wide rows.
|
||||
The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help
|
||||
understand underlying performance differences between version and configuration options when using data models that have
|
||||
wide rows.
|
||||
|
||||
## Schema
|
||||
|
||||
CREATE KEYSPACE if not exists baselines WITH replication =
|
||||
CREATE KEYSPACE if not exists baselines WITH replication =
|
||||
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
|
||||
|
||||
|
||||
CREATE TABLE if not exists baselines.widerows (
|
||||
part text,
|
||||
clust text,
|
||||
@ -26,17 +27,16 @@ when using data models that have wide rows.
|
||||
2. rampup - Fully populate the widerows with data, 100000 elements per row
|
||||
3. main - Run at steady state with 50% reads and 50% writes, 100M rows
|
||||
|
||||
For in-depth testing, this workload needs significant density of partitions in
|
||||
combination with fully populated wide rows. For exploratory or parameter
|
||||
contrasting tests, ensure that the rampup phase is configured correctly to
|
||||
establish this initial state.
|
||||
For in-depth testing, this workload needs significant density of partitions in combination with fully populated wide
|
||||
rows. For exploratory or parameter contrasting tests, ensure that the rampup phase is configured correctly to establish
|
||||
this initial state.
|
||||
|
||||
## Data Set
|
||||
|
||||
### baselines.widerows dataset (rampup)
|
||||
|
||||
- part - text, number in string form, sequentially from 1..1E9
|
||||
- clust - text, number in string form, sequentially from 1..1E9
|
||||
- clust - text, number in string form, sequentially from 1..1E9
|
||||
- data - text, extract from lorem ipsum between 50 and 150 characters
|
||||
|
||||
### baselines.widerows dataset (main)
|
||||
@ -64,7 +64,7 @@ establish this initial state.
|
||||
select * from baselines.iot
|
||||
where machine_id=? and sensor_name=?
|
||||
limit 10
|
||||
|
||||
|
||||
## Workload Parameters
|
||||
|
||||
This workload has no adjustable parameters when used in the baseline tests.
|
||||
@ -73,16 +73,14 @@ When used for additional testing, the following parameters should be supported:
|
||||
|
||||
- partcount - the number of unique partitions
|
||||
- partsize - the number of logical rows within a CQL partition
|
||||
|
||||
|
||||
## Key Performance Metrics
|
||||
|
||||
Client side metrics are a more accurate measure of the system behavior from a
|
||||
user's perspective. For microbench and baseline tests, these are the only
|
||||
required metrics. When gathering metrics from multiple server nodes, they should
|
||||
be kept in aggregate form, for min, max, and average for each time interval in
|
||||
monitoring. For example, the avg p99 latency for reads should be kept, as well
|
||||
as the min p99 latency for reads. If possible metrics, should be kept in plot
|
||||
form, with discrete histogram values per interval.
|
||||
Client side metrics are a more accurate measure of the system behavior from a user's perspective. For microbench and
|
||||
baseline tests, these are the only required metrics. When gathering metrics from multiple server nodes, they should be
|
||||
kept in aggregate form, for min, max, and average for each time interval in monitoring. For example, the avg p99 latency
|
||||
for reads should be kept, as well as the min p99 latency for reads. If possible metrics, should be kept in plot form,
|
||||
with discrete histogram values per interval.
|
||||
|
||||
### Client-Side
|
||||
|
||||
|
@ -5,24 +5,20 @@ weight: 40
|
||||
|
||||
# Built-In Workloads
|
||||
|
||||
There are a few built-in workloads which you may want to run.
|
||||
These workloads can be run from a command without having to configure anything,
|
||||
or they can be tailored with their built-in parameters.
|
||||
There are a few built-in workloads which you may want to run. These workloads can be run from a command without having
|
||||
to configure anything, or they can be tailored with their built-in parameters.
|
||||
|
||||
There is now a way to list the built-in workloads:
|
||||
|
||||
`nb --list-workloads` will give you a list of all the pre-defined workloads
|
||||
which have a named scenarios built-in.
|
||||
`nb --list-workloads` will give you a list of all the pre-defined workloads which have a named scenarios built-in.
|
||||
|
||||
## Common Built-Ins
|
||||
|
||||
This section of the guidebook will explain a couple of the common
|
||||
scenarios in detail.
|
||||
This section of the guidebook will explain a couple of the common scenarios in detail.
|
||||
|
||||
## Built-In Workload Conventions
|
||||
|
||||
The built-in workloads follow a set of conventions so that they can
|
||||
be used interchangeably:
|
||||
The built-in workloads follow a set of conventions so that they can be used interchangeably:
|
||||
|
||||
### Phases
|
||||
|
||||
@ -34,7 +30,7 @@ Each built-in contains the following tags that can be used to break the workload
|
||||
|
||||
### Parameters
|
||||
|
||||
Each built-in has a set of adjustable parameters which is documented below per workload. For example,
|
||||
the cql-iot workload has a `sources` parameter which determines the number of unique devices in the dataset.
|
||||
Each built-in has a set of adjustable parameters which is documented below per workload. For example, the cql-iot
|
||||
workload has a `sources` parameter which determines the number of unique devices in the dataset.
|
||||
|
||||
|
||||
|
@ -3,68 +3,61 @@ title: 00 YAML Organization
|
||||
weight: 00
|
||||
---
|
||||
|
||||
It is best to keep every workload self-contained within a single YAML file,
|
||||
including schema, data rampup, and the main phase of testing.
|
||||
The phases of testing are controlled by tags as described in the Standard YAML section.
|
||||
It is best to keep every workload self-contained within a single YAML file, including schema, data rampup, and the main
|
||||
phase of testing. The phases of testing are controlled by tags as described in the Standard YAML section.
|
||||
|
||||
:::info
|
||||
The phase names described below have been adopted as a convention within the
|
||||
built-in workloads. It is strongly advised that new workload YAMLs use the same
|
||||
tagging scheme so that workload are more plugable across YAMLs.
|
||||
The phase names described below have been adopted as a convention within the built-in workloads. It is strongly advised
|
||||
that new workload YAMLs use the same tagging scheme so that workload are more plugable across YAMLs.
|
||||
:::
|
||||
|
||||
### Schema phase
|
||||
|
||||
The schema phase is simply a phase of your test which creates the necessary schema
|
||||
on your target system. For CQL, this generally consists of a keyspace and one ore
|
||||
more table statements. There is no special schema layer in nosqlbench. All statements
|
||||
executed are simply statements. This provides the greatest flexibility in testing since
|
||||
every activity type is allowed to control its DDL and DML using the same machinery.
|
||||
The schema phase is simply a phase of your test which creates the necessary schema on your target system. For CQL, this
|
||||
generally consists of a keyspace and one ore more table statements. There is no special schema layer in nosqlbench. All
|
||||
statements executed are simply statements. This provides the greatest flexibility in testing since every activity type
|
||||
is allowed to control its DDL and DML using the same machinery.
|
||||
|
||||
The schema phase is normally executed with defaults for most parameters. This means
|
||||
that statements will execute in the order specified in the YAML, in serialized form,
|
||||
exactly once. This is a welcome side-effect of how the initial parameters like _cycles_
|
||||
is set from the statements which are activated by tagging.
|
||||
The schema phase is normally executed with defaults for most parameters. This means that statements will execute in the
|
||||
order specified in the YAML, in serialized form, exactly once. This is a welcome side-effect of how the initial
|
||||
parameters like _cycles_ is set from the statements which are activated by tagging.
|
||||
|
||||
You can mark statements as schema phase statements by adding this set of tags to the
|
||||
statements, either directly, or by block:
|
||||
You can mark statements as schema phase statements by adding this set of tags to the statements, either directly, or by
|
||||
block:
|
||||
|
||||
tags:
|
||||
phase: schema
|
||||
|
||||
### Rampup phase
|
||||
|
||||
When you run a performance test, it is very important to be aware of how much data is
|
||||
present. Higher density tests are more realistic for systems which accumulate data over
|
||||
time, or which have a large working set of data. The amount of data on the system you are
|
||||
testing should recreate a realistic amount of data that you would run in production,
|
||||
ideally. In general, there is a triangular trade-off between service time, op rate, and data density.
|
||||
When you run a performance test, it is very important to be aware of how much data is present. Higher density tests are
|
||||
more realistic for systems which accumulate data over time, or which have a large working set of data. The amount of
|
||||
data on the system you are testing should recreate a realistic amount of data that you would run in production, ideally.
|
||||
In general, there is a triangular trade-off between service time, op rate, and data density.
|
||||
|
||||
It is the purpose of the _rampup_ phase to create the backdrop data on a target system
|
||||
that makes a test meaningful for some level of data density. Data density is normally
|
||||
discussed as average per node, but it is also important to consider distribution of data
|
||||
as it varies from the least dense to the most dense nodes.
|
||||
It is the purpose of the _rampup_ phase to create the backdrop data on a target system that makes a test meaningful for
|
||||
some level of data density. Data density is normally discussed as average per node, but it is also important to consider
|
||||
distribution of data as it varies from the least dense to the most dense nodes.
|
||||
|
||||
Because it is useful to be able to add data to a target cluster in an incremental way,
|
||||
the bindings which are used with a _rampup_ phase may actually be different from the
|
||||
ones used for a _main_ phase. In most cases, you want the rampup phase to create data
|
||||
in a way that incrementally adds to the population of data in the cluster. This allows
|
||||
you to add some data to a cluster with `cycles=0..1M` and then decide whether to
|
||||
continue adding data using the next contiguous range of cycles, with `cycles=1M..2M` and so on.
|
||||
Because it is useful to be able to add data to a target cluster in an incremental way, the bindings which are used with
|
||||
a _rampup_ phase may actually be different from the ones used for a _main_ phase. In most cases, you want the rampup
|
||||
phase to create data in a way that incrementally adds to the population of data in the cluster. This allows you to add
|
||||
some data to a cluster with `cycles=0..1M` and then decide whether to continue adding data using the next contiguous
|
||||
range of cycles, with `cycles=1M..2M` and so on.
|
||||
|
||||
You can mark statements as rampup phase statements by adding this set of tags to the
|
||||
statements, either directly, or by block:
|
||||
You can mark statements as rampup phase statements by adding this set of tags to the statements, either directly, or by
|
||||
block:
|
||||
|
||||
tags:
|
||||
phase: rampup
|
||||
|
||||
### Main phase
|
||||
|
||||
The main phase of a nosqlbench scenario is the one during which you really care about
|
||||
the metric. This is the actual test that everything else has prepared your system for.
|
||||
The main phase of a nosqlbench scenario is the one during which you really care about the metric. This is the actual
|
||||
test that everything else has prepared your system for.
|
||||
|
||||
You can mark statement as schema phase statements by adding this set of tags to the
|
||||
statements, either directly, or by block:
|
||||
You can mark statement as schema phase statements by adding this set of tags to the statements, either directly, or by
|
||||
block:
|
||||
|
||||
tags:
|
||||
phase: main
|
||||
|
@ -5,11 +5,11 @@ weight: 01
|
||||
|
||||
## Statement Templates
|
||||
|
||||
A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data to use with them, and tags for organizing them.
|
||||
A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data
|
||||
to use with them, and tags for organizing them.
|
||||
|
||||
In essence, the config format is *all about configuring statements*.
|
||||
Every other element in the config format is in some way modifying
|
||||
or otherwise helping create statements to be used in an activity.
|
||||
In essence, the config format is *all about configuring statements*. Every other element in the config format is in some
|
||||
way modifying or otherwise helping create statements to be used in an activity.
|
||||
|
||||
Statement templates are the single most important part of a YAML config.
|
||||
|
||||
@ -19,12 +19,16 @@ statements:
|
||||
- a single statement body
|
||||
```
|
||||
|
||||
This is a valid activity YAML file in and of itself. It has a single
|
||||
statement template.
|
||||
This is a valid activity YAML file in and of itself. It has a single statement template.
|
||||
|
||||
It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and how it will be used.
|
||||
It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The
|
||||
example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL
|
||||
activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then
|
||||
the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and
|
||||
how it will be used.
|
||||
|
||||
You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little further in:
|
||||
You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little
|
||||
further in:
|
||||
|
||||
```yaml
|
||||
statements:
|
||||
@ -46,5 +50,6 @@ statements:
|
||||
submit job {alpha} on queue {beta} with options {gamma};
|
||||
```
|
||||
|
||||
Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the statement based on its position in the YAML file.
|
||||
Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the
|
||||
statement based on its position in the YAML file.
|
||||
|
||||
|
@ -5,7 +5,12 @@ weight: 02
|
||||
|
||||
## Data Bindings
|
||||
|
||||
Procedural data generation is built-in to the nosqlbench runtime by way of the [Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named recipes for generated data are called bindings. Procedural generation for test data has [many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to pull in the recipes we want.
|
||||
Procedural data generation is built-in to the nosqlbench runtime by way of the
|
||||
[Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named
|
||||
recipes for generated data are called bindings. Procedural generation for test data has
|
||||
[many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed
|
||||
and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to
|
||||
pull in the recipes we want.
|
||||
|
||||
You can add a bindings section like this:
|
||||
|
||||
@ -17,9 +22,12 @@ bindings:
|
||||
delta: WeightedStrings('one:1;six:6;three:3;')
|
||||
```
|
||||
|
||||
This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of the four bindings together are called a bindings set.
|
||||
This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that
|
||||
takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of
|
||||
the four bindings together are called a bindings set.
|
||||
|
||||
The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity can construct a statement template from the provided bindings if needed, so this is valid:
|
||||
The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity
|
||||
can construct a statement template from the provided bindings if needed, so this is valid:
|
||||
|
||||
```text
|
||||
[test]$ cat > stdout-test.yaml
|
||||
@ -43,13 +51,21 @@ The above bindings block is also a valid activity YAML, at least for the _stdout
|
||||
9,nine,00J_pro,six
|
||||
```
|
||||
|
||||
Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments.
|
||||
Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the
|
||||
default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments.
|
||||
|
||||
This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time nosqlbench needs to create a set of values as parameters to a statement, the functions are called with an input, known as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an individual statement for a database operation. In this way, each cycle represents a specific operation. Since the functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making all nosqlbench workloads deterministic.
|
||||
This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time
|
||||
nosqlbench needs to create a set of values as parameters to a statement, the functions are called with an input, known
|
||||
as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an
|
||||
individual statement for a database operation. In this way, each cycle represents a specific operation. Since the
|
||||
functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making
|
||||
all nosqlbench workloads deterministic.
|
||||
|
||||
In the example above, you can see the cycle numbers down the left.
|
||||
|
||||
If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly different result, as the bindings apply to the statements that are provided, rather than creating a default statement for the bindings. See the example below:
|
||||
If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly
|
||||
different result, as the bindings apply to the statements that are provided, rather than creating a default statement
|
||||
for the bindings. See the example below:
|
||||
|
||||
```text
|
||||
[test]$ cat > stdout-test.yaml
|
||||
@ -84,11 +100,19 @@ know how statements will be used!
|
||||
submit job 9 on queue nine with options 00J_pro;
|
||||
```
|
||||
|
||||
There are a few things to notice here. First, the statements that are executed are automatically alternated between. If you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run 5 times.
|
||||
There are a few things to notice here. First, the statements that are executed are automatically alternated between. If
|
||||
you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run
|
||||
5 times.
|
||||
|
||||
Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In fact, they were both treated as templates, but one of them had no anchors.
|
||||
Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In
|
||||
fact, they were both treated as templates, but one of them had no anchors.
|
||||
|
||||
On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements. Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it.
|
||||
On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements.
|
||||
Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it.
|
||||
|
||||
This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves are templates for data generation, only to be used when necessary. This means that the bindings that are defined around a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions.
|
||||
This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves
|
||||
are templates for data generation, only to be used when necessary. This means that the bindings that are defined around
|
||||
a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then
|
||||
the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both
|
||||
selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions.
|
||||
|
||||
|
@ -6,17 +6,23 @@ weight: 03
|
||||
|
||||
## Statement Parameters
|
||||
|
||||
Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different than the parameters that you use at the activity level. They apply specifically to a statement template, and are interpreted by an activity type when the statement template is used to construct a native statement form.
|
||||
Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different
|
||||
than the parameters that you use at the activity level. They apply specifically to a statement template, and are
|
||||
interpreted by an activity type when the statement template is used to construct a native statement form.
|
||||
|
||||
For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL statement should be prepared or not.
|
||||
For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In
|
||||
the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL
|
||||
statement should be prepared or not.
|
||||
|
||||
As with the bindings, a params section can be added at the same level, setting additional parameters to be used with statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do much on their own, although they have the same basic map syntax as bindings:
|
||||
As with the bindings, a params section can be added at the same level, setting additional parameters to be used with
|
||||
statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a
|
||||
way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do
|
||||
much on their own, although they have the same basic map syntax as bindings:
|
||||
|
||||
```yaml
|
||||
params:
|
||||
ratio: 1
|
||||
```
|
||||
|
||||
As with statements, it is up to each activity type to interpret params in a
|
||||
useful way.
|
||||
As with statements, it is up to each activity type to interpret params in a useful way.
|
||||
|
||||
|
@ -5,7 +5,8 @@ weight: 04
|
||||
|
||||
## Statement Tags
|
||||
|
||||
Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are generally free-form, but there is a set of conventions that can make your testing easier.
|
||||
Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are
|
||||
generally free-form, but there is a set of conventions that can make your testing easier.
|
||||
|
||||
An example:
|
||||
|
||||
@ -17,7 +18,8 @@ tags:
|
||||
|
||||
### Tag Filtering
|
||||
|
||||
The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided as an activity parameter when an activity is launched. The rules for tag filtering are:
|
||||
The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided
|
||||
as an activity parameter when an activity is launched. The rules for tag filtering are:
|
||||
|
||||
1. If no tag filter is specified, then the statement matches.
|
||||
2. A tag name predicate like `tags=name` asserts the presence of a specific
|
||||
@ -74,7 +76,5 @@ I'm alive!
|
||||
# compound tag predicate does not fully match
|
||||
[test]$ ./nb run driver=stdout workload=stdout-test tags='name=fox.*',unit=delta
|
||||
11:02:53.490 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
@ -5,7 +5,11 @@ weight: 05
|
||||
|
||||
## Statement Blocks
|
||||
|
||||
All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that need to be configured independently.
|
||||
All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and
|
||||
parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your
|
||||
statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or
|
||||
parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that
|
||||
need to be configured independently.
|
||||
|
||||
This is where blocks become useful:
|
||||
|
||||
@ -38,5 +42,7 @@ blocks:
|
||||
9,block2-O
|
||||
```
|
||||
|
||||
This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the root document level. Any of these values that are defined at the base document level apply to all blocks contained in that document, unless specifically overridden within a given block.
|
||||
This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the
|
||||
root document level. Any of these values that are defined at the base document level apply to all blocks contained in
|
||||
that document, unless specifically overridden within a given block.
|
||||
|
||||
|
@ -7,7 +7,9 @@ weight: 06
|
||||
|
||||
## Statement Delimiting
|
||||
|
||||
Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way for small statements is in double quotes. If you need to express a much longer statement with special characters an newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so:
|
||||
Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way
|
||||
for small statements is in double quotes. If you need to express a much longer statement with special characters an
|
||||
newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so:
|
||||
|
||||
```yaml
|
||||
statements:
|
||||
@ -18,16 +20,24 @@ statements:
|
||||
submit job {alpha} on queue {beta} with options {gamma};
|
||||
```
|
||||
|
||||
Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed.
|
||||
Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice
|
||||
because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed.
|
||||
|
||||
Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific activity type they are being used with. Generally, the statements should be thought of as a statement form that you want to use in your activity -- something that has place holders for data bindings. These place holders are called *named anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for each cycle of an activity.
|
||||
Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific
|
||||
activity type they are being used with. Generally, the statements should be thought of as a statement form that you want
|
||||
to use in your activity -- something that has place holders for data bindings. These place holders are called *named
|
||||
anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for
|
||||
each cycle of an activity.
|
||||
|
||||
There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions regarding multi-line blocks, see [YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block)
|
||||
There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it
|
||||
retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions
|
||||
regarding multi-line blocks, see
|
||||
[YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block)
|
||||
|
||||
## Statement Sequences
|
||||
|
||||
To provide a degree of flexibility to the user for statement definitions,
|
||||
multiple statements may be provided together as a sequence.
|
||||
To provide a degree of flexibility to the user for statement definitions, multiple statements may be provided together
|
||||
as a sequence.
|
||||
|
||||
```yaml
|
||||
# a list of statements
|
||||
@ -42,7 +52,8 @@ statements:
|
||||
name2: "statement two"
|
||||
```
|
||||
|
||||
In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as ordered map keys.
|
||||
In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as
|
||||
ordered map keys.
|
||||
|
||||
## Statement Properties
|
||||
|
||||
@ -57,7 +68,10 @@ statements:
|
||||
stmt: statement two
|
||||
```
|
||||
|
||||
This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format names each property of the statement, it allows for other properties to be defined at this level as well. This includes all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A detailed example follows:
|
||||
This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format
|
||||
names each property of the statement, it allows for other properties to be defined at this level as well. This includes
|
||||
all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A
|
||||
detailed example follows:
|
||||
|
||||
```yaml
|
||||
statements:
|
||||
@ -72,9 +86,12 @@ statements:
|
||||
freeparam3: a value, as if it were assigned under the params block.
|
||||
```
|
||||
|
||||
In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as `freeparam3`. These are simply values that get assigned to the params map once all other processing has completed.
|
||||
In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing
|
||||
block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as
|
||||
`freeparam3`. These are simply values that get assigned to the params map once all other processing has completed.
|
||||
|
||||
It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long as some specific rules are followed. An example, which is equivalent to the above:
|
||||
It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long
|
||||
as some specific rules are followed. An example, which is equivalent to the above:
|
||||
|
||||
```yaml
|
||||
statements:
|
||||
@ -93,7 +110,8 @@ The rules:
|
||||
2. Do not use the **`<name>: <statement>`** form in combination with a
|
||||
**`stmt: <statement>`** property. It is not possible to detect if this occurs. Use caution if you choose to mix these forms.
|
||||
|
||||
As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params map for the statement.
|
||||
As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params
|
||||
map for the statement.
|
||||
|
||||
### Per-Statement Format
|
||||
|
||||
@ -111,7 +129,9 @@ statements:
|
||||
type: preload
|
||||
```
|
||||
|
||||
Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>: statement` form), the third is a statement config map, and the fourth is a combination of the previous two.
|
||||
Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>:
|
||||
statement` form), the third is a statement config map, and the fourth is a combination of the previous two.
|
||||
|
||||
The above is valid nosqlbench YAML, although a reader would need
|
||||
to know about the rules explained above in order to really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for overrides and naming when you need it.
|
||||
The above is valid nosqlbench YAML, although a reader would need to know about the rules explained above in order to
|
||||
really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for
|
||||
overrides and naming when you need it.
|
||||
|
@ -5,14 +5,15 @@ weight: 07
|
||||
|
||||
# Multi-Docs
|
||||
|
||||
The YAML spec allows for multiple yaml documents to be concatenated in the
|
||||
same file with a separator:
|
||||
The YAML spec allows for multiple yaml documents to be concatenated in the same file with a separator:
|
||||
|
||||
```yaml
|
||||
---
|
||||
```
|
||||
|
||||
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you need only put them in separate logical documents, separated by a triple-dash.
|
||||
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of
|
||||
statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you
|
||||
need only put them in separate logical documents, separated by a triple-dash.
|
||||
|
||||
For example:
|
||||
|
||||
@ -42,8 +43,11 @@ doc2.number eight
|
||||
doc1.form1 doc1.1
|
||||
```
|
||||
|
||||
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken apart into a whole other level of partitioning if desired.
|
||||
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken
|
||||
apart into a whole other level of partitioning if desired.
|
||||
|
||||
:::warning
|
||||
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are generally a better choice. See examples in the standard workloads.
|
||||
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML
|
||||
workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are
|
||||
generally a better choice. See examples in the standard workloads.
|
||||
:::
|
||||
|
@ -5,7 +5,8 @@ weight: 08
|
||||
|
||||
# Template Params
|
||||
|
||||
All nosqlbench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic macro facility that allows named anchors to be placed in the document as a whole:
|
||||
All nosqlbench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic
|
||||
macro facility that allows named anchors to be placed in the document as a whole:
|
||||
|
||||
```text
|
||||
<<varname:defaultval>>
|
||||
@ -13,7 +14,9 @@ All nosqlbench YAML formats support a parameter macro format that applies before
|
||||
TEMPLATE(varname,defaultval)
|
||||
```
|
||||
|
||||
In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with `barbaz`. If none is provided then the default value will be used instead. For example:
|
||||
In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity
|
||||
parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with
|
||||
`barbaz`. If none is provided then the default value will be used instead. For example:
|
||||
|
||||
```text
|
||||
[test]$ cat > stdout-test.yaml
|
||||
@ -28,6 +31,7 @@ MISSING
|
||||
THIS IS IT
|
||||
```
|
||||
|
||||
If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or `TEMPLATE(varname,)`.
|
||||
If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or
|
||||
`TEMPLATE(varname,)`.
|
||||
|
||||
|
||||
|
@ -20,13 +20,16 @@ name: doc2
|
||||
...
|
||||
```
|
||||
|
||||
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except for documentation or metric naming purposes.
|
||||
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except
|
||||
for documentation or metric naming purposes.
|
||||
|
||||
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
|
||||
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the
|
||||
document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
|
||||
|
||||
For example, the full name of statement1 above would be `doc1--block1--stmt1`.
|
||||
|
||||
:::info
|
||||
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want to adopt the habit of naming all your statements something basic and descriptive.
|
||||
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want
|
||||
to adopt the habit of naming all your statements something basic and descriptive.
|
||||
:::
|
||||
|
||||
|
@ -21,10 +21,11 @@ scenarios:
|
||||
- run driver=diag cycles=10M
|
||||
```
|
||||
|
||||
This provides a way to specify more detailed workflows that users may want
|
||||
to run without them having to build up a command line for themselves.
|
||||
This provides a way to specify more detailed workflows that users may want to run without them having to build up a
|
||||
command line for themselves.
|
||||
|
||||
A couple of other forms are supported in the YAML, for terseness:
|
||||
|
||||
```yaml
|
||||
scenarios:
|
||||
oneliner: run driver=diag cycles=10
|
||||
@ -32,16 +33,15 @@ scenarios:
|
||||
part1: run driver=diag cycles=10 alias=part2
|
||||
part2: run driver=diag cycles=20 alias=part2
|
||||
```
|
||||
These forms simply provide finesse for common editing habits, but they are
|
||||
automatically read internally as a list. In the map form, the names are discarded,
|
||||
but they may be descriptive enough for use as inline docs for some users. The
|
||||
|
||||
These forms simply provide finesse for common editing habits, but they are automatically read internally as a list. In
|
||||
the map form, the names are discarded, but they may be descriptive enough for use as inline docs for some users. The
|
||||
order is retained as listed, since the names have no bearing on the order.
|
||||
|
||||
## Scenario selection
|
||||
|
||||
When a named scenario is run, it is *always* named, so that it can be looked up
|
||||
in the list of named scenarios under your `scenarios:` property. The only
|
||||
exception to this is when an explicit scenario name is not found on the command
|
||||
When a named scenario is run, it is *always* named, so that it can be looked up in the list of named scenarios under
|
||||
your `scenarios:` property. The only exception to this is when an explicit scenario name is not found on the command
|
||||
line, in which case it is automatically assumed to be _default_.
|
||||
|
||||
Some examples may be more illustrative:
|
||||
@ -69,27 +69,24 @@ You can run multiple named scenarios in the same command if
|
||||
|
||||
## Workload selection
|
||||
|
||||
The examples above contain no reference to a workload (formerly called _yaml_).
|
||||
They don't need to, as they refer to themselves implicitly. You may add a `workload=`
|
||||
parameter to the command templates if you like, but this is never needed for basic
|
||||
use, and it is error prone to keep the filename matched to the command template. Just
|
||||
leave it out by default.
|
||||
The examples above contain no reference to a workload (formerly called _yaml_). They don't need to, as they refer to
|
||||
themselves implicitly. You may add a `workload=` parameter to the command templates if you like, but this is never
|
||||
needed for basic use, and it is error prone to keep the filename matched to the command template. Just leave it out by
|
||||
default.
|
||||
|
||||
_However_, if you are doing advanced scripting across multiple systems, you can
|
||||
actually provide a `workload=` parameter particularly to use another workload
|
||||
description in your test.
|
||||
_However_, if you are doing advanced scripting across multiple systems, you can actually provide a `workload=` parameter
|
||||
particularly to use another workload description in your test.
|
||||
|
||||
:::info
|
||||
This is a powerful feature for workload automation and organization. However, it can
|
||||
get unweildy quickly. Caution is advised for deep-linking too many scenarios in a workspace,
|
||||
as there is no mechanism for keeping them in sync when small changes are made.
|
||||
This is a powerful feature for workload automation and organization. However, it can get unweildy quickly. Caution is
|
||||
advised for deep-linking too many scenarios in a workspace, as there is no mechanism for keeping them in sync when small
|
||||
changes are made.
|
||||
:::
|
||||
|
||||
## Named Scenario Discovery
|
||||
|
||||
For named scenarios, there is a way for users to find all the named scenarios that are
|
||||
currently bundled or in view of their current directory. A couple simple rules must
|
||||
be followed by scenario publishers in order to keep things simple:
|
||||
For named scenarios, there is a way for users to find all the named scenarios that are currently bundled or in view of
|
||||
their current directory. A couple simple rules must be followed by scenario publishers in order to keep things simple:
|
||||
|
||||
1. Workload files in the current directory `*.yaml` are considered.
|
||||
2. Workload files under in the relative path `activities/` with name `*.yaml` are
|
||||
@ -99,38 +96,33 @@ be followed by scenario publishers in order to keep things simple:
|
||||
4. Any workload file that contains a `scenarios:` tag is included, but all others
|
||||
are ignored.
|
||||
|
||||
This doesn't mean that you can't use named scenarios for workloads in other locations.
|
||||
It simply means that when users use the `--list-scenarios` option, these are the only
|
||||
ones they will see listed.
|
||||
This doesn't mean that you can't use named scenarios for workloads in other locations. It simply means that when users
|
||||
use the `--list-scenarios` option, these are the only ones they will see listed.
|
||||
|
||||
## Parameter Overrides
|
||||
|
||||
You can override parameters that are provided by named scenarios. Any parameter
|
||||
that you specify on the command line after your workload and optional scenario name
|
||||
will be used to override or augment the commands that are provided for the named scenario.
|
||||
You can override parameters that are provided by named scenarios. Any parameter that you specify on the command line
|
||||
after your workload and optional scenario name will be used to override or augment the commands that are provided for
|
||||
the named scenario.
|
||||
|
||||
This is powerful, but it also means that you can sometimes munge user-provided
|
||||
activity parameters on the command line with the named scenario commands in ways
|
||||
that may not make sense. To solve this, the parameters in the named scenario commands
|
||||
may be locked. You can lock them silently, or you can provide a verbose locking that will
|
||||
cause an error if the user even tries to adjust them.
|
||||
This is powerful, but it also means that you can sometimes munge user-provided activity parameters on the command line
|
||||
with the named scenario commands in ways that may not make sense. To solve this, the parameters in the named scenario
|
||||
commands may be locked. You can lock them silently, or you can provide a verbose locking that will cause an error if the
|
||||
user even tries to adjust them.
|
||||
|
||||
Silent locking is provided with a form like `param==value`. Any silent locked parameters
|
||||
will reject overrides from the command line, but will not interrupt the user.
|
||||
Silent locking is provided with a form like `param==value`. Any silent locked parameters will reject overrides from the
|
||||
command line, but will not interrupt the user.
|
||||
|
||||
Verbose locking is provided with a form like `param===value`. Any time a user provides
|
||||
a parameter on the command line for the named parameter, an error is thrown and they
|
||||
are informed that this is not possible. This level is provided for cases in which you
|
||||
would not want the user to be unaware of an unset parameter which is germain and specific
|
||||
to the named scenario.
|
||||
Verbose locking is provided with a form like `param===value`. Any time a user provides a parameter on the command line
|
||||
for the named parameter, an error is thrown and they are informed that this is not possible. This level is provided for
|
||||
cases in which you would not want the user to be unaware of an unset parameter which is germain and specific to the
|
||||
named scenario.
|
||||
|
||||
All other parameters provided by the user will take the place of the same-named parameters
|
||||
provided in *each* command templates, in the order they appear in the template.
|
||||
Any other parameters provided by the user will be added to *each* of the command templates
|
||||
in the order they appear on the command line.
|
||||
All other parameters provided by the user will take the place of the same-named parameters provided in *each* command
|
||||
templates, in the order they appear in the template. Any other parameters provided by the user will be added to *each*
|
||||
of the command templates in the order they appear on the command line.
|
||||
|
||||
This is a little counter-intuitive at first, but once you see some examples it should
|
||||
make sense.
|
||||
This is a little counter-intuitive at first, but once you see some examples it should make sense.
|
||||
|
||||
## Parameter Overide Examples
|
||||
|
||||
@ -176,9 +168,8 @@ $
|
||||
|
||||
### Silent Locking example
|
||||
|
||||
If you run the second scenario `s2` with your own value for `cycles=7`, then it does
|
||||
what the locked parameter `cycles==10` requires, without telling you that it is
|
||||
ignoring the specified value on your command line.
|
||||
If you run the second scenario `s2` with your own value for `cycles=7`, then it does what the locked parameter
|
||||
`cycles==10` requires, without telling you that it is ignoring the specified value on your command line.
|
||||
|
||||
```
|
||||
$ nb basics s2 cycles=7
|
||||
@ -200,19 +191,15 @@ Sometimes, this is appropriate, such as when specifying settings like `threads==
|
||||
|
||||
### Verbose Locking example
|
||||
|
||||
If you run the third scenario `s3` with your own value for `cycles=7`, then you
|
||||
will get an error telling you that this is not possible. Sometimes you want to
|
||||
make sure tha the user knows a parameter should not be changed, and that if they
|
||||
want to change it, they'll have to make their own custom version of the scenario
|
||||
in question.
|
||||
If you run the third scenario `s3` with your own value for `cycles=7`, then you will get an error telling you that this
|
||||
is not possible. Sometimes you want to make sure tha the user knows a parameter should not be changed, and that if they
|
||||
want to change it, they'll have to make their own custom version of the scenario in question.
|
||||
```
|
||||
$ nb basics s3 cycles=7
|
||||
ERROR: Unable to reassign value for locked param 'cycles===7'
|
||||
$
|
||||
```
|
||||
|
||||
Ultimately, it is up to the scenario designer when to lock parameters for users.
|
||||
The built-in workloads offer some examples on how to set these parameters so that
|
||||
the right value are locked in place without bother the user, but some values
|
||||
are made very clear in how they should be set. Please look at these examples
|
||||
for inspiration when you need.
|
||||
Ultimately, it is up to the scenario designer when to lock parameters for users. The built-in workloads offer some
|
||||
examples on how to set these parameters so that the right value are locked in place without bother the user, but some
|
||||
values are made very clear in how they should be set. Please look at these examples for inspiration when you need.
|
||||
|
@ -5,78 +5,86 @@ weight: 99
|
||||
|
||||
## Diagnostics
|
||||
|
||||
This section describes errors that you might see if you have a YAML loading issue, and what
|
||||
you can do to fix them.
|
||||
This section describes errors that you might see if you have a YAML loading issue, and what you can do to fix them.
|
||||
|
||||
### Undefined Name-Statement Tuple
|
||||
|
||||
This exception is thrown when the statement body is not found in a statement definition
|
||||
in any of the supported formats. For example, the following block will cause an error:
|
||||
This exception is thrown when the statement body is not found in a statement definition in any of the supported formats.
|
||||
For example, the following block will cause an error:
|
||||
|
||||
statements:
|
||||
- name: statement-foo
|
||||
params:
|
||||
aparam: avalue
|
||||
```yaml
|
||||
statements:
|
||||
- name: statement-foo
|
||||
params:
|
||||
aparam: avalue
|
||||
```
|
||||
|
||||
This is because `name` and `params` are reserved property names -- removed from the list of name-value
|
||||
pairs before free parameters are read. If the statement is not defined before free parameters
|
||||
are read, then the first free parameter is taken as the name and statement in `name: statement` form.
|
||||
This is because `name` and `params` are reserved property names -- removed from the list of name-value pairs before free
|
||||
parameters are read. If the statement is not defined before free parameters are read, then the first free parameter is
|
||||
taken as the name and statement in `name: statement` form.
|
||||
|
||||
To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry
|
||||
with a `statement-foo: statement body` at the top of the map:
|
||||
To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry with a
|
||||
`statement-foo: statement body` at the top of the map:
|
||||
|
||||
Either of these will work:
|
||||
|
||||
statements:
|
||||
- name: statement-foo
|
||||
stmt: statement body
|
||||
params:
|
||||
aparam: avalue
|
||||
```yaml
|
||||
statements:
|
||||
- name: statement-foo
|
||||
stmt: statement body
|
||||
params:
|
||||
aparam: avalue
|
||||
---
|
||||
statements:
|
||||
- statement-foo: statement body
|
||||
params:
|
||||
aparam: avalue
|
||||
```
|
||||
|
||||
statements:
|
||||
- statement-foo: statement body
|
||||
params:
|
||||
aparam: avalue
|
||||
|
||||
In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit
|
||||
naming should occur.
|
||||
In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit naming
|
||||
should occur.
|
||||
|
||||
### Redefined Name-Statement Tuple
|
||||
|
||||
This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception
|
||||
to avoid possible ambiguity about which value the user intended. For example, the following statements
|
||||
definition will cause an error:
|
||||
This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception to avoid
|
||||
possible ambiguity about which value the user intended. For example, the following statements definition will cause an
|
||||
error:
|
||||
|
||||
statements:
|
||||
- name: name1
|
||||
name2: statement body
|
||||
```yaml
|
||||
statements:
|
||||
- name: name1
|
||||
name2: statement body
|
||||
```
|
||||
|
||||
This is an error because the statement is not defined before free parameters are read, and the `name: statement`
|
||||
form includes a second definition for the statement name. In order to correct this, simply remove the separate
|
||||
`name` entry, or use the `stmt` property to explicitly set the statement body. Either of these will work:
|
||||
This is an error because the statement is not defined before free parameters are read, and the `name: statement` form
|
||||
includes a second definition for the statement name. In order to correct this, simply remove the separate `name` entry,
|
||||
or use the `stmt` property to explicitly set the statement body. Either of these will work:
|
||||
|
||||
statements:
|
||||
- name2: statement body
|
||||
|
||||
statements:
|
||||
- name: name1
|
||||
stmt: statement body
|
||||
```yaml
|
||||
statements:
|
||||
- name2: statement body
|
||||
---
|
||||
statements:
|
||||
- name: name1
|
||||
stmt: statement body
|
||||
```
|
||||
|
||||
In both cases, there is only one name defined for the statement according to the supported formats.
|
||||
|
||||
### YAML Parsing Error
|
||||
|
||||
This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not
|
||||
working from examples that are known to load cleanly, then please review your document for correctness
|
||||
according to the [YAML Specification]().
|
||||
This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not working from
|
||||
examples that are known to load cleanly, then please review your document for correctness according to the
|
||||
[YAML Specification]().
|
||||
|
||||
If you are sure that the YAML should load, then please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug)
|
||||
with details on the type of YAML file you are trying to load.
|
||||
If you are sure that the YAML should load, then please
|
||||
[submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details on the type of YAML
|
||||
file you are trying to load.
|
||||
|
||||
### YAML Construction Error
|
||||
|
||||
This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed
|
||||
from the in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation.
|
||||
Please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details
|
||||
on the type of YAML file you are trying to load.
|
||||
This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed from the
|
||||
in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation. Please
|
||||
[submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details on the type of YAML
|
||||
file you are trying to load.
|
||||
|
||||
|
@ -5,27 +5,42 @@ weight: 40
|
||||
|
||||
# Designing Workloads
|
||||
|
||||
Workloads in nosqlbench are always controlled by a workload definition. Even the built-in workloads are simply pre-configured and controlled from a single YAML file which is bundled internally.
|
||||
Workloads in nosqlbench are always controlled by a workload definition.
|
||||
Even the built-in workloads are simply pre-configured and controlled
|
||||
from a single YAML file which is bundled internally.
|
||||
|
||||
With nosqlbench a standard YAML configuration format is provided that is used across all activity types. This makes it easy to specify statements, statement parameters, data bindings, and tags. This section describes the standard YAML format and how to use it.
|
||||
With nosqlbench a standard YAML configuration format is provided that is
|
||||
used across all activity types. This makes it easy to specify
|
||||
statements, statement parameters, data bindings, and tags. This section
|
||||
describes the standard YAML format and how to use it.
|
||||
|
||||
It is recommended that you read through the examples in each of the design sections in order. This guide was designed to give you a detailed understanding of workload construction with nosqlbench. The examples will also give you better insight into how nosqlbench works at a fundamental level.
|
||||
It is recommended that you read through the examples in each of the
|
||||
design sections in order. This guide was designed to give you a detailed
|
||||
understanding of workload construction with nosqlbench. The examples
|
||||
will also give you better insight into how nosqlbench works at a
|
||||
fundamental level.
|
||||
|
||||
## Multi-Protocol Support
|
||||
|
||||
You will notice that this guide is not overly CQL-specific. That is because nosqlbench is a multi-protocol tool. All that is needed for you to use this guide with other protocols is the release of more activity types. Try to keep that in mind as you think about designing workloads.
|
||||
You will notice that this guide is not overly CQL-specific. That is
|
||||
because nosqlbench is a multi-protocol tool. All that is needed for you
|
||||
to use this guide with other protocols is the release of more activity
|
||||
types. Try to keep that in mind as you think about designing workloads.
|
||||
|
||||
## Advice for new builders
|
||||
|
||||
### Review existing examples
|
||||
|
||||
The built-in workloads that are include with nosqlbench are also shared on the github site where we manage the nosqlbench project:
|
||||
The built-in workloads that are include with nosqlbench are also shared
|
||||
on the github site where we manage the nosqlbench project:
|
||||
|
||||
- [baselines](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/baselines)
|
||||
- [bindings](https://github.com/datastax/nosqlbench-labs/tree/master/sample-activities/bindings)
|
||||
|
||||
### Follow the conventions
|
||||
|
||||
The tagging conventions described under the YAML Conventions section will make your testing go smoother. All of the baselines that we publish for nosqlbench will use this form.
|
||||
The tagging conventions described under the YAML Conventions section
|
||||
will make your testing go smoother. All of the baselines that we publish
|
||||
for nosqlbench will use this form.
|
||||
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: activity type - CQL
|
||||
title: driver - CQL
|
||||
weight: 06
|
||||
---
|
||||
|
||||
@ -16,35 +16,31 @@ To select this activity type, pass `driver=cql` to a run or start command.
|
||||
|
||||
# cql activity type
|
||||
|
||||
This is an activity type which allows for the execution of CQL statements.
|
||||
This particular activity type is wired synchronously within each client
|
||||
thread, however the async API is used in order to expose fine-grain
|
||||
metrics about op binding, op submission, and waiting for a result.
|
||||
This is an activity type which allows for the execution of CQL statements. This particular activity type is wired
|
||||
synchronously within each client thread, however the async API is used in order to expose fine-grain metrics about op
|
||||
binding, op submission, and waiting for a result.
|
||||
|
||||
### Example activity definitions
|
||||
|
||||
Run a cql activity named 'cql1', with definitions from activities/cqldefs.yaml
|
||||
~~~
|
||||
... driver=cql alias=cql1 workload=cqldefs
|
||||
~~~
|
||||
|
||||
... driver=cql alias=cql1 workload=cqldefs
|
||||
|
||||
Run a cql activity defined by cqldefs.yaml, but with shortcut naming
|
||||
~~~
|
||||
... driver=cql workload=cqldefs
|
||||
~~~
|
||||
|
||||
... driver=cql workload=cqldefs
|
||||
|
||||
Only run statement groups which match a tag regex
|
||||
~~~
|
||||
... driver=cql workload=cqldefs tags=group:'ddl.*'
|
||||
~~~
|
||||
|
||||
... driver=cql workload=cqldefs tags=group:'ddl.*'
|
||||
|
||||
Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
|
||||
~~~
|
||||
... driver=cql workload=cqldefs tags=group:'dml.*' cycles=1000..1100
|
||||
~~~
|
||||
This last example shows that the cycle range is [inclusive..exclusive),
|
||||
to allow for stacking test intervals. This is standard across all
|
||||
activity types.
|
||||
|
||||
|
||||
... driver=cql workload=cqldefs tags=group:'dml.*' cycles=1000..1100
|
||||
|
||||
This last example shows that the cycle range is [inclusive..exclusive), to allow for stacking test intervals. This is
|
||||
standard across all activity types.
|
||||
|
||||
### CQL ActivityType Parameters
|
||||
|
||||
|
@ -23,19 +23,16 @@ that uses the curly brace token form in statements.
|
||||
## Example activity definitions
|
||||
|
||||
Run a stdout activity named 'stdout-test', with definitions from activities/stdout-test.yaml
|
||||
~~~
|
||||
... driver=stdout workload=stdout-test
|
||||
~~~
|
||||
|
||||
... driver=stdout workload=stdout-test
|
||||
|
||||
Only run statement groups which match a tag regex
|
||||
~~~
|
||||
... driver=stdout workload=stdout-test tags=group:'ddl.*'
|
||||
~~~
|
||||
|
||||
... driver=stdout workload=stdout-test tags=group:'ddl.*'
|
||||
|
||||
Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
|
||||
~~~
|
||||
... driver=stdout workload=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv
|
||||
~~~
|
||||
|
||||
... driver=stdout workload=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv
|
||||
|
||||
This last example shows that the cycle range is [inclusive..exclusive),
|
||||
to allow for stacking test intervals. This is standard across all
|
||||
@ -54,45 +51,50 @@ activity types.
|
||||
|
||||
## Configuration
|
||||
|
||||
This activity type uses the uniform yaml configuration format.
|
||||
For more details on this format, please refer to the
|
||||
This activity type uses the uniform yaml configuration format. For more details on this format, please refer to the
|
||||
[Standard YAML Format](http://docs.engineblock.io/user-guide/standard_yaml/)
|
||||
|
||||
## Configuration Parameters
|
||||
|
||||
- **newline** - If a statement has this param defined, then it determines
|
||||
whether or not to automatically add a missing newline for that statement
|
||||
only. If this is not defined for a statement, then the activity-level
|
||||
parameter takes precedence.
|
||||
- **newline** - If a statement has this param defined, then it determines whether or not to automatically add a missing
|
||||
newline for that statement only. If this is not defined for a statement, then the activity-level parameter takes
|
||||
precedence.
|
||||
|
||||
## Statement Format
|
||||
|
||||
The statement format for this activity type is a simple string. Tokens between
|
||||
curly braces are used to refer to binding names, as in the following example:
|
||||
The statement format for this activity type is a simple string. Tokens between curly braces are used to refer to binding
|
||||
names, as in the following example:
|
||||
|
||||
statements:
|
||||
- "It is {minutes} past {hour}."
|
||||
```yaml
|
||||
statements:
|
||||
- "It is {minutes} past {hour}."
|
||||
```
|
||||
|
||||
If you want to suppress the trailing newline that is automatically added, then
|
||||
you must either pass `newline=false` as an activity param, or specify it
|
||||
in the statement params in your config as in:
|
||||
|
||||
```yaml
|
||||
params:
|
||||
newline: false
|
||||
```
|
||||
|
||||
### Auto-generated statements
|
||||
|
||||
If no statement is provided, then the defined binding names are used as-is
|
||||
to create a CSV-style line format. The values are concatenated with
|
||||
comma delimiters, so a set of bindings like this:
|
||||
If no statement is provided, then the defined binding names are used as-is to create a CSV-style line format. The values
|
||||
are concatenated with comma delimiters, so a set of bindings like this:
|
||||
|
||||
bindings:
|
||||
one: Identity()
|
||||
two: NumberNameToString()
|
||||
```yaml
|
||||
bindings:
|
||||
one: Identity()
|
||||
two: NumberNameToString()
|
||||
```
|
||||
|
||||
would create an automatic string template like this:
|
||||
|
||||
statements:
|
||||
- "{one},{two}\n"
|
||||
```yaml
|
||||
statements:
|
||||
- "{one},{two}\n"
|
||||
```
|
||||
|
||||
The auto-generation behavior is forced when the format parameter is supplied.
|
||||
|
@ -3,11 +3,12 @@ title: Driver Types
|
||||
weight: 50
|
||||
---
|
||||
|
||||
Each nosqlbench scenario is comprised of one or more activities of a specific type.
|
||||
The types of activities available are provided by the version of nosqlbench.
|
||||
Each nosqlbench scenario is comprised of one or more activities of a
|
||||
specific type. The types of activities available are provided by the
|
||||
version of nosqlbench.
|
||||
|
||||
Additional activity types will be added in future releases.
|
||||
There are command line help topics for each activity type (driver).
|
||||
Additional drivers will be added in future releases. There are command
|
||||
line help topics for each activity type (driver).
|
||||
|
||||
To get a list of topics run:
|
||||
|
||||
|
@ -4,17 +4,25 @@ title: CLI Scripting
|
||||
|
||||
# CLI Scripting
|
||||
|
||||
Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to write javascript for every scenario.
|
||||
Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in
|
||||
between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to
|
||||
write javascript for every scenario.
|
||||
|
||||
For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them together.
|
||||
For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key
|
||||
API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them
|
||||
together.
|
||||
|
||||
## Script Construction
|
||||
|
||||
As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a snippet of script. It is important to remember that order is important.
|
||||
As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once
|
||||
the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a
|
||||
snippet of script. It is important to remember that order is important.
|
||||
|
||||
## Command line format
|
||||
|
||||
Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you can simply string multiple commands together with spaces between. As usual, single word options without double dashes are commands, key=value style parameters apply to the previous command, and all other commands with
|
||||
Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you
|
||||
can simply string multiple commands together with spaces between. As usual, single word options without double dashes
|
||||
are commands, key=value style parameters apply to the previous command, and all other commands with
|
||||
|
||||
--this-style
|
||||
|
||||
@ -22,28 +30,35 @@ are non-scripting options.
|
||||
|
||||
## Concurrency & Control
|
||||
|
||||
All activities that run during a scenario run under the control of, but
|
||||
independently from the scenario script. This means that you can have a number of activities running while the scenario script is doing its own thing. The scenario only completes when both the scenario script and the activities are finished.
|
||||
All activities that run during a scenario run under the control of, but independently from the scenario script. This
|
||||
means that you can have a number of activities running while the scenario script is doing its own thing. The scenario
|
||||
only completes when both the scenario script and the activities are finished.
|
||||
|
||||
### `start driver=<activity type> alias=<alias> ...`
|
||||
|
||||
You can start an activity with this command. At the time this command is
|
||||
evaluated, the activity is started, and the script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in this way, they will run concurrently.
|
||||
You can start an activity with this command. At the time this command is evaluated, the activity is started, and the
|
||||
script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in
|
||||
this way, they will run concurrently.
|
||||
|
||||
The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless you want to be able to interact with the started activity later. In any case, it is a good idea to name all your activities with a meaningful alias.
|
||||
The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless
|
||||
you want to be able to interact with the started activity later. In any case, it is a good idea to name all your
|
||||
activities with a meaningful alias.
|
||||
|
||||
### `stop <alias>`
|
||||
|
||||
Stop an activity with the given alias. This is synchronous, and causes the
|
||||
scenario to pause until the activity is stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state.
|
||||
Stop an activity with the given alias. This is synchronous, and causes the scenario to pause until the activity is
|
||||
stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state.
|
||||
|
||||
### `await <alias>`
|
||||
|
||||
Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a paused state until the named activity is complete.
|
||||
Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits
|
||||
for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a
|
||||
paused state until the named activity is complete.
|
||||
|
||||
### `run driver=<activity type> alias=<alias> ...`
|
||||
|
||||
Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is effectively the same as
|
||||
Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is
|
||||
effectively the same as
|
||||
|
||||
start driver=<activity type> ... alias=<alias>
|
||||
await <alias>
|
||||
@ -71,7 +86,8 @@ await one \
|
||||
stop two
|
||||
~~~
|
||||
|
||||
in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative of what happens when it is run.
|
||||
in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative
|
||||
of what happens when it is run.
|
||||
|
||||
1. An activity named 'a' is started, with 100K cycles of work.
|
||||
2. An activity named 'b' is started, with 200K cycles of work.
|
||||
|
@ -6,81 +6,115 @@ title: Scenario Scripting
|
||||
|
||||
## Motive
|
||||
|
||||
The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done by human hands.
|
||||
The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not
|
||||
accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of
|
||||
scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this
|
||||
approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done
|
||||
by human hands.
|
||||
|
||||
## Machinery, Controls & Instruments
|
||||
|
||||
All of the heavy lifting is left to Java and the core nosqlbench runtime. This includes the iterative workloads that are meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open design palette of a first-class scripting language.
|
||||
All of the heavy lifting is left to Java and the core nosqlbench runtime. This includes the iterative workloads that are
|
||||
meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually
|
||||
GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity
|
||||
logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity
|
||||
workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open
|
||||
design palette of a first-class scripting language.
|
||||
|
||||
Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic control points and parameters which special to that activity type (driver). This exposes a full feedback loop between a running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a running activity and make changes to it on the fly.
|
||||
Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic
|
||||
control points and parameters which special to that activity type (driver). This exposes a full feedback loop between a
|
||||
running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a
|
||||
running activity and make changes to it on the fly.
|
||||
|
||||
## Scripting Environment
|
||||
|
||||
The nosqlbench scripting environment provided has a few
|
||||
modifications meant to streamline understanding and usage of nosqlbench dynamic parameters and metric.
|
||||
The nosqlbench scripting environment provided has a few modifications meant to streamline understanding and usage of
|
||||
nosqlbench dynamic parameters and metric.
|
||||
|
||||
### Active Bindings
|
||||
|
||||
Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime. Each of the variables
|
||||
below is pre-wired into each script environment.
|
||||
Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime.
|
||||
Each of the variables below is pre-wired into each script environment.
|
||||
|
||||
#### scenario
|
||||
|
||||
This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this Java type are provided
|
||||
to the scripting environment directly.
|
||||
This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this
|
||||
Java type are provided to the scripting environment directly.
|
||||
|
||||
#### activities.<alias>.<paramname>
|
||||
|
||||
Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value to it as in `activities.foo.threads=3`.
|
||||
Any assignments take effect synchronously before the next line of the script continues executing.
|
||||
Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you
|
||||
can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value
|
||||
to it as in `activities.foo.threads=3`. Any assignments take effect synchronously before the next line of the script
|
||||
continues executing.
|
||||
|
||||
#### __metrics__.<alias>.<metric name>
|
||||
|
||||
Each activity metric for a given activity alias is available at this name.
|
||||
This gives you access to the metrics objects directly. Some metrics objects
|
||||
have also been enhanced with wrapper logic to provide simple getters and setters, like `.p99ms` or `.p99ns`, for example.
|
||||
Each activity metric for a given activity alias is available at this name. This gives you access to the metrics objects
|
||||
directly. Some metrics objects have also been enhanced with wrapper logic to provide simple getters and setters, like
|
||||
`.p99ms` or `.p99ns`, for example.
|
||||
|
||||
Interaction with the nosqlbench runtime and the activities therein is made easy
|
||||
by the above variables and objects. When an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to _threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
|
||||
Interaction with the nosqlbench runtime and the activities therein is made easy by the above variables and objects. When
|
||||
an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to
|
||||
_threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other
|
||||
changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
|
||||
|
||||
:::warning
|
||||
Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a different driver once it has been created.
|
||||
Assignment to the _workload_ and _alias_ activity parameters has no special effect, as you can't change an activity to a
|
||||
different driver once it has been created.
|
||||
:::
|
||||
|
||||
You can make use of more extensive Java or Javascript libraries as needed,
|
||||
mixing then with the runtime controls provided above.
|
||||
You can make use of more extensive Java or Javascript libraries as needed, mixing then with the runtime controls
|
||||
provided above.
|
||||
|
||||
## Enhanced Metrics for Scripting
|
||||
|
||||
The metrics available in nosqlbench are slightly different than the standard
|
||||
kit with dropwizard metrics. The key differences are:
|
||||
The metrics available in nosqlbench are slightly different than the standard kit with dropwizard metrics. The key
|
||||
differences are:
|
||||
|
||||
### HDR Histograms
|
||||
|
||||
All histograms use HDR histograms with *four* significant digits.
|
||||
|
||||
All histograms reset on snapshot, automatically keeping all data until you
|
||||
report the snapshot or access the snapshot via scripting. (see below).
|
||||
All histograms reset on snapshot, automatically keeping all data until you report the snapshot or access the snapshot
|
||||
via scripting. (see below).
|
||||
|
||||
The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything differently in your reporter config to use them. However, if you need to use the enhanced versions in your local scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions, then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full snapshots for each report to your downstream metrics systems.
|
||||
The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything
|
||||
differently in your reporter config to use them. However, if you need to use the enhanced versions in your local
|
||||
scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions,
|
||||
then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full
|
||||
snapshots for each report to your downstream metrics systems.
|
||||
|
||||
### Scripting with Delta Snapshots
|
||||
|
||||
For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as <metric>.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the reported snapshot will be exactly what was used in the script.
|
||||
For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as
|
||||
<metric>.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the
|
||||
deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes
|
||||
the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for
|
||||
that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the
|
||||
reported snapshot will be exactly what was used in the script.
|
||||
|
||||
This is important for using local scripting methods and calculations with aggregate views downstream. It means that the histograms will match up between your local script output and your downstream dashboards, as they will both be using the same frame of data, when done properly.
|
||||
This is important for using local scripting methods and calculations with aggregate views downstream. It means that the
|
||||
histograms will match up between your local script output and your downstream dashboards, as they will both be using the
|
||||
same frame of data, when done properly.
|
||||
|
||||
### Histogram Convenience Methods
|
||||
|
||||
All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98, P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and every other possible combination. This means that you can access the 99th percentile metric value in your scripts for activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_.
|
||||
All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98,
|
||||
P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and
|
||||
every other possible combination. This means that you can access the 99th percentile metric value in your scripts for
|
||||
activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_.
|
||||
|
||||
## Control Flow
|
||||
|
||||
When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop all activities.
|
||||
When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its
|
||||
end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop
|
||||
all activities.
|
||||
|
||||
## Strategies
|
||||
|
||||
You can use nosqlbench in the classic form with `run driver=<activity_type> param=value ...` command line syntax. There are reasons, however, that you will sometimes want customize and modify your scripts directly, such as:
|
||||
You can use nosqlbench in the classic form with `run driver=<activity_type> param=value ...` command line syntax. There
|
||||
are reasons, however, that you will sometimes want customize and modify your scripts directly, such as:
|
||||
|
||||
- Permute test variables to cover many sub-conditions in a test.
|
||||
- Automatically adjust load factors to identify the nominal capacity of a system.
|
||||
@ -89,7 +123,9 @@ You can use nosqlbench in the classic form with `run driver=<activity_type> para
|
||||
|
||||
## Script Input & Output
|
||||
|
||||
Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that each line was recorded from.
|
||||
Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the
|
||||
logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that
|
||||
each line was recorded from.
|
||||
|
||||
## External Docs
|
||||
|
||||
|
@ -4,23 +4,34 @@ title: Standard Metrics
|
||||
|
||||
# Standard Metrics
|
||||
|
||||
nosqlbench comes with a set of standard metrics that will be part of every activity type (driver). Each activity type (driver) enhances the metrics available by adding their own metrics with the nosqlbench APIs. This section explains what the standard metrics are, and how to interpret them.
|
||||
nosqlbench comes with a set of standard metrics that will be part of every activity type (driver). Each activity type
|
||||
(driver) enhances the metrics available by adding their own metrics with the nosqlbench APIs. This section explains what
|
||||
the standard metrics are, and how to interpret them.
|
||||
|
||||
## read-input
|
||||
|
||||
Within nosqlbench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only measured the time it takes for a given activity thread to read the input value, nothing more.
|
||||
Within nosqlbench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that
|
||||
will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric
|
||||
is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only
|
||||
measured the time it takes for a given activity thread to read the input value, nothing more.
|
||||
|
||||
## strides
|
||||
|
||||
A stride represents the work-unit for a thread within nosqlbench. It allows a set of cycles to be logically grouped together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them.
|
||||
A stride represents the work-unit for a thread within nosqlbench. It allows a set of cycles to be logically grouped
|
||||
together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple
|
||||
operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them.
|
||||
|
||||
The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring time before the cycle starts, and stops measuring after the last cycle in the stride has run.
|
||||
The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring
|
||||
time before the cycle starts, and stops measuring after the last cycle in the stride has run.
|
||||
|
||||
## cycles
|
||||
|
||||
Within nosqlbench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an iteration of a workload. This corresponds to a single operation executed according to some statement definition.
|
||||
Within nosqlbench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an
|
||||
iteration of a workload. This corresponds to a single operation executed according to some statement definition.
|
||||
|
||||
The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by multi-phase actions.
|
||||
The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has
|
||||
control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by
|
||||
multi-phase actions.
|
||||
|
||||
|
||||
|
||||
|
@ -4,26 +4,45 @@ title: Timing Terms
|
||||
|
||||
# Timing Terms
|
||||
|
||||
Often, terms used to describe latency can create confusion.
|
||||
In fact, the term _latency_ is so overloaded in practice that it is not useful by itself. Because of this, nosqlbench will avoid using the term latency _except in a specific way_. Instead, the terms described in this section will be used.
|
||||
Often, terms used to describe latency can create confusion. In fact, the term _latency_ is so overloaded in practice
|
||||
that it is not useful by itself. Because of this, nosqlbench will avoid using the term latency _except in a specific
|
||||
way_. Instead, the terms described in this section will be used.
|
||||
|
||||
nosqlbench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it *cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that are adopted with nosqlbench.
|
||||
nosqlbench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to
|
||||
what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it
|
||||
*cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that
|
||||
are adopted with nosqlbench.
|
||||
|
||||
Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When using this term to describe other effects in your system, what is included depends on the perspective of the requester. The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is defined by the vantage point of the requester. This is the perspective taken by the nosqlbench approach for naming and semantics below.
|
||||
Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When
|
||||
using this term to describe other effects in your system, what is included depends on the perspective of the requester.
|
||||
The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is
|
||||
defined by the vantage point of the requester. This is the perspective taken by the nosqlbench approach for naming and
|
||||
semantics below.
|
||||
|
||||
## responsetime
|
||||
|
||||
**The duration of time a user has to wait for a response from the time they submitted the request.** Response time is the duration of time from when a request was expected to start, to the time at which the response is finally seen by the user. A request is generally expected to start immediately when users make a request. For example, when a user enters a URL into a browser, they expect the request to start immediately when they hit enter.
|
||||
**The duration of time a user has to wait for a response from the time they submitted the request.** Response time is
|
||||
the duration of time from when a request was expected to start, to the time at which the response is finally seen by the
|
||||
user. A request is generally expected to start immediately when users make a request. For example, when a user enters a
|
||||
URL into a browser, they expect the request to start immediately when they hit enter.
|
||||
|
||||
In nosqlbench, the response time for any operation can be calculated by adding its wait time and its the service time together.
|
||||
In nosqlbench, the response time for any operation can be calculated by adding its wait time and its the service time
|
||||
together.
|
||||
|
||||
## waittime
|
||||
|
||||
**The duration of time between when an operation is intended to start and when it actually starts on a client.** This is also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that delays a client request, it must wait.
|
||||
|
||||
**The duration of time between when an operation is intended to start and when it actually starts on a client.** This is
|
||||
also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests
|
||||
instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This
|
||||
ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that
|
||||
delays a client request, it must wait.
|
||||
|
||||
Wait time can accumulate when you are running something according to a dispatch rate, as with a rate limiter.
|
||||
|
||||
## servicetime
|
||||
|
||||
**The duration of time it takes a server or other system to fully process to a request and send a response.** From the perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the service time metrics in nosqlbench include any operational time that is external to the client, including transport latency.
|
||||
**The duration of time it takes a server or other system to fully process to a request and send a response.** From the
|
||||
perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the
|
||||
service time metrics in nosqlbench include any operational time that is external to the client, including transport
|
||||
latency.
|
||||
|
||||
|
@ -6,7 +6,8 @@ title: Advanced Metrics
|
||||
|
||||
## Unit of Measure
|
||||
|
||||
All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4 digits of precision using HDR histograms.
|
||||
All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4
|
||||
digits of precision using HDR histograms.
|
||||
|
||||
## Metric Outputs
|
||||
|
||||
@ -19,14 +20,14 @@ Metrics from a scenario run can be gathered in multiple ways:
|
||||
- To a monitoring system via graphite
|
||||
- via the --docker-metrics option
|
||||
|
||||
With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command line options for enabling these are documented in the built-in help, although some examples of these may be found below.
|
||||
With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command
|
||||
line options for enabling these are documented in the built-in help, although some examples of these may be found below.
|
||||
|
||||
## Metrics via Graphite
|
||||
|
||||
If you like to have all of your testing data in one place, then you may be
|
||||
interested in reporting your measurements to a monitoring system. For this,
|
||||
nosqlbench includes a [Metrics Library](https://github.com/dropwizard/metrics).
|
||||
Graphite reporting is baked in as the default reporter.
|
||||
If you like to have all of your testing data in one place, then you may be interested in reporting your measurements to
|
||||
a monitoring system. For this, nosqlbench includes a
|
||||
[Metrics Library](https://github.com/dropwizard/metrics). Graphite reporting is baked in as the default reporter.
|
||||
|
||||
In order to enable graphite reporting, use one of these options formats:
|
||||
|
||||
@ -43,12 +44,16 @@ Core metrics use the prefix _engineblock_ by default. You can override this with
|
||||
|
||||
## Identifiers
|
||||
|
||||
Metrics associated with a specific activity will have the activity alias in
|
||||
their name. There is a set of core metrics which are always present regardless of the activity type. The names and types of additional metrics provided for each activity type vary.
|
||||
Metrics associated with a specific activity will have the activity alias in their name. There is a set of core metrics
|
||||
which are always present regardless of the activity type. The names and types of additional metrics provided for each
|
||||
activity type vary.
|
||||
|
||||
Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the most verbose, a metric name could take on the form like
|
||||
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements, which is recommended.
|
||||
Just keep in mind that the double dash connects an activity's alias with named statements *within* that activity.
|
||||
Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given
|
||||
statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the
|
||||
most verbose, a metric name could take on the form like
|
||||
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements,
|
||||
which is recommended. Just keep in mind that the double dash connects an activity's alias with named statements *within*
|
||||
that activity.
|
||||
|
||||
## HDR Histograms
|
||||
|
||||
@ -63,26 +68,30 @@ If you want to record only certain metrics in this way, then use this form:
|
||||
--log-histograms 'hdrdata.log:.*suffix'
|
||||
|
||||
|
||||
Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The '.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as using '.\*', which matches all metrics. Any valid regex is allowed here.
|
||||
Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The
|
||||
'.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as
|
||||
using '.\*', which matches all metrics. Any valid regex is allowed here.
|
||||
|
||||
Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram reservoirs does come with a cost, so be sure to be specific in what you record as much as possible.
|
||||
Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram
|
||||
reservoirs does come with a cost, so be sure to be specific in what you record as much as possible.
|
||||
|
||||
If you want to specify the recording interval, use this form:
|
||||
|
||||
--log-histograms 'hdrdata.log:.*suffix:5s'
|
||||
|
||||
If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty, such as 'hdrdata.log::5s'.
|
||||
If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty,
|
||||
such as 'hdrdata.log::5s'.
|
||||
|
||||
Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in terms of accuracy.
|
||||
Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in
|
||||
terms of accuracy.
|
||||
|
||||
### Recording HDR Histogram Stats
|
||||
|
||||
You can also record basic snapshots of histogram data on a periodic interval
|
||||
just like above with HDR histogram logs. The option to do this is:
|
||||
You can also record basic snapshots of histogram data on a periodic interval just like above with HDR histogram logs.
|
||||
The option to do this is:
|
||||
|
||||
--log-histostats 'hdrstats.log:.*suffix:10s'
|
||||
|
||||
|
||||
Everything works the same as for hdr histogram logging, except that the format is in CSV as shown in the example below:
|
||||
|
||||
~~~
|
||||
@ -97,5 +106,8 @@ Tag=diag1.cycles,0.501,0.499,498,1024,2047,2047,4095,4095,4095,4095,4095,4095,40
|
||||
...
|
||||
~~~
|
||||
|
||||
This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common percentiles are recorded directly.
|
||||
This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number
|
||||
of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that
|
||||
the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common
|
||||
percentiles are recorded directly.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user