import dsbench docs

This commit is contained in:
Jonathan Shook 2020-03-04 13:33:58 -06:00
parent a39742b10c
commit 7d8a0ee403
62 changed files with 2880 additions and 2 deletions

View File

@ -28,7 +28,7 @@
<dependency>
<groupId>io.nosqlbench</groupId>
<artifactId>virtdata-docsys</artifactId>
<artifactId>docsys</artifactId>
<version>3.12.3-SNAPSHOT</version>
<scope>compile</scope>
</dependency>

View File

@ -0,0 +1,25 @@
---
title: Compatibility
weight: 3
---
# Binary Format
DSBench is distributed primarily as a binary for Linux distributions. The DSBench binary includes its own OpenJDK runtime. It should work for most modern Linux distributions. The only requirement is that fuse be installed (it is usually already present) on the client system.
# Supported Systems
DSBench runs on Linux as a binary distribution. Any modern Linux which can run AppImage binaries should work.
# Activity Types
In dsbench terms, this means:
Activity types are how DSBench gets its support for different protocols or client drivers. The initial release of DSBench includes support for
these activity types:
- The CQL activity type
- The initial release of the CQL activity type uses the DataStax driver version 1.9.0
- The stdout activity type.

View File

@ -0,0 +1,18 @@
---
title: Release Notes
weight: 5
---
# Release Notes
The enhancements, fixes, and usage notes that go with each release
will be summarized here.
## Initial Release
- Release Date: 01/29/2020
- System Requirements:
- Any recent Linux system with FUSE support
- Supported Activity Types:
- cql
- stdout

View File

@ -0,0 +1,26 @@
---
title: Support Options
weight: 10
---
# Support Options
These guidelines are mirrored at the [Submitting Feedback](https://github.com/datastax/dsbench-labs/wiki/Submitting-Feedback) wiki page at the dsbench project site, which is also where the `[Submit Feedback]` link will take you.
## Community Support
It is supported by a community of active users at [DataStax DSBench Community](https://community.datastax.com/spaces/51/index.html).
## Bug Fixes
If you think you have found a bug, please [file a bug report](https://github.com/datastax/dsbench-labs/issues/new?labels=bug). DSBench is actively used within DataStax, and verified bugs will get attention as resources permit. Bugs reports which are more detailed, or bug reports which include steps to reproduce will get attention first.
## Feature Requests
If you would like to see something in DSBench that is not there yet,
please [submit a feature request](https://github.com/datastax/dsbench-labs/issues/new?labels=feature).
## Documentation Requests
If you would like to see a specific dsbench or testing topic added to the guidebook, please [request docs content](https://github.com/datastax/dsbench-labs/issues/new?labels=docrequest).

View File

@ -0,0 +1,24 @@
---
title: Troubleshooting
weight: 05
---
# Troubleshooting
This section will contain some troubleshooting guidance for
common issue as we uncover them.
## Errors while starting dsbench binary
If you get an error while trying to run the Linux DSBench binary, ensure that you have the system module installed for fuse. This module is used by the AppImage runtime that allows for a bundled binary.
## Errors when running java -jar
### Verify java binary path
You will need to make sure that the java binary is the correct one that is being run. Either call it with the full path `/usr/local/...` or use `which java` to see which java executable is used when you just run `java ...`.
### Verify java version
Each version of dsbench requires a particular major version of Java. For example, dsbench version 2.12.26 requires at least Java 12.
You can quickly check which version of java you have on your path with `java -version`

View File

@ -0,0 +1,16 @@
---
title: Introducing DSBench
weight: 10
---
# DataStax Bench Documentation
Welcome to the documentation for DataStax Bench (DSBench). DSBench is a power tool that emulates real application workloads. This means that you can fast-track performance, sizing and data model testing without writing your own testing harness.
DSBench endeavors to be valuable to all users. We do this by making it easy for you, our user, to do just what you need without worrying about the rest. If you need to do something simple, it should be simple to find the right settings and just do it. If you need something more sophisticated, then you should be able to find what you need with a reasonable amount of effort and no surprises.
Doing this well requires a coordinated effort in how the tools are documented and layered. We're just getting started with the bundled
docs that you are reading now. Look for new and expanded content in this guidebook with each release. We will be adding docs for more advanced users to unlock based on a how-to format.
We take requests! If you have specific dsbench topics you'd like to
have added to this guidebook, please make a request as described under the Support Options section.

View File

@ -0,0 +1,29 @@
---
title: 01 Installing
weight: 1
---
# 1. Installing DSBench
If you are viewing this via the guidebook, you've already completed this step and you can move on to the next section.
If you are viewing this documentation as exported from the guidebook, then you need to get the binary or jar for your system.
The binary is recommended, since it contains its own built-in JVM. If you are running Linux, get the dsbench binary for Linux.
If you are running another system with a supported JVM, then you can do the following:
1. Download dsbench.jar
2. Download and install the JVM corresponding to the dsbench version. (The second number of the dsbench version indicates the JVM version). For example, dsbench version 2.13.4 would require JVM 13.
3. Execute dsbench as `java -jar dsbench.jar ...`. You can replace the elipses `...` with any valid dsbench command line.
If you have any trouble, check the troubleshooting section.
## Sanity Check
To ensure that dsbench runs on your system, simply run it as
dsbench --version

View File

@ -0,0 +1,151 @@
---
title: 02 Running
weight: 2
---
# 2. Running DSBench
Now that we have DSBench installed, we will run a simple test against a DSE cluster to establish some basic familiarity with the tool.
## Create Schema
We will start by creating a simple schema in the database.
From your command line, go ahead and execute the following command,
replacing the `host=<dse-host-or-ip>` with that of one of your database nodes.
dsbench run type=cql yaml=baselines/cql-keyvalue tags=phase:schema host=<dse-host-or-ip>
This command is creating the following schema in your database:
```cql
CREATE KEYSPACE baselines WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE baselines.keyvalue (
key text PRIMARY KEY,
value text
)
```
Let's break down each of those command line options.
`start` tells DSBench to start an activity.
`type=...` is used to specify the activity type. In this case we are using `cql`, which tells DSBench to use the DataStax Java Driver and execute CQL statements against a database.
`yaml=...` is used to specify the yaml file that defines the activity.
All activities require a yaml in which you configure things such as data bindings and CQL statements, but don't worry about those details right now.
In this example, we use `baselines/cql-keyvalue` which is a pre-built workload that is packaged with DSBench.
`tags=phase:schema` tells DSBench to run the yaml block that has the `phase:schema` defined as one of its tags.
In this example, that is the DDL portion of the `baselines/cql-keyvalue` workload.
`host=...` tells DSBench how to connect to your database, only one host is necessary.
If you like, you can verify the result of this command by decribing your keyspace in cqlsh or DataStax Studio with `DESCRIBE KEYSPACE baselines`.
## Load Some Data
Before running a test of typical access patterns where you want to capture the results, you need to make the test more interesting than loading an empty table. For this, we use the rampup phase.
Before sending our test writes to the database, we will use the `stdout` activity type so we can see what DSBench is generating for CQL statements.
Go ahead and execute the following command:
./dsbench start type=stdout yaml=baselines/cql-keyvalue tags=phase:rampup cycles=10
You should see 10 of the following statements in your console
```
insert into baselines.keyvalue (key, value) values (0,382062539);
insert into baselines.keyvalue (key, value) values (1,774912474);
insert into baselines.keyvalue (key, value) values (2,949364593);
insert into baselines.keyvalue (key, value) values (3,352527683);
insert into baselines.keyvalue (key, value) values (4,351686621);
insert into baselines.keyvalue (key, value) values (5,114304900);
insert into baselines.keyvalue (key, value) values (6,439790106);
insert into baselines.keyvalue (key, value) values (7,564330072);
insert into baselines.keyvalue (key, value) values (8,296173906);
insert into baselines.keyvalue (key, value) values (9,97405552);
```
One thing to know is that DSBench deterministically generates data, so the generated values will be the same from run to run.
Now we are ready to write some data to our database. Go ahead and execute the following from your command line:
./dsbench start type=cql yaml=baselines/cql-keyvalue tags=phase:rampup host=<dse-host-or-ip> cycles=100k --progress console:1s
Note the differences between this and the command that we used to generate the schema.
`tags=phase:rampup` is running the yaml block in `baselines/cql-keyvalue` that has only INSERT statements.
`cycles=100k` will run a total of 100,000 operations, in this case, 100,000 writes. You will want to pick an appropriately large number of cycles in actual testing to make your main test meaningful.
`--progress console:1s` will print the progression of the run to the console every 1 second.
You should see output that looks like this
```
baselines/cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
baselines/cql-keyvalue: 0.00%/Running (details: min=0 cycle=1 max=100000)
baselines/cql-keyvalue: 0.32%/Running (details: min=0 cycle=325 max=100000)
baselines/cql-keyvalue: 1.17%/Running (details: min=0 cycle=1171 max=100000)
baselines/cql-keyvalue: 2.36%/Running (details: min=0 cycle=2360 max=100000)
baselines/cql-keyvalue: 3.65%/Running (details: min=0 cycle=3648 max=100000)
baselines/cql-keyvalue: 4.61%/Running (details: min=0 cycle=4613 max=100000)
baselines/cql-keyvalue: 5.59%/Running (details: min=0 cycle=5593 max=100000)
baselines/cql-keyvalue: 7.14%/Running (details: min=0 cycle=7138 max=100000)
baselines/cql-keyvalue: 8.87%/Running (details: min=0 cycle=8868 max=100000)
...
baselines/cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
```
## Run main workload
Now that we have a base dataset of 100k rows in the database, we will now run a mixed read / write workload, by default this runs a 50% read / 50% write workload.
./dsbench start type=cql yaml=baselines/cql-keyvalue tags=phase:main host=<dse-host-or-ip> cycles=100k cyclerate=5000 threads=50 --progress console:1s
You should see output that looks like this:
```
Logging to logs/scenario_20190812_154431_028.log
baselines/cql-keyvalue: 0.50%/Running (details: min=0 cycle=500 max=100000)
baselines/cql-keyvalue: 2.50%/Running (details: min=0 cycle=2500 max=100000)
baselines/cql-keyvalue: 6.70%/Running (details: min=0 cycle=6700 max=100000)
baselines/cql-keyvalue: 11.16%/Running (details: min=0 cycle=11160 max=100000)
baselines/cql-keyvalue: 14.25%/Running (details: min=0 cycle=14250 max=100000)
baselines/cql-keyvalue: 18.41%/Running (details: min=0 cycle=18440 max=100000)
baselines/cql-keyvalue: 22.76%/Running (details: min=0 cycle=22760 max=100000)
baselines/cql-keyvalue: 27.27%/Running (details: min=0 cycle=27300 max=100000)
baselines/cql-keyvalue: 31.81%/Running (details: min=0 cycle=31810 max=100000)
baselines/cql-keyvalue: 36.34%/Running (details: min=0 cycle=36340 max=100000)
baselines/cql-keyvalue: 40.90%/Running (details: min=0 cycle=40900 max=100000)
baselines/cql-keyvalue: 45.48%/Running (details: min=0 cycle=45480 max=100000)
baselines/cql-keyvalue: 50.05%/Running (details: min=0 cycle=50050 max=100000)
baselines/cql-keyvalue: 54.36%/Running (details: min=0 cycle=54360 max=100000)
baselines/cql-keyvalue: 58.91%/Running (details: min=0 cycle=58920 max=100000)
baselines/cql-keyvalue: 63.40%/Running (details: min=0 cycle=63400 max=100000)
baselines/cql-keyvalue: 66.96%/Running (details: min=0 cycle=66970 max=100000)
baselines/cql-keyvalue: 71.61%/Running (details: min=0 cycle=71610 max=100000)
baselines/cql-keyvalue: 76.11%/Running (details: min=0 cycle=76130 max=100000)
baselines/cql-keyvalue: 80.66%/Running (details: min=0 cycle=80660 max=100000)
baselines/cql-keyvalue: 85.22%/Running (details: min=0 cycle=85220 max=100000)
baselines/cql-keyvalue: 89.80%/Running (details: min=0 cycle=89800 max=100000)
baselines/cql-keyvalue: 94.46%/Running (details: min=0 cycle=94460 max=100000)
baselines/cql-keyvalue: 98.93%/Running (details: min=0 cycle=98930 max=100000)
baselines/cql-keyvalue: 100.00%/Finished (details: min=0 cycle=100000 max=100000)
```
We have a few new command line options here:
`tags=phase:main` is using a new block in our activity's yaml that contains both read and write queries.
`threads=50` is an important one. The default for DSBench is to run with a single thread. This is not adequate for workloads that will be running many operations, so threads is used as a way to increase concurrency on the client side.
`cyclerate=5000` is used to control the operations per second that are initiated by DSBench. This command line option is the primary means to rate limit the workload and here we are running at 5000 ops/sec.
## Now What?
Note in the above output, we see `Logging to logs/scenario_20190812_154431_028.log`.
By default DSBench records the metrics from the run in this file, we will go into detail about these metrics in the next section Viewing Results.

View File

@ -0,0 +1,44 @@
---
title: 03 Getting Results
weight: 3
---
# 3. Getting Results
Coming off of our first run with DSBench, we ran a very simple workload against our database. In that example, we saw that DSBench writes to a log file and it is in that log file where the most basic form of metrics are displayed.
## Log File Metrics
For our previous run, we saw that DSBench was writing to `logs/scenario_20190812_154431_028.log`
Even when you don't configure DSBench to write its metrics to another location, it will periodically report all the metrics to the log file.
At the end of a scenario, before DSBench shuts down, it will flush the partial reporting interval again to the logs. This means you can always
look in the logs for metrics information.
:::warning
If you look in the logs for metrics, be aware that the last report will only contain a partial interval of results. When looking at the last partial window, only metrics which average over time or which compute the mean for the whole test will be meaningful.
:::
Below is a sample of the log that gives us our basic metrics. There is a lot to digest here, for now we will only focus a subset of the most important metrics.
```
2019-08-12 15:46:00,274 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:48] -- BEGIN METRICS DETAIL --
2019-08-12 15:46:00,294 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=GAUGE, name=baselines/cql-keyvalue.cycles.config.burstrate, value=5500.0
2019-08-12 15:46:00,295 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=GAUGE, name=baselines/cql-keyvalue.cycles.config.cyclerate, value=5000.0
2019-08-12 15:46:00,295 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=GAUGE, name=baselines/cql-keyvalue.cycles.waittime, value=3898782735
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=baselines/cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
2019-08-12 15:46:01,703 INFO [main] i.e.c.ScenarioResult [ScenarioResult.java:56] -- END METRICS DETAIL --
```
The log contains lots of information on metrics, but this is obviously _not_ the most desirable way to consume metrics from DSBench.
We recommend that you use one of these methods, according to your environment or tooling available:
1. `--docker-metrics` with a local docker-based grafana dashboard (See the section on Docker Based Metrics)
2. Send your metrics to a dedicated graphite server with `--report-graphite-to graphitehost`
3. Record your metrics to local CSV files with `--report-csv-to my_metrics_dir`
4. Record your metrics to HDR logs with `--log-histograms my_hdr_metrics.log`
See the command line reference for details on how to route your metrics to a metrics collector or format of your preference.

View File

@ -0,0 +1,63 @@
---
title: 04 Reading Metrics
weight: 4
---
# 4. Reading Metrics
A set of core metrics are provided for every workload that runs with DSBench, regardless of the activity type and protocol used. This section explains each of these metrics and shows an example of them from the log file.
## metric: result
This is the primary metric that should be used to get a quick idea of the throughput and latency for a given run. It encapsulates the entire operation life cycle ( ie. bind, execute, get result back ).
For this example we see that we averaged 3732 operations / second with 3.6ms 75th percentile latency and 23.9ms 99th percentile latency. Note the raw metrics are in microseconds. This duration_unit may change depending on how a user configures dsbench, so always double-check it.
```
2019-08-12 15:46:01,310 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=baselines/cql-keyvalue.result, count=100000, min=233.48, max=358596.607, mean=3732.00338612, stddev=10254.850416061185, median=1874.815, p75=3648.767, p95=10115.071, p98=15855.615, p99=23916.543, p999=111292.415, mean_rate=4024.0234405430424, m1=3514.053841156124, m5=3307.431472596865, m15=3268.6786509004132, rate_unit=events/second, duration_unit=microseconds
```
## metric: result-success
This metric shows whether there were any errors during the run. You can confirm that the count is equal to the number of cycles for the run if you are expecting or requiring zero failed operations.
Here we see that all 100k of our cycles succeeded. Note that the metrics for throughput and latency here are slightly different than the `results` metric simply because this is a separate timer that only includes operations which completed with no exceptions.
```
2019-08-12 15:46:01,452 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=TIMER, name=baselines/cql-keyvalue.result-success, count=100000, min=435.168, max=358645.759, mean=3752.40990808, stddev=10251.524945886964, median=1889.791, p75=3668.479, p95=10154.495, p98=15884.287, p99=24280.063, p999=111443.967, mean_rate=4003.3090048756894, m1=3523.40328629036, m5=3318.8463896065778, m15=3280.480326762243, rate_unit=events/second, duration_unit=microseconds
```
## metric: resultset-size
For read workloads, this metric shows the size of result sent back to DSBench from the server. This is useful to confirm that you are reading rows that already exist in the database.
TODO: talk about mix of read / writes and how that affects this metric
```
2019-08-12 15:46:00,298 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=baselines/cql-keyvalue.resultset-size, count=100000, min=0, max=1, mean=8.0E-5, stddev=0.008943914131967056, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
```
#### metric: tries
DSBench will retry failures 10 times by default, this is configurable via the `maxtries` command line option < link >. This metric shows a histogram of the number of tries that each operation required, in this example, there were no retries as the `count` is 100k.
```
2019-08-12 15:46:00,341 INFO [main] i.e.c.ScenarioResult [Slf4jReporter.java:373] type=HISTOGRAM, name=baselines/cql-keyvalue.tries, count=100000, min=1, max=1, mean=1.0, stddev=0.0, median=1.0, p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
```
### More Metrics
DSBench extends many ways to report the metrics from a run, including:
- Built-in Docker Dashboard
- Reporting to CSV
- Reporting to Graphite
- Reporting to HDR
To get more information on these options, see the output of
dsbench --help
### Congratulations
You have completed your first run with DSBench! Let's head over to the Next Steps section < link > to talk about the possibilities that are now at our fingertips.

View File

@ -0,0 +1,35 @@
---
title: 05 Next Steps
weight: 5
---
# 5. Next Steps
Now that you've run dsbench for the first time and seen what it does, you can choose what level of customization you want for further testing.
The sections below describe key areas that users typically customize when working with dsbench.
Everyone who uses DSBench will want to get familiar with the basics section below. This is essential reading for new and experienced testers alike.
## High-Level Users
Several canonical workloads are already baked-in to dsbench for immediate use. If you are simply wanting to drive workloads from dsbench without building a custom workload, then you'll want to learn about the available workloads and their options.
Recommended reading for this is:
1. 'Built-In Workloads'
2. 'DSBench Basics'
## Workload Builders
If you want to use dsbench to build a tailored workload that closely emulates what a specific application would do, then you can build a YAML file that specifies all of the details of an iterative workload. You can specify the access patterns, data distributions, and more.
The recommended reading for this is:
1. 'DSBench Basics'
2. All of the 'Designing Workloads' section.
3. The online examples (find the links in the Designing Workloads section.)
## Scenario Developers
The underlying runtime for a scenario in dsbench is based on EngineBlock,
which means it has all the scripting power that comes with that. For advanced scenario designs, iterative testing models, or analysis methods, you can use ECMAScript to control the scenario from start to finish. This is an advanced feature that is not recommended for first-time users. A guide for scenario developers will be released in increments.

View File

@ -0,0 +1,9 @@
---
title: Getting Started
weight: 20
---
# Getting Started
In this Getting Started track, we will walk you through your first test run with DSBench and explain the minimal set of information that you will need to get off the ground. We recommend that you go through the steps in this section in order, as each step builds on the last.

View File

@ -0,0 +1,174 @@
---
title: DSBench CLI Options
weight: 01
---
# DSBench CLI Options
This is the same documentation you get in markdown format with the
`dsbench --help` command.
---------------------------------------
Help ( You're looking at it. )
--help
Short options, like '-v' represent simple options, like verbosity.
Using multiples increases the level of the option, like '-vvv'.
Long options, like '--help' are top-level options that may only be
used once. These modify general behavior, or allow you to get more
details on how to use dsbench.
All other options are either commands, or named arguments to commands.
Any single word without dashes is a command that will be converted
into script form. Any option that includes an equals sign is a
named argument to the previous command. The following example
is a commandline with a command *start*, and two named arguments
to that command.
dsbench start type=diag alias=example
### Discovery options ###
These options help you learn more about running dsbench, and
about the plugins that are present in your particular version.
Get a list of additional help topics that have more detailed
documentation:
dsbench help topics
Provide specific help for the named activity type:
dsbench help <activity type>
List the available activity types
--list-activity-types
Provide the metrics that are available for scripting
--list-metrics <activity type> [ <activity name> ]
### Execution Options ###
This is how you actually tell dsbench what scenario to run. Each of these
commands appends script logic to the scenario that will be executed.
These are considered as commands, can occur in any order and quantity.
The only rule is that arguments in the arg=value form will apply to
the preceding script or activity.
Add the named script file to the scenario, interpolating named parameters:
script <script file> [arg=value]...
Add the named activity to the scenario, interpolating named parameters
run [arg=value]...
### General options ###
These options modify how the scenario is run.
Specify a directory for scenario log files:
--logs-dir <dirname>
Specify a limit on logfiles (old files will be purged):
--logs-max <count>
Specify the priority level of file logs:
--logs-level <level>
where `<level>` can be one of OFF, ERROR, WARN, INFO, DEBUG, TRACE, or ALL
Specify an override for one or more classes:
--log-level-override com.foobarbaz:DEBUG,com.barfoobaz:TRACE
Specify the logging pattern:
--with-logging-pattern '%date %level [%thread] %logger{10} [%file:%line] %msg%n'
( default: '%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n' )
( See https://logback.qos.ch/manual/layouts.html#ClassicPatternLayout for format options )
Specify a directory and enable CSV reporting of metrics:
--report-csv-to <dirname>
Specify the graphite destination and enable reporting
--report-graphite-to <addr>[:<port>]
Specify the interval for graphite or CSV reporting in seconds (default: 10)
--report-interval <interval-seconds>
Specify the metrics name prefix for graphite reporting
--metrics-prefix <metrics-prefix>
Log all HDR histogram data to a file
--log-histograms histodata.log
--log-histograms 'histodata.log:.*'
--log-histograms 'histodata.log:.*:1m'
--log-histograms 'histodata.log:.*specialmetrics:10s'
Log HDR histogram stats to a CSV file
--log-histostats stats.csv
--log-histostats 'stats.csv:.*' # same as above
--log-histostats 'stats.csv:.*:1m' # with 1-minute interval
--log-histostats 'stats.csv:.*specialmetrics:10s'
Adjust the progress reporting inverval
--progress console:10s
or
--progress logonly:5m
If you want to add in classic time decaying histogram metrics
for your histograms and timers, you may do so with this option:
--classic-histograms prefix
--classic-histograms 'prefix:.*' # same as above
--classic-histograms 'prefix:.*specialmetrics' # subset of names
Name the current session, for logfile naming, etc
By default, this will be "scenario-TIMESTAMP", and a logfile will be created
for this name.
--session-name <name>
Enlist engineblock to stand up your metrics infrastructure using a local docker runtime:
--docker-metrics
When this option is set, engineblock will start graphite, prometheus, and grafana automatically on your local docker, configure them to work together, and point engineblock to send metrics the system automatically. It also imports a base dashboard for engineblock and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be found on localhost:3000 with the default credentials admin/admin).
### Console Options ###
Increase console logging levels: (Default console logging level is *warning*)
-v (info)
-vv (debug)
-vvv (trace)
--progress console:1m (disables itself if -v options are used)
These levels affect *only* the console output level. Other logging level
parameters affect logging to the scenario log, stored by default in logs/...
Show version, long form, with artifact coordinates.
--version

View File

@ -0,0 +1,18 @@
---
title: Grafana Metrics
weight: 2
---
# (docker-based) Grafana Metrics
DSBench comes with a built-in helper to get you up and running quickly
with client-side testing metrics.
:::warning
This feature requires that you have docker running on the local system and that your user is in a group that is allowed to manage docker. Using the `--docker-metrics` command *will* attempt to manage docker on your local system.
:::
To ask DSBench to stand up your metrics infrastructure using a local docker runtime, use this command line option with any other DSBench commands:
--docker-metrics
When this option is set, dsbench will start graphite, prometheus, and grafana automatically on your local docker, configure them to work together, and to send metrics the system automatically. It also imports a base dashboard for dsbench and configures grafana snapshot export to share with a central DataStax grafana instance (grafana can be found on localhost:3000 with the default credentials admin/admin).

View File

@ -0,0 +1,54 @@
---
title: Parameter Types
weight: 03
---
# Parameter Types
To configure an DSBench activity to do something meaningful, you have to
provide parameters to it. This can occur in one of several ways. This section is a guide on DSBench parameters, how they layer together, and when to use one form over another.
The command line is used to configure both the overall DSBench runtime (logging, etc) as well as the individual activities and scripts. Global DSBench options can be distinguished from scenario commands and their parameters because because global options always start with a single or --double-hyphen.
## Activity Parameters
Parameters for an activity always have the form of `<name>=<value>` on the command line. Activity parameters *must* follow a command, such as `run` or `start`, for example. Scenario commands are always single words without any leading hyphens. Every command-line argument that follows a scenario command in the form of `<name>=<value>` is a parameter to that command.
Activity parameters can be provided by the DSBench core runtime or they can be provided by the activity type. All of the params are usable to configure an activity together. It's not important where they are provided from so long as you know what they do for your workloads, how to configure them, and where to find the docs.
*Core* Activity Parameters are those provided by the core runtime.
They are part of the core API and used by every activity type. Core activity params include type*, *alias*, and *threads*, for example.
These parameters are explained individually under the next section.
*Custom* Activity Parameters are those provided by an activity type.
These parameters are documented for each activity type. You can see them by running `dsbench help <activity type>`.
Activity type parameters may be dynamic. *Dynamic* Activity Parameters are parameters which may be changed while an activity is running. This means that scenario scripting logic may change some variables while an activity is running, and that the runtime should dynamically adjust to match. Dynamic parameters are mainly used in more advanced scripting scenarios.
Parameters that are dynamic should be documented as such in the respective activity type's help page.
### Template Parameters
If you need to provide general-purpose overrides to a named section of the
standard YAML, then you may use a mechanism called _template parameters_. These are just like activity parameters, but they are set via macro and cna have defaults. This is a YAML format feature that allows you to easily template workload properties in a way that is easy to override on the command line or via scripting. More details on template parameters are shared under 'Designing Workloads|Template Params'.
### Parameter Loading
Now that we've described all the parameter types, let's tie them together. When an activity is loaded from the command line or script, the parameters are resolved in the following order:
1. The `type` parameter tells DSBench which activity type implementation to load.
2. The activity type implementation creates an activity.
3. The activity is initialized with the parameters provided.
4. The yaml parameter is used to load the workload definition into
a buffer without parsing the YAML.
5. Any template parameters in the file in `<<varname:default value>>` or `TEMPLATE(varname,default value)` form are resolved, taking override values from the provided params.
6. Finally, the activity is started.
## Statement Parameters
Some activities make use of a parameters for statements. These are called _statement parameters_ and are completely different than _activity parameters_. Statement parameters in a YAML allow you to affect *how* a statement is used in a workload. Just as with activity level parameters, statement parameters may be supported by the core runtime or by an activity type. These are also documented in the respective activity type's documentation included in the 'Activity Types' section.
The core statement parameters are explained just below the core activity parameters in this sectin.

View File

@ -0,0 +1,202 @@
---
title: Core Activity Params
weight: 05
---
# Core Activity Parameters
Activity parameters are passed as named arguments for an activity,
either on the command line or via a scenario script. On the command line, these take the form of
<paramname>=<paramvalue>
Some activity parameters are universal in that they can be used with any activity type. These parameters are recognized by DSBench whether or not they are recognized by a particular activity type implementation. These are called _core parameters_. Only core activity parameters are documented here.
:::info
To see what activity parameters are valid for a given activity type, see the documentation for that activity type with `dsbench help <activity type>`.
:::
## type
- `type=<activity type>`
- _default_: inferred from `alias` or `yaml` parameters, or unset
- _required_: yes, unless inferred
- _dynamic_: no
Every activity is powered by a named ActivityType. Thus, you must set the `type` parameter. If you do not specify this parameter, it will be inferred from a substring match against the alias and/or yaml parameters. If there is more than one valid match for a valid type value, then you must set the type parameter directly.
Telling DSBench what type of an activity will be run also determines what other parameters are considered valid and how they will be used. So in this way, the type parameter is actually the base parameter for any activity. When used with scenario commands like `run` or `start`, an activity of the named type will be initialized, and then further activity parameters on the command line will be used to configure it before it is started.
## alias
- `alias=<alias>`
- _default_: inferred from yaml, or 'UNSET'
- _required_: no
- _dynamic_: no
You *should* set the _alias_ parameter when you have multiple activities,
when you want to name metrics per-activity, or when you want to control
activities via scripting.
Each activity can be given a symbolic name known as an _alias_. It is good
practice to give all your activities an alias, since this determines the named used in logging, metrics, and even scripting control.
_default value_ : The name of any provided YAML filename is used as the basis for the default alias. Otherwise, the activity type name is used. This is a convenience for simple test scenarios only.
## threads
- `threads=<threads>`
- _default_: 1
- _required_: no
- _dynamic_: yes
You *should* set the _threads_ parameter when you need to ramp up a workload.
Each activity can be created with a number of threads. It is important to adjust this setting to the system types used by DSBench.
_default value_ : For now, the default is simply *1*. Users must be aware of
this setting and adjust it to a reasonable value for their workloads.
:::info
The threads parameter will work slightly differently for activities using the async parameter. For example, when `async=500` is provided, then the number of async operations is split between all configured threads, and each thread will juggle a number of in-flight operations asynchronously. Without the async parameter, threads determines the logical concurrency level of DSBench in the classic 'request-per-thread' mode. Neither mode is strictly correct, and both modes can be used for more accurate testing depending on the constraints of your environment.
:::
A good rule of thumb for setting threads for maximum effect is to set it relatively high, such as 10XvCPU when running synchronous workloads (when not providing the async parameter), and to 5XvCPU for all async workloads. Variation in system dynamics make it difficult to peg an ideal number, so experimentation is encouraged while you dial in your settings initially.
## cycles
- `cycles=<cycle count>`
- `cycles=<cycle min>..<cycle max>`
- _default_: same as `stride`
- _required_: no
- _dynamic_: no
The cycles parameter determines the starting and ending point for an activity. It determines the range of values which will act as seed values for each operation. For each cycle of the test, a statement is built from a statement template and executed as an operation.
If you do not set the cycles parameter, then it will automatically be set to the size of the sequence. The sequence is simply the length of the op sequence that is constructed from the active statements and ratios in your activity YAML.
You *should* set the cycles for every activity except for schema-like activities, or activities which you run just as a sanity check of active statements.
In the `cycles=<cycle count>` version, the count indicates the total number of cycles, and is equivalent to `cycles=0..<cycle max>`. In both cases, the max value is not the actual number of the last cycle. This is because all cycle parameters define a closed-open interval. In other words, the minimum value is either zero by default or the specified minimum value, but the maximum value is the first value *not* included in the interval. This means that you can easily stack intervals over subsequent runs while knowing that you will cover all logical cycles without gaps or duplicates. For example, given `cycles=1000` and then `cycles=1000..2000`, and then `cycles=2000..5K`, you know that all cycles between 0 (inclusive) and 5000 (exclusive) have been specified.
## stride
- `stride=<stride>`
- _default_: same as op sequence length
- _required_: no
- _dynamic_: no
Usually, you don't want to provide a setting for stride, but it is still important to understand what it does. Within DSBench, each time a thread needs to allocate a set of cycles to operate on, it takes a contiguous range of values from a shared atomic value. Thus, the stride is the unit of micro-batching within DSBench. It also means that you can use stride to optimize a workload by setting the value higher than the default. For example if you are running a single-statement workload at a very high rate, it doesn't make sense for threads to allocate one op at a time from a shared atomic value. You can simply set `stride=1000` to cause (ballpark estimation) about 1000X less internal contention.
The stride is initialized to the calculated sequence length. The sequence length is simply the number of operations in the op sequence that is planned from your active statements and their ratios.
:::info
When simulating multi-op access patterns in non-async mode, the stride metric can tell you how long it took for a whole group of operations to complete.
:::
## async
- `async=<ops>`
- _default_: unset
- _required_: no
- _dynamic_: no
The `async=<ops>` parameter puts an activity into an asynchronous dispatch mode and configures each thread to juggle a proportion of the operations specified. If you specify `async=500 threads=10`, then each of 10 threads will manage execution of 50 operations at a time. With async mode, a thread will always prepare and send operations if there are fewer in flight than it is allotted before servicing any pending responses.
Async mode also puts threads into a different sequencing behavior. When in async mode, responses from an operation may arrive in a different order than they are sent, and thus linearized operations can't be guaranteed as with the non-async mode. This means that sometimes you use want to avoid async mode when you are intentionally simulating access patterns with multiple linearized operations per user as you may see in your application.
The absence of the async parameter leaves the activity in the default non-async mode, where each thread works through a sequence of ops one operation at a time.
## cyclerate
- `cyclerate=<cycle_per_second>`
- `cyclerate=<cycles_per_second>,<burst_ratio>`
- _default_: unset
- _required_: no
- _dynamic_: yes
The cyclerate parameter sets a maximum op rate for individual cycles within the activity, across the whole activity, irrespective of how many threads are active.
:::info
The cyclerate is a rate limiter, and can thus only throttle an activity to be slower than it would otherwise run. Rate limiting is also an invasive element in a workload, and will always come at a cost. For extremely high throughput testing, consider carefully whether your testing would benefit more from concurrency-based throttling as with async or the striderate described below.
:::
When the cyclerate parameter is provided, two additional metrics are tracked: the wait time and the response time. See the 'Reference|Timing Terms' section for more details on these metrics.
_default_: None. When the cyclerate parameter is not provided, an activity runs as fast as it can given how fast operations can complete.
Examples:
- `cyclerate=1000` - set the cycle rate limiter to 1000 ops/s and a default burst ratio of 1.1.
- `cyclerate=1000,1.0` - same as above, but with burstrate set to 1.0 (use it or lose it, not usually desired)
- `cyclerate=1000,1.5` - same as above, with burst rate set to 1.5 (aka 50% burst allowed)
### burst ratio
This is only an optional part of the cyclerate as shown in examples above. If you do not specify it when you initialize a cyclerate, then it defaults 1.1. The burst ratio is only valid as part of a rate limit and can not be specified by itself.
_default_: `1.1`
_dynamic_: yes
The DSBench rate limiter provides a sliding scale between strict rate limiting and average rate limiting. The difference between them is controlled by a _burst ratio_ parameter. When the burst ratio is 1.0 (burst up to 100% relative rate), the rate limiter acts as a strict rate limiter, disallowing faster operations from using time that was previously forfeited by prior slower operations. This is a "use it or lose it" mode that means things like GC events can steal throughput from a running client as a necessary effect of losing time in a strict timing sense.
When the burst ratio is set to higher than 1.0, faster operations may recover lost time from previously slower operations. For example, a burst ratio of 1.3 means that the rate limiter will allow bursting up to 130% of the base rate, but only until the average rate is back to 100% relative speed. This means that any valleys created in the actual op rate of the client can be converted into plateaus of throughput above the strict rate, but only at a speed that fits within (op rate * burst ratio). This allows for workloads to approximate the average target rate over time, with controllable bursting rates. This ability allows for near-strict behavior while allowing clients to still track truer to rate limit expectations, so long as the overall workload is not saturating resources.
:::info
The default burst ratio of 1.1 makes testing results slightly more stable on average, but can also hide some short-term slow-downs in system throughput. It is set at the default to fit most tester's expectations for averaging results, but it may not be strict enough for your testing purposes. However, a strict setting of 1.0 nearly always adds cold/startup time to the result, so if you are testing for steady state, be sure to account for this across test runs.
:::
## striderate
- `striderate=<strides per second>`
- `striderate=<strides per second>,<burst_ratio>`
- _default_: unset
- _required_: no
- _dynamic_: yes
The `striderate` parameter allows you to limit the start of a stride according to some rate. This works almost exactly like the cyclerate parameter, except that it blocks a whole group of operations from starting instead of a single operation. The striderate can use a burst ratio just as the cyclerate.
This sets the target rate for strides. In DSBench, a stride is a group of
operations that are dispatched and executed together within the same thread.
This is useful, for example, to emulate application behaviors in which some
outside request translates to multiple internal requests. It is also a way
to optimize a client runtime for more efficiency and throughput. The stride
rate limiter applies to the whole activity irrespective of how many threads
it has.
:::warning
When using the cyclerate an striderate options together, operations are delayed based on both rate limiters. If the relative rates are not synchronised with the side of a stride, then one rate limiter will artificially throttle the other. Thus, it usually doesn't make sense to use both of these settings in the same activity.
:::
## seq
- `seq=<bucket|concat|interval>`
- _default_: `seq=bucket`
- _required_: no
- _dynamic_: no
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly efficient form.
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or interval.
:::info
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields determined by distinct cycle values.
:::
There are three schemes to pick from:
### bucket
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield the sequence A B B B B B.
### concat
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
### interval
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time, then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval (0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are taken as the order.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A B A` as the op sequence.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle sequencing of statements.

View File

@ -0,0 +1,67 @@
---
title: Core Statement Params
weight: 06
---
# Core Statement Parameters
Some statement parameters are recognized by the DSBench runtime and can be used on any statement in a YAML file.
## *ratio*
A statement parameter called _ratio_ is supported by every workload. It can be attached to a statement, or a block or a document level parameter block. It sets the relative ratio of a statement in the op sequence before an activity is started.
When an activity is initialized, all of the active statements are combined into a sequence based on their relative ratios. By default, all statement templates are initialized with a ratio of 1 if non is specified by the user.
For example, consider the statements below:
```yaml
statements:
- s1: "select foo,bar from baz where ..."
ratio: 1
- s2: "select bar,baz from foo where ..."
ratio: 2
- s3: "select baz,foo from bar where ..."
ratio: 3
```
If all statements are activated (there is no tag filtering), then the activity will be initialized with a sequence length of 6. In this case, the relative ratio of statement "s3" will be 50% overall. If you filtered out the first statement, then the sequence would be 5 operations long. In this case, the relative ratio of statement "s3" would be 60% overall. It is important to remember that statement ratios are always relative to the total sum of the active statements' ratios.
:::info
Because the ratio works so closely with the activity parameter `seq`, the description for that parameter is include below.
:::
### *seq* (activity level - do not use on statements)
- `seq=<bucket|concat|interval>`
- _default_: `seq=bucket`
- _required_: no
- _dynamic_: no
The `seq=<bucket|concat|interval>` parameter determines the type of sequencing that will be used to plan the op sequence. The op sequence is a look-up-table that is used for each stride to pick statement forms according to the cycle offset. It is simply the sequence of statements from your YAML that will be executed, but in a pre-planned, and highly efficient form.
An op sequence is planned for every activity. With the default ratio on every statement as 1, and the default bucket scheme, the basic result is that each active statement will occur once in the order specified. Once you start adding ratios to statements, the most obvious thing that you might expect wil happen: those statements will occur multiple times to meet their ratio in the op mix. You can customize the op mix further by changing the seq parameter to concat or interval.
:::info
The op sequence is a look up table of statement templates, *not* individual statements or operations. Thus, the cycle still determines the uniqueness of an operation as you would expect. For example, if statement form ABC occurs 3x per sequence because you set its ratio to 3, then each of these would manifest as a distinct operation with fields determined by distinct cycle values.
:::
There are three schemes to pick from:
### bucket
This is a round robin planner which draws operations from buckets in circular fashion, removing each bucket as it is exhausted. For example, the ratios A:4, B:2, C:1 would yield the sequence A B C A B A A. The ratios A:1, B5 would yield the sequence A B B B B B.
### concat
This simply takes each statement template as it occurs in order and duplicates it in place to achieve the ratio. The ratios above (A:4, B:2, C:1) would yield the sequence A A A A B B C for the concat sequencer.
### interval
This is arguably the most complex sequencer. It takes each ratio as a frequency over a unit interval of time, and apportions the associated operation to occur evenly over that time. When two operations would be assigned the same time, then the order of appearance establishes precedence. In other words, statements appearing first win ties for the same time slot. The ratios A:4 B:2 C:1 would yield the sequence A B C A A B A. This occurs because, over the unit interval (0.0,1.0), A is assigned the positions `A: 0.0, 0.25, 0.5, 0.75`, B is assigned the positions `B: 0.0, 0.5`, and C is assigned position `C: 0.0`. These offsets are all sorted with a position-stable sort, and then the associated ops are taken as the order.
In detail, the rendering appears as `0.0(A), 0.0(B), 0.0(C), 0.25(A), 0.5(A), 0.5(B), 0.75(A)`, which yields `A B C A A B A` as the op sequence.
This sequencer is most useful when you want a stable ordering of operation from a rich mix of statement types, where each operations is spaced as evenly as possible over time, and where it is not important to control the cycle-by-cycle sequencing of statements.

View File

@ -0,0 +1,8 @@
---
title: DSBench Basics
weight: 30
---
This section covers the essential details that you'll need to
run DSBench in different ways.

View File

@ -0,0 +1,111 @@
---
title: CQL IoT
weight: 2
---
## Description
The CQL IoT workload demonstrates a time-series telemetry system as typically
found in IoT applications. The bulk of the traffic is telemetry ingest. This is
useful for establishing steady-state capacity with an actively managed data
lifecycle. This is a steady-state workload, where inserts are 90% of the
operations and queries are the remaining 10%.
## Schema
CREATE KEYSPACE baselines WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE baselines.iot (
station_id UUID,
machine_id UUID,
machine_type text,
sensor_value double,
time timestamp,
PRIMARY KEY (machine_id, time)
) WITH CLUSTERING ORDER BY (time DESC)
AND compaction = { 'class': 'TimeWindowCompactionStrategy' }
AND default_ttl = 3600;
## Workload Sequence
1. schema - Install the schema
2. rampup - Ramp-Up to steady state for normative density, writes only 100M rows
3. main - Run at steady state with 10% reads and 90% writes, 100M rows
For in-depth testing, this workload will take some time to build up data density
where TTLs begin purging expired data. At this point, the test should be
considered steady-state.
## Data Set
### baselines.iot dataset (rampup,main)
- machine_id - 1000 unique values
- sensor_name - 100 symbolic names, from a seed file
- time - monotonically increasing timestamp
- station_id - 100 unique values
- sensor_value - normal distribution, median 100, stddev 5.0
## Operations
### insert (rampup, main)
insert into baselines.iot
(machine_id, sensor_name, time, sensor_value, station_id)
values (?,?,?,?,?)
### query (main)
select * from baselines.iot
where machine_id=? and sensor_name=?
limit 10
## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests.
When used for additional testing, the following parameters should be supported:
- machines - the number of unique sources (default: 1000)
- stations - the number of unique stations (default: 100)
- limit - the limit for rows in reads (default: 10)
- expiry_minutes - the TTL for data in minutes.
- compression - enabled or disabled, to disable, set compression=''
- write_cl - the consistency level for writes (default: LOCAL_QUORUM)
- read_cl - the consistency level for reads (defaultL LOCAL_QUORUM)
## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a
user's perspective. For microbench and baseline tests, these are the only
required metrics. When gathering metrics from multiple server nodes, they should
be kept in aggregate form, for min, max, and average for each time interval in
monitoring. For example, the avg p99 latency for reads should be kept, as well
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
### Client-Side
- read ops/s
- write ops/s
- read latency histograms
- write latency histograms
- exception counts
### Server-Side
- bytes compacted over time
- pending compactions
- active data on disk
- total data on disk
## Notes on Interpretation
- In order for this test to show useful performance contrasts, it has to be ramped to steady-state.
- Ingest of 1G rows yields an on-disk data density of 20.8 GB using default compression settings.

View File

@ -0,0 +1,100 @@
---
title: CQL Key-Value
weight: 1
---
## Description
The CQL Key-Value workload demonstrates the simplest possible schema with
payload data. This is useful for measuring system capacity most directly in
terms of raw operations. As a reference point, provides some insight around
types of workloads that are constrained around messaging, threading, and
tasking, rather than bulk throughput.
During preload, all keys are set with a value. During the main phase of the
workload, random keys from the known population are replaced with new values
which never repeat. During the main phase, random partitions are selected for
upsert, with row values never repeating.
## Schema
CREATE KEYSPACE baselines IF NOT EXISTS WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE baselines.keyvalue (
user_id UUID,
user_code text
PRIMARY KEY (user_id)
);
## Workload Sequence
1. schema - Initialize the schema.
2. rampup - Load data according to the data set size.
3. main - Run the workload
## Operations
### insert (rampup, main)
insert into baselines.keyvalue (key, value) values (?,?);
### read (main)
select * from baselines.keyvalue where key=?key;
## Data Set
### baselines.keyvalue insert (rampup)
- key - text, number as string, selected sequentially up to keycount
- value - text, number as string, selected sequentially up to valuecount
### baselines.keyvalue insert (main)
- key - text, number as string, selected uniformly within keycount
- value - text, number as string, selected uniformly within valuecount
### baselines.keyvalue read (main)
- key - text, number as string, selected uniformly within keycount
## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests.
When used for additional testing, the following parameters should be supported:
- keycount - the number of unique keys
- valuecount - the number of unique values
## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a
user's perspective. For microbench and baseline tests, these are the only
required metrics. When gathering metrics from multiple server nodes, they should
be kept in aggregate form, for min, max, and average for each time interval in
monitoring. For example, the avg p99 latency for reads should be kept, as well
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
### Client-Side
- read ops/s
- write ops/s
- read latency histograms
- write latency histograms
- exception counts
### Server-Side
- pending compactions
- bytes compacted
- active data on disk
- total data on disk
# Notes on Interpretation
Once the average ratio of overwrites starts to balance with the rate of
compaction, a steady state should be achieved. At this point, pending
compactions and bytes compacted should be mostly flat over time.

View File

@ -0,0 +1,106 @@
---
title: CQL Wide Rows
weight: 3
---
## Description
The CQL Wide Rows workload provides a way to tax a system with wide rows of a given size. This is useful to help understand underlying performance differences between version and configuration options
when using data models that have wide rows.
## Schema
CREATE KEYSPACE if not exists baselines WITH replication =
{ 'class': 'NetworkTopologyStrategy', 'dc1': 3 };
CREATE TABLE if not exists baselines.widerows (
part text,
clust text,
data text,
PRIMARY KEY (part,clust)
);
## Workload Sequence
1. schema - Install the schema
2. rampup - Fully populate the widerows with data, 100000 elements per row
3. main - Run at steady state with 50% reads and 50% writes, 100M rows
For in-depth testing, this workload needs significant density of partitions in
combination with fully populated wide rows. For exploratory or parameter
contrasting tests, ensure that the rampup phase is configured correctly to
establish this initial state.
## Data Set
### baselines.widerows dataset (rampup)
- part - text, number in string form, sequentially from 1..1E9
- clust - text, number in string form, sequentially from 1..1E9
- data - text, extract from lorem ipsum between 50 and 150 characters
### baselines.widerows dataset (main)
- part - text, number in string form, sequentially from 1..1E9
- clust - text, number in string form, sequentially from 1..<partsize>
- data - text, extract from lorem ipsum between 50 and 150 characters
- machine_id - 1000 unique values
- sensor_name - 100 symbolic names, from a seed file
- time - monotonically increasing timestamp
- station_id - 100 unique values
- sensor_value - normal distribution, median 100, stddev 5.0
## Operations
### insert (rampup, main)
insert into baselines.iot
(machine_id, sensor_name, time, sensor_value, station_id)
values (?,?,?,?,?)
### query (main)
select * from baselines.iot
where machine_id=? and sensor_name=?
limit 10
## Workload Parameters
This workload has no adjustable parameters when used in the baseline tests.
When used for additional testing, the following parameters should be supported:
- partcount - the number of unique partitions
- partsize - the number of logical rows within a CQL partition
## Key Performance Metrics
Client side metrics are a more accurate measure of the system behavior from a
user's perspective. For microbench and baseline tests, these are the only
required metrics. When gathering metrics from multiple server nodes, they should
be kept in aggregate form, for min, max, and average for each time interval in
monitoring. For example, the avg p99 latency for reads should be kept, as well
as the min p99 latency for reads. If possible metrics, should be kept in plot
form, with discrete histogram values per interval.
### Client-Side
- read ops/s
- write ops/s
- read latency histograms
- write latency histograms
- exception counts
### Server-Side
- bytes compacted over time
- pending compactions
- active data on disk
- total data on disk
## Notes on Interpretation

View File

@ -0,0 +1,27 @@
---
title: Built-In Workloads
weight: 40
---
There are a few built-in workloads which you may want to run. These workloads can be run from a command without having to configure anything, or they can be tailored with their built-in parameters.
This section of the guidebook will explain each of them in detail.
## Built-In Workload Conventions
The built-in workloads follow a set of conventions so that they can
be used interchangeably:
### Phases
Each built-in contains the following tags that can be used to break the workload up into uniform phases:
- schema - selected with `tags=phase:schema`
- rampup - selected with `tags=phase:rampup`
- main - selected with `tags=phase:main`
### Parameters
Each built-in has a set of adjustable parameters which is documented below per workload. For example, the cql-iot workload has a `sources` parameter which determines the number of unique devices in the dataset.

View File

@ -0,0 +1,45 @@
---
title: 00 YAML Organization
weight: 00
---
It is best to keep every workload self-contained within a single YAML file, including schema, data rampup, and the main phase of testing.
The phases of testing are controlled by tags as described in the Standard YAML section.
:::info
The phase names described below have been adopted as a convention within the
built-in workloads. It is strongly advised that new workload YAMLs use the same tagging scheme so that workload are more plugable across YAMLs.
:::
### Schema phase
The schema phase is simply a phase of your test which creates the necessary schema on your target system. For CQL, this generally consists of a keyspace and one ore more table statements. There is no special schema layer in DSBench. All statements executed are simply statements. This provides the greatest flexibility in testing since every activity type is allowed to control its DDL and DML using the same machinery.
The schema phase is normally executed with defaults for most parameters. This means that statements will execute in the order specified in the YAML, in serialized form, exactly once. This is a welcome side-effect of how the initial parameters like _cycles_ is set from the statements which are activated by tagging.
You can mark statements as schema phase statements by adding this set of tags to the statements, either directly, or by block:
tags:
phase: schema
### Rampup phase
When you run a performance test, it is very important to be aware of how much data is present. Higher density tests are more realistic for systems which accumulate data over time, or which have a large working set of data. The amount of data on the system you are testing should recreate a realistic amount of data that you would run in production, ideally. In general, there is a triangular trade-off between service time, op rate, and data density.
It is the purpose of the _rampup_ phase to create the backdrop data on a target system that makes a test meaningful for some level of data density. Data density is normally discussed as average per node, but it is also important to consider distribution of data as it varies from the least dense to the most dense nodes.
Because it is useful to be able to add data to a target cluster in an incremental way, the bindings which are used with a _rampup_ phase may actually be different from the ones used for a _main_ phase. In most cases, you want the rampup phase to create data in a way that incrementally adds to the population of data in the cluster. This allows you to add some data to a cluster with `cycles=0..1M` and then decide whether to continue adding data using the next contiguous range of cycles, with `cycles=1M..2M` and so on.
You can mark statements as rampup phase statements by adding this set of tags to the statements, either directly, or by block:
tags:
phase: rampup
### Main phase
The main phase of a DSBench scenario is the one during which you really care about the metric. This is the actual test that everything else has prepared your system for.
You can mark statement as schema phase statements by adding this set of tags to the statements, either directly, or by block:
tags:
phase: main

View File

@ -0,0 +1,50 @@
---
title: 01 Statement Templates
weight: 01
---
## Statement Templates
A valid config file for an activity consists of statement templates, parameters for them, bindings to generate the data to use with them, and tags for organizing them.
In essence, the config format is *all about configuring statements*.
Every other element in the config format is in some way modifying
or otherwise helping create statements to be used in an activity.
Statement templates are the single most important part of a YAML config.
```yaml
# a single statement
statements:
- a single statement body
```
This is a valid activity YAML file in and of itself. It has a single
statement template.
It is up to the individual activity types like _cql_, or _stdout_ to interpret the statement template in some way. The example above is valid as a statement in the stdout activity, but it does not produce a valid CQL statement with the CQL activity type. The contents of the statement template are free form text. If the statement template is valid CQL, then the CQL activity type can use it without throwing an error. Each activity type determines what a statement means, and how it will be used.
You can provide multiple statements, and you can use the YAML pipe to put them on multiple lines, indented a little further in:
```yaml
statements:
- |
This is a statement, and the file format doesn't
know how statements will be used!
- |
submit job {alpha} on queue {beta} with options {gamma};
```
Statements can be named:
```yaml
statements:
- s1: |
This is a statement, and the file format doesn't
know how statements will be used!
- s2: |
submit job {alpha} on queue {beta} with options {gamma};
```
Actually, every statement in a YAML has a name. If you don't provide one, then a name is auto-generated for the statement based on its position in the YAML file.

View File

@ -0,0 +1,94 @@
---
title: 02 Data Bindings
weight: 02
---
## Data Bindings
Procedural data generation is built-in to the DSBench runtime by way of the [Virtual DataSet](http://virtdata.io/) library. This allows us to create named data generation recipes. These named recipes for generated data are called bindings. Procedural generation for test data has [many benefits](http://docs.virtdata.io/why_virtdata/why_virtdata/) over shipping bulk test data around, including speed and deterministic behavior. With the VirtData approach, most of the hard work is already done for us. We just have to pull in the recipes we want.
You can add a bindings section like this:
```yaml
bindings:
alpha: Identity()
beta: NumberNameToString()
gamma: Combinations('0-9A-F;0-9;A-Z;_;p;r;o;')
delta: WeightedStrings('one:1;six:6;three:3;')
```
This is a YAML map which provides names and function specifiers. The specifier named _alpha_ provides a function that takes an input value and returns the same value. Together, the name and value constitute a binding named alpha. All of the four bindings together are called a bindings set.
The above bindings block is also a valid activity YAML, at least for the _stdout_ activity type. The _stdout_ activity can construct a statement template from the provided bindings if needed, so this is valid:
```text
[test]$ cat > stdout-test.yaml
bindings:
alpha: Identity()
beta: NumberNameToString()
gamma: Combinations('0-9A-F;0-9;A-Z;_;p;r;o;')
delta: WeightedStrings('one:1;six:6;three:3;')
# EOF (control-D in your terminal)
[test]$ dsbench run type=stdout yaml=stdout-test cycles=10
0,zero,00A_pro,six
1,one,00B_pro,six
2,two,00C_pro,three
3,three,00D_pro,three
4,four,00E_pro,six
5,five,00F_pro,six
6,six,00G_pro,six
7,seven,00H_pro,six
8,eight,00I_pro,six
9,nine,00J_pro,six
```
Above, you can see that the stdout activity type is idea for experimenting with data generation recipes. It uses the default `format=csv` parameter above, but it also supports formats like json, inlinejson, readout, and assignments.
This is all you need to provide a formulaic recipe for converting an ordinal value to a set of field values. Each time DSBench needs to create a set of values as parameters to a statement, the functions are called with an input, known as the cycle. The functions produce a set of named values that, when combined with a statement template, can yield an individual statement for a database operation. In this way, each cycle represents a specific operation. Since the functions above are pure functions, the cycle number of an operation will always produce the same operation, thus making all DSBench workloads deterministic.
In the example above, you can see the cycle numbers down the left.
If you combine the statement section and the bindings sections above into one activity yaml, you get a slightly different result, as the bindings apply to the statements that are provided, rather than creating a default statement for the bindings. See the example below:
```text
[test]$ cat > stdout-test.yaml
statements:
- |
This is a statement, and the file format doesn't
know how statements will be used!
- |
submit job {alpha} on queue {beta} with options {gamma};
bindings:
alpha: Identity()
beta: NumberNameToString()
gamma: Combinations('0-9A-F;0-9;A-Z;_;p;r;o;')
delta: WeightedStrings('one:1;six:6;three:3;')
# EOF (control-D in your terminal)
[test]$ dsbench run type=stdout yaml=stdout-test cycles=10
This is a statement, and the file format doesn't
know how statements will be used!
submit job 1 on queue one with options 00B_pro;
This is a statement, and the file format doesn't
know how statements will be used!
submit job 3 on queue three with options 00D_pro;
This is a statement, and the file format doesn't
know how statements will be used!
submit job 5 on queue five with options 00F_pro;
This is a statement, and the file format doesn't
know how statements will be used!
submit job 7 on queue seven with options 00H_pro;
This is a statement, and the file format doesn't
know how statements will be used!
submit job 9 on queue nine with options 00J_pro;
```
There are a few things to notice here. First, the statements that are executed are automatically alternated between. If you had 10 different statements listed, they would all get their turn with 10 cycles. Since there were two, each was run 5 times.
Also, the statement that had named anchors acted as a template, whereas the other one was evaluated just as it was. In fact, they were both treated as templates, but one of them had no anchors.
On more minor but important detail is that the fourth binding *delta* was not referenced directly in the statements. Since the statements did not pair up an anchor with this binding name, it was not used. No values were generated for it.
This is how activities are expected to work when they are implemented correctly. This means that the bindings themselves are templates for data generation, only to be used when necessary. This means that the bindings that are defined around a statement are more like a menu for the statement. If the statement uses those bindings with `{named}` anchors, then the recipes will be used to construct data when that statement is selected for a specific cycle. The cycle number both selects the statement (via the op sequence) and also provides the input value at the left side of the binding functions.

View File

@ -0,0 +1,22 @@
---
title: 03 Statement Params
weight: 03
---
## Statement Parameters
Statements within a YAML can be accessorized with parameters. These are known as _statement params_ and are different than the parameters that you use at the activity level. They apply specifically to a statement template, and are interpreted by an activity type when the statement template is used to construct a native statement form.
For example, the statement parameter `ratio` is used when an activity is initialized to construct the op sequence. In the _cql_ activity type, the statement parameter `prepared` is a boolean that can be used to designated when a CQL statement should be prepared or not.
As with the bindings, a params section can be added at the same level, setting additional parameters to be used with statements. Again, this is an example of modifying or otherwise creating a specific type of statement, but always in a way specific to the activity type. Params can be thought of as statement properties. As such, params don't really do much on their own, although they have the same basic map syntax as bindings:
```yaml
params:
ratio: 1
```
As with statements, it is up to each activity type to interpret params in a
useful way.

View File

@ -0,0 +1,80 @@
---
title: 04 Statement Tags
weight: 04
---
## Statement Tags
Tags are used to mark and filter groups of statements for controlling which ones get used in a given scenario. Tags are generally free-form, but there is a set of conventions that can make your testing easier.
An example:
```yaml
tags:
name: foxtrot
unit: bravo
```
### Tag Filtering
The tag filters provide a flexible set of conventions for filtering tagged statements. Tag filters are usually provided as an activity parameter when an activity is launched. The rules for tag filtering are:
1. If no tag filter is specified, then the statement matches.
2. A tag name predicate like `tags=name` asserts the presence of a specific
tag name, regardless of its value.
3. A tag value predicate like `tags=name:foxtrot` asserts the presence of
a specific tag name and a specific value for it.
4. A tag pattern predicate like `tags=name:'fox.*'` asserts the presence of a specific tag name and a value that matches the provided regular expression.
5. Multiple tag predicates may be specified as in `tags=name:'fox.*',unit:bravo`
6. Tag predicates are joined by *and* when more than one is provided -- If any predicate fails to match a tagged element, then the whole tag filtering expression fails to match.
A demonstration:
```text
[test]$ cat > stdout-test.yaml
tags:
name: foxtrot
unit: bravo
statements:
- "I'm alive!\n"
# EOF (control-D in your terminal)
# no tag filter matches any
[test]$ dsbench run type=stdout yaml=stdout-test
I'm alive!
# tag name assertion matches
[test]$ dsbench run type=stdout yaml=stdout-test tags=name
I'm alive!
# tag name assertion does not match
[test]$ dsbench run type=stdout yaml=stdout-test tags=name2
02:25:28.158 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
# tag value assertion does not match
[test]$ dsbench run type=stdout yaml=stdout-test tags=name:bravo
02:25:42.584 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
# tag value assertion matches
[test]$ dsbench run type=stdout yaml=stdout-test tags=name:foxtrot
I'm alive!
# tag pattern assertion matches
[test]$ dsbench run type=stdout yaml=stdout-test tags=name:'fox.*'
I'm alive!
# tag pattern assertion does not match
[test]$ dsbench run type=stdout yaml=stdout-test tags=name:'tango.*'
02:26:05.149 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
# compound tag predicate matches every assertion
[test]$ dsbench run type=stdout yaml=stdout-test tags='name=fox.*',unit=bravo
I'm alive!
# compound tag predicate does not fully match
[test]$ dsbench run type=stdout yaml=stdout-test tags='name=fox.*',unit=delta
11:02:53.490 [scenarios:001] ERROR i.e.activities.stdout.StdoutActivity - Unable to create a stdout statement if you have no active statements or bindings configured.
```

View File

@ -0,0 +1,42 @@
---
title: 05 Statement Blocks
weight: 05
---
## Statement Blocks
All the basic primitives described above (names, statements, bindings, params, tags) can be used to describe and parameterize a set of statements in a yaml document. In some scenarios, however, you may need to structure your statements in a more sophisticated way. You might want to do this if you have a set of common statement forms or parameters that need to apply to many statements, or perhaps if you have several *different* groups of statements that need to be configured independently.
This is where blocks become useful:
```text
[test]$ cat > stdout-test.yaml
bindings:
alpha: Identity()
beta: Combinations('u;n;u;s;e;d;')
blocks:
- statements:
- "{alpha},{beta}\n"
bindings:
beta: Combinations('b;l;o;c;k;1;-;COMBINATIONS;')
- statements:
- "{alpha},{beta}\n"
bindings:
beta: Combinations('b;l;o;c;k;2;-;COMBINATIONS;')
# EOF (control-D in your terminal)
[test]$ dsbench run type=stdout yaml=stdout-test cycles=10
0,block1-C
1,block2-O
2,block1-M
3,block2-B
4,block1-I
5,block2-N
6,block1-A
7,block2-T
8,block1-I
9,block2-O
```
This shows a couple of important features of blocks. All blocks inherit defaults for bindings, params, and tags from the root document level. Any of these values that are defined at the base document level apply to all blocks contained in that document, unless specifically overridden within a given block.

View File

@ -0,0 +1,117 @@
---
title: 06 More on Statements
weight: 06
---
# More on Statements
## Statement Delimiting
Sometimes, you want to specify the text of a statement in different ways. Since statements are strings, the simplest way for small statements is in double quotes. If you need to express a much longer statement with special characters an newlines, then you can use YAML's literal block notation (signaled by the '|' character) to do so:
```yaml
statements:
- |
This is a statement, and the file format doesn't
know how statements will be used!
- |
submit job {alpha} on queue {beta} with options {gamma};
```
Notice that the block starts on the following line after the pipe symbol. This is a very popular form in practice because it treats the whole block exactly as it is shown, except for the initial indentations, which are removed.
Statements in this format can be raw statements, statement templates, or anything that is appropriate for the specific activity type they are being used with. Generally, the statements should be thought of as a statement form that you want to use in your activity -- something that has place holders for data bindings. These place holders are called *named anchors*. The second line above is an example of a statement template, with anchors that can be replaced by data for each cycle of an activity.
There is a variety of ways to represent block statements, with folding, without, with the newline removed, with it retained, with trailing newlines trimmed or not, and so forth. For a more comprehensive guide on the YAML conventions regarding multi-line blocks, see [YAML Spec 1.2, Chapter 8, Block Styles](http://www.yaml.org/spec/1.2/spec.html#Block)
## Statement Sequences
To provide a degree of flexibility to the user for statement definitions,
multiple statements may be provided together as a sequence.
```yaml
# a list of statements
statements:
- "This a statement."
- "The file format doesn't know how statements will be used."
- "submit job {job} on queue {queue} with options {options};"
# an ordered map of statements by name
statements:
name1: statement one
name2: "statement two"
```
In the first form, the names are provided automatically by the YAML loader. In the second form, they are specified as ordered map keys.
## Statement Properties
You can also configure individual statements with named properties, using the **statement properties** form:
```yaml
# a list of statements with properties
statements:
- name: name1
stmt: statement one
- name: name2
stmt: statement two
```
This is the most flexible configuration format at the statement level. It is also the most verbose. Because this format names each property of the statement, it allows for other properties to be defined at this level as well. This includes all of the previously described configuration elements: `name`, `bindings`, `params`, `tags`, and additionally `stmt`. A detailed example follows:
```yaml
statements:
- name: foostmt
stmt: "{alpha},{beta}\n"
bindings:
beta: Combinations('COMBINATIONS;')
params:
parm1: pvalue1
tags:
tag1: tvalue1
freeparam3: a value, as if it were assigned under the params block.
```
In this case, the values for `bindings`, `params`, and `tags` take precedence, overriding those set by the enclosing block or document or activity when the names match. Parameters called **free parameters** are allowed here, such as `freeparam3`. These are simply values that get assigned to the params map once all other processing has completed.
It is possible to mix the **`<name>: <statement>`** form as above in the example for mapping statement by name, so long as some specific rules are followed. An example, which is equivalent to the above:
```yaml
statements:
- foostmt: "{alpha},{beta}\n"
parm1: pvalue1
bindings:
beta: Combinations('COMBINATIONS;')
tags:
tag1: tvalue1
```
The rules:
1. You must avoid using both the name property and the initial
**`<name>: <statement>`** together. Doing so will cause an error to be thrown.
2. Do not use the **`<name>: <statement>`** form in combination with a
**`stmt: <statement>`** property. It is not possible to detect if this occurs. Use caution if you choose to mix these forms.
As explained above, `parm1: pvalue1` is a *free parameter*, and is simply short-hand for setting values in the params map for the statement.
### Per-Statement Format
It is indeed possible to use any of the three statement formats within each entry of a statement sequence:
```yaml
statements:
- first statement body
- second: second statement body
- name: statement3
stmt: third statement body
- forth: fourth statement body
freeparam1: freeparamvalue1
tags:
type: preload
```
Specifically, the first statement is a simple statement body, the second is a named statement (via free param `<name>: statement` form), the third is a statement config map, and the fourth is a combination of the previous two.
The above is valid DSBench YAML, although a reader would need
to know about the rules explained above in order to really make sense of it. For most cases, it is best to follow one format convention, but there is flexibility for overrides and naming when you need it.

View File

@ -0,0 +1,49 @@
---
title: 07 Multi-Docs
weight: 07
---
# Multi-Docs
The YAML spec allows for multiple yaml documents to be concatenated in the
same file with a separator:
```yaml
---
```
This offers an additional convenience when configuring activities. If you want to parameterize or tag some a set of statements with their own bindings, params, or tags, but alongside another set of uniquely configured statements, you need only put them in separate logical documents, separated by a triple-dash.
For example:
```text
[test]$ cat > stdout-test.yaml
bindings:
docval: WeightedStrings('doc1.1:1;doc1.2:2;')
statements:
- "doc1.form1 {docval}\n"
- "doc1.form2 {docval}\n"
---
bindings:
numname: NumberNameToString()
statements:
- "doc2.number {numname}\n"
# EOF (control-D in your terminal)
[test]$ dsbench run type=stdout yaml=stdout-test cycles=10
doc1.form1 doc1.1
doc1.form2 doc1.2
doc2.number two
doc1.form1 doc1.2
doc1.form2 doc1.1
doc2.number five
doc1.form1 doc1.2
doc1.form2 doc1.2
doc2.number eight
doc1.form1 doc1.1
```
This shows that you can use the power of blocks and tags together at one level and also allow statements to be broken apart into a whole other level of partitioning if desired.
:::warning
The multi-doc support is there as a ripcord when you need it. However, it is strongly advised that you keep your YAML workloads simple to start and only use features like the multi-doc when you absolutely need it. For this, blocks are generally a better choice. See examples in the standard workloads.
:::

View File

@ -0,0 +1,33 @@
---
title: 08 Template Params
weight: 08
---
# Template Params
All DSBench YAML formats support a parameter macro format that applies before YAML processing starts. It is a basic macro facility that allows named anchors to be placed in the document as a whole:
```text
<<varname:defaultval>>
# or
TEMPLATE(varname,defaultval)
```
In this example, the name of the parameter is `varname`. It is given a default value of `defaultval`. If an activity parameter named *varname* is provided, as in `varname=barbaz`, then this whole expression will be replaced with `barbaz`. If none is provided then the default value will be used instead. For example:
```text
[test]$ cat > stdout-test.yaml
statements:
- "<<linetoprint:MISSING>>\n"
# EOF (control-D in your terminal)
[test]$ dsbench run type=stdout yaml=stdout-test cycles=1
MISSING
[test]$ dsbench run type=stdout yaml=stdout-test cycles=1 linetoprint="THIS IS IT"
THIS IS IT
```
If an empty value is desired by default, then simply use an empty string in your template, like `<<varname:>>` or `TEMPLATE(varname,)`.

View File

@ -0,0 +1,32 @@
---
title: 09 Statement Naming
weight: 09
---
# Statement Naming
Docs, Blocks, and Statements can all have names:
```yaml
name: doc1
blocks:
- name: block1
statements:
- stmt1: statement1
- name: st2
stmt: statement2
---
name: doc2
...
```
This provides a layered naming scheme for the statements themselves. It is not usually important to name things except for documentation or metric naming purposes.
If no names are provided, then names are automatically created for blocks and statements. Statements assigned at the document level are assigned to "block0". All other statements are named with the format `doc#--block#--stmt#`.
For example, the full name of statement1 above would be `doc1--block1--stmt1`.
:::info
If you anticipate wanting to get metrics for a specific statement in addition to the other metrics, then you will want to adopt the habit of naming all your statements something basic and descriptive.
:::

View File

@ -0,0 +1,82 @@
---
title: 10 YAML Diagnostics
weight: 10
---
## Diagnostics
This section describes errors that you might see if you have a YAML loading issue, and what
you can do to fix them.
### Undefined Name-Statement Tuple
This exception is thrown when the statement body is not found in a statement definition
in any of the supported formats. For example, the following block will cause an error:
statements:
- name: statement-foo
params:
aparam: avalue
This is because `name` and `params` are reserved property names -- removed from the list of name-value
pairs before free parameters are read. If the statement is not defined before free parameters
are read, then the first free parameter is taken as the name and statement in `name: statement` form.
To correct this error, supply a statement property in the map, or simply replace the `name: statement-foo` entry
with a `statement-foo: statement body` at the top of the map:
Either of these will work:
statements:
- name: statement-foo
stmt: statement body
params:
aparam: avalue
statements:
- statement-foo: statement body
params:
aparam: avalue
In both cases, it is clear to the loader where the statement body should come from, and what (if any) explicit
naming should occur.
### Redefined Name-Statement Tuple
This exception is thrown when the statement name is defined in multiple ways. This is an explicit exception
to avoid possible ambiguity about which value the user intended. For example, the following statements
definition will cause an error:
statements:
- name: name1
name2: statement body
This is an error because the statement is not defined before free parameters are read, and the `name: statement`
form includes a second definition for the statement name. In order to correct this, simply remove the separate
`name` entry, or use the `stmt` property to explicitly set the statement body. Either of these will work:
statements:
- name2: statement body
statements:
- name: name1
stmt: statement body
In both cases, there is only one name defined for the statement according to the supported formats.
### YAML Parsing Error
This exception is thrown when the YAML format is not recognizable by the YAML parser. If you are not
working from examples that are known to load cleanly, then please review your document for correctness
according to the [YAML Specification]().
If you are sure that the YAML should load, then please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug)
with details on the type of YAML file you are trying to load.
### YAML Construction Error
This exception is thrown when the YAML was loaded, but the configuration object was not able to be constructed
from the in-memory YAML document. If this error occurs, it may be a bug in the YAML loader implementation.
Please [submit a bug report](https://github.com/engineblock/engineblock/issues/new?labels=bug) with details
on the type of YAML file you are trying to load.

View File

@ -0,0 +1,31 @@
---
title: Designing Workloads
weight: 40
---
# Designing Workloads
Workloads in DSBench are always controlled by a workload definition. Even the built-in workloads are simply pre-configured and controlled from a single YAML file which is bundled internally.
With DSBench a standard YAML configuration format is provided that is used across all activity types. This makes it easy to specify statements, statement parameters, data bindings, and tags. This section describes the standard YAML format and how to use it.
It is recommended that you read through the examples in each of the design sections in order. This guide was designed to give you a detailed understanding of workload construction with DSBench. The examples will also give you better insight into how DSBench works at a fundamental level.
## Multi-Protocol Support
You will notice that this guide is not overly CQL-specific. That is because DSBench is a multi-protocol tool. All that is needed for you to use this guide with other protocols is the release of more activity types. Try to keep that in mind as you think about designing workloads.
## Advice for new builders
### Review existing examples
The built-in workloads that are include with DSBench are also shared on the github site where we manage the DSBench project:
- [baselines](https://github.com/datastax/dsbench-labs/tree/master/sample-activities/baselines)
- [bindings](https://github.com/datastax/dsbench-labs/tree/master/sample-activities/bindings)
### Follow the conventions
The tagging conventions described under the YAML Conventions section will make your testing go smoother. All of the baselines that we publish for DSBench will use this form.

View File

@ -0,0 +1,28 @@
---
title: Statement Params
weight: 15
---
Statement parameters apply to the defined operations for an activity. Statement
parameters are always configurable as part of a `params` block in YAML, for
activities that use the [Standard YAML](/user-guide/standard_yaml) format.
In some cases, an [Activity Parameter](/parameters/activity_params) of the same
name can be used to establish a default value. In that case, it will be
documented here with the parameter description.
### ratio
`ratio: <ratio>`
Determines the frequency of the affected statements in the operational sequence.
This means, in effect, the number of times a given statement will be executed
within the planned sequence before it starts over at the beginning. When using ratio,
it is important to be aware of *how* these statements are sequenced according
to the ratio. That is controlled by [seq](/parameters/activity_params#seq).

View File

@ -0,0 +1,413 @@
---
title: activity type - CQL
weight: 06
---
# Activity Type: CQL
This is the same documentation that you get when you run
dsbench help cql
To select this activity type, pass `type=cql` to a run or start command.
---------
# cql activity type
This is an activity type which allows for the execution of CQL statements.
This particular activity type is wired synchronously within each client
thread, however the async API is used in order to expose fine-grain
metrics about op binding, op submission, and waiting for a result.
### Example activity definitions
Run a cql activity named 'cql1', with definitions from activities/cqldefs.yaml
~~~
... type=cql alias=cql1 yaml=cqldefs
~~~
Run a cql activity defined by cqldefs.yaml, but with shortcut naming
~~~
... type=cql yaml=cqldefs
~~~
Only run statement groups which match a tag regex
~~~
... type=cql yaml=cqldefs tags=group:'ddl.*'
~~~
Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
~~~
... type=cql yaml=cqldefs tags=group:'dml.*' cycles=1000..1100
~~~
This last example shows that the cycle range is [inclusive..exclusive),
to allow for stacking test intervals. This is standard across all
activity types.
### CQL ActivityType Parameters
- **driver** - default: dse - The type of driver to use, either dse, or
oss. If you need DSE-specific features, use the dse driver. If you are
connecting to an OSS Apache Cassandra cluster, you must use the oss
driver. The oss driver option is only available in ebdse.
- **host** - The host or hosts to use for connection points to
the cluster. If you specify multiple values here, use commas
with no spaces.
Examples:
- `host=192.168.1.25`
- `host=`192.168.1.25,testhost42`
- **yaml** - The file which holds the schema and statement defs.
(no default, required)
- **port** - The port to connect with
- **cl** - An override to consistency levels for the activity. If
this option is used, then all consistency levels will be replaced
by this one for the current activity, and a log line explaining
the difference with respect to the yaml will be emitted.
This is not a dynamic parameter. It will only be applied at
activity start.
- **cbopts** - default: none - this is how you customize the cluster
settings for the client, including policies, compression, etc. This
is a string of *Java*-like method calls just as you would use them
in the Cluster.Builder fluent API. They are evaluated inline
with the default Cluster.Builder options not covered below.
Example: cbopts=".withCompression(ProtocolOptions.Compression.NONE)"
- **whitelist** default: none - Applies a whitelist policy to the load balancing
policy in the driver. If used, a WhitelistPolicy(RoundRobinPolicy())
will be created and added to the cluster builder on startup.
Examples:
- whitelist=127.0.0.1
- whitelist=127.0.0.1:9042,127.0.0.2:1234
- **retrypolicy** default: none - Applies a retry policy in the driver
The only option supported for this version is `retrypolicy=logging`,
which uses the default retry policy, but with logging added.
- **pooling** default: none - Applies the connection pooling options
to the policy.
Examples:
- `pooling=4:10`
keep between 4 and 10 connections to LOCAL hosts
- `pooling=4:10,2:5`
keep 4-10 connections to LOCAL hosts and 2-5 to REMOTE
- `pooling=4:10:2000`
keep between 4-10 connections to LOCAL hosts with
up to 2000 requests per connection
- `pooling=5:10:2000,2:4:1000` keep between 5-10 connections to
LOCAL hosts with up to 2000 requests per connection, and 2-4
connection to REMOTE hosts with up to 1000 requests per connection
Additionally, you may provide the following options on pooling. Any
of these that are provided must appear in this order:
`,heartbeat_interval_s:n,idle_timeout_s:n,pool_timeout_ms:n`, so a
full example with all options set would appear as:
`pooling=5:10:2000,2:4:1000,heartbeat_interval_s:30,idle_timeout_s:120,pool_timeout_ms:5`
- **socketoptions** default: none - Applies any of the valid socket
options to the client when the session is built. Each of the options
uses the long form of the name, with either a numeric or boolean
value. Individual sub-parameters should be separated by a comma, and
the parameter names and values can be separated by either equals or a
colon. All of these values may be changed:
- read_timeout_ms
- connect_timeout_ms
- keep_alive
- reuse_address
- so_linger
- tcp_no_delay
- receive_buffer_size
- send_buffer_size
Examples:
- `socketoptions=read_timeout_ms=23423,connect_timeout_ms=4444`
- `socketoptions=tcp_no_delay=true
- **tokens** default: unset - Only executes statements that fall within
any of the specified token ranges. Others are counted in metrics
as skipped-tokens, with a histogram value of the cycle number.
Examples:
- tokens=1:10000,100000:1000000
- tokens=1:123456
- **maxtries** - default: 10 - how many times an operation may be
attempted before it is disregarded
- **maxpages** - default: 1 - how many pages can be read from a query which
is larger than the fetchsize. If more than this number of pages
is required for such a query, then an UnexpectedPaging excpetion
is passed to the error handler as explained below.
- **fetchsize** - controls the driver parameter of the same name.
Suffixed units can be used here, such as "50K". If this parameter
is not present, then the driver option is not set.
- **cycles** - standard, however the cql activity type will default
this to however many statements are included in the current
activity, after tag filtering, etc.
- **username** - the user to authenticate as. This option requires
that one of **password** or **passfile** also be defined.
- **password** - the password to authenticate with. This will be
ignored if passfile is also present.
- **passfile** - the file to read the password from. The first
line of this file is used as the password.
- **ssl** - enable ssl if you want transport level encryption.
Examples:
- `ssl=true`
enable ssl
- `ssl=false`
disable ssl (the default)
- **keystore** - specify the keystore location for SSL.
Examples:
- `keystore=JKS` (the default)
- **kspass** - specify the password to the keystore for SSL.
Examples:
- `kspass=mypass`
- **tlsversion** - specify the TLS version to use for SSL.
Examples:
- `tlsversion=TLSv1.2` (the default)
- **jmxreporting** - enable JMX reporting if needed.
Examples:
- `jmxreporting=true`
- `jmxreporting=false` (the default)
- **alias** - this is a standard engineblock parameter, however
the cql type will use the yaml value also as the alias value
when not specified.
- **errors** - error handler configuration.
(default errors=stop,retryable->retry,unverified->stop)
Examples:
- errors=stop,WriteTimeoutException=histogram
- errors=count
- errors=warn,retryable=count
See the separate help on 'cqlerrors' for detailed
configuration options.
- **defaultidempotence** - sets default idempotence on the
driver options, but only if it has a value.
(default unset, valid values: true or false)
- **speculative** - sets the speculative retry policy on the cluster.
(default unset)
This can be in one of the following forms:
- pT:E:L - where :L is optional and
T is a floating point threshold between 0.0 and 100.0 and
E is an allowed number of concurrent speculative executions and
L is the maximum latency tracked in the tracker instance
(L defaults to 15000 when left out)
Examples:
- p99.8:5:15000ms - 99.8 percentile, 5 executions, 15000ms max tracked
- p98:2:10000ms - 98.0 percentile, 2 executions allowed, 10s max tracked
- Tms:E - where :E is optional and
T is a constant threshold latency and
E is the allowed number of concurrent speculative retries
(E default to 5 when left out)
Examples:
- 100ms:5 - constant threshold of 100ms and 5 allowed executions
- **seq** - selects the statement sequencer used with statement ratios.
(default: bucket)
(options: concat | bucket | interval)
The concat sequencer repeats each statement in order until the ratio
is achieved.
The bucket sequencer uses simple round-robin distribution to plan
statement ratios, a simple but unbalanced form of interleaving.
The interval sequencer apportions statements over time and then by
order of appearance for ties. This has the effect of interleaving
statements from an activity more evenly, but is less obvious in how
it works.
All of the sequencers create deterministic schedules which use an internal
lookup table for indexing into a list of possible statements.
- **trace** - enables a trace on a subset of operations. This is disabled
by default.
Examples:
`trace=modulo:100,filename:trace.log`
The above traces every 100th cycle to a file named trace.log.
`trace=modulo:1000,filename:stdout`
The above traces every 1000th cycle to stdout.
If the trace log is not specified, then 'tracelog' is assumed.
If the filename is specified as stdout, then traces are dumped to stdout.
- **clusterid** - names the configuration to be used for this activity. Within
a given scenario, any activities that use the same name for clusterid will
share a session and cluster.
default: 'default'
- **drivermetrics** - enable reporting of driver metrics.
default: false
- **driverprefix** - set the metrics name that will prefix all CQL driver metrics.
default: 'driver.*clusterid*.'
The clusterid specified is included so that separate cluster and session
contexts can be reported independently for advanced tests.
- **usercodecs** - enable the loading of user codec libraries
for more details see: com.datastax.codecs.framework.UDTCodecInjector in the ebdse
code base. This is for dynamic codec loading with user-provided codecs mapped
via the internal UDT APIs.
default: false
- **secureconnectbundle** - used to connect to CaaS, accepts a path to the secure connect bundle
that is downloaded from the CaaS UI.
Examples:
- `secureconnectbundle=/tmp/secure-connect-my_db.zip`
- `secureconnectbundle="/home/automaton/secure-connect-my_db.zip"`
- **insights** - Set to false to disable the driver from sending insights monitoring information
- `insights=false`
- **tickduration** - sets the tickDuration (milliseconds) of HashedWheelTimer of the
java driver. This timer is used to schedule speculative requests.
Examples:
- `tickduration=10`
- `tickduration=100` (driver default value)
- **compression** - sets the transport compression to use for this
activity. Valid values are 'LZ4' and 'SNAPPY'. Both types are bundled
with EBDSE.
### CQL YAML Parameters
A uniform YAML configuration format was introduced with engineblock 2.0.
As part of this format, statement parameters were added for the CQL Activity Type.
These parameters will be consolidated with the above parameters in time, but for
now **they are limited to a YAML params block**:
params:
ratio: 1
# Sets the statement ratio within the operation sequencer
# scheme. Integers only.
# When preparing the operation order (AKA sequencing),
# frequency of the associated statements.
cl: ONE
# Sets the consistency level, using any of the standard
# identifiers from com.datastax.driver.core.ConsistencyLevel,
# any one of:
# LOCAL_QUORUM, ANY, ONE, TWO, THREE, QUORUM, ALL,
# EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE
prepared: true
# By default, all statements are prepared. If you are
# creating schema, set this to false.
idempotent: false
# For statements that are known to be idempotent, set this
# to true
instrument: false
# If a statement has instrument set to true, then
# individual Timer metrics will be tracked for
# that statement for both successes and errors,
# using the given statement name.
logresultcsv: true
OR
logresultcsv: myfilename.csv
# If a statement has logresultcsv set to true,
# then individual operations will be logged to a CSV file.
# In this case the CSV file will be named as
# <statement-name>--results.csv.
# If the value is present and not "true", then the value will
# be used as the name of the file.
#
# The format of the file is:
# <cycle>,(SUCCESS|FAILURE),<nanos>,<rows-fetched>,(<error-class,NONE)
# NOTES:
# 1) BE CAREFUL with this setting. A single logged line per
# result is not useful for high-speed testing as it will
# impose IO loads on the client to slow it down.
# 2) BE CAREFUL with the name. It is best to just pick good
# names for your statement defs so that everything remains
# coherent and nothing gets accidentally overwritten.
# 3) If logresultcsv is provided at the activity level, it
# applies to all statements, and the only value values
# there are true and false.
### Generic Parameters
*provided by the runtime*
- **targetrate** - The target rate in ops/s
- **linkinput** - if the name of another activity is specified, this activity
will only go as fast as that one.
- **tags** - optional filter for matching tags in yaml sections (detailed help
link needed)
- **threads** - the number of client threads driving this activity
### Metrics
- alias.cycles - (provided by engineblock) A timer around the whole cycle
- alias.phase - (provided by engineblock) A timer around additional phases
within a cycle. For this driver, it captures all the work in the client
around fetching additional pages for paged reads.
- alias.bind - A timer which tracks the performance of the statement
binding logic, including the generation of data immediately prior
- alias.execute - A timer which tracks the performance of op submission
only. This is the async execution call, broken out as a separate step.
- alias.result - A timer which tracks the performance of an op result only.
This is the async get on the future, broken out as a separate step.
- alias.tries - A histogram of how many tries were required to get a
completed operation
- alias.pages - A timer which tracks the performance of paging, specific
to more than 1-page query results. i.e., if all reads return within 1
page, this metric will not have any data.
- alias.strides - A timer around each stride of operations within a thread
- alias.skipped-tokens - A histogram that records the count and cycle values
of skipped tokens.
- alias.result-success - A timer that records rate and histograms of the time
it takes from submitting a query to completely reading the result
set that it returns, across all pages. This metric is only counted
for non-exceptional results, while the result metric above includes
all operations.
##### Metrics Details
The cycles metric captures data on the outside of each operation, but it also
includes any internal processing time needed by the client. Within the
cycles metric, bind, execute, and result all occur in sequence. There may
be multiple values recorded for submit and execute for a single bind event.
This is because a bind exception is final, but an execute and result may
both be retried. The tries metric captures how many tries were required. It
is a histogram only. If the metric for tries is 1 across the board, then
no operation had to be retried.
As for a normal single page read result, both the execute and result timers
are included within the code block wrapped by the pages metric.
### YAML Format
The YAML file for a CQL activity has the following structure:
1. One or more document sections, separated with '---' and a newline.
1. An optional tag map
2. One or more statements
1. a descriptive name
2. prepared: false, if you want to modify the default (prepared:true)
3. statement CQL
4. statement data bindings
Each section is a separate yaml document internally to the yaml file. The
tags that are provided allow for subgroups of statements to be activated.
All statements in a matching document (when filtered by tags) are included
in the statement rotation.
If no tags are provided in a document section, then it will be matched by
all possible tag filters. Conversely, if no tag filter is applied in
the activity definition, all tagged documents will match.
Data bindings specify how values are generated to plug into each operation. More
details on data bindings are available in the activity usage guide.
### Parameter Templating
Double angle brackets may be used to drop parameters into the YAML
arbitrarily. When the YAML file is loaded, and only then, these parameters
are interpolated from activity parameters like those above. This allows you
to create activity templates that can be customized simply by providing
additional parameters to the activity. There are two forms,
\<\<some_var_name:default_value\>\> and \<\<some_var_name\>\>. The first
form contains a default value. In any case, if one of these parameters is
encountered and a qualifying value is not found, an error will be thrown.
### YAML Location
The YAML file referenced in the yaml= parameter will be searched for in the following places, in this order:
1. A URL, if it starts with 'http:' or 'https:'
2. The local filesystem, if it exists there
3. The internal classpath and assets in the jar.
The '.yaml' suffix is not required in the yaml= parameter, however it is
required on the actual file. As well, the logical search path "activities/"
will be used if necessary to locate the file, both on the filesystem and in
the classpath.
There is a basic example below that can be copied as a starting template.
## YAML Examples
Please see the bundled activities with ebdse for examples.

View File

@ -0,0 +1,98 @@
---
title: activity type - stdout
weight: 06
---
# Activity Type: stdout
This is the same documentation that you get when you run
dsbench help stdout
To select this activity type, pass `type=stdout` to a run or start command.
---------
# stdout activity type
This is an activity type which allows for the generation of data
into to stdout or a file. It reads the standard engineblock YAML
format. It can read YAML activity files for any activity type
that uses the curly brace token form in statements.
## Example activity definitions
Run a stdout activity named 'stdout-test', with definitions from activities/stdout-test.yaml
~~~
... type=stdout yaml=stdout-test
~~~
Only run statement groups which match a tag regex
~~~
... type=stdout yaml=stdout-test tags=group:'ddl.*'
~~~
Run the matching 'dml' statements, with 100 cycles, from [1000..1100)
~~~
... type=stdout yaml=stdout-test tags=group:'dml.*' cycles=1000..11000 filename=test.csv
~~~
This last example shows that the cycle range is [inclusive..exclusive),
to allow for stacking test intervals. This is standard across all
activity types.
## stdout ActivityType Parameters
- **filename** - this is the name of the output file
(defaults to "stdout", which actually writes to stdout, not the filesystem)
- **newline** - whether to automatically add a missing newline to the end
of any statements.
default: true
- **format** - which format to use. If provided, the format will override
any statement formats provided by the YAML.
valid values are (csv, readout, json, inlinejson, and assignments)
## Configuration
This activity type uses the uniform yaml configuration format.
For more details on this format, please refer to the
[Standard YAML Format](http://docs.engineblock.io/user-guide/standard_yaml/)
## Configuration Parameters
- **newline** - If a statement has this param defined, then it determines
whether or not to automatically add a missing newline for that statement
only. If this is not defined for a statement, then the activity-level
parameter takes precedence.
## Statement Format
The statement format for this activity type is a simple string. Tokens between
curly braces are used to refer to binding names, as in the following example:
statements:
- "It is {minutes} past {hour}."
If you want to suppress the trailing newline that is automatically added, then
you must either pass `newline=false` as an activity param, or specify it
in the statement params in your config as in:
params:
newline: false
### Auto-generated statements
If no statement is provided, then the defined binding names are used as-is
to create a CSV-style line format. The values are concatenated with
comma delimiters, so a set of bindings like this:
bindings:
one: Identity()
two: NumberNameToString()
would create an automatic string template like this:
statements:
- "{one},{two}\n"
The auto-generation behavior is forced when the format parameter is supplied.

View File

@ -0,0 +1,10 @@
---
title: Activity Types
weight: 50
---
Each DSBench scenario is comprised of one or more activities of a specific type. The types of activities available are provided by the version of DSBench.
Additional activity types will be added in future releases. This section is a reference section that shows the help you would get with a command like:
dsbench help <activity type>

View File

@ -0,0 +1,84 @@
---
title: CLI Scripting
--------------------
# CLI Scripting
Sometimes you want to to run a set of workloads in a particular order, or call other specific test setup logic in between phases or workloads. While the full scripting environment allows you to do this and more, it is not necessary to write javascript for every scenario.
For more basic setup and sequencing needs, you can achive a fair degree of flexibility on the command line. A few key API calls are supported directly on the command line. This guide explains each of them, what the do, and how to use them together.
## Script Construction
As the command line is parsed, from left to right, the scenario script is built in an internal scripting buffer. Once the command line is fully parsed, this script is executed. Each of the commands below is effectively a macro for a snippet of script. It is important to remember that order is important.
## Command line format
Newlines are not allowed when building scripts from the command line. As long as you follow the allowed forms below, you can simply string multiple commands together with spaces between. As usual, single word options without double dashes are commands, key=value style parameters apply to the previous command, and all other commands with
--this-style
are non-scripting options.
## Concurrency & Control
All activities that run during a scenario run under the control of, but
independently from the scenario script. This means that you can have a number of activities running while the scenario script is doing its own thing. The scenario only completes when both the scenario script and the activities are finished.
### `start type=<activity type> alias=<alias> ...`
You can start an activity with this command. At the time this command is
evaluated, the activity is started, and the script continues without blocking. This is an asynchronous start of an activity. If you start multiple activities in this way, they will run concurrently.
The type argument is required to identify the activity type to run. The alias parameter is not strictly required, unless you want to be able to interact with the started activity later. In any case, it is a good idea to name all your activities with a meaningful alias.
### `stop <alias>`
Stop an activity with the given alias. This is synchronous, and causes the
scenario to pause until the activity is stopped. This means that all threads for the activity have completed and signalled that they're in a stopped state.
### `await <alias>`
Await the normal completion of an activity with the given alias. This causes the scenario script to pause while it waits for the named activity to finish. This does not tell the activity to stop. It simply puts the scenario script into a paused state until the named activity is complete.
### `run type=<activity type> alias=<alias> ...`
Run an activity to completion, waiting until it is complete before continuing with the scenario script. It is effectively the same as
start type=<activity type> ... alias=<alias>
await <alias>
### `waitmillis <milliseconds>`
Pause the scenario script for this many milliseconds. This is useful for controlling workload run duration, etc.
### `script <script file>`
Add the contents of the named file to the scenario script buffer.
### `fragment <script text>`
Add the contents of the next argument to the scenario script buffer.
# An example CLI script
~~~
./run-eb \
start type=stdout alias=a cycles=100K yaml=baselines/cql-iot tags=phase:main\
start type=stdout alias=b cycles=200K yaml=baselines/cql-iot tags=phase:main\
waitmillis 10000 \
await one \
stop two
~~~
in this CLI script, the backslashes are necessary in order keep everything on the same command line. Here is a narrative of what happens when it is run.
1. An activity named 'a' is started, with 100K cycles of work.
2. An activity named 'b' is started, with 200K cycles of work.
3. While these activities run, the scenario script waits for ten seconds.
4. If a is complete, the await returns immediately. If not, the
script waits for a to complete its 100K cycles.
5. b is immediately stopped.
6. Because all activities are stopped or complete, and the script is complete, the scenario exits.

View File

@ -0,0 +1,98 @@
---
title: Scenario Scripting
---
# Scenario Scripting
## Motive
The EngineBlock runtime is a combination of a scripting sandbox and a workload execution machine. This is not accidental. With this particular arrangement, it should be possible to build sophisticated tests across a variety of scenarios. In particular, logic which can observe and react to the system under test can be powerful. With this approach, it becomes possible to break away from the conventional run-interpret-adjust cycle which is all too often done by human hands.
## Machinery, Controls & Instruments
All of the heavy lifting is left to Java and the core DSBench runtime. This includes the iterative workloads that are meant to test the target system. This is combined with a control layer which is provided by Nashorn and eventually GraalVM. This division of responsibility allows the high-level test logic to be "script" and the low-level activity logic to be "machinery". While the scenario script has the most control, it also is the least busy relative to activity workloads. The net effect is that you have the efficiency of the iterative test loads in conjunction with the open design palette of a first-class scripting language.
Essentially, the ActivityType drivers are meant to handle the workload-specific machinery. They also provide dynamic control points and parameters which special to that activity type. This exposes a full feedback loop between a running scenario script and the activities that it runs. The scenario is free to read the performance metrics from a running activity and make changes to it on the fly.
## Scripting Environment
The DSBench scripting environment provided has a few
modifications meant to streamline understanding and usage of DSBench dynamic parameters and metric.
### Active Bindings
Active bindings are control variables which, when assigned to, cause an immediate change in the behavior of the runtime. Each of the variables
below is pre-wired into each script environment.
#### scenario
This is the __Scenario Controller__ object which manages the activity executors in the runtime. All the methods on this Java type are provided
to the scripting environment directly.
#### activities.&lt;alias&gt;.&lt;paramname&gt;
Each activity parameter for a given activity alias is available at this name within the scripting environment. Thus, you can change the number of threads on an activity named foo (alias=foo) in the scripting environment by assigning a value to it as in `activities.foo.threads=3`.
Any assignments take effect synchronously before the next line of the script continues executing.
#### __metrics__.&lt;alias&gt;.&lt;metric name&gt;
Each activity metric for a given activity alias is available at this name.
This gives you access to the metrics objects directly. Some metrics objects
have also been enhanced with wrapper logic to provide simple getters and setters, like `.p99ms` or `.p99ns`, for example.
Interaction with the DSBench runtime and the activities therein is made easy
by the above variables and objects. When an assignment is made to any of these variables, the changes are propagated to internal listeners. For changes to _threads_, the thread pool responsible for the affected activity adjusts the number of active threads (AKA slots). Other changes are further propagated directly to the thread harnesses and components which implement the ActivityType.
:::warning
Assignment to the _type_ and _alias_ activity parameters has no special effect, as you can't change an activity to a different type once it has been created.
:::
You can make use of more extensive Java or Javascript libraries as needed,
mixing then with the runtime controls provided above.
## Enhanced Metrics for Scripting
The metrics available in DSBench are slightly different than the standard
kit with dropwizard metrics. The key differences are:
### HDR Histograms
All histograms use HDR histograms with *four* significant digits.
All histograms reset on snapshot, automatically keeping all data until you
report the snapshot or access the snapshot via scripting. (see below).
The metric types that use histograms have been replaced with nicer version for scripting. You don't have to do anything differently in your reporter config to use them. However, if you need to use the enhanced versions in your local scripting, you can. This means that Timer and Histogram types are enhanced. If you do not use the scripting extensions, then you will automatically get the standard behavior that you are used to, only with higher-resolution HDR and full snapshots for each report to your downstream metrics systems.
### Scripting with Delta Snapshots
For both the timer and the histogram types, you can call getDeltaReader(), or access it simply as &lt;metric&gt;.deltaReader. When you do this, the delta snapshotting behavior is maintained until you use the deltaReader to access it. You can get a snapshot from the deltaReader by calling getDeltaSnapshot(10000), which causes the snapshot to be reset for collection, but retains a cache of the snapshot for any other consumer of getSnapshot() for that duration in milliseconds. If, for example, metrics reporters access the snapshot in the next 10 seconds, the reported snapshot will be exactly what was used in the script.
This is important for using local scripting methods and calculations with aggregate views downstream. It means that the histograms will match up between your local script output and your downstream dashboards, as they will both be using the same frame of data, when done properly.
### Histogram Convenience Methods
All histogram snapshots have additional convenience methods for accessing every percentile in (P50, P75, P90, P95, P98, P99, P999, P9999) and every time unit in (s, ms, us, ns). For example, getP99ms() is supported, as is getP50ns(), and every other possible combination. This means that you can access the 99th percentile metric value in your scripts for activity _foo_ as _metrics.foo.cycles.snapshot.p99ms_.
## Control Flow
When a script is run, it has absolute control over the scenario runtime while it is active. Once the script reaches its end, however, it will only exit if all activities have completed. If you want to explicitly stop a script, you must stop all activities.
## Strategies
You can use DSBench in the classic form with `run type=<type> param=value ...` command line syntax. There are reasons, however, that you will sometimes want customize and modify your scripts directly, such as:
- Permute test variables to cover many sub-conditions in a test.
- Automatically adjust load factors to identify the nominal capacity of a system.
- Adjust rate of a workload in order to get a specific measurement of system behavior.
- React to changes in test or target system state in order to properly sequence a test.
## Script Input & Output
Internal buffers are kept for _stdin_, _stdout_, and _stderr_ for the scenario script execution. These are logged to the logfile upon script completion, with markers showing the timestamp and file descriptor (stdin, stdout, or stderr) that each line was recorded from.
## External Docs
- [Java Platform, Standard Edition Nashorn User's Guide (Java 8)](https://docs.oracle.com/javase/8/docs/technotes/guides/scripting/nashorn/api.html)
- [Nashorn extensions on OpenJDK Wiki](https://wiki.openjdk.java.net/display/Nashorn/Nashorn+extensions)
- [Scripting for the Java (8) Platform](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/)

View File

@ -0,0 +1,27 @@
---
title: Standard Metrics
---
# Standard Metrics
DSBench comes with a set of standard metrics that will be part of every activity type. Each activity type enhances the metrics available by adding their own metrics with the DSBench APIs. This section explains what the standard metrics are, and how to interpret them.
## read-input
Within DSBench, a data stream provider called an _Input_ is responsible for providing the actual cycle number that will be used by consumer threads. Because different _Input_ implementations may perform differently, a separate metric is provided to track the performance in terms of client-side overhead. The **read-input** metric is a timer that only measured the time it takes for a given activity thread to read the input value, nothing more.
## strides
A stride represents the work-unit for a thread within DSBench. It allows a set of cycles to be logically grouped together for purposes of optimization -- or in some cases -- to simulate realistic client-side behavior over multiple operations. The stride is the number of cycles that will be allocated to each thread before it starts iterating on them.
The **strides** timer measures the time each stride takes, including all cycles within the stride. It starts measuring time before the cycle starts, and stops measuring after the last cycle in the stride has run.
## cycles
Within DSBench, each logical iteration of a statement is handled within a distinct cycle. A cycle represents an iteration of a workload. This corresponds to a single operation executed according to some statement definition.
The **cycles** metric is a timer that starts counting at the start of a cycle, before any specific activity behavior has control. It stops timing once the logical cycle is complete. This includes and additional phases that are executed by multi-phase actions.

View File

@ -0,0 +1,29 @@
---
title: Timing Terms
---
# Timing Terms
Often, terms used to describe latency can create confusion.
In fact, the term _latency_ is so overloaded in practice that it is not useful by itself. Because of this, DSBench will avoid using the term latency _except in a specific way_. Instead, the terms described in this section will be used.
DSBench is a client-centric testing tool. The measurement of operations occurs on the client, without visibility to what happens in transport or on the server. This means that the client *can* see how long an operation takes, but it *cannot see* how much of the operational time is spent in transport and otherwise. This has a bearing on the terms that are adopted with DSBench.
Some terms are anchored by the context in which they are used. For latency terms, *service time* can be subjective. When using this term to describe other effects in your system, what is included depends on the perspective of the requester. The concept of service is universal, and every layer in a system can be seen as a service. Thus, the service time is defined by the vantage point of the requester. This is the perspective taken by the DSBench approach for naming and semantics below.
## responsetime
**The duration of time a user has to wait for a response from the time they submitted the request.** Response time is the duration of time from when a request was expected to start, to the time at which the response is finally seen by the user. A request is generally expected to start immediately when users make a request. For example, when a user enters a URL into a browser, they expect the request to start immediately when they hit enter.
In DSBench, the response time for any operation can be calculated by adding its wait time and its the service time together.
## waittime
**The duration of time between when an operation is intended to start and when it actually starts on a client.** This is also called *scheduling delay* in some places. Wait time occurs because clients are not able to make all requests instantaneously when expected. There is an ideal time at which the request would be made according to user demand. This ideal time is always earlier than the actual time in practice. When there is a shortage of resources *of any kind* that delays a client request, it must wait.
Wait time can accumulate when you are running something according to a dispatch rate, as with a rate limiter.
## servicetime
**The duration of time it takes a server or other system to fully process to a request and send a response.** From the perspective of a testing client, the _system_ includes the infrastructure as well as remote servers. As such, the service time metrics in DSBench include any operational time that is external to the client, including transport latency.

View File

@ -0,0 +1,101 @@
---
title: Advanced Metrics
---
# Advanced Metrics
## Unit of Measure
All metrics collected from activities are recorded in nanoseconds and ops per second. All histograms are recorded with 4 digits of precision using HDR histograms.
## Metric Outputs
Metrics from a scenario run can be gathered in multiple ways:
- In the log output
- In CSV files
- In HDR histogram logs
- In Histogram Stats logs (CSV)
- To a monitoring system via graphite
- via the --docker-metrics option
With the exception of the `--docker-metrics` approach, these forms may be combined and used in combination. The command line options for enabling these are documented in the built-in help, although some examples of these may be found below.
## Metrics via Graphite
If you like to have all of your testing data in one place, then you may be
interested in reporting your measurements to a monitoring system. For this,
DSBench includes a [Metrics Library](https://github.com/dropwizard/metrics).
Graphite reporting is baked in as the default reporter.
In order to enable graphite reporting, use one of these options formats:
--report-graphite-to <host>
--report-graphite-to <host>:<port>
## Metric Naming
## Prefix
Core metrics use the prefix _engineblock_ by default. You can override this with the ``--metrics-prefix` option:
--metrics-prefix myclient.group5
## Identifiers
Metrics associated with a specific activity will have the activity alias in
their name. There is a set of core metrics which are always present regardless of the activity type. The names and types of additional metrics provided for each activity type vary.
Sometimes, an activity type will expose metrics on a per statement basis, measuring over all invocations of a given statement as defined in the YAML. In these cases, you will see `--` separating the name components of the metric. At the most verbose, a metric name could take on the form like
`<activity>.<docname>--<blockname>--<statementname>--<metricname>`, although this is rare when you name your statements, which is recommended.
Just keep in mind that the double dash connects an activity's alias with named statements *within* that activity.
## HDR Histograms
### Recording HDR Histogram Logs
You can record details of histograms from any compatible metric (histograms and timers) with an option like this:
--log-histograms hdrdata.log
If you want to record only certain metrics in this way, then use this form:
--log-histograms 'hdrdata.log:.*suffix'
Notice that the option is enclosed in single quotes. This is because the second part of the option value is a regex. The '.*suffix' pattern matches any metric name that ends with "suffix". Effectively, leaving out the pattern is the same as using '.\*', which matches all metrics. Any valid regex is allowed here.
Metrics may be included in multiple logs, but care should be taken not to overdo this. Keeping higher fidelity histogram reservoirs does come with a cost, so be sure to be specific in what you record as much as possible.
If you want to specify the recording interval, use this form:
--log-histograms 'hdrdata.log:.*suffix:5s'
If you want to specify the interval, you must use the third form above, although it is valid to leave the pattern empty, such as 'hdrdata.log::5s'.
Each interval specified will be tracked in a discrete reservoir in memory, so they will not interfere with each other in terms of accuracy.
### Recording HDR Histogram Stats
You can also record basic snapshots of histogram data on a periodic interval
just like above with HDR histogram logs. The option to do this is:
--log-histostats 'hdrstats.log:.*suffix:10s'
Everything works the same as for hdr histogram logging, except that the format is in CSV as shown in the example below:
~~~
#logging stats for session scenario-1479089852022
#[Histogram log format version 1.0]
#[StartTime: 1479089852.046 (seconds since epoch), Sun Nov 13 20:17:32 CST 2016]
#Tag,Interval_Start,Interval_Length,count,min,p25,p50,p75,p90,p95,p98,p99,p999,p9999,max
Tag=diag1.delay,0.457,0.044,1,16,31,31,31,31,31,31,31,31,31,31
Tag=diag1.cycles,0.48,0.021,31,4096,8191,8191,8191,8191,8191,8191,8191,8191,8191,2097151
Tag=diag1.delay,0.501,0.499,1,1,1,1,1,1,1,1,1,1,1,1
Tag=diag1.cycles,0.501,0.499,498,1024,2047,2047,4095,4095,4095,4095,4095,4095,4095,4194303
...
~~~
This includes the metric name (Tag), the interval start time and length (from the beginning of collection time), number of metrics recorded (count), minimum magnitude, a number of percentile measurements, and the maximum value. Notice that the format used is similar to that of the HDR logging, although instead of including the raw histogram data, common percentiles are recorded directly.

View File

@ -0,0 +1,6 @@
---
title: Reference
weight: 90
---
This section contains additional reference details across a range of DSBench topics.

View File

@ -47,7 +47,7 @@
<dependency>
<groupId>io.nosqlbench</groupId>
<artifactId>virtdata-docsys</artifactId>
<artifactId>docsys</artifactId>
<version>3.12.3-SNAPSHOT</version>
</dependency>