mirror of
https://github.com/nosqlbench/nosqlbench.git
synced 2025-02-10 14:45:42 -06:00
fixed silly typos in showcase section
This commit is contained in:
parent
6156b15a21
commit
9de8ef722c
@ -12,67 +12,91 @@ Some of the features discussed here are only for advanced testing scenarios.
|
||||
|
||||
## Hybrid Rate Limiting
|
||||
|
||||
Rate limiting is a complicated endeavor, if you want to do it well. The basic rub is that going fast means you have to
|
||||
be less accurate, and vice-versa. As such, rate limiting is a parasitic drain on any system. The act of rate limiting is
|
||||
in and of itself poses a limit to the maximum rate, regardless of the settings you pick, because this forces your system
|
||||
to interact with some hardware notion of time passing, and this takes CPU cycles that could be going to the thing you
|
||||
are limiting.
|
||||
Rate limiting is a complicated endeavor, if you want to do it well. The
|
||||
basic rub is that going fast means you have to be less accurate, and
|
||||
vice-versa. As such, rate limiting is a parasitic drain on any system. The
|
||||
act of rate limiting itself poses a limit to the maximum rate, regardless
|
||||
of the settings you pick. This occurs as a side-effect of forcing your
|
||||
system to interact with some hardware notion of time passing, which takes
|
||||
CPU cycles that could be going to the thing you are limiting.
|
||||
|
||||
This means that in practice, rate limiters are often very featureless. It's daunting enough to need rate limiting, and
|
||||
asking for anything more than that is often wishful thinking. Not so in NoSQLBench.
|
||||
This means that in practice, rate limiters are often very featureless.
|
||||
It's daunting enough to need rate limiting, and asking for anything more
|
||||
than that is often wishful thinking. Not so in NoSQLBench.
|
||||
|
||||
The rate limiter in NoSQLBench provides a comparable degree of performance and accuracy to others found in the Java
|
||||
ecosystem, but it *also* has advanced features:
|
||||
The rate limiter in NoSQLBench provides a comparable degree of performance
|
||||
and accuracy to others found in the Java ecosystem, but it *also* has
|
||||
advanced features:
|
||||
|
||||
- Allows a sliding scale between average rate limiting and strict rate limiting.
|
||||
- Internally accumulates delay time, for C.O. friendly metrics
|
||||
- It is resettable and reconfigurable on the fly
|
||||
- It provides its configured values in addition to performance data in metrics
|
||||
- It allows a sliding scale between average rate limiting and strict rate
|
||||
limiting, called _bursting_.
|
||||
- It internally accumulates delay time, for C.O. friendly metrics which
|
||||
are separately tracked for each and every operation.
|
||||
- It is resettable and reconfigurable on the fly, including the bursting
|
||||
rate.
|
||||
- It provides its configured values in addition to performance data in
|
||||
metrics, capturing your rate limiter settings as a simple matter of
|
||||
metrics collection.
|
||||
- It comes with advanced scripting helpers which allow you to read data
|
||||
directly from histogram reservoirs, or control the reservoir window
|
||||
programmatically.
|
||||
|
||||
## Flexible Error Handling
|
||||
|
||||
An emergent facility in NoSQLBench is the way that error are handled within an activity. For example, with the CQL
|
||||
activity type, you are able to route error handling for any of the known exception types. You can count errors, you can
|
||||
log them. You can cause errored operations to auto-retry if possible, up to a configurable number of tries.
|
||||
An emergent facility in NoSQLBench is the way that error are handled
|
||||
within an activity. For example, with the CQL activity type, you are able
|
||||
to route error handling for any of the known exception types. You can
|
||||
count errors, you can log them. You can cause errored operations to
|
||||
auto-retry if possible, up to a configurable number of tries.
|
||||
|
||||
This means, that as a user, you get to decide what your test is about. Is it about measuring some nominal but
|
||||
anticipated level of errors due to intentional over-saturation? If so, then count the errors, and look at their
|
||||
histogram data for timing details within the available timeout.
|
||||
This means, that as a user, you get to decide what your test is about. Is
|
||||
it about measuring some nominal but anticipated level of errors due to
|
||||
intentional over-saturation? If so, then count the errors, and look at
|
||||
their histogram data for timing details within the available timeout.
|
||||
|
||||
Are you doing a basic stability test, where you want the test to error out for even the slightest error? You can
|
||||
configure for that if you need.
|
||||
Are you doing a basic stability test, where you want the test to error out
|
||||
for even the slightest error? You can configure for that if you need.
|
||||
|
||||
## Cycle Logging
|
||||
|
||||
It is possible to record the result status of each and every cycles in a NoSQLBench test run. If the results are mostly
|
||||
homogeneous, the RLE encoding of the results will reduce the output file down to a small fraction of the number of
|
||||
cycles. The errors are mapped to ordinals, and these ordinals are stored into a direct RLE-encoded log file. For most
|
||||
testing where most of the result are simply success, this file will be tiny. You can also convert the cycle log into
|
||||
textual form for other testing and post-processing and vice-versa.
|
||||
It is possible to record the result status of each and every cycles in a
|
||||
NoSQLBench test run. If the results are mostly homogeneous, the RLE
|
||||
encoding of the results will reduce the output file down to a small
|
||||
fraction of the number of cycles. The errors are mapped to ordinals by
|
||||
error type, and these ordinals are stored into a direct RLE-encoded log
|
||||
file. For most testing where most of the results are simply success, this
|
||||
file will be tiny. You can also convert the cycle log into textual form
|
||||
for other testing and post-processing and vice-versa.
|
||||
|
||||
## Op Sequencing
|
||||
|
||||
The way that operations are planned for execution in NoSQLBench is based on a stable ordering that is configurable. The
|
||||
statement forms are mixed together based on their relative ratios. The three schemes currently supported are round-robin
|
||||
with exhaustion (bucket), duplicate in order (concat), and a way to spread each statement out over the unit interval
|
||||
(interval). These account for most configuration scenarios without users having to micro-manage their statement
|
||||
templates.
|
||||
The way that operations are planned for execution in NoSQLBench is based
|
||||
on a stable ordering that is configurable. The statement forms are mixed
|
||||
together based on their relative ratios. The three schemes currently
|
||||
supported are round-robin with exhaustion (bucket), duplicate in order
|
||||
(concat), and a way to spread each statement out over the unit interval
|
||||
(interval). These account for most configuration scenarios without users
|
||||
having to micro-manage their statement templates.
|
||||
|
||||
## Sync and Async
|
||||
|
||||
There are two distinct usage modes in NoSQLBench when it comes to operation dispatch and thread management:
|
||||
There are two distinct usage modes in NoSQLBench when it comes to
|
||||
operation dispatch and thread management:
|
||||
|
||||
### Sync
|
||||
|
||||
Sync is the default form. In this mode, each thread reads its sequence and dispatches one statement at a time, holding
|
||||
only one operation in flight per thread. This is the mode you often use when you want to emulate an application's
|
||||
request-per-thread model, as it implicitly linearizes the order of operations within the computed sequence of
|
||||
statements.
|
||||
Sync is the default form. In this mode, each thread reads its sequence and
|
||||
dispatches one statement at a time, holding only one operation in flight
|
||||
per thread. This is the mode you often use when you want to emulate an
|
||||
application's request-per-thread model, as it implicitly linearizes the
|
||||
order of operations within the computed sequence of statements.
|
||||
|
||||
### Async
|
||||
|
||||
In Async mode, each thread in an activity is reponsible for juggling a number of operations in-flight. This allows a
|
||||
NoSQLBench client to juggle an arbitrarily high number of connections, limited primarily by how much memory you have.
|
||||
In Async mode, each thread in an activity is responsible for juggling a
|
||||
number of operations in-flight. This allows a NoSQLBench client to juggle
|
||||
an arbitrarily high number of connections, limited primarily by how much
|
||||
memory you have.
|
||||
|
||||
Internally, the Sync and Async modes have different code paths. It is possible for an activity type to support one or
|
||||
both of these.
|
||||
Internally, the Sync and Async modes have different code paths. It is
|
||||
possible for an activity type to support one or both of these.
|
||||
|
@ -5,46 +5,72 @@ weight: 2
|
||||
|
||||
# Refined Core Concepts
|
||||
|
||||
The core concepts that NoSQLBench is built on have been scrutinized, replaced, refined, and hardened through several
|
||||
years of use by users of various needs and backgrounds.
|
||||
The core concepts that NoSQLBench is built on have been scrutinized,
|
||||
replaced, refined, and hardened through several years of use by users of
|
||||
various needs and backgrounds.
|
||||
|
||||
This is important when trying to find a way to express common patterns in what is often a highly fragmented practice.
|
||||
Testing is hard. Scale testing is hard. Distributed testing is hard. We need a set of conceptual building blocks that
|
||||
can span across workloads and system types, and machinery to put these concepts to use. Some concepts used in NoSQLBench
|
||||
are shared below for illustration, but this is by no means an exhaustive list.
|
||||
This level of refinement is important when trying to find a way to express
|
||||
common patterns in what is often a highly fragmented practice. Testing is
|
||||
hard. Scale testing is hard. Distributed testing is hard. Combined, the
|
||||
challenge of executing realistic tests is often quite daunting to all but
|
||||
seasons test engineers. To make this worse, existing tools have only
|
||||
skirmished with this problem enough to make dents, but none has tackled
|
||||
full-on the lack of conceptual building blocks.
|
||||
|
||||
This has to change. We need a set of conceptual building blocks that can
|
||||
span across workloads and system types, and machinery to put these
|
||||
concepts to use. This is why it is important to focus on finding a useful
|
||||
and robust set of concepts to use as the foundation for the rest of the
|
||||
toolkit to be built on. Finding these building blocks is often one of the
|
||||
most difficult tasks in systems design. Once you find and validate a
|
||||
useful set of concepts, everything else gets easier
|
||||
|
||||
We feel that the success that we've already had using NoSQLBench has been
|
||||
strongly tied to the core concepts. Some concepts used in NoSQLBench are
|
||||
shared below for illustration, but this is by no means an exhaustive list.
|
||||
|
||||
### The Cycle
|
||||
|
||||
Cycles in NoSQLBench are whole numbers on a number line. All operations in a NoSQLBench session are derived from a
|
||||
single cycle. It's a long value, and a seed. The cycle determines not only which statements (of those available) will
|
||||
get executed, but it also determines what the values bound to that statement will be.
|
||||
Cycles in NoSQLBench are whole numbers on a number line. Each operation in
|
||||
a NoSQLBench scenario is derived from a single cycle. It's a long value,
|
||||
and a seed. The cycle determines not only which statements is selected for
|
||||
execution, but also what synthetic payload data will be attached to it.
|
||||
|
||||
Cycles are specified as a closed-open `[min,max)` interval, just as slices in some languages. That is, the min value is
|
||||
included in the range, but the max value is not. This means that you can stack slices using common numeric reference
|
||||
points without overlaps or gaps. It means you can have exact awareness of what data is in your dataset, even
|
||||
incrementally.
|
||||
Cycles are specified as a closed-open `[min,max)` interval, just as slices
|
||||
in some languages. That is, the min value is included in the range, but
|
||||
the max value is not. This means that you can stack slices using common
|
||||
numeric reference points without overlaps or gaps. It means you can have
|
||||
exact awareness of what data is in your dataset, even incrementally.
|
||||
|
||||
You can think of a cycle as a single-valued coordinate system for data that lives adjacent to that number on the number
|
||||
line.
|
||||
You can think of a cycle as a single-valued coordinate system for data
|
||||
that lives adjacent to that number on the number line. In this way,
|
||||
virtual dataset functions are ways of converting coordinates into data.
|
||||
|
||||
### The Activity
|
||||
|
||||
An activity is a multi-threaded flywheel of statements in some sequence and ratio. Activities run over the numbers in a
|
||||
cycle range. Each activity has a driver type which determines the native protocol that it speaks.
|
||||
An activity is a multi-threaded flywheel of statements in some sequence
|
||||
and ratio. Activities run over the numbers in a cycle range. Each activity
|
||||
has a driver type which determines the native protocol that it speaks.
|
||||
|
||||
### The Activity Type
|
||||
### The Driver Type
|
||||
|
||||
An activity type is a high level driver for a protocol. It is like a statement-aware cartridge that knows how to take a
|
||||
basic statement template and turn it into an operation for the scenario to execute.
|
||||
A driver type is a high level driver for a protocol. It is like a
|
||||
statement-aware cartridge that knows how to take a basic statement
|
||||
template and turn it into an operation for an activity to execute within
|
||||
the scenario.
|
||||
|
||||
### The Scenario
|
||||
|
||||
The scenario is a runtime session that holds the activities while they run. A NoSQLBench scenario is responsible for
|
||||
aggregating global runtime settings, metrics reporting channels, logfiles, and so on.
|
||||
The scenario is a runtime session that holds the activities while they
|
||||
run. A NoSQLBench scenario is responsible for aggregating global runtime
|
||||
settings, metrics reporting channels, log files, and so on. All activities
|
||||
run within a scenario, under the control of the scenario script.
|
||||
|
||||
### The Scenario Script
|
||||
|
||||
Each scenario is governed by a script runs single-threaded, asynchronously from activities, but in control of
|
||||
activities. If needed, the scenario script is automatically created for the user, and the user never knows it is there.
|
||||
If the user has advanced testing requirements, then they may take advantage of the scripting capability at such time.
|
||||
When the script exits, *AND* all activities are complete, then the scenario is complete..
|
||||
Each scenario is governed by a script runs single-threaded, asynchronously
|
||||
from activities, but in control of activities. If needed, the scenario
|
||||
script is automatically created for the user, and the user never knows it
|
||||
is there. If the user has advanced testing requirements, then they may
|
||||
take advantage of the scripting capability at such time. When the script
|
||||
exits, *AND* all activities are complete, then the scenario is complete.
|
||||
|
@ -5,43 +5,49 @@ weight: 12
|
||||
|
||||
# High Fidelity Metrics
|
||||
|
||||
Since NoSQLBench has been built as a serious testing tool for all users, some attention was necessary on the way metric
|
||||
are used.
|
||||
Since NoSQLBench has been built as a serious testing tool for all users,
|
||||
some attention was necessary on the way metric are used.
|
||||
|
||||
## Discrete Reservoirs
|
||||
|
||||
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs. Internally, we use HDR reservoirs with discrete
|
||||
time boundaries. This is so that you can look at the min and max values and know that they apply accurately to the whole
|
||||
sampling window.
|
||||
In NoSQLBench, we avoid the use of time-decaying metrics reservoirs.
|
||||
Internally, we use HDR reservoirs with discrete time boundaries. This is
|
||||
so that you can look at the min and max values and know that they apply
|
||||
accurately to the whole sampling window.
|
||||
|
||||
## Metric Naming
|
||||
|
||||
All running activities have a symbolic alias that identifies them for the purposes of automation and metrics. If you
|
||||
have multiple activities running concurrently, they will have different names and will be represnted distinctly in the
|
||||
metrics flow.
|
||||
All running activities have a symbolic alias that identifies them for the
|
||||
purposes of automation and metrics. If you have multiple activities
|
||||
running concurrently, they will have different names and will be
|
||||
represented distinctly in the metrics flow.
|
||||
|
||||
## Precision and Units
|
||||
|
||||
By default, the internal HDR histogram reservoirs are kept at 4 digits of precision. All timers are kept at nanosecond
|
||||
resolution.
|
||||
By default, the internal HDR histogram reservoirs are kept at 4 digits of
|
||||
precision. All timers are kept at nanosecond resolution.
|
||||
|
||||
## Metrics Reportring
|
||||
## Metrics Reporting
|
||||
|
||||
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and HDR stats summary CSV files.
|
||||
Metrics can be reported via graphite as well as CSV, logs, HDR logs, and
|
||||
HDR stats summary CSV files.
|
||||
|
||||
## Coordianated Omission
|
||||
## Coordinated Omission
|
||||
|
||||
The metrics naming and semantics in NoSQLBench are setup so that you can have coordinated omission metrics when they are
|
||||
appropriate, but there are no there changes when they are not. This means that the metric names and meanings remain
|
||||
stable in any case.
|
||||
The metrics naming and semantics in NoSQLBench are setup so that you can
|
||||
have coordinated omission metrics when they are appropriate, but there are
|
||||
no there changes when they are not. This means that the metric names and
|
||||
meanings remain stable in any case.
|
||||
|
||||
Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused and thus prone to confusing
|
||||
people.
|
||||
Particularly, NoSQLBench avoids the term "latency" altogether as it is
|
||||
often overused and thus prone to confusing people.
|
||||
|
||||
Instead, the terms `service time`, `wait time`, and `response time` are used. These are abbreviated in metrics as
|
||||
`servicetime`, `waittime`, and `responsetime`.
|
||||
Instead, the terms `service time`, `wait time`, and `response time` are
|
||||
used. These are abbreviated in metrics as `servicetime`, `waittime`, and
|
||||
`responsetime`.
|
||||
|
||||
The `servicetime` metric is the only one which is always present. When a rate limiter is used, then additionally
|
||||
`waittime` and `responsetime` are reported.
|
||||
The `servicetime` metric is the only one which is always present. When a
|
||||
rate limiter is used, then additionally `waittime` and `responsetime` are
|
||||
reported.
|
||||
|
||||
|
||||
|
@ -5,18 +5,22 @@ weight: 10
|
||||
|
||||
# NoSQLBench Showcase
|
||||
|
||||
Since NoSQLBench is new on the scene in its current form, you may be wondering why you would want to use it over any
|
||||
other tool. That is what this section is all about.
|
||||
Since NoSQLBench is new on the scene in its current form, you may be
|
||||
wondering why you would want to use it over any other tool. That is what
|
||||
this section is all about.
|
||||
|
||||
If you want to look under the hood of this toolkit before giving it a spin, this section is for you. You don't have to
|
||||
read all of this! It is here for those who want to know the answer to the question "So, what's the big deal??" Just
|
||||
remember it is here for later if you want to skip to the next section and get started testing.
|
||||
You don't have to read all of this! It is here for those who want to know
|
||||
the answer to the question "So, what's the big deal??" Just remember it is
|
||||
here for later if you want to skip to the next section and get started
|
||||
testing.
|
||||
|
||||
NoSQLBench can do nearly everything that other testing tools can do, and more. It achieves this by focusing on a
|
||||
scalable user experience in combination with a modular internal architecture.
|
||||
NoSQLBench can do nearly everything that other testing tools can do, and
|
||||
more. It achieves this by focusing on a scalable user experience in
|
||||
combination with a modular internal architecture.
|
||||
|
||||
NoSQLBench is a workload construction and simulation tool for scalable systems testing. That is an entirely different
|
||||
scope of endeavor than most other tools.
|
||||
NoSQLBench is a workload construction and simulation tool for scalable
|
||||
systems testing. That is an entirely different scope of endeavor than most
|
||||
other tools.
|
||||
|
||||
The pages in this section all speak to advanced capabilities that are unique to NoSQLBench. In time, we want to show
|
||||
these with basic scenario examples, right in the docs.
|
||||
The pages in this section all speak to a selection of advanced
|
||||
capabilities that are unique to NoSQLBench.
|
||||
|
@ -5,18 +5,23 @@ weight: 11
|
||||
|
||||
# Modular Architecture
|
||||
|
||||
The internal architecture of NoSQLBench is modular throughout. Everything from the scripting extensions to the data
|
||||
generation functions is enumerated at compile time into a service descriptor, and then discovered at runtime by the SPI
|
||||
The internal architecture of NoSQLBench is modular throughout. Everything
|
||||
from the scripting extensions to data generation is enumerated at compile
|
||||
time into a service descriptor, and then discovered at runtime by the SPI
|
||||
mechanism in Java.
|
||||
|
||||
This means that extending and customizing bundles and features is quite manageable.
|
||||
This means that extending and customizing bundles and features is quite
|
||||
manageable.
|
||||
|
||||
It also means that it is relatively easy to provide a suitable API for multi-protocol support. In fact, there are
|
||||
several drivers avaialble in the current NoSQLBench distribution. You can list them out with `./nb --list-drivers`, and
|
||||
you can get help on how to use each of them with `./nb help <name>`.
|
||||
It also means that it is relatively easy to provide a suitable API for
|
||||
multi-protocol support. In fact, there are several drivers available in
|
||||
the current NoSQLBench distribution. You can list them out with `nb
|
||||
--list-drivers`, and you can get help on how to use each of them with `nb
|
||||
help <driver name>`.
|
||||
|
||||
This also is a way for us to encourage and empower other contributors to help develop the capabilities and reach of
|
||||
NoSQLBench as a bridge building tool in our community. This level of modularity is somewhat unusual, but it serves the
|
||||
purpose of helping users with new features.
|
||||
This also is a way for us to encourage and empower other contributors to
|
||||
help develop the capabilities and reach of NoSQLBench. By encouraging
|
||||
others to help us build NoSQLBench modules and extensions, we can help
|
||||
more users in the NoSQL community at large.
|
||||
|
||||
|
||||
|
@ -5,38 +5,46 @@ weight: 2
|
||||
|
||||
# Portable Workloads
|
||||
|
||||
All of the workloads that you can build with NoSQLBench are self-contained in a workload file. This is a
|
||||
statement-oriented configuration file that contains templates for the operations you want to run in a workload.
|
||||
All of the workloads that you can build with NoSQLBench are self-contained
|
||||
in a workload file. This is a statement-oriented configuration file that
|
||||
contains templates for the operations you want to run in a workload.
|
||||
|
||||
This defines part of an activity - the iterative flywheel part that is run directly within an activity type. This file
|
||||
contains everything needed to run a basic activity -- A set of statements in some ratio. It can be used to start an
|
||||
activity, or as part of several activities within a scenario.
|
||||
This defines part of an activity - the iterative flywheel part that is run
|
||||
directly within an activity type. This file contains everything needed to
|
||||
run a basic activity -- A set of statements in some ratio. It can be used
|
||||
to start an activity, or as part of several activities within a scenario.
|
||||
|
||||
## Standard YAML Format
|
||||
|
||||
The format for describing statements in NoSQLBench is generic, but in a particular way that is specialized around
|
||||
describing statements for a workload.
|
||||
The format for describing statements in NoSQLBench is generic, but in a
|
||||
particular way that is specialized around describing statements for a
|
||||
workload. That means that you can use the same YAML format to describe a
|
||||
workload for kafka as you can for Apache Cassandra or DSE.
|
||||
|
||||
That means that you can use the same YAML format to describe a workload for kafka as you can for Apache Cassandra or
|
||||
DSE.
|
||||
The YAML structure has been tailored to describing statements, their data
|
||||
generation bindings, how they are grouped and selected, and the parameters
|
||||
needed by drivers, like whether they should be prepared statements or not.
|
||||
|
||||
The YAML structure has been tailored to describing statements, their data generation bindings, how they are grouped and
|
||||
selected, and the parameters needed by drivers, like whether they should be prepared statements or not.
|
||||
Further, the YAML format allows for defaults and overrides with a very
|
||||
simple mechanism that reduces editing fatigue for frequent users.
|
||||
|
||||
Further, the YAML format allows for defaults and overrides with a very simple mechanism that reduces editing fatigue for
|
||||
frequent users.
|
||||
|
||||
You can also template document-wide macro paramers which are taken from the command line parameters just like any other
|
||||
parameter. This is a way of templating a workload and make it multi-purpose or adjustable on the fly.
|
||||
You can also template document-wide macro parameters which are taken from
|
||||
the command line just like any other parameter. This is a way of
|
||||
templating a workload and make it multi-purpose or adjustable on the fly.
|
||||
|
||||
## Experimentation Friendly
|
||||
|
||||
Because the workload YAML format is generic across driver types, it is possible to ask one driver type to interpret the
|
||||
statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you want to
|
||||
have a high level driver type like `stdout` interpret the syntax of another driver like `cql`. When you do this, the
|
||||
stdout activity type _plays_ the statements to your console as they would be executed in CQL, data bindings and all.
|
||||
Because the workload YAML format is generic across driver types, it is
|
||||
possible to ask one driver type to interpret the statements that are meant
|
||||
for another. This isn't generally a good idea, but it becomes extremely
|
||||
handy when you want to have a high level driver type like `stdout`
|
||||
interpret the syntax of another driver like `cql`. When you do this, the
|
||||
stdout activity type _plays_ the statements to your console as they would
|
||||
be executed in CQL, data bindings and all.
|
||||
|
||||
This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
|
||||
details before you change back to cql mode and turn up the settings for a higher scale test. It takes away the guess
|
||||
work about what your test is actually doing, and it works for all drivers.
|
||||
This means you can empirically and directly demonstrate and verify access
|
||||
patterns, data skew, and other dataset details before you change back to
|
||||
cql mode and turn up the settings for a higher scale test. It takes away
|
||||
the guess work about what your test is actually doing, and it works for
|
||||
all drivers.
|
||||
|
||||
|
@ -5,68 +5,93 @@ weight: 3
|
||||
|
||||
# Scripting Environment
|
||||
|
||||
The ability to write open-ended testing simulations is provided in NoSQLBench by means of a scripted runtime, where each
|
||||
scenario is driven from a control script that can do anything the user wants.
|
||||
The ability to write open-ended testing simulations is provided in
|
||||
NoSQLBench by means of a scripted runtime, where each scenario is driven
|
||||
from a control script that can do anything the user wants.
|
||||
|
||||
## Dynamic Parameters
|
||||
|
||||
Some configuration parameters of activities are designed to be assignable while a workload is running. This makes things
|
||||
like threads, rates, and other workload dynamics pseudo real-time. The internal APIs work with the scripting environment
|
||||
to expose these parameters directly to scenario scripts.
|
||||
Some configuration parameters of activities are designed to be assignable
|
||||
while a workload is running. This makes things like threads, rates, and
|
||||
other workload dynamics in real-time. The internal APIs work with the
|
||||
scripting environment to expose these parameters directly to scenario
|
||||
scripts. Drivers that are provided to NoSQLBench can also expose dynamic
|
||||
parameters in the same way so that anything can be scripted dynamically
|
||||
when needed.
|
||||
|
||||
## Scripting Automatons
|
||||
|
||||
When a NoSQLBench scenario is running, it is under the control of a single-threaded script. Each activity that is
|
||||
started by this script is run within its own threadpool, asynchronously.
|
||||
When a NoSQLBench scenario is running, it is under the control of a
|
||||
single-threaded script. Each activity that is started by this script is
|
||||
run within its own thread pool, simultaneously and asynchronously.
|
||||
|
||||
The control script has executive control of the activities, as well as full visibility into the metrics that are
|
||||
provided by each activity. The way these two parts of the runtime meet is through the service objects which are
|
||||
installed into the scripting runtime. These service objects provide a named access point for each running activity and
|
||||
its metrics.
|
||||
The control script has executive control of the activities, as well as
|
||||
full visibility into the metrics that are provided by each activity. The
|
||||
way these two parts of the runtime meet is through the service objects
|
||||
which are installed into the scripting runtime. These service objects
|
||||
provide a named access point for each running activity and its metrics.
|
||||
|
||||
This means that the scenario script can do something simple, like start activities and wait for them to complete, OR, it
|
||||
can do something more sophisticated like dynamically and interative scrutinize the metrics and make realtime adjustments
|
||||
to the workload while it runs.
|
||||
This means that the scenario script can do something simple, like start
|
||||
activities and wait for them to complete, OR, it can do something more
|
||||
sophisticated like dynamically and iteratively scrutinize the metrics and
|
||||
make real-time adjustments to the workload while it runs.
|
||||
|
||||
## Analysis Methods
|
||||
|
||||
Scripting automatons that do feedback-oriented analysis of a target system are called analysis methods in NoSQLBench. We
|
||||
have prototypes a couple of these already, but there is nothing keeping the adventurous from coming up with their own.
|
||||
Scripting automatons that do feedback-oriented analysis of a target system
|
||||
are called analysis methods in NoSQLBench. We have prototypes a couple of
|
||||
these already, but there is nothing keeping the adventurous from coming up
|
||||
with their own.
|
||||
|
||||
## Command Line Scripting
|
||||
|
||||
The command line has the form of basic test commands and parameters. These command get converted directly into scenario
|
||||
control script in the order they appear. The user can choose whether to stay in high level executive mode, with simple
|
||||
commands like "run workload=...", or to drop down directly into script design. They can look at the equivalent script
|
||||
for any command line by running --show-script. If you take the script that is dumped to console and run it, it should do
|
||||
exactly the same thing as if you hadn't even looked at it and just the standard commands.
|
||||
The command line has the form of basic test commands and parameters. These
|
||||
command get converted directly into scenario control script in the order
|
||||
they appear. The user can choose whether to stay in high level executive
|
||||
mode, with simple commands like `nb test-scenario ...`, or to drop down
|
||||
directly into script design. They can look at the equivalent script for
|
||||
any command line by running --show-script. If you take the script that is
|
||||
dumped to console and run it, it will do exactly the same thing as if you
|
||||
hadn't even looked at it and just ran basic commands on the command line.
|
||||
|
||||
There are even ways to combine script fragments, full commands, and calls to scripts on the command line. Since each
|
||||
variant is merely a way of constructing scenario script, they all get composited together before the scenario script is
|
||||
run.
|
||||
There are even ways to combine script fragments, full commands, and calls
|
||||
to scripts on the command line. Since each variant is merely a way of
|
||||
constructing scenario script, they all get composited together before the
|
||||
scenario script is run.
|
||||
|
||||
New introductions to NoSQLBench should focus on the command line. Once a user is familiar with this, it is up to them
|
||||
whether to tap into the deeper functionality. If they don't need to know about scenario scripting, then they shouldn't
|
||||
have to learn about it to be effective.
|
||||
New introductions to NoSQLBench should focus on the command line. Once a
|
||||
user is familiar with this, it is up to them whether to tap into the
|
||||
deeper functionality. If they don't need to know about scenario scripting,
|
||||
then they shouldn't have to learn about it to be effective. This is what
|
||||
we are calling a _scalable user experience_.
|
||||
|
||||
## Compared to DSLs
|
||||
|
||||
Other tools may claim that their DSL makes scenario "simulation" easier. In practice, any DSL is generally dependent on
|
||||
a development tool to lay the language out in front of a user in a fluent way. This means that DSLs are almost always
|
||||
developer-targeted tools, and mostly useless for casual users who don't want to break out an IDE.
|
||||
Other tools may claim that their DSL makes scenario "simulation" easier.
|
||||
In practice, any DSL is generally dependent on a development tool to lay
|
||||
the language out in front of a user in a fluent way. This means that DSLs
|
||||
are almost always developer-targeted tools, and mostly useless for casual
|
||||
users who don't want to break out an IDE.
|
||||
|
||||
One of the things a DSL proponent may tell you is that it tells you "all the things you can do!". This is de-facto the
|
||||
same thing as it telling you "all the things you can't do" because it's not part of the DSL. This is not a win for the
|
||||
user. For DSL-based systems, the user has to use the DSL whether or not it enhances their creative control, while in
|
||||
fact, most DSL aren't rich enough to do much that is interesting from a simulation perspective.
|
||||
One of the things a DSL proponent may tell you is that it tells you "all
|
||||
the things you can do!". This is de-facto the same thing as it telling you
|
||||
"all the things you can't do" because it's not part of the DSL. This is
|
||||
not a win-win for the user. For DSL-based systems, the user has to use the
|
||||
DSL whether or not it enhances their creative control, while in fact, most
|
||||
DSLs aren't rich enough to do much that is interesting from a simulation
|
||||
perspective.
|
||||
|
||||
In NoSQLBench, we don't force the user to use the programming abstractions except at a very surface level -- the CLI. It
|
||||
is up to the user whether or not to open the secret access panel for the more advance functionality. If they decide to
|
||||
do this, we give them a commodity language (ECMAScript), and we wire it into all the things they were already using. We
|
||||
don't take away their expressivity by telling them what they can't do. This way, users can pick their level of
|
||||
investment and reward as best fits thir individual needs, as it should be.
|
||||
In NoSQLBench, we don't force the user to use the programming abstractions
|
||||
except at a very surface level -- the CLI. It is up to the user whether or
|
||||
not to open the secret access panel for the more advance functionality. If
|
||||
they decide to do this, we give them a commodity language (ECMAScript),
|
||||
and we wire it into all the things they were already using. We don't take
|
||||
away their creative freedom by telling them what they can't do. This way,
|
||||
users can pick their level of investment and reward as best fits their
|
||||
individual needs, as it should be.
|
||||
|
||||
## Scripting Extensions
|
||||
|
||||
Also mentioned under the section on modularity, it is relatively easy for a developer to add their own scripting
|
||||
extensions into NoSQLBench.
|
||||
Also mentioned under the section on modularity, it is relatively easy for
|
||||
a developer to add their own scripting extensions into NoSQLBench as named
|
||||
service objects.
|
||||
|
@ -5,71 +5,105 @@ weight: 1
|
||||
|
||||
# Virtual Datasets
|
||||
|
||||
The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate data on the fly. There are many reasons for
|
||||
using this technique in testing, but it is often a topic that is overlooked or taken for granted.
|
||||
The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate
|
||||
data on the fly. There are many reasons for using this technique in
|
||||
testing, but it is often a topic that is overlooked or taken for granted.
|
||||
|
||||
## Industrial Strength
|
||||
|
||||
The algorithms used to generate data are based on advanced techniques in the realm of variate sampling. The authors have
|
||||
gone to great lengths to ensure that data generation is efficient and as much O(1) in processing time as possible.
|
||||
The algorithms used to generate data are based on advanced techniques in
|
||||
the realm of variate sampling. The authors have gone to great lengths to
|
||||
ensure that data generation is efficient and as much O(1) in processing
|
||||
time as possible.
|
||||
|
||||
For example...
|
||||
|
||||
One technique that is used to achieve this is to initialize and cache data in high resolution look-up tables for
|
||||
distributions which may perform differently depending on their density functions. The existing Apache Commons Math
|
||||
libraries have been adapted into a set of interpolated Inverse Cumulative Distribution sampling functions. This means
|
||||
that you can use a Zipfian distribution in the same place as you would a Uniform distribution, and once initialized,
|
||||
they sample with identical overhead. This means that by changing your test definition, you don't accidentally change the
|
||||
behavior of your test client.
|
||||
One technique that is used to achieve this is to initialize and cache data
|
||||
in high resolution look-up tables for distributions which may otherwise
|
||||
perform differently depending on their respective density functions. The
|
||||
existing Apache Commons Math libraries have been adapted into a set of
|
||||
interpolated Inverse Cumulative Distribution sampling functions. This
|
||||
means that you can use them all in the same place as you would a Uniform
|
||||
distribution, and once initialized, they sample with identical overhead.
|
||||
This means that by changing your test definition, you don't accidentally
|
||||
change the behavior of your test client, only the data as intended.
|
||||
|
||||
## The Right Tool
|
||||
## A Purpose-Built Tool
|
||||
|
||||
Many other testing systems avoid building a dataset generation component. It's a toubgh problem to solve, so it's often
|
||||
just avoided. Instead, they use libraries like "faker" and variations on that. However, faker is well named, no pun
|
||||
intended. It was meant as a vignette library, not a source of test data for realistic results. If you are using a
|
||||
testing tool for scale testing and relying on a faker variant, then you will almost certainly get invalid results for
|
||||
any serious test.
|
||||
Many other testing systems avoid building a dataset generation component.
|
||||
It's a tough problem to solve, so it's often just avoided. Instead, they
|
||||
use libraries like "faker" or other sources of data which weren't designed
|
||||
for testing at scale. Faker is well named, no pun intended. It was meant
|
||||
as a vignette and wire-framing library, not a source of test data for
|
||||
realistic results. If you are using a testing tool for scale testing and
|
||||
relying on a faker variant, then you will almost certainly get invalid
|
||||
results that do not represent how a system would perform in production.
|
||||
|
||||
The virtual dataset component of NoSQLBench is a library that was designed for high scale and realistic data streams.
|
||||
The virtual dataset component of NoSQLBench is a library that was designed
|
||||
for high scale and realistic data streams. It uses the limits of the data
|
||||
types in the JVM to simulate high cardinality datasets which approximate
|
||||
production data distributions for realistic and reproducible results.
|
||||
|
||||
## Deterministic
|
||||
|
||||
The data that is generated by the virtual dataset libraries is determinstic. This means that for a given cycle in a
|
||||
test, the operation that is synthesized for that cycle will be the same from one session to the next. This is
|
||||
intentional. If you want to perturb the test data from one session to the next, then you can most easily do it by simply
|
||||
The data that is generated by the virtual dataset libraries is
|
||||
deterministic. This means that for a given cycle in a test, the operation
|
||||
that is synthesized for that cycle will be the same from one session to
|
||||
the next. This is intentional. If you want to perturb the test data from
|
||||
one session to the next, then you can most easily do it by simply
|
||||
selecting a different set of cycles as your basis.
|
||||
|
||||
This means that if you find something intersting in a test run, you can go back to it just by specifying the cycles in
|
||||
question. It also means that you aren't losing comparative value between tests with additional randomness thrown in. The
|
||||
data you generate will still look random to the human eye, but that doesn't mean that it can't be reproducible.
|
||||
This means that if you find something interesting in a test run, you can
|
||||
go back to it just by specifying the cycles in question. It also means
|
||||
that you aren't losing comparative value between tests with additional
|
||||
randomness thrown in. The data you generate will still look random to the
|
||||
human eye, but that doesn't mean that it can't be reproducible.
|
||||
|
||||
## Statistically Shaped
|
||||
|
||||
All this means is that the values you use to tie your dataset together can be specific to any distribution that is
|
||||
appropriate. You can ask for a stream of floating point values 1 trillion values long, in any order. You can use
|
||||
discrete or continuous distributions, with whatever parameters you need.
|
||||
All this means is that the values you use to tie your dataset together can
|
||||
be specific to any distribution that is appropriate. You can ask for a
|
||||
stream of floating point values 1 trillion values long, in any order. You
|
||||
can use discrete or continuous distributions, with whatever distribution
|
||||
parameters you need.
|
||||
|
||||
## Best of Both Worlds
|
||||
|
||||
Some might worry that fully synthetic testing data is not realistic enough. The devil is in the details on these
|
||||
arguments, but suffice it to say that you can pick the level of real data you use as seed data with NoSQLBench.
|
||||
Some might worry that fully synthetic testing data is not realistic
|
||||
enough. The devil is in the details on these arguments, but suffice it to
|
||||
say that you can pick the level of real data you use as seed data with
|
||||
NoSQLBench. You don't have to choose between realism and agility. The
|
||||
procedural data generation approach allows you to have all the benefits of
|
||||
testing agility of low-entropy testing tools while retaining nearly all of
|
||||
the benefits of real testing data.
|
||||
|
||||
For example, using the alias sampling method and a published US census (public domain) list of names and surnames tha
|
||||
occured more than 100x, we can provide extremely accurate samples of names according to the discrete distribution we
|
||||
know of. The alias method allows us to sample accurately in O(1) time from the entire dataset by turning a large number
|
||||
of weights into two uniform samples. You will simply not find a better way to sample names of US names than this. (but
|
||||
if you do, please file an issue!)
|
||||
For example, using the alias sampling method and a published US census
|
||||
(public domain) list of names and surnames tha occurred more than 100x, we
|
||||
can provide extremely accurate samples of names according to the published
|
||||
labels and weights. The alias method allows us to sample accurately in
|
||||
O(1) time from the entire dataset by turning a large number of weights
|
||||
into two uniform samples. You will simply not find a better way to sample
|
||||
realistic (US) names than this. (If you do, please file an issue!)
|
||||
Actually, any data set that you have in CSV form with a weight column can
|
||||
also be used this way, so you're not strictly limited to US census data.
|
||||
|
||||
## Java Idiomatic Extension
|
||||
|
||||
The way that the virtual dataset component works allows Java developers to write any extension to the data generation
|
||||
functions simply in the form of Java 8 or newer Funtional interfaces. As long as they include the annotation processor
|
||||
and annotate their classes, they will show up in the runtime and be available to any workload by their class name.
|
||||
The way that the virtual dataset component works allows Java developers to
|
||||
write any extension to the data generation functions simply in the form of
|
||||
Java 8 or newer Functional interfaces. As long as they include the
|
||||
annotation processor and annotate their classes, they will show up in the
|
||||
runtime and be available to any workload by their class name.
|
||||
|
||||
Additionally, annotation based examples and annotation processing is used
|
||||
to hoist function docs directly into the published docs that go along with
|
||||
any version of NoSQLBench.
|
||||
|
||||
## Binding Recipes
|
||||
|
||||
It is possible to stitch data generation functions together directly in a workload YAML. These are data-flow sketches of
|
||||
functions that can be copied and pasted between workload descriptions to share or remix data streams. This allows for
|
||||
the adventurous to build sophisticated virtual datasets that emulate nuances of real datasets, but in a form that takes
|
||||
It is possible to stitch data generation functions together directly in a
|
||||
workload YAML. These are data-flow sketches of functions that can be
|
||||
copied and pasted between workload descriptions to share or remix data
|
||||
streams. This allows for the adventurous to build sophisticated virtual
|
||||
datasets that emulate nuances of real datasets, but in a form that takes
|
||||
up less space on the screen than this paragraph!
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user