import nosqlbench

This commit is contained in:
Jonathan Shook
2020-02-20 15:37:57 -06:00
parent 62d53ecec6
commit fdc3d79856
464 changed files with 42623 additions and 0 deletions

86
nb-docs/pom.xml Normal file
View File

@@ -0,0 +1,86 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>nb-defaults</artifactId>
<groupId>io.nosqlbench</groupId>
<version>2.12.66-SNAPSHOT</version>
<relativePath>../nb-defaults</relativePath>
</parent>
<artifactId>nb-docs</artifactId>
<packaging>jar</packaging>
<name>${project.artifactId}</name>
<description>CLI for nosqlbench.</description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<javadoc.name>nosqlbench Docs</javadoc.name>
</properties>
<dependencies>
<!-- <dependency>-->
<!-- <groupId>io.nosqlbench</groupId>-->
<!-- <artifactId>nb-vis</artifactId>-->
<!-- <version>2.11.31-SNAPSHOT</version>-->
<!-- </dependency>-->
<dependency>
<groupId>io.nosqlbench</groupId>
<artifactId>virtdata-docsys</artifactId>
<version>2.12.16-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
</build>
<profiles>
<profile>
<id>shade</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<transformers combine.children="append">
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>io.nosqlbench.cli.EBCLI</mainClass>
</transformer>
</transformers>
<finalName>${project.artifactId}</finalName>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>

View File

@@ -0,0 +1,19 @@
package io.nosqlbench.docs;
import io.virtdata.annotations.Service;
import io.virtdata.docsys.api.Docs;
import io.virtdata.docsys.api.DocsBinder;
import io.virtdata.docsys.api.DocsysDynamicManifest;
@Service(DocsysDynamicManifest.class)
public class NBMarkdownManifest implements DocsysDynamicManifest {
@Override
public DocsBinder getDocs() {
return new Docs().namespace("docs-for-eb")
.addFirstFoundPath("nb-cli/src/main/resources/docs-for" +
"-nb/",
"docs-for-nb/")
.setEnabledByDefault(false)
.asDocsBinder();
}
}

View File

@@ -0,0 +1,69 @@
---
title: NoSQLBench Dev Guide
layout: single
weight: 0
---
## Conduct
It's simple really. Everything in the
[Contributor Covenant](https://www.contributor-covenant.org/version/1/4/code-of-conduct)
applies here. If, after reading that, you are unclear, then please pick another project to work on.
The maintainers will not hesitate to enforce a code of conduct.
## License
nosqlbench is licensed under the [Apache License, version
2.0](https://www.apache.org/licenses/LICENSE-2.0). If you wish to contribute
your code to this project, you must be willing to use this license. All code
contributed here is presumed to be licensed as such, and the maintainers
may add licenses to contributed files or add commit-level requirements for
clear licensing headers.
## How to Contribute
### Issue Tracker
There are multiple ways to contribute. This most direct and engaging way is to
file an issue when you have requests for enhancements or bug fixes.
Tickets which may be suitable for newer contributes will be marked as **easy
pick** in the spirit of encouragement.
### Project Site
The project site at nosqlbench.io could use some help as well. This is in a
separate repository adjacent to the main project as 'nosqlbench-docs'.
Consider this as part of the codebase in general. You can file issues against
it, or submit pull requests.
### Pull Request
This project is eager to have contributors. To that end, pull requests which are
in the spirit of the project will be welcome. When pull requests are not
directly accepted, kind and specific explanation of why will be provided. If you
want to contribute, and are not sure about whether your improvements would be
accepted, simply file an issue and describe what you are interested in doing
before coding too much.
### Change Scoping
Like with **easy pick** issues, those which are likely to be more effort will be
marked as **needs design**. The goals of any **needs design** will be to propose
in more detail the moving parts and user-facing ideas which might be too complex
or opaque for a single coding and testing effort. Think of these as *epic*
ideas, which will, by their nature, be required to have some design and usage
documentation submitted before they are merged.
### Project Maturity
As nosqlbench matures, a more stringent set of code management practices will
be adopted. The maintainers are leaning towards the
[Git Flow](https://nvie.com/posts/a-successful-git-branching-model/)
model. A stricter releas and branching model *will* be imposed as part of the next
major release.

View File

@@ -0,0 +1,198 @@
---
title: Activity Internals
weight: 32
menu:
main:
parent: Dev Guide
identifier: Activity Internals
weight: 12
---
Activities are a generalization of some type of client work that needs to occur
to generate work against a test target. However, a purely abstract interface for
activities would be so open-ended that it would provide no common scaffolding.
On the contrary, we do want some sense of isomorphism between activity types in
terms of how they are implemented and reasoned about. After reading this
document, you should know what it means to implement an activity properly--
building on the core machinery while adding in activity-type behavior
appropriately. That is what an Activity Type is for -- filling in the difference
between what the core machinery provides and what is needed to simulate a
particular kind of application workload.
## ActivityTypes
Each activity that runs in nosqlbench is provided by an instance of an
ActivityType. The first activity type that you will become familiar with is
called ``diag``. An ActivityType is responsible for providing the
application-like functionality that can be used in template form by activity
instances. When you are ready, there is a section all about the basics of
actually [implementing an activity
type](/dev-guide/building_activities/).
## Activity Parameters
All activities are controlled at runtime with a _ParameterMap_. This is simply
an observable thread-safe map of configuration values in string-string form,
with type-specific getters. It also provides basic parsing and type checking for
common parameters.
On the command line, you can specify parameters for an activity in the form:
~~~
type=cql alias=activity1 yaml=inserts_cql.yaml cycles=0..1000 threads=10
~~~
Other convenient forms are available when needed -- a JSON map for example.
## Threading
At runtime, an activity is driven by a dedicated thread pool harness -- the
ActivityExecutor. This harness knows how to adjust the running threads down or
up, as needed by changes to the related _threads_ parameter. This is meaningful
for a couple of reasons:
1. The client behavior can emulate typical threading dynamics of real
applications more accurately than a task-and-queue-only abstraction.
2. The synthetic thread ID can be borrowed and used to directly
map some measure of concurrency of data flow.
3. It is a familiar concurrency primitive that is used in many other testing tools.
There are a few lifetime scopes to keep in mind when a scenario is running. They
are:
~~~
scenario (control script)
activity
motor thread
motor thread
...
activity
motor thread
...
...
~~~
These scopes nest strictly from outside to inside. Activity-specific threads,
labeled `motor threads` above, run within the activity. Their executors run in
their own thread per-activity, and so forth. The term `motor thread` is used
here, but when working with nosqlbench you can generally think of them
interchangeably, as all __Runnable__ threads within a running activity are
implemented via the Motor API. It is the Motor and other interfaces which allows
the nosqlbench runtime to easily drive the workloads for an activity in a
modular way.
The ActivityType interface, part of the core nosqlbench API, allows you to
control how threads are created for activity instances, and how activity
instances are created for an activity. This means that the API has two levels of
instantiation and initialization, so some care has been taken to keep it as
simple as possible, nonetheless. Here are the scoping layers above with some
additional detail:
- A Scenario has ActivityType instances.
- An ActivityType can create:
- Activity instances
- MotorDispenser instances
- InputDispenser instances
- ActionDispenser instances
When an activity is initialized, it is created from the ActivityType. As well, a
dispenser for the three other types above is created from the ActivityType and
these are installed into the activity.
From this point forward, when a new thread needs to be created for an activity,
the __Runnable__ is dispensed by the MotorDispenser on that activity. The Input
and Action instances for that thread are also dispensed from the InputDispenser
and ActionDispenser on that activity, respectively.
In practice, you don't have to think about the API at this level of detail. Most
new ActivityType implementations will simply implement the API just enough to
provide an Action implementation and nothing more.
The [annotated Diag](/dev-guide/annotated_diag/) section shows the diag activity
type, built one piece at a time.
### Why Motors?
Each ActivityExecutor uses the _Motor_ API to manage activity threads. A Motor
is nothing new. The reason for the Motor abstraction to exists is to provide a
more definite boundary between the machinery and the pluggable workloads. It
provides a control boundary that is tangible to both the scripting runtime and
the new concurrent programmer. For this reason, seasoned Java programmers will
find nothing new or novel in the Motor abstraction. It's simply there to do the
obvious things:
1. Enable (desired and actual) state signaling between executor and thread.
2. Represent the per-thread flow and execution of inputs and actions.
3. Instrument said inputs and actions for metrics.
4. Control the per-thread unit of work around longer-running, tighter iterations
Motors lifetimes are not per-cycle. Motors can hang around in an activity
executor, be stopped, started, etc. They keep the same input and action
assignments that they were assembled with initially. You can think of motors as
event pumps which are meant to keep running while there is data available. They
aren't meant to cycle once for a lightweight task.
While it is possible to implement your own Motors, this will almost never be necessary.
### Slots, AKA Threads
To support multiple signal routing topologies within an activity, the concept of
a slot is used. A slot is nothing more than an indexed position for a thread in
a thread pool.
When a thread is being started for an activity, a motor instance is created for
the slot, as well as an input and action instance. However, the ActivityType
implementation has control of how these are created. If the ActivityType
implementation chooses, it may return a unique input for each slot, or a single
cached instance for all slots. This is controlled simply by the slot index,
which is passed into the factory methods for motors, inputs and threads.
## Activity Alias
The only way to address a running activity for dynamic control is through its
_alias_. An alias is simply the name that the ScenarioController knows as the
activity's name at runtime. If an alias is not provided, the runtime may accept
a new activity, but it will be forced to generate an internal name for it.
## ActivityType Name
ActivityTypes are discovered by the runtime via the Java ServiceLoader API. In
addition to the basic Java type, an ActivityType instance has a name. For the
built-in diagnostic activity type, it is 'diag'. Each activity type name must be
unique at runtime, or an error is thrown.
With an activity alias and the activity type name, you have enough information
to tell nosqlbench how to start an activity. The variable names for these are
**alias** and **type**.
## Iterating a Cycle
While an activity is running, each of its slots has a running motor which does
the following continuously.
1. Verify Motor control state, stop if signalled (a stop was requested)
2. Read the next input value (a long) from the Input, stop if exhausted
3. Apply the value to the Action.
The motor acts as a data pump, pulling in new test values to the application
workload and turning the crank on the workload machinery. The consumer interface
for an Action is very basic. This is intentional, and allows the maximum amount
of flexibility in workload (AKA ActivityType) design. The motor control state is
simply an atomically-visible breaker that is controlled by the ActivityExecutor.
The default implementation of an activity input is a sequence generator. This is
what most activities will need. However, rate controls and other decorators may
be desired, so the API makes it easy to wrap the default input.
## ActivityType Discovery
_ActivityType_ implementations are discovered by the runtime using the
[ServiceLoader API](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html) ,
with the service name __io.nosqlbench.activityapi.ActivityType.__ That means
simply that you must add the fully-qualified class name of your ActivityType
implementations to the META-INF/services/io.nosqlbench.activityapi.ActivityType
file of your built jar. A maven plugin automates this during build, and is
explained in further detail in the dev guides.

View File

@@ -0,0 +1,244 @@
---
title: Diag ActivityType
weight: 32
menu:
main:
parent: Dev Guide
identifier: Diag ActivityType
weight: 12
---
{{< warning >}}
This section is out of date, and will be updated after the next major release
with details on building async activity types.
{{< /warning >}}
If you take all the code chunks from this document and concatenate them
together, you'll have 'diag', one of the in-build activity types for
nosqlbench.
All activity types are annotated for inclusion in META-INF/services/ActivityType
to be found at runtime by the ServiceLoader API. This is done by an
upstream annotation _io.virtdata.annotations.Service_, since this avoids hoisting
in the popular but overly heavy AutoServer (it has a dependency on Guava).
~~~
@Service(ActivityType.class)
~~~
### DiagActivityType is an ActivityType
Let's implement an ActivityType. Actually, let's make it useful for something
besides default behavior. Let's also implement ActionDispenserProvider.
~~~
public class DiagActivityType implements ActivityType {
~~~
The ActivityType interface uses default methods for the *...DispenserProvider*
methods. Most ActivityType implementations will only need ActionDispenser(...).
### DiagActivityType has a name
Each ActivityType implementation must provide a simple name by which it is known
at runtime. When the available activity types are discovered at runtime, it is
an error for more than one to have the same name.
~~~
@Override
public String getName() {
return "diag";
}
~~~
### ActionDispenser method
We need to provide our own type of action for _diag_ in order to make it useful.
getActionDispenser(...) will be called exactly once per activity instance. The
ActionDispenser that we provide here (per activity instance) will be used to
create the per-thread actions for each per-thread Runnable (aka Motor). The
ActivityDef is also our first chance to specialize the behavior of the
ActivityType. This means that your primary input into the control of an
activity's behavior is this activity definition. If you want your ActivityType
to do something configurable, this is how you do it.
~~~
@Override
public ActionDispenser getActionDispenser(ActivityDef activity) {
return new DiagActionDispenser(activity);
}
}
~~~
Now, on to the implementation of the DiagActionDispenser we just saw above.
This implementation does little, but what it does is important. First, it
remembers the ActivityDef that goes with the activity instance. This is intended
to be the initializer data for the activity instance. Second, it simply provides
an Action when asked. Now we see the second opportunity to specialize the
behavior of the Action, around the slot number.
Whether we want it or not, the slot number is available to us. Notice that the
DiagAction itself is taking the slot number and the activity. We'll explain why
further down.
~~~
public class DiagActionDispenser implements ActionDispenser {
private ActivityDef activity;
public DiagActionDispenser(ActivityDef activity) {
this.activity = activity;
}
@Override
public Action getAction(int slot) {
return new DiagAction(slot, activity);
}
}
~~~
#### A note on instances and scoping
It may be the case that your Action implementation is thread-safe, and that you
want to just share the same instance across all Runnables in your running
activity. In that case, you'd keep a local instance of a DiagAction and simply
initialize and return it as needed. However, most often you'll want thread to
have some thread-local state, and you'll simply use the slot number for
diagnostic and logging purposes. This implementation does the latter.
The picture is different when you are talking about Inputs. It is often useful
to have a common stream of input for many threads, such as when you want to
meter the rate of processing over some number of inputs. In this case, a simple
atomically accessed and incremented long does the job well. But, in order to
meter or rate-limit the set of threads, you need them to use the same input. The
input is your control and measurement point. In this case, you would simply
re-use the same input for all slots.
Motors work exactly the same way. The naming of the interfaces for Motors,
Inputs, and Actions is consistent throughout, so hopefully that makes the API
easier to understand and use.
This allows for a degree of flexibility in mapping motors, inputs, and action to
slot numbers. You can create a ActivityType that shares none of these components
between threads for an activity, or one that shares all of the across threads
for an activity, or anything in between. The most common recipe is: one input
per activity and one motor and action per thread reading from this input.
### DiagAction
Now, on to the substance of this activity type, the Action implementation.
~~~
public class DiagAction implements Action, ActivityDefObserver {
private final static Logger logger = LoggerFactory.getLogger(DiagAction.class);
~~~
DiagAction is also an ActivityDefObserver. This is how an activity is able to be
informed when any of it's parameters are modified while it is running. We also
have the usual Logger in play.
Now, the local state:
~~~
private ActivityDef activity;
private int slot;
private long lastUpdate;
private long quantizedInterval;
~~~
ActivityDef and slot number are remembered. lastUpdate and quantizedInterval are
used by DiagAction in order to know when to report.
The basic purpose of diag is to provide a simple ActivityType that can be used
for testing and diagnostics. To do that, its action simply logs the input value
at some configured interval. It also demonstrates a couple of basic nosqlbench
patterns:
1. Sharing work across threads
2. Dynamically adjusting when the activity definition is modified.
A more detailed explaination of diag's behavior goes like this:
A logline for the input value is reported at every configured 'interval', in
milliseconds. The time between the scheduled reporting time and the actual
reporting time is also reported as 'delay'. All threads take a turn reporting
the interval.
In order to support this behavior, when the activity (and its actions) are
initialized, each Action computes a time interval which would put it in the
right place on the reporting schedule. This method is updateReportTime(), as
seen in the constructor:
~~~
public DiagAction(int slot, ActivityDef activity) {
this.activity = activity;
this.slot = slot;
updateReportTime();
}
~~~
Some helper methods make updateReportTime and the math around time offsets easier to read.
~~~
private void updateReportTime() {
lastUpdate = System.currentTimeMillis() - calculateOffset(slot, activity);
quantizedInterval = calculateInterval(activity);
logger.debug("updating report time for slot:" + slot + ", def:" + activity + " to " + quantizedInterval);
}
private long calculateOffset(long timeslot, ActivityDef activity) {
long updateInterval = activity.getParams().getLongOrDefault("interval", 100L);
long offset = calculateInterval(activity) - (updateInterval * timeslot);
return offset;
}
private long calculateInterval(ActivityDef activity) {
long updateInterval = activity.getParams().getLongOrDefault("interval", 100L);
int threads = activity.getThreads();
return updateInterval * threads;
}
~~~
This is where we read the activity def values. For diag, *interval* is a useful
parameter in the activity definition. The default is 100, of unset.
It is true that the code could be optimized more around performance or
terseness, but clarity and correctness are more important here.
### DiagAction implements Action
As an Action, we must accept input:
~~~
@Override
public void accept(long value) {
long now = System.currentTimeMillis();
if ((now - lastUpdate) > quantizedInterval) {
logger.info("diag action, input=" + value + ", report delay=" + ((now - lastUpdate) - quantizedInterval));
lastUpdate += quantizedInterval;
}
}
~~~
This is simply a loop that reads input and throws it away unless it is time to
report. If it is time to report, we mark the time in lastUpdate.
### DiagAction implements ActivityDefObserver
~~~
@Override
public void onActivityDefUpdate(ActivityDef activity) {
updateReportTime();
}
}
~~~
This is all there is to making an activity react to real-time changes in the activity definition.

View File

@@ -0,0 +1,81 @@
---
title: Async Operations
weight: 35
menu:
main:
parent: Dev Guide
identifier: Async Operations
weight: 12
---
{{< warning >}}
This section is out of date, and will be updated after the next major release
with details on building async activity types.
{{< /warning >}}
## Introduction
In nosqlbench, two types of activities are supported: sync or async. Going forward, the async interface will be
refined and hardened, and then the sync interface will be deprecated. This is simply a matter of simplifying the
API over time, and the async interface is the essential one. If you want synchronous behavior with the async
interface, you can easily achieve that, but not the other way around.
### Configuring Async
In an async activity, you still have multiple threads, but in this case, each thread is allowed to juggle one or
more asynchronous operations. The `async=100` parameter, for example, informs an activity that it needs to allocate
100 total operations over the allocated threads. In the case of `async=100 threads=10`, it is the responsibility
of the ActivityType's action dispenser to configure their actions to know that each of them can juggle 10 operations
each.
{{< note >}}The *async* parameter has a standard meaning in nosqlbench. If it is defined, async is enabled. Its
parameter value is the number of total async operations that can be in flight at any one instant, with the number
per-thread divided as evenly as possible over all the threads.
If the async parameter is defined, but the action implementation does *not* implement the async logic,
then an error is thrown to the user. This is to ensure that users are aware of when they are expecting async
behavior but getting something else.
{{</ note >}}
### Async Messaging Flow
The contract between a motor and an action is very basic.
- Each motor submits as many async operations as is allowed to its action, as long as there are
cycles remaining, until the action signals that it is at its limit.
- As long as an action is able to retire an operation by giving a result back to its motor,
the motor keeps providing one more and retiring one more, as long as there are cycles remaining.
- Once there are no more cycles remaining, the motor retires operations from the action until
the action signals that no more are pending.
The basic result of this is that each thread ramps up to its async juggling rate, hovers at that
rate, with a completion rate dependent on the target system, and then ramps down as pending ops
are retired back down to zero.
### Advanced Signaling
Because of differences in client-side APIs and behavior, and the need to do simple and reliable
flow management in nosqlbench, there are a few details about the API that are worth understanding
as a developer.
- There are multiple return or signaling points in the lifetime of an op context:
1. When an action becomes aware that an operation has completed, it is up to the action to
mark the op context with a call to `opcontext.stop(result)` at that time. This is important,
because operations do not complete in the same order that they are submitted, especially
when other async logic is present. A common way to do this is to register a callback on
a listener, for example.
2. The action must still return this op context back to the motor when it is asked. Thus, it
is a common pattern to keep a linked list of operations that are ready to retire and thus
allow the action to control orderly shutdown of the motor without any guesswork about the
completion state of pending operations.
3. The op context can be a sub-type, if you need to specialize the details that you keep for
an in-flight operation. In fact, async actions *must* implement a basic factory method,
but it can return a simple op context if no specialization is needed.
4. op contexts are recycled to avoid heap pressure for high data rates. This makes it relatively
low-cost to use the specialized op context to hold contextual data that may otherwise be
expensive to _malloc_ and _free_.
### Examples
Developers can refer to the Diag activity type implementation for further examples.

View File

@@ -0,0 +1,61 @@
---
title: Building ActivityTypes
weight: 32
menu:
main:
parent: Dev Guide
identifier: Building ActivityTypes
weight: 12
---
## Requirements
- Java 8
- Maven
## Building new Activity Types
1. Add the nosqlbench API to your project via Maven:
~~~
<dependency>
<groupId>io.nosqlbench</groupId>
<artifactId>nb-api</artifactId>
<version>1.0.17</version>
<type>pom</type>
</dependency>
~~~
2. Implement the ActivityType interface. Use the [Annotated Diag ActivityType] as a reference point as needed.
3. Add your new ActivityType implementation to the nosqlbench classpath.
4. File Issues against the [nosqlbench Project](http://github.com/nosqlbench/nosqlbench/issues) for any doc or API enhancements that you need.
## Working directly on nosqlbench
You can download and locally build nosqlbench. Do this if you want contribute
or otherwise experiment with the nosqlbench code base.
1. Get the source:
~~~
git clone http://github.com/nosqlbench/nosqlbench.git
~~~
2. Build and install locally:
~~~
pushd nosqlbench # assumes bash
mvn clean install
~~~
This will install the nosqlbench artifacts to your local _~/.m2/repository_.
## Using ActivityTypes
There are a couple ways you can use your new ActivityTypes with the nosqlbench
runtime. You can mix and match these as needed. The most common way to integrate
your ActivityTypes with the nosqlbench core is with Maven, but the details on
thi will vary by environment.

View File

@@ -0,0 +1,69 @@
---
title: Contributing
weight: 32
menu:
main:
parent: Dev Guide
identifier: Contributing
weight: 12
---
## Conduct
It's simple really. Everything in the
[Contributor Covenant](https://www.contributor-covenant.org/version/1/4/code-of-conduct)
applies here. If, after
reading that, you are unclear, then please pick another project to work on. The
maintainers will not hesitate to enforce a code of conduct.
## License
NoSQLBench is licensed under the [Apache License, version
2.0](https://www.apache.org/licenses/LICENSE-2.0). If you wish to contribute
your code to this project, you must be willing to use this license. All code
contributed here is presumed to be licensed as such, and the maintainers
may add licenses to contributed files or add commit-level requirements for
clear licensing headers.
## How to Contribute
### Issue Tracker
There are multiple ways to contribute. This most direct and engaging way is to
file an issue when you have requests for enhancements or bug fixes.
Tickets which may be suitable for newer contributes will be marked as **easy
pick** in the spirit of encouragement.
### Project Site
The project site at nosqlbench.io could use some help as well. This is in a
separate repository adjacent to the main project as 'nosqlbench-docs'.
Consider this as part of the codebase in general. You can file issues against
it, or submit pull requests.
### Pull Request
This project is eager to have contributors. To that end, pull requests which are
in the spirit of the project will be welcome. When pull requests are not
directly accepted, kind and specific explanation of why will be provided. If you
want to contribute, and are not sure about whether your improvements would be
accepted, simply file an issue and describe what you are interested in doing
before coding too much.
### Change Scoping
Like with **easy pick** issues, those which are likely to be more effort will be
marked as **needs design**. The goals of any **needs design** will be to propose
in more detail the moving parts and user-facing ideas which might be too complex
or opaque for a single coding and testing effort. Think of these as *epic*
ideas, which will, by their nature, be required to have some design and usage
documentation submitted before they are merged.
### Project Maturity
As nosqlbench matures, a more stringent set of code management practices will
be adopted. The maintainers are leaning towards the
[Git Flow](https://nvie.com/posts/a-successful-git-branching-model/)
model. A stricter releas and branching model *will* be imposed as part of the next
major release.

View File

@@ -0,0 +1,68 @@
---
title: Design Guidelines
weight: 34
menu:
main:
parent: Dev Guide
identifier: Design Guidelines
weight: 12
---
These guidelines are partially aspirational. As the project evolves, attempts will be made to
codify these guidelines and measure them on a per-release basis.
## ActivityType Naming
Each activity type should be named with a single lowercase name that is accurate and stable. Any activity type
implementations submitted to the nosqlbench project may be changed by the project maintainers to ensure this.
## ActivityType Documentation
Each activity type should have a file which provides markdown-formatted documentation for the user. This documentation
should be in a markdown format that is clean for terminal rendering for when users have *only* a terminal to read
with.
The single file should be hosted in the classpath under the name of the activity type with a `.md` extension. For example,
the `tcpclient` activity type has documentation in `tcpclient.md` at the root of the classpath.
This allows for users to run `help tcpclient` to get that documentation.
### ActivityType Parameters
The documentation for an activity type should have an explanation of all the activity parameters that are unique to it.
Examples of each of these should be given. The default values for these parameters should be given. Further, if
there are some common settings that may be useful to users, these should be included in the examples.
### Statement Parameters
The documentation for an activity type should have an explanation of all the statement parameters that are unique to it.
Examples of each of these should be given. The default values for these parameters should be given.
## Parameter Use
Activity parameters *and* statement parameters must combine in intuitive ways.
### Additive Configuration
If there is a configuration element in the activity type which can be modified in multiple ways that are not mutually exclusive, each time that
configuration element is modified, it should be done additively. This means that users should not be surprised when
they use multiple parameters that modify the configuration element with only the last one being applied.
### Parameter Conflicts
If it is possible for parameters to conflict with each other in a way that would provide an invalid configuration when both are applied,
or in a way that the underlying API would not strictly allow, then these conditions must be detected by the activity type, with
an error thrown to the user explaining the conflict.
### Parameter Diagnostics
Each and every activity parameter that is set on an activity *must* be logged at DEBUG level with the
pattern `ACTIVITY PARAMETER: <activity alias>` included in the log line, so that the user may verify applied parameter settings.
Further, an explanation for what this parameter does to the specific activity *should* be included in a following log line.
Each and every statement parameter that is set on a statement *must* be logged at DEBUG level with the
pattern `STATEMENT PARAMETER: <statement name>: ` included in the log line, so that the user may verify applied statement settings.
Further, an explanation for what this parameter does to the specific statement *should* be included in a following log line.

View File

@@ -0,0 +1,20 @@
---
title: Error Mapping
weight: 36
menu:
main:
parent: Dev Guide
identifier: Error Mapping
weight: 13
---
Each activity type should provide its own mapping between thrown errors and the error codes assigned to them.
This is facilitated by the `ErrorMapper` interface. It simply provides a way to initialize a cache-friendly view
of classes which are known exception types to a stable numbering of error codes.
By providing an error mapper for your activity type, you are enabling advanced testing scenarios that deal with
error routing and advanced error handling.
If no error mapper is installed in the ActivityType implementation, then a default one is provided which simply
maps all errors to _unknown_.

View File

@@ -0,0 +1,83 @@
---
title: Project Structure
weight: 32
menu:
main:
parent: Dev Guide
identifier: Project Structure
weight: 12
---
nosqlbench is packaged as a
[Maven Reactor](https://maven.apache.org/guides/mini/guide-multiple-modules.html) project.
## Defaults and Dependencies
Maven reactor projects often confuse developers. In this document, we'll explain
the basic structure of the nosqlbench project and the reasons for it.
Firstly, there is a parent for each of the modules. In Maven parlance, you can
think of a parent project as a template for projects that reference it. One of
the reasons you would do this is to simply common build or dependency settings
across many maven projects or modules. That is exactly why we do that here. The
'parent' project for all nosqlbench modules is aptly named 'project-defaults',
as that is exactly what we use it for.
As well, there is a "root" project, which is simply the project at the project's
base directory. It pulls in the modules of the project explicitly as in:
~~~
<modules>
<module>project-defaults</module> <!-- Holds project level defaults -->
<module>nb-api</module> <!-- APIs -->
...
</modules>
~~~
This means that when you build the root project, it will build all the modules
included, but only after linearizing the build order around the inter-module
dependencies. This is an important detail, as it is often overlooked that this
is the purpose of a reactor-style project.
The dependencies between the modules is not implicit. Each module listed in the
root pom.xml has its own explicit dependencies to other modules in the project.
We could cause them to have a common set of dependencies by adding those
dependencies to the 'project-defaults' module, but this would mostly prevent us
from making the dependencies for each as lean and specific as we like. That is
why the dependencies in the project-default **parent** module are empty.
The project-defaults module does, however, have some build, locale, and project
identity settings. You can consider these cross-cutting aspects of the modules
in the project. If you want to put something in the project-default module, and
it is not strictly cross-cutting across the other modules, then don't. That's
how you keep thing sane.
To be clear, cross-cutting build behavior and per-module dependencies are two
separate axes of build management. Try to keep this in mind when thinking about
modular projects and it will help you stay sane. Violating this basic rule is
one of the most common mistakes that newer Maven users make when trying to
enable modularity.
## Intermodule Dependencies
![Project Structure](../../static/diagrams/project_structure.png)
Modularity at runtime is enabled via the
[ServiceLoader](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html) API.
The nb-core module uses the nb-api module to know the loadable activity types.
ActivityType implementations use the nb-api module to implement the loadable
activity types. In this way, they both depend on the nb-api module to provide
the common types needed for this to work.
The nb-runtime module allows the separate implementations of the core and the
activity type implementations to exist together in the same classpath. This goes
hand-in-hand with how the runtime jar is bundled. Said differently, the artifact
produced by nb-runtime is a bundling of the things it depends on as a single
application. nb-runtime consolidates dependencies and provides a proper place to
do integration testing.
Taking the API at the bottom, and the components that can be composed together
at the middle, and the bundling project at the top, you'll see a not-uncommon
project structure that looks like a diamond. Going from bottom to top, you can
think of it as API, implementation, and packaging.

View File

@@ -0,0 +1,33 @@
---
title: Scripting Extensions
weight: 32
menu:
main:
parent: Dev Guide
identifier: Scripting Extensions
weight: 12
---
## Requirements
- Java 8
- Maven dependency:
## Scripting Extensions
When a new scripting environment is initialized in nosqlbench, a new instance of each scripting extension is published into it as a variable. This variable acts as a named service endpoint within the scripting environment. For example, an extension for saving a JSON map to disk could be published into the scripting environment as "savejson", and you might invoke it as "savejson.save('somefile.json',myjson);".
## Loading Scripting Extensions
In order to share these with the nosqlbench runtime, the ServerLoader API is used. The scripting environment will load every plugin implementing the SandboxPluginData interface, as long as it has the proper data in META-INF/services/ in the classpath. There are examples of how to do this via Maven in the source repo under the nb-extensions module.
## Maven Dependencies
~~~
<dependency>
<groupId>io.nosqlbench</groupId>
<artifactId>nb-api</artifactId>
<version>1.0.17</version>
<type>pom</type>
</dependency>
~~~

View File

@@ -0,0 +1,58 @@
---
title: YAML Config API
weight: 12
menu:
main:
parent: Dev Guide
identifier: configfiles-api
weight: 22
---
In the nosqlbench 2.* and newer versions, a standard YAML configuration format
is provided that makes it easy to use for any activity that requires statements,
tags, parameters and data bindings. This section describes how to use it as a
developer*. Developers should already be familiar with the user guide for the
YAML config format first.
## Simple Usage
StrInterpolater interpolator = new StrInterpolater(activityDef);
String yaml_loc = activityDef.getParams().getOptionalString("yaml").orElse("default");
StmtsDocList sdl = StatementsLoader.load(logger, yaml_loc, interp, "activities");
This loads the yaml at file path *yaml_loc*, while transforming template variables
with the interpolator, searching in the current directory and in the "activities"
subdirectory, and logging all diagnostics.
What you do next depends on the activity type. Typically, an activity will instantiate
an SequencePlanner to establish an operation ordering. See the *stdout* activity type
for an example of this.
## Implementation Notes
The getter methods on this API are intended to provide statements. Thus, all
access to bindings, params, or tags is provided via the StmtDef type.
It is possible to get these as aggregations at the block or doc level for activity
types that can make meaningful use of these as aggregations points. However,
it is usually sufficient to simply access the StmtDef iterator methods, as all
binding, tag, and param values are templated and overridden automatically for you.
within the API.
## On Bindings Usage
It is important to not instantiate or call bindings that are not expected to be
used by the user. This means that your statement form should use named anchors
for each and every binding that will be activated, *or* a clear contract with
the user should be expressed in the documentation for how bindings will be
resolved to statements.
## Named Anchors
The format of named anchors varies by activity type. There are some conventions
that can be used in order to maintain a more uniform user experience:
- String interpolation should use single curly braces when there are no local
conventions.
- Named anchors in prepared statements or other DB activity types should simply
add a name to the existing place holder, to be filtered out by the activity type
before being passed to the lower level driver.

View File

@@ -0,0 +1,24 @@
### Activity Inputs
Each activity has an input that controls which cycles it will run.
By default, the input is a simple interval that dispatches
cycle ranges to activity threads, starting at some cycle number
and ending at another. Example of this type of input include:
# cycle 0 through cycle 4, 5 cycles total
cycles=0..5
# cycle 0 through cycle 9999999
# When the interval start is left off, "0.." is assumed.
cycles=10M
# cycle 1000000000 through cycle 4999999999
cycles=1000G..5000G
However, there are other ways to feed an activity. All inputs are
modular within the nosqlbench runtime. To see what inputs are
available, you can simpy run:
PROG --list-input-types
Any input listed this way should have its own documentation.

View File

@@ -0,0 +1,64 @@
## cycle logging and re-use
### cycle_log format
This file is a binary format that encodes ranges of cycles
in RLE interval form. This means that it will be relatively compact
for scenarios that have many repeats of the same result, as well
as low-overhead when running scenarios.
All cycle logfiles have the *.cyclelog* suffix.
### export cycle_log to text format
You can dump an rlefile to the screen to see the content in text form
by running a command like this:
PROG --export-cycle-log <filename> [spans|cycles]
You do not need to specify the extension. If you do not specify either
optional format at the end, then *spans* is assumed. It will print output like this:
0->3
1->3
...
Alternately, you can see the individual RLE spans with the *spans* format, which looks
like this:
[0,100)->3
...
This format uses the '[x,y)' notation to remind you that the spans are all closed-open
intervals, including the starting cycle number but not the ending one.
### importing text format to a cycle_log
If you need to modify and then re-use a cycle log, you can do this with simple text tools.
Once you have modified the file, you can import it back to the native format with:
PROG --import-cycle-log <infile.txt> <outfile.cyclelog>
The importer recognizes both formats listed above.
### Using cycle logs as outputs
When you want an activity to record its per-cycle result for
later use, you can specify a cycle log as its output. This is configured as:
... output=type:cyclelog,file:somefile ...
If you do not specify the file parameter, then the alias of the activity is used.
### Using cycle logs as inputs
You can have all the cycles in a cycle log as the input cycles of an activity like this:
... input=type:cyclelog,file:somefile ...
Note, that when you use cycle logs as inputs, not all cycles are guaranteed to be
in order. In most cases, they will be, due to reordering support on RLE encoding. However,
that uses a sliding-window buffer, and in some cases RLE spans can occur out of
order in a cycle log.
If you do not specify the fie parameter, then the alias of the activity is used.

View File

@@ -0,0 +1,60 @@
## docker metrics
### summary
Enlist nosqlbench to stand up your metrics infrastructure using a local docker runtime:
--docker-metrics
When this option is set, nosqlbench will start graphite, prometheus, and grafana automatically
on your local docker, configure them to work together, and point nosqlbench to send metrics
the system automatically. It also imports a base dashboard for nosqlbench and configures grafana
snapshot export to share with a central DataStax grafana instance (grafana can be found on localhost:3000
with the default credentials admin/admin).
### details
If you want to know exactly what nosqlbench is doing, it's the equivalent of running the following by hand:
#### pull and run the graphite-exporter container
docker run -d -p 9108:9108 -p 9109:9109 -p 9109:9109/udp prom/graphite-exporter
#### prometheus config
place prometheus config in .prometheus:
prometheus.yml (found in resources/docker/prometheus/prometheus.yml)
#### pull and run the prometheus container
docker run -d -p 9090:9090 -v '<USER HOME>/.prometheus:/etc/prometheus' prom/prometheus --config.file=/etc/prometheus/prometheus.yml" --storage.tsdb.path=/prometheus" --storage.tsdb.retention=183d --web.enable-lifecycle
#### pull and run the grafana container
docker run -d -p 3000:3000 -v grafana/grafana
with the following environment variables:
GF_SECURITY_ADMIN_PASSWORD=admin
GF_AUTH_ANONYMOUS_ENABLED="true"
GF_SNAPSHOTS_EXTERNAL_SNAPSHOT_URL=http://54.165.144.56:3001
GF_SNAPSHOTS_EXTERNAL_SNAPSHOT_NAME="Send to Wei"
#### configure grafana
use the grafana api to set up the datasource and dashboard
POST
http://localhost:3000/api/dashboards/db
Payload:
analysis.json (found in resources/docker/dashboards/analysis.json)
POST
http://localhost:3000/api/datasources
Payload:
prometheus-datasource.yaml (found in resources/docker/datasources/prometheus-datasource.yaml)

View File

@@ -0,0 +1,83 @@
# Optimo Extension (BOBYQA)
Optimo is the name of the scripting extension that allows nosqlbench
scenarios to take advantage of the BOBYQA optimization algorithm.
## Usage
To instantiate a new instance of optimo, call the extension object
`optimos` in this way:
```
var optimo1 = optimos.init();
```
You work with the instance directly. The extension object `optimos` is
there only as a dispenser of new optimo instances, nothing more.
To add a parameter to the optimo instance, use the `param` method like
this:
```
optimo1.param('pressure', 1, 500);
optimo1.param('temperature', 275, 307);
```
This adds a `pressures` parameter to the algorithm with a range between
1 and 500, inclusive. This means that optimo will provide a parameter of
this name with a value in that range for each evaluation.
You should also set some initial parameters, which are key settings for
the BOBYQA optimizer. Read more about BOBYQA to understand what these
settings mean.
```
optimo1
.setInitialRadius(10000.0) // The initial trust radius
.setStoppingRadius(0.001) // The stopping condition (trust radius)
.setMaxEval(100); // Maximum number of iterations
```
Finally, you need to give optimo an objective function. The construction
of the objective function is the most important detail for using optimo
effectively. The function signature is simply `function(params)`, where
params is a map containing the named parameters provided by BOBYQA. This
makes it easy to integrate and extend the algorithm with varying
parameters as you like.
For example, you can add a function with this syntax:
```
optimo1.setObjectiveFunction(function(params) {
return params.temperature * params.pressure
});
```
In practice, this function will look much more like this:
```
optimo1.setObjectiveFunctikon(function(params) {
provide_user_feedback_about_params(params);
var result=run_test_with_params(params);
value = calculate_objective_value_of_result(result);
provide_user_feedback_about_result_and_value(result,value);
return value;
});
```
This schematic shows the two-phase aspect of combining the parameters
for testing with an actual test, and then taking the results of that
test and scoring it so that the algorithm knows which way to steer its
search pattern.
Once you have configured an optimo instance, you can put it in control
of the scenario with a call like this:
```
var result = optimo.optimize();
```
The result that it provides is an instance of
io.nosqlbench.extensions.optimizers.MVResult, which provides
`.getVarArray()`, `.getVarMap()`, and a useful `.toString()` method.

View File

@@ -0,0 +1,21 @@
# Setting threads
Threads may be set in a few different ways depending on the type of
testing you are doing.
Sometimes, you need the client runtime to emulate a threading model of
an application. Other times you may want the client to go as fast as it
can regardless of the threading model. The difference between these
varies significantly depending on whether you are using asynchronous
messaging or not.
Some valid forms for setting threads include:
- threads=auto
- Sets the thread count to 10x the number of CPUs
- This does not consider hyper-threading
- threads=2x
- Sets the thread count to 2x the number of CPUs
- This does not consider hyper-threading
- threads=10
- Simply sets the thread count to 10

View File

@@ -0,0 +1,22 @@
## Help Topics
### Built-in Component Docs
Generally, all named activity types, input types, oputput types, etc
have their own documentation. You can access those with a command like:
PROG help diag
### Advanced Topics
For any of the topics listed here, you can get detailed help by
running PROG help <topic>.
- topics
- commandline
- cli_scripting
- activity_inputs
- activity_outputs
- cycle_log