mirror of
https://github.com/nosqlbench/nosqlbench.git
synced 2025-02-25 18:55:28 -06:00
update uniform workload specification to current terms and structure
This commit is contained in:
@@ -1,133 +1,95 @@
|
||||
# Workload Definition
|
||||
# Workload Specification
|
||||
|
||||
This directory contains the design for a standard and extensible way of loading workload definitions
|
||||
into a NoSQLBench activity.
|
||||
This directory contains the testable specification for workload definitions used by NoSQLBench.
|
||||
|
||||
It is highly recommended that you familiarize yourself with the details below the specifications if
|
||||
you have not already. For the purposes of simplicity, all the user-facing language is called *
|
||||
templating*, and all developer-facing functionality is called *API*.
|
||||
## Op Templates vs Developer API
|
||||
There are two primary views of workload definitions that we care about:
|
||||
|
||||
1. The User View of **op templates**
|
||||
1. Op templates are simply the schematic recipes for building an operation.
|
||||
2. Op templates are provided by users in YAML or JSON or even directly via runtime API.
|
||||
3. Op templates can be provided with optional metadata which serves to label, group or
|
||||
otherwise make the individual op templates more manageable.
|
||||
4. A variety of forms are supported which are self-evident, but which allow users to have
|
||||
some flexibility in how they structure their YAML, JSON, or runtime collections.
|
||||
2. The Developer View of the ParsedOp API -- All op templates, regardless of the form they are
|
||||
provided in, are processed into a normalized internal data structure.
|
||||
1. The detailed documentation for the ParsedOp API is in javadoc.
|
||||
|
||||
The documentation in this directory serve as a testable specification for all the above. It
|
||||
shows specific examples of all the valid op template forms in both YAML and JSON, as well as how
|
||||
the data is normalized to feed developer's view of the ParsedOp API.
|
||||
|
||||
If you are a new user, it is recommended that you read the basic docs first before delving into
|
||||
these specification-level docs too much. The intro docs show normative and simple ways to
|
||||
specific workloads without worrying too much about all the possible forms.
|
||||
|
||||
## Templating Language
|
||||
|
||||
When users want to specify a set of operations to perform, they do so with the workload templating
|
||||
format.
|
||||
format, which includes document level details, block level details, and op level details.
|
||||
Specific reserved words like `block` or `ops` are used in tandem with nesting structure to
|
||||
define all valid workload constructions. Because of this, workload definitions are
|
||||
essentially data structures comprised of basic collection types and primitive values. Any on-disk
|
||||
format which can be loaded as such can be a valid source of workload definitions.
|
||||
|
||||
- [Templated Workloads](templated_workloads.md) - Basics of workloads templating
|
||||
- [Templated Operations](templated_operations.md) - Details of op templating
|
||||
- [SpecTest Formatting](spectest_formatting.md) - A primer on the example formats used here
|
||||
- [Workload Structure](workload_structure.md) - Overall workload structure, keywords, nesting
|
||||
features
|
||||
- [Op Template Basics](op_template_basics.md) - Basic Details of op templating
|
||||
- [Op Template Variations](op_template_variations.md) - Additional op template variants
|
||||
and corner cases
|
||||
- [Template Variables](template_variables.md) - Textual macros and default values
|
||||
|
||||
## Workload API
|
||||
## ParsedOp API
|
||||
|
||||
After a workload template is loaded into an activity, it is presented to the driver in an API which
|
||||
is suitable for building executable ops in the native driver.
|
||||
|
||||
- [Command API](command_api.md) - Defines the API which developers see after a workload is fully
|
||||
- [ParsedOp API](parsed_op_api.md) - Defines the API which developers see after a workload is fully
|
||||
loaded.
|
||||
|
||||
## Workload Templating Format
|
||||
|
||||
The first half of this specification is focused solely on the schematic form that users provide to
|
||||
NoSQLBench in order to describe a set of operations to perform.
|
||||
|
||||
It covers both the user-facing schematic of what can be specified the driver-facing API which allows
|
||||
for a variety of driver implementations to be supported.
|
||||
|
||||
## Background Work
|
||||
## Related Reading
|
||||
|
||||
If you want to understand the rest of this document, it is crucial that you have a working knowledge
|
||||
of the standard YAML format and several examples from the current drivers. You can learn this from
|
||||
the main documentation which demonstrates step-by-step how to build a workload. Reading further in
|
||||
this document will be most useful for core NB developers, but we are happy to take input from
|
||||
anybody.
|
||||
|
||||
## Overview of Workload Mapping
|
||||
|
||||
The purpose of this effort is to thoroughly standardize the concepts and machinery of how operations
|
||||
are mapped from user configuration to operations in flight. In the past, much of the glue logic
|
||||
between YAML and an operation has been left to the NoSQLBench ActivityType -- the high-level driver
|
||||
which acts as a binding layer between vendor APIs and the NoSQLBench core machinery.
|
||||
|
||||
Now that there are several drivers implemented, each with their own minor variations in how YAML
|
||||
could be interpreted, it's time to take stock of the common path and codify it. The expected outcome
|
||||
of this effort are several:
|
||||
|
||||
- NoSQLBench drivers (AKA ActivityTypes) get much easier to implement.
|
||||
- Standard NB driver features are either supported uniformly, or not at all.
|
||||
- The semantics of op templates and workload configuration are much more clearly specified and
|
||||
demonstrated for users.
|
||||
- All API surface area for this facet of NB can be tested in a very tangible way, with docs and
|
||||
testing sharing one and the same examples.
|
||||
|
||||
While these desires are enough alone to warrant an improvement, they are also key to simplifying the
|
||||
design of two new drivers which are in the works: gRPC and the Apache Cassandra Java driver version
|
||||
|
||||
4. The gRPC driver will need to have a very clearly specified logical boundary on the NB side to
|
||||
keep the combined system simple enough to explain and maintain.
|
||||
this document will be most useful for core NB developers, or advanced users who want to know all
|
||||
the possible ways of building workloads.
|
||||
|
||||
## Op Mapping Stages
|
||||
|
||||
As a workload definition is read and mapped into the form of an executable activity in the NB
|
||||
process, it takes on different forms. Each stage can be thought of as a more refined view or API
|
||||
through which the workload can be seen. At each stage, specific processing is required to promote
|
||||
the more generic form into a more specialized and consumable form by the next layer.
|
||||
The process of loading a workload definition occurs in several discrete steps during a NoSQLBench
|
||||
session:
|
||||
|
||||
It should be noted that mapping workload definitions to operations is not something that needs to be
|
||||
done quickly. Instead, it is more important to focus on user experience factors, such as
|
||||
1. The workload file is loaded.
|
||||
2. Template variables are interposed.
|
||||
3. The file is deserialized from its native form into a raw data structure.
|
||||
4. The raw data structure is transformed into a normalized data structure according to the Op
|
||||
Template normalization rules.
|
||||
5. The data is provided to the ParsedOp API for use by the developer.
|
||||
6. The DriverAdapter is loaded which understands the op fields provided in the op template.
|
||||
7. The DriverAdapter uses its documented rules to determine which types of native driver operations
|
||||
each op template is intended to represent. This is called **Op Mapping**.
|
||||
8. The DriverAdapter uses the identified types to create dispensers of native driver operations.
|
||||
This is called **Op Dispensing**.
|
||||
9. The op dispensers are arranged into an indexed bank of op sources according to the specified
|
||||
ratios and or sequencing strategy. From this point on, NoSQLBench has the ability to
|
||||
construct an operation for any given cycle at high speed.
|
||||
|
||||
These specifications are focused on steps 2-5. The DriverAdapter focuses on the developer's use of
|
||||
the ParsedOp API, and as such is documented in javadoc primarily. Some details on the ParsedOp
|
||||
API are shared here for basic awareness, but developers should look to the javadoc for the full
|
||||
story.
|
||||
|
||||
## Mapping vs Running
|
||||
|
||||
It should be noted that the Op Mapping stage, where user intentions are mapped from op templates to
|
||||
native operations is not something that needs to be done quickly. This occurs at
|
||||
_initialization_ time. Instead, it is more important to focus on user experience factors, such as
|
||||
flexibility, obviousness, robustness, correctness, and so on. Thus, priority of design factors in
|
||||
this part of NB is placed more on clear and purposeful abstractions and less on optimizing for
|
||||
speed. The clarity and detail which is conveyed by this layer to the driver developer will then
|
||||
enable them to focus on building fast and correct op dispensers, which are built before the main
|
||||
part of running a workload, but which are used at high speed while the workload is running.
|
||||
|
||||
## Stored Form
|
||||
|
||||
Presently this is YAML, but it could be any format.
|
||||
|
||||
Each stored form requires a loader which can map its supported formats into a raw data structure
|
||||
explained below.
|
||||
|
||||
A Workload Loader is nothing more than a reader which can read a specific format into a data
|
||||
structure.
|
||||
|
||||
## Workload Template
|
||||
|
||||
**Workload templates are presented to NoSQLBench standardized data structures.**
|
||||
|
||||
This is a data structure in basic object form. It is merely the most obvious and direct in-memory
|
||||
representation of the contents of the stored form. In Java, this looks like basic collections and
|
||||
primitive types, such as Lists, Maps, Strings, and so on. The raw data structure form should always
|
||||
be the most commodity type of representation for the target language.
|
||||
|
||||
The workload template is meant to be a runtime model which can be specified and presented to the
|
||||
scenario in multiple ways. As such, scripting layers and similar integrations can build such data
|
||||
structures programmatically, and provide them to the runtime directly. So long as the programmer is
|
||||
aware of what is valid, providing a workload template as a data structure should have the same
|
||||
effect as providing one from a yaml or json file.
|
||||
|
||||
In this way, the NB workload data structure acts as a de-facto API of sorts, although it has no
|
||||
methods or functions. It is simply a commodity representation of a workload template. As such, the
|
||||
NoSQLBench runtime must provide clear feedback to the user when invalid constructions are given.
|
||||
|
||||
What is valid, what is not, and what each possible construction must be codified in a clear and
|
||||
complete standard. The valid elements of a workload template are documented in
|
||||
[workload_templates.md](workload_templates.md), which serves as both an explainer and a living
|
||||
specification. The contents of this file are tested directly by NoSQLBench builds.
|
||||
|
||||
## Workload API
|
||||
|
||||
The workload template provides some layering possibilities which are applied automatically for the
|
||||
user by the workload API. Specifically, any bindings, params, or tags which are defined by name in
|
||||
an outer scope of the structure are automatically used by operations which do not define their own
|
||||
element of the same type and name. This happens at three levels:
|
||||
document scope, block scope, and op scope. More details on this are in the
|
||||
*designing workload* guide.
|
||||
|
||||
Since the workload template is meant to enable layered defaults, there is a logical difference
|
||||
between the minimally-specified version of a workload and that seen by a driver. Drivers access the
|
||||
workload through the lens of the workload API, which is responsible for layering in the settings
|
||||
applied to each op template.
|
||||
|
||||
This form of the workload is called the **rendered workload**, and is presented to the driver as an
|
||||
accessible object model.
|
||||
|
||||
|
||||
enable them to focus on building fast and correct op dispensers. These dispensers are also
|
||||
constructed before the workload starts running, but are used at high speed while the workload
|
||||
is running.
|
||||
|
||||
@@ -1,30 +1,15 @@
|
||||
# Op Templates
|
||||
|
||||
The rules around op templates deserve a separate section, given that there are many techniques that
|
||||
a user can choose from.
|
||||
Op templates are the recipes provided by users for an operation. These hold examples of payload
|
||||
data, metadata that configures the driver, timeout settings and so on.
|
||||
|
||||
The valid elements of the raw workload form are explained below, using YAML and JSON5 as a schematic
|
||||
language. This guide is not meant to be very explanatory for new users, but it can serve as a handy
|
||||
reference about how workloads can be structured.
|
||||
The field name used in workload templates to represent operations can often be symbolic to users.
|
||||
For this reason, several names are allowed: ops, op, operations, statements, statement. It doesn't
|
||||
matter whether the value is provided as a map, list, or scalar. These all allow for the same
|
||||
level of templating. Map forms are preferred, since they include naming in a more streamlined
|
||||
structure. When you use list form, you have to provide the name as a separate field.
|
||||
|
||||
Any bundled workload loader should test all of these fenced code blocks and confirm that the data
|
||||
structures are logically equivalent, using any json5 blocks as a trigger to compare against the
|
||||
prior block.
|
||||
**This document is a testable specification.**
|
||||
|
||||
While some of the examples below appear to be demonstrating basic cross-format encoding, there is
|
||||
more going on. This document captures a set of basic sanity rules for all raw workload data, as well
|
||||
as visual examples and narratives to assist users and maintainers. In short, if you don't see what
|
||||
you want to do here, it is probably not valid in the format, and if you know that to be false, then
|
||||
this document needs to be updated with more details!
|
||||
|
||||
The field used in workload templates to represent an operation can often be symbolic to users. For
|
||||
this reason, several names are allowed: ops, op, operations, statements, statement. It doesn't
|
||||
matter whether the value is provided as a map, list, or scalar. These all do the same thing,
|
||||
although an error is thrown if you specify more than one. The interpretation is always the same: An
|
||||
ordered collection of op templates. In map forms, the key is the op name. In forms which contain no
|
||||
provided name (as a key or as a property of an element map), a name is automatically provided by the
|
||||
API.
|
||||
A name is automatically provided by the API when there is one missing.
|
||||
|
||||
### a single un-named op template
|
||||
|
||||
@@ -4,7 +4,7 @@ title: Op Template Variations
|
||||
|
||||
# Op Templates Variations
|
||||
|
||||
These examples are here to illustrate and test specific variations of op templates.
|
||||
These examples illustrate a variety of valid op template structures.
|
||||
|
||||
## Op Naming
|
||||
|
||||
@@ -301,47 +301,3 @@ ops:
|
||||
]
|
||||
```
|
||||
|
||||
## keyed name statement-map form WITHOUT name field WITHOUT op key
|
||||
|
||||
When statements are named by key, and you need to specify a query string of some type, then it must
|
||||
be explicitly part of the naming structure, as with a field name like `stmt` or `op`.
|
||||
|
||||
*yaml:*
|
||||
|
||||
```yaml
|
||||
ops:
|
||||
op1:
|
||||
field1: select * from ks1.tb1;
|
||||
field2: field 2 value
|
||||
```
|
||||
|
||||
*json:*
|
||||
|
||||
```json5
|
||||
{
|
||||
"ops": {
|
||||
"op1": {
|
||||
"field1": "select * from ks1.tb1;",
|
||||
"field2": "field 2 value"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
*ops:*
|
||||
|
||||
```json5
|
||||
[
|
||||
{
|
||||
"name": "block0--op1",
|
||||
"op": {
|
||||
"field1": "select * from ks1.tb1;",
|
||||
"field2": "field 2 value"
|
||||
},
|
||||
"tags": {
|
||||
"block": "block0",
|
||||
"name": "block0--op1"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
@@ -1,9 +1,9 @@
|
||||
# Command API
|
||||
# ParsedOp API
|
||||
|
||||
In the workload template examples, we show statements as being formed from a string value. This is a
|
||||
specific type of statement form, although it is possible to provide structured op templates as well.
|
||||
|
||||
**The Command API is responsible for converting all valid op template forms into a consistent and
|
||||
**The ParsedOp API is responsible for converting all valid op template forms into a consistent and
|
||||
unambiguous model.** Thus, the rules for mapping the various forms to the command model must be
|
||||
precise. Those rules are the substance of this specification.
|
||||
|
||||
@@ -0,0 +1,46 @@
|
||||
# SpecTest Formatting
|
||||
|
||||
The specifications and examples follow a pattern:
|
||||
|
||||
1. Some or part of a templated workload in yaml format.
|
||||
2. The JSON equivalent as it would be loaded. This is cross-checked against the result of parsing
|
||||
the yaml into data.
|
||||
3. The Workload API view of the same data rendered as a JSON data structure. This is cross-checked
|
||||
against the workload API's rendering of the loaded data.
|
||||
|
||||
To be matched by the testing layer, you must prefix each section with a format marker with emphasis,
|
||||
like this:
|
||||
|
||||
*format:*
|
||||
|
||||
```text
|
||||
body of example
|
||||
```
|
||||
|
||||
Further, to match the pattern above, these must occur in sequences like the following, with no other
|
||||
intervening content:
|
||||
|
||||
*yaml:*
|
||||
|
||||
```yaml
|
||||
# some yaml here
|
||||
```
|
||||
|
||||
*json:*
|
||||
|
||||
```
|
||||
[]
|
||||
```
|
||||
|
||||
*ops:*
|
||||
|
||||
```
|
||||
[]
|
||||
```
|
||||
|
||||
The above sequence of 6 contiguous markdown elements follows a recognizable pattern to the
|
||||
specification testing harness. The names above the sections are required to match and fenced
|
||||
code sections are required to follow each.
|
||||
|
||||
All the markdown files in this directory are loaded and scanned for this pattern, and all
|
||||
such sequences are verified each time NoSQLBench is built.
|
||||
@@ -1,21 +1,6 @@
|
||||
# Workload Templates
|
||||
# Workload Structure
|
||||
|
||||
The valid elements of the raw workload form are explained below, using YAML and JSON5 as a schematic
|
||||
language. This guide is not meant to be very explanatory for new users, but it can serve as a handy
|
||||
reference about how workloads can be structured.
|
||||
|
||||
Any bundled workload loader should test all of these fenced code blocks and confirm that the data
|
||||
structures are logically equivalent, using any json5 blocks as a trigger to compare against the
|
||||
prior block.
|
||||
**This document is a testable specification.**
|
||||
|
||||
While some examples below appear to be demonstrating basic cross-format encoding, there is more
|
||||
going on. This document captures a set of basic sanity rules for all raw workload data, as well as
|
||||
visual examples and narratives to assist users and maintainers. In short, if you don't see what you
|
||||
want to do here, it is probably not valid in the format, and if you know that to be false, then this
|
||||
document needs to be updated with more details!
|
||||
|
||||
# Keywords
|
||||
## Keywords
|
||||
|
||||
The following words have special meaning in templated workloads:
|
||||
|
||||
@@ -28,46 +13,6 @@ The following words have special meaning in templated workloads:
|
||||
- op, ops, operations statement, statements - defines op templates
|
||||
- blocks - groups any or all elements
|
||||
|
||||
# Layout of Examples
|
||||
|
||||
The specifications and examples below follow this pattern:
|
||||
|
||||
1. Some or part of a templated workload in yaml format.
|
||||
2. The JSON equivalent as it would be loaded. This is cross-checked against the result of parsing
|
||||
the yaml into data.
|
||||
3. The Workload API view of the same data rendered as a JSON data structure. This is cross-checked
|
||||
against the workload API's rendering of the loaded data.
|
||||
|
||||
To be matched by the testing layer, you must prefix each section with a format marker with emphasis,
|
||||
like this:
|
||||
|
||||
*format:*
|
||||
|
||||
```text
|
||||
body of example
|
||||
```
|
||||
|
||||
Further, to match the pattern above, these must occur in sequences like the following, with no other
|
||||
intervening content:
|
||||
|
||||
*yaml:*
|
||||
|
||||
```yaml
|
||||
# some yaml here
|
||||
```
|
||||
|
||||
*json:*
|
||||
|
||||
```
|
||||
[]
|
||||
```
|
||||
|
||||
*ops:*
|
||||
|
||||
```
|
||||
[]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Description
|
||||
@@ -364,15 +309,14 @@ tags:
|
||||
|
||||
## Blocks
|
||||
|
||||
Blocks are used to group operations which should be configured or run together such as during a
|
||||
specific part of a test sequence. Blocks can contain any of the defined elements above. Blocks are
|
||||
most useful for organizing a set of operations together, particularly when you put a tag on a block
|
||||
to filter by.
|
||||
Blocks are used to logically partition a workload for the purposes of grouping, configuring or
|
||||
executing subsets and op sequences. Blocks can contain any of the defined elements above.
|
||||
Every op template within a block automatically gets a tag with the name 'block' and the value of
|
||||
the block name. This makes it easy to select a whole block at a time with a tag filter like
|
||||
`tags=block:schema`.
|
||||
|
||||
Blocks are not recursive. You may not put a block inside another block.
|
||||
|
||||
You can think of a block as a subgroup of a document, where all the fields that are described above
|
||||
can be specified.
|
||||
|
||||
### named blocks as a map of property maps
|
||||
|
||||
@@ -571,4 +515,4 @@ blocks:
|
||||
# Putting things together
|
||||
|
||||
This document is focused on the basic properties that can be added to a templated workload. To see
|
||||
how they are combined together, see [templated_operations.md](templated_operations.md).
|
||||
how they are combined together, see [Op Templates Basics](op_template_basics.md).
|
||||
Reference in New Issue
Block a user