update uniform workload specification to current terms and structure

This commit is contained in:
Jonathan Shook
2022-06-23 18:45:35 -05:00
parent f67c722824
commit 8155ae7481
6 changed files with 134 additions and 241 deletions

View File

@@ -1,133 +1,95 @@
# Workload Definition
# Workload Specification
This directory contains the design for a standard and extensible way of loading workload definitions
into a NoSQLBench activity.
This directory contains the testable specification for workload definitions used by NoSQLBench.
It is highly recommended that you familiarize yourself with the details below the specifications if
you have not already. For the purposes of simplicity, all the user-facing language is called *
templating*, and all developer-facing functionality is called *API*.
## Op Templates vs Developer API
There are two primary views of workload definitions that we care about:
1. The User View of **op templates**
1. Op templates are simply the schematic recipes for building an operation.
2. Op templates are provided by users in YAML or JSON or even directly via runtime API.
3. Op templates can be provided with optional metadata which serves to label, group or
otherwise make the individual op templates more manageable.
4. A variety of forms are supported which are self-evident, but which allow users to have
some flexibility in how they structure their YAML, JSON, or runtime collections.
2. The Developer View of the ParsedOp API -- All op templates, regardless of the form they are
provided in, are processed into a normalized internal data structure.
1. The detailed documentation for the ParsedOp API is in javadoc.
The documentation in this directory serve as a testable specification for all the above. It
shows specific examples of all the valid op template forms in both YAML and JSON, as well as how
the data is normalized to feed developer's view of the ParsedOp API.
If you are a new user, it is recommended that you read the basic docs first before delving into
these specification-level docs too much. The intro docs show normative and simple ways to
specific workloads without worrying too much about all the possible forms.
## Templating Language
When users want to specify a set of operations to perform, they do so with the workload templating
format.
format, which includes document level details, block level details, and op level details.
Specific reserved words like `block` or `ops` are used in tandem with nesting structure to
define all valid workload constructions. Because of this, workload definitions are
essentially data structures comprised of basic collection types and primitive values. Any on-disk
format which can be loaded as such can be a valid source of workload definitions.
- [Templated Workloads](templated_workloads.md) - Basics of workloads templating
- [Templated Operations](templated_operations.md) - Details of op templating
- [SpecTest Formatting](spectest_formatting.md) - A primer on the example formats used here
- [Workload Structure](workload_structure.md) - Overall workload structure, keywords, nesting
features
- [Op Template Basics](op_template_basics.md) - Basic Details of op templating
- [Op Template Variations](op_template_variations.md) - Additional op template variants
and corner cases
- [Template Variables](template_variables.md) - Textual macros and default values
## Workload API
## ParsedOp API
After a workload template is loaded into an activity, it is presented to the driver in an API which
is suitable for building executable ops in the native driver.
- [Command API](command_api.md) - Defines the API which developers see after a workload is fully
- [ParsedOp API](parsed_op_api.md) - Defines the API which developers see after a workload is fully
loaded.
## Workload Templating Format
The first half of this specification is focused solely on the schematic form that users provide to
NoSQLBench in order to describe a set of operations to perform.
It covers both the user-facing schematic of what can be specified the driver-facing API which allows
for a variety of driver implementations to be supported.
## Background Work
## Related Reading
If you want to understand the rest of this document, it is crucial that you have a working knowledge
of the standard YAML format and several examples from the current drivers. You can learn this from
the main documentation which demonstrates step-by-step how to build a workload. Reading further in
this document will be most useful for core NB developers, but we are happy to take input from
anybody.
## Overview of Workload Mapping
The purpose of this effort is to thoroughly standardize the concepts and machinery of how operations
are mapped from user configuration to operations in flight. In the past, much of the glue logic
between YAML and an operation has been left to the NoSQLBench ActivityType -- the high-level driver
which acts as a binding layer between vendor APIs and the NoSQLBench core machinery.
Now that there are several drivers implemented, each with their own minor variations in how YAML
could be interpreted, it's time to take stock of the common path and codify it. The expected outcome
of this effort are several:
- NoSQLBench drivers (AKA ActivityTypes) get much easier to implement.
- Standard NB driver features are either supported uniformly, or not at all.
- The semantics of op templates and workload configuration are much more clearly specified and
demonstrated for users.
- All API surface area for this facet of NB can be tested in a very tangible way, with docs and
testing sharing one and the same examples.
While these desires are enough alone to warrant an improvement, they are also key to simplifying the
design of two new drivers which are in the works: gRPC and the Apache Cassandra Java driver version
4. The gRPC driver will need to have a very clearly specified logical boundary on the NB side to
keep the combined system simple enough to explain and maintain.
this document will be most useful for core NB developers, or advanced users who want to know all
the possible ways of building workloads.
## Op Mapping Stages
As a workload definition is read and mapped into the form of an executable activity in the NB
process, it takes on different forms. Each stage can be thought of as a more refined view or API
through which the workload can be seen. At each stage, specific processing is required to promote
the more generic form into a more specialized and consumable form by the next layer.
The process of loading a workload definition occurs in several discrete steps during a NoSQLBench
session:
It should be noted that mapping workload definitions to operations is not something that needs to be
done quickly. Instead, it is more important to focus on user experience factors, such as
1. The workload file is loaded.
2. Template variables are interposed.
3. The file is deserialized from its native form into a raw data structure.
4. The raw data structure is transformed into a normalized data structure according to the Op
Template normalization rules.
5. The data is provided to the ParsedOp API for use by the developer.
6. The DriverAdapter is loaded which understands the op fields provided in the op template.
7. The DriverAdapter uses its documented rules to determine which types of native driver operations
each op template is intended to represent. This is called **Op Mapping**.
8. The DriverAdapter uses the identified types to create dispensers of native driver operations.
This is called **Op Dispensing**.
9. The op dispensers are arranged into an indexed bank of op sources according to the specified
ratios and or sequencing strategy. From this point on, NoSQLBench has the ability to
construct an operation for any given cycle at high speed.
These specifications are focused on steps 2-5. The DriverAdapter focuses on the developer's use of
the ParsedOp API, and as such is documented in javadoc primarily. Some details on the ParsedOp
API are shared here for basic awareness, but developers should look to the javadoc for the full
story.
## Mapping vs Running
It should be noted that the Op Mapping stage, where user intentions are mapped from op templates to
native operations is not something that needs to be done quickly. This occurs at
_initialization_ time. Instead, it is more important to focus on user experience factors, such as
flexibility, obviousness, robustness, correctness, and so on. Thus, priority of design factors in
this part of NB is placed more on clear and purposeful abstractions and less on optimizing for
speed. The clarity and detail which is conveyed by this layer to the driver developer will then
enable them to focus on building fast and correct op dispensers, which are built before the main
part of running a workload, but which are used at high speed while the workload is running.
## Stored Form
Presently this is YAML, but it could be any format.
Each stored form requires a loader which can map its supported formats into a raw data structure
explained below.
A Workload Loader is nothing more than a reader which can read a specific format into a data
structure.
## Workload Template
**Workload templates are presented to NoSQLBench standardized data structures.**
This is a data structure in basic object form. It is merely the most obvious and direct in-memory
representation of the contents of the stored form. In Java, this looks like basic collections and
primitive types, such as Lists, Maps, Strings, and so on. The raw data structure form should always
be the most commodity type of representation for the target language.
The workload template is meant to be a runtime model which can be specified and presented to the
scenario in multiple ways. As such, scripting layers and similar integrations can build such data
structures programmatically, and provide them to the runtime directly. So long as the programmer is
aware of what is valid, providing a workload template as a data structure should have the same
effect as providing one from a yaml or json file.
In this way, the NB workload data structure acts as a de-facto API of sorts, although it has no
methods or functions. It is simply a commodity representation of a workload template. As such, the
NoSQLBench runtime must provide clear feedback to the user when invalid constructions are given.
What is valid, what is not, and what each possible construction must be codified in a clear and
complete standard. The valid elements of a workload template are documented in
[workload_templates.md](workload_templates.md), which serves as both an explainer and a living
specification. The contents of this file are tested directly by NoSQLBench builds.
## Workload API
The workload template provides some layering possibilities which are applied automatically for the
user by the workload API. Specifically, any bindings, params, or tags which are defined by name in
an outer scope of the structure are automatically used by operations which do not define their own
element of the same type and name. This happens at three levels:
document scope, block scope, and op scope. More details on this are in the
*designing workload* guide.
Since the workload template is meant to enable layered defaults, there is a logical difference
between the minimally-specified version of a workload and that seen by a driver. Drivers access the
workload through the lens of the workload API, which is responsible for layering in the settings
applied to each op template.
This form of the workload is called the **rendered workload**, and is presented to the driver as an
accessible object model.
enable them to focus on building fast and correct op dispensers. These dispensers are also
constructed before the workload starts running, but are used at high speed while the workload
is running.

View File

@@ -1,30 +1,15 @@
# Op Templates
The rules around op templates deserve a separate section, given that there are many techniques that
a user can choose from.
Op templates are the recipes provided by users for an operation. These hold examples of payload
data, metadata that configures the driver, timeout settings and so on.
The valid elements of the raw workload form are explained below, using YAML and JSON5 as a schematic
language. This guide is not meant to be very explanatory for new users, but it can serve as a handy
reference about how workloads can be structured.
The field name used in workload templates to represent operations can often be symbolic to users.
For this reason, several names are allowed: ops, op, operations, statements, statement. It doesn't
matter whether the value is provided as a map, list, or scalar. These all allow for the same
level of templating. Map forms are preferred, since they include naming in a more streamlined
structure. When you use list form, you have to provide the name as a separate field.
Any bundled workload loader should test all of these fenced code blocks and confirm that the data
structures are logically equivalent, using any json5 blocks as a trigger to compare against the
prior block.
**This document is a testable specification.**
While some of the examples below appear to be demonstrating basic cross-format encoding, there is
more going on. This document captures a set of basic sanity rules for all raw workload data, as well
as visual examples and narratives to assist users and maintainers. In short, if you don't see what
you want to do here, it is probably not valid in the format, and if you know that to be false, then
this document needs to be updated with more details!
The field used in workload templates to represent an operation can often be symbolic to users. For
this reason, several names are allowed: ops, op, operations, statements, statement. It doesn't
matter whether the value is provided as a map, list, or scalar. These all do the same thing,
although an error is thrown if you specify more than one. The interpretation is always the same: An
ordered collection of op templates. In map forms, the key is the op name. In forms which contain no
provided name (as a key or as a property of an element map), a name is automatically provided by the
API.
A name is automatically provided by the API when there is one missing.
### a single un-named op template

View File

@@ -4,7 +4,7 @@ title: Op Template Variations
# Op Templates Variations
These examples are here to illustrate and test specific variations of op templates.
These examples illustrate a variety of valid op template structures.
## Op Naming
@@ -301,47 +301,3 @@ ops:
]
```
## keyed name statement-map form WITHOUT name field WITHOUT op key
When statements are named by key, and you need to specify a query string of some type, then it must
be explicitly part of the naming structure, as with a field name like `stmt` or `op`.
*yaml:*
```yaml
ops:
op1:
field1: select * from ks1.tb1;
field2: field 2 value
```
*json:*
```json5
{
"ops": {
"op1": {
"field1": "select * from ks1.tb1;",
"field2": "field 2 value"
}
}
}
```
*ops:*
```json5
[
{
"name": "block0--op1",
"op": {
"field1": "select * from ks1.tb1;",
"field2": "field 2 value"
},
"tags": {
"block": "block0",
"name": "block0--op1"
}
}
]
```

View File

@@ -1,9 +1,9 @@
# Command API
# ParsedOp API
In the workload template examples, we show statements as being formed from a string value. This is a
specific type of statement form, although it is possible to provide structured op templates as well.
**The Command API is responsible for converting all valid op template forms into a consistent and
**The ParsedOp API is responsible for converting all valid op template forms into a consistent and
unambiguous model.** Thus, the rules for mapping the various forms to the command model must be
precise. Those rules are the substance of this specification.

View File

@@ -0,0 +1,46 @@
# SpecTest Formatting
The specifications and examples follow a pattern:
1. Some or part of a templated workload in yaml format.
2. The JSON equivalent as it would be loaded. This is cross-checked against the result of parsing
the yaml into data.
3. The Workload API view of the same data rendered as a JSON data structure. This is cross-checked
against the workload API's rendering of the loaded data.
To be matched by the testing layer, you must prefix each section with a format marker with emphasis,
like this:
*format:*
```text
body of example
```
Further, to match the pattern above, these must occur in sequences like the following, with no other
intervening content:
*yaml:*
```yaml
# some yaml here
```
*json:*
```
[]
```
*ops:*
```
[]
```
The above sequence of 6 contiguous markdown elements follows a recognizable pattern to the
specification testing harness. The names above the sections are required to match and fenced
code sections are required to follow each.
All the markdown files in this directory are loaded and scanned for this pattern, and all
such sequences are verified each time NoSQLBench is built.

View File

@@ -1,21 +1,6 @@
# Workload Templates
# Workload Structure
The valid elements of the raw workload form are explained below, using YAML and JSON5 as a schematic
language. This guide is not meant to be very explanatory for new users, but it can serve as a handy
reference about how workloads can be structured.
Any bundled workload loader should test all of these fenced code blocks and confirm that the data
structures are logically equivalent, using any json5 blocks as a trigger to compare against the
prior block.
**This document is a testable specification.**
While some examples below appear to be demonstrating basic cross-format encoding, there is more
going on. This document captures a set of basic sanity rules for all raw workload data, as well as
visual examples and narratives to assist users and maintainers. In short, if you don't see what you
want to do here, it is probably not valid in the format, and if you know that to be false, then this
document needs to be updated with more details!
# Keywords
## Keywords
The following words have special meaning in templated workloads:
@@ -28,46 +13,6 @@ The following words have special meaning in templated workloads:
- op, ops, operations statement, statements - defines op templates
- blocks - groups any or all elements
# Layout of Examples
The specifications and examples below follow this pattern:
1. Some or part of a templated workload in yaml format.
2. The JSON equivalent as it would be loaded. This is cross-checked against the result of parsing
the yaml into data.
3. The Workload API view of the same data rendered as a JSON data structure. This is cross-checked
against the workload API's rendering of the loaded data.
To be matched by the testing layer, you must prefix each section with a format marker with emphasis,
like this:
*format:*
```text
body of example
```
Further, to match the pattern above, these must occur in sequences like the following, with no other
intervening content:
*yaml:*
```yaml
# some yaml here
```
*json:*
```
[]
```
*ops:*
```
[]
```
---
## Description
@@ -364,15 +309,14 @@ tags:
## Blocks
Blocks are used to group operations which should be configured or run together such as during a
specific part of a test sequence. Blocks can contain any of the defined elements above. Blocks are
most useful for organizing a set of operations together, particularly when you put a tag on a block
to filter by.
Blocks are used to logically partition a workload for the purposes of grouping, configuring or
executing subsets and op sequences. Blocks can contain any of the defined elements above.
Every op template within a block automatically gets a tag with the name 'block' and the value of
the block name. This makes it easy to select a whole block at a time with a tag filter like
`tags=block:schema`.
Blocks are not recursive. You may not put a block inside another block.
You can think of a block as a subgroup of a document, where all the fields that are described above
can be specified.
### named blocks as a map of property maps
@@ -571,4 +515,4 @@ blocks:
# Putting things together
This document is focused on the basic properties that can be added to a templated workload. To see
how they are combined together, see [templated_operations.md](templated_operations.md).
how they are combined together, see [Op Templates Basics](op_template_basics.md).