more specification details

This commit is contained in:
Jonathan Shook 2021-05-27 01:26:52 -05:00
parent 32a42f19a0
commit c6d8e4cf4c
7 changed files with 186 additions and 495 deletions

View File

@ -35,7 +35,8 @@ public class CommandTemplate<T> {
private final String name;
private final Map<String, String> statics = new HashMap<>();
private final Map<String, StringBindings> dynamics = new HashMap<>();
private final int mapsize;
transient private final int mapsize;
/**
* Create a CommandTemplate directly from an OpTemplate.
@ -149,7 +150,7 @@ public class CommandTemplate<T> {
/**
* Applyl the provided binding functions to the command template, yielding a map with concrete values
* Apply the provided binding functions to the command template, yielding a map with concrete values
* to be used by a native command.
*
* @param cycle The cycle value which will be used by the binding functions

View File

@ -1,137 +1,133 @@
# Workload Definition
This directory contains the design for a standard and extensible way of
loading workload definitions into a NoSQLBench activity. It covers both
the user-facing schematic of what can be specified and the driver-facing
API which allows for a variety of driver implementations to be supported.
This directory contains the design for a standard and extensible way of loading workload definitions
into a NoSQLBench activity.
It is highly recommended that you familiarize yourself with the details below the specifications if
you have not already. For the purposes of simplicity, all the user-facing language is called *
templating*, and all developer-facing functionality is called *API*.
## Templating Language
When users want to specify a set of operations to perform, they do so with the workload templating
format.
- [Templated Workloads](templated_workloads.md) - Basics of workloads templating
- [Templated Operations](templated_operations.md) - Details of op templating
- [Template Variables](template_variables.md) - Textual macros and default values
## Workload API
After a workload template is loaded into an activity, it is presented to the driver in an API which
is suitable for building executable ops in the native driver.
- [Command API](command_api.md) - Defines the API which developers see after a workload is fully
loaded.
## Workload Templating Format
The first half of this specification is focused solely on the schematic form that users provide to
NoSQLBench in order to describe a set of operations to perform.
It covers both the user-facing schematic of what can be specified the driver-facing API which allows
for a variety of driver implementations to be supported.
## Background Work
If you want to understand the rest of this document, it is crucial that
you have a working knowledge of the standard YAML format and several
examples from the current drivers. You can learn this from the main
documentation which demonstrates step-by-step how to build a workload.
Reading further in this document will be most useful for core NB
developers, but we are happy to take input from anybody.
If you want to understand the rest of this document, it is crucial that you have a working knowledge
of the standard YAML format and several examples from the current drivers. You can learn this from
the main documentation which demonstrates step-by-step how to build a workload. Reading further in
this document will be most useful for core NB developers, but we are happy to take input from
anybody.
## Overview of Workload Mapping
The purpose of this effort is to thoroughly standardize the concepts and
machinery of how operations are mapped from user configuration to
operations in flight. In the past, much of the glue logic between YAML and
an operation has been left to the NoSQLBench ActivityType -- the
high-level driver which acts as a binding layer between vendor APIs and
the NoSQLBench core machinery.
The purpose of this effort is to thoroughly standardize the concepts and machinery of how operations
are mapped from user configuration to operations in flight. In the past, much of the glue logic
between YAML and an operation has been left to the NoSQLBench ActivityType -- the high-level driver
which acts as a binding layer between vendor APIs and the NoSQLBench core machinery.
Now that there are several drivers implemented, each with their own minor
variations in how YAML could be interpreted, it's time to take stock of
the common path and codify it. The expected outcome of this effort are
several:
Now that there are several drivers implemented, each with their own minor variations in how YAML
could be interpreted, it's time to take stock of the common path and codify it. The expected outcome
of this effort are several:
- NoSQLBench drivers (AKA ActivityTypes) get much easier to implement.
- Standard NB driver features are either supported uniformly, or not at
all.
- The semantics of op templates and workload configuration are much more
clearly specified and demonstrated for users.
- All API surface area for this facet of NB can be tested in a very
tangible way, with docs and testing sharing one and the same examples.
- Standard NB driver features are either supported uniformly, or not at all.
- The semantics of op templates and workload configuration are much more clearly specified and
demonstrated for users.
- All API surface area for this facet of NB can be tested in a very tangible way, with docs and
testing sharing one and the same examples.
While these desires are enough alone to warrant an improvement, they are
also key to simplifying the design of two new drivers which are in the
works: gRPC and the Apache Cassandra Java driver version 4. The gRPC
driver will need to have a very clearly specified logical boundary on the
NB side to keep the combined system simple enough to explain and maintain.
While these desires are enough alone to warrant an improvement, they are also key to simplifying the
design of two new drivers which are in the works: gRPC and the Apache Cassandra Java driver version
4. The gRPC driver will need to have a very clearly specified logical boundary on the NB side to
keep the combined system simple enough to explain and maintain.
## Op Mapping Stages
As a workload definition is read and mapped into the form of an executable
activity in the NB process, it takes on different forms. Each stage can be
thought of as a more refined view or API through which the workload can be
seen. At each stage, specific processing is required to promote the more
generic form into a more specialized and consumable form by the next
layer.
As a workload definition is read and mapped into the form of an executable activity in the NB
process, it takes on different forms. Each stage can be thought of as a more refined view or API
through which the workload can be seen. At each stage, specific processing is required to promote
the more generic form into a more specialized and consumable form by the next layer.
It should be noted that mapping workload definitions to operations is not
something that needs to be done quickly. Instead, it is more important to
focus on user experience factors, such as flexibility, obviousness,
robustness, correctness, and so on. Thus, priority of design factors in
this part of NB is placed more on clear and purposeful abstractions and
less on optimizing for speed. The clarity and detail which is conveyed by
this layer to the driver developer will then enable them to focus on
building fast and correct op dispensers, which are built before the main
part of running a workload, but which are used at high speed while the
workload is running.
It should be noted that mapping workload definitions to operations is not something that needs to be
done quickly. Instead, it is more important to focus on user experience factors, such as
flexibility, obviousness, robustness, correctness, and so on. Thus, priority of design factors in
this part of NB is placed more on clear and purposeful abstractions and less on optimizing for
speed. The clarity and detail which is conveyed by this layer to the driver developer will then
enable them to focus on building fast and correct op dispensers, which are built before the main
part of running a workload, but which are used at high speed while the workload is running.
## Stored Form
Presently this is YAML, but it could be any format.
Each stored form requires a loader which can map its supported formats
into a raw data structure explained below.
Each stored form requires a loader which can map its supported formats into a raw data structure
explained below.
A Workload Loader is nothing more than a reader which can read a specific
format into a data structure.
A Workload Loader is nothing more than a reader which can read a specific format into a data
structure.
## Workload Template
**Workload templates are presented to NoSQLBench standardized data
structures.**
**Workload templates are presented to NoSQLBench standardized data structures.**
This is a data structure in basic object form. It is merely the most
obvious and direct in-memory representation of the contents of the stored
form. In Java, this looks like basic collections and primitive types, such
as Lists, Maps, Strings, and so on. The raw data structure form should
always be the most commodity type of representation for the target
language.
This is a data structure in basic object form. It is merely the most obvious and direct in-memory
representation of the contents of the stored form. In Java, this looks like basic collections and
primitive types, such as Lists, Maps, Strings, and so on. The raw data structure form should always
be the most commodity type of representation for the target language.
The workload template is meant to be a runtime model which can be
specified and presented to the scenario in multiple ways. As such,
scripting layers and similar integrations can build such data structures
programmatically, and provide them to the runtime directly. So long as the
programmer is aware of what is valid, providing a workload template as a
data structure should have the same effect as providing one from a yaml or
json file.
The workload template is meant to be a runtime model which can be specified and presented to the
scenario in multiple ways. As such, scripting layers and similar integrations can build such data
structures programmatically, and provide them to the runtime directly. So long as the programmer is
aware of what is valid, providing a workload template as a data structure should have the same
effect as providing one from a yaml or json file.
In this way, the NB workload data structure acts as a de-facto API of
sorts, although it has no methods or functions. It is simply a commodity
representation of a workload template. As such, the NoSQLBench runtime
must provide clear feedback to the user when invalid constructions are
given.
In this way, the NB workload data structure acts as a de-facto API of sorts, although it has no
methods or functions. It is simply a commodity representation of a workload template. As such, the
NoSQLBench runtime must provide clear feedback to the user when invalid constructions are given.
What is valid, what is not, and what each possible construction must be
codified in a clear and complete standard. The valid elements of a
workload template are documented in
[workload_templates.md](workload_templates.md), which serves as both an
explainer and a living specification. The contents of this file are tested
directly by NoSQLBench builds.
What is valid, what is not, and what each possible construction must be codified in a clear and
complete standard. The valid elements of a workload template are documented in
[workload_templates.md](workload_templates.md), which serves as both an explainer and a living
specification. The contents of this file are tested directly by NoSQLBench builds.
## Workload API
The workload template provides some layering possibilities which are
applied automatically for the user by the workload API. Specifically, any
bindings, params, or tags which are defined by name in an outer scope of
the structure are automatically used by operations which do not define
their own element of the same type and name. This happens at three levels:
The workload template provides some layering possibilities which are applied automatically for the
user by the workload API. Specifically, any bindings, params, or tags which are defined by name in
an outer scope of the structure are automatically used by operations which do not define their own
element of the same type and name. This happens at three levels:
document scope, block scope, and op scope. More details on this are in the
*designing workload* guide.
Since the workload template is meant to enable layered defaults, there is
a logical difference between the minimally-specified version of a workload
and that seen by a driver. Drivers access the workload through the lens of
the workload API, which is responsible for layering in the settings
Since the workload template is meant to enable layered defaults, there is a logical difference
between the minimally-specified version of a workload and that seen by a driver. Drivers access the
workload through the lens of the workload API, which is responsible for layering in the settings
applied to each op template.
This form of the workload is called the **rendered workload**, and is
presented to the driver as an accessible object model.
This form of the workload is called the **rendered workload**, and is presented to the driver as an
accessible object model.
## Specifications
- [Templated Workloads](templated_workloads.md) - defines how users can
specify workloads
- [Templated Operations](templated_operations.md) - defines how users can
specify various types of operation templates
- [Workload API](workload_api.md) - provides a normalized
view of operations to driver implementations
- [Rendered Command API](rendered_command.md) - provides op template views
to driver implementations in a highly flexible way

View File

@ -1,3 +1,26 @@
# Command API
Command templates are the third layer of workload templating. As described in other spec documents,
the other layers are:
1. [Workload level templates](templated_workloads.md) - This specification covers the basics of a
workload template, including the valid properties and structure.
2. [Operation level templates](templated_operations.md) - This specification covers how operations
can be specified, including semantics and structure.
3. Command level templates, explained below. These are the detailed views of what goes into an op
template, parsed and structured in a way that allows for efficient use at runtime.
Users do not create command templates directly. Instead, these are the *parsed* form of op templates
as seen by the NB driver. The whole point of a command template is to provide crisp semantics and
structure about what a user is asking a driver to do. Command Template
Command templates are essentially schematics for an operation. They are a structural interpretation
of the content provided by users in op templates. Each op template provided can be converted into a
command template. In short, the op template is the form that users tend to edit in yaml or provided
as a data structure via scripting. **Command templates are the view of an op template as seen by an
NB driver.**
```
### Command Templates
Command templates are part of the workload API.
@ -48,3 +71,6 @@ in contrast, may need to be realized dynamically for each cycle, given
that you don't know the value of the fields in the command until you know
the cycle value.
```

View File

@ -0,0 +1,2 @@
# Template Variables

View File

@ -110,10 +110,11 @@ NB. These are basically command line templates which can be invoked automaticall
name out on your command line. More details on their usage are in the workload construction guide.
We're focused merely on the structural rules here.
### single un-named step
*yaml:*
```yaml
# As named scenarios with a single un-named step
scenarios:
default: run driver=diag cycles=10
```
@ -134,12 +135,11 @@ scenarios:
[]
```
OR
### multiple named steps
*yaml:*
```yaml
# As named scenarios with named steps
scenarios:
default:
step1: run alias=first driver=diag cycles=10
@ -165,12 +165,11 @@ scenarios:
[]
```
OR
### list of un-named steps
*yaml:*
```yaml
# As named scenarios with a list of un-named steps
scenarios:
default:
- run alias=first driver=diag cycles=10
@ -196,6 +195,46 @@ scenarios:
[]
```
### silent locked step parameters
For scenario steps which should not be overridable by user parameters on the command line, a double
equals is used to lock the values for a given step without informing the user that their provided
value was ignored. This can be useful in cases where there are multiple steps and some parameters
should only be changeable for some steps.
*yaml:*
```yaml
# The user is not allowed to change the value for the alias parameter, and attempting to do so
# will cause an error to be thrown and the scenario halted.
scenarios:
default: run alias===first driver=diag cycles=10
```
*json:*
```json5
{
"scenarios": {
"default": "run alias===first driver=diag cycles=10"
}
}
```
*ops:*
```json5
[]
```
### verbose locked step parameters
For scenario steps which should not be overridable by user parameters on the command line, a triple
equals is used to indicate that changing these parameters is not allowed. If a user tries to
override a verbose locked parameter, an error is thrown and the scenario is not allowed to run. This
can be useful when you want to clearly indicate that a parameter must remain as it is.
---
## Bindings

View File

@ -1,390 +0,0 @@
# Workload API
The valid elements of the raw workload form are explained below, using
YAML and JSON5 as a schematic language. This guide is not meant to be very
explanatory for new users, but it can serve as a handy reference about how
workloads can be structured.
Any bundled workload loader should test all of these fenced code blocks
and confirm that the data structures are logically equivalent, using any
json5 blocks as a trigger to compare against the prior block.
**This document is a testable specification.**
While some of the examples below appear to be demonstrating basic
cross-format encoding, there is more going on. This document captures a
set of basic sanity rules for all raw workload data, as well as visual
examples and narratives to assist users and maintainers. In short, if you
don't see what you want to do here, it is probably not valid in the
format, and if you know that to be false, then this document needs to be
updated with more details!
---
## Description
**zero or one `description` fields:**
The first line of the description represents the summary of the
description in summary views. Otherwise, the whole value is used.
*yaml format:*
```yaml
description: |
summary of this workload
and more details
```
*json format:*
```json5
{
"description": "summary of this workload\nand more details\n"
}
```
*ops data format:*
```json5
[]
```
<sup>* </sup>This is empty since there are no statements.
---
## Scenarios
**zero or one `scenarios` fields, containing one of the following forms**
The way that you create macro-level workloads from individual stages is
called *named scenarios* in NB. These are basically command line templates
which can be invoked automatically by calling their name out on your
command line. More details on their usage are in the workload construction
guide. We're focused merely on the structural rules here.
```yaml
# As named scenarios with a single un-named step
scenarios:
default: run driver=diag cycles=10
```
```json5
{
"scenarios": {
"default": "run driver=diag cycles=10"
}
}
```
OR
```yaml
# As named scenarios with named steps
scenarios:
default:
step1: run alias=first driver=diag cycles=10
step2: run alias=second driver=diag cycles=10
```
```json5
{
"scenarios": {
"default": {
"step1": "run alias=first driver=diag cycles=10",
"step2": "run alias=second driver=diag cycles=10"
}
}
}
```
OR
```yaml
# As named scenarios with a list of un-named steps
scenarios:
default:
- run alias=first driver=diag cycles=10
- run alias=second driver=diag cycles=10
```
```json5
{
"scenarios": {
"default": [
"run alias=first driver=diag cycles=10",
"run alias=second driver=diag cycles=10"
]
}
}
```
---
## Bindings
**zero or one `bindings` fields, containing a map of named bindings
recipes**
Bindings are the functions which synthesize data for your operations. They
are specified in recipes which are just function chains from the provided
libraries.
```yaml
bindings:
cycle: Identity();
name: NumberNameToString();
```
```json5
{
"bindings": {
"cycle": "Identity();",
"name": "NumberNameToString();"
}
}
```
---
## Params
**zero of one `params` fields, containing a map of parameter names to
values**
Params are modifiers to your operations. They specify important details
which are not part of the operation's command or payload, like consistency
level, or timeout settings.
```yaml
params:
param1: pvalue1
param2: pvalue2
```
```json5
{
"params": {
"param1": "pvalue1",
"param2": "pvalue2"
}
}
```
---
## Tags
**zero or one `tags` fields, containing a map of tag names and values**
Tags are how you mark your operations for special inclusion into tests.
They are basically naming metadata that lets you filter what type of
operations you actually use. Further details on tags are in the workload
construction guide.
```yaml
tags:
phase: main
```
```json5
{
"tags": {
"phase": "main"
}
}
```
---
## Op Templates
The representation of an operation in the workload definition is the most
flexible as well as the most potentially confusion. The reasons for this
are explained in the README for this module. Thus, it is useful to be
detail oriented in these examples.
An op template, as expressed by the user, is just a recipe for how to
construct an operation at runtime. They are not operations. They are
merely blueprints that the driver uses to create real operations that can
be executed.
This applies at two levels:
1) When the user specifies their op template as part of a workload
definition.
2) When the loaded workload definition is promoted to a convenient
OpTemplate type for use by the driver developer.
Just be aware that this term can be used in both ways.
For historic reasons, the field name used for op templates in yaml files
is *statements*, although it will be valid to use any of `statement`,
`statements`, `op`, `ops`, `operation`, or `operations`. This is because
these names are all symbolic and familiar to certain protocols. The
recommended name is `ops` for most cases. Internally, pre-processing will
likely be used to convert them all to simply `ops`.
### a single un-named op template
```yaml
op: select * from bar.table;
```
```json5
{
"op": "select * from bar.table;"
}
```
### un-named op templates as a list of strings
```yaml
ops:
- select * from bar.table;
```
```json5
{
"ops": [
"select * from bar.table;"
]
}
```
### named op templates as a list of maps
```yaml
ops:
- op1: select * from bar.table;
```
```json5
{
"ops": [
{
"op1": "select * from bar.table;"
}
]
}
```
### named op templates as a map of strings
```yaml
ops:
op1: select * from bar.table;
```
```json5
{
"ops": {
"op1": "select * from bar.table;"
}
}
```
### named op templates as a map of maps
```yaml
ops:
op1:
stmt: select * from bar.table;
```
```json5
{
"ops": {
"op1": {
"stmt": "select * from bar.table;"
}
}
}
```
---
## Blocks
Blocks are used to group operations which should be configured or run
together such as during a specific part of a test sequence. Blocks can
contain any of the defined elements above.
### named blocks as a map of property maps
```yaml
blocks:
block1:
ops:
op1: select * from bar.table;
op2:
type: batch
stmt: insert into bar.table (a,b,c) values (1,2,3);
```
```json5
{
"blocks": {
"block1": {
"ops": {
"op1": "select * from bar.table;",
"op2": {
"type": "batch",
"stmt": "insert into bar.table (a,b,c) values (1,2,3);"
}
}
}
}
}
```
### un-named blocks as a list of property maps
```yaml
blocks:
- ops:
op1: select * from bar.table;
op2:
type: batch
stmt: insert into bar.table (a,b,c) values (1,2,3);
```
```json5
{
"blocks": [
{
"ops": {
"op1": "select * from bar.table;",
"op2": {
"type": "batch",
"stmt": "insert into bar.table (a,b,c) values (1,2,3);"
}
}
}
]
}
```
---
## Names
All documents, blocks, and ops within a workload can have an assigned
name. When map and list forms are both supported for entries, the map
form provides the name. When list forms are used, an additional field
named `name` can be used.
```yaml
blocks:
- name: myblock
op: "test op"
```
```json5
{
"blocks" : [
{
"name": "myblock",
"op": "test op"
}
]
}
```
# Normalization

View File

@ -1,5 +1,7 @@
package io.nosqlbench.engine.api.templating;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import io.nosqlbench.engine.api.activityconfig.StatementsLoader;
import io.nosqlbench.engine.api.activityconfig.yaml.OpTemplate;
import io.nosqlbench.engine.api.activityconfig.yaml.StmtsDocList;
@ -19,4 +21,19 @@ public class CommandTemplateTest {
assertThat(ct.isStatic()).isTrue();
}
@Test
public void testCommandTemplateFormat() {
Gson gson = new GsonBuilder().setPrettyPrinting().create();
StmtsDocList stmtsDocs = StatementsLoader.loadString("" +
"statements:\n" +
" - s1: test1=foo test2={bar}\n" +
" bindings:\n" +
" bar: NumberNameToString();\n");
OpTemplate stmtDef = stmtsDocs.getStmts().get(0);
CommandTemplate ct = new CommandTemplate(stmtDef);
String format = gson.toJson(ct);
System.out.println(format);
}
}