update uniform workload specification to current terms and structure

2025-02-25 18:55:28 -06:00 · 2022-06-23 18:45:35 -05:00
parent f67c722824
commit 8155ae7481
6 changed files with 134 additions and 241 deletions
--- a/adapters-api/src/main/resources/workload_definition/README.md
+++ b/adapters-api/src/main/resources/workload_definition/README.md
@@ -1,133 +1,95 @@
-# Workload Definition
+# Workload Specification

-This directory contains the design for a standard and extensible way of loading workload definitions
-into a NoSQLBench activity.
+This directory contains the testable specification for workload definitions used by NoSQLBench.

-It is highly recommended that you familiarize yourself with the details below the specifications if
-you have not already. For the purposes of simplicity, all the user-facing language is called *
-templating*, and all developer-facing functionality is called *API*.
+## Op Templates vs Developer API
+There are two primary views of workload definitions that we care about:
+
+1. The User View of **op templates**
+   1. Op templates are simply the schematic recipes for building an operation.
+   2. Op templates are provided by users in YAML or JSON or even directly via runtime API.
+   3. Op templates can be provided with optional metadata which serves to label, group or
+      otherwise make the individual op templates more manageable.
+   4. A variety of forms are supported which are self-evident, but which allow users to have
+      some flexibility in how they structure their YAML, JSON, or runtime collections.
+2. The Developer View of the ParsedOp API -- All op templates, regardless of the form they are
+   provided in, are processed into a normalized internal data structure.
+   1. The detailed documentation for the ParsedOp API is in javadoc.
+
+The documentation in this directory serve as a testable specification for all the above. It
+shows specific examples of all the valid op template forms in both YAML and JSON, as well as how
+the data is normalized to feed developer's view of the ParsedOp API.
+
+If you are a new user, it is recommended that you read the basic docs first before delving into
+these specification-level docs too much. The intro docs show normative and simple ways to
+specific workloads without worrying too much about all the possible forms.

 ## Templating Language

 When users want to specify a set of operations to perform, they do so with the workload templating
-format.
+format, which includes document level details, block level details, and op level details.
+Specific reserved words like `block` or `ops` are used in tandem with nesting structure to
+define all valid workload constructions. Because of this, workload definitions are
+essentially data structures comprised of basic collection types and primitive values. Any on-disk
+format which can be loaded as such can be a valid source of workload definitions.

- [Templated Workloads](templated_workloads.md) - Basics of workloads templating
- [Templated Operations](templated_operations.md) - Details of op templating
+- [SpecTest Formatting](spectest_formatting.md) - A primer on the example formats used here
+- [Workload Structure](workload_structure.md) - Overall workload structure, keywords, nesting
+  features
+- [Op Template Basics](op_template_basics.md) - Basic Details of op templating
+- [Op Template Variations](op_template_variations.md) - Additional op template variants
+  and corner cases
 - [Template Variables](template_variables.md) - Textual macros and default values

-## Workload API
+## ParsedOp API

 After a workload template is loaded into an activity, it is presented to the driver in an API which
 is suitable for building executable ops in the native driver.

- [Command API](command_api.md) - Defines the API which developers see after a workload is fully
+- [ParsedOp API](parsed_op_api.md) - Defines the API which developers see after a workload is fully
  loaded.

-## Workload Templating Format
-
-The first half of this specification is focused solely on the schematic form that users provide to
-NoSQLBench in order to describe a set of operations to perform.
-
-It covers both the user-facing schematic of what can be specified the driver-facing API which allows
-for a variety of driver implementations to be supported.
-
-## Background Work
+## Related Reading

 If you want to understand the rest of this document, it is crucial that you have a working knowledge
 of the standard YAML format and several examples from the current drivers. You can learn this from
 the main documentation which demonstrates step-by-step how to build a workload. Reading further in
-this document will be most useful for core NB developers, but we are happy to take input from
-anybody.
-
-## Overview of Workload Mapping
-
-The purpose of this effort is to thoroughly standardize the concepts and machinery of how operations
-are mapped from user configuration to operations in flight. In the past, much of the glue logic
-between YAML and an operation has been left to the NoSQLBench ActivityType -- the high-level driver
-which acts as a binding layer between vendor APIs and the NoSQLBench core machinery.
-
-Now that there are several drivers implemented, each with their own minor variations in how YAML
-could be interpreted, it's time to take stock of the common path and codify it. The expected outcome
-of this effort are several:
-
- NoSQLBench drivers (AKA ActivityTypes) get much easier to implement.
- Standard NB driver features are either supported uniformly, or not at all.
- The semantics of op templates and workload configuration are much more clearly specified and
-  demonstrated for users.
- All API surface area for this facet of NB can be tested in a very tangible way, with docs and
-  testing sharing one and the same examples.
-
-While these desires are enough alone to warrant an improvement, they are also key to simplifying the
-design of two new drivers which are in the works: gRPC and the Apache Cassandra Java driver version
-
-4. The gRPC driver will need to have a very clearly specified logical boundary on the NB side to
-   keep the combined system simple enough to explain and maintain.
+this document will be most useful for core NB developers, or advanced users who want to know all
+the possible ways of building workloads.

 ## Op Mapping Stages

-As a workload definition is read and mapped into the form of an executable activity in the NB
-process, it takes on different forms. Each stage can be thought of as a more refined view or API
-through which the workload can be seen. At each stage, specific processing is required to promote
-the more generic form into a more specialized and consumable form by the next layer.
+The process of loading a workload definition occurs in several discrete steps during a NoSQLBench
+session:

-It should be noted that mapping workload definitions to operations is not something that needs to be
-done quickly. Instead, it is more important to focus on user experience factors, such as
+1. The workload file is loaded.
+2. Template variables are interposed.
+3. The file is deserialized from its native form into a raw data structure.
+4. The raw data structure is transformed into a normalized data structure according to the Op
+   Template normalization rules.
+5. The data is provided to the ParsedOp API for use by the developer.
+6. The DriverAdapter is loaded which understands the op fields provided in the op template.
+7. The DriverAdapter uses its documented rules to determine which types of native driver operations
+   each op template is intended to represent. This is called **Op Mapping**.
+8. The DriverAdapter uses the identified types to create dispensers of native driver operations.
+   This is called **Op Dispensing**.
+9. The op dispensers are arranged into an indexed bank of op sources according to the specified
+   ratios and or sequencing strategy. From this point on, NoSQLBench has the ability to
+   construct an operation for any given cycle at high speed.
+
+These specifications are focused on steps 2-5. The DriverAdapter focuses on the developer's use of
+the ParsedOp API, and as such is documented in javadoc primarily. Some details on the ParsedOp
+API are shared here for basic awareness, but developers should look to the javadoc for the full
+story.
+
+## Mapping vs Running
+
+It should be noted that the Op Mapping stage, where user intentions are mapped from op templates to
+native operations is not something that needs to be done quickly. This occurs at
+_initialization_ time. Instead, it is more important to focus on user experience factors, such as
 flexibility, obviousness, robustness, correctness, and so on. Thus, priority of design factors in
 this part of NB is placed more on clear and purposeful abstractions and less on optimizing for
 speed. The clarity and detail which is conveyed by this layer to the driver developer will then
-enable them to focus on building fast and correct op dispensers, which are built before the main
-part of running a workload, but which are used at high speed while the workload is running.
-
-## Stored Form
-
-Presently this is YAML, but it could be any format.
-
-Each stored form requires a loader which can map its supported formats into a raw data structure
-explained below.
-
-A Workload Loader is nothing more than a reader which can read a specific format into a data
-structure.
-
-## Workload Template
-
-**Workload templates are presented to NoSQLBench standardized data structures.**
-
-This is a data structure in basic object form. It is merely the most obvious and direct in-memory
-representation of the contents of the stored form. In Java, this looks like basic collections and
-primitive types, such as Lists, Maps, Strings, and so on. The raw data structure form should always
-be the most commodity type of representation for the target language.
-
-The workload template is meant to be a runtime model which can be specified and presented to the
-scenario in multiple ways. As such, scripting layers and similar integrations can build such data
-structures programmatically, and provide them to the runtime directly. So long as the programmer is
-aware of what is valid, providing a workload template as a data structure should have the same
-effect as providing one from a yaml or json file.
-
-In this way, the NB workload data structure acts as a de-facto API of sorts, although it has no
-methods or functions. It is simply a commodity representation of a workload template. As such, the
-NoSQLBench runtime must provide clear feedback to the user when invalid constructions are given.
-
-What is valid, what is not, and what each possible construction must be codified in a clear and
-complete standard. The valid elements of a workload template are documented in
-[workload_templates.md](workload_templates.md), which serves as both an explainer and a living
-specification. The contents of this file are tested directly by NoSQLBench builds.
-
-## Workload API
-
-The workload template provides some layering possibilities which are applied automatically for the
-user by the workload API. Specifically, any bindings, params, or tags which are defined by name in
-an outer scope of the structure are automatically used by operations which do not define their own
-element of the same type and name. This happens at three levels:
-document scope, block scope, and op scope. More details on this are in the
-*designing workload* guide.
-
-Since the workload template is meant to enable layered defaults, there is a logical difference
-between the minimally-specified version of a workload and that seen by a driver. Drivers access the
-workload through the lens of the workload API, which is responsible for layering in the settings
-applied to each op template.
-
-This form of the workload is called the **rendered workload**, and is presented to the driver as an
-accessible object model.
-
-
+enable them to focus on building fast and correct op dispensers. These dispensers are also
+constructed before the workload starts running, but are used at high speed while the workload
+is running.
--- a/adapters-api/src/main/resources/workload_definition/templated_operations.md
+++ b/adapters-api/src/main/resources/workload_definition/templated_operations.md
@@ -1,30 +1,15 @@
 # Op Templates

-The rules around op templates deserve a separate section, given that there are many techniques that
-a user can choose from.
+Op templates are the recipes provided by users for an operation. These hold examples of payload
+data, metadata that configures the driver, timeout settings and so on.

-The valid elements of the raw workload form are explained below, using YAML and JSON5 as a schematic
-language. This guide is not meant to be very explanatory for new users, but it can serve as a handy
-reference about how workloads can be structured.
+The field name used in workload templates to represent operations can often be symbolic to users.
+For this reason, several names are allowed: ops, op, operations, statements, statement. It doesn't
+matter whether the value is provided as a map, list, or scalar. These all allow for the same
+level of templating. Map forms are preferred, since they include naming in a more streamlined
+structure. When you use list form, you have to provide the name as a separate field.

-Any bundled workload loader should test all of these fenced code blocks and confirm that the data
-structures are logically equivalent, using any json5 blocks as a trigger to compare against the
-prior block.
-**This document is a testable specification.**
-
-While some of the examples below appear to be demonstrating basic cross-format encoding, there is
-more going on. This document captures a set of basic sanity rules for all raw workload data, as well
-as visual examples and narratives to assist users and maintainers. In short, if you don't see what
-you want to do here, it is probably not valid in the format, and if you know that to be false, then
-this document needs to be updated with more details!
-
-The field used in workload templates to represent an operation can often be symbolic to users. For
-this reason, several names are allowed: ops, op, operations, statements, statement. It doesn't
-matter whether the value is provided as a map, list, or scalar. These all do the same thing,
-although an error is thrown if you specify more than one. The interpretation is always the same: An
-ordered collection of op templates. In map forms, the key is the op name. In forms which contain no
-provided name (as a key or as a property of an element map), a name is automatically provided by the
-API.
+A name is automatically provided by the API when there is one missing.

 ### a single un-named op template

--- a/adapters-api/src/main/resources/workload_definition/templated_operation_variations.md
+++ b/adapters-api/src/main/resources/workload_definition/templated_operation_variations.md
@@ -4,7 +4,7 @@ title: Op Template Variations

 # Op Templates Variations

-These examples are here to illustrate and test specific variations of op templates.
+These examples illustrate a variety of valid op template structures.

 ## Op Naming

@@ -301,47 +301,3 @@ ops:
 ]
 ```

-## keyed name statement-map form WITHOUT name field WITHOUT op key
-
-When statements are named by key, and you need to specify a query string of some type, then it must
-be explicitly part of the naming structure, as with a field name like `stmt` or `op`.
-
-*yaml:*
-
-```yaml
-ops:
-  op1:
-    field1: select * from ks1.tb1;
-    field2: field 2 value
-```
-
-*json:*
-
-```json5
-{
-  "ops": {
-    "op1": {
-      "field1": "select * from ks1.tb1;",
-      "field2": "field 2 value"
-    }
-  }
-}
-```
-
-*ops:*
-
-```json5
-[
-  {
-    "name": "block0--op1",
-    "op": {
-      "field1": "select * from ks1.tb1;",
-      "field2": "field 2 value"
-    },
-    "tags": {
-      "block": "block0",
-      "name": "block0--op1"
-    }
-  }
-]
-```
--- a/adapters-api/src/main/resources/workload_definition/parsed_op_api.md
+++ b/adapters-api/src/main/resources/workload_definition/parsed_op_api.md
@@ -1,9 +1,9 @@
-# Command API
+# ParsedOp API

 In the workload template examples, we show statements as being formed from a string value. This is a
 specific type of statement form, although it is possible to provide structured op templates as well.

-**The Command API is responsible for converting all valid op template forms into a consistent and
+**The ParsedOp API is responsible for converting all valid op template forms into a consistent and
 unambiguous model.** Thus, the rules for mapping the various forms to the command model must be
 precise. Those rules are the substance of this specification.

--- a/adapters-api/src/main/resources/workload_definition/spectest_formatting.md
+++ b/adapters-api/src/main/resources/workload_definition/spectest_formatting.md
@@ -0,0 +1,46 @@
+# SpecTest Formatting
+
+The specifications and examples follow a pattern:
+
+1. Some or part of a templated workload in yaml format.
+2. The JSON equivalent as it would be loaded. This is cross-checked against the result of parsing
+   the yaml into data.
+3. The Workload API view of the same data rendered as a JSON data structure. This is cross-checked
+   against the workload API's rendering of the loaded data.
+
+To be matched by the testing layer, you must prefix each section with a format marker with emphasis,
+like this:
+
+*format:*
+
+```text
+body of example
+```
+
+Further, to match the pattern above, these must occur in sequences like the following, with no other
+intervening content:
+
+*yaml:*
+
+```yaml
+# some yaml here
+```
+
+*json:*
+
+```
+[]
+```
+
+*ops:*
+
+```
+[]
+```
+
+The above sequence of 6 contiguous markdown elements follows a recognizable pattern to the
+specification testing harness. The names above the sections are required to match and fenced
+code sections are required to follow each.
+
+All the markdown files in this directory are loaded and scanned for this pattern, and all
+such sequences are verified each time NoSQLBench is built.
--- a/adapters-api/src/main/resources/workload_definition/templated_workloads.md
+++ b/adapters-api/src/main/resources/workload_definition/templated_workloads.md
@@ -1,21 +1,6 @@
-# Workload Templates
+# Workload Structure

-The valid elements of the raw workload form are explained below, using YAML and JSON5 as a schematic
-language. This guide is not meant to be very explanatory for new users, but it can serve as a handy
-reference about how workloads can be structured.
-
-Any bundled workload loader should test all of these fenced code blocks and confirm that the data
-structures are logically equivalent, using any json5 blocks as a trigger to compare against the
-prior block.
-**This document is a testable specification.**
-
-While some examples below appear to be demonstrating basic cross-format encoding, there is more
-going on. This document captures a set of basic sanity rules for all raw workload data, as well as
-visual examples and narratives to assist users and maintainers. In short, if you don't see what you
-want to do here, it is probably not valid in the format, and if you know that to be false, then this
-document needs to be updated with more details!
-
-# Keywords
+## Keywords

 The following words have special meaning in templated workloads:

@@ -28,46 +13,6 @@ The following words have special meaning in templated workloads:
 - op, ops, operations statement, statements - defines op templates
 - blocks - groups any or all elements

-# Layout of Examples
-
-The specifications and examples below follow this pattern:
-
-1. Some or part of a templated workload in yaml format.
-2. The JSON equivalent as it would be loaded. This is cross-checked against the result of parsing
-   the yaml into data.
-3. The Workload API view of the same data rendered as a JSON data structure. This is cross-checked
-   against the workload API's rendering of the loaded data.
-
-To be matched by the testing layer, you must prefix each section with a format marker with emphasis,
-like this:
-
-*format:*
-
-```text
-body of example
-```
-
-Further, to match the pattern above, these must occur in sequences like the following, with no other
-intervening content:
-
-*yaml:*
-
-```yaml
-# some yaml here
-```
-
-*json:*
-
-```
-[]
-```
-
-*ops:*
-
-```
-[]
-```
-
 ---

 ## Description
@@ -364,15 +309,14 @@ tags:

 ## Blocks

-Blocks are used to group operations which should be configured or run together such as during a
-specific part of a test sequence. Blocks can contain any of the defined elements above. Blocks are
-most useful for organizing a set of operations together, particularly when you put a tag on a block
-to filter by.
+Blocks are used to logically partition a workload for the purposes of grouping, configuring or
+executing subsets and op sequences. Blocks can contain any of the defined elements above.
+Every op template within a block automatically gets a tag with the name 'block' and the value of
+the block name. This makes it easy to select a whole block at a time with a tag filter like
+`tags=block:schema`.

 Blocks are not recursive. You may not put a block inside another block.

-You can think of a block as a subgroup of a document, where all the fields that are described above
-can be specified.

 ### named blocks as a map of property maps

@@ -571,4 +515,4 @@ blocks:
 # Putting things together

 This document is focused on the basic properties that can be added to a templated workload. To see
-how they are combined together, see [templated_operations.md](templated_operations.md).
+how they are combined together, see [Op Templates Basics](op_template_basics.md).