more specification details

2024-11-25 10:10:32 -06:00 · 2021-05-27 01:26:52 -05:00 · 2021-05-27 01:26:52 -05:00 · c6d8e4cf4c
commit c6d8e4cf4c
parent 32a42f19a0
7 changed files with 186 additions and 495 deletions
--- a/engine-api/src/main/java/io/nosqlbench/engine/api/templating/CommandTemplate.java
+++ b/engine-api/src/main/java/io/nosqlbench/engine/api/templating/CommandTemplate.java
@ -35,7 +35,8 @@ public class CommandTemplate<T> {
    private final String name;
    private final Map<String, String> statics = new HashMap<>();
    private final Map<String, StringBindings> dynamics = new HashMap<>();
-    private final int mapsize;
+
+    transient private final int mapsize;

    /**
     * Create a CommandTemplate directly from an OpTemplate.
@ -149,7 +150,7 @@ public class CommandTemplate<T> {


    /**
-     * Applyl the provided binding functions to the command template, yielding a map with concrete values
+     * Apply the provided binding functions to the command template, yielding a map with concrete values
     * to be used by a native command.
     *
     * @param cycle The cycle value which will be used by the binding functions
--- a/engine-api/src/main/resources/workload_definition/README.md
+++ b/engine-api/src/main/resources/workload_definition/README.md
@ -1,137 +1,133 @@
 # Workload Definition

-This directory contains the design for a standard and extensible way of
-loading workload definitions into a NoSQLBench activity. It covers both
-the user-facing schematic of what can be specified and the driver-facing
-API which allows for a variety of driver implementations to be supported.
+This directory contains the design for a standard and extensible way of loading workload definitions
+into a NoSQLBench activity.
+
+It is highly recommended that you familiarize yourself with the details below the specifications if
+you have not already. For the purposes of simplicity, all the user-facing language is called *
+templating*, and all developer-facing functionality is called *API*.
+
+## Templating Language
+
+When users want to specify a set of operations to perform, they do so with the workload templating
+format.
+
+- [Templated Workloads](templated_workloads.md) - Basics of workloads templating
+- [Templated Operations](templated_operations.md) - Details of op templating
+- [Template Variables](template_variables.md) - Textual macros and default values
+
+## Workload API
+
+After a workload template is loaded into an activity, it is presented to the driver in an API which
+is suitable for building executable ops in the native driver.
+
+- [Command API](command_api.md) - Defines the API which developers see after a workload is fully
+  loaded.
+
+## Workload Templating Format
+
+The first half of this specification is focused solely on the schematic form that users provide to
+NoSQLBench in order to describe a set of operations to perform.
+
+It covers both the user-facing schematic of what can be specified the driver-facing API which allows
+for a variety of driver implementations to be supported.

 ## Background Work

-If you want to understand the rest of this document, it is crucial that
-you have a working knowledge of the standard YAML format and several
-examples from the current drivers. You can learn this from the main
-documentation which demonstrates step-by-step how to build a workload.
-Reading further in this document will be most useful for core NB
-developers, but we are happy to take input from anybody.
+If you want to understand the rest of this document, it is crucial that you have a working knowledge
+of the standard YAML format and several examples from the current drivers. You can learn this from
+the main documentation which demonstrates step-by-step how to build a workload. Reading further in
+this document will be most useful for core NB developers, but we are happy to take input from
+anybody.

 ## Overview of Workload Mapping

-The purpose of this effort is to thoroughly standardize the concepts and
-machinery of how operations are mapped from user configuration to
-operations in flight. In the past, much of the glue logic between YAML and
-an operation has been left to the NoSQLBench ActivityType -- the
-high-level driver which acts as a binding layer between vendor APIs and
-the NoSQLBench core machinery.
+The purpose of this effort is to thoroughly standardize the concepts and machinery of how operations
+are mapped from user configuration to operations in flight. In the past, much of the glue logic
+between YAML and an operation has been left to the NoSQLBench ActivityType -- the high-level driver
+which acts as a binding layer between vendor APIs and the NoSQLBench core machinery.

-Now that there are several drivers implemented, each with their own minor
-variations in how YAML could be interpreted, it's time to take stock of
-the common path and codify it. The expected outcome of this effort are
-several:
+Now that there are several drivers implemented, each with their own minor variations in how YAML
+could be interpreted, it's time to take stock of the common path and codify it. The expected outcome
+of this effort are several:

 - NoSQLBench drivers (AKA ActivityTypes) get much easier to implement.
- Standard NB driver features are either supported uniformly, or not at
-  all.
- The semantics of op templates and workload configuration are much more
-  clearly specified and demonstrated for users.
- All API surface area for this facet of NB can be tested in a very
-  tangible way, with docs and testing sharing one and the same examples.
+- Standard NB driver features are either supported uniformly, or not at all.
+- The semantics of op templates and workload configuration are much more clearly specified and
+  demonstrated for users.
+- All API surface area for this facet of NB can be tested in a very tangible way, with docs and
+  testing sharing one and the same examples.

-While these desires are enough alone to warrant an improvement, they are
-also key to simplifying the design of two new drivers which are in the
-works: gRPC and the Apache Cassandra Java driver version 4. The gRPC
-driver will need to have a very clearly specified logical boundary on the
-NB side to keep the combined system simple enough to explain and maintain.
+While these desires are enough alone to warrant an improvement, they are also key to simplifying the
+design of two new drivers which are in the works: gRPC and the Apache Cassandra Java driver version
+
+4. The gRPC driver will need to have a very clearly specified logical boundary on the NB side to
+   keep the combined system simple enough to explain and maintain.

 ## Op Mapping Stages

-As a workload definition is read and mapped into the form of an executable
-activity in the NB process, it takes on different forms. Each stage can be
-thought of as a more refined view or API through which the workload can be
-seen. At each stage, specific processing is required to promote the more
-generic form into a more specialized and consumable form by the next
-layer.
+As a workload definition is read and mapped into the form of an executable activity in the NB
+process, it takes on different forms. Each stage can be thought of as a more refined view or API
+through which the workload can be seen. At each stage, specific processing is required to promote
+the more generic form into a more specialized and consumable form by the next layer.

-It should be noted that mapping workload definitions to operations is not
-something that needs to be done quickly. Instead, it is more important to
-focus on user experience factors, such as flexibility, obviousness,
-robustness, correctness, and so on. Thus, priority of design factors in
-this part of NB is placed more on clear and purposeful abstractions and
-less on optimizing for speed. The clarity and detail which is conveyed by
-this layer to the driver developer will then enable them to focus on
-building fast and correct op dispensers, which are built before the main
-part of running a workload, but which are used at high speed while the
-workload is running.
+It should be noted that mapping workload definitions to operations is not something that needs to be
+done quickly. Instead, it is more important to focus on user experience factors, such as
+flexibility, obviousness, robustness, correctness, and so on. Thus, priority of design factors in
+this part of NB is placed more on clear and purposeful abstractions and less on optimizing for
+speed. The clarity and detail which is conveyed by this layer to the driver developer will then
+enable them to focus on building fast and correct op dispensers, which are built before the main
+part of running a workload, but which are used at high speed while the workload is running.

 ## Stored Form

 Presently this is YAML, but it could be any format.

-Each stored form requires a loader which can map its supported formats
-into a raw data structure explained below.
+Each stored form requires a loader which can map its supported formats into a raw data structure
+explained below.

-A Workload Loader is nothing more than a reader which can read a specific
-format into a data structure.
+A Workload Loader is nothing more than a reader which can read a specific format into a data
+structure.

 ## Workload Template

-**Workload templates are presented to NoSQLBench standardized data
-structures.**
+**Workload templates are presented to NoSQLBench standardized data structures.**

-This is a data structure in basic object form. It is merely the most
-obvious and direct in-memory representation of the contents of the stored
-form. In Java, this looks like basic collections and primitive types, such
-as Lists, Maps, Strings, and so on. The raw data structure form should
-always be the most commodity type of representation for the target
-language.
+This is a data structure in basic object form. It is merely the most obvious and direct in-memory
+representation of the contents of the stored form. In Java, this looks like basic collections and
+primitive types, such as Lists, Maps, Strings, and so on. The raw data structure form should always
+be the most commodity type of representation for the target language.

-The workload template is meant to be a runtime model which can be
-specified and presented to the scenario in multiple ways. As such,
-scripting layers and similar integrations can build such data structures
-programmatically, and provide them to the runtime directly. So long as the
-programmer is aware of what is valid, providing a workload template as a
-data structure should have the same effect as providing one from a yaml or
-json file.
+The workload template is meant to be a runtime model which can be specified and presented to the
+scenario in multiple ways. As such, scripting layers and similar integrations can build such data
+structures programmatically, and provide them to the runtime directly. So long as the programmer is
+aware of what is valid, providing a workload template as a data structure should have the same
+effect as providing one from a yaml or json file.

-In this way, the NB workload data structure acts as a de-facto API of
-sorts, although it has no methods or functions. It is simply a commodity
-representation of a workload template. As such, the NoSQLBench runtime
-must provide clear feedback to the user when invalid constructions are
-given.
+In this way, the NB workload data structure acts as a de-facto API of sorts, although it has no
+methods or functions. It is simply a commodity representation of a workload template. As such, the
+NoSQLBench runtime must provide clear feedback to the user when invalid constructions are given.

-What is valid, what is not, and what each possible construction must be
-codified in a clear and complete standard. The valid elements of a
-workload template are documented in
-[workload_templates.md](workload_templates.md), which serves as both an
-explainer and a living specification. The contents of this file are tested
-directly by NoSQLBench builds.
+What is valid, what is not, and what each possible construction must be codified in a clear and
+complete standard. The valid elements of a workload template are documented in
+[workload_templates.md](workload_templates.md), which serves as both an explainer and a living
+specification. The contents of this file are tested directly by NoSQLBench builds.

 ## Workload API

-The workload template provides some layering possibilities which are
-applied automatically for the user by the workload API. Specifically, any
-bindings, params, or tags which are defined by name in an outer scope of
-the structure are automatically used by operations which do not define
-their own element of the same type and name. This happens at three levels:
+The workload template provides some layering possibilities which are applied automatically for the
+user by the workload API. Specifically, any bindings, params, or tags which are defined by name in
+an outer scope of the structure are automatically used by operations which do not define their own
+element of the same type and name. This happens at three levels:
 document scope, block scope, and op scope. More details on this are in the
 *designing workload* guide.

-Since the workload template is meant to enable layered defaults, there is
-a logical difference between the minimally-specified version of a workload
-and that seen by a driver. Drivers access the workload through the lens of
-the workload API, which is responsible for layering in the settings
+Since the workload template is meant to enable layered defaults, there is a logical difference
+between the minimally-specified version of a workload and that seen by a driver. Drivers access the
+workload through the lens of the workload API, which is responsible for layering in the settings
 applied to each op template.

-This form of the workload is called the **rendered workload**, and is
-presented to the driver as an accessible object model.
+This form of the workload is called the **rendered workload**, and is presented to the driver as an
+accessible object model.

-## Specifications
-
- [Templated Workloads](templated_workloads.md) - defines how users can
-  specify workloads
- [Templated Operations](templated_operations.md) - defines how users can
-  specify various types of operation templates
- [Workload API](workload_api.md) - provides a normalized
-  view of operations to driver implementations
- [Rendered Command API](rendered_command.md) - provides op template views
-  to driver implementations in a highly flexible way

--- a/engine-api/src/main/resources/workload_definition/rendered_command.md
+++ b/engine-api/src/main/resources/workload_definition/rendered_command.md
@ -1,3 +1,26 @@
+# Command API
+
+Command templates are the third layer of workload templating. As described in other spec documents,
+the other layers are:
+
+1. [Workload level templates](templated_workloads.md) - This specification covers the basics of a
+   workload template, including the valid properties and structure.
+2. [Operation level templates](templated_operations.md) - This specification covers how operations
+   can be specified, including semantics and structure.
+3. Command level templates, explained below. These are the detailed views of what goes into an op
+   template, parsed and structured in a way that allows for efficient use at runtime.
+
+Users do not create command templates directly. Instead, these are the *parsed* form of op templates
+as seen by the NB driver. The whole point of a command template is to provide crisp semantics and
+structure about what a user is asking a driver to do. Command Template
+
+Command templates are essentially schematics for an operation. They are a structural interpretation
+of the content provided by users in op templates. Each op template provided can be converted into a
+command template. In short, the op template is the form that users tend to edit in yaml or provided
+as a data structure via scripting. **Command templates are the view of an op template as seen by an
+NB driver.**
+
+```
 ### Command Templates

 Command templates are part of the workload API.
@ -48,3 +71,6 @@ in contrast, may need to be realized dynamically for each cycle, given
 that you don't know the value of the fields in the command until you know
 the cycle value.

+
+
+```
--- a/engine-api/src/main/resources/workload_definition/template_variables.md
+++ b/engine-api/src/main/resources/workload_definition/template_variables.md
@ -0,0 +1,2 @@
+# Template Variables
+
--- a/engine-api/src/main/resources/workload_definition/templated_workloads.md
+++ b/engine-api/src/main/resources/workload_definition/templated_workloads.md
@ -110,10 +110,11 @@ NB. These are basically command line templates which can be invoked automaticall
 name out on your command line. More details on their usage are in the workload construction guide.
 We're focused merely on the structural rules here.

+### single un-named step
+
 *yaml:*

 ```yaml
-# As named scenarios with a single un-named step
 scenarios:
    default: run driver=diag cycles=10
 ```
@ -134,12 +135,11 @@ scenarios:
 []
 ```

-OR
+### multiple named steps

 *yaml:*

 ```yaml
-# As named scenarios with named steps
 scenarios:
    default:
        step1: run alias=first driver=diag cycles=10
@ -165,12 +165,11 @@ scenarios:
 []
 ```

-OR
+### list of un-named steps

 *yaml:*

 ```yaml
-# As named scenarios with a list of un-named steps
 scenarios:
    default:
        - run alias=first driver=diag cycles=10
@ -196,6 +195,46 @@ scenarios:
 []
 ```

+### silent locked step parameters
+
+For scenario steps which should not be overridable by user parameters on the command line, a double
+equals is used to lock the values for a given step without informing the user that their provided
+value was ignored. This can be useful in cases where there are multiple steps and some parameters
+should only be changeable for some steps.
+
+*yaml:*
+
+```yaml
+# The user is not allowed to change the value for the alias parameter, and attempting to do so
+# will cause an error to be thrown and the scenario halted.
+scenarios:
+    default: run alias===first driver=diag cycles=10
+```
+
+*json:*
+
+```json5
+{
+    "scenarios": {
+        "default": "run alias===first driver=diag cycles=10"
+    }
+}
+```
+
+*ops:*
+
+```json5
+[]
+```
+
+### verbose locked step parameters
+
+For scenario steps which should not be overridable by user parameters on the command line, a triple
+equals is used to indicate that changing these parameters is not allowed. If a user tries to
+override a verbose locked parameter, an error is thrown and the scenario is not allowed to run. This
+can be useful when you want to clearly indicate that a parameter must remain as it is.
+
+
 ---

 ## Bindings
--- a/engine-api/src/main/resources/workload_definition/workload_api.md
+++ b/engine-api/src/main/resources/workload_definition/workload_api.md
@ -1,390 +0,0 @@
-# Workload API
-
-The valid elements of the raw workload form are explained below, using
-YAML and JSON5 as a schematic language. This guide is not meant to be very
-explanatory for new users, but it can serve as a handy reference about how
-workloads can be structured.
-
-Any bundled workload loader should test all of these fenced code blocks
-and confirm that the data structures are logically equivalent, using any
-json5 blocks as a trigger to compare against the prior block.
-**This document is a testable specification.**
-
-While some of the examples below appear to be demonstrating basic
-cross-format encoding, there is more going on. This document captures a
-set of basic sanity rules for all raw workload data, as well as visual
-examples and narratives to assist users and maintainers. In short, if you
-don't see what you want to do here, it is probably not valid in the
-format, and if you know that to be false, then this document needs to be
-updated with more details!
-
---
-
-## Description
-
-**zero or one `description` fields:**
-
-The first line of the description represents the summary of the
-description in summary views. Otherwise, the whole value is used.
-
-*yaml format:*
-```yaml
-description: |
-    summary of this workload
-    and more details
-```
-*json format:*
-```json5
-{
-    "description": "summary of this workload\nand more details\n"
-}
-```
-*ops data format:*
-```json5
-[]
-```
-<sup>* </sup>This is empty since there are no statements.
-
-
---
-
-## Scenarios
-
-**zero or one `scenarios` fields, containing one of the following forms**
-
-The way that you create macro-level workloads from individual stages is
-called *named scenarios* in NB. These are basically command line templates
-which can be invoked automatically by calling their name out on your
-command line. More details on their usage are in the workload construction
-guide. We're focused merely on the structural rules here.
-
-```yaml
-# As named scenarios with a single un-named step
-scenarios:
-    default: run driver=diag cycles=10
-```
-
-```json5
-{
-    "scenarios": {
-        "default": "run driver=diag cycles=10"
-    }
-}
-```
-
-OR
-
-```yaml
-# As named scenarios with named steps
-scenarios:
-    default:
-        step1: run alias=first driver=diag cycles=10
-        step2: run alias=second driver=diag cycles=10
-```
-
-```json5
-{
-    "scenarios": {
-        "default": {
-            "step1": "run alias=first driver=diag cycles=10",
-            "step2": "run alias=second driver=diag cycles=10"
-        }
-    }
-}
-```
-
-OR
-
-```yaml
-# As named scenarios with a list of un-named steps
-scenarios:
-    default:
-        - run alias=first driver=diag cycles=10
-        - run alias=second driver=diag cycles=10
-```
-
-```json5
-{
-    "scenarios": {
-        "default": [
-            "run alias=first driver=diag cycles=10",
-            "run alias=second driver=diag cycles=10"
-        ]
-    }
-}
-```
-
---
-
-## Bindings
-
-**zero or one `bindings` fields, containing a map of named bindings
-recipes**
-
-Bindings are the functions which synthesize data for your operations. They
-are specified in recipes which are just function chains from the provided
-libraries.
-
-```yaml
-bindings:
-    cycle: Identity();
-    name: NumberNameToString();
-```
-
-```json5
-{
-    "bindings": {
-        "cycle": "Identity();",
-        "name": "NumberNameToString();"
-    }
-}
-```
-
---
-
-## Params
-
-**zero of one `params` fields, containing a map of parameter names to
-values**
-
-Params are modifiers to your operations. They specify important details
-which are not part of the operation's command or payload, like consistency
-level, or timeout settings.
-
-```yaml
-params:
-    param1: pvalue1
-    param2: pvalue2
-```
-
-```json5
-{
-    "params": {
-        "param1": "pvalue1",
-        "param2": "pvalue2"
-    }
-}
-```
-
---
-
-## Tags
-
-**zero or one `tags` fields, containing a map of tag names and values**
-
-Tags are how you mark your operations for special inclusion into tests.
-They are basically naming metadata that lets you filter what type of
-operations you actually use. Further details on tags are in the workload
-construction guide.
-
-```yaml
-tags:
-    phase: main
-```
-
-```json5
-{
-    "tags": {
-        "phase": "main"
-    }
-}
-```
-
---
-
-## Op Templates
-
-The representation of an operation in the workload definition is the most
-flexible as well as the most potentially confusion. The reasons for this
-are explained in the README for this module. Thus, it is useful to be
-detail oriented in these examples.
-
-An op template, as expressed by the user, is just a recipe for how to
-construct an operation at runtime. They are not operations. They are
-merely blueprints that the driver uses to create real operations that can
-be executed.
-
-This applies at two levels:
-
-1) When the user specifies their op template as part of a workload
-   definition.
-2) When the loaded workload definition is promoted to a convenient
-   OpTemplate type for use by the driver developer.
-
-Just be aware that this term can be used in both ways.
-
-For historic reasons, the field name used for op templates in yaml files
-is *statements*, although it will be valid to use any of `statement`,
-`statements`, `op`, `ops`, `operation`, or `operations`. This is because
-these names are all symbolic and familiar to certain protocols. The
-recommended name is `ops` for most cases. Internally, pre-processing will
-likely be used to convert them all to simply `ops`.
-
-### a single un-named op template
-
-```yaml
-op: select * from bar.table;
-```
-
-```json5
-{
-    "op": "select * from bar.table;"
-}
-```
-
-### un-named op templates as a list of strings
-
-```yaml
-ops:
-    - select * from bar.table;
-```
-
-```json5
-{
-    "ops": [
-        "select * from bar.table;"
-    ]
-}
-```
-
-### named op templates as a list of maps
-
-```yaml
-ops:
-    - op1: select * from bar.table;
-```
-
-```json5
-{
-    "ops": [
-        {
-            "op1": "select * from bar.table;"
-        }
-    ]
-}
-```
-
-### named op templates as a map of strings
-
-```yaml
-ops:
-    op1: select * from bar.table;
-```
-
-```json5
-{
-    "ops": {
-        "op1": "select * from bar.table;"
-    }
-}
-```
-
-### named op templates as a map of maps
-
-```yaml
-ops:
-    op1:
-        stmt: select * from bar.table;
-```
-
-```json5
-{
-    "ops": {
-        "op1": {
-            "stmt": "select * from bar.table;"
-        }
-    }
-}
-```
-
---
-
-## Blocks
-
-Blocks are used to group operations which should be configured or run
-together such as during a specific part of a test sequence. Blocks can
-contain any of the defined elements above.
-
-### named blocks as a map of property maps
-
-```yaml
-blocks:
-    block1:
-        ops:
-            op1: select * from bar.table;
-            op2:
-                type: batch
-                stmt: insert into bar.table (a,b,c) values (1,2,3);
-```
-
-```json5
-{
-    "blocks": {
-        "block1": {
-            "ops": {
-                "op1": "select * from bar.table;",
-                "op2": {
-                    "type": "batch",
-                    "stmt": "insert into bar.table (a,b,c) values (1,2,3);"
-                }
-            }
-        }
-    }
-}
-```
-
-### un-named blocks as a list of property maps
-
-```yaml
-blocks:
-    - ops:
-          op1: select * from bar.table;
-          op2:
-              type: batch
-              stmt: insert into bar.table (a,b,c) values (1,2,3);
-```
-
-```json5
-{
-    "blocks": [
-        {
-            "ops": {
-                "op1": "select * from bar.table;",
-                "op2": {
-                    "type": "batch",
-                    "stmt": "insert into bar.table (a,b,c) values (1,2,3);"
-                }
-            }
-        }
-    ]
-}
-```
-
---
-
-## Names
-
-All documents, blocks, and ops within a workload can have an assigned
-name. When map and list forms are both supported for entries, the map
-form provides the name. When list forms are used, an additional field
-named `name` can be used.
-
-```yaml
-blocks:
-    - name: myblock
-      op: "test op"
-
-```
-```json5
-{
-    "blocks" : [
-        {
-            "name": "myblock",
-            "op": "test op"
-        }
-    ]
-}
-```
-
-# Normalization
-
--- a/engine-api/src/test/java/io/nosqlbench/engine/api/templating/CommandTemplateTest.java
+++ b/engine-api/src/test/java/io/nosqlbench/engine/api/templating/CommandTemplateTest.java
@ -1,5 +1,7 @@
 package io.nosqlbench.engine.api.templating;

+import com.google.gson.Gson;
+import com.google.gson.GsonBuilder;
 import io.nosqlbench.engine.api.activityconfig.StatementsLoader;
 import io.nosqlbench.engine.api.activityconfig.yaml.OpTemplate;
 import io.nosqlbench.engine.api.activityconfig.yaml.StmtsDocList;
@ -19,4 +21,19 @@ public class CommandTemplateTest {
        assertThat(ct.isStatic()).isTrue();
    }

+    @Test
+    public void testCommandTemplateFormat() {
+        Gson gson = new GsonBuilder().setPrettyPrinting().create();
+        StmtsDocList stmtsDocs = StatementsLoader.loadString("" +
+            "statements:\n" +
+            " - s1: test1=foo test2={bar}\n" +
+            "   bindings:\n" +
+            "    bar: NumberNameToString();\n");
+        OpTemplate stmtDef = stmtsDocs.getStmts().get(0);
+        CommandTemplate ct = new CommandTemplate(stmtDef);
+        String format = gson.toJson(ct);
+        System.out.println(format);
+
+    }
+
 }