documentation for named scenarios

2025-02-25 18:55:28 -06:00 · 2020-03-24 20:58:34 -05:00
parent c19ce84693
commit 5506858edf
2 changed files with 206 additions and 24 deletions
--- a/engine-docs/src/main/resources/docs-for-nb/04_designing_workloads/00_yaml_org.md
+++ b/engine-docs/src/main/resources/docs-for-nb/04_designing_workloads/00_yaml_org.md
@@ -3,43 +3,68 @@ title: 00 YAML Organization
 weight: 00
 ---

-It is best to keep every workload self-contained within a single YAML file, including schema, data rampup, and the main phase of testing.
+It is best to keep every workload self-contained within a single YAML file,
+including schema, data rampup, and the main phase of testing.
 The phases of testing are controlled by tags as described in the Standard YAML section.

 :::info
 The phase names described below have been adopted as a convention within the
-built-in workloads. It is strongly advised that new workload YAMLs use the same tagging scheme so that workload are more plugable across YAMLs.
+built-in workloads. It is strongly advised that new workload YAMLs use the same
+tagging scheme so that workload are more plugable across YAMLs.
 :::

 ### Schema phase

-The schema phase is simply a phase of your test which creates the necessary schema on your target system. For CQL, this generally consists of a keyspace and one ore more table statements. There is no special schema layer in nosqlbench. All statements executed are simply statements. This provides the greatest flexibility in testing since every activity type is allowed to control its DDL and DML using the same machinery.
+The schema phase is simply a phase of your test which creates the necessary schema
+on your target system. For CQL, this generally consists of a keyspace and one ore
+more table statements. There is no special schema layer in nosqlbench. All statements
+executed are simply statements. This provides the greatest flexibility in testing since
+every activity type is allowed to control its DDL and DML using the same machinery.

-The schema phase is normally executed with defaults for most parameters. This means that statements will execute in the order specified in the YAML, in serialized form, exactly once. This is a welcome side-effect of how the initial parameters like _cycles_ is set from the statements which are activated by tagging.
+The schema phase is normally executed with defaults for most parameters. This means
+that statements will execute in the order specified in the YAML, in serialized form,
+exactly once. This is a welcome side-effect of how the initial parameters like _cycles_
+is set from the statements which are activated by tagging.

-You can mark statements as schema phase statements by adding this set of tags to the statements, either directly, or by block:
+You can mark statements as schema phase statements by adding this set of tags to the
+statements, either directly, or by block:

    tags:
      phase: schema

 ### Rampup phase

-When you run a performance test, it is very important to be aware of how much data is present. Higher density tests are more realistic for systems which accumulate data over time, or which have a large working set of data. The amount of data on the system you are testing should recreate a realistic amount of data that you would run in production, ideally. In general, there is a triangular trade-off between service time, op rate, and data density.
+When you run a performance test, it is very important to be aware of how much data is
+present. Higher density tests are more realistic for systems which accumulate data over
+time, or which have a large working set of data. The amount of data on the system you are
+testing should recreate a realistic amount of data that you would run in production,
+ideally. In general, there is a triangular trade-off between service time, op rate, and data density.

-It is the purpose of the _rampup_ phase to create the backdrop data on a target system that makes a test meaningful for some level of data density. Data density is normally discussed as average per node, but it is also important to consider distribution of data as it varies from the least dense to the most dense nodes.
+It is the purpose of the _rampup_ phase to create the backdrop data on a target system
+that makes a test meaningful for some level of data density. Data density is normally
+discussed as average per node, but it is also important to consider distribution of data
+as it varies from the least dense to the most dense nodes.

-Because it is useful to be able to add data to a target cluster in an incremental way, the bindings which are used with a _rampup_ phase may actually be different from the ones used for a _main_ phase. In most cases, you want the rampup phase to create data in a way that incrementally adds to the population of data in the cluster. This allows you to add some data to a cluster with `cycles=0..1M` and then decide whether to continue adding data using the next contiguous range of cycles, with `cycles=1M..2M` and so on.
+Because it is useful to be able to add data to a target cluster in an incremental way,
+the bindings which are used with a _rampup_ phase may actually be different from the
+ones used for a _main_ phase. In most cases, you want the rampup phase to create data
+in a way that incrementally adds to the population of data in the cluster. This allows
+you to add some data to a cluster with `cycles=0..1M` and then decide whether to
+continue adding data using the next contiguous range of cycles, with `cycles=1M..2M` and so on.

-You can mark statements as rampup phase statements by adding this set of tags to the statements, either directly, or by block:
+You can mark statements as rampup phase statements by adding this set of tags to the
+statements, either directly, or by block:

    tags:
      phase: rampup

 ### Main phase

-The main phase of a nosqlbench scenario is the one during which you really care about the metric. This is the actual test that everything else has prepared your system for.
+The main phase of a nosqlbench scenario is the one during which you really care about
+the metric. This is the actual test that everything else has prepared your system for.

-You can mark statement as schema phase statements by adding this set of tags to the statements, either directly, or by block:
+You can mark statement as schema phase statements by adding this set of tags to the
+statements, either directly, or by block:

    tags:
      phase: main
--- a/engine-docs/src/main/resources/docs-for-nb/04_designing_workloads/10_named_scenarios.md
+++ b/engine-docs/src/main/resources/docs-for-nb/04_designing_workloads/10_named_scenarios.md
@@ -7,6 +7,8 @@ weight: 10

 There is one final element of a yaml that you need to know about: _named scenarios_.

+**Named Scenarios allow anybody to run your testing workflows with a single command.**
+
 You can provide named scenarios for a workload like this:

 ```yaml
@@ -22,7 +24,27 @@ scenarios:
 This provides a way to specify more detailed workflows that users may want
 to run without them having to build up a command line for themselves.

-There are two ways to invoke a named scenario.
+A couple of other forms are supported in the YAML, for terseness:
+```yaml
+scenarios:
+ oneliner: run driver=diag cycles=10
+ mapform:
+  part1: run driver=diag cycles=10 alias=part2
+  part2: run driver=diag cycles=20 alias=part2
+```
+These forms simply provide finesse for common editing habits, but they are
+automatically read internally as a list. In the map form, the names are discarded,
+but they may be descriptive enough for use as inline docs for some users. The
+order is retained as listed, since the names have no bearing on the order.
+
+## Scenario selection
+
+When a named scenario is run, it is *always* named, so that it can be looked up
+in the list of named scenarios under your `scenarios:` property. The only
+exception to this is when an explicit scenario name is not found on the command
+line, in which case it is automatically assumed to be _default_.
+
+Some examples may be more illustrative:

 ```
 # runs the scenario named 'default' if it exists, or throws an error if it does not.
@@ -32,30 +54,165 @@ nb myworkloads default

 # runs the named scenario 'longrun' if it exists, or throws an error if it does not.
 nb myworkloads longrun
+
+# runs the named scenario 'longrun' if it exists, or throws an error if it does not.
+# this is simply the canonical form which is more verbose, but more explicit.
+nb scenario myworkloads longrun
+
+# run multiple named scenarios from one workload, and then some from another
+nb scenario myworkloads longrun default longrun scenario another.yaml name1 name2
+# In this form ^ you may have to add the explicit form to avoid conflicts between
+# workload names and scenario names. That's why the explicit form is provided, afterall.
 ```

+You can run multiple named scenarios in the same command if
+
+## Workload selection
+
+The examples above contain no reference to a workload (formerly called _yaml_).
+They don't need to, as they refer to themselves implicitly. You may add a `workload=`
+parameter to the command templates if you like, but this is never needed for basic
+use, and it is error prone to keep the filename matched to the command template. Just
+leave it out by default.
+
+_However_, if you are doing advanced scripting across multiple systems, you can
+actually provide a `workload=` parameter particularly to use another workload
+description in your test.
+
+:::info
+This is a powerful feature for workload automation and organization. However, it can
+get unweildy quickly. Caution is advised for deep-linking too many scenarios in a workspace,
+as there is no mechanism for keeping them in sync when small changes are made.
+:::
+
 ## Named Scenario Discovery

-Only workloads which include named scenarios will be easily discoverable by users
-who look for pre-baked scenarios.
+For named scenarios, there is a way for users to find all the named scenarios that are
+currently bundled or in view of their current directory. A couple simple rules must
+be followed by scenario publishers in order to keep things simple:
+
+1. Workload files in the current directory `*.yaml` are considered.
+2. Workload files under in the relative path `activities/` with name `*.yaml` are
+   considered.
+3. The same rules are used when looking in the bundled nosqlbench, so built-ins
+   come along for the ride.
+4. Any workload file that contains a `scenarios:` tag is included, but all others
+   are ignored.
+
+This doesn't mean that you can't use named scenarios for workloads in other locations.
+It simply means that when users use the `--list-scenarios` option, these are the only
+ones they will see listed.

 ## Parameter Overrides

 You can override parameters that are provided by named scenarios. Any parameter
-that you specify for the name scenario will override parameters of the same name
-in the named scenario's script.
+that you specify on the command line after your workload and optional scenario name
+will be used to override or augment the commands that are provided for the named scenario.

-## Examples
+This is powerful, but it also means that you can sometimes munge user-provided
+activity parameters on the command line with the named scenario commands in ways
+that may not make sense. To solve this, the parameters in the named scenario commands
+may be locked. You can lock them silently, or you can provide a verbose locking that will
+cause an error if the user even tries to adjust them.

+Silent locking is provided with a form like `param==value`. Any silent locked parameters
+will reject overrides from the command line, but will not interrupt the user.
+
+Verbose locking is provided with a form like `param===value`. Any time a user provides
+a parameter on the command line for the named parameter, an error is thrown and they
+are informed that this is not possible. This level is provided for cases in which you
+would not want the user to be unaware of an unset parameter which is germain and specific
+to the named scenario.
+
+All other parameters provided by the user will take the place of the same-named parameters
+provided in *each* command templates, in the order they appear in the template.
+Any other parameters provided by the user will be added to *each* of the command templates
+in the order they appear on the command line.
+
+This is a little counter-intuitive at first, but once you see some examples it should
+make sense.
+
+## Parameter Overide Examples
+
+Consider a simple workload with three named scenarios:
 ```yaml
-# example-scenarios.yaml
+# basics.yaml
 scenarios:
- default:
-  - run cycles=3 alias=A driver=stdout
-  - run cycles=5 alias=B driver=stdout
+ s1: run driver=stdout cycles=10
+ s2: run driver=stdout cycles==10
+ s3: run driver=stdout cycles===10
+
 bindings:
- cycle: Identity()
- name: NumberNameToCycle()
+ c: Identity()
+
 statements:
- - cycle: "cycle {cycle}\n"
+ - A: "cycle={c}\n"
 ```
+
+Running this with no options prompts the user to select one of the named scenarios:
+```
+$ nb basics
+ERROR: Unable to find named scenario 'default' in workload 'basics', but you can pick from s1,s2,s3
+$
+```
+
+### Basic Override example
+
+If you run the first scenario `s1` with your own value for `cycles=7`, it does as you
+ask:
+
+```
+$ nb basics s1 cycles=7
+Logging to logs/scenario_20200324_205121_554.log
+cycle=0
+cycle=1
+cycle=2
+cycle=3
+cycle=4
+cycle=5
+cycle=6
+$
+```
+
+### Silent Locking example
+
+If you run the second scenario `s2` with your own value for `cycles=7`, then it does
+what the locked parameter `cycles==10` requires, without telling you that it is
+ignoring the specified value on your command line.
+
+```
+$ nb basics s2 cycles=7
+Logging to logs/scenario_20200324_205339_486.log
+cycle=0
+cycle=1
+cycle=2
+cycle=3
+cycle=4
+cycle=5
+cycle=6
+cycle=7
+cycle=8
+cycle=9
+$
+```
+
+Sometimes, this is appropriate, such as when specifying settings like `threads==` for schema phases.
+
+### Verbose Locking example
+
+If you run the third scenario `s3` with your own value for `cycles=7`, then you
+will get an error telling you that this is not possible. Sometimes you want to
+make sure tha the user knows a parameter should not be changed, and that if they
+want to change it, they'll have to make their own custom version of the scenario
+in question.
+```
+$ nb basics s3 cycles=7
+ERROR: Unable to reassign value for locked param 'cycles===7'
+$
+```
+
+Ultimately, it is up to the scenario designer when to lock parameters for users.
+The built-in workloads offer some examples on how to set these parameters so that
+the right value are locked in place without bother the user, but some values
+are made very clear in how they should be set. Please look at these examples
+for inspiration when you need.