fixed silly typos in showcase section

2025-02-25 18:55:28 -06:00 · 2020-04-06 22:06:00 -07:00 · 2020-04-06 22:06:00 -07:00 · 9de8ef722c
commit 9de8ef722c
parent 6156b15a21
8 changed files with 344 additions and 212 deletions
--- a/engine-docs/src/main/resources/docs-for-nb/showcase/advanced_testing.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/advanced_testing.md
@ -12,67 +12,91 @@ Some of the features discussed here are only for advanced testing scenarios.

 ## Hybrid Rate Limiting

-Rate limiting is a complicated endeavor, if you want to do it well. The basic rub is that going fast means you have to
-be less accurate, and vice-versa. As such, rate limiting is a parasitic drain on any system. The act of rate limiting is
-in and of itself poses a limit to the maximum rate, regardless of the settings you pick, because this forces your system
-to interact with some hardware notion of time passing, and this takes CPU cycles that could be going to the thing you
-are limiting.
+Rate limiting is a complicated endeavor, if you want to do it well. The
+basic rub is that going fast means you have to be less accurate, and
+vice-versa. As such, rate limiting is a parasitic drain on any system. The
+act of rate limiting itself poses a limit to the maximum rate, regardless
+of the settings you pick. This occurs as a side-effect of forcing your
+system to interact with some hardware notion of time passing, which takes
+CPU cycles that could be going to the thing you are limiting.

-This means that in practice, rate limiters are often very featureless. It's daunting enough to need rate limiting, and
-asking for anything more than that is often wishful thinking. Not so in NoSQLBench.
+This means that in practice, rate limiters are often very featureless.
+It's daunting enough to need rate limiting, and asking for anything more
+than that is often wishful thinking. Not so in NoSQLBench.

-The rate limiter in NoSQLBench provides a comparable degree of performance and accuracy to others found in the Java
-ecosystem, but it *also* has advanced features:
+The rate limiter in NoSQLBench provides a comparable degree of performance
+and accuracy to others found in the Java ecosystem, but it *also* has
+advanced features:

- Allows a sliding scale between average rate limiting and strict rate limiting.
- Internally accumulates delay time, for C.O. friendly metrics
- It is resettable and reconfigurable on the fly
- It provides its configured values in addition to performance data in metrics
+- It allows a sliding scale between average rate limiting and strict rate
+  limiting, called _bursting_.
+- It internally accumulates delay time, for C.O. friendly metrics which
+  are separately tracked for each and every operation.
+- It is resettable and reconfigurable on the fly, including the bursting
+  rate.
+- It provides its configured values in addition to performance data in
+  metrics, capturing your rate limiter settings as a simple matter of
+  metrics collection.
+- It comes with advanced scripting helpers which allow you to read data
+  directly from histogram reservoirs, or control the reservoir window
+  programmatically.

 ## Flexible Error Handling

-An emergent facility in NoSQLBench is the way that error are handled within an activity. For example, with the CQL
-activity type, you are able to route error handling for any of the known exception types. You can count errors, you can
-log them. You can cause errored operations to auto-retry if possible, up to a configurable number of tries.
+An emergent facility in NoSQLBench is the way that error are handled
+within an activity. For example, with the CQL activity type, you are able
+to route error handling for any of the known exception types. You can
+count errors, you can log them. You can cause errored operations to
+auto-retry if possible, up to a configurable number of tries.

-This means, that as a user, you get to decide what your test is about. Is it about measuring some nominal but
-anticipated level of errors due to intentional over-saturation? If so, then count the errors, and look at their
-histogram data for timing details within the available timeout.
+This means, that as a user, you get to decide what your test is about. Is
+it about measuring some nominal but anticipated level of errors due to
+intentional over-saturation? If so, then count the errors, and look at
+their histogram data for timing details within the available timeout.

-Are you doing a basic stability test, where you want the test to error out for even the slightest error? You can
-configure for that if you need.
+Are you doing a basic stability test, where you want the test to error out
+for even the slightest error? You can configure for that if you need.

 ## Cycle Logging

-It is possible to record the result status of each and every cycles in a NoSQLBench test run. If the results are mostly
-homogeneous, the RLE encoding of the results will reduce the output file down to a small fraction of the number of
-cycles. The errors are mapped to ordinals, and these ordinals are stored into a direct RLE-encoded log file. For most
-testing where most of the result are simply success, this file will be tiny. You can also convert the cycle log into
-textual form for other testing and post-processing and vice-versa.
+It is possible to record the result status of each and every cycles in a
+NoSQLBench test run. If the results are mostly homogeneous, the RLE
+encoding of the results will reduce the output file down to a small
+fraction of the number of cycles. The errors are mapped to ordinals by
+error type, and these ordinals are stored into a direct RLE-encoded log
+file. For most testing where most of the results are simply success, this
+file will be tiny. You can also convert the cycle log into textual form
+for other testing and post-processing and vice-versa.

 ## Op Sequencing

-The way that operations are planned for execution in NoSQLBench is based on a stable ordering that is configurable. The
-statement forms are mixed together based on their relative ratios. The three schemes currently supported are round-robin
-with exhaustion (bucket), duplicate in order (concat), and a way to spread each statement out over the unit interval
-(interval). These account for most configuration scenarios without users having to micro-manage their statement
-templates.
+The way that operations are planned for execution in NoSQLBench is based
+on a stable ordering that is configurable. The statement forms are mixed
+together based on their relative ratios. The three schemes currently
+supported are round-robin with exhaustion (bucket), duplicate in order
+(concat), and a way to spread each statement out over the unit interval
+(interval). These account for most configuration scenarios without users
+having to micro-manage their statement templates.

 ## Sync and Async

-There are two distinct usage modes in NoSQLBench when it comes to operation dispatch and thread management:
+There are two distinct usage modes in NoSQLBench when it comes to
+operation dispatch and thread management:

 ### Sync

-Sync is the default form. In this mode, each thread reads its sequence and dispatches one statement at a time, holding
-only one operation in flight per thread. This is the mode you often use when you want to emulate an application's
-request-per-thread model, as it implicitly linearizes the order of operations within the computed sequence of
-statements.
+Sync is the default form. In this mode, each thread reads its sequence and
+dispatches one statement at a time, holding only one operation in flight
+per thread. This is the mode you often use when you want to emulate an
+application's request-per-thread model, as it implicitly linearizes the
+order of operations within the computed sequence of statements.

 ### Async

-In Async mode, each thread in an activity is reponsible for juggling a number of operations in-flight. This allows a
-NoSQLBench client to juggle an arbitrarily high number of connections, limited primarily by how much memory you have.
+In Async mode, each thread in an activity is responsible for juggling a
+number of operations in-flight. This allows a NoSQLBench client to juggle
+an arbitrarily high number of connections, limited primarily by how much
+memory you have.

-Internally, the Sync and Async modes have different code paths. It is possible for an activity type to support one or
-both of these.
+Internally, the Sync and Async modes have different code paths. It is
+possible for an activity type to support one or both of these.
--- a/engine-docs/src/main/resources/docs-for-nb/showcase/battle_tested_ideas.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/battle_tested_ideas.md
@ -5,46 +5,72 @@ weight: 2

 # Refined Core Concepts

-The core concepts that NoSQLBench is built on have been scrutinized, replaced, refined, and hardened through several
-years of use by users of various needs and backgrounds.
+The core concepts that NoSQLBench is built on have been scrutinized,
+replaced, refined, and hardened through several years of use by users of
+various needs and backgrounds.

-This is important when trying to find a way to express common patterns in what is often a highly fragmented practice.
-Testing is hard. Scale testing is hard. Distributed testing is hard. We need a set of conceptual building blocks that
-can span across workloads and system types, and machinery to put these concepts to use. Some concepts used in NoSQLBench
-are shared below for illustration, but this is by no means an exhaustive list.
+This level of refinement is important when trying to find a way to express
+common patterns in what is often a highly fragmented practice. Testing is
+hard. Scale testing is hard. Distributed testing is hard. Combined, the
+challenge of executing realistic tests is often quite daunting to all but
+seasons test engineers. To make this worse, existing tools have only
+skirmished with this problem enough to make dents, but none has tackled
+full-on the lack of conceptual building blocks.
+
+This has to change. We need a set of conceptual building blocks that can
+span across workloads and system types, and machinery to put these
+concepts to use. This is why it is important to focus on finding a useful
+and robust set of concepts to use as the foundation for the rest of the
+toolkit to be built on. Finding these building blocks is often one of the
+most difficult tasks in systems design. Once you find and validate a
+useful set of concepts, everything else gets easier
+
+We feel that the success that we've already had using NoSQLBench has been
+strongly tied to the core concepts. Some concepts used in NoSQLBench are
+shared below for illustration, but this is by no means an exhaustive list.

 ### The Cycle

-Cycles in NoSQLBench are whole numbers on a number line. All operations in a NoSQLBench session are derived from a
-single cycle. It's a long value, and a seed. The cycle determines not only which statements (of those available) will
-get executed, but it also determines what the values bound to that statement will be.
+Cycles in NoSQLBench are whole numbers on a number line. Each operation in
+a NoSQLBench scenario is derived from a single cycle. It's a long value,
+and a seed. The cycle determines not only which statements is selected for
+execution, but also what synthetic payload data will be attached to it.

-Cycles are specified as a closed-open `[min,max)` interval, just as slices in some languages. That is, the min value is
-included in the range, but the max value is not. This means that you can stack slices using common numeric reference
-points without overlaps or gaps. It means you can have exact awareness of what data is in your dataset, even
-incrementally.
+Cycles are specified as a closed-open `[min,max)` interval, just as slices
+in some languages. That is, the min value is included in the range, but
+the max value is not. This means that you can stack slices using common
+numeric reference points without overlaps or gaps. It means you can have
+exact awareness of what data is in your dataset, even incrementally.

-You can think of a cycle as a single-valued coordinate system for data that lives adjacent to that number on the number
-line.
+You can think of a cycle as a single-valued coordinate system for data
+that lives adjacent to that number on the number line. In this way,
+virtual dataset functions are ways of converting coordinates into data.

 ### The Activity

-An activity is a multi-threaded flywheel of statements in some sequence and ratio. Activities run over the numbers in a
-cycle range. Each activity has a driver type which determines the native protocol that it speaks.
+An activity is a multi-threaded flywheel of statements in some sequence
+and ratio. Activities run over the numbers in a cycle range. Each activity
+has a driver type which determines the native protocol that it speaks.

-### The Activity Type
+### The Driver Type

-An activity type is a high level driver for a protocol. It is like a statement-aware cartridge that knows how to take a
-basic statement template and turn it into an operation for the scenario to execute.
+A driver type is a high level driver for a protocol. It is like a
+statement-aware cartridge that knows how to take a basic statement
+template and turn it into an operation for an activity to execute within
+the scenario.

 ### The Scenario

-The scenario is a runtime session that holds the activities while they run. A NoSQLBench scenario is responsible for
-aggregating global runtime settings, metrics reporting channels, logfiles, and so on.
+The scenario is a runtime session that holds the activities while they
+run. A NoSQLBench scenario is responsible for aggregating global runtime
+settings, metrics reporting channels, log files, and so on. All activities
+run within a scenario, under the control of the scenario script.

 ### The Scenario Script

-Each scenario is governed by a script runs single-threaded, asynchronously from activities, but in control of
-activities. If needed, the scenario script is automatically created for the user, and the user never knows it is there.
-If the user has advanced testing requirements, then they may take advantage of the scripting capability at such time.
-When the script exits, *AND* all activities are complete, then the scenario is complete..
+Each scenario is governed by a script runs single-threaded, asynchronously
+from activities, but in control of activities. If needed, the scenario
+script is automatically created for the user, and the user never knows it
+is there. If the user has advanced testing requirements, then they may
+take advantage of the scripting capability at such time. When the script
+exits, *AND* all activities are complete, then the scenario is complete.
--- a/engine-docs/src/main/resources/docs-for-nb/showcase/hifi_metrics.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/hifi_metrics.md
@ -5,43 +5,49 @@ weight: 12

 # High Fidelity Metrics

-Since NoSQLBench has been built as a serious testing tool for all users, some attention was necessary on the way metric
-are used.
+Since NoSQLBench has been built as a serious testing tool for all users,
+some attention was necessary on the way metric are used.

 ## Discrete Reservoirs

-In NoSQLBench, we avoid the use of time-decaying metrics reservoirs. Internally, we use HDR reservoirs with discrete
-time boundaries. This is so that you can look at the min and max values and know that they apply accurately to the whole
-sampling window.
+In NoSQLBench, we avoid the use of time-decaying metrics reservoirs.
+Internally, we use HDR reservoirs with discrete time boundaries. This is
+so that you can look at the min and max values and know that they apply
+accurately to the whole sampling window.

 ## Metric Naming

-All running activities have a symbolic alias that identifies them for the purposes of automation and metrics. If you
-have multiple activities running concurrently, they will have different names and will be represnted distinctly in the
-metrics flow.
+All running activities have a symbolic alias that identifies them for the
+purposes of automation and metrics. If you have multiple activities
+running concurrently, they will have different names and will be
+represented distinctly in the metrics flow.

 ## Precision and Units

-By default, the internal HDR histogram reservoirs are kept at 4 digits of precision. All timers are kept at nanosecond
-resolution.
+By default, the internal HDR histogram reservoirs are kept at 4 digits of
+precision. All timers are kept at nanosecond resolution.

-## Metrics Reportring
+## Metrics Reporting

-Metrics can be reported via graphite as well as CSV, logs, HDR logs, and HDR stats summary CSV files.
+Metrics can be reported via graphite as well as CSV, logs, HDR logs, and
+HDR stats summary CSV files.

-## Coordianated Omission
+## Coordinated Omission

-The metrics naming and semantics in NoSQLBench are setup so that you can have coordinated omission metrics when they are
-appropriate, but there are no there changes when they are not. This means that the metric names and meanings remain
-stable in any case.
+The metrics naming and semantics in NoSQLBench are setup so that you can
+have coordinated omission metrics when they are appropriate, but there are
+no there changes when they are not. This means that the metric names and
+meanings remain stable in any case.

-Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused and thus prone to confusing
-people.
+Particularly, NoSQLBench avoids the term "latency" altogether as it is
+often overused and thus prone to confusing people.

-Instead, the terms `service time`, `wait time`, and `response time` are used. These are abbreviated in metrics as
-`servicetime`, `waittime`, and `responsetime`.
+Instead, the terms `service time`, `wait time`, and `response time` are
+used. These are abbreviated in metrics as `servicetime`, `waittime`, and
+`responsetime`.

-The `servicetime` metric is the only one which is always present. When a rate limiter is used, then additionally
-`waittime` and `responsetime` are reported.
+The `servicetime` metric is the only one which is always present. When a
+rate limiter is used, then additionally `waittime` and `responsetime` are
+reported.


--- a/engine-docs/src/main/resources/docs-for-nb/showcase/index.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/index.md
@ -5,18 +5,22 @@ weight: 10

 # NoSQLBench Showcase

-Since NoSQLBench is new on the scene in its current form, you may be wondering why you would want to use it over any
-other tool. That is what this section is all about.
+Since NoSQLBench is new on the scene in its current form, you may be
+wondering why you would want to use it over any other tool. That is what
+this section is all about.

-If you want to look under the hood of this toolkit before giving it a spin, this section is for you. You don't have to
-read all of this! It is here for those who want to know the answer to the question "So, what's the big deal??" Just
-remember it is here for later if you want to skip to the next section and get started testing.
+You don't have to read all of this! It is here for those who want to know
+the answer to the question "So, what's the big deal??" Just remember it is
+here for later if you want to skip to the next section and get started
+testing.

-NoSQLBench can do nearly everything that other testing tools can do, and more. It achieves this by focusing on a
-scalable user experience in combination with a modular internal architecture.
+NoSQLBench can do nearly everything that other testing tools can do, and
+more. It achieves this by focusing on a scalable user experience in
+combination with a modular internal architecture.

-NoSQLBench is a workload construction and simulation tool for scalable systems testing. That is an entirely different
-scope of endeavor than most other tools.
+NoSQLBench is a workload construction and simulation tool for scalable
+systems testing. That is an entirely different scope of endeavor than most
+other tools.

-The pages in this section all speak to advanced capabilities that are unique to NoSQLBench. In time, we want to show
-these with basic scenario examples, right in the docs.
+The pages in this section all speak to a selection of advanced
+capabilities that are unique to NoSQLBench.
--- a/engine-docs/src/main/resources/docs-for-nb/showcase/modular_architecture.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/modular_architecture.md
@ -5,18 +5,23 @@ weight: 11

 # Modular Architecture

-The internal architecture of NoSQLBench is modular throughout. Everything from the scripting extensions to the data
-generation functions is enumerated at compile time into a service descriptor, and then discovered at runtime by the SPI
+The internal architecture of NoSQLBench is modular throughout. Everything
+from the scripting extensions to data generation is enumerated at compile
+time into a service descriptor, and then discovered at runtime by the SPI
 mechanism in Java.

-This means that extending and customizing bundles and features is quite manageable.
+This means that extending and customizing bundles and features is quite
+manageable.

-It also means that it is relatively easy to provide a suitable API for multi-protocol support. In fact, there are
-several drivers avaialble in the current NoSQLBench distribution. You can list them out with `./nb --list-drivers`, and
-you can get help on how to use each of them with `./nb help <name>`.
+It also means that it is relatively easy to provide a suitable API for
+multi-protocol support. In fact, there are several drivers available in
+the current NoSQLBench distribution. You can list them out with `nb
+--list-drivers`, and you can get help on how to use each of them with `nb
+help <driver name>`.

-This also is a way for us to encourage and empower other contributors to help develop the capabilities and reach of
-NoSQLBench as a bridge building tool in our community. This level of modularity is somewhat unusual, but it serves the
-purpose of helping users with new features.
+This also is a way for us to encourage and empower other contributors to
+help develop the capabilities and reach of NoSQLBench. By encouraging
+others to help us build NoSQLBench modules and extensions, we can help
+more users in the NoSQL community at large.


--- a/engine-docs/src/main/resources/docs-for-nb/showcase/portable_workloads.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/portable_workloads.md
@ -5,38 +5,46 @@ weight: 2

 # Portable Workloads

-All of the workloads that you can build with NoSQLBench are self-contained in a workload file. This is a
-statement-oriented configuration file that contains templates for the operations you want to run in a workload.
+All of the workloads that you can build with NoSQLBench are self-contained
+in a workload file. This is a statement-oriented configuration file that
+contains templates for the operations you want to run in a workload.

-This defines part of an activity - the iterative flywheel part that is run directly within an activity type. This file
-contains everything needed to run a basic activity -- A set of statements in some ratio. It can be used to start an
-activity, or as part of several activities within a scenario.
+This defines part of an activity - the iterative flywheel part that is run
+directly within an activity type. This file contains everything needed to
+run a basic activity -- A set of statements in some ratio. It can be used
+to start an activity, or as part of several activities within a scenario.

 ## Standard YAML Format

-The format for describing statements in NoSQLBench is generic, but in a particular way that is specialized around
-describing statements for a workload.
+The format for describing statements in NoSQLBench is generic, but in a
+particular way that is specialized around describing statements for a
+workload. That means that you can use the same YAML format to describe a
+workload for kafka as you can for Apache Cassandra or DSE.

-That means that you can use the same YAML format to describe a workload for kafka as you can for Apache Cassandra or
-DSE.
+The YAML structure has been tailored to describing statements, their data
+generation bindings, how they are grouped and selected, and the parameters
+needed by drivers, like whether they should be prepared statements or not.

-The YAML structure has been tailored to describing statements, their data generation bindings, how they are grouped and
-selected, and the parameters needed by drivers, like whether they should be prepared statements or not.
+Further, the YAML format allows for defaults and overrides with a very
+simple mechanism that reduces editing fatigue for frequent users.

-Further, the YAML format allows for defaults and overrides with a very simple mechanism that reduces editing fatigue for
-frequent users.
-
-You can also template document-wide macro paramers which are taken from the command line parameters just like any other
-parameter. This is a way of templating a workload and make it multi-purpose or adjustable on the fly.
+You can also template document-wide macro parameters which are taken from
+the command line just like any other parameter. This is a way of
+templating a workload and make it multi-purpose or adjustable on the fly.

 ## Experimentation Friendly

-Because the workload YAML format is generic across driver types, it is possible to ask one driver type to interpret the
-statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you want to
-have a high level driver type like `stdout` interpret the syntax of another driver like `cql`. When you do this, the
-stdout activity type _plays_ the statements to your console as they would be executed in CQL, data bindings and all.
+Because the workload YAML format is generic across driver types, it is
+possible to ask one driver type to interpret the statements that are meant
+for another. This isn't generally a good idea, but it becomes extremely
+handy when you want to have a high level driver type like `stdout`
+interpret the syntax of another driver like `cql`. When you do this, the
+stdout activity type _plays_ the statements to your console as they would
+be executed in CQL, data bindings and all.

-This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
-details before you change back to cql mode and turn up the settings for a higher scale test. It takes away the guess
-work about what your test is actually doing, and it works for all drivers.
+This means you can empirically and directly demonstrate and verify access
+patterns, data skew, and other dataset details before you change back to
+cql mode and turn up the settings for a higher scale test. It takes away
+the guess work about what your test is actually doing, and it works for
+all drivers.

--- a/engine-docs/src/main/resources/docs-for-nb/showcase/scripting_environment.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/scripting_environment.md
@ -5,68 +5,93 @@ weight: 3

 # Scripting Environment

-The ability to write open-ended testing simulations is provided in NoSQLBench by means of a scripted runtime, where each
-scenario is driven from a control script that can do anything the user wants.
+The ability to write open-ended testing simulations is provided in
+NoSQLBench by means of a scripted runtime, where each scenario is driven
+from a control script that can do anything the user wants.

 ## Dynamic Parameters

-Some configuration parameters of activities are designed to be assignable while a workload is running. This makes things
-like threads, rates, and other workload dynamics pseudo real-time. The internal APIs work with the scripting environment
-to expose these parameters directly to scenario scripts.
+Some configuration parameters of activities are designed to be assignable
+while a workload is running. This makes things like threads, rates, and
+other workload dynamics in real-time. The internal APIs work with the
+scripting environment to expose these parameters directly to scenario
+scripts. Drivers that are provided to NoSQLBench can also expose dynamic
+parameters in the same way so that anything can be scripted dynamically
+when needed.

 ## Scripting Automatons

-When a NoSQLBench scenario is running, it is under the control of a single-threaded script. Each activity that is
-started by this script is run within its own threadpool, asynchronously.
+When a NoSQLBench scenario is running, it is under the control of a
+single-threaded script. Each activity that is started by this script is
+run within its own thread pool, simultaneously and asynchronously.

-The control script has executive control of the activities, as well as full visibility into the metrics that are
-provided by each activity. The way these two parts of the runtime meet is through the service objects which are
-installed into the scripting runtime. These service objects provide a named access point for each running activity and
-its metrics.
+The control script has executive control of the activities, as well as
+full visibility into the metrics that are provided by each activity. The
+way these two parts of the runtime meet is through the service objects
+which are installed into the scripting runtime. These service objects
+provide a named access point for each running activity and its metrics.

-This means that the scenario script can do something simple, like start activities and wait for them to complete, OR, it
-can do something more sophisticated like dynamically and interative scrutinize the metrics and make realtime adjustments
-to the workload while it runs.
+This means that the scenario script can do something simple, like start
+activities and wait for them to complete, OR, it can do something more
+sophisticated like dynamically and iteratively scrutinize the metrics and
+make real-time adjustments to the workload while it runs.

 ## Analysis Methods

-Scripting automatons that do feedback-oriented analysis of a target system are called analysis methods in NoSQLBench. We
-have prototypes a couple of these already, but there is nothing keeping the adventurous from coming up with their own.
+Scripting automatons that do feedback-oriented analysis of a target system
+are called analysis methods in NoSQLBench. We have prototypes a couple of
+these already, but there is nothing keeping the adventurous from coming up
+with their own.

 ## Command Line Scripting

-The command line has the form of basic test commands and parameters. These command get converted directly into scenario
-control script in the order they appear. The user can choose whether to stay in high level executive mode, with simple
-commands like "run workload=...", or to drop down directly into script design. They can look at the equivalent script
-for any command line by running --show-script. If you take the script that is dumped to console and run it, it should do
-exactly the same thing as if you hadn't even looked at it and just the standard commands.
+The command line has the form of basic test commands and parameters. These
+command get converted directly into scenario control script in the order
+they appear. The user can choose whether to stay in high level executive
+mode, with simple commands like `nb test-scenario ...`, or to drop down
+directly into script design. They can look at the equivalent script for
+any command line by running --show-script. If you take the script that is
+dumped to console and run it, it will do exactly the same thing as if you
+hadn't even looked at it and just ran basic commands on the command line.

-There are even ways to combine script fragments, full commands, and calls to scripts on the command line. Since each
-variant is merely a way of constructing scenario script, they all get composited together before the scenario script is
-run.
+There are even ways to combine script fragments, full commands, and calls
+to scripts on the command line. Since each variant is merely a way of
+constructing scenario script, they all get composited together before the
+scenario script is run.

-New introductions to NoSQLBench should focus on the command line. Once a user is familiar with this, it is up to them
-whether to tap into the deeper functionality. If they don't need to know about scenario scripting, then they shouldn't
-have to learn about it to be effective.
+New introductions to NoSQLBench should focus on the command line. Once a
+user is familiar with this, it is up to them whether to tap into the
+deeper functionality. If they don't need to know about scenario scripting,
+then they shouldn't have to learn about it to be effective. This is what
+we are calling a _scalable user experience_.

 ## Compared to DSLs

-Other tools may claim that their DSL makes scenario "simulation" easier. In practice, any DSL is generally dependent on
-a development tool to lay the language out in front of a user in a fluent way. This means that DSLs are almost always
-developer-targeted tools, and mostly useless for casual users who don't want to break out an IDE.
+Other tools may claim that their DSL makes scenario "simulation" easier.
+In practice, any DSL is generally dependent on a development tool to lay
+the language out in front of a user in a fluent way. This means that DSLs
+are almost always developer-targeted tools, and mostly useless for casual
+users who don't want to break out an IDE.

-One of the things a DSL proponent may tell you is that it tells you "all the things you can do!". This is de-facto the
-same thing as it telling you "all the things you can't do" because it's not part of the DSL. This is not a win for the
-user. For DSL-based systems, the user has to use the DSL whether or not it enhances their creative control, while in
-fact, most DSL aren't rich enough to do much that is interesting from a simulation perspective.
+One of the things a DSL proponent may tell you is that it tells you "all
+the things you can do!". This is de-facto the same thing as it telling you
+"all the things you can't do" because it's not part of the DSL. This is
+not a win-win for the user. For DSL-based systems, the user has to use the
+DSL whether or not it enhances their creative control, while in fact, most
+DSLs aren't rich enough to do much that is interesting from a simulation
+perspective.

-In NoSQLBench, we don't force the user to use the programming abstractions except at a very surface level -- the CLI. It
-is up to the user whether or not to open the secret access panel for the more advance functionality. If they decide to
-do this, we give them a commodity language (ECMAScript), and we wire it into all the things they were already using. We
-don't take away their expressivity by telling them what they can't do. This way, users can pick their level of
-investment and reward as best fits thir individual needs, as it should be.
+In NoSQLBench, we don't force the user to use the programming abstractions
+except at a very surface level -- the CLI. It is up to the user whether or
+not to open the secret access panel for the more advance functionality. If
+they decide to do this, we give them a commodity language (ECMAScript),
+and we wire it into all the things they were already using. We don't take
+away their creative freedom by telling them what they can't do. This way,
+users can pick their level of investment and reward as best fits their
+individual needs, as it should be.

 ## Scripting Extensions

-Also mentioned under the section on modularity, it is relatively easy for a developer to add their own scripting
-extensions into NoSQLBench.
+Also mentioned under the section on modularity, it is relatively easy for
+a developer to add their own scripting extensions into NoSQLBench as named
+service objects.
--- a/engine-docs/src/main/resources/docs-for-nb/showcase/virtual_datasets.md
+++ b/engine-docs/src/main/resources/docs-for-nb/showcase/virtual_datasets.md
@ -5,71 +5,105 @@ weight: 1

 # Virtual Datasets

-The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate data on the fly. There are many reasons for
-using this technique in testing, but it is often a topic that is overlooked or taken for granted.
+The _Virtual Dataset_ capabilities within NoSQLBench allow you to generate
+data on the fly. There are many reasons for using this technique in
+testing, but it is often a topic that is overlooked or taken for granted.

 ## Industrial Strength

-The algorithms used to generate data are based on advanced techniques in the realm of variate sampling. The authors have
-gone to great lengths to ensure that data generation is efficient and as much O(1) in processing time as possible.
+The algorithms used to generate data are based on advanced techniques in
+the realm of variate sampling. The authors have gone to great lengths to
+ensure that data generation is efficient and as much O(1) in processing
+time as possible.

 For example...

-One technique that is used to achieve this is to initialize and cache data in high resolution look-up tables for
-distributions which may perform differently depending on their density functions. The existing Apache Commons Math
-libraries have been adapted into a set of interpolated Inverse Cumulative Distribution sampling functions. This means
-that you can use a Zipfian distribution in the same place as you would a Uniform distribution, and once initialized,
-they sample with identical overhead. This means that by changing your test definition, you don't accidentally change the
-behavior of your test client.
+One technique that is used to achieve this is to initialize and cache data
+in high resolution look-up tables for distributions which may otherwise
+perform differently depending on their respective density functions. The
+existing Apache Commons Math libraries have been adapted into a set of
+interpolated Inverse Cumulative Distribution sampling functions. This
+means that you can use them all in the same place as you would a Uniform
+distribution, and once initialized, they sample with identical overhead.
+This means that by changing your test definition, you don't accidentally
+change the behavior of your test client, only the data as intended.

-## The Right Tool
+## A Purpose-Built Tool

-Many other testing systems avoid building a dataset generation component. It's a toubgh problem to solve, so it's often
-just avoided. Instead, they use libraries like "faker" and variations on that. However, faker is well named, no pun
-intended. It was meant as a vignette library, not a source of test data for realistic results. If you are using a
-testing tool for scale testing and relying on a faker variant, then you will almost certainly get invalid results for
-any serious test.
+Many other testing systems avoid building a dataset generation component.
+It's a tough problem to solve, so it's often just avoided. Instead, they
+use libraries like "faker" or other sources of data which weren't designed
+for testing at scale. Faker is well named, no pun intended. It was meant
+as a vignette and wire-framing library, not a source of test data for
+realistic results. If you are using a testing tool for scale testing and
+relying on a faker variant, then you will almost certainly get invalid
+results that do not represent how a system would perform in production.

-The virtual dataset component of NoSQLBench is a library that was designed for high scale and realistic data streams.
+The virtual dataset component of NoSQLBench is a library that was designed
+for high scale and realistic data streams. It uses the limits of the data
+types in the JVM to simulate high cardinality datasets which approximate
+production data distributions for realistic and reproducible results.

 ## Deterministic

-The data that is generated by the virtual dataset libraries is determinstic. This means that for a given cycle in a
-test, the operation that is synthesized for that cycle will be the same from one session to the next. This is
-intentional. If you want to perturb the test data from one session to the next, then you can most easily do it by simply
+The data that is generated by the virtual dataset libraries is
+deterministic. This means that for a given cycle in a test, the operation
+that is synthesized for that cycle will be the same from one session to
+the next. This is intentional. If you want to perturb the test data from
+one session to the next, then you can most easily do it by simply
 selecting a different set of cycles as your basis.

-This means that if you find something intersting in a test run, you can go back to it just by specifying the cycles in
-question. It also means that you aren't losing comparative value between tests with additional randomness thrown in. The
-data you generate will still look random to the human eye, but that doesn't mean that it can't be reproducible.
+This means that if you find something interesting in a test run, you can
+go back to it just by specifying the cycles in question. It also means
+that you aren't losing comparative value between tests with additional
+randomness thrown in. The data you generate will still look random to the
+human eye, but that doesn't mean that it can't be reproducible.

 ## Statistically Shaped

-All this means is that the values you use to tie your dataset together can be specific to any distribution that is
-appropriate. You can ask for a stream of floating point values 1 trillion values long, in any order. You can use
-discrete or continuous distributions, with whatever parameters you need.
+All this means is that the values you use to tie your dataset together can
+be specific to any distribution that is appropriate. You can ask for a
+stream of floating point values 1 trillion values long, in any order. You
+can use discrete or continuous distributions, with whatever distribution
+parameters you need.

 ## Best of Both Worlds

-Some might worry that fully synthetic testing data is not realistic enough. The devil is in the details on these
-arguments, but suffice it to say that you can pick the level of real data you use as seed data with NoSQLBench.
+Some might worry that fully synthetic testing data is not realistic
+enough. The devil is in the details on these arguments, but suffice it to
+say that you can pick the level of real data you use as seed data with
+NoSQLBench. You don't have to choose between realism and agility. The
+procedural data generation approach allows you to have all the benefits of
+testing agility of low-entropy testing tools while retaining nearly all of
+the benefits of real testing data.

-For example, using the alias sampling method and a published US census (public domain) list of names and surnames tha
-occured more than 100x, we can provide extremely accurate samples of names according to the discrete distribution we
-know of. The alias method allows us to sample accurately in O(1) time from the entire dataset by turning a large number
-of weights into two uniform samples. You will simply not find a better way to sample names of US names than this. (but
-if you do, please file an issue!)
+For example, using the alias sampling method and a published US census
+(public domain) list of names and surnames tha occurred more than 100x, we
+can provide extremely accurate samples of names according to the published
+labels and weights. The alias method allows us to sample accurately in
+O(1) time from the entire dataset by turning a large number of weights
+into two uniform samples. You will simply not find a better way to sample
+realistic (US) names than this. (If you do, please file an issue!)
+Actually, any data set that you have in CSV form with a weight column can
+also be used this way, so you're not strictly limited to US census data.

 ## Java Idiomatic Extension

-The way that the virtual dataset component works allows Java developers to write any extension to the data generation
-functions simply in the form of Java 8 or newer Funtional interfaces. As long as they include the annotation processor
-and annotate their classes, they will show up in the runtime and be available to any workload by their class name.
+The way that the virtual dataset component works allows Java developers to
+write any extension to the data generation functions simply in the form of
+Java 8 or newer Functional interfaces. As long as they include the
+annotation processor and annotate their classes, they will show up in the
+runtime and be available to any workload by their class name.
+
+Additionally, annotation based examples and annotation processing is used
+to hoist function docs directly into the published docs that go along with
+any version of NoSQLBench.

 ## Binding Recipes

-It is possible to stitch data generation functions together directly in a workload YAML. These are data-flow sketches of
-functions that can be copied and pasted between workload descriptions to share or remix data streams. This allows for
-the adventurous to build sophisticated virtual datasets that emulate nuances of real datasets, but in a form that takes
+It is possible to stitch data generation functions together directly in a
+workload YAML. These are data-flow sketches of functions that can be
+copied and pasted between workload descriptions to share or remix data
+streams. This allows for the adventurous to build sophisticated virtual
+datasets that emulate nuances of real datasets, but in a form that takes
 up less space on the screen than this paragraph!
-