Merge branch 'master' into releases

2025-02-10 14:45:42 -06:00 · 2020-03-17 11:36:51 -05:00 · 2020-03-17 11:36:51 -05:00 · 335c88bc1d
commit 335c88bc1d
parent 9044ad774c ea68cd065c
13 changed files with 509 additions and 46 deletions
--- a/AUTOMATION.md
+++ b/AUTOMATION.md
@ -0,0 +1,19 @@
+## CI and Github Actions
+
+This project uses github actions to manage continuous integration.
+Below is a sketch of how the automation currently works.
+
+### releases
+
+The releases workflow is responsible for thie following:
+
+1. Build nb and nb.jar binaries and application content.
+2. Publish new nb.jar releases to maven via Sonatype OSSRH.
+3. Upload release assets to the newly published release.
+4. Upload updated docs to the github pages site for docs.nosqlbench.io
+5. (future) upload updated javadocs ot the github pages site for javadoc.nosqlbench.io/...
+
+### build
+
+The build workflow simply builds the project and then verifies it in that order
+using the standard maven mojo.
--- a/engine-docs/src/main/resources/docs-for-nb/00_nosqlbench/02_downloading.md
+++ b/engine-docs/src/main/resources/docs-for-nb/00_nosqlbench/02_downloading.md
--- a/engine-docs/src/main/resources/docs-for-nb/00_nosqlbench/04_support_options.md
+++ b/engine-docs/src/main/resources/docs-for-nb/00_nosqlbench/04_support_options.md
--- a/engine-docs/src/main/resources/docs-for-nb/00_nosqlbench/index.md
+++ b/engine-docs/src/main/resources/docs-for-nb/00_nosqlbench/index.md
@ -1,6 +1,6 @@
 ---
-title: Introducing NoSQLBench
-weight: 10
+title: NoSQLBench Intro
+weight: 0
 ---

 ## Welcome to NoSQLBench
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/advanced_testing.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/advanced_testing.md
@ -0,0 +1,92 @@
+---
+title: Advanced Testing
+weight: 13
+---
+
+:::info
+Some of the features discussed here are only for advanced testing scenarios.
+:::
+
+
+## Hybrid Rate Limiting
+
+Rate limiting is a complicated endeavor, if you want to do it well. The basic
+rub is that going fast means you have to be less accurate, and vice-versa.
+As such, rate limiting is a parasitic drain on any system. The act of rate
+limiting is in and of itself poses a limit to the maximum rate, regardless
+of the settings you pick, because this forces your system to interact with
+some hardware notion of time passing, and this takes CPU cycles that could
+be going to the thing you are limiting.
+
+This means that in practice, rate limiters are often very featureless. It's
+daunting enough to need rate limiting, and asking for anything more than
+that is often wishful thinking. Not so in NoSQLBench.
+
+The rate limiter in NoSQLBench provides a comparable degree of performance
+and accuracy to others found in the Java ecosystem, but it *also* has advanced
+features:
+
+- Allows a sliding scale between average rate limiting and strict rate limiting.
+- Internally accumulates delay time, for C.O. friendly metrics
+- It is resettable and reconfigurable on the fly
+- It provides its configured values in addition to performance data in metrics
+
+## Flexible Error Handling
+
+An emergent facility in NoSQLBench is the way that error are handled within
+an activity. For example, with the CQL activity type, you are able to route
+error handling for any of the known exception types. You can count errors,
+you can log them. You can cause errored operations to  auto-retry if possible,
+up to a configurable number of tries.
+
+This means, that as a user, you get to decide what your test is about. Is it
+about measuring some nominal but anticipated level of errors due to intentional
+over-saturation? If so, then count the errors, and look at their histogram data
+for timing details within the available timeout.
+
+Are you doing a basic stability test, where you want the test to error out
+for even the slightest error? You can configure for that if you need.
+
+## Cycle Logging
+
+It is possible to record the result status of each and every cycles in
+a NoSQLBench test run. If the results are mostly homogeneous, the RLE
+encoding of the results will reduce the output file down to a small
+fraction of the number of cycles. The errors are mapped to ordinals, and
+these ordinals are stored into a direct RLE-encoded log file. For most
+testing where most of the result are simply success, this file will be tiny.
+You can also convert the cycle log into textual form for other testing
+and post-processing and vice-versa.
+
+## Op Sequencing
+
+The way that operations are planned for execution in NoSQLBench is based on
+a stable ordering that is configurable. The statement forms are mixed
+together based on their relative ratios. The three schemes currently supported
+are round-robin with exhaustion (bucket), duplicate in order (concat), and
+a way to spread each statement out over the unit interval (interval). These
+account for most configuration scenarios without users having to micro-manage
+their statement templates.
+
+## Sync and Async
+
+There are two distinct usage modes in NoSQLBench when it comes to operation
+dispatch and thread management:
+
+### Sync
+
+Sync is the default form. In this mode, each thread reads its sequence
+and dispatches one statement at a time, holding only one operation in flight
+per thread. This is the mode you often use when you want to emulate an
+application's request-per-thread model, as it implicitly linearizes the
+order of operations within the computed sequence of statements.
+
+### Async
+
+In Async mode, each thread in an activity is reponsible for juggling a number
+of operations in-flight. This allows a NoSQLBench client to juggle an
+arbitrarily high number of connections, limited primarily by how much memory
+you have.
+
+Internally, the Sync and Async modes have different code paths. It is possible
+for an activity type to support one or both of these.
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/battle_tested_ideas.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/battle_tested_ideas.md
@ -0,0 +1,63 @@
+---
+title: Core Concepts
+weight: 2
+---
+
+The core concepts that NoSQLBench is built on have been scrutinized,
+replaced, refined, and hardened through several years of use
+by users of various needs and backgrounds.
+
+This is important when trying to find a way to express common patterns
+in what is often a highly fragmented practice. Testing is hard. Scale
+testing is hard. Distributed testing is hard. We need a set of conceptual
+building blocks that can span across workloads and system types, and
+machinery to put these concepts to use. Some concepts used in NoSQLBench
+are shared below for illustration, but this is by no means an exhaustive
+list.
+
+### The Cycle
+
+Cycles in NoSQLBench are whole numbers on a number line. All operations
+in a NoSQLBench session are derived from a single cycle. It's a long value,
+and a seed. The cycle determines not only which statements (of those available)
+will get executed, but it also determines what the values bound to that
+statement will be.
+
+Cycles are specified as a closed-open `[min,max)` interval, just as slices
+in some languages. That is, the min value is included in the range, but the
+max value is not. This means that you can stack slices using common numeric
+reference points without overlaps or gaps. It means you can have exact awareness
+of what data is in your dataset, even incrementally.
+
+You can think of a cycle as a single-valued coordinate system for data that
+lives adjacent to that number on the number line.
+
+### The Activity
+
+An activity is a multi-threaded flywheel of statements in some sequence
+and ratio. Activities run over the numbers in a cycle range. Each activity
+has a driver type which determines the native protocol that it speaks.
+
+An activity continuously
+
+### The Activity Type
+
+An activity type is a high level driver for a protocol. It is like a
+statement-aware cartridge that knows how to take a basic statement template
+and turn it into an operation for the scenario to execute.
+
+### The Scenario
+
+The scenario is a runtime session that holds the activities while they run.
+A NoSQLBench scenario is responsible for aggregating global runtime settings,
+metrics reporting channels, logfiles, and so on.
+
+### The Scenario Script
+
+Each scenario is governed by a script runs single-threaded, asynchronously
+from activities, but in control of activities. If needed, the scenario script
+is automatically created for the user, and the user never knows it is there.
+If the user has advanced testing requirements, then they may take advantage
+of the scripting capability at such time.
+When the script exits, *AND* all activities are complete, then the scenario
+is complete..
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/hifi_metrics.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/hifi_metrics.md
@ -0,0 +1,47 @@
+---
+title: High Fidelity Metrics
+weight: 12
+---
+
+## Discrete Reservoirs
+
+In NoSQLBench, we avoid the use of time-decaying metrics reservoirs.
+Internally, we use HDR reservoirs with discrete time boundaries. This
+is so that you can look at the min and max values and know that they
+apply accurately to the whole sampling window.
+
+## Metric Naming
+
+All activity types that run have a symbolic alias that identifies
+them for the purposes of automation and metrics. If you have multiple
+activities running concurrently, they will have different names and will
+be represnted distinctly in the metrics flow.
+
+## Precision and Units
+
+By default, the internal HDR histogram reservoirs are kept at 4 digits
+of precision. All timers are kept at nanosecond resolution.
+
+## Metrics Reportring
+
+Metrics can be reported via graphite as well as CSV, logs, HDR logs, and
+HDR stats summary CSV files.
+
+## Coordianated Omission
+
+The metrics naming and semantics in NoSQLBench are setup so that you
+can have coordinated omission metrics when they are appropriate, but
+there are no there changes when they are not. This means that the metric
+names and meanings remain stable in any case.
+
+Particularly, NoSQLBench avoids the term "latency" altogether as it is often overused
+and thus prone to confusing people.
+
+Instead, the terms `service time`, `wait time`, and `response time` are used.
+These are abbreviated in metrics as `servicetime`, `waittime`, and `responsetime`.
+
+The `servicetime` metric is the only one which is always present. When a
+rate limiter is used, then additionally `waittime` and `responsetime` are
+reported.
+
+
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/index.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/index.md
@ -0,0 +1,25 @@
+---
+title: NoSQLBench Showcase
+weight: 10
+---
+
+Since NoSQLBench is new on the scene in its current form, you may be wondering
+why you would want to use it over any other tool. That is what this section is all
+about.
+
+If you want to look under the hood of this toolkit before giving it a spin,
+this section is for you. You don't have to read all of this! It is here for those
+who want to know the answer to the question "So, what's the big deal??"
+Just remember it is here for later if you want to skip to the next section and get
+started testing.
+
+NoSQLBench can do nearly everything that other testing tools can do, and more. It
+achieves this by focusing on a scalable user experience in combination with a
+modular internal architecture.
+
+NoSQLBench is a workload construction and simulation tool for scalable systems
+testing. That is an entirely different scope of endeavor than most other tools.
+
+The pages in this section all speak to advanced capabilities that are unique
+to NoSQLBench. In time, we want to show these with basic scenario examples, right
+in the docs.
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/modular_architecture.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/modular_architecture.md
@ -0,0 +1,25 @@
+---
+title: Modular Architecture
+weight: 11
+---
+
+The internal architecture of NoSQLBench is modular throughout.
+Everything from the scripting extensions to the data generation functions
+is enumerated at compile time into a service descriptor, and then discovered
+at runtime by the SPI mechanism in Java.
+
+This means that extending and customizing bundles and features is quite
+manageable.
+
+It also means that it is relatively easy to provide a suitable
+API for multi-protocol support. In fact, there are several drivers
+avaialble in the current NoSQLBench distribution. You can list them
+out with `./nb --list-activity-types`, and you can get help on
+how to use each of them with `./nb help <name>`.
+
+This also is a way for us to encourage and empower other contributors
+to help develop the capabilities and reach of NoSQLBench as a bridge
+building tool in our community. This level of modularity is somewhat
+unusual, but it serves the purpose of helping users with new features.
+
+
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/portable_workloads.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/portable_workloads.md
@ -0,0 +1,49 @@
+---
+title: Portable Workloads
+weight: 2
+---
+
+All of the workloads that you can build with NoSQLBench are self-contained
+in a workload file. This is a statement-oriented configuration file that
+contains templates for the operations you want to run in a workload.
+
+This defines part of an activity - the iterative flywheel part that is
+run directly within an activity type.  This file contains everything needed
+to run a basic activity -- A set of statements in some ratio. It can be
+used to start an activity, or as part of several activities within a scenario.
+
+## Standard YAML Format
+
+The format for describing statements in NoSQLBench is generic, but in a
+particular way that is specialized around describing statements for a workload.
+
+That means that you can use the same YAML format to describe a workload
+for kafka as you can for Apache Cassandra or DSE.
+
+The YAML structure has been tailored to describing statements, their
+data generation bindings, how they are grouped and selected, and the
+parameters needed by drivers, like whether they should be prepared
+statements or not.
+
+Further, the YAML format allows for defaults and overrides with a
+very simple mechanism that reduces editing fatigue for frequent users.
+
+You can also template document-wide macro paramers which are taken
+from the command line parameters just like any other parameter. This is
+a way of templating a workload and make it multi-purpose or adjustable
+on the fly.
+
+## Experimentation Friendly
+
+Because the workload YAML format is generic across activity types,
+it is possible to ask one acivity type to interpret the statements that are
+meant for another. This isn't generally a good idea, but it becomes
+extremely handy when you want to have a very high level activity type like
+`stdout` use a lower-level syntax like that of the `cql` activity type.
+When you do this, the stdout activity type _plays_ the statements to your
+console as they would be executed in CQL, data bindings and all.
+
+This means you can empirically and substantively demonstrate and verify
+access patterns, data skew, and other dataset details before you
+change back to cql mode and turn up the settings for a higher scale test.
+
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/scripting_environment.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/scripting_environment.md
@ -0,0 +1,93 @@
+---
+title: Scripting Environment
+weight: 3
+---
+
+The ability to write open-ended testing simulations is provided in
+EngineBlock by means of a scripted runtime, where each scenario is
+driven from a control script that can do anything the user wants.
+
+## Dynamic Parameters
+
+Some configuration parameters of activities are designed to be
+assignable while a workload is running. This makes things like
+threads, rates, and other workload dynamics pseudo real-time.
+The internal APIs work with the scripting environment to expose
+these parameters directly to scenario scripts.
+
+## Scripting Automatons
+
+When a NoSQLBench scenario is running, it is under the control of a
+single-threaded script. Each activity that is started by this script
+is run within its own threadpool, asynchronously.
+
+The control script has executive control of the activities, as well
+as full visibility into the metrics that are provided by each activity.
+The way these two parts of the runtime meet is through the service
+objects which are installed into the scripting runtime. These service
+objects provide a named access point for each running activity and its
+metrics.
+
+This means that the scenario script can do something simple, like start
+activities and wait for them to complete, OR, it can do something
+more sophisticated like dynamically and interative scrutinize the metrics
+and make realtime adjustments to the workload while it runs.
+
+## Analysis Methods
+
+Scripting automatons that do feedback-oriented analysis of a target system
+are called analysis methods in NoSQLBench. We have prototypes a couple of
+these already, but there is nothing keeping the adventurous from coming up
+with their own.
+
+## Command Line Scripting
+
+The command line has the form of basic test commands and parameters.
+These command get converted directly into scenario control script
+in the order they appear. The user can choose whether to stay in
+high level executive mode, with simple commands like "run yaml=...",
+or to drop down directly into script design. They can look at the
+equivalent script for any command line by running --show-script.
+If you take the script that is dumped to console and run it, it should
+do exactly the same thing as if you hadn't even looked at it and just
+the standard commands.
+
+There are even ways to combine script fragments, full commands, and calls
+to scripts on the command line. Since each variant is merely a way of
+constructing scenario script, they all get composited together before
+the scenario script is run.
+
+New introductions to NoSQLBench should focus on the command line. Once
+a user is familiar with this, it is up to them whether to tap into the
+deeper functionality. If they don't need to know about scenario scripting,
+then they shouldn't have to learn about it to be effective.
+
+## Compared to DSLs
+
+Other tools may claim that their DSL makes scenario "simulation" easier.
+In practice, any DSL is generally dependent on a development tool to
+lay the language out in front of a user in a fluent way. This means that
+DSLs are almost always developer-targeted tools, and mostly useless for
+casual users who don't want to break out an IDE.
+
+One of the things a DSL proponent may tell you is that it tells you
+"all the things you can do!". This is de-facto the same thing as it
+telling you "all the things you can't do" because it's not part of the
+DSL. This is not a win for the user. For DSL-based systems, the user
+has to use the DSL whether or not it enhances their creative control,
+while in fact, most DSL aren't rich enough to do much that is interesting
+from a simulation perspective.
+
+In NoSQLBench, we don't force the user to use the programming abstractions
+except at a very surface level -- the CLI. It is up to the user whether
+or not to open the secret access panel for the more advance functionality.
+If they decide to do this, we give them a commodity language (ECMAScript),
+and we wire it into all the things they were already using. We don't take
+away their expressivity by telling them what they can't do. This way,
+users can pick their level of investment and reward as best fits thir individual
+needs, as it should be.
+
+## Scripting Extensions
+
+Also mentioned under the section on modularity, it is relatively easy
+for a developer to add their own scripting extensions into NoSQLBench.
--- a/engine-docs/src/main/resources/docs-for-nb/01_showcase/virtual_datasets.md
+++ b/engine-docs/src/main/resources/docs-for-nb/01_showcase/virtual_datasets.md
@ -0,0 +1,94 @@
+---
+title: Virtual DataSets
+weight: 1
+---
+
+The _Virtual Dataset_ capabilities within NoSQLBench allow you to
+generate data on the fly. There are many reasons for using this technique
+in testing, but it is often a topic that is overlooked or taken for granted.
+
+## Industrial Strength
+
+The algorithms used to generate data are based on
+advanced techniques in the realm of variate sampling. The authors have
+gone to great lengths to ensure that data generation is efficient and
+as much O(1) in processing time as possible.
+
+For example...
+
+One technique that is used to achieve this is to initialize and cache
+data in high resolution look-up tables for distributions which may perform
+differently depending on their density functions. The existing Apache
+Commons Math libraries have been adapted into a set of interpolated
+Inverse Cumulative Distribution sampling functions. This means that
+you can use a Zipfian distribution in the same place as you would a
+Uniform distribution, and once initialized, they sample with identical
+overhead. This means that by changing your test definition, you don't
+accidentally change the behavior of your test client.
+
+## The Right Tool
+
+Many other testing systems avoid building a dataset generation component.
+It's a toubgh problem to solve, so it's often just avoided. Instead, they use
+libraries like "faker" and variations on that. However, faker is well named,
+no pun intended. It was meant as a vignette library, not a source of test
+data for realistic results. If you are using a testing tool for scale testing
+and relying on a faker variant, then you will almost certainly get invalid
+results for any serious test.
+
+The virtual dataset component of NoSQLBench is a library that was designed
+for high scale and realistic data streams.
+
+## Deterministic
+
+The data that is generated by the virtual dataset libraries is determinstic.
+This means that for a given cycle in a test, the operation that is synthesized
+for that cycle will be the same from one session to the next. This is intentional.
+If you want to perturb the test data from one session to the next, then you can
+most easily do it by simply selecting a different set of cycles as your basis.
+
+This means that if you find something intersting in a test run, you can go
+back to it just by specifying the cycles in question. It also means that you
+aren't losing comparative value between tests with additional randomness thrown
+in. The data you generate will still look random to the human eye, but that doesn't
+mean that it can't be reproducible.
+
+## Statistically Shaped
+
+All this means is that the values you use to tie your dataset together
+can be specific to any distribution that is appropriate. You can ask for
+a stream of floating point values 1 trillion values long, in any order.
+You can use discrete or continuous distributions, with whatever parameters
+you need.
+
+## Best of Both Worlds
+
+Some might worry that fully synthetic testing data is not realistic enough.
+The devil is in the details on these arguments, but suffice it to say that
+you can pick the level of real data you use as seed data with NoSQLBench.
+
+For example, using the alias sampling method and a published US census
+(public domain) list of names and surnames tha occured more than 100x,
+we can provide extremely accurate samples of names according to the
+discrete distribution we know of. The alias method allows us to sample
+accurately in O(1) time from the entire dataset by turning a large number
+of weights into two uniform samples. You will simply not find a better way
+to sample names of US names than this. (but if you do, please file an issue!)
+
+## Java Idiomatic Extension
+
+The way that the virtual dataset component works allows Java developers to
+write any extension to the data generation functions simply in the form
+of Java 8 or newer Funtional interfaces. As long as they include the
+annotation processor and annotate their classes, they will show up in the
+runtime and be available to any workload by their class name.
+
+## Binding Recipes
+
+It is possible to stitch data generation functions together directly in
+a workload YAML. These are data-flow sketches of functions that can
+be copied and pasted between workload descriptions to share or remix
+data streams. This allows for the adventurous to build sophisticated
+virtual datasets that emulate nuances of real datasets, but in a form
+that takes up less space on the screen than this paragraph!
+
--- a/engine-docs/src/main/resources/docs-for-nb/08_showcase/index.md
+++ b/engine-docs/src/main/resources/docs-for-nb/08_showcase/index.md
@ -1,44 +0,0 @@
---
-title: NoSQLBench Showcase
-weight: 80
---
-
-Since NoSQLBench is new on the scene in its current form, you may be wondering
-why you would want to use it over any other tool. That is what this section is all
-about.
-
-First, a brief overview, vis-a-vis some tools you may have already used.
-
-NoSQLBench can do everything that other testing tools can do, and more. It
-achieves this by focusing on a scalable user experience in combination with a
-modular internal architecture.
-
-NoSQLBench is a workload construction and simulation tool for scalable systems
-testing. That is an entirely different scope of endeavor than most other tools.
-
-Here are some of the serious capabilities that are unique to NoSQLBench:
-
- Metrics are built-in
- Scenario Scripting
- Command Line Scripting
- Scripting Extensions
- Multi-Protocol Support
- Analysis Methods
- Deterministic Workloads
- Advanced Variate Sampling
- Attention to Performance
- Coordinated Omission is not special
- Synchronous and Asynchronous
- Flexible Sequencing
- Advanced Rate Limiting
- Blazing fast Procedural Data Generation
- Virtual Data Set Recipes with Lambdas
-  - Experimentation Friendly
-  - Industrial Strength
- Built-In Seed Data
- Not limited by a "DSL"
-
-1. You can control your test data, down to the operation.
-2. All operations are deterministic according to the cycle.
-3. The core concepts are battle tested and refined
-4. We serve quick workflows as well as advanced testing scenarios.