Merge branch 'master' into releases

2025-02-25 18:55:28 -06:00 · 2020-03-13 02:51:18 -05:00 · 2020-03-13 02:51:18 -05:00 · fae0ea49e9
commit fae0ea49e9
parent e2a7090cde 91d38091f4
18 changed files with 205 additions and 100 deletions
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@ -1,4 +1,4 @@
-name: CI
+name: build

 on:
  push:
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@ -1,4 +1,4 @@
-name: CI
+name: release

 on:
  push:
--- a/README.md
+++ b/README.md
@ -1,3 +1,5 @@
+![maven build](https://github.com/nosqlbench/nosqlbench/workflows/CI/badge.svg)
+
 This project combines upstream projects of engineblock and virtualdataset into one main project. More details on release practices and contributor guidelines are on the way.

 # Status
--- a/engine-docs/src/main/resources/docs-for-nb/09_reference/02_cli_scripting.md
+++ b/engine-docs/src/main/resources/docs-for-nb/09_reference/02_cli_scripting.md
@ -1,6 +1,6 @@
 ---
 title: CLI Scripting
--------------------
+---

 # CLI Scripting

--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/concepts.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/concepts.md
@ -0,0 +1,100 @@
+---
+title: Binding Concepts
+weight: 10
+---
+
+NoSQLBench has a built-in library for the flexible management and expressive use of
+procedural generation libraries. This section explains the core concepts
+of this library, known as _Virtual Data Set_.
+
+## Variates (Samples)
+
+A numeric sample that is drawn from a distribution for the purpose
+of simulation or analysis is called a *Variate*.
+
+## Procedural Generation
+
+Procedural generation is a category of algorithms and techniques which take
+a set or stream of inputs and produce an output in a different form or structure.
+While it may appear that procedural generation actually _generates_ data, no output
+can come from a void. These techniques simply perturb a value in some stateful way,
+or map a coordinate system to another representation. Sometimes, both techniques are
+combined together.
+
+## Uniform Variate
+
+A variate (sample) drawn from a uniform (flat) distribution is what we are used
+to seeing when we ask a system for a "random" value. These are often produced in
+one of two very common forms, either a register full of bits as with most hashing
+functions, or a floating point value between 0.0 and 1.0. (This is called the _unit
+interval_).
+
+Uniform variates are not really random. Without careful attention to API usage,
+such random samples are not even unique from session to session. In many systems,
+the programmer has to be very careful to seed the random generator or they will
+get the same sequence of numbers every time they run their program. This turns out
+to be a useful property, and the random number generators that behave this way are
+usually called Pseudo-Random Number Generators, or PRNGs.
+
+## Apparently Random Variates
+
+Uniform variates produced by PRNGs are not actually random, even though they may
+pass certain tests for randomness. The streams of values produced are nearly
+always measurably random by some meaningful standard. However, they can be
+used again in exactly the same way with the same initial seed.
+
+## Deterministic Variates
+
+If you intentionally avoid randomizing the initial seed for a PRNG, for example,
+with the current timestamp, then it gives you a way to replay a sequence.
+You can think of each initial seed as a _bank_ of values which you can go back
+to at any time. However, when using stateful PRNGs as a way to provide these
+variates, your results will be order dependent.
+
+## Randomly Accessible Determinism
+
+Instead of using a PRNG, it is possible to use a hash function instead. With a 64-bit
+register, you have 2^64 (2^63 in practice due to available implementations) possible
+values. If your hash function has high dispersion, then you will effectively
+get the same result of apparent randomness as well as deterministic sequences, even
+when you use simple sequences of inputs to your _random()_ function. This allows
+you to access a random value in bucket 57, for example, and go back to it at any
+time and in any order to get the same value again.
+
+## Data Mapping Functions
+
+The data mapping functions are the core building block of virtual data set.
+Data mapping functions are generally pure functions. This simply means that
+a generator function will always provide the same result given the same input.
+The parameters that you will see on some binding recipes are not representative
+of volatile state. These parameters are initializer values which are part of a
+function's definition. For example a `Mod(5)` will always behave like a `Mod(5)`,
+as a pure function. But a `Mod(7)` will be have differently than a `Mod(5)`, although
+each function will always produce its own stable result for a given input.
+
+## Combining RNGs and Data Mapping Functions
+
+Because pure functions play such a key part in procedural generation techniques,
+the terms "data mapping function", "data mapper" and "data mapping library" will
+be more common in the library than "generator". Conceptually, mapping functions
+to not generate anything. It makes more sense to think of mapping data from one
+domain to another. Even so, the data that is yielded by mapping functions can
+appear quite realistic.
+
+Because good RNGs do generally contain internal state, they aren't purely
+functional. This means that in some cases -- those in which you need to have
+random access to a virtual data set, hash functions make more sense. This
+toolkit allows you to choose between the two in some cases. However, it
+generally favors using hashing and pure-function approaches where possible. Even
+the statistical curve simulations do this.
+
+## Bindings Template
+
+It is often useful to have a template that describes a set of generator
+functions that can be reused across many threads or other application scopes. A
+bindings template is a way to capture the requested generator functions for
+re-use, with actual scope instantiation of the generator functions controlled by
+the usage point. For example, in a JEE app, you may have a bindings template in
+the application scope, and a set of actual bindings within each request (thread
+scope).
+
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_collections.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_collections.md
@ -1,4 +1,8 @@
-# CATEGORY collections
+---
+title: collections functions
+weight: 40
+---
+
 ## HashedLineToStringList

 Creates a List\<String\> from a list of words in a file.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_conversion.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_conversion.md
@ -1,4 +1,8 @@
-# CATEGORY conversion
+---
+title: conversion functions
+weight: 30
+---
+
 ## DigestToByteBuffer

 Computes the digest of the ByteBuffer on input and stores it in the output ByteBuffer. The digestTypes available are:
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_datetime.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_datetime.md
@ -1,4 +1,8 @@
-# CATEGORY datetime
+---
+title: datetime functions
+weight: 20
+---
+
 ## DateTimeParser

 This function will parse a String containing a formatted date time, yielding a DateTime object. If no arguments are provided, then the format is set to
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_diagnostics.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_diagnostics.md
@ -1,4 +1,8 @@
-# CATEGORY diagnostics
+---
+title: diagnostic functions
+weight: 40
+---
+
 ## Show

 Show diagnostic values for the thread-local variable map.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_distributions.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_distributions.md
@ -1,4 +1,8 @@
-# CATEGORY distributions
+---
+title: distribution functions
+weight: 30
+---
+
 ## Beta

@see [Wikipedia: Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) @see [Commons JavaDoc: BetaDistribution](https://commons.apache.org/proper/commons-statistics/commons-statistics-distribution/apidocs/org/apache/commons/statistics/distribution/BetaDistribution.html)
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_functional.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_functional.md
@ -1,4 +1,8 @@
-# CATEGORY functional
+---
+title: utility functions
+weight: 40
+---
+
 ## IntFlow

 Combine multiple IntUnaryOperators into a single function.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_general.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_general.md
@ -1,4 +1,8 @@
-# CATEGORY general
+---
+title: general functions
+weight: 20
+---
+
 ## Add

 Adds a value to the input.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_nulls.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_nulls.md
@ -1,4 +1,8 @@
-# CATEGORY nulls
+---
+title: null functions
+weight: 40
+---
+
 ## NullIfCloseTo

 Returns null if the input value is within range of the specified value.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_premade.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_premade.md
@ -1,4 +1,8 @@
-# CATEGORY premade
+---
+title: pre-made functions
+weight: 20
+---
+
 ## FirstNames

 Return a pseudo-randomly sampled first name from the last US census data on first names occurring more than 100 times. Both male and female names are combined in this function.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_state.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/funcref_state.md
@ -1,4 +1,8 @@
-# CATEGORY state
+---
+title: state functions
+weight: 30
+---
+
 ## Clear

 Clears the per-thread map which is used by the Expr function.
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/index.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/index.md
@ -0,0 +1,25 @@
+---
+title: Binding Functions
+weight: 100
+---
+
+The functions which you can use to generate data in your workloads are
+called *bindings*. They are injected into your statement templates by
+name, just as you might do with named parameters in CQL statements.
+
+These functions can be stitched together in small recipes. When you give
+these mapping functions useful names in your workloads, they are called
+bindings.
+
+Here is an example:
+
+```yaml
+bindings:
+ numbers: NumberNameToString()
+ names: FirstNames()
+```
+
+These are two bindings that you can use in your workloads. The names on the left
+are the _binding names_ and the functions on the right are the _binding recipes_.
+Altogether, we just call them _bindings_.
+
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/using_bindings.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/01_binding_functions/using_bindings.md
@ -0,0 +1,25 @@
+---
+title: Using Bindings
+weight: 15
+---
+
+The functions which you can use to generate data in your workloads are
+mapped into your operations by name, just like you would do with a
+prepared statement, for example.
+
+These functions can be stitched together in small recipes. When you give
+these mapping functions useful names in your workloads, they are called
+bindings.
+
+Here is an example:
+
+```yaml
+bindings:
+ numbers: NumberNameToString()
+ names: FirstNames()
+```
+
+These are two bindings that you can use in your workloads. The names on the left
+are the _binding names_ and the functions on the right are the _binding recipes_.
+Altogether, we just call them _bindings_.
+
--- a/virtdata-userlibs/src/main/resources/docs-for-virtdata/virtdata-dev/concepts.md
+++ b/virtdata-userlibs/src/main/resources/docs-for-virtdata/virtdata-dev/concepts.md
@ -1,87 +0,0 @@
-# Virtual Dataset Concepts
-
-VirtData is a library for the flexible management and expressive use of
-procedural generation libraries. It is a reincarnation of a previous project.
-This version of the idea starts by focusing directly on usage aspects and
-extension points rather than the big idea.
-
-### Procedural Generation
-
-Procedural generation is a general class of methods for taking a set of inputs
-and modifying them in a predictable way to generate content which appears random
-but is actually deterministic. For example, some games use procedural generation
-to take a single value known as the "seed" to generate an apparently rich and
-interesting world.
-
-### Apparently Random RNGs
-
-Sequences of values produced by RNGs (more properly called PRNGs) are not
-actually random, even though they may pass certain tests for randomness. In
-practice, the combination of these two properties is quite valuable for testing
-and data synthesis. Having a stream of data that is measurably random by some
-meaningful standard, but which is configurable and reusable allows for test to
-be replayed, for example.
-
-### Apparently Random Samples
-
-Just as RNGs can appear random when the are not truly, statistical distributions
-which rely on them can also appear random. Uniform random number generators over
-the unit interval [0,1.0) are a common input to virtual sampling methods. This
-means that if you can configure the RNG stream that you feed into your virtual
-sampling methods, you can simulate a repeatable sequence from a known
-distribution.
-
-### Data Mapping Functions
-
-The data mapping functions are the core building block of virtdata. They are the
-functional logic that powers all procedural generation. Data mapping functions
-are generally pure functions. This simply means that a generator function will
-always provide the same result given the same input. All top-level mapping
-functions all take a long value as their input, and produce a result based on
-their parameterized type.
-
-##### Combining RNGs and Data Mapping Functions
-
-Because pure functions play such a key part in procedural generation techniques,
-the terms "data mapping function", "data mapper" and "data mapping library" will
-be more common in the library than "generator". Conceptually, mapping functions
-to not generate anything. It makes more sense to think of mapping data from one
-domain to another. Even so, the data that is yielded by mapping functions can
-appear quite realistic.
-
-Because good RNGs do generally contain internal state, they aren't purely
-functional. This means that in some cases -- those in which you need to have
-random access to a virtual data set, hash functions make more sense. This
-toolkit allows you to choose between the two in some cases. However, it
-generally favors using hashing and pure-function approaches where possible. Even
-the statistical curve simulations do this.
-
-### Data Mapper Library
-
-Data Mapping functions are packaged into libraries which can be loaded by the
-virtdata-user component of the project. Each library has a name, a function
-resolver, and a set of functions that can be instantiated via the function
-resolver.
-
-### Function Resolver
-
-Each library must implement its own function resolver. This is because each
-library may have a different way of naming, finding, creating or managing
-function generator instances. For the user, the description of a generator is
-simply a string. What the generator library does with it is
-implementation-specific. This means that some generator libraries may simply
-have constructor signatures as function specifiers, and others may go as far as
-implementing their own DSL. The basic contract for a function resolver is that
-you pass it a string describing what you want, and it provides a generator
-function in return.
-
-#### Bindings Template
-
-It is often useful to have a template that describes a set of generator
-functions that can be reused across many threads or other application scopes. A
-bindings template is a way to capture the requested generator functions for
-re-use, with actual scope instantiation of the generator functions controlled by
-the usage point. For example, in a JEE app, you may have a bindings template in
-the application scope, and a set of actual bindings within each request (thread
-scope).
-