docs structure updates

This commit is contained in:
Jonathan Shook
2020-03-28 01:52:17 -05:00
parent 7a75408423
commit 0908f53494
29 changed files with 196 additions and 39 deletions

View File

@@ -89,9 +89,10 @@ Note the differences between this and the command that we used to generate the s
appropriately large number of cycles in actual testing to make your main test meaningful.
:::info
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for `cycles=0..n`,
which makes cycles a zero-based quantity by default. For example, cycles=5 means that the activity will use cycles
0,1,2,3,4, but not 5. The reason for this is explained in detail in the Activity Parameters section.
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for
`cycles=0..n`, which makes cycles a zero-based range. For example, cycles=5 means that the activity will use cycles
0,1,2,3,4, but
not 5. The reason for this is explained in detail in the Activity Parameters section.
:::
These parameters are explained in detail in the section on _Activity Parameters_.

View File

@@ -31,12 +31,12 @@ parameter. This is a way of templating a workload and make it multi-purpose or a
## Experimentation Friendly
Because the workload YAML format is generic across activity types, it is possible to ask one acivity type to interpret
the statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you
want to have a very high level activity type like `stdout` use a lower-level syntax like that of the `cql` activity
type. When you do this, the stdout activity type _plays_ the statements to your console as they would be executed in
CQL, data bindings and all.
Because the workload YAML format is generic across driver types, it is possible to ask one driver type to interpret the
statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you want to
have a high level driver type like `stdout` interpret the syntax of another driver like `cql`. When you do this, the
stdout activity type _plays_ the statements to your console as they would be executed in CQL, data bindings and all.
This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
details before you change back to cql mode and turn up the settings for a higher scale test.
details before you change back to cql mode and turn up the settings for a higher scale test. It takes away the guess
work about what your test is actually doing, and it works for all drivers.

View File

@@ -106,6 +106,7 @@
<include>META-INF/functions</include>
<include>data/**</include>
<include>docs-for-virtdata/**</include>
<include>docs/**</include>
</includes>
</resource>
</resources>

View File

@@ -1,30 +0,0 @@
package io.nosqlbench.virtdata.userlibs.apps.docsapp;
import io.nosqlbench.virtdata.annotations.Category;
import io.nosqlbench.virtdata.processors.DocCtorData;
import java.util.*;
public class FunctionDoc {
private String funcName;
private String classDocs;
private Set<Category> categories= new HashSet<>();
private List<DocCtorData> ctors = new ArrayList<>();
public FunctionDoc(String funcName) {
this.funcName = funcName;
}
public void setClassDocs(String distinctClassDocs) {
this.classDocs = distinctClassDocs;
}
public void addCategories(Category[] categories) {
this.categories.addAll(Arrays.asList(categories));
}
public void addCtor(DocCtorData ctor) {
this.ctors.add(ctor);
}
}

View File

@@ -0,0 +1,10 @@
---
title: collection functions
weight: 40
---
Collection functions allow you to construct Java Lists, Maps or Sets.
These functions often take the form of a higher-order function, where
the inner function definitions are called to determine the size of
the collection, the individual values to be added, etc.

View File

@@ -0,0 +1,8 @@
---
title: conversion functions
weight: 30
---
Conversion functions simply allow values of one type
to be converted to another type in an obvious way.

View File

@@ -0,0 +1,12 @@
---
title: datetime functions
weight: 20
---
Functions in this category know about times and dates, datetimes, seconds or millisecond epoch times, and so forth.
Some of the functions in this category are designed to allow testing of UUID types which are usually designed to avoid
determinism. This makes it possible to test systems which depend on UUIDs but which require determinism in test data.
This is strictly for testing use. Breaking the universally-unique properties of UUIDs in production systems is a bad
idea. Yet, in testing, this determinism is quite useful.

View File

@@ -0,0 +1,7 @@
---
title: diagnostic functions
weight: 40
---
Diagnostic functions can be used to help you construct the right VirtData recipe.

View File

@@ -0,0 +1,92 @@
---
title: distribution functions
weight: 30
---
All of the distributions that are provided in the Apache Commons Math
project are supported here, in multiple forms.
## Continuous or Discrete
These distributions break down into two main categories:
### Continuous Distributions
These are distributions over real numbers like 23.4323, with
continuity across the values. Each of the continuous distributions can
provide samples that fall on an interval of the real number line.
Continuous probability distributions include the *Normal* distribution,
and the *Exponential* distribution, among many others.
### Discrete Distributions
Discrete distributions, also known as *integer distributions* have only
whole-number valued samples. These distributions include the *Binomial*
distribution, the *Zipf* distribution, and the *Poisson* distribution,
among others.
## Hashed or Mapped
### hashed samples
Generally, you will want to "randomly sample" from a probability distribution.
This is handled automatically by the functions below if you do not override the
defaults. **The `hash` mode is the default sampling mode for probability
distributions.** This is accomplished by computing an internal on the unit
interval variate input before using the resulting value to map into the sampling
curve. This is called the `hash` sampling mode by VirtData. You can put `hash`
into the modifiers as explained below if you want to document it explicitly.
### mapped samples
The method used to sample from these distributions depends on a mathematical
function called the cumulative probability function, or more specifically
the inverse of it. Having this function computed over some interval allows
one to sample the shape of a distribution progressively if desired. In
other words, it allows for some *percentile-like* view of values within
a given probability distribution. This mode of using the inverse cumulative
density function is known as the `map` mode in VirtData, as it allows one
to map a unit interval variate in a deterministic way to a density
sampling curve. To enable this mode, simply pass `map` as one of the
function modifiers for any function in this category.
## Interpolated or Computed Samples
When sampling from mathematical models of probability densities, performance
between different densities can vary drastically. This means that you may
end up perturbing the results of your test in an unexpected way simply
by changing parameters of your testing distributions. Even worse, some
densities have painful corner cases in performance, like 'Zipf', which
can make tests unbearably slow and flawed as they chew up CPU resources.
### Interpolated Samples
For this reason, interpolation is built-in to these sampling functions.
**The default mode is `interpolate`.** This means that the sampling
function is pre-computed over 1000 equidistant points in the unit interval,
and the result is shared among all threads as a look-up-table for
interpolation. This makes all statistical sampling functions perform nearly
identically at runtime (after initialization, a one time cost).
This does have the minor side effect of a little loss in accuracy, but
the difference is generally negligible for nearly all performance testing
cases.
### Computed Samples
Conversely, `compute` mode sampling calls the sampling function every
time a sample is needed. This affords a little more accuracy, but is generally
not preferable to the default interpolated mode. You'll know if you need
computed samples. Otherwise, it's best to stick with interpolation so that
you spend more time testing your target system and less time testing
your data generation functions.
## Input Range
All of these functions take a long as the input value for sampling. This
is similar to how the unit interval (0.0,1.0) is used in mathematics
and statistics, but more tailored to modern system capabilities. Instead
of using the unit interval, we simply use the interval of all positive
longs. This provides more compatibility with other functions in VirtData,
including hashing functions.

View File

@@ -0,0 +1,6 @@
---
title: flow functions
weight: 40
---
These functions help combine other functions into higher-order functions when needed.

View File

@@ -0,0 +1,7 @@
---
title: general functions
weight: 20
---
These functions have no particular category, so they ended up here by default.

View File

@@ -0,0 +1,13 @@
---
title: null functions
weight: 40
---
These functions can generate null values. When using nulls in your binding recipes, ensure that you don't generate them
in-line as inputs to other functions. This will lead to errors which interrupt your test. If you must use functions that
generate null values, ensure that they are the only or last function in a chain.
If you need to mark a field to be undefined, but _not set to null_, then use the functions which know how to yield a
VALUE.UNSET, which is a sigil constant within the VirtData runtime. These functions are correctly interpreted by
conformant drivers like the SQL driver so that they will avoid inject the named field into an operation if it has this
special value.

View File

@@ -0,0 +1,11 @@
---
title: pre-made functions
weight: 20
---
Functions in this category are meant to provide easy grab-and-go functions that are tailored for real-world simulation.
This library will grow over time. These functions are often built directly on top of other functions in the core
libraries. However, they are provided here for simplicity in workload construction. They perform exactly the same as
their longer-form equivalents.

View File

@@ -0,0 +1,19 @@
---
title: state functions
weight: 30
---
Functions in the state category allow you to do things with side-effects in the function flow. Specifically, they allow
you to save or load values of named variables to thread-local registers. These work best when used with non-async
activities, since the normal statement grouping allows you to share data between statements in the sequence. It is not
advised to use these with async activities.
When using these functions, be careful that you call them when needed. For example, if you have a named binding which
will save a value, that action only occurs if some statement with this named binding is used.
For example, if you have an account records and transaction records, where you want to save the account identifier to
use within the transaction inserts, you must ensure that each account binding is used within the thread first.