mirror of
https://github.com/nosqlbench/nosqlbench.git
synced 2025-02-25 18:55:28 -06:00
docs structure updates
This commit is contained in:
@@ -89,9 +89,10 @@ Note the differences between this and the command that we used to generate the s
|
||||
appropriately large number of cycles in actual testing to make your main test meaningful.
|
||||
|
||||
:::info
|
||||
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for `cycles=0..n`,
|
||||
which makes cycles a zero-based quantity by default. For example, cycles=5 means that the activity will use cycles
|
||||
0,1,2,3,4, but not 5. The reason for this is explained in detail in the Activity Parameters section.
|
||||
The cycles parameter is not just a quantity. It is a range of values. The `cycles=n` format is short for
|
||||
`cycles=0..n`, which makes cycles a zero-based range. For example, cycles=5 means that the activity will use cycles
|
||||
0,1,2,3,4, but
|
||||
not 5. The reason for this is explained in detail in the Activity Parameters section.
|
||||
:::
|
||||
|
||||
These parameters are explained in detail in the section on _Activity Parameters_.
|
||||
|
||||
@@ -31,12 +31,12 @@ parameter. This is a way of templating a workload and make it multi-purpose or a
|
||||
|
||||
## Experimentation Friendly
|
||||
|
||||
Because the workload YAML format is generic across activity types, it is possible to ask one acivity type to interpret
|
||||
the statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you
|
||||
want to have a very high level activity type like `stdout` use a lower-level syntax like that of the `cql` activity
|
||||
type. When you do this, the stdout activity type _plays_ the statements to your console as they would be executed in
|
||||
CQL, data bindings and all.
|
||||
Because the workload YAML format is generic across driver types, it is possible to ask one driver type to interpret the
|
||||
statements that are meant for another. This isn't generally a good idea, but it becomes extremely handy when you want to
|
||||
have a high level driver type like `stdout` interpret the syntax of another driver like `cql`. When you do this, the
|
||||
stdout activity type _plays_ the statements to your console as they would be executed in CQL, data bindings and all.
|
||||
|
||||
This means you can empirically and substantively demonstrate and verify access patterns, data skew, and other dataset
|
||||
details before you change back to cql mode and turn up the settings for a higher scale test.
|
||||
details before you change back to cql mode and turn up the settings for a higher scale test. It takes away the guess
|
||||
work about what your test is actually doing, and it works for all drivers.
|
||||
|
||||
|
||||
@@ -106,6 +106,7 @@
|
||||
<include>META-INF/functions</include>
|
||||
<include>data/**</include>
|
||||
<include>docs-for-virtdata/**</include>
|
||||
<include>docs/**</include>
|
||||
</includes>
|
||||
</resource>
|
||||
</resources>
|
||||
|
||||
@@ -1,30 +0,0 @@
|
||||
package io.nosqlbench.virtdata.userlibs.apps.docsapp;
|
||||
|
||||
import io.nosqlbench.virtdata.annotations.Category;
|
||||
import io.nosqlbench.virtdata.processors.DocCtorData;
|
||||
|
||||
import java.util.*;
|
||||
|
||||
public class FunctionDoc {
|
||||
|
||||
private String funcName;
|
||||
private String classDocs;
|
||||
private Set<Category> categories= new HashSet<>();
|
||||
private List<DocCtorData> ctors = new ArrayList<>();
|
||||
|
||||
public FunctionDoc(String funcName) {
|
||||
this.funcName = funcName;
|
||||
}
|
||||
|
||||
public void setClassDocs(String distinctClassDocs) {
|
||||
this.classDocs = distinctClassDocs;
|
||||
}
|
||||
|
||||
public void addCategories(Category[] categories) {
|
||||
this.categories.addAll(Arrays.asList(categories));
|
||||
}
|
||||
|
||||
public void addCtor(DocCtorData ctor) {
|
||||
this.ctors.add(ctor);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
---
|
||||
title: collection functions
|
||||
weight: 40
|
||||
---
|
||||
|
||||
Collection functions allow you to construct Java Lists, Maps or Sets.
|
||||
These functions often take the form of a higher-order function, where
|
||||
the inner function definitions are called to determine the size of
|
||||
the collection, the individual values to be added, etc.
|
||||
|
||||
@@ -0,0 +1,8 @@
|
||||
---
|
||||
title: conversion functions
|
||||
weight: 30
|
||||
---
|
||||
|
||||
Conversion functions simply allow values of one type
|
||||
to be converted to another type in an obvious way.
|
||||
|
||||
@@ -0,0 +1,12 @@
|
||||
---
|
||||
title: datetime functions
|
||||
weight: 20
|
||||
---
|
||||
|
||||
Functions in this category know about times and dates, datetimes, seconds or millisecond epoch times, and so forth.
|
||||
|
||||
Some of the functions in this category are designed to allow testing of UUID types which are usually designed to avoid
|
||||
determinism. This makes it possible to test systems which depend on UUIDs but which require determinism in test data.
|
||||
This is strictly for testing use. Breaking the universally-unique properties of UUIDs in production systems is a bad
|
||||
idea. Yet, in testing, this determinism is quite useful.
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: diagnostic functions
|
||||
weight: 40
|
||||
---
|
||||
|
||||
Diagnostic functions can be used to help you construct the right VirtData recipe.
|
||||
|
||||
@@ -0,0 +1,92 @@
|
||||
---
|
||||
title: distribution functions
|
||||
weight: 30
|
||||
---
|
||||
|
||||
All of the distributions that are provided in the Apache Commons Math
|
||||
project are supported here, in multiple forms.
|
||||
|
||||
## Continuous or Discrete
|
||||
|
||||
These distributions break down into two main categories:
|
||||
|
||||
### Continuous Distributions
|
||||
|
||||
These are distributions over real numbers like 23.4323, with
|
||||
continuity across the values. Each of the continuous distributions can
|
||||
provide samples that fall on an interval of the real number line.
|
||||
Continuous probability distributions include the *Normal* distribution,
|
||||
and the *Exponential* distribution, among many others.
|
||||
|
||||
### Discrete Distributions
|
||||
|
||||
Discrete distributions, also known as *integer distributions* have only
|
||||
whole-number valued samples. These distributions include the *Binomial*
|
||||
distribution, the *Zipf* distribution, and the *Poisson* distribution,
|
||||
among others.
|
||||
|
||||
## Hashed or Mapped
|
||||
|
||||
### hashed samples
|
||||
|
||||
Generally, you will want to "randomly sample" from a probability distribution.
|
||||
This is handled automatically by the functions below if you do not override the
|
||||
defaults. **The `hash` mode is the default sampling mode for probability
|
||||
distributions.** This is accomplished by computing an internal on the unit
|
||||
interval variate input before using the resulting value to map into the sampling
|
||||
curve. This is called the `hash` sampling mode by VirtData. You can put `hash`
|
||||
into the modifiers as explained below if you want to document it explicitly.
|
||||
|
||||
### mapped samples
|
||||
|
||||
The method used to sample from these distributions depends on a mathematical
|
||||
function called the cumulative probability function, or more specifically
|
||||
the inverse of it. Having this function computed over some interval allows
|
||||
one to sample the shape of a distribution progressively if desired. In
|
||||
other words, it allows for some *percentile-like* view of values within
|
||||
a given probability distribution. This mode of using the inverse cumulative
|
||||
density function is known as the `map` mode in VirtData, as it allows one
|
||||
to map a unit interval variate in a deterministic way to a density
|
||||
sampling curve. To enable this mode, simply pass `map` as one of the
|
||||
function modifiers for any function in this category.
|
||||
|
||||
## Interpolated or Computed Samples
|
||||
|
||||
When sampling from mathematical models of probability densities, performance
|
||||
between different densities can vary drastically. This means that you may
|
||||
end up perturbing the results of your test in an unexpected way simply
|
||||
by changing parameters of your testing distributions. Even worse, some
|
||||
densities have painful corner cases in performance, like 'Zipf', which
|
||||
can make tests unbearably slow and flawed as they chew up CPU resources.
|
||||
|
||||
### Interpolated Samples
|
||||
|
||||
For this reason, interpolation is built-in to these sampling functions.
|
||||
**The default mode is `interpolate`.** This means that the sampling
|
||||
function is pre-computed over 1000 equidistant points in the unit interval,
|
||||
and the result is shared among all threads as a look-up-table for
|
||||
interpolation. This makes all statistical sampling functions perform nearly
|
||||
identically at runtime (after initialization, a one time cost).
|
||||
This does have the minor side effect of a little loss in accuracy, but
|
||||
the difference is generally negligible for nearly all performance testing
|
||||
cases.
|
||||
|
||||
### Computed Samples
|
||||
|
||||
Conversely, `compute` mode sampling calls the sampling function every
|
||||
time a sample is needed. This affords a little more accuracy, but is generally
|
||||
not preferable to the default interpolated mode. You'll know if you need
|
||||
computed samples. Otherwise, it's best to stick with interpolation so that
|
||||
you spend more time testing your target system and less time testing
|
||||
your data generation functions.
|
||||
|
||||
## Input Range
|
||||
|
||||
All of these functions take a long as the input value for sampling. This
|
||||
is similar to how the unit interval (0.0,1.0) is used in mathematics
|
||||
and statistics, but more tailored to modern system capabilities. Instead
|
||||
of using the unit interval, we simply use the interval of all positive
|
||||
longs. This provides more compatibility with other functions in VirtData,
|
||||
including hashing functions.
|
||||
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
---
|
||||
title: flow functions
|
||||
weight: 40
|
||||
---
|
||||
|
||||
These functions help combine other functions into higher-order functions when needed.
|
||||
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: general functions
|
||||
weight: 20
|
||||
---
|
||||
|
||||
These functions have no particular category, so they ended up here by default.
|
||||
|
||||
@@ -0,0 +1,13 @@
|
||||
---
|
||||
title: null functions
|
||||
weight: 40
|
||||
---
|
||||
|
||||
These functions can generate null values. When using nulls in your binding recipes, ensure that you don't generate them
|
||||
in-line as inputs to other functions. This will lead to errors which interrupt your test. If you must use functions that
|
||||
generate null values, ensure that they are the only or last function in a chain.
|
||||
|
||||
If you need to mark a field to be undefined, but _not set to null_, then use the functions which know how to yield a
|
||||
VALUE.UNSET, which is a sigil constant within the VirtData runtime. These functions are correctly interpreted by
|
||||
conformant drivers like the SQL driver so that they will avoid inject the named field into an operation if it has this
|
||||
special value.
|
||||
@@ -0,0 +1,11 @@
|
||||
---
|
||||
title: pre-made functions
|
||||
weight: 20
|
||||
---
|
||||
|
||||
Functions in this category are meant to provide easy grab-and-go functions that are tailored for real-world simulation.
|
||||
This library will grow over time. These functions are often built directly on top of other functions in the core
|
||||
libraries. However, they are provided here for simplicity in workload construction. They perform exactly the same as
|
||||
their longer-form equivalents.
|
||||
|
||||
|
||||
@@ -0,0 +1,19 @@
|
||||
---
|
||||
title: state functions
|
||||
weight: 30
|
||||
---
|
||||
|
||||
Functions in the state category allow you to do things with side-effects in the function flow. Specifically, they allow
|
||||
you to save or load values of named variables to thread-local registers. These work best when used with non-async
|
||||
activities, since the normal statement grouping allows you to share data between statements in the sequence. It is not
|
||||
advised to use these with async activities.
|
||||
|
||||
When using these functions, be careful that you call them when needed. For example, if you have a named binding which
|
||||
will save a value, that action only occurs if some statement with this named binding is used.
|
||||
|
||||
For example, if you have an account records and transaction records, where you want to save the account identifier to
|
||||
use within the transaction inserts, you must ensure that each account binding is used within the thread first.
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user