findmax enhancements and docs

This commit is contained in:
Jonathan Shook 2020-12-22 23:33:36 -06:00
parent 41adf58840
commit f0223501f8
2 changed files with 197 additions and 24 deletions

View File

@ -0,0 +1,152 @@
Analysis Method: FindMax
========================
The findmax.js script can be used with any normal activity definition
which uses the standard phase tagging scheme. It searches for the maximum
throughput available which satisfy a basic SLA requirement. It does this
by dynamically adjusting the workload while it runs, iteratively adjusting
the performance goals against known results. In short, it does what we
often do when we're searching for a local maxima in performance
parameters.
This particular analysis method is not open-ended in terms of its
parameters. That is, it is designed (for now) to focus only on a set of
known parameters: Namely the target throughput as the independent variable
and the resulting response time as the dependent variable.
Here is how the algorithm works.
1. The baseline is set to zero, and the stepping size is set to a small
value.
2. The stepping size is added to the baseline to determine the current
target rate.
3. The workload is run for a period of time (10 seconds by default) and
the achieved throughput and service times are measured.
4. Boundary conditions are checked to determine whether the settings for
this sampling window are acceptable: These conditions are:
1. The achieved throughput is high enough with respect to the target
rate (min 80% by default).
2. The achieved throughput is not lower than the best known result.
3. The achieved latency is low enough (p99<50ms).
5. If any of the boundary conditions are not met, then target throughput
for the current iteration is deemed unusable and the search parameters
are adjusted. In this case, the baseline is check-pointed to that of
the highest valid result, and the step increases start again at the
base step size.
6. If the conditions are met, then the incremental target rate is scaled
up by a factor. The incremental target rate is simply the base rate
plus the step size multiplied by some factor.
7. If increasing the target by the minimum step size would set a target
rate that is known to be too high from a previous run, then the test
terminates.
The findmax script actually runs multiple times, and takes the average
result.
### Example Output (start of run)
# This shows findmax just starting up, where the target rates are
# still pretty low.
>-- iteration 1 ---> targeting 100 ops_s (0+100) for 10s
LATENCY PASS(OK) [ 3.70ms p99 < max 50ms ]
OPRATE/TARGET PASS(OK) [ 102% of target > min 80% ] ( 102/100 )
OPRATE/BEST PASS(OK) [ 100% of best known > min 90% ] ( 102/102 )
---> accepting iteration 1
>-- iteration 2 ---> targeting 200 ops_s (0+200) for 10s
LATENCY PASS(OK) [ 3.62ms p99 < max 50ms ]
OPRATE/TARGET PASS(OK) [ 100% of target > min 80% ] ( 199/200 )
OPRATE/BEST PASS(OK) [ 195% of best known > min 90% ] ( 199/102 )
---> accepting iteration 2
>-- iteration 3 ---> targeting 400 ops_s (0+400) for 10s
LATENCY PASS(OK) [ 3.57ms p99 < max 50ms ]
OPRATE/TARGET PASS(OK) [ 100% of target > min 80% ] ( 399/400 )
OPRATE/BEST PASS(OK) [ 201% of best known > min 90% ] ( 399/199 )
---> accepting iteration 3
>-- iteration 4 ---> targeting 800 ops_s (0+800) for 10s
LATENCY PASS(OK) [ 3.70ms p99 < max 50ms ]
OPRATE/TARGET PASS(OK) [ 99% of target > min 80% ] ( 796/800 )
OPRATE/BEST PASS(OK) [ 199% of best known > min 90% ] ( 796/399 )
---> accepting iteration 4
>-- iteration 5 ---> targeting 1600 ops_s (0+1600) for 10s
LATENCY PASS(OK) [ 3.53ms p99 < max 50ms ]
OPRATE/TARGET PASS(OK) [ 99% of target > min 80% ] ( 1591/1600 )
OPRATE/BEST PASS(OK) [ 200% of best known > min 90% ] ( 1591/796 )
---> accepting iteration 5
...
### Example Result (end of run)
### Parameters
#### sampling controls ####
- `sample_time=10` - How long each sampling window lasts in seconds,
initially.
- `sample_incr=1.33` - How to scale up sample_time each time the search
range is adjusted.
- `sample_max=300` - The maximum sampling window size in seconds.
The sample time determines how long each iteration lasts. As findmax runs,
this value is increased according to the sample_incr value. This improves
accuracy as test results start to converge, but it also makes the test
take longer. Lower values of sample_incr or sample_max will limit the
runtime of tests, but this will also limit accuracy.
#### rate controls ####
- `rate_base=0` - The initial base rate.
- `rate_step=100` - The minimum step size to be added to the base rate.
- `rate_incr=2` - The increase factor for the step size.
By default, these parameters approximate the speed of binary search for
each time the base parameter is check-pointed. The target rate is
calculated as `rate_base + (rate_step * rate_incr ^ step)` where steps
start at 0. For example, the initial iteration will thus use `0 +
(100 * 2^0)` which is simply 100.) Based on this, the initial target rates
will follow a pattern of 100,200,400,800, and so on. Once the highest
successful target rate is found in this series, the base rate is
check-pointed to that, and the search starts again with that new base rate
with the same progression above stacking on top.
- `min_stride=100` - A lock-avoiding optimization for rate limiting. This
uses a stride rate limiter with micro-batching internally to avoid
higher contention on many-cored client systems.
- `averageof=2` - How many runs to average together for the final result.
Averaging multiple runs together ensures that some noise of local
effects (like background processes or compactions) are smoothed out for
a more realistic result.
#### pass/fail conditions ####
- `latency_cutoff=50.0` - The numeric value which is considered the
highest acceptable latency at the selected percentile.
- `latency_pctile=0.99` - The selected percentile. The values shown here
for these two mean "50ms@p99".
- `testrate_cutoff=0.8` - The minimum achieved throughput for an iteration
with respect to the target rate to be considered successful.
- `bestrate_cutoff=0.9` - The minimum achieved throughput for an iteration
with respect to the known best result to be considered successful.
## Workload Selection
- `activity_type=cql` - The type of internal NB driver to use.
- `yaml_file=cql-iot` - The name of the workload yaml.
You can invoke findmax with any workload yaml which uses the standard
naming scheme for phase control. This means that you have tagged each of
your statements or statement blocks with the appropriate phase tags from
schema, rampup, main, for example.
- `schematags=phase:schema` - The tag filter for schema statements.
Findmax will run a schema phase with 1 thread by default.
- `maintas=phase:main` - The tag filter for the main workload. This is the
workload that is started and run in the background for all of the
sampling windows.

View File

@ -21,7 +21,7 @@ if (params.cycles) {
var sample_time = 10; // start sampling at 10 second windows
var sample_incr = 1.6181; // sampling window is fibonacci
var sample_incr = 1.333; // + 1/3 time
var sample_max = 300; // 5 minutes is the max sampling window
var rate_base = 0;
@ -36,6 +36,7 @@ var testrate_cutoff = 0.8;
var bestrate_cutoff = 0.90;
var latency_pctile = 0.99;
var profile = "TEMPLATE(profile,default)";
printf(" profile=%s\n",profile);
@ -44,12 +45,12 @@ if (profile == "default") {
printf(" :NOTE: Consider using profile=fast or profile=accurate.\n");
} else if (profile == "fast") {
sample_time = 5;
sample_incr = 1.6181;
sample_incr = 1.2;
sample_max = 60;
averageof = 1;
} else if (profile == "accurate") {
sample_time = 10;
sample_incr = 2;
sample_incr = 1.6181; // fibonacci
sample_max = 300;
averageof = 3;
}
@ -97,8 +98,8 @@ printf(" Schedule %s operations at a time per thread.\n", min_stride);
printf(" Report the average result of running the findmax search algorithm %d times.\n",averageof);
printf(" Reject iterations which fail to achieve %2.0f%% of the target rate.\n", testrate_cutoff*100);
printf(" Reject iterations which fail to achieve %2.0f%% of the best rate.\n",bestrate_cutoff*100);
printf(" Reject iterations which fail to achieve better than %dms response\n",latency_cutoff);
printf(" at percentile p%f\n",latency_pctile*100);
printf(" Reject iterations which fail to achieve better than %dms response\n", latency_cutoff);
printf(" at percentile p%f\n", latency_pctile * 100);
printf("```\n");
if (latency_pctile > 1.0) {
@ -107,7 +108,8 @@ if (latency_pctile > 1.0) {
var reporter_rate_base = scriptingmetrics.newGauge("findmax.params.rate_base", rate_base + 0.0);
var reporter_achieved_rate = scriptingmetrics.newGauge("findmax.sampling.achieved_rate", 0.0);
var reporter_target_rate = scriptingmetrics.newGauge("findmax.target.rate_base", 0.0);
var reporter_base_rate = scriptingmetrics.newGauge("findmax.target.rate_base", 0.0);
var reporter_target_rate = scriptingmetrics.newGauge("findmax.target.rate", 0.0);
var activity_type = "TEMPLATE(activity_type,cql)";
@ -130,7 +132,7 @@ scenario.run(schema_activitydef);
activitydef = params.withDefaults({
type: activity_type,
yaml: yaml_file,
threads: "50"
threads: "auto"
});
activitydef.alias = "findmax";
@ -159,8 +161,9 @@ function lowest_unacceptable_of(result1, result2) {
}
function as_pct(val) {
return (val * 100.0).toFixed(0)+"%";
return (val * 100.0).toFixed(0) + "%";
}
function as_pctile(val) {
return "p" + (val * 100.0).toFixed(0);
}
@ -168,16 +171,16 @@ function as_pctile(val) {
scenario.start(activitydef);
raise_to_min_stride(activities.findmax, min_stride);
printf("\nwarming up client JIT for 10 seconds... \n");
activities.findmax.striderate = "" + (1000.0 / activities.findmax.stride) + ":1.1:restart";
scenario.waitMillis(10000);
activities.findmax.striderate = "" + (100.0 / activities.findmax.stride) + ":1.1:restart";
printf("commencing test\n");
// printf("\nwarming up client JIT for 10 seconds... \n");
// activities.findmax.striderate = "" + (1000.0 / activities.findmax.stride) + ":1.1:restart";
// scenario.waitMillis(10000);
// activities.findmax.striderate = "" + (100.0 / activities.findmax.stride) + ":1.1:restart";
// printf("commencing test\n");
function raise_to_min_stride(params, min_stride) {
var multipler = (min_stride / params.stride);
var newstride = (multipler * params.stride);
printf("increasing stride to %d ( %d x initial)\n",newstride, multipler);
printf("increasing stride to %d ( %d x initial)\n", newstride, multipler);
params.stride = newstride.toFixed(0);
}
@ -192,9 +195,16 @@ function testCycleFun(params) {
var result = {};
reporter_target_rate.update(params.target_rate)
// printf(" target rate is " + params.target_rate + " ops_s");
activities.findmax.striderate = "" + (params.target_rate / activities.findmax.stride) + ":1.1:restart";
if (params.iteration == 1) {
printf("\n settling load at base for %ds before active sampling.\n", sample_time);
scenario.waitMillis(sample_time * 1000);
}
precount = metrics.findmax.cycles.servicetime.count;
var snapshot_reader = metrics.findmax.result.deltaReader;
@ -225,7 +235,15 @@ function highest_acceptable_of(results, iteration1, iteration2) {
if (iteration1 == 0) {
return iteration2;
}
return results[iteration1].ops_per_second > results[iteration2].ops_per_second ? iteration1 : iteration2;
// printf("iteration1.target_rate=%s iteration2.target_rate=%s\n",""+results[iteration1].target_rate,""+results[iteration2].target_rate);
// printf("comparing iterations left: %d right: %d\n", iteration1, iteration2);
let selected = results[iteration1].target_rate > results[iteration2].target_rate ? iteration1 : iteration2;
// printf("selected %d\n", selected)
// printf("iteration1.ops_per_second=%f iteration2.ops_per_second=%f\n",results[iteration1].ops_per_second,results[iteration2].ops_per_second);
// printf("comparing iterations left: %d right: %d\n", iteration1, iteration2);
// let selected = results[iteration1].ops_per_second > results[iteration2].ops_per_second ? iteration1 : iteration2;
// printf("selected %d\n", selected)
return selected;
}
function find_max() {
@ -239,7 +257,7 @@ function find_max() {
var rejected_count = 0;
var base = rate_base;
reporter_target_rate.update(base);
reporter_base_rate.update(base);
var step = rate_step;
var accepted_count = 0;
@ -315,26 +333,29 @@ function find_max() {
if (failed_checks == 0) {
accepted_count++;
highest_acceptable_iteration = highest_acceptable_of(results, highest_acceptable_iteration, iteration);
printf(" ---> accepting iteration %d\n",iteration);
printf(" ---> accepting iteration %d (highest is now %d)\n", iteration, highest_acceptable_iteration);
} else {
rejected_count++;
printf(" !!! rejecting iteration %d\n",iteration);
printf(" !!! rejecting iteration %d\n", iteration);
// If the lowest unacceptable iteration was not already defined ...
if (lowest_unacceptable_iteration == 0) {
lowest_unacceptable_iteration = iteration;
} else {
// keep the more conservative target rate, if both this iteration and another one was not acceptable
lowest_unacceptable_iteration = result.target_rate < results[lowest_unacceptable_iteration].target_rate ?
iteration : lowest_unacceptable_iteration;
}
// keep the lower maximum target rate, if both this iteration and another one failed
if (result.target_rate < results[lowest_unacceptable_iteration].target_rate) {
lowest_unacceptable_iteration = iteration;
printf("setting lowest_unacceptable_iteration to %d\n", iteration)
}
highest_acceptable_iteration = highest_acceptable_iteration == 0 ? iteration : highest_acceptable_iteration;
var too_fast = results[lowest_unacceptable_iteration].target_rate;
if (base + step >= too_fast && results[highest_acceptable_iteration].target_rate + rate_step < too_fast) {
base = results[highest_acceptable_iteration].target_rate;
reporter_target_rate.update(base);
reporter_base_rate.update(base);
step = rate_step;
var search_summary = "[[ PASS " +