diff --git a/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java b/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java index 388bdc721..3ce27f6a0 100644 --- a/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java +++ b/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java @@ -16,12 +16,25 @@ /** *

This package contains experimental support for new methods for testing vector stores. - * projective simulation ... TBD - * of vector spaces - * within which provably correct KNN relationships can be derived from affine ordinal relationships. - * In other words, vectors in some projective space which are addressable by some ordinal identity - * can be constructed with procedural generation methods, and provably correct KNN neighborhoods of - * some size can be derived on the fly in a closed form calculation.

+ * The primary method employed is functional mapping of ordinal spaces to vector spaces. + * In this way, closed-form functions can be used to synthesize vectors and provably correct neighborhoods + * as if they were defined in a static dataset. This allows for arbitrary testing scenarios to be + * created and used immediately and with no need to regenerate or compute any data beforehand.

+ * + *

The original concept for this was derived by Shaunak Das, in the form of (Das) Direct Nearest Neighbor. + * Additional methods have been implemented using this technique to include additional space mappings + * for other vector distance functions.

+ * + *

The testing methods enabled by this approach include: + *

    + *
  1. Generation of a population of vectors which are enumerable and stable with respect to their + * ordinal addresses.
  2. + *
  3. Generation of ordered subsets of this population which maintain a unique local ordering in + * terms of the selected distance function, otherwise known as rank for KNN queries.
  4. + *
  5. Validation of results for nearest neighborhood queries, using synthetic results computed on the fly as the + * basis for correctness.
  6. + *
+ *

* *

The vector spaces constructed in this way are not intended nor guaranteed to be dimensionally disperse. * They are meant to provide an algebraic basis for exercising vector storage systems with increasing @@ -30,13 +43,22 @@ * *

Each vector scheme in this method has the following properties: *

- *

* - *

This work is largely inspired by the DNN or "Das/Direct Nearest Neighbor" method, pioneered by - * Shaunak Das at DataStax. Additional implementations and ideas are contributed by the vector performance - * team and our testing community.

+ *

TBD: Explain the above in terms of specific implementations and parameters.

+ *

*/ package io.nosqlbench.virtdata.lib.vectors.dnn;