From 13b81cd955d87f334e0825355b73b023bb26c78e Mon Sep 17 00:00:00 2001
From: Jonathan Shook
Date: Tue, 5 Mar 2024 13:52:06 -0600
Subject: [PATCH] v1 of intro docs
---
.../lib/vectors/dnn/package-info.java | 46 ++++++++++++++-----
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java b/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java
index 388bdc721..3ce27f6a0 100644
--- a/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java
+++ b/virtdata-lib-vectors/src/main/java/io/nosqlbench/virtdata/lib/vectors/dnn/package-info.java
@@ -16,12 +16,25 @@
/**
* This package contains experimental support for new methods for testing vector stores.
- * projective simulation ... TBD
- * of vector spaces
- * within which provably correct KNN relationships can be derived from affine ordinal relationships.
- * In other words, vectors in some projective space which are addressable by some ordinal identity
- * can be constructed with procedural generation methods, and provably correct KNN neighborhoods of
- * some size can be derived on the fly in a closed form calculation.
+ * The primary method employed is functional mapping of ordinal spaces to vector spaces.
+ * In this way, closed-form functions can be used to synthesize vectors and provably correct neighborhoods
+ * as if they were defined in a static dataset. This allows for arbitrary testing scenarios to be
+ * created and used immediately and with no need to regenerate or compute any data beforehand.
+ *
+ * The original concept for this was derived by Shaunak Das, in the form of (Das) Direct Nearest Neighbor.
+ * Additional methods have been implemented using this technique to include additional space mappings
+ * for other vector distance functions.
+ *
+ * The testing methods enabled by this approach include:
+ *
+ * - Generation of a population of vectors which are enumerable and stable with respect to their
+ * ordinal addresses.
+ * - Generation of ordered subsets of this population which maintain a unique local ordering in
+ * terms of the selected distance function, otherwise known as rank for KNN queries.
+ * - Validation of results for nearest neighborhood queries, using synthetic results computed on the fly as the
+ * basis for correctness.
+ *
+ *
*
* The vector spaces constructed in this way are not intended nor guaranteed to be dimensionally disperse.
* They are meant to provide an algebraic basis for exercising vector storage systems with increasing
@@ -30,13 +43,22 @@
*
*
Each vector scheme in this method has the following properties:
*
- * - All vectors within the space are enumerable. Each increasing ordinal value describes a new and distinct
- * vector. The value of this vector is deterministic within the parameters of the space.
+ * - Each virtual vector space is defined by a set of parameters which are used as inputs to the
+ * mapping functions. The space, and the definition of valid vectors in a neighborhood depend on these
+ * for stability and correctness. Thus each space is explicitly defined by and inseparable from its parameters.
+ * - All vectors within the space are enumerable. Each increasing ordinal value describes a new and distinct
+ * vector. The value of this vector is deterministic within the parameters of the space.
+ * - Each vector within a space is a valid query vector which implies a correct set of distance-ranked
+ * neighbors up to some neighborhood size for the related distance function.
+ * - Nearest neighbors may have equal distance in some cases, for which ties are accommodated in testing
+ * assertions. Suppose the distance from v10 to v5 is the same as the distance from
+ * v10 to v15, then both v5 and v15 should be interchangeable as
+ * correct elements in any KNN results for query vector v10, provided that their distances are within
+ * the top K results as otherwise expected.
+ *
*
- *
*
- * This work is largely inspired by the DNN or "Das/Direct Nearest Neighbor" method, pioneered by
- * Shaunak Das at DataStax. Additional implementations and ideas are contributed by the vector performance
- * team and our testing community.
+ * TBD: Explain the above in terms of specific implementations and parameters.
+ *
*/
package io.nosqlbench.virtdata.lib.vectors.dnn;