v1 of intro docs

This commit is contained in:
Jonathan Shook 2024-03-05 13:52:06 -06:00
parent 2c94e81f6b
commit 13b81cd955

View File

@ -16,12 +16,25 @@
/**
* <P>This package contains experimental support for new methods for testing vector stores.
* projective simulation ... TBD
* of vector spaces
* within which provably correct KNN relationships can be derived from affine ordinal relationships.
* In other words, vectors in some projective space which are addressable by some ordinal identity
* can be constructed with procedural generation methods, and provably correct KNN neighborhoods of
* some size can be derived on the fly in a closed form calculation.</P>
* The primary method employed is functional mapping of ordinal spaces to vector spaces.
* In this way, closed-form functions can be used to synthesize vectors and provably correct neighborhoods
* as if they were defined in a static dataset. This allows for arbitrary testing scenarios to be
* created and used immediately and with no need to regenerate or compute any data beforehand.</P>
*
* <P>The original concept for this was derived by Shaunak Das, in the form of (Das) Direct Nearest Neighbor.
* Additional methods have been implemented using this technique to include additional space mappings
* for other vector distance functions.</P>
*
* <P>The testing methods enabled by this approach include:
* <OL>
* <LI>Generation of a population of vectors which are enumerable and stable with respect to their
* ordinal addresses.</LI>
* <LI>Generation of ordered subsets of this population which maintain a unique local ordering in
* terms of the selected distance function, otherwise known as rank for KNN queries.</LI>
* <LI>Validation of results for nearest neighborhood queries, using synthetic results computed on the fly as the
* basis for correctness.</LI>
* </OL>
* </P>
*
* <P>The vector spaces constructed in this way are not intended nor guaranteed to be dimensionally disperse.
* They are meant to provide an algebraic basis for exercising vector storage systems with increasing
@ -30,13 +43,22 @@
*
* <P>Each vector scheme in this method has the following properties:
* <UL>
* <LI>All vectors within the space are enumerable. Each increasing ordinal value describes a new and distinct
* vector. The value of this vector is deterministic within the parameters of the space.</LI>
* <LI>Each virtual vector space is defined by a set of parameters which are used as inputs to the
* mapping functions. The space, and the definition of valid vectors in a neighborhood depend on these
* for stability and correctness. Thus each space is explicitly defined by and inseparable from its parameters.</LI>
* <LI>All vectors within the space are enumerable. Each increasing ordinal value describes a new and distinct
* vector. The value of this vector is deterministic within the parameters of the space.</LI>
* <LI>Each vector within a space is a valid query vector which implies a correct set of distance-ranked
* neighbors up to some neighborhood size for the related distance function.</LI>
* <LI>Nearest neighbors may have equal distance in some cases, for which ties are accommodated in testing
* assertions. Suppose the distance from v<sub>10</sub> to v<sub>5</sub> is the same as the distance from
* v<sub>10</sub> to v<sub>15</sub>, then both v<sub>5</sub> and v<sub>15</sub> should be interchangeable as
* correct elements in any KNN results for query vector v<sub>10</sub>, provided that their distances are within
* the top K results as otherwise expected.
* </LI>
* </UL>
* </P>
*
* <P>This work is largely inspired by the DNN or "Das/Direct Nearest Neighbor" method, pioneered by
* Shaunak Das at DataStax. Additional implementations and ideas are contributed by the vector performance
* team and our testing community.</P>
* <P>TBD: Explain the above in terms of specific implementations and parameters.</P>
* </P>
*/
package io.nosqlbench.virtdata.lib.vectors.dnn;