format fixes (#20030)

This commit is contained in:
Tatiana Savina 2023-09-26 10:49:03 +02:00 committed by GitHub
parent 69c237f340
commit b6427ad0fb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -15,31 +15,35 @@
**Algorithm formulation**: **Algorithm formulation**:
Note: The following notation .. note::
The following notation denotes a range of real numbers between a and b.
.. math:: .. math::
[a, b] => { x \in \mathbb{R}, a <= x <= b } [a, b] => { x \in \mathbb{R}, a <= x <= b }
denotes a range of real numbers between a and b.
Given a list of probabilities x1, x2, ..., xn: Given a list of probabilities x1, x2, ..., xn:
* If *log_probs* is true: * If *log_probs* is true:
* For each probability x, replace it with a value e^x
* Create an array - discrete CDF ([Cumulative Distribution Function](https://en.wikipedia.org/wiki/Cumulative_distribution_function)) - the cumulative sum of those probabilities, ie. create an array of values where the ith value is the sum of the probabilities x1, ..., xi * For each probability x, replace it with a value :math:`e^{x}`.
* Create an array - discrete CDF (`Cumulative Distribution Function <https://en.wikipedia.org/wiki/Cumulative_distribution_function>`__) - the cumulative sum of those probabilities, ie. create an array of values where the ith value is the sum of the probabilities x1, ..., xi.
* Divide the created array by its maximum value to normalize the cumulative probabilities between the real values in the range [0, 1]. This array is, by definition of CDF, sorted in ascending order, hence the maximum value is the last value of the array. * Divide the created array by its maximum value to normalize the cumulative probabilities between the real values in the range [0, 1]. This array is, by definition of CDF, sorted in ascending order, hence the maximum value is the last value of the array.
* Randomly generate a sequence of double-precision floating point numbers in the range [0, 1]. * Randomly generate a sequence of double-precision floating point numbers in the range [0, 1].
* For each generated number, assign the class with the lowest index for which the cumulative probability is less or equal to the generated value. * For each generated number, assign the class with the lowest index for which the cumulative probability is less or equal to the generated value.
* If *with_replacement* is False (sampling without replacement): * If *with_replacement* is False (sampling without replacement):
* Assume a class with index i has been selected - then every CDF value starting at i-th index should be lowered by the original probability of the selected class. This effectively sets the probability of sampling the given class to 0. * Assume a class with index i has been selected - then every CDF value starting at i-th index should be lowered by the original probability of the selected class. This effectively sets the probability of sampling the given class to 0.
* Afterwards, divide the CDF by its last (maximum) value to normalize the cumulative probabilities between the real values in the range [0, 1] * Afterwards, divide the CDF by its last (maximum) value to normalize the cumulative probabilities between the real values in the range [0, 1].
* Convert the output indices to *convert_type* * Convert the output indices to *convert_type*.
* Return output indices * Return output indices.
**Example computations**: **Example computations**:
Example 1 - 1D tensor Example 1 - 1D tensor
* Let ``probs`` = ``[0.1, 0.5, 0.4]``, ``num_samples`` = 5, ``log_probs`` = false, ``with_replacement`` = true * Let ``probs`` = ``[0.1, 0.5, 0.4]``, ``num_samples`` = 5, ``log_probs`` = false, ``with_replacement`` = true
@ -67,7 +71,7 @@ Example 3 - 1D tensor, without replacement
* Therefore, in CDF, for every class starting with idx ``1`` subtract the probability of class at idx ``1`` = ``probs[1]`` = 0.5 * Therefore, in CDF, for every class starting with idx ``1`` subtract the probability of class at idx ``1`` = ``probs[1]`` = 0.5
* CDF = ``[0.1, 0.6 - 0.5, 1.0 - 0.5]`` = ``[0.1, 0.1, 0.5]`` * CDF = ``[0.1, 0.6 - 0.5, 1.0 - 0.5]`` = ``[0.1, 0.1, 0.5]``
* Normalize CDF by dividing by last value: CDF = ``[0.2, 0.2, 1.0]`` * Normalize CDF by dividing by last value: CDF = ``[0.2, 0.2, 1.0]``
* Take the next randomly generated float, here 0.2, and repeat until all random samples have assigned classes. Notice that for ``sampled values`` <= 0.2, only the class wih idx ``0`` can be selected, since the search stops at the index with the first value satisfying ``sample value`` <= ``CDF probability`` * Take the next randomly generated float, here 0.2, and repeat until all random samples have assigned classes. Notice that for ``sampled values`` <= 0.2, only the class with idx ``0`` can be selected, since the search stops at the index with the first value satisfying ``sample value`` <= ``CDF probability``
* Assigned classes = ``[1, 2]`` * Assigned classes = ``[1, 2]``
@ -86,8 +90,8 @@ Example 3 - 1D tensor, without replacement
* **Description**: controls whether to sample with replacement (classes can be sampled multiple times). * **Description**: controls whether to sample with replacement (classes can be sampled multiple times).
* **Range of values**: `true`, `false` * **Range of values**: `true`, `false`
* ``true`` - class indices can be sampled multiple times. * ``true`` - class indices can be sampled multiple times.
* ``false`` - class indices will not repeat in the output and the size of ``probs``' ``class_size`` dimension is required to be larger or equal to *num_samples* value. Might affect performance. * ``false`` - class indices will not repeat in the output and the size of ``probs``' ``class_size`` dimension is required to be larger or equal to *num_samples* value. Might affect performance.
* **Type**: `bool` * **Type**: `bool`
* **Required**: *Yes* * **Required**: *Yes*
@ -97,8 +101,8 @@ Example 3 - 1D tensor, without replacement
* **Description**: allows to control whether *inputs* should be treated as probabilities or unnormalized log probabilities. * **Description**: allows to control whether *inputs* should be treated as probabilities or unnormalized log probabilities.
* **Range of values**: `true`, `false` * **Range of values**: `true`, `false`
* ``true`` - set values in *inputs* are unnormalized log probabilities that can be any real number. * ``true`` - set values in *inputs* are unnormalized log probabilities that can be any real number.
* ``false`` - probabilities in *inputs* are expected to be non-negative, finite and have a non-zero-sum. * ``false`` - probabilities in *inputs* are expected to be non-negative, finite and have a non-zero-sum.
* **Type**: `bool` * **Type**: `bool`
* **Required**: *Yes* * **Required**: *Yes*