Fix documentation (md and inline) for C++ and Python spech samples (#16185)
* Fix documentation (md and inline) for C++ and Python spech samples * Fix clang-format * Minor fix * Fix clang-format * Fix a typo * Fix according to Mike's review * Fix clang-format
This commit is contained in:
@@ -39,10 +39,9 @@ each sample step at [Integration Steps](../../../docs/OV_Runtime_UG/integrate_wi
|
||||
|
||||
If the GNA device is selected (for example, using the `-d` GNA flag), the GNA OpenVINO™ Runtime plugin quantizes the model and input feature vector sequence to integer representation before performing inference.
|
||||
Several parameters control neural network quantization. The `-q` flag determines the quantization mode.
|
||||
Three modes are supported:
|
||||
Two modes are supported:
|
||||
|
||||
- *static* - The first utterance in the input file is scanned for dynamic range. The scale factor (floating point scalar multiplier) required to scale the maximum input value of the first utterance to 16384 (15 bits) is used for all subsequent inputs. The neural network is quantized to accommodate the scaled input dynamic range.
|
||||
- *dynamic* - The scale factor for each input batch is computed just before inference on that batch. The input and network are (re)quantized on the fly using an efficient procedure.
|
||||
- *user-defined* - The user may specify a scale factor via the `-sf` flag that will be used for static quantization.
|
||||
|
||||
The `-qb` flag provides a hint to the GNA plugin regarding the preferred target weight resolution for all layers. For example, when `-qb 8` is specified, the plugin will use 8-bit weights wherever possible in the
|
||||
@@ -56,9 +55,9 @@ network.
|
||||
|
||||
Several execution modes are supported via the `-d` flag:
|
||||
|
||||
- `CPU` - All calculation are performed on CPU device using CPU Plugin.
|
||||
- `GPU` - All calculation are performed on GPU device using GPU Plugin.
|
||||
- `VPUX` - All calculation are performed on VPUX device using VPUX Plugin.
|
||||
- `CPU` - All calculations are performed on CPU device using CPU Plugin.
|
||||
- `GPU` - All calculations are performed on GPU device using GPU Plugin.
|
||||
- `VPUX` - All calculations are performed on VPUX device using VPUX Plugin.
|
||||
- `GNA_AUTO` - GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
|
||||
- `GNA_HW` - GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
|
||||
- `GNA_SW` - Deprecated. The GNA device is emulated in fast-but-not-bit-exact mode.
|
||||
@@ -99,16 +98,16 @@ speech_sample [OPTION]
|
||||
Options:
|
||||
|
||||
-h Print a usage message.
|
||||
-i "<path>" Required. Paths to input file or Layers names with corresponding paths to the input files. Example of usage for single file: <file.ark> or <file.npz>. Example of usage for named layers: <layer1>=<file1.ark>,<layer2>=<file2.ark>.
|
||||
-i "<path>" Required. Path(s) to input file(s). Usage for a single file/layer: <input_file.ark> or <input_file.npz>. Example of usage for several files/layers: <layer1>:<port_num1>=<input_file1.ark>,<layer2>:<port_num2>=<input_file2.ark>.
|
||||
-m "<path>" Required. Path to an .xml file with a trained model (required if -rg is missing).
|
||||
-o "<path>" Optional. Output file name to save scores or Layer names with corresponding files names to save scores. Example of usage for single file: <output.ark> or <output.npz>. Example of usage for named layers: Example of usage for named layers: <layer1:port_num>=<output_file1.ark>,<layer2:port_num>=<output_file2.ark>.
|
||||
-o "<path>" Optional. Output file name(s) to save scores (inference results). Example of usage for a single file/layer: <output_file.ark> or <output_file.npz>. Example of usage for several files/layers: <layer1>:<port_num1>=<output_file1.ark>,<layer2>:<port_num2>=<output_file2.ark>.
|
||||
-d "<device>" Optional. Specify a target device to infer on. CPU, GPU, VPUX, GNA_AUTO, GNA_HW, GNA_HW_WITH_SW_FBACK, GNA_SW_FP32, GNA_SW_EXACT and HETERO with combination of GNA as the primary device and CPU as a secondary (e.g. HETERO:GNA,CPU) are supported. The sample will look for a suitable plugin for device specified.
|
||||
-pc Optional. Enables per-layer performance report.
|
||||
-q "<mode>" Optional. Input quantization mode: static (default), dynamic, or user (use with -sf).
|
||||
-qb "<integer>" Optional. Weight bits for quantization: 8 or 16 (default)
|
||||
-sf "<double>" Optional. User-specified input scale factor for quantization (use with -q user). If the network contains multiple inputs, provide scale factors by separating them with commas. For example: <input_name1>:<sf1>,<input_name2>:<sf2> or just <sf> to be applied to all inputs
|
||||
-q "<mode>" Optional. Input quantization mode for GNA: static (default) or user defined (use with -sf).
|
||||
-qb "<integer>" Optional. Weight resolution in bits for GNA quantization: 8 or 16 (default)
|
||||
-sf "<double>" Optional. User-specified input scale factor for GNA quantization (use with -q user). If the model contains multiple inputs, provide scale factors by separating them with commas. For example: <layer1>:<sf1>,<layer2>:<sf2> or just <sf> to be applied to all inputs.
|
||||
-bs "<integer>" Optional. Batch size 1-8 (default 1)
|
||||
-r "<path>" Optional. Read reference score file or named layers with corresponding score files and compare scores. Example of usage for single file: <reference.ark> or <reference.npz>. Example of usage for named layers: Example of usage for named layers: <layer1:port_num>=<reference_file2.ark>,<layer2:port_num>=<reference_file2.ark>.
|
||||
-r "<path>" Optional. Read reference score file(s) and compare inference results with reference scores. Usage for a single file/layer: <reference.ark> or <reference.npz>. Example of usage for several files/layers: <layer1>:<port_num1>=<reference_file1.ark>,<layer2>:<port_num2>=<reference_file2.ark>.
|
||||
-rg "<path>" Read GNA model from file using path/filename provided (required if -m is missing).
|
||||
-wg "<path>" Optional. Write GNA model to file using path/filename provided.
|
||||
-we "<path>" Optional. Write GNA embedded model to file using path/filename provided.
|
||||
@@ -120,7 +119,7 @@ Options:
|
||||
-compile_target "<string>" Optional. Specify GNA compile target generation. May be one of GNA_TARGET_2_0, GNA_TARGET_3_0. By default, generation corresponds to the GNA HW available in the system or the latest fully supported generation by the software. See the GNA Plugin's GNA_COMPILE_TARGET config option description.
|
||||
-memory_reuse_off Optional. Disables memory optimizations for compiled model.
|
||||
|
||||
Available target devices: CPU GNA GPU VPUX
|
||||
Available target devices: CPU GNA GPU VPUX
|
||||
```
|
||||
|
||||
### <a name="model-preparation-speech"></a> Model Preparation
|
||||
|
||||
@@ -14,9 +14,10 @@
|
||||
static const char help_message[] = "Print a usage message.";
|
||||
|
||||
/// @brief message for input data argument
|
||||
static const char input_message[] = "Required. Paths to input file or Layers names with corresponding paths to the "
|
||||
"input files. Example of usage for single file: <file.ark> or <file.npz>. Example "
|
||||
"of usage for named layers: <layer1>=<file1.ark>,<layer2>=<file2.ark>.";
|
||||
static const char input_message[] = "Required. Path(s) to input file(s). "
|
||||
"Usage for a single file/layer: <input_file.ark> or <input_file.npz>. "
|
||||
"Example of usage for several files/layers: "
|
||||
"<layer1>:<port_num1>=<input_file1.ark>,<layer2>:<port_num2>=<input_file2.ark>.";
|
||||
|
||||
/// @brief message for model argument
|
||||
static const char model_message[] = "Required. Path to an .xml file with a trained model (required if -rg is missing).";
|
||||
@@ -60,16 +61,17 @@ static const char custom_cpu_library_message[] = "Required for CPU plugin custom
|
||||
"Absolute path to a shared library with the kernels implementations.";
|
||||
|
||||
/// @brief message for score output argument
|
||||
static const char output_message[] =
|
||||
"Optional. Output file name to save scores or Layer names with corresponding files names to save scores. Example "
|
||||
"of usage for single file: <output.ark> or <output.npz>. Example of usage for named layers: "
|
||||
"<layer1:port_num>=<output_file1.ark>,<layer2:port_num>=<output_file2.ark>.";
|
||||
static const char output_message[] = "Optional. Output file name(s) to save scores (inference results). "
|
||||
"Usage for a single file/layer: <output_file.ark> or <output_file.npz>. "
|
||||
"Example of usage for several files/layers: "
|
||||
"<layer1>:<port_num1>=<output_file1.ark>,<layer2>:<port_num2>=<output_file2.ark>.";
|
||||
|
||||
/// @brief message for reference score file argument
|
||||
static const char reference_score_message[] =
|
||||
"Optional. Read reference score file or named layers with corresponding score files and compare scores. Example of "
|
||||
"usage for single file: <reference.ark> or <reference.npz>. Example of usage for named layers: "
|
||||
"<layer1:port_num>=<reference_file2.ark>,<layer2:port_num>=<reference_file2.ark>.";
|
||||
"Optional. Read reference score file(s) and compare inference results with reference scores. "
|
||||
"Usage for a single file/layer: <reference_file.ark> or <reference_file.npz>. "
|
||||
"Example of usage for several files/layers: "
|
||||
"<layer1>:<port_num1>=<reference_file1.ark>,<layer2>:<port_num2>=<reference_file2.ark>.";
|
||||
|
||||
/// @brief message for read GNA model argument
|
||||
static const char read_gna_model_message[] =
|
||||
@@ -89,17 +91,17 @@ static const char write_embedded_model_generation_message[] =
|
||||
|
||||
/// @brief message for quantization argument
|
||||
static const char quantization_message[] =
|
||||
"Optional. Input quantization mode: static (default), dynamic, or user (use with -sf).";
|
||||
"Optional. Input quantization mode for GNA: static (default) or user defined (use with -sf).";
|
||||
|
||||
/// @brief message for quantization bits argument
|
||||
static const char quantization_bits_message[] = "Optional. Weight bits for quantization: 8 or 16 (default)";
|
||||
static const char quantization_bits_message[] =
|
||||
"Optional. Weight resolution in bits for GNA quantization: 8 or 16 (default)";
|
||||
|
||||
/// @brief message for scale factor argument
|
||||
static const char scale_factor_message[] =
|
||||
"Optional. User-specified input scale factor for quantization (use with -q user). "
|
||||
"If the network contains multiple inputs, provide scale factors by separating them with "
|
||||
"commas. "
|
||||
"For example: <input_name1>:<sf1>,<input_name2>:<sf2> or just <sf> to be applied to all inputs";
|
||||
"Optional. User-specified input scale factor for GNA quantization (use with -q user). "
|
||||
"If the model contains multiple inputs, provide scale factors by separating them with commas. "
|
||||
"For example: <layer1>:<sf1>,<layer2>:<sf2> or just <sf> to be applied to all inputs.";
|
||||
|
||||
/// @brief message for batch size argument
|
||||
static const char batch_size_message[] = "Optional. Batch size 1-8 (default 1)";
|
||||
@@ -173,7 +175,7 @@ DEFINE_string(we, "", write_embedded_model_message);
|
||||
/// @brief Input quantization mode (default static)
|
||||
DEFINE_string(q, "static", quantization_message);
|
||||
|
||||
/// @brief Input quantization bits (default 16)
|
||||
/// @brief Weight resolution in bits (default 16)
|
||||
DEFINE_int32(qb, 16, quantization_bits_message);
|
||||
|
||||
/// @brief Scale factor for quantization
|
||||
@@ -284,12 +286,8 @@ bool parse_and_check_command_line(int argc, char* argv[]) {
|
||||
}
|
||||
|
||||
/** default is a static quantization **/
|
||||
if ((FLAGS_q.compare("static") != 0) && (FLAGS_q.compare("dynamic") != 0) && (FLAGS_q.compare("user") != 0)) {
|
||||
throw std::logic_error("Quantization mode not supported (static, dynamic, user).");
|
||||
}
|
||||
|
||||
if (FLAGS_q.compare("dynamic") == 0) {
|
||||
throw std::logic_error("Dynamic quantization not yet supported.");
|
||||
if ((FLAGS_q.compare("static") != 0) && (FLAGS_q.compare("user") != 0)) {
|
||||
throw std::logic_error("Quantization mode not supported (static, user).");
|
||||
}
|
||||
|
||||
if (FLAGS_qb != 16 && FLAGS_qb != 8) {
|
||||
|
||||
@@ -55,9 +55,9 @@ model.
|
||||
|
||||
Several execution modes are supported via the `-d` flag:
|
||||
|
||||
- `CPU` - All calculation are performed on CPU device using CPU Plugin.
|
||||
- `GPU` - All calculation are performed on GPU device using GPU Plugin.
|
||||
- `VPUX` - All calculation are performed on VPUX device using VPUX Plugin.
|
||||
- `CPU` - All calculations are performed on CPU device using CPU Plugin.
|
||||
- `GPU` - All calculations are performed on GPU device using GPU Plugin.
|
||||
- `VPUX` - All calculations are performed on VPUX device using VPUX Plugin.
|
||||
- `GNA_AUTO` - GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
|
||||
- `GNA_HW` - GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
|
||||
- `GNA_SW` - Deprecated. The GNA device is emulated in fast-but-not-bit-exact mode.
|
||||
@@ -102,11 +102,17 @@ optional arguments:
|
||||
Options:
|
||||
-h, --help Show this help message and exit.
|
||||
-i INPUT, --input INPUT
|
||||
Required. Path to an input file (.ark or .npz).
|
||||
Required. Path(s) to input file(s).
|
||||
Usage for a single file/layer: <input_file.ark> or <input_file.npz>.
|
||||
Example of usage for several files/layers: <layer1>:<port_num1>=<input_file1.ark>,<layer2>:<port_num2>=<input_file2.ark>.
|
||||
-o OUTPUT, --output OUTPUT
|
||||
Optional. Output file name to save inference results (.ark or .npz).
|
||||
Optional. Output file name(s) to save scores (inference results).
|
||||
Usage for a single file/layer: <output_file.ark> or <output_file.npz>.
|
||||
Example of usage for several files/layers: <layer1>:<port_num1>=<output_file1.ark>,<layer2>:<port_num2>=<output_file2.ark>.
|
||||
-r REFERENCE, --reference REFERENCE
|
||||
Optional. Read reference score file and compare scores.
|
||||
Read reference score file(s) and compare inference results with reference scores.
|
||||
Usage for a single file/layer: <reference_file.ark> or <reference_file.npz>.
|
||||
Example of usage for several files/layers: <layer1>:<port_num1>=<reference_file1.ark>,<layer2>:<port_num2>=<reference_file2.ark>.
|
||||
-d DEVICE, --device DEVICE
|
||||
Optional. Specify a target device to infer on. CPU, GPU, VPUX, GNA_AUTO, GNA_HW, GNA_SW_FP32,
|
||||
GNA_SW_EXACT and HETERO with combination of GNA as the primary device and CPU as a secondary (e.g.
|
||||
@@ -117,10 +123,11 @@ Options:
|
||||
-layout LAYOUT Optional. Custom layout in format: "input0[value0],input1[value1]" or "[value]" (applied to all
|
||||
inputs)
|
||||
-qb [8, 16], --quantization_bits [8, 16]
|
||||
Optional. Weight bits for quantization: 8 or 16 (default 16).
|
||||
Optional. Weight resolution in bits for GNA quantization: 8 or 16 (default 16).
|
||||
-sf SCALE_FACTOR, --scale_factor SCALE_FACTOR
|
||||
Optional. The user-specified input scale factor for quantization. If the model contains multiple
|
||||
inputs, provide scale factors by separating them with commas.
|
||||
Optional. User-specified input scale factor for GNA quantization.
|
||||
If the model contains multiple inputs, provide scale factors by separating them with commas.
|
||||
For example: <layer1>:<sf1>,<layer2>:<sf2> or just <sf> to be applied to all inputs.
|
||||
-wg EXPORT_GNA_MODEL, --export_gna_model EXPORT_GNA_MODEL
|
||||
Optional. Write GNA model to file using path/filename provided.
|
||||
-we EXPORT_EMBEDDED_GNA_MODEL, --export_embedded_gna_model EXPORT_EMBEDDED_GNA_MODEL
|
||||
@@ -135,12 +142,6 @@ Options:
|
||||
Optional. Enables performance report (specify -a to ensure arch accurate results).
|
||||
-a [CORE, ATOM], --arch [CORE, ATOM]
|
||||
Optional. Specify architecture. CORE, ATOM with the combination of -pc.
|
||||
-iname INPUT_LAYERS, --input_layers INPUT_LAYERS
|
||||
Optional. Layer names for input blobs. The names are separated with ",". Allows to change the order
|
||||
of input layers for -i flag. Example: Input1,Input2
|
||||
-oname OUTPUT_LAYERS, --output_layers OUTPUT_LAYERS
|
||||
Optional. Layer names for output blobs. The names are separated with ",". Allows to change the
|
||||
order of output layers for -o flag. Example: Output1:port,Output2:port.
|
||||
-cw_l CONTEXT_WINDOW_LEFT, --context_window_left CONTEXT_WINDOW_LEFT
|
||||
Optional. Number of frames for left context windows (default is 0). Works only with context window
|
||||
models. If you use the cw_l or cw_r flag, then batch size argument is ignored.
|
||||
|
||||
@@ -18,11 +18,18 @@ def build_arg_parser() -> argparse.ArgumentParser:
|
||||
help='Read GNA model from file using path/filename provided (required if -m is missing).')
|
||||
|
||||
args.add_argument('-h', '--help', action='help', help='Show this help message and exit.')
|
||||
args.add_argument('-i', '--input', required=True, type=str, help='Required. Path to an input file (.ark or .npz).')
|
||||
args.add_argument('-i', '--input', required=True, type=str,
|
||||
help='Required. Path(s) to input file(s). '
|
||||
'Usage for a single file/layer: <input_file.ark> or <input_file.npz>. '
|
||||
'Example of usage for several files/layers: <layer1>:<port_num1>=<input_file1.ark>,<layer2>:<port_num2>=<input_file2.ark>.')
|
||||
args.add_argument('-o', '--output', type=str,
|
||||
help='Optional. Output file name to save inference results (.ark or .npz).')
|
||||
help='Optional. Output file name(s) to save scores (inference results). '
|
||||
'Usage for a single file/layer: <output_file.ark> or <output_file.npz>. '
|
||||
'Example of usage for several files/layers: <layer1>:<port_num1>=<output_file1.ark>,<layer2>:<port_num2>=<output_file2.ark>.')
|
||||
args.add_argument('-r', '--reference', type=str,
|
||||
help='Optional. Read reference score file and compare scores.')
|
||||
help='Optional. Read reference score file(s) and compare inference results with reference scores. '
|
||||
'Usage for a single file/layer: <reference_file.ark> or <reference_file.npz>. '
|
||||
'Example of usage for several files/layers: <layer1>:<port_num1>=<reference_file1.ark>,<layer2>:<port_num2>=<reference_file2.ark>.')
|
||||
args.add_argument('-d', '--device', default='CPU', type=str,
|
||||
help='Optional. Specify a target device to infer on. '
|
||||
'CPU, GPU, VPUX, GNA_AUTO, GNA_HW, GNA_SW_FP32, GNA_SW_EXACT and HETERO with combination of GNA'
|
||||
@@ -33,9 +40,11 @@ def build_arg_parser() -> argparse.ArgumentParser:
|
||||
args.add_argument('-layout', type=str,
|
||||
help='Optional. Custom layout in format: "input0[value0],input1[value1]" or "[value]" (applied to all inputs)')
|
||||
args.add_argument('-qb', '--quantization_bits', default=16, type=int, choices=(8, 16), metavar='[8, 16]',
|
||||
help='Optional. Weight bits for quantization: 8 or 16 (default 16).')
|
||||
help='Optional. Weight resolution in bits for GNA quantization: 8 or 16 (default 16).')
|
||||
args.add_argument('-sf', '--scale_factor', type=str,
|
||||
help='Optional. The user-specified input scale factor for quantization.')
|
||||
help='Optional. User-specified input scale factor for GNA quantization. '
|
||||
'If the model contains multiple inputs, provide scale factors by separating them with commas. '
|
||||
'For example: <layer1>:<sf1>,<layer2>:<sf2> or just <sf> to be applied to all inputs.')
|
||||
args.add_argument('-wg', '--export_gna_model', type=str,
|
||||
help='Optional. Write GNA model to file using path/filename provided.')
|
||||
args.add_argument('-we', '--export_embedded_gna_model', type=str,
|
||||
|
||||
Reference in New Issue
Block a user