Files

Ilya Churaev f639e4e902 Moved inference_engine samples to cpp folder (#8615 )

* Moved inference_engine samples to cpp folder

* Fixed documentations links

* Fixed installation

* Fixed scripts

* Fixed cmake script

* Try to fix install

* Fixed samples

* Some fix

2021-11-18 10:08:20 +03:00

12 KiB

Raw Blame History

Automatic Speech Recognition Python* Sample

This sample demonstrates how to do a Synchronous Inference of acoustic model based on Kaldi* neural networks and speech feature vectors.

The sample works with Kaldi ARK or Numpy* uncompressed NPZ files, so it does not cover an end-to-end speech recognition scenario (speech to text), requiring additional preprocessing (feature extraction) to get a feature vector from a speech signal, as well as postprocessing (decoding) to produce text from scores.

Automatic Speech Recognition Python sample application demonstrates how to use the following Inference Engine Python API in applications:

Feature	API	Description
Import/Export Model	IECore.import_network, ExecutableNetwork.export	The GNA plugin supports loading and saving of the GNA-optimized model
Network Operations	IENetwork.batch_size, CDataPtr.shape, ExecutableNetwork.input_info, ExecutableNetwork.outputs	Managing of network: configure input and output blobs
Network Operations	IENetwork.add_outputs	Managing of network: Change names of output layers in the network
InferRequest Operations	InferRequest.query_state, VariableState.reset	Gets and resets state control interface for given executable network

Basic Inference Engine API is covered by Hello Classification Python* Sample.

Options	Values
Validated Models	Acoustic model based on Kaldi* neural networks (see Model Preparation section)
Model Format	Inference Engine Intermediate Representation (.xml + .bin)
Supported devices	See Execution Modes section below and List Supported Devices
Other language realization	C++

How It Works

At startup, the sample application reads command-line parameters, loads a specified model and input data to the Inference Engine plugin, performs synchronous inference on all speech utterances stored in the input file, logging each step in a standard output stream.

You can see the explicit description of each sample step at Integration Steps section of "Integrate the Inference Engine with Your Application" guide.

GNA-specific details

Quantization

If the GNA device is selected (for example, using the -d GNA flag), the GNA Inference Engine plugin quantizes the model and input feature vector sequence to integer representation before performing inference.

The -qb flag provides a hint to the GNA plugin regarding the preferred target weight resolution for all layers.
For example, when -qb 8 is specified, the plugin will use 8-bit weights wherever possible in the network.

Note

:

It is not always possible to use 8-bit weights due to GNA hardware limitations. For example, convolutional layers always use 16-bit weights (GNA hardware version 1 and 2). This limitation will be removed in GNA hardware version 3 and higher.

Execution Modes

Several execution modes are supported via the -d flag:

CPU - All calculation are performed on CPU device using CPU Plugin.
GPU - All calculation are performed on GPU device using GPU Plugin.
MYRIAD - All calculation are performed on Intel® Neural Compute Stick 2 device using VPU MYRIAD Plugin.
GNA_AUTO - GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
GNA_HW - GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
GNA_SW - Deprecated. The GNA device is emulated in fast-but-not-bit-exact mode.
GNA_SW_FP32 - Substitutes parameters and calculations from low precision to floating point (FP32).
GNA_SW_EXACT - GNA device is emulated in bit-exact mode.

Loading and Saving Models

The GNA plugin supports loading and saving of the GNA-optimized model (non-IR) via the -rg and -wg flags.
Thereby, it is possible to avoid the cost of full model quantization at run time.

In addition to performing inference directly from a GNA model file, this option makes it possible to:

Convert from IR format to GNA format model file (-m, -wg)

Running

Run the application with the -h option to see the usage message:

python <path_to_sample>/speech_sample.py -h

Usage message:

usage: speech_sample.py [-h] (-m MODEL | -rg IMPORT_GNA_MODEL) -i INPUT       
                        [-o OUTPUT] [-r REFERENCE] [-d DEVICE]
                        [-bs BATCH_SIZE] [-qb QUANTIZATION_BITS]
                        [-sf SCALE_FACTOR] [-wg EXPORT_GNA_MODEL] [-pc]       
                        [-a {CORE,ATOM}] [-iname INPUT_LAYERS]
                        [-oname OUTPUT_LAYERS]

optional arguments:
  -m MODEL, --model MODEL
                        Path to an .xml file with a trained model (required if
                        -rg is missing).
  -rg IMPORT_GNA_MODEL, --import_gna_model IMPORT_GNA_MODEL
                        Read GNA model from file using path/filename provided
                        (required if -m is missing).

Options:
  -h, --help            Show this help message and exit.
  -i INPUT, --input INPUT
                        Required. Path to an input file (.ark or .npz).
  -o OUTPUT, --output OUTPUT
                        Optional. Output file name to save inference results
                        (.ark or .npz).
  -r REFERENCE, --reference REFERENCE
                        Optional. Read reference score file and compare
                        scores.
  -d DEVICE, --device DEVICE
                        Optional. Specify a target device to infer on. CPU,
                        GPU, MYRIAD, GNA_AUTO, GNA_HW, GNA_SW_FP32,
                        GNA_SW_EXACT and HETERO with combination of GNA as the
                        primary device and CPU as a secondary (e.g.
                        HETERO:GNA,CPU) are supported. The sample will look
                        for a suitable plugin for device specified. Default
                        value is CPU.
  -bs BATCH_SIZE, --batch_size BATCH_SIZE
                        Optional. Batch size 1-8 (default 1).
  -qb QUANTIZATION_BITS, --quantization_bits QUANTIZATION_BITS
                        Optional. Weight bits for quantization: 8 or 16
                        (default 16).
  -sf SCALE_FACTOR, --scale_factor SCALE_FACTOR
                        Optional. The user-specified input scale factor for
                        quantization. If the network contains multiple inputs,
                        provide scale factors by separating them with commas.
  -wg EXPORT_GNA_MODEL, --export_gna_model EXPORT_GNA_MODEL
                        Optional. Write GNA model to file using path/filename
                        provided.
  -pc, --performance_counter
                        Optional. Enables performance report (specify -a to
                        ensure arch accurate results).
  -a {CORE,ATOM}, --arch {CORE,ATOM}
                        Optional. Specify architecture. CORE, ATOM with the
                        combination of -pc.
  -iname INPUT_LAYERS, --input_layers INPUT_LAYERS
                        Optional. Layer names for input blobs. The names are
                        separated with ",". Allows to change the order of
                        input layers for -i flag. Example: Input1,Input2
  -oname OUTPUT_LAYERS, --output_layers OUTPUT_LAYERS
                        Optional. Layer names for output blobs. The names are
                        separated with ",". Allows to change the order of
                        output layers for -o flag. Example:
                        Output1:port,Output2:port.

Model Preparation

You can use the following model optimizer command to convert a Kaldi nnet1 or nnet2 neural network to Inference Engine Intermediate Representation format:

python <path_to_mo>/mo.py --framework kaldi --input_model wsj_dnn5b.nnet --counts wsj_dnn5b.counts --remove_output_softmax --output_dir <path_to_dir>

The following pre-trained models are available:

wsj_dnn5b_smbr
rm_lstm4f
rm_cnn4a_smbr

All of them can be downloaded from https://storage.openvinotoolkit.org/models_contrib/speech/2021.2.

Speech Inference

You can do inference on Intel® Processors with the GNA co-processor (or emulation library):

python <path_to_sample>/speech_sample.py -m <path_to_model>/wsj_dnn5b.xml -i <path_to_ark>/dev93_10.ark -r <path_to_ark>/dev93_scores_10.ark -d GNA_AUTO -o result.npz

NOTES:

Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

The sample supports input and output in numpy file format (.npz)

Sample Output

The sample application logs each step in a standard output stream.

[ INFO ] Creating Inference Engine
[ INFO ] Reading the network: wsj_dnn5b.xml
[ INFO ] Configuring input and output blobs
[ INFO ] Using scale factor(s) calculated from first utterance
[ INFO ] For input 0 using scale factor of 2175.4322418
[ INFO ] Loading the model to the plugin
[ INFO ] Starting inference in synchronous mode
[ INFO ] Utterance 0 (4k0c0301)
[ INFO ] Output blob name: affinetransform14/Fused_Add_
[ INFO ] Frames in utterance: 1294
[ INFO ] Total time in Infer (HW and SW): 6211.45ms
[ INFO ] max error: 0.7051840
[ INFO ] avg error: 0.0448388
[ INFO ] avg rms error: 0.0582387
[ INFO ] stdev error: 0.0371650
[ INFO ]
[ INFO ] Utterance 1 (4k0c0302)
[ INFO ] Output blob name: affinetransform14/Fused_Add_
[ INFO ] Frames in utterance: 1005
[ INFO ] Total time in Infer (HW and SW): 4742.27ms
[ INFO ] max error: 0.7575974
[ INFO ] avg error: 0.0452166
[ INFO ] avg rms error: 0.0586013
[ INFO ] stdev error: 0.0372769
...
[ INFO ] Total sample time: 40219.99ms
[ INFO ] File result.npz was created!
[ INFO ] This sample is an API example, for any performance measurements please use the dedicated benchmark_app tool

12 KiB Raw Blame History