[GNA] Update documentation (cherry-pick from release) (#10974)

* [GNA] Update documentation (release) (#10873) * parent 5f755d5e4a author Nadezhda Ageeva <nadezhda.ageeva@intel.com> 1646919359 +0300 committer Nadezhda Ageeva <nadezhda.ageeva@intel.com> 1647270928 +0300 [GNA] Updte documentation (release) Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Denis Orlov <denis.orlov@intel.com> Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Denis Orlov <denis.orlov@intel.com> Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Denis Orlov <denis.orlov@intel.com> Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Denis Orlov <denis.orlov@intel.com> Apply comments Move snippets to separate file Add notes about POT and 2d convolutions * Add lins to GNA setup * cleanup after rebase * [GNA] small docs fixes (#10959) * [GNA] small docs fixes * Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Victoria Yashina <victoria.yashina@intel.com> * Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Victoria Yashina <victoria.yashina@intel.com> * Update docs/OV_Runtime_UG/supported_plugins/GNA.md Co-authored-by: Victoria Yashina <victoria.yashina@intel.com> Co-authored-by: Victoria Yashina <victoria.yashina@intel.com> Co-authored-by: Victoria Yashina <victoria.yashina@intel.com>
2022-03-16 14:37:55 +03:00 · 2022-03-16 14:37:55 +03:00 · 097006d97a
commit 097006d97a
parent 848a824260
14 changed files with 30838 additions and 479 deletions
--- a/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md
+++ b/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md
@ -16,16 +16,16 @@

 The OpenVINO Runtime provides capabilities to infer deep learning models on the following device types with corresponding plugins:

-| Plugin                                   | Device types                                                                                                                                                |
-|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
-|[CPU plugin](CPU.md)              |Intel&reg; Xeon&reg; with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel&reg; Core&trade; Processors with Intel&reg; AVX2, Intel&reg; Atom&reg; Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
-|[GPU plugin](GPU.md)            |Intel® Graphics, including Intel® HD Graphics, Intel® UHD Graphics, Intel® Iris® Graphics, Intel® Xe Graphics, Intel® Xe MAX Graphics |
-|[VPU plugins](VPU.md)            |Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs                                                                                           |
-|[GNA plugin](GNA.md)              |Intel&reg; Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel&reg; Pentium&reg; Silver J5005 Processor, Intel&reg; Pentium&reg; Silver N5000 Processor, Intel&reg; Celeron&reg; J4005 Processor, Intel&reg; Celeron&reg; J4105 Processor, Intel&reg; Celeron&reg; Processor N4100, Intel&reg; Celeron&reg; Processor N4000, Intel&reg; Core&trade; i3-8121U Processor, Intel&reg; Core&trade; i7-1065G7 Processor, Intel&reg; Core&trade; i7-1060G7 Processor, Intel&reg; Core&trade; i5-1035G4 Processor, Intel&reg; Core&trade; i5-1035G7 Processor, Intel&reg; Core&trade; i5-1035G1 Processor, Intel&reg; Core&trade; i5-1030G7 Processor, Intel&reg; Core&trade; i5-1030G4 Processor, Intel&reg; Core&trade; i3-1005G1 Processor, Intel&reg; Core&trade; i3-1000G1 Processor, Intel&reg; Core&trade; i3-1000G4 Processor|
+| Plugin | Device types                                                                                                                                                |
+|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|[CPU](CPU.md)              |Intel&reg; Xeon&reg; with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel&reg; Core&trade; Processors with Intel&reg; AVX2, Intel&reg; Atom&reg; Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
+|[GPU](GPU.md)            |Intel® Graphics, including Intel® HD Graphics, Intel® UHD Graphics, Intel® Iris® Graphics, Intel® Xe Graphics, Intel® Xe MAX Graphics |
+|[VPUs](VPU.md)            |Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs                                                                                           |
+|[GNA](GNA.md)              |[Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html); [Amazon Alexa\* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice); [Intel® Pentium® Silver Processors N5xxx, J5xxx and Intel® Celeron® Processors N4xxx, J4xxx (formerly codenamed Gemini Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html): [Intel® Pentium® Silver J5005 Processor](https://ark.intel.com/content/www/us/en/ark/products/128984/intel-pentium-silver-j5005-processor-4m-cache-up-to-2-80-ghz.html), [Intel® Pentium® Silver N5000 Processor](https://ark.intel.com/content/www/us/en/ark/products/128990/intel-pentium-silver-n5000-processor-4m-cache-up-to-2-70-ghz.html), [Intel® Celeron® J4005 Processor](https://ark.intel.com/content/www/us/en/ark/products/128992/intel-celeron-j4005-processor-4m-cache-up-to-2-70-ghz.html), [Intel® Celeron® J4105 Processor](https://ark.intel.com/content/www/us/en/ark/products/128989/intel-celeron-j4105-processor-4m-cache-up-to-2-50-ghz.html), [Intel® Celeron® J4125 Processor](https://ark.intel.com/content/www/us/en/ark/products/197305/intel-celeron-processor-j4125-4m-cache-up-to-2-70-ghz.html), [Intel® Celeron® Processor N4100](https://ark.intel.com/content/www/us/en/ark/products/128983/intel-celeron-processor-n4100-4m-cache-up-to-2-40-ghz.html), [Intel® Celeron® Processor N4000](https://ark.intel.com/content/www/us/en/ark/products/128988/intel-celeron-processor-n4000-4m-cache-up-to-2-60-ghz.html); [Intel® Pentium® Processors N6xxx, J6xxx, Intel® Celeron® Processors N6xxx, J6xxx and Intel Atom® x6xxxxx (formerly codenamed Elkhart Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/128825/products-formerly-elkhart-lake.html); [Intel® Core™ Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html); [10th Generation Intel® Core™ Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html): [Intel® Core™ i7-1065G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/196597/intel-core-i71065g7-processor-8m-cache-up-to-3-90-ghz.html), [Intel® Core™ i7-1060G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/197120/intel-core-i71060g7-processor-8m-cache-up-to-3-80-ghz.html), [Intel® Core™ i5-1035G4 Processor](https://ark.intel.com/content/www/us/en/ark/products/196591/intel-core-i51035g4-processor-6m-cache-up-to-3-70-ghz.html), [Intel® Core™ i5-1035G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/196592/intel-core-i51035g7-processor-6m-cache-up-to-3-70-ghz.html), [Intel® Core™ i5-1035G1 Processor](https://ark.intel.com/content/www/us/en/ark/products/196603/intel-core-i51035g1-processor-6m-cache-up-to-3-60-ghz.html), [Intel® Core™ i5-1030G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/197119/intel-core-i51030g7-processor-6m-cache-up-to-3-50-ghz.html), [Intel® Core™ i5-1030G4 Processor](https://ark.intel.com/content/www/us/en/ark/products/197121/intel-core-i51030g4-processor-6m-cache-up-to-3-50-ghz.html), [Intel® Core™ i3-1005G1 Processor](https://ark.intel.com/content/www/us/en/ark/products/196588/intel-core-i31005g1-processor-4m-cache-up-to-3-40-ghz.html), [Intel® Core™ i3-1000G1 Processor](https://ark.intel.com/content/www/us/en/ark/products/197122/intel-core-i31000g1-processor-4m-cache-up-to-3-20-ghz.html), [Intel® Core™ i3-1000G4 Processor](https://ark.intel.com/content/www/us/en/ark/products/197123/intel-core-i31000g4-processor-4m-cache-up-to-3-20-ghz.html); [11th Generation Intel® Core™ Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html); [12th Generation Intel® Core™ Processors (formerly codenamed Alder Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/147470/products-formerly-alder-lake.html)|

 OpenVINO runtime also has several execution capabilities which work on top of other devices:

-| Capability                                   | Description                                                                                                                                                |
+| Capability                               | Description                                                                                                                                                 |
 |------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
 |[Multi-Device execution](../multi_device.md) |Multi-Device enables simultaneous inference of the same model on several devices in parallel    |
 |[Auto-Device selection](../auto_device_selection.md) |Auto-Device selection enables selecting Intel&reg; device for inference automatically |
@ -38,17 +38,17 @@ Devices similar to the ones we have used for benchmarking can be accessed using
 ## Features support matrix
 The table below demonstrates support of key features by OpenVINO device plugins.

-| Capability | CPU | [GPU](./GPU.md) | GNA | VPU |
+| Capability | [CPU](CPU.md) | [GPU](GPU.md) | [GNA](GNA.md) | [VPU](VPU.md) |
 | ---------- | --- | --- | --- | --- |
-| [Heterogeneous execution](../hetero_execution.md)| Yes | Yes | ? | ? |
-| [Multi-device execution](../multi_device.md) | Yes | Yes | ? | ? |
-| [Automatic batching](../automatic_batching.md) | No | Yes | ? | ? |
-| [Multi-stream execution](@ref openvino_docs_optimization_guide_dldt_optimization_guide) | Yes | Yes | ? | ? |
-| [Models caching](../Model_caching_overview.md) | Yes | Partial | ? | ? |
-| [Dynamic shapes](../ov_dynamic_shapes.md) | Yes | Partial | ? | ? |
-| Import/Export | Yes | No | ? | ? |
-| [Preprocessing acceleration](../preprocessing_overview.md) | Yes | Yes | ? | ? |
-| [Stateful models](../network_state_intro.md) | Yes | No | ? | ? |
-| [Extensibility](@ref openvino_docs_Extensibility_UG_Intro) | Yes | Yes | ? | ? |
+| [Heterogeneous execution](../hetero_execution.md)| Yes | Yes | No | ? |
+| [Multi-device execution](../multi_device.md) | Yes | Yes | Partial | ? |
+| [Automatic batching](../automatic_batching.md) | No | Yes | No | ? |
+| [Multi-stream execution](@ref openvino_docs_optimization_guide_dldt_optimization_guide) | Yes | Yes | No | ? |
+| [Models caching](../Model_caching_overview.md) | Yes | Partial | Yes | ? |
+| [Dynamic shapes](../ov_dynamic_shapes.md) | Yes | Partial | No | ? |
+| Import/Export | Yes | No | Yes | ? |
+| [Preprocessing acceleration](../preprocessing_overview.md) | Yes | Yes | No | ? |
+| [Stateful models](../network_state_intro.md) | Yes | No | Yes | ? |
+| [Extensibility](@ref openvino_docs_Extensibility_UG_Intro) | Yes | Yes | No | ? |

-For more details on plugin specific feature limitation see corresponding plugin pages.
+For more details on plugin specific feature limitation, see corresponding plugin pages.
--- a/docs/OV_Runtime_UG/supported_plugins/GNA.md
+++ b/docs/OV_Runtime_UG/supported_plugins/GNA.md
@ -1,7 +1,6 @@
 # GNA device {#openvino_docs_OV_UG_supported_plugins_GNA}
-## Introducing the GNA Plugin

-The Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.
+The Intel® Gaussian & Neural Accelerator (GNA) is a low-power neural coprocessor for continuous inference at the edge.

 Intel® GNA is not intended to replace typical inference devices such as the
 CPU, graphics processing unit (GPU), or vision processing unit (VPU). It is designed for offloading
@ -10,371 +9,175 @@ to save power and free CPU resources.

 The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU.

-## Devices with Intel® GNA
-
-Devices with Intel® GNA support:
-
-* [Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html)
-
-* [Amazon Alexa\* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice)
-
-* [Intel® Pentium® Silver Processors N5xxx, J5xxx and Intel® Celeron® Processors N4xxx, J4xxx (formerly codenamed Gemini Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html):
-   - Intel® Pentium® Silver J5005 Processor
-   - Intel® Pentium® Silver N5000 Processor
-   - Intel® Celeron® J4005 Processor
-   - Intel® Celeron® J4105 Processor
-   - Intel® Celeron® J4125 Processor
-   - Intel® Celeron® Processor N4100
-   - Intel® Celeron® Processor N4000
-
-* [Intel® Pentium® Processors N6xxx, J6xxx, Intel® Celeron® Processors N6xxx, J6xxx and Intel Atom® x6xxxxx (formerly codenamed Elkhart Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/128825/products-formerly-elkhart-lake.html)
-
-* [Intel® Core™ Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html)
-
-* [10th Generation Intel® Core™ Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html):
-
-* [11th Generation Intel® Core™ Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html).
-
-* [12th Generation Intel® Core™ Processors (formerly codenamed Alder Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/147470/products-formerly-alder-lake.html).
-
-> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only.
+For more details on how to configure a machine to use GNA plugin, see [GNA configuration page](@ref openvino_docs_install_guides_configurations_for_intel_gna).

 ## Intel® GNA Generational Differences

-The first and second versions of Intel® GNA found in 10th and 11th generation Intel® Core™ Processors may be considered to be functionally equivalent.  Intel® GNA 2.0 provided performance improvement with respect to Intel® GNA 1.0.  Starting with 12th Generation Intel® Core™ Processors (formerly codenamed Alder Lake), support for Intel® GNA 3.0 features is being added.
+The first (1.0) and second (2.0) versions of Intel® GNA found in 10th and 11th generation Intel® Core™ Processors may be considered to be functionally equivalent.  Intel® GNA 2.0 provided performance improvement with respect to Intel® GNA 1.0.  Starting with 12th Generation Intel® Core™ Processors (formerly codenamed Alder Lake), support for Intel® GNA 3.0 features is being added.

-In the rest of this documentation, "GNA 2.0" refers to Intel® GNA hardware delivered on 10th and 11th generation Intel® Core™ processors, and the term "GNA 3.0" will be used to refer to GNA hardware delivered on 12th generation Intel® Core™ processors.
+In the rest of this documentation, "GNA 2.0" refers to Intel® GNA hardware delivered on 10th and 11th generation Intel® Core™ processors, and the term "GNA 3.0" refers to GNA hardware delivered on 12th generation Intel® Core™ processors.

-Initially, a limited subset of Intel® GNA 3.0 features are added to the previous feature set including the following:
+### Intel® GNA Forward and Backward Compatibility

-* **2D VALID Convolution With Small 2D Kernels:**  Two-dimensional convolutions with the following kernel dimensions [H,W] are supported:  [1,1], [2,2], [3,3], [2,1], [3,1], [4,1], [5,1], [6,1], [7,1], [1,2], or [1,3].  Input tensor dimensions are limited to [1,8,16,16] <= [N,C,H,W] <= [1,120,384,240].  Up to 384 channels C may be used with a subset of kernel sizes (see table below).  Up to 256 kernels (output channels) are supported.  Pooling is limited to pool shapes of [1,1], [2,2], or [3,3].  Not all combinations of kernel shape and input tensor shape are supported (see the tables below for exact limitations).
-
-The tables below show that the exact limitation on the input tensor width W depends on the number of input channels C (indicated as Ci below) and the kernel shape.  There is much more freedom to choose the input tensor height and number of output channels.
-
-## Initially Supported Subset of Intel® GNA 2D Convolutions
-
-The following tables provide a more explicit representation of the Intel(R) GNA 3.0 2D convolution operations initially supported.  The limits depend strongly on number of input tensor channels (Ci) and the input tensor width (W).  Other factors are kernel height (KH), kernel width (KW), pool height (PH), pool width (PW), horizontal pool step (SH), and vertical pool step (PW).  For example, the first table shows that for a 3x3 kernel with max pooling, only square pools are supported, and W is limited to 87 when there are 64 input channels.
-
-**Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters** (Input and Kernel Precision: 2 bytes)
-
-|KH|KW|PH|PW|SH|SW|H|W<br>Ci=8<br>Co=256|W<br>Ci=16<br>Co=256|W<br>Ci=32<br>Co=256|W<br>Ci=64<br>Co=256|W<br>Ci=128<br>Co=256|W<br>Ci=256<br>Co=256|W<br>Ci=384<br>Co=256|
-|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|
-|1|1|1|1|1|1|128|240|240|240|240|240|240|170|
-|1|1|1|1|1|1|256|240|240|240|240|240|128|85|
-|1|1|1|1|1|1|384|240|240|240|240|170|85|56|
-|1|2|1|1|1|1|128|240|240|240|240|   |  |  |
-|1|2|1|1|1|1|256|240|240|240|240|   |  |  |
-|1|2|1|1|1|1|384|240|240|240|240|   |  |  |
-|1|3|1|1|1|1|128|240|240|240|240|   |  |  |
-|1|3|1|1|1|1|256|240|240|240|240|   |  |  |
-|1|3|1|1|1|1|384|240|240|240|240|   |  |  |
-|2|1|1|1|1|1|128|192|192|192|192|192|192|128|
-|2|1|1|1|1|1|256|192|192|192|192|192|128|85|
-|2|1|1|1|1|1|384|192|192|192|192|170|85|56|
-|2|2|1|1|1|1|128|193|193|193|193|   |  |  |
-|2|2|1|1|1|1|256|193|193|193|193|   |  |  |
-|2|2|1|1|1|1|384|193|193|193|193|   |  |  |
-|2|2|2|2|1|1|128|193|193|192|179|   |  |  |
-|2|2|2|2|1|1|256|193|193|192|179|   |  |  |
-|2|2|2|2|1|1|384|193|193|192|179|   |  |  |
-|2|2|2|2|1|2|128|193|193|192|179|   |  |  |
-|2|2|2|2|1|2|256|193|193|192|179|   |  |  |
-|2|2|2|2|1|2|384|193|193|192|179|   |  |  |
-|2|2|2|2|2|1|128|193|193|192|179|   |  |  |
-|2|2|2|2|2|1|256|193|193|192|179|   |  |  |
-|2|2|2|2|2|1|384|193|193|192|179|   |  |  |
-|2|2|2|2|2|2|128|193|193|192|179|   |  |  |
-|2|2|2|2|2|2|256|193|193|192|179|   |  |  |
-|2|2|2|2|2|2|384|193|193|192|179|   |  |  |
-|3|1|1|1|1|1|128|128|128|128|128|128|85|42|
-|3|1|1|1|1|1|256|128|128|128|128|128|85|42|
-|3|1|1|1|1|1|384|128|128|128|128|128|85|42|
-|3|3|1|1|1|1|128|130|130|130|87|   |  |  |
-|3|3|1|1|1|1|256|130|130|130|87|   |  |  |
-|3|3|1|1|1|1|384|130|130|130|87|   |  |  |
-|3|3|2|2|1|1|128|130|130|126|87|   |  |  |
-|3|3|2|2|1|1|256|130|130|126|87|   |  |  |
-|3|3|2|2|1|1|384|130|130|126|87|   |  |  |
-|3|3|2|2|1|2|128|130|130|126|87|   |  |  |
-|3|3|2|2|1|2|256|130|130|126|87|   |  |  |
-|3|3|2|2|1|2|384|130|130|126|87|   |  |  |
-|3|3|2|2|2|1|128|130|130|126|87|   |  |  |
-|3|3|2|2|2|1|256|130|130|126|87|   |  |  |
-|3|3|2|2|2|1|384|130|130|126|87|   |  |  |
-|3|3|2|2|2|2|128|130|130|126|87|   |  |  |
-|3|3|2|2|2|2|256|130|130|126|87|   |  |  |
-|3|3|2|2|2|2|384|130|130|126|87|   |  |  |
-|3|3|3|3|1|1|128|130|128|118|87|   |  |  |
-|3|3|3|3|1|1|256|130|128|118|87|   |  |  |
-|3|3|3|3|1|1|384|130|128|118|87|   |  |  |
-|3|3|3|3|1|2|128|130|128|118|87|   |  |  |
-|3|3|3|3|1|2|256|130|128|118|87|   |  |  |
-|3|3|3|3|1|2|384|130|128|118|87|   |  |  |
-|3|3|3|3|1|3|128|130|128|118|87|   |  |  |
-|3|3|3|3|1|3|256|130|128|118|87|   |  |  |
-|3|3|3|3|1|3|384|130|128|118|87|   |  |  |
-|3|3|3|3|2|1|128|130|128|118|87|   |  |  |
-|3|3|3|3|2|1|256|130|128|118|87|   |  |  |
-|3|3|3|3|2|1|384|130|128|118|87|   |  |  |
-|3|3|3|3|2|2|128|130|128|118|87|   |  |  |
-|3|3|3|3|2|2|256|130|128|118|87|   |  |  |
-|3|3|3|3|2|2|384|130|128|118|87|   |  |  |
-|3|3|3|3|2|3|128|130|128|118|87|   |  |  |
-|3|3|3|3|2|3|256|130|128|118|87|   |  |  |
-|3|3|3|3|2|3|384|130|128|118|87|   |  |  |
-|3|3|3|3|3|1|128|130|128|118|87|   |  |  |
-|3|3|3|3|3|1|256|130|128|118|87|   |  |  |
-|3|3|3|3|3|1|384|130|128|118|87|   |  |  |
-|3|3|3|3|3|2|128|130|128|118|87|   |  |  |
-|3|3|3|3|3|2|256|130|128|118|87|   |  |  |
-|3|3|3|3|3|2|384|130|128|118|87|   |  |  |
-|3|3|3|3|3|3|128|130|128|118|87|   |  |  |
-|3|3|3|3|3|3|256|130|128|118|87|   |  |  |
-|3|3|3|3|3|3|384|130|128|118|87|   |  |  |
-|4|1|1|1|1|1|128|96|96|96|96|96|64|32|
-|4|1|1|1|1|1|256|96|96|96|96|96|64|32|
-|4|1|1|1|1|1|384|96|96|96|96|96|64|32|
-|5|1|1|1|1|1|128|76|76|76|76|51|25|  |
-|5|1|1|1|1|1|256|76|76|76|76|51|25|  |
-|5|1|1|1|1|1|384|76|76|76|76|51|25|  |
-|6|1|1|1|1|1|128|64|64|64|64|42|21|  |
-|6|1|1|1|1|1|256|64|64|64|64|42|21|  |
-|6|1|1|1|1|1|384|64|64|64|64|42|21|  |
-|7|1|1|1|1|1|128|54|54|54|54|36|  |  |
-|7|1|1|1|1|1|256|54|54|54|54|36|  |  |
-|7|1|1|1|1|1|384|54|54|54|54|36|  |  |
-
-**Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters** (Input and Kernel Precision: 1 bytes)
-
-|KH|KW|PH|PW|SH|SW|H|W<br>Ci=8<br>Co=256|W<br>Ci=16<br>Co=256|W<br>Ci=32<br>Co=256|W<br>Ci=64<br>Co=256|W<br>Ci=128<br>Co=256|W<br>Ci=256<br>Co=256|W<br>Ci=384<br>Co=256|
-|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|
-|1|1|1|1|1|1|128|240|240|240|240|240|240|240|
-|1|1|1|1|1|1|256|240|240|240|240|240|240|170|
-|1|1|1|1|1|1|384|240|240|240|240|240|170|113|
-|1|2|1|1|1|1|128|240|240|240|240|240|240|240|
-|1|2|1|1|1|1|256|240|240|240|240|240|240|170|
-|1|2|1|1|1|1|384|240|240|240|240|240|170|113|
-|1|3|1|1|1|1|128|240|240|240|240|240|   |   |
-|1|3|1|1|1|1|256|240|240|240|240|240|   |   |
-|1|3|1|1|1|1|384|240|240|240|240|240|   |   |
-|2|1|1|1|1|1|128|192|192|192|192|192|192|192|
-|2|1|1|1|1|1|256|192|192|192|192|192|192|170|
-|2|1|1|1|1|1|384|192|192|192|192|192|170|113|
-|2|2|1|1|1|1|128|193|193|193|193|193|193|129|
-|2|2|1|1|1|1|256|193|193|193|193|193|193|129|
-|2|2|1|1|1|1|384|193|193|193|193|193|170|113|
-|3|1|1|1|1|1|128|128|128|128|128|128|128|85|
-|3|1|1|1|1|1|256|128|128|128|128|128|128|85|
-|3|1|1|1|1|1|384|128|128|128|128|128|128|85|
-|3|3|1|1|1|1|128|130|130|130|130|87 |   |  |
-|3|3|1|1|1|1|256|130|130|130|130|87 |   |  |
-|3|3|1|1|1|1|384|130|130|130|130|87 |   |  |
-|4|1|1|1|1|1|128|96|96|96|96|96|96|64|
-|4|1|1|1|1|1|256|96|96|96|96|96|96|64|
-|4|1|1|1|1|1|384|96|96|96|96|96|96|64|
-|5|1|1|1|1|1|128|76|76|76|76|76|51|51|
-|5|1|1|1|1|1|256|76|76|76|76|76|51|51|
-|5|1|1|1|1|1|384|76|76|76|76|76|51|51|
-|6|1|1|1|1|1|128|64|64|64|64|64|42|21|
-|6|1|1|1|1|1|256|64|64|64|64|64|42|21|
-|6|1|1|1|1|1|384|64|64|64|64|64|42|21|
-|7|1|1|1|1|1|128|54|54|54|54|54|36|18|
-|7|1|1|1|1|1|256|54|54|54|54|54|36|18|
-|7|1|1|1|1|1|384|54|54|54|54|54|36|18|
-
-
-> **NOTE**:  The above limitations only apply to the new hardware 2D convolution operation.  When possible, the Intel® GNA plugin graph compiler flattens 2D convolutions so that the second generation Intel® GNA 1D convolution operations (without these limitations) may be used.  The plugin will also flatten 2D convolutions regardless of the sizes  if GNA 2.0 compilation target is selected (see below).
-
-## Intel® GNA Forward and Backward Compatibility
-
-In the general case, there is no guarantee that a model compiled for GNA 2.0 will run on GNA 3.0, or vice versa.
-
-However, in most cases, networks compiled for GNA 2.0 will run as expected on GNA 3.0, although the performance may be worse compared to the case when a network is compiled specifically for the latter.  The exception is networks with convolutions with the number of filters greater than 8192 (see the <a href="#models-and-layers-limitations">Models and Layers Limitations</a> section).
-
-Networks compiled for GNA 3.0 should run on GNA 2.0 with incompatible layers emulated on CPU.
-
-You can use the following options `KEY_GNA_EXEC_TARGET` and `KEY_GNA_COMPILE_TARGET` options  to check interoperability (see the <a href="#supported-configuration-parameters">Supported Configuration Parameters</a> section below):
+When you run a model using the GNA plugin, it is compiled internally for the specific hardware target. It is possible to export compiled model using <a href="#import-export">Import/Export</a> functionality to use it later, but in the general case, there is no guarantee that a model compiled and exported for GNA 2.0 runs on GNA 3.0, or vice versa.

@sphinxdirective
-.. tab:: C++

-   ``KEY_GNA_EXEC_TARGET``,  ``KEY_GNA_COMPILE_TARGET``
+.. csv-table:: Interoperability of compile target and hardware target
+   :header: "Hardware", "Compile target 2.0", "Compile target 3.0"

-.. tab:: Python
-
-   ``GNA_EXEC_TARGET``,  ``GNA_COMPILE_TARGET``
+   "GNA 2.0", "Supported", "Not supported (incompatible layers emulated on CPU)"
+   "GNA 3.0", "Partially supported", "Supported"

@endsphinxdirective

-## Drivers and Dependencies
+> **NOTE**: In most cases, networks compiled for GNA 2.0 runs as expected on GNA 3.0, although the performance may be worse compared to the case when a network is compiled specifically for the latter.  The exception is networks with convolutions with the number of filters greater than 8192 (see the <a href="#models-and-operations-limitations">Models and Operations Limitations</a> section).

-Intel® GNA hardware requires a driver to be installed on the system.
+For optimal work with POT quantized models which includes 2D convolutions on GNA 3.0 hardware, the <a href="#support-for-2d-convolutions-using-pot">following requirements</a> should be satisfied.

-* Linux\* OS:
-[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.4+)](https://storage.openvinotoolkit.org/drivers/gna/)
+Choose a compile target depending on the priority: cross-platform execution, performance, memory, or power optimization..

-* Windows\* OS:
-Intel® GNA driver for Windows is available through Windows Update\*
+Use the following properties to check interoperability in your application: `ov::intel_gna::execution_target` and `ov::intel_gna::compile_target`

-## <a name="models-and-layers-limitations">Models and Layers Limitations</a>
+[Speech C++ Sample](@ref openvino_inference_engine_samples_speech_sample_README) can be used for experiments (see `-exec_target` and `-compile_target` command line options).

-Because of specifics of hardware architecture, Intel® GNA supports a limited set of layers, their kinds and combinations.
-For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted for the GNA Plugin, because the plugin does not fully support 2D convolutions.
+## Software emulation mode

-For the list of supported layers, see the **GNA** column of the **Supported Layers** section in [Supported Devices](Supported_Devices.md).
+On platforms without GNA hardware support plugin chooses software emulation mode by default. It means, model runs even if you do not have GNA HW within your platform.
+GNA plugin enables you to switch the execution between software emulation mode and hardware execution mode after the model is loaded.
+For details, see description of the `ov::intel_gna::execution_mode` property.

-Limitations include:
+## Recovery from Interruption by High-Priority Windows Audio Processes\*

- Only 1D convolutions are natively supported on the HW prior to GNA 3.0; 2D convolutions have specific limitations (see the table above).
- The number of output channels for convolutions must be a multiple of 4.
- The maximum number of filters is 65532 for GNA 2.0 and 8192 for GNA 3.0.
- Transpose layer support is limited to the cases where no data reordering is needed or when reordering is happening for two dimensions, at least one of which is not greater than 8.
- Splits and concatenations are supported for continuous portions of memory (e.g., split of 1,2,3,4 to 1,1,3,4 and 1,1,3,4 or concats of 1,2,3,4 and 1,2,3,5 to 2,2,3,4).
- For Multiply, Add and Subtract layers, auto broadcasting is only supported for constant inputs.
+GNA is designed for real-time workloads such as noise reduction.
+For such workloads, processing should be time constrained, otherwise extra delays may cause undesired effects such as
+*audio glitches*. To make sure that processing can satisfy real-time requirements, the GNA driver provides a Quality of Service
+(QoS) mechanism, which interrupts requests that might cause high-priority Windows audio processes to miss
+the schedule, thereby causing long running GNA tasks to terminate early.

-### Support for 2D Convolutions in Previous Generations of GNA Hardware
+To prepare the applications correctly, use Automatic QoS Feature described below.

-The Intel® GNA 1.0 and 2.0 hardware natively supports only 1D convolutions.
+### Automatic QoS Feature on Windows*

-However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. GNA Plugin performs such a transformation for Kaldi `nnet1` convolution. From this perspective, the Intel® GNA hardware convolution operation accepts an `NHWC` input and produces an `NHWC` output. Because OpenVINO™ only supports the `NCHW` layout, you may need to insert `Transpose` layers before or after convolutions.
-
-For example, the Kaldi model optimizer inserts such a transpose after convolution for the [rm_cnn4a network](https://storage.openvinotoolkit.org/models_contrib/speech/2021.2/rm_cnn4a_smbr/). This `Transpose` layer is automatically removed by the GNA Plugin, because the Intel® GNA hardware convolution layer already produces the required `NHWC` result.
-
-## Operation Precision
-
-Intel® GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations. Outputs calculated using a reduced integer precision are different from the scores calculated using the floating point format, for example, `FP32` outputs calculated on CPU using the OpenVINO [CPU device](CPU.md).
-
-Unlike other plugins supporting low-precision execution, the GNA plugin can calculate quantization factors at the model loading time, so you can run a model without calibration using the [Post-Training Optimization Tool](@ref pot_README).
-However, this mode may not provide satisfactory accuracy because the internal quantization algorithm is based on heuristics which may or may not be efficient, depending on the model and dynamic range of input data.
-
-Starting with 2021.4 release of OpenVINO, GNA plugin users are encouraged to use the [POT API Usage sample for GNA](@ref pot_sample_speech_README) to get a model with quantization hints based on statistics for the provided dataset.
-
-## <a name="execution-modes">Execution Modes</a>
+Starting with 2021.4.1 release of OpenVINO and 03.00.00.1363 version of Windows* GNA driver, a new execution mode `ov::intel_gna::ExecutionMode::HW_WITH_SW_FBACK` is introduced
+to assure that workloads satisfy real-time execution. In this mode, the GNA driver automatically falls back on CPU for a particular infer request
+if the HW queue is not empty, so there is no need for explicitly switching between GNA and CPU.

@sphinxdirective
 .. tab:: C++

-   ============================  ==============================================================================================================================================
-   Mode                          Description
-   ============================  ==============================================================================================================================================
-   ``KEY_GNA_AUTO``              Uses Intel® GNA if available, otherwise uses software execution mode on CPU.
-   ``KEY_GNA_HW``                Uses Intel® GNA if available, otherwise raises an error.
-   ``KEY_GNA_SW``                *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode.
-   ``KEY_GNA_SW_EXACT``          Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode.
-   ``KEY_GNA_HW_WITH_SW_FBACK``  Uses Intel® GNA if available, otherwise raises an error. If the hardware queue is not empty, automatically falls back to CPU in the bit-exact mode.
-   ``KEY_GNA_SW_FP32``           Executes the GNA-compiled graph on CPU but substitutes parameters and calculations from low precision to floating point (``FP32``).
-   ============================  ==============================================================================================================================================
+   .. doxygensnippet:: docs/snippets/gna/configure.cpp
+      :language: cpp
+      :fragment: [include]
+
+   .. doxygensnippet:: docs/snippets/gna/configure.cpp
+      :language: cpp
+      :fragment: [ov_gna_exec_mode_hw_with_sw_fback]

 .. tab:: Python

-   ========================  ==============================================================================================================================================
-   Mode                      Description
-   ========================  ==============================================================================================================================================
-   ``GNA_AUTO``              Uses Intel® GNA if available, otherwise uses software execution mode on CPU.
-   ``GNA_HW``                Uses Intel® GNA if available, otherwise raises an error.
-   ``GNA_SW``                *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode.
-   ``GNA_SW_EXACT``          Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode.
-   ``GNA_HW_WITH_SW_FBACK``  Uses Intel® GNA if available, otherwise raises an error. If the hardware queue is not empty, automatically falls back to CPU in the bit-exact mode.
-   ``GNA_SW_FP32``           Executes the GNA-compiled graph on CPU but substitutes parameters and calculations from low precision to floating point (``FP32``).
-   ========================  ==============================================================================================================================================
+   .. doxygensnippet:: docs/snippets/gna/configure.py
+      :language: python
+      :fragment: [import]
+
+   .. doxygensnippet:: docs/snippets/gna/configure.py
+      :language: python
+      :fragment: [ov_gna_exec_mode_hw_with_sw_fback]

@endsphinxdirective

-## <a name="supported-configuration-parameters">Supported Configuration Parameters</a>
+> **NOTE**: Due to the "first come - first served" nature of GNA driver and the QoS feature, this mode may lead to increased CPU consumption
+if there are several clients using GNA simultaneously.
+Even a lightweight competing infer request which has not been cleared at the time when the user's GNA client process makes its request,
+can cause the user's request to be executed on CPU, thereby unnecessarily increasing CPU utilization and power.

-The plugin supports the configuration parameters listed below. The parameter names correspond to their usage through API keys, such as ``GNAConfigParams::KEY_GNA_DEVICE_MODE`` or ``PluginConfigParams::KEY_PERF_COUNT`` in C++ and ``GNA_DEVICE_MODE`` or ``PERF_COUNT`` in Python.
+## Supported inference data types
+
+Intel® GNA essentially operates in the low-precision mode which represents a mix of 8-bit (`i8`), 16-bit (`i16`), and 32-bit (`i32`) integer computations.
+
+GNA plugin users are encouraged to use the [Post-Training Optimization Tool](@ref pot_README) to get a model with quantization hints based on statistics for the provided dataset.
+
+Unlike other plugins supporting low-precision execution, the GNA plugin can calculate quantization factors at the model loading time, so you can run a model without calibration. However, this mode may not provide satisfactory accuracy because the internal quantization algorithm is based on heuristics which may or may not be efficient, depending on the model and dynamic range of input data and this mode is going to be deprecated soon.
+
+GNA plugin supports the following data types as inference precision of internal primitives
+* Quantized data types:
+  - i16
+  - i8
+
+[Hello Query Device C++ Sample](@ref openvino_inference_engine_samples_hello_query_device_README) can be used to print out supported data types for all detected devices.
+
+[POT API Usage sample for GNA](@ref pot_sample_speech_README) demonstrates how a model can be quantized for GNA using POT API in 2 modes:
+* Accuracy (i16 weights)
+* Performance (i8 weights)
+
+For POT quantized model `ov::hint::inference_precision` property has no effect except cases described in <a href="#support-for-2d-convolutions-using-pot">Support for 2D Convolutions using POT</a>.
+
+## Supported features
+
+### Models caching
+Cache for GNA plugin may be enabled via common OpenVINO `ov::cache_dir` property due to import/export functionality support (see below).
+
+See [Model caching overview page](@ref openvino_docs_IE_DG_Model_caching_overview) for more details.
+
+### Import/Export
+
+The GNA plugin supports import/export capability which helps to significantly decrease first inference time. The model compile target is the same as the execution target by default. The default value for the execution target corresponds to available hardware, or latest hardware version supported by the plugin (i.e., GNA 3.0) if there is no GNA HW in the system.
+
+If you are willing to export a model for a specific version of GNA HW, please use the `ov::intel_gna::compile_target` property and then export the model:

@sphinxdirective
 .. tab:: C++

-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | Parameter Name                   | Values                  | Default Value | Description                                                     |
-   +==================================+=========================+===============+=================================================================+
-   | ``KEY_GNA_EXEC_TARGET``          | ``TARGET_2_0``,         | *see below*   | Defines the execution target.                                   |
-   |                                  | ``TARGET_3_0``          |               |                                                                 |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_COMPILE_TARGET``       | ``TARGET_2_0``,         | *see below*   | Defines the compilation target.                                 |
-   |                                  | ``TARGET_3_0``          |               |                                                                 |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_COMPACT_MODE``         | ``YES``, ``NO``         | ``NO``        | Enables I/O buffers reuse to save space.                        |
-   |                                  |                         |               | Makes debugging harder.                                         |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_SCALE_FACTOR``         | FP32 number             | 1.0           | Sets the scale factor to use for input quantization.            |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_DEVICE_MODE``          | ``GNA_AUTO``,           | ``GNA_AUTO``  | One of the modes described                                      |
-   |                                  | ``GNA_HW``,             |               | in `Execution Modes <#execution-modes>`_.                       |
-   |                                  | ``GNA_HW_WITH_SW_FBACK``|               |                                                                 |
-   |                                  | ``GNA_SW_EXACT``,       |               |                                                                 |
-   |                                  | ``GNA_SW_FP32``         |               |                                                                 |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_FIRMWARE_MODEL_IMAGE`` | ``std::string``         | ``""``        | Sets the name for the embedded model binary dump file.          |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_PRECISION``            | ``I16``, ``I8``         | ``I16``       | Sets the preferred integer weight resolution for quantization   |
-   |                                  |                         |               | (ignored for models produced using POT).                        |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_PERF_COUNT``               | ``YES``, ``NO``         | ``NO``        | Turns on performance counters reporting.                        |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-
-   The parameters are passed as ``std::map<std::string, std::string>`` on ``InferenceEngine::Core::LoadNetwork`` or ``InferenceEngine::SetConfig``.
-
-   Normally, you do not need to select the execution target (``KEY_GNA_EXEC_TARGET``) and compilation target (``KEY_GNA_COMPILE_TARGET``). The default value for the execution target corresponds to available hardware, or latest hardware version supported by the plugin (i.e., GNA 3.0) if there is no GNA HW in the system. The compilation target is the same as the execution target by default. However, you may want to change the targets, for example, if you want to check how a model compiled for one generation would behave on the other generation (using the software emulation mode), or if you are willing to export a model for a specific version of GNA HW.
-
-   You can change the ``KEY_GNA_DEVICE_MODE`` parameter at run time using ``InferenceEngine::ExecutableNetwork::SetConfig``, which works for any value excluding ``GNA_SW_FP32``. This enables you to switch the execution between software emulation mode and hardware execution mode after the model is loaded.
+   .. doxygensnippet:: docs/snippets/gna/import_export.cpp
+      :language: cpp
+      :fragment: [ov_gna_export]

 .. tab:: Python

-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | Parameter Name                   | Values                  | Default Value | Description                                                     |
-   +==================================+=========================+===============+=================================================================+
-   | ``GNA_EXEC_TARGET``              | ``TARGET_2_0``,         | _see below_   | Defines the execution target.                                   |
-   |                                  | ``TARGET_3_0``          |               |                                                                 |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``GNA_COMPILE_TARGET``           | ``TARGET_2_0``,         | _see below_   | Defines the compilation target.                                 |
-   |                                  | ``TARGET_3_0``          |               |                                                                 |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``GNA_COMPACT_MODE``             | ``YES``, ``NO``         | ``NO``        | Enables I/O buffers reuse to save space.                        |
-   |                                  |                         |               | Makes debugging harder.                                         |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``GNA_SCALE_FACTOR``             | FP32 number             | 1.0           | Sets the scale factor to use for input quantization.            |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``KEY_GNA_DEVICE_MODE``          | ``GNA_AUTO``,           | ``GNA_AUTO``  | One of the modes described                                      |
-   |                                  | ``GNA_HW``,             |               | in `Execution Modes <#execution-modes>`_.                       |
-   |                                  | ``GNA_HW_WITH_SW_FBACK``|               |                                                                 |
-   |                                  | ``GNA_SW_EXACT``,       |               |                                                                 |
-   |                                  | ``GNA_SW_FP32``         |               |                                                                 |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``GNA_FIRMWARE_MODEL_IMAGE``     | ``string``              | ``""``        | Sets the name for the embedded model binary dump file.          |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``GNA_PRECISION``                | ``I16``, ``I8``         | ``I16``       | Sets the preferred integer weight resolution for quantization   |
-   |                                  |                         |               | (ignored for models produced using POT).                        |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-   | ``PERF_COUNT``                   | ``YES``, ``NO``         | ``NO``        | Turns on performance counters reporting.                        |
-   +----------------------------------+-------------------------+---------------+-----------------------------------------------------------------+
-
-   The parameters are passed as strings to `IECore.load_network <api/ie_python_api/_autosummary/openvino.inference_engine.IECore.html#openvino.inference_engine.IECore.load_network>`_.
-
-   Normally, you do not need to select the execution target (``GNA_EXEC_TARGET``) and compilation target (``GNA_COMPILE_TARGET``). The default value for the execution target corresponds to available hardware, or latest hardware version supported by the plugin (i.e., GNA 3.0) if there is no GNA HW in the system. The compilation target is the same as the execution target by default. However, you may want to change the targets, for example, if you want to check how a model compiled for one generation would behave on the other generation (using the SW emulation mode), or if you are willing to export a model for a specific version of GNA HW.
-
-   You can change the ``GNA_DEVICE_MODE`` parameter at run time by sending a configuration dict to the `IECore.load_network <api/ie_python_api/_autosummary/openvino.inference_engine.IECore.html#openvino.inference_engine.IECore.load_network>`_ call, which works for any value excluding ``GNA_SW_FP32``. This enables you to switch the execution between software emulation mode and hardware execution mode after the model is loaded.
+   .. doxygensnippet:: docs/snippets/gna/import_export.py
+      :language: python
+      :fragment: [ov_gna_export]

@endsphinxdirective
-## How to Interpret Performance Counters

-With the following methods, you can collect performance counters that provides various performance data about execution on GNA:
+Import model:

@sphinxdirective
 .. tab:: C++

-   ``InferenceEngine::InferRequest::GetPerformanceCounts``
-
-   The returned map stores a counter description as a key, and a counter value in the ``realTime_uSec`` field of the ``InferenceEngineProfileInfo`` structure.
-
+   .. doxygensnippet:: docs/snippets/gna/import_export.cpp
+      :language: cpp
+      :fragment: [ov_gna_import]

 .. tab:: Python

-   ``openvino.inference_engine.InferRequest.get_perf_counts``
+   .. doxygensnippet:: docs/snippets/gna/import_export.py
+      :language: python
+      :fragment: [ov_gna_import]

-   The returned map stores a counter description as a key, and a counter value in the ``real_time`` field.
+@endsphinxdirective
+
+[Compile Tool](@ref openvino_inference_engine_tools_compile_tool_README) or [Speech C++ Sample](@ref openvino_inference_engine_samples_speech_sample_README) can be used to compile model.
+
+### Stateful models
+GNA plugin natively supports stateful models.
+
+Please refer to [Stateful models] (@ref openvino_docs_IE_DG_network_state_intro) for more details about such models.
+
+> **NOTE**: Typically, GNA is used in streaming scenarios, when minimizing the latency is important. Taking into account that POT does not support the `TensorIterator` operation, the recommendation is to use the `--transform` option of the Model Optimizer to apply `LowLatency2` transformation when converting an original model.
+
+### Profiling
+The GNA plugin allows to turn on profiling using the `ov::enable_profiling` property.
+With the following methods, you can collect profiling information that provides various performance data about execution on GNA:
+
+@sphinxdirective
+.. tab:: C++
+
+   ``ov::InferRequest::get_profiling_info``
+
+.. tab:: Python
+
+   ``openvino.runtime.InferRequest.get_profiling_info``

@endsphinxdirective

@ -385,109 +188,154 @@ seconds = cycles / frequency
 ```

 Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor:
-Processor | Frequency of Intel® GNA
---|---
-Intel® Core™ processors| 400MHz
-Intel® processors formerly codenamed Elkhart Lake | 200MHz
-Intel® processors formerly codenamed Gemini Lake | 200MHz
+
+@sphinxdirective
+
+.. csv-table:: Frequency of Intel® GNA inside a particular processor
+   :header: "Processor", "Frequency of Intel® GNA, MHz"
+
+   "Intel® Core™ processors", 400
+   "Intel® processors formerly codenamed Elkhart Lake", 200
+   "Intel® processors formerly codenamed Gemini Lake", 200
+
+@endsphinxdirective

 Performance counters provided for the time being:

-* Scoring request performance results
+* Inference request performance results
 	* Number of total cycles spent on scoring in hardware including compute and memory stall cycles
 	* Number of stall cycles spent in hardware

-## Network Batch Size
+##  Supported properties
+The plugin supports the properties listed below.

-Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one
-input blob using the following methods:
+### Read-write properties
+The following parameters must be set before model compilation in order to take effect or passed as additional argument to `ov::Core::compile_model()`:
+
+- ov::cache_dir
+- ov::enable_profiling
+- ov::hint::inference_precision
+- ov::hint::num_requests
+- ov::intel_gna::compile_target
+- ov::intel_gna::firmware_model_image_path
+- ov::intel_gna::execution_target
+- ov::intel_gna::pwl_design_algorithm
+- ov::intel_gna::pwl_max_error_percent
+- ov::intel_gna::scale_factors_per_input
+
+These parameters can be changed after model compilation `ov::CompiledModel::set_property`:
+- ov::hint::performance_mode
+- ov::intel_gna::execution_mode
+- ov::log::level
+
+### Read-only properties
+- ov::available_devices
+- ov::device::capabilities
+- ov::device::full_name
+- ov::intel_gna::library_full_version
+- ov::optimal_number_of_infer_requests
+- ov::range_for_async_infer_requests
+- ov::supported_properties
+
+## Limitations
+
+### Models and Operations Limitations
+
+Because of specifics of hardware architecture, Intel® GNA supports a limited set of operations, their kinds and combinations.
+For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted for the GNA Plugin, because the plugin does not fully support 2D convolutions.
+
+Limitations include:
+
+- Only 1D convolutions are natively supported on the HW prior to GNA 3.0; 2D convolutions have specific limitations (see the table below).
+- The number of output channels for convolutions must be a multiple of 4.
+- The maximum number of filters is 65532 for GNA 2.0 and 8192 for GNA 3.0.
+- Transpose layer support is limited to the cases where no data reordering is needed or when reordering is happening for two dimensions, at least one of which is not greater than 8.
+- Splits and concatenations are supported for continuous portions of memory (e.g., split of 1,2,3,4 to 1,1,3,4 and 1,1,3,4 or concats of 1,2,3,4 and 1,2,3,5 to 2,2,3,4).
+- For Multiply, Add and Subtract layers, auto broadcasting is only supported for constant inputs.
+
+#### Support for 2D Convolutions
+
+The Intel® GNA 1.0 and 2.0 hardware natively supports only 1D convolutions. However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction.
+
+Initially, a limited subset of Intel® GNA 3.0 features are added to the previous feature set including the following:
+
+* **2D VALID Convolution With Small 2D Kernels:**  Two-dimensional convolutions with the following kernel dimensions [H,W] are supported: [1,1], [2,2], [3,3], [2,1], [3,1], [4,1], [5,1], [6,1], [7,1], [1,2], or [1,3]. Input tensor dimensions are limited to [1,8,16,16] <= [N,C,H,W] <= [1,120,384,240]. Up to 384 channels C may be used with a subset of kernel sizes (see table below).  Up to 256 kernels (output channels) are supported. Pooling is limited to pool shapes of [1,1], [2,2], or [3,3]. Not all combinations of kernel shape and input tensor shape are supported (see the tables below for exact limitations).
+
+The tables below show that the exact limitation on the input tensor width W depends on the number of input channels C (indicated as Ci below) and the kernel shape.  There is much more freedom to choose the input tensor height and number of output channels.
+
+The following tables provide a more explicit representation of the Intel(R) GNA 3.0 2D convolution operations initially supported. The limits depend strongly on number of input tensor channels (Ci) and the input tensor width (W). Other factors are kernel height (KH), kernel width (KW), pool height (PH), pool width (PW), horizontal pool step (SH), and vertical pool step (PW). For example, the first table shows that for a 3x3 kernel with max pooling, only square pools are supported, and W is limited to 87 when there are 64 input channels.
+
+@sphinxdirective
+
+:download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i16) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i16.csv>`
+
+:download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i8) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i8.csv>`
+
+@endsphinxdirective
+
+> **NOTE**: The above limitations only apply to the new hardware 2D convolution operation. When possible, the Intel® GNA plugin graph compiler flattens 2D convolutions so that the second generation Intel® GNA 1D convolution operations (without these limitations) may be used. The plugin will also flatten 2D convolutions regardless of the sizes if GNA 2.0 compilation target is selected (see below).
+
+#### Support for 2D Convolutions using POT
+
+For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met:
+* All convolution parameters are natively supported by HW (see tables above)
+* The runtime precision is explicitly set by the `ov::hint::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT.
+
+### Batch Size Limitation
+
+Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames.
+
+Please refer to [Layout API overview](@ref openvino_docs_OV_Runtime_UG_Layout_Overview) to determine batch dimension.
+
+To set layout of model inputs in runtime use [Preprocessing API](@ref openvino_docs_OV_Runtime_UG_Preprocessing_Overview):

@sphinxdirective
 .. tab:: C++

-   ``InferenceEngine::ICNNNetwork::setBatchSize``
+   .. doxygensnippet:: docs/snippets/gna/set_batch.cpp
+      :language: cpp
+      :fragment: [include]
+
+   .. doxygensnippet:: docs/snippets/gna/set_batch.cpp
+      :language: cpp
+      :fragment: [ov_gna_set_nc_layout]

 .. tab:: Python

-   `IENetwork.batch_size <api/ie_python_api/_autosummary/openvino.inference_engine.IENetwork.html#openvino.inference_engine.IENetwork.batch_size>`_
+   .. doxygensnippet:: docs/snippets/gna/set_batch.py
+      :language: python
+      :fragment: [import]
+
+   .. doxygensnippet:: docs/snippets/gna/set_batch.py
+      :language: python
+      :fragment: [ov_gna_set_nc_layout]

@endsphinxdirective

-Increasing batch size only improves efficiency of `Fully Connected` layers.
-
-> **NOTE**: For networks with `Convolutional`, `LSTM`, or `Memory` layers, the only supported batch size is 1.
-
-## Compatibility with Heterogeneous Plugin
-
-Heterogeneous plugin was tested with the Intel® GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin, such as Softmax, use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration.
-
-> **NOTE**: Due to limitation of the Intel® GNA backend library, heterogenous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices.
-
-## Recovery from Interruption by High-Priority Windows Audio Processes\*
-
-GNA is designed for real-time workloads such as noise reduction.
-For such workloads, processing should be time constrained, otherwise extra delays may cause undesired effects such as
-*audio glitches*. To make sure that processing can satisfy real-time requirements, the GNA driver provides a Quality of Service
-(QoS) mechanism, which interrupts requests that might cause high-priority Windows audio processes to miss
-the schedule, thereby causing long running GNA tasks to terminate early.
-
-Applications should be prepared for this situation.
-
-If an inference in the `GNA_HW` mode cannot be executed because of such an interruption, then the `wait` method returns the following status code:
+then set batch size:

@sphinxdirective
 .. tab:: C++

-   ``InferRequest::Wait()`` returns status code ``StatusCode::INFER_NOT_STARTED``.
+   .. doxygensnippet:: docs/snippets/gna/set_batch.cpp
+      :language: cpp
+      :fragment: [ov_gna_set_batch_size]

 .. tab:: Python

-   `InferRequest.wait <api/ie_python_api/_autosummary/openvino.inference_engine.InferRequest.html#openvino.inference_engine.InferRequest.wait>`_ returns status code `INFER_NOT_STARTED`.
+   .. doxygensnippet:: docs/snippets/gna/set_batch.py
+      :language: python
+      :fragment: [ov_gna_set_batch_size]

@endsphinxdirective

-In future releases, it will be changed to a more meaningful status code.
+Increasing batch size only improves efficiency of `MatMul` layers.

-Any application working with GNA must properly react to this code.
-One of the strategies to adapt an application:
+> **NOTE**: For models with `Convolution`, `LSTMCell`, or `ReadValue`/`Assign` operations, the only supported batch size is 1.

-1. Immediately switch to the GNA_SW_EXACT emulation mode:
-@sphinxdirective
-.. tab:: C++
+### Compatibility with Heterogeneous mode

-   .. code-block:: cpp
-
-      std::map<std::string, Parameter> newConfig;
-      newConfig[GNAConfigParams::KEY_GNA_DEVICE_MODE] = Parameter("GNA_SW_EXACT");
-      executableNet.SetConfig(newConfig);
-
-.. tab:: Python
-
-   .. code-block:: python
-
-      from openvino.inference_engine import IECore
-
-      ie = IECore()
-      new_cfg = {'GNA_DEVICE_MODE' : 'GNA_SW_EXACT'}
-      net = ie.read_network(model=path_to_model)
-      exec_net = ie.load_network(network=net, device_name="GNA", config=new_cfg)
-
-@endsphinxdirective
-
-2. Resubmit and switch back to GNA_HW expecting that the competing application has finished.
-
-   > **NOTE**: This method is deprecated since a new automatic QoS mode has been introduced in 2021.4.1 release of OpenVINO™ (see below).
-
-## GNA3 Automatic QoS Feature on Windows*
-
-Starting with 2021.4.1 release of OpenVINO and 03.00.00.1363 version of Windows* GNA driver, a new execution mode `GNA_HW_WITH_SW_FBACK` is introduced
-to assure that workloads satisfy real-time execution. In this mode, the GNA driver automatically falls back on CPU for a particular infer request
-if the HW queue is not empty, so there is no need for explicitly switching between GNA and CPU.
-
-> **NOTE**: Due to the "first come - first served" nature of GNA driver and the QoS feature, this mode may lead to increased CPU consumption
-if there are several clients using GNA simultaneously.
-Even a lightweight competing infer request which has not been cleared at the time when the user's GNA client process makes its request,
-can cause the user's request to be executed on CPU, thereby unnecessarily increasing CPU utilization and power.
+[Heterogeneous execution](@ref openvino_docs_OV_UG_Hetero_execution) is currently not supported by GNA plugin.

 ## See Also

--- a/docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i16.csv
+++ b/docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i16.csv
--- a/docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i8.csv
+++ b/docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i8.csv
--- a/docs/get_started.md
+++ b/docs/get_started.md
@ -6,7 +6,7 @@
   :maxdepth: 1
   :hidden:
   :caption: Install OpenVINO
-   
+
   Overview <openvino_docs_install_guides_overview>
   Install OpenVINO Runtime <openvino_docs_install_guides_install_runtime>
   Install OpenVINO Development Tools <openvino_docs_install_guides_install_dev_tools>
@ -18,23 +18,24 @@
   :maxdepth: 1
   :hidden:
   :caption: Additional Configurations
-   
+
   Configurations for GPU <openvino_docs_install_guides_configurations_for_intel_gpu>
   Configurations for NCS2 <openvino_docs_install_guides_configurations_for_ncs2>
   Configurations for VPU <openvino_docs_install_guides_installing_openvino_ivad_vpu>
-   
+   Configurations for GNA <openvino_docs_install_guides_configurations_for_intel_gna>
+
 .. toctree::
   :maxdepth: 1
   :hidden:
   :caption: Troubleshooting
-   
+
   Troubleshooting Guide <openvino_docs_get_started_guide_troubleshooting>
-   
+
 .. toctree::
   :maxdepth: 1
   :hidden:
   :caption: Get Started Guides
-   
+
   Get Started with Step-by-step Demo <openvino_docs_get_started_get_started_demos>
   Get Started with Tutorials <tutorials>

@ -47,33 +48,33 @@


@endsphinxdirective
- 
+
@sphinxdirective
 .. raw:: html
-    
+
   <link rel="stylesheet" type="text/css" href="_static/css/getstarted_style.css">
-   
+
   <p>To get started with OpenVINO, the first thing to do is to actually install it. You can get an <a href="openvino_docs_install_guides_overview.html" >overview</a> of what installation options we provide and start from there. </p>
-   
+
   <p id="GSG_introtext">If you already have enough information, you can also choose the installation type that best suits your needs from one of the options below:<br />
     <a href="openvino_docs_install_guides_install_runtime.html" >Install <br />OpenVINO Runtime </a>
     <a href="openvino_docs_install_guides_install_dev_tools.html" >Install OpenVINO <br />Development Tools</a>
     <a href="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode" >Build <br /> from source</a>
   </p>
-   <div style="clear:both;"> </div> 
-   
+   <div style="clear:both;"> </div>
+
   <p>If you are using Intel® Processor Graphics, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs or Intel® Neural Compute Stick 2, please check the additional configurations for them accordingly: <a href="openvino_docs_install_guides_configurations_for_intel_gpu.html" >Configurations for GPU</a>, <a href="openvino_docs_install_guides_installing_openvino_ivad_vpu.html" >Configurations for VPU</a> or <a href="openvino_docs_install_guides_configurations_for_ncs2.html" >Configurations for NCS2</a>.
   </p>
-   
+
   <p>With OpenVINO installed, you are ready to run your first inference and learn the workflow. <br /> Here is a set of hands-on demonstrations of various complexity levels to guide you through the process: from performing sample inference with just one command, to running code samples, demo application or Jupyter notebooks. If you prefer working with GUI, you can also get started with the DL Workbench application. This way you can choose the right level for you.<br /></p>
- 
+
   <h3>Choose how you want to progress:</h3>
- 
+
   <div id="GSG_nextstepchoice">
     <a href="openvino_docs_get_started_get_started_scripts.html" >
        <h4>One-command demo 		</h4>
        <p>Execute just one command and watch all the steps happening before your eyes. </p>
-     </a>  		
+     </a>
     <a href="openvino_docs_get_started_get_started_demos.html" >
        <h4>Step-by-step demo		</h4>
        <p>Follow the step-by-step instructions to execute simple tasks with OpenVINO. </p>
@ -81,15 +82,15 @@
     <a href="tutorials.html" >
        <h4>Python Tutorials		</h4>
        <p>Learn from a choice of interactive Python tutorials targeting typical OpenVINO use cases.	</p>
-     </a> 		
+     </a>
     <a href="workbench_docs_Workbench_DG_Introduction.html" >
        <h4>DL Workbench		</h4>
        <p>Use a web-based version of OpenVINO with a Graphical User Interface. Installing a DL Workbench container is required. </p>
-     </a> 
+     </a>
     <a href="openvino_docs_IE_DG_Samples_Overview.html" >
        <h4>OpenVINO samples	</h4>
        <p>See ready-made applications explaining OpenVINO features and various use-cases.		</p>
-     </a> 
+     </a>
     <a href="openvino_docs_IE_DG_Samples_Overview.html" >
        <h4>Reference Implementation For Speech Recognition Apps</h4>
        <p>Use a speech recognition demo and Kaldi* model conversion tool as reference. </p>
@ -97,7 +98,7 @@
     <a href="http://devcloud.intel.com/edge/" >
        <h4>Intel® DevCloud 	</h4>
        <p>Develop, test, and run your OpenVINO solution for free on a cluster of the latest Intel® hardware. </p>
-     </a> 
+     </a>
   </div>
   <div style="clear:both;"> </div>

--- a/docs/install_guides/configurations-for-intel-gna.md
+++ b/docs/install_guides/configurations-for-intel-gna.md
@ -0,0 +1,30 @@
+# Configurations for Intel® Gaussian & Neural Accelerator (GNA) with Intel® Distribution of OpenVINO™ toolkit {#openvino_docs_install_guides_configurations_for_intel_gna}
+
+This page introduces additional configurations for Intel® Gaussian & Neural Accelerator (GNA) with Intel® Distribution of OpenVINO™ toolkit on Linux and Windows.
+
+> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only.
+
+### Drivers and Dependencies
+
+Intel® GNA hardware requires a driver to be installed on the system.
+
+@sphinxdirective
+
+.. _gna guide:
+
+@endsphinxdirective
+
+## Linux
+
+[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.4+)](https://storage.openvinotoolkit.org/drivers/gna/)
+
+@sphinxdirective
+
+.. _gna guide windows:
+
+@endsphinxdirective
+
+## Windows
+
+Intel® GNA driver for Windows is available through Windows Update\*
+
--- a/docs/install_guides/installing-openvino-linux.md
+++ b/docs/install_guides/installing-openvino-linux.md
@ -17,7 +17,7 @@

  Optimized for these processors:

-  * 6th to 12th generation Intel® Core™ processors and Intel® Xeon® processors 
+  * 6th to 12th generation Intel® Core™ processors and Intel® Xeon® processors
  * 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake)
  * Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake)
  * Intel Atom® processor with support for Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1)
@ -28,9 +28,9 @@

 .. tab:: Processor Notes

-  Processor graphics are not included in all processors. 
+  Processor graphics are not included in all processors.
  See `Product Specifications`_ for information about your processor.
-  
+
  .. _Product Specifications: https://ark.intel.com/

@endsphinxdirective
@ -74,11 +74,11 @@ This guide provides step-by-step instructions on how to install the Intel® Dist
   <br>You should see the following dialog box open up:

   @sphinxdirective
-   
+
   .. image:: _static/images/openvino-install.png
      :width: 400px
      :align: center
-   
+
   @endsphinxdirective

   Otherwise, you can add parameters `-a` for additional arguments and `--cli` to run installation in command line (CLI):
@ -86,7 +86,7 @@ This guide provides step-by-step instructions on how to install the Intel® Dist
   ./l_openvino_toolkit_p_<version>.sh -a --cli
   ```
   > **NOTE**: To get additional information on all parameters that can be used, use the help option: `--help`. Among others, you can find there `-s` option which offers silent mode, which together with `--eula approve` allows you to run whole installation with default values without any user inference.
-   
+
 6. Follow the instructions on your screen. During the installation you will be asked to accept the license agreement. Your acceptance is required to continue. Check the installation process on the image below:<br>

   ![](../img/openvino-install-linux-run-boostrapper-script.gif)
@ -114,7 +114,7 @@ This script enables you to install Linux platform development tools and componen
   ```sh
   sudo -E ./install_openvino_dependencies.sh
   ```
-   
+
   Once the dependencies are installed, continue to the next section to set your environment variables.

 ## <a name="set-the-environment-variables"></a>Step 3: Configure the Environment
@ -123,7 +123,7 @@ You must update several environment variables before you can compile and run Ope

 ```sh
 source <INSTALL_DIR>/setupvars.sh
-```  
+```

 If you have more than one OpenVINO™ version on your machine, you can easily switch its version by sourcing `setupvars.sh` of your choice.

@ -151,6 +151,10 @@ The environment variables are set. Next, you can download some additional tools.
 ## <a name="optional-steps"></a>Step 5 (Optional): Configure Inference on Non-CPU Devices

@sphinxdirective
+.. tab:: GNA
+
+   Only if you want to enable the toolkit components to use Intel® Gaussian & Neural Accelerator (GNA) on your system, follow the steps in :ref:`GNA Setup Guide <gna guide>`.
+
 .. tab:: GPU

   To enable the toolkit components to use processor graphics (GPU) on your system, follow the steps in :ref:`GPU Setup Guide <gpu guide>`.
@ -163,7 +167,7 @@ The environment variables are set. Next, you can download some additional tools.
 .. tab:: VPU

   To install and configure your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, see the :ref:`VPU Configuration Guide <vpu guide>`.
-   After configuration is done, you are ready to run the verification scripts with the HDDL Plugin for your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. 
+   After configuration is done, you are ready to run the verification scripts with the HDDL Plugin for your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.

   .. warning::
      While working with either HDDL or NCS, choose one of them as they cannot run simultaneously on the same machine.
@ -193,15 +197,15 @@ To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalli
 .. dropdown:: Troubleshooting

   PRC developers might encounter pip errors during Intel® Distribution of OpenVINO™ installation. To resolve the issues, try one of the following options:
-   
-   * Add the download source using the ``-i`` parameter with the Python ``pip`` command. For example: 
+
+   * Add the download source using the ``-i`` parameter with the Python ``pip`` command. For example:

   .. code-block:: sh

      pip install openvino-dev -i https://mirrors.aliyun.com/pypi/simple/

   Use the ``--trusted-host`` parameter if the URL above is ``http`` instead of ``https``.
-   
+
   * If you run into incompatibility issues between components after installing new Intel® Distribution of OpenVINO™ version, try running ``requirements.txt`` with the following command:

   .. code-block:: sh
@ -213,21 +217,21 @@ To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalli
@sphinxdirective

 .. dropdown:: Additional Resources
-      
+
   * Convert models for use with OpenVINO™: :ref:`Model Optimizer Developer Guide <deep learning model optimizer>`
   * Write your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide <deep learning inference engine>`
   * Information on sample applications: :ref:`OpenVINO™ Toolkit Samples Overview <code samples>`
   * Information on a supplied set of models: :ref:`Overview of OpenVINO™ Toolkit Pre-Trained Models <model zoo>`
-   * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_ 
-      
+   * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_
+
   To learn more about converting models from specific frameworks, go to:
-      
+
   * :ref:`Convert Your Caffe Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe>`
   * :ref:`Convert Your TensorFlow Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow>`
   * :ref:`Convert Your MXNet Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_MxNet>`
   * :ref:`Convert Your Kaldi Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi>`
   * :ref:`Convert Your ONNX Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX>`
-      
+
   .. _Intel® IoT Developer Kit: https://github.com/intel-iot-devkit

@endsphinxdirective
--- a/docs/install_guides/installing-openvino-windows.md
+++ b/docs/install_guides/installing-openvino-windows.md
@ -13,7 +13,7 @@

  Optimized for these processors:

-  * 6th to 12th generation Intel® Core™ processors and Intel® Xeon® processors 
+  * 6th to 12th generation Intel® Core™ processors and Intel® Xeon® processors
  * 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake)
  * Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake)
  * Intel Atom® processor with support for Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1)
@ -21,12 +21,12 @@
  * Intel® Iris® Xe MAX Graphics
  * Intel® Neural Compute Stick 2
  * Intel® Vision Accelerator Design with Intel® Movidius™ VPUs
-  
+
 .. tab:: Processor Notes

-  Processor graphics are not included in all processors. 
+  Processor graphics are not included in all processors.
  See `Product Specifications`_ for information about your processor.
-  
+
  .. _Product Specifications: https://ark.intel.com/

 .. tab:: Software
@ -34,13 +34,13 @@
  * `Microsoft Visual Studio 2019 with MSBuild <http://visualstudio.microsoft.com/downloads/>`_
  * `CMake 3.14 or higher, 64-bit <https://cmake.org/download/>`_
  * `Python 3.6 - 3.9, 64-bit <https://www.python.org/downloads/windows/>`_
-  
+
  .. note::
    You can choose to download Community version. Use `Microsoft Visual Studio installation guide <https://docs.microsoft.com/en-us/visualstudio/install/install-visual-studio?view=vs-2019>`_ to walk you through the installation. During installation in the **Workloads** tab, choose **Desktop development with C++**.

  .. note::
    You can either use `cmake<version>.msi` which is the installation wizard or `cmake<version>.zip` where you have to go into the `bin` folder and then manually add the path to environmental variables.
-  
+
  .. important::
    As part of this installation, make sure you click the option **Add Python 3.x to PATH** to `add Python <https://docs.python.org/3/using/windows.html#installation-steps>`_ to your `PATH` environment variable.

@ -53,24 +53,24 @@ This guide provides step-by-step instructions on how to install the Intel® Dist
 1. <a href="#install-openvino">Install the Intel® Distribution of OpenVINO™ Toolkit</a>
 2. <a href="#set-the-environment-variables">Configure the Environment</a>
 3. <a href="#model-optimizer">Download additional components (Optional)</a>
-4. <a href="#optional-steps">Configure Inference on non-CPU Devices (Optional)</a>  
+4. <a href="#optional-steps">Configure Inference on non-CPU Devices (Optional)</a>
 5. <a href="#get-started">What's next?</a>

 ## <a name="install-openvino"></a>Step 1: Install the Intel® Distribution of OpenVINO™ toolkit Core Components

 1. Download the Intel® Distribution of OpenVINO™ toolkit package file from [Intel® Distribution of OpenVINO™ toolkit for Windows](https://software.intel.com/en-us/openvino-toolkit/choose-download).
   Select the Intel® Distribution of OpenVINO™ toolkit for Windows package from the dropdown menu.
-   
+
 2. Go to the `Downloads` folder and double-click `w_openvino_toolkit_p_<version>.exe`. In the opened window, you can select the folder where installer files will be placed. The directory will be referred to as <INSTALL_DIR> elsewhere in the documentation. Once the files are extracted, you should see the following dialog box open up:

   @sphinxdirective
-   
+
   .. image:: _static/images/openvino-install.png
     :width: 400px
     :align: center
-   
+
   @endsphinxdirective
-   
+
 3. Follow the instructions on your screen. During the installation you will be asked to accept the license agreement. Your acceptance is required to continue. Check out the installation process in the image below:<br>
   ![](../img/openvino-install-win-run-boostrapper-script.gif)
   Click on the image to see the details.
@ -111,26 +111,26 @@ The environment variables are set. Next, you can download some additional tools.

   .. note::
      No prerequisites are needed.
-      
-   There are three ways to run the script:
-   
-   * GUI: right-click the script and select ``Run with PowerShell``.
-      
-   * Command prompt (CMD) console:
-   
-   .. code-block:: sh
-   
-      powershell <INSTALL_DIR>\extras\scripts\download_opencv.ps1
-      
-      
-   * PowerShell console:
-   
-   .. code-block:: sh
-   
-      .\<INSTALL_DIR>\scripts\download_opencv.ps1 
-      

-   If the Intel® Distribution of OpenVINO™ is installed to the system location (e.g. ``Program Files (x86)``) then privilege elevation dialog will be shown. The script can be run from CMD/PowerShell Administrator console to avoid this dialog in case of system-wide installation. 
+   There are three ways to run the script:
+
+   * GUI: right-click the script and select ``Run with PowerShell``.
+
+   * Command prompt (CMD) console:
+
+   .. code-block:: sh
+
+      powershell <INSTALL_DIR>\extras\scripts\download_opencv.ps1
+
+
+   * PowerShell console:
+
+   .. code-block:: sh
+
+      .\<INSTALL_DIR>\scripts\download_opencv.ps1
+
+
+   If the Intel® Distribution of OpenVINO™ is installed to the system location (e.g. ``Program Files (x86)``) then privilege elevation dialog will be shown. The script can be run from CMD/PowerShell Administrator console to avoid this dialog in case of system-wide installation.
   The script is interactive by default, so during the execution it will wait for user to press ``Enter`` If you want to avoid this, use the ``-batch`` option, e.g. ``powershell <openvino>\extras\scripts\download_opencv.ps1 -batch``. After the execution of the script, you will find OpenCV extracted to ``<INSTALL_DIR>/extras/opencv``.

@endsphinxdirective
@ -138,6 +138,10 @@ The environment variables are set. Next, you can download some additional tools.
 ## <a name="optional-steps"></a>Step 4 (Optional): Configure Inference on non-CPU Devices

@sphinxdirective
+.. tab:: GNA
+
+   Only if you want to enable the toolkit components to use Intel® Gaussian & Neural Accelerator (GNA) on your system, follow the steps in :ref:`GNA Setup Guide <gna guide windows>`.
+
 .. tab:: GPU

   To enable the toolkit components to use processor graphics (GPU) on your system, follow the steps in :ref:`GPU Setup Guide <gpu guide windows>`.
@ -161,7 +165,7 @@ Developing in C++:
   * [Image Classification Async C++ Sample](@ref openvino_inference_engine_samples_classification_sample_async_README)
   * [Hello Classification C++ Sample](@ref openvino_inference_engine_samples_hello_classification_README)
   * [Hello Reshape SSD C++ Sample](@ref openvino_inference_engine_samples_hello_reshape_ssd_README)
-    
+
 ## <a name="uninstall"></a>Uninstall the Intel® Distribution of OpenVINO™ Toolkit

 To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalling-openvino.md).
@ -169,21 +173,21 @@ To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalli
@sphinxdirective

 .. dropdown:: Additional Resources
-      
+
   * Convert models for use with OpenVINO™: :ref:`Model Optimizer Developer Guide <deep learning model optimizer>`
   * Write your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide <deep learning inference engine>`
   * Information on sample applications: :ref:`OpenVINO™ Toolkit Samples Overview <code samples>`
   * Information on a supplied set of models: :ref:`Overview of OpenVINO™ Toolkit Pre-Trained Models <model zoo>`
-   * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_ 
-      
+   * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_
+
   To learn more about converting models from specific frameworks, go to:
-      
+
   * :ref:`Convert Your Caffe Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe>`
   * :ref:`Convert Your TensorFlow Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow>`
   * :ref:`Convert Your MXNet Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_MxNet>`
   * :ref:`Convert Your Kaldi Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi>`
   * :ref:`Convert Your ONNX Model <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX>`
-      
+
   .. _Intel® IoT Developer Kit: https://github.com/intel-iot-devkit

@endsphinxdirective
--- a/docs/snippets/gna/configure.cpp
+++ b/docs/snippets/gna/configure.cpp
@ -0,0 +1,17 @@
+// Copyright (C) 2022 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+//! [include]
+#include <openvino/openvino.hpp>
+#include <openvino/runtime/intel_gna/properties.hpp>
+//! [include]
+
+int main() {
+const std::string model_path = "model.xml";
+//! [ov_gna_exec_mode_hw_with_sw_fback]
+ov::Core core;
+auto model = core.read_model(model_path);
+auto compiled_model = core.compile_model(model, "GNA",
+   ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::HW_WITH_SW_FBACK));
+//! [ov_gna_exec_mode_hw_with_sw_fback]
+}
--- a/docs/snippets/gna/configure.py
+++ b/docs/snippets/gna/configure.py
@ -0,0 +1,15 @@
+# Copyright (C) 2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+#! [import]
+from openvino.runtime import Core
+#! [import]
+
+model_path = "model.xml"
+
+#! [ov_gna_exec_mode_hw_with_sw_fback]
+core = Core()
+model = core.read_model(model=model_path)
+compiled_model = core.compile_model(model, device_name="GNA",
+    config={ 'GNA_DEVICE_MODE' : 'GNA_HW_WITH_SW_FBACK'})
+#! [ov_gna_exec_mode_hw_with_sw_fback]
--- a/docs/snippets/gna/import_export.cpp
+++ b/docs/snippets/gna/import_export.cpp
@ -0,0 +1,29 @@
+// Copyright (C) 2022 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+//! [include]
+#include <fstream>
+#include <openvino/openvino.hpp>
+//! [include]
+
+int main() {
+const std::string model_path = "model.xml";
+const std::string blob_path = "compiled_model.blob";
+
+ov::Core core;
+auto model = core.read_model(model_path);
+auto compiled_model = core.compile_model(model, "GNA");
+
+{
+//! [ov_gna_export]
+std::ofstream ofs(blob_path, std::ios_base::binary | std::ios::out);
+compiled_model.export_model(ofs);
+//! [ov_gna_export]
+}
+{
+//! [ov_gna_import]
+std::ifstream ifs(blob_path, std::ios_base::binary | std::ios_base::in);
+auto compiled_model = core.import_model(ifs, "GNA");
+//! [ov_gna_import]
+}
+}
--- a/docs/snippets/gna/import_export.py
+++ b/docs/snippets/gna/import_export.py
@ -0,0 +1,26 @@
+# Copyright (C) 2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+#! [import]
+from openvino.runtime import Core
+from io import BytesIO
+#! [import]
+
+model_path = "model.xml"
+blob_path = "compiled_model.blob"
+
+core = Core()
+model = core.read_model(model=model_path)
+compiled_model = core.compile_model(model, device_name="GNA")
+
+#! [ov_gna_export]
+user_stream = compiled_model.export_model()
+with open(blob_path, 'wb') as f:
+    f.write(user_stream)
+#! [ov_gna_export]
+
+# [ov_gna_import]
+with open(blob_path, 'rb') as f:
+    buf = BytesIO(f.read())
+    compiled_model = core.import_model(buf, device_name="GNA")
+# [ov_gna_import]
--- a/docs/snippets/gna/set_batch.cpp
+++ b/docs/snippets/gna/set_batch.cpp
@ -0,0 +1,29 @@
+// Copyright (C) 2022 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+//! [include]
+#include <openvino/openvino.hpp>
+//! [include]
+
+int main() {
+const std::string model_path = "model.xml";
+size_t batch_size = 8;
+
+//! [ov_gna_read_model]
+ov::Core core;
+auto model = core.read_model(model_path);
+//! [ov_gna_read_model]
+
+//! [ov_gna_set_nc_layout]
+ov::preprocess::PrePostProcessor ppp(model);
+for (const auto& input : model->inputs()) {
+    auto& in = ppp.input(input.get_any_name());
+    in.model().set_layout(ov::Layout("N?"));
+}
+model = ppp.build();
+//! [ov_gna_set_nc_layout]
+
+//! [ov_gna_set_batch_size]
+ov::set_batch(model, batch_size);
+//! [ov_gna_set_batch_size]
+}
--- a/docs/snippets/gna/set_batch.py
+++ b/docs/snippets/gna/set_batch.py
@ -0,0 +1,27 @@
+# Copyright (C) 2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+#! [import]
+from openvino.runtime import Core, set_batch
+from openvino.preprocess import PrePostProcessor
+#! [import]
+
+model_path = "model.xml"
+batch_size = 8
+
+#! [ov_gna_read_model]
+core = Core()
+model = core.read_model(model=model_path)
+#! [ov_gna_read_model]
+
+#! [ov_gna_set_nc_layout]
+ppp = PrePostProcessor(model)
+for i in range(len(model.inputs)):
+    input_name = model.input(i).get_any_name()
+    ppp.input(i).model().set_layout("N?")
+model = ppp.build()
+#! [ov_gna_set_nc_layout]
+
+#! [ov_gna_set_batch_size]
+set_batch(model, batch_size)
+#! [ov_gna_set_batch_size]