* [LPT] INT16, INT32 Quantization support
* [LPT] Support build on platforms with size_t == unsigned int
* [LPT] Test and fix wrong constant
* Fix build for size_t = unsigned int
* [LPT] getDequantization: added precision limitation for dequantization constants
* [TESTS] GetDequantizationTransformation: added test case with unsupported precision
* Allocate internal buffer to usm_device when one of the input tensor is from usm_device.
Allocate output tensors if there is no user which is cpu impl.
* Move intermediate buffer allocation to primitive_inst
* Allocate to usm_host when the internal buffer is allocated close to limitation of device memory
* Remove internal_buffer_info and replace it with vector of layout.
Updated conditions to use alloc_type w.r.t the availability.
* Allocate internal buffer within primitive_inst construction
* Fixed device_mem allocation condition aligned with driver team
- Single allocation should be less than CL_DEVICE_MAX_MEM_ALLOC_SIZE
- Total allocation for a kernel should be less than CL_DEVICE_GLOBAL_MEM_SIZE
* Apply review comment
* [GNA] Add support for DWSC, other fixes and code refactoring.
* [GNA] Change supported layout to NHWC
* [GNA] Detect bias const only on second position, move verification of dwsc to matcher
* Added RandomUniformFusion transformation.
* Extended transformations for case with Convert, extended transformations for general min and max value case.
* Set to const unchanged variables.
* Apply suggestions from code review
Co-authored-by: Gleb Kazantaev <gleb.nnstu@gmail.com>
* Reformat code, small corrections.
* Added const shape checks.
* Fixed transformation for case of different const ranks.
* Added type checks.
* Apply suggestions from code review
Co-authored-by: Gleb Kazantaev <gleb.nnstu@gmail.com>
* United RandomUniformMulFusion and RandomUniformAddFusion to single transformation.
* Added negative tests.
* Used get_constant_from_source().
* Moved transformation to common fusions.
* Added const refs.
* Update inference-engine/src/transformations/src/transformations/common_optimizations/random_uniform_fusion.cpp
Co-authored-by: Gleb Kazantaev <gleb.nnstu@gmail.com>
* Changed to single class.
* Corrected IRs checks in layer tests.
* Small corrections.
Co-authored-by: Gleb Kazantaev <gleb.nnstu@gmail.com>
* Add MulConvFusion transformation
This transformation is applied to a following graph:
```
+-------+ +----------+
| Input | | Constant |
+-------+ +----------+
| |
------ ------
| |
v v
+----------+ +---------+
| Multiply | | Weights |
+----------+ +---------+
| |
----------- ----------
| |
v v
+----------------+
| Convolution Op |
+----------------+
```
and converts it to:
```
+---------+ +----------+
| Weights | | Constant |
+---------+ +----------+
| |
------ ------
| |
v v
+-------+ +----------+
| Input | | Multiply |
+-------+ +----------+
| |
----------- ----------
| |
v v
+----------------+
| Convolution Op |
+----------------+
```
Since 'Weights' are constants in most cases, the right hand side gets constant folded,
and we eliminate Multiply node.
Ticket: 52283
* Handle GroupConvolution, ConvolutionBackpropData, GroupConvolutionBackpropData in separate transformations
* Handle dequantization subgraph
* add namespace
* add more ngraph namespace
* address review comments
* fix build issue due to implicit-const-int-float-conversion and remove unused lambda function
* just remove it instead of commenting out
Co-authored-by: FuhengWu@Oracle <fuheng.wu@oracle.com>
* rebasing the perf-modes-2021.3 to the 2021.4
Caveats:
the (explicit) setting #streams is not disabled (as it was before for experiments with DLBenchmark), and the logic slighlty differ (streamsSet)
(cherry picked from commit 1ae1edc0ed)
* overriding streams (to force the TPUT mode to the DLBenchnark)
(cherry picked from commit 7f506cda31)
* disabling reducing #streams to fully mimic baseline c4df94d42d of the 2021.3 (before experiments)
(cherry picked from commit 85073dd1dd)
* clang/identation
(cherry picked from commit 050a4155a9)
* splitting the Transformation to general and CPU specific.
Now hopefully,this fully mimics the baseline c4df94d42d of the 2021.3 (before experiments), as the streams reduce num (as well as early exit on GRU/LSTM/TensorIterator) is deisabled
(cherry picked from commit e98b2c1a67)
* disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-limited only for int8
(cherry picked from commit 32b8d80dee)
* refactored to avoid compute_limited_ratio, reverted the reducing #streams, removed LSTM from limitations
(cherry picked from commit f2b972171b)
* isa-based threshold logic
(cherry picked from commit b218457e1a)
* mode->hint
(cherry picked from commit ec20aa8eca)
* optional PERFORMANCE_HINT_NUM_REQUESTS
(cherry picked from commit 5a3883e3f3)
* moving the perfHints to the common OV config class + initial tests (CPU only, as the actual AUTO/MULTI should be accommodated on the master)
(cherry picked from commit (then fixed)45bafe7d527f466507dea0693aeed51be4ebf776)
* AUTO support for PerfHints
* MULTI support for PerfHints
* Enabling Perf hints for the GPU plugin
* brushing settings output a bit
* disabling "throughput" perf hint being default (until OV 2.0)
* uncommenting the logic which was disabled to force the DLBenchmark to use the throughput mode by default
* removing dead and experimental code, and debug printfs
* clang/code-style
* code-review remarks
* Moved the output of the actual params that the hint produced to the right place
* aligning MULTI's GetConfig beh to HETERO's as captured in the preso (CVS-59960) ratified with the ArchForum
* clang
* benchmark_app brushing
* Update inference-engine/samples/benchmark_app/README.md
* propagating the perf hints thru one more scenario in the merged AUTO-MULTI
* fixed mispint
* Python benchmark_app update for perf hints
* addresssing reviewers comments on the python benchmark_app
* simplifying/brushing logic a bit
* refactor the heuristic to the separate file (to be shared with iGPU soon)
* refactor conversion of modes to the specific GPU config per feedback from Vladimir