Commit Graph

419 Commits

Author SHA1 Message Date
Zhang Yi
c1206ef447 [CPU] SoftMax cache (#9480)
* [CPUCache]SoftMax cache

* [CpuCache]fix bf16 tests

* [CPUCache]apply review comments

* [CPUCache]fix compilation
2022-01-10 18:46:57 +03:00
Alexandra Sidorova
af105b86f8 [CPU] Fixed Replicate via ov::Model (#9252) 2022-01-10 17:51:33 +03:00
Vladislav Volkov
e52c96389d [CPU] Bug in jit_convert fixed (#9485) 2021-12-30 18:22:16 +03:00
Edward Shogulin
ec5198094a [CPU] PriorBox & PriorBoxClustered dynamism enabling (#8597) 2021-12-30 17:43:16 +03:00
Maxim Andronov
8ba94cfb8f [CPU] Fix memory allocation for non default shape infer path (#9475) 2021-12-30 17:04:33 +03:00
Chen Xu
bea10d6e3c [CPU] Optimize Broadcast node for case with scalar input (#9358) 2021-12-30 16:47:48 +03:00
Vladislav Volkov
1ee8007764 [CPU] NV12toRGB and NV12toBGR operations for CPU plugin (#8628) 2021-12-29 13:46:02 +03:00
Maxim Andronov
2e433620b7 [CPU] General fixes for dynamic shapes. Part 3 (#9338) 2021-12-29 13:43:35 +03:00
Maksim Kutakov
2870dc7d3f [CPU] Cache for runtime data (#9192)
Caching added for Eltwise and MatMul nodes
2021-12-29 09:19:45 +03:00
Vladislav Volkov
cb9fe0910d [CPU] Broken support for Layout::ANY in CPU plugin (#9434) 2021-12-29 09:09:56 +03:00
Luwei Zhou
ce753f41dc [shape_infer]shape inference implement of Select Detectionoutput and Shufflechannels OPs (#8348)
* Implement detection_output shape infer

* revise and update the code flow

* update based on review.

* Update based on review

* Implement the shuffle_channels Op shape inference.

* Fix CI coding style issue.

* Implement the select OP shape inference.

* Update based on the review  comments

* Update based on the review comments.

* Add pragma once for the shape inference head.

* Add new shape_infer test file for detection_output OP.

* Ensure the header would only be included once.

* Add shuffle_channels OP shape infer test.

* Add shape_infer() invocations into shape_inference() API

shape_inference() API support Select, ShuffleChannels, DetectionOutput OPs
Fix extra pragma, unnecessary friend function declaration.

* Update based on the review comments.

* Move the shape infer API helpers into new folder.

* Applied review comments.

* Applied 2nd review comments

* Applied review comments

* Fix coding style.

* Update

* Applied review comments.

* Fix comipling issue of unused variable.

* Fix the CI issue.

* Update the coding style

* Move test cases into new folder

* Applied  review comments.
2021-12-29 05:39:50 +03:00
Chenhu Wang
a83bcee4bd [CPU] NMS optimization (#8312) 2021-12-27 15:51:50 +03:00
Egor Shulman
b454076a56 [CPU] Fixed leftovers for ExperimentalDetectronTopKROIs and klocwork issue (#7885) 2021-12-26 21:25:33 +03:00
Alexandra Sidorova
fa2647f965 [CPU] Added dynamism support for If (#8967) 2021-12-24 19:43:30 +03:00
Alexandra Sidorova
91945ba122 [CPU] Added dynamism support for TensorIterator (#8879) 2021-12-24 15:08:42 +03:00
Maxim Vafin
3f35e2a321 Enable new FP16 and support mixed precision by MO (#8514)
* Enable new FP16 format and support mixed precision

* Apply review comments

* Fix issue with fp64 in FakeQuantWithMinMaxVars.py

* Enabme decompression converts fusing for CPU plugin

* Apply review feedback

* Fix code style

* Fix issue with np.full and apply review feedback

* Apply review feedback

* Fix HardSigmoid onnx extractor

* Replace np.arrays that were skipped with mo_array

* Fix compress_quantized_weights_test.py

* Fix import issues

* Apply review feedback and fix type of fusing linops in MO

* Apply review feedback

* Fix types for Mean/Scales and MXNET zeros

* Add RandomUniform_8 to ConvertPrecision

* Fix merge issue

* Fix consts names collision in GPU plugin
2021-12-24 14:00:37 +03:00
Maxim Shevtsov
49b5e5728b Auto Batching impl (#7883)
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)

(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)

* applying/accomodating the API changes after rebase to the master

* replaying modified version of actual batch selection

* eearly experiments with model mem footprint

* changes from rebasing to the latest master

* experimenting with DG1 on the batch size selection, also collecting the mem footprint

* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap

* quick-n-dirty batch footpint vs device total mem

* code style

* testing which models perform badly due to kernels and NOT (batched) footprint

* stub  pipeline task to comunicate the readiness rather than promise/future

* quick-n-dirty timeout impl

* explicit _completionTasks,reverting BA to use the timeout

* inputs outputs copies, works with AUTO and demo now

* accomodate the config per device-id, after rebase to the latest master

* allowing the auto-batching only with tput hint to let more conventional tests pass

* fix the pre-mature timeout restaring via waiting for batch1 requests completion

* moved the bacthed request statring ( along with input copies) to the dedicated thread

* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution

* code style

* increasing the timeout to test the ssd_* models perf (timeout?) issues

* reducing number of output stuff in BA to avoid bloating the logs in experiments

* more aggressive batching for experiments, not limited to 32 and also 4 as a min

* more accurate timeout debugging info

* getting the reqs limitation from the plugin SetConfig as well

* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context

* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)

* auto-batching functional test (with results check vs ref) and GPU instance for that

* fixed arithemtic on blobs ptrs

* clang

* handling possible batched network failure

* BATCH as the constants device name in test

* ENABLE_BATCH

* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)

* DetectionOutput hetero test for the CPU

* reenabling the Auto-Batching in the AUTO

* auto-batching device enabled in the test

* fixed the DO test

* improve the loading loop logic

* brushed the config keys

* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests

* fix the test after refactoring

* clang

* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now

* auto-batching hetero test (subgraph with DetectionOutput)

* fixed minor changes that were result of experiments with impl

* code-style

* brushing, disabling CPU's HETERO tests until planned activity for 22.2

* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team

* remote blobs tests for the auto-batching (old API)

* brushed names a bit

* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests

* fixed the ieUnitTests with adding CreateContext stub to the MockICore

* clang

* improved remote-blobs tests

* revert the back BA from exeprimenents with AB + device_use_mem

* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE

* remote blobs 2.0 tests, issue with context having the orig device name

* debugging DG1 perf drop (presumably due to non-fitting the device-mem)

* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2

* remote blobs 2.0 tests for different tensor sharing types

* converting assert to throw to accomodate legacy API where the lock() was possible to be called

* revert the timeout back to avoid mixing the studies, fixed the footprint calc

* reverting to estimating the max batch by extrapolating from bacth1 size

* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication

* even graceful batch 1 handling without duplication

* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching

* AutoBatchPlugin -> ov_auto_batch_plugin

* WA for gcc 4.8

* clang

* fix misprint

* fixed errors resulted from recent OV's Variant to Any transition

* skip auto-batching for already-batched networks

* AUTO_BATCH_TIMEOUT and tests

* GPU-specific L3

* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic

* debugging device info

* enabling the config tests for the GPU and fixing the Auto-batching tests to pass

* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers

* skip auto-batching for RNNs and alikes (e.g. single CHW input)

* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating

* brushing

* Auto plugin GetMetric support gpu auto-batch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add test case

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add comments on test

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the vars names, alos adding the excpetion handling

* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures

* add try catch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the code changed in the GPU plugin

* Auto-Batch requests tests

* brushed varibles a bit (ref)

* cleaned debug output from the ie_core

* cleaned cmake for the Auto-Batch

* removed batchN estimation from batch1

* cleaned from debug printf

* comments, cleanup

* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13

* Adding back  removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1.

* brushing ie_core.cpp

* fix 32bit compilation

* Code review: ENABLE_AUTO_BATCH

* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching

* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE  wrt MODEL_PTR

* default value for the OPTIMAL_BATCH_SIZE

* clang

* accomodate new func tests location

* fix shuffle of headers after clang + copyrights

* fixed misprint made during code refactoring

* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header

* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching

* switching from the unsafe size() and minimizing time under lock

* code style

* brushed the ApplyAutoBatching

* brushed the netric/config names and descriptions

* completed the core intergration tests for the auto-batching

* ExecGraphInfo and check for incorrect cfg

* removed explicit dependencies from cmake file of the plugin

* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests

Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
2021-12-24 12:55:22 +03:00
Alexey Varyzgin
a40b5bf15e [CPU][INT8][Intel OMZ / Public] Third dimension issue in FuseConvolutionAndZeroPoints (#9385) 2021-12-24 10:13:30 +03:00
Vladislav Volkov
60a11a6348 [CPU] Renamed CPU plugin to ov_intel_cpu_plugin (#9342) 2021-12-23 11:49:25 +03:00