openvino

Author	SHA1	Message	Date
Zhang Yi	c1206ef447	[CPU] SoftMax cache (#9480 ) * [CPUCache]SoftMax cache * [CpuCache]fix bf16 tests * [CPUCache]apply review comments * [CPUCache]fix compilation	2022-01-10 18:46:57 +03:00
Alexandra Sidorova	af105b86f8	[CPU] Fixed Replicate via ov::Model (#9252 )	2022-01-10 17:51:33 +03:00
Vladislav Volkov	e52c96389d	[CPU] Bug in jit_convert fixed (#9485 )	2021-12-30 18:22:16 +03:00
Edward Shogulin	ec5198094a	[CPU] PriorBox & PriorBoxClustered dynamism enabling (#8597 )	2021-12-30 17:43:16 +03:00
Maxim Andronov	8ba94cfb8f	[CPU] Fix memory allocation for non default shape infer path (#9475 )	2021-12-30 17:04:33 +03:00
Chen Xu	bea10d6e3c	[CPU] Optimize Broadcast node for case with scalar input (#9358 )	2021-12-30 16:47:48 +03:00
Vladislav Volkov	1ee8007764	[CPU] NV12toRGB and NV12toBGR operations for CPU plugin (#8628 )	2021-12-29 13:46:02 +03:00
Maxim Andronov	2e433620b7	[CPU] General fixes for dynamic shapes. Part 3 (#9338 )	2021-12-29 13:43:35 +03:00
Maksim Kutakov	2870dc7d3f	[CPU] Cache for runtime data (#9192 ) Caching added for Eltwise and MatMul nodes	2021-12-29 09:19:45 +03:00
Vladislav Volkov	cb9fe0910d	[CPU] Broken support for Layout::ANY in CPU plugin (#9434 )	2021-12-29 09:09:56 +03:00
Luwei Zhou	ce753f41dc	[shape_infer]shape inference implement of Select Detectionoutput and Shufflechannels OPs (#8348 ) * Implement detection_output shape infer * revise and update the code flow * update based on review. * Update based on review * Implement the shuffle_channels Op shape inference. * Fix CI coding style issue. * Implement the select OP shape inference. * Update based on the review comments * Update based on the review comments. * Add pragma once for the shape inference head. * Add new shape_infer test file for detection_output OP. * Ensure the header would only be included once. * Add shuffle_channels OP shape infer test. * Add shape_infer() invocations into shape_inference() API shape_inference() API support Select, ShuffleChannels, DetectionOutput OPs Fix extra pragma, unnecessary friend function declaration. * Update based on the review comments. * Move the shape infer API helpers into new folder. * Applied review comments. * Applied 2nd review comments * Applied review comments * Fix coding style. * Update * Applied review comments. * Fix comipling issue of unused variable. * Fix the CI issue. * Update the coding style * Move test cases into new folder * Applied review comments.	2021-12-29 05:39:50 +03:00
Chenhu Wang	a83bcee4bd	[CPU] NMS optimization (#8312 )	2021-12-27 15:51:50 +03:00
Egor Shulman	b454076a56	[CPU] Fixed leftovers for ExperimentalDetectronTopKROIs and klocwork issue (#7885 )	2021-12-26 21:25:33 +03:00
Alexandra Sidorova	fa2647f965	[CPU] Added dynamism support for If (#8967 )	2021-12-24 19:43:30 +03:00
Alexandra Sidorova	91945ba122	[CPU] Added dynamism support for TensorIterator (#8879 )	2021-12-24 15:08:42 +03:00
Maxim Vafin	3f35e2a321	Enable new FP16 and support mixed precision by MO (#8514 ) * Enable new FP16 format and support mixed precision * Apply review comments * Fix issue with fp64 in FakeQuantWithMinMaxVars.py * Enabme decompression converts fusing for CPU plugin * Apply review feedback * Fix code style * Fix issue with np.full and apply review feedback * Apply review feedback * Fix HardSigmoid onnx extractor * Replace np.arrays that were skipped with mo_array * Fix compress_quantized_weights_test.py * Fix import issues * Apply review feedback and fix type of fusing linops in MO * Apply review feedback * Fix types for Mean/Scales and MXNET zeros * Add RandomUniform_8 to ConvertPrecision * Fix merge issue * Fix consts names collision in GPU plugin	2021-12-24 14:00:37 +03:00
Maxim Shevtsov	49b5e5728b	Auto Batching impl (#7883 ) * auto-batching POC squashed (all commits from auto-batch-2021.3 branch) (cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7) * applying/accomodating the API changes after rebase to the master * replaying modified version of actual batch selection * eearly experiments with model mem footprint * changes from rebasing to the latest master * experimenting with DG1 on the batch size selection, also collecting the mem footprint * WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap * quick-n-dirty batch footpint vs device total mem * code style * testing which models perform badly due to kernels and NOT (batched) footprint * stub pipeline task to comunicate the readiness rather than promise/future * quick-n-dirty timeout impl * explicit _completionTasks,reverting BA to use the timeout * inputs outputs copies, works with AUTO and demo now * accomodate the config per device-id, after rebase to the latest master * allowing the auto-batching only with tput hint to let more conventional tests pass * fix the pre-mature timeout restaring via waiting for batch1 requests completion * moved the bacthed request statring ( along with input copies) to the dedicated thread * [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution * code style * increasing the timeout to test the ssd_* models perf (timeout?) issues * reducing number of output stuff in BA to avoid bloating the logs in experiments * more aggressive batching for experiments, not limited to 32 and also 4 as a min * more accurate timeout debugging info * getting the reqs limitation from the plugin SetConfig as well * refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context * let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4) * auto-batching functional test (with results check vs ref) and GPU instance for that * fixed arithemtic on blobs ptrs * clang * handling possible batched network failure * BATCH as the constants device name in test * ENABLE_BATCH * func tests for CPU, also DetectionOutput hetero tests (CPU and GPU) * DetectionOutput hetero test for the CPU * reenabling the Auto-Batching in the AUTO * auto-batching device enabled in the test * fixed the DO test * improve the loading loop logic * brushed the config keys * allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests * fix the test after refactoring * clang * moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now * auto-batching hetero test (subgraph with DetectionOutput) * fixed minor changes that were result of experiments with impl * code-style * brushing, disabling CPU's HETERO tests until planned activity for 22.2 * removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team * remote blobs tests for the auto-batching (old API) * brushed names a bit * CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests * fixed the ieUnitTests with adding CreateContext stub to the MockICore * clang * improved remote-blobs tests * revert the back BA from exeprimenents with AB + device_use_mem * conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE * remote blobs 2.0 tests, issue with context having the orig device name * debugging DG1 perf drop (presumably due to non-fitting the device-mem) * disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2 * remote blobs 2.0 tests for different tensor sharing types * converting assert to throw to accomodate legacy API where the lock() was possible to be called * revert the timeout back to avoid mixing the studies, fixed the footprint calc * reverting to estimating the max batch by extrapolating from bacth1 size * more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication * even graceful batch 1 handling without duplication * WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching * AutoBatchPlugin -> ov_auto_batch_plugin * WA for gcc 4.8 * clang * fix misprint * fixed errors resulted from recent OV's Variant to Any transition * skip auto-batching for already-batched networks * AUTO_BATCH_TIMEOUT and tests * GPU-specific L3 * switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic * debugging device info * enabling the config tests for the GPU and fixing the Auto-batching tests to pass * making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers * skip auto-batching for RNNs and alikes (e.g. single CHW input) * fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating * brushing * Auto plugin GetMetric support gpu auto-batch Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * add test case Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * add comments on test Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * brushing the vars names, alos adding the excpetion handling * disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures * add try catch Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * brushing the code changed in the GPU plugin * Auto-Batch requests tests * brushed varibles a bit (ref) * cleaned debug output from the ie_core * cleaned cmake for the Auto-Batch * removed batchN estimation from batch1 * cleaned from debug printf * comments, cleanup * WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13 * Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit `e8f1738ac1`. * brushing ie_core.cpp * fix 32bit compilation * Code review: ENABLE_AUTO_BATCH * consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching * renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR * default value for the OPTIMAL_BATCH_SIZE * clang * accomodate new func tests location * fix shuffle of headers after clang + copyrights * fixed misprint made during code refactoring * moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header * switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching * switching from the unsafe size() and minimizing time under lock * code style * brushed the ApplyAutoBatching * brushed the netric/config names and descriptions * completed the core intergration tests for the auto-batching * ExecGraphInfo and check for incorrect cfg * removed explicit dependencies from cmake file of the plugin * disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests Co-authored-by: Roman Lyamin <roman.lyamin@intel.com> Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>	2021-12-24 12:55:22 +03:00
Alexey Varyzgin	a40b5bf15e	[CPU][INT8][Intel OMZ / Public] Third dimension issue in FuseConvolutionAndZeroPoints (#9385 )	2021-12-24 10:13:30 +03:00
Vladislav Volkov	60a11a6348	[CPU] Renamed CPU plugin to ov_intel_cpu_plugin (#9342 )	2021-12-23 11:49:25 +03:00

... 5 6 7 8 9

419 Commits