OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Go to file
Maxim Shevtsov 49b5e5728b
Auto Batching impl (#7883)
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)

(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)

* applying/accomodating the API changes after rebase to the master

* replaying modified version of actual batch selection

* eearly experiments with model mem footprint

* changes from rebasing to the latest master

* experimenting with DG1 on the batch size selection, also collecting the mem footprint

* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap

* quick-n-dirty batch footpint vs device total mem

* code style

* testing which models perform badly due to kernels and NOT (batched) footprint

* stub  pipeline task to comunicate the readiness rather than promise/future

* quick-n-dirty timeout impl

* explicit _completionTasks,reverting BA to use the timeout

* inputs outputs copies, works with AUTO and demo now

* accomodate the config per device-id, after rebase to the latest master

* allowing the auto-batching only with tput hint to let more conventional tests pass

* fix the pre-mature timeout restaring via waiting for batch1 requests completion

* moved the bacthed request statring ( along with input copies) to the dedicated thread

* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution

* code style

* increasing the timeout to test the ssd_* models perf (timeout?) issues

* reducing number of output stuff in BA to avoid bloating the logs in experiments

* more aggressive batching for experiments, not limited to 32 and also 4 as a min

* more accurate timeout debugging info

* getting the reqs limitation from the plugin SetConfig as well

* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context

* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)

* auto-batching functional test (with results check vs ref) and GPU instance for that

* fixed arithemtic on blobs ptrs

* clang

* handling possible batched network failure

* BATCH as the constants device name in test

* ENABLE_BATCH

* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)

* DetectionOutput hetero test for the CPU

* reenabling the Auto-Batching in the AUTO

* auto-batching device enabled in the test

* fixed the DO test

* improve the loading loop logic

* brushed the config keys

* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests

* fix the test after refactoring

* clang

* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now

* auto-batching hetero test (subgraph with DetectionOutput)

* fixed minor changes that were result of experiments with impl

* code-style

* brushing, disabling CPU's HETERO tests until planned activity for 22.2

* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team

* remote blobs tests for the auto-batching (old API)

* brushed names a bit

* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests

* fixed the ieUnitTests with adding CreateContext stub to the MockICore

* clang

* improved remote-blobs tests

* revert the back BA from exeprimenents with AB + device_use_mem

* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE

* remote blobs 2.0 tests, issue with context having the orig device name

* debugging DG1 perf drop (presumably due to non-fitting the device-mem)

* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2

* remote blobs 2.0 tests for different tensor sharing types

* converting assert to throw to accomodate legacy API where the lock() was possible to be called

* revert the timeout back to avoid mixing the studies, fixed the footprint calc

* reverting to estimating the max batch by extrapolating from bacth1 size

* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication

* even graceful batch 1 handling without duplication

* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching

* AutoBatchPlugin -> ov_auto_batch_plugin

* WA for gcc 4.8

* clang

* fix misprint

* fixed errors resulted from recent OV's Variant to Any transition

* skip auto-batching for already-batched networks

* AUTO_BATCH_TIMEOUT and tests

* GPU-specific L3

* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic

* debugging device info

* enabling the config tests for the GPU and fixing the Auto-batching tests to pass

* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers

* skip auto-batching for RNNs and alikes (e.g. single CHW input)

* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating

* brushing

* Auto plugin GetMetric support gpu auto-batch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add test case

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add comments on test

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the vars names, alos adding the excpetion handling

* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures

* add try catch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the code changed in the GPU plugin

* Auto-Batch requests tests

* brushed varibles a bit (ref)

* cleaned debug output from the ie_core

* cleaned cmake for the Auto-Batch

* removed batchN estimation from batch1

* cleaned from debug printf

* comments, cleanup

* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13

* Adding back  removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1.

* brushing ie_core.cpp

* fix 32bit compilation

* Code review: ENABLE_AUTO_BATCH

* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching

* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE  wrt MODEL_PTR

* default value for the OPTIMAL_BATCH_SIZE

* clang

* accomodate new func tests location

* fix shuffle of headers after clang + copyrights

* fixed misprint made during code refactoring

* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header

* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching

* switching from the unsafe size() and minimizing time under lock

* code style

* brushed the ApplyAutoBatching

* brushed the netric/config names and descriptions

* completed the core intergration tests for the auto-batching

* ExecGraphInfo and check for incorrect cfg

* removed explicit dependencies from cmake file of the plugin

* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests

Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
2021-12-24 12:55:22 +03:00
.ci Refactor install wheels on azure (#9394) 2021-12-24 11:37:05 +03:00
.github Feature/azaytsev/from 2021 4 (#9247) 2021-12-21 20:26:37 +03:00
cmake Auto Batching impl (#7883) 2021-12-24 12:55:22 +03:00
docs Auto Batching impl (#7883) 2021-12-24 12:55:22 +03:00
inference-engine Auto Batching impl (#7883) 2021-12-24 12:55:22 +03:00
licensing Export frontend_common as dev target (#9003) 2021-12-08 17:18:44 +03:00
samples Auto Batching impl (#7883) 2021-12-24 12:55:22 +03:00
scripts Update setupvars.bat with CMAKE_BUILD_TYPE (#9327) 2021-12-23 18:14:27 +03:00
src Auto Batching impl (#7883) 2021-12-24 12:55:22 +03:00
tests Added install wheel on azure for samples tests (#9179) 2021-12-23 11:53:17 +03:00
thirdparty Paddle FrontEnd Refactoring (#9157) 2021-12-22 11:20:54 +03:00
tools [POT] Handle exception (#9405) 2021-12-24 12:34:23 +03:00
.gitattributes [POT] Update tests with new data (#8209) 2021-10-27 12:40:19 +03:00
.gitignore Feature/azaytsev/from 2021 4 (#9247) 2021-12-21 20:26:37 +03:00
.gitmodules [CPU] Renamed CPU plugin to ov_intel_cpu_plugin (#9342) 2021-12-23 11:49:25 +03:00
CMakeLists.txt Remove ngraph backends (#9162) 2021-12-13 00:04:56 +03:00
CODEOWNERS [CPU] Renamed CPU plugin to ov_intel_cpu_plugin (#9342) 2021-12-23 11:49:25 +03:00
install_build_dependencies.sh Enabled proper OpenVINOConfig.cmake generation for static build (#8634) 2021-11-20 02:27:43 +03:00
Jenkinsfile Beautify Jenkinsfile a little bit 2021-05-31 15:24:56 +03:00
LICENSE Publishing R3 2018-10-16 13:45:03 +03:00
README.md [README.md] change latest release to 2021.4.2 2021-11-16 22:12:20 +03:00
SECURITY.md Added SECURITY.md back (#3177) 2020-11-17 16:44:44 +03:00

OpenVINO™ Toolkit

Stable release Apache License Version 2.0 GitHub branch checks state Azure DevOps builds (branch) PyPI Downloads

This toolkit allows developers to deploy pre-trained deep learning models through a high-level C++ Inference Engine API integrated with application logic.

This open source version includes several components: namely Model Optimizer, nGraph and Inference Engine, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. It supports pre-trained models from the Open Model Zoo, along with 100+ open source and public models in popular formats such as Caffe*, TensorFlow*, MXNet* and ONNX*.

Repository components:

License

Deep Learning Deployment Toolkit is licensed under Apache License Version 2.0. By contributing to the project, you agree to the license and copyright terms therein and release your contribution under these terms.

Resources:

Support

Please report questions, issues and suggestions using:


* Other names and brands may be claimed as the property of others.