* cli_parser.py fix to accept scalar value for freezing
* update cli help
* fixed unit-tests, clarified help for specifying data type
* typos correction
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)
(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)
* applying/accomodating the API changes after rebase to the master
* replaying modified version of actual batch selection
* eearly experiments with model mem footprint
* changes from rebasing to the latest master
* experimenting with DG1 on the batch size selection, also collecting the mem footprint
* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap
* quick-n-dirty batch footpint vs device total mem
* code style
* testing which models perform badly due to kernels and NOT (batched) footprint
* stub pipeline task to comunicate the readiness rather than promise/future
* quick-n-dirty timeout impl
* explicit _completionTasks,reverting BA to use the timeout
* inputs outputs copies, works with AUTO and demo now
* accomodate the config per device-id, after rebase to the latest master
* allowing the auto-batching only with tput hint to let more conventional tests pass
* fix the pre-mature timeout restaring via waiting for batch1 requests completion
* moved the bacthed request statring ( along with input copies) to the dedicated thread
* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution
* code style
* increasing the timeout to test the ssd_* models perf (timeout?) issues
* reducing number of output stuff in BA to avoid bloating the logs in experiments
* more aggressive batching for experiments, not limited to 32 and also 4 as a min
* more accurate timeout debugging info
* getting the reqs limitation from the plugin SetConfig as well
* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context
* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)
* auto-batching functional test (with results check vs ref) and GPU instance for that
* fixed arithemtic on blobs ptrs
* clang
* handling possible batched network failure
* BATCH as the constants device name in test
* ENABLE_BATCH
* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)
* DetectionOutput hetero test for the CPU
* reenabling the Auto-Batching in the AUTO
* auto-batching device enabled in the test
* fixed the DO test
* improve the loading loop logic
* brushed the config keys
* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests
* fix the test after refactoring
* clang
* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now
* auto-batching hetero test (subgraph with DetectionOutput)
* fixed minor changes that were result of experiments with impl
* code-style
* brushing, disabling CPU's HETERO tests until planned activity for 22.2
* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team
* remote blobs tests for the auto-batching (old API)
* brushed names a bit
* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests
* fixed the ieUnitTests with adding CreateContext stub to the MockICore
* clang
* improved remote-blobs tests
* revert the back BA from exeprimenents with AB + device_use_mem
* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE
* remote blobs 2.0 tests, issue with context having the orig device name
* debugging DG1 perf drop (presumably due to non-fitting the device-mem)
* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2
* remote blobs 2.0 tests for different tensor sharing types
* converting assert to throw to accomodate legacy API where the lock() was possible to be called
* revert the timeout back to avoid mixing the studies, fixed the footprint calc
* reverting to estimating the max batch by extrapolating from bacth1 size
* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication
* even graceful batch 1 handling without duplication
* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching
* AutoBatchPlugin -> ov_auto_batch_plugin
* WA for gcc 4.8
* clang
* fix misprint
* fixed errors resulted from recent OV's Variant to Any transition
* skip auto-batching for already-batched networks
* AUTO_BATCH_TIMEOUT and tests
* GPU-specific L3
* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic
* debugging device info
* enabling the config tests for the GPU and fixing the Auto-batching tests to pass
* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers
* skip auto-batching for RNNs and alikes (e.g. single CHW input)
* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating
* brushing
* Auto plugin GetMetric support gpu auto-batch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add test case
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add comments on test
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the vars names, alos adding the excpetion handling
* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures
* add try catch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the code changed in the GPU plugin
* Auto-Batch requests tests
* brushed varibles a bit (ref)
* cleaned debug output from the ie_core
* cleaned cmake for the Auto-Batch
* removed batchN estimation from batch1
* cleaned from debug printf
* comments, cleanup
* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13
* Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1.
* brushing ie_core.cpp
* fix 32bit compilation
* Code review: ENABLE_AUTO_BATCH
* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching
* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR
* default value for the OPTIMAL_BATCH_SIZE
* clang
* accomodate new func tests location
* fix shuffle of headers after clang + copyrights
* fixed misprint made during code refactoring
* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header
* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching
* switching from the unsafe size() and minimizing time under lock
* code style
* brushed the ApplyAutoBatching
* brushed the netric/config names and descriptions
* completed the core intergration tests for the auto-batching
* ExecGraphInfo and check for incorrect cfg
* removed explicit dependencies from cmake file of the plugin
* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests
Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Remove some legacy targets
* Replace some targets
* Removed inference_engine_plugin_api dependency
* Minor comment for developer config
* Fixed include paths
* Small fixes for static build
* Try to fix build pyopenvino
* Fixed comments
* Try to fix build
* Include OpenVINODeveloperPackage inside InferenceEngineDeveloperPackageConfig
* Try to fix GAPI tests
* Fix incomprehensible error message during layout conversion when layout rank doesn't match with shape rank
* Stash
* stash
* Memcpy implementation
Added tests
* Revert "Fix incomprehensible error message during layout conversion when layout rank doesn't match with shape rank"
This reverts commit 37064741b2.
* Fix clang-format and remove redundant headers
* Covered "cached" case (+ tested on Myriad)
* Apply review comments
Introduced 'applyBatchedBlob' function which allows override 'memcpy' on inferefnce time
* clang-format fix
* Added dynamic shape case
* - Review comments
- Deep copy of parameters/results for caching from cnnNetwork. Deep copy logic is moved to Utils
- Caching Tests: return correct inputs/outputs map after ImportNetwork mock call
* Reworked according to discussion
Also introduced 'SetBlobsImpl' which throws 'Not implemented' exception by default.
Template plugin updates internal '_batched_inputs' map
* Updated according to moved tests
* don't support 'memcpy' for ROI tensors
* Fix caching tests
* Just to retrigger CI
* Correct offset padding (however there is no test update as current implementation will not hit here due to other checks)
* Fix clang-format
* Applied review comments
* Added check that 'get_tensor' throws if set_tensors/set_input_tensors is used
* Fix review comments - part 1
* Fix caching tests - mock implementation becomes more complicated
Cached mock model shall identify its inputs/outputs, otherwise core will assert on SetExeNetworkInfo stage
* More comments fix
* More comments fixes
* More cleanup
* And more style comment
* typo fix
* Try fix caching windows tests
* Blind attempt to fix Ubuntu20 CI
* Renamed ov::Function to ov::Model
* Fixed all for macos
* Fixed build
* Fixed build
* Revert changes in GPU plugin
* Fixed ngraphFunctions
* Fixed all for mac
* Fixed new test
* Fixed if for Windows
* Fixed unit tests and renamed Function in python API
* Fixed code style
* Fixed import
* Fixed conflict
* Fixed merge issues
* Remove fp16 of Convert layer test from skip_tests.config.cpp as it works now
* update repo
* add op reference test of ExperimentalDetectronPriorGridGenerator
* implement actual_comparision_size for compare
* update slt for actual comparison size and add visitor api test
* fixed clang error
* Moved and merged mo/ and extensions/ into openvino/tools/mo
* edited imports
* edited docs to use mo script from entry_point
* edited MO transformations list loading and setup.py
* changed full path -> 'mo' entry point in docs (leftovers)
* corrected package_BOM
* updated resolving --transformation_config in cli_parser.py
* pkgutil-style __init__.py, added summarize_graph into entry points
* updated DOCs for the new --transformations_config
* fix select
* updated install instructions, fixed setup.py for windows and python_version < 3.8
* fixed typo in requirements.txt
* resolved conflicts
* removed creating custom __init__.py from setup.py
* corrected folder with caffe proto
* corrected loading user defined extensions
* fix openvino.tools.mo import in serialize.py
* corrected layer tests for new namespace
* fix in get_testdata.py
* moved model-optimizer into tools/
* renamed import in POT
* corrected mo.yml
* correct CMakeLists.txt for the newest tools/mo
* corrected find_ie_version.py
* docs and openvino-dev setup.py update for the newest tools/mo
* miscellaneous leftovers and fixes
* corrected CI files, pybind11_add_module in CMakeLists.txt and use of tools/mo path instead of tools/model_optimizer
* add_subdirectory pybind11 for tools/mo
* POT path fix
* updated setupvars.sh setupvars.bat
* Revert "updated setupvars.sh setupvars.bat"
This reverts commit c011142340.
* removed model-optimizer env variables from setupvars
* updated CMakeLists.txt to pack MO properly with tests component
* corrected left imports, corrected loading requirements for layer tests
* mo doc typo correction
* minor corrections in docs; removed summarize_graph from entry_points
* get_started_windows.md, MonoDepth_how_to.md corrections, mo path corrections
In case of partially-dynamic shape, e.g. {?,3,?,?} shape inference
for gathering channels and reverse operations can't infer final shape to {?,3,?,?} and it becomes {?,?,?,?}
Added 'static' version of reverse-channels to preserve output shape for such cases
It can be changed in future if operations will be able to calculate shape on 'validate' phase