* Build with system TBB
* Fixes
* Check whether system TBB is available
* Try to fix ONNX Runtime build with system TBB
* Test
* Fixed compilation of threading.cpp
* Fixed unset of cache dirs
* Limit dearch paths of TBB
* Try to enable pip packages with custom TBB
* Fix for TBB 2021.2
* Install only needed TBB libraries
* Install TBB from system to pip package
* Reverted usage of TBBROOT
* Fixed oneTBB case
* Try to fix Android
* Escape some paths
* Added samples path
* Fixed TBBBind usage for case of system TBB
* Performance improvement for constant creation
The issue is that 'are_all_data_elements_bitwise_identical()' is called every time in Constant constructor, and it potentially checks all buffer which is O(N) complexity.
While it is needed only if client uses 'get_all_data_elements_bitwise_identical'
Solution:
- Defer calculation until first call of 'get_all_data_elements_bitwise_identical'
- Store calculated value in mutable class member to reuse it on next calls of 'get_all_data_elements_bitwise_identical'
Test verifies both cases:
a) that constant creation with shared memory data (now O(1)) is significantly faster than creation+bitwiseCheck O(N)
b) Than once calculated, value is taken from cache, which is significantly faster than re-calculation
* fix clang-format
* Stash - Linux implementation
* Windows mmap implementation + unicode
* Clang for windows
* removed debug print
* Add handling of empty bin file
* fix windows includes
* Fix python test
* Unit tests
Fix for Constant with size > 4GB
* Fix review comments
* fix .ncc_style target names
it was breaking configure on system with libclang-12-dev, clang-12,
ninja and cmake 3.17+(ninja complains about duplicate
target). with lower cmake version configure succeeds, but build exits
immediately with error. by replacing ninja with make error becomes
warning(it's still significant, make just skips duplicate rules, i.e.
doesn't check style of some source files, rule duplication is genuine
bug). without libclang-12-dev and clang-12 ENABLE_NCC_STYLE is OFF and
bug is not triggered
* silence uninitialized warning in core_integration
probably it was always initialized before use, but compiler wasn't made
aware of it
* fix function spelling to unbreak code style checks in benchmark_app
* include <thread> for std::this_thread
existing code was relying on namespace pollution by old libstdc++
* replace is_pod with is_standard_layout && is_trivial
is_pod is deprecated, it breaks build on current gcc
Co-authored-by: Serhii Pavlovskyi <spavlovskyi@lohika.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* Fixed Apple install
* Update path to libs in setupvars.sh
* Fix IE_CPACK_RUNTIME_PATH for Apple
* Fix wheels packaging
Co-authored-by: Alexey Suhov <alexey.suhov@intel.com>
* Upgrade protobuf to 3.19.4
* Upgdate precompiled protoc version
* Update protobuf to v3.18.2
Updating further peding this fix to be released
https://github.com/protocolbuffers/protobuf/pull/9437
* Disable warnings for protobuf
* Depricated Any implicit cast
* Fixed test
* fixed gna build
* Fixed warnings in benchmark_app
* Fixed test build
* ncc exception for PrintTo
* Error mesage in test
* Error mesage in test
* fixed build
* Install libGNAconfig.cmake
* Refactor gnaConfig to correctly find from OV package
* remove ENABLE_INTEL_GNA option from CI
* Apply comments and fix CI
* re-trigger CI (demos issue)
* Enable GNA/samples smoke tests
* rename GNA to GNA_EXT_DIR
* re-trigger CI (mxnet cpu test issue)
* Pick azhogov changes to check CI
* try win wa
* fix win build
* re-trigger onnx
* tests
* disable win samples tests
Co-authored-by: Alexander Zhogov <alexander.zhogov@intel.com>
* Further fixes of plugins.xml generation
1) Unregistration is done by name (e.g. CPU), not by file name (ov_cpu_plugin)
2) Unregistered line is searched by name="MULTI" instead of just 'MULTI' to not conflict with MULTI_WORK_MODE_AS_AUTO entry
3) Removed list of all possible plugins from ov_runtime as logic shall not rely on this (not possible to add 3rd party plugins)
* Revert ov_runtime - some CI jobs require plugins.xml even though plugins are not built
Registration - if some entry already exists in XML - don't copy it.
E.g.
- Registration of 'TEMPLATE' is performed
- Registration loops through existing plugins.xml
- If name="TEMPLATE" is found - don't take it to newContent
- If name like "myCustomPlugin" is found - take it
- As result - "myCustomPlugin" will exist after update, but old "TEMPLATE" will be removed
* Add missing change
- Added registration (and unregistration) of ov_auto_batch_plugin. Otherwise 'BATCH' plugin will always produce new XML line without removing old one
- Added unregistration of legacy plugin names (<= 2021.4 release). Otherwise old lines like "libHeteroPlugin.so" will not be removed from plugins.xml file
* ShutdownProtobufLibrary when unload paddle frontend dynmaic library to fix probuf memory leak
* ShutdownProtobufLibrary if the frontend libraries use protobuf
* make shutdown_protobuf a library
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)
(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)
* applying/accomodating the API changes after rebase to the master
* replaying modified version of actual batch selection
* eearly experiments with model mem footprint
* changes from rebasing to the latest master
* experimenting with DG1 on the batch size selection, also collecting the mem footprint
* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap
* quick-n-dirty batch footpint vs device total mem
* code style
* testing which models perform badly due to kernels and NOT (batched) footprint
* stub pipeline task to comunicate the readiness rather than promise/future
* quick-n-dirty timeout impl
* explicit _completionTasks,reverting BA to use the timeout
* inputs outputs copies, works with AUTO and demo now
* accomodate the config per device-id, after rebase to the latest master
* allowing the auto-batching only with tput hint to let more conventional tests pass
* fix the pre-mature timeout restaring via waiting for batch1 requests completion
* moved the bacthed request statring ( along with input copies) to the dedicated thread
* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution
* code style
* increasing the timeout to test the ssd_* models perf (timeout?) issues
* reducing number of output stuff in BA to avoid bloating the logs in experiments
* more aggressive batching for experiments, not limited to 32 and also 4 as a min
* more accurate timeout debugging info
* getting the reqs limitation from the plugin SetConfig as well
* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context
* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)
* auto-batching functional test (with results check vs ref) and GPU instance for that
* fixed arithemtic on blobs ptrs
* clang
* handling possible batched network failure
* BATCH as the constants device name in test
* ENABLE_BATCH
* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)
* DetectionOutput hetero test for the CPU
* reenabling the Auto-Batching in the AUTO
* auto-batching device enabled in the test
* fixed the DO test
* improve the loading loop logic
* brushed the config keys
* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests
* fix the test after refactoring
* clang
* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now
* auto-batching hetero test (subgraph with DetectionOutput)
* fixed minor changes that were result of experiments with impl
* code-style
* brushing, disabling CPU's HETERO tests until planned activity for 22.2
* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team
* remote blobs tests for the auto-batching (old API)
* brushed names a bit
* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests
* fixed the ieUnitTests with adding CreateContext stub to the MockICore
* clang
* improved remote-blobs tests
* revert the back BA from exeprimenents with AB + device_use_mem
* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE
* remote blobs 2.0 tests, issue with context having the orig device name
* debugging DG1 perf drop (presumably due to non-fitting the device-mem)
* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2
* remote blobs 2.0 tests for different tensor sharing types
* converting assert to throw to accomodate legacy API where the lock() was possible to be called
* revert the timeout back to avoid mixing the studies, fixed the footprint calc
* reverting to estimating the max batch by extrapolating from bacth1 size
* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication
* even graceful batch 1 handling without duplication
* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching
* AutoBatchPlugin -> ov_auto_batch_plugin
* WA for gcc 4.8
* clang
* fix misprint
* fixed errors resulted from recent OV's Variant to Any transition
* skip auto-batching for already-batched networks
* AUTO_BATCH_TIMEOUT and tests
* GPU-specific L3
* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic
* debugging device info
* enabling the config tests for the GPU and fixing the Auto-batching tests to pass
* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers
* skip auto-batching for RNNs and alikes (e.g. single CHW input)
* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating
* brushing
* Auto plugin GetMetric support gpu auto-batch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add test case
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add comments on test
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the vars names, alos adding the excpetion handling
* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures
* add try catch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the code changed in the GPU plugin
* Auto-Batch requests tests
* brushed varibles a bit (ref)
* cleaned debug output from the ie_core
* cleaned cmake for the Auto-Batch
* removed batchN estimation from batch1
* cleaned from debug printf
* comments, cleanup
* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13
* Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1.
* brushing ie_core.cpp
* fix 32bit compilation
* Code review: ENABLE_AUTO_BATCH
* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching
* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR
* default value for the OPTIMAL_BATCH_SIZE
* clang
* accomodate new func tests location
* fix shuffle of headers after clang + copyrights
* fixed misprint made during code refactoring
* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header
* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching
* switching from the unsafe size() and minimizing time under lock
* code style
* brushed the ApplyAutoBatching
* brushed the netric/config names and descriptions
* completed the core intergration tests for the auto-batching
* ExecGraphInfo and check for incorrect cfg
* removed explicit dependencies from cmake file of the plugin
* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests
Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Remove some legacy targets
* Replace some targets
* Removed inference_engine_plugin_api dependency
* Minor comment for developer config
* Fixed include paths
* Small fixes for static build
* Try to fix build pyopenvino
* Fixed comments
* Try to fix build
* Include OpenVINODeveloperPackage inside InferenceEngineDeveloperPackageConfig
* Try to fix GAPI tests