* Update doc for AUTO and AUTO_BATCH
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update per the comments
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Move default hint to THROUGHPUT section
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Fixed newAPI for case if core was removed
* Fixed code style
* Fixed typo
* Use new API by default
* Create core with template plugin
* Added doxygen comment
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* fix references
* update links
* update the wording to be more clear
* add the error message about Visual studio back
* update links to static html links of 2022.2
* change memory access pattern of fsv layout for permute
* Fix permute_ref to process F first only when (bf...) => (b...f)
* Refactor
Co-authored-by: si-eun-kim <sieun.kim@intel.com>
* add auto_batch_timeout for MULTI and AUTO
* fix clang-format for ie_core.cpp
* fix coredump
* simplify insert key to deviceConfig logic and parseDeviceNameIntoConfig() check "AUTO" and "AUTO:" only
* check config auto_batch_timeout
* add CleanUpInIECore()
* fix clang-format for ie_core.cpp
* Fix the deconv fused issue on AVX2 and AVX512 and enable deconv test
* Keep GroupDeconv BF16 test cases still disabled.
* Update to also excluding nightly
* Update onednn submodule.
* Update onednn submodule
* Update onednn submodule.
* Update the ONDENN submodule
* Update the ONEDNN commit.
* Update with merged onednn commit.
* Define new ppp API for nv12
* Add new ppp API function
* Add new ppp API unit test
* Add hello nv12 input classification ov
* Define new ppp API for nv12
* Add new ppp API function
* Add new ppp API unit test
* Add hello nv12 input classification ov
* Fix the clang -formate issue
* Modify the function called is_supported_image_size
* Update code as suggested
* Add hello_nv12_input_classification e2e test
* clang-format openvinotoolkit
* Fix the doc error in CI
Co-authored-by: River Li <river.li@intel.com>
Some compiler flags restrict the compiler from making arbitrary decisions while handling undefined C/C++ behaviors.
Therefore they can be used to fix some issues caused by undefined behavior.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* FTZ_and_DAZ_set_for_cpu
* remove DAZ
* fix
* extract to utils
* ie core part changes to add do as property and benchmark_app enable do
* enable brgcov from Luocheng patch
* add debug info
* enable_brgemm_on_avx512
* add python binding
* dlb test
* FTZ_and_DAZ_set_for_cpu
* remove DAZ
* fix
* extract to utils
* ie core part changes to add do as property and benchmark_app enable do
* enable brgcov from Luocheng patch
* add debug info
* enable_brgemm_on_avx512
* add python binding
* dlb test
* revert test code
* revert test code
* Handle in-place failure cases in reshape node
* Disable inplace when non-const reshape connected to constant
* Add comment to reshape_inplace test
* move copy WA into execute() to cover more general in-place failure cases
* enable brgconv f32
* use config to enable brgconv f32
* when brg disabled not init bin-postops
* change prop name for extensive
* use more general field
* fix review comments.
* Add FORCE_TBB_TERMINATE to legacy API
* Put this config into proper place
* fix issue in property test
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* [CPU] Optimize NonZero operation
# Conflicts:
# src/plugins/intel_cpu/src/nodes/non_zero.cpp
* [CPU] Rewrite NonZero implementation, so it will use generic ie_parallel API
* [CPU] NonZero operation: apply an additional optimization
* NonZero operation: add fallback code for inRank >= 6
* NonZero operation: apply review modifications
# Conflicts:
# src/plugins/intel_cpu/src/nodes/non_zero.cpp
* NonZero operation: inShape.getDims().size() -> inRank
* NonZero operation: eliminate input array index calculation by slight modification of ie_parallel API
* Adjust ie_parallel.hpp style for clang-format
* Try to unbreak the build
* Move to parallel_nt and add a cache for nd loops to optimize more
* Add minimal size threshold for threading and reduce warning count
* Try to workaround linter errors
* One more try to unbreak cpplint build
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* Remove vmaxps in store_vector.
This instruction is not needed for dst_prc int8.
And it may lead to wrong result with denormals optimization is on.
* Add vpmaxsd if dst_prc is u8 or u16.
* Enable hint to tput if no property is specified for both AUTO device and target device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. Update logic.
2. Add test cases.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update. Set hints to default for target device if no hints setting for AUTO plugin and no specific properties setting for target device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
This extra semicolon creates an output as example below. The extra
'::' is equivalent to add '.' as part of the LD_LIBRARY_PATH. This
breaks glibc build, and very often creates weird issue when launch
commands from different path.
...inference_engine/external/tbb/lib::/opt/intel/openvino_2021/...
We also noticed that :${parameter:+:$parameter} is widely used in
this file. Please review the code and fix as needed.
* ExperimentalDetectronDetectionOutput: refine sorting criteria for NMS stage
This is to ensure the operation produces stable predictable results across
the possible sorting algorithm implementaions.
This property is useful for the operation testing.
* [GPU] Implement ExperimentalDetectronDetectionOutput operation
* [GPU] ExperimentalDetectronDetectionOutput: use vector types and operations in kernel
* Reformat changed files to make clang format checker happy
* [GPU] ExperimentalDetectronDetectionOutput: add another test case to the unit test
* [GPU] ExperimentalDetectronDetectionOutput: Add f16 test
* ExperimentalDetectronDetectionOutput: single-layer test: use all three outputs
* [GPU] ExperimentalDetectronDetectionOutput: increase single layer test coverage
More attribute permutations were added.
* add testcase for plugin properties should not be revised by compile_model
* rename smoke_cpuCompileModelBehaviorTests to smoke_gpuCompileModelBehaviorTests
* remove property EXCLUSIVE_ASYNC_REQUESTS in ov2.0 test
* add testcase for plugin properties should not be revised by loadNetwork
* 1. Enable IE Core filter to promote the secondary properties to first level for hardware device.
2. Enable IE Core filter to pass the secondary properties to AUTO plguin.
3. Enable AUTO Plugin to parse secondary properties to first level and pass them to corresponding target hardware device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. Enable MULTI Plugin to support secondary properties.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. Enable HETERO Plugin to support secondary priorities.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Catch the EXPECT_CALL with AVAILABLE_DEVICES argument inputting to GetMetric.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Revert the logic of handling secondary properties for MULTI and HETERO device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Remove the secondary property flattening logic because this logic has been implemented within AUTO plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. update flatten logic when secondary properties is specified.
2. add the test case with secondary properties for CPU.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* add the test case with secondary properties for GPU plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add debug message to fix the test case failure issue.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add more debug info.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
1. For IE Core, 1st level property overides the 2nd level property.
2. For AUTO plugin, add available device list to check if the secondary properties is vaild.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add CUDA and ARM.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update device name for ARM Plugin and add device name for HPU plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* 1. Enable OPTIMIZATION_CAPABILITIES for AUTO plugin.
2. Add corresponding test case.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Remove EXPORT_IMPORT as Export is not implemented in the AUTO/MULTI.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* DOCS-structure_workflow
workflow diagram files and formatting
added overview articles on models and deployment
added the ecosystem page and changed the header from addons
* DOCS-structure_dlworkbench
* DOCS-structure_ovtf
* fixed FakeOutputResolver to avoid renaming correctly named nodes
* fixed failed mo_args test: process reverse_input_channels through eltwise with constant with shape=[]
* changed fix to more accuarate to avoid possible issues
* Remove unnecessary iterating over producer outputs
Co-authored-by: sadolini <svetlana.a.dolinina@intel.com>
* Property to force terminate tbb threads
During inference done, tbb threads cannot be closed by itself, which cause memory leak and unload/lingering threads.
Sometimes the tbb threads need to be terminate for resource(memory, thread) consumption
This PR contains:
1. Add a new property to control whether force to terminate tbb threads.
2. Property key is "FORCE_TBB_TERMINATE", default value is false.
3. Explicitly to terminate tbb task scheduler during unload openvino dll if this property is set true.
e.g: core.set_property(device, ov::force_tbb_terminate(true));
4. If not set FORCE_TBB_TERMINATE, there will be no any additional tbb operations.
Change-Id: I32dc0ba122bb19a9dbf3ba12fdd596aad9ac54b4
* Fix executorManager test case
Change executorManager from static to be dynamic, the test case should fit this change.
* Change frontendManger to be non-static instance
Make frontendManger to be non-static instance.
We should guard it is not released before Model, due to Model will use the mem allocated by frontendManger.
So put frontendManager reference in ov::Model to make it work.
* Fix race condition between executor and executorManger
* Add test case for tbb property
1. Add basic test case for ov::force_tbb_terminate property
2. set ov::force_tbb_terminate to be false
* Avoid terminate tbb in case of no tbb thread created
* Fix Constant ops segmentfault issue
There is segmentfault issue during Constant destruction, which is caused by some shared memory is double free
Test case is:
ie = IECore()
net = ie.read_network(model=test_net_xml, weights=test_net_bin)
query_res = ie.query_network(net, device)
func_net = ng.function_from_cnn(net)
ops_net = func_net.get_ordered_ops()
ie and net will be released before ops_net destruction, so Constant will free the shared memory that has been freed
* Make constant::m_data is released before frontendmanager
* tiny format change
* change tbb blocking_terminate to terminate
Tbb blocking_terminate calling will cause some segmentfault during run some special models,
the reason may comes from block_terminate cause current thread block here to wait for tbb exit,
but cannot handle some resource dependencies.
After adopt terminate(), the dependencies can be resolved and no segmentfault any more.
Change-Id: I0b920630a25cd3fd2747c57ec71ca749ba35573b
* Remove unnecessary dependencies
* Disable dynamic lib test case in static library compilation version
As CVS-68982 description, we should disable the test case which will load
dynamic library in openvino static library compilation.
* Fix nested-namespace-definition issue
* Address reviewer's comments
* Refine ov_partial_shape for OV 2.0 C interface
To avoid potential string security problem, remove string pointer from ov_partial_shape structure.
* Remove redundant code
* fix typo issue
* fix shape test issue
* fix some minor issues
* Address reviewing comments
Use Dimension to represent rank of parital shape.
* Appy safer method to parse partialShape string
1. adopt ov::Dimension::value_type to construct ov::Dimension
2. safter method to convert string to dimension value
3. apply std::vector<std::string> to replace std::vector<char *> during pasrsing partialShape string
Change-Id: I0e0b70a915fc5c5fefad51de51f167798854f55e
* Convolution concat sum inplace conflict fix
* Minor refactoring.
* Rebase to OV2.0, build pass.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Remove old file.
Rebase introduce this file by mistake.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Move functional test for subgraph.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Disable some crash test for continue to test others.
* Rename ConcatConvSumInPlaceTest to ReLuConcatConvSumInPlaceTest
fix ci crash issue.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Revert "Disable some crash test for continue to test others."
This reverts commit f7a8677c002747b45e84f74672f76e2fdfc7ab22.
* Add const for inPlace.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* fix build issue, missing braces;
Co-authored-by: Maksim Kutakov <maksim.kutakov@intel.com>
* Add signal stack management for AMX in linux python API
* fix wording
* fix empty line
* add AT_MINSIGSTKSZ definition
* Fix misspelling and conditional compiling on __linux__
* Change read_image() into generate_image()
* Move test utils from testdata repo to local files
* Minor changes
* Remove unnecessary code
* Minor changes
* Fix compatibility tests
* Fix imports for Azure pipeline
* Move model generation into test_utils
* Minor changes
* Minor changes
* Update linux.yml CI
* Remove testdata repo from .ci/linux.yml
* Remove testdata repo from pipelines
* Fix Azure compatibility tests
* Reset linux.yml
* Remove testdata repo from linux CI
* Try eliminating one of configs
* Attempt at fixing Azure tests
* Add separate utils for compatibility
* xfail comp if op tests
* Minor changes
* Revert changes to .ci files
* minor changes
* Remove xfails
* Remove unecessary import
* Skip if op tests
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* add paddle op top_k_v2
* rebase
* fix variable support issue for paddle top_k_v2
* Update src/frontends/paddle/src/op/top_k_v2.cpp
Co-authored-by: Bo Liu <bo4.liu@intel.com>
* Update src/frontends/paddle/src/op/top_k_v2.cpp
Co-authored-by: Bo Liu <bo4.liu@intel.com>
* Update src/frontends/paddle/src/op/top_k_v2.cpp
Co-authored-by: Bo Liu <bo4.liu@intel.com>
* format the top_k_v2.cpp
Co-authored-by: meiyang-intel <yang.mei@intel.com>
Co-authored-by: Bo Liu <bo4.liu@intel.com>
They sporadically impact CI... possible reason is the order of paddle and openvino is not guaranteed when more than
one bboxes have equal scores.
Actually there is no need for these random tests as the remainding cases have covered them.
* draft pr for planar and fsv16
* draft pr for general test
* update fusion test (failing)
* update fusing test (pass)
* update fusing test (include exception)
* clean gpu unit test
* review comment applied
* unit test cases added & cpplint applied
* cpplint error fixed
* change gpu test cases for fp16
* fusing test fix generate_unique_indices
* fix typo
* revise cl kernel for occasions when updates shape is altered
* Initial files & cmakefiles for ov 2.0 c api development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add all ov 2.0 C APIs define
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix review comments
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Disable test of OV 2.0 C APIs test for tmp
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add related property key for ov 2.0 C-API
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add description for ov_property_key_e
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add EXECEPTION handling
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* compiledModel add interface
* add inferrequest interface
* solve cpplint problem
* Finished OV 2.0 C-APIs PPP related development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix code review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add ov::tensor API
* add compiled model func
* Finished C-API funs about core, model, node development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C-API] add const to ov_output_node
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C-API] Using define GET_OV_ELEMENT_TYPE & GET_CAPI_ELEMENT_TYPE in tensor APIs
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C-API] add string initialize
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* add inferrequest func
* add move construction to runtime_model
* supplement two infer request interface functions
* [OV 2.0 C-API] Add the common framwork of unit test
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* modify ov_infer_request_get_profiling_info
* add tests dir
* restore CMakeLists.txt
* Fix the bug of COPY in Tensor
* [OV 2.0 C API] Finished core related function unite test
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add ov:Tensor API test
* [OV 2.0 C API] fix some review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* add some infer request test
* add compiled model test
* [OV 2.0 C API] Finished preprocess related function unite test
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] Fix review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] Modify to use default model
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* transfer device_name from fix value to parameter
* add some infer request test
* remove compiled model get_property test
* add infer request tests
* Add ov::model Test and modify Tensor Test name
* Determine whether partial shape meets the standard
* Add get tensor name function and Modify reshape test case
* modify fixed tensor name,remove unnecessary comparison
* add ov_model_get_nodes_info, modify according to comments
* Update reshape test
* extract common function, modify interface about get tensor name,shape and type
* modify according comments
* [OV 2.0 C API] Finished hello classification with ov 2.0 c-api development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] Fixed hello classification with ov 2.0 c-api review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] delete inactive code hello classification with ov 2.0 c-api
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix clang format issue
* [OV 2.0 C API] rename
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix windows build erre
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Apply qsort for sorting data
Apply qsort for sarting data
Fix issues of "potentially uninitialized local pointer variable"
* Not use deprecated INSTANTIATE_TEST_CASE_P for c api gtest
INSTANTIATE_TEST_CASE_P is deprecated, should use INSTANTIATE_TEST_SUITE_P.
* Fix some review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [Ov 2.0 C API] Add error info
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix some review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* polish error message for ov c api
* Redefined ov_shape_t, ov_partial_shape_t and ov_layout_t. Modified functions and test cases involving these variables
* Added the conversion between char* and partial_shape
* Add partial_shape_to_shape
* prune code
* modify split
* Use regex to split and search pattern
* Modify str_to_char_array delete
* Add the judgment of rank
* Fix compiling error
Fix issue: address of array 'shape.dims' will always evaluate to 'true' if -Wpointer-bool-conversion
Co-authored-by: xuejun <Xuejun.Zhai@intel.com>
Co-authored-by: sunxiaoxia2022 <xiaoxia.sun@intel.com>
Co-authored-by: ruiqi <ruiqi.yang@intel.com>
* gather blocked format
* enable double blocked
* 5d test
* support cross dimension
* Add some disabled test for later use
* Support non-default planar formats
* It has better performance by using reduction kernel instead of pooling kernel in oneDNN for reduction layer.
* Stop using global pooling instead of reduce primitive
* Use oneDNN reduction if its mode is supported by optimized onedNN kernel
* activation pow is supported
* Use clDNN reduce if 3d or redundant reduce, tensor size mismatch
* Updated thirdparty onednn_gpu
Signed-off-by: Min, Byungil <byungil.min@intel.com>
Co-authored-by: Wei Tang <wei1.tang@intel.com>
Co-authored-by: Chen Kurt <kurt.chen@intel.com>
* [GPU] Implement Roll kernel
* [GPU] Add Roll kernel selector
* [GPU] Add Roll primitive
* [GPU] Add Roll helpers
* [GPU] Implement unit tests for the Roll operation
* [GPU] Add Roll operation to GPU plugin
* [GPU] Add single layer tests for the Roll operation
* [GPU] Add changes after review
* [GPU] Improve cldnn unit test
* Dynamic shape memory reuse solution
* Fix Split node to properly work with dyn mem
* Fix race condition for Memory mgrHandle
* Avoid Memory race condition between GetData and SetDataHandle
Add a lock for race condition between ov::intel_cpu::Memory::GetData() and ov::intel_cpu::Memory::SetDataHandle() is not a good solution,
which will impact the inference performance. We found that it is unnecessary get edge DataPtr in inferRequest::SetBlob or GetBlob, which
only need the tensorDesc, so we can only get tensorDesc to replace get dataPtr to avoid this race condition.
* Resolve reviewer's comments
* Avoid performance impact due to frenquent reset MemMngrHandle
If MemMngrHandle already has been assigned an external buffer, it can be reused.
Else it need create a new one.
* multiclass_nms opset9 spec, api, reference, paddle fe mapper, paddle fe unittest.
* multiclass_nms opset9 cpu node impl.
* multiclass_nms opset9 shape infer fix.
* multiclass_nms opset9: add transform ConvertMulticlassNms8ToMulticlassNms9.
* ConvertMulticlassNmsToMulticlassNmsIE: to MulticlassNmsIEInternal
* add test dependency package paddledet==2.1.0
* 1. fix for roisnum overflow. 2. common shape_infer private function.
Signed-off-by: jialipen <cecilia.peng@intel.com>
* 1. use common infer_shape helper. 2. fix roisnum overflow issue. 3. fix for nmsWithEta.
* test suite for opset9 multiclass_nms smoke tests pass, with both static and dynamic shapes.
code clean for unit test.
* decouple specification from this PR.
* op fuzzy: dynamic input/output
* reference impl refactor
* multiclass_nms_base no need clone_inputs.
* code clean
* restrict ppdet import
* fix clang format error
* change ppdet import to resolve CI fail issue related to its dependency.
* fix CI
* refactor: multiclass_nms_shape_inference for opset9 and reference impl.
TODO: could be applied to opset8 and even matrix_nms.
* fix CI build failure.
* CI fix for ambiguous namespace reference issue when
building static libs.
* update nms save_model python scripts.
* dynamic inputs for NMS with CPU plugin.
* copyright header for test scripts.
* op comformance test for multiclass_nms_9.
* minor update: is_type
* python opset9 and multiclass_nms
* flake8 CI fix
flake8 CI fix
flake8 CI fix
* remove NmsBase. stage1.
flake8 CI fix
remove NmsBase. stage 1 fix.
* rm NmsBase. stage2.
* more multiclass_nms prop tests and fix.
* remove unchanged ops from binding opset9.
* dependcy of paddle_tests.
* fix: add MulticlassNms to op mapper.
* clang format fix
* fix merge error.
* add formats for 3d conv
data formats
-bs_fs_zyx_bsv32_fsv32
-bs_fs_zyx_bsv32_fsv16
-bs_fs_zyx_bsv8_fsv4
-bs_fs_zyx_bsv8_fsv2
-bs_fs_zyx_bsv16_fsv32
-b_fs_zyx_fsv2, b_fs_zyx_fsv4
weight formats
-os_is_zyx_osa2_isa8_osv8_isv2
-os_is_zyx_osv8_isv4
-os_is_zyx_osv8_isv2
-gs_oizyx_gsv32
* add supported formats for primitives
* choose onednn convolution impl for 3d conv
* optimize layout of shallow depth convolution
* remove reorder for conv
* Don't remove reorder between bs_fs_zyx_b32_f16/f32 and bfyx.
* add formats to SetDefault() to optimize gws/lws for quantize/eltwise
* fallback cldnn if onednn pooling's layout is b_fs_zyx_fsv32 and i8.
* fixed wrong position for new weight formats
* restore imad_case()
* This func is used to choose format for fallbacked cldnn
* [GPU] add debug flag: OV_GPU_SerialCompile
0(default): parallel compile
1: serial compile
* add is_mixed_layout
* remove format::bs_fs_zyx_bsv8_fsv4 in needs_onednn_small_ic_to_blocked
* prevent to fuse the reorder which is between quantize and conv
* shallow feature first conv
* Revert "[MO args][ONNX FE]fix cutting graph with input, output or both (#9698)"
This reverts commit 2b03d5fe66.
* Fix cutting the graph when inputs/outputs are passed to the MO
* Check that port exists
* Simplification of getting node port
* Reducing amount of nesting inside searching of node by operation name
* Refactoring
- remove mutable default arg
- changes in code style
- change variables name
* Check that user input data type is dictionary
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* [GPU] Modify Softmax single layer tests to check Softmax-8 is supported with axes in [-rank, rank) interval
* [GPU] Fix cldnn::softmax::dimension_t documentation
* [GPU] Fix ParamsKey::EnableSoftmaxDim
Support Z dimension.
* [GPU] Add Softmax single layer test that checks 5D case
Since some Softmax kernel code contains ifdef on 5-dimensional case,
a test case is needed that covers this functionality.
* [GPU] Support axis 0 in Softmax
* [GPU] Modify Softmax single layer tests to check axis 0
* [GPU] Modify Softmax items class optimized kernel to handle axis 0 correctly
Modify single layer test accordingly.
* [GPU] Modify Softmax unit-test to check softmax::normalize_b
* Split SoftMaxLayerTest into opset1 and opset8 versions
Use SoftMax8LayerTest in the tests throughout repository.
SoftMaxLayerTest now defaults to SoftMax1LayerTest for compatibility.
* [GPU] Add f16 test-case for Softmax single-layer test
Co-authored-by: tgubanova-lohika <tgubanova@lohika.com>
* dft with single layer test
* idft with single layer test
* fix output param usage in dft
* update dft according to the clang-format
* move output layout setup to calc_output_layout
* add support for other dimensions
* add clDNN unit test for DFT/IDFT
* remove unnecessary original rank
* use defined formats in kernel
* fix dft docs
* changes after review
* Revert "fix dft docs"
This reverts commit 45b05172dfd161d92dae6d26e0f1b74748e56fd5.
Co-authored-by: Serhii Pavlovskyi <spavlovskyi@lohika.com>
Co-authored-by: Mykhailo Hnap <mhnap@lohika.com>
With new networkx release (2.8.1) some of MO tests started to fail
with following error:
```
def __setstate__(self, state):
self._graph = G = state["_graph"]
self._adjdict = G._pred if hasattr(G, "pred") else G._adj
AttributeError: 'Graph' object has no attribute '_adj'
```
Seems like regression that was introduced in
f50fc70b8c
convolution_gpu_yxfb_yxio_b16 for fp16 has hardcoded reqd_work_group_size
to (16, 1, 1). On devices where CL_DEVICE_MAX_WORK_GROUP_SIZE is 512
GetOptimalLocalWorkGroupSizes picks (16, 2, 1) for LWS.
That causes issues during clEnqueueNDRangeKernel since LWS doesn't match
with reqd_work_group_size in the kernel.
* Add single layer tests for GPU
* Add GPU primitive for ExperimentalDetectronGenerateProposalsSingleImage
* Add kernel for ExperimentalDetectronGenerateProposalsSingleImage
* Add unit test
* rename abbreviation edgpsi to the full name experimental_detectron_generate_proposal_single_image
* Add f16 support to operation
* Add f16 support to the unit test
* Add notification about the second output in primitive
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
* Added shell for Eye-9
* Updated spec for Eye-9
* Added reference for Eye-9
* eye cpu
* Added op impl check for Eye-9
* Fix unallowed dynamic to static dim conversion in eye shape_infer
* Add template plugin tests for dynamic shapes
* Add template plugin tests for dynamic shapes batch input
* Enable batch shape input dynamic rank
* Uncomment 3D batch cpu Eye tests
* Update assertions and messages
* use ov::element type
* Remove redundant evaluate from eval map
* Style fix
* Add static_cast<T>(1) to cpu eye
* Add defaults to eye cpu class members
* Reuse out_ptr and checks
* Reutrn if onesPerBatchNum == 0
* Add Eye CPU Dynamic shape tests with 2D batch
* Additional test cases for CPU and reference
* Disable 3D batch eye cpu tests
* Fix CPU implementation for matrix with not equal cols and rows
* Update CPU test name
* Disable CPU Eye 3D batch static shapes tests
Co-authored-by: Alexandra Sidorova <alexandra.sidorova@intel.com>
Co-authored-by: Yury Gaydaychuk <yury.gaydaychuk@intel.com>
* Update oneDNN rls-v2.6
* Support weight tag for oneDNN v2.6
* Fix first conv selection issue in oneDNN
* oneDNN v2.6 required specific tags to run jit:ir primitives.
* any_tag can find optimized primitives in oneDNN.
* Enable aBcd2b src tag for oneDNN v2.6
* Add create_memory_desc from format string.
* Apply group depthwise separable conv uses jit:ir in oneDNN v2.6
* Use byxf format.
* Update only use acdb format in shallow group conv
* Fix refconv selection in shallow conv with post operations.
* Enable reshape int8
* Fixed quantize fusing through reorder+reshape : Fixed the condition to check per_tensor_input_shift only when need_input_shift is true
* minor change
* Allow FP quant to be fused to FC/gemm
* Disable reshape tranform for onednn until onednn FC is optimized
* [GPU] Support implicit crop in input transposition.
+ Make the crop in front of quantize implicit by changing output format to bfyx.
+ Use implicit concat after quantize nodes.
* Add unit test for implicit crop and concat.
+ remove unnecessary code.
+ Modified jitter Load for planar input of fused eltwise
+ Bugfix in jitter if planar input has LT_ALIGNED_READ
Signed-off-by: Min, Byungil <byungil.min@intel.com>
Update the branch to be used for 2022.1 and remove reference to
-staticdev package which isn't generated anymore.
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* roi_align_9: ov_core, transformations, template_plugin
* roi_align_9: CPU Plugin
* keep only constructor with enums which is aligned with spec
* remove evaluate function for ROIAlign_9
* Add op check test for operation ROIAlign-9
* Apply suggestions from code review
* fix version name from 'v0' to 'v3' in transform part
* use common shape_infer function for v3 and v9
* remove'tf_' prefix for ROIAlign::AlignedMode to avoid misleading for models from different platforms
* Update Convert_Model_From_TensorFlow.md (#11425)
* Apply suggestions by Yuan
The changes are made in the port PR, so will be published with the 22.2 version.
Co-authored-by: Evan <evan.juras@gmail.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Docs: Add links to specific examples (#11618)
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
* Add links to specific examples
This edit adds links to more example applications, making it easier for users to discover how to build an OpenVINO application around their specific model.
* Add links to MO installation and ONNX examples (#11617)
These edits help make it easier for a new user to find more information on how to convert ONNX models.
* Apply suggestions by Yuan
The changes are made in the port PR, so will be published with the 22.2 version.
Co-authored-by: Evan <evan.juras@gmail.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* selectdevice returns MULTI:device in cumulative_throughput
* load multi with throughput and disable cpu helper in cumulative
* disable cpu helper in cumulative_throughput
* add cumulative to bechmark_app help message
* modify benchmark_app.hpp clang-format
- Add TC for decrease_label_id=true to cover MXNet-style NMS models
- Fix segfault issue that occurs when data precision is fp16
Signed-off-by: Andrew Kwangwoong Park <andrew.kwangwoong.park@intel.com>
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Einsum test helper
* Einsum single layer tests
* Add Einsum decomposition with repeated labels and ellipsis support
to GPU transformations pipeline
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
Check first whether the path specified by --input_dirs is a directory.
Otherwise the argument is always treated as a .lst file,
and in case it is a directory it silently fails,
which causes the test runner to not execute any tests intended.
porting from 22.1 as per Andrey's request from 04.08
* sphinx google search
* fixes
* fixes
* fix version tabs
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
* DOCS-benchmarktool_python_correction
add info on tool installation
* Update docs/OV_Runtime_UG/Samples_Overview.md
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* Try to improve gflags
* Try to improve gflags: part 2
* Tried to use dependencies on system
* Use nlohmann_jsonConfig from system
* Enabled nlohmann_json from system
* Improvements
* handle system gflags in developer package
* Simplifications
* Simplify dependency management
* Corrected package names
* Fixed subgraphsDumper configure stage
* Try to fix rhel8
* Try to fix macosx
* Fixed VPUX build
* Fixed aliasing issues
* Suppress some wanrings
* export gflags when build it
* Fixed some LTO
* Try to fix Mac
* revert
* use gflags as private dependency
* Aligned targets in developer package
* Fixed frontends tests build on U20 with LTO
* PAssed
* Don't use pkg_search_module(zlib ..) during cross-compilation
* Removed unused variables
* Fixed finding of zlib during cross-compilation
* CVS-83529
* Use nothreads_static
* Fixed python
* Moving PWL to ngraph
* improving the running time of php_search; refactoring the pwl operation
* fixed erros & refactored code
* moved PWL op to GNA
* Update src/plugins/intel_gna/ops/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/reference/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* fixed compilation error
* Update inference-engine/tests/unit/gna/ngraph/transformations/gna_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* added some tests; changed algorithm of checking accuracy of pwl; refactoring
* added first and last segments; added fq and fixed errors
* fixed after review & rewrote some tests on ngraph
* removed debug logs & fixed code style check error
* s/ngraph_helper/ngraph_util
* removed TRANSFORMATIONS_API in PWLApproximation class declaration
* removed OPENVINO_API in Pwl class declaration
* replaced the deprecated version of evaluate() with a new one
* fixed some problems after reviewing
* fixed a problem when a value of function of left point of segment is less than minimum of function
* corrected a value of the right point of last segments
* [GNA] Moved pwl func tests
* Deleted deprecated test
* s/OPENVINO_RTTI/OPENVINO_OP
* Deleted conflicted test file
* fixed after review
Co-authored-by: Dmitrii Khurtin <dmitrii.khurtin@intel.com>
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* [IE Samples] Activating new parameter is compact mode(memory_reuse) in speech sample
* changed format
* renamed the option to memory_reuse
* renamed the option
* DynamicShapeResolver is able to save information about dynamic output in order to pass it in INFER_DYNAMIC_SHAPE mode. Previously, it propagated fully dynamic output shape (however ranks were equal) and dynamic Convolutions and Poolings were performed incorrectly. Now in the case of dynamic batch, DSR propagates only dynamic batch and Convolutions and Poolings are performed properly as a Loop of single-batch operations.
* Fixed dynamicToStaticShapeTranspose transformation. There was a bug: transposition indices could not be applied with Scatter because the formula is not applicable for this. Replaced with Gather.
i.e. Shape of output tensor of Transpose with transition [0,3,1,2] indices (NHWC [1, 224, 224, 3]->NCHW [1, 3, 224, 224]) was calculated by ScatterElementsUpdate. So output_shape[transposition[i]] = input_shape[i] and the result was output_shape=[1, 224, 3, 224] which was wrong. Vise-versa Gather does output_shape[i] = input_shape[transposition[i]] and the result is [1, 3, 224, 224] which is right.
* MaxPool and AvgPool can be sliced for loop in case of dynamic batch
* Convert stage for inputs is not inserted in the VPU model in the case of OV API 2.0. It did not cause a problem with non-dynamic functions because Graph Transformer has a pass to eliminate redundant converts (u8->f16, ~f16->f16~). In the case of dynamic inputs, yet another inserted Convert breaks data<->shape relations.
* Try to improve gflags
* Try to improve gflags: part 2
* Tried to use dependencies on system
* Use nlohmann_jsonConfig from system
* Enabled nlohmann_json from system
* Improvements
* handle system gflags in developer package
* Simplifications
* Simplify dependency management
* Corrected package names
* Fixed subgraphsDumper configure stage
* Try to fix rhel8
* Try to fix macosx
* Fixed VPUX build
* Fixed aliasing issues
* Suppress some wanrings
* export gflags when build it
* Fixed some LTO
* Try to fix Mac
* revert
* use gflags as private dependency
* Aligned targets in developer package
* Fixed frontends tests build on U20 with LTO
* PAssed
* Don't use pkg_search_module(zlib ..) during cross-compilation
* Removed unused variables
* Fixed finding of zlib during cross-compilation
* added recursive run for transformation to fix fp16 IR with Interpolate inside If
* added test for interpolate inside If
* remove useless variable
* fixed transformaion for divide
* fix code style
* commit auto change
* review fix
* add test for recursive call of divide marks
* removed empty line
* [MO] Support TensorFlow Grouped Conv2DBackpropInput
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Correct computation of group number for ConvBackpropInput operation
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix get_conv_backprop_groups function
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Add unit-tests for Deconvolution shape inference
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
Compilation with ENABLE_CPU_DEBUG_CAPS was fixed.
Previous to this change it failed due to undefined dnnl::impl::md2dim_str
(since DNNL_VERBOSE was disabled in the scope of PR #11244).
* Removed a redundant image
* Fixed ops specifications and other issues
* converted html links to anchor links
* converted html links to anchor links
* Fixed a link
* Fixed a link
* Changed anchor links according to dev review
# Conflicts:
# docs/OV_Runtime_UG/Operations_specifications.md
* Right fill in the values of the inputs
* Using create_and_fill_tensor_unique_sequence() instead of create_and_fill_tensor()
* Fixing a problem with a missing parameter when calling the create_and_fill_tensor method
* Fix Bucketize Conformance tests inputs generation for Template plugin
* Correct filling of the first port (data)
* Correct the order of passing arguments to the InputGenerateData constructor
* Full range correction for random numbers
* Refactoring the argument sequence of the InputGenerateData class constructor
* A small imperfection
* Rollback changes that are related to range
PR for 22.1 made, now porting to release...
some discrepancy between this version and the 22.1 branch seems to exist, so I adjusted the conflicting link to avoid build check errors...
the overview has been merged, the remaining articles are reviewed here
* Paddle FasterRCNN Ops Conversion: roi_align, strided_slice, where
* add check for 'aligned' feature of 'roi_align' op; use common function for idx_node in 'striede_slice' op
* Apply suggestions from code review
* use common funciton for stride_slice and slice, OP_CHECK for 'where' op conversion
* Apply suggestions from code review
* Fix batchability check of MAX_BATCH_SIZE
* Applied review comment
* clonenetwork in auto
Signed-off-by: fishbell <bell.song@intel.com>
* clone in correct way
Signed-off-by: fishbell <bell.song@intel.com>
Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
* Frontend exception safety
Every call to frontend's API (except Places) can throw exception. If during exception handling, FrontEndManager is destroyed and calls 'dlclose' for plugin - call stack will be corrupted and crash will occur.
Solution is to wrap 'plugins' calls with try/catch and throw new exception in 'openvino' context
TODO: currently "Place" objects don't have 'actual' wrappers, so exception in 'place' objects will potentially cause such crash (if exception handler destroys FrontEndManager). Workaround for user would be to try/catch any calls of Place API on their side.
We're not expecting users to use Place API directly, so this workaround looks acceptable
* Add check for exception message
* Keep type of frontend exception during rethrow
* IR FE tests: don't expect InferenceEngine::exception as it be not propagated as is by FrontEndManager
* [Python API] Remove old api class from the new api
* start working on refactoring of OVAny
* fix tests
* fix code-style
* remove tuple test
* fix test
* fix omz hash
* one more overload
* fix pyfloat
* move from_ov_any to utils
* code-style
* move function from common to utils
* Build with system TBB
* Fixes
* Check whether system TBB is available
* Try to fix ONNX Runtime build with system TBB
* Test
* Fixed compilation of threading.cpp
* Fixed unset of cache dirs
* Limit dearch paths of TBB
* Try to enable pip packages with custom TBB
* Fix for TBB 2021.2
* Install only needed TBB libraries
* Install TBB from system to pip package
* Reverted usage of TBBROOT
* Fixed oneTBB case
* Try to fix Android
* Escape some paths
* Added samples path
* Fixed TBBBind usage for case of system TBB
* Added specification for EyeLike-9
* Update docs/ops/generation/EyeLike_9.md
* removed batch from TF
* minor fix
* Applied comment by Anton
* Added new example with dynamic output, added corner case
* Fixed corner case description
* Rename matrix
* applied comments by Yuan
* Added diag_idx as input, minor fixes, renaming
* added support of batch_shape from TF
Co-authored-by: Andrei Kochin <andrei.kochin@intel.com>
* [GNA] Fuse all FakeQuantize layers with their previous layers
* [GNA] Fuse FQ with previous layer if it's not required for precision change
* [GNA] Fixed MatMulOverloadCorrectionTest
* New command line parameters format for speech sample
* fixed notes
* changed format for scale factor
* changed format for scale factor in tests
* added more variants, when name is directy specified for i/o/r like it is done for sf
* removed nthreads flag
* fixed notes
* changed output params
* updated tests with new format
Co-authored-by: Alexander Zhogov <alexander.zhogov@intel.com>
* Fix for str_to_container if string value has whitespaces
* Add test
* Add trim for leading and trailing whitespaces
* Apply comments
* Apply comments 2
* Apply comments 3
* Enable explicit TBlob declaration in all compilers
This fixes problems when linking gcc compiled IE with clang compiled
applications.
Previous to this change, only clang compilers would consider TBlob<T>
templated types as declared externally. When *declared* explictly (with
the `extern template` syntax), the C++ spec says
that any inline methods of the templated class (such as TBlob<T>
constructors) should be ignored in favor of the externally instantiated
version of that templated type:
"An explicit instantiation declaration (an extern template) skips
implicit instantiation step: the code that would otherwise cause an
implicit instantiation instead uses the explicit instantiation
definition provided elsewhere (resulting in link errors if no such
instantiation exists)."
However, when IE is compiled with gcc, it does not see the explicit
`extern template` declarations of TBlob<T> (due to the `#ifdef
__clang__` guards in `ie_blob.h`). As an end result, presumably due to
link-time-optimizations during IE library compilation(?), none of the
TBlob<T> implementations are actually included in the IE dynamic
libraries.
* Fix warnings for windows
* Fix typo
* revert previous version of convert_seq_to_ti transformation
* try to check that outputs of TI are connected to Result nodes
* add unit tests
* fix codestyle
* fix Memory tests
* revert local change
* revert local change
* replace duplicated code with lambda
* Written nGraph reference for the operation RDFT.
* Used std::reverse() algorithm to simplify the function reverse_shape() from fft_common.cpp.
* Added assert into the function offset_from_coords_and_strides().
* Deleted redundant variable.
* Deleted redundant functions from the reference implementation of (I)DFT.
* Renamed the method reverse_shape() in fft_common.hpp.
* Code style fix.
* Paddle FasterRCNN Ops Conversion: greater_than, less_than, gather, floor
* Apply suggestions from code review
* fix 'gather' testcase failure issue on CI
* implement 'axis' input for 'Gather' Op conversion with testcase comment;use common function for all elementwise Ops
* Fix setupvars.bat patching
setupvars.bat shoudl not be patched for regular Debug and Release
configurations.
* Use SRTEQUAL for cmake string comparison
* Improve performance for 'ov::Model::add_output'
On first call of `add_output(tensor_name)` all available tensor names are cached.
Next calls take nodes from cache which significantly reduces complexity.
Cache is invalidated if topological cache is not valid or cache points to incorrect output (no tensor name of this node anymore)
The same caching is done for 'add_output(op_name, output_index)'
Tests:
- Verifies that adding outputs to all nodes has linear complexity O(N), not O(N^2)
- Verifies cache invalidation scenarios
* Fix python tests
* Update topological cache after add_output(Output<Node>) by adding result to the end of cached ops
* Add 'm_shared_rt_info' to 'result node just for consistency (there is actually no scenario which may fail due to absence of this info for Result
* Added test cases to verify that names cache should be cleared on refresh of 'get_ordered_ops'
* wip remote tests2, fixed smoke_canInferOnUserContext
* completed the OV 1.0 tests for remote blobs
* updated OV 2.0 tests for remote blobs with auto-batching (using the ngraph func that is reshape-able by the batch)
* re-using the DetectionOutput-based ngraph func that is 100% batch-reshapeble
* Add test case for the loadNetwork with Auto Batching.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Enable logic test case for GPU.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Enable property for config key 'AUTO_BATCH_DEVICE_CONFIG'.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Omit {}.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add commont test for the property ALLOW_AUTO_BATCHING.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add commont test for AUTO Batching plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Moving PWL to ngraph
* improving the running time of php_search; refactoring the pwl operation
* fixed erros & refactored code
* moved PWL op to GNA
* Update src/plugins/intel_gna/ops/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/reference/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* fixed compilation error
* Update inference-engine/tests/unit/gna/ngraph/transformations/gna_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* added some tests; changed algorithm of checking accuracy of pwl; refactoring
* added first and last segments; added fq and fixed errors
* fixed after review & rewrote some tests on ngraph
* removed debug logs & fixed code style check error
* s/ngraph_helper/ngraph_util
* removed TRANSFORMATIONS_API in PWLApproximation class declaration
* removed OPENVINO_API in Pwl class declaration
* replaced the deprecated version of evaluate() with a new one
* fixed some problems after reviewing
* fixed a problem when a value of function of left point of segment is less than minimum of function
* corrected a value of the right point of last segments
* s/OPENVINO_RTTI/OPENVINO_OP
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
+ Fix colorization-sig accuracy issue using oneDNN
Memory crash in case reuse_eltwise_sum_post in oneDNN and memory_pool
And print node in/out gpu_usm_mem addr at OV_GPU_Verbose >= 1
+ Check the size of z spatial axis for checking fulltensor.
+ Remove program_helpers's functions.
Co-authored-by: hyunback <hyunback.kim@intel.com>
Scenario:
- Node "Split" with multiple outputs (e.g. 3). All outputs are connected to "Result"s
- Add post-processing step (e.g. convert element type, can be also implicit)
Issue: after post-processing, 3 new results will be created, each will have "Split" friendly name - inconsistency with IRv10 rules
Fix:
- For nodes with multiple outputs, add '.<idx>' suffix to new output's friendly name
- If no post-processing is applied, return immediately, keeping original results as is
Tests:
- Split with 3 outputs where 2 outputs have post-processing.
- Split with 3 outputs, post-processing doesn't create any nodes
* [XLink] - tests to smoke scope
* [XLink] - small change in XLink related file to trigger ie-tests-windows-myriadx
* [XLink] - azure windows and linux
* [XLink] - azure windows and linux
* [XLink] - azure windows and linux - change dir?
* [XLink] - azure windows and linux - change dir?
* [XLink] - azure windows and linux - install?
* [XLink] - azure windows and linux - xlink cmake
* [XLink] - azure windows and linux - XLinkTests because another target with the same name already exists
* [XLink] - azure windows and linux - XLinkTests because another target with the same name already exists
* [XLink] - azure windows and linux - install TARGETS given target XLinkTests which does not exist
* [XLink] - azure windows and linux - remove smoke
Inserting padding into oneDNN primitive has issue with implicit concat behavior.
Deconv onedNN initialized output buffer to 0 including padding area. Padding area should be reserved.
Use oneDNN offset from program_node in/out lower_padding instead of oneDNN memory desc.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* add 3D shape to test and rename crop4d to strided_slice
* remove ConvertStridedSliceToCropNegative2 since 3D is now supported
* add myriad functional tests to skip-list
* update Auto docs
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update python snippets
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* remove vpu, fix a mistaken in python code
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update MYRIAD device full name
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update API name
old API use name Inference Engine API
NEW API usen name OpenVINO Runtime API 2.0
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update tab name, and code format
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix AUTO4 format issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update set_property code
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* auto draft
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* mv code into .cpp and .py
modify the devicelist part accoding to the review
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* remove priority list in code and document
modify the begning of the document
remove perfomance data
remove old API
use compile_model instead of set_property
add a image about cpu accelerate
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix mis print and code is not match document
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* try to fix doc build issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix snippets code compile issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Added migration for deployment (#10800)
* Added migration for deployment
* Addressed comments
* more info after the What's new Sessions' questions (#10803)
* more info after the What's new Sessions' questions
* generalizing the optimal_batch_size vs explicit value message
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Perf Hints docs and General Opt Guide refactoring (#10815)
* Brushed the general optimization page
* Opt GUIDE, WIP
* perf hints doc placeholder
* WIP
* WIP2
* WIP 3
* added streams and few other details
* fixed titles, misprints etc
* Perf hints
* movin the runtime optimizations intro
* fixed link
* Apply suggestions from code review
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* some details on the FIL and other means when pure inference time is not the only factor
* shuffled according to general->use-case->device-specifics flow, minor brushing
* next iter
* section on optimizing for tput and latency
* couple of links to the features support matrix
* Links, brushing, dedicated subsections for Latency/FIL/Tput
* had to make the link less specific (otherwise docs compilations fails)
* removing the Temp/Should be moved to the Opt Guide
* shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins
- openvino_docs_IE_DG_Model_caching_overview
- openvino_docs_IE_DG_Int8Inference
- openvino_docs_IE_DG_Bfloat16Inference
- openvino_docs_OV_UG_NoDynamicShapes
* fixed toc for ov_dynamic_shapes.md
* referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors
* fixed main product TOC, removed ref from the second-level items
* reviewers remarks
* reverted the openvino_docs_OV_UG_NoDynamicShapes
* reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference
* "No dynamic shapes" to the "Dynamic shapes" as TOC
* removed duplication
* minor brushing
* Caching to the next level in TOC
* brushing
* more on the perf counters ( for latency and dynamic cases)
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Updated common IE pipeline infer-request section (#10844)
* Updated common IE pipeline infer-reqest section
* Update ov_infer_request.md
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* DOCS: Removed useless 4 spaces in snippets (#10870)
* Updated snippets
* Added link to encryption
* [DOCS] ARM CPU plugin docs (#10885)
* initial commit
ARM_CPU.md added
ARM CPU is added to the list of supported devices
* Update the list of supported properties
* Update Device_Plugins.md
* Update CODEOWNERS
* Removed quotes in limitations section
* NVIDIA and Android are added to the list of supported devices
* Added See Also section and reg sign to arm
* Added Preprocessing acceleration section
* Update the list of supported layers
* updated list of supported layers
* fix typos
* Added support disclaimer
* update trade and reg symbols
* fixed typos
* fix typos
* reg fix
* add reg symbol back
Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com>
* Try to fix visualization (#10896)
* Try to fix visualization
* New try
* Update Install&Deployment for migration guide to 22/1 (#10933)
* updates
* update
* Getting started improvements (#10948)
* Onnx updates (#10962)
* onnx changes
* onnx updates
* onnx updates
* fix broken anchors api reference (#10976)
* add ote repo (#10979)
* DOCS: Increase content width (#10995)
* fixes
* fix
* Fixed compilation
Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: Aleksandr Voron <aleksandr.voron@intel.com>
Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
Co-authored-by: Victoria Yashina <victoria.yashina@intel.com>
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
* CPU device documentation refresh
* Bfloat16 inference page aligned with the new API
* Bfloat16 inference section moved to CPU main
* First review comments applied
* Second review step comments applied
* OneDNN reference changed to the GitHub page
* AvgPool added to the oneDNN ops list
* Add readvalue, assign to templte plugin test
* Fix clang error
* Fix clang error
* Remove unnecessary comment
* Fix type-casting error
* Fix ci issue regarding const value
* Change Function to Model
* Fix op scope
* Change way to get variable
* Fix type-casting error
* Set variable id to const
* Fix side-effect in ieFuncTests
* Implement Assign-3, ReadValue-3 in evaluates_map
* Correct setting attribute
* Correct setting attribute
* Remove unnecessarily added method
* Roll back v6
* Use member variable for variable_id in assign-3, read_value-3
* Get data pointer from host tensor
* Remove visitor API test for ReadValue-6, Assign-6
* Implement visitor api test for read_value-6, assign-6
* Fix clang error
* Split read_value and assign into each file for visitor test
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
This behavior is already used by default because ONNX is enabled by default and thirdparty/onnx/onnx/CMakeLists.txt forcing CMAKE_BUILD_TYPE to Release if it is not set
It fixes the following issues:
- When ONNX frontend is disabled - source is built for Debug, which is very unexpected comparing to Release with ONNX frontend enabled
- When ONNX frontend is disabled, even libopenvino.so could not be built due to some generated makefiles issues
It is set to 'Release' (not to 'Debug') to comply with default behavior when ONNX is enabled (it is default option working for most users)
* Performance improvement for constant creation
The issue is that 'are_all_data_elements_bitwise_identical()' is called every time in Constant constructor, and it potentially checks all buffer which is O(N) complexity.
While it is needed only if client uses 'get_all_data_elements_bitwise_identical'
Solution:
- Defer calculation until first call of 'get_all_data_elements_bitwise_identical'
- Store calculated value in mutable class member to reuse it on next calls of 'get_all_data_elements_bitwise_identical'
Test verifies both cases:
a) that constant creation with shared memory data (now O(1)) is significantly faster than creation+bitwiseCheck O(N)
b) Than once calculated, value is taken from cache, which is significantly faster than re-calculation
* fix clang-format
* Stash - Linux implementation
* Windows mmap implementation + unicode
* Clang for windows
* removed debug print
* Add handling of empty bin file
* fix windows includes
* Fix python test
* Unit tests
Fix for Constant with size > 4GB
* Fix review comments
* refactoring: get bias shape in bc and fbc algoritms
* use scipy to take most frequent shape
* pylint
* update reference
* pylint
* Update test_sanity.py
* update test_sanity.py
* Update test_sanity.py
* [GNA] Added SW_FP32 mode w/o SF for BasicLSTM
* deleted additional test
added sw_fp32 mode for exisiting test
changed reference output for new mode
* [GNA] Fixed according to review
* [GNA] Parametrized weights range
* fixed after review
Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
* Written header files for the nGraph operations RDFT and IRDFT.
* Written nGraph shell for the operation RDFT.
* Added missed include.
* Added RDFT to opset9 table.
* Code style fixes.
* Written the nGraph shell of the operation IRDFT.
* Added IRDFT to opset9 table.
* Started to write shape infer tests for RDFT.
* Refactoring: shape infer functions of RDFT and IRDFT moved into separate files.
* Written shape infer tests for RDFT.
* Written shape infer tests for IRDFT operation.
* Fixed code style.
* Fixes in the shape infer function of RDFT.
* Fixes in the shape infer function of RDFT.
* Fixes in the shape infer function of IRDFT.
* Deleted redundant includes in include/ngraph/op/irdft.hpp and include/ngraph/op/rdft.hpp
* Deleted redundant includes in include/openvino/op/rdft.hpp and include/openvino/op/irdft.hpp.
* Deleted redundant includes in cpp-files of nGraph shells of operations IRDFT and RDFT.
* Code style fixes.
* Shape inference functions of operations RDFT and IRDFT moved to the namespace ov::op::util.
* Deleted RDFT and IRDFT from docs/template_plugin/backend/opset_int_tbl.hpp.
* Deleted 'using namespace ngraph' from cpp-files of nGraph shells of operations RDFT and IRDFT.
* Fixed typos.
* Merged some loops in shape inference functions of RDFT and IRDFT.
* Written visitor tests for RDFT and IRDFT.
* Small change.
* Common part of RDFT and IRDFT shape validation moved into the separate file.
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* don't check dynamic shape when there is only one device
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* remove redundant if
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* mod docs/_static/images/dataset.png and docs/_static/images/inputs.png
* add new hint cumulative_throughput
* clang format properties.hpp
* add set properties and get properties test case for CUMULATIVE_THROUGHPUT
* reset docs/_static/images/dataset.png and docs/_static/images/inputs.png
* reset docs/_static/images/dataset.png and docs/_static/images/inputs.png
* reset dataset.png and inputs.png
* reset dataset.png and inputs.png
* remove test value cumulative_throughput from gpuplugin and cpuplugin testcase
* rollback dataset.png and inputs.png to 41818a377
* add fps log
add format '%lf' for log
add INFO_RUN and DEBUG_RUN, code only run when greater than special log level
add fps log for device
print device config info with DEBUG_RUN
add mock test for DEBUG_RUN and INFO_RUN
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* use n / end -start instead of (n-1) / ((nst start) -(1st start))
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Mark `get_type_info_static()` as hidden
Each plugin linked with openvino library contains `type_info_static` symbols. In case when one of the libraries is unloaded and app tries to get opset, it leads to segfault. So mark `get_type_info_static()` as hidden to use only one implementation exactly from openvino lib
* Fix "'visibility' attribute ignored" issue by moving `TestPass` out of test scope
* Fix clang format
* Small update of `If` op
* Revert "fix 79520 (#10449)" to correctly compare DiscreteTypeInfo via `==`
This reverts commit 29883a152a.
The change fixes FQ fusions for subgraphs like 'Const weights'->FQ->Transpose->Multiply.
After PullTransposeThroughFQUp transformation, we end up with following:
'Const weights'->Transpose->FQ->Multiply. Because of the Transpose on first
FakeQuantize inputs, Multiply could not be fused since FakeQuantizeMulFusion
expected that weights is a Constant node.
Ticket: 77785
* Performance improvement for constant creation
The issue is that 'are_all_data_elements_bitwise_identical()' is called every time in Constant constructor, and it potentially checks all buffer which is O(N) complexity.
While it is needed only if client uses 'get_all_data_elements_bitwise_identical'
Solution:
- Defer calculation until first call of 'get_all_data_elements_bitwise_identical'
- Store calculated value in mutable class member to reuse it on next calls of 'get_all_data_elements_bitwise_identical'
Test verifies both cases:
a) that constant creation with shared memory data (now O(1)) is significantly faster than creation+bitwiseCheck O(N)
b) Than once calculated, value is taken from cache, which is significantly faster than re-calculation
* fix clang-format
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* InputTensorInfo::from implementation
If user's application already has `ov::runtime::Tensor` object created,
it will be possible to reuse basic characteristics for input (shape, precision) from tensor using InputTensorInfo::from method
* Rename 'from' to 'set_from' as in Python 'from' keyword is used for import modules
Python bindings: from ov.Tensor and from numpy array
* Style fix (quotes)
* Apply suggestions from code review
Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
* Fix code style
* Use set_from in hello_classification CPP sample
Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
* add placeholder for python version of first snippet
* fix problem with placeholder
* fix wrong file name
* fix fragment name
* update python snippets
* move imports to the top of the code fragments
* [GNA] Single lstm-cell test added
* Added additional config for test
* one more input and hidden shape
* Added cell with ReLU
Deleted deprecated test
* test added as lstm_cell_basic
* Enabled gna_compact_mode
Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
* enabled compact_mode in all tests
Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
* fix .ncc_style target names
it was breaking configure on system with libclang-12-dev, clang-12,
ninja and cmake 3.17+(ninja complains about duplicate
target). with lower cmake version configure succeeds, but build exits
immediately with error. by replacing ninja with make error becomes
warning(it's still significant, make just skips duplicate rules, i.e.
doesn't check style of some source files, rule duplication is genuine
bug). without libclang-12-dev and clang-12 ENABLE_NCC_STYLE is OFF and
bug is not triggered
* silence uninitialized warning in core_integration
probably it was always initialized before use, but compiler wasn't made
aware of it
* fix function spelling to unbreak code style checks in benchmark_app
* include <thread> for std::this_thread
existing code was relying on namespace pollution by old libstdc++
* replace is_pod with is_standard_layout && is_trivial
is_pod is deprecated, it breaks build on current gcc
Co-authored-by: Serhii Pavlovskyi <spavlovskyi@lohika.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
When partial build is called for dryrun, do constant propagate too.
In normal case, partial build is not doing constant propate for saving build time of internal program.
However, if partial build is called with dryrun, it will fail at transfer_constants due to the generic nodes which does not have impl.
* Update IEEngine with the Dynamic models support
* Update with the batch
* Method naming fix
* Update image_loader & tests with dynamic models
* Update test_sanity.py
* Replace custom_mo_config from the model
* Modified the workflow diagram
* Moved supported topology lists to separate topics
* Additional changes
* Removed Supported Topologies list and Deprecated pages
* Created the Model Conversion Tutorials section for instructions for specific models
* Topic names alignment, removed Default_Model_Optimizer_Optimizations.md
* Additional structural changes
* Fixed links
* heading fixes
* [MO] Remove IR frontend from available frontend list in MO
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix issue - forget to pass FEM
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix issue for TF with new FE and default legacy
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix result saving when batch size is not 1
* Remove useless if statement
* improved processing scores for model with more than one outputs
* added checking on count of model outputs
* improve if statements
* divide fix for model with several outputs to other PR
Co-authored-by: Maxim Gordeev <maxim.gordeev@intel.com>
* [GPU] update the condition for minimize_local_reorders
* Update to check needs reorder condition in quantize.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Add coommont test of the key PERFORMANCE_HINT for AUTO plugin API 2.0.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add common test case for config check.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Use the implemented property test case.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Written the draft of the specification of the operation RFFT.
* Started to write the specification of the operation IRFFT.
* Small fix.
* Renamed RFFT operation as RDFT.
* Fix in Operations_specifications.md.
* Written the specification of the operation IRDFT.
* Fixes in examples.
* Fixes in opset9.md and Operations_specifications.md.
* Small fix.
* Replaced opset8 by opset9 in opset9.md.
* Deleted redundant sentences.
* Small fix.
* Replaced input_shape by data_shape.
* Fixed mistypes.
* Fixes of mistypes.
* Fixed typo.
* Fixed RDFT specification, in order to perform signal_size input as in TF and PyTorch.
* Fixes in examples for RDFT.
* Fixes in the output shape calculation of IRDFT. Now this calculation is as in TF and PyTorch.
We suppose that you are an enthusiastic coder, want to contribute some code. For that purpose OpenVINO project now has a repository on the GitHub, to simplify everybody's life! All the bug fixes, new functionality, new tutorials etc. should be submitted via the GitHub's mechanism of pull requests.
We welcome community contributions to OpenVINO™. Please read the following guide to learn how to find ideas for contribution, practices for good pull requests, checking your changes with our tests and more.
If you are not familiar with the mechanism - do not worry, it's very simple. Keep reading.
## Before you start contributing you should
- Make sure you agree to contribute your code under [OpenVINO (Apache 2.0)](https://github.com/openvinotoolkit/openvino/blob/master/LICENSE) license.
-If you are submitting a new module, you should gointo [openvino_contrib](https://github.com/openvinotoolkit/openvino_contrib) repository by default.
- If you are going to fix a bug, check that it's still exists. This can be done by building the latest [releases/2020/3](https://github.com/openvinotoolkit/openvino/tree/releases/2020/3) branch (LTS release) or the latest master branch, and make sure that the error is still reproducible there. We do not fix bugs that only affect older non-LTS releases like 2020.2 for example (more details about [branching strategy](https://github.com/openvinotoolkit/openvino/wiki/Branches))
- Make sure that nobody beat you into fixing or reporting the issue by doing a search on the [Github OpenVINO issues](https://github.com/openvinotoolkit/openvino/issues) page, and making sure that there isn't someone working on it. In the latter case you might provide support or suggestion in the issue or in the linked pull request.
- If you have a question about the software, then this is **NOT** the right place. You should open up a question at the [OpenVINO forum](https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/bd-p/distribution-openvino-toolkit). In order to post a decent question from the start, feel free to read the official forum guidelines.
- Make sure you agree to contribute your code under [OpenVINO™ (Apache 2.0)](https://github.com/openvinotoolkit/openvino/blob/master/LICENSE) license.
-Figure out what you’re going to contribute. If you don’t know what you are going to work on, navigate to the [Github "Issues" tab](https://github.com/openvinotoolkit/openvino/issues). Make sure that there isn't someone working on it. In the latter case you might provide support or suggestion in the issue or in the linked pull request.
- If you are going to fix a bug, check that it's still exists in the latest release. This can be done by building the latest master branch, and make sure that the error is still reproducible there. We do not fix bugs that only affect older non-LTS releases like 2020.2 for example (more details about [branching strategy](https://github.com/openvinotoolkit/openvino/wiki/Branches)).
Before you open up anything on the OpenVINO GitHub page, be sure that you are at the right place with your problem.
## "Fork & Pull Request model" for code contribution
### [](https://github.com/openvinotoolkit/openvino/wiki/Contribute#the-instruction-in-brief)The instruction in brief
### [](https://github.com/openvinotoolkit/openvino/blob/master/CONTRIBUTING.md#the-instruction-in-brief)The instruction in brief
- Register at GitHub. Create your fork of OpenVINO repository [https://github.com/openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino) (see [https://help.github.com/articles/fork-a-repo](https://help.github.com/articles/fork-a-repo) for details).
- Register at GitHub. Create your fork of OpenVINO™ repository [https://github.com/openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino) (see [https://help.github.com/articles/fork-a-repo](https://help.github.com/articles/fork-a-repo) for details).
- Install Git.
- Set your user name and email address in a Git configuration according to GitHub account (see [https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup) for details).
- Choose a task for yourself. It could be a bugfix or some new code.
- Choose a base branch for your work. More details about branches and policies are here: [Branches](https://github.com/openvinotoolkit/openvino/wiki/Branches)
- Clone your fork to your computer.
- Create a new branch (with a meaningful name) from the base branch you chose.
- Modify / add the code following our [Coding Style Guide](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLines) and [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation).
- Modify / add the code following our [Coding Style Guide](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLines).
- If you want to add a new sample, please look at this [Guide for contributing to C++/C/Python IE samples](https://github.com/openvinotoolkit/openvino/wiki/SampleContribute)
- If you want to contribute to the documentation and want to add a new guide, follow that instruction [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation)
- Run testsuite locally:
- execute each test binary from the artifacts directory, e.g. `<source dir>/bin/intel64/Release/ieFuncTests`
- If you contribute to the documentation and want to add a new guide:
- Create a new markdown file in an appropriate folder.
-**REQUIRED:** The document title must contain a document label in a form: `{#openvino_docs_<name>}`. For example: `Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™ {#openvino_docs_MO_DG_IR_and_opsets}`.
- Add your file to the documentation structure. Open the documentation structure file [`docs/doxygen/ie_docs.xml`](https://github.com/openvinotoolkit/openvino/blob/master/docs/doxygen/ie_docs.xml) and add your file path to the appropriate section.
- When you are done, make sure that your branch is to date with latest state of the branch you want to contribute to (e.g. `git fetch upstream && git merge upstream/master`), push your branch to your GitHub fork; then create a pull request from your branch to the base branch (see [https://help.github.com/articles/using-pull-requests](https://help.github.com/articles/using-pull-requests) for details).
## Making a good pull request
Following these guidelines will increase the likelihood of your pull request being accepted:
-Before pushing your PR to the repository, make sure that it builds perfectly fine on your local system.
-Add enough information, like a meaningful title, the reason why you made the commit and a link to the issue page if you opened one for this PR.
-Scope your PR to one issue. Before submitting, make sure the diff contains no unrelated changes. If you want to cover more than one issue, submit your changes for each as separate pull requests.
-If you have added new functionality, you should update/create the relevant documentation, as well as add tests for it to the testsuite.
-Try not to include "oops" commits - ones that just fix an error in the previous commit. If you have those, then before submitting [squash](https://github.com/openvinotoolkit/openvino/wiki/Contribute#https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#Squashing-Commits) those fixes directly into the commits where they belong.
-Make sure to choose the right base branch and to follow the [Coding Style Guide](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLines) for your code or [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation) you are changing documentation files.
-Make sure to add test for new functionality or test that reproduces fixed bug with related test data. Please do not add extra images or videos, if some of existing media files are suitable.
-One PR – one issue.
-Build perfectly on your local system.
-Choose the right base branch [Branches](https://github.com/openvinotoolkit/openvino/wiki/Branches).
-Follow the [Coding Style Guide](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLines) for your code.
-Update documentation using [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation) if needed.
-Cover your changes with test.
-Add license at the top of new files [C++ example](https://github.com/openvinotoolkit/openvino/blob/master/samples/cpp/classification_sample_async/main.cpp#L1-L2), [Python example](https://github.com/openvinotoolkit/openvino/blob/master/samples/python/hello_classification/hello_classification.py#L3-L4).
- Add enough information: a meaningful title, the reason why you made the commit and a link to the issue page if exists.
- Remove unrelated to PR changes.
- If it is still WIP and you want to check CI test results early then use _Draft_ PR.
- Submit your PR and become an OpenVINO™ contributor!
## Testing and merging pull requests
-Your pull request will be automatically tested by OpenVINO's precommit (testing status are automatically reported as "green" or "red" circles in precommit steps on PR's page). If any builders have failed, you should fix the issue. To rerun the automatic builds just push changes to your branch on GitHub. No need to close pull request and open a new one!
- Once all the builders are "green", one of OpenVINO developers will review your code. Reviewer could ask you to modify your pull request. Please provide timely response for reviewers (within weeks, not months), otherwise you submission could be postponed or even rejected.
Your pull request will be automatically tested by OpenVINO™'s precommit (testing status are automatically reported as "green" or "red" circles in precommit steps on PR's page). If any builders have failed, you need fix the issue. To rerun the automatic builds just push changes to your branch on GitHub. No need to close pull request and open a new one!
## PR review good practices
- Originator is responsible for driving the review of changes and should ping reviewers periodically.
- Originator should close comments from the Reviewer when it is resolved. The Reviewer may re-open the comment if he does not agree with the resolution.
- Originator should request re-review from the Reviewer when all comments are resolved by pushing the button in the “Reviewers” section.
- If it is still WIP and you want to check CI test results early then use _Draft_ PR.
- Do **NOT** rewrite history (push -f) once you converted draft PR into regular one, add new commits instead. Looking at diffs makes review easier.
- Write meaningful description of commits resulting from review. _"Addressing review comments"_ is **NOT** a good description! Having a quick look at good descriptions can tell you much what is going on in PR without a need to go through all of resolved comments.
## Merging PR
As soon as the reviewer is fine with the pull request and Precommit likes your code and shows "green" status, the "Approved" review status is put, which signals OpenVINO maintainers that they can merge your pull request.
As soon as the reviewer is fine with the pull request and precommit shows "green" status, the "Approved" review status is put, which signals OpenVINO™ maintainers that they can merge your pull request.
This toolkit allows developers to deploy pre-trained deep learning models
through a high-level OpenVINO™ Runtime C++ and Python APIs integrated with application logic.
## Contents:
This open source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics.
- [Products which use OpenVINO](#products-which-use-openvino)
- [System requirements](#system-requirements)
- [How to build](#how-to-build)
- [How to contribute](#how-to-contribute)
- [Get a support](#get-a-support)
- [See also](#see-also)
## What is OpenVINO toolkit?
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
- Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks
- Use models trained with popular frameworks like TensorFlow, PyTorch and more
- Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud
This open-source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics.
It supports pre-trained models from the [Open Model Zoo], along with 100+ open
source and public models in popular formats such as TensorFlow, ONNX, PaddlePaddle, MXNet, Caffe, Kaldi.
## Repository components
* [OpenVINO™ Runtime]
* [Model Optimizer]
* [Post-Training Optimization Tool]
* [Samples]
### Components
* [OpenVINO™ Runtime] - is a set of C++ libraries with C and Python bindings providing a common API to deliver inference solutions on the platform of your choice.
* [core](https://github.com/openvinotoolkit/openvino/tree/master/src/core) - provides the base API for model representation and modification.
* [inference](https://github.com/openvinotoolkit/openvino/tree/master/src/inference) - provides an API to infer models on device.
* [transformations](https://github.com/openvinotoolkit/openvino/tree/master/src/common/transformations) - contains the set of common transformations which are used in OpenVINO plugins.
* [low precision transformations](https://github.com/openvinotoolkit/openvino/tree/master/src/common/low_precision_transformations) - contains the set of transformations which are used in low precision models
* [bindings](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings) - contains all awailable OpenVINO bindings which are maintained by OpenVINO team.
* [c](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings/c) - provides C API for OpenVINO™ Runtime
* [python](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings/python) - Python API for OpenVINO™ Runtime
* [Plugins](https://github.com/openvinotoolkit/openvino/tree/master/src/plugins) - contains OpenVINO plugins which are maintained in open-source by OpenVINO team. For more information please taje a look to the [list of supported devices](#supported-hardware-matrix).
* [Frontends](https://github.com/openvinotoolkit/openvino/tree/master/src/frontends) - contains available OpenVINO frontends which allow to read model from native framework format.
* [Model Optimizer] - is a cross-platform command-line tool that facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.
* [Post-Training Optimization Tool] - is designed to accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, for example, post-training 8-bit quantization.
* [Samples] - applications on C, C++ and Python languages which shows basic use cases of OpenVINO usages.
## Supported Hardware matrix
The OpenVINO™ Runtime can infer models on different hardware devices. This section provides the list of supported devices.
<td>Auto batch plugin performs on-the-fly automatic batching (i.e. grouping inference requests together) to improve device utilization, with no programming effort from the user</td>
* [Intel® Distribution of OpenVINO™ toolkit Product Page](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
* [Intel® Distribution of OpenVINO™ toolkit Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes)
## Documentation
### User documentation
The latest documentation for OpenVINO™ Toolkit is availabe [here](https://docs.openvino.ai/). This documentation contains detailed information about all OpenVINO components and provides all important information which could be needed if you create an application which is based on binary OpenVINO distribution or own OpenVINO version without source code modification.
### Developer documentation
[Developer documentation](#todo-add) contains information about architectural decisions which are applied inside the OpenVINO components. This documentation has all necessary information which could be needed in order to contribute to OpenVINO.
Please take a look to [OpenVINO Wiki](https://github.com/openvinotoolkit/openvino/wiki#how-to-build) to get more information about OpenVINO build process.
## How to contribute
See [CONTRIBUTING](./CONTRIBUTING.md) for details. Thank you!
## Get a support
## Support
Please report questions, issues and suggestions using:
* The [`openvino`](https://stackoverflow.com/questions/tagged/openvino) tag on StackOverflow\*
* [Intel® Distribution of OpenVINO™ toolkit Product Page](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
* [Intel® Distribution of OpenVINO™ toolkit Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes)
* [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf) - a suite of advanced algorithms for model inference optimization including quantization, filter pruning, binarization and sparsity
* [OpenVINO™ Training Extensions (OTE)](https://github.com/openvinotoolkit/training_extensions) - convenient environment to train Deep Learning models and convert them using OpenVINO for optimized inference.
* [OpenVINO™ Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) - a scalable, high-performance solution for serving deep learning models optimized for Intel architectures
* [DL Workbench](https://docs.openvino.ai/nightly/workbench_docs_Workbench_DG_Introduction.html) - An alternative, web-based version of OpenVINO designed to make production of pretrained deep learning models significantly easier.
* [Computer Vision Annotation Tool (CVAT)](https://github.com/openvinotoolkit/cvat) - an online, interactive video and image annotation tool for computer vision purposes.
* [Dataset Management Framework (Datumaro)](https://github.com/openvinotoolkit/datumaro) - a framework and CLI tool to build, transform, and analyze datasets.
---
\* Other names and brands may be claimed as the property of others.
[Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
User Guide <workbench_docs_Workbench_DG_User_Guide>
workbench_docs_Workbench_DG_Troubleshooting
@endsphinxdirective
Deep Learning Workbench (DL Workbench) is an official OpenVINO™ graphical interface designed to make the production of pretrained deep learning Computer Vision and Natural Language Processing models significantly easier.
Minimize the inference-to-deployment workflow timing for neural models right in your browser: import a model, analyze its performance and accuracy, visualize the outputs, optimize and make the final model deployment-ready in a matter of minutes. DL Workbench takes you through the full OpenVINO™ workflow, providing the opportunity to learn about various toolkit components.
DL Workbench enables you to get a detailed performance assessment, explore inference configurations, and obtain an optimized model ready to be deployed on various Intel® configurations, such as client and server CPU, Intel® Processor Graphics (GPU), Intel® Movidius™ Neural Compute Stick 2 (NCS 2), and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
DL Workbench also provides the [JupyterLab environment](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Jupyter_Notebooks.html#doxid-workbench-docs-workbench-d-g-jupyter-notebooks) that helps you quick start with OpenVINO™ API and command-line interface (CLI). Follow the full OpenVINO workflow created for your model and learn about different toolkit components.
DL Workbench helps achieve your goals depending on the stage of your deep learning journey.
If you are a beginner in the deep learning field, the DL Workbench provides you with
learning opportunities:
* Learn what neural networks are, how they work, and how to examine their architectures.
* Learn the basics of neural network analysis and optimization before production.
* Get familiar with the OpenVINO™ ecosystem and its main components without installing it on your system.
If you have enough experience with neural networks, DL Workbench provides you with a
convenient web interface to optimize your model and prepare it for production:
* Measure and interpret model performance.
* Tune the model for enhanced performance.
* Analyze the quality of your model and visualize output.
## General Workflow
The diagram below illustrates the typical DL Workbench workflow. Click to see the full-size image:

Get a quick overview of the workflow in the DL Workbench User Interface:

## OpenVINO™ Toolkit Components
The intuitive web-based interface of the DL Workbench enables you to easily use various
OpenVINO™ toolkit components:
Component | Description
|------------------|------------------|
| [Open Model Zoo](https://docs.openvinotoolkit.org/latest/omz_tools_downloader.html)| Get access to the collection of high-quality pre-trained deep learning [public](https://docs.openvinotoolkit.org/latest/omz_models_group_public.html) and [Intel-trained](https://docs.openvinotoolkit.org/latest/omz_models_group_intel.html) models trained to resolve a variety of different tasks.
| [Model Optimizer](https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) |Optimize and transform models trained in supported frameworks to the IR format. <br>Supported frameworks include TensorFlow\*, Caffe\*, Kaldi\*, MXNet\*, and ONNX\* format.
| [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html)| Estimate deep learning model inference performance on supported devices.
| [Accuracy Checker](https://docs.openvinotoolkit.org/latest/omz_tools_accuracy_checker.html)| Evaluate the accuracy of a model by collecting one or several metric values.
| [Post-Training Optimization Tool](https://docs.openvinotoolkit.org/latest/pot_README.html)| Optimize pretrained models with lowering the precision of a model from floating-point precision(FP32 or FP16) to integer precision (INT8), without the need to retrain or fine-tune models. |
# Introduction to Model Processing {#openvino_docs_model_processing_introduction}
Every deep learning workflow begins with obtaining a model. You can choose to prepare a custom one, use a ready-made solution and adjust it to your needs, or even download and run a pre-trained network from an online database, such as OpenVINO's [Open Model Zoo](../model_zoo.md).
This section describes how to obtain and prepare your model for work with OpenVINO to get the best inference results:
* [Browse a database of models for use in your projects](../model_zoo.md).
* [Convert different model formats to the OpenVINO IR format](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
* [Automate model-related tasks with Model Downloader and additional OMZ Tools](https://docs.openvino.ai/latest/omz_tools_downloader.html).
OpenVINO™ is not just one tool. It is an expansive ecosystem of utilities, providing a comprehensive workflow for deep learning solution development. Learn more about each of them to reach the full potential of OpenVINO™ Toolkit.
### OpenVINO™ Model Server (OVMS)
OpenVINO Model Server is a scalable, high-performance solution for serving deep learning models optimized for Intel® architectures. The server uses Inference Engine libraries as a backend and exposes gRPC and HTTP/REST interfaces for inference that are fully compatible with TensorFlow Serving.
* [Red Hat Ecosystem Catalog](https://catalog.redhat.com/software/container-stacks/detail/60649e41ccfb383fe395a167)
### Neural Network Compression Framework (NNCF)
A suite of advanced algorithms for Neural Network inference optimization with minimal accuracy drop. NNCF applies quantization, filter pruning, binarization and sparsity algorithms to PyTorch and TensorFlow models during training.
A solution empowering TensorFlow developers with OpenVINO's optimization capabilities. With just two lines of code in your application, you can offload inference to OpenVINO, while keeping the TensorFlow API.
A streaming media analytics framework, based on the GStreamer multimedia framework, for creating complex media analytics pipelines.
More resources:
* [documentation on GitHub](https://openvinotoolkit.github.io/dlstreamer_gst/)
* [installation Guide on GitHub](https://github.com/openvinotoolkit/dlstreamer_gst/wiki/Install-Guide)
### DL Workbench
A web-based tool for deploying deep learning models. Built on the core of OpenVINO and equipped with a graphics user interface, DL Workbench is a great way to explore the possibilities of the OpenVINO workflow, import, analyze, optimize, and build your pre-trained models. You can do all that by visiting [Intel® DevCloud for the Edge](https://software.intel.com/content/www/us/en/develop/tools/devcloud.html) and launching DL Workbench on-line.
# OpenVINO™ integration with TensorFlow {#ovtf_integration}
**OpenVINO™ integration with TensorFlow** is a solution for TensorFlow developers who want to get started with OpenVINO™ in their inferencing applications. By adding just two lines of code you can now take advantage of OpenVINO™ toolkit optimizations with TensorFlow inference applications across a range of Intel® computation devices.
This is all you need:
```bash
import openvino_tensorflow
openvino_tensorflow.set_backend('<backend_name>')
```
**OpenVINO™ integration with TensorFlow** accelerates inference across many AI models on a variety of Intel® technologies, such as:
- Intel® CPUs
- Intel® integrated GPUs
- Intel® Movidius™ Vision Processing Units - referred to as VPU
- Intel® Vision Accelerator Design with 8 Intel Movidius™ MyriadX VPUs - referred to as VAD-M or HDDL
> **NOTE**: For maximum performance, efficiency, tooling customization, and hardware control, we recommend developers to adopt native OpenVINO™ solutions.
To find out more about the product itself, as well as learn how to use it in your project, check its dedicated [GitHub repository](https://github.com/openvinotoolkit/openvino_tensorflow/tree/master/docs).
To see what you can do with **OpenVINO™ integration with TensorFlow**, explore the demos located in the [examples folder](https://github.com/openvinotoolkit/openvino_tensorflow/tree/master/examples) in our GitHub repository.
Sample tutorials are also hosted on [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/edge/build/ovtfoverview.html). The demo applications are implemented using Jupyter Notebooks. You can interactively execute them on Intel® DevCloud nodes, compare the results of **OpenVINO™ integration with TensorFlow**, native TensorFlow, and OpenVINO™.
## License
**OpenVINO™ integration with TensorFlow** is licensed under [Apache License Version 2.0](https://github.com/openvinotoolkit/openvino_tensorflow/blob/master/LICENSE).
By contributing to the project, you agree to the license and copyright terms therein
and release your contribution under these terms.
## Support
Submit your questions, feature requests and bug reports via [GitHub issues](https://github.com/openvinotoolkit/openvino_tensorflow/issues).
## How to Contribute
We welcome community contributions to **OpenVINO™ integration with TensorFlow**. If you have an idea for improvement:
* Share your proposal via [GitHub issues](https://github.com/openvinotoolkit/openvino_tensorflow/issues).
* Submit a [pull request](https://github.com/openvinotoolkit/openvino_tensorflow/pulls).
We will review your contribution as soon as possible. If any additional fixes or modifications are necessary, we will guide you and provide feedback. Before you make your contribution, make sure you can build **OpenVINO™ integration with TensorFlow** and run all the examples with your fix/patch. If you want to introduce a large feature, create test cases for your feature. Upon our verification of your pull request, we will merge it to the repository provided that the pull request has met the above mentioned requirements and proved acceptable.
---
\* Other names and brands may be claimed as the property of others.
# How to Implement Custom GPU Operations {#openvino_docs_Extensibility_UG_GPU}
To enable operations not supported by OpenVINO out of the box, you may need an extension for an OpenVINO operation set, and a custom kernel for the device you will target. This page describes custom kernel support for the GPU device.
The GPU codepath abstracts many details about OpenCL. You need to provide the kernel code in OpenCL C and an XML configuration file that connects the kernel and its parameters to the parameters of the operation.
There are two options for using the custom operation configuration file:
* Include a section with your kernels into the automatically-loaded `<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file.
* Call the `ov::Core::set_property()` method from your application with the `"CONFIG_FILE"` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
All OpenVINO samples, except the trivial `hello_classification`, and most Open Model Zoo demos
feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
`Kernel` node contains all kernel source code configuration.
**Sub-nodes**: `Source` (1+), `Define` (0+)
### Source Node and Sub-Node Structure
`Source` node points to a single OpenCL source file.
| Attribute Name | \# |Description|
|-----|-----|-----|
| `filename` | (1) | Name of the file containing OpenCL source code. Note that the path is relative to your executable. Multiple source nodes will have their sources concatenated in order. |
**Sub-nodes**: None
### Define Node and Sub-Node Structure
`Define` node configures a single `#‍define` instruction to be added to
the sources during compilation (JIT).
| Attribute Name | \# | Description |
|------|-------|------|
| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well, which is taken as a string. |
| `param` | (0/1) | This parameter value is used as the value of this JIT definition. |
| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. |
| `default` | (0/1) | The default value to be used if the specified parameters are missing from the operation in the IR. |
**Sub-nodes:** None
The resulting JIT has the following form:
`#‍define [name] [type] [value/default]`.
### Buffers Node and Sub-Node Structure
`Buffers` node configures all input/output buffers for the OpenCL entry
function. No buffers node structure exists.
**Sub-nodes:**`Data` (0+), `Tensor` (1+)
### Data Node and Sub-Node Structure
`Data` node configures a single input with static data, for example,
weights or biases.
| Attribute Name | \# | Description |
|----|-----|------|
| `name` | (1) | Name of a blob attached to an operation in the IR |
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to |
**Sub-nodes**: None
### Tensor Node and Sub-Node Structure
`Tensor` node configures a single input or output tensor.
| Attribute Name | \# | Description |
|------|-------|-------|
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. |
| `type` | (1) | `input` or `output` |
| `port-index` | (1) | 0-based index in the operation input/output ports in the IR |
| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB`, and same values in all lowercase. Default value: `BFYX` |
### CompilerOptions Node and Sub-Node Structure
`CompilerOptions` node configures the compilation flags for the OpenCL
sources.
| Attribute Name | \# | Description |
|--------|-----|------|
| `options` | (1) | Options string to be passed to the OpenCL compiler |
**Sub-nodes**: None
### WorkSizes Node and Sub-Node Structure
`WorkSizes` node configures the global/local work sizes to be used when
queuing an OpenCL program for execution.
| Attribute Name | \# | Description |
|-----|------|-----|
| `global`<br>`local` | (0/1)<br>(0/1) | An array of up to three integers or formulas for defining OpenCL work-sizes to be used during execution.<br> The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,%. All operators are evaluated in integer arithmetic. <br>Default value: `global=”B*F*Y*X” local=””` |
| `dim` | (0/1) | A tensor to take the work-size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. Default value: `output` |
**Sub-nodes**: None
## Example Configuration File
The following code sample provides an example configuration file in XML
format. For information on the configuration file structure, see
The following table includes definitions that are attached before
user sources.
For an example, see [Example Kernel](#example-kernel).
| Name | Value |
|---|---|
| `NUM_INPUTS` | Number of the input tensors bound to this kernel |
| `GLOBAL_WORKSIZE` | An array of global work sizes used to execute this kernel |
| `GLOBAL_WORKSIZE_SIZE` | The size of the `GLOBAL_WORKSIZE` array |
| `LOCAL_WORKSIZE` | An array of local work sizes used to execute this kernel |
| `LOCAL_WORKSIZE_SIZE` | The size of the `LOCAL_WORKSIZE` array |
| `<TENSOR>_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX` |
| `<TENSOR>_DIMS_SIZE`| The size of the `<TENSOR>_DIMS` array.|
| `<TENSOR>_TYPE`| The datatype of the tensor: `float`, `half`, or `char`|
| `<TENSOR>_FORMAT_<TENSOR_FORMAT>` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. |
| `<TENSOR>_LOWER_PADDING` | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.|
| `<TENSOR>_LOWER_PADDING_SIZE` | The size of the `<TENSOR>_LOWER_PADDING` array |
| `<TENSOR>_UPPER_PADDING` | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. |
| `<TENSOR>_UPPER_PADDING_SIZE` | The size of the `<TENSOR>_UPPER_PADDING` array |
| `<TENSOR>_PITCHES` | The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX.|
| `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array |
| `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element, bypassing the lower padding. |
All `<TENSOR>` values are automatically defined for every tensor
bound to this operation, such as `INPUT0`, `INPUT1`, and `OUTPUT0`, as shown
in the following example:
```c
#define INPUT0_DIMS_SIZE 4
#define INPUT0_DIMS (int []){ 1,96,55,55, }
```
## Example Kernel<a name="example-kernel"></a>
```c
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernelvoidexample_relu_kernel(
const__globalINPUT0_TYPE*input0,
__globalOUTPUT0_TYPE*output)
{
constuintidx=get_global_id(0);
constuintidy=get_global_id(1);
constuintidbf=get_global_id(2);// batches*features, as OpenCL supports 3D nd-ranges only
constuintfeature=idbf%OUTPUT0_DIMS[1];
constuintbatch=idbf/OUTPUT0_DIMS[1];
//notice that pitches are in elements, not in bytes!
Custom operations, that is those not included in the list, are not recognized by OpenVINO™ out-of-the-box. Therefore, creating Intermediate Representation (IR) for a model using them requires additional steps. This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to plug in your own implementation for existing or completely new operations.
Custom operations, that is those not included in the list, are not recognized by OpenVINO™ out-of-the-box. The need for a custom operation may appear in two main cases:
If your model contains operations not normally supported by OpenVINO™, the OpenVINO™ Extensibility API lets you add support for those custom operations and use one implementation for Model Optimizer and OpenVINO™ Runtime.
1. A regular framework operation that is new or rarely used, which is why it hasn’t been implemented in OpenVINO yet.
There are two steps to support inference of a model with custom operation(s):
1. Add support for a [custom operation in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) so
the Model Optimizer can generate the IR with the operation.
2. Create a custom operation in it as described in the [Custom Operation](add_openvino_ops.md).
2. A new user operation that was created for some specific model topology by a model author using framework extension capabilities.
## OpenVINO™ Extensions
Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations, allowing you to plug in your own implementation for them. OpenVINO™ Extensibility API lets you add support for those custom operations and use one implementation for Model Optimizer and OpenVINO™ Runtime.
An OpenVINO™ provides extensions for:
Defining a new custom operation basically consist of two parts:
- Enables the use of `ov::Core::read_model` to read models with unsupported operations
- Provides a shape inference mechanism for custom operations
- Provides an evaluate method which allow to support the operation on CPU or perform constant folding
1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). How to implement execution kernels for [GPU](./GPU_Extensibility.md) and [VPU](./VPU_Extensibility.md) is described in separate guides.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details. You can review the complete code, which is fully compilable and up-to-date, to see how it works.
2. Mapping rule that facilitates conversion of framework operation representation to OpenVINO defined operation semantics.
## Load extensions to OpenVINO™ Runtime
The first part is required for inference, the second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part, the next sections will describe them in detail.
## Definition of Operation Semantics
If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. When deciding feasibility of such decomposition refer to the latest OpenVINO operation set. You can use any valid combination of exiting operations. How to map a custom operation is described in the next section of this document.
If such decomposition is not possible or appears too bulky with lots of consisting operations that are not performing well, then a new class for the custom operation should be implemented as described in the [Custom Operation Guide](add_openvino_ops.md).
Prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise try to decompose the operation first as described above and then after verifying correctness of inference and resulting performance, optionally invest to implementing bare metal C++ implementation.
## Mapping from Framework Operation
Depending on model format used for import, mapping of custom operation is implemented differently, choose one of:
1. If model is represented in ONNX (including models exported from Pytorch in ONNX) or PaddlePaddle formats, then one of the classes from [Frontend Extension API](frontend_extensions.md) should be used. It consists of several classes available in C++ which can be used with Model Optimizer `--extensions` option or when model is imported directly to OpenVINO run-time using read_model method. Python API is also available for run-time model importing.
2. If model is represented in TensorFlow, Caffe, Kaldi or MXNet formats, then [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) should be used. This approach is available for model conversion in Model Optimizer only.
Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle) and legacy frontends (TensorFlow, Caffe, Kaldi and Apache MXNet). Model Optimizer can use both front-ends in contrast to the direct import of model with `read_model` method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend.
If you are implementing extensions for ONNX or PaddlePaddle new frontends and plan to use Model Optimizer `--extension` option for model conversion, then the extensions should be
1. Implemented in C++ only
2. Compiled as a separate shared library (see details how to do that later in this guide).
You cannot write new frontend extensions using Python API if you plan to use them with Model Optimizer.
Remaining part of this guide uses Frontend Extension API applicable for new frontends.
## Registering Extensions
A custom operation class and a new mapping frontend extension class object should be registered to be usable in OpenVINO runtime.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works.
To load the extensions to the `ov::Core` object, use the `ov::Core::add_extension` method, this method allows to load library with extensions or extensions from the code.
@@ -44,27 +76,50 @@ To load the extensions to the `ov::Core` object, use the `ov::Core::add_extensio
Extensions can be loaded from code with `ov::Core::add_extension` method:
`Identity` is custom operation class defined in [Custom Operation Guide](add_openvino_ops.md). This is enough to enable reading IR which uses `Identity` extension operation emitted by Model Optimizer. To be able to load original model directly to the runtime, you need to add also a mapping extension:
When Python API is used there is no way to implement a custom OpenVINO operation. Also, even if custom OpenVINO operation is implemented in C++ and loaded to the runtime through a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. Use C++ shared library approach to implement both operations semantics and framework mapping in this case.
You still can use Python for operation mapping and decomposition in case if operations from the standard OpenVINO operation set is used only.
### Create library with extensions
You need to create extension library in following cases:
-Load extensions to Model Optimizer
- Load extensions to Python application
You need to create extension library in the following cases:
-Convert model with custom operations in Model Optimizer
- Load model with custom operations in Python application. It is applicable for both framework model and IR.
- Loading models with custom operations in tools that support loading extensions from a library, for example `benchmark_app`.
If you want to create an extension library, for example in order to load these extensions to the Model Optimizer, you need to do next steps:
Create an entry point for extension library. OpenVINO™ provides an `OPENVINO_CREATE_EXTENSIONS()` macro, which allows to define an entry point to a library with OpenVINO™ Extensions.
@@ -92,24 +147,25 @@ $ cmake --build .
After the build you can use path to your extension library to load your extensions to OpenVINO™ Runtime:
# How to Implement Custom Layers for VPU (Intel® Neural Compute Stick 2) {#openvino_docs_Extensibility_UG_VPU_Kernel}
To enable operations not supported by OpenVINO™ out of the box, you need a custom extension for Model Optimizer, a custom nGraph operation set, and a custom kernel for the device you will target. This page describes custom kernel support for one the VPU, the Intel® Neural Compute Stick 2 device, which uses the MYRIAD device plugin.
> **NOTES:**
> * OpenCL\* custom layer support is available in the preview mode.
> * This section assumes you are familiar with developing kernels using OpenCL.
To customize your topology with an OpenCL layer, carry out the tasks described on this page:
1. Write and compile your OpenCL code with the standalone offline OpenCL compiler (`clc`).
2. Write a configuration file to bind the OpenCL kernel to the topology file (`.xml`) of the model IR.
3. Pass the configuration file to the OpenVINO™ Runtime with the model IR.
> **NOTE**: OpenCL compiler, targeting Intel® Neural Compute Stick 2 for the SHAVE* processor only, is redistributed with OpenVINO.
OpenCL support is provided by ComputeAorta* and is distributed under a license agreement between Intel® and Codeplay* Software Ltd.
The OpenCL toolchain for the Intel® Neural Compute Stick 2 supports offline compilation only, so first compile OpenCL C code using the standalone `clc` compiler. You can find the compiler binary at `<INSTALL_DIR>/tools/cl_compiler`.
> **NOTE**: By design, custom OpenCL layers support any OpenCL kernels written assuming OpenCL version 1.2. It also supports half float extension and is optimized for this type, because it is a native type for Intel® Movidius™ VPUs.
1. Prior to running a compilation, make sure that the following variables are set:
2. Run the compilation with the command below. You should use `--strip-binary-header` to make an OpenCL runtime-agnostic binary runnable with the OpenVINO™ Runtime.
To tie the topology IR for a layer you customize, prepare a configuration file, so that the OpenVINO™ Runtime can find parameters for your kernel and the execution work grid is described.
For example, consider the following OpenCL kernel signature:
```cpp
__kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride);
```
A configuration file for this kernel might be the following:
Each custom layer is described with the `CustomLayer` node. It has the following nodes and attributes:
- Root node `CustomLayer` contains the following attributes:
- `name` – (Required) The name of the OpenVINO™ Runtime layer to bind the kernel with.
- `type` and `version` – (Required) Reserved for future use. Set them to `MVCL` and `1` respectively.
- `max-shaves` – (Optional) The maximum number of SHAVE cores that should be dedicated for the layer. It is useful for debugging concurrency issues or for resource saving that memory bound kernel does not scale well with the number of cores, so more resources can be left for the rest of a topology.
- Sub-node `Kernel` must contain the following attributes:
- `entry` – The name of your kernel function as you defined it in a source file. In the example above, it is `reorg_nhwc`.
- Node `Source` must contain the following attributes:
- `filename` – The path to a compiled binary relative to the XML configuration file.
- Sub-node `Parameters` – Describes parameters bindings. For more information, see the description below.
- Sub-node `WorkSizes` – Describes local and global work group sizes and the source for dimension deduction as a pair `direction,port`. In the example above, the work group is described relatively to the dimension of the input tensor that comes through port 0 in the IR. `global` and `local` work group configurations support any simple math expressions with +,-,\*,/, and () from `B`(batch), `Y`(height), `X`(width) and `F`(channels).
- Sub-node `Where` – Allows to customize bindings with the `key="value"` attribute. For example, to substitute only 3x3 convolutions, write `<Wherekernel="3,3"/>` in the binding xml.
Parameter description supports `Tensor` of one of tensor types such as `input`, `output`, `input_buffer`, `output_buffer` or `data`, `Scalar`, or `Data` nodes and has the following format:
- Each `Tensor` node of `input` or `output` type must contain the following attributes:
- `arg-name` – The name of a kernel parameter in the kernel signature.
- `type` – Node type: `input` or `output` as specified in the IR.
- `port-index` – A number of input/output ports as specified in the IR.
- `format` – The channel order in the tensor. Optional conversion layers are generated if the custom layer format is not compatible with formats of neighboring layers. `BFXY`, `BYXF`, and `ANY` formats are supported currently.
- Each `Tensor` node of `input_buffer` or `output_buffer` type must contain the following attributes:
- `arg-name` – The name of a kernel parameter in the kernel signature.
- `type` – Node type: `input_buffer` or `output_buffer`. Use the appropriate type to bind multiple kernels that correspond to different stages of the same layer.
- `port-index` – The unique identifier to bind by.
- `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings.
- `size` – Amount of bytes needed. Current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and might be expended in the future.
Here is an example of multi-stage MVN layer binding:
- Each `Scalar` node must contain the following attributes:
- `arg-name` – The name of a kernel parameter in the kernel signature.
- `type` – `int` or `float` value. It is used for correct argument extraction from IR parameters.
- `source` – Contains the name of the parameter in the IR file or input/output (`I`/`O`, `In`/`On`, where `n` is a port number)
followed by dimension `B`(batch), `Y`(height), `X`(width), or `F`(channels).
- Each `Data` node must contain the following attributes:
- `arg-name` – The name of a kernel parameter in the kernel signature.
- `type` – Node type. Currently, `local_data` is the only supported value, which defines buffer allocated in fast local on-chip memory. It is limited to 100KB for all `__local` and
`__private` arrays defined inside the kernel as well as all `__local` parameters passed to the kernel. Note that a manual-DMA extension requires double buffering.
If the custom layer is detected to run out of local memory, the inference fails.
- `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings.
- `size` – Amount of bytes needed. The current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and may be extended in the future.
The example binding below illustrates a kernel with two local buffers passed to the kernel.
> **NOTE**: If both native and custom layer implementations are present, the custom kernel has a priority over the native one.
Before loading the network that features the custom layers, provide a separate configuration file and load it using the ov::Core::set_property() method with the "CONFIG_KEY" key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
@snippet docs/snippets/vpu/custom_op.cpp part0
## Optimizing Kernels with OpenCL for VPU (Intel® Neural Compute Stick 2)
This section provides optimization guidelines on writing custom layers with OpenCL for VPU devices. Knowledge about general OpenCL
programming model and OpenCL kernel language is assumed and not a subject of this section. The OpenCL model mapping to VPU is described in the table below.
| OpenCL Model | VPU Mapping|
|-----|----|
| Device code | Executed on SHAVE cores |
| Private memory | Mapped to CMX internal memory, limited to 100KB per work group, valid only while the work group is executed |
| Local memory | Mapped to CMX internal memory, limited to 100KB per work group, valid only while the work group is executed |
| Global memory | Mapped to DDR, used to pass execution preserved parameters for inputs, outputs, and blobs |
| Work group | Executed on a single SHAVE core iterating over multiple work items |
Note that by the OpenCL specification, the work group execution order is not specified. This means that it is your
responsibility to ensure that race conditions among work groups are not introduced. Custom layer runtime spits evenly
work grid among available compute resources and executes them in an arbitrary order. This static scheduling approach works best if the load is evenly spread out across work groups, which is a typical case for Deep Learning kernels. The following guidelines are recommended to use for work group partitioning:
1. Split work evenly across work groups.
2. Adjust work group granularity to maintain equal workload for all compute codes.
3. Set the maximum number of cores using the `max-shaves` attribute for the `CustomLayer` node. This keeps more resources for the rest of topology. It is also useful if the kernel scalability reached its limits, which may happen while optimizing memory bound kernels or kernels with poor parallelization.
4. Try an alternate data layout (`BFXY`/`BYXF`) for the kernel if it improves work group partitioning or data access patterns.
Consider not just specific layer boost, but full topology performance because data conversion layers would be automatically inserted
as appropriate.
Offline OpenCL compiler (`clc`) features automatic vectorization over `get_global_id(0)` usage, if uniform access is detected.
For example, the kernel below could be automatically vectorized:
However, this work-group based vectorizer (WGV) conflicts with the default LLVM vectorizer based on superword level parallelism
(SLP) for the current compiler version. Manual vectorization is recommended to provide the best performance for non-uniform code
patterns. WGV works if and only if vector types are not used in the code.
Here is a short list of optimization tips:
1. Help auto-vectorizer ensure non-aliasing pointers for kernel parameters by putting `restrict` where possible.
- This can give a performance boost, especially for kernels with unrolling, like `ocl_grn` from the example below.
- Place `restrict` markers for kernels with manually vectorized codes. In the `ocl_grn` kernel below, the unrolled version without `restrict` is up to 20% slower than the most optimal one, which combines unrolling and `restrict`.
2. Put `#‍pragma unroll N` to your loop header. The compiler does not trigger unrolling by default, so it is your responsibility to
annotate the code with pragmas as appropriate. The `ocl_grn` version with `#‍pragma unroll 4` is up to 50% faster, most of which comes from unrolling the first loop, because LLVM, in general, is better in scheduling 3-stage loops (load-compute-store), while the fist loop
`variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]);` is only 2-stage (load-compute). Pay
attention to unrolling such cases first. Unrolling factor is loop-dependent. Choose the smallest number that
still improves performance as an optimum between the kernel size and execution speed. For this specific kernel, changing the unroll factor from `4` to `6` results in the same performance, so unrolling factor equal to 4 is an optimum. For Intel® Neural Compute Stick 2, unrolling is conjugated with the automatic software pipelining for load, store, and compute stages:
Both versions perform the same, but the second one has more complex code.
3. If it is easy to predict the work group size, you can also use the `reqd_work_group_size` kernel attribute to ask the compiler
to unroll the code up to the local size of the work group. Note that if the kernel is actually executed with the
different work group configuration, the result is undefined.
4. Prefer to use the `half` compute if it keeps reasonable accuracy. 16-bit float is a native type for Intel® Neural Compute Stick 2, most of the functions `half_*` are mapped to a single hardware instruction.
Use the standard `native_*` function for the rest of types.
5. Prefer to use the `convert_half` function over `vstore_half` if conversion to 32-bit float is required. `convert_half` is mapped to a single hardware instruction. For the `cvtf32f16` kernel above, the line `outImage[idx] = convert_half(inImage[idx]*scale+bais);` is eight times slower than the code with `vstore_half`.
6. Mind early exits. Early exit can be extremely costly for the current version of the `clc` compiler due to conflicts with the
auto-vectorizer. The generic advice would be to setup local size by `x` dimension equal to inputs or/and outputs width.
If it is impossible to define the work grid that exactly matches inputs or/and outputs to eliminate checks, for example,
`if (get_global_id(0) >= width) return`, use line-wise kernel variant with manual vectorization.
The kernel example below demonstrates the impact of early exits on kernel performance.
This `reorg` kernel is auto-vectorizable, but an input for YOLO v2 topology is `NCHW=<1,64,26,26>` and it is not multiple of vector width, which is `8` for `half` data type. As a result, the Inference Engine does not select the auto-vectorized kernel.
To compare performance of auto-vectorized and scalar version of the kernel, change the input size to`NCHW=<1,64,26,32>`. This enables the auto-vectorized version to be selected by the Inference Engine and can give you about 30% uplift.
Since the auto-vectorized version is faster, it makes sense to enable it for the YOLO v2 topology input size by setting the local size multiple of vector, for example, 32, and adjust global sizes accordingly. As a result, the execution work grid exceeds actual input dimension, so out-of-bound checks should be inserted. See the updated kernel version below:
```cpp
// Version with out-of-bound checks added
__kernel void reorg(const __global half* restrict src, __global half* restrict out, int W, int stride)
This code performs the same as the initial kernel above (scalar) due to branching overhead. If you replace min/max expression `w = min(w, W-1);` with `if (w >= W) return;`, runtime increases up to 2x against to code without branching (initial version).<br>
If branching is inevitable for your element-based kernel, it is recommended to change the scheme to line-based. See the kernel variant below:
```cpp
// Line-wise version
__kernel void reorg(const __global half* restrict src, __global half* restrict out, int H, int W, int stride)
This decreases the execution time up to 40% against the best performing vectorized kernel without early exits (initial version).
7. Reuse computations among work items by using line-based kernels or sharing values though `__local` memory.
8. Improve data access locality. Most of custom kernels are memory bound while convolution and fully connected layers are hardware-implemented. The code below demonstrates a further optimized version of the `reorg` kernel unrolled by `stride`:
`scr` data in this case loaded only once. As the result, the cycle count drops up to 45% against the line-wise version.
9. Copy data from `__dlobal` to `__local` or `__private` memory if the data is accessed more than once. Access to
`__dlobal` memory is orders of magnitude slower than access to `__local`/`__private` due to statically scheduled pipeline, which
stalls completely on memory access without any prefetch. The same recommendation is applicable for scalar load/store
from/to a `__blobal` pointer since work-group copying could be done in a vector fashion.
10. Use a manual DMA extension. Local (on-chip) memory throughput is up to 24x higher than DDR throughput. Starting from OpenVINO™ 2020.1, VPU OpenCL features manual-DMA kernel extension to copy sub-tensor used by work group into local memory and performing compute without DDR evolved. Here is the simple GRN kernel implementation that runs over DDR. Local size is in the form (width of the input tensor, 1, 1) to define a large enough work group to get code automatically vectorized and unrolled, while global size is (width of the input tensor, height of the input tensor, 1):
```cpp
__kernel void grn_NCHW(
__global const half* restrict src_data,
__global half* restrict dst_data,
int C,
float bias)
{
float variance = bias + 1e-9f;
#pragma unroll 4
for (int c = 0; c < C; c++)
{
float val = (float) src_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)];
This kernel can be rewritten to introduce special data binding `__dma_preload` and `__dma_postwrite intrinsics`. This means that instead of one kernel, a group of three kernels should be implemented: `kernelName`, `__dma_preload_kernelName`, and `__dma_postwrite_kernelName`. `__dma_preload_kernelName` for a particular work group `n` is guaranteed to be executed before the `n`-th work group itself, while `__dma_postwrite_kernelName` is guaranteed to be executed after a corresponding work group. You can define one of those functions that are intended to be used to copy data from-to `__global` and `__local` memory. The syntactics requires exact functional signature match. The example below illustrates how to prepare your kernel for manual-DMA.
```cpp
__kernel void __dma_preload_grn_NCHW(
__global const half* restrict src,
__global half* restrict dst,
__local half* restrict local_src,
__local half* restrict local_dst,
int C,
float bias)
{
// ToDO: copy required piece of src tensor into local_src
}
__kernel void __dma_postwrite_grn_NCHW(
__global const half* restrict src,
__global half* restrict dst,
__local const half* restrict local_src,
__local half* restrict local_dst,
int C,
float bias)
{
// ToDO: copy back computed piece of local_dst into dst
}
__kernel void grn_NCHW(
__global const half* restrict src_data,
__global half* restrict dst_data,
__local half* restrict src,
__local half* restrict dst,
int C,
float bias)
{
// same as the example above
}
```
The GRN kernel operates on channel-major tensors to compute average over full channel range and then normalizes input elements to produce the output.
As a part of the manual DMA extension, a group of work group copy functions are introduced in addition to `async_work_group_copy`, which is also mapped to a DMA call.
Here is the list of supported functions:
```cpp
// 2D sub-tensor copy
event_t WorkGroupDmaCreateStrideTransaction(
const local T *src,
global T *dst,
size_t src_width, // width of the line of source in bytes
size_t dst_width, // width of the line of destination in bytes
size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes
size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes
size_t size, // total number of bytes loaded for all lines from source to destination
event_t event) __OVERLOAD;
event_t WorkGroupDmaCreateStrideTransaction(
const global T *src,
local T *dst,
size_t src_width, // width of the line of source in bytes
size_t dst_width, // width of the line of destination in bytes
size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes
size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes
size_t size, // total number of bytes loaded for all lines from source to destination
event_t event) __OVERLOAD;
// 3D sub-tensor copy
event_t WorkGroupDmaCreate3DTransaction(
const local T *src,
global T *dst,
size_t src_width, // width of the line of source in bytes
size_t dst_width, // width of the line of destination in bytes
size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes
size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes
size_t num_planes, // number of planes to be copied
size_t src_plane_stride, // stride between corresponding 2 consecutive planes of source in bytes
size_t dst_plane_stride, // stride between corresponding 2 consecutive planes of destination in bytes
size_t size, // size of the loaded plane in bytes, analogues to the size in 2D case
event_t event) __OVERLOAD;
event_t WorkGroupDmaCreate3DTransaction(
const global T *src,
local T *dst,
size_t src_width, // width of the line of source in bytes
size_t dst_width, // width of the line of destination in bytes
size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes
size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes
size_t num_planes, // number of planes to be copied
size_t src_plane_stride, // stride between corresponding 2 consecutive planes of source in bytes
size_t dst_plane_stride, // stride between corresponding 2 consecutive planes of destination in bytes
size_t size, // size of the loaded plane in bytes, analogues to the size in 2D case
event_t event) __OVERLOAD;
```
where `T` can be `uchar`, `char`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `half` or `float`.
Modified version of the GRN kernel could be the following:
Note the `get_local_size` and `get_local_id` usage inside the kernel. 21x speedup is expected for a kernel on enet-curbs setup because it was completely limited by memory usage.
An alternative method to using DMA is to use work item copy extension. Those functions are executed inside a kernel and requires work groups equal to single work item.
Here is the list of supported work item functions:
OpenVINO™ Extension API allows you to register custom operations to support models with operations which OpenVINO™ does not support out-of-the-box.
@@ -20,14 +20,10 @@ Follow the steps below to add a custom operation:
5. Override the `visit_attributes` method, which enables serialization and deserialization of operation attributes. An `AttributeVisitor` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware `on_attribute` helper. Helpers are already implemented for standard C++ types like `int64_t`, `float`, `bool`, `vector`, and for existing OpenVINO defined types.
6. Override `evaluate`, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains `evaluate` method you also need to override the `has_evaluate` method, this method allow to get information about availability of `evaluate` method for the operation.
7. Add the `OPENVINO_FRAMEWORK_MAP` macro if you want to map custom operation to framework operation with the same name. It is an optional macro which can be used for one to one mapping. In order to use this macro please include frontend specific headers:
6. Override `evaluate`, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains `evaluate` method you also need to override the `has_evaluate` method, this method allows to get information about availability of `evaluate` method for the operation.
Based on that, declaration of an operation class can look as follows:
The goal of this chapter is to explain how to use Frontend extension classes to facilitate mapping of custom operations from framework model representation to OpenVINO representation. Refer to [Introduction to OpenVINO Extension](Intro.md) to understand entire flow.
This API is applicable for new frontends only, which exist for ONNX and PaddlePaddle. If a different model format is used, follow legacy [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) guide.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works.
## Single Operation Mapping with OpExtension
This section covers the case when a single operation in framework representation is mapped to a single operation in OpenVINO representation. This is called *one-to-one mapping*. There is `OpExtension` class that works well if all the following conditions are satisfied:
1. Number of inputs to operation in the Framework representation is the same as in the OpenVINO representation.
2. Number of outputs is also the same in both representations.
3. Inputs can be indexed and are mapped in order correspondingly, e.g. input with index 0 in framework representation maps to input with index 0 in OpenVINO representation and so on.
4. The same for outputs.
5. Each attribute in OpenVINO operation can be initialized from one of the attributes of original operation or by some predefined constant value. Value of copied attributes cannot contain expressions, value is accepted as-is, so type of a value should be compatible.
> **NOTE**: `OpExtension` class is currently available for ONNX frontend only. PaddlePaddle frontend has named inputs and outputs for operation (not indexed) therefore OpExtension mapping is not applicable for this case.
The next example maps ONNX operation with type [“Identity”]( https://github.com/onnx/onnx/blob/main/docs/Operators.md#Identity) to OpenVINO template extension `Identity` class.
The mapping doesn’t involve any attributes, as operation Identity doesn’t have them.
Extension objects, like just constructed `extension` can be used to add to the OpenVINO runtime just before the loading a model that contains custom operations:
Or extensions can be constructed in a separately compiled shared library. Separately compiled library can be used in Model Optimizer or `benchmark_app`. Read about how to build and load such library in chapter “Create library with extensions” in [Introduction to OpenVINO Extension](Intro.md).
If operation have multiple inputs and/or outputs they will be mapped in order. The type of elements in input/output tensors should match expected types in the surrounding operations. For example, if custom operation produces `f32` data type then operation that consumes this output should also support `f32`. Otherwise, model conversion fails with an error, there are no automatic type conversion happens.
### Converting to Standard OpenVINO Operation
`OpExtension` class can be used when mapping to one of the operations from standard OpenVINO operation set is what you need and there is no class like `TemplateExtension::Identity` implemented.
Here is an example for a custom framework operation “MyRelu”. Suppose it is mathematically equivalent to standard `Relu` that exists in OpenVINO operation set, but for some reason has type name “MyRelu”. In this case you can directly say that “MyRelu” -> `Relu` mapping should be used:
In the resulting converted OpenVINO model, “MyRelu” operation will be replaced by the standard operation `Relu` from the latest available OpenVINO operation set. Notice that when standard operation is used, it can be specified using just a type string (“Relu”) instead of using a `ov::opset8::Relu` class name as a template parameter for `OpExtension`. This method is available for operations from the standard operation set only. For a user custom OpenVINO operation the corresponding class should be always specified as a template parameter as it was demonstrated with `TemplateExtension::Identity`.
### Attributes Mapping
As described above, `OpExtension` is useful when attributes can be mapped one by one or initialized by a constant. If the set of attributes in framework representation and OpenVINO representation completely match by their names and types, nothing should be specified in OpExtension constructor parameters. The attributes are discovered and mapped automatically based on `visit_attributes` method that should be defined for any OpenVINO operation.
Imagine you have CustomOperation class implementation that has two attributes with names `attr1` and `attr2`:
And original model in framework representation also has operation with name “CustomOperatoin” with the same `attr1` and `attr2` attributes. Then with the following code:
both `attr1` and `attr2` are copied from framework representation to OpenVINO representation automatically. If for some reason names of attributes are different but values still can be copied “as-is” you can pass attribute names mapping in `OpExtension` constructor:
Where `fw_attr1` and `fw_attr2` are names for corresponding attributes in framework operation representation.
If copying of an attribute is not what you need, `OpExtension` also can set attribute to predefined constant value. For the same `CustomOperation`, imagine you want to set `attr2` to value 5 instead of copying from `fw_attr2`, to achieve that do the following:
So the conclusion is that each attribute of target OpenVINO operation should be initialized either by
1. Setting automatically due to name matching
2. Mapped by attribute name
3. Set to a constant value
This is achieved by specifying maps as arguments for `OpExtension` constructor.
## Mapping to Multiple Operations with ConversionExtension
Previous sections cover the case when a single operation is mapped to a single operation with optional adjustment in names and attribute values. That is likely enough for your own custom operation with existing C++ kernel implementation. In this case your framework representation and OpenVINO representation for the operation are under your control and inputs/outpus/attributes can be aligned to make `OpExtension` usable.
In case if one-to-one mapping is not possible, *decomposition to multiple operations* should be considered. It is achieved by using more verbose and less automated `ConversionExtension` class. It enables writing arbitrary code to replace a single framework operation by multiple connected OpenVINO operations constructing dependency graph of any complexity.
`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter [Build a Model in OpenVINO Runtime](@ref ov_ug_build_model) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.
The next example illustrates using `ConversionExtension` for conversion of “ThresholdedRelu” from ONNX according to the formula: `ThresholdedRelu(x, alpha) -> Multiply(x, Convert(Greater(x, alpha), type=float))`.
> **NOTE**: `ThresholdedRelu` is one of the standard ONNX operators which is supported by ONNX frontend natively out-of-the-box. Here we are re-implementing it to illustrate how you can add a similar support for your custom operation instead of `ThresholdedRelu`.
To access original framework operation attribute value and connect to inputs, `node` object of type `NodeContext` is used. It has two main methods:
*`NodeContext::get_input` to get input with a given index,
*`NodeContext::get_attribute` to get attribute value with a given name.
The conversion function should return a vector of node outputs that are mapped to corresponding outputs of the original framework operation in the same order.
OpenVINO Transformation mechanism allows to develop transformation passes to modify `ov::Model`. You can use this mechanism to apply additional optimizations to the original Model or transform unsupported subgraphs and operations to new operations which are supported by the plugin.
This guide contains all necessary information that you need to start implementing OpenVINO™ transformations.
@@ -37,8 +37,8 @@ The implementation `CompileNetwork` is fully device-specific.
The function accepts a const shared pointer to `ngraph::Function` object and performs the following steps:
1. Applies ngraph passes using `TransformNetwork` function, which defines plugin-specific conversion pipeline. To support low precision inference, the pipeline can include Low Precision Transformations. These transformations are usually hardware specific. You can find how to use and configure Low Precisions Transformations in [Low Precision Transformations](@ref openvino_docs_IE_DG_lpt) guide.
2. Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
1. Applies nGraph passes using `TransformNetwork` function, which defines plugin-specific conversion pipeline. To support low precision inference, the pipeline can include Low Precision Transformations. These transformations are usually hardware specific. You can find how to use and configure Low Precisions Transformations in [Low Precision Transformations](@ref openvino_docs_OV_UG_lpt) guide.
2. Maps the transformed graph to a backend specific graph representation (for example, to CPU plugin internal graph representation).
3. Allocates and fills memory for graph weights, backend specific memory handles and so on.
@@ -9,7 +9,7 @@ For more details about low-precision model representation please refer to this [
During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
- Independently based on the definition of *FakeQuantize* operation.
- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](@ref openvino_docs_IE_DG_Int8Inference).
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into models with low-precision operations.
Here we provide only a high-level overview of the interpretation rules of FakeQuantize.
At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**.
@@ -72,11 +72,7 @@ For example, if you would like to infer a model with `Convolution` operation in
> There are several supported quantization approaches on activations and on weights. All supported approaches are described in [Quantization approaches](#quantization-approaches) section below. In demonstrated model [FakeQuantize operation quantization](#fakequantize-operation) approach is used.
Additionally, low precision transformations can handle ONNX quantized models.
For more details on how to get a quantized model, refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
## Quantization approaches
LPT transformations support two quantization approaches:
@@ -115,63 +111,63 @@ Inside each step LPT transformations handle input model operation by operation,
As result, usually all operations are inferred by plugin in low precision. If plugin doesn't support an operation inference in low precision, then corresponding LPT transformation can be disabled, and input tensor precisions for the operation will not be changed. In this case the operation is inferred in the original precision.
Low precision transformations pipeline includes four steps:
The model on this step is changed. There are more details in developer guide [Prerequisites transformations](@ref openvino_docs_IE_DG_lpt_step1_prerequisites).
The model on this step is changed. There are more details in developer guide [Prerequisites transformations](@ref openvino_docs_OV_UG_lpt_step1_prerequisites).
### Step 2. Markup
This step creates runtime attributes for operations. These attributes will be used in next step. Transformations:
The model on this step is changed: only new attributes are added to some operations. There are more details in developer guide [Markup transformations](@ref openvino_docs_IE_DG_lpt_step2_markup).
The model on this step is changed: only new attributes are added to some operations. There are more details in developer guide [Markup transformations](@ref openvino_docs_OV_UG_lpt_step2_markup).
### Step 3. Main transformations, FakeQuantize decomposition and dequantization operations handling
This step has the most transformations. These transformations can be separated in two groups: decomposition transformation and dequantization operations handling. There are more details in developer guide [Main transformations](@ref openvino_docs_IE_DG_lpt_step3_main). Transformations:
This step has the most transformations. These transformations can be separated in two groups: decomposition transformation and dequantization operations handling. There are more details in developer guide [Main transformations](@ref openvino_docs_OV_UG_lpt_step3_main). Transformations:
Decomposition transformations decompose the `FakeQuantize` operation to: quantize (`FakeQuantize` with low precision output) and dequantization operations (opposite to quantize, with low precision input and the original precision output). For dequantization operations LPT uses three operations: `Convert`, `Subtract` and `Multiply`. Element-wise operations `Subtract` and `Multiply` have constants on the second branches. If dequantization operations are not handled at the end of LPT pipeline, then they will be fused back to the `FakeQuantize`.
@@ -197,14 +193,14 @@ Original `Convolution` operation in FP32 with dequantization operations before:
### Step 4: Cleanup of the result model
LPT cleanup transformations is final stage in LPT pipeline. In this step LPT transformations clean up the result model to avoid not handled dequantization operations: fuse dequantization operations if possible (fuse at least `Convert` operations if not) to other model operations to cleanup result model. Transformations:
There are more details in developer guide [Cleanup transformations](@ref openvino_docs_IE_DG_lpt_step4_cleanup).
There are more details in developer guide [Cleanup transformations](@ref openvino_docs_OV_UG_lpt_step4_cleanup).
`FakeQuantize` operation with not handled dequantization operations:

@@ -236,11 +232,11 @@ This step is optional. It modifies the nGraph function to a device-specific oper
Let's explore quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model. Use [Model Downloader](@ref omz_tools_downloader) tool to download the `fp16` model from [OpenVINO™ Toolkit - Open Model Zoo repository](https://github.com/openvinotoolkit/open_model_zoo):
@@ -259,7 +255,7 @@ Result model depends on different factors:
Information about layer precision is stored in the performance counters that are
available from the Inference Engine API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows:
available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows:
Prerequisites transformations are optional. The transformations prepare a model before running other low precision transformations. The transformations do not operate with dequantization operations or update precisions. Prerequisites transformations include:
This step defines the optimal `FakeQuantize` decomposition precisions for the best inference performance via operations markup with runtime attribute instances. Attributes are created for input and output ports and operations. Transformations do not change the operation output port precisions. A model markup low precision logic is decomposed and implemented into the following common markup transformations. The order of transformations is important:
@@ -25,11 +25,11 @@ The table of transformations and used attributes:
> **Note:** the same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different
Common markup transformations can be decomposed into simpler utility markup transformations. The order of Markup utility transformations is not important:
# Step 3. Main Transformations {#openvino_docs_IE_DG_lpt_step3_main}
# Step 3. Main Transformations {#openvino_docs_OV_UG_lpt_step3_main}
Main transformations are the majority of low precision transformations. Transformations operate with dequantization operations. Main transformations include:
ngraph::pass::low_precision::ClampTransformation class represents the `Clamp` operation transformation.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.