Files
openvino/cmake/features.cmake
Maxim Shevtsov 49b5e5728b Auto Batching impl (#7883)
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)

(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)

* applying/accomodating the API changes after rebase to the master

* replaying modified version of actual batch selection

* eearly experiments with model mem footprint

* changes from rebasing to the latest master

* experimenting with DG1 on the batch size selection, also collecting the mem footprint

* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap

* quick-n-dirty batch footpint vs device total mem

* code style

* testing which models perform badly due to kernels and NOT (batched) footprint

* stub  pipeline task to comunicate the readiness rather than promise/future

* quick-n-dirty timeout impl

* explicit _completionTasks,reverting BA to use the timeout

* inputs outputs copies, works with AUTO and demo now

* accomodate the config per device-id, after rebase to the latest master

* allowing the auto-batching only with tput hint to let more conventional tests pass

* fix the pre-mature timeout restaring via waiting for batch1 requests completion

* moved the bacthed request statring ( along with input copies) to the dedicated thread

* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution

* code style

* increasing the timeout to test the ssd_* models perf (timeout?) issues

* reducing number of output stuff in BA to avoid bloating the logs in experiments

* more aggressive batching for experiments, not limited to 32 and also 4 as a min

* more accurate timeout debugging info

* getting the reqs limitation from the plugin SetConfig as well

* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context

* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)

* auto-batching functional test (with results check vs ref) and GPU instance for that

* fixed arithemtic on blobs ptrs

* clang

* handling possible batched network failure

* BATCH as the constants device name in test

* ENABLE_BATCH

* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)

* DetectionOutput hetero test for the CPU

* reenabling the Auto-Batching in the AUTO

* auto-batching device enabled in the test

* fixed the DO test

* improve the loading loop logic

* brushed the config keys

* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests

* fix the test after refactoring

* clang

* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now

* auto-batching hetero test (subgraph with DetectionOutput)

* fixed minor changes that were result of experiments with impl

* code-style

* brushing, disabling CPU's HETERO tests until planned activity for 22.2

* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team

* remote blobs tests for the auto-batching (old API)

* brushed names a bit

* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests

* fixed the ieUnitTests with adding CreateContext stub to the MockICore

* clang

* improved remote-blobs tests

* revert the back BA from exeprimenents with AB + device_use_mem

* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE

* remote blobs 2.0 tests, issue with context having the orig device name

* debugging DG1 perf drop (presumably due to non-fitting the device-mem)

* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2

* remote blobs 2.0 tests for different tensor sharing types

* converting assert to throw to accomodate legacy API where the lock() was possible to be called

* revert the timeout back to avoid mixing the studies, fixed the footprint calc

* reverting to estimating the max batch by extrapolating from bacth1 size

* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication

* even graceful batch 1 handling without duplication

* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching

* AutoBatchPlugin -> ov_auto_batch_plugin

* WA for gcc 4.8

* clang

* fix misprint

* fixed errors resulted from recent OV's Variant to Any transition

* skip auto-batching for already-batched networks

* AUTO_BATCH_TIMEOUT and tests

* GPU-specific L3

* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic

* debugging device info

* enabling the config tests for the GPU and fixing the Auto-batching tests to pass

* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers

* skip auto-batching for RNNs and alikes (e.g. single CHW input)

* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating

* brushing

* Auto plugin GetMetric support gpu auto-batch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add test case

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add comments on test

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the vars names, alos adding the excpetion handling

* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures

* add try catch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the code changed in the GPU plugin

* Auto-Batch requests tests

* brushed varibles a bit (ref)

* cleaned debug output from the ie_core

* cleaned cmake for the Auto-Batch

* removed batchN estimation from batch1

* cleaned from debug printf

* comments, cleanup

* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13

* Adding back  removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1.

* brushing ie_core.cpp

* fix 32bit compilation

* Code review: ENABLE_AUTO_BATCH

* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching

* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE  wrt MODEL_PTR

* default value for the OPTIMAL_BATCH_SIZE

* clang

* accomodate new func tests location

* fix shuffle of headers after clang + copyrights

* fixed misprint made during code refactoring

* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header

* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching

* switching from the unsafe size() and minimizing time under lock

* code style

* brushed the ApplyAutoBatching

* brushed the netric/config names and descriptions

* completed the core intergration tests for the auto-batching

* ExecGraphInfo and check for incorrect cfg

* removed explicit dependencies from cmake file of the plugin

* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests

Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
2021-12-24 12:55:22 +03:00

205 lines
8.1 KiB
CMake

# Copyright (C) 2018-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
#
# Common cmake options
#
ie_dependent_option (ENABLE_INTEL_CPU "CPU plugin for inference engine" ON "X86_64" OFF)
ie_option (ENABLE_TESTS "unit, behavior and functional tests" OFF)
ie_option (ENABLE_STRICT_DEPENDENCIES "Skip configuring \"convinient\" dependencies for efficient parallel builds" ON)
ie_dependent_option (ENABLE_CLDNN "clDnn based plugin for inference engine" ON "X86_64;NOT APPLE;NOT MINGW;NOT WINDOWS_STORE;NOT WINDOWS_PHONE" OFF)
ie_dependent_option (ENABLE_INTEL_GPU "GPU plugin for inference engine on Intel GPU" ON "ENABLE_CLDNN" OFF)
if (NOT ENABLE_CLDNN OR ANDROID OR
(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0))
# oneDNN doesn't support old compilers and android builds for now, so we'll
# build GPU plugin without oneDNN
set(ENABLE_ONEDNN_FOR_GPU_DEFAULT OFF)
else()
set(ENABLE_ONEDNN_FOR_GPU_DEFAULT ON)
endif()
ie_dependent_option (ENABLE_ONEDNN_FOR_GPU "Enable oneDNN with GPU support" ON "ENABLE_ONEDNN_FOR_GPU_DEFAULT" OFF)
ie_option (ENABLE_PROFILING_ITT "Build with ITT tracing. Optionally configure pre-built ittnotify library though INTEL_VTUNE_DIR variable." OFF)
ie_option_enum(ENABLE_PROFILING_FILTER "Enable or disable ITT counter groups.\
Supported values:\
ALL - enable all ITT counters (default value)\
FIRST_INFERENCE - enable only first inference time counters" ALL
ALLOWED_VALUES ALL FIRST_INFERENCE)
ie_option (ENABLE_PROFILING_FIRST_INFERENCE "Build with ITT tracing of first inference time." ON)
ie_option_enum(SELECTIVE_BUILD "Enable OpenVINO conditional compilation or statistics collection. \
In case SELECTIVE_BUILD is enabled, the SELECTIVE_BUILD_STAT variable should contain the path to the collected InelSEAPI statistics. \
Usage: -DSELECTIVE_BUILD=ON -DSELECTIVE_BUILD_STAT=/path/*.csv" OFF
ALLOWED_VALUES ON OFF COLLECT)
ie_option(ENABLE_ERROR_HIGHLIGHT "Highlight errors and warnings during compile time" OFF)
# Try to find python3
find_package(PythonLibs 3 QUIET)
ie_dependent_option (ENABLE_PYTHON "enables ie python bridge build" OFF "PYTHONLIBS_FOUND" OFF)
find_package(PythonInterp 3 QUIET)
ie_dependent_option (ENABLE_DOCS "Build docs using Doxygen" OFF "PYTHONINTERP_FOUND" OFF)
# this option should not be a part of InferenceEngineDeveloperPackage
# since wheels can be built only together with main OV build
cmake_dependent_option (ENABLE_WHEEL "Build wheel packages for PyPi" OFF
"PYTHONINTERP_FOUND;CMAKE_SOURCE_DIR STREQUAL OpenVINO_SOURCE_DIR" OFF)
#
# Inference Engine specific options
#
# "MKL-DNN library based on OMP or TBB or Sequential implementation: TBB|OMP|SEQ"
if(X86 OR ARM OR (MSVC AND (ARM OR AARCH64)) )
set(THREADING_DEFAULT "SEQ")
else()
set(THREADING_DEFAULT "TBB")
endif()
set(THREADING "${THREADING_DEFAULT}" CACHE STRING "Threading")
set_property(CACHE THREADING PROPERTY STRINGS "TBB" "TBB_AUTO" "OMP" "SEQ")
list (APPEND IE_OPTIONS THREADING)
if (NOT THREADING STREQUAL "TBB" AND
NOT THREADING STREQUAL "TBB_AUTO" AND
NOT THREADING STREQUAL "OMP" AND
NOT THREADING STREQUAL "SEQ")
message(FATAL_ERROR "THREADING should be set to TBB, TBB_AUTO, OMP or SEQ. Default option is ${THREADING_DEFAULT}")
endif()
if((THREADING STREQUAL "TBB" OR THREADING STREQUAL "TBB_AUTO") AND
(BUILD_SHARED_LIBS OR (LINUX AND X86_64)))
set(ENABLE_TBBBIND_2_5_DEFAULT ON)
else()
set(ENABLE_TBBBIND_2_5_DEFAULT OFF)
endif()
ie_dependent_option (ENABLE_TBBBIND_2_5 "Enable TBBBind_2_5 static usage in OpenVINO runtime" ON "ENABLE_TBBBIND_2_5_DEFAULT" OFF)
ie_dependent_option (ENABLE_INTEL_GNA "GNA support for inference engine" ON
"NOT APPLE;NOT ANDROID;X86_64;CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 5.4" OFF)
if(ENABLE_TESTS OR BUILD_SHARED_LIBS)
set(ENABLE_IR_V7_READER_DEFAULT ON)
else()
set(ENABLE_IR_V7_READER_DEFAULT OFF)
endif()
ie_option (ENABLE_IR_V7_READER "Enables IR v7 reader" ${ENABLE_IR_V7_READER_DEFAULT})
ie_option (ENABLE_GAPI_PREPROCESSING "Enables G-API preprocessing" ON)
ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON)
ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON)
ie_option (ENABLE_AUTO_BATCH "Enables Auto-Batching Plugin" ON)
ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON)
ie_option (ENABLE_TEMPLATE "Enable template plugin" ON)
ie_dependent_option (ENABLE_VPU "vpu targeted plugins for inference engine" ON "NOT WINDOWS_PHONE;NOT WINDOWS_STORE" OFF)
ie_dependent_option (ENABLE_MYRIAD "myriad targeted plugin for inference engine" ON "ENABLE_VPU" OFF)
ie_dependent_option (ENABLE_MYRIAD_NO_BOOT "myriad plugin will skip device boot" OFF "ENABLE_MYRIAD" OFF)
ie_dependent_option (ENABLE_GAPI_TESTS "tests for GAPI kernels" ON "ENABLE_GAPI_PREPROCESSING;ENABLE_TESTS" OFF)
ie_dependent_option (GAPI_TEST_PERF "if GAPI unit tests should examine performance" OFF "ENABLE_GAPI_TESTS" OFF)
ie_dependent_option (ENABLE_MYRIAD_MVNC_TESTS "functional and behavior tests for mvnc api" OFF "ENABLE_TESTS;ENABLE_MYRIAD" OFF)
ie_dependent_option (ENABLE_DATA "fetch models from testdata repo" ON "ENABLE_FUNCTIONAL_TESTS;NOT ANDROID" OFF)
ie_dependent_option (ENABLE_BEH_TESTS "tests oriented to check inference engine API corecteness" ON "ENABLE_TESTS" OFF)
ie_dependent_option (ENABLE_FUNCTIONAL_TESTS "functional tests" ON "ENABLE_TESTS" OFF)
ie_dependent_option (ENABLE_SAMPLES "console samples are part of inference engine package" ON "NOT MINGW" OFF)
ie_option (ENABLE_OPENCV "enables OpenCV" ON)
ie_option (ENABLE_V7_SERIALIZE "enables serialization to IR v7" OFF)
set(IE_EXTRA_MODULES "" CACHE STRING "Extra paths for extra modules to include into OpenVINO build")
ie_dependent_option(ENABLE_TBB_RELEASE_ONLY "Only Release TBB libraries are linked to the Inference Engine binaries" ON "THREADING MATCHES TBB;LINUX" OFF)
ie_dependent_option (ENABLE_SYSTEM_PUGIXML "use the system copy of pugixml" OFF "BUILD_SHARED_LIBS" OFF)
ie_option (ENABLE_DEBUG_CAPS "enable OpenVINO debug capabilities at runtime" OFF)
ie_dependent_option (ENABLE_GPU_DEBUG_CAPS "enable GPU debug capabilities at runtime" ON "ENABLE_DEBUG_CAPS" OFF)
ie_dependent_option (ENABLE_CPU_DEBUG_CAPS "enable CPU debug capabilities at runtime" ON "ENABLE_DEBUG_CAPS" OFF)
if(ANDROID OR WINDOWS_STORE OR (MSVC AND (ARM OR AARCH64)))
set(protoc_available OFF)
else()
set(protoc_available ON)
endif()
ie_dependent_option(ENABLE_OV_ONNX_FRONTEND "Enable ONNX FrontEnd" ON "protoc_available" OFF)
ie_dependent_option(ENABLE_OV_PADDLE_FRONTEND "Enable PaddlePaddle FrontEnd" ON "protoc_available" OFF)
ie_option(ENABLE_OV_IR_FRONTEND "Enable IR FrontEnd" ON)
ie_dependent_option(ENABLE_OV_TF_FRONTEND "Enable TensorFlow FrontEnd" ON "protoc_available" OFF)
ie_dependent_option(ENABLE_SYSTEM_PROTOBUF "Use system protobuf" OFF
"ENABLE_OV_ONNX_FRONTEND OR ENABLE_OV_PADDLE_FRONTEND OR ENABLE_OV_TF_FRONTEND;BUILD_SHARED_LIBS" OFF)
ie_dependent_option(ENABLE_OV_CORE_UNIT_TESTS "Enables OpenVINO core unit tests" ON "ENABLE_TESTS;NOT ANDROID" OFF)
ie_dependent_option(ENABLE_OV_CORE_BACKEND_UNIT_TESTS "Control the building of unit tests using backends" ON
"ENABLE_OV_CORE_UNIT_TESTS" OFF)
ie_option(ENABLE_OPENVINO_DEBUG "Enable output for OPENVINO_DEBUG statements" OFF)
ie_option(ENABLE_REQUIREMENTS_INSTALL "Dynamic dependencies install" ON)
if(NOT BUILD_SHARED_LIBS AND ENABLE_OV_TF_FRONTEND)
set(FORCE_FRONTENDS_USE_PROTOBUF ON)
else()
set(FORCE_FRONTENDS_USE_PROTOBUF OFF)
endif()
# WA for ngraph python build on Windows debug
list(REMOVE_ITEM IE_OPTIONS ENABLE_OV_CORE_UNIT_TESTS ENABLE_OV_CORE_BACKEND_UNIT_TESTS)
#
# Process featues
#
if(ENABLE_OPENVINO_DEBUG)
add_definitions(-DENABLE_OPENVINO_DEBUG)
endif()
if (ENABLE_PROFILING_RAW)
add_definitions(-DENABLE_PROFILING_RAW=1)
endif()
if (ENABLE_MYRIAD)
add_definitions(-DENABLE_MYRIAD=1)
endif()
if (ENABLE_MYRIAD_NO_BOOT AND ENABLE_MYRIAD )
add_definitions(-DENABLE_MYRIAD_NO_BOOT=1)
endif()
if (ENABLE_INTEL_GPU)
add_definitions(-DENABLE_INTEL_GPU=1)
endif()
if (ENABLE_INTEL_CPU)
add_definitions(-DENABLE_INTEL_CPU=1)
endif()
if (ENABLE_INTEL_GNA)
add_definitions(-DENABLE_INTEL_GNA)
endif()
print_enabled_features()