From 49b5e5728ba62a3e93ce63557accc85664abfc04 Mon Sep 17 00:00:00 2001 From: Maxim Shevtsov Date: Fri, 24 Dec 2021 12:55:22 +0300 Subject: [PATCH] Auto Batching impl (#7883) * auto-batching POC squashed (all commits from auto-batch-2021.3 branch) (cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7) * applying/accomodating the API changes after rebase to the master * replaying modified version of actual batch selection * eearly experiments with model mem footprint * changes from rebasing to the latest master * experimenting with DG1 on the batch size selection, also collecting the mem footprint * WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap * quick-n-dirty batch footpint vs device total mem * code style * testing which models perform badly due to kernels and NOT (batched) footprint * stub pipeline task to comunicate the readiness rather than promise/future * quick-n-dirty timeout impl * explicit _completionTasks,reverting BA to use the timeout * inputs outputs copies, works with AUTO and demo now * accomodate the config per device-id, after rebase to the latest master * allowing the auto-batching only with tput hint to let more conventional tests pass * fix the pre-mature timeout restaring via waiting for batch1 requests completion * moved the bacthed request statring ( along with input copies) to the dedicated thread * [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution * code style * increasing the timeout to test the ssd_* models perf (timeout?) issues * reducing number of output stuff in BA to avoid bloating the logs in experiments * more aggressive batching for experiments, not limited to 32 and also 4 as a min * more accurate timeout debugging info * getting the reqs limitation from the plugin SetConfig as well * refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context * let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4) * auto-batching functional test (with results check vs ref) and GPU instance for that * fixed arithemtic on blobs ptrs * clang * handling possible batched network failure * BATCH as the constants device name in test * ENABLE_BATCH * func tests for CPU, also DetectionOutput hetero tests (CPU and GPU) * DetectionOutput hetero test for the CPU * reenabling the Auto-Batching in the AUTO * auto-batching device enabled in the test * fixed the DO test * improve the loading loop logic * brushed the config keys * allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests * fix the test after refactoring * clang * moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now * auto-batching hetero test (subgraph with DetectionOutput) * fixed minor changes that were result of experiments with impl * code-style * brushing, disabling CPU's HETERO tests until planned activity for 22.2 * removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team * remote blobs tests for the auto-batching (old API) * brushed names a bit * CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests * fixed the ieUnitTests with adding CreateContext stub to the MockICore * clang * improved remote-blobs tests * revert the back BA from exeprimenents with AB + device_use_mem * conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE * remote blobs 2.0 tests, issue with context having the orig device name * debugging DG1 perf drop (presumably due to non-fitting the device-mem) * disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2 * remote blobs 2.0 tests for different tensor sharing types * converting assert to throw to accomodate legacy API where the lock() was possible to be called * revert the timeout back to avoid mixing the studies, fixed the footprint calc * reverting to estimating the max batch by extrapolating from bacth1 size * more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication * even graceful batch 1 handling without duplication * WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching * AutoBatchPlugin -> ov_auto_batch_plugin * WA for gcc 4.8 * clang * fix misprint * fixed errors resulted from recent OV's Variant to Any transition * skip auto-batching for already-batched networks * AUTO_BATCH_TIMEOUT and tests * GPU-specific L3 * switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic * debugging device info * enabling the config tests for the GPU and fixing the Auto-batching tests to pass * making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers * skip auto-batching for RNNs and alikes (e.g. single CHW input) * fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating * brushing * Auto plugin GetMetric support gpu auto-batch Signed-off-by: Hu, Yuan2 * add test case Signed-off-by: Hu, Yuan2 * add comments on test Signed-off-by: Hu, Yuan2 * brushing the vars names, alos adding the excpetion handling * disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures * add try catch Signed-off-by: Hu, Yuan2 * brushing the code changed in the GPU plugin * Auto-Batch requests tests * brushed varibles a bit (ref) * cleaned debug output from the ie_core * cleaned cmake for the Auto-Batch * removed batchN estimation from batch1 * cleaned from debug printf * comments, cleanup * WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13 * Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac19d20dd56f36d4e824bf273fd6ea917. * brushing ie_core.cpp * fix 32bit compilation * Code review: ENABLE_AUTO_BATCH * consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching * renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR * default value for the OPTIMAL_BATCH_SIZE * clang * accomodate new func tests location * fix shuffle of headers after clang + copyrights * fixed misprint made during code refactoring * moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header * switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching * switching from the unsafe size() and minimizing time under lock * code style * brushed the ApplyAutoBatching * brushed the netric/config names and descriptions * completed the core intergration tests for the auto-batching * ExecGraphInfo and check for incorrect cfg * removed explicit dependencies from cmake file of the plugin * disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests Co-authored-by: Roman Lyamin Co-authored-by: Hu, Yuan2 --- cmake/features.cmake | 2 + docs/IE_DG/supported_plugins/GPU.md | 3 + docs/snippets/GPU_Metric1.cpp | 8 + .../api/intel_gpu/runtime/device_info.hpp | 5 + .../benchmark_app/remote_blobs_filling.cpp | 1 + samples/cpp/benchmark_app/utils.cpp | 6 +- src/bindings/c/tests/CMakeLists.txt | 4 + src/inference/dev_api/ie_icore.hpp | 28 + .../dev_api/performance_heuristics.hpp | 8 +- .../threading/ie_thread_safe_containers.hpp | 86 +++ src/inference/include/ie/ie_plugin_config.hpp | 21 + src/inference/src/ie_core.cpp | 137 +++- src/plugins/CMakeLists.txt | 4 + src/plugins/auto/executable_network.cpp | 25 +- src/plugins/auto/executable_network.hpp | 80 +- src/plugins/auto_batch/CMakeLists.txt | 20 + src/plugins/auto_batch/auto_batch.cpp | 731 ++++++++++++++++++ src/plugins/auto_batch/auto_batch.hpp | 159 ++++ src/plugins/intel_cpu/src/mkldnn_plugin.cpp | 4 +- src/plugins/intel_gpu/src/plugin/plugin.cpp | 74 +- .../inference_engine/CMakeLists.txt | 4 + .../include/api_conformance_helpers.hpp | 11 +- .../src/behavior/infer_request/callback.cpp | 6 + .../src/behavior/infer_request/io_blob.cpp | 6 + .../behavior/infer_request/multitheading.cpp | 6 + .../infer_request/set_blob_by_type.cpp | 6 + .../src/behavior/infer_request/wait.cpp | 5 + .../auto_batching/auto_batching_tests.cpp | 31 + .../cldnn_remote_blob_tests.cpp | 44 +- .../gpu_remote_tensor_tests.cpp | 78 +- .../auto_batching/auto_batching_tests.cpp | 31 + .../executable_network/exec_net_base.cpp | 11 + .../executable_network/get_metric.cpp | 10 +- .../behavior/infer_request/callback.cpp | 10 + .../behavior/infer_request/multithreading.cpp | 10 + .../behavior/infer_request/wait.cpp | 19 +- .../behavior/ov_plugin/core_integration.cpp | 10 +- .../behavior/plugin/configuration_tests.cpp | 57 +- .../behavior/plugin/core_integration.cpp | 31 +- .../functional/plugin/shared/CMakeLists.txt | 5 + .../auto_batching/auto_batching_tests.hpp | 161 ++++ .../common_test_utils/test_constants.hpp | 1 + .../cpp_interfaces/interface/mock_icore.hpp | 3 + .../ngraph_functions/subgraph_builders.hpp | 38 + .../unit/auto/exec_network_get_metrics.cpp | 61 +- .../behavior/shared_tests/CMakeLists.txt | 5 + .../functional/shared_tests/CMakeLists.txt | 4 + 47 files changed, 1882 insertions(+), 188 deletions(-) create mode 100644 src/inference/dev_api/threading/ie_thread_safe_containers.hpp create mode 100644 src/plugins/auto_batch/CMakeLists.txt create mode 100644 src/plugins/auto_batch/auto_batch.cpp create mode 100644 src/plugins/auto_batch/auto_batch.hpp create mode 100644 src/tests/functional/plugin/cpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp create mode 100644 src/tests/functional/plugin/gpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp create mode 100644 src/tests/functional/plugin/shared/include/auto_batching/auto_batching_tests.hpp diff --git a/cmake/features.cmake b/cmake/features.cmake index 1a238a20e2a..e1655250327 100644 --- a/cmake/features.cmake +++ b/cmake/features.cmake @@ -100,6 +100,8 @@ ie_option (ENABLE_GAPI_PREPROCESSING "Enables G-API preprocessing" ON) ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON) ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON) +ie_option (ENABLE_AUTO_BATCH "Enables Auto-Batching Plugin" ON) + ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON) ie_option (ENABLE_TEMPLATE "Enable template plugin" ON) diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index ae294da6770..7dc8f7b0023 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -141,6 +141,9 @@ When specifying key values as raw strings (that is, when using Python API), omit @snippet snippets/GPU_Metric1.cpp part1 +* OPTIMAL_BATCH_SIZE : Returns _optimal_ batch size for a given network on the given GPU device. The returned value is aligned to power of 2. Also, MODEL_PTR is the required option for this metric since the optimal batch size highly depends on the model. If the MODEL_PTR is not given, the value of 1 is returned. The example code to set the required and optional configs for this metric is available in the following snippet: + +@snippet snippets/GPU_Metric1.cpp part2 ## GPU Context and Video Memory Sharing RemoteBlob API See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md) diff --git a/docs/snippets/GPU_Metric1.cpp b/docs/snippets/GPU_Metric1.cpp index 50ccb61a4cd..3d5ee4129db 100644 --- a/docs/snippets/GPU_Metric1.cpp +++ b/docs/snippets/GPU_Metric1.cpp @@ -14,4 +14,12 @@ options.insert(std::make_pair("AVAILABLE_DEVICE_MEM_SIZE", available_device_mem_ auto max_batch_size = core.GetMetric("GPU", GPU_METRIC_KEY(MAX_BATCH_SIZE), options).as(); //! [part1] +//! [part2] +std::map opt = {{"MODEL_PTR", cnnNetwork.getFunction()}}; // Required. Same usage as for the MAX_BATCH_SIZE above. If not set, the OPTIONAL_BATCH_SIZE returns 1. +// This is not entirely GPU-specific metric (so METRIC_KEY is used rather than GPU_METRIC_KEY below), +// but the GPU is the only device that supports that at the moment. +// For the GPU, the metric already accommodates limitation for the on-device memory that the MAX_BATCH_SIZE poses. +// so OPTIMAL_BATCH_SIZE is always less than MAX_BATCH_SIZE. Unlike the latter it is also aligned to the power of 2. +auto optimal_batch_size = core.GetMetric("GPU", METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as(); +//! [part2] } diff --git a/inference-engine/thirdparty/clDNN/api/intel_gpu/runtime/device_info.hpp b/inference-engine/thirdparty/clDNN/api/intel_gpu/runtime/device_info.hpp index f1398341304..4350046bcd7 100644 --- a/inference-engine/thirdparty/clDNN/api/intel_gpu/runtime/device_info.hpp +++ b/inference-engine/thirdparty/clDNN/api/intel_gpu/runtime/device_info.hpp @@ -6,6 +6,7 @@ #include #include +#include namespace cldnn { /// @addtogroup cpp_api C++ API @@ -25,6 +26,10 @@ struct gfx_version { uint16_t major; uint8_t minor; uint8_t revision; + friend bool operator < (const gfx_version& l, const gfx_version& r) { + return std::tie(l.major, l.minor, l.revision) + < std::tie(r.major, r.minor, r.revision); // same order + } }; /// @brief Information about the device properties and capabilities. diff --git a/samples/cpp/benchmark_app/remote_blobs_filling.cpp b/samples/cpp/benchmark_app/remote_blobs_filling.cpp index cdb30ceb7e4..6a98825f87c 100644 --- a/samples/cpp/benchmark_app/remote_blobs_filling.cpp +++ b/samples/cpp/benchmark_app/remote_blobs_filling.cpp @@ -124,6 +124,7 @@ std::map> getRemoteInputBlo } auto blob = InferenceEngine::gpu::make_shared_blob(desc, context, clBuffer.back()); + blob->allocate(); remoteBlobs[name].push_back(blob); }; diff --git a/samples/cpp/benchmark_app/utils.cpp b/samples/cpp/benchmark_app/utils.cpp index 752539b5873..734c096abde 100644 --- a/samples/cpp/benchmark_app/utils.cpp +++ b/samples/cpp/benchmark_app/utils.cpp @@ -109,8 +109,10 @@ std::vector splitFloat(const std::string& s, char delim) { std::vector parseDevices(const std::string& device_string) { std::string comma_separated_devices = device_string; - if (comma_separated_devices.find(":") != std::string::npos) { - comma_separated_devices = comma_separated_devices.substr(comma_separated_devices.find(":") + 1); + auto colon = comma_separated_devices.find(":"); + if (colon != std::string::npos) { + auto bracket = comma_separated_devices.find("("); // e.g. in BATCH:GPU(4) + comma_separated_devices = comma_separated_devices.substr(colon + 1, bracket - colon - 1); } if ((comma_separated_devices == "MULTI") || (comma_separated_devices == "HETERO")) return std::vector(); diff --git a/src/bindings/c/tests/CMakeLists.txt b/src/bindings/c/tests/CMakeLists.txt index 9135c944b7a..8b0a128212b 100644 --- a/src/bindings/c/tests/CMakeLists.txt +++ b/src/bindings/c/tests/CMakeLists.txt @@ -26,6 +26,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI) add_dependencies(${TARGET_NAME} ov_auto_plugin) endif() +if(ENABLE_AUTO_BATCH) + add_dependencies(${TARGET_NAME} ov_auto_batch_plugin) +endif() + if(ENABLE_INTEL_CPU) add_dependencies(${TARGET_NAME} ov_intel_cpu_plugin) endif() diff --git a/src/inference/dev_api/ie_icore.hpp b/src/inference/dev_api/ie_icore.hpp index 4949ab51b68..0bc0bb528e1 100644 --- a/src/inference/dev_api/ie_icore.hpp +++ b/src/inference/dev_api/ie_icore.hpp @@ -16,6 +16,7 @@ #include "cpp/ie_cnn_network.h" #include "cpp_interfaces/interface/ie_iexecutable_network_internal.hpp" #include "ie_parameter.hpp" +#include "ie_remote_context.hpp" #include "threading/ie_itask_executor.hpp" namespace InferenceEngine { @@ -60,6 +61,22 @@ public: const std::string& deviceName, const std::map& config = {}) = 0; + /** + * @brief Creates an executable network from a network object. + * + * Users can create as many networks as they need and use + * them simultaneously (up to the limitation of the hardware resources) + * + * @param network CNNNetwork object acquired from Core::ReadNetwork + * @param remoteCtx "Remote" (non-CPU) accelerator device-specific execution context to use + * @param config Optional map of pairs: (config parameter name, config parameter value) relevant only for this load + * operation + * @return An executable network reference + */ + virtual SoExecutableNetworkInternal LoadNetwork(const CNNNetwork& network, + const RemoteContext::Ptr& remoteCtx, + const std::map& config = {}) = 0; + /** * @brief Creates an executable network from a model file. * @@ -142,6 +159,16 @@ public: */ virtual bool DeviceSupportsImportExport(const std::string& deviceName) const = 0; + /** + * @brief Create a new shared context object on specified accelerator device + * using specified plugin-specific low level device API parameters (device handle, pointer, etc.) + * @param deviceName Name of a device to create new shared context on. + * @param params Map of device-specific shared context parameters. + * @return A shared pointer to a created remote context. + */ + virtual InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName, + const InferenceEngine::ParamMap&) = 0; + virtual bool isNewAPI() const = 0; /** @@ -165,6 +192,7 @@ public: static std::vector getHeteroDevices(std::string fallbackDevice); static std::vector getMultiDevices(std::string devicesList); + static std::string getBatchDevice(std::string devicesList); }; } // namespace InferenceEngine diff --git a/src/inference/dev_api/performance_heuristics.hpp b/src/inference/dev_api/performance_heuristics.hpp index aeb4ebcfaf0..0f68401c11b 100644 --- a/src/inference/dev_api/performance_heuristics.hpp +++ b/src/inference/dev_api/performance_heuristics.hpp @@ -23,14 +23,12 @@ struct MemBandwidthPressure { static MemBandwidthPressure MemBandwidthPressureTolerance( const std::shared_ptr nGraphFunc, - const float L2_cache_size, - const float L3_cache_size, + const float cache_size, const float memThresholdAssumeLimited = MemBandwidthPressure::LIMITED) { int total_convs = 0, mem_limited_convs = 0, compute_convs = 0, total_gemms = 0, mem_limited_gemms = 0, total_deconvs = 0, compute_deconvs = 0, mem_limited_deconvs = 0; - auto memLimitedFactor = [&](int size_data_moved, int datatype_size) -> float { - return (L2_cache_size * 1.0f /*util factor, tbd */ - / (size_data_moved * datatype_size)); + auto memLimitedFactor = [&](int size_data_moved, int datatype_size = 4) -> float { + return (cache_size / (size_data_moved * datatype_size)); }; auto isLowPrecision = [&](ngraph::element::Type type) -> bool { return (type == ngraph::element::i8) || (type == ngraph::element::u8); diff --git a/src/inference/dev_api/threading/ie_thread_safe_containers.hpp b/src/inference/dev_api/threading/ie_thread_safe_containers.hpp new file mode 100644 index 00000000000..3849339d2a2 --- /dev/null +++ b/src/inference/dev_api/threading/ie_thread_safe_containers.hpp @@ -0,0 +1,86 @@ +// Copyright (C) 2018-2021 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +/////////////////////////////////////////////////////////////////////////////////////////////////// +#pragma once + +#include +#include +#include +#include + +#include "ie_parallel.hpp" +#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO)) +# include +#endif + +namespace InferenceEngine { + +template +class ThreadSafeQueueWithSize { +public: + void push(T value) { + std::lock_guard lock(_mutex); + _queue.push(std::move(value)); + } + bool try_pop(T& value) { + std::lock_guard lock(_mutex); + if (!_queue.empty()) { + value = std::move(_queue.front()); + _queue.pop(); + return true; + } else { + return false; + } + } + size_t size() { + std::lock_guard lock(_mutex); + return _queue.size(); + } + +protected: + std::queue _queue; + std::mutex _mutex; +}; +#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO)) +template +using ThreadSafeQueue = tbb::concurrent_queue; +template +using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue; +#else +template +using ThreadSafeQueue = ThreadSafeQueueWithSize; +template +class ThreadSafeBoundedQueue { +public: + ThreadSafeBoundedQueue() = default; + bool try_push(T value) { + std::lock_guard lock(_mutex); + if (_capacity) { + _queue.push(std::move(value)); + } + return _capacity; + } + bool try_pop(T& value) { + std::lock_guard lock(_mutex); + if (_capacity && !_queue.empty()) { + value = std::move(_queue.front()); + _queue.pop(); + return true; + } else { + return false; + } + } + void set_capacity(std::size_t newCapacity) { + std::lock_guard lock(_mutex); + _capacity = newCapacity; + } + +protected: + std::queue _queue; + std::mutex _mutex; + bool _capacity = false; +}; +#endif +} // namespace InferenceEngine diff --git a/src/inference/include/ie/ie_plugin_config.hpp b/src/inference/include/ie/ie_plugin_config.hpp index 09f62301f7e..b30c403f588 100644 --- a/src/inference/include/ie/ie_plugin_config.hpp +++ b/src/inference/include/ie/ie_plugin_config.hpp @@ -118,6 +118,18 @@ DECLARE_METRIC_VALUE(BATCHED_BLOB); * String value for metric name is "RANGE_FOR_STREAMS". */ DECLARE_METRIC_KEY(RANGE_FOR_STREAMS, std::tuple); +/** + * @brief Metric to query information optimal batch size for the given device and the network + * + * Metric returns a value of unsigned int type, + * Returns optimal batch size for a given network on the given device. The returned value is aligned to power of 2. + * Also, MODEL_PTR is the required option for this metric since the optimal batch size depends on the model, + * so if the MODEL_PTR is not given, the result of the metric is always 1. + * For the GPU the metric is queried automatically whenever the OpenVINO performance hint for the throughput is used, + * so that the result (>1) governs the automatic batching (transparently to the application). + * The automatic batching can be disabled with ALLOW_AUTO_BATCHING set to NO + */ +DECLARE_METRIC_KEY(OPTIMAL_BATCH_SIZE, unsigned int); /** * @brief Metric to provide a hint for a range for number of async infer requests. If device supports streams, @@ -250,6 +262,15 @@ DECLARE_CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS); DECLARE_CONFIG_VALUE(YES); DECLARE_CONFIG_VALUE(NO); +/** + * @brief Auto-batching configuration, string for the device + batch size, e.g. "GPU(4)" + */ +DECLARE_CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG); +/** + * @brief Auto-batching configuration: string with timeout (in ms), e.g. "100" + */ +DECLARE_CONFIG_KEY(AUTO_BATCH_TIMEOUT); + /** * @brief Limit `#threads` that are used by Inference Engine for inference on the CPU. */ diff --git a/src/inference/src/ie_core.cpp b/src/inference/src/ie_core.cpp index 13987458312..29c543f97c7 100644 --- a/src/inference/src/ie_core.cpp +++ b/src/inference/src/ie_core.cpp @@ -46,6 +46,7 @@ #endif using namespace InferenceEngine::PluginConfigParams; +using namespace InferenceEngine; using namespace std::placeholders; namespace ov { @@ -94,6 +95,9 @@ Parsed parseDeviceNameIntoConfig(const std::string& deviceName, const std::ma config_[ie::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES] = deviceName.substr(std::string("AUTO:").size()); } + } else if (deviceName_.find("BATCH:") == 0) { + deviceName_ = "BATCH"; + config_[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceName.substr(6); } else { ie::DeviceIDParser parser(deviceName_); deviceName_ = parser.getDeviceName(); @@ -480,14 +484,22 @@ public: return newAPI; } - ov::runtime::SoPtr LoadNetwork(const ie::CNNNetwork& network, - const std::shared_ptr& context, - const std::map& config) { + ov::runtime::SoPtr LoadNetwork( + const ie::CNNNetwork& network, + const std::shared_ptr& context, + const std::map& config) override { OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext"); if (context == nullptr) { IE_THROW() << "Remote context is null"; } + // have to deduce the device name/config from the context first auto parsed = parseDeviceNameIntoConfig(context->getDeviceName(), config); + std::string& deviceName = parsed._deviceName; + std::map& config_with_batch = parsed._config; + // if auto-batching is applicable, the below function will patch the device name and config accordingly: + ApplyAutoBatching(network, deviceName, config_with_batch); + parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch); + auto plugin = GetCPPPluginByName(parsed._deviceName); ov::runtime::SoPtr res; auto cacheManager = coreConfig.getCacheConfig()._cacheManager; @@ -508,12 +520,59 @@ public: return res; } + void ApplyAutoBatching(const ie::CNNNetwork& network, + std::string& deviceName, + std::map& config_with_batch) { + if (deviceName.find("BATCH") != std::string::npos) { + // explicitly enabled Auto-Batching e.g. in the tests + auto pos = deviceName.find_first_of(":"); + if (pos != std::string::npos) { + auto deviceNameWithBatchSize = deviceName.substr(pos + 1); + auto deviceNameWithoutBatch = DeviceIDParser::getBatchDevice(deviceNameWithBatchSize); + auto function = network.getFunction(); + // have to execute the DetectionOutput separately (without batching) + // as this layer mix-in the values from the different inputs (batch id) + bool bDetectionOutput = false; + const std::string detectionOutputOpName = ngraph::op::DetectionOutput::get_type_info_static().name; + const std::string resultOpName = ngraph::op::Result::get_type_info_static().name; + for (auto&& node : function->get_ops()) { + auto isDetectionOutputParent = [&detectionOutputOpName](decltype(node)& nd) { + for (size_t n = 0; n < nd->get_input_size(); n++) { + if (detectionOutputOpName == nd->get_input_node_ptr(n)->get_type_info().name) + return true; + } + return false; + }; + + if ((detectionOutputOpName == node->get_type_info().name) || + ((resultOpName == node->get_type_info().name) && isDetectionOutputParent(node))) { + node->get_rt_info()["affinity"] = deviceNameWithoutBatch; + bDetectionOutput = true; + } else { + node->get_rt_info()["affinity"] = "BATCH"; + } + } + if (bDetectionOutput) { + deviceName = "HETERO:BATCH," + deviceNameWithoutBatch; + config_with_batch[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceNameWithBatchSize; + } else { + deviceName = "BATCH:" + deviceNameWithBatchSize; + } + } + } + } + ie::SoExecutableNetworkInternal LoadNetwork(const ie::CNNNetwork& network, - const std::string& deviceName, + const std::string& deviceNameOrig, const std::map& config) override { OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::CNN"); - bool forceDisableCache = config.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0; - auto parsed = parseDeviceNameIntoConfig(deviceName, config); + std::string deviceName = deviceNameOrig; + std::map config_with_batch = config; + // if auto-batching is applicable, the below function will patch the device name and config accordingly: + ApplyAutoBatching(network, deviceName, config_with_batch); + + bool forceDisableCache = config_with_batch.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0; + auto parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch); if (forceDisableCache) { // remove this config key from parsed as plugins can throw unsupported exception parsed._config.erase(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)); @@ -732,6 +791,19 @@ public: return devices; } + /** + * @brief Create a new shared context object on specified accelerator device + * using specified plugin-specific low level device API parameters (device handle, pointer, etc.) + * @param deviceName Name of a device to create new shared context on. + * @param params Map of device-specific shared context parameters. + * @return A shared pointer to a created remote context. + */ + InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName, + const InferenceEngine::ParamMap& params) override { + auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params); + return GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr; + } + /** * @brief Returns reference to CPP plugin wrapper by a device name * @param deviceName A name of device @@ -1030,6 +1102,12 @@ public: deviceNames = ie::DeviceIDParser::getMultiDevices(deviceName.substr(pos + 1)); } deviceNames.emplace_back("AUTO"); + } else if (deviceName.find("BATCH") == 0) { + auto pos = deviceName.find_first_of(":"); + if (pos != std::string::npos) { + deviceNames = {ie::DeviceIDParser::getBatchDevice(deviceName.substr(pos + 1))}; + } + deviceNames.push_back("BATCH"); } else { deviceNames.push_back(deviceName); } @@ -1120,8 +1198,8 @@ std::vector DeviceIDParser::getHeteroDevices(std::string fallbackDe } std::vector DeviceIDParser::getMultiDevices(std::string devicesList) { - std::vector deviceNames; - auto trim_request_info = [](std::string device_with_requests) { + std::set deviceNames; + auto trim_request_info = [](const std::string& device_with_requests) { auto opening_bracket = device_with_requests.find_first_of('('); return device_with_requests.substr(0, opening_bracket); }; @@ -1132,14 +1210,36 @@ std::vector DeviceIDParser::getMultiDevices(std::string devicesList // we skip the #requests info here while ((pos = devicesList.find(delimiter)) != std::string::npos) { auto d = devicesList.substr(0, pos); - deviceNames.push_back(trim_request_info(d)); + if (d.find("BATCH") == 0) { + deviceNames.insert("BATCH"); + auto p = d.find_first_of(":"); + if (p != std::string::npos) + deviceNames.insert(DeviceIDParser::getBatchDevice(d.substr(p + 1))); + } else { + deviceNames.insert(trim_request_info(d)); + } devicesList.erase(0, pos + 1); } - if (!devicesList.empty()) - deviceNames.push_back(trim_request_info(devicesList)); + if (!devicesList.empty()) { + if (devicesList.find("BATCH") == 0) { + deviceNames.insert("BATCH"); + auto p = devicesList.find_first_of(":"); + if (p != std::string::npos) + deviceNames.insert(DeviceIDParser::getBatchDevice(devicesList.substr(p + 1))); + } else { + deviceNames.insert(trim_request_info(devicesList)); + } + } + return std::vector(deviceNames.begin(), deviceNames.end()); +} - return deviceNames; +std::string DeviceIDParser::getBatchDevice(std::string device) { + auto trim_request_info = [](const std::string& device_with_requests) { + auto opening_bracket = device_with_requests.find_first_of('('); + return device_with_requests.substr(0, opening_bracket); + }; + return trim_request_info(device); } class Core::Impl : public ov::runtime::CoreImpl { @@ -1207,18 +1307,7 @@ ExecutableNetwork Core::LoadNetwork(const std::string& modelPath, const std::map } RemoteContext::Ptr Core::CreateContext(const std::string& deviceName, const ParamMap& params) { - if (deviceName.find("HETERO") == 0) { - IE_THROW() << "HETERO device does not support remote context"; - } - if (deviceName.find("MULTI") == 0) { - IE_THROW() << "MULTI device does not support remote context"; - } - if (deviceName.find("AUTO") == 0) { - IE_THROW() << "AUTO device does not support remote context"; - } - - auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params); - return _impl->GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr; + return _impl->CreateContext(deviceName, params); } RemoteContext::Ptr Core::GetDefaultContext(const std::string& deviceName) { diff --git a/src/plugins/CMakeLists.txt b/src/plugins/CMakeLists.txt index 54f90dca336..1c041774a2f 100644 --- a/src/plugins/CMakeLists.txt +++ b/src/plugins/CMakeLists.txt @@ -21,3 +21,7 @@ endif() if(ENABLE_AUTO OR ENABLE_MULTI) add_subdirectory(auto) endif() + +if(ENABLE_AUTO_BATCH) + add_subdirectory(auto_batch) +endif() diff --git a/src/plugins/auto/executable_network.cpp b/src/plugins/auto/executable_network.cpp index 0f63b63f114..e5814fe891b 100644 --- a/src/plugins/auto/executable_network.cpp +++ b/src/plugins/auto/executable_network.cpp @@ -156,7 +156,8 @@ MultiDeviceExecutableNetwork::MultiDeviceExecutableNetwork(const std::string& , _needPerfCounters(needPerfCounters) , _multiPlugin(plugin) , _context(context) - , _workModeIsAUTO(true) { + , _workModeIsAUTO(true) + , _network(network) { if (_multiPlugin->GetCore() == nullptr) { IE_THROW() << "Please, work with " << _multiPlugin->GetName() << " device via InferencEngine::Core object"; } @@ -667,10 +668,30 @@ InferenceEngine::Parameter MultiDeviceExecutableNetwork::GetMetric(const std::st real = _loadContext[ACTUALDEVICE]. executableNetwork->GetMetric(name).as(); } else { + IE_ASSERT(_loadContext[CPU].isAlready == true); real = _loadContext[CPU]. executableNetwork->GetMetric(name).as(); + std::unique_lock lock(_confMutex); + auto deviceInfo = _loadContext[ACTUALDEVICE].deviceInfo; + lock.unlock(); + if (deviceInfo.deviceName.find("GPU") != std::string::npos) { + const auto& mode = deviceInfo.config.find(CONFIG_KEY(PERFORMANCE_HINT)); + if (mode != deviceInfo.config.end() && mode->second == CONFIG_VALUE(THROUGHPUT)) { + std::map options; + options["MODEL_PTR"] = _network.getFunction(); // CNNntework + try { + auto optimalBatchSize = _core->GetMetric(deviceInfo.deviceName, + METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as(); + auto rangeOfStreams = _core->GetMetric(deviceInfo.deviceName, + METRIC_KEY(RANGE_FOR_STREAMS), options).as>(); + real = (std::max)(real, std::get<1>(rangeOfStreams) * optimalBatchSize); + } catch (const InferenceEngine::Exception &iie) { + LOG_WARNING("[AUTOPLUGIN]get optimal infer requset num for GPU auto-batch failed :%s", iie.what()); + } + } + } } - unsigned int res = std::max(8u, real); + unsigned int res = (std::max)(8u, real); IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, res); } diff --git a/src/plugins/auto/executable_network.hpp b/src/plugins/auto/executable_network.hpp index 45efc0450c1..2c963d912d8 100644 --- a/src/plugins/auto/executable_network.hpp +++ b/src/plugins/auto/executable_network.hpp @@ -7,22 +7,17 @@ #include #include -#include #include #include #include #include -#include -#include -#include -#include +#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp" +#include "threading/ie_thread_safe_containers.hpp" +#include "threading/ie_itask_executor.hpp" +#include "threading/ie_executor_manager.hpp" #include "ie_icore.hpp" -#if (IE_THREAD == IE_THREAD_TBB || IE_THREAD == IE_THREAD_TBB_AUTO) -# include -#endif - #ifdef MULTIUNITTEST #define MOCKTESTMACRO virtual #define MultiDevicePlugin MockMultiDevicePlugin @@ -79,66 +74,6 @@ enum AutoLoadContextIndex { template using DeviceMap = std::unordered_map; -#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO)) -template -using ThreadSafeQueue = tbb::concurrent_queue; -template -using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue; -#else -template -class ThreadSafeQueue { -public: - void push(T value) { - std::lock_guard lock(_mutex); - _queue.push(std::move(value)); - } - bool try_pop(T& value) { - std::lock_guard lock(_mutex); - if (!_queue.empty()) { - value = std::move(_queue.front()); - _queue.pop(); - return true; - } else { - return false; - } - } -protected: - std::queue _queue; - std::mutex _mutex; -}; -template -class ThreadSafeBoundedQueue { -public: - ThreadSafeBoundedQueue() = default; - bool try_push(T value) { - std::lock_guard lock(_mutex); - if (_capacity) { - _queue.push(std::move(value)); - } - return _capacity; - } - bool try_pop(T& value) { - std::lock_guard lock(_mutex); - if (_capacity && !_queue.empty()) { - value = std::move(_queue.front()); - _queue.pop(); - return true; - } else { - return false; - } - } - void set_capacity(std::size_t newCapacity) { - std::lock_guard lock(_mutex); - _capacity = newCapacity; - } - -protected: - std::queue _queue; - std::mutex _mutex; - bool _capacity = false; -}; -#endif - class MultiDeviceExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault, public InferenceEngine::ITaskExecutor { public: @@ -148,7 +83,7 @@ public: InferenceEngine::Task _task; std::exception_ptr _exceptionPtr = nullptr; }; - using NotBusyWorkerRequests = ThreadSafeBoundedQueue; + using NotBusyWorkerRequests = InferenceEngine::ThreadSafeBoundedQueue; explicit MultiDeviceExecutableNetwork(const DeviceMap& networksPerDevice, const std::vector& networkDevices, @@ -186,8 +121,8 @@ public: std::vector _devicePriorities; const std::vector _devicePrioritiesInitial; DeviceMap _networksPerDevice; - ThreadSafeQueue _inferPipelineTasks; - DeviceMap>> _inferPipelineTasksDeviceSpecific; + InferenceEngine::ThreadSafeQueue _inferPipelineTasks; + DeviceMap>> _inferPipelineTasksDeviceSpecific; DeviceMap _idleWorkerRequests; DeviceMap> _workerRequests; std::unordered_map _config; @@ -217,6 +152,7 @@ private: std::promise _firstLoadPromise; mutable AutoLoadContext _loadContext[CONTEXTNUM]; mutable std::mutex _confMutex; + const InferenceEngine::CNNNetwork _network; }; } // namespace MultiDevicePlugin diff --git a/src/plugins/auto_batch/CMakeLists.txt b/src/plugins/auto_batch/CMakeLists.txt new file mode 100644 index 00000000000..0eb9dd31f06 --- /dev/null +++ b/src/plugins/auto_batch/CMakeLists.txt @@ -0,0 +1,20 @@ +# Copyright (C) 2018-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +# + +set(TARGET_NAME "ov_auto_batch_plugin") + +file(GLOB SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp) + +file(GLOB HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp) + +ie_add_plugin(NAME ${TARGET_NAME} + DEVICE_NAME "BATCH" + SOURCES ${SOURCES} ${HEADERS} + VERSION_DEFINES_FOR auto_batch.cpp ADD_CLANG_FORMAT) + +target_link_libraries(${TARGET_NAME} PRIVATE Threads::Threads) + +ie_add_api_validator_post_build_step(TARGET ${TARGET_NAME}) + +set_target_properties(${TARGET_NAME} PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE ${ENABLE_LTO}) diff --git a/src/plugins/auto_batch/auto_batch.cpp b/src/plugins/auto_batch/auto_batch.cpp new file mode 100644 index 00000000000..104e856201f --- /dev/null +++ b/src/plugins/auto_batch/auto_batch.cpp @@ -0,0 +1,731 @@ +// Copyright (C) 2018-2021 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +/////////////////////////////////////////////////////////////////////////////////////////////////// +#include "auto_batch.hpp" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace AutoBatchPlugin { +using namespace InferenceEngine; + +std::vector supported_configKeys = {CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CONFIG_KEY(AUTO_BATCH_TIMEOUT)}; + +template +Blob::Ptr create_shared_blob_on_top_of_batched_blob(Blob::Ptr batched_blob, size_t batch_id, size_t batch_num) { + typedef typename PrecisionTrait::value_type TYPE; + typedef typename std::add_pointer::type TYPEPTR; + auto ptr = batched_blob->buffer().as(); + auto sizePerBatch = batched_blob->size() / batch_num; + auto layout = batched_blob->getTensorDesc().getLayout(); + SizeVector dims = batched_blob->getTensorDesc().getDims(); + // the below code is a placeholder for the WIP (22.1) functionality + // that will check the reshaping by the batch is robust (CVS-51744) + if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW || + layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC || + layout == InferenceEngine::Layout::NDHWC) { + dims[0] = 1; + assert(batched_blob->getTensorDesc().getPrecision() == precision); + return make_shared_blob({precision, dims, batched_blob->getTensorDesc().getLayout()}, + ptr + sizePerBatch * batch_id, + sizePerBatch); + } else { + // same blob for all requests (e.g. constants) + return make_shared_blob({precision, dims, batched_blob->getTensorDesc().getLayout()}, ptr); + } +} + +// ------------------------------AutoBatchInferRequest---------------------------- +AutoBatchInferRequest::AutoBatchInferRequest(const InputsDataMap& networkInputs, + const OutputsDataMap& networkOutputs, + AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr, + int batch_id, + int num_batch, + bool needPerfCounters) + : IInferRequestInternal(networkInputs, networkOutputs), + _myBatchedRequestWrapper(workerRequestPtr), + _needPerfCounters(needPerfCounters), + _batchId(batch_id), + _batchSize(num_batch) { + // Allocate all input blobs + for (const auto& it : networkInputs) { + auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first); + Blob::Ptr res; + switch (it.second->getTensorDesc().getPrecision()) { + case InferenceEngine::Precision::FP32: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + case InferenceEngine::Precision::I32: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + case InferenceEngine::Precision::I8: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + case InferenceEngine::Precision::U16: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + + case InferenceEngine::Precision::I16: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + + break; + case InferenceEngine::Precision::U8: + case InferenceEngine::Precision::BOOL: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + default: + IE_THROW() << "Unsupported input precision " << it.second->getTensorDesc().getPrecision(); + } + _inputs[it.first] = res; + } + // Allocate all output blobs + for (const auto& it : networkOutputs) { + auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first); + Blob::Ptr res; + switch (it.second->getTensorDesc().getPrecision()) { + case InferenceEngine::Precision::FP32: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + case InferenceEngine::Precision::I32: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + case InferenceEngine::Precision::I8: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + case InferenceEngine::Precision::U16: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + + case InferenceEngine::Precision::I16: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + + break; + case InferenceEngine::Precision::U8: + case InferenceEngine::Precision::BOOL: + res = create_shared_blob_on_top_of_batched_blob( + _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first), + batch_id, + num_batch); + break; + default: + IE_THROW(NotImplemented) << "Unsupported input precision " << it.second->getTensorDesc().getPrecision(); + } + _outputs[it.first] = res; + } +} + +void AutoBatchInferRequest::SetBlobsToAnotherRequest(SoIInferRequestInternal& req) { + for (const auto& it : _networkInputs) { + auto& name = it.first; + // this request is already in BUSY state, so using the internal functions safely + auto blob = GetBlob(name); + if (req->GetBlob(name) != blob) + req->SetBlob(name, blob); + } + for (const auto& it : _networkOutputs) { + auto& name = it.first; + // this request is already in BUSY state, so using the internal functions safely + auto blob = GetBlob(name); + if (req->GetBlob(name) != blob) + req->SetBlob(name, blob); + } +} + +void AutoBatchInferRequest::CopyInputsIfNeeded() { + for (const auto& it : _networkInputs) { + auto& name = it.first; + // this request is already in BUSY state, so using the internal functions safely + CopyBlobIfNeeded(GetBlob(name), _myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), true); + } +} + +void AutoBatchInferRequest::CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src, + InferenceEngine::Blob::Ptr dst, + bool bInput) { + auto bufferDst = dst->buffer(); + auto ptrDst = bufferDst.as(); + auto bufferSrc = src->cbuffer(); + auto ptrSrc = bufferSrc.as(); + ptrdiff_t szDst = dst->byteSize(); + ptrdiff_t szSrc = src->byteSize(); + if (bInput) { + ptrdiff_t offset = szSrc != szDst ? _batchId * szDst / _batchSize : 0; + if ((ptrDst + offset) == ptrSrc) + return; + else + memcpy(ptrDst + offset, ptrSrc, szSrc); + } else { + ptrdiff_t offset = szSrc != szDst ? _batchId * szSrc / _batchSize : 0; + if ((ptrSrc + offset) == ptrDst) + return; + else + memcpy(ptrDst, ptrSrc + offset, szDst); + } +} + +void AutoBatchInferRequest::CopyOutputsIfNeeded() { + for (const auto& it : _networkOutputs) { + auto& name = it.first; + // this request is already in BUSY state, so using the internal functions safely + CopyBlobIfNeeded(_myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), GetBlob(name), false); + } +} + +std::map AutoBatchInferRequest::GetPerformanceCounts() const { + return _perfMap; +} + +AutoBatchAsyncInferRequest::AutoBatchAsyncInferRequest( + const AutoBatchInferRequest::Ptr& inferRequest, + const bool needPerfCounters, + InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch, + const ITaskExecutor::Ptr& callbackExecutor) + : AsyncInferRequestThreadSafeDefault(inferRequest, nullptr, callbackExecutor), + _inferRequestWithoutBatch(inferRequestWithoutBatch), + _inferRequest{inferRequest} { + // this executor starts the inference while the task (checking the result) is passed to the next stage + struct ThisRequestExecutor : public ITaskExecutor { + explicit ThisRequestExecutor(AutoBatchAsyncInferRequest* _this_) : _this{_this_} {} + void run(Task task) override { + auto& workerInferRequest = _this->_inferRequest->_myBatchedRequestWrapper; + std::pair t; + t.first = _this; + t.second = std::move(task); + workerInferRequest._tasks.push(t); + // it is ok to call size() here as the queue only grows (and the bulk removal happens under the mutex) + const int sz = workerInferRequest._tasks.size(); + if (sz == workerInferRequest._batchSize) { + workerInferRequest._cond.notify_one(); + } + }; + AutoBatchAsyncInferRequest* _this = nullptr; + }; + _pipeline = { + {/*TaskExecutor*/ std::make_shared(this), /*task*/ [this, needPerfCounters] { + if (this->_inferRequest->_exceptionPtr) // if the exception happened in the batch1 fallback + std::rethrow_exception(this->_inferRequest->_exceptionPtr); + if (this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr) // when the batchN execution failed + std::rethrow_exception(this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr); + this->_inferRequest->CopyOutputsIfNeeded(); + }}}; +} + +void AutoBatchAsyncInferRequest::Infer_ThreadUnsafe() { + InferUsingAsync(); +} + +AutoBatchAsyncInferRequest::~AutoBatchAsyncInferRequest() { + StopAndWait(); +} + +// ------------------------------AutoBatchExecutableNetwork---------------------------- +AutoBatchExecutableNetwork::AutoBatchExecutableNetwork( + const InferenceEngine::SoExecutableNetworkInternal& networkWithBatch, + const InferenceEngine::SoExecutableNetworkInternal& networkWithoutBatch, + const DeviceInformation& networkDevice, + const std::unordered_map& config, + const bool needPerfCounters) + : InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr, + std::make_shared()), + _network{networkWithBatch}, + _networkWithoutBatch{networkWithoutBatch}, + _config{config}, + _needPerfCounters{needPerfCounters} { + // WA for gcc 4.8 ( fails compilation with member init-list) + _device = networkDevice; + auto time_out = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT)); + if (time_out != config.end()) + _timeOut = ParseTimeoutValue(time_out->second.as()); +} + +AutoBatchExecutableNetwork::~AutoBatchExecutableNetwork() { + _terminate = true; + for (auto w : _workerRequests) { + w->_thread.join(); + } + _workerRequests.clear(); +} + +unsigned int AutoBatchExecutableNetwork::ParseTimeoutValue(const std::string& s) { + auto val = std::stoi(s); + if (val < 0) + IE_THROW(ParameterMismatch) << "Value for the " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " should be unsigned int"; + return val; +} + +std::shared_ptr AutoBatchExecutableNetwork::GetContext() const { + return _network->GetContext(); +} + +InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequestImpl( + InferenceEngine::InputsDataMap networkInputs, + InferenceEngine::OutputsDataMap networkOutputs) { + // todo : guard request creation from another thread/on-the-fly + auto num = _numRequestsCreated++; + auto batch_id = num % _device.batchForDevice; + if (!batch_id) { // need new request + _workerRequests.push_back(std::make_shared()); + auto workerRequestPtr = _workerRequests.back(); + workerRequestPtr->_inferRequestBatched = {_network->CreateInferRequest(), _network._so}; + workerRequestPtr->_batchSize = _device.batchForDevice; + workerRequestPtr->_completionTasks.resize(workerRequestPtr->_batchSize); + workerRequestPtr->_inferRequestBatched->SetCallback( + [workerRequestPtr, this](std::exception_ptr exceptionPtr) mutable { + if (exceptionPtr) + workerRequestPtr->_exceptionPtr = exceptionPtr; + IE_ASSERT(workerRequestPtr->_completionTasks.size() == (size_t)workerRequestPtr->_batchSize); + // notify the individual requests on the completion + for (int c = 0; c < workerRequestPtr->_batchSize; c++) { + workerRequestPtr->_completionTasks[c](); + } + // reset the timeout + workerRequestPtr->_cond.notify_one(); + }); + + workerRequestPtr->_thread = std::thread([workerRequestPtr, this] { + while (1) { + std::cv_status status; + { + std::unique_lock lock(workerRequestPtr->_mutex); + status = workerRequestPtr->_cond.wait_for(lock, std::chrono::milliseconds(_timeOut)); + } + if (_terminate) { + break; + } else { + // as we pop the tasks from the queue only here + // it is ok to call size() (as the _tasks can only grow in parallel) + const int sz = workerRequestPtr->_tasks.size(); + if (sz == workerRequestPtr->_batchSize) { + std::pair t; + for (int n = 0; n < sz; n++) { + IE_ASSERT(workerRequestPtr->_tasks.try_pop(t)); + workerRequestPtr->_completionTasks[n] = std::move(t.second); + t.first->_inferRequest->CopyInputsIfNeeded(); + } + workerRequestPtr->_inferRequestBatched->StartAsync(); + } else if ((status == std::cv_status::timeout) && sz) { + // timeout to collect the batch is over, have to execute the requests in the batch1 mode + std::pair t; + // popping all tasks collected by the moment of the time-out and execute each with batch1 + std::atomic arrived = {0}; + std::promise all_completed; + auto all_completed_future = all_completed.get_future(); + for (int n = 0; n < sz; n++) { + IE_ASSERT(workerRequestPtr->_tasks.try_pop(t)); + t.first->_inferRequestWithoutBatch->SetCallback( + [t, sz, &arrived, &all_completed](std::exception_ptr p) { + if (p) + t.first->_inferRequest->_exceptionPtr = p; + t.second(); + if (sz == ++arrived) + all_completed.set_value(); + }); + t.first->_inferRequest->SetBlobsToAnotherRequest(t.first->_inferRequestWithoutBatch); + t.first->_inferRequestWithoutBatch->StartAsync(); + } + all_completed_future.get(); + // now when all the tasks for this batch are completed, start waiting for the timeout again + } + } + } + }); + } + return std::make_shared(networkInputs, + networkOutputs, + *_workerRequests.back(), + batch_id, + _device.batchForDevice, + _needPerfCounters); +} + +InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequest() { + auto syncRequestImpl = CreateInferRequestImpl(_networkInputs, _networkOutputs); + syncRequestImpl->setPointerToExecutableNetworkInternal(shared_from_this()); + InferenceEngine::SoIInferRequestInternal inferRequestWithoutBatch = {_networkWithoutBatch->CreateInferRequest(), + _networkWithoutBatch._so}; + return std::make_shared( + std::static_pointer_cast(syncRequestImpl), + _needPerfCounters, + inferRequestWithoutBatch, + _callbackExecutor); +} + +std::shared_ptr AutoBatchExecutableNetwork::GetExecGraphInfo() { + return _network->GetExecGraphInfo() ? _network->GetExecGraphInfo() : _networkWithoutBatch->GetExecGraphInfo(); +} + +void AutoBatchExecutableNetwork::SetConfig(const std::map& config) { + auto timeout = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT)); + if (timeout == config.end() || config.size() > 1) { + IE_THROW() << "The only config that can be changed on the fly for the AutoBatching the is the " + << CONFIG_KEY(AUTO_BATCH_TIMEOUT); + } else { + _timeOut = ParseTimeoutValue(timeout->second.as()); + } +} + +InferenceEngine::Parameter AutoBatchExecutableNetwork::GetConfig(const std::string& name) const { + auto it = _config.find(name); + if (it != _config.end()) { + return it->second; + } else { + // find config key among networks config keys + auto param = _network->GetMetric(METRIC_KEY(SUPPORTED_CONFIG_KEYS)); + for (auto&& configKey : param.as>()) { + if (configKey == name) { + return _network->GetConfig(configKey); + } + } + IE_THROW(NotFound) << name << " not found in the ExecutableNetwork config"; + } +} + +InferenceEngine::Parameter AutoBatchExecutableNetwork::GetMetric(const std::string& name) const { + if (name == METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)) { + auto reqs = 0; + try { + auto hint = _network->GetConfig(CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)).as(); + reqs = InferenceEngine::PerfHintsConfig::CheckPerformanceHintRequestValue(hint); + if (!reqs) // no limitations from user, let's deduce the full blown #requests + // (multiplied by the devices capabilities to run multiple requests for further perf) + reqs = _device.batchForDevice * + _network->GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); + } catch (const InferenceEngine::Exception& iie) { + } + reqs = std::max(reqs, _device.batchForDevice); // round up to the possible user's value + IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, reqs); + } else if (name == METRIC_KEY(NETWORK_NAME)) { + IE_SET_METRIC_RETURN(NETWORK_NAME, _network->GetMetric(METRIC_KEY(NETWORK_NAME)).as()); + } else if (name == METRIC_KEY(SUPPORTED_METRICS)) { + IE_SET_METRIC_RETURN(SUPPORTED_METRICS, + {METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS), + METRIC_KEY(SUPPORTED_METRICS), + METRIC_KEY(NETWORK_NAME), + METRIC_KEY(SUPPORTED_CONFIG_KEYS)}); + } else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) { + IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, + {CONFIG_KEY(AUTO_BATCH_TIMEOUT)}); // only timeout can be changed on the fly + } else { + IE_THROW() << "Unsupported Network metric: " << name; + } +} + +// ------------------------------AutoBatchInferencePlugin---------------------------- + +namespace { + +std::map mergeConfigs(std::map config, + const std::map& local) { + for (auto&& kvp : local) { + config[kvp.first] = kvp.second; + } + return config; +} + +} // namespace + +std::map AutoBatchInferencePlugin::GetSupportedConfig( + const std::map& config, + const std::string& deviceName) const { + std::vector supportedConfigKeys = GetCore()->GetMetric(deviceName, METRIC_KEY(SUPPORTED_CONFIG_KEYS)); + std::map supportedConfig; + for (auto&& key : supportedConfigKeys) { + auto itKey = config.find(key); + if (config.end() != itKey) { + supportedConfig[key] = itKey->second; + } + } + return supportedConfig; +} + +DeviceInformation AutoBatchInferencePlugin::ParseBatchDevice(const std::string& deviceWithBatch) { + auto&& d = deviceWithBatch; + auto openingBracket = d.find_first_of('('); + auto closingBracket = d.find_first_of(')', openingBracket); + auto deviceName = d.substr(0, openingBracket); + + int batch = 1; + if (closingBracket != std::string::npos && openingBracket < closingBracket) { + batch = std::stol(d.substr(openingBracket + 1, closingBracket - 1)); + + if (batch <= 0) { + IE_THROW() << "Batch value for '" << deviceName << "' must be > 0, while " << batch << "is passed"; + } + } + return {deviceName, {{}}, batch}; +} + +DeviceInformation AutoBatchInferencePlugin::ParseMetaDevice(const std::string& devicesBatchCfg, + const std::map& config) const { + auto getDeviceConfig = [&](const DeviceName& deviceWithID) { + DeviceIDParser deviceParser(deviceWithID); + std::string deviceName = deviceParser.getDeviceName(); + std::map tconfig = mergeConfigs(_config, config); + + // set device ID if any + std::string deviceIDLocal = deviceParser.getDeviceID(); + if (!deviceIDLocal.empty()) { + tconfig[PluginConfigParams::KEY_DEVICE_ID] = deviceIDLocal; + } + + return GetSupportedConfig(tconfig, deviceName); + }; + + auto metaDevice = ParseBatchDevice(devicesBatchCfg); + metaDevice.config = getDeviceConfig(metaDevice.deviceName); + + auto cfg = config; + // check that no irrelevant config-keys left + for (auto k : config) { + const auto& name = k.first; + auto found_in_supported_cfg = std::find(supported_configKeys.begin(), supported_configKeys.end(), k.first); + auto found_in_device_cfg = metaDevice.config.find(k.first); + if (found_in_device_cfg == metaDevice.config.end() && found_in_supported_cfg == supported_configKeys.end()) { + IE_THROW() << "Unsupported config key: " << name; + } + } + return metaDevice; +} + +RemoteContext::Ptr AutoBatchInferencePlugin::CreateContext(const InferenceEngine::ParamMap& config) { + auto cfg = config; + auto it = cfg.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)); + if (it == cfg.end()) + IE_THROW() << "Value for KEY_AUTO_BATCH is not set"; + + auto val = it->second; + auto metaDevice = ParseMetaDevice(val, std::map()); + cfg.erase(it); + return GetCore()->CreateContext(metaDevice.deviceName, cfg); +} + +Parameter AutoBatchInferencePlugin::GetConfig(const std::string& name, + const std::map& options) const { + if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), name)) { + auto it = _config.find(name); + if (it == _config.end()) { + IE_THROW() << "Value for " << name << " is not set"; + } else { + return {it->second}; + } + } else { + IE_THROW() << "Unsupported config key: " << name; + } +} + +void AutoBatchInferencePlugin::CheckConfig(const std::map& config) { + for (auto&& kvp : config) { + const auto name = kvp.first; + const auto val = kvp.second; + if (supported_configKeys.end() == std::find(supported_configKeys.begin(), supported_configKeys.end(), name)) + IE_THROW() << "Unsupported config key: " << name; + if (name == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) { + ParseBatchDevice(val); + } else if (name == CONFIG_KEY(AUTO_BATCH_TIMEOUT)) { + try { + auto t = std::stoi(val); + if (t < 0) + IE_THROW(ParameterMismatch); + } catch (const std::exception& e) { + IE_THROW(ParameterMismatch) + << " Expecting unsigned int value for " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " got " << val; + } + } + } +} + +void AutoBatchInferencePlugin::SetConfig(const std::map& config) { + CheckConfig(config); + for (auto&& kvp : config) { + _config[kvp.first] = kvp.second; + } +} + +static const Version version = {{2, 1}, CI_BUILD_NUMBER, "AutoBatchPlugin"}; +IE_DEFINE_PLUGIN_CREATE_FUNCTION(AutoBatchInferencePlugin, version) + +AutoBatchInferencePlugin::AutoBatchInferencePlugin() { + _pluginName = "BATCH"; +} + +InferenceEngine::Parameter AutoBatchInferencePlugin::GetMetric( + const std::string& name, + const std::map& options) const { + if (name == METRIC_KEY(SUPPORTED_METRICS)) { + std::vector metrics; + metrics.push_back(METRIC_KEY(SUPPORTED_METRICS)); + metrics.push_back(METRIC_KEY(FULL_DEVICE_NAME)); + metrics.push_back(METRIC_KEY(SUPPORTED_CONFIG_KEYS)); + IE_SET_METRIC_RETURN(SUPPORTED_METRICS, metrics); + } else if (name == METRIC_KEY(FULL_DEVICE_NAME)) { + IE_SET_METRIC_RETURN(FULL_DEVICE_NAME, _pluginName); + } else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) { + IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, supported_configKeys); + } else { + IE_THROW(NotFound) << "Unsupported metric key " << name; + } +} + +IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl( + const InferenceEngine::CNNNetwork& network, + const std::map& config) { + return LoadNetworkImpl(network, nullptr, config); +} + +InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadNetworkImpl( + const InferenceEngine::CNNNetwork& network, + const std::shared_ptr ctx, + const std::map& config) { + if (GetCore() == nullptr) { + IE_THROW() << "Please, work with MULTI device via InferencEngine::Core object"; + } + + auto fullConfig = mergeConfigs(_config, config); + auto device_batch = fullConfig.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)); + if (device_batch == fullConfig.end()) { + IE_THROW() << "KEY_AUTO_BATCH key is not set for BATCH device"; + } + + auto metaDevice = ParseMetaDevice(device_batch->second, fullConfig); + const auto& deviceName = metaDevice.deviceName; + const auto& deviceConfig = metaDevice.config; + const auto perfConfig = fullConfig.find(PluginConfigParams::KEY_PERF_COUNT); + const bool enablePerfCounters = (fullConfig.end() != perfConfig) && (perfConfig->second == PluginConfigParams::YES); + + auto report_footprint = [](std::shared_ptr pCore, std::string device) -> size_t { + size_t footprint = 0; + // TODO: use the per-network metric (22.2) rather than plugin-level + auto stats = pCore->GetMetric(device, GPU_METRIC_KEY(MEMORY_STATISTICS)).as>(); + for (auto s : stats) + if (s.first.find("_current") != std::string::npos) + footprint += s.second; + return footprint; + }; + + size_t batch1_footprint = 0; + if (deviceName.find("GPU") != std::string::npos) + batch1_footprint = report_footprint(GetCore(), deviceName); + auto executableNetworkWithoutBatch = ctx ? GetCore()->LoadNetwork(network, ctx, deviceConfig) + : GetCore()->LoadNetwork(network, deviceName, deviceConfig); + if (deviceName.find("GPU") != std::string::npos) { + batch1_footprint = report_footprint(GetCore(), deviceName) - batch1_footprint; + if (batch1_footprint) { + const uint64_t total_mem = GetCore()->GetMetric(deviceName, GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE)); + const int estimated_batch = (total_mem - batch1_footprint) / batch1_footprint; + int closest = pow(2, floor(log(estimated_batch) / log(2))); + closest = std::max(1, closest); + metaDevice.batchForDevice = std::min(metaDevice.batchForDevice, closest); + } + } + // auto-batch settings + std::unordered_map networkConfig; + for (auto c : fullConfig) { + if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), c.first)) + networkConfig.insert(c); + } + + InferenceEngine::SoExecutableNetworkInternal executableNetworkWithBatch; + if (metaDevice.batchForDevice > 1) { + try { + CNNNetwork clonedNetwork(InferenceEngine::details::cloneNetwork(network)); + const InputsDataMap inputInfo = clonedNetwork.getInputsInfo(); + ICNNNetwork::InputShapes shapes = clonedNetwork.getInputShapes(); + for (const InputsDataMap::value_type& item : inputInfo) { + auto layout = item.second->getTensorDesc().getLayout(); + // the below code is a placeholder for the WIP (22.1) functionality + // that will check the reshaping by the batch is robust (CVS-51744) + if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW || + layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC || + layout == InferenceEngine::Layout::NDHWC) { + assert(1 == shapes[item.first][0]); // do not reshape/re-batch originally batched networks + shapes[item.first][0] = metaDevice.batchForDevice; + } + } + clonedNetwork.reshape(shapes); + executableNetworkWithBatch = + ctx ? GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, ctx, deviceConfig) + : GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, deviceName, deviceConfig); + } catch (...) { + executableNetworkWithBatch = {nullptr, nullptr}; + } + } + + if (!executableNetworkWithBatch) { + executableNetworkWithBatch = executableNetworkWithoutBatch; + metaDevice.batchForDevice = 1; + } + + return std::make_shared(executableNetworkWithBatch, + executableNetworkWithoutBatch, + metaDevice, + networkConfig, + enablePerfCounters); +} + +InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl( + const InferenceEngine::CNNNetwork& network, + const std::shared_ptr& context, + const std::map& config) { + return LoadNetworkImpl(network, context, config); +} + +InferenceEngine::QueryNetworkResult AutoBatchInferencePlugin::QueryNetwork( + const InferenceEngine::CNNNetwork& network, + const std::map& config) const { + auto cfg = config; + for (auto c : cfg) { + if (c.first == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) { + auto val = c.second; + cfg.erase(c.first); + auto metaDevice = ParseMetaDevice(val, cfg); + return GetCore()->QueryNetwork(network, metaDevice.deviceName, cfg); + } + } + IE_THROW() << "Value for KEY_AUTO_BATCH is not set"; +} +} // namespace AutoBatchPlugin diff --git a/src/plugins/auto_batch/auto_batch.hpp b/src/plugins/auto_batch/auto_batch.hpp new file mode 100644 index 00000000000..95660798417 --- /dev/null +++ b/src/plugins/auto_batch/auto_batch.hpp @@ -0,0 +1,159 @@ +// Copyright (C) 2018-2021 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +/////////////////////////////////////////////////////////////////////////////////////////////////// +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp" +#include "cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp" +#include "cpp_interfaces/interface/ie_iplugin_internal.hpp" +#include "ie_metric_helpers.hpp" +#include "threading/ie_thread_safe_containers.hpp" + +namespace AutoBatchPlugin { + +using DeviceName = std::string; + +struct DeviceInformation { + DeviceName deviceName; + std::map config; + int batchForDevice; +}; + +class AutoBatchAsyncInferRequest; +class AutoBatchExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault { +public: + using Ptr = std::shared_ptr; + struct WorkerInferRequest { + using Ptr = std::shared_ptr; + InferenceEngine::SoIInferRequestInternal _inferRequestBatched; + int _batchSize; + InferenceEngine::ThreadSafeQueueWithSize> _tasks; + std::vector _completionTasks; + std::thread _thread; + std::condition_variable _cond; + std::mutex _mutex; + std::exception_ptr _exceptionPtr; + }; + + explicit AutoBatchExecutableNetwork( + const InferenceEngine::SoExecutableNetworkInternal& networkForDevice, + const InferenceEngine::SoExecutableNetworkInternal& networkForDeviceWithoutBatch, + const DeviceInformation& networkDevices, + const std::unordered_map& config, + const bool needPerfCounters = false); + + void SetConfig(const std::map& config) override; + InferenceEngine::Parameter GetConfig(const std::string& name) const override; + InferenceEngine::Parameter GetMetric(const std::string& name) const override; + InferenceEngine::IInferRequestInternal::Ptr CreateInferRequest() override; + InferenceEngine::IInferRequestInternal::Ptr CreateInferRequestImpl( + InferenceEngine::InputsDataMap networkInputs, + InferenceEngine::OutputsDataMap networkOutputs) override; + std::shared_ptr GetContext() const override; + std::shared_ptr GetExecGraphInfo() override; + virtual ~AutoBatchExecutableNetwork(); + +protected: + static unsigned int ParseTimeoutValue(const std::string&); + std::atomic_bool _terminate = {false}; + DeviceInformation _device; + InferenceEngine::SoExecutableNetworkInternal _network; + InferenceEngine::SoExecutableNetworkInternal _networkWithoutBatch; + std::vector _workerRequests; + std::unordered_map _config; + bool _needPerfCounters = false; + std::atomic_size_t _numRequestsCreated = {0}; + std::atomic_int _timeOut = {1000}; // in ms +}; + +class AutoBatchInferRequest : public InferenceEngine::IInferRequestInternal { +public: + using Ptr = std::shared_ptr; + explicit AutoBatchInferRequest(const InferenceEngine::InputsDataMap& networkInputs, + const InferenceEngine::OutputsDataMap& networkOutputs, + AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr, + int batch_id, + int num_batch, + bool _needPerfCounters = false); + std::map GetPerformanceCounts() const override; + + // Batch-Device impl specific: sets the data (blobs from the device request to the batched device request) + void SetBlobsToAnotherRequest(InferenceEngine::SoIInferRequestInternal& req); + void CopyInputsIfNeeded(); + void CopyOutputsIfNeeded(); + AutoBatchExecutableNetwork::WorkerInferRequest& _myBatchedRequestWrapper; + std::exception_ptr _exceptionPtr; + +protected: + std::map _perfMap; + bool _needPerfCounters = false; + void CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src, InferenceEngine::Blob::Ptr dst, bool bInput); + size_t _batchId; + size_t _batchSize; +}; + +class AutoBatchAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault { +public: + using Ptr = std::shared_ptr; + + explicit AutoBatchAsyncInferRequest(const AutoBatchInferRequest::Ptr& inferRequest, + const bool needPerfCounters, + InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch, + const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor); + void Infer_ThreadUnsafe() override; + virtual ~AutoBatchAsyncInferRequest(); + + InferenceEngine::SoIInferRequestInternal _inferRequestWithoutBatch; + AutoBatchInferRequest::Ptr _inferRequest; +}; + +class AutoBatchInferencePlugin : public InferenceEngine::IInferencePlugin { +public: + AutoBatchInferencePlugin(); + virtual ~AutoBatchInferencePlugin() = default; + InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl( + const InferenceEngine::CNNNetwork& network, + const std::map& config) override; + InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl( + const InferenceEngine::CNNNetwork& network, + const std::shared_ptr& context, + const std::map& config) override; + + void SetConfig(const std::map& config) override; + void CheckConfig(const std::map& config); + + InferenceEngine::Parameter GetConfig( + const std::string& name, + const std::map& options) const override; + InferenceEngine::QueryNetworkResult QueryNetwork(const InferenceEngine::CNNNetwork& network, + const std::map& config) const override; + InferenceEngine::Parameter GetMetric( + const std::string& name, + const std::map& options) const override; + InferenceEngine::RemoteContext::Ptr CreateContext(const InferenceEngine::ParamMap&) override; + +protected: + DeviceInformation ParseMetaDevice(const std::string& devicesBatchCfg, + const std::map& config) const; + + std::map GetSupportedConfig(const std::map& config, + const DeviceName& deviceName) const; + static DeviceInformation ParseBatchDevice(const std::string& deviceWithBatch); + + InferenceEngine::IExecutableNetworkInternal::Ptr LoadNetworkImpl( + const InferenceEngine::CNNNetwork& network, + const std::shared_ptr context, + const std::map& config); +}; + +} // namespace AutoBatchPlugin diff --git a/src/plugins/intel_cpu/src/mkldnn_plugin.cpp b/src/plugins/intel_cpu/src/mkldnn_plugin.cpp index 8a4cc5c6d89..70aa7e97cc0 100644 --- a/src/plugins/intel_cpu/src/mkldnn_plugin.cpp +++ b/src/plugins/intel_cpu/src/mkldnn_plugin.cpp @@ -609,11 +609,9 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network, const std // the more "capable" the CPU in general, the more streams we may want to keep to keep it utilized const float memThresholdAssumeLimitedForISA = ov::MemBandwidthPressure::LIMITED/isaSpecificThreshold; const float L2_cache_size = mkldnn::utils::get_cache_size(2 /*level*/, true /*per core */); - const float L3_cache_size = mkldnn::utils::get_cache_size(3, false); ov::MemBandwidthPressure networkToleranceForLowCache = ov::MemBandwidthPressureTolerance( clonedNetwork.getFunction(), - L2_cache_size, L3_cache_size, - memThresholdAssumeLimitedForISA); + L2_cache_size, memThresholdAssumeLimitedForISA); // num of phys CPU cores (most aggressive value for #streams) const auto num_cores = getNumberOfCPUCores(); // less aggressive diff --git a/src/plugins/intel_gpu/src/plugin/plugin.cpp b/src/plugins/intel_gpu/src/plugin/plugin.cpp index 879f15976fa..7450b67be0e 100644 --- a/src/plugins/intel_gpu/src/plugin/plugin.cpp +++ b/src/plugins/intel_gpu/src/plugin/plugin.cpp @@ -28,6 +28,7 @@ #include "intel_gpu/runtime/device_query.hpp" #include "intel_gpu/runtime/debug_configuration.hpp" +#include #ifdef __linux__ # include #endif @@ -681,6 +682,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map(device_info.gfx_ver.revision); } IE_SET_METRIC_RETURN(GPU_UARCH_VERSION, s.str()); + } else if (name == METRIC_KEY(OPTIMAL_BATCH_SIZE)) { + auto next_pow_of_2 = [] (float x) { + return pow(2, ceil(log(x)/log(2))); + }; + auto closest_pow_of_2 = [] (float x) { + return pow(2, floor(log(x)/log(2))); + }; + auto model_param = options.find("MODEL_PTR"); + if (model_param == options.end()) { + GPU_DEBUG_IF(debug_config->verbose >= 1) { + GPU_DEBUG_COUT << "[GPU_OPTIMAL_BATCH_SIZE] MODELS_PTR is not set: return 1" << std::endl; + } + IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, static_cast(1)); + } + std::shared_ptr model; + try { + model = model_param->second.as>(); + } catch (...) { + IE_THROW() << "[GPU_OPTIMAL_BATCH_SIZE] MODEL_PTR should be std::shared_ptr type"; + } + GPU_DEBUG_IF(debug_config->verbose >= 1) { + GPU_DEBUG_COUT << "DEVICE_INFO:" + << "gfx_version.major, " << device_info.gfx_ver.major + << "gfx_version.minor " << std::to_string(device_info.gfx_ver.minor) << std::endl; + } + static std::map gen_kbytes_per_bank = { + {{12, 0, 0}, 480}, // TGL + {{12, 1, 0}, 2048}, // DG1 + {{12, 5, 0}, 320}, + {{12, 7, 0}, 512}, + }; + size_t L3_cache_size = device_info.gfx_ver.major && (device_info.gfx_ver.major <= 9) + ? 768 * 1024 // Gen9 + : 2 * 768 * 1024; //reasonable default when no arch has been detected (e.g. due to old driver ver) + cldnn::gfx_version gen = {device_info.gfx_ver.major, device_info.gfx_ver.minor, 0 /*ignore the revision*/}; + auto val = gen_kbytes_per_bank.find(gen); + if (gen_kbytes_per_bank.end() != val) { + auto kbytes_per_bank = val->second; + auto num_banks_per_slice = device_info.num_sub_slices_per_slice > 4 + ? next_pow_of_2(device_info.num_sub_slices_per_slice) + : 2 * device_info.num_sub_slices_per_slice; + L3_cache_size = kbytes_per_bank * 1024 * num_banks_per_slice * device_info.num_slices; + GPU_DEBUG_IF(debug_config->verbose >= 1) { + GPU_DEBUG_COUT << "DEVICE_INFO:" + << "num_slices " << device_info.num_slices + << ", num_sub_slices_per_slice " << device_info.num_sub_slices_per_slice + << ", num_banks_per_slice " << num_banks_per_slice + << ", gen_kbytes_per_bank : " << kbytes_per_bank + << ", L3_cache_size is (MB): " << float(L3_cache_size) / 1024 / 1024 << std::endl; + } + } + Config config = _impl->m_configs.GetConfig(device_id); + auto networkCloned = CloneAndTransformNetwork(CNNNetwork(model), config); + ov::MemBandwidthPressure memPressure = ov::MemBandwidthPressureTolerance(networkCloned.getFunction(), L3_cache_size); + unsigned int batch = 1; + if (memPressure.max_mem_tolerance != ov::MemBandwidthPressure::UNKNOWN) + batch = std::max(1.0, 16 * closest_pow_of_2(memPressure.max_mem_tolerance)); + std::map options_for_max_batch; + options_for_max_batch["MODEL_PTR"] = model; + options_for_max_batch["GPU_THROUGHPUT_STREAMS"] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO); + auto max_batch_size = GetMetric(GPU_METRIC_KEY(MAX_BATCH_SIZE), options_for_max_batch).as(); + unsigned int closest = closest_pow_of_2(max_batch_size); + batch = std::min(closest, batch); + batch = std::min(256u, batch); //batch 256 is a max + GPU_DEBUG_IF(debug_config->verbose >= 1) { + GPU_DEBUG_COUT << memPressure.max_mem_tolerance << std::endl; + GPU_DEBUG_COUT << "MAX_BATCH: " << max_batch_size << std::endl; + GPU_DEBUG_COUT << "ACTUAL OPTIMAL BATCH: " << batch << std::endl; + } + IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, batch); } else if (name == METRIC_KEY(FULL_DEVICE_NAME)) { auto deviceName = StringRightTrim(device_info.dev_name, "NEO", false); deviceName += std::string(" (") + (device_info.dev_type == cldnn::device_type::discrete_gpu ? "dGPU" : "iGPU") + ")"; @@ -885,7 +957,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map(cloned_network, engine, config, false, true); - std::pair device_memory_usage = program->GetCompiledProgram(0)->get_estimated_device_mem_usage(); + std::pair device_memory_usage = program->GetCompiledProgram(0)->get_estimated_device_mem_usage(); int64_t mem_for_general = std::max(static_cast(1L), static_cast(static_cast(available_device_mem) - device_memory_usage.first)); int64_t mem_per_batch = std::max(static_cast(1L), (device_memory_usage.second / static_cast(base_batch_size))); diff --git a/src/tests/functional/inference_engine/CMakeLists.txt b/src/tests/functional/inference_engine/CMakeLists.txt index e108134920b..922faa37c82 100644 --- a/src/tests/functional/inference_engine/CMakeLists.txt +++ b/src/tests/functional/inference_engine/CMakeLists.txt @@ -48,6 +48,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI) list(APPEND DEPENDENCIES ov_auto_plugin) endif() +if(ENABLE_AUTO_BATCH) + list(APPEND DEPENDENCIES ov_auto_batch_plugin) +endif() + if (NOT ENABLE_OV_ONNX_FRONTEND) list(APPEND EXCLUDED_SOURCE_PATHS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_reader") endif() diff --git a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/include/api_conformance_helpers.hpp b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/include/api_conformance_helpers.hpp index a39c1b451de..0a120f0157b 100644 --- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/include/api_conformance_helpers.hpp +++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/include/api_conformance_helpers.hpp @@ -24,6 +24,7 @@ inline const std::string getPluginLibNameByDevice(const std::string& deviceName) { "GNA", "ov_intel_gna_plugin" }, { "GPU", "ov_intel_gpu_plugin" }, { "HETERO", "ov_hetero_plugin" }, + { "BATCH", "ov_auto_batch_plugin" }, { "MULTI", "ov_multi_plugin" }, { "MYRIAD", "myriadPlugin" }, { "TEMPLATE", "ov_template_plugin" }, @@ -42,6 +43,11 @@ inline const std::pair generateDefaultHeteroConfig() { return { "TARGET_FALLBACK" , ConformanceTests::targetDevice }; } +inline const std::pair generateDefaultBatchConfig() { + // auto-batching with batch 1 (no real batching in fact, but full machinery is in action) + return { CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , std::string(ConformanceTests::targetDevice)}; +} + inline const std::vector> generateConfigs(const std::string& targetDevice, const std::vector>& config = {}) { std::pair defaultConfig; @@ -49,6 +55,8 @@ inline const std::vector> generateConfigs(con defaultConfig = generateDefaultMultiConfig(); } else if (targetDevice == std::string(CommonTestUtils::DEVICE_HETERO)) { defaultConfig = generateDefaultHeteroConfig(); + } else if (targetDevice == std::string(CommonTestUtils::DEVICE_BATCH)) { + defaultConfig = generateDefaultBatchConfig(); } else { throw std::runtime_error("Incorrect target device: " + targetDevice); } @@ -70,7 +78,8 @@ inline const std::string generateComplexDeviceName(const std::string& deviceName inline const std::vector returnAllPossibleDeviceCombination() { std::vector res{ConformanceTests::targetDevice}; - std::vector devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO, CommonTestUtils::DEVICE_MULTI}; + std::vector devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO, + CommonTestUtils::DEVICE_BATCH, CommonTestUtils::DEVICE_MULTI}; for (const auto& device : devices) { res.emplace_back(generateComplexDeviceName(device)); } diff --git a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/callback.cpp b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/callback.cpp index 7137300df77..b089f77889e 100644 --- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/callback.cpp +++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/callback.cpp @@ -33,4 +33,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestCallbackTests, ::testing::Values(CommonTestUtils::DEVICE_HETERO), ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))), InferRequestCallbackTests::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestCallbackTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))), + InferRequestCallbackTests::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/io_blob.cpp b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/io_blob.cpp index 21d7dcc0c86..3aacb8e80b1 100644 --- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/io_blob.cpp +++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/io_blob.cpp @@ -36,4 +36,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestIOBBlobTest, ::testing::Values(CommonTestUtils::DEVICE_HETERO), ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))), InferRequestIOBBlobTest::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestIOBBlobTest, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))), + InferRequestIOBBlobTest::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/multitheading.cpp b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/multitheading.cpp index ea24706a8df..26c38ada713 100644 --- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/multitheading.cpp +++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/multitheading.cpp @@ -38,4 +38,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestMultithreadingT ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))), InferRequestMultithreadingTests::getTestCaseName); +INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestMultithreadingTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))), + InferRequestMultithreadingTests::getTestCaseName); + } // namespace diff --git a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/set_blob_by_type.cpp b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/set_blob_by_type.cpp index 49a2b23bd73..af064831d76 100644 --- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/set_blob_by_type.cpp +++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/set_blob_by_type.cpp @@ -46,4 +46,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Hetero, InferRequestSetBlobByType, ::testing::Values(CommonTestUtils::DEVICE_HETERO), ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))), InferRequestSetBlobByType::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Batch, InferRequestSetBlobByType, + ::testing::Combine(::testing::ValuesIn(setBlobTypes), + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))), + InferRequestSetBlobByType::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/wait.cpp b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/wait.cpp index e70458bdf4a..a600c31e746 100644 --- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/wait.cpp +++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/wait.cpp @@ -37,4 +37,9 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestWaitTests, ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))), InferRequestWaitTests::getTestCaseName); +INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestWaitTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))), + InferRequestWaitTests::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/cpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp b/src/tests/functional/plugin/cpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp new file mode 100644 index 00000000000..05b0da43c67 --- /dev/null +++ b/src/tests/functional/plugin/cpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp @@ -0,0 +1,31 @@ +// Copyright (C) 2018-2021 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// +#include + +const std::vector get_vs_set{ true, false }; +const std::vector num_streams{ 1, 2 }; +const std::vector num_requests{ 1, 3, 8, 9, 16, 64 }; +const std::vector num_batch{ 1, 4, 8, 16, 32, 64, 128, 256 }; +using namespace AutoBatchingTests; + +namespace { +INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_CPU), + ::testing::ValuesIn(get_vs_set), + ::testing::ValuesIn(num_streams), + ::testing::ValuesIn(num_requests), + ::testing::ValuesIn(num_batch)), + AutoBatching_Test::getTestCaseName); +// TODO: for 22.2 (CVS-68949) +//INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test_DetectionOutput, +// ::testing::Combine( +// ::testing::Values(CommonTestUtils::DEVICE_CPU), +// ::testing::ValuesIn(get_vs_set), +// ::testing::ValuesIn(num_streams), +// ::testing::ValuesIn(num_requests), +// ::testing::ValuesIn(num_batch)), +// AutoBatching_Test_DetectionOutput::getTestCaseName); + +} // namespace \ No newline at end of file diff --git a/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp b/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp index 95aecd6b357..986f3f1a809 100644 --- a/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp +++ b/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp @@ -21,16 +21,27 @@ using namespace ::testing; using namespace InferenceEngine; using namespace InferenceEngine::gpu; -class RemoteBlob_Test : public CommonTestUtils::TestsCommon { +class RemoteBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface { protected: std::shared_ptr fn_ptr; + std::string deviceName; +public: void SetUp() override { fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat(); + deviceName = CommonTestUtils::DEVICE_GPU; + auto with_auto_batching = this->GetParam(); + if (with_auto_batching) { // BATCH:GPU + deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName; + } + } + static std::string getTestCaseName(const testing::TestParamInfo& obj) { + auto with_auto_batch = obj.param; + return std::string("RemoteBlob_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": ""); } }; -TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) { +TEST_P(RemoteBlob_Test, smoke_canInputUserBlob) { #if defined(ANDROID) GTEST_SKIP(); #endif @@ -41,7 +52,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) { // TODO: Issue: investigate issue with IECore auto ie = InferenceEngine::Core(); - auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU); + auto exec_net = ie.LoadNetwork(net, deviceName); // regular inference auto inf_req_regular = exec_net.CreateInferRequest(); @@ -70,6 +81,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) { Blob::Ptr shared_blob = make_shared_blob(net.getInputsInfo().begin()->second->getTensorDesc(), cldnn_context, shared_buffer); + shared_blob->allocate(); inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, shared_blob); inf_req_shared.Infer(); @@ -85,7 +97,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) { } -TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) { +TEST_P(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) { #if defined(ANDROID) GTEST_SKIP(); #endif @@ -96,7 +108,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) { // TODO: Issue: investigate issue with IECore auto ie = InferenceEngine::Core(); - auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU); + auto exec_net = ie.LoadNetwork(net, deviceName); // regular inference auto inf_req_regular = exec_net.CreateInferRequest(); @@ -139,7 +151,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) { } -TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) { +TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) { auto fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat(); CNNNetwork net(fn_ptr); @@ -149,7 +161,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie->LoadNetwork(net, deviceName); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -161,7 +173,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) { // inference using remote blob auto ocl_instance = std::make_shared(); - auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_context.get()); + auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_context.get()); auto exec_net_shared = ie->LoadNetwork(net, remote_context); auto inf_req_shared = exec_net_shared.CreateInferRequest(); inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, fakeImageData); @@ -178,7 +190,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) { } } -TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { +TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { #if defined _WIN32 GTEST_SKIP(); #endif @@ -191,7 +203,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie->LoadNetwork(net, deviceName); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -214,7 +226,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases) // without calling thread blocks - auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get()); + auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get()); auto exec_net_shared = ie->LoadNetwork(net, remote_context); auto inf_req_shared = exec_net_shared.CreateInferRequest(); @@ -270,7 +282,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { } } -TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { +TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { #if defined _WIN32 GTEST_SKIP(); #endif @@ -283,7 +295,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie->LoadNetwork(net, deviceName); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -307,7 +319,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases) // without calling thread blocks - auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get()); + auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get()); auto exec_net_shared = ie->LoadNetwork(net, remote_context); auto inf_req_shared = exec_net_shared.CreateInferRequest(); @@ -358,6 +370,10 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { } } +std::vector with_auto_batching {true, false}; +INSTANTIATE_TEST_SUITE_P(smoke_RemoteBlob, RemoteBlob_Test, ::testing::ValuesIn(with_auto_batching), + RemoteBlob_Test::getTestCaseName); + class BatchedBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface { void SetUp() override { num_batch = this->GetParam(); diff --git a/src/tests/functional/plugin/gpu/remote_blob_tests/gpu_remote_tensor_tests.cpp b/src/tests/functional/plugin/gpu/remote_blob_tests/gpu_remote_tensor_tests.cpp index b4ae7c0ea9d..8c4e43984ef 100644 --- a/src/tests/functional/plugin/gpu/remote_blob_tests/gpu_remote_tensor_tests.cpp +++ b/src/tests/functional/plugin/gpu/remote_blob_tests/gpu_remote_tensor_tests.cpp @@ -30,6 +30,7 @@ protected: } }; +std::vector ov_with_auto_batching {true, false}; enum class RemoteTensorSharingType { USER_CL_TENSOR = 0, PLUGIN_CL_TENSOR = 1, @@ -54,17 +55,34 @@ std::ostream& operator<<(std::ostream& stream, RemoteTensorSharingType sharing_t return stream; } -class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test, public testing::WithParamInterface { +using RemoteTensorSharingTestOptionsParams = std::tuple; + +class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test, + public testing::WithParamInterface { +protected: + std::shared_ptr fn_ptr; + std::string deviceName; + public: void SetUp() override { fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat(); + deviceName = CommonTestUtils::DEVICE_GPU; + RemoteTensorSharingType sharing_type; + bool with_auto_batching; + std::tie(sharing_type, with_auto_batching) = this->GetParam(); + if (with_auto_batching) // BATCH:GPU + deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName; } - - static std::string getTestCaseName(testing::TestParamInfo obj) { - RemoteTensorSharingType sharing_type = obj.param; + static std::string getTestCaseName(const testing::TestParamInfo& obj) { + RemoteTensorSharingType sharing_type; + bool with_auto_batching; + std::tie(sharing_type, with_auto_batching) = obj.param; std::ostringstream result; + result << "OVRemoteTensorInputBlob_Test_"; result << sharing_type; + if (with_auto_batching) + result << "_WITH_AUTO_BATCHING"; return result.str(); } }; @@ -81,9 +99,17 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) { p.input().preprocess().convert_element_type(ov::element::f32); auto function = p.build(); - auto exec_net = ie.compile_model(function, CommonTestUtils::DEVICE_GPU); + RemoteTensorSharingType sharing_type; + bool with_auto_batching; + std::tie(sharing_type, with_auto_batching) = GetParam(); - RemoteTensorSharingType sharing_type = GetParam(); + // auto-batching relies on availability of the lock() for the tensor (and the *USM_DEVICE is not lockable) + if (with_auto_batching + && (RemoteTensorSharingType::USER_USM_DEVICE_TENSOR == sharing_type + || RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR == sharing_type)) + GTEST_SKIP(); + + auto exec_net = ie.compile_model(function, deviceName); // regular inference auto inf_req_regular = exec_net.create_infer_request(); @@ -244,6 +270,7 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) { INSTANTIATE_TEST_SUITE_P( smoke_GPU, OVRemoteTensorInputBlob_Test, + ::testing::Combine( ::testing::ValuesIn(std::vector{RemoteTensorSharingType::USER_CL_TENSOR, RemoteTensorSharingType::PLUGIN_CL_TENSOR, RemoteTensorSharingType::USER_USM_HOST_TENSOR, @@ -251,9 +278,29 @@ INSTANTIATE_TEST_SUITE_P( RemoteTensorSharingType::PLUGIN_USM_HOST_TENSOR, RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR, RemoteTensorSharingType::PLUGIN_HOST_TENSOR}), + ::testing::ValuesIn(ov_with_auto_batching)), OVRemoteTensorInputBlob_Test::getTestCaseName); -TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) { +class OVRemoteTensor_TestsWithContext : public OVRemoteTensor_Test, public testing::WithParamInterface { +protected: + std::shared_ptr fn_ptr; + std::string deviceName; +public: + void SetUp() override { + fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat(); + deviceName = CommonTestUtils::DEVICE_GPU; + auto with_auto_batching = this->GetParam(); + if (with_auto_batching) { // BATCH:GPU + deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName; + } + } + static std::string getTestCaseName(const testing::TestParamInfo& obj) { + auto with_auto_batch = obj.param; + return std::string("RemoteTensor_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": ""); + } +}; + +TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContext) { auto ie = ov::runtime::Core(); using namespace ov::preprocess; @@ -262,7 +309,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) { p.input().preprocess().convert_element_type(ov::element::f32); auto function = p.build(); - auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie.compile_model(function, deviceName); auto input = function->get_parameters().at(0); auto output = function->get_results().at(0); @@ -296,7 +343,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) { } } -TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) { +TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContextWithMultipleDevices) { auto ie = ov::runtime::Core(); using namespace ov::preprocess; @@ -305,7 +352,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) { p.input().preprocess().convert_element_type(ov::element::f32); auto function = p.build(); - auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie.compile_model(function, deviceName); auto input = function->get_parameters().at(0); auto output = function->get_results().at(0); @@ -344,7 +391,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) { } } -TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) { +TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_out_of_order) { auto ie = ov::runtime::Core(); using namespace ov::preprocess; @@ -353,7 +400,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) { p.input().preprocess().convert_element_type(ov::element::f32); auto function = p.build(); - auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie.compile_model(function, deviceName); auto input = function->get_parameters().at(0); auto output = function->get_results().at(0); @@ -423,7 +470,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) { } } -TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) { +TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_in_order) { auto ie = ov::runtime::Core(); using namespace ov::preprocess; @@ -432,7 +479,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) { p.input().preprocess().convert_element_type(ov::element::f32); auto function = p.build(); - auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU); + auto exec_net_regular = ie.compile_model(function, deviceName); auto input = function->get_parameters().at(0); auto output = function->get_results().at(0); @@ -498,6 +545,9 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) { } } +INSTANTIATE_TEST_SUITE_P(smoke_RemoteTensor, OVRemoteTensor_TestsWithContext, ::testing::ValuesIn(ov_with_auto_batching), + OVRemoteTensor_TestsWithContext::getTestCaseName); + TEST_F(OVRemoteTensor_Test, NV12toBGR_image) { #if defined(ANDROID) GTEST_SKIP(); diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp new file mode 100644 index 00000000000..e8128730f42 --- /dev/null +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp @@ -0,0 +1,31 @@ +// Copyright (C) 2018-2021 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// +#include + +const std::vector num_streams{ 2 }; +const std::vector get_vs_set{ true, false }; +const std::vector num_requests{ 1, 8, 16, 64 }; +const std::vector num_batch{ 1, 8, 32, 256 }; +using namespace AutoBatchingTests; + +namespace AutoBatchingTests { + +INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_GPU), + ::testing::ValuesIn(get_vs_set), + ::testing::ValuesIn(num_streams), + ::testing::ValuesIn(num_requests), + ::testing::ValuesIn(num_batch)), + AutoBatching_Test::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test_DetectionOutput, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_GPU), + ::testing::ValuesIn(get_vs_set), + ::testing::ValuesIn(num_streams), + ::testing::ValuesIn(num_requests), + ::testing::ValuesIn(num_batch)), + AutoBatching_Test_DetectionOutput::getTestCaseName); +} // namespace AutoBatchingTests \ No newline at end of file diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/exec_net_base.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/exec_net_base.cpp index 1082853b862..ec7181ec6a0 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/exec_net_base.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/exec_net_base.cpp @@ -52,6 +52,10 @@ const std::vector> autoConfig = { {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}}, }; +const std::vector> autoBatchConfig = { + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}}, +}; + INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, ExecNetSetPrecision, ::testing::Combine( ::testing::ValuesIn(netPrecisions), @@ -72,4 +76,11 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetSetPrecision, ::testing::Values(CommonTestUtils::DEVICE_AUTO), ::testing::ValuesIn(autoConfig)), ExecNetSetPrecision::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, ExecNetSetPrecision, + ::testing::Combine( + ::testing::ValuesIn(netPrecisions), + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(autoBatchConfig)), + ExecNetSetPrecision::getTestCaseName); } // namespace \ No newline at end of file diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/get_metric.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/get_metric.cpp index 3f8f7bd30df..3c0bfd785f4 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/get_metric.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/get_metric.cpp @@ -22,27 +22,27 @@ namespace { INSTANTIATE_TEST_SUITE_P( nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_OPTIMAL_NUMBER_OF_INFER_REQUESTS, - ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU") + ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU") ); INSTANTIATE_TEST_SUITE_P( nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_CONFIG_KEYS, - ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU") + ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU") ); INSTANTIATE_TEST_SUITE_P( nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_METRICS, - ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU") + ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU") ); INSTANTIATE_TEST_SUITE_P( nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_NETWORK_NAME, - ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU") + ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU") ); INSTANTIATE_TEST_SUITE_P( nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_ThrowsUnsupported, - ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU") + ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU") ); // diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/callback.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/callback.cpp index dfaa591dd96..68025694559 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/callback.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/callback.cpp @@ -19,6 +19,10 @@ const std::vector> autoConfigs = { {InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}} }; +const std::vector> autoBatchConfigs = { + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}}, +}; + INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestCallbackTests, ::testing::Combine( ::testing::Values(CommonTestUtils::DEVICE_GPU), @@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestCallbackTests, ::testing::Values(CommonTestUtils::DEVICE_AUTO), ::testing::ValuesIn(autoConfigs)), InferRequestCallbackTests::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestCallbackTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(autoBatchConfigs)), + InferRequestCallbackTests::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/multithreading.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/multithreading.cpp index a23ea031001..429e1695ebe 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/multithreading.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/multithreading.cpp @@ -18,6 +18,10 @@ const std::vector> autoconfigs = { {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, std::string(CommonTestUtils::DEVICE_CPU) + "," + CommonTestUtils::DEVICE_GPU}} }; +const std::vector> auto_batch_configs = { + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}}, +}; + INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestMultithreadingTests, ::testing::Combine( ::testing::Values(CommonTestUtils::DEVICE_GPU), @@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestMultithreadingTes ::testing::ValuesIn(autoconfigs)), InferRequestMultithreadingTests::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestMultithreadingTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(auto_batch_configs)), + InferRequestMultithreadingTests::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/wait.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/wait.cpp index 41da3069a87..77b717b6605 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/wait.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/wait.cpp @@ -19,6 +19,11 @@ namespace { CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}} }; + + const std::vector> autoBatchConfigs = { + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}}, + }; + INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestWaitTests, ::testing::Combine( ::testing::Values(CommonTestUtils::DEVICE_GPU), @@ -32,9 +37,15 @@ namespace { InferRequestWaitTests::getTestCaseName); INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestWaitTests, - ::testing::Combine( - ::testing::Values(CommonTestUtils::DEVICE_AUTO), - ::testing::ValuesIn(autoConfigs)), - InferRequestWaitTests::getTestCaseName); + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_AUTO), + ::testing::ValuesIn(autoConfigs)), + InferRequestWaitTests::getTestCaseName); + + INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestWaitTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(autoBatchConfigs)), + InferRequestWaitTests::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp index 55fe2c973a7..f03794f7ac3 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp @@ -30,11 +30,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassNetworkTestP, OVClassNetworkTestP, ::tes INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, OVClassGetMetricTest_SUPPORTED_CONFIG_KEYS, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")); + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")); INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, OVClassGetMetricTest_SUPPORTED_METRICS, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")); + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")); INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, OVClassGetMetricTest_AVAILABLE_DEVICES, @@ -42,7 +42,7 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, OVClassGetMetricTest_FULL_DEVICE_NAME, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")); + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")); INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, OVClassGetMetricTest_OPTIMIZATION_CAPABILITIES, @@ -62,11 +62,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest, OVClassGetMetricTest_ThrowUnsupported, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")); + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")); INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetConfigTest, OVClassGetConfigTest_ThrowUnsupported, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")); + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")); INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetAvailableDevices, OVClassGetAvailableDevices, ::testing::Values("GPU")); diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/configuration_tests.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/configuration_tests.cpp index 826d3f1fc47..9e3a44d6fad 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/configuration_tests.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/configuration_tests.cpp @@ -104,6 +104,29 @@ namespace { CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}, {InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "should be int"}} }; + + + const std::vector> auto_batch_inconfigs = { + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU}, + {CONFIG_KEY(AUTO_BATCH_TIMEOUT), "-1"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, "DOESN'T EXIST"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}, + {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "-1"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, "ON"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "unknown_file"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_DUMP_KERNELS, "ON"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_TUNING_MODE, "TUNING_UNKNOWN_MODE"}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {InferenceEngine::PluginConfigParams::KEY_DEVICE_ID, "DEVICE_UNKNOWN"}}, + }; + + IE_SUPPRESS_DEPRECATED_END INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, IncorrectConfigTests, @@ -125,6 +148,12 @@ namespace { IncorrectConfigTests::getTestCaseName); + INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(auto_batch_inconfigs)), + IncorrectConfigTests::getTestCaseName); + const std::vector> conf = { {} }; @@ -167,17 +196,6 @@ namespace { }; IE_SUPPRESS_DEPRECATED_END - const std::vector> multiconf = { - {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}}, - {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}, - {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::THROUGHPUT}}, - {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}, - {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}}, - {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}, - {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}, - {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "1"}} - }; - const std::vector> autoConfigs = { {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}}, {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}, @@ -232,6 +250,12 @@ namespace { {InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "2"}} }; + const std::vector> auto_batch_configs = { + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}}, + {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}, + {CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "1"}}, + }; + INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, DefaultValuesConfigTests, ::testing::Combine( ::testing::Values(CommonTestUtils::DEVICE_GPU), @@ -255,4 +279,15 @@ namespace { ::testing::Values(CommonTestUtils::DEVICE_AUTO), ::testing::ValuesIn(autoinconfigs)), IncorrectConfigAPITests::getTestCaseName); + INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigAPITests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(auto_batch_inconfigs)), + IncorrectConfigAPITests::getTestCaseName); + + INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, CorrectConfigTests, + ::testing::Combine( + ::testing::Values(CommonTestUtils::DEVICE_BATCH), + ::testing::ValuesIn(auto_batch_configs)), + CorrectConfigTests::getTestCaseName); } // namespace diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/core_integration.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/core_integration.cpp index b758bba7496..703532fdab6 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/core_integration.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/core_integration.cpp @@ -35,12 +35,12 @@ INSTANTIATE_TEST_SUITE_P( INSTANTIATE_TEST_SUITE_P( nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_CONFIG_KEYS, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO") + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH") ); INSTANTIATE_TEST_SUITE_P( nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_METRICS, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO") + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH") ); INSTANTIATE_TEST_SUITE_P( @@ -50,7 +50,7 @@ INSTANTIATE_TEST_SUITE_P( INSTANTIATE_TEST_SUITE_P( nightly_IEClassGetMetricTest, IEClassGetMetricTest_FULL_DEVICE_NAME, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO") + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH") ); INSTANTIATE_TEST_SUITE_P( @@ -80,12 +80,12 @@ INSTANTIATE_TEST_SUITE_P( INSTANTIATE_TEST_SUITE_P( nightly_IEClassGetMetricTest, IEClassGetMetricTest_ThrowUnsupported, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO") + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH") ); INSTANTIATE_TEST_SUITE_P( nightly_IEClassGetConfigTest, IEClassGetConfigTest_ThrowUnsupported, - ::testing::Values("GPU", "MULTI", "HETERO", "AUTO") + ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH") ); INSTANTIATE_TEST_SUITE_P( @@ -115,6 +115,26 @@ INSTANTIATE_TEST_SUITE_P( ::testing::Values("GPU") ); +using IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE = BehaviorTestsUtils::IEClassBaseTestP; +TEST_P(IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE, GetMetricAndPrintNoThrow) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() + InferenceEngine::Core ie; + InferenceEngine::Parameter p; + + std::map _options = {{"MODEL_PTR", simpleCnnNetwork.getFunction()}}; + ASSERT_NO_THROW(p = ie.GetMetric(deviceName, METRIC_KEY(OPTIMAL_BATCH_SIZE), _options).as()); + unsigned int t = p; + + std::cout << "GPU device optimal batch size: " << t << std::endl; + + ASSERT_METRIC_SUPPORTED_IE(METRIC_KEY(OPTIMAL_BATCH_SIZE)); +} + +INSTANTIATE_TEST_SUITE_P( + nightly_IEClassExecutableNetworkGetMetricTest, IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE, + ::testing::Values("GPU") +); + using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT = BehaviorTestsUtils::IEClassBaseTestP; TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT, GetMetricAndPrintNoThrow) { SKIP_IF_CURRENT_TEST_IS_DISABLED() @@ -135,6 +155,7 @@ INSTANTIATE_TEST_SUITE_P( ::testing::Values("GPU") ); + using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM = BehaviorTestsUtils::IEClassBaseTestP; TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM, GetMetricAndPrintNoThrow) { SKIP_IF_CURRENT_TEST_IS_DISABLED() diff --git a/src/tests/functional/plugin/shared/CMakeLists.txt b/src/tests/functional/plugin/shared/CMakeLists.txt index 9c4834253ad..3c63543d663 100644 --- a/src/tests/functional/plugin/shared/CMakeLists.txt +++ b/src/tests/functional/plugin/shared/CMakeLists.txt @@ -16,6 +16,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI) list(APPEND DEPENDENCIES ov_auto_plugin) endif() +if(ENABLE_AUTO_BATCH) + list(APPEND DEPENDENCIES ov_auto_batch_plugin) +endif() + + # remove once CVS-69781 is fixed if(ENABLE_OV_IR_FRONTEND) list(APPEND DEPENDENCIES ov_ir_frontend) diff --git a/src/tests/functional/plugin/shared/include/auto_batching/auto_batching_tests.hpp b/src/tests/functional/plugin/shared/include/auto_batching/auto_batching_tests.hpp new file mode 100644 index 00000000000..5c7c425713b --- /dev/null +++ b/src/tests/functional/plugin/shared/include/auto_batching/auto_batching_tests.hpp @@ -0,0 +1,161 @@ +// Copyright (C) 2018-2021 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include +#include +#include +#include + +#include +#include +#include + +#include "ngraph_functions/subgraph_builders.hpp" +#include "functional_test_utils/blob_utils.hpp" + +using namespace ::testing; +using namespace InferenceEngine; + +namespace AutoBatchingTests { +using AutoBatchTwoNetsParams = std::tuple< + std::string, // device name + bool, // get or set blob + size_t, // number of streams + size_t, // number of requests + size_t>; // batch size> + +class AutoBatching_Test : public CommonTestUtils::TestsCommon, + public testing::WithParamInterface { + void SetUp() override { + std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam(); + fn_ptrs = {ngraph::builder::subgraph::makeSingleConv(), + ngraph::builder::subgraph::makeMultiSingleConv()}; + }; +public: + static std::string getTestCaseName(const testing::TestParamInfo &obj) { + size_t streams, requests, batch; + bool use_get_blob; + std::string device_name; + std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param; + return device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") + "_batch_size_" + + std::to_string(batch) + + "_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests); + } + +protected: + std::string device_name; + bool use_get_blob; + size_t num_streams; + size_t num_requests; + size_t num_batch; + std::vector> fn_ptrs; + + void TestAutoBatch() { + std::vector nets; + for (auto &fn_ptr : fn_ptrs) { + nets.push_back(CNNNetwork(fn_ptr)); + } + + auto ie = InferenceEngine::Core(); + std::vector outputs; + std::vector irs; + std::vector> ref; + std::vector outElementsCount; + + for (size_t i = 0; i < nets.size(); ++i) { + auto net = nets[i]; + auto inputs = net.getInputsInfo(); + for (auto n : inputs) { + n.second->setPrecision(Precision::FP32); + } + std::map config; + if (device_name.find("GPU") != std::string::npos) + config[CONFIG_KEY(GPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams); + if (device_name.find("CPU") != std::string::npos) + config[CONFIG_KEY(CPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams); + // minimize timeout to reduce test time + config[CONFIG_KEY(AUTO_BATCH_TIMEOUT)] = std::to_string(1); + auto exec_net_ref = ie.LoadNetwork(net, std::string(CommonTestUtils::DEVICE_BATCH) + ":" + + device_name + "(" + std::to_string(num_batch) + ")", + config); + + for (size_t j = 0; j < num_requests; j++) { + outputs.push_back(net.getOutputsInfo().begin()->first); //single output + outElementsCount.push_back( + std::accumulate(begin(fn_ptrs[i]->get_output_shape(0)), end(fn_ptrs[i]->get_output_shape(0)), 1, + std::multiplies())); + + auto inf_req = exec_net_ref.CreateInferRequest(); + irs.push_back(inf_req); + + std::vector> inData; + for (auto n : inputs) { + auto blob = FuncTestUtils::createAndFillBlob(n.second->getTensorDesc()); + if (use_get_blob) + memcpy(reinterpret_cast(inf_req.GetBlob(n.first)->buffer().as()), + reinterpret_cast(blob->cbuffer().as()), blob->byteSize()); + else + inf_req.SetBlob(n.first, blob); + + const auto inBlob = inf_req.GetBlob(n.first); + const auto blobSize = inBlob->byteSize(); + const auto inBlobBuf = inBlob->cbuffer().as(); + inData.push_back(std::vector(inBlobBuf, inBlobBuf + blobSize)); + } + auto refOutData = ngraph::helpers::interpreterFunction(fn_ptrs[i], {inData}).front().second; + ref.push_back(refOutData); + } + } + + const int niter = 1; + for (int i = 0; i < niter; i++) { + for (auto ir : irs) { + ir.StartAsync(); + } + + for (auto ir : irs) { + ir.Wait(InferRequest::RESULT_READY); + } + } + + auto thr = FuncTestUtils::GetComparisonThreshold(InferenceEngine::Precision::FP32); + for (size_t i = 0; i < irs.size(); ++i) { + const auto &refBuffer = ref[i].data(); + ASSERT_EQ(outElementsCount[i], irs[i].GetBlob(outputs[i])->size()); + FuncTestUtils::compareRawBuffers(irs[i].GetBlob(outputs[i])->buffer().as(), + reinterpret_cast(refBuffer), outElementsCount[i], + outElementsCount[i], + thr); + } + } +}; + +class AutoBatching_Test_DetectionOutput : public AutoBatching_Test { +public: + void SetUp() override { + std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam(); + fn_ptrs = {ngraph::builder::subgraph::makeEltwisePlusDetectionOutput(), + ngraph::builder::subgraph::makeEltwisePlusDetectionOutput()}; + }; + + static std::string getTestCaseName(const testing::TestParamInfo &obj) { + size_t streams, requests, batch; + bool use_get_blob; + std::string device_name; + std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param; + return "DetectionOutput_HETERO_" + device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") + + "_batch_size_" + std::to_string(batch) + + "_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests); + } +}; + +TEST_P(AutoBatching_Test, compareAutoBatchingToSingleBatch) { + TestAutoBatch(); +} + +TEST_P(AutoBatching_Test_DetectionOutput, compareAutoBatchingToSingleBatch) { + TestAutoBatch(); +} + +} // namespace AutoBatchingTests \ No newline at end of file diff --git a/src/tests/ie_test_utils/common_test_utils/test_constants.hpp b/src/tests/ie_test_utils/common_test_utils/test_constants.hpp index 7d8087fb925..158352a72b4 100644 --- a/src/tests/ie_test_utils/common_test_utils/test_constants.hpp +++ b/src/tests/ie_test_utils/common_test_utils/test_constants.hpp @@ -10,6 +10,7 @@ const char DEVICE_AUTO[] = "AUTO"; const char DEVICE_CPU[] = "CPU"; const char DEVICE_GNA[] = "GNA"; const char DEVICE_GPU[] = "GPU"; +const char DEVICE_BATCH[] = "BATCH"; const char DEVICE_HDDL[] = "HDDL"; const char DEVICE_MYRIAD[] = "MYRIAD"; const char DEVICE_KEEMBAY[] = "VPUX"; diff --git a/src/tests/ie_test_utils/unit_test_utils/mocks/cpp_interfaces/interface/mock_icore.hpp b/src/tests/ie_test_utils/unit_test_utils/mocks/cpp_interfaces/interface/mock_icore.hpp index 2f7fb1730d1..dab3bdd16c7 100644 --- a/src/tests/ie_test_utils/unit_test_utils/mocks/cpp_interfaces/interface/mock_icore.hpp +++ b/src/tests/ie_test_utils/unit_test_utils/mocks/cpp_interfaces/interface/mock_icore.hpp @@ -26,6 +26,9 @@ public: MOCK_METHOD3(ImportNetwork, InferenceEngine::SoExecutableNetworkInternal( std::istream&, const std::shared_ptr&, const std::map&)); + MOCK_METHOD2(CreateContext, InferenceEngine::RemoteContext::Ptr(const std::string& deviceName, + const InferenceEngine::ParamMap& params)); + MOCK_CONST_METHOD3(QueryNetwork, InferenceEngine::QueryNetworkResult( const InferenceEngine::CNNNetwork&, const std::string&, const std::map&)); diff --git a/src/tests/ngraph_helpers/ngraph_functions/include/ngraph_functions/subgraph_builders.hpp b/src/tests/ngraph_helpers/ngraph_functions/include/ngraph_functions/subgraph_builders.hpp index a518b080af3..3609d54bab4 100644 --- a/src/tests/ngraph_helpers/ngraph_functions/include/ngraph_functions/subgraph_builders.hpp +++ b/src/tests/ngraph_helpers/ngraph_functions/include/ngraph_functions/subgraph_builders.hpp @@ -242,6 +242,44 @@ inline std::shared_ptr makeSingleConv(std::vector inpu return fn_ptr; } +inline std::shared_ptr makeEltwisePlusDetectionOutput(std::vector> inShapes = + {{1, 60}, {1, 165}, {1, 1, 75}}, + ngraph::element::Type_t type = ngraph::element::Type_t::f32) { + // adding Eltwise so that we can tests Auto-Batching's HETERO code-path that splits the DetectionOutput and the rest of the network + auto params = ngraph::builder::makeParams(ngraph::element::f32, inShapes); + auto paramOuts = ngraph::helpers::convert2OutputVector( + ngraph::helpers::castOps2Nodes(params)); + ngraph::OutputVector outs; + for (size_t i = 0; i < inShapes.size(); i++) { + auto shape = inShapes[i]; + auto p = std::make_shared(ngraph::element::f32, ngraph::Shape{shape}); + auto add = ngraph::builder::makeEltwise(paramOuts[i], p, ngraph::helpers::EltwiseTypes::ADD); + params.push_back(p); + outs.push_back(add->output(0)); + } + ngraph::op::DetectionOutput::Attributes attr; + attr.num_classes = 11; + attr.background_label_id = 0; + attr.top_k = 75; + attr.variance_encoded_in_target = true; + attr.keep_top_k = {50}; + attr.code_type = std::string{"caffe.PriorBoxParameter.CORNER"}; + attr.share_location = true; + attr.nms_threshold = 0.5f; + attr.confidence_threshold = 0.5f; + attr.clip_after_nms = false; + attr.clip_before_nms = false; + attr.decrease_label_id = false; + attr.normalized = false; + attr.input_height = 1; + attr.input_width = 1; + attr.objectness_score = 0.4f; + + auto detOut = ngraph::builder::makeDetectionOutput(outs, attr); + ngraph::ResultVector results{std::make_shared(detOut)}; + return std::make_shared(results, params, "EltWiseWithDetectionOutput"); +} + inline std::shared_ptr makeMultiSingleConv(std::vector inputShape = {1, 3, 24, 24}, ngraph::element::Type type = ngraph::element::Type_t::f32) { auto param0 = std::make_shared(type, ngraph::Shape(inputShape)); diff --git a/src/tests/unit/auto/exec_network_get_metrics.cpp b/src/tests/unit/auto/exec_network_get_metrics.cpp index 9c01e0fe7ab..b1d8ead498d 100644 --- a/src/tests/unit/auto/exec_network_get_metrics.cpp +++ b/src/tests/unit/auto/exec_network_get_metrics.cpp @@ -38,6 +38,7 @@ using Config = std::map; using namespace MockMultiDevice; using ConfigParams = std::tuple< + bool, // if THROUGHPUT unsigned int, // cpu OPTIMAL_NUMBER_OF_INFER_REQUESTS int, // cpu infer requet num of customer want bool, // if cpu sleep, cpu device will load slow @@ -77,12 +78,18 @@ public: unsigned int expectOptimalNum; bool cpuSleep; bool gpuSleep; - std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep, + bool isThroughput; + std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep, gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = obj.param; std::ostringstream result; result << "cpuOptimalNum_" << cpuOptimalNum << "cpuCustomerNum_" << cpuCustomerNum; result << "gpuOptimalNum_" << gpuOptimalNum << "gpuCustomerNum_" << gpuCustomerNum; result << "expectOptimalNum_" << expectOptimalNum; + if (isThroughput) { + result << "_isThroughput" << "true"; + } else { + result << "__isThroughput" << "false"; + } if (cpuSleep) { result << "_cpuSleep_" << "true"; } else { @@ -147,7 +154,7 @@ public: IE_SET_METRIC(SUPPORTED_CONFIG_KEYS, supportConfigs, {}); ON_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _)) .WillByDefault(RETURN_MOCK_VALUE(supportConfigs)); - EXPECT_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _)).Times(AnyNumber()); + EXPECT_CALL(*core, GetMetric(_, _, _)).Times(AnyNumber()); // test auto plugin config.insert({CONFIG_KEY_INTERNAL(MULTI_WORK_MODE_AS_AUTO), InferenceEngine::PluginConfigParams::YES}); @@ -168,11 +175,24 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) { unsigned int expectOptimalNum; bool cpuSleep; bool gpuSleep; - std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep, + bool isThroughput; + std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep, gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = this->GetParam(); - - metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""}); - metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""}); + if (isThroughput) { + metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {{CONFIG_KEY(PERFORMANCE_HINT), + InferenceEngine::PluginConfigParams::THROUGHPUT}}, cpuCustomerNum, ""}); + metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {{CONFIG_KEY(PERFORMANCE_HINT), + InferenceEngine::PluginConfigParams::THROUGHPUT}}, gpuCustomerNum, ""}); + IE_SET_METRIC(OPTIMAL_BATCH_SIZE, optimalBatchNum, 256); + IE_SET_METRIC(RANGE_FOR_STREAMS, rangeOfStreams, std::make_tuple(1, 2)); + ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(OPTIMAL_BATCH_SIZE)), _)) + .WillByDefault(RETURN_MOCK_VALUE(optimalBatchNum)); + ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(RANGE_FOR_STREAMS)), _)) + .WillByDefault(RETURN_MOCK_VALUE(rangeOfStreams)); + } else { + metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""}); + metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""}); + } ON_CALL(*plugin, SelectDevice(_, _, _)).WillByDefault(Return(metaDevices[1])); ON_CALL(*plugin, ParseMetaDevices(_, _)).WillByDefault(Return(metaDevices)); EXPECT_CALL(*plugin, ParseMetaDevices(_, _)).Times(1); @@ -241,27 +261,28 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) { } -// ConfigParams {unsigned int, int, bool, +// ConfigParams {bool, unsigned int, int, bool, // unsigned int, int, bool, unsigned int} // // every element for ConfigParams -// {cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load, +// {is throughput mode, cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load, // gpuOptimalNum, customer hope for gpu infer requset num, if gpu sleep when load, // expectOptimalNum of Auto ExecNetwork} // const std::vector testConfigs = { - ConfigParams {1, -1, false, 2, -1, true, 8}, - ConfigParams {1, -1, false, 10, -1, true, 8}, - ConfigParams {12, -1, false, 2, -1, true, 12}, - ConfigParams {12, -1, false, 10, -1, true, 12}, - ConfigParams {1, -1, true, 2, -1, false, 8}, - ConfigParams {1, -1, true, 10, -1, false, 10}, - ConfigParams {6, -1, true, 2, -1, false, 8}, - ConfigParams {6, -1, true, 10, -1, false, 10}, - ConfigParams {6, 4, false, 2, 3, true, 8}, - ConfigParams {6, 4, false, 10, 3, true, 8}, - ConfigParams {1, 4, true, 2, 3, false, 8}, - ConfigParams {1, 4, true, 10, 3, false, 10} + ConfigParams {false, 1, -1, false, 2, -1, true, 8}, + ConfigParams {false, 1, -1, false, 10, -1, true, 8}, + ConfigParams {false, 12, -1, false, 2, -1, true, 12}, + ConfigParams {false, 12, -1, false, 10, -1, true, 12}, + ConfigParams {false, 1, -1, true, 2, -1, false, 8}, + ConfigParams {false, 1, -1, true, 10, -1, false, 10}, + ConfigParams {false, 6, -1, true, 2, -1, false, 8}, + ConfigParams {false, 6, -1, true, 10, -1, false, 10}, + ConfigParams {false, 6, 4, false, 2, 3, true, 8}, + ConfigParams {false, 6, 4, false, 10, 3, true, 8}, + ConfigParams {false, 1, 4, true, 2, 3, false, 8}, + ConfigParams {false, 1, 4, true, 10, 3, false, 10}, + ConfigParams {true, 1, 4, false, 10, 3, true, 512} }; INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetworkGetMetric, diff --git a/src/tests_deprecated/behavior/shared_tests/CMakeLists.txt b/src/tests_deprecated/behavior/shared_tests/CMakeLists.txt index 82fcfd2183b..761ee14f6fd 100644 --- a/src/tests_deprecated/behavior/shared_tests/CMakeLists.txt +++ b/src/tests_deprecated/behavior/shared_tests/CMakeLists.txt @@ -14,6 +14,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI) add_dependencies(${TARGET_NAME} ov_auto_plugin) endif() +if(ENABLE_AUTO_BATCH) + add_dependencies(${TARGET_NAME} ov_auto_batch_plugin) +endif() + + target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/plugin_tests") target_link_libraries(${TARGET_NAME} PUBLIC diff --git a/src/tests_deprecated/functional/shared_tests/CMakeLists.txt b/src/tests_deprecated/functional/shared_tests/CMakeLists.txt index 6bb4ba313a3..fcbfbcdcc59 100644 --- a/src/tests_deprecated/functional/shared_tests/CMakeLists.txt +++ b/src/tests_deprecated/functional/shared_tests/CMakeLists.txt @@ -25,6 +25,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI) add_dependencies(${TARGET_NAME} ov_auto_plugin) endif() +if(ENABLE_AUTO_BATCH) + add_dependencies(${TARGET_NAME} ov_auto_batch_plugin) +endif() + set_ie_threading_interface_for(${TARGET_NAME}) ie_faster_build(${TARGET_NAME}