Auto Batching impl (#7883)

* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)

(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)

* applying/accomodating the API changes after rebase to the master

* replaying modified version of actual batch selection

* eearly experiments with model mem footprint

* changes from rebasing to the latest master

* experimenting with DG1 on the batch size selection, also collecting the mem footprint

* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap

* quick-n-dirty batch footpint vs device total mem

* code style

* testing which models perform badly due to kernels and NOT (batched) footprint

* stub  pipeline task to comunicate the readiness rather than promise/future

* quick-n-dirty timeout impl

* explicit _completionTasks,reverting BA to use the timeout

* inputs outputs copies, works with AUTO and demo now

* accomodate the config per device-id, after rebase to the latest master

* allowing the auto-batching only with tput hint to let more conventional tests pass

* fix the pre-mature timeout restaring via waiting for batch1 requests completion

* moved the bacthed request statring ( along with input copies) to the dedicated thread

* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution

* code style

* increasing the timeout to test the ssd_* models perf (timeout?) issues

* reducing number of output stuff in BA to avoid bloating the logs in experiments

* more aggressive batching for experiments, not limited to 32 and also 4 as a min

* more accurate timeout debugging info

* getting the reqs limitation from the plugin SetConfig as well

* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context

* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)

* auto-batching functional test (with results check vs ref) and GPU instance for that

* fixed arithemtic on blobs ptrs

* clang

* handling possible batched network failure

* BATCH as the constants device name in test

* ENABLE_BATCH

* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)

* DetectionOutput hetero test for the CPU

* reenabling the Auto-Batching in the AUTO

* auto-batching device enabled in the test

* fixed the DO test

* improve the loading loop logic

* brushed the config keys

* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests

* fix the test after refactoring

* clang

* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now

* auto-batching hetero test (subgraph with DetectionOutput)

* fixed minor changes that were result of experiments with impl

* code-style

* brushing, disabling CPU's HETERO tests until planned activity for 22.2

* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team

* remote blobs tests for the auto-batching (old API)

* brushed names a bit

* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests

* fixed the ieUnitTests with adding CreateContext stub to the MockICore

* clang

* improved remote-blobs tests

* revert the back BA from exeprimenents with AB + device_use_mem

* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE

* remote blobs 2.0 tests, issue with context having the orig device name

* debugging DG1 perf drop (presumably due to non-fitting the device-mem)

* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2

* remote blobs 2.0 tests for different tensor sharing types

* converting assert to throw to accomodate legacy API where the lock() was possible to be called

* revert the timeout back to avoid mixing the studies, fixed the footprint calc

* reverting to estimating the max batch by extrapolating from bacth1 size

* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication

* even graceful batch 1 handling without duplication

* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching

* AutoBatchPlugin -> ov_auto_batch_plugin

* WA for gcc 4.8

* clang

* fix misprint

* fixed errors resulted from recent OV's Variant to Any transition

* skip auto-batching for already-batched networks

* AUTO_BATCH_TIMEOUT and tests

* GPU-specific L3

* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic

* debugging device info

* enabling the config tests for the GPU and fixing the Auto-batching tests to pass

* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers

* skip auto-batching for RNNs and alikes (e.g. single CHW input)

* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating

* brushing

* Auto plugin GetMetric support gpu auto-batch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add test case

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* add comments on test

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the vars names, alos adding the excpetion handling

* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures

* add try catch

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* brushing the code changed in the GPU plugin

* Auto-Batch requests tests

* brushed varibles a bit (ref)

* cleaned debug output from the ie_core

* cleaned cmake for the Auto-Batch

* removed batchN estimation from batch1

* cleaned from debug printf

* comments, cleanup

* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13

* Adding back  removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1.

* brushing ie_core.cpp

* fix 32bit compilation

* Code review: ENABLE_AUTO_BATCH

* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching

* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE  wrt MODEL_PTR

* default value for the OPTIMAL_BATCH_SIZE

* clang

* accomodate new func tests location

* fix shuffle of headers after clang + copyrights

* fixed misprint made during code refactoring

* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header

* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching

* switching from the unsafe size() and minimizing time under lock

* code style

* brushed the ApplyAutoBatching

* brushed the netric/config names and descriptions

* completed the core intergration tests for the auto-batching

* ExecGraphInfo and check for incorrect cfg

* removed explicit dependencies from cmake file of the plugin

* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests

Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
This commit is contained in:
Maxim Shevtsov 2021-12-24 12:55:22 +03:00 committed by GitHub
parent bc5da8d522
commit 49b5e5728b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
47 changed files with 1882 additions and 188 deletions

View File

@ -100,6 +100,8 @@ ie_option (ENABLE_GAPI_PREPROCESSING "Enables G-API preprocessing" ON)
ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON)
ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON)
ie_option (ENABLE_AUTO_BATCH "Enables Auto-Batching Plugin" ON)
ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON)
ie_option (ENABLE_TEMPLATE "Enable template plugin" ON)

View File

@ -141,6 +141,9 @@ When specifying key values as raw strings (that is, when using Python API), omit
@snippet snippets/GPU_Metric1.cpp part1
* OPTIMAL_BATCH_SIZE : Returns _optimal_ batch size for a given network on the given GPU device. The returned value is aligned to power of 2. Also, MODEL_PTR is the required option for this metric since the optimal batch size highly depends on the model. If the MODEL_PTR is not given, the value of 1 is returned. The example code to set the required and optional configs for this metric is available in the following snippet:
@snippet snippets/GPU_Metric1.cpp part2
## GPU Context and Video Memory Sharing RemoteBlob API
See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md)

View File

@ -14,4 +14,12 @@ options.insert(std::make_pair("AVAILABLE_DEVICE_MEM_SIZE", available_device_mem_
auto max_batch_size = core.GetMetric("GPU", GPU_METRIC_KEY(MAX_BATCH_SIZE), options).as<uint32_t>();
//! [part1]
//! [part2]
std::map<std::string, Parameter> opt = {{"MODEL_PTR", cnnNetwork.getFunction()}}; // Required. Same usage as for the MAX_BATCH_SIZE above. If not set, the OPTIONAL_BATCH_SIZE returns 1.
// This is not entirely GPU-specific metric (so METRIC_KEY is used rather than GPU_METRIC_KEY below),
// but the GPU is the only device that supports that at the moment.
// For the GPU, the metric already accommodates limitation for the on-device memory that the MAX_BATCH_SIZE poses.
// so OPTIMAL_BATCH_SIZE is always less than MAX_BATCH_SIZE. Unlike the latter it is also aligned to the power of 2.
auto optimal_batch_size = core.GetMetric("GPU", METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
//! [part2]
}

View File

@ -6,6 +6,7 @@
#include <string>
#include <vector>
#include <tuple>
namespace cldnn {
/// @addtogroup cpp_api C++ API
@ -25,6 +26,10 @@ struct gfx_version {
uint16_t major;
uint8_t minor;
uint8_t revision;
friend bool operator < (const gfx_version& l, const gfx_version& r) {
return std::tie(l.major, l.minor, l.revision)
< std::tie(r.major, r.minor, r.revision); // same order
}
};
/// @brief Information about the device properties and capabilities.

View File

@ -124,6 +124,7 @@ std::map<std::string, std::vector<InferenceEngine::Blob::Ptr>> getRemoteInputBlo
}
auto blob = InferenceEngine::gpu::make_shared_blob(desc, context, clBuffer.back());
blob->allocate();
remoteBlobs[name].push_back(blob);
};

View File

@ -109,8 +109,10 @@ std::vector<float> splitFloat(const std::string& s, char delim) {
std::vector<std::string> parseDevices(const std::string& device_string) {
std::string comma_separated_devices = device_string;
if (comma_separated_devices.find(":") != std::string::npos) {
comma_separated_devices = comma_separated_devices.substr(comma_separated_devices.find(":") + 1);
auto colon = comma_separated_devices.find(":");
if (colon != std::string::npos) {
auto bracket = comma_separated_devices.find("("); // e.g. in BATCH:GPU(4)
comma_separated_devices = comma_separated_devices.substr(colon + 1, bracket - colon - 1);
}
if ((comma_separated_devices == "MULTI") || (comma_separated_devices == "HETERO"))
return std::vector<std::string>();

View File

@ -26,6 +26,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
add_dependencies(${TARGET_NAME} ov_auto_plugin)
endif()
if(ENABLE_AUTO_BATCH)
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
endif()
if(ENABLE_INTEL_CPU)
add_dependencies(${TARGET_NAME} ov_intel_cpu_plugin)
endif()

View File

@ -16,6 +16,7 @@
#include "cpp/ie_cnn_network.h"
#include "cpp_interfaces/interface/ie_iexecutable_network_internal.hpp"
#include "ie_parameter.hpp"
#include "ie_remote_context.hpp"
#include "threading/ie_itask_executor.hpp"
namespace InferenceEngine {
@ -60,6 +61,22 @@ public:
const std::string& deviceName,
const std::map<std::string, std::string>& config = {}) = 0;
/**
* @brief Creates an executable network from a network object.
*
* Users can create as many networks as they need and use
* them simultaneously (up to the limitation of the hardware resources)
*
* @param network CNNNetwork object acquired from Core::ReadNetwork
* @param remoteCtx "Remote" (non-CPU) accelerator device-specific execution context to use
* @param config Optional map of pairs: (config parameter name, config parameter value) relevant only for this load
* operation
* @return An executable network reference
*/
virtual SoExecutableNetworkInternal LoadNetwork(const CNNNetwork& network,
const RemoteContext::Ptr& remoteCtx,
const std::map<std::string, std::string>& config = {}) = 0;
/**
* @brief Creates an executable network from a model file.
*
@ -142,6 +159,16 @@ public:
*/
virtual bool DeviceSupportsImportExport(const std::string& deviceName) const = 0;
/**
* @brief Create a new shared context object on specified accelerator device
* using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
* @param deviceName Name of a device to create new shared context on.
* @param params Map of device-specific shared context parameters.
* @return A shared pointer to a created remote context.
*/
virtual InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
const InferenceEngine::ParamMap&) = 0;
virtual bool isNewAPI() const = 0;
/**
@ -165,6 +192,7 @@ public:
static std::vector<std::string> getHeteroDevices(std::string fallbackDevice);
static std::vector<std::string> getMultiDevices(std::string devicesList);
static std::string getBatchDevice(std::string devicesList);
};
} // namespace InferenceEngine

View File

@ -23,14 +23,12 @@ struct MemBandwidthPressure {
static MemBandwidthPressure MemBandwidthPressureTolerance(
const std::shared_ptr<ngraph::Function> nGraphFunc,
const float L2_cache_size,
const float L3_cache_size,
const float cache_size,
const float memThresholdAssumeLimited = MemBandwidthPressure::LIMITED) {
int total_convs = 0, mem_limited_convs = 0, compute_convs = 0, total_gemms = 0, mem_limited_gemms = 0,
total_deconvs = 0, compute_deconvs = 0, mem_limited_deconvs = 0;
auto memLimitedFactor = [&](int size_data_moved, int datatype_size) -> float {
return (L2_cache_size * 1.0f /*util factor, tbd */
/ (size_data_moved * datatype_size));
auto memLimitedFactor = [&](int size_data_moved, int datatype_size = 4) -> float {
return (cache_size / (size_data_moved * datatype_size));
};
auto isLowPrecision = [&](ngraph::element::Type type) -> bool {
return (type == ngraph::element::i8) || (type == ngraph::element::u8);

View File

@ -0,0 +1,86 @@
// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
///////////////////////////////////////////////////////////////////////////////////////////////////
#pragma once
#include <cstddef>
#include <mutex>
#include <queue>
#include <type_traits>
#include "ie_parallel.hpp"
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
# include <tbb/concurrent_queue.h>
#endif
namespace InferenceEngine {
template <typename T>
class ThreadSafeQueueWithSize {
public:
void push(T value) {
std::lock_guard<std::mutex> lock(_mutex);
_queue.push(std::move(value));
}
bool try_pop(T& value) {
std::lock_guard<std::mutex> lock(_mutex);
if (!_queue.empty()) {
value = std::move(_queue.front());
_queue.pop();
return true;
} else {
return false;
}
}
size_t size() {
std::lock_guard<std::mutex> lock(_mutex);
return _queue.size();
}
protected:
std::queue<T> _queue;
std::mutex _mutex;
};
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
template <typename T>
using ThreadSafeQueue = tbb::concurrent_queue<T>;
template <typename T>
using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
#else
template <typename T>
using ThreadSafeQueue = ThreadSafeQueueWithSize<T>;
template <typename T>
class ThreadSafeBoundedQueue {
public:
ThreadSafeBoundedQueue() = default;
bool try_push(T value) {
std::lock_guard<std::mutex> lock(_mutex);
if (_capacity) {
_queue.push(std::move(value));
}
return _capacity;
}
bool try_pop(T& value) {
std::lock_guard<std::mutex> lock(_mutex);
if (_capacity && !_queue.empty()) {
value = std::move(_queue.front());
_queue.pop();
return true;
} else {
return false;
}
}
void set_capacity(std::size_t newCapacity) {
std::lock_guard<std::mutex> lock(_mutex);
_capacity = newCapacity;
}
protected:
std::queue<T> _queue;
std::mutex _mutex;
bool _capacity = false;
};
#endif
} // namespace InferenceEngine

View File

@ -118,6 +118,18 @@ DECLARE_METRIC_VALUE(BATCHED_BLOB);
* String value for metric name is "RANGE_FOR_STREAMS".
*/
DECLARE_METRIC_KEY(RANGE_FOR_STREAMS, std::tuple<unsigned int, unsigned int>);
/**
* @brief Metric to query information optimal batch size for the given device and the network
*
* Metric returns a value of unsigned int type,
* Returns optimal batch size for a given network on the given device. The returned value is aligned to power of 2.
* Also, MODEL_PTR is the required option for this metric since the optimal batch size depends on the model,
* so if the MODEL_PTR is not given, the result of the metric is always 1.
* For the GPU the metric is queried automatically whenever the OpenVINO performance hint for the throughput is used,
* so that the result (>1) governs the automatic batching (transparently to the application).
* The automatic batching can be disabled with ALLOW_AUTO_BATCHING set to NO
*/
DECLARE_METRIC_KEY(OPTIMAL_BATCH_SIZE, unsigned int);
/**
* @brief Metric to provide a hint for a range for number of async infer requests. If device supports streams,
@ -250,6 +262,15 @@ DECLARE_CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS);
DECLARE_CONFIG_VALUE(YES);
DECLARE_CONFIG_VALUE(NO);
/**
* @brief Auto-batching configuration, string for the device + batch size, e.g. "GPU(4)"
*/
DECLARE_CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG);
/**
* @brief Auto-batching configuration: string with timeout (in ms), e.g. "100"
*/
DECLARE_CONFIG_KEY(AUTO_BATCH_TIMEOUT);
/**
* @brief Limit `#threads` that are used by Inference Engine for inference on the CPU.
*/

View File

@ -46,6 +46,7 @@
#endif
using namespace InferenceEngine::PluginConfigParams;
using namespace InferenceEngine;
using namespace std::placeholders;
namespace ov {
@ -94,6 +95,9 @@ Parsed<T> parseDeviceNameIntoConfig(const std::string& deviceName, const std::ma
config_[ie::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES] =
deviceName.substr(std::string("AUTO:").size());
}
} else if (deviceName_.find("BATCH:") == 0) {
deviceName_ = "BATCH";
config_[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceName.substr(6);
} else {
ie::DeviceIDParser parser(deviceName_);
deviceName_ = parser.getDeviceName();
@ -480,14 +484,22 @@ public:
return newAPI;
}
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(const ie::CNNNetwork& network,
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(
const ie::CNNNetwork& network,
const std::shared_ptr<ie::RemoteContext>& context,
const std::map<std::string, std::string>& config) {
const std::map<std::string, std::string>& config) override {
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext");
if (context == nullptr) {
IE_THROW() << "Remote context is null";
}
// have to deduce the device name/config from the context first
auto parsed = parseDeviceNameIntoConfig(context->getDeviceName(), config);
std::string& deviceName = parsed._deviceName;
std::map<std::string, std::string>& config_with_batch = parsed._config;
// if auto-batching is applicable, the below function will patch the device name and config accordingly:
ApplyAutoBatching(network, deviceName, config_with_batch);
parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
auto plugin = GetCPPPluginByName(parsed._deviceName);
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> res;
auto cacheManager = coreConfig.getCacheConfig()._cacheManager;
@ -508,12 +520,59 @@ public:
return res;
}
void ApplyAutoBatching(const ie::CNNNetwork& network,
std::string& deviceName,
std::map<std::string, std::string>& config_with_batch) {
if (deviceName.find("BATCH") != std::string::npos) {
// explicitly enabled Auto-Batching e.g. in the tests
auto pos = deviceName.find_first_of(":");
if (pos != std::string::npos) {
auto deviceNameWithBatchSize = deviceName.substr(pos + 1);
auto deviceNameWithoutBatch = DeviceIDParser::getBatchDevice(deviceNameWithBatchSize);
auto function = network.getFunction();
// have to execute the DetectionOutput separately (without batching)
// as this layer mix-in the values from the different inputs (batch id)
bool bDetectionOutput = false;
const std::string detectionOutputOpName = ngraph::op::DetectionOutput::get_type_info_static().name;
const std::string resultOpName = ngraph::op::Result::get_type_info_static().name;
for (auto&& node : function->get_ops()) {
auto isDetectionOutputParent = [&detectionOutputOpName](decltype(node)& nd) {
for (size_t n = 0; n < nd->get_input_size(); n++) {
if (detectionOutputOpName == nd->get_input_node_ptr(n)->get_type_info().name)
return true;
}
return false;
};
if ((detectionOutputOpName == node->get_type_info().name) ||
((resultOpName == node->get_type_info().name) && isDetectionOutputParent(node))) {
node->get_rt_info()["affinity"] = deviceNameWithoutBatch;
bDetectionOutput = true;
} else {
node->get_rt_info()["affinity"] = "BATCH";
}
}
if (bDetectionOutput) {
deviceName = "HETERO:BATCH," + deviceNameWithoutBatch;
config_with_batch[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceNameWithBatchSize;
} else {
deviceName = "BATCH:" + deviceNameWithBatchSize;
}
}
}
}
ie::SoExecutableNetworkInternal LoadNetwork(const ie::CNNNetwork& network,
const std::string& deviceName,
const std::string& deviceNameOrig,
const std::map<std::string, std::string>& config) override {
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::CNN");
bool forceDisableCache = config.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
auto parsed = parseDeviceNameIntoConfig(deviceName, config);
std::string deviceName = deviceNameOrig;
std::map<std::string, std::string> config_with_batch = config;
// if auto-batching is applicable, the below function will patch the device name and config accordingly:
ApplyAutoBatching(network, deviceName, config_with_batch);
bool forceDisableCache = config_with_batch.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
auto parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
if (forceDisableCache) {
// remove this config key from parsed as plugins can throw unsupported exception
parsed._config.erase(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE));
@ -732,6 +791,19 @@ public:
return devices;
}
/**
* @brief Create a new shared context object on specified accelerator device
* using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
* @param deviceName Name of a device to create new shared context on.
* @param params Map of device-specific shared context parameters.
* @return A shared pointer to a created remote context.
*/
InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
const InferenceEngine::ParamMap& params) override {
auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
return GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
}
/**
* @brief Returns reference to CPP plugin wrapper by a device name
* @param deviceName A name of device
@ -1030,6 +1102,12 @@ public:
deviceNames = ie::DeviceIDParser::getMultiDevices(deviceName.substr(pos + 1));
}
deviceNames.emplace_back("AUTO");
} else if (deviceName.find("BATCH") == 0) {
auto pos = deviceName.find_first_of(":");
if (pos != std::string::npos) {
deviceNames = {ie::DeviceIDParser::getBatchDevice(deviceName.substr(pos + 1))};
}
deviceNames.push_back("BATCH");
} else {
deviceNames.push_back(deviceName);
}
@ -1120,8 +1198,8 @@ std::vector<std::string> DeviceIDParser::getHeteroDevices(std::string fallbackDe
}
std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList) {
std::vector<std::string> deviceNames;
auto trim_request_info = [](std::string device_with_requests) {
std::set<std::string> deviceNames;
auto trim_request_info = [](const std::string& device_with_requests) {
auto opening_bracket = device_with_requests.find_first_of('(');
return device_with_requests.substr(0, opening_bracket);
};
@ -1132,14 +1210,36 @@ std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList
// we skip the #requests info here
while ((pos = devicesList.find(delimiter)) != std::string::npos) {
auto d = devicesList.substr(0, pos);
deviceNames.push_back(trim_request_info(d));
if (d.find("BATCH") == 0) {
deviceNames.insert("BATCH");
auto p = d.find_first_of(":");
if (p != std::string::npos)
deviceNames.insert(DeviceIDParser::getBatchDevice(d.substr(p + 1)));
} else {
deviceNames.insert(trim_request_info(d));
}
devicesList.erase(0, pos + 1);
}
if (!devicesList.empty())
deviceNames.push_back(trim_request_info(devicesList));
if (!devicesList.empty()) {
if (devicesList.find("BATCH") == 0) {
deviceNames.insert("BATCH");
auto p = devicesList.find_first_of(":");
if (p != std::string::npos)
deviceNames.insert(DeviceIDParser::getBatchDevice(devicesList.substr(p + 1)));
} else {
deviceNames.insert(trim_request_info(devicesList));
}
}
return std::vector<std::string>(deviceNames.begin(), deviceNames.end());
}
return deviceNames;
std::string DeviceIDParser::getBatchDevice(std::string device) {
auto trim_request_info = [](const std::string& device_with_requests) {
auto opening_bracket = device_with_requests.find_first_of('(');
return device_with_requests.substr(0, opening_bracket);
};
return trim_request_info(device);
}
class Core::Impl : public ov::runtime::CoreImpl {
@ -1207,18 +1307,7 @@ ExecutableNetwork Core::LoadNetwork(const std::string& modelPath, const std::map
}
RemoteContext::Ptr Core::CreateContext(const std::string& deviceName, const ParamMap& params) {
if (deviceName.find("HETERO") == 0) {
IE_THROW() << "HETERO device does not support remote context";
}
if (deviceName.find("MULTI") == 0) {
IE_THROW() << "MULTI device does not support remote context";
}
if (deviceName.find("AUTO") == 0) {
IE_THROW() << "AUTO device does not support remote context";
}
auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
return _impl->GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
return _impl->CreateContext(deviceName, params);
}
RemoteContext::Ptr Core::GetDefaultContext(const std::string& deviceName) {

View File

@ -21,3 +21,7 @@ endif()
if(ENABLE_AUTO OR ENABLE_MULTI)
add_subdirectory(auto)
endif()
if(ENABLE_AUTO_BATCH)
add_subdirectory(auto_batch)
endif()

View File

@ -156,7 +156,8 @@ MultiDeviceExecutableNetwork::MultiDeviceExecutableNetwork(const std::string&
, _needPerfCounters(needPerfCounters)
, _multiPlugin(plugin)
, _context(context)
, _workModeIsAUTO(true) {
, _workModeIsAUTO(true)
, _network(network) {
if (_multiPlugin->GetCore() == nullptr) {
IE_THROW() << "Please, work with " << _multiPlugin->GetName() << " device via InferencEngine::Core object";
}
@ -667,10 +668,30 @@ InferenceEngine::Parameter MultiDeviceExecutableNetwork::GetMetric(const std::st
real = _loadContext[ACTUALDEVICE].
executableNetwork->GetMetric(name).as<unsigned int>();
} else {
IE_ASSERT(_loadContext[CPU].isAlready == true);
real = _loadContext[CPU].
executableNetwork->GetMetric(name).as<unsigned int>();
std::unique_lock<std::mutex> lock(_confMutex);
auto deviceInfo = _loadContext[ACTUALDEVICE].deviceInfo;
lock.unlock();
if (deviceInfo.deviceName.find("GPU") != std::string::npos) {
const auto& mode = deviceInfo.config.find(CONFIG_KEY(PERFORMANCE_HINT));
if (mode != deviceInfo.config.end() && mode->second == CONFIG_VALUE(THROUGHPUT)) {
std::map<std::string, InferenceEngine::Parameter> options;
options["MODEL_PTR"] = _network.getFunction(); // CNNntework
try {
auto optimalBatchSize = _core->GetMetric(deviceInfo.deviceName,
METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
auto rangeOfStreams = _core->GetMetric(deviceInfo.deviceName,
METRIC_KEY(RANGE_FOR_STREAMS), options).as<std::tuple<unsigned int, unsigned int>>();
real = (std::max)(real, std::get<1>(rangeOfStreams) * optimalBatchSize);
} catch (const InferenceEngine::Exception &iie) {
LOG_WARNING("[AUTOPLUGIN]get optimal infer requset num for GPU auto-batch failed :%s", iie.what());
}
unsigned int res = std::max(8u, real);
}
}
}
unsigned int res = (std::max)(8u, real);
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, res);
}

View File

@ -7,22 +7,17 @@
#include <atomic>
#include <mutex>
#include <queue>
#include <unordered_map>
#include <map>
#include <vector>
#include <string>
#include <cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp>
#include <ie_parallel.hpp>
#include <threading/ie_itask_executor.hpp>
#include <threading/ie_executor_manager.hpp>
#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
#include "threading/ie_thread_safe_containers.hpp"
#include "threading/ie_itask_executor.hpp"
#include "threading/ie_executor_manager.hpp"
#include "ie_icore.hpp"
#if (IE_THREAD == IE_THREAD_TBB || IE_THREAD == IE_THREAD_TBB_AUTO)
# include <tbb/concurrent_queue.h>
#endif
#ifdef MULTIUNITTEST
#define MOCKTESTMACRO virtual
#define MultiDevicePlugin MockMultiDevicePlugin
@ -79,66 +74,6 @@ enum AutoLoadContextIndex {
template<typename T>
using DeviceMap = std::unordered_map<DeviceName, T>;
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
template <typename T>
using ThreadSafeQueue = tbb::concurrent_queue<T>;
template <typename T>
using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
#else
template <typename T>
class ThreadSafeQueue {
public:
void push(T value) {
std::lock_guard<std::mutex> lock(_mutex);
_queue.push(std::move(value));
}
bool try_pop(T& value) {
std::lock_guard<std::mutex> lock(_mutex);
if (!_queue.empty()) {
value = std::move(_queue.front());
_queue.pop();
return true;
} else {
return false;
}
}
protected:
std::queue<T> _queue;
std::mutex _mutex;
};
template <typename T>
class ThreadSafeBoundedQueue {
public:
ThreadSafeBoundedQueue() = default;
bool try_push(T value) {
std::lock_guard<std::mutex> lock(_mutex);
if (_capacity) {
_queue.push(std::move(value));
}
return _capacity;
}
bool try_pop(T& value) {
std::lock_guard<std::mutex> lock(_mutex);
if (_capacity && !_queue.empty()) {
value = std::move(_queue.front());
_queue.pop();
return true;
} else {
return false;
}
}
void set_capacity(std::size_t newCapacity) {
std::lock_guard<std::mutex> lock(_mutex);
_capacity = newCapacity;
}
protected:
std::queue<T> _queue;
std::mutex _mutex;
bool _capacity = false;
};
#endif
class MultiDeviceExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault,
public InferenceEngine::ITaskExecutor {
public:
@ -148,7 +83,7 @@ public:
InferenceEngine::Task _task;
std::exception_ptr _exceptionPtr = nullptr;
};
using NotBusyWorkerRequests = ThreadSafeBoundedQueue<WorkerInferRequest*>;
using NotBusyWorkerRequests = InferenceEngine::ThreadSafeBoundedQueue<WorkerInferRequest*>;
explicit MultiDeviceExecutableNetwork(const DeviceMap<InferenceEngine::SoExecutableNetworkInternal>& networksPerDevice,
const std::vector<DeviceInformation>& networkDevices,
@ -186,8 +121,8 @@ public:
std::vector<DeviceInformation> _devicePriorities;
const std::vector<DeviceInformation> _devicePrioritiesInitial;
DeviceMap<InferenceEngine::SoExecutableNetworkInternal> _networksPerDevice;
ThreadSafeQueue<InferenceEngine::Task> _inferPipelineTasks;
DeviceMap<std::unique_ptr<ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
InferenceEngine::ThreadSafeQueue<InferenceEngine::Task> _inferPipelineTasks;
DeviceMap<std::unique_ptr<InferenceEngine::ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
DeviceMap<NotBusyWorkerRequests> _idleWorkerRequests;
DeviceMap<std::vector<WorkerInferRequest>> _workerRequests;
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
@ -217,6 +152,7 @@ private:
std::promise<void> _firstLoadPromise;
mutable AutoLoadContext _loadContext[CONTEXTNUM];
mutable std::mutex _confMutex;
const InferenceEngine::CNNNetwork _network;
};
} // namespace MultiDevicePlugin

View File

@ -0,0 +1,20 @@
# Copyright (C) 2018-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
set(TARGET_NAME "ov_auto_batch_plugin")
file(GLOB SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
file(GLOB HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
ie_add_plugin(NAME ${TARGET_NAME}
DEVICE_NAME "BATCH"
SOURCES ${SOURCES} ${HEADERS}
VERSION_DEFINES_FOR auto_batch.cpp ADD_CLANG_FORMAT)
target_link_libraries(${TARGET_NAME} PRIVATE Threads::Threads)
ie_add_api_validator_post_build_step(TARGET ${TARGET_NAME})
set_target_properties(${TARGET_NAME} PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE ${ENABLE_LTO})

View File

@ -0,0 +1,731 @@
// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
///////////////////////////////////////////////////////////////////////////////////////////////////
#include "auto_batch.hpp"
#include <cpp_interfaces/interface/ie_internal_plugin_config.hpp>
#include <ie_icore.hpp>
#include <ie_ngraph_utils.hpp>
#include <ie_performance_hints.hpp>
#include <iostream>
#include <map>
#include <memory>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
namespace AutoBatchPlugin {
using namespace InferenceEngine;
std::vector<std::string> supported_configKeys = {CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CONFIG_KEY(AUTO_BATCH_TIMEOUT)};
template <Precision::ePrecision precision>
Blob::Ptr create_shared_blob_on_top_of_batched_blob(Blob::Ptr batched_blob, size_t batch_id, size_t batch_num) {
typedef typename PrecisionTrait<precision>::value_type TYPE;
typedef typename std::add_pointer<TYPE>::type TYPEPTR;
auto ptr = batched_blob->buffer().as<TYPEPTR>();
auto sizePerBatch = batched_blob->size() / batch_num;
auto layout = batched_blob->getTensorDesc().getLayout();
SizeVector dims = batched_blob->getTensorDesc().getDims();
// the below code is a placeholder for the WIP (22.1) functionality
// that will check the reshaping by the batch is robust (CVS-51744)
if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
layout == InferenceEngine::Layout::NDHWC) {
dims[0] = 1;
assert(batched_blob->getTensorDesc().getPrecision() == precision);
return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()},
ptr + sizePerBatch * batch_id,
sizePerBatch);
} else {
// same blob for all requests (e.g. constants)
return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()}, ptr);
}
}
// ------------------------------AutoBatchInferRequest----------------------------
AutoBatchInferRequest::AutoBatchInferRequest(const InputsDataMap& networkInputs,
const OutputsDataMap& networkOutputs,
AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
int batch_id,
int num_batch,
bool needPerfCounters)
: IInferRequestInternal(networkInputs, networkOutputs),
_myBatchedRequestWrapper(workerRequestPtr),
_needPerfCounters(needPerfCounters),
_batchId(batch_id),
_batchSize(num_batch) {
// Allocate all input blobs
for (const auto& it : networkInputs) {
auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
Blob::Ptr res;
switch (it.second->getTensorDesc().getPrecision()) {
case InferenceEngine::Precision::FP32:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::I32:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::I8:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::U16:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::I16:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::U8:
case InferenceEngine::Precision::BOOL:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
default:
IE_THROW() << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
}
_inputs[it.first] = res;
}
// Allocate all output blobs
for (const auto& it : networkOutputs) {
auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
Blob::Ptr res;
switch (it.second->getTensorDesc().getPrecision()) {
case InferenceEngine::Precision::FP32:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::I32:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::I8:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::U16:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::I16:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
case InferenceEngine::Precision::U8:
case InferenceEngine::Precision::BOOL:
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
batch_id,
num_batch);
break;
default:
IE_THROW(NotImplemented) << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
}
_outputs[it.first] = res;
}
}
void AutoBatchInferRequest::SetBlobsToAnotherRequest(SoIInferRequestInternal& req) {
for (const auto& it : _networkInputs) {
auto& name = it.first;
// this request is already in BUSY state, so using the internal functions safely
auto blob = GetBlob(name);
if (req->GetBlob(name) != blob)
req->SetBlob(name, blob);
}
for (const auto& it : _networkOutputs) {
auto& name = it.first;
// this request is already in BUSY state, so using the internal functions safely
auto blob = GetBlob(name);
if (req->GetBlob(name) != blob)
req->SetBlob(name, blob);
}
}
void AutoBatchInferRequest::CopyInputsIfNeeded() {
for (const auto& it : _networkInputs) {
auto& name = it.first;
// this request is already in BUSY state, so using the internal functions safely
CopyBlobIfNeeded(GetBlob(name), _myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), true);
}
}
void AutoBatchInferRequest::CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src,
InferenceEngine::Blob::Ptr dst,
bool bInput) {
auto bufferDst = dst->buffer();
auto ptrDst = bufferDst.as<char*>();
auto bufferSrc = src->cbuffer();
auto ptrSrc = bufferSrc.as<const char*>();
ptrdiff_t szDst = dst->byteSize();
ptrdiff_t szSrc = src->byteSize();
if (bInput) {
ptrdiff_t offset = szSrc != szDst ? _batchId * szDst / _batchSize : 0;
if ((ptrDst + offset) == ptrSrc)
return;
else
memcpy(ptrDst + offset, ptrSrc, szSrc);
} else {
ptrdiff_t offset = szSrc != szDst ? _batchId * szSrc / _batchSize : 0;
if ((ptrSrc + offset) == ptrDst)
return;
else
memcpy(ptrDst, ptrSrc + offset, szDst);
}
}
void AutoBatchInferRequest::CopyOutputsIfNeeded() {
for (const auto& it : _networkOutputs) {
auto& name = it.first;
// this request is already in BUSY state, so using the internal functions safely
CopyBlobIfNeeded(_myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), GetBlob(name), false);
}
}
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> AutoBatchInferRequest::GetPerformanceCounts() const {
return _perfMap;
}
AutoBatchAsyncInferRequest::AutoBatchAsyncInferRequest(
const AutoBatchInferRequest::Ptr& inferRequest,
const bool needPerfCounters,
InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
const ITaskExecutor::Ptr& callbackExecutor)
: AsyncInferRequestThreadSafeDefault(inferRequest, nullptr, callbackExecutor),
_inferRequestWithoutBatch(inferRequestWithoutBatch),
_inferRequest{inferRequest} {
// this executor starts the inference while the task (checking the result) is passed to the next stage
struct ThisRequestExecutor : public ITaskExecutor {
explicit ThisRequestExecutor(AutoBatchAsyncInferRequest* _this_) : _this{_this_} {}
void run(Task task) override {
auto& workerInferRequest = _this->_inferRequest->_myBatchedRequestWrapper;
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
t.first = _this;
t.second = std::move(task);
workerInferRequest._tasks.push(t);
// it is ok to call size() here as the queue only grows (and the bulk removal happens under the mutex)
const int sz = workerInferRequest._tasks.size();
if (sz == workerInferRequest._batchSize) {
workerInferRequest._cond.notify_one();
}
};
AutoBatchAsyncInferRequest* _this = nullptr;
};
_pipeline = {
{/*TaskExecutor*/ std::make_shared<ThisRequestExecutor>(this), /*task*/ [this, needPerfCounters] {
if (this->_inferRequest->_exceptionPtr) // if the exception happened in the batch1 fallback
std::rethrow_exception(this->_inferRequest->_exceptionPtr);
if (this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr) // when the batchN execution failed
std::rethrow_exception(this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr);
this->_inferRequest->CopyOutputsIfNeeded();
}}};
}
void AutoBatchAsyncInferRequest::Infer_ThreadUnsafe() {
InferUsingAsync();
}
AutoBatchAsyncInferRequest::~AutoBatchAsyncInferRequest() {
StopAndWait();
}
// ------------------------------AutoBatchExecutableNetwork----------------------------
AutoBatchExecutableNetwork::AutoBatchExecutableNetwork(
const InferenceEngine::SoExecutableNetworkInternal& networkWithBatch,
const InferenceEngine::SoExecutableNetworkInternal& networkWithoutBatch,
const DeviceInformation& networkDevice,
const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
const bool needPerfCounters)
: InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr,
std::make_shared<InferenceEngine::ImmediateExecutor>()),
_network{networkWithBatch},
_networkWithoutBatch{networkWithoutBatch},
_config{config},
_needPerfCounters{needPerfCounters} {
// WA for gcc 4.8 ( fails compilation with member init-list)
_device = networkDevice;
auto time_out = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
if (time_out != config.end())
_timeOut = ParseTimeoutValue(time_out->second.as<std::string>());
}
AutoBatchExecutableNetwork::~AutoBatchExecutableNetwork() {
_terminate = true;
for (auto w : _workerRequests) {
w->_thread.join();
}
_workerRequests.clear();
}
unsigned int AutoBatchExecutableNetwork::ParseTimeoutValue(const std::string& s) {
auto val = std::stoi(s);
if (val < 0)
IE_THROW(ParameterMismatch) << "Value for the " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " should be unsigned int";
return val;
}
std::shared_ptr<InferenceEngine::RemoteContext> AutoBatchExecutableNetwork::GetContext() const {
return _network->GetContext();
}
InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequestImpl(
InferenceEngine::InputsDataMap networkInputs,
InferenceEngine::OutputsDataMap networkOutputs) {
// todo : guard request creation from another thread/on-the-fly
auto num = _numRequestsCreated++;
auto batch_id = num % _device.batchForDevice;
if (!batch_id) { // need new request
_workerRequests.push_back(std::make_shared<WorkerInferRequest>());
auto workerRequestPtr = _workerRequests.back();
workerRequestPtr->_inferRequestBatched = {_network->CreateInferRequest(), _network._so};
workerRequestPtr->_batchSize = _device.batchForDevice;
workerRequestPtr->_completionTasks.resize(workerRequestPtr->_batchSize);
workerRequestPtr->_inferRequestBatched->SetCallback(
[workerRequestPtr, this](std::exception_ptr exceptionPtr) mutable {
if (exceptionPtr)
workerRequestPtr->_exceptionPtr = exceptionPtr;
IE_ASSERT(workerRequestPtr->_completionTasks.size() == (size_t)workerRequestPtr->_batchSize);
// notify the individual requests on the completion
for (int c = 0; c < workerRequestPtr->_batchSize; c++) {
workerRequestPtr->_completionTasks[c]();
}
// reset the timeout
workerRequestPtr->_cond.notify_one();
});
workerRequestPtr->_thread = std::thread([workerRequestPtr, this] {
while (1) {
std::cv_status status;
{
std::unique_lock<std::mutex> lock(workerRequestPtr->_mutex);
status = workerRequestPtr->_cond.wait_for(lock, std::chrono::milliseconds(_timeOut));
}
if (_terminate) {
break;
} else {
// as we pop the tasks from the queue only here
// it is ok to call size() (as the _tasks can only grow in parallel)
const int sz = workerRequestPtr->_tasks.size();
if (sz == workerRequestPtr->_batchSize) {
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
for (int n = 0; n < sz; n++) {
IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
workerRequestPtr->_completionTasks[n] = std::move(t.second);
t.first->_inferRequest->CopyInputsIfNeeded();
}
workerRequestPtr->_inferRequestBatched->StartAsync();
} else if ((status == std::cv_status::timeout) && sz) {
// timeout to collect the batch is over, have to execute the requests in the batch1 mode
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
// popping all tasks collected by the moment of the time-out and execute each with batch1
std::atomic<int> arrived = {0};
std::promise<void> all_completed;
auto all_completed_future = all_completed.get_future();
for (int n = 0; n < sz; n++) {
IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
t.first->_inferRequestWithoutBatch->SetCallback(
[t, sz, &arrived, &all_completed](std::exception_ptr p) {
if (p)
t.first->_inferRequest->_exceptionPtr = p;
t.second();
if (sz == ++arrived)
all_completed.set_value();
});
t.first->_inferRequest->SetBlobsToAnotherRequest(t.first->_inferRequestWithoutBatch);
t.first->_inferRequestWithoutBatch->StartAsync();
}
all_completed_future.get();
// now when all the tasks for this batch are completed, start waiting for the timeout again
}
}
}
});
}
return std::make_shared<AutoBatchInferRequest>(networkInputs,
networkOutputs,
*_workerRequests.back(),
batch_id,
_device.batchForDevice,
_needPerfCounters);
}
InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequest() {
auto syncRequestImpl = CreateInferRequestImpl(_networkInputs, _networkOutputs);
syncRequestImpl->setPointerToExecutableNetworkInternal(shared_from_this());
InferenceEngine::SoIInferRequestInternal inferRequestWithoutBatch = {_networkWithoutBatch->CreateInferRequest(),
_networkWithoutBatch._so};
return std::make_shared<AutoBatchAsyncInferRequest>(
std::static_pointer_cast<AutoBatchInferRequest>(syncRequestImpl),
_needPerfCounters,
inferRequestWithoutBatch,
_callbackExecutor);
}
std::shared_ptr<ngraph::Function> AutoBatchExecutableNetwork::GetExecGraphInfo() {
return _network->GetExecGraphInfo() ? _network->GetExecGraphInfo() : _networkWithoutBatch->GetExecGraphInfo();
}
void AutoBatchExecutableNetwork::SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) {
auto timeout = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
if (timeout == config.end() || config.size() > 1) {
IE_THROW() << "The only config that can be changed on the fly for the AutoBatching the is the "
<< CONFIG_KEY(AUTO_BATCH_TIMEOUT);
} else {
_timeOut = ParseTimeoutValue(timeout->second.as<std::string>());
}
}
InferenceEngine::Parameter AutoBatchExecutableNetwork::GetConfig(const std::string& name) const {
auto it = _config.find(name);
if (it != _config.end()) {
return it->second;
} else {
// find config key among networks config keys
auto param = _network->GetMetric(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
for (auto&& configKey : param.as<std::vector<std::string>>()) {
if (configKey == name) {
return _network->GetConfig(configKey);
}
}
IE_THROW(NotFound) << name << " not found in the ExecutableNetwork config";
}
}
InferenceEngine::Parameter AutoBatchExecutableNetwork::GetMetric(const std::string& name) const {
if (name == METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)) {
auto reqs = 0;
try {
auto hint = _network->GetConfig(CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)).as<std::string>();
reqs = InferenceEngine::PerfHintsConfig::CheckPerformanceHintRequestValue(hint);
if (!reqs) // no limitations from user, let's deduce the full blown #requests
// (multiplied by the devices capabilities to run multiple <batched> requests for further perf)
reqs = _device.batchForDevice *
_network->GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as<unsigned int>();
} catch (const InferenceEngine::Exception& iie) {
}
reqs = std::max(reqs, _device.batchForDevice); // round up to the possible user's value
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, reqs);
} else if (name == METRIC_KEY(NETWORK_NAME)) {
IE_SET_METRIC_RETURN(NETWORK_NAME, _network->GetMetric(METRIC_KEY(NETWORK_NAME)).as<std::string>());
} else if (name == METRIC_KEY(SUPPORTED_METRICS)) {
IE_SET_METRIC_RETURN(SUPPORTED_METRICS,
{METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS),
METRIC_KEY(SUPPORTED_METRICS),
METRIC_KEY(NETWORK_NAME),
METRIC_KEY(SUPPORTED_CONFIG_KEYS)});
} else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS,
{CONFIG_KEY(AUTO_BATCH_TIMEOUT)}); // only timeout can be changed on the fly
} else {
IE_THROW() << "Unsupported Network metric: " << name;
}
}
// ------------------------------AutoBatchInferencePlugin----------------------------
namespace {
std::map<std::string, std::string> mergeConfigs(std::map<std::string, std::string> config,
const std::map<std::string, std::string>& local) {
for (auto&& kvp : local) {
config[kvp.first] = kvp.second;
}
return config;
}
} // namespace
std::map<std::string, std::string> AutoBatchInferencePlugin::GetSupportedConfig(
const std::map<std::string, std::string>& config,
const std::string& deviceName) const {
std::vector<std::string> supportedConfigKeys = GetCore()->GetMetric(deviceName, METRIC_KEY(SUPPORTED_CONFIG_KEYS));
std::map<std::string, std::string> supportedConfig;
for (auto&& key : supportedConfigKeys) {
auto itKey = config.find(key);
if (config.end() != itKey) {
supportedConfig[key] = itKey->second;
}
}
return supportedConfig;
}
DeviceInformation AutoBatchInferencePlugin::ParseBatchDevice(const std::string& deviceWithBatch) {
auto&& d = deviceWithBatch;
auto openingBracket = d.find_first_of('(');
auto closingBracket = d.find_first_of(')', openingBracket);
auto deviceName = d.substr(0, openingBracket);
int batch = 1;
if (closingBracket != std::string::npos && openingBracket < closingBracket) {
batch = std::stol(d.substr(openingBracket + 1, closingBracket - 1));
if (batch <= 0) {
IE_THROW() << "Batch value for '" << deviceName << "' must be > 0, while " << batch << "is passed";
}
}
return {deviceName, {{}}, batch};
}
DeviceInformation AutoBatchInferencePlugin::ParseMetaDevice(const std::string& devicesBatchCfg,
const std::map<std::string, std::string>& config) const {
auto getDeviceConfig = [&](const DeviceName& deviceWithID) {
DeviceIDParser deviceParser(deviceWithID);
std::string deviceName = deviceParser.getDeviceName();
std::map<std::string, std::string> tconfig = mergeConfigs(_config, config);
// set device ID if any
std::string deviceIDLocal = deviceParser.getDeviceID();
if (!deviceIDLocal.empty()) {
tconfig[PluginConfigParams::KEY_DEVICE_ID] = deviceIDLocal;
}
return GetSupportedConfig(tconfig, deviceName);
};
auto metaDevice = ParseBatchDevice(devicesBatchCfg);
metaDevice.config = getDeviceConfig(metaDevice.deviceName);
auto cfg = config;
// check that no irrelevant config-keys left
for (auto k : config) {
const auto& name = k.first;
auto found_in_supported_cfg = std::find(supported_configKeys.begin(), supported_configKeys.end(), k.first);
auto found_in_device_cfg = metaDevice.config.find(k.first);
if (found_in_device_cfg == metaDevice.config.end() && found_in_supported_cfg == supported_configKeys.end()) {
IE_THROW() << "Unsupported config key: " << name;
}
}
return metaDevice;
}
RemoteContext::Ptr AutoBatchInferencePlugin::CreateContext(const InferenceEngine::ParamMap& config) {
auto cfg = config;
auto it = cfg.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
if (it == cfg.end())
IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
auto val = it->second;
auto metaDevice = ParseMetaDevice(val, std::map<std::string, std::string>());
cfg.erase(it);
return GetCore()->CreateContext(metaDevice.deviceName, cfg);
}
Parameter AutoBatchInferencePlugin::GetConfig(const std::string& name,
const std::map<std::string, Parameter>& options) const {
if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), name)) {
auto it = _config.find(name);
if (it == _config.end()) {
IE_THROW() << "Value for " << name << " is not set";
} else {
return {it->second};
}
} else {
IE_THROW() << "Unsupported config key: " << name;
}
}
void AutoBatchInferencePlugin::CheckConfig(const std::map<std::string, std::string>& config) {
for (auto&& kvp : config) {
const auto name = kvp.first;
const auto val = kvp.second;
if (supported_configKeys.end() == std::find(supported_configKeys.begin(), supported_configKeys.end(), name))
IE_THROW() << "Unsupported config key: " << name;
if (name == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
ParseBatchDevice(val);
} else if (name == CONFIG_KEY(AUTO_BATCH_TIMEOUT)) {
try {
auto t = std::stoi(val);
if (t < 0)
IE_THROW(ParameterMismatch);
} catch (const std::exception& e) {
IE_THROW(ParameterMismatch)
<< " Expecting unsigned int value for " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " got " << val;
}
}
}
}
void AutoBatchInferencePlugin::SetConfig(const std::map<std::string, std::string>& config) {
CheckConfig(config);
for (auto&& kvp : config) {
_config[kvp.first] = kvp.second;
}
}
static const Version version = {{2, 1}, CI_BUILD_NUMBER, "AutoBatchPlugin"};
IE_DEFINE_PLUGIN_CREATE_FUNCTION(AutoBatchInferencePlugin, version)
AutoBatchInferencePlugin::AutoBatchInferencePlugin() {
_pluginName = "BATCH";
}
InferenceEngine::Parameter AutoBatchInferencePlugin::GetMetric(
const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& options) const {
if (name == METRIC_KEY(SUPPORTED_METRICS)) {
std::vector<std::string> metrics;
metrics.push_back(METRIC_KEY(SUPPORTED_METRICS));
metrics.push_back(METRIC_KEY(FULL_DEVICE_NAME));
metrics.push_back(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
IE_SET_METRIC_RETURN(SUPPORTED_METRICS, metrics);
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
IE_SET_METRIC_RETURN(FULL_DEVICE_NAME, _pluginName);
} else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, supported_configKeys);
} else {
IE_THROW(NotFound) << "Unsupported metric key " << name;
}
}
IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::map<std::string, std::string>& config) {
return LoadNetworkImpl(network, nullptr, config);
}
InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::shared_ptr<InferenceEngine::RemoteContext> ctx,
const std::map<std::string, std::string>& config) {
if (GetCore() == nullptr) {
IE_THROW() << "Please, work with MULTI device via InferencEngine::Core object";
}
auto fullConfig = mergeConfigs(_config, config);
auto device_batch = fullConfig.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
if (device_batch == fullConfig.end()) {
IE_THROW() << "KEY_AUTO_BATCH key is not set for BATCH device";
}
auto metaDevice = ParseMetaDevice(device_batch->second, fullConfig);
const auto& deviceName = metaDevice.deviceName;
const auto& deviceConfig = metaDevice.config;
const auto perfConfig = fullConfig.find(PluginConfigParams::KEY_PERF_COUNT);
const bool enablePerfCounters = (fullConfig.end() != perfConfig) && (perfConfig->second == PluginConfigParams::YES);
auto report_footprint = [](std::shared_ptr<ICore> pCore, std::string device) -> size_t {
size_t footprint = 0;
// TODO: use the per-network metric (22.2) rather than plugin-level
auto stats = pCore->GetMetric(device, GPU_METRIC_KEY(MEMORY_STATISTICS)).as<std::map<std::string, uint64_t>>();
for (auto s : stats)
if (s.first.find("_current") != std::string::npos)
footprint += s.second;
return footprint;
};
size_t batch1_footprint = 0;
if (deviceName.find("GPU") != std::string::npos)
batch1_footprint = report_footprint(GetCore(), deviceName);
auto executableNetworkWithoutBatch = ctx ? GetCore()->LoadNetwork(network, ctx, deviceConfig)
: GetCore()->LoadNetwork(network, deviceName, deviceConfig);
if (deviceName.find("GPU") != std::string::npos) {
batch1_footprint = report_footprint(GetCore(), deviceName) - batch1_footprint;
if (batch1_footprint) {
const uint64_t total_mem = GetCore()->GetMetric(deviceName, GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
const int estimated_batch = (total_mem - batch1_footprint) / batch1_footprint;
int closest = pow(2, floor(log(estimated_batch) / log(2)));
closest = std::max(1, closest);
metaDevice.batchForDevice = std::min(metaDevice.batchForDevice, closest);
}
}
// auto-batch settings
std::unordered_map<std::string, InferenceEngine::Parameter> networkConfig;
for (auto c : fullConfig) {
if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), c.first))
networkConfig.insert(c);
}
InferenceEngine::SoExecutableNetworkInternal executableNetworkWithBatch;
if (metaDevice.batchForDevice > 1) {
try {
CNNNetwork clonedNetwork(InferenceEngine::details::cloneNetwork(network));
const InputsDataMap inputInfo = clonedNetwork.getInputsInfo();
ICNNNetwork::InputShapes shapes = clonedNetwork.getInputShapes();
for (const InputsDataMap::value_type& item : inputInfo) {
auto layout = item.second->getTensorDesc().getLayout();
// the below code is a placeholder for the WIP (22.1) functionality
// that will check the reshaping by the batch is robust (CVS-51744)
if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
layout == InferenceEngine::Layout::NDHWC) {
assert(1 == shapes[item.first][0]); // do not reshape/re-batch originally batched networks
shapes[item.first][0] = metaDevice.batchForDevice;
}
}
clonedNetwork.reshape(shapes);
executableNetworkWithBatch =
ctx ? GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, ctx, deviceConfig)
: GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, deviceName, deviceConfig);
} catch (...) {
executableNetworkWithBatch = {nullptr, nullptr};
}
}
if (!executableNetworkWithBatch) {
executableNetworkWithBatch = executableNetworkWithoutBatch;
metaDevice.batchForDevice = 1;
}
return std::make_shared<AutoBatchExecutableNetwork>(executableNetworkWithBatch,
executableNetworkWithoutBatch,
metaDevice,
networkConfig,
enablePerfCounters);
}
InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::shared_ptr<InferenceEngine::RemoteContext>& context,
const std::map<std::string, std::string>& config) {
return LoadNetworkImpl(network, context, config);
}
InferenceEngine::QueryNetworkResult AutoBatchInferencePlugin::QueryNetwork(
const InferenceEngine::CNNNetwork& network,
const std::map<std::string, std::string>& config) const {
auto cfg = config;
for (auto c : cfg) {
if (c.first == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
auto val = c.second;
cfg.erase(c.first);
auto metaDevice = ParseMetaDevice(val, cfg);
return GetCore()->QueryNetwork(network, metaDevice.deviceName, cfg);
}
}
IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
}
} // namespace AutoBatchPlugin

View File

@ -0,0 +1,159 @@
// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
///////////////////////////////////////////////////////////////////////////////////////////////////
#pragma once
#include <atomic>
#include <map>
#include <mutex>
#include <string>
#include <unordered_map>
#include <utility>
#include <vector>
#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
#include "cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp"
#include "cpp_interfaces/interface/ie_iplugin_internal.hpp"
#include "ie_metric_helpers.hpp"
#include "threading/ie_thread_safe_containers.hpp"
namespace AutoBatchPlugin {
using DeviceName = std::string;
struct DeviceInformation {
DeviceName deviceName;
std::map<std::string, std::string> config;
int batchForDevice;
};
class AutoBatchAsyncInferRequest;
class AutoBatchExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
public:
using Ptr = std::shared_ptr<AutoBatchExecutableNetwork>;
struct WorkerInferRequest {
using Ptr = std::shared_ptr<WorkerInferRequest>;
InferenceEngine::SoIInferRequestInternal _inferRequestBatched;
int _batchSize;
InferenceEngine::ThreadSafeQueueWithSize<std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task>> _tasks;
std::vector<InferenceEngine::Task> _completionTasks;
std::thread _thread;
std::condition_variable _cond;
std::mutex _mutex;
std::exception_ptr _exceptionPtr;
};
explicit AutoBatchExecutableNetwork(
const InferenceEngine::SoExecutableNetworkInternal& networkForDevice,
const InferenceEngine::SoExecutableNetworkInternal& networkForDeviceWithoutBatch,
const DeviceInformation& networkDevices,
const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
const bool needPerfCounters = false);
void SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) override;
InferenceEngine::Parameter GetConfig(const std::string& name) const override;
InferenceEngine::Parameter GetMetric(const std::string& name) const override;
InferenceEngine::IInferRequestInternal::Ptr CreateInferRequest() override;
InferenceEngine::IInferRequestInternal::Ptr CreateInferRequestImpl(
InferenceEngine::InputsDataMap networkInputs,
InferenceEngine::OutputsDataMap networkOutputs) override;
std::shared_ptr<InferenceEngine::RemoteContext> GetContext() const override;
std::shared_ptr<ngraph::Function> GetExecGraphInfo() override;
virtual ~AutoBatchExecutableNetwork();
protected:
static unsigned int ParseTimeoutValue(const std::string&);
std::atomic_bool _terminate = {false};
DeviceInformation _device;
InferenceEngine::SoExecutableNetworkInternal _network;
InferenceEngine::SoExecutableNetworkInternal _networkWithoutBatch;
std::vector<WorkerInferRequest::Ptr> _workerRequests;
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
bool _needPerfCounters = false;
std::atomic_size_t _numRequestsCreated = {0};
std::atomic_int _timeOut = {1000}; // in ms
};
class AutoBatchInferRequest : public InferenceEngine::IInferRequestInternal {
public:
using Ptr = std::shared_ptr<AutoBatchInferRequest>;
explicit AutoBatchInferRequest(const InferenceEngine::InputsDataMap& networkInputs,
const InferenceEngine::OutputsDataMap& networkOutputs,
AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
int batch_id,
int num_batch,
bool _needPerfCounters = false);
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> GetPerformanceCounts() const override;
// Batch-Device impl specific: sets the data (blobs from the device request to the batched device request)
void SetBlobsToAnotherRequest(InferenceEngine::SoIInferRequestInternal& req);
void CopyInputsIfNeeded();
void CopyOutputsIfNeeded();
AutoBatchExecutableNetwork::WorkerInferRequest& _myBatchedRequestWrapper;
std::exception_ptr _exceptionPtr;
protected:
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> _perfMap;
bool _needPerfCounters = false;
void CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src, InferenceEngine::Blob::Ptr dst, bool bInput);
size_t _batchId;
size_t _batchSize;
};
class AutoBatchAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
public:
using Ptr = std::shared_ptr<AutoBatchAsyncInferRequest>;
explicit AutoBatchAsyncInferRequest(const AutoBatchInferRequest::Ptr& inferRequest,
const bool needPerfCounters,
InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);
void Infer_ThreadUnsafe() override;
virtual ~AutoBatchAsyncInferRequest();
InferenceEngine::SoIInferRequestInternal _inferRequestWithoutBatch;
AutoBatchInferRequest::Ptr _inferRequest;
};
class AutoBatchInferencePlugin : public InferenceEngine::IInferencePlugin {
public:
AutoBatchInferencePlugin();
virtual ~AutoBatchInferencePlugin() = default;
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::map<std::string, std::string>& config) override;
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::shared_ptr<InferenceEngine::RemoteContext>& context,
const std::map<std::string, std::string>& config) override;
void SetConfig(const std::map<std::string, std::string>& config) override;
void CheckConfig(const std::map<std::string, std::string>& config);
InferenceEngine::Parameter GetConfig(
const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
InferenceEngine::QueryNetworkResult QueryNetwork(const InferenceEngine::CNNNetwork& network,
const std::map<std::string, std::string>& config) const override;
InferenceEngine::Parameter GetMetric(
const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
InferenceEngine::RemoteContext::Ptr CreateContext(const InferenceEngine::ParamMap&) override;
protected:
DeviceInformation ParseMetaDevice(const std::string& devicesBatchCfg,
const std::map<std::string, std::string>& config) const;
std::map<std::string, std::string> GetSupportedConfig(const std::map<std::string, std::string>& config,
const DeviceName& deviceName) const;
static DeviceInformation ParseBatchDevice(const std::string& deviceWithBatch);
InferenceEngine::IExecutableNetworkInternal::Ptr LoadNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::shared_ptr<InferenceEngine::RemoteContext> context,
const std::map<std::string, std::string>& config);
};
} // namespace AutoBatchPlugin

View File

@ -609,11 +609,9 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network, const std
// the more "capable" the CPU in general, the more streams we may want to keep to keep it utilized
const float memThresholdAssumeLimitedForISA = ov::MemBandwidthPressure::LIMITED/isaSpecificThreshold;
const float L2_cache_size = mkldnn::utils::get_cache_size(2 /*level*/, true /*per core */);
const float L3_cache_size = mkldnn::utils::get_cache_size(3, false);
ov::MemBandwidthPressure networkToleranceForLowCache = ov::MemBandwidthPressureTolerance(
clonedNetwork.getFunction(),
L2_cache_size, L3_cache_size,
memThresholdAssumeLimitedForISA);
L2_cache_size, memThresholdAssumeLimitedForISA);
// num of phys CPU cores (most aggressive value for #streams)
const auto num_cores = getNumberOfCPUCores();
// less aggressive

View File

@ -28,6 +28,7 @@
#include "intel_gpu/runtime/device_query.hpp"
#include "intel_gpu/runtime/debug_configuration.hpp"
#include <performance_heuristics.hpp>
#ifdef __linux__
# include <dlfcn.h>
#endif
@ -681,6 +682,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
metrics.push_back(METRIC_KEY(RANGE_FOR_STREAMS));
metrics.push_back(METRIC_KEY(DEVICE_TYPE));
metrics.push_back(METRIC_KEY(DEVICE_GOPS));
metrics.push_back(METRIC_KEY(OPTIMAL_BATCH_SIZE));
metrics.push_back(GPU_METRIC_KEY(MAX_BATCH_SIZE));
metrics.push_back(GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
metrics.push_back(GPU_METRIC_KEY(UARCH_VERSION));
@ -716,6 +718,76 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
<< static_cast<int>(device_info.gfx_ver.revision);
}
IE_SET_METRIC_RETURN(GPU_UARCH_VERSION, s.str());
} else if (name == METRIC_KEY(OPTIMAL_BATCH_SIZE)) {
auto next_pow_of_2 = [] (float x) {
return pow(2, ceil(log(x)/log(2)));
};
auto closest_pow_of_2 = [] (float x) {
return pow(2, floor(log(x)/log(2)));
};
auto model_param = options.find("MODEL_PTR");
if (model_param == options.end()) {
GPU_DEBUG_IF(debug_config->verbose >= 1) {
GPU_DEBUG_COUT << "[GPU_OPTIMAL_BATCH_SIZE] MODELS_PTR is not set: return 1" << std::endl;
}
IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, static_cast<unsigned int>(1));
}
std::shared_ptr<ngraph::Function> model;
try {
model = model_param->second.as<std::shared_ptr<ngraph::Function>>();
} catch (...) {
IE_THROW() << "[GPU_OPTIMAL_BATCH_SIZE] MODEL_PTR should be std::shared_ptr<ngraph::Function> type";
}
GPU_DEBUG_IF(debug_config->verbose >= 1) {
GPU_DEBUG_COUT << "DEVICE_INFO:"
<< "gfx_version.major, " << device_info.gfx_ver.major
<< "gfx_version.minor " << std::to_string(device_info.gfx_ver.minor) << std::endl;
}
static std::map<cldnn::gfx_version, size_t> gen_kbytes_per_bank = {
{{12, 0, 0}, 480}, // TGL
{{12, 1, 0}, 2048}, // DG1
{{12, 5, 0}, 320},
{{12, 7, 0}, 512},
};
size_t L3_cache_size = device_info.gfx_ver.major && (device_info.gfx_ver.major <= 9)
? 768 * 1024 // Gen9
: 2 * 768 * 1024; //reasonable default when no arch has been detected (e.g. due to old driver ver)
cldnn::gfx_version gen = {device_info.gfx_ver.major, device_info.gfx_ver.minor, 0 /*ignore the revision*/};
auto val = gen_kbytes_per_bank.find(gen);
if (gen_kbytes_per_bank.end() != val) {
auto kbytes_per_bank = val->second;
auto num_banks_per_slice = device_info.num_sub_slices_per_slice > 4
? next_pow_of_2(device_info.num_sub_slices_per_slice)
: 2 * device_info.num_sub_slices_per_slice;
L3_cache_size = kbytes_per_bank * 1024 * num_banks_per_slice * device_info.num_slices;
GPU_DEBUG_IF(debug_config->verbose >= 1) {
GPU_DEBUG_COUT << "DEVICE_INFO:"
<< "num_slices " << device_info.num_slices
<< ", num_sub_slices_per_slice " << device_info.num_sub_slices_per_slice
<< ", num_banks_per_slice " << num_banks_per_slice
<< ", gen_kbytes_per_bank : " << kbytes_per_bank
<< ", L3_cache_size is (MB): " << float(L3_cache_size) / 1024 / 1024 << std::endl;
}
}
Config config = _impl->m_configs.GetConfig(device_id);
auto networkCloned = CloneAndTransformNetwork(CNNNetwork(model), config);
ov::MemBandwidthPressure memPressure = ov::MemBandwidthPressureTolerance(networkCloned.getFunction(), L3_cache_size);
unsigned int batch = 1;
if (memPressure.max_mem_tolerance != ov::MemBandwidthPressure::UNKNOWN)
batch = std::max(1.0, 16 * closest_pow_of_2(memPressure.max_mem_tolerance));
std::map<std::string, InferenceEngine::Parameter> options_for_max_batch;
options_for_max_batch["MODEL_PTR"] = model;
options_for_max_batch["GPU_THROUGHPUT_STREAMS"] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO);
auto max_batch_size = GetMetric(GPU_METRIC_KEY(MAX_BATCH_SIZE), options_for_max_batch).as<unsigned int>();
unsigned int closest = closest_pow_of_2(max_batch_size);
batch = std::min(closest, batch);
batch = std::min(256u, batch); //batch 256 is a max
GPU_DEBUG_IF(debug_config->verbose >= 1) {
GPU_DEBUG_COUT << memPressure.max_mem_tolerance << std::endl;
GPU_DEBUG_COUT << "MAX_BATCH: " << max_batch_size << std::endl;
GPU_DEBUG_COUT << "ACTUAL OPTIMAL BATCH: " << batch << std::endl;
}
IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, batch);
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
auto deviceName = StringRightTrim(device_info.dev_name, "NEO", false);
deviceName += std::string(" (") + (device_info.dev_type == cldnn::device_type::discrete_gpu ? "dGPU" : "iGPU") + ")";

View File

@ -48,6 +48,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
list(APPEND DEPENDENCIES ov_auto_plugin)
endif()
if(ENABLE_AUTO_BATCH)
list(APPEND DEPENDENCIES ov_auto_batch_plugin)
endif()
if (NOT ENABLE_OV_ONNX_FRONTEND)
list(APPEND EXCLUDED_SOURCE_PATHS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_reader")
endif()

View File

@ -24,6 +24,7 @@ inline const std::string getPluginLibNameByDevice(const std::string& deviceName)
{ "GNA", "ov_intel_gna_plugin" },
{ "GPU", "ov_intel_gpu_plugin" },
{ "HETERO", "ov_hetero_plugin" },
{ "BATCH", "ov_auto_batch_plugin" },
{ "MULTI", "ov_multi_plugin" },
{ "MYRIAD", "myriadPlugin" },
{ "TEMPLATE", "ov_template_plugin" },
@ -42,6 +43,11 @@ inline const std::pair<std::string, std::string> generateDefaultHeteroConfig() {
return { "TARGET_FALLBACK" , ConformanceTests::targetDevice };
}
inline const std::pair<std::string, std::string> generateDefaultBatchConfig() {
// auto-batching with batch 1 (no real batching in fact, but full machinery is in action)
return { CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , std::string(ConformanceTests::targetDevice)};
}
inline const std::vector<std::map<std::string, std::string>> generateConfigs(const std::string& targetDevice,
const std::vector<std::map<std::string, std::string>>& config = {}) {
std::pair<std::string, std::string> defaultConfig;
@ -49,6 +55,8 @@ inline const std::vector<std::map<std::string, std::string>> generateConfigs(con
defaultConfig = generateDefaultMultiConfig();
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_HETERO)) {
defaultConfig = generateDefaultHeteroConfig();
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_BATCH)) {
defaultConfig = generateDefaultBatchConfig();
} else {
throw std::runtime_error("Incorrect target device: " + targetDevice);
}
@ -70,7 +78,8 @@ inline const std::string generateComplexDeviceName(const std::string& deviceName
inline const std::vector<std::string> returnAllPossibleDeviceCombination() {
std::vector<std::string> res{ConformanceTests::targetDevice};
std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO, CommonTestUtils::DEVICE_MULTI};
std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO,
CommonTestUtils::DEVICE_BATCH, CommonTestUtils::DEVICE_MULTI};
for (const auto& device : devices) {
res.emplace_back(generateComplexDeviceName(device));
}

View File

@ -33,4 +33,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestCallbackTests,
::testing::Values(CommonTestUtils::DEVICE_HETERO),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
InferRequestCallbackTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestCallbackTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
InferRequestCallbackTests::getTestCaseName);
} // namespace

View File

@ -36,4 +36,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestIOBBlobTest,
::testing::Values(CommonTestUtils::DEVICE_HETERO),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
InferRequestIOBBlobTest::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestIOBBlobTest,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
InferRequestIOBBlobTest::getTestCaseName);
} // namespace

View File

@ -38,4 +38,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestMultithreadingT
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
InferRequestMultithreadingTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestMultithreadingTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
InferRequestMultithreadingTests::getTestCaseName);
} // namespace

View File

@ -46,4 +46,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Hetero, InferRequestSetBlobByType,
::testing::Values(CommonTestUtils::DEVICE_HETERO),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
InferRequestSetBlobByType::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Batch, InferRequestSetBlobByType,
::testing::Combine(::testing::ValuesIn(setBlobTypes),
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
InferRequestSetBlobByType::getTestCaseName);
} // namespace

View File

@ -37,4 +37,9 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestWaitTests,
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
InferRequestWaitTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestWaitTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
InferRequestWaitTests::getTestCaseName);
} // namespace

View File

@ -0,0 +1,31 @@
// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <auto_batching/auto_batching_tests.hpp>
const std::vector<bool> get_vs_set{ true, false };
const std::vector<size_t> num_streams{ 1, 2 };
const std::vector<size_t> num_requests{ 1, 3, 8, 9, 16, 64 };
const std::vector<size_t> num_batch{ 1, 4, 8, 16, 32, 64, 128, 256 };
using namespace AutoBatchingTests;
namespace {
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_CPU),
::testing::ValuesIn(get_vs_set),
::testing::ValuesIn(num_streams),
::testing::ValuesIn(num_requests),
::testing::ValuesIn(num_batch)),
AutoBatching_Test::getTestCaseName);
// TODO: for 22.2 (CVS-68949)
//INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test_DetectionOutput,
// ::testing::Combine(
// ::testing::Values(CommonTestUtils::DEVICE_CPU),
// ::testing::ValuesIn(get_vs_set),
// ::testing::ValuesIn(num_streams),
// ::testing::ValuesIn(num_requests),
// ::testing::ValuesIn(num_batch)),
// AutoBatching_Test_DetectionOutput::getTestCaseName);
} // namespace

View File

@ -21,16 +21,27 @@ using namespace ::testing;
using namespace InferenceEngine;
using namespace InferenceEngine::gpu;
class RemoteBlob_Test : public CommonTestUtils::TestsCommon {
class RemoteBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<bool> {
protected:
std::shared_ptr<ngraph::Function> fn_ptr;
std::string deviceName;
public:
void SetUp() override {
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
deviceName = CommonTestUtils::DEVICE_GPU;
auto with_auto_batching = this->GetParam();
if (with_auto_batching) { // BATCH:GPU
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
}
}
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
auto with_auto_batch = obj.param;
return std::string("RemoteBlob_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
}
};
TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
TEST_P(RemoteBlob_Test, smoke_canInputUserBlob) {
#if defined(ANDROID)
GTEST_SKIP();
#endif
@ -41,7 +52,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
// TODO: Issue: investigate issue with IECore
auto ie = InferenceEngine::Core();
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
auto exec_net = ie.LoadNetwork(net, deviceName);
// regular inference
auto inf_req_regular = exec_net.CreateInferRequest();
@ -70,6 +81,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
Blob::Ptr shared_blob = make_shared_blob(net.getInputsInfo().begin()->second->getTensorDesc(), cldnn_context,
shared_buffer);
shared_blob->allocate();
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, shared_blob);
inf_req_shared.Infer();
@ -85,7 +97,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
}
TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
TEST_P(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
#if defined(ANDROID)
GTEST_SKIP();
#endif
@ -96,7 +108,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
// TODO: Issue: investigate issue with IECore
auto ie = InferenceEngine::Core();
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
auto exec_net = ie.LoadNetwork(net, deviceName);
// regular inference
auto inf_req_regular = exec_net.CreateInferRequest();
@ -139,7 +151,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
}
TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
auto fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
CNNNetwork net(fn_ptr);
@ -149,7 +161,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -161,7 +173,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
// inference using remote blob
auto ocl_instance = std::make_shared<OpenCL>();
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_context.get());
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_context.get());
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
auto inf_req_shared = exec_net_shared.CreateInferRequest();
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, fakeImageData);
@ -178,7 +190,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
}
}
TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
#if defined _WIN32
GTEST_SKIP();
#endif
@ -191,7 +203,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -214,7 +226,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
// without calling thread blocks
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
auto inf_req_shared = exec_net_shared.CreateInferRequest();
@ -270,7 +282,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
}
}
TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
#if defined _WIN32
GTEST_SKIP();
#endif
@ -283,7 +295,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -307,7 +319,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
// without calling thread blocks
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
auto inf_req_shared = exec_net_shared.CreateInferRequest();
@ -358,6 +370,10 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
}
}
std::vector<bool> with_auto_batching {true, false};
INSTANTIATE_TEST_SUITE_P(smoke_RemoteBlob, RemoteBlob_Test, ::testing::ValuesIn(with_auto_batching),
RemoteBlob_Test::getTestCaseName);
class BatchedBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<size_t> {
void SetUp() override {
num_batch = this->GetParam();

View File

@ -30,6 +30,7 @@ protected:
}
};
std::vector<bool> ov_with_auto_batching {true, false};
enum class RemoteTensorSharingType {
USER_CL_TENSOR = 0,
PLUGIN_CL_TENSOR = 1,
@ -54,17 +55,34 @@ std::ostream& operator<<(std::ostream& stream, RemoteTensorSharingType sharing_t
return stream;
}
class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test, public testing::WithParamInterface<RemoteTensorSharingType> {
using RemoteTensorSharingTestOptionsParams = std::tuple<RemoteTensorSharingType, bool /*auto-batching*/>;
class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test,
public testing::WithParamInterface<RemoteTensorSharingTestOptionsParams> {
protected:
std::shared_ptr<ngraph::Function> fn_ptr;
std::string deviceName;
public:
void SetUp() override {
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
deviceName = CommonTestUtils::DEVICE_GPU;
RemoteTensorSharingType sharing_type;
bool with_auto_batching;
std::tie(sharing_type, with_auto_batching) = this->GetParam();
if (with_auto_batching) // BATCH:GPU
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
}
static std::string getTestCaseName(testing::TestParamInfo<RemoteTensorSharingType> obj) {
RemoteTensorSharingType sharing_type = obj.param;
static std::string getTestCaseName(const testing::TestParamInfo<RemoteTensorSharingTestOptionsParams>& obj) {
RemoteTensorSharingType sharing_type;
bool with_auto_batching;
std::tie(sharing_type, with_auto_batching) = obj.param;
std::ostringstream result;
result << "OVRemoteTensorInputBlob_Test_";
result << sharing_type;
if (with_auto_batching)
result << "_WITH_AUTO_BATCHING";
return result.str();
}
};
@ -81,9 +99,17 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
p.input().preprocess().convert_element_type(ov::element::f32);
auto function = p.build();
auto exec_net = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
RemoteTensorSharingType sharing_type;
bool with_auto_batching;
std::tie(sharing_type, with_auto_batching) = GetParam();
RemoteTensorSharingType sharing_type = GetParam();
// auto-batching relies on availability of the lock() for the tensor (and the *USM_DEVICE is not lockable)
if (with_auto_batching
&& (RemoteTensorSharingType::USER_USM_DEVICE_TENSOR == sharing_type
|| RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR == sharing_type))
GTEST_SKIP();
auto exec_net = ie.compile_model(function, deviceName);
// regular inference
auto inf_req_regular = exec_net.create_infer_request();
@ -244,6 +270,7 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
INSTANTIATE_TEST_SUITE_P(
smoke_GPU,
OVRemoteTensorInputBlob_Test,
::testing::Combine(
::testing::ValuesIn(std::vector<RemoteTensorSharingType>{RemoteTensorSharingType::USER_CL_TENSOR,
RemoteTensorSharingType::PLUGIN_CL_TENSOR,
RemoteTensorSharingType::USER_USM_HOST_TENSOR,
@ -251,9 +278,29 @@ INSTANTIATE_TEST_SUITE_P(
RemoteTensorSharingType::PLUGIN_USM_HOST_TENSOR,
RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR,
RemoteTensorSharingType::PLUGIN_HOST_TENSOR}),
::testing::ValuesIn(ov_with_auto_batching)),
OVRemoteTensorInputBlob_Test::getTestCaseName);
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
class OVRemoteTensor_TestsWithContext : public OVRemoteTensor_Test, public testing::WithParamInterface<bool> {
protected:
std::shared_ptr<ngraph::Function> fn_ptr;
std::string deviceName;
public:
void SetUp() override {
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
deviceName = CommonTestUtils::DEVICE_GPU;
auto with_auto_batching = this->GetParam();
if (with_auto_batching) { // BATCH:GPU
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
}
}
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
auto with_auto_batch = obj.param;
return std::string("RemoteTensor_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
}
};
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContext) {
auto ie = ov::runtime::Core();
using namespace ov::preprocess;
@ -262,7 +309,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
p.input().preprocess().convert_element_type(ov::element::f32);
auto function = p.build();
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie.compile_model(function, deviceName);
auto input = function->get_parameters().at(0);
auto output = function->get_results().at(0);
@ -296,7 +343,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
}
}
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContextWithMultipleDevices) {
auto ie = ov::runtime::Core();
using namespace ov::preprocess;
@ -305,7 +352,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
p.input().preprocess().convert_element_type(ov::element::f32);
auto function = p.build();
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie.compile_model(function, deviceName);
auto input = function->get_parameters().at(0);
auto output = function->get_results().at(0);
@ -344,7 +391,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
}
}
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_out_of_order) {
auto ie = ov::runtime::Core();
using namespace ov::preprocess;
@ -353,7 +400,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
p.input().preprocess().convert_element_type(ov::element::f32);
auto function = p.build();
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie.compile_model(function, deviceName);
auto input = function->get_parameters().at(0);
auto output = function->get_results().at(0);
@ -423,7 +470,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
}
}
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_in_order) {
auto ie = ov::runtime::Core();
using namespace ov::preprocess;
@ -432,7 +479,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
p.input().preprocess().convert_element_type(ov::element::f32);
auto function = p.build();
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
auto exec_net_regular = ie.compile_model(function, deviceName);
auto input = function->get_parameters().at(0);
auto output = function->get_results().at(0);
@ -498,6 +545,9 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
}
}
INSTANTIATE_TEST_SUITE_P(smoke_RemoteTensor, OVRemoteTensor_TestsWithContext, ::testing::ValuesIn(ov_with_auto_batching),
OVRemoteTensor_TestsWithContext::getTestCaseName);
TEST_F(OVRemoteTensor_Test, NV12toBGR_image) {
#if defined(ANDROID)
GTEST_SKIP();

View File

@ -0,0 +1,31 @@
// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <auto_batching/auto_batching_tests.hpp>
const std::vector<size_t> num_streams{ 2 };
const std::vector<bool> get_vs_set{ true, false };
const std::vector<size_t> num_requests{ 1, 8, 16, 64 };
const std::vector<size_t> num_batch{ 1, 8, 32, 256 };
using namespace AutoBatchingTests;
namespace AutoBatchingTests {
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_GPU),
::testing::ValuesIn(get_vs_set),
::testing::ValuesIn(num_streams),
::testing::ValuesIn(num_requests),
::testing::ValuesIn(num_batch)),
AutoBatching_Test::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test_DetectionOutput,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_GPU),
::testing::ValuesIn(get_vs_set),
::testing::ValuesIn(num_streams),
::testing::ValuesIn(num_requests),
::testing::ValuesIn(num_batch)),
AutoBatching_Test_DetectionOutput::getTestCaseName);
} // namespace AutoBatchingTests

View File

@ -52,6 +52,10 @@ const std::vector<std::map<std::string, std::string>> autoConfig = {
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
};
const std::vector<std::map<std::string, std::string>> autoBatchConfig = {
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
};
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, ExecNetSetPrecision,
::testing::Combine(
::testing::ValuesIn(netPrecisions),
@ -72,4 +76,11 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetSetPrecision,
::testing::Values(CommonTestUtils::DEVICE_AUTO),
::testing::ValuesIn(autoConfig)),
ExecNetSetPrecision::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, ExecNetSetPrecision,
::testing::Combine(
::testing::ValuesIn(netPrecisions),
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(autoBatchConfig)),
ExecNetSetPrecision::getTestCaseName);
} // namespace

View File

@ -22,27 +22,27 @@ namespace {
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_OPTIMAL_NUMBER_OF_INFER_REQUESTS,
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
);
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_CONFIG_KEYS,
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
);
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_METRICS,
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
);
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_NETWORK_NAME,
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
);
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_ThrowsUnsupported,
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
);
//

View File

@ -19,6 +19,10 @@ const std::vector<std::map<std::string, std::string>> autoConfigs = {
{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
};
const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
};
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestCallbackTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestCallbackTests,
::testing::Values(CommonTestUtils::DEVICE_AUTO),
::testing::ValuesIn(autoConfigs)),
InferRequestCallbackTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestCallbackTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(autoBatchConfigs)),
InferRequestCallbackTests::getTestCaseName);
} // namespace

View File

@ -18,6 +18,10 @@ const std::vector<std::map<std::string, std::string>> autoconfigs = {
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, std::string(CommonTestUtils::DEVICE_CPU) + "," + CommonTestUtils::DEVICE_GPU}}
};
const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
};
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestMultithreadingTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestMultithreadingTes
::testing::ValuesIn(autoconfigs)),
InferRequestMultithreadingTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestMultithreadingTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(auto_batch_configs)),
InferRequestMultithreadingTests::getTestCaseName);
} // namespace

View File

@ -19,6 +19,11 @@ namespace {
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
};
const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
};
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestWaitTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -37,4 +42,10 @@ namespace {
::testing::ValuesIn(autoConfigs)),
InferRequestWaitTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestWaitTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(autoBatchConfigs)),
InferRequestWaitTests::getTestCaseName);
} // namespace

View File

@ -30,11 +30,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassNetworkTestP, OVClassNetworkTestP, ::tes
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
OVClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
OVClassGetMetricTest_SUPPORTED_METRICS,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
OVClassGetMetricTest_AVAILABLE_DEVICES,
@ -42,7 +42,7 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
OVClassGetMetricTest_FULL_DEVICE_NAME,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
OVClassGetMetricTest_OPTIMIZATION_CAPABILITIES,
@ -62,11 +62,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
OVClassGetMetricTest_ThrowUnsupported,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetConfigTest,
OVClassGetConfigTest_ThrowUnsupported,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetAvailableDevices, OVClassGetAvailableDevices, ::testing::Values("GPU"));

View File

@ -104,6 +104,29 @@ namespace {
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU},
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "should be int"}}
};
const std::vector<std::map<std::string, std::string>> auto_batch_inconfigs = {
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
{CONFIG_KEY(AUTO_BATCH_TIMEOUT), "-1"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, "DOESN'T EXIST"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "-1"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, "ON"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "unknown_file"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_DUMP_KERNELS, "ON"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_TUNING_MODE, "TUNING_UNKNOWN_MODE"}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_DEVICE_ID, "DEVICE_UNKNOWN"}},
};
IE_SUPPRESS_DEPRECATED_END
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, IncorrectConfigTests,
@ -125,6 +148,12 @@ namespace {
IncorrectConfigTests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(auto_batch_inconfigs)),
IncorrectConfigTests::getTestCaseName);
const std::vector<std::map<std::string, std::string>> conf = {
{}
};
@ -167,17 +196,6 @@ namespace {
};
IE_SUPPRESS_DEPRECATED_END
const std::vector<std::map<std::string, std::string>> multiconf = {
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::THROUGHPUT}},
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}},
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "1"}}
};
const std::vector<std::map<std::string, std::string>> autoConfigs = {
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
@ -232,6 +250,12 @@ namespace {
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "2"}}
};
const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
{CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "1"}},
};
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, DefaultValuesConfigTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -255,4 +279,15 @@ namespace {
::testing::Values(CommonTestUtils::DEVICE_AUTO),
::testing::ValuesIn(autoinconfigs)),
IncorrectConfigAPITests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigAPITests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(auto_batch_inconfigs)),
IncorrectConfigAPITests::getTestCaseName);
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, CorrectConfigTests,
::testing::Combine(
::testing::Values(CommonTestUtils::DEVICE_BATCH),
::testing::ValuesIn(auto_batch_configs)),
CorrectConfigTests::getTestCaseName);
} // namespace

View File

@ -35,12 +35,12 @@ INSTANTIATE_TEST_SUITE_P(
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
);
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_METRICS,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
);
INSTANTIATE_TEST_SUITE_P(
@ -50,7 +50,7 @@ INSTANTIATE_TEST_SUITE_P(
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassGetMetricTest, IEClassGetMetricTest_FULL_DEVICE_NAME,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
);
INSTANTIATE_TEST_SUITE_P(
@ -80,12 +80,12 @@ INSTANTIATE_TEST_SUITE_P(
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassGetMetricTest, IEClassGetMetricTest_ThrowUnsupported,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
);
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassGetConfigTest, IEClassGetConfigTest_ThrowUnsupported,
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
);
INSTANTIATE_TEST_SUITE_P(
@ -115,6 +115,26 @@ INSTANTIATE_TEST_SUITE_P(
::testing::Values("GPU")
);
using IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE = BehaviorTestsUtils::IEClassBaseTestP;
TEST_P(IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE, GetMetricAndPrintNoThrow) {
SKIP_IF_CURRENT_TEST_IS_DISABLED()
InferenceEngine::Core ie;
InferenceEngine::Parameter p;
std::map<std::string, InferenceEngine::Parameter> _options = {{"MODEL_PTR", simpleCnnNetwork.getFunction()}};
ASSERT_NO_THROW(p = ie.GetMetric(deviceName, METRIC_KEY(OPTIMAL_BATCH_SIZE), _options).as<unsigned int>());
unsigned int t = p;
std::cout << "GPU device optimal batch size: " << t << std::endl;
ASSERT_METRIC_SUPPORTED_IE(METRIC_KEY(OPTIMAL_BATCH_SIZE));
}
INSTANTIATE_TEST_SUITE_P(
nightly_IEClassExecutableNetworkGetMetricTest, IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE,
::testing::Values("GPU")
);
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT = BehaviorTestsUtils::IEClassBaseTestP;
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT, GetMetricAndPrintNoThrow) {
SKIP_IF_CURRENT_TEST_IS_DISABLED()
@ -135,6 +155,7 @@ INSTANTIATE_TEST_SUITE_P(
::testing::Values("GPU")
);
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM = BehaviorTestsUtils::IEClassBaseTestP;
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM, GetMetricAndPrintNoThrow) {
SKIP_IF_CURRENT_TEST_IS_DISABLED()

View File

@ -16,6 +16,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
list(APPEND DEPENDENCIES ov_auto_plugin)
endif()
if(ENABLE_AUTO_BATCH)
list(APPEND DEPENDENCIES ov_auto_batch_plugin)
endif()
# remove once CVS-69781 is fixed
if(ENABLE_OV_IR_FRONTEND)
list(APPEND DEPENDENCIES ov_ir_frontend)

View File

@ -0,0 +1,161 @@
// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <string>
#include <utility>
#include <vector>
#include <memory>
#include <gpu/gpu_config.hpp>
#include <common_test_utils/test_common.hpp>
#include <functional_test_utils/plugin_cache.hpp>
#include "ngraph_functions/subgraph_builders.hpp"
#include "functional_test_utils/blob_utils.hpp"
using namespace ::testing;
using namespace InferenceEngine;
namespace AutoBatchingTests {
using AutoBatchTwoNetsParams = std::tuple<
std::string, // device name
bool, // get or set blob
size_t, // number of streams
size_t, // number of requests
size_t>; // batch size>
class AutoBatching_Test : public CommonTestUtils::TestsCommon,
public testing::WithParamInterface<AutoBatchTwoNetsParams> {
void SetUp() override {
std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
fn_ptrs = {ngraph::builder::subgraph::makeSingleConv(),
ngraph::builder::subgraph::makeMultiSingleConv()};
};
public:
static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
size_t streams, requests, batch;
bool use_get_blob;
std::string device_name;
std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
return device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") + "_batch_size_" +
std::to_string(batch) +
"_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
}
protected:
std::string device_name;
bool use_get_blob;
size_t num_streams;
size_t num_requests;
size_t num_batch;
std::vector<std::shared_ptr<ngraph::Function>> fn_ptrs;
void TestAutoBatch() {
std::vector<InferenceEngine::CNNNetwork> nets;
for (auto &fn_ptr : fn_ptrs) {
nets.push_back(CNNNetwork(fn_ptr));
}
auto ie = InferenceEngine::Core();
std::vector<std::string> outputs;
std::vector<InferRequest> irs;
std::vector<std::vector<uint8_t>> ref;
std::vector<int> outElementsCount;
for (size_t i = 0; i < nets.size(); ++i) {
auto net = nets[i];
auto inputs = net.getInputsInfo();
for (auto n : inputs) {
n.second->setPrecision(Precision::FP32);
}
std::map<std::string, std::string> config;
if (device_name.find("GPU") != std::string::npos)
config[CONFIG_KEY(GPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
if (device_name.find("CPU") != std::string::npos)
config[CONFIG_KEY(CPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
// minimize timeout to reduce test time
config[CONFIG_KEY(AUTO_BATCH_TIMEOUT)] = std::to_string(1);
auto exec_net_ref = ie.LoadNetwork(net, std::string(CommonTestUtils::DEVICE_BATCH) + ":" +
device_name + "(" + std::to_string(num_batch) + ")",
config);
for (size_t j = 0; j < num_requests; j++) {
outputs.push_back(net.getOutputsInfo().begin()->first); //single output
outElementsCount.push_back(
std::accumulate(begin(fn_ptrs[i]->get_output_shape(0)), end(fn_ptrs[i]->get_output_shape(0)), 1,
std::multiplies<size_t>()));
auto inf_req = exec_net_ref.CreateInferRequest();
irs.push_back(inf_req);
std::vector<std::vector<uint8_t>> inData;
for (auto n : inputs) {
auto blob = FuncTestUtils::createAndFillBlob(n.second->getTensorDesc());
if (use_get_blob)
memcpy(reinterpret_cast<void *>(inf_req.GetBlob(n.first)->buffer().as<uint8_t*>()),
reinterpret_cast<const void *>(blob->cbuffer().as<uint8_t*>()), blob->byteSize());
else
inf_req.SetBlob(n.first, blob);
const auto inBlob = inf_req.GetBlob(n.first);
const auto blobSize = inBlob->byteSize();
const auto inBlobBuf = inBlob->cbuffer().as<uint8_t *>();
inData.push_back(std::vector<uint8_t>(inBlobBuf, inBlobBuf + blobSize));
}
auto refOutData = ngraph::helpers::interpreterFunction(fn_ptrs[i], {inData}).front().second;
ref.push_back(refOutData);
}
}
const int niter = 1;
for (int i = 0; i < niter; i++) {
for (auto ir : irs) {
ir.StartAsync();
}
for (auto ir : irs) {
ir.Wait(InferRequest::RESULT_READY);
}
}
auto thr = FuncTestUtils::GetComparisonThreshold(InferenceEngine::Precision::FP32);
for (size_t i = 0; i < irs.size(); ++i) {
const auto &refBuffer = ref[i].data();
ASSERT_EQ(outElementsCount[i], irs[i].GetBlob(outputs[i])->size());
FuncTestUtils::compareRawBuffers(irs[i].GetBlob(outputs[i])->buffer().as<float *>(),
reinterpret_cast<const float *>(refBuffer), outElementsCount[i],
outElementsCount[i],
thr);
}
}
};
class AutoBatching_Test_DetectionOutput : public AutoBatching_Test {
public:
void SetUp() override {
std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
fn_ptrs = {ngraph::builder::subgraph::makeEltwisePlusDetectionOutput(),
ngraph::builder::subgraph::makeEltwisePlusDetectionOutput()};
};
static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
size_t streams, requests, batch;
bool use_get_blob;
std::string device_name;
std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
return "DetectionOutput_HETERO_" + device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") +
"_batch_size_" + std::to_string(batch) +
"_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
}
};
TEST_P(AutoBatching_Test, compareAutoBatchingToSingleBatch) {
TestAutoBatch();
}
TEST_P(AutoBatching_Test_DetectionOutput, compareAutoBatchingToSingleBatch) {
TestAutoBatch();
}
} // namespace AutoBatchingTests

View File

@ -10,6 +10,7 @@ const char DEVICE_AUTO[] = "AUTO";
const char DEVICE_CPU[] = "CPU";
const char DEVICE_GNA[] = "GNA";
const char DEVICE_GPU[] = "GPU";
const char DEVICE_BATCH[] = "BATCH";
const char DEVICE_HDDL[] = "HDDL";
const char DEVICE_MYRIAD[] = "MYRIAD";
const char DEVICE_KEEMBAY[] = "VPUX";

View File

@ -26,6 +26,9 @@ public:
MOCK_METHOD3(ImportNetwork, InferenceEngine::SoExecutableNetworkInternal(
std::istream&, const std::shared_ptr<InferenceEngine::RemoteContext>&, const std::map<std::string, std::string>&));
MOCK_METHOD2(CreateContext, InferenceEngine::RemoteContext::Ptr(const std::string& deviceName,
const InferenceEngine::ParamMap& params));
MOCK_CONST_METHOD3(QueryNetwork, InferenceEngine::QueryNetworkResult(
const InferenceEngine::CNNNetwork&, const std::string&, const std::map<std::string, std::string>&));

View File

@ -242,6 +242,44 @@ inline std::shared_ptr<ngraph::Function> makeSingleConv(std::vector<size_t> inpu
return fn_ptr;
}
inline std::shared_ptr<ngraph::Function> makeEltwisePlusDetectionOutput(std::vector<std::vector<size_t>> inShapes =
{{1, 60}, {1, 165}, {1, 1, 75}},
ngraph::element::Type_t type = ngraph::element::Type_t::f32) {
// adding Eltwise so that we can tests Auto-Batching's HETERO code-path that splits the DetectionOutput and the rest of the network
auto params = ngraph::builder::makeParams(ngraph::element::f32, inShapes);
auto paramOuts = ngraph::helpers::convert2OutputVector(
ngraph::helpers::castOps2Nodes<ngraph::opset3::Parameter>(params));
ngraph::OutputVector outs;
for (size_t i = 0; i < inShapes.size(); i++) {
auto shape = inShapes[i];
auto p = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::f32, ngraph::Shape{shape});
auto add = ngraph::builder::makeEltwise(paramOuts[i], p, ngraph::helpers::EltwiseTypes::ADD);
params.push_back(p);
outs.push_back(add->output(0));
}
ngraph::op::DetectionOutput::Attributes attr;
attr.num_classes = 11;
attr.background_label_id = 0;
attr.top_k = 75;
attr.variance_encoded_in_target = true;
attr.keep_top_k = {50};
attr.code_type = std::string{"caffe.PriorBoxParameter.CORNER"};
attr.share_location = true;
attr.nms_threshold = 0.5f;
attr.confidence_threshold = 0.5f;
attr.clip_after_nms = false;
attr.clip_before_nms = false;
attr.decrease_label_id = false;
attr.normalized = false;
attr.input_height = 1;
attr.input_width = 1;
attr.objectness_score = 0.4f;
auto detOut = ngraph::builder::makeDetectionOutput(outs, attr);
ngraph::ResultVector results{std::make_shared<ngraph::opset3::Result>(detOut)};
return std::make_shared<ngraph::Function>(results, params, "EltWiseWithDetectionOutput");
}
inline std::shared_ptr<ngraph::Function> makeMultiSingleConv(std::vector<size_t> inputShape = {1, 3, 24, 24},
ngraph::element::Type type = ngraph::element::Type_t::f32) {
auto param0 = std::make_shared<ngraph::opset1::Parameter>(type, ngraph::Shape(inputShape));

View File

@ -38,6 +38,7 @@ using Config = std::map<std::string, std::string>;
using namespace MockMultiDevice;
using ConfigParams = std::tuple<
bool, // if THROUGHPUT
unsigned int, // cpu OPTIMAL_NUMBER_OF_INFER_REQUESTS
int, // cpu infer requet num of customer want
bool, // if cpu sleep, cpu device will load slow
@ -77,12 +78,18 @@ public:
unsigned int expectOptimalNum;
bool cpuSleep;
bool gpuSleep;
std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
bool isThroughput;
std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = obj.param;
std::ostringstream result;
result << "cpuOptimalNum_" << cpuOptimalNum << "cpuCustomerNum_" << cpuCustomerNum;
result << "gpuOptimalNum_" << gpuOptimalNum << "gpuCustomerNum_" << gpuCustomerNum;
result << "expectOptimalNum_" << expectOptimalNum;
if (isThroughput) {
result << "_isThroughput" << "true";
} else {
result << "__isThroughput" << "false";
}
if (cpuSleep) {
result << "_cpuSleep_" << "true";
} else {
@ -147,7 +154,7 @@ public:
IE_SET_METRIC(SUPPORTED_CONFIG_KEYS, supportConfigs, {});
ON_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _))
.WillByDefault(RETURN_MOCK_VALUE(supportConfigs));
EXPECT_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _)).Times(AnyNumber());
EXPECT_CALL(*core, GetMetric(_, _, _)).Times(AnyNumber());
// test auto plugin
config.insert({CONFIG_KEY_INTERNAL(MULTI_WORK_MODE_AS_AUTO), InferenceEngine::PluginConfigParams::YES});
@ -168,11 +175,24 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
unsigned int expectOptimalNum;
bool cpuSleep;
bool gpuSleep;
std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
bool isThroughput;
std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = this->GetParam();
if (isThroughput) {
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {{CONFIG_KEY(PERFORMANCE_HINT),
InferenceEngine::PluginConfigParams::THROUGHPUT}}, cpuCustomerNum, ""});
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {{CONFIG_KEY(PERFORMANCE_HINT),
InferenceEngine::PluginConfigParams::THROUGHPUT}}, gpuCustomerNum, ""});
IE_SET_METRIC(OPTIMAL_BATCH_SIZE, optimalBatchNum, 256);
IE_SET_METRIC(RANGE_FOR_STREAMS, rangeOfStreams, std::make_tuple<unsigned int, unsigned int>(1, 2));
ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(OPTIMAL_BATCH_SIZE)), _))
.WillByDefault(RETURN_MOCK_VALUE(optimalBatchNum));
ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(RANGE_FOR_STREAMS)), _))
.WillByDefault(RETURN_MOCK_VALUE(rangeOfStreams));
} else {
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""});
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""});
}
ON_CALL(*plugin, SelectDevice(_, _, _)).WillByDefault(Return(metaDevices[1]));
ON_CALL(*plugin, ParseMetaDevices(_, _)).WillByDefault(Return(metaDevices));
EXPECT_CALL(*plugin, ParseMetaDevices(_, _)).Times(1);
@ -241,27 +261,28 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
}
// ConfigParams {unsigned int, int, bool,
// ConfigParams {bool, unsigned int, int, bool,
// unsigned int, int, bool, unsigned int}
//
// every element for ConfigParams
// {cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
// {is throughput mode, cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
// gpuOptimalNum, customer hope for gpu infer requset num, if gpu sleep when load,
// expectOptimalNum of Auto ExecNetwork}
//
const std::vector<ConfigParams> testConfigs = {
ConfigParams {1, -1, false, 2, -1, true, 8},
ConfigParams {1, -1, false, 10, -1, true, 8},
ConfigParams {12, -1, false, 2, -1, true, 12},
ConfigParams {12, -1, false, 10, -1, true, 12},
ConfigParams {1, -1, true, 2, -1, false, 8},
ConfigParams {1, -1, true, 10, -1, false, 10},
ConfigParams {6, -1, true, 2, -1, false, 8},
ConfigParams {6, -1, true, 10, -1, false, 10},
ConfigParams {6, 4, false, 2, 3, true, 8},
ConfigParams {6, 4, false, 10, 3, true, 8},
ConfigParams {1, 4, true, 2, 3, false, 8},
ConfigParams {1, 4, true, 10, 3, false, 10}
ConfigParams {false, 1, -1, false, 2, -1, true, 8},
ConfigParams {false, 1, -1, false, 10, -1, true, 8},
ConfigParams {false, 12, -1, false, 2, -1, true, 12},
ConfigParams {false, 12, -1, false, 10, -1, true, 12},
ConfigParams {false, 1, -1, true, 2, -1, false, 8},
ConfigParams {false, 1, -1, true, 10, -1, false, 10},
ConfigParams {false, 6, -1, true, 2, -1, false, 8},
ConfigParams {false, 6, -1, true, 10, -1, false, 10},
ConfigParams {false, 6, 4, false, 2, 3, true, 8},
ConfigParams {false, 6, 4, false, 10, 3, true, 8},
ConfigParams {false, 1, 4, true, 2, 3, false, 8},
ConfigParams {false, 1, 4, true, 10, 3, false, 10},
ConfigParams {true, 1, 4, false, 10, 3, true, 512}
};
INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetworkGetMetric,

View File

@ -14,6 +14,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
add_dependencies(${TARGET_NAME} ov_auto_plugin)
endif()
if(ENABLE_AUTO_BATCH)
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
endif()
target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/plugin_tests")
target_link_libraries(${TARGET_NAME} PUBLIC

View File

@ -25,6 +25,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
add_dependencies(${TARGET_NAME} ov_auto_plugin)
endif()
if(ENABLE_AUTO_BATCH)
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
endif()
set_ie_threading_interface_for(${TARGET_NAME})
ie_faster_build(${TARGET_NAME}