Auto Batching impl (#7883)
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)
(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)
* applying/accomodating the API changes after rebase to the master
* replaying modified version of actual batch selection
* eearly experiments with model mem footprint
* changes from rebasing to the latest master
* experimenting with DG1 on the batch size selection, also collecting the mem footprint
* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap
* quick-n-dirty batch footpint vs device total mem
* code style
* testing which models perform badly due to kernels and NOT (batched) footprint
* stub pipeline task to comunicate the readiness rather than promise/future
* quick-n-dirty timeout impl
* explicit _completionTasks,reverting BA to use the timeout
* inputs outputs copies, works with AUTO and demo now
* accomodate the config per device-id, after rebase to the latest master
* allowing the auto-batching only with tput hint to let more conventional tests pass
* fix the pre-mature timeout restaring via waiting for batch1 requests completion
* moved the bacthed request statring ( along with input copies) to the dedicated thread
* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution
* code style
* increasing the timeout to test the ssd_* models perf (timeout?) issues
* reducing number of output stuff in BA to avoid bloating the logs in experiments
* more aggressive batching for experiments, not limited to 32 and also 4 as a min
* more accurate timeout debugging info
* getting the reqs limitation from the plugin SetConfig as well
* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context
* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)
* auto-batching functional test (with results check vs ref) and GPU instance for that
* fixed arithemtic on blobs ptrs
* clang
* handling possible batched network failure
* BATCH as the constants device name in test
* ENABLE_BATCH
* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)
* DetectionOutput hetero test for the CPU
* reenabling the Auto-Batching in the AUTO
* auto-batching device enabled in the test
* fixed the DO test
* improve the loading loop logic
* brushed the config keys
* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests
* fix the test after refactoring
* clang
* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now
* auto-batching hetero test (subgraph with DetectionOutput)
* fixed minor changes that were result of experiments with impl
* code-style
* brushing, disabling CPU's HETERO tests until planned activity for 22.2
* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team
* remote blobs tests for the auto-batching (old API)
* brushed names a bit
* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests
* fixed the ieUnitTests with adding CreateContext stub to the MockICore
* clang
* improved remote-blobs tests
* revert the back BA from exeprimenents with AB + device_use_mem
* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE
* remote blobs 2.0 tests, issue with context having the orig device name
* debugging DG1 perf drop (presumably due to non-fitting the device-mem)
* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2
* remote blobs 2.0 tests for different tensor sharing types
* converting assert to throw to accomodate legacy API where the lock() was possible to be called
* revert the timeout back to avoid mixing the studies, fixed the footprint calc
* reverting to estimating the max batch by extrapolating from bacth1 size
* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication
* even graceful batch 1 handling without duplication
* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching
* AutoBatchPlugin -> ov_auto_batch_plugin
* WA for gcc 4.8
* clang
* fix misprint
* fixed errors resulted from recent OV's Variant to Any transition
* skip auto-batching for already-batched networks
* AUTO_BATCH_TIMEOUT and tests
* GPU-specific L3
* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic
* debugging device info
* enabling the config tests for the GPU and fixing the Auto-batching tests to pass
* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers
* skip auto-batching for RNNs and alikes (e.g. single CHW input)
* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating
* brushing
* Auto plugin GetMetric support gpu auto-batch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add test case
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add comments on test
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the vars names, alos adding the excpetion handling
* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures
* add try catch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the code changed in the GPU plugin
* Auto-Batch requests tests
* brushed varibles a bit (ref)
* cleaned debug output from the ie_core
* cleaned cmake for the Auto-Batch
* removed batchN estimation from batch1
* cleaned from debug printf
* comments, cleanup
* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13
* Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1
.
* brushing ie_core.cpp
* fix 32bit compilation
* Code review: ENABLE_AUTO_BATCH
* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching
* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR
* default value for the OPTIMAL_BATCH_SIZE
* clang
* accomodate new func tests location
* fix shuffle of headers after clang + copyrights
* fixed misprint made during code refactoring
* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header
* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching
* switching from the unsafe size() and minimizing time under lock
* code style
* brushed the ApplyAutoBatching
* brushed the netric/config names and descriptions
* completed the core intergration tests for the auto-batching
* ExecGraphInfo and check for incorrect cfg
* removed explicit dependencies from cmake file of the plugin
* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests
Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
This commit is contained in:
parent
bc5da8d522
commit
49b5e5728b
@ -100,6 +100,8 @@ ie_option (ENABLE_GAPI_PREPROCESSING "Enables G-API preprocessing" ON)
|
||||
ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON)
|
||||
ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON)
|
||||
|
||||
ie_option (ENABLE_AUTO_BATCH "Enables Auto-Batching Plugin" ON)
|
||||
|
||||
ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON)
|
||||
|
||||
ie_option (ENABLE_TEMPLATE "Enable template plugin" ON)
|
||||
|
@ -141,6 +141,9 @@ When specifying key values as raw strings (that is, when using Python API), omit
|
||||
|
||||
@snippet snippets/GPU_Metric1.cpp part1
|
||||
|
||||
* OPTIMAL_BATCH_SIZE : Returns _optimal_ batch size for a given network on the given GPU device. The returned value is aligned to power of 2. Also, MODEL_PTR is the required option for this metric since the optimal batch size highly depends on the model. If the MODEL_PTR is not given, the value of 1 is returned. The example code to set the required and optional configs for this metric is available in the following snippet:
|
||||
|
||||
@snippet snippets/GPU_Metric1.cpp part2
|
||||
## GPU Context and Video Memory Sharing RemoteBlob API
|
||||
|
||||
See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md)
|
||||
|
@ -14,4 +14,12 @@ options.insert(std::make_pair("AVAILABLE_DEVICE_MEM_SIZE", available_device_mem_
|
||||
|
||||
auto max_batch_size = core.GetMetric("GPU", GPU_METRIC_KEY(MAX_BATCH_SIZE), options).as<uint32_t>();
|
||||
//! [part1]
|
||||
//! [part2]
|
||||
std::map<std::string, Parameter> opt = {{"MODEL_PTR", cnnNetwork.getFunction()}}; // Required. Same usage as for the MAX_BATCH_SIZE above. If not set, the OPTIONAL_BATCH_SIZE returns 1.
|
||||
// This is not entirely GPU-specific metric (so METRIC_KEY is used rather than GPU_METRIC_KEY below),
|
||||
// but the GPU is the only device that supports that at the moment.
|
||||
// For the GPU, the metric already accommodates limitation for the on-device memory that the MAX_BATCH_SIZE poses.
|
||||
// so OPTIMAL_BATCH_SIZE is always less than MAX_BATCH_SIZE. Unlike the latter it is also aligned to the power of 2.
|
||||
auto optimal_batch_size = core.GetMetric("GPU", METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
|
||||
//! [part2]
|
||||
}
|
||||
|
@ -6,6 +6,7 @@
|
||||
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <tuple>
|
||||
|
||||
namespace cldnn {
|
||||
/// @addtogroup cpp_api C++ API
|
||||
@ -25,6 +26,10 @@ struct gfx_version {
|
||||
uint16_t major;
|
||||
uint8_t minor;
|
||||
uint8_t revision;
|
||||
friend bool operator < (const gfx_version& l, const gfx_version& r) {
|
||||
return std::tie(l.major, l.minor, l.revision)
|
||||
< std::tie(r.major, r.minor, r.revision); // same order
|
||||
}
|
||||
};
|
||||
|
||||
/// @brief Information about the device properties and capabilities.
|
||||
|
@ -124,6 +124,7 @@ std::map<std::string, std::vector<InferenceEngine::Blob::Ptr>> getRemoteInputBlo
|
||||
}
|
||||
|
||||
auto blob = InferenceEngine::gpu::make_shared_blob(desc, context, clBuffer.back());
|
||||
blob->allocate();
|
||||
remoteBlobs[name].push_back(blob);
|
||||
};
|
||||
|
||||
|
@ -109,8 +109,10 @@ std::vector<float> splitFloat(const std::string& s, char delim) {
|
||||
|
||||
std::vector<std::string> parseDevices(const std::string& device_string) {
|
||||
std::string comma_separated_devices = device_string;
|
||||
if (comma_separated_devices.find(":") != std::string::npos) {
|
||||
comma_separated_devices = comma_separated_devices.substr(comma_separated_devices.find(":") + 1);
|
||||
auto colon = comma_separated_devices.find(":");
|
||||
if (colon != std::string::npos) {
|
||||
auto bracket = comma_separated_devices.find("("); // e.g. in BATCH:GPU(4)
|
||||
comma_separated_devices = comma_separated_devices.substr(colon + 1, bracket - colon - 1);
|
||||
}
|
||||
if ((comma_separated_devices == "MULTI") || (comma_separated_devices == "HETERO"))
|
||||
return std::vector<std::string>();
|
||||
|
@ -26,6 +26,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
||||
endif()
|
||||
|
||||
if(ENABLE_AUTO_BATCH)
|
||||
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
|
||||
endif()
|
||||
|
||||
if(ENABLE_INTEL_CPU)
|
||||
add_dependencies(${TARGET_NAME} ov_intel_cpu_plugin)
|
||||
endif()
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include "cpp/ie_cnn_network.h"
|
||||
#include "cpp_interfaces/interface/ie_iexecutable_network_internal.hpp"
|
||||
#include "ie_parameter.hpp"
|
||||
#include "ie_remote_context.hpp"
|
||||
#include "threading/ie_itask_executor.hpp"
|
||||
|
||||
namespace InferenceEngine {
|
||||
@ -60,6 +61,22 @@ public:
|
||||
const std::string& deviceName,
|
||||
const std::map<std::string, std::string>& config = {}) = 0;
|
||||
|
||||
/**
|
||||
* @brief Creates an executable network from a network object.
|
||||
*
|
||||
* Users can create as many networks as they need and use
|
||||
* them simultaneously (up to the limitation of the hardware resources)
|
||||
*
|
||||
* @param network CNNNetwork object acquired from Core::ReadNetwork
|
||||
* @param remoteCtx "Remote" (non-CPU) accelerator device-specific execution context to use
|
||||
* @param config Optional map of pairs: (config parameter name, config parameter value) relevant only for this load
|
||||
* operation
|
||||
* @return An executable network reference
|
||||
*/
|
||||
virtual SoExecutableNetworkInternal LoadNetwork(const CNNNetwork& network,
|
||||
const RemoteContext::Ptr& remoteCtx,
|
||||
const std::map<std::string, std::string>& config = {}) = 0;
|
||||
|
||||
/**
|
||||
* @brief Creates an executable network from a model file.
|
||||
*
|
||||
@ -142,6 +159,16 @@ public:
|
||||
*/
|
||||
virtual bool DeviceSupportsImportExport(const std::string& deviceName) const = 0;
|
||||
|
||||
/**
|
||||
* @brief Create a new shared context object on specified accelerator device
|
||||
* using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
|
||||
* @param deviceName Name of a device to create new shared context on.
|
||||
* @param params Map of device-specific shared context parameters.
|
||||
* @return A shared pointer to a created remote context.
|
||||
*/
|
||||
virtual InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
|
||||
const InferenceEngine::ParamMap&) = 0;
|
||||
|
||||
virtual bool isNewAPI() const = 0;
|
||||
|
||||
/**
|
||||
@ -165,6 +192,7 @@ public:
|
||||
|
||||
static std::vector<std::string> getHeteroDevices(std::string fallbackDevice);
|
||||
static std::vector<std::string> getMultiDevices(std::string devicesList);
|
||||
static std::string getBatchDevice(std::string devicesList);
|
||||
};
|
||||
|
||||
} // namespace InferenceEngine
|
||||
|
@ -23,14 +23,12 @@ struct MemBandwidthPressure {
|
||||
|
||||
static MemBandwidthPressure MemBandwidthPressureTolerance(
|
||||
const std::shared_ptr<ngraph::Function> nGraphFunc,
|
||||
const float L2_cache_size,
|
||||
const float L3_cache_size,
|
||||
const float cache_size,
|
||||
const float memThresholdAssumeLimited = MemBandwidthPressure::LIMITED) {
|
||||
int total_convs = 0, mem_limited_convs = 0, compute_convs = 0, total_gemms = 0, mem_limited_gemms = 0,
|
||||
total_deconvs = 0, compute_deconvs = 0, mem_limited_deconvs = 0;
|
||||
auto memLimitedFactor = [&](int size_data_moved, int datatype_size) -> float {
|
||||
return (L2_cache_size * 1.0f /*util factor, tbd */
|
||||
/ (size_data_moved * datatype_size));
|
||||
auto memLimitedFactor = [&](int size_data_moved, int datatype_size = 4) -> float {
|
||||
return (cache_size / (size_data_moved * datatype_size));
|
||||
};
|
||||
auto isLowPrecision = [&](ngraph::element::Type type) -> bool {
|
||||
return (type == ngraph::element::i8) || (type == ngraph::element::u8);
|
||||
|
@ -0,0 +1,86 @@
|
||||
// Copyright (C) 2018-2021 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
///////////////////////////////////////////////////////////////////////////////////////////////////
|
||||
#pragma once
|
||||
|
||||
#include <cstddef>
|
||||
#include <mutex>
|
||||
#include <queue>
|
||||
#include <type_traits>
|
||||
|
||||
#include "ie_parallel.hpp"
|
||||
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
|
||||
# include <tbb/concurrent_queue.h>
|
||||
#endif
|
||||
|
||||
namespace InferenceEngine {
|
||||
|
||||
template <typename T>
|
||||
class ThreadSafeQueueWithSize {
|
||||
public:
|
||||
void push(T value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
_queue.push(std::move(value));
|
||||
}
|
||||
bool try_pop(T& value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
if (!_queue.empty()) {
|
||||
value = std::move(_queue.front());
|
||||
_queue.pop();
|
||||
return true;
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
size_t size() {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
return _queue.size();
|
||||
}
|
||||
|
||||
protected:
|
||||
std::queue<T> _queue;
|
||||
std::mutex _mutex;
|
||||
};
|
||||
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
|
||||
template <typename T>
|
||||
using ThreadSafeQueue = tbb::concurrent_queue<T>;
|
||||
template <typename T>
|
||||
using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
|
||||
#else
|
||||
template <typename T>
|
||||
using ThreadSafeQueue = ThreadSafeQueueWithSize<T>;
|
||||
template <typename T>
|
||||
class ThreadSafeBoundedQueue {
|
||||
public:
|
||||
ThreadSafeBoundedQueue() = default;
|
||||
bool try_push(T value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
if (_capacity) {
|
||||
_queue.push(std::move(value));
|
||||
}
|
||||
return _capacity;
|
||||
}
|
||||
bool try_pop(T& value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
if (_capacity && !_queue.empty()) {
|
||||
value = std::move(_queue.front());
|
||||
_queue.pop();
|
||||
return true;
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
void set_capacity(std::size_t newCapacity) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
_capacity = newCapacity;
|
||||
}
|
||||
|
||||
protected:
|
||||
std::queue<T> _queue;
|
||||
std::mutex _mutex;
|
||||
bool _capacity = false;
|
||||
};
|
||||
#endif
|
||||
} // namespace InferenceEngine
|
@ -118,6 +118,18 @@ DECLARE_METRIC_VALUE(BATCHED_BLOB);
|
||||
* String value for metric name is "RANGE_FOR_STREAMS".
|
||||
*/
|
||||
DECLARE_METRIC_KEY(RANGE_FOR_STREAMS, std::tuple<unsigned int, unsigned int>);
|
||||
/**
|
||||
* @brief Metric to query information optimal batch size for the given device and the network
|
||||
*
|
||||
* Metric returns a value of unsigned int type,
|
||||
* Returns optimal batch size for a given network on the given device. The returned value is aligned to power of 2.
|
||||
* Also, MODEL_PTR is the required option for this metric since the optimal batch size depends on the model,
|
||||
* so if the MODEL_PTR is not given, the result of the metric is always 1.
|
||||
* For the GPU the metric is queried automatically whenever the OpenVINO performance hint for the throughput is used,
|
||||
* so that the result (>1) governs the automatic batching (transparently to the application).
|
||||
* The automatic batching can be disabled with ALLOW_AUTO_BATCHING set to NO
|
||||
*/
|
||||
DECLARE_METRIC_KEY(OPTIMAL_BATCH_SIZE, unsigned int);
|
||||
|
||||
/**
|
||||
* @brief Metric to provide a hint for a range for number of async infer requests. If device supports streams,
|
||||
@ -250,6 +262,15 @@ DECLARE_CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS);
|
||||
DECLARE_CONFIG_VALUE(YES);
|
||||
DECLARE_CONFIG_VALUE(NO);
|
||||
|
||||
/**
|
||||
* @brief Auto-batching configuration, string for the device + batch size, e.g. "GPU(4)"
|
||||
*/
|
||||
DECLARE_CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG);
|
||||
/**
|
||||
* @brief Auto-batching configuration: string with timeout (in ms), e.g. "100"
|
||||
*/
|
||||
DECLARE_CONFIG_KEY(AUTO_BATCH_TIMEOUT);
|
||||
|
||||
/**
|
||||
* @brief Limit `#threads` that are used by Inference Engine for inference on the CPU.
|
||||
*/
|
||||
|
@ -46,6 +46,7 @@
|
||||
#endif
|
||||
|
||||
using namespace InferenceEngine::PluginConfigParams;
|
||||
using namespace InferenceEngine;
|
||||
using namespace std::placeholders;
|
||||
|
||||
namespace ov {
|
||||
@ -94,6 +95,9 @@ Parsed<T> parseDeviceNameIntoConfig(const std::string& deviceName, const std::ma
|
||||
config_[ie::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES] =
|
||||
deviceName.substr(std::string("AUTO:").size());
|
||||
}
|
||||
} else if (deviceName_.find("BATCH:") == 0) {
|
||||
deviceName_ = "BATCH";
|
||||
config_[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceName.substr(6);
|
||||
} else {
|
||||
ie::DeviceIDParser parser(deviceName_);
|
||||
deviceName_ = parser.getDeviceName();
|
||||
@ -480,14 +484,22 @@ public:
|
||||
return newAPI;
|
||||
}
|
||||
|
||||
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(const ie::CNNNetwork& network,
|
||||
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(
|
||||
const ie::CNNNetwork& network,
|
||||
const std::shared_ptr<ie::RemoteContext>& context,
|
||||
const std::map<std::string, std::string>& config) {
|
||||
const std::map<std::string, std::string>& config) override {
|
||||
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext");
|
||||
if (context == nullptr) {
|
||||
IE_THROW() << "Remote context is null";
|
||||
}
|
||||
// have to deduce the device name/config from the context first
|
||||
auto parsed = parseDeviceNameIntoConfig(context->getDeviceName(), config);
|
||||
std::string& deviceName = parsed._deviceName;
|
||||
std::map<std::string, std::string>& config_with_batch = parsed._config;
|
||||
// if auto-batching is applicable, the below function will patch the device name and config accordingly:
|
||||
ApplyAutoBatching(network, deviceName, config_with_batch);
|
||||
parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
|
||||
|
||||
auto plugin = GetCPPPluginByName(parsed._deviceName);
|
||||
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> res;
|
||||
auto cacheManager = coreConfig.getCacheConfig()._cacheManager;
|
||||
@ -508,12 +520,59 @@ public:
|
||||
return res;
|
||||
}
|
||||
|
||||
void ApplyAutoBatching(const ie::CNNNetwork& network,
|
||||
std::string& deviceName,
|
||||
std::map<std::string, std::string>& config_with_batch) {
|
||||
if (deviceName.find("BATCH") != std::string::npos) {
|
||||
// explicitly enabled Auto-Batching e.g. in the tests
|
||||
auto pos = deviceName.find_first_of(":");
|
||||
if (pos != std::string::npos) {
|
||||
auto deviceNameWithBatchSize = deviceName.substr(pos + 1);
|
||||
auto deviceNameWithoutBatch = DeviceIDParser::getBatchDevice(deviceNameWithBatchSize);
|
||||
auto function = network.getFunction();
|
||||
// have to execute the DetectionOutput separately (without batching)
|
||||
// as this layer mix-in the values from the different inputs (batch id)
|
||||
bool bDetectionOutput = false;
|
||||
const std::string detectionOutputOpName = ngraph::op::DetectionOutput::get_type_info_static().name;
|
||||
const std::string resultOpName = ngraph::op::Result::get_type_info_static().name;
|
||||
for (auto&& node : function->get_ops()) {
|
||||
auto isDetectionOutputParent = [&detectionOutputOpName](decltype(node)& nd) {
|
||||
for (size_t n = 0; n < nd->get_input_size(); n++) {
|
||||
if (detectionOutputOpName == nd->get_input_node_ptr(n)->get_type_info().name)
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
};
|
||||
|
||||
if ((detectionOutputOpName == node->get_type_info().name) ||
|
||||
((resultOpName == node->get_type_info().name) && isDetectionOutputParent(node))) {
|
||||
node->get_rt_info()["affinity"] = deviceNameWithoutBatch;
|
||||
bDetectionOutput = true;
|
||||
} else {
|
||||
node->get_rt_info()["affinity"] = "BATCH";
|
||||
}
|
||||
}
|
||||
if (bDetectionOutput) {
|
||||
deviceName = "HETERO:BATCH," + deviceNameWithoutBatch;
|
||||
config_with_batch[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceNameWithBatchSize;
|
||||
} else {
|
||||
deviceName = "BATCH:" + deviceNameWithBatchSize;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
ie::SoExecutableNetworkInternal LoadNetwork(const ie::CNNNetwork& network,
|
||||
const std::string& deviceName,
|
||||
const std::string& deviceNameOrig,
|
||||
const std::map<std::string, std::string>& config) override {
|
||||
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::CNN");
|
||||
bool forceDisableCache = config.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
|
||||
auto parsed = parseDeviceNameIntoConfig(deviceName, config);
|
||||
std::string deviceName = deviceNameOrig;
|
||||
std::map<std::string, std::string> config_with_batch = config;
|
||||
// if auto-batching is applicable, the below function will patch the device name and config accordingly:
|
||||
ApplyAutoBatching(network, deviceName, config_with_batch);
|
||||
|
||||
bool forceDisableCache = config_with_batch.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
|
||||
auto parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
|
||||
if (forceDisableCache) {
|
||||
// remove this config key from parsed as plugins can throw unsupported exception
|
||||
parsed._config.erase(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE));
|
||||
@ -732,6 +791,19 @@ public:
|
||||
return devices;
|
||||
}
|
||||
|
||||
/**
|
||||
* @brief Create a new shared context object on specified accelerator device
|
||||
* using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
|
||||
* @param deviceName Name of a device to create new shared context on.
|
||||
* @param params Map of device-specific shared context parameters.
|
||||
* @return A shared pointer to a created remote context.
|
||||
*/
|
||||
InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
|
||||
const InferenceEngine::ParamMap& params) override {
|
||||
auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
|
||||
return GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
|
||||
}
|
||||
|
||||
/**
|
||||
* @brief Returns reference to CPP plugin wrapper by a device name
|
||||
* @param deviceName A name of device
|
||||
@ -1030,6 +1102,12 @@ public:
|
||||
deviceNames = ie::DeviceIDParser::getMultiDevices(deviceName.substr(pos + 1));
|
||||
}
|
||||
deviceNames.emplace_back("AUTO");
|
||||
} else if (deviceName.find("BATCH") == 0) {
|
||||
auto pos = deviceName.find_first_of(":");
|
||||
if (pos != std::string::npos) {
|
||||
deviceNames = {ie::DeviceIDParser::getBatchDevice(deviceName.substr(pos + 1))};
|
||||
}
|
||||
deviceNames.push_back("BATCH");
|
||||
} else {
|
||||
deviceNames.push_back(deviceName);
|
||||
}
|
||||
@ -1120,8 +1198,8 @@ std::vector<std::string> DeviceIDParser::getHeteroDevices(std::string fallbackDe
|
||||
}
|
||||
|
||||
std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList) {
|
||||
std::vector<std::string> deviceNames;
|
||||
auto trim_request_info = [](std::string device_with_requests) {
|
||||
std::set<std::string> deviceNames;
|
||||
auto trim_request_info = [](const std::string& device_with_requests) {
|
||||
auto opening_bracket = device_with_requests.find_first_of('(');
|
||||
return device_with_requests.substr(0, opening_bracket);
|
||||
};
|
||||
@ -1132,14 +1210,36 @@ std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList
|
||||
// we skip the #requests info here
|
||||
while ((pos = devicesList.find(delimiter)) != std::string::npos) {
|
||||
auto d = devicesList.substr(0, pos);
|
||||
deviceNames.push_back(trim_request_info(d));
|
||||
if (d.find("BATCH") == 0) {
|
||||
deviceNames.insert("BATCH");
|
||||
auto p = d.find_first_of(":");
|
||||
if (p != std::string::npos)
|
||||
deviceNames.insert(DeviceIDParser::getBatchDevice(d.substr(p + 1)));
|
||||
} else {
|
||||
deviceNames.insert(trim_request_info(d));
|
||||
}
|
||||
devicesList.erase(0, pos + 1);
|
||||
}
|
||||
|
||||
if (!devicesList.empty())
|
||||
deviceNames.push_back(trim_request_info(devicesList));
|
||||
if (!devicesList.empty()) {
|
||||
if (devicesList.find("BATCH") == 0) {
|
||||
deviceNames.insert("BATCH");
|
||||
auto p = devicesList.find_first_of(":");
|
||||
if (p != std::string::npos)
|
||||
deviceNames.insert(DeviceIDParser::getBatchDevice(devicesList.substr(p + 1)));
|
||||
} else {
|
||||
deviceNames.insert(trim_request_info(devicesList));
|
||||
}
|
||||
}
|
||||
return std::vector<std::string>(deviceNames.begin(), deviceNames.end());
|
||||
}
|
||||
|
||||
return deviceNames;
|
||||
std::string DeviceIDParser::getBatchDevice(std::string device) {
|
||||
auto trim_request_info = [](const std::string& device_with_requests) {
|
||||
auto opening_bracket = device_with_requests.find_first_of('(');
|
||||
return device_with_requests.substr(0, opening_bracket);
|
||||
};
|
||||
return trim_request_info(device);
|
||||
}
|
||||
|
||||
class Core::Impl : public ov::runtime::CoreImpl {
|
||||
@ -1207,18 +1307,7 @@ ExecutableNetwork Core::LoadNetwork(const std::string& modelPath, const std::map
|
||||
}
|
||||
|
||||
RemoteContext::Ptr Core::CreateContext(const std::string& deviceName, const ParamMap& params) {
|
||||
if (deviceName.find("HETERO") == 0) {
|
||||
IE_THROW() << "HETERO device does not support remote context";
|
||||
}
|
||||
if (deviceName.find("MULTI") == 0) {
|
||||
IE_THROW() << "MULTI device does not support remote context";
|
||||
}
|
||||
if (deviceName.find("AUTO") == 0) {
|
||||
IE_THROW() << "AUTO device does not support remote context";
|
||||
}
|
||||
|
||||
auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
|
||||
return _impl->GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
|
||||
return _impl->CreateContext(deviceName, params);
|
||||
}
|
||||
|
||||
RemoteContext::Ptr Core::GetDefaultContext(const std::string& deviceName) {
|
||||
|
@ -21,3 +21,7 @@ endif()
|
||||
if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||
add_subdirectory(auto)
|
||||
endif()
|
||||
|
||||
if(ENABLE_AUTO_BATCH)
|
||||
add_subdirectory(auto_batch)
|
||||
endif()
|
||||
|
@ -156,7 +156,8 @@ MultiDeviceExecutableNetwork::MultiDeviceExecutableNetwork(const std::string&
|
||||
, _needPerfCounters(needPerfCounters)
|
||||
, _multiPlugin(plugin)
|
||||
, _context(context)
|
||||
, _workModeIsAUTO(true) {
|
||||
, _workModeIsAUTO(true)
|
||||
, _network(network) {
|
||||
if (_multiPlugin->GetCore() == nullptr) {
|
||||
IE_THROW() << "Please, work with " << _multiPlugin->GetName() << " device via InferencEngine::Core object";
|
||||
}
|
||||
@ -667,10 +668,30 @@ InferenceEngine::Parameter MultiDeviceExecutableNetwork::GetMetric(const std::st
|
||||
real = _loadContext[ACTUALDEVICE].
|
||||
executableNetwork->GetMetric(name).as<unsigned int>();
|
||||
} else {
|
||||
IE_ASSERT(_loadContext[CPU].isAlready == true);
|
||||
real = _loadContext[CPU].
|
||||
executableNetwork->GetMetric(name).as<unsigned int>();
|
||||
std::unique_lock<std::mutex> lock(_confMutex);
|
||||
auto deviceInfo = _loadContext[ACTUALDEVICE].deviceInfo;
|
||||
lock.unlock();
|
||||
if (deviceInfo.deviceName.find("GPU") != std::string::npos) {
|
||||
const auto& mode = deviceInfo.config.find(CONFIG_KEY(PERFORMANCE_HINT));
|
||||
if (mode != deviceInfo.config.end() && mode->second == CONFIG_VALUE(THROUGHPUT)) {
|
||||
std::map<std::string, InferenceEngine::Parameter> options;
|
||||
options["MODEL_PTR"] = _network.getFunction(); // CNNntework
|
||||
try {
|
||||
auto optimalBatchSize = _core->GetMetric(deviceInfo.deviceName,
|
||||
METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
|
||||
auto rangeOfStreams = _core->GetMetric(deviceInfo.deviceName,
|
||||
METRIC_KEY(RANGE_FOR_STREAMS), options).as<std::tuple<unsigned int, unsigned int>>();
|
||||
real = (std::max)(real, std::get<1>(rangeOfStreams) * optimalBatchSize);
|
||||
} catch (const InferenceEngine::Exception &iie) {
|
||||
LOG_WARNING("[AUTOPLUGIN]get optimal infer requset num for GPU auto-batch failed :%s", iie.what());
|
||||
}
|
||||
unsigned int res = std::max(8u, real);
|
||||
}
|
||||
}
|
||||
}
|
||||
unsigned int res = (std::max)(8u, real);
|
||||
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, res);
|
||||
}
|
||||
|
||||
|
@ -7,22 +7,17 @@
|
||||
|
||||
#include <atomic>
|
||||
#include <mutex>
|
||||
#include <queue>
|
||||
#include <unordered_map>
|
||||
#include <map>
|
||||
#include <vector>
|
||||
#include <string>
|
||||
|
||||
#include <cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp>
|
||||
#include <ie_parallel.hpp>
|
||||
#include <threading/ie_itask_executor.hpp>
|
||||
#include <threading/ie_executor_manager.hpp>
|
||||
#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
|
||||
#include "threading/ie_thread_safe_containers.hpp"
|
||||
#include "threading/ie_itask_executor.hpp"
|
||||
#include "threading/ie_executor_manager.hpp"
|
||||
#include "ie_icore.hpp"
|
||||
|
||||
#if (IE_THREAD == IE_THREAD_TBB || IE_THREAD == IE_THREAD_TBB_AUTO)
|
||||
# include <tbb/concurrent_queue.h>
|
||||
#endif
|
||||
|
||||
#ifdef MULTIUNITTEST
|
||||
#define MOCKTESTMACRO virtual
|
||||
#define MultiDevicePlugin MockMultiDevicePlugin
|
||||
@ -79,66 +74,6 @@ enum AutoLoadContextIndex {
|
||||
template<typename T>
|
||||
using DeviceMap = std::unordered_map<DeviceName, T>;
|
||||
|
||||
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
|
||||
template <typename T>
|
||||
using ThreadSafeQueue = tbb::concurrent_queue<T>;
|
||||
template <typename T>
|
||||
using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
|
||||
#else
|
||||
template <typename T>
|
||||
class ThreadSafeQueue {
|
||||
public:
|
||||
void push(T value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
_queue.push(std::move(value));
|
||||
}
|
||||
bool try_pop(T& value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
if (!_queue.empty()) {
|
||||
value = std::move(_queue.front());
|
||||
_queue.pop();
|
||||
return true;
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
protected:
|
||||
std::queue<T> _queue;
|
||||
std::mutex _mutex;
|
||||
};
|
||||
template <typename T>
|
||||
class ThreadSafeBoundedQueue {
|
||||
public:
|
||||
ThreadSafeBoundedQueue() = default;
|
||||
bool try_push(T value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
if (_capacity) {
|
||||
_queue.push(std::move(value));
|
||||
}
|
||||
return _capacity;
|
||||
}
|
||||
bool try_pop(T& value) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
if (_capacity && !_queue.empty()) {
|
||||
value = std::move(_queue.front());
|
||||
_queue.pop();
|
||||
return true;
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
void set_capacity(std::size_t newCapacity) {
|
||||
std::lock_guard<std::mutex> lock(_mutex);
|
||||
_capacity = newCapacity;
|
||||
}
|
||||
|
||||
protected:
|
||||
std::queue<T> _queue;
|
||||
std::mutex _mutex;
|
||||
bool _capacity = false;
|
||||
};
|
||||
#endif
|
||||
|
||||
class MultiDeviceExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault,
|
||||
public InferenceEngine::ITaskExecutor {
|
||||
public:
|
||||
@ -148,7 +83,7 @@ public:
|
||||
InferenceEngine::Task _task;
|
||||
std::exception_ptr _exceptionPtr = nullptr;
|
||||
};
|
||||
using NotBusyWorkerRequests = ThreadSafeBoundedQueue<WorkerInferRequest*>;
|
||||
using NotBusyWorkerRequests = InferenceEngine::ThreadSafeBoundedQueue<WorkerInferRequest*>;
|
||||
|
||||
explicit MultiDeviceExecutableNetwork(const DeviceMap<InferenceEngine::SoExecutableNetworkInternal>& networksPerDevice,
|
||||
const std::vector<DeviceInformation>& networkDevices,
|
||||
@ -186,8 +121,8 @@ public:
|
||||
std::vector<DeviceInformation> _devicePriorities;
|
||||
const std::vector<DeviceInformation> _devicePrioritiesInitial;
|
||||
DeviceMap<InferenceEngine::SoExecutableNetworkInternal> _networksPerDevice;
|
||||
ThreadSafeQueue<InferenceEngine::Task> _inferPipelineTasks;
|
||||
DeviceMap<std::unique_ptr<ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
|
||||
InferenceEngine::ThreadSafeQueue<InferenceEngine::Task> _inferPipelineTasks;
|
||||
DeviceMap<std::unique_ptr<InferenceEngine::ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
|
||||
DeviceMap<NotBusyWorkerRequests> _idleWorkerRequests;
|
||||
DeviceMap<std::vector<WorkerInferRequest>> _workerRequests;
|
||||
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
|
||||
@ -217,6 +152,7 @@ private:
|
||||
std::promise<void> _firstLoadPromise;
|
||||
mutable AutoLoadContext _loadContext[CONTEXTNUM];
|
||||
mutable std::mutex _confMutex;
|
||||
const InferenceEngine::CNNNetwork _network;
|
||||
};
|
||||
|
||||
} // namespace MultiDevicePlugin
|
||||
|
20
src/plugins/auto_batch/CMakeLists.txt
Normal file
20
src/plugins/auto_batch/CMakeLists.txt
Normal file
@ -0,0 +1,20 @@
|
||||
# Copyright (C) 2018-2021 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
|
||||
set(TARGET_NAME "ov_auto_batch_plugin")
|
||||
|
||||
file(GLOB SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
|
||||
|
||||
file(GLOB HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
|
||||
|
||||
ie_add_plugin(NAME ${TARGET_NAME}
|
||||
DEVICE_NAME "BATCH"
|
||||
SOURCES ${SOURCES} ${HEADERS}
|
||||
VERSION_DEFINES_FOR auto_batch.cpp ADD_CLANG_FORMAT)
|
||||
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE Threads::Threads)
|
||||
|
||||
ie_add_api_validator_post_build_step(TARGET ${TARGET_NAME})
|
||||
|
||||
set_target_properties(${TARGET_NAME} PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE ${ENABLE_LTO})
|
731
src/plugins/auto_batch/auto_batch.cpp
Normal file
731
src/plugins/auto_batch/auto_batch.cpp
Normal file
@ -0,0 +1,731 @@
|
||||
// Copyright (C) 2018-2021 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
///////////////////////////////////////////////////////////////////////////////////////////////////
|
||||
#include "auto_batch.hpp"
|
||||
|
||||
#include <cpp_interfaces/interface/ie_internal_plugin_config.hpp>
|
||||
#include <ie_icore.hpp>
|
||||
#include <ie_ngraph_utils.hpp>
|
||||
#include <ie_performance_hints.hpp>
|
||||
#include <iostream>
|
||||
#include <map>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <unordered_set>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
namespace AutoBatchPlugin {
|
||||
using namespace InferenceEngine;
|
||||
|
||||
std::vector<std::string> supported_configKeys = {CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CONFIG_KEY(AUTO_BATCH_TIMEOUT)};
|
||||
|
||||
template <Precision::ePrecision precision>
|
||||
Blob::Ptr create_shared_blob_on_top_of_batched_blob(Blob::Ptr batched_blob, size_t batch_id, size_t batch_num) {
|
||||
typedef typename PrecisionTrait<precision>::value_type TYPE;
|
||||
typedef typename std::add_pointer<TYPE>::type TYPEPTR;
|
||||
auto ptr = batched_blob->buffer().as<TYPEPTR>();
|
||||
auto sizePerBatch = batched_blob->size() / batch_num;
|
||||
auto layout = batched_blob->getTensorDesc().getLayout();
|
||||
SizeVector dims = batched_blob->getTensorDesc().getDims();
|
||||
// the below code is a placeholder for the WIP (22.1) functionality
|
||||
// that will check the reshaping by the batch is robust (CVS-51744)
|
||||
if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
|
||||
layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
|
||||
layout == InferenceEngine::Layout::NDHWC) {
|
||||
dims[0] = 1;
|
||||
assert(batched_blob->getTensorDesc().getPrecision() == precision);
|
||||
return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()},
|
||||
ptr + sizePerBatch * batch_id,
|
||||
sizePerBatch);
|
||||
} else {
|
||||
// same blob for all requests (e.g. constants)
|
||||
return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()}, ptr);
|
||||
}
|
||||
}
|
||||
|
||||
// ------------------------------AutoBatchInferRequest----------------------------
|
||||
AutoBatchInferRequest::AutoBatchInferRequest(const InputsDataMap& networkInputs,
|
||||
const OutputsDataMap& networkOutputs,
|
||||
AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
|
||||
int batch_id,
|
||||
int num_batch,
|
||||
bool needPerfCounters)
|
||||
: IInferRequestInternal(networkInputs, networkOutputs),
|
||||
_myBatchedRequestWrapper(workerRequestPtr),
|
||||
_needPerfCounters(needPerfCounters),
|
||||
_batchId(batch_id),
|
||||
_batchSize(num_batch) {
|
||||
// Allocate all input blobs
|
||||
for (const auto& it : networkInputs) {
|
||||
auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
|
||||
Blob::Ptr res;
|
||||
switch (it.second->getTensorDesc().getPrecision()) {
|
||||
case InferenceEngine::Precision::FP32:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
case InferenceEngine::Precision::I32:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
case InferenceEngine::Precision::I8:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
case InferenceEngine::Precision::U16:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
|
||||
case InferenceEngine::Precision::I16:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
|
||||
break;
|
||||
case InferenceEngine::Precision::U8:
|
||||
case InferenceEngine::Precision::BOOL:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
default:
|
||||
IE_THROW() << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
|
||||
}
|
||||
_inputs[it.first] = res;
|
||||
}
|
||||
// Allocate all output blobs
|
||||
for (const auto& it : networkOutputs) {
|
||||
auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
|
||||
Blob::Ptr res;
|
||||
switch (it.second->getTensorDesc().getPrecision()) {
|
||||
case InferenceEngine::Precision::FP32:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
case InferenceEngine::Precision::I32:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
case InferenceEngine::Precision::I8:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
case InferenceEngine::Precision::U16:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
|
||||
case InferenceEngine::Precision::I16:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
|
||||
break;
|
||||
case InferenceEngine::Precision::U8:
|
||||
case InferenceEngine::Precision::BOOL:
|
||||
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
|
||||
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||
batch_id,
|
||||
num_batch);
|
||||
break;
|
||||
default:
|
||||
IE_THROW(NotImplemented) << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
|
||||
}
|
||||
_outputs[it.first] = res;
|
||||
}
|
||||
}
|
||||
|
||||
void AutoBatchInferRequest::SetBlobsToAnotherRequest(SoIInferRequestInternal& req) {
|
||||
for (const auto& it : _networkInputs) {
|
||||
auto& name = it.first;
|
||||
// this request is already in BUSY state, so using the internal functions safely
|
||||
auto blob = GetBlob(name);
|
||||
if (req->GetBlob(name) != blob)
|
||||
req->SetBlob(name, blob);
|
||||
}
|
||||
for (const auto& it : _networkOutputs) {
|
||||
auto& name = it.first;
|
||||
// this request is already in BUSY state, so using the internal functions safely
|
||||
auto blob = GetBlob(name);
|
||||
if (req->GetBlob(name) != blob)
|
||||
req->SetBlob(name, blob);
|
||||
}
|
||||
}
|
||||
|
||||
void AutoBatchInferRequest::CopyInputsIfNeeded() {
|
||||
for (const auto& it : _networkInputs) {
|
||||
auto& name = it.first;
|
||||
// this request is already in BUSY state, so using the internal functions safely
|
||||
CopyBlobIfNeeded(GetBlob(name), _myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), true);
|
||||
}
|
||||
}
|
||||
|
||||
void AutoBatchInferRequest::CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src,
|
||||
InferenceEngine::Blob::Ptr dst,
|
||||
bool bInput) {
|
||||
auto bufferDst = dst->buffer();
|
||||
auto ptrDst = bufferDst.as<char*>();
|
||||
auto bufferSrc = src->cbuffer();
|
||||
auto ptrSrc = bufferSrc.as<const char*>();
|
||||
ptrdiff_t szDst = dst->byteSize();
|
||||
ptrdiff_t szSrc = src->byteSize();
|
||||
if (bInput) {
|
||||
ptrdiff_t offset = szSrc != szDst ? _batchId * szDst / _batchSize : 0;
|
||||
if ((ptrDst + offset) == ptrSrc)
|
||||
return;
|
||||
else
|
||||
memcpy(ptrDst + offset, ptrSrc, szSrc);
|
||||
} else {
|
||||
ptrdiff_t offset = szSrc != szDst ? _batchId * szSrc / _batchSize : 0;
|
||||
if ((ptrSrc + offset) == ptrDst)
|
||||
return;
|
||||
else
|
||||
memcpy(ptrDst, ptrSrc + offset, szDst);
|
||||
}
|
||||
}
|
||||
|
||||
void AutoBatchInferRequest::CopyOutputsIfNeeded() {
|
||||
for (const auto& it : _networkOutputs) {
|
||||
auto& name = it.first;
|
||||
// this request is already in BUSY state, so using the internal functions safely
|
||||
CopyBlobIfNeeded(_myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), GetBlob(name), false);
|
||||
}
|
||||
}
|
||||
|
||||
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> AutoBatchInferRequest::GetPerformanceCounts() const {
|
||||
return _perfMap;
|
||||
}
|
||||
|
||||
AutoBatchAsyncInferRequest::AutoBatchAsyncInferRequest(
|
||||
const AutoBatchInferRequest::Ptr& inferRequest,
|
||||
const bool needPerfCounters,
|
||||
InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
|
||||
const ITaskExecutor::Ptr& callbackExecutor)
|
||||
: AsyncInferRequestThreadSafeDefault(inferRequest, nullptr, callbackExecutor),
|
||||
_inferRequestWithoutBatch(inferRequestWithoutBatch),
|
||||
_inferRequest{inferRequest} {
|
||||
// this executor starts the inference while the task (checking the result) is passed to the next stage
|
||||
struct ThisRequestExecutor : public ITaskExecutor {
|
||||
explicit ThisRequestExecutor(AutoBatchAsyncInferRequest* _this_) : _this{_this_} {}
|
||||
void run(Task task) override {
|
||||
auto& workerInferRequest = _this->_inferRequest->_myBatchedRequestWrapper;
|
||||
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
|
||||
t.first = _this;
|
||||
t.second = std::move(task);
|
||||
workerInferRequest._tasks.push(t);
|
||||
// it is ok to call size() here as the queue only grows (and the bulk removal happens under the mutex)
|
||||
const int sz = workerInferRequest._tasks.size();
|
||||
if (sz == workerInferRequest._batchSize) {
|
||||
workerInferRequest._cond.notify_one();
|
||||
}
|
||||
};
|
||||
AutoBatchAsyncInferRequest* _this = nullptr;
|
||||
};
|
||||
_pipeline = {
|
||||
{/*TaskExecutor*/ std::make_shared<ThisRequestExecutor>(this), /*task*/ [this, needPerfCounters] {
|
||||
if (this->_inferRequest->_exceptionPtr) // if the exception happened in the batch1 fallback
|
||||
std::rethrow_exception(this->_inferRequest->_exceptionPtr);
|
||||
if (this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr) // when the batchN execution failed
|
||||
std::rethrow_exception(this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr);
|
||||
this->_inferRequest->CopyOutputsIfNeeded();
|
||||
}}};
|
||||
}
|
||||
|
||||
void AutoBatchAsyncInferRequest::Infer_ThreadUnsafe() {
|
||||
InferUsingAsync();
|
||||
}
|
||||
|
||||
AutoBatchAsyncInferRequest::~AutoBatchAsyncInferRequest() {
|
||||
StopAndWait();
|
||||
}
|
||||
|
||||
// ------------------------------AutoBatchExecutableNetwork----------------------------
|
||||
AutoBatchExecutableNetwork::AutoBatchExecutableNetwork(
|
||||
const InferenceEngine::SoExecutableNetworkInternal& networkWithBatch,
|
||||
const InferenceEngine::SoExecutableNetworkInternal& networkWithoutBatch,
|
||||
const DeviceInformation& networkDevice,
|
||||
const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
|
||||
const bool needPerfCounters)
|
||||
: InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr,
|
||||
std::make_shared<InferenceEngine::ImmediateExecutor>()),
|
||||
_network{networkWithBatch},
|
||||
_networkWithoutBatch{networkWithoutBatch},
|
||||
_config{config},
|
||||
_needPerfCounters{needPerfCounters} {
|
||||
// WA for gcc 4.8 ( fails compilation with member init-list)
|
||||
_device = networkDevice;
|
||||
auto time_out = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
|
||||
if (time_out != config.end())
|
||||
_timeOut = ParseTimeoutValue(time_out->second.as<std::string>());
|
||||
}
|
||||
|
||||
AutoBatchExecutableNetwork::~AutoBatchExecutableNetwork() {
|
||||
_terminate = true;
|
||||
for (auto w : _workerRequests) {
|
||||
w->_thread.join();
|
||||
}
|
||||
_workerRequests.clear();
|
||||
}
|
||||
|
||||
unsigned int AutoBatchExecutableNetwork::ParseTimeoutValue(const std::string& s) {
|
||||
auto val = std::stoi(s);
|
||||
if (val < 0)
|
||||
IE_THROW(ParameterMismatch) << "Value for the " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " should be unsigned int";
|
||||
return val;
|
||||
}
|
||||
|
||||
std::shared_ptr<InferenceEngine::RemoteContext> AutoBatchExecutableNetwork::GetContext() const {
|
||||
return _network->GetContext();
|
||||
}
|
||||
|
||||
InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequestImpl(
|
||||
InferenceEngine::InputsDataMap networkInputs,
|
||||
InferenceEngine::OutputsDataMap networkOutputs) {
|
||||
// todo : guard request creation from another thread/on-the-fly
|
||||
auto num = _numRequestsCreated++;
|
||||
auto batch_id = num % _device.batchForDevice;
|
||||
if (!batch_id) { // need new request
|
||||
_workerRequests.push_back(std::make_shared<WorkerInferRequest>());
|
||||
auto workerRequestPtr = _workerRequests.back();
|
||||
workerRequestPtr->_inferRequestBatched = {_network->CreateInferRequest(), _network._so};
|
||||
workerRequestPtr->_batchSize = _device.batchForDevice;
|
||||
workerRequestPtr->_completionTasks.resize(workerRequestPtr->_batchSize);
|
||||
workerRequestPtr->_inferRequestBatched->SetCallback(
|
||||
[workerRequestPtr, this](std::exception_ptr exceptionPtr) mutable {
|
||||
if (exceptionPtr)
|
||||
workerRequestPtr->_exceptionPtr = exceptionPtr;
|
||||
IE_ASSERT(workerRequestPtr->_completionTasks.size() == (size_t)workerRequestPtr->_batchSize);
|
||||
// notify the individual requests on the completion
|
||||
for (int c = 0; c < workerRequestPtr->_batchSize; c++) {
|
||||
workerRequestPtr->_completionTasks[c]();
|
||||
}
|
||||
// reset the timeout
|
||||
workerRequestPtr->_cond.notify_one();
|
||||
});
|
||||
|
||||
workerRequestPtr->_thread = std::thread([workerRequestPtr, this] {
|
||||
while (1) {
|
||||
std::cv_status status;
|
||||
{
|
||||
std::unique_lock<std::mutex> lock(workerRequestPtr->_mutex);
|
||||
status = workerRequestPtr->_cond.wait_for(lock, std::chrono::milliseconds(_timeOut));
|
||||
}
|
||||
if (_terminate) {
|
||||
break;
|
||||
} else {
|
||||
// as we pop the tasks from the queue only here
|
||||
// it is ok to call size() (as the _tasks can only grow in parallel)
|
||||
const int sz = workerRequestPtr->_tasks.size();
|
||||
if (sz == workerRequestPtr->_batchSize) {
|
||||
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
|
||||
for (int n = 0; n < sz; n++) {
|
||||
IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
|
||||
workerRequestPtr->_completionTasks[n] = std::move(t.second);
|
||||
t.first->_inferRequest->CopyInputsIfNeeded();
|
||||
}
|
||||
workerRequestPtr->_inferRequestBatched->StartAsync();
|
||||
} else if ((status == std::cv_status::timeout) && sz) {
|
||||
// timeout to collect the batch is over, have to execute the requests in the batch1 mode
|
||||
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
|
||||
// popping all tasks collected by the moment of the time-out and execute each with batch1
|
||||
std::atomic<int> arrived = {0};
|
||||
std::promise<void> all_completed;
|
||||
auto all_completed_future = all_completed.get_future();
|
||||
for (int n = 0; n < sz; n++) {
|
||||
IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
|
||||
t.first->_inferRequestWithoutBatch->SetCallback(
|
||||
[t, sz, &arrived, &all_completed](std::exception_ptr p) {
|
||||
if (p)
|
||||
t.first->_inferRequest->_exceptionPtr = p;
|
||||
t.second();
|
||||
if (sz == ++arrived)
|
||||
all_completed.set_value();
|
||||
});
|
||||
t.first->_inferRequest->SetBlobsToAnotherRequest(t.first->_inferRequestWithoutBatch);
|
||||
t.first->_inferRequestWithoutBatch->StartAsync();
|
||||
}
|
||||
all_completed_future.get();
|
||||
// now when all the tasks for this batch are completed, start waiting for the timeout again
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
return std::make_shared<AutoBatchInferRequest>(networkInputs,
|
||||
networkOutputs,
|
||||
*_workerRequests.back(),
|
||||
batch_id,
|
||||
_device.batchForDevice,
|
||||
_needPerfCounters);
|
||||
}
|
||||
|
||||
InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequest() {
|
||||
auto syncRequestImpl = CreateInferRequestImpl(_networkInputs, _networkOutputs);
|
||||
syncRequestImpl->setPointerToExecutableNetworkInternal(shared_from_this());
|
||||
InferenceEngine::SoIInferRequestInternal inferRequestWithoutBatch = {_networkWithoutBatch->CreateInferRequest(),
|
||||
_networkWithoutBatch._so};
|
||||
return std::make_shared<AutoBatchAsyncInferRequest>(
|
||||
std::static_pointer_cast<AutoBatchInferRequest>(syncRequestImpl),
|
||||
_needPerfCounters,
|
||||
inferRequestWithoutBatch,
|
||||
_callbackExecutor);
|
||||
}
|
||||
|
||||
std::shared_ptr<ngraph::Function> AutoBatchExecutableNetwork::GetExecGraphInfo() {
|
||||
return _network->GetExecGraphInfo() ? _network->GetExecGraphInfo() : _networkWithoutBatch->GetExecGraphInfo();
|
||||
}
|
||||
|
||||
void AutoBatchExecutableNetwork::SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) {
|
||||
auto timeout = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
|
||||
if (timeout == config.end() || config.size() > 1) {
|
||||
IE_THROW() << "The only config that can be changed on the fly for the AutoBatching the is the "
|
||||
<< CONFIG_KEY(AUTO_BATCH_TIMEOUT);
|
||||
} else {
|
||||
_timeOut = ParseTimeoutValue(timeout->second.as<std::string>());
|
||||
}
|
||||
}
|
||||
|
||||
InferenceEngine::Parameter AutoBatchExecutableNetwork::GetConfig(const std::string& name) const {
|
||||
auto it = _config.find(name);
|
||||
if (it != _config.end()) {
|
||||
return it->second;
|
||||
} else {
|
||||
// find config key among networks config keys
|
||||
auto param = _network->GetMetric(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
|
||||
for (auto&& configKey : param.as<std::vector<std::string>>()) {
|
||||
if (configKey == name) {
|
||||
return _network->GetConfig(configKey);
|
||||
}
|
||||
}
|
||||
IE_THROW(NotFound) << name << " not found in the ExecutableNetwork config";
|
||||
}
|
||||
}
|
||||
|
||||
InferenceEngine::Parameter AutoBatchExecutableNetwork::GetMetric(const std::string& name) const {
|
||||
if (name == METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)) {
|
||||
auto reqs = 0;
|
||||
try {
|
||||
auto hint = _network->GetConfig(CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)).as<std::string>();
|
||||
reqs = InferenceEngine::PerfHintsConfig::CheckPerformanceHintRequestValue(hint);
|
||||
if (!reqs) // no limitations from user, let's deduce the full blown #requests
|
||||
// (multiplied by the devices capabilities to run multiple <batched> requests for further perf)
|
||||
reqs = _device.batchForDevice *
|
||||
_network->GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as<unsigned int>();
|
||||
} catch (const InferenceEngine::Exception& iie) {
|
||||
}
|
||||
reqs = std::max(reqs, _device.batchForDevice); // round up to the possible user's value
|
||||
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, reqs);
|
||||
} else if (name == METRIC_KEY(NETWORK_NAME)) {
|
||||
IE_SET_METRIC_RETURN(NETWORK_NAME, _network->GetMetric(METRIC_KEY(NETWORK_NAME)).as<std::string>());
|
||||
} else if (name == METRIC_KEY(SUPPORTED_METRICS)) {
|
||||
IE_SET_METRIC_RETURN(SUPPORTED_METRICS,
|
||||
{METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS),
|
||||
METRIC_KEY(SUPPORTED_METRICS),
|
||||
METRIC_KEY(NETWORK_NAME),
|
||||
METRIC_KEY(SUPPORTED_CONFIG_KEYS)});
|
||||
} else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
|
||||
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS,
|
||||
{CONFIG_KEY(AUTO_BATCH_TIMEOUT)}); // only timeout can be changed on the fly
|
||||
} else {
|
||||
IE_THROW() << "Unsupported Network metric: " << name;
|
||||
}
|
||||
}
|
||||
|
||||
// ------------------------------AutoBatchInferencePlugin----------------------------
|
||||
|
||||
namespace {
|
||||
|
||||
std::map<std::string, std::string> mergeConfigs(std::map<std::string, std::string> config,
|
||||
const std::map<std::string, std::string>& local) {
|
||||
for (auto&& kvp : local) {
|
||||
config[kvp.first] = kvp.second;
|
||||
}
|
||||
return config;
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
std::map<std::string, std::string> AutoBatchInferencePlugin::GetSupportedConfig(
|
||||
const std::map<std::string, std::string>& config,
|
||||
const std::string& deviceName) const {
|
||||
std::vector<std::string> supportedConfigKeys = GetCore()->GetMetric(deviceName, METRIC_KEY(SUPPORTED_CONFIG_KEYS));
|
||||
std::map<std::string, std::string> supportedConfig;
|
||||
for (auto&& key : supportedConfigKeys) {
|
||||
auto itKey = config.find(key);
|
||||
if (config.end() != itKey) {
|
||||
supportedConfig[key] = itKey->second;
|
||||
}
|
||||
}
|
||||
return supportedConfig;
|
||||
}
|
||||
|
||||
DeviceInformation AutoBatchInferencePlugin::ParseBatchDevice(const std::string& deviceWithBatch) {
|
||||
auto&& d = deviceWithBatch;
|
||||
auto openingBracket = d.find_first_of('(');
|
||||
auto closingBracket = d.find_first_of(')', openingBracket);
|
||||
auto deviceName = d.substr(0, openingBracket);
|
||||
|
||||
int batch = 1;
|
||||
if (closingBracket != std::string::npos && openingBracket < closingBracket) {
|
||||
batch = std::stol(d.substr(openingBracket + 1, closingBracket - 1));
|
||||
|
||||
if (batch <= 0) {
|
||||
IE_THROW() << "Batch value for '" << deviceName << "' must be > 0, while " << batch << "is passed";
|
||||
}
|
||||
}
|
||||
return {deviceName, {{}}, batch};
|
||||
}
|
||||
|
||||
DeviceInformation AutoBatchInferencePlugin::ParseMetaDevice(const std::string& devicesBatchCfg,
|
||||
const std::map<std::string, std::string>& config) const {
|
||||
auto getDeviceConfig = [&](const DeviceName& deviceWithID) {
|
||||
DeviceIDParser deviceParser(deviceWithID);
|
||||
std::string deviceName = deviceParser.getDeviceName();
|
||||
std::map<std::string, std::string> tconfig = mergeConfigs(_config, config);
|
||||
|
||||
// set device ID if any
|
||||
std::string deviceIDLocal = deviceParser.getDeviceID();
|
||||
if (!deviceIDLocal.empty()) {
|
||||
tconfig[PluginConfigParams::KEY_DEVICE_ID] = deviceIDLocal;
|
||||
}
|
||||
|
||||
return GetSupportedConfig(tconfig, deviceName);
|
||||
};
|
||||
|
||||
auto metaDevice = ParseBatchDevice(devicesBatchCfg);
|
||||
metaDevice.config = getDeviceConfig(metaDevice.deviceName);
|
||||
|
||||
auto cfg = config;
|
||||
// check that no irrelevant config-keys left
|
||||
for (auto k : config) {
|
||||
const auto& name = k.first;
|
||||
auto found_in_supported_cfg = std::find(supported_configKeys.begin(), supported_configKeys.end(), k.first);
|
||||
auto found_in_device_cfg = metaDevice.config.find(k.first);
|
||||
if (found_in_device_cfg == metaDevice.config.end() && found_in_supported_cfg == supported_configKeys.end()) {
|
||||
IE_THROW() << "Unsupported config key: " << name;
|
||||
}
|
||||
}
|
||||
return metaDevice;
|
||||
}
|
||||
|
||||
RemoteContext::Ptr AutoBatchInferencePlugin::CreateContext(const InferenceEngine::ParamMap& config) {
|
||||
auto cfg = config;
|
||||
auto it = cfg.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
|
||||
if (it == cfg.end())
|
||||
IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
|
||||
|
||||
auto val = it->second;
|
||||
auto metaDevice = ParseMetaDevice(val, std::map<std::string, std::string>());
|
||||
cfg.erase(it);
|
||||
return GetCore()->CreateContext(metaDevice.deviceName, cfg);
|
||||
}
|
||||
|
||||
Parameter AutoBatchInferencePlugin::GetConfig(const std::string& name,
|
||||
const std::map<std::string, Parameter>& options) const {
|
||||
if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), name)) {
|
||||
auto it = _config.find(name);
|
||||
if (it == _config.end()) {
|
||||
IE_THROW() << "Value for " << name << " is not set";
|
||||
} else {
|
||||
return {it->second};
|
||||
}
|
||||
} else {
|
||||
IE_THROW() << "Unsupported config key: " << name;
|
||||
}
|
||||
}
|
||||
|
||||
void AutoBatchInferencePlugin::CheckConfig(const std::map<std::string, std::string>& config) {
|
||||
for (auto&& kvp : config) {
|
||||
const auto name = kvp.first;
|
||||
const auto val = kvp.second;
|
||||
if (supported_configKeys.end() == std::find(supported_configKeys.begin(), supported_configKeys.end(), name))
|
||||
IE_THROW() << "Unsupported config key: " << name;
|
||||
if (name == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
|
||||
ParseBatchDevice(val);
|
||||
} else if (name == CONFIG_KEY(AUTO_BATCH_TIMEOUT)) {
|
||||
try {
|
||||
auto t = std::stoi(val);
|
||||
if (t < 0)
|
||||
IE_THROW(ParameterMismatch);
|
||||
} catch (const std::exception& e) {
|
||||
IE_THROW(ParameterMismatch)
|
||||
<< " Expecting unsigned int value for " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " got " << val;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void AutoBatchInferencePlugin::SetConfig(const std::map<std::string, std::string>& config) {
|
||||
CheckConfig(config);
|
||||
for (auto&& kvp : config) {
|
||||
_config[kvp.first] = kvp.second;
|
||||
}
|
||||
}
|
||||
|
||||
static const Version version = {{2, 1}, CI_BUILD_NUMBER, "AutoBatchPlugin"};
|
||||
IE_DEFINE_PLUGIN_CREATE_FUNCTION(AutoBatchInferencePlugin, version)
|
||||
|
||||
AutoBatchInferencePlugin::AutoBatchInferencePlugin() {
|
||||
_pluginName = "BATCH";
|
||||
}
|
||||
|
||||
InferenceEngine::Parameter AutoBatchInferencePlugin::GetMetric(
|
||||
const std::string& name,
|
||||
const std::map<std::string, InferenceEngine::Parameter>& options) const {
|
||||
if (name == METRIC_KEY(SUPPORTED_METRICS)) {
|
||||
std::vector<std::string> metrics;
|
||||
metrics.push_back(METRIC_KEY(SUPPORTED_METRICS));
|
||||
metrics.push_back(METRIC_KEY(FULL_DEVICE_NAME));
|
||||
metrics.push_back(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
|
||||
IE_SET_METRIC_RETURN(SUPPORTED_METRICS, metrics);
|
||||
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
|
||||
IE_SET_METRIC_RETURN(FULL_DEVICE_NAME, _pluginName);
|
||||
} else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
|
||||
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, supported_configKeys);
|
||||
} else {
|
||||
IE_THROW(NotFound) << "Unsupported metric key " << name;
|
||||
}
|
||||
}
|
||||
|
||||
IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::map<std::string, std::string>& config) {
|
||||
return LoadNetworkImpl(network, nullptr, config);
|
||||
}
|
||||
|
||||
InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadNetworkImpl(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::shared_ptr<InferenceEngine::RemoteContext> ctx,
|
||||
const std::map<std::string, std::string>& config) {
|
||||
if (GetCore() == nullptr) {
|
||||
IE_THROW() << "Please, work with MULTI device via InferencEngine::Core object";
|
||||
}
|
||||
|
||||
auto fullConfig = mergeConfigs(_config, config);
|
||||
auto device_batch = fullConfig.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
|
||||
if (device_batch == fullConfig.end()) {
|
||||
IE_THROW() << "KEY_AUTO_BATCH key is not set for BATCH device";
|
||||
}
|
||||
|
||||
auto metaDevice = ParseMetaDevice(device_batch->second, fullConfig);
|
||||
const auto& deviceName = metaDevice.deviceName;
|
||||
const auto& deviceConfig = metaDevice.config;
|
||||
const auto perfConfig = fullConfig.find(PluginConfigParams::KEY_PERF_COUNT);
|
||||
const bool enablePerfCounters = (fullConfig.end() != perfConfig) && (perfConfig->second == PluginConfigParams::YES);
|
||||
|
||||
auto report_footprint = [](std::shared_ptr<ICore> pCore, std::string device) -> size_t {
|
||||
size_t footprint = 0;
|
||||
// TODO: use the per-network metric (22.2) rather than plugin-level
|
||||
auto stats = pCore->GetMetric(device, GPU_METRIC_KEY(MEMORY_STATISTICS)).as<std::map<std::string, uint64_t>>();
|
||||
for (auto s : stats)
|
||||
if (s.first.find("_current") != std::string::npos)
|
||||
footprint += s.second;
|
||||
return footprint;
|
||||
};
|
||||
|
||||
size_t batch1_footprint = 0;
|
||||
if (deviceName.find("GPU") != std::string::npos)
|
||||
batch1_footprint = report_footprint(GetCore(), deviceName);
|
||||
auto executableNetworkWithoutBatch = ctx ? GetCore()->LoadNetwork(network, ctx, deviceConfig)
|
||||
: GetCore()->LoadNetwork(network, deviceName, deviceConfig);
|
||||
if (deviceName.find("GPU") != std::string::npos) {
|
||||
batch1_footprint = report_footprint(GetCore(), deviceName) - batch1_footprint;
|
||||
if (batch1_footprint) {
|
||||
const uint64_t total_mem = GetCore()->GetMetric(deviceName, GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
|
||||
const int estimated_batch = (total_mem - batch1_footprint) / batch1_footprint;
|
||||
int closest = pow(2, floor(log(estimated_batch) / log(2)));
|
||||
closest = std::max(1, closest);
|
||||
metaDevice.batchForDevice = std::min(metaDevice.batchForDevice, closest);
|
||||
}
|
||||
}
|
||||
// auto-batch settings
|
||||
std::unordered_map<std::string, InferenceEngine::Parameter> networkConfig;
|
||||
for (auto c : fullConfig) {
|
||||
if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), c.first))
|
||||
networkConfig.insert(c);
|
||||
}
|
||||
|
||||
InferenceEngine::SoExecutableNetworkInternal executableNetworkWithBatch;
|
||||
if (metaDevice.batchForDevice > 1) {
|
||||
try {
|
||||
CNNNetwork clonedNetwork(InferenceEngine::details::cloneNetwork(network));
|
||||
const InputsDataMap inputInfo = clonedNetwork.getInputsInfo();
|
||||
ICNNNetwork::InputShapes shapes = clonedNetwork.getInputShapes();
|
||||
for (const InputsDataMap::value_type& item : inputInfo) {
|
||||
auto layout = item.second->getTensorDesc().getLayout();
|
||||
// the below code is a placeholder for the WIP (22.1) functionality
|
||||
// that will check the reshaping by the batch is robust (CVS-51744)
|
||||
if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
|
||||
layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
|
||||
layout == InferenceEngine::Layout::NDHWC) {
|
||||
assert(1 == shapes[item.first][0]); // do not reshape/re-batch originally batched networks
|
||||
shapes[item.first][0] = metaDevice.batchForDevice;
|
||||
}
|
||||
}
|
||||
clonedNetwork.reshape(shapes);
|
||||
executableNetworkWithBatch =
|
||||
ctx ? GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, ctx, deviceConfig)
|
||||
: GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, deviceName, deviceConfig);
|
||||
} catch (...) {
|
||||
executableNetworkWithBatch = {nullptr, nullptr};
|
||||
}
|
||||
}
|
||||
|
||||
if (!executableNetworkWithBatch) {
|
||||
executableNetworkWithBatch = executableNetworkWithoutBatch;
|
||||
metaDevice.batchForDevice = 1;
|
||||
}
|
||||
|
||||
return std::make_shared<AutoBatchExecutableNetwork>(executableNetworkWithBatch,
|
||||
executableNetworkWithoutBatch,
|
||||
metaDevice,
|
||||
networkConfig,
|
||||
enablePerfCounters);
|
||||
}
|
||||
|
||||
InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::shared_ptr<InferenceEngine::RemoteContext>& context,
|
||||
const std::map<std::string, std::string>& config) {
|
||||
return LoadNetworkImpl(network, context, config);
|
||||
}
|
||||
|
||||
InferenceEngine::QueryNetworkResult AutoBatchInferencePlugin::QueryNetwork(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::map<std::string, std::string>& config) const {
|
||||
auto cfg = config;
|
||||
for (auto c : cfg) {
|
||||
if (c.first == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
|
||||
auto val = c.second;
|
||||
cfg.erase(c.first);
|
||||
auto metaDevice = ParseMetaDevice(val, cfg);
|
||||
return GetCore()->QueryNetwork(network, metaDevice.deviceName, cfg);
|
||||
}
|
||||
}
|
||||
IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
|
||||
}
|
||||
} // namespace AutoBatchPlugin
|
159
src/plugins/auto_batch/auto_batch.hpp
Normal file
159
src/plugins/auto_batch/auto_batch.hpp
Normal file
@ -0,0 +1,159 @@
|
||||
// Copyright (C) 2018-2021 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
///////////////////////////////////////////////////////////////////////////////////////////////////
|
||||
#pragma once
|
||||
|
||||
#include <atomic>
|
||||
#include <map>
|
||||
#include <mutex>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
|
||||
#include "cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp"
|
||||
#include "cpp_interfaces/interface/ie_iplugin_internal.hpp"
|
||||
#include "ie_metric_helpers.hpp"
|
||||
#include "threading/ie_thread_safe_containers.hpp"
|
||||
|
||||
namespace AutoBatchPlugin {
|
||||
|
||||
using DeviceName = std::string;
|
||||
|
||||
struct DeviceInformation {
|
||||
DeviceName deviceName;
|
||||
std::map<std::string, std::string> config;
|
||||
int batchForDevice;
|
||||
};
|
||||
|
||||
class AutoBatchAsyncInferRequest;
|
||||
class AutoBatchExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
|
||||
public:
|
||||
using Ptr = std::shared_ptr<AutoBatchExecutableNetwork>;
|
||||
struct WorkerInferRequest {
|
||||
using Ptr = std::shared_ptr<WorkerInferRequest>;
|
||||
InferenceEngine::SoIInferRequestInternal _inferRequestBatched;
|
||||
int _batchSize;
|
||||
InferenceEngine::ThreadSafeQueueWithSize<std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task>> _tasks;
|
||||
std::vector<InferenceEngine::Task> _completionTasks;
|
||||
std::thread _thread;
|
||||
std::condition_variable _cond;
|
||||
std::mutex _mutex;
|
||||
std::exception_ptr _exceptionPtr;
|
||||
};
|
||||
|
||||
explicit AutoBatchExecutableNetwork(
|
||||
const InferenceEngine::SoExecutableNetworkInternal& networkForDevice,
|
||||
const InferenceEngine::SoExecutableNetworkInternal& networkForDeviceWithoutBatch,
|
||||
const DeviceInformation& networkDevices,
|
||||
const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
|
||||
const bool needPerfCounters = false);
|
||||
|
||||
void SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) override;
|
||||
InferenceEngine::Parameter GetConfig(const std::string& name) const override;
|
||||
InferenceEngine::Parameter GetMetric(const std::string& name) const override;
|
||||
InferenceEngine::IInferRequestInternal::Ptr CreateInferRequest() override;
|
||||
InferenceEngine::IInferRequestInternal::Ptr CreateInferRequestImpl(
|
||||
InferenceEngine::InputsDataMap networkInputs,
|
||||
InferenceEngine::OutputsDataMap networkOutputs) override;
|
||||
std::shared_ptr<InferenceEngine::RemoteContext> GetContext() const override;
|
||||
std::shared_ptr<ngraph::Function> GetExecGraphInfo() override;
|
||||
virtual ~AutoBatchExecutableNetwork();
|
||||
|
||||
protected:
|
||||
static unsigned int ParseTimeoutValue(const std::string&);
|
||||
std::atomic_bool _terminate = {false};
|
||||
DeviceInformation _device;
|
||||
InferenceEngine::SoExecutableNetworkInternal _network;
|
||||
InferenceEngine::SoExecutableNetworkInternal _networkWithoutBatch;
|
||||
std::vector<WorkerInferRequest::Ptr> _workerRequests;
|
||||
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
|
||||
bool _needPerfCounters = false;
|
||||
std::atomic_size_t _numRequestsCreated = {0};
|
||||
std::atomic_int _timeOut = {1000}; // in ms
|
||||
};
|
||||
|
||||
class AutoBatchInferRequest : public InferenceEngine::IInferRequestInternal {
|
||||
public:
|
||||
using Ptr = std::shared_ptr<AutoBatchInferRequest>;
|
||||
explicit AutoBatchInferRequest(const InferenceEngine::InputsDataMap& networkInputs,
|
||||
const InferenceEngine::OutputsDataMap& networkOutputs,
|
||||
AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
|
||||
int batch_id,
|
||||
int num_batch,
|
||||
bool _needPerfCounters = false);
|
||||
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> GetPerformanceCounts() const override;
|
||||
|
||||
// Batch-Device impl specific: sets the data (blobs from the device request to the batched device request)
|
||||
void SetBlobsToAnotherRequest(InferenceEngine::SoIInferRequestInternal& req);
|
||||
void CopyInputsIfNeeded();
|
||||
void CopyOutputsIfNeeded();
|
||||
AutoBatchExecutableNetwork::WorkerInferRequest& _myBatchedRequestWrapper;
|
||||
std::exception_ptr _exceptionPtr;
|
||||
|
||||
protected:
|
||||
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> _perfMap;
|
||||
bool _needPerfCounters = false;
|
||||
void CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src, InferenceEngine::Blob::Ptr dst, bool bInput);
|
||||
size_t _batchId;
|
||||
size_t _batchSize;
|
||||
};
|
||||
|
||||
class AutoBatchAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
|
||||
public:
|
||||
using Ptr = std::shared_ptr<AutoBatchAsyncInferRequest>;
|
||||
|
||||
explicit AutoBatchAsyncInferRequest(const AutoBatchInferRequest::Ptr& inferRequest,
|
||||
const bool needPerfCounters,
|
||||
InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
|
||||
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);
|
||||
void Infer_ThreadUnsafe() override;
|
||||
virtual ~AutoBatchAsyncInferRequest();
|
||||
|
||||
InferenceEngine::SoIInferRequestInternal _inferRequestWithoutBatch;
|
||||
AutoBatchInferRequest::Ptr _inferRequest;
|
||||
};
|
||||
|
||||
class AutoBatchInferencePlugin : public InferenceEngine::IInferencePlugin {
|
||||
public:
|
||||
AutoBatchInferencePlugin();
|
||||
virtual ~AutoBatchInferencePlugin() = default;
|
||||
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::map<std::string, std::string>& config) override;
|
||||
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::shared_ptr<InferenceEngine::RemoteContext>& context,
|
||||
const std::map<std::string, std::string>& config) override;
|
||||
|
||||
void SetConfig(const std::map<std::string, std::string>& config) override;
|
||||
void CheckConfig(const std::map<std::string, std::string>& config);
|
||||
|
||||
InferenceEngine::Parameter GetConfig(
|
||||
const std::string& name,
|
||||
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
|
||||
InferenceEngine::QueryNetworkResult QueryNetwork(const InferenceEngine::CNNNetwork& network,
|
||||
const std::map<std::string, std::string>& config) const override;
|
||||
InferenceEngine::Parameter GetMetric(
|
||||
const std::string& name,
|
||||
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
|
||||
InferenceEngine::RemoteContext::Ptr CreateContext(const InferenceEngine::ParamMap&) override;
|
||||
|
||||
protected:
|
||||
DeviceInformation ParseMetaDevice(const std::string& devicesBatchCfg,
|
||||
const std::map<std::string, std::string>& config) const;
|
||||
|
||||
std::map<std::string, std::string> GetSupportedConfig(const std::map<std::string, std::string>& config,
|
||||
const DeviceName& deviceName) const;
|
||||
static DeviceInformation ParseBatchDevice(const std::string& deviceWithBatch);
|
||||
|
||||
InferenceEngine::IExecutableNetworkInternal::Ptr LoadNetworkImpl(
|
||||
const InferenceEngine::CNNNetwork& network,
|
||||
const std::shared_ptr<InferenceEngine::RemoteContext> context,
|
||||
const std::map<std::string, std::string>& config);
|
||||
};
|
||||
|
||||
} // namespace AutoBatchPlugin
|
@ -609,11 +609,9 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network, const std
|
||||
// the more "capable" the CPU in general, the more streams we may want to keep to keep it utilized
|
||||
const float memThresholdAssumeLimitedForISA = ov::MemBandwidthPressure::LIMITED/isaSpecificThreshold;
|
||||
const float L2_cache_size = mkldnn::utils::get_cache_size(2 /*level*/, true /*per core */);
|
||||
const float L3_cache_size = mkldnn::utils::get_cache_size(3, false);
|
||||
ov::MemBandwidthPressure networkToleranceForLowCache = ov::MemBandwidthPressureTolerance(
|
||||
clonedNetwork.getFunction(),
|
||||
L2_cache_size, L3_cache_size,
|
||||
memThresholdAssumeLimitedForISA);
|
||||
L2_cache_size, memThresholdAssumeLimitedForISA);
|
||||
// num of phys CPU cores (most aggressive value for #streams)
|
||||
const auto num_cores = getNumberOfCPUCores();
|
||||
// less aggressive
|
||||
|
@ -28,6 +28,7 @@
|
||||
|
||||
#include "intel_gpu/runtime/device_query.hpp"
|
||||
#include "intel_gpu/runtime/debug_configuration.hpp"
|
||||
#include <performance_heuristics.hpp>
|
||||
#ifdef __linux__
|
||||
# include <dlfcn.h>
|
||||
#endif
|
||||
@ -681,6 +682,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
|
||||
metrics.push_back(METRIC_KEY(RANGE_FOR_STREAMS));
|
||||
metrics.push_back(METRIC_KEY(DEVICE_TYPE));
|
||||
metrics.push_back(METRIC_KEY(DEVICE_GOPS));
|
||||
metrics.push_back(METRIC_KEY(OPTIMAL_BATCH_SIZE));
|
||||
metrics.push_back(GPU_METRIC_KEY(MAX_BATCH_SIZE));
|
||||
metrics.push_back(GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
|
||||
metrics.push_back(GPU_METRIC_KEY(UARCH_VERSION));
|
||||
@ -716,6 +718,76 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
|
||||
<< static_cast<int>(device_info.gfx_ver.revision);
|
||||
}
|
||||
IE_SET_METRIC_RETURN(GPU_UARCH_VERSION, s.str());
|
||||
} else if (name == METRIC_KEY(OPTIMAL_BATCH_SIZE)) {
|
||||
auto next_pow_of_2 = [] (float x) {
|
||||
return pow(2, ceil(log(x)/log(2)));
|
||||
};
|
||||
auto closest_pow_of_2 = [] (float x) {
|
||||
return pow(2, floor(log(x)/log(2)));
|
||||
};
|
||||
auto model_param = options.find("MODEL_PTR");
|
||||
if (model_param == options.end()) {
|
||||
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||
GPU_DEBUG_COUT << "[GPU_OPTIMAL_BATCH_SIZE] MODELS_PTR is not set: return 1" << std::endl;
|
||||
}
|
||||
IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, static_cast<unsigned int>(1));
|
||||
}
|
||||
std::shared_ptr<ngraph::Function> model;
|
||||
try {
|
||||
model = model_param->second.as<std::shared_ptr<ngraph::Function>>();
|
||||
} catch (...) {
|
||||
IE_THROW() << "[GPU_OPTIMAL_BATCH_SIZE] MODEL_PTR should be std::shared_ptr<ngraph::Function> type";
|
||||
}
|
||||
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||
GPU_DEBUG_COUT << "DEVICE_INFO:"
|
||||
<< "gfx_version.major, " << device_info.gfx_ver.major
|
||||
<< "gfx_version.minor " << std::to_string(device_info.gfx_ver.minor) << std::endl;
|
||||
}
|
||||
static std::map<cldnn::gfx_version, size_t> gen_kbytes_per_bank = {
|
||||
{{12, 0, 0}, 480}, // TGL
|
||||
{{12, 1, 0}, 2048}, // DG1
|
||||
{{12, 5, 0}, 320},
|
||||
{{12, 7, 0}, 512},
|
||||
};
|
||||
size_t L3_cache_size = device_info.gfx_ver.major && (device_info.gfx_ver.major <= 9)
|
||||
? 768 * 1024 // Gen9
|
||||
: 2 * 768 * 1024; //reasonable default when no arch has been detected (e.g. due to old driver ver)
|
||||
cldnn::gfx_version gen = {device_info.gfx_ver.major, device_info.gfx_ver.minor, 0 /*ignore the revision*/};
|
||||
auto val = gen_kbytes_per_bank.find(gen);
|
||||
if (gen_kbytes_per_bank.end() != val) {
|
||||
auto kbytes_per_bank = val->second;
|
||||
auto num_banks_per_slice = device_info.num_sub_slices_per_slice > 4
|
||||
? next_pow_of_2(device_info.num_sub_slices_per_slice)
|
||||
: 2 * device_info.num_sub_slices_per_slice;
|
||||
L3_cache_size = kbytes_per_bank * 1024 * num_banks_per_slice * device_info.num_slices;
|
||||
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||
GPU_DEBUG_COUT << "DEVICE_INFO:"
|
||||
<< "num_slices " << device_info.num_slices
|
||||
<< ", num_sub_slices_per_slice " << device_info.num_sub_slices_per_slice
|
||||
<< ", num_banks_per_slice " << num_banks_per_slice
|
||||
<< ", gen_kbytes_per_bank : " << kbytes_per_bank
|
||||
<< ", L3_cache_size is (MB): " << float(L3_cache_size) / 1024 / 1024 << std::endl;
|
||||
}
|
||||
}
|
||||
Config config = _impl->m_configs.GetConfig(device_id);
|
||||
auto networkCloned = CloneAndTransformNetwork(CNNNetwork(model), config);
|
||||
ov::MemBandwidthPressure memPressure = ov::MemBandwidthPressureTolerance(networkCloned.getFunction(), L3_cache_size);
|
||||
unsigned int batch = 1;
|
||||
if (memPressure.max_mem_tolerance != ov::MemBandwidthPressure::UNKNOWN)
|
||||
batch = std::max(1.0, 16 * closest_pow_of_2(memPressure.max_mem_tolerance));
|
||||
std::map<std::string, InferenceEngine::Parameter> options_for_max_batch;
|
||||
options_for_max_batch["MODEL_PTR"] = model;
|
||||
options_for_max_batch["GPU_THROUGHPUT_STREAMS"] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO);
|
||||
auto max_batch_size = GetMetric(GPU_METRIC_KEY(MAX_BATCH_SIZE), options_for_max_batch).as<unsigned int>();
|
||||
unsigned int closest = closest_pow_of_2(max_batch_size);
|
||||
batch = std::min(closest, batch);
|
||||
batch = std::min(256u, batch); //batch 256 is a max
|
||||
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||
GPU_DEBUG_COUT << memPressure.max_mem_tolerance << std::endl;
|
||||
GPU_DEBUG_COUT << "MAX_BATCH: " << max_batch_size << std::endl;
|
||||
GPU_DEBUG_COUT << "ACTUAL OPTIMAL BATCH: " << batch << std::endl;
|
||||
}
|
||||
IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, batch);
|
||||
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
|
||||
auto deviceName = StringRightTrim(device_info.dev_name, "NEO", false);
|
||||
deviceName += std::string(" (") + (device_info.dev_type == cldnn::device_type::discrete_gpu ? "dGPU" : "iGPU") + ")";
|
||||
|
@ -48,6 +48,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||
list(APPEND DEPENDENCIES ov_auto_plugin)
|
||||
endif()
|
||||
|
||||
if(ENABLE_AUTO_BATCH)
|
||||
list(APPEND DEPENDENCIES ov_auto_batch_plugin)
|
||||
endif()
|
||||
|
||||
if (NOT ENABLE_OV_ONNX_FRONTEND)
|
||||
list(APPEND EXCLUDED_SOURCE_PATHS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_reader")
|
||||
endif()
|
||||
|
@ -24,6 +24,7 @@ inline const std::string getPluginLibNameByDevice(const std::string& deviceName)
|
||||
{ "GNA", "ov_intel_gna_plugin" },
|
||||
{ "GPU", "ov_intel_gpu_plugin" },
|
||||
{ "HETERO", "ov_hetero_plugin" },
|
||||
{ "BATCH", "ov_auto_batch_plugin" },
|
||||
{ "MULTI", "ov_multi_plugin" },
|
||||
{ "MYRIAD", "myriadPlugin" },
|
||||
{ "TEMPLATE", "ov_template_plugin" },
|
||||
@ -42,6 +43,11 @@ inline const std::pair<std::string, std::string> generateDefaultHeteroConfig() {
|
||||
return { "TARGET_FALLBACK" , ConformanceTests::targetDevice };
|
||||
}
|
||||
|
||||
inline const std::pair<std::string, std::string> generateDefaultBatchConfig() {
|
||||
// auto-batching with batch 1 (no real batching in fact, but full machinery is in action)
|
||||
return { CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , std::string(ConformanceTests::targetDevice)};
|
||||
}
|
||||
|
||||
inline const std::vector<std::map<std::string, std::string>> generateConfigs(const std::string& targetDevice,
|
||||
const std::vector<std::map<std::string, std::string>>& config = {}) {
|
||||
std::pair<std::string, std::string> defaultConfig;
|
||||
@ -49,6 +55,8 @@ inline const std::vector<std::map<std::string, std::string>> generateConfigs(con
|
||||
defaultConfig = generateDefaultMultiConfig();
|
||||
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_HETERO)) {
|
||||
defaultConfig = generateDefaultHeteroConfig();
|
||||
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_BATCH)) {
|
||||
defaultConfig = generateDefaultBatchConfig();
|
||||
} else {
|
||||
throw std::runtime_error("Incorrect target device: " + targetDevice);
|
||||
}
|
||||
@ -70,7 +78,8 @@ inline const std::string generateComplexDeviceName(const std::string& deviceName
|
||||
|
||||
inline const std::vector<std::string> returnAllPossibleDeviceCombination() {
|
||||
std::vector<std::string> res{ConformanceTests::targetDevice};
|
||||
std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO, CommonTestUtils::DEVICE_MULTI};
|
||||
std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO,
|
||||
CommonTestUtils::DEVICE_BATCH, CommonTestUtils::DEVICE_MULTI};
|
||||
for (const auto& device : devices) {
|
||||
res.emplace_back(generateComplexDeviceName(device));
|
||||
}
|
||||
|
@ -33,4 +33,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestCallbackTests,
|
||||
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||
InferRequestCallbackTests::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestCallbackTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||
InferRequestCallbackTests::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -36,4 +36,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestIOBBlobTest,
|
||||
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||
InferRequestIOBBlobTest::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestIOBBlobTest,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||
InferRequestIOBBlobTest::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -38,4 +38,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestMultithreadingT
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||
InferRequestMultithreadingTests::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestMultithreadingTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||
InferRequestMultithreadingTests::getTestCaseName);
|
||||
|
||||
} // namespace
|
||||
|
@ -46,4 +46,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Hetero, InferRequestSetBlobByType,
|
||||
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||
InferRequestSetBlobByType::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Batch, InferRequestSetBlobByType,
|
||||
::testing::Combine(::testing::ValuesIn(setBlobTypes),
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||
InferRequestSetBlobByType::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -37,4 +37,9 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestWaitTests,
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||
InferRequestWaitTests::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestWaitTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||
InferRequestWaitTests::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -0,0 +1,31 @@
|
||||
// Copyright (C) 2018-2021 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
#include <auto_batching/auto_batching_tests.hpp>
|
||||
|
||||
const std::vector<bool> get_vs_set{ true, false };
|
||||
const std::vector<size_t> num_streams{ 1, 2 };
|
||||
const std::vector<size_t> num_requests{ 1, 3, 8, 9, 16, 64 };
|
||||
const std::vector<size_t> num_batch{ 1, 4, 8, 16, 32, 64, 128, 256 };
|
||||
using namespace AutoBatchingTests;
|
||||
|
||||
namespace {
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_CPU),
|
||||
::testing::ValuesIn(get_vs_set),
|
||||
::testing::ValuesIn(num_streams),
|
||||
::testing::ValuesIn(num_requests),
|
||||
::testing::ValuesIn(num_batch)),
|
||||
AutoBatching_Test::getTestCaseName);
|
||||
// TODO: for 22.2 (CVS-68949)
|
||||
//INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test_DetectionOutput,
|
||||
// ::testing::Combine(
|
||||
// ::testing::Values(CommonTestUtils::DEVICE_CPU),
|
||||
// ::testing::ValuesIn(get_vs_set),
|
||||
// ::testing::ValuesIn(num_streams),
|
||||
// ::testing::ValuesIn(num_requests),
|
||||
// ::testing::ValuesIn(num_batch)),
|
||||
// AutoBatching_Test_DetectionOutput::getTestCaseName);
|
||||
|
||||
} // namespace
|
@ -21,16 +21,27 @@ using namespace ::testing;
|
||||
using namespace InferenceEngine;
|
||||
using namespace InferenceEngine::gpu;
|
||||
|
||||
class RemoteBlob_Test : public CommonTestUtils::TestsCommon {
|
||||
class RemoteBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<bool> {
|
||||
protected:
|
||||
std::shared_ptr<ngraph::Function> fn_ptr;
|
||||
std::string deviceName;
|
||||
|
||||
public:
|
||||
void SetUp() override {
|
||||
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
deviceName = CommonTestUtils::DEVICE_GPU;
|
||||
auto with_auto_batching = this->GetParam();
|
||||
if (with_auto_batching) { // BATCH:GPU
|
||||
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
|
||||
}
|
||||
}
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
|
||||
auto with_auto_batch = obj.param;
|
||||
return std::string("RemoteBlob_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
|
||||
}
|
||||
};
|
||||
|
||||
TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
||||
TEST_P(RemoteBlob_Test, smoke_canInputUserBlob) {
|
||||
#if defined(ANDROID)
|
||||
GTEST_SKIP();
|
||||
#endif
|
||||
@ -41,7 +52,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
||||
|
||||
// TODO: Issue: investigate issue with IECore
|
||||
auto ie = InferenceEngine::Core();
|
||||
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net = ie.LoadNetwork(net, deviceName);
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net.CreateInferRequest();
|
||||
@ -70,6 +81,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
||||
|
||||
Blob::Ptr shared_blob = make_shared_blob(net.getInputsInfo().begin()->second->getTensorDesc(), cldnn_context,
|
||||
shared_buffer);
|
||||
shared_blob->allocate();
|
||||
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, shared_blob);
|
||||
|
||||
inf_req_shared.Infer();
|
||||
@ -85,7 +97,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
||||
}
|
||||
|
||||
|
||||
TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
||||
TEST_P(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
||||
#if defined(ANDROID)
|
||||
GTEST_SKIP();
|
||||
#endif
|
||||
@ -96,7 +108,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
||||
|
||||
// TODO: Issue: investigate issue with IECore
|
||||
auto ie = InferenceEngine::Core();
|
||||
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net = ie.LoadNetwork(net, deviceName);
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net.CreateInferRequest();
|
||||
@ -139,7 +151,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
||||
}
|
||||
|
||||
|
||||
TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
auto fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
CNNNetwork net(fn_ptr);
|
||||
|
||||
@ -149,7 +161,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -161,7 +173,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
|
||||
// inference using remote blob
|
||||
auto ocl_instance = std::make_shared<OpenCL>();
|
||||
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_context.get());
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_context.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, fakeImageData);
|
||||
@ -178,7 +190,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
#if defined _WIN32
|
||||
GTEST_SKIP();
|
||||
#endif
|
||||
@ -191,7 +203,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -214,7 +226,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
@ -270,7 +282,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
#if defined _WIN32
|
||||
GTEST_SKIP();
|
||||
#endif
|
||||
@ -283,7 +295,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -307,7 +319,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
@ -358,6 +370,10 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<bool> with_auto_batching {true, false};
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_RemoteBlob, RemoteBlob_Test, ::testing::ValuesIn(with_auto_batching),
|
||||
RemoteBlob_Test::getTestCaseName);
|
||||
|
||||
class BatchedBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<size_t> {
|
||||
void SetUp() override {
|
||||
num_batch = this->GetParam();
|
||||
|
@ -30,6 +30,7 @@ protected:
|
||||
}
|
||||
};
|
||||
|
||||
std::vector<bool> ov_with_auto_batching {true, false};
|
||||
enum class RemoteTensorSharingType {
|
||||
USER_CL_TENSOR = 0,
|
||||
PLUGIN_CL_TENSOR = 1,
|
||||
@ -54,17 +55,34 @@ std::ostream& operator<<(std::ostream& stream, RemoteTensorSharingType sharing_t
|
||||
return stream;
|
||||
}
|
||||
|
||||
class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test, public testing::WithParamInterface<RemoteTensorSharingType> {
|
||||
using RemoteTensorSharingTestOptionsParams = std::tuple<RemoteTensorSharingType, bool /*auto-batching*/>;
|
||||
|
||||
class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test,
|
||||
public testing::WithParamInterface<RemoteTensorSharingTestOptionsParams> {
|
||||
protected:
|
||||
std::shared_ptr<ngraph::Function> fn_ptr;
|
||||
std::string deviceName;
|
||||
|
||||
public:
|
||||
void SetUp() override {
|
||||
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
deviceName = CommonTestUtils::DEVICE_GPU;
|
||||
RemoteTensorSharingType sharing_type;
|
||||
bool with_auto_batching;
|
||||
std::tie(sharing_type, with_auto_batching) = this->GetParam();
|
||||
if (with_auto_batching) // BATCH:GPU
|
||||
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
|
||||
}
|
||||
|
||||
static std::string getTestCaseName(testing::TestParamInfo<RemoteTensorSharingType> obj) {
|
||||
RemoteTensorSharingType sharing_type = obj.param;
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<RemoteTensorSharingTestOptionsParams>& obj) {
|
||||
RemoteTensorSharingType sharing_type;
|
||||
bool with_auto_batching;
|
||||
std::tie(sharing_type, with_auto_batching) = obj.param;
|
||||
|
||||
std::ostringstream result;
|
||||
result << "OVRemoteTensorInputBlob_Test_";
|
||||
result << sharing_type;
|
||||
if (with_auto_batching)
|
||||
result << "_WITH_AUTO_BATCHING";
|
||||
return result.str();
|
||||
}
|
||||
};
|
||||
@ -81,9 +99,17 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
|
||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||
|
||||
auto function = p.build();
|
||||
auto exec_net = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
||||
RemoteTensorSharingType sharing_type;
|
||||
bool with_auto_batching;
|
||||
std::tie(sharing_type, with_auto_batching) = GetParam();
|
||||
|
||||
RemoteTensorSharingType sharing_type = GetParam();
|
||||
// auto-batching relies on availability of the lock() for the tensor (and the *USM_DEVICE is not lockable)
|
||||
if (with_auto_batching
|
||||
&& (RemoteTensorSharingType::USER_USM_DEVICE_TENSOR == sharing_type
|
||||
|| RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR == sharing_type))
|
||||
GTEST_SKIP();
|
||||
|
||||
auto exec_net = ie.compile_model(function, deviceName);
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net.create_infer_request();
|
||||
@ -244,6 +270,7 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
smoke_GPU,
|
||||
OVRemoteTensorInputBlob_Test,
|
||||
::testing::Combine(
|
||||
::testing::ValuesIn(std::vector<RemoteTensorSharingType>{RemoteTensorSharingType::USER_CL_TENSOR,
|
||||
RemoteTensorSharingType::PLUGIN_CL_TENSOR,
|
||||
RemoteTensorSharingType::USER_USM_HOST_TENSOR,
|
||||
@ -251,9 +278,29 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
RemoteTensorSharingType::PLUGIN_USM_HOST_TENSOR,
|
||||
RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR,
|
||||
RemoteTensorSharingType::PLUGIN_HOST_TENSOR}),
|
||||
::testing::ValuesIn(ov_with_auto_batching)),
|
||||
OVRemoteTensorInputBlob_Test::getTestCaseName);
|
||||
|
||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
|
||||
class OVRemoteTensor_TestsWithContext : public OVRemoteTensor_Test, public testing::WithParamInterface<bool> {
|
||||
protected:
|
||||
std::shared_ptr<ngraph::Function> fn_ptr;
|
||||
std::string deviceName;
|
||||
public:
|
||||
void SetUp() override {
|
||||
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
deviceName = CommonTestUtils::DEVICE_GPU;
|
||||
auto with_auto_batching = this->GetParam();
|
||||
if (with_auto_batching) { // BATCH:GPU
|
||||
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
|
||||
}
|
||||
}
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
|
||||
auto with_auto_batch = obj.param;
|
||||
return std::string("RemoteTensor_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
|
||||
}
|
||||
};
|
||||
|
||||
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContext) {
|
||||
auto ie = ov::runtime::Core();
|
||||
|
||||
using namespace ov::preprocess;
|
||||
@ -262,7 +309,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
|
||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||
auto function = p.build();
|
||||
|
||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||
auto input = function->get_parameters().at(0);
|
||||
auto output = function->get_results().at(0);
|
||||
|
||||
@ -296,7 +343,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
|
||||
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContextWithMultipleDevices) {
|
||||
auto ie = ov::runtime::Core();
|
||||
|
||||
using namespace ov::preprocess;
|
||||
@ -305,7 +352,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
|
||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||
auto function = p.build();
|
||||
|
||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||
auto input = function->get_parameters().at(0);
|
||||
auto output = function->get_results().at(0);
|
||||
|
||||
@ -344,7 +391,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_out_of_order) {
|
||||
auto ie = ov::runtime::Core();
|
||||
|
||||
using namespace ov::preprocess;
|
||||
@ -353,7 +400,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||
auto function = p.build();
|
||||
|
||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||
auto input = function->get_parameters().at(0);
|
||||
auto output = function->get_results().at(0);
|
||||
|
||||
@ -423,7 +470,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_in_order) {
|
||||
auto ie = ov::runtime::Core();
|
||||
|
||||
using namespace ov::preprocess;
|
||||
@ -432,7 +479,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||
auto function = p.build();
|
||||
|
||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
||||
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||
auto input = function->get_parameters().at(0);
|
||||
auto output = function->get_results().at(0);
|
||||
|
||||
@ -498,6 +545,9 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
}
|
||||
}
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_RemoteTensor, OVRemoteTensor_TestsWithContext, ::testing::ValuesIn(ov_with_auto_batching),
|
||||
OVRemoteTensor_TestsWithContext::getTestCaseName);
|
||||
|
||||
TEST_F(OVRemoteTensor_Test, NV12toBGR_image) {
|
||||
#if defined(ANDROID)
|
||||
GTEST_SKIP();
|
||||
|
@ -0,0 +1,31 @@
|
||||
// Copyright (C) 2018-2021 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
#include <auto_batching/auto_batching_tests.hpp>
|
||||
|
||||
const std::vector<size_t> num_streams{ 2 };
|
||||
const std::vector<bool> get_vs_set{ true, false };
|
||||
const std::vector<size_t> num_requests{ 1, 8, 16, 64 };
|
||||
const std::vector<size_t> num_batch{ 1, 8, 32, 256 };
|
||||
using namespace AutoBatchingTests;
|
||||
|
||||
namespace AutoBatchingTests {
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||
::testing::ValuesIn(get_vs_set),
|
||||
::testing::ValuesIn(num_streams),
|
||||
::testing::ValuesIn(num_requests),
|
||||
::testing::ValuesIn(num_batch)),
|
||||
AutoBatching_Test::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test_DetectionOutput,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||
::testing::ValuesIn(get_vs_set),
|
||||
::testing::ValuesIn(num_streams),
|
||||
::testing::ValuesIn(num_requests),
|
||||
::testing::ValuesIn(num_batch)),
|
||||
AutoBatching_Test_DetectionOutput::getTestCaseName);
|
||||
} // namespace AutoBatchingTests
|
@ -52,6 +52,10 @@ const std::vector<std::map<std::string, std::string>> autoConfig = {
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
||||
};
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> autoBatchConfig = {
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, ExecNetSetPrecision,
|
||||
::testing::Combine(
|
||||
::testing::ValuesIn(netPrecisions),
|
||||
@ -72,4 +76,11 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetSetPrecision,
|
||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||
::testing::ValuesIn(autoConfig)),
|
||||
ExecNetSetPrecision::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, ExecNetSetPrecision,
|
||||
::testing::Combine(
|
||||
::testing::ValuesIn(netPrecisions),
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(autoBatchConfig)),
|
||||
ExecNetSetPrecision::getTestCaseName);
|
||||
} // namespace
|
@ -22,27 +22,27 @@ namespace {
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_OPTIMAL_NUMBER_OF_INFER_REQUESTS,
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_METRICS,
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_NETWORK_NAME,
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_ThrowsUnsupported,
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||
);
|
||||
|
||||
//
|
||||
|
@ -19,6 +19,10 @@ const std::vector<std::map<std::string, std::string>> autoConfigs = {
|
||||
{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
|
||||
};
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestCallbackTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestCallbackTests,
|
||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||
::testing::ValuesIn(autoConfigs)),
|
||||
InferRequestCallbackTests::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestCallbackTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(autoBatchConfigs)),
|
||||
InferRequestCallbackTests::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -18,6 +18,10 @@ const std::vector<std::map<std::string, std::string>> autoconfigs = {
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, std::string(CommonTestUtils::DEVICE_CPU) + "," + CommonTestUtils::DEVICE_GPU}}
|
||||
};
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestMultithreadingTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestMultithreadingTes
|
||||
::testing::ValuesIn(autoconfigs)),
|
||||
InferRequestMultithreadingTests::getTestCaseName);
|
||||
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestMultithreadingTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(auto_batch_configs)),
|
||||
InferRequestMultithreadingTests::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -19,6 +19,11 @@ namespace {
|
||||
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
|
||||
};
|
||||
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestWaitTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||
@ -37,4 +42,10 @@ namespace {
|
||||
::testing::ValuesIn(autoConfigs)),
|
||||
InferRequestWaitTests::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestWaitTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(autoBatchConfigs)),
|
||||
InferRequestWaitTests::getTestCaseName);
|
||||
|
||||
} // namespace
|
||||
|
@ -30,11 +30,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassNetworkTestP, OVClassNetworkTestP, ::tes
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
OVClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
OVClassGetMetricTest_SUPPORTED_METRICS,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
OVClassGetMetricTest_AVAILABLE_DEVICES,
|
||||
@ -42,7 +42,7 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
OVClassGetMetricTest_FULL_DEVICE_NAME,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
OVClassGetMetricTest_OPTIMIZATION_CAPABILITIES,
|
||||
@ -62,11 +62,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||
OVClassGetMetricTest_ThrowUnsupported,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetConfigTest,
|
||||
OVClassGetConfigTest_ThrowUnsupported,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetAvailableDevices, OVClassGetAvailableDevices, ::testing::Values("GPU"));
|
||||
|
||||
|
@ -104,6 +104,29 @@ namespace {
|
||||
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU},
|
||||
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "should be int"}}
|
||||
};
|
||||
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> auto_batch_inconfigs = {
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
|
||||
{CONFIG_KEY(AUTO_BATCH_TIMEOUT), "-1"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, "DOESN'T EXIST"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "-1"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, "ON"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "unknown_file"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_DUMP_KERNELS, "ON"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_TUNING_MODE, "TUNING_UNKNOWN_MODE"}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_DEVICE_ID, "DEVICE_UNKNOWN"}},
|
||||
};
|
||||
|
||||
|
||||
IE_SUPPRESS_DEPRECATED_END
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, IncorrectConfigTests,
|
||||
@ -125,6 +148,12 @@ namespace {
|
||||
IncorrectConfigTests::getTestCaseName);
|
||||
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(auto_batch_inconfigs)),
|
||||
IncorrectConfigTests::getTestCaseName);
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> conf = {
|
||||
{}
|
||||
};
|
||||
@ -167,17 +196,6 @@ namespace {
|
||||
};
|
||||
IE_SUPPRESS_DEPRECATED_END
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> multiconf = {
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::THROUGHPUT}},
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}},
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
|
||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "1"}}
|
||||
};
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> autoConfigs = {
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
||||
@ -232,6 +250,12 @@ namespace {
|
||||
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "2"}}
|
||||
};
|
||||
|
||||
const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||
{CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "1"}},
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, DefaultValuesConfigTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||
@ -255,4 +279,15 @@ namespace {
|
||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||
::testing::ValuesIn(autoinconfigs)),
|
||||
IncorrectConfigAPITests::getTestCaseName);
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigAPITests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(auto_batch_inconfigs)),
|
||||
IncorrectConfigAPITests::getTestCaseName);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, CorrectConfigTests,
|
||||
::testing::Combine(
|
||||
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||
::testing::ValuesIn(auto_batch_configs)),
|
||||
CorrectConfigTests::getTestCaseName);
|
||||
} // namespace
|
||||
|
@ -35,12 +35,12 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_METRICS,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
@ -50,7 +50,7 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_FULL_DEVICE_NAME,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
@ -80,12 +80,12 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_ThrowUnsupported,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassGetConfigTest, IEClassGetConfigTest_ThrowUnsupported,
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||
);
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
@ -115,6 +115,26 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
::testing::Values("GPU")
|
||||
);
|
||||
|
||||
using IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE = BehaviorTestsUtils::IEClassBaseTestP;
|
||||
TEST_P(IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE, GetMetricAndPrintNoThrow) {
|
||||
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
||||
InferenceEngine::Core ie;
|
||||
InferenceEngine::Parameter p;
|
||||
|
||||
std::map<std::string, InferenceEngine::Parameter> _options = {{"MODEL_PTR", simpleCnnNetwork.getFunction()}};
|
||||
ASSERT_NO_THROW(p = ie.GetMetric(deviceName, METRIC_KEY(OPTIMAL_BATCH_SIZE), _options).as<unsigned int>());
|
||||
unsigned int t = p;
|
||||
|
||||
std::cout << "GPU device optimal batch size: " << t << std::endl;
|
||||
|
||||
ASSERT_METRIC_SUPPORTED_IE(METRIC_KEY(OPTIMAL_BATCH_SIZE));
|
||||
}
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE,
|
||||
::testing::Values("GPU")
|
||||
);
|
||||
|
||||
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT = BehaviorTestsUtils::IEClassBaseTestP;
|
||||
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT, GetMetricAndPrintNoThrow) {
|
||||
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
||||
@ -135,6 +155,7 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
::testing::Values("GPU")
|
||||
);
|
||||
|
||||
|
||||
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM = BehaviorTestsUtils::IEClassBaseTestP;
|
||||
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM, GetMetricAndPrintNoThrow) {
|
||||
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
||||
|
@ -16,6 +16,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||
list(APPEND DEPENDENCIES ov_auto_plugin)
|
||||
endif()
|
||||
|
||||
if(ENABLE_AUTO_BATCH)
|
||||
list(APPEND DEPENDENCIES ov_auto_batch_plugin)
|
||||
endif()
|
||||
|
||||
|
||||
# remove once CVS-69781 is fixed
|
||||
if(ENABLE_OV_IR_FRONTEND)
|
||||
list(APPEND DEPENDENCIES ov_ir_frontend)
|
||||
|
@ -0,0 +1,161 @@
|
||||
// Copyright (C) 2018-2021 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include <string>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
|
||||
#include <gpu/gpu_config.hpp>
|
||||
#include <common_test_utils/test_common.hpp>
|
||||
#include <functional_test_utils/plugin_cache.hpp>
|
||||
|
||||
#include "ngraph_functions/subgraph_builders.hpp"
|
||||
#include "functional_test_utils/blob_utils.hpp"
|
||||
|
||||
using namespace ::testing;
|
||||
using namespace InferenceEngine;
|
||||
|
||||
namespace AutoBatchingTests {
|
||||
using AutoBatchTwoNetsParams = std::tuple<
|
||||
std::string, // device name
|
||||
bool, // get or set blob
|
||||
size_t, // number of streams
|
||||
size_t, // number of requests
|
||||
size_t>; // batch size>
|
||||
|
||||
class AutoBatching_Test : public CommonTestUtils::TestsCommon,
|
||||
public testing::WithParamInterface<AutoBatchTwoNetsParams> {
|
||||
void SetUp() override {
|
||||
std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
|
||||
fn_ptrs = {ngraph::builder::subgraph::makeSingleConv(),
|
||||
ngraph::builder::subgraph::makeMultiSingleConv()};
|
||||
};
|
||||
public:
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
|
||||
size_t streams, requests, batch;
|
||||
bool use_get_blob;
|
||||
std::string device_name;
|
||||
std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
|
||||
return device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") + "_batch_size_" +
|
||||
std::to_string(batch) +
|
||||
"_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
|
||||
}
|
||||
|
||||
protected:
|
||||
std::string device_name;
|
||||
bool use_get_blob;
|
||||
size_t num_streams;
|
||||
size_t num_requests;
|
||||
size_t num_batch;
|
||||
std::vector<std::shared_ptr<ngraph::Function>> fn_ptrs;
|
||||
|
||||
void TestAutoBatch() {
|
||||
std::vector<InferenceEngine::CNNNetwork> nets;
|
||||
for (auto &fn_ptr : fn_ptrs) {
|
||||
nets.push_back(CNNNetwork(fn_ptr));
|
||||
}
|
||||
|
||||
auto ie = InferenceEngine::Core();
|
||||
std::vector<std::string> outputs;
|
||||
std::vector<InferRequest> irs;
|
||||
std::vector<std::vector<uint8_t>> ref;
|
||||
std::vector<int> outElementsCount;
|
||||
|
||||
for (size_t i = 0; i < nets.size(); ++i) {
|
||||
auto net = nets[i];
|
||||
auto inputs = net.getInputsInfo();
|
||||
for (auto n : inputs) {
|
||||
n.second->setPrecision(Precision::FP32);
|
||||
}
|
||||
std::map<std::string, std::string> config;
|
||||
if (device_name.find("GPU") != std::string::npos)
|
||||
config[CONFIG_KEY(GPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
|
||||
if (device_name.find("CPU") != std::string::npos)
|
||||
config[CONFIG_KEY(CPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
|
||||
// minimize timeout to reduce test time
|
||||
config[CONFIG_KEY(AUTO_BATCH_TIMEOUT)] = std::to_string(1);
|
||||
auto exec_net_ref = ie.LoadNetwork(net, std::string(CommonTestUtils::DEVICE_BATCH) + ":" +
|
||||
device_name + "(" + std::to_string(num_batch) + ")",
|
||||
config);
|
||||
|
||||
for (size_t j = 0; j < num_requests; j++) {
|
||||
outputs.push_back(net.getOutputsInfo().begin()->first); //single output
|
||||
outElementsCount.push_back(
|
||||
std::accumulate(begin(fn_ptrs[i]->get_output_shape(0)), end(fn_ptrs[i]->get_output_shape(0)), 1,
|
||||
std::multiplies<size_t>()));
|
||||
|
||||
auto inf_req = exec_net_ref.CreateInferRequest();
|
||||
irs.push_back(inf_req);
|
||||
|
||||
std::vector<std::vector<uint8_t>> inData;
|
||||
for (auto n : inputs) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(n.second->getTensorDesc());
|
||||
if (use_get_blob)
|
||||
memcpy(reinterpret_cast<void *>(inf_req.GetBlob(n.first)->buffer().as<uint8_t*>()),
|
||||
reinterpret_cast<const void *>(blob->cbuffer().as<uint8_t*>()), blob->byteSize());
|
||||
else
|
||||
inf_req.SetBlob(n.first, blob);
|
||||
|
||||
const auto inBlob = inf_req.GetBlob(n.first);
|
||||
const auto blobSize = inBlob->byteSize();
|
||||
const auto inBlobBuf = inBlob->cbuffer().as<uint8_t *>();
|
||||
inData.push_back(std::vector<uint8_t>(inBlobBuf, inBlobBuf + blobSize));
|
||||
}
|
||||
auto refOutData = ngraph::helpers::interpreterFunction(fn_ptrs[i], {inData}).front().second;
|
||||
ref.push_back(refOutData);
|
||||
}
|
||||
}
|
||||
|
||||
const int niter = 1;
|
||||
for (int i = 0; i < niter; i++) {
|
||||
for (auto ir : irs) {
|
||||
ir.StartAsync();
|
||||
}
|
||||
|
||||
for (auto ir : irs) {
|
||||
ir.Wait(InferRequest::RESULT_READY);
|
||||
}
|
||||
}
|
||||
|
||||
auto thr = FuncTestUtils::GetComparisonThreshold(InferenceEngine::Precision::FP32);
|
||||
for (size_t i = 0; i < irs.size(); ++i) {
|
||||
const auto &refBuffer = ref[i].data();
|
||||
ASSERT_EQ(outElementsCount[i], irs[i].GetBlob(outputs[i])->size());
|
||||
FuncTestUtils::compareRawBuffers(irs[i].GetBlob(outputs[i])->buffer().as<float *>(),
|
||||
reinterpret_cast<const float *>(refBuffer), outElementsCount[i],
|
||||
outElementsCount[i],
|
||||
thr);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
class AutoBatching_Test_DetectionOutput : public AutoBatching_Test {
|
||||
public:
|
||||
void SetUp() override {
|
||||
std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
|
||||
fn_ptrs = {ngraph::builder::subgraph::makeEltwisePlusDetectionOutput(),
|
||||
ngraph::builder::subgraph::makeEltwisePlusDetectionOutput()};
|
||||
};
|
||||
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
|
||||
size_t streams, requests, batch;
|
||||
bool use_get_blob;
|
||||
std::string device_name;
|
||||
std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
|
||||
return "DetectionOutput_HETERO_" + device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") +
|
||||
"_batch_size_" + std::to_string(batch) +
|
||||
"_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
|
||||
}
|
||||
};
|
||||
|
||||
TEST_P(AutoBatching_Test, compareAutoBatchingToSingleBatch) {
|
||||
TestAutoBatch();
|
||||
}
|
||||
|
||||
TEST_P(AutoBatching_Test_DetectionOutput, compareAutoBatchingToSingleBatch) {
|
||||
TestAutoBatch();
|
||||
}
|
||||
|
||||
} // namespace AutoBatchingTests
|
@ -10,6 +10,7 @@ const char DEVICE_AUTO[] = "AUTO";
|
||||
const char DEVICE_CPU[] = "CPU";
|
||||
const char DEVICE_GNA[] = "GNA";
|
||||
const char DEVICE_GPU[] = "GPU";
|
||||
const char DEVICE_BATCH[] = "BATCH";
|
||||
const char DEVICE_HDDL[] = "HDDL";
|
||||
const char DEVICE_MYRIAD[] = "MYRIAD";
|
||||
const char DEVICE_KEEMBAY[] = "VPUX";
|
||||
|
@ -26,6 +26,9 @@ public:
|
||||
MOCK_METHOD3(ImportNetwork, InferenceEngine::SoExecutableNetworkInternal(
|
||||
std::istream&, const std::shared_ptr<InferenceEngine::RemoteContext>&, const std::map<std::string, std::string>&));
|
||||
|
||||
MOCK_METHOD2(CreateContext, InferenceEngine::RemoteContext::Ptr(const std::string& deviceName,
|
||||
const InferenceEngine::ParamMap& params));
|
||||
|
||||
MOCK_CONST_METHOD3(QueryNetwork, InferenceEngine::QueryNetworkResult(
|
||||
const InferenceEngine::CNNNetwork&, const std::string&, const std::map<std::string, std::string>&));
|
||||
|
||||
|
@ -242,6 +242,44 @@ inline std::shared_ptr<ngraph::Function> makeSingleConv(std::vector<size_t> inpu
|
||||
return fn_ptr;
|
||||
}
|
||||
|
||||
inline std::shared_ptr<ngraph::Function> makeEltwisePlusDetectionOutput(std::vector<std::vector<size_t>> inShapes =
|
||||
{{1, 60}, {1, 165}, {1, 1, 75}},
|
||||
ngraph::element::Type_t type = ngraph::element::Type_t::f32) {
|
||||
// adding Eltwise so that we can tests Auto-Batching's HETERO code-path that splits the DetectionOutput and the rest of the network
|
||||
auto params = ngraph::builder::makeParams(ngraph::element::f32, inShapes);
|
||||
auto paramOuts = ngraph::helpers::convert2OutputVector(
|
||||
ngraph::helpers::castOps2Nodes<ngraph::opset3::Parameter>(params));
|
||||
ngraph::OutputVector outs;
|
||||
for (size_t i = 0; i < inShapes.size(); i++) {
|
||||
auto shape = inShapes[i];
|
||||
auto p = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::f32, ngraph::Shape{shape});
|
||||
auto add = ngraph::builder::makeEltwise(paramOuts[i], p, ngraph::helpers::EltwiseTypes::ADD);
|
||||
params.push_back(p);
|
||||
outs.push_back(add->output(0));
|
||||
}
|
||||
ngraph::op::DetectionOutput::Attributes attr;
|
||||
attr.num_classes = 11;
|
||||
attr.background_label_id = 0;
|
||||
attr.top_k = 75;
|
||||
attr.variance_encoded_in_target = true;
|
||||
attr.keep_top_k = {50};
|
||||
attr.code_type = std::string{"caffe.PriorBoxParameter.CORNER"};
|
||||
attr.share_location = true;
|
||||
attr.nms_threshold = 0.5f;
|
||||
attr.confidence_threshold = 0.5f;
|
||||
attr.clip_after_nms = false;
|
||||
attr.clip_before_nms = false;
|
||||
attr.decrease_label_id = false;
|
||||
attr.normalized = false;
|
||||
attr.input_height = 1;
|
||||
attr.input_width = 1;
|
||||
attr.objectness_score = 0.4f;
|
||||
|
||||
auto detOut = ngraph::builder::makeDetectionOutput(outs, attr);
|
||||
ngraph::ResultVector results{std::make_shared<ngraph::opset3::Result>(detOut)};
|
||||
return std::make_shared<ngraph::Function>(results, params, "EltWiseWithDetectionOutput");
|
||||
}
|
||||
|
||||
inline std::shared_ptr<ngraph::Function> makeMultiSingleConv(std::vector<size_t> inputShape = {1, 3, 24, 24},
|
||||
ngraph::element::Type type = ngraph::element::Type_t::f32) {
|
||||
auto param0 = std::make_shared<ngraph::opset1::Parameter>(type, ngraph::Shape(inputShape));
|
||||
|
@ -38,6 +38,7 @@ using Config = std::map<std::string, std::string>;
|
||||
using namespace MockMultiDevice;
|
||||
|
||||
using ConfigParams = std::tuple<
|
||||
bool, // if THROUGHPUT
|
||||
unsigned int, // cpu OPTIMAL_NUMBER_OF_INFER_REQUESTS
|
||||
int, // cpu infer requet num of customer want
|
||||
bool, // if cpu sleep, cpu device will load slow
|
||||
@ -77,12 +78,18 @@ public:
|
||||
unsigned int expectOptimalNum;
|
||||
bool cpuSleep;
|
||||
bool gpuSleep;
|
||||
std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
||||
bool isThroughput;
|
||||
std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
||||
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = obj.param;
|
||||
std::ostringstream result;
|
||||
result << "cpuOptimalNum_" << cpuOptimalNum << "cpuCustomerNum_" << cpuCustomerNum;
|
||||
result << "gpuOptimalNum_" << gpuOptimalNum << "gpuCustomerNum_" << gpuCustomerNum;
|
||||
result << "expectOptimalNum_" << expectOptimalNum;
|
||||
if (isThroughput) {
|
||||
result << "_isThroughput" << "true";
|
||||
} else {
|
||||
result << "__isThroughput" << "false";
|
||||
}
|
||||
if (cpuSleep) {
|
||||
result << "_cpuSleep_" << "true";
|
||||
} else {
|
||||
@ -147,7 +154,7 @@ public:
|
||||
IE_SET_METRIC(SUPPORTED_CONFIG_KEYS, supportConfigs, {});
|
||||
ON_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _))
|
||||
.WillByDefault(RETURN_MOCK_VALUE(supportConfigs));
|
||||
EXPECT_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _)).Times(AnyNumber());
|
||||
EXPECT_CALL(*core, GetMetric(_, _, _)).Times(AnyNumber());
|
||||
|
||||
// test auto plugin
|
||||
config.insert({CONFIG_KEY_INTERNAL(MULTI_WORK_MODE_AS_AUTO), InferenceEngine::PluginConfigParams::YES});
|
||||
@ -168,11 +175,24 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
|
||||
unsigned int expectOptimalNum;
|
||||
bool cpuSleep;
|
||||
bool gpuSleep;
|
||||
std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
||||
bool isThroughput;
|
||||
std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
||||
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = this->GetParam();
|
||||
|
||||
if (isThroughput) {
|
||||
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {{CONFIG_KEY(PERFORMANCE_HINT),
|
||||
InferenceEngine::PluginConfigParams::THROUGHPUT}}, cpuCustomerNum, ""});
|
||||
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {{CONFIG_KEY(PERFORMANCE_HINT),
|
||||
InferenceEngine::PluginConfigParams::THROUGHPUT}}, gpuCustomerNum, ""});
|
||||
IE_SET_METRIC(OPTIMAL_BATCH_SIZE, optimalBatchNum, 256);
|
||||
IE_SET_METRIC(RANGE_FOR_STREAMS, rangeOfStreams, std::make_tuple<unsigned int, unsigned int>(1, 2));
|
||||
ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(OPTIMAL_BATCH_SIZE)), _))
|
||||
.WillByDefault(RETURN_MOCK_VALUE(optimalBatchNum));
|
||||
ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(RANGE_FOR_STREAMS)), _))
|
||||
.WillByDefault(RETURN_MOCK_VALUE(rangeOfStreams));
|
||||
} else {
|
||||
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""});
|
||||
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""});
|
||||
}
|
||||
ON_CALL(*plugin, SelectDevice(_, _, _)).WillByDefault(Return(metaDevices[1]));
|
||||
ON_CALL(*plugin, ParseMetaDevices(_, _)).WillByDefault(Return(metaDevices));
|
||||
EXPECT_CALL(*plugin, ParseMetaDevices(_, _)).Times(1);
|
||||
@ -241,27 +261,28 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
|
||||
}
|
||||
|
||||
|
||||
// ConfigParams {unsigned int, int, bool,
|
||||
// ConfigParams {bool, unsigned int, int, bool,
|
||||
// unsigned int, int, bool, unsigned int}
|
||||
//
|
||||
// every element for ConfigParams
|
||||
// {cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
|
||||
// {is throughput mode, cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
|
||||
// gpuOptimalNum, customer hope for gpu infer requset num, if gpu sleep when load,
|
||||
// expectOptimalNum of Auto ExecNetwork}
|
||||
//
|
||||
const std::vector<ConfigParams> testConfigs = {
|
||||
ConfigParams {1, -1, false, 2, -1, true, 8},
|
||||
ConfigParams {1, -1, false, 10, -1, true, 8},
|
||||
ConfigParams {12, -1, false, 2, -1, true, 12},
|
||||
ConfigParams {12, -1, false, 10, -1, true, 12},
|
||||
ConfigParams {1, -1, true, 2, -1, false, 8},
|
||||
ConfigParams {1, -1, true, 10, -1, false, 10},
|
||||
ConfigParams {6, -1, true, 2, -1, false, 8},
|
||||
ConfigParams {6, -1, true, 10, -1, false, 10},
|
||||
ConfigParams {6, 4, false, 2, 3, true, 8},
|
||||
ConfigParams {6, 4, false, 10, 3, true, 8},
|
||||
ConfigParams {1, 4, true, 2, 3, false, 8},
|
||||
ConfigParams {1, 4, true, 10, 3, false, 10}
|
||||
ConfigParams {false, 1, -1, false, 2, -1, true, 8},
|
||||
ConfigParams {false, 1, -1, false, 10, -1, true, 8},
|
||||
ConfigParams {false, 12, -1, false, 2, -1, true, 12},
|
||||
ConfigParams {false, 12, -1, false, 10, -1, true, 12},
|
||||
ConfigParams {false, 1, -1, true, 2, -1, false, 8},
|
||||
ConfigParams {false, 1, -1, true, 10, -1, false, 10},
|
||||
ConfigParams {false, 6, -1, true, 2, -1, false, 8},
|
||||
ConfigParams {false, 6, -1, true, 10, -1, false, 10},
|
||||
ConfigParams {false, 6, 4, false, 2, 3, true, 8},
|
||||
ConfigParams {false, 6, 4, false, 10, 3, true, 8},
|
||||
ConfigParams {false, 1, 4, true, 2, 3, false, 8},
|
||||
ConfigParams {false, 1, 4, true, 10, 3, false, 10},
|
||||
ConfigParams {true, 1, 4, false, 10, 3, true, 512}
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetworkGetMetric,
|
||||
|
@ -14,6 +14,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
||||
endif()
|
||||
|
||||
if(ENABLE_AUTO_BATCH)
|
||||
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
|
||||
endif()
|
||||
|
||||
|
||||
target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/plugin_tests")
|
||||
|
||||
target_link_libraries(${TARGET_NAME} PUBLIC
|
||||
|
@ -25,6 +25,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
||||
endif()
|
||||
|
||||
if(ENABLE_AUTO_BATCH)
|
||||
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
|
||||
endif()
|
||||
|
||||
set_ie_threading_interface_for(${TARGET_NAME})
|
||||
|
||||
ie_faster_build(${TARGET_NAME}
|
||||
|
Loading…
Reference in New Issue
Block a user