Auto Batching impl (#7883)

* auto-batching POC squashed (all commits from auto-batch-2021.3 branch) (cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7) * applying/accomodating the API changes after rebase to the master * replaying modified version of actual batch selection * eearly experiments with model mem footprint * changes from rebasing to the latest master * experimenting with DG1 on the batch size selection, also collecting the mem footprint * WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap * quick-n-dirty batch footpint vs device total mem * code style * testing which models perform badly due to kernels and NOT (batched) footprint * stub pipeline task to comunicate the readiness rather than promise/future * quick-n-dirty timeout impl * explicit _completionTasks,reverting BA to use the timeout * inputs outputs copies, works with AUTO and demo now * accomodate the config per device-id, after rebase to the latest master * allowing the auto-batching only with tput hint to let more conventional tests pass * fix the pre-mature timeout restaring via waiting for batch1 requests completion * moved the bacthed request statring ( along with input copies) to the dedicated thread * [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution * code style * increasing the timeout to test the ssd_* models perf (timeout?) issues * reducing number of output stuff in BA to avoid bloating the logs in experiments * more aggressive batching for experiments, not limited to 32 and also 4 as a min * more accurate timeout debugging info * getting the reqs limitation from the plugin SetConfig as well * refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context * let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4) * auto-batching functional test (with results check vs ref) and GPU instance for that * fixed arithemtic on blobs ptrs * clang * handling possible batched network failure * BATCH as the constants device name in test * ENABLE_BATCH * func tests for CPU, also DetectionOutput hetero tests (CPU and GPU) * DetectionOutput hetero test for the CPU * reenabling the Auto-Batching in the AUTO * auto-batching device enabled in the test * fixed the DO test * improve the loading loop logic * brushed the config keys * allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests * fix the test after refactoring * clang * moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now * auto-batching hetero test (subgraph with DetectionOutput) * fixed minor changes that were result of experiments with impl * code-style * brushing, disabling CPU's HETERO tests until planned activity for 22.2 * removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team * remote blobs tests for the auto-batching (old API) * brushed names a bit * CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests * fixed the ieUnitTests with adding CreateContext stub to the MockICore * clang * improved remote-blobs tests * revert the back BA from exeprimenents with AB + device_use_mem * conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE * remote blobs 2.0 tests, issue with context having the orig device name * debugging DG1 perf drop (presumably due to non-fitting the device-mem) * disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2 * remote blobs 2.0 tests for different tensor sharing types * converting assert to throw to accomodate legacy API where the lock() was possible to be called * revert the timeout back to avoid mixing the studies, fixed the footprint calc * reverting to estimating the max batch by extrapolating from bacth1 size * more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication * even graceful batch 1 handling without duplication * WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching * AutoBatchPlugin -> ov_auto_batch_plugin * WA for gcc 4.8 * clang * fix misprint * fixed errors resulted from recent OV's Variant to Any transition * skip auto-batching for already-batched networks * AUTO_BATCH_TIMEOUT and tests * GPU-specific L3 * switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic * debugging device info * enabling the config tests for the GPU and fixing the Auto-batching tests to pass * making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers * skip auto-batching for RNNs and alikes (e.g. single CHW input) * fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating * brushing * Auto plugin GetMetric support gpu auto-batch Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * add test case Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * add comments on test Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * brushing the vars names, alos adding the excpetion handling * disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures * add try catch Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * brushing the code changed in the GPU plugin * Auto-Batch requests tests * brushed varibles a bit (ref) * cleaned debug output from the ie_core * cleaned cmake for the Auto-Batch * removed batchN estimation from batch1 * cleaned from debug printf * comments, cleanup * WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13 * Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1. * brushing ie_core.cpp * fix 32bit compilation * Code review: ENABLE_AUTO_BATCH * consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching * renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR * default value for the OPTIMAL_BATCH_SIZE * clang * accomodate new func tests location * fix shuffle of headers after clang + copyrights * fixed misprint made during code refactoring * moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header * switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching * switching from the unsafe size() and minimizing time under lock * code style * brushed the ApplyAutoBatching * brushed the netric/config names and descriptions * completed the core intergration tests for the auto-batching * ExecGraphInfo and check for incorrect cfg * removed explicit dependencies from cmake file of the plugin * disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests Co-authored-by: Roman Lyamin <roman.lyamin@intel.com> Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
2021-12-24 12:55:22 +03:00 · 2021-12-24 12:55:22 +03:00 · 49b5e5728b
commit 49b5e5728b
parent bc5da8d522
47 changed files with 1882 additions and 188 deletions
--- a/cmake/features.cmake
+++ b/cmake/features.cmake
@ -100,6 +100,8 @@ ie_option (ENABLE_GAPI_PREPROCESSING "Enables G-API preprocessing" ON)
 ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON)
 ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON)
 ie_option (ENABLE_AUTO_BATCH "Enables Auto-Batching Plugin" ON)
 ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON)
 ie_option (ENABLE_TEMPLATE "Enable template plugin" ON)
--- a/docs/IE_DG/supported_plugins/GPU.md
+++ b/docs/IE_DG/supported_plugins/GPU.md
@ -141,6 +141,9 @@ When specifying key values as raw strings (that is, when using Python API), omit
@snippet snippets/GPU_Metric1.cpp part1
 * OPTIMAL_BATCH_SIZE : Returns _optimal_ batch size for a given network on the given GPU device. The returned value is aligned to power of 2. Also, MODEL_PTR is the required option for this metric since the optimal batch size highly depends on the model. If the MODEL_PTR is not given, the value of 1 is returned. The example code to set the required and optional configs for this metric is available in the following snippet:
@snippet snippets/GPU_Metric1.cpp part2
 ## GPU Context and Video Memory Sharing RemoteBlob API
 See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md)
--- a/docs/snippets/GPU_Metric1.cpp
+++ b/docs/snippets/GPU_Metric1.cpp
@ -14,4 +14,12 @@ options.insert(std::make_pair("AVAILABLE_DEVICE_MEM_SIZE", available_device_mem_
 auto max_batch_size = core.GetMetric("GPU", GPU_METRIC_KEY(MAX_BATCH_SIZE), options).as<uint32_t>();
 //! [part1]
 //! [part2]
 std::map<std::string, Parameter> opt = {{"MODEL_PTR", cnnNetwork.getFunction()}}; // Required. Same usage as for the MAX_BATCH_SIZE above. If not set, the OPTIONAL_BATCH_SIZE returns 1.
 // This is not entirely GPU-specific metric (so METRIC_KEY is used rather than GPU_METRIC_KEY below),
 // but the GPU is the only device that supports that at the moment.
 // For the GPU, the metric already accommodates limitation for the on-device memory that the MAX_BATCH_SIZE poses.
 // so OPTIMAL_BATCH_SIZE is always less than MAX_BATCH_SIZE. Unlike the latter it is also aligned to the power of 2.
 auto optimal_batch_size = core.GetMetric("GPU", METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
 //! [part2]
 }
--- a/inference-engine/thirdparty/clDNN/api/intel_gpu/runtime/device_info.hpp
+++ b/inference-engine/thirdparty/clDNN/api/intel_gpu/runtime/device_info.hpp
@ -6,6 +6,7 @@
 #include <string>
 #include <vector>
 #include <tuple>
 namespace cldnn {
 /// @addtogroup cpp_api C++ API
@ -25,6 +26,10 @@ struct gfx_version {
    uint16_t major;
    uint8_t minor;
    uint8_t revision;
    friend bool operator < (const gfx_version& l, const gfx_version& r)  {
        return std::tie(l.major, l.minor, l.revision)
               < std::tie(r.major, r.minor, r.revision); // same order
    }
 };
 /// @brief Information about the device properties and capabilities.
--- a/samples/cpp/benchmark_app/remote_blobs_filling.cpp
+++ b/samples/cpp/benchmark_app/remote_blobs_filling.cpp
@ -124,6 +124,7 @@ std::map<std::string, std::vector<InferenceEngine::Blob::Ptr>> getRemoteInputBlo
        }
        auto blob = InferenceEngine::gpu::make_shared_blob(desc, context, clBuffer.back());
        blob->allocate();
        remoteBlobs[name].push_back(blob);
    };
--- a/samples/cpp/benchmark_app/utils.cpp
+++ b/samples/cpp/benchmark_app/utils.cpp
@ -109,8 +109,10 @@ std::vector<float> splitFloat(const std::string& s, char delim) {
 std::vector<std::string> parseDevices(const std::string& device_string) {
    std::string comma_separated_devices = device_string;
-    if (comma_separated_devices.find(":") != std::string::npos) {
+    auto colon = comma_separated_devices.find(":");
-        comma_separated_devices = comma_separated_devices.substr(comma_separated_devices.find(":") + 1);
+    if (colon != std::string::npos) {
        auto bracket = comma_separated_devices.find("(");  // e.g. in BATCH:GPU(4)
        comma_separated_devices = comma_separated_devices.substr(colon + 1, bracket - colon - 1);
    }
    if ((comma_separated_devices == "MULTI") || (comma_separated_devices == "HETERO"))
        return std::vector<std::string>();
--- a/src/bindings/c/tests/CMakeLists.txt
+++ b/src/bindings/c/tests/CMakeLists.txt
@ -26,6 +26,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
    add_dependencies(${TARGET_NAME} ov_auto_plugin)
 endif()
 if(ENABLE_AUTO_BATCH)
    add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
 endif()
 if(ENABLE_INTEL_CPU)
    add_dependencies(${TARGET_NAME} ov_intel_cpu_plugin)
 endif()
--- a/src/inference/dev_api/ie_icore.hpp
+++ b/src/inference/dev_api/ie_icore.hpp
@ -16,6 +16,7 @@
 #include "cpp/ie_cnn_network.h"
 #include "cpp_interfaces/interface/ie_iexecutable_network_internal.hpp"
 #include "ie_parameter.hpp"
 #include "ie_remote_context.hpp"
 #include "threading/ie_itask_executor.hpp"
 namespace InferenceEngine {
@ -60,6 +61,22 @@ public:
                                                    const std::string& deviceName,
                                                    const std::map<std::string, std::string>& config = {}) = 0;
    /**
     * @brief Creates an executable network from a network object.
     *
     * Users can create as many networks as they need and use
     *        them simultaneously (up to the limitation of the hardware resources)
     *
     * @param network CNNNetwork object acquired from Core::ReadNetwork
     * @param remoteCtx  "Remote" (non-CPU) accelerator device-specific execution context to use
     * @param config Optional map of pairs: (config parameter name, config parameter value) relevant only for this load
     * operation
     * @return An executable network reference
     */
    virtual SoExecutableNetworkInternal LoadNetwork(const CNNNetwork& network,
                                                    const RemoteContext::Ptr& remoteCtx,
                                                    const std::map<std::string, std::string>& config = {}) = 0;
    /**
     * @brief Creates an executable network from a model file.
     *
@ -142,6 +159,16 @@ public:
     */
    virtual bool DeviceSupportsImportExport(const std::string& deviceName) const = 0;
    /**
     * @brief Create a new shared context object on specified accelerator device
     * using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
     * @param deviceName Name of a device to create new shared context on.
     * @param params Map of device-specific shared context parameters.
     * @return A shared pointer to a created remote context.
     */
    virtual InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
                                                              const InferenceEngine::ParamMap&) = 0;
    virtual bool isNewAPI() const = 0;
    /**
@ -165,6 +192,7 @@ public:
    static std::vector<std::string> getHeteroDevices(std::string fallbackDevice);
    static std::vector<std::string> getMultiDevices(std::string devicesList);
    static std::string getBatchDevice(std::string devicesList);
 };
 }  // namespace InferenceEngine
--- a/src/inference/dev_api/performance_heuristics.hpp
+++ b/src/inference/dev_api/performance_heuristics.hpp
@ -23,14 +23,12 @@ struct MemBandwidthPressure {
 static MemBandwidthPressure MemBandwidthPressureTolerance(
    const std::shared_ptr<ngraph::Function> nGraphFunc,
-    const float L2_cache_size,
+    const float cache_size,
    const float L3_cache_size,
    const float memThresholdAssumeLimited = MemBandwidthPressure::LIMITED) {
    int total_convs = 0, mem_limited_convs = 0, compute_convs = 0, total_gemms = 0, mem_limited_gemms = 0,
        total_deconvs = 0, compute_deconvs = 0, mem_limited_deconvs = 0;
-    auto memLimitedFactor = [&](int size_data_moved, int datatype_size) -> float {
+    auto memLimitedFactor = [&](int size_data_moved, int datatype_size = 4) -> float {
-        return (L2_cache_size * 1.0f /*util factor, tbd */
+        return (cache_size / (size_data_moved * datatype_size));
                / (size_data_moved * datatype_size));
    };
    auto isLowPrecision = [&](ngraph::element::Type type) -> bool {
        return (type == ngraph::element::i8) || (type == ngraph::element::u8);
--- a/src/inference/dev_api/threading/ie_thread_safe_containers.hpp
+++ b/src/inference/dev_api/threading/ie_thread_safe_containers.hpp
@ -0,0 +1,86 @@
 // Copyright (C) 2018-2021 Intel Corporation
 // SPDX-License-Identifier: Apache-2.0
 //
 ///////////////////////////////////////////////////////////////////////////////////////////////////
 #pragma once
 #include <cstddef>
 #include <mutex>
 #include <queue>
 #include <type_traits>
 #include "ie_parallel.hpp"
 #if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
 #    include <tbb/concurrent_queue.h>
 #endif
 namespace InferenceEngine {
 template <typename T>
 class ThreadSafeQueueWithSize {
 public:
    void push(T value) {
        std::lock_guard<std::mutex> lock(_mutex);
        _queue.push(std::move(value));
    }
    bool try_pop(T& value) {
        std::lock_guard<std::mutex> lock(_mutex);
        if (!_queue.empty()) {
            value = std::move(_queue.front());
            _queue.pop();
            return true;
        } else {
            return false;
        }
    }
    size_t size() {
        std::lock_guard<std::mutex> lock(_mutex);
        return _queue.size();
    }
 protected:
    std::queue<T> _queue;
    std::mutex _mutex;
 };
 #if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
 template <typename T>
 using ThreadSafeQueue = tbb::concurrent_queue<T>;
 template <typename T>
 using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
 #else
 template <typename T>
 using ThreadSafeQueue = ThreadSafeQueueWithSize<T>;
 template <typename T>
 class ThreadSafeBoundedQueue {
 public:
    ThreadSafeBoundedQueue() = default;
    bool try_push(T value) {
        std::lock_guard<std::mutex> lock(_mutex);
        if (_capacity) {
            _queue.push(std::move(value));
        }
        return _capacity;
    }
    bool try_pop(T& value) {
        std::lock_guard<std::mutex> lock(_mutex);
        if (_capacity && !_queue.empty()) {
            value = std::move(_queue.front());
            _queue.pop();
            return true;
        } else {
            return false;
        }
    }
    void set_capacity(std::size_t newCapacity) {
        std::lock_guard<std::mutex> lock(_mutex);
        _capacity = newCapacity;
    }
 protected:
    std::queue<T> _queue;
    std::mutex _mutex;
    bool _capacity = false;
 };
 #endif
 }  // namespace InferenceEngine
--- a/src/inference/include/ie/ie_plugin_config.hpp
+++ b/src/inference/include/ie/ie_plugin_config.hpp
@ -118,6 +118,18 @@ DECLARE_METRIC_VALUE(BATCHED_BLOB);
 * String value for metric name is "RANGE_FOR_STREAMS".
 */
 DECLARE_METRIC_KEY(RANGE_FOR_STREAMS, std::tuple<unsigned int, unsigned int>);
 /**
 * @brief Metric to query information optimal batch size for the given device and the network
 *
 * Metric returns a value of unsigned int type,
 * Returns optimal batch size for a given network on the given device. The returned value is aligned to power of 2.
 * Also, MODEL_PTR is the required option for this metric since the optimal batch size depends on the model,
 * so if the MODEL_PTR is not given, the result of the metric is always 1.
 * For the GPU the metric is queried automatically whenever the OpenVINO performance hint for the throughput is used,
 * so that the result (>1) governs the automatic batching (transparently to the application).
 * The automatic batching can be disabled with ALLOW_AUTO_BATCHING set to NO
 */
 DECLARE_METRIC_KEY(OPTIMAL_BATCH_SIZE, unsigned int);
 /**
 * @brief Metric to provide a hint for a range for number of async infer requests. If device supports streams,
@ -250,6 +262,15 @@ DECLARE_CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS);
 DECLARE_CONFIG_VALUE(YES);
 DECLARE_CONFIG_VALUE(NO);
 /**
 * @brief Auto-batching configuration, string for the device + batch size, e.g. "GPU(4)"
 */
 DECLARE_CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG);
 /**
 * @brief Auto-batching configuration: string with timeout (in ms), e.g. "100"
 */
 DECLARE_CONFIG_KEY(AUTO_BATCH_TIMEOUT);
 /**
 * @brief Limit `#threads` that are used by Inference Engine for inference on the CPU.
 */
--- a/src/inference/src/ie_core.cpp
+++ b/src/inference/src/ie_core.cpp
@ -46,6 +46,7 @@
 #endif
 using namespace InferenceEngine::PluginConfigParams;
 using namespace InferenceEngine;
 using namespace std::placeholders;
 namespace ov {
@ -94,6 +95,9 @@ Parsed<T> parseDeviceNameIntoConfig(const std::string& deviceName, const std::ma
            config_[ie::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES] =
                deviceName.substr(std::string("AUTO:").size());
        }
    } else if (deviceName_.find("BATCH:") == 0) {
        deviceName_ = "BATCH";
        config_[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceName.substr(6);
    } else {
        ie::DeviceIDParser parser(deviceName_);
        deviceName_ = parser.getDeviceName();
@ -480,14 +484,22 @@ public:
        return newAPI;
    }
-    ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(const ie::CNNNetwork& network,
+    ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(
-                                                                   const std::shared_ptr<ie::RemoteContext>& context,
+        const ie::CNNNetwork& network,
-                                                                   const std::map<std::string, std::string>& config) {
+        const std::shared_ptr<ie::RemoteContext>& context,
        const std::map<std::string, std::string>& config) override {
        OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext");
        if (context == nullptr) {
            IE_THROW() << "Remote context is null";
        }
        // have to deduce the device name/config from the context first
        auto parsed = parseDeviceNameIntoConfig(context->getDeviceName(), config);
        std::string& deviceName = parsed._deviceName;
        std::map<std::string, std::string>& config_with_batch = parsed._config;
        // if auto-batching is applicable, the below function will patch the device name and config accordingly:
        ApplyAutoBatching(network, deviceName, config_with_batch);
        parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
        auto plugin = GetCPPPluginByName(parsed._deviceName);
        ov::runtime::SoPtr<ie::IExecutableNetworkInternal> res;
        auto cacheManager = coreConfig.getCacheConfig()._cacheManager;
@ -508,12 +520,59 @@ public:
        return res;
    }
    void ApplyAutoBatching(const ie::CNNNetwork& network,
                           std::string& deviceName,
                           std::map<std::string, std::string>& config_with_batch) {
        if (deviceName.find("BATCH") != std::string::npos) {
            // explicitly enabled Auto-Batching e.g. in the tests
            auto pos = deviceName.find_first_of(":");
            if (pos != std::string::npos) {
                auto deviceNameWithBatchSize = deviceName.substr(pos + 1);
                auto deviceNameWithoutBatch = DeviceIDParser::getBatchDevice(deviceNameWithBatchSize);
                auto function = network.getFunction();
                // have to execute the DetectionOutput separately (without batching)
                // as this layer mix-in the values from the different inputs (batch id)
                bool bDetectionOutput = false;
                const std::string detectionOutputOpName = ngraph::op::DetectionOutput::get_type_info_static().name;
                const std::string resultOpName = ngraph::op::Result::get_type_info_static().name;
                for (auto&& node : function->get_ops()) {
                    auto isDetectionOutputParent = [&detectionOutputOpName](decltype(node)& nd) {
                        for (size_t n = 0; n < nd->get_input_size(); n++) {
                            if (detectionOutputOpName == nd->get_input_node_ptr(n)->get_type_info().name)
                                return true;
                        }
                        return false;
                    };
                    if ((detectionOutputOpName == node->get_type_info().name) ||
                        ((resultOpName == node->get_type_info().name) && isDetectionOutputParent(node))) {
                        node->get_rt_info()["affinity"] = deviceNameWithoutBatch;
                        bDetectionOutput = true;
                    } else {
                        node->get_rt_info()["affinity"] = "BATCH";
                    }
                }
                if (bDetectionOutput) {
                    deviceName = "HETERO:BATCH," + deviceNameWithoutBatch;
                    config_with_batch[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceNameWithBatchSize;
                } else {
                    deviceName = "BATCH:" + deviceNameWithBatchSize;
                }
            }
        }
    }
    ie::SoExecutableNetworkInternal LoadNetwork(const ie::CNNNetwork& network,
-                                                const std::string& deviceName,
+                                                const std::string& deviceNameOrig,
                                                const std::map<std::string, std::string>& config) override {
        OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::CNN");
-        bool forceDisableCache = config.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
+        std::string deviceName = deviceNameOrig;
-        auto parsed = parseDeviceNameIntoConfig(deviceName, config);
+        std::map<std::string, std::string> config_with_batch = config;
        // if auto-batching is applicable, the below function will patch the device name and config accordingly:
        ApplyAutoBatching(network, deviceName, config_with_batch);
        bool forceDisableCache = config_with_batch.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
        auto parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
        if (forceDisableCache) {
            // remove this config key from parsed as plugins can throw unsupported exception
            parsed._config.erase(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE));
@ -732,6 +791,19 @@ public:
        return devices;
    }
    /**
     * @brief Create a new shared context object on specified accelerator device
     * using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
     * @param deviceName Name of a device to create new shared context on.
     * @param params Map of device-specific shared context parameters.
     * @return A shared pointer to a created remote context.
     */
    InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
                                                      const InferenceEngine::ParamMap& params) override {
        auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
        return GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
    }
    /**
     * @brief Returns reference to CPP plugin wrapper by a device name
     * @param deviceName A name of device
@ -1030,6 +1102,12 @@ public:
                    deviceNames = ie::DeviceIDParser::getMultiDevices(deviceName.substr(pos + 1));
                }
                deviceNames.emplace_back("AUTO");
            } else if (deviceName.find("BATCH") == 0) {
                auto pos = deviceName.find_first_of(":");
                if (pos != std::string::npos) {
                    deviceNames = {ie::DeviceIDParser::getBatchDevice(deviceName.substr(pos + 1))};
                }
                deviceNames.push_back("BATCH");
            } else {
                deviceNames.push_back(deviceName);
            }
@ -1120,8 +1198,8 @@ std::vector<std::string> DeviceIDParser::getHeteroDevices(std::string fallbackDe
 }
 std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList) {
-    std::vector<std::string> deviceNames;
+    std::set<std::string> deviceNames;
-    auto trim_request_info = [](std::string device_with_requests) {
+    auto trim_request_info = [](const std::string& device_with_requests) {
        auto opening_bracket = device_with_requests.find_first_of('(');
        return device_with_requests.substr(0, opening_bracket);
    };
@ -1132,14 +1210,36 @@ std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList
    // we skip the #requests info here
    while ((pos = devicesList.find(delimiter)) != std::string::npos) {
        auto d = devicesList.substr(0, pos);
-        deviceNames.push_back(trim_request_info(d));
+        if (d.find("BATCH") == 0) {
            deviceNames.insert("BATCH");
            auto p = d.find_first_of(":");
            if (p != std::string::npos)
                deviceNames.insert(DeviceIDParser::getBatchDevice(d.substr(p + 1)));
        } else {
            deviceNames.insert(trim_request_info(d));
        }
        devicesList.erase(0, pos + 1);
    }
-    if (!devicesList.empty())
+    if (!devicesList.empty()) {
-        deviceNames.push_back(trim_request_info(devicesList));
+        if (devicesList.find("BATCH") == 0) {
            deviceNames.insert("BATCH");
            auto p = devicesList.find_first_of(":");
            if (p != std::string::npos)
                deviceNames.insert(DeviceIDParser::getBatchDevice(devicesList.substr(p + 1)));
        } else {
            deviceNames.insert(trim_request_info(devicesList));
        }
    }
    return std::vector<std::string>(deviceNames.begin(), deviceNames.end());
 }
-    return deviceNames;
+std::string DeviceIDParser::getBatchDevice(std::string device) {
    auto trim_request_info = [](const std::string& device_with_requests) {
        auto opening_bracket = device_with_requests.find_first_of('(');
        return device_with_requests.substr(0, opening_bracket);
    };
    return trim_request_info(device);
 }
 class Core::Impl : public ov::runtime::CoreImpl {
@ -1207,18 +1307,7 @@ ExecutableNetwork Core::LoadNetwork(const std::string& modelPath, const std::map
 }
 RemoteContext::Ptr Core::CreateContext(const std::string& deviceName, const ParamMap& params) {
-    if (deviceName.find("HETERO") == 0) {
+    return _impl->CreateContext(deviceName, params);
        IE_THROW() << "HETERO device does not support remote context";
    }
    if (deviceName.find("MULTI") == 0) {
        IE_THROW() << "MULTI device does not support remote context";
    }
    if (deviceName.find("AUTO") == 0) {
        IE_THROW() << "AUTO device does not support remote context";
    }
    auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
    return _impl->GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
 }
 RemoteContext::Ptr Core::GetDefaultContext(const std::string& deviceName) {
--- a/src/plugins/CMakeLists.txt
+++ b/src/plugins/CMakeLists.txt
@ -21,3 +21,7 @@ endif()
 if(ENABLE_AUTO OR ENABLE_MULTI)
    add_subdirectory(auto)
 endif()
 if(ENABLE_AUTO_BATCH)
    add_subdirectory(auto_batch)
 endif()
--- a/src/plugins/auto/executable_network.cpp
+++ b/src/plugins/auto/executable_network.cpp
@ -156,7 +156,8 @@ MultiDeviceExecutableNetwork::MultiDeviceExecutableNetwork(const std::string&
                                                           , _needPerfCounters(needPerfCounters)
                                                           , _multiPlugin(plugin)
                                                           , _context(context)
-                                                           , _workModeIsAUTO(true) {
+                                                           , _workModeIsAUTO(true)
                                                           , _network(network) {
    if (_multiPlugin->GetCore() == nullptr) {
        IE_THROW() << "Please, work with " << _multiPlugin->GetName() << " device via InferencEngine::Core object";
    }
@ -667,10 +668,30 @@ InferenceEngine::Parameter MultiDeviceExecutableNetwork::GetMetric(const std::st
                real = _loadContext[ACTUALDEVICE].
                    executableNetwork->GetMetric(name).as<unsigned int>();
            } else {
                IE_ASSERT(_loadContext[CPU].isAlready == true);
                real = _loadContext[CPU].
                    executableNetwork->GetMetric(name).as<unsigned int>();
                std::unique_lock<std::mutex> lock(_confMutex);
                auto deviceInfo =  _loadContext[ACTUALDEVICE].deviceInfo;
                lock.unlock();
                if (deviceInfo.deviceName.find("GPU") != std::string::npos) {
                    const auto& mode = deviceInfo.config.find(CONFIG_KEY(PERFORMANCE_HINT));
                    if (mode != deviceInfo.config.end() && mode->second == CONFIG_VALUE(THROUGHPUT)) {
                         std::map<std::string, InferenceEngine::Parameter> options;
                         options["MODEL_PTR"] = _network.getFunction(); // CNNntework
                         try {
                             auto optimalBatchSize = _core->GetMetric(deviceInfo.deviceName,
                                     METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
                             auto rangeOfStreams = _core->GetMetric(deviceInfo.deviceName,
                                     METRIC_KEY(RANGE_FOR_STREAMS), options).as<std::tuple<unsigned int, unsigned int>>();
                             real = (std::max)(real, std::get<1>(rangeOfStreams) * optimalBatchSize);
                         } catch (const InferenceEngine::Exception &iie) {
                             LOG_WARNING("[AUTOPLUGIN]get optimal infer requset num for GPU auto-batch failed :%s", iie.what());
                         }
                    }
                }
            }
-            unsigned int res = std::max(8u, real);
+            unsigned int res = (std::max)(8u, real);
            IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, res);
        }
--- a/src/plugins/auto/executable_network.hpp
+++ b/src/plugins/auto/executable_network.hpp
@ -7,22 +7,17 @@
 #include <atomic>
 #include <mutex>
 #include <queue>
 #include <unordered_map>
 #include <map>
 #include <vector>
 #include <string>
-#include <cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp>
+#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
-#include <ie_parallel.hpp>
+#include "threading/ie_thread_safe_containers.hpp"
-#include <threading/ie_itask_executor.hpp>
+#include "threading/ie_itask_executor.hpp"
-#include <threading/ie_executor_manager.hpp>
+#include "threading/ie_executor_manager.hpp"
 #include "ie_icore.hpp"
 #if (IE_THREAD == IE_THREAD_TBB || IE_THREAD == IE_THREAD_TBB_AUTO)
 # include <tbb/concurrent_queue.h>
 #endif
 #ifdef  MULTIUNITTEST
 #define MOCKTESTMACRO virtual
 #define MultiDevicePlugin MockMultiDevicePlugin
@ -79,66 +74,6 @@ enum AutoLoadContextIndex {
 template<typename T>
 using DeviceMap = std::unordered_map<DeviceName, T>;
 #if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
 template <typename T>
 using ThreadSafeQueue = tbb::concurrent_queue<T>;
 template <typename T>
 using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
 #else
 template <typename T>
 class ThreadSafeQueue {
 public:
    void push(T value) {
        std::lock_guard<std::mutex> lock(_mutex);
        _queue.push(std::move(value));
    }
    bool try_pop(T& value) {
        std::lock_guard<std::mutex> lock(_mutex);
        if (!_queue.empty()) {
            value = std::move(_queue.front());
            _queue.pop();
            return true;
        } else {
            return false;
        }
    }
 protected:
    std::queue<T>   _queue;
    std::mutex      _mutex;
 };
 template <typename T>
 class ThreadSafeBoundedQueue {
 public:
    ThreadSafeBoundedQueue() = default;
    bool try_push(T value) {
        std::lock_guard<std::mutex> lock(_mutex);
        if (_capacity) {
            _queue.push(std::move(value));
        }
        return _capacity;
    }
    bool try_pop(T& value) {
        std::lock_guard<std::mutex> lock(_mutex);
        if (_capacity && !_queue.empty()) {
            value = std::move(_queue.front());
            _queue.pop();
            return true;
        } else {
            return false;
        }
    }
    void set_capacity(std::size_t newCapacity) {
        std::lock_guard<std::mutex> lock(_mutex);
        _capacity = newCapacity;
    }
 protected:
    std::queue<T>   _queue;
    std::mutex      _mutex;
    bool            _capacity = false;
 };
 #endif
 class MultiDeviceExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault,
                                     public InferenceEngine::ITaskExecutor {
 public:
@ -148,7 +83,7 @@ public:
        InferenceEngine::Task                     _task;
        std::exception_ptr                        _exceptionPtr = nullptr;
    };
-    using NotBusyWorkerRequests = ThreadSafeBoundedQueue<WorkerInferRequest*>;
+    using NotBusyWorkerRequests = InferenceEngine::ThreadSafeBoundedQueue<WorkerInferRequest*>;
    explicit MultiDeviceExecutableNetwork(const DeviceMap<InferenceEngine::SoExecutableNetworkInternal>&        networksPerDevice,
                                          const std::vector<DeviceInformation>&                                 networkDevices,
@ -186,8 +121,8 @@ public:
    std::vector<DeviceInformation>                              _devicePriorities;
    const std::vector<DeviceInformation>                        _devicePrioritiesInitial;
    DeviceMap<InferenceEngine::SoExecutableNetworkInternal>     _networksPerDevice;
-    ThreadSafeQueue<InferenceEngine::Task>                      _inferPipelineTasks;
+    InferenceEngine::ThreadSafeQueue<InferenceEngine::Task>                      _inferPipelineTasks;
-    DeviceMap<std::unique_ptr<ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
+    DeviceMap<std::unique_ptr<InferenceEngine::ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
    DeviceMap<NotBusyWorkerRequests>                            _idleWorkerRequests;
    DeviceMap<std::vector<WorkerInferRequest>>                  _workerRequests;
    std::unordered_map<std::string, InferenceEngine::Parameter> _config;
@ -217,6 +152,7 @@ private:
    std::promise<void>                                                  _firstLoadPromise;
    mutable AutoLoadContext                                             _loadContext[CONTEXTNUM];
    mutable std::mutex                                                  _confMutex;
    const InferenceEngine::CNNNetwork                                   _network;
 };
 }  // namespace MultiDevicePlugin
--- a/src/plugins/auto_batch/CMakeLists.txt
+++ b/src/plugins/auto_batch/CMakeLists.txt
@ -0,0 +1,20 @@
 # Copyright (C) 2018-2021 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 #
 set(TARGET_NAME "ov_auto_batch_plugin")
 file(GLOB SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
 file(GLOB HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
 ie_add_plugin(NAME ${TARGET_NAME}
              DEVICE_NAME "BATCH"
              SOURCES ${SOURCES} ${HEADERS}
              VERSION_DEFINES_FOR auto_batch.cpp ADD_CLANG_FORMAT)
 target_link_libraries(${TARGET_NAME} PRIVATE Threads::Threads)
 ie_add_api_validator_post_build_step(TARGET ${TARGET_NAME})
 set_target_properties(${TARGET_NAME} PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE ${ENABLE_LTO})
--- a/src/plugins/auto_batch/auto_batch.cpp
+++ b/src/plugins/auto_batch/auto_batch.cpp
@ -0,0 +1,731 @@
 // Copyright (C) 2018-2021 Intel Corporation
 // SPDX-License-Identifier: Apache-2.0
 //
 ///////////////////////////////////////////////////////////////////////////////////////////////////
 #include "auto_batch.hpp"
 #include <cpp_interfaces/interface/ie_internal_plugin_config.hpp>
 #include <ie_icore.hpp>
 #include <ie_ngraph_utils.hpp>
 #include <ie_performance_hints.hpp>
 #include <iostream>
 #include <map>
 #include <memory>
 #include <string>
 #include <unordered_map>
 #include <unordered_set>
 #include <utility>
 #include <vector>
 namespace AutoBatchPlugin {
 using namespace InferenceEngine;
 std::vector<std::string> supported_configKeys = {CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CONFIG_KEY(AUTO_BATCH_TIMEOUT)};
 template <Precision::ePrecision precision>
 Blob::Ptr create_shared_blob_on_top_of_batched_blob(Blob::Ptr batched_blob, size_t batch_id, size_t batch_num) {
    typedef typename PrecisionTrait<precision>::value_type TYPE;
    typedef typename std::add_pointer<TYPE>::type TYPEPTR;
    auto ptr = batched_blob->buffer().as<TYPEPTR>();
    auto sizePerBatch = batched_blob->size() / batch_num;
    auto layout = batched_blob->getTensorDesc().getLayout();
    SizeVector dims = batched_blob->getTensorDesc().getDims();
    // the below code is a placeholder for the WIP (22.1) functionality
    // that will check the reshaping by the batch is robust (CVS-51744)
    if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
        layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
        layout == InferenceEngine::Layout::NDHWC) {
        dims[0] = 1;
        assert(batched_blob->getTensorDesc().getPrecision() == precision);
        return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()},
                                      ptr + sizePerBatch * batch_id,
                                      sizePerBatch);
    } else {
        // same blob for all requests (e.g. constants)
        return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()}, ptr);
    }
 }
 // ------------------------------AutoBatchInferRequest----------------------------
 AutoBatchInferRequest::AutoBatchInferRequest(const InputsDataMap& networkInputs,
                                             const OutputsDataMap& networkOutputs,
                                             AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
                                             int batch_id,
                                             int num_batch,
                                             bool needPerfCounters)
    : IInferRequestInternal(networkInputs, networkOutputs),
      _myBatchedRequestWrapper(workerRequestPtr),
      _needPerfCounters(needPerfCounters),
      _batchId(batch_id),
      _batchSize(num_batch) {
    // Allocate all input blobs
    for (const auto& it : networkInputs) {
        auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
        Blob::Ptr res;
        switch (it.second->getTensorDesc().getPrecision()) {
        case InferenceEngine::Precision::FP32:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::I32:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::I8:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::U16:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::I16:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::U8:
        case InferenceEngine::Precision::BOOL:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        default:
            IE_THROW() << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
        }
        _inputs[it.first] = res;
    }
    // Allocate all output blobs
    for (const auto& it : networkOutputs) {
        auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
        Blob::Ptr res;
        switch (it.second->getTensorDesc().getPrecision()) {
        case InferenceEngine::Precision::FP32:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::I32:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::I8:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::U16:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::I16:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        case InferenceEngine::Precision::U8:
        case InferenceEngine::Precision::BOOL:
            res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
                _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
                batch_id,
                num_batch);
            break;
        default:
            IE_THROW(NotImplemented) << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
        }
        _outputs[it.first] = res;
    }
 }
 void AutoBatchInferRequest::SetBlobsToAnotherRequest(SoIInferRequestInternal& req) {
    for (const auto& it : _networkInputs) {
        auto& name = it.first;
        // this request is already in BUSY state, so using the internal functions safely
        auto blob = GetBlob(name);
        if (req->GetBlob(name) != blob)
            req->SetBlob(name, blob);
    }
    for (const auto& it : _networkOutputs) {
        auto& name = it.first;
        // this request is already in BUSY state, so using the internal functions safely
        auto blob = GetBlob(name);
        if (req->GetBlob(name) != blob)
            req->SetBlob(name, blob);
    }
 }
 void AutoBatchInferRequest::CopyInputsIfNeeded() {
    for (const auto& it : _networkInputs) {
        auto& name = it.first;
        // this request is already in BUSY state, so using the internal functions safely
        CopyBlobIfNeeded(GetBlob(name), _myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), true);
    }
 }
 void AutoBatchInferRequest::CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src,
                                             InferenceEngine::Blob::Ptr dst,
                                             bool bInput) {
    auto bufferDst = dst->buffer();
    auto ptrDst = bufferDst.as<char*>();
    auto bufferSrc = src->cbuffer();
    auto ptrSrc = bufferSrc.as<const char*>();
    ptrdiff_t szDst = dst->byteSize();
    ptrdiff_t szSrc = src->byteSize();
    if (bInput) {
        ptrdiff_t offset = szSrc != szDst ? _batchId * szDst / _batchSize : 0;
        if ((ptrDst + offset) == ptrSrc)
            return;
        else
            memcpy(ptrDst + offset, ptrSrc, szSrc);
    } else {
        ptrdiff_t offset = szSrc != szDst ? _batchId * szSrc / _batchSize : 0;
        if ((ptrSrc + offset) == ptrDst)
            return;
        else
            memcpy(ptrDst, ptrSrc + offset, szDst);
    }
 }
 void AutoBatchInferRequest::CopyOutputsIfNeeded() {
    for (const auto& it : _networkOutputs) {
        auto& name = it.first;
        // this request is already in BUSY state, so using the internal functions safely
        CopyBlobIfNeeded(_myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), GetBlob(name), false);
    }
 }
 std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> AutoBatchInferRequest::GetPerformanceCounts() const {
    return _perfMap;
 }
 AutoBatchAsyncInferRequest::AutoBatchAsyncInferRequest(
    const AutoBatchInferRequest::Ptr& inferRequest,
    const bool needPerfCounters,
    InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
    const ITaskExecutor::Ptr& callbackExecutor)
    : AsyncInferRequestThreadSafeDefault(inferRequest, nullptr, callbackExecutor),
      _inferRequestWithoutBatch(inferRequestWithoutBatch),
      _inferRequest{inferRequest} {
    // this executor starts the inference while  the task (checking the result) is passed to the next stage
    struct ThisRequestExecutor : public ITaskExecutor {
        explicit ThisRequestExecutor(AutoBatchAsyncInferRequest* _this_) : _this{_this_} {}
        void run(Task task) override {
            auto& workerInferRequest = _this->_inferRequest->_myBatchedRequestWrapper;
            std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
            t.first = _this;
            t.second = std::move(task);
            workerInferRequest._tasks.push(t);
            // it is ok to call size() here as the queue only grows (and the bulk removal happens under the mutex)
            const int sz = workerInferRequest._tasks.size();
            if (sz == workerInferRequest._batchSize) {
                workerInferRequest._cond.notify_one();
            }
        };
        AutoBatchAsyncInferRequest* _this = nullptr;
    };
    _pipeline = {
        {/*TaskExecutor*/ std::make_shared<ThisRequestExecutor>(this), /*task*/ [this, needPerfCounters] {
             if (this->_inferRequest->_exceptionPtr)  // if the exception happened in the batch1 fallback
                 std::rethrow_exception(this->_inferRequest->_exceptionPtr);
             if (this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr)  // when the batchN execution failed
                 std::rethrow_exception(this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr);
             this->_inferRequest->CopyOutputsIfNeeded();
         }}};
 }
 void AutoBatchAsyncInferRequest::Infer_ThreadUnsafe() {
    InferUsingAsync();
 }
 AutoBatchAsyncInferRequest::~AutoBatchAsyncInferRequest() {
    StopAndWait();
 }
 // ------------------------------AutoBatchExecutableNetwork----------------------------
 AutoBatchExecutableNetwork::AutoBatchExecutableNetwork(
    const InferenceEngine::SoExecutableNetworkInternal& networkWithBatch,
    const InferenceEngine::SoExecutableNetworkInternal& networkWithoutBatch,
    const DeviceInformation& networkDevice,
    const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
    const bool needPerfCounters)
    : InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr,
                                                          std::make_shared<InferenceEngine::ImmediateExecutor>()),
      _network{networkWithBatch},
      _networkWithoutBatch{networkWithoutBatch},
      _config{config},
      _needPerfCounters{needPerfCounters} {
    // WA for gcc 4.8 ( fails compilation with member init-list)
    _device = networkDevice;
    auto time_out = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
    if (time_out != config.end())
        _timeOut = ParseTimeoutValue(time_out->second.as<std::string>());
 }
 AutoBatchExecutableNetwork::~AutoBatchExecutableNetwork() {
    _terminate = true;
    for (auto w : _workerRequests) {
        w->_thread.join();
    }
    _workerRequests.clear();
 }
 unsigned int AutoBatchExecutableNetwork::ParseTimeoutValue(const std::string& s) {
    auto val = std::stoi(s);
    if (val < 0)
        IE_THROW(ParameterMismatch) << "Value for the " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " should be unsigned int";
    return val;
 }
 std::shared_ptr<InferenceEngine::RemoteContext> AutoBatchExecutableNetwork::GetContext() const {
    return _network->GetContext();
 }
 InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequestImpl(
    InferenceEngine::InputsDataMap networkInputs,
    InferenceEngine::OutputsDataMap networkOutputs) {
    // todo : guard request creation from another thread/on-the-fly
    auto num = _numRequestsCreated++;
    auto batch_id = num % _device.batchForDevice;
    if (!batch_id) {  // need new request
        _workerRequests.push_back(std::make_shared<WorkerInferRequest>());
        auto workerRequestPtr = _workerRequests.back();
        workerRequestPtr->_inferRequestBatched = {_network->CreateInferRequest(), _network._so};
        workerRequestPtr->_batchSize = _device.batchForDevice;
        workerRequestPtr->_completionTasks.resize(workerRequestPtr->_batchSize);
        workerRequestPtr->_inferRequestBatched->SetCallback(
            [workerRequestPtr, this](std::exception_ptr exceptionPtr) mutable {
                if (exceptionPtr)
                    workerRequestPtr->_exceptionPtr = exceptionPtr;
                IE_ASSERT(workerRequestPtr->_completionTasks.size() == (size_t)workerRequestPtr->_batchSize);
                // notify the individual requests on the completion
                for (int c = 0; c < workerRequestPtr->_batchSize; c++) {
                    workerRequestPtr->_completionTasks[c]();
                }
                // reset the timeout
                workerRequestPtr->_cond.notify_one();
            });
        workerRequestPtr->_thread = std::thread([workerRequestPtr, this] {
            while (1) {
                std::cv_status status;
                {
                    std::unique_lock<std::mutex> lock(workerRequestPtr->_mutex);
                    status = workerRequestPtr->_cond.wait_for(lock, std::chrono::milliseconds(_timeOut));
                }
                if (_terminate) {
                    break;
                } else {
                    // as we pop the tasks from the queue only here
                    // it is ok to call size() (as the _tasks can only grow in parallel)
                    const int sz = workerRequestPtr->_tasks.size();
                    if (sz == workerRequestPtr->_batchSize) {
                        std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
                        for (int n = 0; n < sz; n++) {
                            IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
                            workerRequestPtr->_completionTasks[n] = std::move(t.second);
                            t.first->_inferRequest->CopyInputsIfNeeded();
                        }
                        workerRequestPtr->_inferRequestBatched->StartAsync();
                    } else if ((status == std::cv_status::timeout) && sz) {
                        // timeout to collect the batch is over, have to execute the requests in the batch1 mode
                        std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
                        // popping all tasks collected by the moment of the time-out and execute each with batch1
                        std::atomic<int> arrived = {0};
                        std::promise<void> all_completed;
                        auto all_completed_future = all_completed.get_future();
                        for (int n = 0; n < sz; n++) {
                            IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
                            t.first->_inferRequestWithoutBatch->SetCallback(
                                [t, sz, &arrived, &all_completed](std::exception_ptr p) {
                                    if (p)
                                        t.first->_inferRequest->_exceptionPtr = p;
                                    t.second();
                                    if (sz == ++arrived)
                                        all_completed.set_value();
                                });
                            t.first->_inferRequest->SetBlobsToAnotherRequest(t.first->_inferRequestWithoutBatch);
                            t.first->_inferRequestWithoutBatch->StartAsync();
                        }
                        all_completed_future.get();
                        // now when all the tasks for this batch are completed, start waiting for the timeout again
                    }
                }
            }
        });
    }
    return std::make_shared<AutoBatchInferRequest>(networkInputs,
                                                   networkOutputs,
                                                   *_workerRequests.back(),
                                                   batch_id,
                                                   _device.batchForDevice,
                                                   _needPerfCounters);
 }
 InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequest() {
    auto syncRequestImpl = CreateInferRequestImpl(_networkInputs, _networkOutputs);
    syncRequestImpl->setPointerToExecutableNetworkInternal(shared_from_this());
    InferenceEngine::SoIInferRequestInternal inferRequestWithoutBatch = {_networkWithoutBatch->CreateInferRequest(),
                                                                         _networkWithoutBatch._so};
    return std::make_shared<AutoBatchAsyncInferRequest>(
        std::static_pointer_cast<AutoBatchInferRequest>(syncRequestImpl),
        _needPerfCounters,
        inferRequestWithoutBatch,
        _callbackExecutor);
 }
 std::shared_ptr<ngraph::Function> AutoBatchExecutableNetwork::GetExecGraphInfo() {
    return _network->GetExecGraphInfo() ? _network->GetExecGraphInfo() : _networkWithoutBatch->GetExecGraphInfo();
 }
 void AutoBatchExecutableNetwork::SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) {
    auto timeout = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
    if (timeout == config.end() || config.size() > 1) {
        IE_THROW() << "The only config that can be changed on the fly for the AutoBatching the is the "
                   << CONFIG_KEY(AUTO_BATCH_TIMEOUT);
    } else {
        _timeOut = ParseTimeoutValue(timeout->second.as<std::string>());
    }
 }
 InferenceEngine::Parameter AutoBatchExecutableNetwork::GetConfig(const std::string& name) const {
    auto it = _config.find(name);
    if (it != _config.end()) {
        return it->second;
    } else {
        // find config key among networks config keys
        auto param = _network->GetMetric(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
        for (auto&& configKey : param.as<std::vector<std::string>>()) {
            if (configKey == name) {
                return _network->GetConfig(configKey);
            }
        }
        IE_THROW(NotFound) << name << " not found in the ExecutableNetwork config";
    }
 }
 InferenceEngine::Parameter AutoBatchExecutableNetwork::GetMetric(const std::string& name) const {
    if (name == METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)) {
        auto reqs = 0;
        try {
            auto hint = _network->GetConfig(CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)).as<std::string>();
            reqs = InferenceEngine::PerfHintsConfig::CheckPerformanceHintRequestValue(hint);
            if (!reqs)  // no limitations from user, let's deduce the full blown #requests
                // (multiplied by the devices capabilities to run multiple <batched> requests for further perf)
                reqs = _device.batchForDevice *
                       _network->GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as<unsigned int>();
        } catch (const InferenceEngine::Exception& iie) {
        }
        reqs = std::max(reqs, _device.batchForDevice);  // round up to the possible  user's value
        IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, reqs);
    } else if (name == METRIC_KEY(NETWORK_NAME)) {
        IE_SET_METRIC_RETURN(NETWORK_NAME, _network->GetMetric(METRIC_KEY(NETWORK_NAME)).as<std::string>());
    } else if (name == METRIC_KEY(SUPPORTED_METRICS)) {
        IE_SET_METRIC_RETURN(SUPPORTED_METRICS,
                             {METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS),
                              METRIC_KEY(SUPPORTED_METRICS),
                              METRIC_KEY(NETWORK_NAME),
                              METRIC_KEY(SUPPORTED_CONFIG_KEYS)});
    } else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
        IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS,
                             {CONFIG_KEY(AUTO_BATCH_TIMEOUT)});  // only timeout can be changed on the fly
    } else {
        IE_THROW() << "Unsupported Network metric: " << name;
    }
 }
 // ------------------------------AutoBatchInferencePlugin----------------------------
 namespace {
 std::map<std::string, std::string> mergeConfigs(std::map<std::string, std::string> config,
                                                const std::map<std::string, std::string>& local) {
    for (auto&& kvp : local) {
        config[kvp.first] = kvp.second;
    }
    return config;
 }
 }  // namespace
 std::map<std::string, std::string> AutoBatchInferencePlugin::GetSupportedConfig(
    const std::map<std::string, std::string>& config,
    const std::string& deviceName) const {
    std::vector<std::string> supportedConfigKeys = GetCore()->GetMetric(deviceName, METRIC_KEY(SUPPORTED_CONFIG_KEYS));
    std::map<std::string, std::string> supportedConfig;
    for (auto&& key : supportedConfigKeys) {
        auto itKey = config.find(key);
        if (config.end() != itKey) {
            supportedConfig[key] = itKey->second;
        }
    }
    return supportedConfig;
 }
 DeviceInformation AutoBatchInferencePlugin::ParseBatchDevice(const std::string& deviceWithBatch) {
    auto&& d = deviceWithBatch;
    auto openingBracket = d.find_first_of('(');
    auto closingBracket = d.find_first_of(')', openingBracket);
    auto deviceName = d.substr(0, openingBracket);
    int batch = 1;
    if (closingBracket != std::string::npos && openingBracket < closingBracket) {
        batch = std::stol(d.substr(openingBracket + 1, closingBracket - 1));
        if (batch <= 0) {
            IE_THROW() << "Batch value for '" << deviceName << "' must be > 0, while " << batch << "is passed";
        }
    }
    return {deviceName, {{}}, batch};
 }
 DeviceInformation AutoBatchInferencePlugin::ParseMetaDevice(const std::string& devicesBatchCfg,
                                                            const std::map<std::string, std::string>& config) const {
    auto getDeviceConfig = [&](const DeviceName& deviceWithID) {
        DeviceIDParser deviceParser(deviceWithID);
        std::string deviceName = deviceParser.getDeviceName();
        std::map<std::string, std::string> tconfig = mergeConfigs(_config, config);
        // set device ID if any
        std::string deviceIDLocal = deviceParser.getDeviceID();
        if (!deviceIDLocal.empty()) {
            tconfig[PluginConfigParams::KEY_DEVICE_ID] = deviceIDLocal;
        }
        return GetSupportedConfig(tconfig, deviceName);
    };
    auto metaDevice = ParseBatchDevice(devicesBatchCfg);
    metaDevice.config = getDeviceConfig(metaDevice.deviceName);
    auto cfg = config;
    // check that no irrelevant config-keys left
    for (auto k : config) {
        const auto& name = k.first;
        auto found_in_supported_cfg = std::find(supported_configKeys.begin(), supported_configKeys.end(), k.first);
        auto found_in_device_cfg = metaDevice.config.find(k.first);
        if (found_in_device_cfg == metaDevice.config.end() && found_in_supported_cfg == supported_configKeys.end()) {
            IE_THROW() << "Unsupported config key: " << name;
        }
    }
    return metaDevice;
 }
 RemoteContext::Ptr AutoBatchInferencePlugin::CreateContext(const InferenceEngine::ParamMap& config) {
    auto cfg = config;
    auto it = cfg.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
    if (it == cfg.end())
        IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
    auto val = it->second;
    auto metaDevice = ParseMetaDevice(val, std::map<std::string, std::string>());
    cfg.erase(it);
    return GetCore()->CreateContext(metaDevice.deviceName, cfg);
 }
 Parameter AutoBatchInferencePlugin::GetConfig(const std::string& name,
                                              const std::map<std::string, Parameter>& options) const {
    if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), name)) {
        auto it = _config.find(name);
        if (it == _config.end()) {
            IE_THROW() << "Value for " << name << " is not set";
        } else {
            return {it->second};
        }
    } else {
        IE_THROW() << "Unsupported config key: " << name;
    }
 }
 void AutoBatchInferencePlugin::CheckConfig(const std::map<std::string, std::string>& config) {
    for (auto&& kvp : config) {
        const auto name = kvp.first;
        const auto val = kvp.second;
        if (supported_configKeys.end() == std::find(supported_configKeys.begin(), supported_configKeys.end(), name))
            IE_THROW() << "Unsupported config key: " << name;
        if (name == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
            ParseBatchDevice(val);
        } else if (name == CONFIG_KEY(AUTO_BATCH_TIMEOUT)) {
            try {
                auto t = std::stoi(val);
                if (t < 0)
                    IE_THROW(ParameterMismatch);
            } catch (const std::exception& e) {
                IE_THROW(ParameterMismatch)
                    << " Expecting unsigned int value for " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " got " << val;
            }
        }
    }
 }
 void AutoBatchInferencePlugin::SetConfig(const std::map<std::string, std::string>& config) {
    CheckConfig(config);
    for (auto&& kvp : config) {
        _config[kvp.first] = kvp.second;
    }
 }
 static const Version version = {{2, 1}, CI_BUILD_NUMBER, "AutoBatchPlugin"};
 IE_DEFINE_PLUGIN_CREATE_FUNCTION(AutoBatchInferencePlugin, version)
 AutoBatchInferencePlugin::AutoBatchInferencePlugin() {
    _pluginName = "BATCH";
 }
 InferenceEngine::Parameter AutoBatchInferencePlugin::GetMetric(
    const std::string& name,
    const std::map<std::string, InferenceEngine::Parameter>& options) const {
    if (name == METRIC_KEY(SUPPORTED_METRICS)) {
        std::vector<std::string> metrics;
        metrics.push_back(METRIC_KEY(SUPPORTED_METRICS));
        metrics.push_back(METRIC_KEY(FULL_DEVICE_NAME));
        metrics.push_back(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
        IE_SET_METRIC_RETURN(SUPPORTED_METRICS, metrics);
    } else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
        IE_SET_METRIC_RETURN(FULL_DEVICE_NAME, _pluginName);
    } else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
        IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, supported_configKeys);
    } else {
        IE_THROW(NotFound) << "Unsupported metric key " << name;
    }
 }
 IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
    const InferenceEngine::CNNNetwork& network,
    const std::map<std::string, std::string>& config) {
    return LoadNetworkImpl(network, nullptr, config);
 }
 InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadNetworkImpl(
    const InferenceEngine::CNNNetwork& network,
    const std::shared_ptr<InferenceEngine::RemoteContext> ctx,
    const std::map<std::string, std::string>& config) {
    if (GetCore() == nullptr) {
        IE_THROW() << "Please, work with MULTI device via InferencEngine::Core object";
    }
    auto fullConfig = mergeConfigs(_config, config);
    auto device_batch = fullConfig.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
    if (device_batch == fullConfig.end()) {
        IE_THROW() << "KEY_AUTO_BATCH key is not set for BATCH device";
    }
    auto metaDevice = ParseMetaDevice(device_batch->second, fullConfig);
    const auto& deviceName = metaDevice.deviceName;
    const auto& deviceConfig = metaDevice.config;
    const auto perfConfig = fullConfig.find(PluginConfigParams::KEY_PERF_COUNT);
    const bool enablePerfCounters = (fullConfig.end() != perfConfig) && (perfConfig->second == PluginConfigParams::YES);
    auto report_footprint = [](std::shared_ptr<ICore> pCore, std::string device) -> size_t {
        size_t footprint = 0;
        // TODO: use the per-network metric (22.2) rather than plugin-level
        auto stats = pCore->GetMetric(device, GPU_METRIC_KEY(MEMORY_STATISTICS)).as<std::map<std::string, uint64_t>>();
        for (auto s : stats)
            if (s.first.find("_current") != std::string::npos)
                footprint += s.second;
        return footprint;
    };
    size_t batch1_footprint = 0;
    if (deviceName.find("GPU") != std::string::npos)
        batch1_footprint = report_footprint(GetCore(), deviceName);
    auto executableNetworkWithoutBatch = ctx ? GetCore()->LoadNetwork(network, ctx, deviceConfig)
                                             : GetCore()->LoadNetwork(network, deviceName, deviceConfig);
    if (deviceName.find("GPU") != std::string::npos) {
        batch1_footprint = report_footprint(GetCore(), deviceName) - batch1_footprint;
        if (batch1_footprint) {
            const uint64_t total_mem = GetCore()->GetMetric(deviceName, GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
            const int estimated_batch = (total_mem - batch1_footprint) / batch1_footprint;
            int closest = pow(2, floor(log(estimated_batch) / log(2)));
            closest = std::max(1, closest);
            metaDevice.batchForDevice = std::min(metaDevice.batchForDevice, closest);
        }
    }
    // auto-batch settings
    std::unordered_map<std::string, InferenceEngine::Parameter> networkConfig;
    for (auto c : fullConfig) {
        if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), c.first))
            networkConfig.insert(c);
    }
    InferenceEngine::SoExecutableNetworkInternal executableNetworkWithBatch;
    if (metaDevice.batchForDevice > 1) {
        try {
            CNNNetwork clonedNetwork(InferenceEngine::details::cloneNetwork(network));
            const InputsDataMap inputInfo = clonedNetwork.getInputsInfo();
            ICNNNetwork::InputShapes shapes = clonedNetwork.getInputShapes();
            for (const InputsDataMap::value_type& item : inputInfo) {
                auto layout = item.second->getTensorDesc().getLayout();
                // the below code is a placeholder for the WIP (22.1) functionality
                // that will check the reshaping by the batch is robust (CVS-51744)
                if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
                    layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
                    layout == InferenceEngine::Layout::NDHWC) {
                    assert(1 == shapes[item.first][0]);  // do not reshape/re-batch originally batched networks
                    shapes[item.first][0] = metaDevice.batchForDevice;
                }
            }
            clonedNetwork.reshape(shapes);
            executableNetworkWithBatch =
                ctx ? GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, ctx, deviceConfig)
                    : GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, deviceName, deviceConfig);
        } catch (...) {
            executableNetworkWithBatch = {nullptr, nullptr};
        }
    }
    if (!executableNetworkWithBatch) {
        executableNetworkWithBatch = executableNetworkWithoutBatch;
        metaDevice.batchForDevice = 1;
    }
    return std::make_shared<AutoBatchExecutableNetwork>(executableNetworkWithBatch,
                                                        executableNetworkWithoutBatch,
                                                        metaDevice,
                                                        networkConfig,
                                                        enablePerfCounters);
 }
 InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
    const InferenceEngine::CNNNetwork& network,
    const std::shared_ptr<InferenceEngine::RemoteContext>& context,
    const std::map<std::string, std::string>& config) {
    return LoadNetworkImpl(network, context, config);
 }
 InferenceEngine::QueryNetworkResult AutoBatchInferencePlugin::QueryNetwork(
    const InferenceEngine::CNNNetwork& network,
    const std::map<std::string, std::string>& config) const {
    auto cfg = config;
    for (auto c : cfg) {
        if (c.first == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
            auto val = c.second;
            cfg.erase(c.first);
            auto metaDevice = ParseMetaDevice(val, cfg);
            return GetCore()->QueryNetwork(network, metaDevice.deviceName, cfg);
        }
    }
    IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
 }
 }  // namespace AutoBatchPlugin
--- a/src/plugins/auto_batch/auto_batch.hpp
+++ b/src/plugins/auto_batch/auto_batch.hpp
@ -0,0 +1,159 @@
 // Copyright (C) 2018-2021 Intel Corporation
 // SPDX-License-Identifier: Apache-2.0
 //
 ///////////////////////////////////////////////////////////////////////////////////////////////////
 #pragma once
 #include <atomic>
 #include <map>
 #include <mutex>
 #include <string>
 #include <unordered_map>
 #include <utility>
 #include <vector>
 #include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
 #include "cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp"
 #include "cpp_interfaces/interface/ie_iplugin_internal.hpp"
 #include "ie_metric_helpers.hpp"
 #include "threading/ie_thread_safe_containers.hpp"
 namespace AutoBatchPlugin {
 using DeviceName = std::string;
 struct DeviceInformation {
    DeviceName deviceName;
    std::map<std::string, std::string> config;
    int batchForDevice;
 };
 class AutoBatchAsyncInferRequest;
 class AutoBatchExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
 public:
    using Ptr = std::shared_ptr<AutoBatchExecutableNetwork>;
    struct WorkerInferRequest {
        using Ptr = std::shared_ptr<WorkerInferRequest>;
        InferenceEngine::SoIInferRequestInternal _inferRequestBatched;
        int _batchSize;
        InferenceEngine::ThreadSafeQueueWithSize<std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task>> _tasks;
        std::vector<InferenceEngine::Task> _completionTasks;
        std::thread _thread;
        std::condition_variable _cond;
        std::mutex _mutex;
        std::exception_ptr _exceptionPtr;
    };
    explicit AutoBatchExecutableNetwork(
        const InferenceEngine::SoExecutableNetworkInternal& networkForDevice,
        const InferenceEngine::SoExecutableNetworkInternal& networkForDeviceWithoutBatch,
        const DeviceInformation& networkDevices,
        const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
        const bool needPerfCounters = false);
    void SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) override;
    InferenceEngine::Parameter GetConfig(const std::string& name) const override;
    InferenceEngine::Parameter GetMetric(const std::string& name) const override;
    InferenceEngine::IInferRequestInternal::Ptr CreateInferRequest() override;
    InferenceEngine::IInferRequestInternal::Ptr CreateInferRequestImpl(
        InferenceEngine::InputsDataMap networkInputs,
        InferenceEngine::OutputsDataMap networkOutputs) override;
    std::shared_ptr<InferenceEngine::RemoteContext> GetContext() const override;
    std::shared_ptr<ngraph::Function> GetExecGraphInfo() override;
    virtual ~AutoBatchExecutableNetwork();
 protected:
    static unsigned int ParseTimeoutValue(const std::string&);
    std::atomic_bool _terminate = {false};
    DeviceInformation _device;
    InferenceEngine::SoExecutableNetworkInternal _network;
    InferenceEngine::SoExecutableNetworkInternal _networkWithoutBatch;
    std::vector<WorkerInferRequest::Ptr> _workerRequests;
    std::unordered_map<std::string, InferenceEngine::Parameter> _config;
    bool _needPerfCounters = false;
    std::atomic_size_t _numRequestsCreated = {0};
    std::atomic_int _timeOut = {1000};  // in ms
 };
 class AutoBatchInferRequest : public InferenceEngine::IInferRequestInternal {
 public:
    using Ptr = std::shared_ptr<AutoBatchInferRequest>;
    explicit AutoBatchInferRequest(const InferenceEngine::InputsDataMap& networkInputs,
                                   const InferenceEngine::OutputsDataMap& networkOutputs,
                                   AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
                                   int batch_id,
                                   int num_batch,
                                   bool _needPerfCounters = false);
    std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> GetPerformanceCounts() const override;
    // Batch-Device impl specific: sets the data (blobs from the device request to the batched device request)
    void SetBlobsToAnotherRequest(InferenceEngine::SoIInferRequestInternal& req);
    void CopyInputsIfNeeded();
    void CopyOutputsIfNeeded();
    AutoBatchExecutableNetwork::WorkerInferRequest& _myBatchedRequestWrapper;
    std::exception_ptr _exceptionPtr;
 protected:
    std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> _perfMap;
    bool _needPerfCounters = false;
    void CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src, InferenceEngine::Blob::Ptr dst, bool bInput);
    size_t _batchId;
    size_t _batchSize;
 };
 class AutoBatchAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
 public:
    using Ptr = std::shared_ptr<AutoBatchAsyncInferRequest>;
    explicit AutoBatchAsyncInferRequest(const AutoBatchInferRequest::Ptr& inferRequest,
                                        const bool needPerfCounters,
                                        InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
                                        const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);
    void Infer_ThreadUnsafe() override;
    virtual ~AutoBatchAsyncInferRequest();
    InferenceEngine::SoIInferRequestInternal _inferRequestWithoutBatch;
    AutoBatchInferRequest::Ptr _inferRequest;
 };
 class AutoBatchInferencePlugin : public InferenceEngine::IInferencePlugin {
 public:
    AutoBatchInferencePlugin();
    virtual ~AutoBatchInferencePlugin() = default;
    InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
        const InferenceEngine::CNNNetwork& network,
        const std::map<std::string, std::string>& config) override;
    InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
        const InferenceEngine::CNNNetwork& network,
        const std::shared_ptr<InferenceEngine::RemoteContext>& context,
        const std::map<std::string, std::string>& config) override;
    void SetConfig(const std::map<std::string, std::string>& config) override;
    void CheckConfig(const std::map<std::string, std::string>& config);
    InferenceEngine::Parameter GetConfig(
        const std::string& name,
        const std::map<std::string, InferenceEngine::Parameter>& options) const override;
    InferenceEngine::QueryNetworkResult QueryNetwork(const InferenceEngine::CNNNetwork& network,
                                                     const std::map<std::string, std::string>& config) const override;
    InferenceEngine::Parameter GetMetric(
        const std::string& name,
        const std::map<std::string, InferenceEngine::Parameter>& options) const override;
    InferenceEngine::RemoteContext::Ptr CreateContext(const InferenceEngine::ParamMap&) override;
 protected:
    DeviceInformation ParseMetaDevice(const std::string& devicesBatchCfg,
                                      const std::map<std::string, std::string>& config) const;
    std::map<std::string, std::string> GetSupportedConfig(const std::map<std::string, std::string>& config,
                                                          const DeviceName& deviceName) const;
    static DeviceInformation ParseBatchDevice(const std::string& deviceWithBatch);
    InferenceEngine::IExecutableNetworkInternal::Ptr LoadNetworkImpl(
        const InferenceEngine::CNNNetwork& network,
        const std::shared_ptr<InferenceEngine::RemoteContext> context,
        const std::map<std::string, std::string>& config);
 };
 }  // namespace AutoBatchPlugin
--- a/src/plugins/intel_cpu/src/mkldnn_plugin.cpp
+++ b/src/plugins/intel_cpu/src/mkldnn_plugin.cpp
@ -609,11 +609,9 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network, const std
                // the more "capable" the CPU in general, the more streams we may want to keep to keep it utilized
                const float memThresholdAssumeLimitedForISA = ov::MemBandwidthPressure::LIMITED/isaSpecificThreshold;
                const float L2_cache_size = mkldnn::utils::get_cache_size(2 /*level*/, true /*per core */);
                const float L3_cache_size = mkldnn::utils::get_cache_size(3, false);
                ov::MemBandwidthPressure networkToleranceForLowCache = ov::MemBandwidthPressureTolerance(
                        clonedNetwork.getFunction(),
-                        L2_cache_size, L3_cache_size,
+                        L2_cache_size, memThresholdAssumeLimitedForISA);
                        memThresholdAssumeLimitedForISA);
                // num of phys CPU cores (most aggressive value for #streams)
                const auto num_cores = getNumberOfCPUCores();
                // less aggressive
--- a/src/plugins/intel_gpu/src/plugin/plugin.cpp
+++ b/src/plugins/intel_gpu/src/plugin/plugin.cpp
@ -28,6 +28,7 @@
 #include "intel_gpu/runtime/device_query.hpp"
 #include "intel_gpu/runtime/debug_configuration.hpp"
 #include <performance_heuristics.hpp>
 #ifdef __linux__
 # include <dlfcn.h>
 #endif
@ -681,6 +682,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
        metrics.push_back(METRIC_KEY(RANGE_FOR_STREAMS));
        metrics.push_back(METRIC_KEY(DEVICE_TYPE));
        metrics.push_back(METRIC_KEY(DEVICE_GOPS));
        metrics.push_back(METRIC_KEY(OPTIMAL_BATCH_SIZE));
        metrics.push_back(GPU_METRIC_KEY(MAX_BATCH_SIZE));
        metrics.push_back(GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
        metrics.push_back(GPU_METRIC_KEY(UARCH_VERSION));
@ -716,6 +718,76 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
              << static_cast<int>(device_info.gfx_ver.revision);
        }
        IE_SET_METRIC_RETURN(GPU_UARCH_VERSION, s.str());
    } else if (name == METRIC_KEY(OPTIMAL_BATCH_SIZE)) {
        auto next_pow_of_2 = [] (float x) {
            return pow(2, ceil(log(x)/log(2)));
        };
        auto closest_pow_of_2 = [] (float x) {
            return pow(2, floor(log(x)/log(2)));
        };
        auto model_param = options.find("MODEL_PTR");
        if (model_param == options.end()) {
            GPU_DEBUG_IF(debug_config->verbose >= 1) {
                GPU_DEBUG_COUT << "[GPU_OPTIMAL_BATCH_SIZE] MODELS_PTR is not set: return 1" << std::endl;
            }
            IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, static_cast<unsigned int>(1));
        }
        std::shared_ptr<ngraph::Function> model;
        try {
            model = model_param->second.as<std::shared_ptr<ngraph::Function>>();
        } catch (...) {
            IE_THROW() << "[GPU_OPTIMAL_BATCH_SIZE] MODEL_PTR should be std::shared_ptr<ngraph::Function> type";
        }
        GPU_DEBUG_IF(debug_config->verbose >= 1) {
            GPU_DEBUG_COUT << "DEVICE_INFO:"
                           << "gfx_version.major, " << device_info.gfx_ver.major
                           << "gfx_version.minor " << std::to_string(device_info.gfx_ver.minor) << std::endl;
        }
        static std::map<cldnn::gfx_version, size_t> gen_kbytes_per_bank = {
                {{12, 0, 0}, 480},  // TGL
                {{12, 1, 0}, 2048}, // DG1
                {{12, 5, 0}, 320},
                {{12, 7, 0}, 512},
        };
        size_t L3_cache_size = device_info.gfx_ver.major && (device_info.gfx_ver.major <= 9)
                ? 768 * 1024 // Gen9
                : 2 * 768 * 1024;  //reasonable default when no arch has been detected (e.g. due to old driver ver)
        cldnn::gfx_version gen = {device_info.gfx_ver.major, device_info.gfx_ver.minor, 0 /*ignore the revision*/};
        auto val = gen_kbytes_per_bank.find(gen);
        if (gen_kbytes_per_bank.end() != val) {
            auto kbytes_per_bank = val->second;
            auto num_banks_per_slice = device_info.num_sub_slices_per_slice > 4
                                       ? next_pow_of_2(device_info.num_sub_slices_per_slice)
                                       : 2 * device_info.num_sub_slices_per_slice;
            L3_cache_size = kbytes_per_bank * 1024 * num_banks_per_slice * device_info.num_slices;
            GPU_DEBUG_IF(debug_config->verbose >= 1) {
                GPU_DEBUG_COUT << "DEVICE_INFO:"
                               << "num_slices " << device_info.num_slices
                               << ", num_sub_slices_per_slice " << device_info.num_sub_slices_per_slice
                               << ", num_banks_per_slice " << num_banks_per_slice
                               << ", gen_kbytes_per_bank : " << kbytes_per_bank
                               << ", L3_cache_size is (MB): " << float(L3_cache_size) / 1024 / 1024 << std::endl;
            }
        }
        Config config = _impl->m_configs.GetConfig(device_id);
        auto networkCloned = CloneAndTransformNetwork(CNNNetwork(model), config);
        ov::MemBandwidthPressure memPressure = ov::MemBandwidthPressureTolerance(networkCloned.getFunction(), L3_cache_size);
        unsigned int batch = 1;
        if (memPressure.max_mem_tolerance != ov::MemBandwidthPressure::UNKNOWN)
            batch = std::max(1.0, 16 * closest_pow_of_2(memPressure.max_mem_tolerance));
        std::map<std::string, InferenceEngine::Parameter> options_for_max_batch;
        options_for_max_batch["MODEL_PTR"] = model;
        options_for_max_batch["GPU_THROUGHPUT_STREAMS"] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO);
        auto max_batch_size = GetMetric(GPU_METRIC_KEY(MAX_BATCH_SIZE), options_for_max_batch).as<unsigned int>();
        unsigned int closest = closest_pow_of_2(max_batch_size);
        batch = std::min(closest, batch);
        batch = std::min(256u, batch); //batch 256 is a max
        GPU_DEBUG_IF(debug_config->verbose >= 1) {
            GPU_DEBUG_COUT << memPressure.max_mem_tolerance << std::endl;
            GPU_DEBUG_COUT << "MAX_BATCH: " << max_batch_size << std::endl;
            GPU_DEBUG_COUT << "ACTUAL OPTIMAL BATCH: " << batch << std::endl;
        }
        IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, batch);
    } else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
        auto deviceName = StringRightTrim(device_info.dev_name, "NEO", false);
        deviceName += std::string(" (") + (device_info.dev_type == cldnn::device_type::discrete_gpu ? "dGPU" : "iGPU") + ")";
@ -885,7 +957,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
            TransformationsPipeline transformations(config, device_info);
            transformations.apply(nGraphFunc);
            program = std::make_shared<Program>(cloned_network, engine, config, false, true);
-            std::pair<int64_t, int64_t> device_memory_usage =  program->GetCompiledProgram(0)->get_estimated_device_mem_usage();
+            std::pair<int64_t, int64_t> device_memory_usage = program->GetCompiledProgram(0)->get_estimated_device_mem_usage();
            int64_t mem_for_general = std::max(static_cast<int64_t>(1L),
                    static_cast<int64_t>(static_cast<int64_t>(available_device_mem) - device_memory_usage.first));
            int64_t mem_per_batch = std::max(static_cast<int64_t>(1L), (device_memory_usage.second / static_cast<int64_t>(base_batch_size)));
--- a/src/tests/functional/inference_engine/CMakeLists.txt
+++ b/src/tests/functional/inference_engine/CMakeLists.txt
@ -48,6 +48,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
    list(APPEND DEPENDENCIES ov_auto_plugin)
 endif()
 if(ENABLE_AUTO_BATCH)
    list(APPEND DEPENDENCIES ov_auto_batch_plugin)
 endif()
 if (NOT ENABLE_OV_ONNX_FRONTEND)
    list(APPEND EXCLUDED_SOURCE_PATHS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_reader")
 endif()
--- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/include/api_conformance_helpers.hpp
+++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/include/api_conformance_helpers.hpp
@ -24,6 +24,7 @@ inline const std::string getPluginLibNameByDevice(const std::string& deviceName)
            { "GNA", "ov_intel_gna_plugin" },
            { "GPU", "ov_intel_gpu_plugin" },
            { "HETERO", "ov_hetero_plugin" },
            { "BATCH", "ov_auto_batch_plugin" },
            { "MULTI", "ov_multi_plugin" },
            { "MYRIAD", "myriadPlugin" },
            { "TEMPLATE", "ov_template_plugin" },
@ -42,6 +43,11 @@ inline const std::pair<std::string, std::string> generateDefaultHeteroConfig() {
    return { "TARGET_FALLBACK" , ConformanceTests::targetDevice };
 }
 inline const std::pair<std::string, std::string> generateDefaultBatchConfig() {
    // auto-batching with batch 1 (no real batching in fact, but full machinery is in action)
    return { CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , std::string(ConformanceTests::targetDevice)};
 }
 inline const std::vector<std::map<std::string, std::string>> generateConfigs(const std::string& targetDevice,
                                                                             const std::vector<std::map<std::string, std::string>>& config = {}) {
    std::pair<std::string, std::string> defaultConfig;
@ -49,6 +55,8 @@ inline const std::vector<std::map<std::string, std::string>> generateConfigs(con
        defaultConfig = generateDefaultMultiConfig();
    } else if (targetDevice ==  std::string(CommonTestUtils::DEVICE_HETERO)) {
        defaultConfig = generateDefaultHeteroConfig();
    } else if (targetDevice ==  std::string(CommonTestUtils::DEVICE_BATCH)) {
        defaultConfig = generateDefaultBatchConfig();
    } else {
        throw std::runtime_error("Incorrect target device: " + targetDevice);
    }
@ -70,7 +78,8 @@ inline const std::string generateComplexDeviceName(const std::string& deviceName
 inline const std::vector<std::string> returnAllPossibleDeviceCombination() {
    std::vector<std::string> res{ConformanceTests::targetDevice};
-    std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO, CommonTestUtils::DEVICE_MULTI};
+    std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO,
                                     CommonTestUtils::DEVICE_BATCH, CommonTestUtils::DEVICE_MULTI};
    for (const auto& device : devices) {
        res.emplace_back(generateComplexDeviceName(device));
    }
--- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/callback.cpp
+++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/callback.cpp
@ -33,4 +33,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestCallbackTests,
                                 ::testing::Values(CommonTestUtils::DEVICE_HETERO),
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
                         InferRequestCallbackTests::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestCallbackTests,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
                         InferRequestCallbackTests::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/io_blob.cpp
+++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/io_blob.cpp
@ -36,4 +36,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestIOBBlobTest,
                                 ::testing::Values(CommonTestUtils::DEVICE_HETERO),
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
                         InferRequestIOBBlobTest::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestIOBBlobTest,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
                         InferRequestIOBBlobTest::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/multitheading.cpp
+++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/multitheading.cpp
@ -38,4 +38,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestMultithreadingT
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
                         InferRequestMultithreadingTests::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestMultithreadingTests,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
                         InferRequestMultithreadingTests::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/set_blob_by_type.cpp
+++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/set_blob_by_type.cpp
@ -46,4 +46,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Hetero, InferRequestSetBlobByType,
                                            ::testing::Values(CommonTestUtils::DEVICE_HETERO),
                                            ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
                         InferRequestSetBlobByType::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Batch, InferRequestSetBlobByType,
                         ::testing::Combine(::testing::ValuesIn(setBlobTypes),
                                            ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                            ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
                         InferRequestSetBlobByType::getTestCaseName);
 } // namespace
--- a/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/wait.cpp
+++ b/src/tests/functional/plugin/conformance/test_runner/api_conformance_runner/src/behavior/infer_request/wait.cpp
@ -37,4 +37,9 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestWaitTests,
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
                         InferRequestWaitTests::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestWaitTests,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
                         InferRequestWaitTests::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/cpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp
+++ b/src/tests/functional/plugin/cpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp
@ -0,0 +1,31 @@
 // Copyright (C) 2018-2021 Intel Corporation
 // SPDX-License-Identifier: Apache-2.0
 //
 #include <auto_batching/auto_batching_tests.hpp>
 const std::vector<bool>   get_vs_set{ true, false };
 const std::vector<size_t> num_streams{ 1, 2 };
 const std::vector<size_t> num_requests{ 1, 3, 8, 9, 16, 64 };
 const std::vector<size_t> num_batch{ 1, 4, 8, 16, 32, 64, 128, 256 };
 using namespace AutoBatchingTests;
 namespace {
 INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test,
        ::testing::Combine(
                ::testing::Values(CommonTestUtils::DEVICE_CPU),
                ::testing::ValuesIn(get_vs_set),
                ::testing::ValuesIn(num_streams),
                ::testing::ValuesIn(num_requests),
                ::testing::ValuesIn(num_batch)),
                         AutoBatching_Test::getTestCaseName);
 // TODO: for 22.2 (CVS-68949)
 //INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test_DetectionOutput,
 //                         ::testing::Combine(
 //                                 ::testing::Values(CommonTestUtils::DEVICE_CPU),
 //                                 ::testing::ValuesIn(get_vs_set),
 //                                 ::testing::ValuesIn(num_streams),
 //                                 ::testing::ValuesIn(num_requests),
 //                                 ::testing::ValuesIn(num_batch)),
 //                         AutoBatching_Test_DetectionOutput::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp
+++ b/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp
@ -21,16 +21,27 @@ using namespace ::testing;
 using namespace InferenceEngine;
 using namespace InferenceEngine::gpu;
-class RemoteBlob_Test : public CommonTestUtils::TestsCommon {
+class RemoteBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<bool> {
 protected:
    std::shared_ptr<ngraph::Function> fn_ptr;
    std::string deviceName;
 public:
    void SetUp() override {
        fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
        deviceName = CommonTestUtils::DEVICE_GPU;
        auto with_auto_batching = this->GetParam();
        if (with_auto_batching) { // BATCH:GPU
            deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
        }
    }
    static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
        auto with_auto_batch = obj.param;
        return std::string("RemoteBlob_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
    }
 };
-TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
+TEST_P(RemoteBlob_Test, smoke_canInputUserBlob) {
 #if defined(ANDROID)
    GTEST_SKIP();
 #endif
@ -41,7 +52,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
    // TODO: Issue: investigate issue with IECore
    auto ie = InferenceEngine::Core();
-    auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
+    auto exec_net = ie.LoadNetwork(net, deviceName);
    // regular inference
    auto inf_req_regular = exec_net.CreateInferRequest();
@ -70,6 +81,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
    Blob::Ptr shared_blob = make_shared_blob(net.getInputsInfo().begin()->second->getTensorDesc(), cldnn_context,
                                             shared_buffer);
    shared_blob->allocate();
    inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, shared_blob);
    inf_req_shared.Infer();
@ -85,7 +97,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
 }
-TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
+TEST_P(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
 #if defined(ANDROID)
    GTEST_SKIP();
 #endif
@ -96,7 +108,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
    // TODO: Issue: investigate issue with IECore
    auto ie = InferenceEngine::Core();
-    auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
+    auto exec_net = ie.LoadNetwork(net, deviceName);
    // regular inference
    auto inf_req_regular = exec_net.CreateInferRequest();
@ -139,7 +151,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
 }
-TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
+TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
    auto fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
    CNNNetwork net(fn_ptr);
@ -149,7 +161,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName);
    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -161,7 +173,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
    // inference using remote blob
    auto ocl_instance = std::make_shared<OpenCL>();
-    auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_context.get());
+    auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_context.get());
    auto exec_net_shared = ie->LoadNetwork(net, remote_context);
    auto inf_req_shared = exec_net_shared.CreateInferRequest();
    inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, fakeImageData);
@ -178,7 +190,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
    }
 }
-TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
+TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
 #if defined _WIN32
    GTEST_SKIP();
 #endif
@ -191,7 +203,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName);
    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -214,7 +226,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
    // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
    // without calling thread blocks
-    auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
+    auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
    auto exec_net_shared = ie->LoadNetwork(net, remote_context);
    auto inf_req_shared = exec_net_shared.CreateInferRequest();
@ -270,7 +282,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
    }
 }
-TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
+TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
 #if defined _WIN32
    GTEST_SKIP();
 #endif
@ -283,7 +295,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName);
    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -307,7 +319,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
    // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
    // without calling thread blocks
-    auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
+    auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
    auto exec_net_shared = ie->LoadNetwork(net, remote_context);
    auto inf_req_shared = exec_net_shared.CreateInferRequest();
@ -358,6 +370,10 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
    }
 }
 std::vector<bool> with_auto_batching {true, false};
 INSTANTIATE_TEST_SUITE_P(smoke_RemoteBlob, RemoteBlob_Test, ::testing::ValuesIn(with_auto_batching),
        RemoteBlob_Test::getTestCaseName);
 class BatchedBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<size_t> {
    void SetUp() override {
        num_batch = this->GetParam();
--- a/src/tests/functional/plugin/gpu/remote_blob_tests/gpu_remote_tensor_tests.cpp
+++ b/src/tests/functional/plugin/gpu/remote_blob_tests/gpu_remote_tensor_tests.cpp
@ -30,6 +30,7 @@ protected:
    }
 };
 std::vector<bool> ov_with_auto_batching {true, false};
 enum class RemoteTensorSharingType {
    USER_CL_TENSOR = 0,
    PLUGIN_CL_TENSOR = 1,
@ -54,17 +55,34 @@ std::ostream& operator<<(std::ostream& stream, RemoteTensorSharingType sharing_t
    return stream;
 }
-class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test, public testing::WithParamInterface<RemoteTensorSharingType> {
+using RemoteTensorSharingTestOptionsParams = std::tuple<RemoteTensorSharingType, bool /*auto-batching*/>;
 class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test,
        public testing::WithParamInterface<RemoteTensorSharingTestOptionsParams> {
 protected:
    std::shared_ptr<ngraph::Function> fn_ptr;
    std::string deviceName;
 public:
    void SetUp() override {
        fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
        deviceName = CommonTestUtils::DEVICE_GPU;
        RemoteTensorSharingType sharing_type;
        bool with_auto_batching;
        std::tie(sharing_type, with_auto_batching) = this->GetParam();
        if (with_auto_batching)  // BATCH:GPU
            deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
    }
-
+    static std::string getTestCaseName(const testing::TestParamInfo<RemoteTensorSharingTestOptionsParams>& obj) {
-    static std::string getTestCaseName(testing::TestParamInfo<RemoteTensorSharingType> obj) {
+        RemoteTensorSharingType sharing_type;
-        RemoteTensorSharingType sharing_type = obj.param;
+        bool with_auto_batching;
        std::tie(sharing_type, with_auto_batching) = obj.param;
        std::ostringstream result;
        result << "OVRemoteTensorInputBlob_Test_";
        result << sharing_type;
        if (with_auto_batching)
            result << "_WITH_AUTO_BATCHING";
        return result.str();
    }
 };
@ -81,9 +99,17 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
    p.input().preprocess().convert_element_type(ov::element::f32);
    auto function = p.build();
-    auto exec_net = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
+    RemoteTensorSharingType sharing_type;
    bool with_auto_batching;
    std::tie(sharing_type, with_auto_batching) = GetParam();
-    RemoteTensorSharingType sharing_type = GetParam();
+    // auto-batching relies on availability of the lock() for the tensor (and the *USM_DEVICE is not lockable)
    if (with_auto_batching
            && (RemoteTensorSharingType::USER_USM_DEVICE_TENSOR == sharing_type
                    || RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR == sharing_type))
        GTEST_SKIP();
    auto exec_net = ie.compile_model(function, deviceName);
    // regular inference
    auto inf_req_regular = exec_net.create_infer_request();
@ -244,6 +270,7 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
 INSTANTIATE_TEST_SUITE_P(
    smoke_GPU,
    OVRemoteTensorInputBlob_Test,
    ::testing::Combine(
        ::testing::ValuesIn(std::vector<RemoteTensorSharingType>{RemoteTensorSharingType::USER_CL_TENSOR,
                                                                 RemoteTensorSharingType::PLUGIN_CL_TENSOR,
                                                                 RemoteTensorSharingType::USER_USM_HOST_TENSOR,
@ -251,9 +278,29 @@ INSTANTIATE_TEST_SUITE_P(
                                                                 RemoteTensorSharingType::PLUGIN_USM_HOST_TENSOR,
                                                                 RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR,
                                                                 RemoteTensorSharingType::PLUGIN_HOST_TENSOR}),
        ::testing::ValuesIn(ov_with_auto_batching)),
        OVRemoteTensorInputBlob_Test::getTestCaseName);
-TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
+class OVRemoteTensor_TestsWithContext : public OVRemoteTensor_Test, public testing::WithParamInterface<bool> {
 protected:
    std::shared_ptr<ngraph::Function> fn_ptr;
    std::string deviceName;
 public:
    void SetUp() override {
        fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
        deviceName = CommonTestUtils::DEVICE_GPU;
        auto with_auto_batching = this->GetParam();
        if (with_auto_batching) { // BATCH:GPU
            deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
        }
    }
    static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
        auto with_auto_batch = obj.param;
        return std::string("RemoteTensor_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
    }
 };
 TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContext) {
    auto ie = ov::runtime::Core();
    using namespace ov::preprocess;
@ -262,7 +309,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
    p.input().preprocess().convert_element_type(ov::element::f32);
    auto function = p.build();
-    auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie.compile_model(function, deviceName);
    auto input = function->get_parameters().at(0);
    auto output = function->get_results().at(0);
@ -296,7 +343,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
    }
 }
-TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
+TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContextWithMultipleDevices) {
    auto ie = ov::runtime::Core();
    using namespace ov::preprocess;
@ -305,7 +352,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
    p.input().preprocess().convert_element_type(ov::element::f32);
    auto function = p.build();
-    auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie.compile_model(function, deviceName);
    auto input = function->get_parameters().at(0);
    auto output = function->get_results().at(0);
@ -344,7 +391,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
    }
 }
-TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
+TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_out_of_order) {
    auto ie = ov::runtime::Core();
    using namespace ov::preprocess;
@ -353,7 +400,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
    p.input().preprocess().convert_element_type(ov::element::f32);
    auto function = p.build();
-    auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie.compile_model(function, deviceName);
    auto input = function->get_parameters().at(0);
    auto output = function->get_results().at(0);
@ -423,7 +470,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
    }
 }
-TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
+TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_in_order) {
    auto ie = ov::runtime::Core();
    using namespace ov::preprocess;
@ -432,7 +479,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
    p.input().preprocess().convert_element_type(ov::element::f32);
    auto function = p.build();
-    auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
+    auto exec_net_regular = ie.compile_model(function, deviceName);
    auto input = function->get_parameters().at(0);
    auto output = function->get_results().at(0);
@ -498,6 +545,9 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
    }
 }
 INSTANTIATE_TEST_SUITE_P(smoke_RemoteTensor, OVRemoteTensor_TestsWithContext, ::testing::ValuesIn(ov_with_auto_batching),
                         OVRemoteTensor_TestsWithContext::getTestCaseName);
 TEST_F(OVRemoteTensor_Test, NV12toBGR_image) {
 #if defined(ANDROID)
    GTEST_SKIP();
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/auto_batching/auto_batching_tests.cpp
@ -0,0 +1,31 @@
 // Copyright (C) 2018-2021 Intel Corporation
 // SPDX-License-Identifier: Apache-2.0
 //
 #include <auto_batching/auto_batching_tests.hpp>
 const std::vector<size_t> num_streams{ 2 };
 const std::vector<bool>   get_vs_set{ true, false };
 const std::vector<size_t> num_requests{ 1, 8, 16, 64 };
 const std::vector<size_t> num_batch{ 1, 8, 32, 256 };
 using namespace AutoBatchingTests;
 namespace AutoBatchingTests {
 INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_GPU),
                                 ::testing::ValuesIn(get_vs_set),
                                 ::testing::ValuesIn(num_streams),
                                 ::testing::ValuesIn(num_requests),
                                 ::testing::ValuesIn(num_batch)),
                         AutoBatching_Test::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test_DetectionOutput,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_GPU),
                                 ::testing::ValuesIn(get_vs_set),
                                 ::testing::ValuesIn(num_streams),
                                 ::testing::ValuesIn(num_requests),
                                 ::testing::ValuesIn(num_batch)),
                         AutoBatching_Test_DetectionOutput::getTestCaseName);
 }  // namespace AutoBatchingTests
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/exec_net_base.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/exec_net_base.cpp
@ -52,6 +52,10 @@ const std::vector<std::map<std::string, std::string>> autoConfig = {
        {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
 };
 const std::vector<std::map<std::string, std::string>> autoBatchConfig = {
        {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
 };
 INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, ExecNetSetPrecision,
                         ::testing::Combine(
                                 ::testing::ValuesIn(netPrecisions),
@ -72,4 +76,11 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetSetPrecision,
                                 ::testing::Values(CommonTestUtils::DEVICE_AUTO),
                                 ::testing::ValuesIn(autoConfig)),
                         ExecNetSetPrecision::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, ExecNetSetPrecision,
                         ::testing::Combine(
                                 ::testing::ValuesIn(netPrecisions),
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(autoBatchConfig)),
                         ExecNetSetPrecision::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/get_metric.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/executable_network/get_metric.cpp
@ -22,27 +22,27 @@ namespace {
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_OPTIMAL_NUMBER_OF_INFER_REQUESTS,
-        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
+        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
 );
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_CONFIG_KEYS,
-        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
+        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
 );
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_METRICS,
-        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
+        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
 );
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_NETWORK_NAME,
-        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
+        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
 );
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_ThrowsUnsupported,
-        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
+        ::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
 );
 //
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/callback.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/callback.cpp
@ -19,6 +19,10 @@ const std::vector<std::map<std::string, std::string>> autoConfigs = {
        {InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
 };
 const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
        {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
 };
 INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestCallbackTests,
        ::testing::Combine(
            ::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestCallbackTests,
            ::testing::Values(CommonTestUtils::DEVICE_AUTO),
            ::testing::ValuesIn(autoConfigs)),
        InferRequestCallbackTests::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestCallbackTests,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(autoBatchConfigs)),
                         InferRequestCallbackTests::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/multithreading.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/multithreading.cpp
@ -18,6 +18,10 @@ const std::vector<std::map<std::string, std::string>> autoconfigs = {
        {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, std::string(CommonTestUtils::DEVICE_CPU) + "," + CommonTestUtils::DEVICE_GPU}}
 };
 const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
        {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
 };
 INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestMultithreadingTests,
                        ::testing::Combine(
                                ::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestMultithreadingTes
                                ::testing::ValuesIn(autoconfigs)),
                        InferRequestMultithreadingTests::getTestCaseName);
 INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestMultithreadingTests,
                         ::testing::Combine(
                                 ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                 ::testing::ValuesIn(auto_batch_configs)),
                         InferRequestMultithreadingTests::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/wait.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/infer_request/wait.cpp
@ -19,6 +19,11 @@ namespace {
             CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
    };
    const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
    };
    INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestWaitTests,
                            ::testing::Combine(
                                    ::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -32,9 +37,15 @@ namespace {
                            InferRequestWaitTests::getTestCaseName);
    INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestWaitTests,
-                            ::testing::Combine(
+                             ::testing::Combine(
-                                    ::testing::Values(CommonTestUtils::DEVICE_AUTO),
+                                     ::testing::Values(CommonTestUtils::DEVICE_AUTO),
-                                    ::testing::ValuesIn(autoConfigs)),
+                                     ::testing::ValuesIn(autoConfigs)),
-                            InferRequestWaitTests::getTestCaseName);
+                             InferRequestWaitTests::getTestCaseName);
    INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestWaitTests,
                             ::testing::Combine(
                                     ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                                     ::testing::ValuesIn(autoBatchConfigs)),
                             InferRequestWaitTests::getTestCaseName);
 }  // namespace
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp
@ -30,11 +30,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassNetworkTestP, OVClassNetworkTestP, ::tes
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
        OVClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
        OVClassGetMetricTest_SUPPORTED_METRICS,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
        OVClassGetMetricTest_AVAILABLE_DEVICES,
@ -42,7 +42,7 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
        OVClassGetMetricTest_FULL_DEVICE_NAME,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
        OVClassGetMetricTest_OPTIMIZATION_CAPABILITIES,
@ -62,11 +62,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
        OVClassGetMetricTest_ThrowUnsupported,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetConfigTest,
        OVClassGetConfigTest_ThrowUnsupported,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
 INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetAvailableDevices, OVClassGetAvailableDevices, ::testing::Values("GPU"));
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/configuration_tests.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/configuration_tests.cpp
@ -104,6 +104,29 @@ namespace {
                 CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU},
                {InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "should be int"}}
    };
    const std::vector<std::map<std::string, std::string>> auto_batch_inconfigs = {
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
                    {CONFIG_KEY(AUTO_BATCH_TIMEOUT), "-1"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, "DOESN'T EXIST"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
                    {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "-1"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, "ON"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "unknown_file"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_DUMP_KERNELS, "ON"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_TUNING_MODE, "TUNING_UNKNOWN_MODE"}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
                    {InferenceEngine::PluginConfigParams::KEY_DEVICE_ID, "DEVICE_UNKNOWN"}},
    };
    IE_SUPPRESS_DEPRECATED_END
    INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, IncorrectConfigTests,
@ -125,6 +148,12 @@ namespace {
                            IncorrectConfigTests::getTestCaseName);
    INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigTests,
             ::testing::Combine(
                     ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                     ::testing::ValuesIn(auto_batch_inconfigs)),
             IncorrectConfigTests::getTestCaseName);
    const std::vector<std::map<std::string, std::string>> conf = {
            {}
    };
@ -167,17 +196,6 @@ namespace {
    };
    IE_SUPPRESS_DEPRECATED_END
    const std::vector<std::map<std::string, std::string>> multiconf = {
            {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
            {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
                {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::THROUGHPUT}},
            {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
                {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}},
            {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
                {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
                {InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "1"}}
    };
    const std::vector<std::map<std::string, std::string>> autoConfigs = {
            {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
            {{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
@ -232,6 +250,12 @@ namespace {
             {InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "2"}}
    };
    const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
            {{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
             {CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "1"}},
    };
    INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, DefaultValuesConfigTests,
            ::testing::Combine(
                ::testing::Values(CommonTestUtils::DEVICE_GPU),
@ -255,4 +279,15 @@ namespace {
                    ::testing::Values(CommonTestUtils::DEVICE_AUTO),
                    ::testing::ValuesIn(autoinconfigs)),
            IncorrectConfigAPITests::getTestCaseName);
    INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigAPITests,
             ::testing::Combine(
                     ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                     ::testing::ValuesIn(auto_batch_inconfigs)),
             IncorrectConfigAPITests::getTestCaseName);
    INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, CorrectConfigTests,
             ::testing::Combine(
                     ::testing::Values(CommonTestUtils::DEVICE_BATCH),
                     ::testing::ValuesIn(auto_batch_configs)),
             CorrectConfigTests::getTestCaseName);
 } // namespace
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/core_integration.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/plugin/core_integration.cpp
@ -35,12 +35,12 @@ INSTANTIATE_TEST_SUITE_P(
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
 );
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_METRICS,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
 );
 INSTANTIATE_TEST_SUITE_P(
@ -50,7 +50,7 @@ INSTANTIATE_TEST_SUITE_P(
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassGetMetricTest, IEClassGetMetricTest_FULL_DEVICE_NAME,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
 );
 INSTANTIATE_TEST_SUITE_P(
@ -80,12 +80,12 @@ INSTANTIATE_TEST_SUITE_P(
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassGetMetricTest, IEClassGetMetricTest_ThrowUnsupported,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
 );
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassGetConfigTest, IEClassGetConfigTest_ThrowUnsupported,
-        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
+        ::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
 );
 INSTANTIATE_TEST_SUITE_P(
@ -115,6 +115,26 @@ INSTANTIATE_TEST_SUITE_P(
        ::testing::Values("GPU")
 );
 using IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE = BehaviorTestsUtils::IEClassBaseTestP;
 TEST_P(IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE, GetMetricAndPrintNoThrow) {
    SKIP_IF_CURRENT_TEST_IS_DISABLED()
    InferenceEngine::Core ie;
    InferenceEngine::Parameter p;
    std::map<std::string, InferenceEngine::Parameter> _options = {{"MODEL_PTR", simpleCnnNetwork.getFunction()}};
    ASSERT_NO_THROW(p = ie.GetMetric(deviceName, METRIC_KEY(OPTIMAL_BATCH_SIZE), _options).as<unsigned int>());
    unsigned int t = p;
    std::cout << "GPU device optimal batch size: " << t << std::endl;
    ASSERT_METRIC_SUPPORTED_IE(METRIC_KEY(OPTIMAL_BATCH_SIZE));
 }
 INSTANTIATE_TEST_SUITE_P(
        nightly_IEClassExecutableNetworkGetMetricTest, IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE,
        ::testing::Values("GPU")
 );
 using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT = BehaviorTestsUtils::IEClassBaseTestP;
 TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT, GetMetricAndPrintNoThrow) {
    SKIP_IF_CURRENT_TEST_IS_DISABLED()
@ -135,6 +155,7 @@ INSTANTIATE_TEST_SUITE_P(
        ::testing::Values("GPU")
 );
 using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM = BehaviorTestsUtils::IEClassBaseTestP;
 TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM, GetMetricAndPrintNoThrow) {
    SKIP_IF_CURRENT_TEST_IS_DISABLED()
--- a/src/tests/functional/plugin/shared/CMakeLists.txt
+++ b/src/tests/functional/plugin/shared/CMakeLists.txt
@ -16,6 +16,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
    list(APPEND DEPENDENCIES ov_auto_plugin)
 endif()
 if(ENABLE_AUTO_BATCH)
    list(APPEND DEPENDENCIES ov_auto_batch_plugin)
 endif()
 # remove once CVS-69781 is fixed
 if(ENABLE_OV_IR_FRONTEND)
    list(APPEND DEPENDENCIES ov_ir_frontend)
--- a/src/tests/functional/plugin/shared/include/auto_batching/auto_batching_tests.hpp
+++ b/src/tests/functional/plugin/shared/include/auto_batching/auto_batching_tests.hpp
@ -0,0 +1,161 @@
 // Copyright (C) 2018-2021 Intel Corporation
 // SPDX-License-Identifier: Apache-2.0
 //
 #include <string>
 #include <utility>
 #include <vector>
 #include <memory>
 #include <gpu/gpu_config.hpp>
 #include <common_test_utils/test_common.hpp>
 #include <functional_test_utils/plugin_cache.hpp>
 #include "ngraph_functions/subgraph_builders.hpp"
 #include "functional_test_utils/blob_utils.hpp"
 using namespace ::testing;
 using namespace InferenceEngine;
 namespace AutoBatchingTests {
 using AutoBatchTwoNetsParams = std::tuple<
        std::string,             // device name
        bool,  // get or set blob
        size_t,  // number of streams
        size_t,  // number of requests
        size_t>; // batch size>
 class AutoBatching_Test : public CommonTestUtils::TestsCommon,
                          public testing::WithParamInterface<AutoBatchTwoNetsParams> {
    void SetUp() override {
        std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
        fn_ptrs = {ngraph::builder::subgraph::makeSingleConv(),
                   ngraph::builder::subgraph::makeMultiSingleConv()};
    };
 public:
    static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
        size_t streams, requests, batch;
        bool use_get_blob;
        std::string device_name;
        std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
        return device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") + "_batch_size_" +
               std::to_string(batch) +
               "_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
    }
 protected:
    std::string device_name;
    bool use_get_blob;
    size_t num_streams;
    size_t num_requests;
    size_t num_batch;
    std::vector<std::shared_ptr<ngraph::Function>> fn_ptrs;
    void TestAutoBatch() {
        std::vector<InferenceEngine::CNNNetwork> nets;
        for (auto &fn_ptr : fn_ptrs) {
            nets.push_back(CNNNetwork(fn_ptr));
        }
        auto ie = InferenceEngine::Core();
        std::vector<std::string> outputs;
        std::vector<InferRequest> irs;
        std::vector<std::vector<uint8_t>> ref;
        std::vector<int> outElementsCount;
        for (size_t i = 0; i < nets.size(); ++i) {
            auto net = nets[i];
            auto inputs = net.getInputsInfo();
            for (auto n : inputs) {
                n.second->setPrecision(Precision::FP32);
            }
            std::map<std::string, std::string> config;
            if (device_name.find("GPU") != std::string::npos)
                config[CONFIG_KEY(GPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
            if (device_name.find("CPU") != std::string::npos)
                config[CONFIG_KEY(CPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
            // minimize timeout to reduce test time
            config[CONFIG_KEY(AUTO_BATCH_TIMEOUT)] = std::to_string(1);
            auto exec_net_ref = ie.LoadNetwork(net, std::string(CommonTestUtils::DEVICE_BATCH) + ":" +
                                                    device_name + "(" + std::to_string(num_batch) + ")",
                                               config);
            for (size_t j = 0; j < num_requests; j++) {
                outputs.push_back(net.getOutputsInfo().begin()->first); //single output
                outElementsCount.push_back(
                        std::accumulate(begin(fn_ptrs[i]->get_output_shape(0)), end(fn_ptrs[i]->get_output_shape(0)), 1,
                                        std::multiplies<size_t>()));
                auto inf_req = exec_net_ref.CreateInferRequest();
                irs.push_back(inf_req);
                std::vector<std::vector<uint8_t>> inData;
                for (auto n : inputs) {
                    auto blob = FuncTestUtils::createAndFillBlob(n.second->getTensorDesc());
                    if (use_get_blob)
                        memcpy(reinterpret_cast<void *>(inf_req.GetBlob(n.first)->buffer().as<uint8_t*>()),
                               reinterpret_cast<const void *>(blob->cbuffer().as<uint8_t*>()), blob->byteSize());
                    else
                        inf_req.SetBlob(n.first, blob);
                    const auto inBlob = inf_req.GetBlob(n.first);
                    const auto blobSize = inBlob->byteSize();
                    const auto inBlobBuf = inBlob->cbuffer().as<uint8_t *>();
                    inData.push_back(std::vector<uint8_t>(inBlobBuf, inBlobBuf + blobSize));
                }
                auto refOutData = ngraph::helpers::interpreterFunction(fn_ptrs[i], {inData}).front().second;
                ref.push_back(refOutData);
            }
        }
        const int niter = 1;
        for (int i = 0; i < niter; i++) {
            for (auto ir : irs) {
                ir.StartAsync();
            }
            for (auto ir : irs) {
                ir.Wait(InferRequest::RESULT_READY);
            }
        }
        auto thr = FuncTestUtils::GetComparisonThreshold(InferenceEngine::Precision::FP32);
        for (size_t i = 0; i < irs.size(); ++i) {
            const auto &refBuffer = ref[i].data();
            ASSERT_EQ(outElementsCount[i], irs[i].GetBlob(outputs[i])->size());
            FuncTestUtils::compareRawBuffers(irs[i].GetBlob(outputs[i])->buffer().as<float *>(),
                                             reinterpret_cast<const float *>(refBuffer), outElementsCount[i],
                                             outElementsCount[i],
                                             thr);
        }
    }
 };
 class AutoBatching_Test_DetectionOutput : public AutoBatching_Test {
 public:
    void SetUp() override {
        std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
        fn_ptrs = {ngraph::builder::subgraph::makeEltwisePlusDetectionOutput(),
                   ngraph::builder::subgraph::makeEltwisePlusDetectionOutput()};
    };
    static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
        size_t streams, requests, batch;
        bool use_get_blob;
        std::string device_name;
        std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
        return "DetectionOutput_HETERO_" + device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") +
               "_batch_size_" + std::to_string(batch) +
               "_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
    }
 };
 TEST_P(AutoBatching_Test, compareAutoBatchingToSingleBatch) {
    TestAutoBatch();
 }
 TEST_P(AutoBatching_Test_DetectionOutput, compareAutoBatchingToSingleBatch) {
    TestAutoBatch();
 }
 }  // namespace AutoBatchingTests
--- a/src/tests/ie_test_utils/common_test_utils/test_constants.hpp
+++ b/src/tests/ie_test_utils/common_test_utils/test_constants.hpp
@ -10,6 +10,7 @@ const char DEVICE_AUTO[] = "AUTO";
 const char DEVICE_CPU[] = "CPU";
 const char DEVICE_GNA[] = "GNA";
 const char DEVICE_GPU[] = "GPU";
 const char DEVICE_BATCH[] = "BATCH";
 const char DEVICE_HDDL[] = "HDDL";
 const char DEVICE_MYRIAD[] = "MYRIAD";
 const char DEVICE_KEEMBAY[] = "VPUX";
--- a/src/tests/ie_test_utils/unit_test_utils/mocks/cpp_interfaces/interface/mock_icore.hpp
+++ b/src/tests/ie_test_utils/unit_test_utils/mocks/cpp_interfaces/interface/mock_icore.hpp
@ -26,6 +26,9 @@ public:
    MOCK_METHOD3(ImportNetwork, InferenceEngine::SoExecutableNetworkInternal(
        std::istream&, const std::shared_ptr<InferenceEngine::RemoteContext>&, const std::map<std::string, std::string>&));
    MOCK_METHOD2(CreateContext, InferenceEngine::RemoteContext::Ptr(const std::string& deviceName,
            const InferenceEngine::ParamMap& params));
    MOCK_CONST_METHOD3(QueryNetwork, InferenceEngine::QueryNetworkResult(
        const InferenceEngine::CNNNetwork&, const std::string&, const std::map<std::string, std::string>&));
--- a/src/tests/ngraph_helpers/ngraph_functions/include/ngraph_functions/subgraph_builders.hpp
+++ b/src/tests/ngraph_helpers/ngraph_functions/include/ngraph_functions/subgraph_builders.hpp
@ -242,6 +242,44 @@ inline std::shared_ptr<ngraph::Function> makeSingleConv(std::vector<size_t> inpu
    return fn_ptr;
 }
 inline std::shared_ptr<ngraph::Function> makeEltwisePlusDetectionOutput(std::vector<std::vector<size_t>> inShapes =
        {{1, 60}, {1, 165}, {1, 1, 75}},
                                                                         ngraph::element::Type_t type = ngraph::element::Type_t::f32) {
    // adding Eltwise so that we can tests Auto-Batching's HETERO code-path that splits the DetectionOutput and the rest of the network
    auto params = ngraph::builder::makeParams(ngraph::element::f32, inShapes);
    auto paramOuts = ngraph::helpers::convert2OutputVector(
            ngraph::helpers::castOps2Nodes<ngraph::opset3::Parameter>(params));
    ngraph::OutputVector outs;
    for (size_t i = 0; i < inShapes.size(); i++) {
        auto shape = inShapes[i];
        auto p = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::f32, ngraph::Shape{shape});
        auto add = ngraph::builder::makeEltwise(paramOuts[i], p, ngraph::helpers::EltwiseTypes::ADD);
        params.push_back(p);
        outs.push_back(add->output(0));
    }
    ngraph::op::DetectionOutput::Attributes attr;
    attr.num_classes = 11;
    attr.background_label_id = 0;
    attr.top_k = 75;
    attr.variance_encoded_in_target = true;
    attr.keep_top_k = {50};
    attr.code_type = std::string{"caffe.PriorBoxParameter.CORNER"};
    attr.share_location = true;
    attr.nms_threshold = 0.5f;
    attr.confidence_threshold = 0.5f;
    attr.clip_after_nms = false;
    attr.clip_before_nms = false;
    attr.decrease_label_id = false;
    attr.normalized = false;
    attr.input_height = 1;
    attr.input_width = 1;
    attr.objectness_score = 0.4f;
    auto detOut = ngraph::builder::makeDetectionOutput(outs, attr);
    ngraph::ResultVector results{std::make_shared<ngraph::opset3::Result>(detOut)};
    return std::make_shared<ngraph::Function>(results, params, "EltWiseWithDetectionOutput");
 }
 inline std::shared_ptr<ngraph::Function> makeMultiSingleConv(std::vector<size_t> inputShape = {1, 3, 24, 24},
    ngraph::element::Type type = ngraph::element::Type_t::f32) {
    auto param0 = std::make_shared<ngraph::opset1::Parameter>(type, ngraph::Shape(inputShape));
--- a/src/tests/unit/auto/exec_network_get_metrics.cpp
+++ b/src/tests/unit/auto/exec_network_get_metrics.cpp
@ -38,6 +38,7 @@ using Config = std::map<std::string, std::string>;
 using namespace MockMultiDevice;
 using ConfigParams = std::tuple<
        bool,                        // if THROUGHPUT
        unsigned int,                // cpu OPTIMAL_NUMBER_OF_INFER_REQUESTS
        int,                         // cpu infer requet num of customer want
        bool,                        // if cpu sleep, cpu device will load slow
@ -77,12 +78,18 @@ public:
        unsigned int expectOptimalNum;
        bool cpuSleep;
        bool gpuSleep;
-        std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
+        bool isThroughput;
        std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
                 gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = obj.param;
        std::ostringstream result;
        result << "cpuOptimalNum_" << cpuOptimalNum << "cpuCustomerNum_" << cpuCustomerNum;
        result << "gpuOptimalNum_" << gpuOptimalNum << "gpuCustomerNum_" << gpuCustomerNum;
        result << "expectOptimalNum_" << expectOptimalNum;
        if (isThroughput) {
            result << "_isThroughput" << "true";
        } else {
            result << "__isThroughput" << "false";
        }
        if (cpuSleep) {
            result << "_cpuSleep_" << "true";
        } else {
@ -147,7 +154,7 @@ public:
       IE_SET_METRIC(SUPPORTED_CONFIG_KEYS, supportConfigs, {});
       ON_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _))
           .WillByDefault(RETURN_MOCK_VALUE(supportConfigs));
-       EXPECT_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _)).Times(AnyNumber());
+       EXPECT_CALL(*core, GetMetric(_, _, _)).Times(AnyNumber());
       // test auto plugin
       config.insert({CONFIG_KEY_INTERNAL(MULTI_WORK_MODE_AS_AUTO), InferenceEngine::PluginConfigParams::YES});
@ -168,11 +175,24 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
    unsigned int expectOptimalNum;
    bool cpuSleep;
    bool gpuSleep;
-    std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
+    bool isThroughput;
    std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
             gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = this->GetParam();
-
+    if (isThroughput) {
-    metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""});
+        metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {{CONFIG_KEY(PERFORMANCE_HINT),
-    metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""});
+                    InferenceEngine::PluginConfigParams::THROUGHPUT}}, cpuCustomerNum, ""});
        metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {{CONFIG_KEY(PERFORMANCE_HINT),
                    InferenceEngine::PluginConfigParams::THROUGHPUT}}, gpuCustomerNum, ""});
        IE_SET_METRIC(OPTIMAL_BATCH_SIZE, optimalBatchNum, 256);
        IE_SET_METRIC(RANGE_FOR_STREAMS, rangeOfStreams, std::make_tuple<unsigned int, unsigned int>(1, 2));
        ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(OPTIMAL_BATCH_SIZE)), _))
            .WillByDefault(RETURN_MOCK_VALUE(optimalBatchNum));
        ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(RANGE_FOR_STREAMS)), _))
            .WillByDefault(RETURN_MOCK_VALUE(rangeOfStreams));
    } else {
        metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""});
        metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""});
    }
    ON_CALL(*plugin, SelectDevice(_, _, _)).WillByDefault(Return(metaDevices[1]));
    ON_CALL(*plugin, ParseMetaDevices(_, _)).WillByDefault(Return(metaDevices));
    EXPECT_CALL(*plugin, ParseMetaDevices(_, _)).Times(1);
@ -241,27 +261,28 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
 }
-// ConfigParams {unsigned int, int, bool,
+// ConfigParams {bool, unsigned int, int, bool,
 //               unsigned int, int, bool, unsigned int}
 //
 // every element for ConfigParams
-// {cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
+// {is throughput mode, cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
 //  gpuOptimalNum, customer hope for gpu infer requset num, if gpu sleep when load,
 //  expectOptimalNum of Auto ExecNetwork}
 //
 const std::vector<ConfigParams> testConfigs = {
-                                               ConfigParams {1, -1, false, 2, -1, true, 8},
+                                               ConfigParams {false, 1, -1, false, 2, -1, true, 8},
-                                               ConfigParams {1, -1, false, 10, -1, true, 8},
+                                               ConfigParams {false, 1, -1, false, 10, -1, true, 8},
-                                               ConfigParams {12, -1, false, 2, -1, true, 12},
+                                               ConfigParams {false, 12, -1, false, 2, -1, true, 12},
-                                               ConfigParams {12, -1, false, 10, -1, true, 12},
+                                               ConfigParams {false, 12, -1, false, 10, -1, true, 12},
-                                               ConfigParams {1, -1, true, 2, -1, false, 8},
+                                               ConfigParams {false, 1, -1, true, 2, -1, false, 8},
-                                               ConfigParams {1, -1, true, 10, -1, false, 10},
+                                               ConfigParams {false, 1, -1, true, 10, -1, false, 10},
-                                               ConfigParams {6, -1, true, 2, -1, false, 8},
+                                               ConfigParams {false, 6, -1, true, 2, -1, false, 8},
-                                               ConfigParams {6, -1, true, 10, -1, false, 10},
+                                               ConfigParams {false, 6, -1, true, 10, -1, false, 10},
-                                               ConfigParams {6, 4, false, 2, 3, true, 8},
+                                               ConfigParams {false, 6, 4, false, 2, 3, true, 8},
-                                               ConfigParams {6, 4, false, 10, 3, true, 8},
+                                               ConfigParams {false, 6, 4, false, 10, 3, true, 8},
-                                               ConfigParams {1, 4, true, 2, 3, false, 8},
+                                               ConfigParams {false, 1, 4, true, 2, 3, false, 8},
-                                               ConfigParams {1, 4, true, 10, 3, false, 10}
+                                               ConfigParams {false, 1, 4, true, 10, 3, false, 10},
                                               ConfigParams {true, 1, 4, false, 10, 3, true, 512}
                                              };
 INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetworkGetMetric,
--- a/src/tests_deprecated/behavior/shared_tests/CMakeLists.txt
+++ b/src/tests_deprecated/behavior/shared_tests/CMakeLists.txt
@ -14,6 +14,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
    add_dependencies(${TARGET_NAME} ov_auto_plugin)
 endif()
 if(ENABLE_AUTO_BATCH)
    add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
 endif()
 target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/plugin_tests")
 target_link_libraries(${TARGET_NAME} PUBLIC
--- a/src/tests_deprecated/functional/shared_tests/CMakeLists.txt
+++ b/src/tests_deprecated/functional/shared_tests/CMakeLists.txt
@ -25,6 +25,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
    add_dependencies(${TARGET_NAME} ov_auto_plugin)
 endif()
 if(ENABLE_AUTO_BATCH)
    add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
 endif()
 set_ie_threading_interface_for(${TARGET_NAME})
 ie_faster_build(${TARGET_NAME}