diff --git a/docs/OV_Runtime_UG/supported_plugins/CPU.md b/docs/OV_Runtime_UG/supported_plugins/CPU.md index fee0b285ba2..26f4cd5a1b5 100644 --- a/docs/OV_Runtime_UG/supported_plugins/CPU.md +++ b/docs/OV_Runtime_UG/supported_plugins/CPU.md @@ -11,7 +11,7 @@ For an in-depth description of CPU plugin, see: ## Device Name The `CPU` device name is used for the CPU plugin. Even though there can be more than one physical socket on a platform, only one device of this kind is listed by OpenVINO. -On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically. +On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically. In order to use CPU for inference, the device name should be passed to the `ov::Core::compile_model()` method: @sphinxtabset @@ -38,7 +38,7 @@ CPU plugin supports the following data types as inference precision of internal - u8 - i8 - u1 - + [Hello Query Device C++ Sample](../../../samples/cpp/hello_query_device/README.md) can be used to print out supported data types for all detected devices. ### Quantized Data Types Specifics @@ -60,7 +60,7 @@ For more details about the `bfloat16` format, see the [BFLOAT16 – Hardware Num Using the `bf16` precision provides the following performance benefits: - Faster multiplication of two `bfloat16` numbers because of shorter mantissa of the `bfloat16` data. -- Reduced memory consumption since `bfloat16` data half the size of 32-bit float. +- Reduced memory consumption since `bfloat16` data half the size of 32-bit float. To check if the CPU device can support the `bfloat16` data type, use the [query device properties interface](./config_properties.md) to query `ov::device::capabilities` property, which should contain `BF16` in the list of CPU capabilities: @@ -76,11 +76,11 @@ To check if the CPU device can support the `bfloat16` data type, use the [query @endsphinxtabset -If the model has been converted to `bf16`, the `ov::hint::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type: +If the model has been converted to `bf16`, the `ov::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type: @snippet snippets/cpu/Bfloat16Inference1.cpp part1 -To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::hint::inference_precision` to `ov::element::f32`. +To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::inference_precision` to `ov::element::f32`. @sphinxtabset @@ -95,12 +95,12 @@ To infer the model in `f32` precision instead of `bf16` on targets with native ` @endsphinxtabset The `Bfloat16` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance. -To enable the simulation, the `ov::hint::inference_precision` has to be explicitly set to `ov::element::bf16`. +To enable the simulation, the `ov::inference_precision` has to be explicitly set to `ov::element::bf16`. -> **NOTE**: If ov::hint::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown. +> **NOTE**: If ov::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown. > **NOTE**: Due to the reduced mantissa size of the `bfloat16` data type, the resulting `bf16` inference accuracy may differ from the `f32` inference, especially for models that were not trained using the `bfloat16` data type. If the `bf16` inference accuracy is not acceptable, it is recommended to switch to the `f32` precision. - + ## Supported Features ### Multi-device Execution @@ -204,7 +204,7 @@ The plugin supports the following properties: All parameters must be set before calling `ov::Core::compile_model()` in order to take effect or passed as additional argument to `ov::Core::compile_model()` - `ov::enable_profiling` -- `ov::hint::inference_precision` +- `ov::inference_precision` - `ov::hint::performance_mode` - `ov::hint::num_request` - `ov::num_streams` diff --git a/docs/OV_Runtime_UG/supported_plugins/GNA.md b/docs/OV_Runtime_UG/supported_plugins/GNA.md index 0257fb89158..4fdd2f7320d 100644 --- a/docs/OV_Runtime_UG/supported_plugins/GNA.md +++ b/docs/OV_Runtime_UG/supported_plugins/GNA.md @@ -51,7 +51,7 @@ For details, see a description of the `ov::intel_gna::execution_mode` property. GNA is designed for real-time workloads i.e., noise reduction. For such workloads, processing should be time constrained. Otherwise, extra delays may cause undesired effects such as -*audio glitches*. The GNA driver provides a Quality of Service (QoS) mechanism to ensure that processing can satisfy real-time requirements. +*audio glitches*. The GNA driver provides a Quality of Service (QoS) mechanism to ensure that processing can satisfy real-time requirements. The mechanism interrupts requests that might cause high-priority Windows audio processes to miss the schedule. As a result, long running GNA tasks terminate early. @@ -101,7 +101,7 @@ GNA plugin supports the `i16` and `i8` quantized data types as inference precisi * Accuracy (i16 weights) * Performance (i8 weights) -For POT quantized model, the `ov::hint::inference_precision` property has no effect except cases described in Support for 2D Convolutions using POT. +For POT quantized model, the `ov::inference_precision` property has no effect except cases described in Support for 2D Convolutions using POT. ## Supported Features @@ -206,7 +206,7 @@ In order to take effect, the following parameters must be set before model compi - ov::cache_dir - ov::enable_profiling -- ov::hint::inference_precision +- ov::inference_precision - ov::hint::num_requests - ov::intel_gna::compile_target - ov::intel_gna::firmware_model_image_path @@ -272,7 +272,7 @@ The following tables provide a more explicit representation of the Intel(R) GNA For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met: * All convolution parameters are natively supported by HW (see tables above). -* The runtime precision is explicitly set by the `ov::hint::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT. +* The runtime precision is explicitly set by the `ov::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT. ### Batch Size Limitation @@ -332,4 +332,4 @@ Increasing batch size only improves efficiency of `MatMul` layers. * [Supported Devices](Supported_Devices.md) * [Converting Model](../../MO_DG/prepare_model/convert_model/Converting_Model.md) -* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) \ No newline at end of file +* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) diff --git a/docs/OV_Runtime_UG/supported_plugins/GPU.md b/docs/OV_Runtime_UG/supported_plugins/GPU.md index b1d30f9fca1..9ddc26bf15b 100644 --- a/docs/OV_Runtime_UG/supported_plugins/GPU.md +++ b/docs/OV_Runtime_UG/supported_plugins/GPU.md @@ -138,7 +138,7 @@ It is done by specifying `MULTI:GPU.1,GPU.0` as a target device. For more details, see the [Multi-device execution](../multi_device.md). ### Automatic Batching -The GPU plugin is capable of reporting `ov::max_batch_size` and `ov::optimal_batch_size` metrics with respect to the current hardware +The GPU plugin is capable of reporting `ov::max_batch_size` and `ov::optimal_batch_size` metrics with respect to the current hardware platform and model. Therefore, automatic batching is enabled by default when `ov::optimal_batch_size` is `> 1` and `ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)` is set. Alternatively, it can be enabled explicitly via the device notion, for example `BATCH:GPU`. @@ -238,7 +238,7 @@ For usage examples, refer to the [RemoteTensor API](./GPU_RemoteTensor_API.md). For more details, see the [preprocessing API](../preprocessing_overview.md). ### Model Caching -Cache for the GPU plugin may be enabled via the common OpenVINO `ov::cache_dir` property. GPU plugin implementation supports only caching of compiled kernels, so all plugin-specific model transformations are executed on each `ov::Core::compile_model()` call regardless of the `cache_dir` option. +Cache for the GPU plugin may be enabled via the common OpenVINO `ov::cache_dir` property. GPU plugin implementation supports only caching of compiled kernels, so all plugin-specific model transformations are executed on each `ov::Core::compile_model()` call regardless of the `cache_dir` option. Still, since kernel compilation is a bottleneck in the model loading process, a significant load time reduction can be achieved with the `ov::cache_dir` property enabled. > **NOTE**: Full model caching support is currently implemented as a preview feature. To activate it, set the OV_GPU_CACHE_MODEL environment variable to 1. @@ -262,8 +262,9 @@ All parameters must be set before calling `ov::Core::compile_model()` in order t - ov::enable_profiling - ov::hint::model_priority - ov::hint::performance_mode +- ov::hint::execution_mode - ov::hint::num_requests -- ov::hint::inference_precision +- ov::inference_precision - ov::num_streams - ov::compilation_num_threads - ov::device::id diff --git a/docs/optimization_guide/dldt_deployment_optimization_guide.md b/docs/optimization_guide/dldt_deployment_optimization_guide.md index 5626900fcb3..2efc59bf003 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_guide.md +++ b/docs/optimization_guide/dldt_deployment_optimization_guide.md @@ -17,11 +17,11 @@ @endsphinxdirective Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost. -`ov::hint::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model. +`ov::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model. Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold. -It is also important to understand how the full-stack application would use the inference component "end-to-end." For example, to know what stages need to be orchestrated to save workload devoted to fetching and preparing input data. +It is also important to understand how the full-stack application would use the inference component "end-to-end." For example, to know what stages need to be orchestrated to save workload devoted to fetching and preparing input data. For more information on this topic, see the following articles: * [feature support by device](@ref features_support_matrix) @@ -30,28 +30,28 @@ For more information on this topic, see the following articles: * [The 'get_tensor' Idiom](@ref tensor_idiom) * For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md) -See the [latency](./dldt_deployment_optimization_latency.md) and [throughput](./dldt_deployment_optimization_tput.md) optimization guides, for **use-case-specific optimizations** +See the [latency](./dldt_deployment_optimization_latency.md) and [throughput](./dldt_deployment_optimization_tput.md) optimization guides, for **use-case-specific optimizations** ## Writing Performance-Portable Inference Applications Although inference performed in OpenVINO Runtime can be configured with a multitude of low-level performance settings, it is not recommended in most cases. Firstly, achieving the best performance with such adjustments requires deep understanding of device architecture and the inference engine. Secondly, such optimization may not translate well to other device-model combinations. In other words, one set of execution parameters is likely to result in different performance when used under different conditions. For example: - * both the CPU and GPU support the notion of [streams](./dldt_deployment_optimization_tput_advanced.md), yet they deduce their optimal number very differently. - * Even among devices of the same type, different execution configurations can be considered optimal, as in the case of instruction sets or the number of cores for the CPU and the batch size for the GPU. - * Different models have different optimal parameter configurations, considering factors such as compute vs memory-bandwidth, inference precision, and possible model quantization. - * Execution "scheduling" impacts performance strongly and is highly device-specific, for example, GPU-oriented optimizations like batching, combining multiple inputs to achieve the optimal throughput, [do not always map well to the CPU](dldt_deployment_optimization_internals.md). - - + * both the CPU and GPU support the notion of [streams](./dldt_deployment_optimization_tput_advanced.md), yet they deduce their optimal number very differently. + * Even among devices of the same type, different execution configurations can be considered optimal, as in the case of instruction sets or the number of cores for the CPU and the batch size for the GPU. + * Different models have different optimal parameter configurations, considering factors such as compute vs memory-bandwidth, inference precision, and possible model quantization. + * Execution "scheduling" impacts performance strongly and is highly device-specific, for example, GPU-oriented optimizations like batching, combining multiple inputs to achieve the optimal throughput, [do not always map well to the CPU](dldt_deployment_optimization_internals.md). + + To make the configuration process much easier and its performance optimization more portable, the option of [Performance Hints](../OV_Runtime_UG/performance_hints.md) has been introduced. It comprises two high-level "presets" focused on either **latency** or **throughput** and, essentially, makes execution specifics irrelevant. -The Performance Hints functionality makes configuration transparent to the application, for example, anticipates the need for explicit (application-side) batching or streams, and facilitates parallel processing of separate infer requests for different input sources +The Performance Hints functionality makes configuration transparent to the application, for example, anticipates the need for explicit (application-side) batching or streams, and facilitates parallel processing of separate infer requests for different input sources ## Additional Resources * [Using Async API and running multiple inference requests in parallel to leverage throughput](@ref throughput_app_design). -* [The throughput approach implementation details for specific devices](dldt_deployment_optimization_internals.md) +* [The throughput approach implementation details for specific devices](dldt_deployment_optimization_internals.md) * [Details on throughput](dldt_deployment_optimization_tput.md) * [Details on latency](dldt_deployment_optimization_latency.md) * [API examples and details](../OV_Runtime_UG/performance_hints.md). diff --git a/docs/snippets/cpu/Bfloat16Inference1.cpp b/docs/snippets/cpu/Bfloat16Inference1.cpp index 51850c6018d..58f42ebfcaf 100644 --- a/docs/snippets/cpu/Bfloat16Inference1.cpp +++ b/docs/snippets/cpu/Bfloat16Inference1.cpp @@ -6,7 +6,7 @@ using namespace InferenceEngine; ov::Core core; auto network = core.read_model("sample.xml"); auto exec_network = core.compile_model(network, "CPU"); -auto inference_precision = exec_network.get_property(ov::hint::inference_precision); +auto inference_precision = exec_network.get_property(ov::inference_precision); //! [part1] return 0; diff --git a/docs/snippets/cpu/Bfloat16Inference2.cpp b/docs/snippets/cpu/Bfloat16Inference2.cpp index c06a6491b89..762329269fc 100644 --- a/docs/snippets/cpu/Bfloat16Inference2.cpp +++ b/docs/snippets/cpu/Bfloat16Inference2.cpp @@ -4,7 +4,7 @@ int main() { using namespace InferenceEngine; //! [part2] ov::Core core; -core.set_property("CPU", ov::hint::inference_precision(ov::element::f32)); +core.set_property("CPU", ov::inference_precision(ov::element::f32)); //! [part2] return 0; diff --git a/docs/snippets/ov_hetero.cpp b/docs/snippets/ov_hetero.cpp index 791340afff5..2f5cf3f5c9e 100644 --- a/docs/snippets/ov_hetero.cpp +++ b/docs/snippets/ov_hetero.cpp @@ -49,7 +49,7 @@ auto compiled_model = core.compile_model(model, "HETERO", // profiling is enabled only for GPU ov::device::properties("GPU", ov::enable_profiling(true)), // FP32 inference precision only for CPU - ov::device::properties("CPU", ov::hint::inference_precision(ov::element::f32)) + ov::device::properties("CPU", ov::inference_precision(ov::element::f32)) ); //! [configure_fallback_devices] } diff --git a/docs/snippets/ov_properties_api.cpp b/docs/snippets/ov_properties_api.cpp index 1d971f52ced..d9a144fcf99 100644 --- a/docs/snippets/ov_properties_api.cpp +++ b/docs/snippets/ov_properties_api.cpp @@ -19,7 +19,7 @@ auto model = core.read_model("sample.xml"); //! [compile_model_with_property] auto compiled_model = core.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT), - ov::hint::inference_precision(ov::element::f32)); + ov::inference_precision(ov::element::f32)); //! [compile_model_with_property] } diff --git a/docs/snippets/ov_properties_migration.cpp b/docs/snippets/ov_properties_migration.cpp index 7be66b4a1d1..6ee3279395c 100644 --- a/docs/snippets/ov_properties_migration.cpp +++ b/docs/snippets/ov_properties_migration.cpp @@ -25,7 +25,7 @@ auto model = core.read_model("sample.xml"); auto compiled_model = core.compile_model(model, "MULTI", ov::device::priorities("GPU", "CPU"), ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT), - ov::hint::inference_precision(ov::element::f32)); + ov::inference_precision(ov::element::f32)); //! [core_compile_model] //! [compiled_model_set_property] diff --git a/samples/cpp/benchmark_app/main.cpp b/samples/cpp/benchmark_app/main.cpp index 08f1a58bd10..5be1d8eb9de 100644 --- a/samples/cpp/benchmark_app/main.cpp +++ b/samples/cpp/benchmark_app/main.cpp @@ -500,13 +500,13 @@ int main(int argc, char* argv[]) { auto it_device_infer_precision = device_infer_precision.find(device); if (it_device_infer_precision != device_infer_precision.end()) { // set to user defined value - if (supported(ov::hint::inference_precision.name())) { - device_config.emplace(ov::hint::inference_precision(it_device_infer_precision->second)); + if (supported(ov::inference_precision.name())) { + device_config.emplace(ov::inference_precision(it_device_infer_precision->second)); } else if (device == "MULTI" || device == "AUTO") { // check if the element contains the hardware device property auto value_vec = split(it_device_infer_precision->second, ' '); if (value_vec.size() == 1) { - auto key = ov::hint::inference_precision.name(); + auto key = ov::inference_precision.name(); device_config[key] = it_device_infer_precision->second; } else { // set device inference_precison properties in the AUTO/MULTI plugin @@ -523,16 +523,16 @@ int main(int argc, char* argv[]) { is_dev_set_property[it.first] = false; device_config.erase(it.first); device_config.insert( - ov::device::properties(it.first, ov::hint::inference_precision(it.second))); + ov::device::properties(it.first, ov::inference_precision(it.second))); } else { auto& property = device_config[it.first].as(); - property.emplace(ov::hint::inference_precision(it.second)); + property.emplace(ov::inference_precision(it.second)); } } } } else { throw std::logic_error("Device " + device + " doesn't support config key '" + - ov::hint::inference_precision.name() + "'! " + + ov::inference_precision.name() + "'! " + "Please specify -infer_precision for correct devices in format " ":,:" + " or via configuration file."); diff --git a/samples/cpp/speech_sample/main.cpp b/samples/cpp/speech_sample/main.cpp index f7839eceb66..17b06e6984c 100644 --- a/samples/cpp/speech_sample/main.cpp +++ b/samples/cpp/speech_sample/main.cpp @@ -220,7 +220,7 @@ int main(int argc, char* argv[]) { gnaPluginConfig[ov::intel_gna::scale_factors_per_input.name()] = scale_factors_per_input; } } - gnaPluginConfig[ov::hint::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16; + gnaPluginConfig[ov::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16; auto parse_target = [&](const std::string& target) -> ov::intel_gna::HWGeneration { auto hw_target = ov::intel_gna::HWGeneration::UNDEFINED; diff --git a/src/bindings/python/src/pyopenvino/core/properties/properties.cpp b/src/bindings/python/src/pyopenvino/core/properties/properties.cpp index e7db415cad8..efb4c97af45 100644 --- a/src/bindings/python/src/pyopenvino/core/properties/properties.cpp +++ b/src/bindings/python/src/pyopenvino/core/properties/properties.cpp @@ -38,6 +38,7 @@ void regmodule_properties(py::module m) { wrap_property_RO(m_properties, ov::optimal_batch_size, "optimal_batch_size"); wrap_property_RO(m_properties, ov::max_batch_size, "max_batch_size"); wrap_property_RO(m_properties, ov::range_for_async_infer_requests, "range_for_async_infer_requests"); + wrap_property_RW(m_properties, ov::inference_precision, "inference_precision"); // Submodule hint py::module m_hint = diff --git a/src/bindings/python/tests/test_runtime/test_properties.py b/src/bindings/python/tests/test_runtime/test_properties.py index fa946b069e0..524d2699c3a 100644 --- a/src/bindings/python/tests/test_runtime/test_properties.py +++ b/src/bindings/python/tests/test_runtime/test_properties.py @@ -199,6 +199,7 @@ def test_properties_ro(ov_property_ro, expected_value): ((properties.Affinity.NONE, properties.Affinity.NONE),), ), (properties.force_tbb_terminate, "FORCE_TBB_TERMINATE", ((True, True),)), + (properties.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)), (properties.hint.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)), ( properties.hint.model_priority, @@ -362,7 +363,7 @@ def test_single_property_setting(device): properties.cache_dir("./"), properties.inference_num_threads(9), properties.affinity(properties.Affinity.NONE), - properties.hint.inference_precision(Type.f32), + properties.inference_precision(Type.f32), properties.hint.performance_mode(properties.hint.PerformanceMode.LATENCY), properties.hint.num_requests(12), properties.streams.num(5), @@ -374,7 +375,7 @@ def test_single_property_setting(device): properties.cache_dir(): "./", properties.inference_num_threads(): 9, properties.affinity(): properties.Affinity.NONE, - properties.hint.inference_precision(): Type.f32, + properties.inference_precision(): Type.f32, properties.hint.performance_mode(): properties.hint.PerformanceMode.LATENCY, properties.hint.num_requests(): 12, properties.streams.num(): 5, diff --git a/src/inference/include/openvino/runtime/properties.hpp b/src/inference/include/openvino/runtime/properties.hpp index b6e685740f1..b590e0eacaf 100644 --- a/src/inference/include/openvino/runtime/properties.hpp +++ b/src/inference/include/openvino/runtime/properties.hpp @@ -228,16 +228,22 @@ static constexpr Property model_name{"NETWO static constexpr Property optimal_number_of_infer_requests{ "OPTIMAL_NUMBER_OF_INFER_REQUESTS"}; +/** + * @brief Hint for device to use specified precision for inference + * @ingroup ov_runtime_cpp_prop_api + */ +static constexpr Property inference_precision{"INFERENCE_PRECISION_HINT"}; + /** * @brief Namespace with hint properties */ namespace hint { /** - * @brief Hint for device to use specified precision for inference + * @brief An alias for inference_precision property for backward compatibility * @ingroup ov_runtime_cpp_prop_api */ -static constexpr Property inference_precision{"INFERENCE_PRECISION_HINT"}; +using ov::inference_precision; /** * @brief Enum to define possible priorities hints @@ -360,6 +366,56 @@ static constexpr Property> model{"MODEL_PTR"}; * @ingroup ov_runtime_cpp_prop_api */ static constexpr Property allow_auto_batching{"ALLOW_AUTO_BATCHING"}; + +/** + * @brief Enum to define possible execution mode hints + * @ingroup ov_runtime_cpp_prop_api + */ +enum class ExecutionMode { + UNDEFINED = -1, //!< Undefined value, settings may vary from device to device + PERFORMANCE = 1, //!< Optimize for max performance + ACCURACY = 2, //!< Optimize for max accuracy +}; + +/** @cond INTERNAL */ +inline std::ostream& operator<<(std::ostream& os, const ExecutionMode& mode) { + switch (mode) { + case ExecutionMode::UNDEFINED: + return os << "UNDEFINED"; + case ExecutionMode::PERFORMANCE: + return os << "PERFORMANCE"; + case ExecutionMode::ACCURACY: + return os << "ACCURACY"; + default: + throw ov::Exception{"Unsupported execution mode hint"}; + } +} + +inline std::istream& operator>>(std::istream& is, ExecutionMode& mode) { + std::string str; + is >> str; + if (str == "PERFORMANCE") { + mode = ExecutionMode::PERFORMANCE; + } else if (str == "ACCURACY") { + mode = ExecutionMode::ACCURACY; + } else if (str == "UNDEFINED") { + mode = ExecutionMode::UNDEFINED; + } else { + throw ov::Exception{"Unsupported execution mode: " + str}; + } + return is; +} +/** @endcond */ + +/** + * @brief High-level OpenVINO Execution hint + * unlike low-level properties that are individual (per-device), the hints are something that every device accepts + * and turns into device-specific settings + * Execution mode hint controls preferred optimization targets (performance or accuracy) for given model + * @ingroup ov_runtime_cpp_prop_api + */ +static constexpr Property execution_mode{"EXECUTION_MODE_HINT"}; + } // namespace hint /** diff --git a/src/plugins/intel_cpu/src/config.cpp b/src/plugins/intel_cpu/src/config.cpp index b7a52b2b21c..e11ca103be8 100644 --- a/src/plugins/intel_cpu/src/config.cpp +++ b/src/plugins/intel_cpu/src/config.cpp @@ -150,7 +150,7 @@ void Config::readProperties(const std::map &prop) { IE_THROW() << "Wrong value for property key " << PluginConfigParams::KEY_ENFORCE_BF16 << ". Expected only YES/NO"; } - } else if (key == ov::hint::inference_precision.name()) { + } else if (key == ov::inference_precision.name()) { if (val == "bf16") { if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core)) { enforceBF16 = true; @@ -162,7 +162,7 @@ void Config::readProperties(const std::map &prop) { enforceBF16 = false; manualEnforceBF16 = false; } else { - IE_THROW() << "Wrong value for property key " << ov::hint::inference_precision.name() + IE_THROW() << "Wrong value for property key " << ov::inference_precision.name() << ". Supported values: bf16, f32"; } } else if (key == PluginConfigParams::KEY_CACHE_DIR) { @@ -266,4 +266,3 @@ void Config::updateProperties() { } // namespace intel_cpu } // namespace ov - diff --git a/src/plugins/intel_cpu/src/exec_network.cpp b/src/plugins/intel_cpu/src/exec_network.cpp index 62ba6ccbd8a..117cd6263ed 100644 --- a/src/plugins/intel_cpu/src/exec_network.cpp +++ b/src/plugins/intel_cpu/src/exec_network.cpp @@ -305,7 +305,7 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const RO_property(ov::affinity.name()), RO_property(ov::inference_num_threads.name()), RO_property(ov::enable_profiling.name()), - RO_property(ov::hint::inference_precision.name()), + RO_property(ov::inference_precision.name()), RO_property(ov::hint::performance_mode.name()), RO_property(ov::hint::num_requests.name()), RO_property(ov::execution_devices.name()), @@ -341,10 +341,10 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const } else if (name == ov::enable_profiling.name()) { const bool perfCount = config.collectPerfCounters; return decltype(ov::enable_profiling)::value_type(perfCount); - } else if (name == ov::hint::inference_precision) { + } else if (name == ov::inference_precision) { const auto enforceBF16 = config.enforceBF16; const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32; - return decltype(ov::hint::inference_precision)::value_type(inference_precision); + return decltype(ov::inference_precision)::value_type(inference_precision); } else if (name == ov::hint::performance_mode) { const auto perfHint = ov::util::from_string(config.perfHintsConfig.ovPerfHint, ov::hint::performance_mode); return perfHint; diff --git a/src/plugins/intel_cpu/src/plugin.cpp b/src/plugins/intel_cpu/src/plugin.cpp index 871cc3a5381..159134fbdb8 100644 --- a/src/plugins/intel_cpu/src/plugin.cpp +++ b/src/plugins/intel_cpu/src/plugin.cpp @@ -505,10 +505,10 @@ Parameter Engine::GetConfig(const std::string& name, const std::map> inference_precision; if ((inference_precision != ov::element::i8) && (inference_precision != ov::element::i16)) { @@ -194,7 +194,7 @@ OPENVINO_SUPPRESS_DEPRECATED_END } gnaPrecision = (inference_precision == ov::element::i8) ? Precision::I8 : Precision::I16; } else if (key == GNA_CONFIG_KEY(PRECISION)) { - check_compatibility(ov::hint::inference_precision.name()); + check_compatibility(ov::inference_precision.name()); auto precision = Precision::FromStr(value); if (precision != Precision::I8 && precision != Precision::I16) { THROW_GNA_EXCEPTION << "Unsupported precision of GNA hardware, should be Int16 or Int8, but was: " @@ -329,7 +329,7 @@ void Config::AdjustKeyMapValues() { gnaFlags.exclusive_async_requests ? PluginConfigParams::YES: PluginConfigParams::NO; keyConfigMap[ov::hint::performance_mode.name()] = ov::util::to_string(performance_mode); if (inference_precision != ov::element::undefined) { - keyConfigMap[ov::hint::inference_precision.name()] = ov::util::to_string(inference_precision); + keyConfigMap[ov::inference_precision.name()] = ov::util::to_string(inference_precision); } else { keyConfigMap[GNA_CONFIG_KEY(PRECISION)] = gnaPrecision.name(); } @@ -370,7 +370,7 @@ Parameter Config::GetParameter(const std::string& name) const { ov::intel_gna::HWGeneration::UNDEFINED); } else if (name == ov::hint::performance_mode) { return performance_mode; - } else if (name == ov::hint::inference_precision) { + } else if (name == ov::inference_precision) { return inference_precision; } else { auto result = keyConfigMap.find(name); @@ -399,7 +399,7 @@ const Parameter Config::GetSupportedProperties(bool compiled) { { ov::intel_gna::pwl_design_algorithm.name(), model_mutability }, { ov::intel_gna::pwl_max_error_percent.name(), model_mutability }, { ov::hint::performance_mode.name(), ov::PropertyMutability::RW }, - { ov::hint::inference_precision.name(), model_mutability }, + { ov::inference_precision.name(), model_mutability }, { ov::hint::num_requests.name(), model_mutability }, { ov::log::level.name(), ov::PropertyMutability::RW }, { ov::execution_devices.name(), ov::PropertyMutability::RO }, diff --git a/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_executable_network/get_metric.cpp b/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_executable_network/get_metric.cpp index b33ac4e2f84..39091715cff 100644 --- a/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_executable_network/get_metric.cpp +++ b/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_executable_network/get_metric.cpp @@ -173,7 +173,7 @@ INSTANTIATE_TEST_SUITE_P( ::testing::Combine( ::testing::Values("GNA"), ::testing::Values(ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i8), + ov::inference_precision(ngraph::element::i8), ov::hint::num_requests(2), ov::intel_gna::pwl_design_algorithm(ov::intel_gna::PWLDesignAlgorithm::UNIFORM_DISTRIBUTION), ov::intel_gna::pwl_max_error_percent(0.2), diff --git a/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_plugin/core_intergration.cpp b/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_plugin/core_intergration.cpp index 8dc3ed19bd6..8bc300bd79c 100644 --- a/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_plugin/core_intergration.cpp +++ b/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_plugin/core_intergration.cpp @@ -110,30 +110,35 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) { ov::Core core; ov::element::Type precision; - OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision)); + OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision)); ASSERT_EQ(ov::element::undefined, precision); + OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i8))); + OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision)); + ASSERT_EQ(ov::element::i8, precision); + + OPENVINO_SUPPRESS_DEPRECATED_START OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i8))); OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision)); - ASSERT_EQ(ov::element::i8, precision); + OPENVINO_SUPPRESS_DEPRECATED_END - OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i16))); - OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision)); + OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i16))); + OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision)); ASSERT_EQ(ov::element::i16, precision); - OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I8"}})); - OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision)); + OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I8"}})); + OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision)); ASSERT_EQ(ov::element::i8, precision); - OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I16"}})); - OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision)); + OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I16"}})); + OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision)); ASSERT_EQ(ov::element::i16, precision); - ASSERT_THROW(core.set_property("GNA", { ov::hint::inference_precision(ov::element::i8), + ASSERT_THROW(core.set_property("GNA", { ov::inference_precision(ov::element::i8), { GNA_CONFIG_KEY(PRECISION), "I16"}}), ov::Exception); - ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i32)), ov::Exception); - ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::undefined)), ov::Exception); - ASSERT_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "ABC"}}), ov::Exception); + ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i32)), ov::Exception); + ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::undefined)), ov::Exception); + ASSERT_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "ABC"}}), ov::Exception); } TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPerformanceHint) { diff --git a/src/plugins/intel_gna/tests/unit/gna_export_import_test.cpp b/src/plugins/intel_gna/tests/unit/gna_export_import_test.cpp index 7b6b10d3033..09fd35cbc78 100644 --- a/src/plugins/intel_gna/tests/unit/gna_export_import_test.cpp +++ b/src/plugins/intel_gna/tests/unit/gna_export_import_test.cpp @@ -169,7 +169,7 @@ protected: TEST_F(GNAExportImportTest, ExportImportI16) { const ov::AnyMap gna_config = { ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), - ov::hint::inference_precision(ngraph::element::i16) + ov::inference_precision(ngraph::element::i16) }; exported_file_name = "export_test.bin"; ExportModel(exported_file_name, gna_config); @@ -179,7 +179,7 @@ TEST_F(GNAExportImportTest, ExportImportI16) { TEST_F(GNAExportImportTest, ExportImportI8) { const ov::AnyMap gna_config = { ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), - ov::hint::inference_precision(ngraph::element::i8) + ov::inference_precision(ngraph::element::i8) }; exported_file_name = "export_test.bin"; ExportModel(exported_file_name, gna_config); @@ -202,4 +202,4 @@ TEST_F(GNAExportImportTest, ShowLibVersionFromModelInLogDebugMode) { const ov::AnyMap gna_config = {ov::log::level(ov::log::Level::DEBUG)}; EXPECT_THAT(ExportImportModelWithLogLevel(gna_config), HasSubstr(ov::intel_gna::common::get_openvino_version_string())); -} \ No newline at end of file +} diff --git a/src/plugins/intel_gna/tests/unit/gna_hw_precision_test.cpp b/src/plugins/intel_gna/tests/unit/gna_hw_precision_test.cpp index d3aaf7e7ee8..380b18db0c1 100644 --- a/src/plugins/intel_gna/tests/unit/gna_hw_precision_test.cpp +++ b/src/plugins/intel_gna/tests/unit/gna_hw_precision_test.cpp @@ -90,7 +90,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestDefault) { TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) { Run({ ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), - ov::hint::inference_precision(ngraph::element::i16) + ov::inference_precision(ngraph::element::i16) }); compare(ngraph::element::i16, ngraph::element::i32, sizeof(int16_t), sizeof(uint32_t)); } @@ -98,7 +98,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) { TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) { Run({ ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), - ov::hint::inference_precision(ngraph::element::i8) + ov::inference_precision(ngraph::element::i8) }); compare(ngraph::element::i16, ngraph::element::i32, sizeof(int8_t), Precision::fromType().size()); } @@ -106,7 +106,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) { TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8LP) { Run({ ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), - ov::hint::inference_precision(ngraph::element::i8) + ov::inference_precision(ngraph::element::i8) }, true); compare(ngraph::element::i8, ngraph::element::i32, sizeof(int8_t), sizeof(int8_t)); } diff --git a/src/plugins/intel_gna/tests/unit/gna_input_preproc_test.cpp b/src/plugins/intel_gna/tests/unit/gna_input_preproc_test.cpp index acae6284dd0..6ae0629823b 100644 --- a/src/plugins/intel_gna/tests/unit/gna_input_preproc_test.cpp +++ b/src/plugins/intel_gna/tests/unit/gna_input_preproc_test.cpp @@ -117,13 +117,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestFp32to ::testing::ValuesIn(std::vector { // gna config map {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 8.0f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 0.125f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, }), ::testing::Values(true), // gna device ::testing::Values(false), // use low precision @@ -141,13 +141,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestFp32to ::testing::ValuesIn(std::vector { // gna config map {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 4.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 0.25f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, }), ::testing::Values(true), // gna device ::testing::Values(true), // use low precision @@ -189,13 +189,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestI16toI ::testing::ValuesIn(std::vector { // gna config map {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 4.0f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 0.25f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, }), ::testing::Values(true), // gna device ::testing::Values(false), // use low precision @@ -214,13 +214,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestI16toI ::testing::ValuesIn(std::vector { // gna config map, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 10.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 20.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, }), ::testing::Values(true), // gna device ::testing::Values(true), // use low precision @@ -239,10 +239,10 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestU8toI1 ::testing::ValuesIn(std::vector { // gna config map {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 8.0f}}), - ov::hint::inference_precision(ngraph::element::i16)}, + ov::inference_precision(ngraph::element::i16)}, }), ::testing::Values(true), // gna device ::testing::Values(false), // use low precision @@ -261,10 +261,10 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestU8toI8 ::testing::ValuesIn(std::vector { // gna config map {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 1.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT), ov::intel_gna::scale_factors_per_input(std::map{{"0", 4.0f}}), - ov::hint::inference_precision(ngraph::element::i8)}, + ov::inference_precision(ngraph::element::i8)}, }), ::testing::Values(true), // gna device ::testing::Values(true), // use low precision diff --git a/src/plugins/intel_gpu/include/intel_gpu/runtime/execution_config.hpp b/src/plugins/intel_gpu/include/intel_gpu/runtime/execution_config.hpp index 93c3805fe5e..0af98bf1e95 100644 --- a/src/plugins/intel_gpu/include/intel_gpu/runtime/execution_config.hpp +++ b/src/plugins/intel_gpu/include/intel_gpu/runtime/execution_config.hpp @@ -142,6 +142,7 @@ public: protected: void apply_hints(const cldnn::device_info& info); + void apply_execution_hints(const cldnn::device_info& info); void apply_performance_hints(const cldnn::device_info& info); void apply_priority_hints(const cldnn::device_info& info); void apply_debug_options(const cldnn::device_info& info); diff --git a/src/plugins/intel_gpu/src/plugin/compiled_model.cpp b/src/plugins/intel_gpu/src/plugin/compiled_model.cpp index bcc170c2953..bdd985b51d1 100644 --- a/src/plugins/intel_gpu/src/plugin/compiled_model.cpp +++ b/src/plugins/intel_gpu/src/plugin/compiled_model.cpp @@ -473,10 +473,11 @@ InferenceEngine::Parameter CompiledModel::GetMetric(const std::string &name) con ov::PropertyName{ov::intel_gpu::enable_loop_unrolling.name(), PropertyMutability::RO}, ov::PropertyName{ov::cache_dir.name(), PropertyMutability::RO}, ov::PropertyName{ov::hint::performance_mode.name(), PropertyMutability::RO}, + ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RO}, ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RO}, ov::PropertyName{ov::num_streams.name(), PropertyMutability::RO}, ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RO}, - ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RO}, + ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RO}, ov::PropertyName{ov::device::id.name(), PropertyMutability::RO}, ov::PropertyName{ov::execution_devices.name(), PropertyMutability::RO} }; diff --git a/src/plugins/intel_gpu/src/plugin/legacy_api_helper.cpp b/src/plugins/intel_gpu/src/plugin/legacy_api_helper.cpp index 938dcebb857..15f201aa145 100644 --- a/src/plugins/intel_gpu/src/plugin/legacy_api_helper.cpp +++ b/src/plugins/intel_gpu/src/plugin/legacy_api_helper.cpp @@ -13,7 +13,7 @@ bool LegacyAPIHelper::is_new_api_property(const std::pair& static const std::vector new_properties_list = { ov::intel_gpu::hint::queue_priority.name(), ov::intel_gpu::hint::queue_throttle.name(), - ov::hint::inference_precision.name(), + ov::inference_precision.name(), ov::compilation_num_threads.name(), ov::num_streams.name(), }; diff --git a/src/plugins/intel_gpu/src/plugin/plugin.cpp b/src/plugins/intel_gpu/src/plugin/plugin.cpp index fb4b3cc6f52..9399d2c6d83 100644 --- a/src/plugins/intel_gpu/src/plugin/plugin.cpp +++ b/src/plugins/intel_gpu/src/plugin/plugin.cpp @@ -581,10 +581,11 @@ std::vector Plugin::get_supported_properties() const { ov::PropertyName{ov::intel_gpu::enable_loop_unrolling.name(), PropertyMutability::RW}, ov::PropertyName{ov::cache_dir.name(), PropertyMutability::RW}, ov::PropertyName{ov::hint::performance_mode.name(), PropertyMutability::RW}, + ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RW}, ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RW}, ov::PropertyName{ov::num_streams.name(), PropertyMutability::RW}, ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RW}, - ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RW}, + ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RW}, ov::PropertyName{ov::device::id.name(), PropertyMutability::RW}, }; diff --git a/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp b/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp index 064baabfbb4..6bb90f34851 100644 --- a/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp +++ b/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp @@ -206,7 +206,7 @@ void TransformationsPipeline::apply(std::shared_ptr func) { }; // Add conversion from FP data types to infer precision if it's specified - auto infer_precision = config.get_property(ov::hint::inference_precision); + auto infer_precision = config.get_property(ov::inference_precision); if (infer_precision != ov::element::undefined) { if (!fp_precision_supported(infer_precision)) infer_precision = fallback_precision; diff --git a/src/plugins/intel_gpu/src/runtime/execution_config.cpp b/src/plugins/intel_gpu/src/runtime/execution_config.cpp index 6bd5a344627..5a6cd0a770e 100644 --- a/src/plugins/intel_gpu/src/runtime/execution_config.cpp +++ b/src/plugins/intel_gpu/src/runtime/execution_config.cpp @@ -40,9 +40,10 @@ void ExecutionConfig::set_default() { std::make_tuple(ov::cache_dir, ""), std::make_tuple(ov::num_streams, 1), std::make_tuple(ov::compilation_num_threads, std::max(1, static_cast(std::thread::hardware_concurrency()))), - std::make_tuple(ov::hint::inference_precision, ov::element::f16, InferencePrecisionValidator()), + std::make_tuple(ov::inference_precision, ov::element::f16, InferencePrecisionValidator()), std::make_tuple(ov::hint::model_priority, ov::hint::Priority::MEDIUM), std::make_tuple(ov::hint::performance_mode, ov::hint::PerformanceMode::LATENCY, PerformanceModeValidator()), + std::make_tuple(ov::hint::execution_mode, ov::hint::ExecutionMode::PERFORMANCE), std::make_tuple(ov::hint::num_requests, 0), std::make_tuple(ov::intel_gpu::hint::host_task_priority, ov::hint::Priority::MEDIUM), @@ -119,6 +120,22 @@ Any ExecutionConfig::get_property(const std::string& name) const { return internal_properties.at(name); } +void ExecutionConfig::apply_execution_hints(const cldnn::device_info& info) { + if (is_set_by_user(ov::hint::execution_mode)) { + const auto mode = get_property(ov::hint::execution_mode); + if (!is_set_by_user(ov::inference_precision)) { + if (mode == ov::hint::ExecutionMode::ACCURACY) { + set_property(ov::inference_precision(ov::element::f32)); + } else if (mode == ov::hint::ExecutionMode::PERFORMANCE) { + if (info.supports_fp16) + set_property(ov::inference_precision(ov::element::f16)); + else + set_property(ov::inference_precision(ov::element::f32)); + } + } + } +} + void ExecutionConfig::apply_performance_hints(const cldnn::device_info& info) { if (is_set_by_user(ov::hint::performance_mode)) { const auto mode = get_property(ov::hint::performance_mode); @@ -158,6 +175,7 @@ void ExecutionConfig::apply_debug_options(const cldnn::device_info& info) { } void ExecutionConfig::apply_hints(const cldnn::device_info& info) { + apply_execution_hints(info); apply_performance_hints(info); apply_priority_hints(info); apply_debug_options(info); diff --git a/src/tests/functional/plugin/gpu/behavior/inference_precision.cpp b/src/tests/functional/plugin/gpu/behavior/inference_precision.cpp index 1b32ace027e..215ea840516 100644 --- a/src/tests/functional/plugin/gpu/behavior/inference_precision.cpp +++ b/src/tests/functional/plugin/gpu/behavior/inference_precision.cpp @@ -37,9 +37,9 @@ TEST_P(InferencePrecisionTests, smoke_canSetInferencePrecisionAndInfer) { ov::element::Type model_precision; ov::element::Type inference_precision; std::tie(model_precision, inference_precision) = GetParam(); - auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice("GPU", {1, 1, 32, 32}, model_precision); + auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(CommonTestUtils::DEVICE_GPU, {1, 1, 32, 32}, model_precision); ov::CompiledModel compiled_model; - OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, "GPU", ov::hint::inference_precision(inference_precision))); + OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::inference_precision(inference_precision))); auto req = compiled_model.create_infer_request(); OV_ASSERT_NO_THROW(req.infer()); } @@ -52,3 +52,35 @@ static const std::vector test_params = { }; INSTANTIATE_TEST_SUITE_P(smoke_GPU_BehaviorTests, InferencePrecisionTests, ::testing::ValuesIn(test_params), InferencePrecisionTests::getTestCaseName); + +TEST(InferencePrecisionTests, CantSetInvalidInferencePrecision) { + ov::Core core; + + ASSERT_NO_THROW(core.get_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision)); + ASSERT_ANY_THROW(core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::bf16))); + ASSERT_ANY_THROW(core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::undefined))); +} + +TEST(ExecutionModeTest, SetCompileGetInferPrecisionAndExecMode) { + ov::Core core; + + core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE)); + auto model = ngraph::builder::subgraph::makeConvPoolRelu(); + { + auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32)); + ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode)); + ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision)); + } + + { + auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY)); + ASSERT_EQ(ov::hint::ExecutionMode::ACCURACY, compiled_model.get_property(ov::hint::execution_mode)); + ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision)); + } + + { + auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU); + ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode)); + ASSERT_EQ(ov::element::f16, compiled_model.get_property(ov::hint::inference_precision)); + } +} diff --git a/src/tests/functional/plugin/gpu/concurrency/gpu_concurrency_tests.cpp b/src/tests/functional/plugin/gpu/concurrency/gpu_concurrency_tests.cpp index 846f5b17731..eda756c53eb 100644 --- a/src/tests/functional/plugin/gpu/concurrency/gpu_concurrency_tests.cpp +++ b/src/tests/functional/plugin/gpu/concurrency/gpu_concurrency_tests.cpp @@ -55,7 +55,7 @@ TEST_P(OVConcurrencyTest, canInferTwoExecNets) { auto fn = fn_ptrs[i]; auto exec_net = ie.compile_model(fn_ptrs[i], CommonTestUtils::DEVICE_GPU, - ov::num_streams(num_streams), ov::hint::inference_precision(ov::element::f32)); + ov::num_streams(num_streams), ov::inference_precision(ov::element::f32)); auto input = fn_ptrs[i]->get_parameters().at(0); auto output = fn_ptrs[i]->get_results().at(0); @@ -115,7 +115,7 @@ TEST(canSwapTensorsBetweenInferRequests, inputs) { auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat(); auto ie = ov::Core(); - auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32)); + auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32)); const int infer_requests_num = 2; ov::InferRequest infer_request1 = compiled_model.create_infer_request(); @@ -193,7 +193,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, usmHostIsNotChanged) { auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32); auto ie = ov::Core(); - auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32)); + auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32)); ov::InferRequest infer_request1 = compiled_model.create_infer_request(); ov::InferRequest infer_request2 = compiled_model.create_infer_request(); @@ -232,7 +232,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, canSetSystemHostTensor) { auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32); auto ie = ov::Core(); - auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32)); + auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32)); ov::InferRequest infer_request1 = compiled_model.create_infer_request(); ov::InferRequest infer_request2 = compiled_model.create_infer_request(); @@ -258,7 +258,7 @@ TEST(canSwapTensorsBetweenInferRequests, outputs) { auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat(); auto ie = ov::Core(); - auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32)); + auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32)); const int infer_requests_num = 2; ov::InferRequest infer_request1 = compiled_model.create_infer_request(); diff --git a/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp b/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp index ecf8575d4fb..96904a9eead 100644 --- a/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp +++ b/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp @@ -40,7 +40,7 @@ public: {CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "0"}, }; } - config.insert({ov::hint::inference_precision.name(), "f32"}); + config.insert({ov::inference_precision.name(), "f32"}); fn_ptr = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(with_auto_batching ? CommonTestUtils::DEVICE_BATCH : deviceName); } static std::string getTestCaseName(const testing::TestParamInfo& obj) { @@ -230,7 +230,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}}); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -277,7 +277,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}}); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -305,7 +305,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) { // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases) // without calling thread blocks auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get()); - auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}}); auto inf_req_shared = exec_net_shared.CreateInferRequest(); // Allocate shared buffers for input and output data which will be set to infer request @@ -375,7 +375,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}}); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -404,7 +404,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) { // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases) // without calling thread blocks auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get()); - auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}}); auto inf_req_shared = exec_net_shared.CreateInferRequest(); // Allocate shared buffers for input and output data which will be set to infer request @@ -469,7 +469,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) { auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc()); auto ie = PluginCache::get().ie(); - auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}}); // regular inference auto inf_req_regular = exec_net_regular.CreateInferRequest(); @@ -498,7 +498,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) { // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases) // without calling thread blocks auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get()); - auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}}); auto inf_req_shared = exec_net_shared.CreateInferRequest(); // Allocate shared buffers for input and output data which will be set to infer request @@ -601,7 +601,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) { /* XXX: is it correct to set KEY_CLDNN_NV12_TWO_INPUTS in case of remote blob? */ auto exec_net_b = ie.LoadNetwork(net_remote, CommonTestUtils::DEVICE_GPU, - { { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::hint::inference_precision.name(), "f32"} }); + { { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::inference_precision.name(), "f32"} }); auto inf_req_remote = exec_net_b.CreateInferRequest(); auto cldnn_context = exec_net_b.GetContext(); cl_context ctx = std::dynamic_pointer_cast(cldnn_context)->get(); @@ -670,7 +670,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) { net_local.getInputsInfo().begin()->second->setPrecision(Precision::U8); net_local.getInputsInfo().begin()->second->getPreProcess().setColorFormat(ColorFormat::NV12); - auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::hint::inference_precision.name(), "f32"}}); + auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::inference_precision.name(), "f32"}}); auto inf_req_local = exec_net_b1.CreateInferRequest(); @@ -742,7 +742,7 @@ TEST_P(TwoNets_Test, canInferTwoExecNets) { auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU, {{PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS, std::to_string(num_streams)}, - {ov::hint::inference_precision.name(), "f32"}}); + {ov::inference_precision.name(), "f32"}}); for (int j = 0; j < num_streams * num_requests; j++) { outputs.push_back(net.getOutputsInfo().begin()->first); diff --git a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp index 6ceb601ba02..3be67745fd7 100644 --- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp +++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp @@ -87,6 +87,10 @@ INSTANTIATE_TEST_SUITE_P( smoke_OVClassSetModelPriorityConfigTest, OVClassSetModelPriorityConfigTest, ::testing::Values("MULTI", "AUTO")); +INSTANTIATE_TEST_SUITE_P( + smoke_OVClassSetExecutionModeHintConfigTest, OVClassSetExecutionModeHintConfigTest, + ::testing::Values(CommonTestUtils::DEVICE_GPU)); + INSTANTIATE_TEST_SUITE_P( smoke_OVClassSetTBBForceTerminatePropertyTest, OVClassSetTBBForceTerminatePropertyTest, ::testing::Values("CPU", "GPU")); @@ -346,14 +350,21 @@ TEST_P(OVClassGetPropertyTest_GPU, GetAndSetInferencePrecisionNoThrow) { auto value = ov::element::undefined; const auto expected_default_precision = ov::element::f16; - OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision)); + OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision)); ASSERT_EQ(expected_default_precision, value); const auto forced_precision = ov::element::f32; - OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision))); - OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision)); + OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::inference_precision(forced_precision))); + OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision)); ASSERT_EQ(value, forced_precision); + + OPENVINO_SUPPRESS_DEPRECATED_START + const auto forced_precision_deprecated = ov::element::f16; + OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision_deprecated))); + OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision)); + ASSERT_EQ(value, forced_precision_deprecated); + OPENVINO_SUPPRESS_DEPRECATED_END } TEST_P(OVClassGetPropertyTest_GPU, GetAndSetModelPriorityNoThrow) { @@ -715,6 +726,9 @@ const std::vector gpuCorrectConfigs = { auto gpuCorrectConfigsWithSecondaryProperties = []() { return std::vector{ + {ov::device::properties(CommonTestUtils::DEVICE_GPU, + ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE), + ov::inference_precision(ov::element::f32))}, {ov::device::properties(CommonTestUtils::DEVICE_GPU, ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT), ov::hint::allow_auto_batching(false))}, diff --git a/src/tests/functional/plugin/shared/include/behavior/ov_plugin/core_integration.hpp b/src/tests/functional/plugin/shared/include/behavior/ov_plugin/core_integration.hpp index b17fe3b2af6..4180e6dd6ec 100644 --- a/src/tests/functional/plugin/shared/include/behavior/ov_plugin/core_integration.hpp +++ b/src/tests/functional/plugin/shared/include/behavior/ov_plugin/core_integration.hpp @@ -119,6 +119,7 @@ using OVClassLoadNetworkAfterCoreRecreateTest = OVClassBaseTestP; using OVClassLoadNetworkTest = OVClassQueryNetworkTest; using OVClassSetGlobalConfigTest = OVClassBaseTestP; using OVClassSetModelPriorityConfigTest = OVClassBaseTestP; +using OVClassSetExecutionModeHintConfigTest = OVClassBaseTestP; using OVClassSetTBBForceTerminatePropertyTest = OVClassBaseTestP; using OVClassSetLogLevelConfigTest = OVClassBaseTestP; using OVClassSpecificDeviceTestSetConfig = OVClassBaseTestP; @@ -430,6 +431,22 @@ TEST_P(OVClassSetModelPriorityConfigTest, SetConfigNoThrow) { EXPECT_EQ(value, ov::hint::Priority::HIGH); } +TEST_P(OVClassSetExecutionModeHintConfigTest, SetConfigNoThrow) { + ov::Core ie = createCoreWithTemplate(); + + OV_ASSERT_PROPERTY_SUPPORTED(ov::hint::execution_mode); + + ov::hint::ExecutionMode defaultMode{}; + ASSERT_NO_THROW(defaultMode = ie.get_property(target_device, ov::hint::execution_mode)); + + ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::UNDEFINED)); + ASSERT_EQ(ov::hint::ExecutionMode::UNDEFINED, ie.get_property(target_device, ov::hint::execution_mode)); + ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY)); + ASSERT_EQ(ov::hint::ExecutionMode::ACCURACY, ie.get_property(target_device, ov::hint::execution_mode)); + ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE)); + ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, ie.get_property(target_device, ov::hint::execution_mode)); +} + TEST_P(OVClassSetDevicePriorityConfigTest, SetConfigAndCheckGetConfigNoThrow) { ov::Core ie = createCoreWithTemplate(); std::string devicePriority; diff --git a/src/tests/functional/plugin/shared/src/execution_graph_tests/normalize_l2_decomposition.cpp b/src/tests/functional/plugin/shared/src/execution_graph_tests/normalize_l2_decomposition.cpp index 5eb9fb1b402..5de247f6f0d 100644 --- a/src/tests/functional/plugin/shared/src/execution_graph_tests/normalize_l2_decomposition.cpp +++ b/src/tests/functional/plugin/shared/src/execution_graph_tests/normalize_l2_decomposition.cpp @@ -36,7 +36,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNonContiguousAxes auto core = ov::Core(); ov::AnyMap config; if (device_name == CommonTestUtils::DEVICE_GPU) - config.insert(ov::hint::inference_precision(ov::element::f32)); + config.insert(ov::inference_precision(ov::element::f32)); const auto compiled_model = core.compile_model(model, device_name, config); ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied @@ -56,7 +56,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNormalizeOverAllA auto core = ov::Core(); ov::AnyMap config; if (device_name == CommonTestUtils::DEVICE_GPU) - config.insert(ov::hint::inference_precision(ov::element::f32)); + config.insert(ov::inference_precision(ov::element::f32)); const auto compiled_model = core.compile_model(model, device_name, config); ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied @@ -76,7 +76,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForNotSorted) { auto core = ov::Core(); ov::AnyMap config; if (device_name == CommonTestUtils::DEVICE_GPU) - config.insert(ov::hint::inference_precision(ov::element::f32)); + config.insert(ov::inference_precision(ov::element::f32)); const auto compiled_model = core.compile_model(model, device_name, config); ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied @@ -96,7 +96,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForSingleAxis) { auto core = ov::Core(); ov::AnyMap config; if (device_name == CommonTestUtils::DEVICE_GPU) - config.insert(ov::hint::inference_precision(ov::element::f32)); + config.insert(ov::inference_precision(ov::element::f32)); const auto compiled_model = core.compile_model(model, device_name, config); ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied diff --git a/src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp b/src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp index ee2b91c8985..135abd0def9 100644 --- a/src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp +++ b/src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp @@ -225,7 +225,7 @@ void SubgraphBaseTest::compile_model() { break; } } - configuration.insert({ov::hint::inference_precision.name(), hint}); + configuration.insert({ov::inference_precision.name(), hint}); } compiledModel = core->compile_model(function, targetDevice, configuration); diff --git a/src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp b/src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp index 30560a943cf..3ea4432c33a 100644 --- a/src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp +++ b/src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp @@ -54,7 +54,7 @@ void SnippetsTestsCommon::validateOriginalLayersNamesByType(const std::string& l ASSERT_TRUE(false) << "Layer type '" << layerType << "' was not found in compiled model"; } void SnippetsTestsCommon::setInferenceType(ov::element::Type type) { - configuration.emplace(ov::hint::inference_precision(type)); + configuration.emplace(ov::inference_precision(type)); } } // namespace test