Added common ov::execution_mode hint (#15048)

* [GPU] Added common exec mode hint and gpu support * Add ov::inference precision and update usages. Deprecate ov::hint::inference_precision property * [GPU] Execution mode tests and fixes * Fixed code style * Moved execution_mode test to common. Fixes for python API * Remove deprecations for hint::inference_precision and just keep both * Fix test
2023-01-18 20:13:00 +04:00 · 2023-01-18 20:13:00 +04:00 · 2201a5f83e
commit 2201a5f83e
parent 53e699eaba
38 changed files with 282 additions and 128 deletions
--- a/docs/OV_Runtime_UG/supported_plugins/CPU.md
+++ b/docs/OV_Runtime_UG/supported_plugins/CPU.md
@ -76,11 +76,11 @@ To check if the CPU device can support the `bfloat16` data type, use the [query

@endsphinxtabset

-If the model has been converted to `bf16`, the `ov::hint::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type:
+If the model has been converted to `bf16`, the `ov::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type:

@snippet snippets/cpu/Bfloat16Inference1.cpp part1

-To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::hint::inference_precision` to `ov::element::f32`.
+To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::inference_precision` to `ov::element::f32`.

@sphinxtabset

@ -95,9 +95,9 @@ To infer the model in `f32` precision instead of `bf16` on targets with native `
@endsphinxtabset

 The `Bfloat16` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance.
-To enable the simulation, the `ov::hint::inference_precision` has to be explicitly set to `ov::element::bf16`.
+To enable the simulation, the `ov::inference_precision` has to be explicitly set to `ov::element::bf16`.

-> **NOTE**: If ov::hint::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
+> **NOTE**: If ov::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.

 > **NOTE**: Due to the reduced mantissa size of the `bfloat16` data type, the resulting `bf16` inference accuracy may differ from the `f32` inference, especially for models that were not trained using the `bfloat16` data type. If the `bf16` inference accuracy is not acceptable, it is recommended to switch to the `f32` precision.

@ -204,7 +204,7 @@ The plugin supports the following properties:
 All parameters must be set before calling `ov::Core::compile_model()` in order to take effect or passed as additional argument to `ov::Core::compile_model()`

 - `ov::enable_profiling`
- `ov::hint::inference_precision`
+- `ov::inference_precision`
 - `ov::hint::performance_mode`
 - `ov::hint::num_request`
 - `ov::num_streams`
--- a/docs/OV_Runtime_UG/supported_plugins/GNA.md
+++ b/docs/OV_Runtime_UG/supported_plugins/GNA.md
@ -101,7 +101,7 @@ GNA plugin supports the `i16` and `i8` quantized data types as inference precisi
 * Accuracy (i16 weights)
 * Performance (i8 weights)

-For POT quantized model, the `ov::hint::inference_precision` property has no effect except cases described in <a href="#support-for-2d-convolutions-using-pot">Support for 2D Convolutions using POT</a>.
+For POT quantized model, the `ov::inference_precision` property has no effect except cases described in <a href="#support-for-2d-convolutions-using-pot">Support for 2D Convolutions using POT</a>.

 ## Supported Features

@ -206,7 +206,7 @@ In order to take effect, the following parameters must be set before model compi

 - ov::cache_dir
 - ov::enable_profiling
- ov::hint::inference_precision
+- ov::inference_precision
 - ov::hint::num_requests
 - ov::intel_gna::compile_target
 - ov::intel_gna::firmware_model_image_path
@ -272,7 +272,7 @@ The following tables provide a more explicit representation of the Intel(R) GNA

 For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met:
 * All convolution parameters are natively supported by HW (see tables above).
-* The runtime precision is explicitly set by the `ov::hint::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT.
+* The runtime precision is explicitly set by the `ov::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT.

 ### Batch Size Limitation

--- a/docs/OV_Runtime_UG/supported_plugins/GPU.md
+++ b/docs/OV_Runtime_UG/supported_plugins/GPU.md
@ -262,8 +262,9 @@ All parameters must be set before calling `ov::Core::compile_model()` in order t
 - ov::enable_profiling
 - ov::hint::model_priority
 - ov::hint::performance_mode
+- ov::hint::execution_mode
 - ov::hint::num_requests
- ov::hint::inference_precision
+- ov::inference_precision
 - ov::num_streams
 - ov::compilation_num_threads
 - ov::device::id
--- a/docs/optimization_guide/dldt_deployment_optimization_guide.md
+++ b/docs/optimization_guide/dldt_deployment_optimization_guide.md
@ -17,7 +17,7 @@
@endsphinxdirective

 Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
-`ov::hint::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model. 
+`ov::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model.

 Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold.

--- a/docs/snippets/cpu/Bfloat16Inference1.cpp
+++ b/docs/snippets/cpu/Bfloat16Inference1.cpp
@ -6,7 +6,7 @@ using namespace InferenceEngine;
 ov::Core core;
 auto network = core.read_model("sample.xml");
 auto exec_network = core.compile_model(network, "CPU");
-auto inference_precision = exec_network.get_property(ov::hint::inference_precision);
+auto inference_precision = exec_network.get_property(ov::inference_precision);
 //! [part1]

 return 0;
--- a/docs/snippets/cpu/Bfloat16Inference2.cpp
+++ b/docs/snippets/cpu/Bfloat16Inference2.cpp
@ -4,7 +4,7 @@ int main() {
 using namespace InferenceEngine;
 //! [part2]
 ov::Core core;
-core.set_property("CPU", ov::hint::inference_precision(ov::element::f32));
+core.set_property("CPU", ov::inference_precision(ov::element::f32));
 //! [part2]

 return 0;
--- a/docs/snippets/ov_hetero.cpp
+++ b/docs/snippets/ov_hetero.cpp
@ -49,7 +49,7 @@ auto compiled_model = core.compile_model(model, "HETERO",
    // profiling is enabled only for GPU
    ov::device::properties("GPU", ov::enable_profiling(true)),
    // FP32 inference precision only for CPU
-    ov::device::properties("CPU", ov::hint::inference_precision(ov::element::f32))
+    ov::device::properties("CPU", ov::inference_precision(ov::element::f32))
 );
 //! [configure_fallback_devices]
 }
--- a/docs/snippets/ov_properties_api.cpp
+++ b/docs/snippets/ov_properties_api.cpp
@ -19,7 +19,7 @@ auto model = core.read_model("sample.xml");
 //! [compile_model_with_property]
 auto compiled_model = core.compile_model(model, "CPU",
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
-    ov::hint::inference_precision(ov::element::f32));
+    ov::inference_precision(ov::element::f32));
 //! [compile_model_with_property]
 }

--- a/docs/snippets/ov_properties_migration.cpp
+++ b/docs/snippets/ov_properties_migration.cpp
@ -25,7 +25,7 @@ auto model = core.read_model("sample.xml");
 auto compiled_model = core.compile_model(model, "MULTI",
    ov::device::priorities("GPU", "CPU"),
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
-    ov::hint::inference_precision(ov::element::f32));
+    ov::inference_precision(ov::element::f32));
 //! [core_compile_model]

 //! [compiled_model_set_property]
--- a/samples/cpp/benchmark_app/main.cpp
+++ b/samples/cpp/benchmark_app/main.cpp
@ -500,13 +500,13 @@ int main(int argc, char* argv[]) {
                auto it_device_infer_precision = device_infer_precision.find(device);
                if (it_device_infer_precision != device_infer_precision.end()) {
                    // set to user defined value
-                    if (supported(ov::hint::inference_precision.name())) {
-                        device_config.emplace(ov::hint::inference_precision(it_device_infer_precision->second));
+                    if (supported(ov::inference_precision.name())) {
+                        device_config.emplace(ov::inference_precision(it_device_infer_precision->second));
                    } else if (device == "MULTI" || device == "AUTO") {
                        // check if the element contains the hardware device property
                        auto value_vec = split(it_device_infer_precision->second, ' ');
                        if (value_vec.size() == 1) {
-                            auto key = ov::hint::inference_precision.name();
+                            auto key = ov::inference_precision.name();
                            device_config[key] = it_device_infer_precision->second;
                        } else {
                            // set device inference_precison properties in the AUTO/MULTI plugin
@ -523,16 +523,16 @@ int main(int argc, char* argv[]) {
                                    is_dev_set_property[it.first] = false;
                                    device_config.erase(it.first);
                                    device_config.insert(
-                                        ov::device::properties(it.first, ov::hint::inference_precision(it.second)));
+                                        ov::device::properties(it.first, ov::inference_precision(it.second)));
                                } else {
                                    auto& property = device_config[it.first].as<ov::AnyMap>();
-                                    property.emplace(ov::hint::inference_precision(it.second));
+                                    property.emplace(ov::inference_precision(it.second));
                                }
                            }
                        }
                    } else {
                        throw std::logic_error("Device " + device + " doesn't support config key '" +
-                                               ov::hint::inference_precision.name() + "'! " +
+                                               ov::inference_precision.name() + "'! " +
                                               "Please specify -infer_precision for correct devices in format  "
                                               "<dev1>:<infer_precision1>,<dev2>:<infer_precision2>" +
                                               " or via configuration file.");
--- a/samples/cpp/speech_sample/main.cpp
+++ b/samples/cpp/speech_sample/main.cpp
@ -220,7 +220,7 @@ int main(int argc, char* argv[]) {
                gnaPluginConfig[ov::intel_gna::scale_factors_per_input.name()] = scale_factors_per_input;
            }
        }
-        gnaPluginConfig[ov::hint::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
+        gnaPluginConfig[ov::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
        auto parse_target = [&](const std::string& target) -> ov::intel_gna::HWGeneration {
            auto hw_target = ov::intel_gna::HWGeneration::UNDEFINED;

--- a/src/bindings/python/src/pyopenvino/core/properties/properties.cpp
+++ b/src/bindings/python/src/pyopenvino/core/properties/properties.cpp
@ -38,6 +38,7 @@ void regmodule_properties(py::module m) {
    wrap_property_RO(m_properties, ov::optimal_batch_size, "optimal_batch_size");
    wrap_property_RO(m_properties, ov::max_batch_size, "max_batch_size");
    wrap_property_RO(m_properties, ov::range_for_async_infer_requests, "range_for_async_infer_requests");
+    wrap_property_RW(m_properties, ov::inference_precision, "inference_precision");

    // Submodule hint
    py::module m_hint =
--- a/src/bindings/python/tests/test_runtime/test_properties.py
+++ b/src/bindings/python/tests/test_runtime/test_properties.py
@ -199,6 +199,7 @@ def test_properties_ro(ov_property_ro, expected_value):
            ((properties.Affinity.NONE, properties.Affinity.NONE),),
        ),
        (properties.force_tbb_terminate, "FORCE_TBB_TERMINATE", ((True, True),)),
+        (properties.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
        (properties.hint.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
        (
            properties.hint.model_priority,
@ -362,7 +363,7 @@ def test_single_property_setting(device):
                properties.cache_dir("./"),
                properties.inference_num_threads(9),
                properties.affinity(properties.Affinity.NONE),
-                properties.hint.inference_precision(Type.f32),
+                properties.inference_precision(Type.f32),
                properties.hint.performance_mode(properties.hint.PerformanceMode.LATENCY),
                properties.hint.num_requests(12),
                properties.streams.num(5),
@ -374,7 +375,7 @@ def test_single_property_setting(device):
            properties.cache_dir(): "./",
            properties.inference_num_threads(): 9,
            properties.affinity(): properties.Affinity.NONE,
-            properties.hint.inference_precision(): Type.f32,
+            properties.inference_precision(): Type.f32,
            properties.hint.performance_mode(): properties.hint.PerformanceMode.LATENCY,
            properties.hint.num_requests(): 12,
            properties.streams.num(): 5,
--- a/src/inference/include/openvino/runtime/properties.hpp
+++ b/src/inference/include/openvino/runtime/properties.hpp
@ -228,16 +228,22 @@ static constexpr Property<std::string, PropertyMutability::RO> model_name{"NETWO
 static constexpr Property<uint32_t, PropertyMutability::RO> optimal_number_of_infer_requests{
    "OPTIMAL_NUMBER_OF_INFER_REQUESTS"};

+/**
+ * @brief Hint for device to use specified precision for inference
+ * @ingroup ov_runtime_cpp_prop_api
+ */
+static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
+
 /**
 * @brief Namespace with hint properties
 */
 namespace hint {

 /**
- * @brief Hint for device to use specified precision for inference
+ * @brief An alias for inference_precision property for backward compatibility
 * @ingroup ov_runtime_cpp_prop_api
 */
-static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
+using ov::inference_precision;

 /**
 * @brief Enum to define possible priorities hints
@ -360,6 +366,56 @@ static constexpr Property<std::shared_ptr<ov::Model>> model{"MODEL_PTR"};
 * @ingroup ov_runtime_cpp_prop_api
 */
 static constexpr Property<bool, PropertyMutability::RW> allow_auto_batching{"ALLOW_AUTO_BATCHING"};
+
+/**
+ * @brief Enum to define possible execution mode hints
+ * @ingroup ov_runtime_cpp_prop_api
+ */
+enum class ExecutionMode {
+    UNDEFINED = -1,   //!<  Undefined value, settings may vary from device to device
+    PERFORMANCE = 1,  //!<  Optimize for max performance
+    ACCURACY = 2,     //!<  Optimize for max accuracy
+};
+
+/** @cond INTERNAL */
+inline std::ostream& operator<<(std::ostream& os, const ExecutionMode& mode) {
+    switch (mode) {
+    case ExecutionMode::UNDEFINED:
+        return os << "UNDEFINED";
+    case ExecutionMode::PERFORMANCE:
+        return os << "PERFORMANCE";
+    case ExecutionMode::ACCURACY:
+        return os << "ACCURACY";
+    default:
+        throw ov::Exception{"Unsupported execution mode hint"};
+    }
+}
+
+inline std::istream& operator>>(std::istream& is, ExecutionMode& mode) {
+    std::string str;
+    is >> str;
+    if (str == "PERFORMANCE") {
+        mode = ExecutionMode::PERFORMANCE;
+    } else if (str == "ACCURACY") {
+        mode = ExecutionMode::ACCURACY;
+    } else if (str == "UNDEFINED") {
+        mode = ExecutionMode::UNDEFINED;
+    } else {
+        throw ov::Exception{"Unsupported execution mode: " + str};
+    }
+    return is;
+}
+/** @endcond */
+
+/**
+ * @brief High-level OpenVINO Execution hint
+ * unlike low-level properties that are individual (per-device), the hints are something that every device accepts
+ * and turns into device-specific settings
+ * Execution mode hint controls preferred optimization targets (performance or accuracy) for given model
+ * @ingroup ov_runtime_cpp_prop_api
+ */
+static constexpr Property<ExecutionMode> execution_mode{"EXECUTION_MODE_HINT"};
+
 }  // namespace hint

 /**
--- a/src/plugins/intel_cpu/src/config.cpp
+++ b/src/plugins/intel_cpu/src/config.cpp
@ -150,7 +150,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
                IE_THROW() << "Wrong value for property key " << PluginConfigParams::KEY_ENFORCE_BF16
                    << ". Expected only YES/NO";
            }
-        } else if (key == ov::hint::inference_precision.name()) {
+        } else if (key == ov::inference_precision.name()) {
            if (val == "bf16") {
                if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core)) {
                    enforceBF16 = true;
@ -162,7 +162,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
                enforceBF16 = false;
                manualEnforceBF16 = false;
            } else {
-                IE_THROW() << "Wrong value for property key " << ov::hint::inference_precision.name()
+                IE_THROW() << "Wrong value for property key " << ov::inference_precision.name()
                    << ". Supported values: bf16, f32";
            }
        } else if (key == PluginConfigParams::KEY_CACHE_DIR) {
@ -266,4 +266,3 @@ void Config::updateProperties() {

 }   // namespace intel_cpu
 }   // namespace ov
-
--- a/src/plugins/intel_cpu/src/exec_network.cpp
+++ b/src/plugins/intel_cpu/src/exec_network.cpp
@ -305,7 +305,7 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
            RO_property(ov::affinity.name()),
            RO_property(ov::inference_num_threads.name()),
            RO_property(ov::enable_profiling.name()),
-            RO_property(ov::hint::inference_precision.name()),
+            RO_property(ov::inference_precision.name()),
            RO_property(ov::hint::performance_mode.name()),
            RO_property(ov::hint::num_requests.name()),
            RO_property(ov::execution_devices.name()),
@ -341,10 +341,10 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
    } else if (name == ov::enable_profiling.name()) {
        const bool perfCount = config.collectPerfCounters;
        return decltype(ov::enable_profiling)::value_type(perfCount);
-    } else if (name == ov::hint::inference_precision) {
+    } else if (name == ov::inference_precision) {
        const auto enforceBF16 = config.enforceBF16;
        const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
-        return decltype(ov::hint::inference_precision)::value_type(inference_precision);
+        return decltype(ov::inference_precision)::value_type(inference_precision);
    } else if (name == ov::hint::performance_mode) {
        const auto perfHint = ov::util::from_string(config.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
        return perfHint;
--- a/src/plugins/intel_cpu/src/plugin.cpp
+++ b/src/plugins/intel_cpu/src/plugin.cpp
@ -505,10 +505,10 @@ Parameter Engine::GetConfig(const std::string& name, const std::map<std::string,
    } else if (name == ov::enable_profiling.name()) {
        const bool perfCount = engConfig.collectPerfCounters;
        return decltype(ov::enable_profiling)::value_type(perfCount);
-    } else if (name == ov::hint::inference_precision) {
+    } else if (name == ov::inference_precision) {
        const auto enforceBF16 = engConfig.enforceBF16;
        const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
-        return decltype(ov::hint::inference_precision)::value_type(inference_precision);
+        return decltype(ov::inference_precision)::value_type(inference_precision);
    } else if (name == ov::hint::performance_mode) {
        const auto perfHint = ov::util::from_string(engConfig.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
        return perfHint;
@ -594,7 +594,7 @@ Parameter Engine::GetMetric(const std::string& name, const std::map<std::string,
                                                    RW_property(ov::affinity.name()),
                                                    RW_property(ov::inference_num_threads.name()),
                                                    RW_property(ov::enable_profiling.name()),
-                                                    RW_property(ov::hint::inference_precision.name()),
+                                                    RW_property(ov::inference_precision.name()),
                                                    RW_property(ov::hint::performance_mode.name()),
                                                    RW_property(ov::hint::num_requests.name()),
        };
--- a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/behavior/ov_plugin/core_integration.cpp
+++ b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/behavior/ov_plugin/core_integration.cpp
@ -224,14 +224,21 @@ TEST(OVClassBasicTest, smoke_SetConfigHintInferencePrecision) {
    auto value = ov::element::f32;
    const auto precision = InferenceEngine::with_cpu_x86_bfloat16() ? ov::element::bf16 : ov::element::f32;

-    OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
    ASSERT_EQ(precision, value);

    const auto forcedPrecision = ov::element::f32;

-    OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::hint::inference_precision(forcedPrecision)));
-    OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::inference_precision(forcedPrecision)));
+    OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
    ASSERT_EQ(value, forcedPrecision);
+
+    OPENVINO_SUPPRESS_DEPRECATED_START
+    const auto forced_precision_deprecated = ov::element::f32;
+    OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::hint::inference_precision(forced_precision_deprecated)));
+    OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
+    ASSERT_EQ(value, forced_precision_deprecated);
+    OPENVINO_SUPPRESS_DEPRECATED_END
 }

 TEST(OVClassBasicTest, smoke_SetConfigEnableProfiling) {
--- a/src/plugins/intel_gna/src/gna_plugin_config.cpp
+++ b/src/plugins/intel_gna/src/gna_plugin_config.cpp
@ -185,7 +185,7 @@ OPENVINO_SUPPRESS_DEPRECATED_END
            }
        } else if (key == ov::hint::performance_mode) {
            performance_mode = ov::util::from_string(value, ov::hint::performance_mode);
-        } else if (key ==  ov::hint::inference_precision) {
+        } else if (key ==  ov::inference_precision) {
            std::stringstream ss(value);
            ss >> inference_precision;
            if ((inference_precision != ov::element::i8) && (inference_precision != ov::element::i16)) {
@ -194,7 +194,7 @@ OPENVINO_SUPPRESS_DEPRECATED_END
            }
            gnaPrecision = (inference_precision == ov::element::i8) ? Precision::I8 : Precision::I16;
        } else if (key == GNA_CONFIG_KEY(PRECISION)) {
-            check_compatibility(ov::hint::inference_precision.name());
+            check_compatibility(ov::inference_precision.name());
            auto precision = Precision::FromStr(value);
            if (precision != Precision::I8 && precision != Precision::I16) {
                THROW_GNA_EXCEPTION << "Unsupported precision of GNA hardware, should be Int16 or Int8, but was: "
@ -329,7 +329,7 @@ void Config::AdjustKeyMapValues() {
            gnaFlags.exclusive_async_requests ? PluginConfigParams::YES: PluginConfigParams::NO;
    keyConfigMap[ov::hint::performance_mode.name()] = ov::util::to_string(performance_mode);
    if (inference_precision != ov::element::undefined) {
-        keyConfigMap[ov::hint::inference_precision.name()] = ov::util::to_string(inference_precision);
+        keyConfigMap[ov::inference_precision.name()] = ov::util::to_string(inference_precision);
    } else {
        keyConfigMap[GNA_CONFIG_KEY(PRECISION)] = gnaPrecision.name();
    }
@ -370,7 +370,7 @@ Parameter Config::GetParameter(const std::string& name) const {
                ov::intel_gna::HWGeneration::UNDEFINED);
    } else if (name == ov::hint::performance_mode) {
        return performance_mode;
-    } else if (name ==  ov::hint::inference_precision) {
+    } else if (name ==  ov::inference_precision) {
        return inference_precision;
    } else {
        auto result = keyConfigMap.find(name);
@ -399,7 +399,7 @@ const Parameter Config::GetSupportedProperties(bool compiled) {
        { ov::intel_gna::pwl_design_algorithm.name(), model_mutability },
        { ov::intel_gna::pwl_max_error_percent.name(), model_mutability },
        { ov::hint::performance_mode.name(), ov::PropertyMutability::RW },
-        { ov::hint::inference_precision.name(), model_mutability },
+        { ov::inference_precision.name(), model_mutability },
        { ov::hint::num_requests.name(), model_mutability },
        { ov::log::level.name(), ov::PropertyMutability::RW },
        { ov::execution_devices.name(), ov::PropertyMutability::RO },
--- a/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_executable_network/get_metric.cpp
+++ b/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_executable_network/get_metric.cpp
@ -173,7 +173,7 @@ INSTANTIATE_TEST_SUITE_P(
        ::testing::Combine(
        ::testing::Values("GNA"),
        ::testing::Values(ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                          ov::hint::inference_precision(ngraph::element::i8),
+                          ov::inference_precision(ngraph::element::i8),
                          ov::hint::num_requests(2),
                          ov::intel_gna::pwl_design_algorithm(ov::intel_gna::PWLDesignAlgorithm::UNIFORM_DISTRIBUTION),
                          ov::intel_gna::pwl_max_error_percent(0.2),
--- a/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_plugin/core_intergration.cpp
+++ b/src/plugins/intel_gna/tests/functional/shared_tests_instances/behavior/ov_plugin/core_intergration.cpp
@ -110,30 +110,35 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) {
    ov::Core core;
    ov::element::Type precision;

-    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
    ASSERT_EQ(ov::element::undefined, precision);

+    OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i8)));
+    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
+    ASSERT_EQ(ov::element::i8, precision);
+
+    OPENVINO_SUPPRESS_DEPRECATED_START
    OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i8)));
    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
-    ASSERT_EQ(ov::element::i8, precision);
+    OPENVINO_SUPPRESS_DEPRECATED_END

-    OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i16)));
-    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i16)));
+    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
    ASSERT_EQ(ov::element::i16, precision);

-    OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I8"}}));
-    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I8"}}));
+    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
    ASSERT_EQ(ov::element::i8, precision);

-    OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I16"}}));
-    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I16"}}));
+    OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
    ASSERT_EQ(ov::element::i16, precision);

-    ASSERT_THROW(core.set_property("GNA", { ov::hint::inference_precision(ov::element::i8),
+    ASSERT_THROW(core.set_property("GNA", { ov::inference_precision(ov::element::i8),
        { GNA_CONFIG_KEY(PRECISION), "I16"}}), ov::Exception);
-    ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i32)), ov::Exception);
-    ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::undefined)), ov::Exception);
-    ASSERT_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "ABC"}}), ov::Exception);
+    ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i32)), ov::Exception);
+    ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::undefined)), ov::Exception);
+    ASSERT_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "ABC"}}), ov::Exception);
 }

 TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPerformanceHint) {
--- a/src/plugins/intel_gna/tests/unit/gna_export_import_test.cpp
+++ b/src/plugins/intel_gna/tests/unit/gna_export_import_test.cpp
@ -169,7 +169,7 @@ protected:
 TEST_F(GNAExportImportTest, ExportImportI16) {
    const ov::AnyMap gna_config = {
        ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
-        ov::hint::inference_precision(ngraph::element::i16)
+        ov::inference_precision(ngraph::element::i16)
    };
    exported_file_name = "export_test.bin";
    ExportModel(exported_file_name, gna_config);
@ -179,7 +179,7 @@ TEST_F(GNAExportImportTest, ExportImportI16) {
 TEST_F(GNAExportImportTest, ExportImportI8) {
    const ov::AnyMap gna_config = {
        ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
-        ov::hint::inference_precision(ngraph::element::i8)
+        ov::inference_precision(ngraph::element::i8)
    };
    exported_file_name = "export_test.bin";
    ExportModel(exported_file_name, gna_config);
--- a/src/plugins/intel_gna/tests/unit/gna_hw_precision_test.cpp
+++ b/src/plugins/intel_gna/tests/unit/gna_hw_precision_test.cpp
@ -90,7 +90,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestDefault) {
 TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) {
    Run({
        ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
-        ov::hint::inference_precision(ngraph::element::i16)
+        ov::inference_precision(ngraph::element::i16)
    });
    compare(ngraph::element::i16, ngraph::element::i32, sizeof(int16_t), sizeof(uint32_t));
 }
@ -98,7 +98,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) {
 TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
    Run({
        ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
-        ov::hint::inference_precision(ngraph::element::i8)
+        ov::inference_precision(ngraph::element::i8)
    });
    compare(ngraph::element::i16, ngraph::element::i32, sizeof(int8_t), Precision::fromType<gna_compound_bias_t>().size());
 }
@ -106,7 +106,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
 TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8LP) {
    Run({
        ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
-        ov::hint::inference_precision(ngraph::element::i8)
+        ov::inference_precision(ngraph::element::i8)
    }, true);
    compare(ngraph::element::i8, ngraph::element::i32, sizeof(int8_t), sizeof(int8_t));
 }
--- a/src/plugins/intel_gna/tests/unit/gna_input_preproc_test.cpp
+++ b/src/plugins/intel_gna/tests/unit/gna_input_preproc_test.cpp
@ -117,13 +117,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestFp32to
                            ::testing::ValuesIn(std::vector<ov::AnyMap> {         // gna config map
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.125f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                            }),
                            ::testing::Values(true),                              // gna device
                            ::testing::Values(false),                             // use low precision
@ -141,13 +141,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestFp32to
                            ::testing::ValuesIn(std::vector<ov::AnyMap> {        // gna config map
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                            }),
                            ::testing::Values(true),                              // gna device
                            ::testing::Values(true),                              // use low precision
@ -189,13 +189,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestI16toI
                            ::testing::ValuesIn(std::vector<ov::AnyMap> {         // gna config map
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                            }),
                            ::testing::Values(true),                              // gna device
                            ::testing::Values(false),                             // use low precision
@ -214,13 +214,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestI16toI
                            ::testing::ValuesIn(std::vector<ov::AnyMap> {          // gna config map,
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 10.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 20.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                            }),
                            ::testing::Values(true),                              // gna device
                            ::testing::Values(true),                              // use low precision
@ -239,10 +239,10 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestU8toI1
                            ::testing::ValuesIn(std::vector<ov::AnyMap> {          // gna config map
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i16)},
+                                    ov::inference_precision(ngraph::element::i16)},
                            }),
                            ::testing::Values(true),                              // gna device
                            ::testing::Values(false),                             // use low precision
@ -261,10 +261,10 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestU8toI8
                            ::testing::ValuesIn(std::vector<ov::AnyMap> {         // gna config map
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                                {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
                                    ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
-                                    ov::hint::inference_precision(ngraph::element::i8)},
+                                    ov::inference_precision(ngraph::element::i8)},
                            }),
                            ::testing::Values(true),                              // gna device
                            ::testing::Values(true),                              // use low precision
--- a/src/plugins/intel_gpu/include/intel_gpu/runtime/execution_config.hpp
+++ b/src/plugins/intel_gpu/include/intel_gpu/runtime/execution_config.hpp
@ -142,6 +142,7 @@ public:

 protected:
    void apply_hints(const cldnn::device_info& info);
+    void apply_execution_hints(const cldnn::device_info& info);
    void apply_performance_hints(const cldnn::device_info& info);
    void apply_priority_hints(const cldnn::device_info& info);
    void apply_debug_options(const cldnn::device_info& info);
--- a/src/plugins/intel_gpu/src/plugin/compiled_model.cpp
+++ b/src/plugins/intel_gpu/src/plugin/compiled_model.cpp
@ -473,10 +473,11 @@ InferenceEngine::Parameter CompiledModel::GetMetric(const std::string &name) con
            ov::PropertyName{ov::intel_gpu::enable_loop_unrolling.name(), PropertyMutability::RO},
            ov::PropertyName{ov::cache_dir.name(), PropertyMutability::RO},
            ov::PropertyName{ov::hint::performance_mode.name(), PropertyMutability::RO},
+            ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RO},
            ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RO},
            ov::PropertyName{ov::num_streams.name(), PropertyMutability::RO},
            ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RO},
-            ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RO},
+            ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RO},
            ov::PropertyName{ov::device::id.name(), PropertyMutability::RO},
            ov::PropertyName{ov::execution_devices.name(), PropertyMutability::RO}
        };
--- a/src/plugins/intel_gpu/src/plugin/legacy_api_helper.cpp
+++ b/src/plugins/intel_gpu/src/plugin/legacy_api_helper.cpp
@ -13,7 +13,7 @@ bool LegacyAPIHelper::is_new_api_property(const std::pair<std::string, ov::Any>&
    static const std::vector<std::string> new_properties_list = {
        ov::intel_gpu::hint::queue_priority.name(),
        ov::intel_gpu::hint::queue_throttle.name(),
-        ov::hint::inference_precision.name(),
+        ov::inference_precision.name(),
        ov::compilation_num_threads.name(),
        ov::num_streams.name(),
    };
--- a/src/plugins/intel_gpu/src/plugin/plugin.cpp
+++ b/src/plugins/intel_gpu/src/plugin/plugin.cpp
@ -581,10 +581,11 @@ std::vector<ov::PropertyName> Plugin::get_supported_properties() const {
        ov::PropertyName{ov::intel_gpu::enable_loop_unrolling.name(), PropertyMutability::RW},
        ov::PropertyName{ov::cache_dir.name(), PropertyMutability::RW},
        ov::PropertyName{ov::hint::performance_mode.name(), PropertyMutability::RW},
+        ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RW},
        ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RW},
        ov::PropertyName{ov::num_streams.name(), PropertyMutability::RW},
        ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RW},
-        ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RW},
+        ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RW},
        ov::PropertyName{ov::device::id.name(), PropertyMutability::RW},
    };

--- a/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp
+++ b/src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp
@ -206,7 +206,7 @@ void TransformationsPipeline::apply(std::shared_ptr<ov::Model> func) {
        };

        // Add conversion from FP data types to infer precision if it's specified
-        auto infer_precision = config.get_property(ov::hint::inference_precision);
+        auto infer_precision = config.get_property(ov::inference_precision);
        if (infer_precision != ov::element::undefined) {
            if (!fp_precision_supported(infer_precision))
                infer_precision = fallback_precision;
--- a/src/plugins/intel_gpu/src/runtime/execution_config.cpp
+++ b/src/plugins/intel_gpu/src/runtime/execution_config.cpp
@ -40,9 +40,10 @@ void ExecutionConfig::set_default() {
        std::make_tuple(ov::cache_dir, ""),
        std::make_tuple(ov::num_streams, 1),
        std::make_tuple(ov::compilation_num_threads, std::max(1, static_cast<int>(std::thread::hardware_concurrency()))),
-        std::make_tuple(ov::hint::inference_precision, ov::element::f16, InferencePrecisionValidator()),
+        std::make_tuple(ov::inference_precision, ov::element::f16, InferencePrecisionValidator()),
        std::make_tuple(ov::hint::model_priority, ov::hint::Priority::MEDIUM),
        std::make_tuple(ov::hint::performance_mode, ov::hint::PerformanceMode::LATENCY, PerformanceModeValidator()),
+        std::make_tuple(ov::hint::execution_mode, ov::hint::ExecutionMode::PERFORMANCE),
        std::make_tuple(ov::hint::num_requests, 0),

        std::make_tuple(ov::intel_gpu::hint::host_task_priority, ov::hint::Priority::MEDIUM),
@ -119,6 +120,22 @@ Any ExecutionConfig::get_property(const std::string& name) const {
    return internal_properties.at(name);
 }

+void ExecutionConfig::apply_execution_hints(const cldnn::device_info& info) {
+    if (is_set_by_user(ov::hint::execution_mode)) {
+        const auto mode = get_property(ov::hint::execution_mode);
+        if (!is_set_by_user(ov::inference_precision)) {
+            if (mode == ov::hint::ExecutionMode::ACCURACY) {
+                set_property(ov::inference_precision(ov::element::f32));
+            } else if (mode == ov::hint::ExecutionMode::PERFORMANCE) {
+                if (info.supports_fp16)
+                    set_property(ov::inference_precision(ov::element::f16));
+                else
+                    set_property(ov::inference_precision(ov::element::f32));
+            }
+        }
+    }
+}
+
 void ExecutionConfig::apply_performance_hints(const cldnn::device_info& info) {
    if (is_set_by_user(ov::hint::performance_mode)) {
        const auto mode = get_property(ov::hint::performance_mode);
@ -158,6 +175,7 @@ void ExecutionConfig::apply_debug_options(const cldnn::device_info& info) {
 }

 void ExecutionConfig::apply_hints(const cldnn::device_info& info) {
+    apply_execution_hints(info);
    apply_performance_hints(info);
    apply_priority_hints(info);
    apply_debug_options(info);
--- a/src/tests/functional/plugin/gpu/behavior/inference_precision.cpp
+++ b/src/tests/functional/plugin/gpu/behavior/inference_precision.cpp
@ -37,9 +37,9 @@ TEST_P(InferencePrecisionTests, smoke_canSetInferencePrecisionAndInfer) {
    ov::element::Type model_precision;
    ov::element::Type inference_precision;
    std::tie(model_precision, inference_precision) = GetParam();
-    auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice("GPU", {1, 1, 32, 32}, model_precision);
+    auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(CommonTestUtils::DEVICE_GPU, {1, 1, 32, 32}, model_precision);
    ov::CompiledModel compiled_model;
-    OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, "GPU", ov::hint::inference_precision(inference_precision)));
+    OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::inference_precision(inference_precision)));
    auto req = compiled_model.create_infer_request();
    OV_ASSERT_NO_THROW(req.infer());
 }
@ -52,3 +52,35 @@ static const std::vector<params> test_params = {
 };

 INSTANTIATE_TEST_SUITE_P(smoke_GPU_BehaviorTests, InferencePrecisionTests, ::testing::ValuesIn(test_params), InferencePrecisionTests::getTestCaseName);
+
+TEST(InferencePrecisionTests, CantSetInvalidInferencePrecision) {
+    ov::Core core;
+
+    ASSERT_NO_THROW(core.get_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision));
+    ASSERT_ANY_THROW(core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::bf16)));
+    ASSERT_ANY_THROW(core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::undefined)));
+}
+
+TEST(ExecutionModeTest, SetCompileGetInferPrecisionAndExecMode) {
+    ov::Core core;
+
+    core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
+    auto model = ngraph::builder::subgraph::makeConvPoolRelu();
+    {
+        auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
+        ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode));
+        ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision));
+    }
+
+    {
+        auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY));
+        ASSERT_EQ(ov::hint::ExecutionMode::ACCURACY, compiled_model.get_property(ov::hint::execution_mode));
+        ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision));
+    }
+
+    {
+        auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU);
+        ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode));
+        ASSERT_EQ(ov::element::f16, compiled_model.get_property(ov::hint::inference_precision));
+    }
+}
--- a/src/tests/functional/plugin/gpu/concurrency/gpu_concurrency_tests.cpp
+++ b/src/tests/functional/plugin/gpu/concurrency/gpu_concurrency_tests.cpp
@ -55,7 +55,7 @@ TEST_P(OVConcurrencyTest, canInferTwoExecNets) {
        auto fn = fn_ptrs[i];

        auto exec_net = ie.compile_model(fn_ptrs[i], CommonTestUtils::DEVICE_GPU,
-                                         ov::num_streams(num_streams), ov::hint::inference_precision(ov::element::f32));
+                                         ov::num_streams(num_streams), ov::inference_precision(ov::element::f32));

        auto input = fn_ptrs[i]->get_parameters().at(0);
        auto output = fn_ptrs[i]->get_results().at(0);
@ -115,7 +115,7 @@ TEST(canSwapTensorsBetweenInferRequests, inputs) {
    auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();

    auto ie = ov::Core();
-    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
+    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));

    const int infer_requests_num = 2;
    ov::InferRequest infer_request1 = compiled_model.create_infer_request();
@ -193,7 +193,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, usmHostIsNotChanged) {
    auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);

    auto ie = ov::Core();
-    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
+    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));

    ov::InferRequest infer_request1 = compiled_model.create_infer_request();
    ov::InferRequest infer_request2 = compiled_model.create_infer_request();
@ -232,7 +232,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, canSetSystemHostTensor) {
    auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);

    auto ie = ov::Core();
-    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
+    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));

    ov::InferRequest infer_request1 = compiled_model.create_infer_request();
    ov::InferRequest infer_request2 = compiled_model.create_infer_request();
@ -258,7 +258,7 @@ TEST(canSwapTensorsBetweenInferRequests, outputs) {
    auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();

    auto ie = ov::Core();
-    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
+    auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));

    const int infer_requests_num = 2;
    ov::InferRequest infer_request1 = compiled_model.create_infer_request();
--- a/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp
+++ b/src/tests/functional/plugin/gpu/remote_blob_tests/cldnn_remote_blob_tests.cpp
@ -40,7 +40,7 @@ public:
                            {CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "0"},
                            };
            }
-        config.insert({ov::hint::inference_precision.name(), "f32"});
+        config.insert({ov::inference_precision.name(), "f32"});
        fn_ptr = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(with_auto_batching ? CommonTestUtils::DEVICE_BATCH : deviceName);
    }
    static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
@ -230,7 +230,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());

    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});

    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -277,7 +277,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());

    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});

    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -305,7 +305,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
    // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
    // without calling thread blocks
    auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
-    auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
    auto inf_req_shared = exec_net_shared.CreateInferRequest();

    // Allocate shared buffers for input and output data which will be set to infer request
@ -375,7 +375,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());

    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});

    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -404,7 +404,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
    // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
    // without calling thread blocks
    auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
-    auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
    auto inf_req_shared = exec_net_shared.CreateInferRequest();

    // Allocate shared buffers for input and output data which will be set to infer request
@ -469,7 +469,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
    auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());

    auto ie = PluginCache::get().ie();
-    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});

    // regular inference
    auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -498,7 +498,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
    // In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
    // without calling thread blocks
    auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
-    auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
    auto inf_req_shared = exec_net_shared.CreateInferRequest();

    // Allocate shared buffers for input and output data which will be set to infer request
@ -601,7 +601,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {

    /* XXX: is it correct to set KEY_CLDNN_NV12_TWO_INPUTS in case of remote blob? */
    auto exec_net_b = ie.LoadNetwork(net_remote, CommonTestUtils::DEVICE_GPU,
-                { { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::hint::inference_precision.name(), "f32"} });
+                { { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::inference_precision.name(), "f32"} });
    auto inf_req_remote = exec_net_b.CreateInferRequest();
    auto cldnn_context = exec_net_b.GetContext();
    cl_context ctx = std::dynamic_pointer_cast<ClContext>(cldnn_context)->get();
@ -670,7 +670,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
    net_local.getInputsInfo().begin()->second->setPrecision(Precision::U8);
    net_local.getInputsInfo().begin()->second->getPreProcess().setColorFormat(ColorFormat::NV12);

-    auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::hint::inference_precision.name(), "f32"}});
+    auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::inference_precision.name(), "f32"}});

    auto inf_req_local = exec_net_b1.CreateInferRequest();

@ -742,7 +742,7 @@ TEST_P(TwoNets_Test, canInferTwoExecNets) {

        auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU,
                                       {{PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS, std::to_string(num_streams)},
-                                        {ov::hint::inference_precision.name(), "f32"}});
+                                        {ov::inference_precision.name(), "f32"}});

        for (int j = 0; j < num_streams * num_requests; j++) {
            outputs.push_back(net.getOutputsInfo().begin()->first);
--- a/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp
+++ b/src/tests/functional/plugin/gpu/shared_tests_instances/behavior/ov_plugin/core_integration.cpp
@ -87,6 +87,10 @@ INSTANTIATE_TEST_SUITE_P(
        smoke_OVClassSetModelPriorityConfigTest, OVClassSetModelPriorityConfigTest,
        ::testing::Values("MULTI", "AUTO"));

+INSTANTIATE_TEST_SUITE_P(
+        smoke_OVClassSetExecutionModeHintConfigTest, OVClassSetExecutionModeHintConfigTest,
+        ::testing::Values(CommonTestUtils::DEVICE_GPU));
+
 INSTANTIATE_TEST_SUITE_P(
        smoke_OVClassSetTBBForceTerminatePropertyTest, OVClassSetTBBForceTerminatePropertyTest,
        ::testing::Values("CPU", "GPU"));
@ -346,14 +350,21 @@ TEST_P(OVClassGetPropertyTest_GPU, GetAndSetInferencePrecisionNoThrow) {
    auto value = ov::element::undefined;
    const auto expected_default_precision = ov::element::f16;

-    OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
    ASSERT_EQ(expected_default_precision, value);

    const auto forced_precision = ov::element::f32;

-    OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision)));
-    OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
+    OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::inference_precision(forced_precision)));
+    OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
    ASSERT_EQ(value, forced_precision);
+
+    OPENVINO_SUPPRESS_DEPRECATED_START
+    const auto forced_precision_deprecated = ov::element::f16;
+    OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision_deprecated)));
+    OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
+    ASSERT_EQ(value, forced_precision_deprecated);
+    OPENVINO_SUPPRESS_DEPRECATED_END
 }

 TEST_P(OVClassGetPropertyTest_GPU, GetAndSetModelPriorityNoThrow) {
@ -715,6 +726,9 @@ const std::vector<ov::AnyMap> gpuCorrectConfigs = {

 auto gpuCorrectConfigsWithSecondaryProperties = []() {
    return std::vector<ov::AnyMap>{
+        {ov::device::properties(CommonTestUtils::DEVICE_GPU,
+                                ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE),
+                                ov::inference_precision(ov::element::f32))},
        {ov::device::properties(CommonTestUtils::DEVICE_GPU,
                                ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
                                ov::hint::allow_auto_batching(false))},
--- a/src/tests/functional/plugin/shared/include/behavior/ov_plugin/core_integration.hpp
+++ b/src/tests/functional/plugin/shared/include/behavior/ov_plugin/core_integration.hpp
@ -119,6 +119,7 @@ using OVClassLoadNetworkAfterCoreRecreateTest = OVClassBaseTestP;
 using OVClassLoadNetworkTest = OVClassQueryNetworkTest;
 using OVClassSetGlobalConfigTest = OVClassBaseTestP;
 using OVClassSetModelPriorityConfigTest = OVClassBaseTestP;
+using OVClassSetExecutionModeHintConfigTest = OVClassBaseTestP;
 using OVClassSetTBBForceTerminatePropertyTest = OVClassBaseTestP;
 using OVClassSetLogLevelConfigTest = OVClassBaseTestP;
 using OVClassSpecificDeviceTestSetConfig = OVClassBaseTestP;
@ -430,6 +431,22 @@ TEST_P(OVClassSetModelPriorityConfigTest, SetConfigNoThrow) {
    EXPECT_EQ(value, ov::hint::Priority::HIGH);
 }

+TEST_P(OVClassSetExecutionModeHintConfigTest, SetConfigNoThrow) {
+    ov::Core ie = createCoreWithTemplate();
+
+    OV_ASSERT_PROPERTY_SUPPORTED(ov::hint::execution_mode);
+
+    ov::hint::ExecutionMode defaultMode{};
+    ASSERT_NO_THROW(defaultMode = ie.get_property(target_device, ov::hint::execution_mode));
+
+    ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::UNDEFINED));
+    ASSERT_EQ(ov::hint::ExecutionMode::UNDEFINED, ie.get_property(target_device, ov::hint::execution_mode));
+    ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY));
+    ASSERT_EQ(ov::hint::ExecutionMode::ACCURACY, ie.get_property(target_device, ov::hint::execution_mode));
+    ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
+    ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, ie.get_property(target_device, ov::hint::execution_mode));
+}
+
 TEST_P(OVClassSetDevicePriorityConfigTest, SetConfigAndCheckGetConfigNoThrow) {
    ov::Core ie = createCoreWithTemplate();
    std::string devicePriority;
--- a/src/tests/functional/plugin/shared/src/execution_graph_tests/normalize_l2_decomposition.cpp
+++ b/src/tests/functional/plugin/shared/src/execution_graph_tests/normalize_l2_decomposition.cpp
@ -36,7 +36,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNonContiguousAxes
      auto core = ov::Core();
      ov::AnyMap config;
      if (device_name == CommonTestUtils::DEVICE_GPU)
-        config.insert(ov::hint::inference_precision(ov::element::f32));
+        config.insert(ov::inference_precision(ov::element::f32));
      const auto compiled_model = core.compile_model(model, device_name, config);

      ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
@ -56,7 +56,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNormalizeOverAllA
      auto core = ov::Core();
      ov::AnyMap config;
      if (device_name == CommonTestUtils::DEVICE_GPU)
-        config.insert(ov::hint::inference_precision(ov::element::f32));
+        config.insert(ov::inference_precision(ov::element::f32));
      const auto compiled_model = core.compile_model(model, device_name, config);

      ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
@ -76,7 +76,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForNotSorted) {
      auto core = ov::Core();
      ov::AnyMap config;
      if (device_name == CommonTestUtils::DEVICE_GPU)
-        config.insert(ov::hint::inference_precision(ov::element::f32));
+        config.insert(ov::inference_precision(ov::element::f32));
      const auto compiled_model = core.compile_model(model, device_name, config);

      ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
@ -96,7 +96,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForSingleAxis) {
      auto core = ov::Core();
      ov::AnyMap config;
      if (device_name == CommonTestUtils::DEVICE_GPU)
-        config.insert(ov::hint::inference_precision(ov::element::f32));
+        config.insert(ov::inference_precision(ov::element::f32));
      const auto compiled_model = core.compile_model(model, device_name, config);

      ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
--- a/src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp
+++ b/src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp
@ -225,7 +225,7 @@ void SubgraphBaseTest::compile_model() {
                break;
            }
        }
-        configuration.insert({ov::hint::inference_precision.name(), hint});
+        configuration.insert({ov::inference_precision.name(), hint});
    }

    compiledModel = core->compile_model(function, targetDevice, configuration);
--- a/src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp
+++ b/src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp
@ -54,7 +54,7 @@ void SnippetsTestsCommon::validateOriginalLayersNamesByType(const std::string& l
    ASSERT_TRUE(false) << "Layer type '" << layerType << "' was not found in compiled model";
 }
 void SnippetsTestsCommon::setInferenceType(ov::element::Type type) {
-    configuration.emplace(ov::hint::inference_precision(type));
+    configuration.emplace(ov::inference_precision(type));
 }

 }  // namespace test