Merge branch 'itikhono/ts/slice' of https://github.com/itikhono/openvino into itikhono/ts/slice
This commit is contained in:
commit
bf9bc8628b
@ -1,49 +1,45 @@
|
||||
# Asynchronous Inference Request {#openvino_docs_ie_plugin_dg_async_infer_request}
|
||||
# Asynchronous Inference Request {#openvino_docs_ov_plugin_dg_async_infer_request}
|
||||
|
||||
Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure.
|
||||
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
|
||||
OpenVINO Runtime Plugin API provides the base ov::IAsyncInferRequest class:
|
||||
|
||||
- The class has the `_pipeline` field of `std::vector<std::pair<ITaskExecutor::Ptr, Task> >`, which contains pairs of an executor and executed task.
|
||||
- The class has the `m_pipeline` field of `std::vector<std::pair<std::shared_ptr<ov::threading::ITaskExecutor>, ov::threading::Task> >`, which contains pairs of an executor and executed task.
|
||||
- All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks.
|
||||
- The class has the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method, which waits for `_pipeline` to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the executable network instance and are not destroyed.
|
||||
- The class has the ov::IAsyncInferRequest::stop_and_wait method, which waits for `m_pipeline` to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the compiled model instance and are not destroyed.
|
||||
|
||||
`AsyncInferRequest` Class
|
||||
AsyncInferRequest Class
|
||||
------------------------
|
||||
|
||||
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
|
||||
OpenVINO Runtime Plugin API provides the base ov::IAsyncInferRequest class for a custom asynchronous inference request implementation:
|
||||
|
||||
@snippet src/async_infer_request.hpp async_infer_request:header
|
||||
|
||||
#### Class Fields
|
||||
### Class Fields
|
||||
|
||||
- `_inferRequest` - a reference to the [synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
|
||||
- `_waitExecutor` - a task executor that waits for a response from a device about device tasks completion
|
||||
- `m_wait_executor` - a task executor that waits for a response from a device about device tasks completion
|
||||
|
||||
> **NOTE**: If a plugin can work with several instances of a device, `_waitExecutor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
|
||||
> **NOTE**: If a plugin can work with several instances of a device, `m_wait_executor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
|
||||
|
||||
### `AsyncInferRequest()`
|
||||
### AsyncInferRequest()
|
||||
|
||||
The main goal of the `AsyncInferRequest` constructor is to define a device pipeline `_pipeline`. The example below demonstrates `_pipeline` creation with the following stages:
|
||||
The main goal of the `AsyncInferRequest` constructor is to define a device pipeline `m_pipeline`. The example below demonstrates `m_pipeline` creation with the following stages:
|
||||
|
||||
- `inferPreprocess` is a CPU compute task.
|
||||
- `startPipeline` is a CPU ligthweight task to submit tasks to a remote device.
|
||||
- `waitPipeline` is a CPU non-compute task that waits for a response from a remote device.
|
||||
- `inferPostprocess` is a CPU compute task.
|
||||
- `infer_preprocess_and_start_pipeline` is a CPU ligthweight task to submit tasks to a remote device.
|
||||
- `wait_pipeline` is a CPU non-compute task that waits for a response from a remote device.
|
||||
- `infer_postprocess` is a CPU compute task.
|
||||
|
||||
@snippet src/async_infer_request.cpp async_infer_request:ctor
|
||||
|
||||
The stages are distributed among two task executors in the following way:
|
||||
|
||||
- `inferPreprocess` and `startPipeline` are combined into a single task and run on `_requestExecutor`, which computes CPU tasks.
|
||||
- `infer_preprocess_and_start_pipeline` prepare input tensors and run on `m_request_executor`, which computes CPU tasks.
|
||||
- You need at least two executors to overlap compute tasks of a CPU and a remote device the plugin works with. Otherwise, CPU and device tasks are executed serially one by one.
|
||||
- `waitPipeline` is sent to `_waitExecutor`, which works with the device.
|
||||
- `wait_pipeline` is sent to `m_wait_executor`, which works with the device.
|
||||
|
||||
> **NOTE**: `callbackExecutor` is also passed to the constructor and it is used in the base InferenceEngine::AsyncInferRequestThreadSafeDefault class, which adds a pair of `callbackExecutor` and a callback function set by the user to the end of the pipeline.
|
||||
> **NOTE**: `m_callback_executor` is also passed to the constructor and it is used in the base ov::IAsyncInferRequest class, which adds a pair of `callback_executor` and a callback function set by the user to the end of the pipeline.
|
||||
|
||||
Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which shows how pipelines of multiple asynchronous inference requests are run in parallel via the [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) tool.
|
||||
### ~AsyncInferRequest()
|
||||
|
||||
### `~AsyncInferRequest()`
|
||||
|
||||
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class.
|
||||
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the ov::IAsyncInferRequest::stop_and_wait method of the base class.
|
||||
|
||||
@snippet src/async_infer_request.cpp async_infer_request:dtor
|
||||
|
@ -4,8 +4,8 @@ ov::CompiledModel class functionality:
|
||||
- Compile an ov::Model instance to a backend specific graph representation
|
||||
- Create an arbitrary number of ov::InferRequest objects
|
||||
- Hold some common resources shared between different instances of ov::InferRequest. For example:
|
||||
- ov::ICompiledModel::m_task_executor task executor to implement asynchronous execution
|
||||
- ov::ICompiledModel::m_callback_executor task executor to run an asynchronous inference request callback in a separate thread
|
||||
- ov::ICompiledModel::m_task_executor task executor to implement asynchronous execution
|
||||
- ov::ICompiledModel::m_callback_executor task executor to run an asynchronous inference request callback in a separate thread
|
||||
|
||||
CompiledModel Class
|
||||
------------------------
|
||||
@ -54,7 +54,7 @@ The method creates an synchronous inference request and returns it.
|
||||
While the public OpenVINO API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
|
||||
|
||||
- [Synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request), which defines pipeline stages and runs them synchronously in the `infer` method.
|
||||
- [Asynchronous inference request](@ref openvino_docs_ie_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
|
||||
- [Asynchronous inference request](@ref openvino_docs_ov_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
|
||||
- For single-stage pipelines, there is no need to define this method and create a class derived from ov::IAsyncInferRequest. For single stage pipelines, a default implementation of this method creates ov::IAsyncInferRequest wrapping a synchronous inference request and runs it asynchronously in the `m_request_executor` executor.
|
||||
- For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
|
||||
> **IMPORTANT**: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
`InferRequest` class functionality:
|
||||
- Allocate input and output tensors needed for a backend-dependent network inference.
|
||||
- Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) implementation.
|
||||
- Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref openvino_docs_ov_plugin_dg_async_infer_request) implementation.
|
||||
- Call inference stages one by one synchronously.
|
||||
|
||||
InferRequest Class
|
||||
@ -81,4 +81,4 @@ The method returns the profiling info which was measured during pipeline stages
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:get_profiling_info
|
||||
|
||||
The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) class.
|
||||
The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ov_plugin_dg_async_infer_request) class.
|
||||
|
@ -10,7 +10,9 @@
|
||||
Implement Plugin Functionality <openvino_docs_ov_plugin_dg_plugin>
|
||||
Implement Compiled Model Functionality <openvino_docs_ov_plugin_dg_compiled_model>
|
||||
Implement Synchronous Inference Request <openvino_docs_ov_plugin_dg_infer_request>
|
||||
Implement Asynchronous Inference Request <openvino_docs_ie_plugin_dg_async_infer_request>
|
||||
Implement Asynchronous Inference Request <openvino_docs_ov_plugin_dg_async_infer_request>
|
||||
Implement Remote Context <openvino_docs_ov_plugin_dg_remote_context>
|
||||
Implement Remote Tensor <openvino_docs_ov_plugin_dg_remote_tensor>
|
||||
openvino_docs_ov_plugin_dg_plugin_build
|
||||
openvino_docs_ov_plugin_dg_plugin_testing
|
||||
openvino_docs_ie_plugin_detailed_guides
|
||||
@ -28,23 +30,25 @@ OpenVINO Plugin Library
|
||||
OpenVINO plugin dynamic library consists of several main components:
|
||||
|
||||
1. [Plugin class](@ref openvino_docs_ov_plugin_dg_plugin):
|
||||
- Provides information about devices of a specific type.
|
||||
- Can create an [compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) instance which represents a Neural
|
||||
Network backend specific graph structure for a particular device in opposite to the ov::Model
|
||||
which is backend-independent.
|
||||
- Can import an already compiled graph structure from an input stream to an
|
||||
[compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) object.
|
||||
2. [Compiled Modek class](@ref openvino_docs_ov_plugin_dg_compiled_model):
|
||||
- Is an execution configuration compiled for a particular device and takes into account its capabilities.
|
||||
- Holds a reference to a particular device and a task executor for this device.
|
||||
- Can create several instances of [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request).
|
||||
- Can export an internal backend specific graph structure to an output stream.
|
||||
- Provides information about devices of a specific type.
|
||||
- Can create an [compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) instance which represents a Neural Network backend specific graph structure for a particular device in opposite to the ov::Model
|
||||
which is backend-independent.
|
||||
- Can import an already compiled graph structure from an input stream to an
|
||||
[compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) object.
|
||||
2. [Compiled Model class](@ref openvino_docs_ov_plugin_dg_compiled_model):
|
||||
- Is an execution configuration compiled for a particular device and takes into account its capabilities.
|
||||
- Holds a reference to a particular device and a task executor for this device.
|
||||
- Can create several instances of [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request).
|
||||
- Can export an internal backend specific graph structure to an output stream.
|
||||
3. [Inference Request class](@ref openvino_docs_ov_plugin_dg_infer_request):
|
||||
- Runs an inference pipeline serially.
|
||||
- Can extract performance counters for an inference pipeline execution profiling.
|
||||
4. [Asynchronous Inference Request class](@ref openvino_docs_ie_plugin_dg_async_infer_request):
|
||||
- Wraps the [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class and runs pipeline stages in parallel
|
||||
on several task executors based on a device-specific pipeline structure.
|
||||
4. [Asynchronous Inference Request class](@ref openvino_docs_ov_plugin_dg_async_infer_request):
|
||||
- Wraps the [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class and runs pipeline stages in parallel on several task executors based on a device-specific pipeline structure.
|
||||
5. [Remote Context](@ref openvino_docs_ov_plugin_dg_remote_context):
|
||||
- Provides the device specific remote context. Context allows to create remote tensors.
|
||||
6. [Remote Tensor](@ref openvino_docs_ov_plugin_dg_remote_tensor)
|
||||
- Provides the device specific remote tensor API and implementation.
|
||||
|
||||
> **NOTE**: This documentation is written based on the `Template` plugin, which demonstrates plugin
|
||||
|
||||
@ -57,7 +61,7 @@ Detailed guides
|
||||
|
||||
* [Build](@ref openvino_docs_ov_plugin_dg_plugin_build) a plugin library using CMake
|
||||
* Plugin and its components [testing](@ref openvino_docs_ov_plugin_dg_plugin_testing)
|
||||
* [Quantized networks](@ref openvino_docs_ie_plugin_dg_quantized_networks)
|
||||
* [Quantized networks](@ref openvino_docs_ov_plugin_dg_quantized_models)
|
||||
* [Low precision transformations](@ref openvino_docs_OV_UG_lpt) guide
|
||||
* [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide
|
||||
|
||||
|
@ -85,7 +85,7 @@ Actual model compilation is done in the `CompiledModel` constructor. Refer to th
|
||||
|
||||
The function accepts a const shared pointer to `ov::Model` object and applies common and device-specific transformations on a copied model to make it more friendly to hardware operations. For details how to write custom device-specific transformation, please, refer to [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide. See detailed topics about model representation:
|
||||
* [Intermediate Representation and Operation Sets](@ref openvino_docs_MO_DG_IR_and_opsets)
|
||||
* [Quantized models](@ref openvino_docs_ie_plugin_dg_quantized_networks).
|
||||
* [Quantized models](@ref openvino_docs_ov_plugin_dg_quantized_models).
|
||||
|
||||
@snippet template/src/plugin.cpp plugin:transform_model
|
||||
|
||||
|
@ -8,7 +8,7 @@ OpenVINO Plugin tests are included in the `openvino::funcSharedTests` CMake targ
|
||||
|
||||
Test definitions are split into tests class declaration (see `src/tests/functional/plugin/shared/include`) and tests class implementation (see `src/tests/functional/plugin/shared/src`) and include the following scopes of plugin conformance tests:
|
||||
|
||||
1. **Behavior tests** (`behavior` sub-folder), which are a separate test group to check that a plugin satisfies basic OpenVINO concepts: plugin creation, multiple executable networks support, multiple synchronous and asynchronous inference requests support, and so on. See the next section with details how to instantiate the tests definition class with plugin-specific parameters.
|
||||
1. **Behavior tests** (`behavior` sub-folder), which are a separate test group to check that a plugin satisfies basic OpenVINO concepts: plugin creation, multiple compiled models support, multiple synchronous and asynchronous inference requests support, and so on. See the next section with details how to instantiate the tests definition class with plugin-specific parameters.
|
||||
|
||||
2. **Single layer tests** (`single_layer_tests` sub-folder). This groups of tests checks that a particular single layer can be inferenced on a device. An example of test instantiation based on test definition from `openvino::funcSharedTests` library:
|
||||
|
||||
|
@ -1,8 +1,8 @@
|
||||
# Quantized networks compute and restrictions {#openvino_docs_ie_plugin_dg_quantized_networks}
|
||||
# Quantized models compute and restrictions {#openvino_docs_ov_plugin_dg_quantized_models}
|
||||
|
||||
One of the feature of Inference Engine is the support of quantized networks with different precisions: INT8, INT4, etc.
|
||||
One of the feature of OpenVINO is the support of quantized models with different precisions: INT8, INT4, etc.
|
||||
However, it is up to the plugin to define what exact precisions are supported by the particular HW.
|
||||
All quantized networks which can be expressed in IR have a unified representation by means of *FakeQuantize* operation.
|
||||
All quantized models which can be expressed in IR have a unified representation by means of *FakeQuantize* operation.
|
||||
For more details about low-precision model representation please refer to this [document](@ref openvino_docs_ie_plugin_dg_lp_representation).
|
||||
|
||||
### Interpreting FakeQuantize at runtime
|
||||
@ -44,6 +44,6 @@ Below we define these rules as follows:
|
||||
- Per-channel quantization of activations for channel-wise and element-wise operations, e.g. Depthwise Convolution, Eltwise Add/Mul, ScaleShift.
|
||||
- Symmetric and asymmetric quantization of weights and activations with the support of per-channel scales and zero-points.
|
||||
- Non-unified quantization parameters for Eltwise and Concat operations.
|
||||
- Non-quantized network output, i.e. there are no quantization parameters for it.
|
||||
- Non-quantized models output, i.e. there are no quantization parameters for it.
|
||||
|
||||
[qdq_propagation]: images/qdq_propagation.png
|
||||
|
49
docs/IE_PLUGIN_DG/RemoteContext.md
Normal file
49
docs/IE_PLUGIN_DG/RemoteContext.md
Normal file
@ -0,0 +1,49 @@
|
||||
# Remote Context {#openvino_docs_ov_plugin_dg_remote_context}
|
||||
|
||||
ov::RemoteContext class functionality:
|
||||
- Represents device specific inference context.
|
||||
- Allows to create remote device specific tensor.
|
||||
|
||||
> **NOTE**: If plugin provides a public API for own Remote Context, the API should be header only and doesn't depend on the plugin library.
|
||||
|
||||
|
||||
RemoteContext Class
|
||||
------------------------
|
||||
|
||||
OpenVINO Plugin API provides the interface ov::IRemoteContext which should be used as a base class for a plugin specific remote context. Based on that, a declaration of an compiled model class can look as follows:
|
||||
|
||||
@snippet src/remote_context.hpp remote_context:header
|
||||
|
||||
### Class Fields
|
||||
|
||||
The example class has several fields:
|
||||
|
||||
- `m_name` - Device name.
|
||||
- `m_property` - Device specific context properties. It can be used to cast RemoteContext to device specific type.
|
||||
|
||||
### RemoteContext Constructor
|
||||
|
||||
This constructor should initialize the remote context device name and properties.
|
||||
|
||||
@snippet src/remote_context.cpp remote_context:ctor
|
||||
|
||||
### get_device_name()
|
||||
|
||||
The function returns the device name from the remote context.
|
||||
|
||||
@snippet src/remote_context.cpp remote_context:get_device_name
|
||||
|
||||
### get_property()
|
||||
|
||||
The implementation returns the remote context properties.
|
||||
|
||||
@snippet src/remote_context.cpp remote_context:get_property
|
||||
|
||||
|
||||
### create_tensor()
|
||||
|
||||
The method creates device specific remote tensor.
|
||||
|
||||
@snippet src/remote_context.cpp remote_context:create_tensor
|
||||
|
||||
The next step to support device specific tensors is a creation of device specific [Remote Tensor](@ref openvino_docs_ov_plugin_dg_remote_tensor) class.
|
87
docs/IE_PLUGIN_DG/RemoteTensor.md
Normal file
87
docs/IE_PLUGIN_DG/RemoteTensor.md
Normal file
@ -0,0 +1,87 @@
|
||||
# Remote Tensor {#openvino_docs_ov_plugin_dg_remote_tensor}
|
||||
|
||||
ov::RemoteTensor class functionality:
|
||||
- Provide an interface to work with device specific memory.
|
||||
|
||||
> **NOTE**: If plugin provides a public API for own Remote Tensor, the API should be header only and doesn't depend on the plugin library.
|
||||
|
||||
|
||||
Device Specific Remote Tensor Public API
|
||||
------------------------------------------
|
||||
|
||||
The public interface to work with device specific remote tensors should have header only implementation and doesn't depend on the plugin library.
|
||||
|
||||
@snippet include/template/remote_tensor.hpp remote_tensor:public_header
|
||||
|
||||
The implementation below has several methods:
|
||||
|
||||
### type_check()
|
||||
|
||||
Static method is used to understand that some abstract remote tensor can be casted to this particular remote tensor type.
|
||||
|
||||
### get_data()
|
||||
|
||||
The set of methods (specific for the example, other implementation can have another API) which are helpers to get an access to remote data.
|
||||
|
||||
Device Specific Internal tensor implementation
|
||||
-----------------------------------------------
|
||||
|
||||
The plugin should have the internal implementation of remote tensor which can communicate with public API.
|
||||
The example contains the implementation of remote tensor which wraps memory from stl vector.
|
||||
|
||||
OpenVINO Plugin API provides the interface ov::IRemoteTensor which should be used as a base class for remote tensors.
|
||||
|
||||
The example implementation have two remote tensor classes:
|
||||
|
||||
- Internal type dependent implementation which has as an template argument the vector type and create the type specific tensor.
|
||||
- The type independent implementation which works with type dependent tensor inside.
|
||||
|
||||
Based on that, an implementation of a type independent remote tensor class can look as follows:
|
||||
|
||||
@snippet src/remote_context.cpp vector_impl:implementation
|
||||
|
||||
The implementation provides a helper to get wrapped stl tensor and overrides all important methods of ov::IRemoteTensor class and recall the type dependent implementation.
|
||||
|
||||
The type dependent remote tensor has the next implementation:
|
||||
|
||||
@snippet src/remote_context.cpp vector_impl_t:implementation
|
||||
|
||||
### Class Fields
|
||||
|
||||
The class has several fields:
|
||||
|
||||
- `m_element_type` - Tensor element type.
|
||||
- `m_shape` - Tensor shape.
|
||||
- `m_strides` - Tensor strides.
|
||||
- `m_data` - Wrapped vector.
|
||||
- `m_dev_name` - Device name.
|
||||
- `m_properties` - Remote tensor specific properties which can be used to detect the type of the remote tensor.
|
||||
|
||||
### VectorTensorImpl()
|
||||
|
||||
The constructor of remote tensor implementation. Creates a vector with data, initialize device name and properties, updates shape, element type and strides.
|
||||
|
||||
|
||||
### get_element_type()
|
||||
|
||||
The method returns tensor element type.
|
||||
|
||||
### get_shape()
|
||||
|
||||
The method returns tensor shape.
|
||||
|
||||
### get_strides()
|
||||
|
||||
The method returns tensor strides.
|
||||
|
||||
### set_shape()
|
||||
|
||||
The method allows to set new shapes for the remote tensor.
|
||||
|
||||
### get_properties()
|
||||
|
||||
The method returns tensor specific properties.
|
||||
|
||||
### get_device_name()
|
||||
|
||||
The method returns tensor specific device name.
|
@ -6,13 +6,13 @@
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
openvino_docs_ie_plugin_dg_quantized_networks
|
||||
openvino_docs_ov_plugin_dg_quantized_models
|
||||
openvino_docs_OV_UG_lpt
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
The guides below provides extra information about specific features of OpenVINO needed for understanding during OpenVINO plugin development:
|
||||
|
||||
* [Quantized networks](@ref openvino_docs_ie_plugin_dg_quantized_networks)
|
||||
* [Quantized networks](@ref openvino_docs_ov_plugin_dg_quantized_models)
|
||||
* [Low precision transformations](@ref openvino_docs_OV_UG_lpt) guide
|
||||
* [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide
|
||||
|
@ -4,7 +4,7 @@
|
||||
<tab type="usergroup" url="index.html" visibile="yes" title="GUIDE">
|
||||
<tab type="usergroup" url="index.html" title="Developer Guide for OpenVINO Plugin Library">
|
||||
<tab type="user" url="@ref plugin" visibile="yes" title="Implement Plugin Functionality"/>
|
||||
<tab type="user" url="@ref executable_network" visibile="yes" title="Implement Executable Network Functionality">
|
||||
<tab type="user" url="@ref compiled_model" visibile="yes" title="Implement Executable Network Functionality">
|
||||
<tab type="usergroup" title="Low Precision Transformations" url="@ref openvino_docs_OV_UG_lpt">
|
||||
<tab type="user" title="Attributes" url="@ref openvino_docs_OV_UG_lpt_attributes">
|
||||
<tab type="user" title="AvgPoolPrecisionPreserved" url="@ref openvino_docs_OV_UG_lpt_AvgPoolPrecisionPreserved"/>
|
||||
@ -79,6 +79,8 @@
|
||||
</tab>
|
||||
<tab type="user" url="@ref infer_request" visibile="yes" title="Implement Synchronous Inference Request"/>
|
||||
<tab type="user" url="@ref async_infer_request" visibile="yes" title="Implement Asynchronous Inference Request"/>
|
||||
<tab type="user" url="@ref remote_context" visibile="yes" title="Implement Remote Context"/>
|
||||
<tab type="user" url="@ref remote_tensor" visibile="yes" title="Implement Remote Tensor"/>
|
||||
</tab>
|
||||
</tab>
|
||||
<!-- Additional resources -->
|
||||
|
@ -1,109 +1,168 @@
|
||||
# Getting Performance Numbers {#openvino_docs_MO_DG_Getting_Performance_Numbers}
|
||||
|
||||
This guide explains how to use the benchmark_app to get performance numbers. It also explains how the performance numbers are reflected through internal inference performance counters and execution graphs. It also includes information on using ITT and Intel® VTune™ Profiler to get performance insights.
|
||||
|
||||
## Test performance with the benchmark_app
|
||||
@sphinxdirective
|
||||
|
||||
### Prerequisites
|
||||
This guide explains how to use the benchmark_app to get performance numbers. It also explains how the performance
|
||||
numbers are reflected through internal inference performance counters and execution graphs. It also includes
|
||||
information on using ITT and Intel® VTune™ Profiler to get performance insights.
|
||||
|
||||
To run benchmarks, you need both OpenVINO developer tools and Runtime installed. Follow the [Installation guide](../../install_guides/installing-model-dev-tools.md) and make sure to install the latest general release package with support for frameworks of the models you want to test.
|
||||
Test performance with the benchmark_app
|
||||
###########################################################
|
||||
|
||||
To test performance of your model, make sure you [prepare the model for use with OpenVINO](../../Documentation/model_introduction.md). For example, if you use [OpenVINO's automation tools](@ref omz_tools_downloader), these two lines of code will download the resnet-50-tf and convert it to OpenVINO IR.
|
||||
Prerequisites
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
To run benchmarks, you need both OpenVINO developer tools and Runtime installed. Follow the
|
||||
:doc:`Installation guide <openvino_docs_install_guides_install_dev_tools>` and make sure to install the latest
|
||||
general release package with support for frameworks of the models you want to test.
|
||||
|
||||
To test performance of your model, make sure you :doc:`prepare the model for use with OpenVINO <openvino_docs_model_processing_introduction>`.
|
||||
For example, if you use :doc:`OpenVINO's automation tools <omz_tools_downloader>`, these two lines of code will download the
|
||||
resnet-50-tf and convert it to OpenVINO IR.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
```bash
|
||||
omz_downloader --name resnet-50-tf
|
||||
omz_converter --name resnet-50-tf
|
||||
```
|
||||
|
||||
### Running the benchmark application
|
||||
|
||||
For a detailed description, see the dedicated articles: [benchmark_app for C++](../../../samples/cpp/benchmark_app/README.md) and [benchmark_app for Python](../../../tools/benchmark_tool/README.md).
|
||||
Running the benchmark application
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
For a detailed description, see the dedicated articles:
|
||||
:doc:`benchmark_app for C++ <openvino_inference_engine_samples_benchmark_app_README>` and
|
||||
:doc:`benchmark_app for Python <openvino_inference_engine_tools_benchmark_tool_README>`.
|
||||
|
||||
The benchmark_app includes a lot of device-specific options, but the primary usage is as simple as:
|
||||
|
||||
```bash
|
||||
.. code-block:: bash
|
||||
|
||||
benchmark_app -m <model> -d <device> -i <input>
|
||||
```
|
||||
|
||||
Each of the [OpenVINO supported devices](../../OV_Runtime_UG/supported_plugins/Supported_Devices.md) offers performance settings that contain command-line equivalents in the Benchmark app.
|
||||
|
||||
While these settings provide really low-level control for the optimal model performance on the _specific_ device, it is recommended to always start performance evaluation with the [OpenVINO High-Level Performance Hints](../../OV_Runtime_UG/performance_hints.md) first, like so:
|
||||
Each of the :doc:`OpenVINO supported devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>` offers
|
||||
performance settings that contain command-line equivalents in the Benchmark app.
|
||||
|
||||
While these settings provide really low-level control for the optimal model performance on the *specific* device,
|
||||
it is recommended to always start performance evaluation with the :doc:`OpenVINO High-Level Performance Hints <openvino_docs_OV_UG_Performance_Hints>` first, like so:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
```bash
|
||||
# for throughput prioritization
|
||||
benchmark_app -hint tput -m <model> -d <device>
|
||||
# for latency prioritization
|
||||
benchmark_app -hint latency -m <model> -d <device>
|
||||
```
|
||||
|
||||
## Additional benchmarking considerations
|
||||
|
||||
### 1 - Select a Proper Set of Operations to Measure
|
||||
Additional benchmarking considerations
|
||||
###########################################################
|
||||
|
||||
1 - Select a Proper Set of Operations to Measure
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
When evaluating performance of a model with OpenVINO Runtime, it is required to measure a proper set of operations.
|
||||
|
||||
- Avoid including one-time costs such as model loading.
|
||||
- Track operations that occur outside OpenVINO Runtime (such as video decoding) separately.
|
||||
|
||||
> **NOTE**: Some image pre-processing can be baked into OpenVINO IR and accelerated accordingly. For more information, refer to [Embedding the Pre-processing](Additional_Optimizations.md) and [General Runtime Optimizations](../../optimization_guide/dldt_deployment_optimization_common.md).
|
||||
|
||||
### 2 - Try to Get Credible Data
|
||||
.. note::
|
||||
|
||||
Performance conclusions should be build upon reproducible data. As for the performance measurements, they should be done with a large number of invocations of the same routine. Since the first iteration is almost always significantly slower than the subsequent ones, an aggregated value can be used for the execution time for final projections:
|
||||
|
||||
- If the warm-up run does not help or execution time still varies, you can try running a large number of iterations and then average or find a mean of the results.
|
||||
- If the time values range too much, consider geomean.
|
||||
- Be aware of the throttling and other power oddities. A device can exist in one of several different power states. When optimizing your model, consider fixing the device frequency for better performance data reproducibility. However, the end-to-end (application) benchmarking should also be performed under real operational conditions.
|
||||
Some image pre-processing can be baked into OpenVINO IR and accelerated accordingly. For more information,
|
||||
refer to :doc:`Embedding Pre-processing <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>` and
|
||||
:doc:`General Runtime Optimizations <openvino_docs_deployment_optimization_guide_common>`.
|
||||
|
||||
|
||||
### 3 - Compare Performance with Native/Framework Code
|
||||
2 - Try to Get Credible Data
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Performance conclusions should be build upon reproducible data. As for the performance measurements, they should
|
||||
be done with a large number of invocations of the same routine. Since the first iteration is almost always significantly
|
||||
slower than the subsequent ones, an aggregated value can be used for the execution time for final projections:
|
||||
|
||||
- If the warm-up run does not help or execution time still varies, you can try running a large number of iterations
|
||||
and then average or find a mean of the results.
|
||||
- If the time values range too much, consider geomean.
|
||||
- Be aware of the throttling and other power oddities. A device can exist in one of several different power states.
|
||||
When optimizing your model, consider fixing the device frequency for better performance data reproducibility.
|
||||
However, the end-to-end (application) benchmarking should also be performed under real operational conditions.
|
||||
|
||||
|
||||
3 - Compare Performance with Native/Framework Code
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
When comparing the OpenVINO Runtime performance with the framework or another reference code, make sure that both versions are as similar as possible:
|
||||
|
||||
- Wrap the exact inference execution (refer to the [Benchmark app](../../../samples/cpp/benchmark_app/README.md) for examples).
|
||||
- Wrap the exact inference execution (for examples, see :doc:`Benchmark app <openvino_inference_engine_samples_benchmark_app_README>`).
|
||||
- Do not include model loading time.
|
||||
- Ensure that the inputs are identical for OpenVINO Runtime and the framework. For example, watch out for random values that can be used to populate the inputs.
|
||||
- In situations when any user-side pre-processing should be tracked separately, consider [image pre-processing and conversion](../../OV_Runtime_UG/preprocessing_overview.md).
|
||||
- When applicable, leverage the [Dynamic Shapes support](../../OV_Runtime_UG/ov_dynamic_shapes.md).
|
||||
- If possible, demand the same accuracy. For example, TensorFlow allows `FP16` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the `FP16` as well.
|
||||
- In situations when any user-side pre-processing should be tracked separately, consider :doc:`image pre-processing and conversion <openvino_docs_OV_UG_Preprocessing_Overview>`.
|
||||
- When applicable, leverage the :doc:`Dynamic Shapes support <openvino_docs_OV_UG_DynamicShapes>`.
|
||||
- If possible, demand the same accuracy. For example, TensorFlow allows ``FP16`` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the ``FP16`` as well.
|
||||
|
||||
### Internal Inference Performance Counters and Execution Graphs <a name="performance-counters"></a>
|
||||
Internal Inference Performance Counters and Execution Graphs
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
More detailed insights into inference performance breakdown can be achieved with device-specific performance counters and/or execution graphs.
|
||||
Both [C++](../../../samples/cpp/benchmark_app/README.md) and [Python](../../../tools/benchmark_tool/README.md) versions of the `benchmark_app` support a `-pc` command-line parameter that outputs internal execution breakdown.
|
||||
Both :doc:`C++ <openvino_inference_engine_samples_benchmark_app_README>` and :doc:`Python <openvino_inference_engine_tools_benchmark_tool_README>`
|
||||
versions of the *benchmark_app* support a ``-pc`` command-line parameter that outputs internal execution breakdown.
|
||||
|
||||
For example, the table shown below is part of performance counters for quantized [TensorFlow implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](../../OV_Runtime_UG/supported_plugins/CPU.md).
|
||||
Keep in mind that since the device is CPU, the `realTime` wall clock and the `cpu` time layers are the same. Information about layer precision is also stored in the performance counters.
|
||||
|
||||
| layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) |
|
||||
| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ |
|
||||
| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.377 | 0.377 |
|
||||
| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_I8 | 0.499 | 0.499 |
|
||||
| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.399 | 0.399 |
|
||||
| resnet\_model/add\_4/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
| resnet\_model/add\_4 | NOT\_RUN | Eltwise | undef | 0 | 0 |
|
||||
| resnet\_model/add\_5/fq\_input\_1 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
For example, the table shown below is part of performance counters for quantized
|
||||
`TensorFlow implementation of ResNet-50 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf>`__
|
||||
model inference on :doc:`CPU Plugin <openvino_docs_OV_UG_supported_plugins_CPU>`.
|
||||
Keep in mind that since the device is CPU, the ``realTime`` wall clock and the ``cpu`` time layers are the same.
|
||||
Information about layer precision is also stored in the performance counters.
|
||||
|
||||
|
||||
The `exeStatus` column of the table includes the following possible values:
|
||||
- `EXECUTED` - the layer was executed by standalone primitive.
|
||||
- `NOT_RUN` - the layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.
|
||||
|
||||
The `execType` column of the table includes inference primitives with specific suffixes. The layers could have the following marks:
|
||||
* The `I8` suffix is for layers that had 8-bit data type input and were computed in 8-bit precision.
|
||||
* The `FP32` suffix is for layers computed in 32-bit precision.
|
||||
=========================================================== ============= ============== ===================== ================= ==============
|
||||
layerName execStatus layerType execType realTime (ms) cpuTime (ms)
|
||||
=========================================================== ============= ============== ===================== ================= ==============
|
||||
resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add EXECUTED Convolution jit\_avx512\_1x1\_I8 0.377 0.377
|
||||
resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 NOT\_RUN FakeQuantize undef 0 0
|
||||
resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add EXECUTED Convolution jit\_avx512\_I8 0.499 0.499
|
||||
resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 NOT\_RUN FakeQuantize undef 0 0
|
||||
resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add EXECUTED Convolution jit\_avx512\_1x1\_I8 0.399 0.399
|
||||
resnet\_model/add\_4/fq\_input\_0 NOT\_RUN FakeQuantize undef 0 0
|
||||
resnet\_model/add\_4 NOT\_RUN Eltwise undef 0 0
|
||||
resnet\_model/add\_5/fq\_input\_1 NOT\_RUN FakeQuantize undef 0 0
|
||||
=========================================================== ============= ============== ===================== ================= ==============
|
||||
|
||||
All `Convolution` layers are executed in `int8` precision. The rest of the layers are fused into Convolutions using post-operation optimization, as described in [CPU Device](../../OV_Runtime_UG/supported_plugins/CPU.md).
|
||||
This contains layer names (as seen in OpenVINO IR), type of the layer, and execution statistics.
|
||||
| The ``exeStatus`` column of the table includes the following possible values:
|
||||
| - ``EXECUTED`` - the layer was executed by standalone primitive.
|
||||
| - ``NOT_RUN`` - the layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.
|
||||
|
|
||||
| The ``execType`` column of the table includes inference primitives with specific suffixes. The layers could have the following marks:
|
||||
| - The ``I8`` suffix is for layers that had 8-bit data type input and were computed in 8-bit precision.
|
||||
| - The ``FP32`` suffix is for layers computed in 32-bit precision.
|
||||
|
|
||||
| All ``Convolution`` layers are executed in ``int8`` precision. The rest of the layers are fused into Convolutions using post-operation optimization,
|
||||
as described in :doc:`CPU Device <openvino_docs_OV_UG_supported_plugins_CPU>`. This contains layer names
|
||||
(as seen in OpenVINO IR), type of the layer, and execution statistics.
|
||||
|
||||
Both `benchmark_app` versions also support the `exec_graph_path` command-line option. It requires OpenVINO to output the same execution statistics per layer, but in the form of plugin-specific [Netron-viewable](https://netron.app/) graph to the specified file.
|
||||
|
||||
Especially when performance-debugging the [latency](../../optimization_guide/dldt_deployment_optimization_latency.md), note that the counters do not reflect the time spent in the `plugin/device/driver/etc` queues. If the sum of the counters is too different from the latency of an inference request, consider testing with less inference requests. For example, running single [OpenVINO stream](../../optimization_guide/dldt_deployment_optimization_tput.md) with multiple requests would produce nearly identical counters as running a single inference request, while the actual latency can be quite different.
|
||||
Both *benchmark_app* versions also support the ``exec_graph_path`` command-line option. It requires OpenVINO to output the same execution
|
||||
statistics per layer, but in the form of plugin-specific `Netron-viewable <https://netron.app/>`__ graph to the specified file.
|
||||
|
||||
Lastly, the performance statistics with both performance counters and execution graphs are averaged, so such data for the [inputs of dynamic shapes](../../OV_Runtime_UG/ov_dynamic_shapes.md) should be measured carefully, preferably by isolating the specific shape and executing multiple times in a loop, to gather the reliable data.
|
||||
Especially when performance-debugging the :doc:`latency <openvino_docs_deployment_optimization_guide_latency>`, note that the counters
|
||||
do not reflect the time spent in the ``plugin/device/driver/etc`` queues. If the sum of the counters is too different from the latency
|
||||
of an inference request, consider testing with less inference requests. For example, running single
|
||||
:doc:`OpenVINO stream <openvino_docs_deployment_optimization_guide_tput>` with multiple requests would produce nearly identical
|
||||
counters as running a single inference request, while the actual latency can be quite different.
|
||||
|
||||
Lastly, the performance statistics with both performance counters and execution graphs are averaged,
|
||||
so such data for the :doc:`inputs of dynamic shapes <openvino_docs_OV_UG_DynamicShapes>` should be measured carefully,
|
||||
preferably by isolating the specific shape and executing multiple times in a loop, to gather reliable data.
|
||||
|
||||
Use ITT to Get Performance Insights
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
In general, OpenVINO and its individual plugins are heavily instrumented with Intel® Instrumentation and Tracing Technology (ITT).
|
||||
Therefore, you can also compile OpenVINO from the source code with ITT enabled and use tools like
|
||||
`Intel® VTune™ Profiler <https://software.intel.com/en-us/vtune>`__ to get detailed inference performance breakdown and additional
|
||||
insights in the application-level performance on the timeline view.
|
||||
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
### Use ITT to Get Performance Insights
|
||||
|
||||
In general, OpenVINO and its individual plugins are heavily instrumented with Intel® Instrumentation and Tracing Technology (ITT). Therefore, you can also compile OpenVINO from the source code with ITT enabled and use tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view.
|
||||
|
@ -7,13 +7,13 @@ To run inference on multiple devices, you can choose either of the following way
|
||||
- Use the :ref:`CUMULATIVE_THROUGHPUT option <cumulative throughput>` of the Automatic Device Selection mode. This way, you can use all available devices in the system without the need to specify them.
|
||||
- Use the Multi-Device execution mode. This page will explain how it works and how to use it.
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
## How MULTI Works
|
||||
How MULTI Works
|
||||
####################
|
||||
|
||||
The Multi-Device execution mode, or MULTI for short, acts as a "virtual" or a "proxy" device, which does not bind to a specific type of hardware. Instead, it assigns available computing devices to particular inference requests, which are then executed in parallel.
|
||||
|
||||
The potential gains from using Multi-Device execution are:
|
||||
|
||||
* improved throughput from using multiple devices at once,
|
||||
* increase in performance stability due to multiple devices sharing inference workload.
|
||||
|
||||
@ -22,31 +22,29 @@ Importantly, the Multi-Device mode does not change the application logic, so it
|
||||
Note that the performance increase in this mode comes from utilizing multiple devices at once. This means that you need to provide the devices with enough inference requests to keep them busy, otherwise you will not benefit much from using MULTI.
|
||||
|
||||
|
||||
## Using the Multi-Device Mode
|
||||
Using the Multi-Device Mode
|
||||
###########################
|
||||
|
||||
Following the OpenVINO™ naming convention, the Multi-Device mode is assigned the label of “MULTI.” The only configuration option available for it is a prioritized list of devices to use:
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
+---------------------------+---------------------------------+------------------------------------------------------------+
|
||||
| Property | Property values | Description |
|
||||
+===========================+=================================+============================================================+
|
||||
| <device list> | | MULTI: <device names> | | Specifies the devices available for selection. |
|
||||
| | | comma-separated, no spaces | | The device sequence will be taken as priority |
|
||||
+---------------------------+---------------------------------+ | from high to low. |
|
||||
| ov::device::priorities | | device names | | Priorities can be set directly as a string. |
|
||||
| | | comma-separated, no spaces | |
|
||||
+---------------------------+---------------------------------+------------------------------------------------------------+
|
||||
+----------------------------+---------------------------------+------------------------------------------------------------+
|
||||
| Property | Property values | Description |
|
||||
+============================+=================================+============================================================+
|
||||
| <device list> | | MULTI: <device names> | | Specifies the devices available for selection. |
|
||||
| | | comma-separated, no spaces | | The device sequence will be taken as priority |
|
||||
+----------------------------+---------------------------------+ | from high to low. |
|
||||
| ``ov::device::priorities`` | | device names | | Priorities can be set directly as a string. |
|
||||
| | | comma-separated, no spaces | |
|
||||
+----------------------------+---------------------------------+------------------------------------------------------------+
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
Specifying the device list explicitly is required by MULTI, as it defines the devices available for inference and sets their priorities. Importantly, the list may also specify the number of requests for MULTI to keep for each device, as described below.
|
||||
|
||||
Note that OpenVINO™ Runtime enables you to use “GPU” as an alias for “GPU.0” in function calls. More details on enumerating devices can be found in [Working with devices](supported_plugins/Device_Plugins.md).
|
||||
Note that OpenVINO™ Runtime enables you to use “GPU” as an alias for “GPU.0” in function calls. More details on enumerating devices can be found in :doc:`Working with devices <openvino_docs_OV_UG_Working_with_devices>`.
|
||||
|
||||
The following commands are accepted by the API:
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. tab:: C++
|
||||
|
||||
@ -60,11 +58,9 @@ The following commands are accepted by the API:
|
||||
:language: python
|
||||
:fragment: [MULTI_0]
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
Notice that MULTI allows you to **change device priorities on the fly**. You can alter the order, exclude a device, and bring an excluded device back. Still, it does not allow adding new devices.
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. tab:: C++
|
||||
|
||||
@ -78,19 +74,17 @@ Notice that MULTI allows you to **change device priorities on the fly**. You can
|
||||
:language: python
|
||||
:fragment: [MULTI_1]
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
One more thing you can define is the **number of requests to allocate for each device**. You can do it simply by adding the number to each device in parentheses, like this: ``"MULTI:CPU(2),GPU(2)"``. However, this method is not recommended as it is not performance-portable. The suggested approach is to configure individual devices and query the resulting number of requests to be used at the application level, as described in `Configuring Individual Devices and Creating MULTI On Top <#configuring-individual-devices-and-creating-the-multi-device-on-top>`__.
|
||||
|
||||
To check what devices are present in the system, you can use the Device API. For information on how to do it, check :doc:`Query device properties and configuration <openvino_docs_OV_UG_query_api>`.
|
||||
|
||||
|
||||
Configuring Individual Devices and Creating the Multi-Device On Top
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
One more thing you can define is the **number of requests to allocate for each device**. You can do it simply by adding the number to each device in parentheses, like this: `"MULTI:CPU(2),GPU(2)"`. However, this method is not recommended as it is not performance-portable. The suggested approach is to configure individual devices and query the resulting number of requests to be used at the application level, as described in [Configuring Individual Devices and Creating MULTI On Top](#config-multi-on-top).
|
||||
|
||||
To check what devices are present in the system, you can use the Device API. For information on how to do it, check [Query device properties and configuration](supported_plugins/config_properties.md).
|
||||
|
||||
|
||||
### <a name="config-multi-on-top"></a> Configuring Individual Devices and Creating the Multi-Device On Top
|
||||
As mentioned previously, executing inference with MULTI may be set up by configuring individual devices before creating the "MULTI" device on top. It may be considered for performance reasons.
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. tab:: C++
|
||||
|
||||
@ -104,17 +98,15 @@ As mentioned previously, executing inference with MULTI may be set up by configu
|
||||
:language: python
|
||||
:fragment: [MULTI_4]
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
Alternatively, you can combine all the individual device settings into a single config file and load it for MULTI to parse. See the code example in the next section.
|
||||
|
||||
Querying the Optimal Number of Inference Requests
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
|
||||
### Querying the Optimal Number of Inference Requests
|
||||
When using MULTI, you don't need to sum over included devices yourself, you can query the optimal number of requests directly,
|
||||
using the [configure devices](supported_plugins/config_properties.md) property:
|
||||
using the :doc:`configure devices <openvino_docs_OV_UG_query_api>` property:
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. tab:: C++
|
||||
|
||||
@ -122,56 +114,52 @@ using the [configure devices](supported_plugins/config_properties.md) property:
|
||||
:language: cpp
|
||||
:fragment: [part5]
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
|
||||
## Using the Multi-Device with OpenVINO Samples and Benchmarking Performance
|
||||
Using the Multi-Device with OpenVINO Samples and Benchmarking Performance
|
||||
#########################################################################
|
||||
|
||||
To see how the Multi-Device execution is used in practice and test its performance, take a look at OpenVINO's Benchmark Application which presents the optimal performance of the plugin without the need for additional settings, like the number of requests or CPU threads.
|
||||
Here is an example command to evaluate performance of CPU + GPU:
|
||||
|
||||
```sh
|
||||
./benchmark_app –d MULTI:CPU,GPU –m <model> -i <input> -niter 1000
|
||||
```
|
||||
.. code-block:: sh
|
||||
|
||||
./benchmark_app –d MULTI:CPU,GPU –m <model> -i <input> -niter 1000
|
||||
|
||||
|
||||
For more information, refer to the :doc:`C++ <openvino_inference_engine_samples_benchmark_app_README>` or :doc:`Python <openvino_inference_engine_tools_benchmark_tool_README>` version instructions.
|
||||
|
||||
For more information, refer to the [C++](../../samples/cpp/benchmark_app/README.md) or [Python](../../tools/benchmark_tool/README.md) version instructions.
|
||||
|
||||
@sphinxdirective
|
||||
.. note::
|
||||
|
||||
You can keep using the FP16 IR without converting it to FP32, even if some of the listed devices do not support it. The conversion will be done automatically for you.
|
||||
|
||||
No demos are yet fully optimized for MULTI, by means of supporting the ov::optimal_number_of_infer_requests property, using the GPU streams/throttling, and so on.
|
||||
@endsphinxdirective
|
||||
No demos are yet fully optimized for MULTI, by means of supporting the ``ov::optimal_number_of_infer_requests`` property, using the GPU streams/throttling, and so on.
|
||||
|
||||
|
||||
## Performance Considerations for the Multi-Device Execution
|
||||
Performance Considerations for the Multi-Device Execution
|
||||
#########################################################
|
||||
|
||||
For best performance when using the MULTI execution mode you should consider a few recommendations:
|
||||
- MULTI usually performs best when the fastest device is specified first in the device candidate list.
|
||||
This is particularly important when the request-level parallelism is not sufficient
|
||||
(e.g. the number of requests is not enough to saturate all devices).
|
||||
- Just like with any throughput-oriented execution mode, it is highly recommended to query the optimal number of inference requests
|
||||
directly from the instance of the `ov:compiled_model`. Refer to the code of the previously mentioned `benchmark_app` for more details.
|
||||
- Execution on certain device combinations, for example CPU+GPU, performs better with certain knobs. Refer to the `benchmark_app` code for details. One specific example is disabling GPU driver polling, which in turn requires multiple GPU streams to balance out slower
|
||||
communication of inference completion from the device to the host.
|
||||
- The MULTI logic always attempts to save on copying data between device-agnostic and user-facing inference requests,
|
||||
and device-specific 'worker' requests that are being actually scheduled behind the scene.
|
||||
To facilitate the copy savings, it is recommended to run the requests in the order in which they were created.
|
||||
|
||||
- MULTI usually performs best when the fastest device is specified first in the device candidate list. This is particularly important when the request-level parallelism is not sufficient (e.g. the number of requests is not enough to saturate all devices).
|
||||
- Just like with any throughput-oriented execution mode, it is highly recommended to query the optimal number of inference requests directly from the instance of the ``ov:compiled_model``. Refer to the code of the previously mentioned ``benchmark_app`` for more details.
|
||||
- Execution on certain device combinations, for example CPU+GPU, performs better with certain knobs. Refer to the ``benchmark_app`` code for details. One specific example is disabling GPU driver polling, which in turn requires multiple GPU streams to balance out slower communication of inference completion from the device to the host.
|
||||
- The MULTI logic always attempts to save on copying data between device-agnostic and user-facing inference requests, and device-specific 'worker' requests that are being actually scheduled behind the scene. To facilitate the copy savings, it is recommended to run the requests in the order in which they were created.
|
||||
- While performance of accelerators combines well with MULTI, the CPU+GPU execution may introduce certain performance issues. It is due to the devices sharing some resources, like power or bandwidth. Enabling the GPU throttling hint, which saves a CPU thread for CPU inference, is an example of a recommended solution addressing this issue.
|
||||
|
||||
|
||||
Additional Resources
|
||||
####################
|
||||
|
||||
## Additional Resources
|
||||
- :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`
|
||||
- :doc:`Automatic Device Selection <openvino_docs_OV_UG_supported_plugins_AUTO>`
|
||||
|
||||
- [Supported Devices](supported_plugins/Supported_Devices.md)
|
||||
- [Automatic Device Selection](./auto_device_selection.md)
|
||||
|
||||
@sphinxdirective
|
||||
.. raw:: html
|
||||
|
||||
<iframe allowfullscreen mozallowfullscreen msallowfullscreen oallowfullscreen webkitallowfullscreen width="560" height="315" src="https://www.youtube.com/embed/xbORYFEmrqU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
> **NOTE**: This video is currently available only for C++, but many of the same concepts apply to Python.
|
||||
.. note:: This video is currently available only for C++, but many of the same concepts apply to Python.
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -10,17 +10,25 @@
|
||||
openvino_docs_MO_DG_Getting_Performance_Numbers
|
||||
|
||||
|
||||
@endsphinxdirective
|
||||
The `Intel® Distribution of OpenVINO™ toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__
|
||||
helps accelerate deep learning inference across a variety of Intel® processors and accelerators.
|
||||
|
||||
The [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) helps accelerate deep learning inference across a variety of Intel® processors and accelerators.
|
||||
|
||||
The benchmark results below demonstrate high performance gains on several public neural networks on multiple Intel® CPUs, GPUs and GNAs covering a broad performance range. The results may be helpful when deciding which hardware is best for your applications or to plan AI workload on the Intel computing already included in your solutions.
|
||||
The benchmark results presented here demonstrate high performance gains on several public neural networks on multiple Intel® CPUs,
|
||||
GPUs, and GNAs covering a broad performance range. The results may be helpful when deciding which hardware is best for your
|
||||
applications or to plan AI workload on the Intel computing already included in your solutions.
|
||||
|
||||
Benchmarks are available for:
|
||||
|
||||
* [Intel® Distribution of OpenVINO™ toolkit](performance_benchmarks_openvino.md).
|
||||
* :doc:`Intel® Distribution of OpenVINO™ toolkit <openvino_docs_performance_benchmarks_openvino>`.
|
||||
|
||||
You can also test performance for your system yourself, following the guide on :doc:`getting performance numbers <openvino_docs_MO_DG_Getting_Performance_Numbers>`.
|
||||
Performance of a particular application can also be evaluated virtually using `Intel® DevCloud for the Edge <https://devcloud.intel.com/edge/>`__.
|
||||
It is a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit.
|
||||
To learn more about it, visit `the website <https://www.intel.com/content/www/us/en/developer/tools/devcloud/edge/overview.html>`__
|
||||
or `create an account <https://www.intel.com/content/www/us/en/secure/forms/devcloud-enrollment/account-provisioning.html>`__.
|
||||
|
||||
|
||||
You can also test performance for your system yourself, following the guide on [getting performance numbers](../MO_DG/prepare_model/Getting_performance_numbers.md).
|
||||
Performance of a particular application can also be evaluated virtually using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/). It is a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. To learn more about it, visit [the website](https://www.intel.com/content/www/us/en/developer/tools/devcloud/edge/overview.html) or [create an account](https://www.intel.com/content/www/us/en/forms/idz/devcloud-registration.html?tgt=https://www.intel.com/content/www/us/en/secure/forms/devcloud-enrollment/account-provisioning.html).
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
|
@ -6,7 +6,7 @@
|
||||
.. dropdown:: How often do performance benchmarks get updated?
|
||||
|
||||
New performance benchmarks are typically published on every
|
||||
`major.minor` release of the Intel® Distribution of OpenVINO™ toolkit.
|
||||
``major.minor`` release of the Intel® Distribution of OpenVINO™ toolkit.
|
||||
|
||||
.. dropdown:: Where can I find the models used in the performance benchmarks?
|
||||
|
||||
@ -22,7 +22,7 @@
|
||||
|
||||
All of the performance benchmarks are generated using the
|
||||
open-source tool within the Intel® Distribution of OpenVINO™ toolkit
|
||||
called `benchmark_app`. This tool is available
|
||||
called ``benchmark_app``. This tool is available
|
||||
:doc:`for C++ apps <openvino_inference_engine_samples_benchmark_app_README>`.
|
||||
as well as
|
||||
:doc:`for Python apps <openvino_inference_engine_tools_benchmark_tool_README>`.
|
||||
@ -42,63 +42,63 @@
|
||||
- Public Network
|
||||
- Task
|
||||
- Input Size
|
||||
* - `bert-base-cased <https://github.com/PaddlePaddle/PaddleNLP/tree/v2.1.1>`_
|
||||
* - `bert-base-cased <https://github.com/PaddlePaddle/PaddleNLP/tree/v2.1.1>`__
|
||||
- BERT
|
||||
- question / answer
|
||||
- 124
|
||||
* - `bert-large-uncased-whole-word-masking-squad-int8-0001 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/bert-large-uncased-whole-word-masking-squad-int8-0001>`_
|
||||
* - `bert-large-uncased-whole-word-masking-squad-int8-0001 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/bert-large-uncased-whole-word-masking-squad-int8-0001>`__
|
||||
- BERT-large
|
||||
- question / answer
|
||||
- 384
|
||||
* - `deeplabv3-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/deeplabv3>`_
|
||||
* - `deeplabv3-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/deeplabv3>`__
|
||||
- DeepLab v3 Tf
|
||||
- semantic segmentation
|
||||
- 513x513
|
||||
* - `densenet-121-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/densenet-121-tf>`_
|
||||
* - `densenet-121-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/densenet-121-tf>`__
|
||||
- Densenet-121 Tf
|
||||
- classification
|
||||
- 224x224
|
||||
* - `efficientdet-d0 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/efficientdet-d0-tf>`_
|
||||
* - `efficientdet-d0 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/efficientdet-d0-tf>`__
|
||||
- Efficientdet
|
||||
- classification
|
||||
- 512x512
|
||||
* - `faster_rcnn_resnet50_coco-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco>`_
|
||||
* - `faster_rcnn_resnet50_coco-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco>`__
|
||||
- Faster RCNN Tf
|
||||
- object detection
|
||||
- 600x1024
|
||||
* - `inception-v4-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/googlenet-v4-tf>`_
|
||||
* - `inception-v4-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/googlenet-v4-tf>`__
|
||||
- Inception v4 Tf (aka GoogleNet-V4)
|
||||
- classification
|
||||
- 299x299
|
||||
* - `mobilenet-ssd-CF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-ssd>`_
|
||||
* - `mobilenet-ssd-CF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-ssd>`__
|
||||
- SSD (MobileNet)_COCO-2017_Caffe
|
||||
- object detection
|
||||
- 300x300
|
||||
* - `mobilenet-v2-pytorch <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-pytorch>`_
|
||||
* - `mobilenet-v2-pytorch <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-pytorch>`__
|
||||
- Mobilenet V2 PyTorch
|
||||
- classification
|
||||
- 224x224
|
||||
* - `resnet-18-pytorch <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-18-pytorch>`_
|
||||
* - `resnet-18-pytorch <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-18-pytorch>`__
|
||||
- ResNet-18 PyTorch
|
||||
- classification
|
||||
- 224x224
|
||||
* - `resnet-50-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf>`_
|
||||
* - `resnet-50-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf>`__
|
||||
- ResNet-50_v1_ILSVRC-2012
|
||||
- classification
|
||||
- 224x224
|
||||
* - `ssd-resnet34-1200-onnx <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd-resnet34-1200-onnx>`_
|
||||
* - `ssd-resnet34-1200-onnx <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd-resnet34-1200-onnx>`__
|
||||
- ssd-resnet34 onnx model
|
||||
- object detection
|
||||
- 1200x1200
|
||||
* - `unet-camvid-onnx-0001 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/unet-camvid-onnx-0001>`_
|
||||
* - `unet-camvid-onnx-0001 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/unet-camvid-onnx-0001>`__
|
||||
- U-Net
|
||||
- semantic segmentation
|
||||
- 368x480
|
||||
* - `yolo-v3-tiny-tf <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tiny-tf>`_
|
||||
* - `yolo-v3-tiny-tf <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tiny-tf>`__
|
||||
- YOLO v3 Tiny
|
||||
- object detection
|
||||
- 416x416
|
||||
* - `yolo_v4-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf>`_
|
||||
* - `yolo_v4-TF <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf>`__
|
||||
- Yolo-V4 TF
|
||||
- object detection
|
||||
- 608x608
|
||||
@ -107,16 +107,16 @@
|
||||
.. dropdown:: Where can I purchase the specific hardware used in the benchmarking?
|
||||
|
||||
Intel partners with vendors all over the world. For a list of Hardware Manufacturers, see the
|
||||
`Intel® AI: In Production Partners & Solutions Catalog <https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html>`_.
|
||||
`Intel® AI: In Production Partners & Solutions Catalog <https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html>`__.
|
||||
For more details, see the :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`.
|
||||
documentation. Before purchasing any hardware, you can test and run
|
||||
models remotely, using `Intel® DevCloud for the Edge <http://devcloud.intel.com/edge/>`_.
|
||||
models remotely, using `Intel® DevCloud for the Edge <http://devcloud.intel.com/edge/>`__.
|
||||
|
||||
.. dropdown:: How can I optimize my models for better performance or accuracy?
|
||||
|
||||
Set of guidelines and recommendations to optimize models are available in the
|
||||
:doc:`optimization guide <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>`.
|
||||
Join the conversation in the `Community Forum <https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit>`_ for further support.
|
||||
Join the conversation in the `Community Forum <https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit>`__ for further support.
|
||||
|
||||
.. dropdown:: Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support?
|
||||
|
||||
|
@ -73,7 +73,7 @@ Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are ba
|
||||
|
||||
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of December 13, 2022 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.
|
||||
|
||||
Performance varies by use, configuration and other factors. Learn more at :ref:`www.intel.com/PerformanceIndex<https://www.intel.com/PerformanceIndex>`.
|
||||
Performance varies by use, configuration and other factors. Learn more at `www.intel.com/PerformanceIndex <https://www.intel.com/PerformanceIndex>`__.
|
||||
|
||||
Your costs and results may vary.
|
||||
|
||||
@ -81,4 +81,8 @@ Intel optimizations, for Intel compilers or other products, may not optimize to
|
||||
|
||||
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
|
@ -3,7 +3,24 @@
|
||||
@endsphinxdirective
|
||||
# OpenVINO™ Model Server Benchmark Results {#openvino_docs_performance_benchmarks_ovms}
|
||||
|
||||
OpenVINO™ Model Server is an open-source, production-grade inference platform that exposes a set of models via a convenient inference API over gRPC or HTTP/REST. It employs the OpenVINO™ Runtime libraries from the Intel® Distribution of OpenVINO™ toolkit to extend workloads across Intel® hardware including CPU, GPU and others.
|
||||
|
||||
@sphinxdirective
|
||||
Click the "Benchmark Graphs" button to see the OpenVINO™ benchmark graphs. Select the models, the hardware platforms (CPU SKUs),
|
||||
precision and performance index from the lists and click the “Build Graphs” button.
|
||||
|
||||
.. button-link:: #
|
||||
:class: ov-toolkit-benchmark-results
|
||||
:color: primary
|
||||
:outline:
|
||||
|
||||
:material-regular:`bar_chart;1.4em` Benchmark Graphs
|
||||
|
||||
|
||||
OpenVINO™ Model Server is an open-source, production-grade inference platform that exposes a set of models via a convenient inference API
|
||||
over gRPC or HTTP/REST. It employs the OpenVINO™ Runtime libraries from the Intel® Distribution of OpenVINO™ toolkit to extend workloads
|
||||
across Intel® hardware including CPU, GPU and others.
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||

|
||||
|
||||
@ -21,216 +38,49 @@ OpenVINO™ Model Server is measured in multiple-client-single-server configurat
|
||||
|
||||
* **Execution Controller** is launched on the client platform. It is responsible for synchronization of the whole measurement process, downloading metrics from the load balancer, and presenting the final report of the execution.
|
||||
|
||||
## bert-small-uncased-whole-word-masking-squad-002 (INT8)
|
||||

|
||||
## bert-small-uncased-whole-word-masking-squad-002 (FP32)
|
||||

|
||||
## densenet-121 (INT8)
|
||||

|
||||
## densenet-121 (FP32)
|
||||

|
||||
## efficientdet-d0 (INT8)
|
||||

|
||||
## efficientdet-d0 (FP32)
|
||||

|
||||
## inception-v4 (INT8)
|
||||

|
||||
## inception-v4 (FP32)
|
||||

|
||||
## mobilenet-ssd (INT8)
|
||||

|
||||
## mobilenet-ssd (FP32)
|
||||

|
||||
## mobilenet-v2 (INT8)
|
||||

|
||||
## mobilenet-v2 (FP32)
|
||||

|
||||
## resnet-18 (INT8)
|
||||

|
||||
## resnet-18 (FP32)
|
||||

|
||||
## resnet-50 (INT8)
|
||||

|
||||
## resnet-50 (FP32)
|
||||

|
||||
## ssd-resnt34-1200 (INT8)
|
||||

|
||||
## ssd-resnt34-1200 (FP32)
|
||||

|
||||
## unet-camvid-onnx-001 (INT8)
|
||||

|
||||
## unet-camvid-onnx-001 (FP32)
|
||||

|
||||
## yolo-v3-tiny (INT8)
|
||||

|
||||
## yolo-v3-tiny (FP32)
|
||||

|
||||
## yolo-v4 (INT8)
|
||||

|
||||
## yolo-v4 (FP32)
|
||||

|
||||
|
||||
|
||||
## Platform Configurations
|
||||
|
||||
OpenVINO™ Model Server performance benchmark numbers are based on release 2022.2. Performance results are based on testing as of November 16, 2022 and may not reflect all publicly available updates.
|
||||
|
||||
|
||||
@sphinxdirective
|
||||
.. dropdown:: Platform with Intel® Xeon® Platinum 8260M
|
||||
|
||||
.. table::
|
||||
:widths: 25 25 50
|
||||
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| | Server Platform | Client Platform |
|
||||
+==========================+===========================================+========================================+
|
||||
| Motherboard | Inspur YZMB-00882-104 NF5280M5 | Inspur YZMB-00882-104 NF5280M5 |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| Memory | Samsung 16 x 16GB @ 2666 MT/s DDR4 | Kingston 16 x 16GB @ 2666 MT/s DDR4 |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| CPU | Intel® Xeon® Platinum 8260M CPU @ 2.40GHz | Intel® Xeon® Gold 6238M CPU @ 2.10GHz |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| Selected CPU Flags | Hyper Threading, Turbo Boost, DL Boost | Hyper Threading, Turbo Boost, DL Boost |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| CPU Thermal Design Power | 162W | 150W |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| Operating System | Ubuntu 20.04.4 LTS | Ubuntu 20.04.4 LTS |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| Kernel Version | 5.4.0-107-generic | 5.4.0-107-generic |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| BIOS Vendor | American Megatrends Inc. | AMI |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| BIOS Version & Release | 4.1.16; date: 06/23/2020 | 4.1.16; date: 06/23/2020 |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| Docker Version | 20.10.3 | 20.10.3 |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
| Network Speed | 40 Gb/s | 40 Gb/s |
|
||||
+--------------------------+-------------------------------------------+----------------------------------------+
|
||||
|
||||
.. dropdown:: Platform with 6238M
|
||||
Platform & Configurations
|
||||
####################################
|
||||
|
||||
.. table::
|
||||
:widths: 25 25 50
|
||||
For a listing of all platforms and configurations used for testing, refer to the following:
|
||||
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| | Server Platform | Client Platform |
|
||||
+==========================+===========================================+============================================+
|
||||
| Motherboard | Inspur YZMB-00882-104 NF5280M5 | Inspur YZMB-00882-104 NF5280M5 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Memory | Kingston 16 x 16GB @ 2666 MT/s DDR4 | Samsung 16 x 16GB @ 2666 MT/s DDR4 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU | Intel® Xeon® Gold 6238M CPU @ 2.10GHz | Intel® Xeon® Platinum 8260M CPU @ 2.40GHz |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Selected CPU Flags | Hyper Threading, Turbo Boost, DL Boost | Hyper Threading, Turbo Boost, DL Boost |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU Thermal Design Power | 150W | 162W |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Operating System | Ubuntu 20.04.4 LTS | Ubuntu 20.04.4 LTS |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Kernel Version | 5.4.0-107-generic | 5.4.0-107-generic |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Vendor | AMI | American Megatrends Inc. |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Version & Release | 4.1.16; date: 06/23/2020 | 4.1.16; date: 06/23/2020 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Docker Version | 20.10.3 | 20.10.3 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Network Speed | 40 Gb/s | 40 Gb/s |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
.. button-link:: _static/benchmarks_files/platform_list_22.3.pdf
|
||||
:color: primary
|
||||
:outline:
|
||||
|
||||
.. dropdown:: Platform with Intel® Core™ i9-10920X
|
||||
:material-regular:`download;1.5em` Click for Hardware Platforms [PDF]
|
||||
|
||||
.. table::
|
||||
:widths: 25 25 50
|
||||
.. button-link:: _static/benchmarks_files/OV-2022.3-system-info-detailed.xlsx
|
||||
:color: primary
|
||||
:outline:
|
||||
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| | Server Platform | Client Platform |
|
||||
+==========================+===========================================+============================================+
|
||||
| Motherboard | ASUSTeK COMPUTER INC. PRIME X299-A II | ASUSTeK COMPUTER INC. PRIME Z370-P |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Memory | Corsair 4 x 16GB @ 2666 MT/s DDR4 | Corsair 4 x 16GB @ 2133 MT/s DDR4 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU | Intel® Core™ i9-10920X CPU @ 3.50GHz | Intel® Core™ i7-8700T CPU @ 2.40GHz |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Selected CPU Flags | Hyper Threading, Turbo Boost, DL Boost | Hyper Threading, Turbo Boost, DL Boost |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU Thermal Design Power | 165W | 35 W |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Operating System | Ubuntu 20.04.4 LTS | Ubuntu 20.04.4 LTS |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Kernel Version | 5.4.0-107-generic | 5.4.0-107-generic |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Vendor | American Megatrends Inc. | American Megatrends Inc. |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Version & Release | 0702; date: 06/10/2020 | 2401; date: 07/15/2019 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Docker Version | 19.03.13 | 19.03.14 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Network Speed | 10 Gb/s | 10 Gb/s |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
|
||||
:material-regular:`download;1.5em` Click for Configuration Details [XLSX]
|
||||
|
||||
.. dropdown:: Platform with Intel® Core™ i7-8700T
|
||||
.. the files above need to be changed to the proper ones!!!
|
||||
|
||||
.. table::
|
||||
:widths: 25 25 50
|
||||
The presented performance benchmark numbers are based on the release 2022.2 of the Intel® Distribution of OpenVINO™ toolkit.
|
||||
The benchmark application loads the OpenVINO™ Runtime and executes inferences on the specified hardware (CPU, GPU or GNA).
|
||||
It measures the time spent on actual inference (excluding any pre or post processing) and then reports on the inferences per second (or Frames Per Second).
|
||||
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| | Server Platform | Client Platform |
|
||||
+==========================+===========================================+============================================+
|
||||
| Motherboard | ASUSTeK COMPUTER INC. PRIME Z370-P | ASUSTeK COMPUTER INC. PRIME X299-A II |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Memory | Corsair 4 x 16GB @ 2133 MT/s DDR4 | Corsair 4 x 16GB @ 2666 MT/s DDR4 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Selected CPU Flags | Hyper Threading, Turbo Boost | Hyper Threading, Turbo Boost |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU Thermal Design Power | 35W | 165 W |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Operating System | Ubuntu 20.04.4 LTS | Ubuntu 20.04.4 LTS |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Kernel Version | 5.4.0-107-generic | 5.4.0-107-generic |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Vendor | American Megatrends Inc. | American Megatrends Inc. |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Version & Release | 2401; date: 07/15/2019 | 0702; date: 06/10/2020 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Docker Version | 19.03.14 | 19.03.13 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Network Speed | 10 Gb/s | 10 Gb/s |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
Disclaimers
|
||||
####################################
|
||||
|
||||
.. dropdown:: Platform with Intel® Core™ i5-8500
|
||||
Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2022.3.
|
||||
|
||||
.. table::
|
||||
:widths: 25 25 50
|
||||
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of November 16, 2022 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.
|
||||
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| | Server Platform | Client Platform |
|
||||
+==========================+===========================================+============================================+
|
||||
| Motherboard | ASUSTeK COMPUTER INC. PRIME Z370-A | Gigabyte Technology Co., Ltd. Z390 UD |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Memory | Corsair 2 x 16GB @ 2133 MT/s DDR4 | 029E 4 x 8GB @ 2400 MT/s DDR4 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i3-8100 CPU @ 3.60GHz |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Selected CPU Flags | Turbo Boost | |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| CPU Thermal Design Power | 65W | 65 W |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Operating System | Ubuntu 20.04.4 LTS | Ubuntu 20.04.1 LTS |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Kernel Version | 5.4.0-113-generic | 5.4.0-52-generic |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Vendor | American Megatrends Inc. | American Megatrends Inc. |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| BIOS Version & Release | 3004; date: 07/12/2021 | F10j; date: 09/16/2020 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Docker Version | 19.03.13 | 20.10.0 |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
| Network Speed | 40 Gb/s | 40 Gb/s |
|
||||
+--------------------------+-------------------------------------------+--------------------------------------------+
|
||||
Performance varies by use, configuration and other factors. Learn more at `www.intel.com/PerformanceIndex <https://www.intel.com/PerformanceIndex>`__.
|
||||
|
||||
@endsphinxdirective
|
||||
Your costs and results may vary.
|
||||
|
||||
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
|
||||
|
||||
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
|
||||
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,12 +1,16 @@
|
||||
# Model Accuracy {#openvino_docs_performance_int8_vs_fp32}
|
||||
|
||||
The following table presents the absolute accuracy drop calculated as the accuracy difference between FP32 and INT8 representations of a model on two platforms
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
The following table presents the absolute accuracy drop calculated as the accuracy difference
|
||||
between FP32 and INT8 representations of a model on two platforms.
|
||||
|
||||
* A - Intel® Core™ i9-9000K (AVX2)
|
||||
* B - Intel® Xeon® 6338, (VNNI)
|
||||
* C - Intel® Flex-170
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. list-table:: Model Accuracy
|
||||
:header-rows: 1
|
||||
|
||||
|
@ -563,7 +563,7 @@ ov::Tensor get_random_tensor(const std::pair<std::string, benchmark_app::InputIn
|
||||
} else if (type == ov::element::f64) {
|
||||
return create_tensor_random<double, double>(inputInfo.second);
|
||||
} else if (type == ov::element::f16) {
|
||||
return create_tensor_random<short, short>(inputInfo.second);
|
||||
return create_tensor_random<ov::float16, float>(inputInfo.second);
|
||||
} else if (type == ov::element::i32) {
|
||||
return create_tensor_random<int32_t, int32_t>(inputInfo.second);
|
||||
} else if (type == ov::element::i64) {
|
||||
|
@ -356,11 +356,10 @@ int main(int argc, char* argv[]) {
|
||||
|
||||
bool perf_counts = false;
|
||||
// check if using the virtual device
|
||||
auto if_auto = std::find(devices.begin(), devices.end(), "AUTO") != devices.end();
|
||||
auto if_multi = std::find(devices.begin(), devices.end(), "MULTI") != devices.end();
|
||||
auto is_virtual = is_virtual_device_found(devices);
|
||||
auto hardware_devices = devices;
|
||||
// Remove the hardware devices if AUTO/MULTI appears in the devices list.
|
||||
if (if_auto || if_multi) {
|
||||
// Remove the hardware devices if AUTO/MULTI/HETERO appears in the devices list.
|
||||
if (is_virtual) {
|
||||
devices.clear();
|
||||
// Parse out the currect virtual device as the target device.
|
||||
std::string virtual_device = split(device_name, ':').at(0);
|
||||
@ -376,8 +375,11 @@ int main(int argc, char* argv[]) {
|
||||
auto& device_config = config[device];
|
||||
|
||||
// high-level performance modes
|
||||
auto ov_perf_hint = get_performance_hint(device, core);
|
||||
device_config.emplace(ov::hint::performance_mode(ov_perf_hint));
|
||||
if (!device_config.count(ov::hint::performance_mode.name())) {
|
||||
device_config.emplace(ov::hint::performance_mode(get_performance_hint(device, core)));
|
||||
}
|
||||
auto ov_perf_hint = device_config.at(ov::hint::performance_mode.name()).as<ov::hint::PerformanceMode>();
|
||||
|
||||
if (FLAGS_nireq != 0)
|
||||
device_config.emplace(ov::hint::num_requests(unsigned(FLAGS_nireq)));
|
||||
|
||||
@ -415,7 +417,7 @@ int main(int argc, char* argv[]) {
|
||||
std::end(supported_properties);
|
||||
};
|
||||
// the rest are individual per-device settings (overriding the values set with perf modes)
|
||||
auto setThroughputStreams = [&]() {
|
||||
auto set_throughput_streams = [&]() {
|
||||
std::string key = getDeviceTypeFromName(device) + "_THROUGHPUT_STREAMS";
|
||||
auto it_device_nstreams = device_nstreams.find(device);
|
||||
if (it_device_nstreams != device_nstreams.end()) {
|
||||
@ -426,34 +428,13 @@ int main(int argc, char* argv[]) {
|
||||
// Use API 2.0 key for streams
|
||||
key = ov::num_streams.name();
|
||||
device_config[key] = it_device_nstreams->second;
|
||||
} else if (device == "MULTI" || device == "AUTO") {
|
||||
// check if the element contains the hardware device property
|
||||
auto value_vec = split(it_device_nstreams->second, ' ');
|
||||
if (value_vec.size() == 1) {
|
||||
key = ov::num_streams.name();
|
||||
device_config[key] = it_device_nstreams->second;
|
||||
} else {
|
||||
// set device nstreams properties in the AUTO/MULTI plugin
|
||||
std::stringstream strm(it_device_nstreams->second);
|
||||
std::map<std::string, std::string> devices_property;
|
||||
ov::util::Read<std::map<std::string, std::string>>{}(strm, devices_property);
|
||||
for (const auto& it : devices_property) {
|
||||
if (device_config.find(it.first) == device_config.end() ||
|
||||
(is_load_config && is_dev_set_property[it.first])) {
|
||||
// Create ov::device::properties with ov::num_stream and
|
||||
// 1. Insert this ov::device::properties into device config if this
|
||||
// ov::device::properties isn't existed. Otherwise,
|
||||
// 2. Replace the existed ov::device::properties within device config.
|
||||
is_dev_set_property[it.first] = false;
|
||||
device_config.erase(it.first);
|
||||
device_config.insert(
|
||||
ov::device::properties(it.first, ov::num_streams(std::stoi(it.second))));
|
||||
} else {
|
||||
auto& property = device_config[it.first].as<ov::AnyMap>();
|
||||
property.emplace(ov::num_streams(std::stoi(it.second)));
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if (is_virtual_device(device)) {
|
||||
key = ov::num_streams.name();
|
||||
update_device_config_for_virtual_device(it_device_nstreams->second,
|
||||
device_config,
|
||||
ov::num_streams,
|
||||
is_dev_set_property,
|
||||
is_load_config);
|
||||
} else {
|
||||
throw std::logic_error("Device " + device + " doesn't support config key '" + key + "' " +
|
||||
"and '" + ov::num_streams.name() + "'!" +
|
||||
@ -477,7 +458,7 @@ int main(int argc, char* argv[]) {
|
||||
// Use API 2.0 key for streams
|
||||
key = ov::num_streams.name();
|
||||
device_config[key] = ov::streams::AUTO;
|
||||
} else if (device == "MULTI" || device == "AUTO") {
|
||||
} else if (is_virtual_device(device)) {
|
||||
// Set nstreams to default value auto if no nstreams specified from cmd line.
|
||||
for (auto& hwdevice : hardware_devices) {
|
||||
std::string key = std::string(getDeviceTypeFromName(hwdevice) + "_THROUGHPUT_STREAMS");
|
||||
@ -502,34 +483,12 @@ int main(int argc, char* argv[]) {
|
||||
// set to user defined value
|
||||
if (supported(ov::inference_precision.name())) {
|
||||
device_config.emplace(ov::inference_precision(it_device_infer_precision->second));
|
||||
} else if (device == "MULTI" || device == "AUTO") {
|
||||
// check if the element contains the hardware device property
|
||||
auto value_vec = split(it_device_infer_precision->second, ' ');
|
||||
if (value_vec.size() == 1) {
|
||||
auto key = ov::inference_precision.name();
|
||||
device_config[key] = it_device_infer_precision->second;
|
||||
} else {
|
||||
// set device inference_precison properties in the AUTO/MULTI plugin
|
||||
std::stringstream strm(it_device_infer_precision->second);
|
||||
std::map<std::string, std::string> devices_property;
|
||||
ov::util::Read<std::map<std::string, std::string>>{}(strm, devices_property);
|
||||
for (const auto& it : devices_property) {
|
||||
if (device_config.find(it.first) == device_config.end() ||
|
||||
(is_load_config && is_dev_set_property[it.first])) {
|
||||
// Create ov::device::properties with ov::inference_precision and
|
||||
// 1. Insert this ov::device::properties into device config if this
|
||||
// ov::device::properties isn't existed. Otherwise,
|
||||
// 2. Replace the existed ov::device::properties within device config.
|
||||
is_dev_set_property[it.first] = false;
|
||||
device_config.erase(it.first);
|
||||
device_config.insert(
|
||||
ov::device::properties(it.first, ov::inference_precision(it.second)));
|
||||
} else {
|
||||
auto& property = device_config[it.first].as<ov::AnyMap>();
|
||||
property.emplace(ov::inference_precision(it.second));
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if (is_virtual_device(device)) {
|
||||
update_device_config_for_virtual_device(it_device_infer_precision->second,
|
||||
device_config,
|
||||
ov::inference_precision,
|
||||
is_dev_set_property,
|
||||
is_load_config);
|
||||
} else {
|
||||
throw std::logic_error("Device " + device + " doesn't support config key '" +
|
||||
ov::inference_precision.name() + "'! " +
|
||||
@ -556,7 +515,7 @@ int main(int argc, char* argv[]) {
|
||||
if (supported(property_name) || device_name == "AUTO") {
|
||||
// create nthreads/pin primary property for HW device or AUTO if -d is AUTO directly.
|
||||
device_config.emplace(property);
|
||||
} else if (if_auto || if_multi) {
|
||||
} else if (is_virtual) {
|
||||
// Create secondary property of -nthreads/-pin only for CPU if CPU device appears in the devices
|
||||
// list specified by -d.
|
||||
for (auto& device : hardware_devices) {
|
||||
@ -571,38 +530,10 @@ int main(int argc, char* argv[]) {
|
||||
if (isFlagSetInCommandLine("pin"))
|
||||
set_nthreads_pin("pin");
|
||||
|
||||
if (device.find("CPU") != std::string::npos || device.find("GPU") != std::string::npos) {
|
||||
// CPU supports few special performance-oriented keys
|
||||
// for CPU and GPU execution, more throughput-oriented execution via streams
|
||||
setThroughputStreams();
|
||||
set_infer_precision();
|
||||
} else if (device.find("GNA") != std::string::npos) {
|
||||
set_infer_precision();
|
||||
} else if (device.find("AUTO") != std::string::npos) {
|
||||
setThroughputStreams();
|
||||
set_infer_precision();
|
||||
device_nstreams.erase(device);
|
||||
} else if (device.find("MULTI") != std::string::npos) {
|
||||
setThroughputStreams();
|
||||
set_infer_precision();
|
||||
if ((device_name.find("GPU") != std::string::npos) && (device_name.find("CPU") != std::string::npos)) {
|
||||
slog::warn << "GPU throttling is turned on. Multi-device execution with "
|
||||
"the CPU + GPU performs best with GPU throttling hint, "
|
||||
<< "which releases another CPU thread (that is otherwise "
|
||||
"used by the GPU driver for active polling)."
|
||||
<< slog::endl;
|
||||
set_throughput_streams();
|
||||
set_infer_precision();
|
||||
|
||||
device_config.insert(ov::device::properties("GPU", {{GPU_CONFIG_KEY(PLUGIN_THROTTLE), 1}}));
|
||||
// limit threading for CPU portion of inference
|
||||
if (!isFlagSetInCommandLine("pin")) {
|
||||
auto it_affinity = device_config.find(ov::affinity.name());
|
||||
if (it_affinity != device_config.end()) {
|
||||
slog::warn << "Turn off threads pinning for " << device
|
||||
<< " device since multi-scenario with GPU device is used." << slog::endl;
|
||||
it_affinity->second = ov::Affinity::NONE;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (is_virtual_device(device)) {
|
||||
device_nstreams.erase(device);
|
||||
}
|
||||
}
|
||||
@ -905,7 +836,21 @@ int main(int argc, char* argv[]) {
|
||||
if (cfg == ov::supported_properties)
|
||||
continue;
|
||||
auto prop = compiledModel.get_property(cfg);
|
||||
slog::info << " " << cfg << ": " << prop.as<std::string>() << slog::endl;
|
||||
if (cfg == ov::device::properties) {
|
||||
auto devices_properties = prop.as<ov::AnyMap>();
|
||||
for (auto& item : devices_properties) {
|
||||
slog::info << " " << item.first << ": " << slog::endl;
|
||||
for (auto& item2 : item.second.as<ov::AnyMap>()) {
|
||||
if (item2.first == ov::supported_properties ||
|
||||
item2.first == METRIC_KEY(SUPPORTED_CONFIG_KEYS) ||
|
||||
item2.first == METRIC_KEY(SUPPORTED_METRICS))
|
||||
continue;
|
||||
slog::info << " " << item2.first << ": " << item2.second.as<std::string>() << slog::endl;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
slog::info << " " << cfg << ": " << prop.as<std::string>() << slog::endl;
|
||||
}
|
||||
}
|
||||
|
||||
// Update number of streams
|
||||
|
@ -107,13 +107,27 @@ std::vector<float> split_float(const std::string& s, char delim) {
|
||||
return result;
|
||||
}
|
||||
|
||||
static const std::vector<std::string> meta_plugins{"MULTI", "HETERO", "AUTO"};
|
||||
bool is_virtual_device(const std::string& device_name) {
|
||||
return std::find(meta_plugins.begin(), meta_plugins.end(), device_name) != meta_plugins.end();
|
||||
}
|
||||
|
||||
bool is_virtual_device_found(const std::vector<std::string>& device_names) {
|
||||
for (const auto& device_name : device_names) {
|
||||
if (is_virtual_device(device_name)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
std::vector<std::string> parse_devices(const std::string& device_string) {
|
||||
std::string comma_separated_devices = device_string;
|
||||
auto colon = comma_separated_devices.find(":");
|
||||
std::vector<std::string> result;
|
||||
if (colon != std::string::npos) {
|
||||
auto target_device = comma_separated_devices.substr(0, colon);
|
||||
if (target_device == "AUTO" || target_device == "MULTI") {
|
||||
if (is_virtual_device(target_device)) {
|
||||
result.push_back(target_device);
|
||||
}
|
||||
auto bracket = comma_separated_devices.find("("); // e.g. in BATCH:GPU(4)
|
||||
@ -137,8 +151,8 @@ void parse_value_for_virtual_device(const std::string& device, std::map<std::str
|
||||
// Remove the element that the key is virtual device MULTI
|
||||
// e.g. MULTI:xxx -nstreams 2 will set nstreams 2 to xxx.
|
||||
values_string.erase(item_virtual);
|
||||
} else if (device == "AUTO") {
|
||||
// Just keep the element that the key is virtual device AUTO
|
||||
} else if ((device == "AUTO") || (device == "HETERO")) {
|
||||
// Just keep the element that the key is virtual device AUTO/HETERO
|
||||
// e.g. AUTO:xxx,xxx -nstreams 2 will trigger exception that AUTO plugin didn't support nstream property.
|
||||
auto value = item_virtual->second;
|
||||
values_string.clear();
|
||||
@ -146,23 +160,92 @@ void parse_value_for_virtual_device(const std::string& device, std::map<std::str
|
||||
return;
|
||||
}
|
||||
}
|
||||
std::stringstream ss;
|
||||
auto iter = values_string.begin();
|
||||
while (iter != values_string.end()) {
|
||||
if (iter->first == device) {
|
||||
iter++;
|
||||
continue;
|
||||
}
|
||||
values_string[device] += iter->first + " " + iter->second + " ";
|
||||
if (ss.str().empty())
|
||||
ss << '{';
|
||||
else
|
||||
ss << ',';
|
||||
ss << iter->first << ":" << iter->second;
|
||||
iter = values_string.erase(iter);
|
||||
}
|
||||
if (values_string.find(device) != values_string.end()) {
|
||||
auto& nstreams = values_string[device];
|
||||
// Remove the space at the tail.
|
||||
nstreams.pop_back();
|
||||
if (!ss.str().empty()) {
|
||||
ss << '}';
|
||||
values_string[device] = ss.str();
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void update_device_config_for_virtual_device(const std::string& value,
|
||||
ov::AnyMap& device_config,
|
||||
ov::Property<T, ov::PropertyMutability::RW> property,
|
||||
std::map<std::string, bool>& is_dev_set_property,
|
||||
bool is_load_config) {
|
||||
// check if the element contains the hardware device property
|
||||
if (split(value, ':').size() == 1) {
|
||||
device_config[property.name()] = value;
|
||||
} else {
|
||||
// set device nstreams properties in the AUTO/MULTI/HETERO plugin
|
||||
std::stringstream strm(value);
|
||||
std::map<std::string, std::string> devices_property;
|
||||
ov::util::Read<std::map<std::string, std::string>>{}(strm, devices_property);
|
||||
for (const auto& it : devices_property) {
|
||||
const auto& device_name = it.first;
|
||||
const auto& device_value = it.second;
|
||||
if (device_config.find(ov::device::properties.name()) == device_config.end() ||
|
||||
(is_load_config && is_dev_set_property[device_name])) {
|
||||
// Create ov::device::properties with ov::num_stream/ov::inference_precision and
|
||||
// 1. Insert this ov::device::properties into device config if this
|
||||
// ov::device::properties isn't existed. Otherwise,
|
||||
// 2. Replace the existed ov::device::properties within device config.
|
||||
is_dev_set_property[device_name] = false;
|
||||
device_config.erase(device_name);
|
||||
device_config[ov::device::properties.name()] = ov::AnyMap{};
|
||||
auto& secondary_property = device_config.at(ov::device::properties.name()).as<ov::AnyMap>();
|
||||
secondary_property[device_name] = ov::AnyMap{{property.name(), device_value}};
|
||||
} else {
|
||||
auto& secondary_property = device_config.at(ov::device::properties.name()).as<ov::AnyMap>();
|
||||
if (secondary_property.count(device_name)) {
|
||||
auto& device_property = secondary_property.at(device_name).as<ov::AnyMap>();
|
||||
device_property.emplace(property(device_value));
|
||||
} else {
|
||||
secondary_property[device_name] = ov::AnyMap{{property.name(), device_value}};
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void update_device_config_for_virtual_device(const std::string& value,
|
||||
ov::AnyMap& device_config,
|
||||
ov::Property<ov::streams::Num, ov::PropertyMutability::RW> property,
|
||||
std::map<std::string, bool>& is_dev_set_property,
|
||||
bool is_load_config) {
|
||||
return update_device_config_for_virtual_device<ov::streams::Num>(value,
|
||||
device_config,
|
||||
property,
|
||||
is_dev_set_property,
|
||||
is_load_config);
|
||||
}
|
||||
|
||||
void update_device_config_for_virtual_device(const std::string& value,
|
||||
ov::AnyMap& device_config,
|
||||
ov::Property<ov::element::Type, ov::PropertyMutability::RW> property,
|
||||
std::map<std::string, bool>& is_dev_set_property,
|
||||
bool is_load_config) {
|
||||
return update_device_config_for_virtual_device<ov::element::Type>(value,
|
||||
device_config,
|
||||
property,
|
||||
is_dev_set_property,
|
||||
is_load_config);
|
||||
}
|
||||
|
||||
std::map<std::string, std::string> parse_value_per_device(const std::vector<std::string>& devices,
|
||||
const std::string& values_string) {
|
||||
// Format: <device1>:<value1>,<device2>:<value2> or just <value>
|
||||
@ -691,27 +774,12 @@ void dump_config(const std::string& filename, const std::map<std::string, ov::An
|
||||
nlohmann::json jsonConfig;
|
||||
for (const auto& item : config) {
|
||||
std::string deviceName = item.first;
|
||||
std::map<std::string, ov::AnyMap> device_properties;
|
||||
for (const auto& option : item.second) {
|
||||
if (option.second.is<ov::AnyMap>()) {
|
||||
// hw device properties
|
||||
device_properties[option.first] = option.second.as<ov::AnyMap>();
|
||||
} else {
|
||||
// primary property
|
||||
std::stringstream strm;
|
||||
option.second.print(strm);
|
||||
auto property_string = strm.str();
|
||||
jsonConfig[deviceName][option.first] = property_string;
|
||||
}
|
||||
if (!device_properties.empty()) {
|
||||
for (auto& item : device_properties) {
|
||||
auto hw_device_name = item.first;
|
||||
for (auto& property : item.second) {
|
||||
jsonConfig[deviceName]["DEVICE_PROPERTIES"][hw_device_name][property.first] =
|
||||
property.second.as<std::string>();
|
||||
}
|
||||
}
|
||||
}
|
||||
// primary property
|
||||
std::stringstream strm;
|
||||
option.second.print(strm);
|
||||
auto property_string = strm.str();
|
||||
jsonConfig[deviceName][option.first] = property_string;
|
||||
}
|
||||
}
|
||||
|
||||
@ -740,23 +808,7 @@ void load_config(const std::string& filename, std::map<std::string, ov::AnyMap>&
|
||||
const std::string& deviceName = item.key();
|
||||
const auto& itemValue = item.value();
|
||||
for (auto option = itemValue.cbegin(), itemValueEnd = itemValue.cend(); option != itemValueEnd; ++option) {
|
||||
if (option.key() != "DEVICE_PROPERTIES") {
|
||||
config[deviceName][option.key()] = option.value().get<std::string>();
|
||||
continue;
|
||||
}
|
||||
const auto& optionValue = option.value();
|
||||
for (auto hw_properties = optionValue.cbegin(), optionValueEnd = optionValue.cend();
|
||||
hw_properties != optionValueEnd;
|
||||
++hw_properties) {
|
||||
const std::string& hw_device_name = hw_properties.key();
|
||||
std::map<std::string, ov::Any> hw_device_properties;
|
||||
const auto& hw_propertiesValue = hw_properties.value();
|
||||
for (auto property = hw_propertiesValue.cbegin(), hw_propertiesEnd = hw_propertiesValue.cend();
|
||||
property != hw_propertiesEnd;
|
||||
++property)
|
||||
hw_device_properties[property.key()] = property.value().get<std::string>();
|
||||
config[deviceName][hw_device_name] = hw_device_properties;
|
||||
}
|
||||
config[deviceName][option.key()] = option.value().get<std::string>();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -58,11 +58,19 @@ using InputsInfo = std::map<std::string, InputInfo>;
|
||||
using PartialShapes = std::map<std::string, ngraph::PartialShape>;
|
||||
} // namespace benchmark_app
|
||||
|
||||
bool is_virtual_device(const std::string& device_name);
|
||||
bool is_virtual_device_found(const std::vector<std::string>& device_names);
|
||||
std::vector<std::string> parse_devices(const std::string& device_string);
|
||||
uint32_t device_default_device_duration_in_seconds(const std::string& device);
|
||||
std::map<std::string, std::string> parse_value_per_device(const std::vector<std::string>& devices,
|
||||
const std::string& values_string);
|
||||
void parse_value_for_virtual_device(const std::string& device, std::map<std::string, std::string>& values_string);
|
||||
template <typename T>
|
||||
void update_device_config_for_virtual_device(const std::string& value,
|
||||
ov::AnyMap& device_config,
|
||||
ov::Property<T, ov::PropertyMutability::RW> property,
|
||||
std::map<std::string, bool>& is_dev_set_property,
|
||||
bool is_load_config = false);
|
||||
std::string get_shapes_string(const benchmark_app::PartialShapes& shapes);
|
||||
size_t get_batch_size(const benchmark_app::InputsInfo& inputs_info);
|
||||
std::vector<std::string> split(const std::string& s, char delim);
|
||||
|
@ -255,7 +255,11 @@ int main(int argc, char* argv[]) {
|
||||
// -----------------------------------------------------------------------------------------------------
|
||||
// --------------------------- Step 2. Loading model to the device ------------------------------------------
|
||||
if (useGna) {
|
||||
genericPluginConfig.insert(std::begin(gnaPluginConfig), std::end(gnaPluginConfig));
|
||||
if (useHetero) {
|
||||
genericPluginConfig.insert(ov::device::properties("GNA", gnaPluginConfig));
|
||||
} else {
|
||||
genericPluginConfig.insert(std::begin(gnaPluginConfig), std::end(gnaPluginConfig));
|
||||
}
|
||||
}
|
||||
auto t0 = Time::now();
|
||||
ms loadTime = std::chrono::duration_cast<ms>(Time::now() - t0);
|
||||
|
@ -6,6 +6,7 @@
|
||||
|
||||
#include "pyopenvino/core/common.hpp"
|
||||
#include "pyopenvino/graph/any.hpp"
|
||||
#include "pyopenvino/utils/utils.hpp"
|
||||
|
||||
namespace py = pybind11;
|
||||
|
||||
@ -158,6 +159,31 @@ void regmodule_properties(py::module m) {
|
||||
wrap_property_RO(m_device, ov::device::capabilities, "capabilities");
|
||||
wrap_property_RO(m_device, ov::device::uuid, "uuid");
|
||||
|
||||
// Special case: ov::device::properties
|
||||
m_device.def("properties", []() {
|
||||
return ov::device::properties.name();
|
||||
});
|
||||
|
||||
m_device.def("properties", [](py::args& args) {
|
||||
ov::AnyMap value = {};
|
||||
for (auto v : args) {
|
||||
if (!py::isinstance<py::dict>(v)) {
|
||||
throw py::type_error("Incorrect passed value: " + std::string(py::str(v)) +
|
||||
", expected dictionary instead of " + typeid(v).name());
|
||||
}
|
||||
auto dict = py::cast<py::dict>(v);
|
||||
for (auto item : dict) {
|
||||
if (!py::isinstance<py::str>(item.first)) {
|
||||
throw py::type_error("Incorrect passed key in value: " + std::string(py::str(item.first)) +
|
||||
", expected string instead of " + typeid(item.first).name());
|
||||
}
|
||||
value[py::cast<std::string>(item.first)] =
|
||||
Common::utils::py_object_to_any(py::cast<py::object>(item.second));
|
||||
}
|
||||
}
|
||||
return ov::device::properties(value);
|
||||
});
|
||||
|
||||
// Modules made in pybind cannot easily register attributes, thus workaround is needed.
|
||||
// Let's simulate module with attributes by creating empty proxy class called FakeModuleName.
|
||||
class FakeCapability {};
|
||||
|
@ -99,6 +99,10 @@ py::object from_ov_any(const ov::Any& any) {
|
||||
else if (any.is<std::map<ov::element::Type, float>>()) {
|
||||
return py::cast(any.as<std::map<ov::element::Type, float>>());
|
||||
}
|
||||
// Check for std::map<std::string, Any> {
|
||||
else if (any.is<std::map<std::string, ov::Any>>()) {
|
||||
return py::cast(any.as<std::map<std::string, ov::Any>>());
|
||||
}
|
||||
// Check for std::vector<ov::PropertyName>
|
||||
else if (any.is<std::vector<ov::PropertyName>>()) {
|
||||
auto val = any.as<std::vector<ov::PropertyName>>();
|
||||
@ -194,6 +198,33 @@ void deprecation_warning(const std::string& function_name, const std::string& ve
|
||||
PyErr_WarnEx(PyExc_DeprecationWarning, ss.str().data(), 2);
|
||||
}
|
||||
|
||||
bool py_object_is_any_map(const py::object& py_obj) {
|
||||
if (!py::isinstance<py::dict>(py_obj)) {
|
||||
return false;
|
||||
}
|
||||
auto dict = py::cast<py::dict>(py_obj);
|
||||
return std::all_of(dict.begin(), dict.end(), [&](const std::pair<py::object::handle, py::object::handle>& elem) {
|
||||
return py::isinstance<py::str>(elem.first);
|
||||
});
|
||||
}
|
||||
|
||||
ov::AnyMap py_object_to_any_map(const py::object& py_obj) {
|
||||
OPENVINO_ASSERT(py_object_is_any_map(py_obj), "Unsupported attribute type.");
|
||||
ov::AnyMap return_value = {};
|
||||
for (auto& item : py::cast<py::dict>(py_obj)) {
|
||||
std::string key = py::cast<std::string>(item.first);
|
||||
py::object value = py::cast<py::object>(item.second);
|
||||
if (py::isinstance<ov::Affinity>(value)) {
|
||||
return_value[key] = py::cast<ov::Affinity>(value);
|
||||
} else if (py_object_is_any_map(value)) {
|
||||
return_value[key] = Common::utils::py_object_to_any_map(value);
|
||||
} else {
|
||||
return_value[key] = Common::utils::py_object_to_any(value);
|
||||
}
|
||||
}
|
||||
return return_value;
|
||||
}
|
||||
|
||||
ov::Any py_object_to_any(const py::object& py_obj) {
|
||||
// Python types
|
||||
if (py::isinstance<py::str>(py_obj)) {
|
||||
@ -244,6 +275,8 @@ ov::Any py_object_to_any(const py::object& py_obj) {
|
||||
OPENVINO_ASSERT(false, "Unsupported attribute type.");
|
||||
}
|
||||
// OV types
|
||||
} else if (py_object_is_any_map(py_obj)) {
|
||||
return py_object_to_any_map(py_obj);
|
||||
} else if (py::isinstance<ov::Any>(py_obj)) {
|
||||
return py::cast<ov::Any>(py_obj);
|
||||
} else if (py::isinstance<ov::element::Type>(py_obj)) {
|
||||
|
@ -23,6 +23,10 @@ namespace utils {
|
||||
|
||||
void deprecation_warning(const std::string& function_name, const std::string& version = std::string(), const std::string& message = std::string());
|
||||
|
||||
bool py_object_is_any_map(const py::object& py_obj);
|
||||
|
||||
ov::AnyMap py_object_to_any_map(const py::object& py_obj);
|
||||
|
||||
ov::Any py_object_to_any(const py::object& py_obj);
|
||||
|
||||
ov::pass::Serialize::Version convert_to_version(const std::string& version);
|
||||
|
@ -33,18 +33,18 @@ def test_any_list(values, data_type):
|
||||
assert ovany.get() == values
|
||||
|
||||
|
||||
@pytest.mark.parametrize(("value_dict", "data_type"), [
|
||||
({"key": "value"}, str),
|
||||
({21: 37}, int),
|
||||
({21.0: 37.0}, float),
|
||||
@pytest.mark.parametrize(("value_dict", "value_type", "data_type"), [
|
||||
({"key": "value"}, OVAny, str),
|
||||
({21: 37}, int, int),
|
||||
({21.0: 37.0}, float, float),
|
||||
])
|
||||
def test_any_dict(value_dict, data_type):
|
||||
def test_any_dict(value_dict, value_type, data_type):
|
||||
ovany = OVAny(value_dict)
|
||||
key = list(value_dict.keys())[0]
|
||||
assert isinstance(ovany.value, dict)
|
||||
assert ovany[key] == list(value_dict.values())[0]
|
||||
assert len(ovany.value) == 1
|
||||
assert type(ovany.value[key]) == data_type
|
||||
assert type(ovany.value[key]) == value_type
|
||||
assert type(list(value_dict.values())[0]) == data_type
|
||||
assert ovany.get() == value_dict
|
||||
|
||||
|
@ -1092,6 +1092,8 @@ def test_mixed_scalar_infer(device, shared_flag, input_data):
|
||||
])
|
||||
def test_mixed_dynamic_infer(device, shared_flag, input_data):
|
||||
core = Core()
|
||||
if device == "CPU" and "Intel" not in core.get_property(device, "FULL_DEVICE_NAME"):
|
||||
pytest.skip("This test fails on ARM plugin because it doesn't support dynamic shapes.")
|
||||
param0 = ops.parameter([], np.float32, name="data0")
|
||||
param1 = ops.parameter(["?"], np.float32, name="data1")
|
||||
add = ops.add(param0, param1, name="add")
|
||||
|
@ -305,6 +305,30 @@ def test_properties_device_priorities():
|
||||
assert f"Incorrect passed value: {value} , expected string values." in str(e.value)
|
||||
|
||||
|
||||
def test_properties_device_properties():
|
||||
assert properties.device.properties() == "DEVICE_PROPERTIES"
|
||||
|
||||
def make_dict(*arg):
|
||||
return dict( # noqa: C406
|
||||
[*arg])
|
||||
|
||||
def check(value1, value2):
|
||||
assert properties.device.properties(value1) == ("DEVICE_PROPERTIES", OVAny(value2))
|
||||
|
||||
check({"CPU": {properties.streams.num(): 2}},
|
||||
{"CPU": {"NUM_STREAMS": 2}})
|
||||
check({"CPU": make_dict(properties.streams.num(2))},
|
||||
{"CPU": {"NUM_STREAMS": properties.streams.Num(2)}})
|
||||
check({"GPU": make_dict(properties.inference_precision(Type.f32))},
|
||||
{"GPU": {"INFERENCE_PRECISION_HINT": Type.f32}})
|
||||
check({"CPU": make_dict(properties.streams.num(2), properties.inference_precision(Type.f32))},
|
||||
{"CPU": {"INFERENCE_PRECISION_HINT": Type.f32, "NUM_STREAMS": properties.streams.Num(2)}})
|
||||
check({"CPU": make_dict(properties.streams.num(2), properties.inference_precision(Type.f32)),
|
||||
"GPU": make_dict(properties.streams.num(1), properties.inference_precision(Type.f16))},
|
||||
{"CPU": {"INFERENCE_PRECISION_HINT": Type.f32, "NUM_STREAMS": properties.streams.Num(2)},
|
||||
"GPU": {"INFERENCE_PRECISION_HINT": Type.f16, "NUM_STREAMS": properties.streams.Num(1)}})
|
||||
|
||||
|
||||
def test_properties_streams():
|
||||
# Test extra Num class
|
||||
assert properties.streams.Num().to_integer() == -1
|
||||
|
@ -28,6 +28,9 @@ namespace ov {
|
||||
class Plugin;
|
||||
/** @cond INTERNAL */
|
||||
class Any;
|
||||
|
||||
using AnyMap = std::map<std::string, Any>;
|
||||
|
||||
namespace util {
|
||||
|
||||
OPENVINO_API bool equal(std::type_index lhs, std::type_index rhs);
|
||||
@ -126,6 +129,11 @@ struct OPENVINO_API Read<std::tuple<unsigned int, unsigned int>> {
|
||||
void operator()(std::istream& is, std::tuple<unsigned int, unsigned int>& tuple) const;
|
||||
};
|
||||
|
||||
template <>
|
||||
struct OPENVINO_API Read<AnyMap> {
|
||||
void operator()(std::istream& is, AnyMap& map) const;
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
auto from_string(const std::string& str) -> const
|
||||
typename std::enable_if<std::is_same<T, std::string>::value, T>::type& {
|
||||
@ -210,14 +218,36 @@ struct Read<
|
||||
std::map<K, T, C, A>,
|
||||
typename std::enable_if<std::is_default_constructible<K>::value && std::is_default_constructible<T>::value>::type> {
|
||||
void operator()(std::istream& is, std::map<K, T, C, A>& map) const {
|
||||
while (is.good()) {
|
||||
std::string str;
|
||||
is >> str;
|
||||
auto k = from_string<K>(str);
|
||||
is >> str;
|
||||
auto v = from_string<T>(str);
|
||||
map.emplace(std::move(k), std::move(v));
|
||||
char c;
|
||||
|
||||
is >> c;
|
||||
OPENVINO_ASSERT(c == '{', "Failed to parse std::map<K, T>. Starting symbols is not '{', it's ", c);
|
||||
|
||||
while (c != '}') {
|
||||
std::string key, value;
|
||||
std::getline(is, key, ':');
|
||||
size_t enclosed_container_level = 0;
|
||||
|
||||
while (is.good()) {
|
||||
is >> c;
|
||||
if (c == ',') { // delimiter between map's pairs
|
||||
if (enclosed_container_level == 0) // we should interrupt after delimiter
|
||||
break;
|
||||
}
|
||||
if (c == '{' || c == '[') // case of enclosed maps / arrays
|
||||
++enclosed_container_level;
|
||||
if (c == '}' || c == ']') {
|
||||
if (enclosed_container_level == 0)
|
||||
break; // end of map
|
||||
--enclosed_container_level;
|
||||
}
|
||||
|
||||
value += c; // accumulate current value
|
||||
}
|
||||
map.emplace(from_string<K>(key), from_string<T>(value));
|
||||
}
|
||||
|
||||
OPENVINO_ASSERT(c == '}', "Failed to parse std::map<K, T>. Ending symbols is not '}', it's ", c);
|
||||
}
|
||||
};
|
||||
|
||||
@ -322,14 +352,14 @@ struct Write<std::map<K, T, C, A>> {
|
||||
void operator()(std::ostream& os, const std::map<K, T, C, A>& map) const {
|
||||
if (!map.empty()) {
|
||||
std::size_t i = 0;
|
||||
os << '{';
|
||||
for (auto&& v : map) {
|
||||
os << to_string(v.first);
|
||||
os << ' ';
|
||||
os << to_string(v.second);
|
||||
os << to_string(v.first) << ':' << to_string(v.second);
|
||||
if (i < (map.size() - 1))
|
||||
os << ' ';
|
||||
os << ',';
|
||||
++i;
|
||||
}
|
||||
os << '}';
|
||||
}
|
||||
}
|
||||
};
|
||||
@ -914,8 +944,6 @@ public:
|
||||
const void* addressof() const;
|
||||
};
|
||||
|
||||
using AnyMap = std::map<std::string, Any>;
|
||||
|
||||
using RTMap = AnyMap;
|
||||
|
||||
using AnyVector = std::vector<ov::Any>;
|
||||
|
@ -34,12 +34,30 @@ public:
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
|
||||
|
||||
/// \brief Set the output ROI feature map (pooled_h, pooled_w).
|
||||
/// \param output_size Shape with pooling attributes pooled_h and pooled_w sizes.
|
||||
void set_output_roi(Shape output_size);
|
||||
|
||||
/// \brief Get the output ROI feature map shape (H x W)
|
||||
/// \return Shape with pooled_h and pooled_w attributes.
|
||||
const Shape& get_output_roi() const;
|
||||
|
||||
OPENVINO_DEPRECATED("Use 'get_output_roi' instead. Use of this member can be ambiguous with Node base "
|
||||
"'get_output_size' which return number of outputs.")
|
||||
const Shape& get_output_size() const {
|
||||
return m_output_size;
|
||||
}
|
||||
|
||||
/// \brief Set the spatial scale value.
|
||||
/// \param scale Scale value to set.
|
||||
void set_spatial_scale(float scale);
|
||||
float get_spatial_scale() const {
|
||||
return m_spatial_scale;
|
||||
}
|
||||
|
||||
/// \brief Set the method of pooling
|
||||
/// \param method_name Pooling method name.
|
||||
void set_method(std::string method_name);
|
||||
const std::string& get_method() const {
|
||||
return m_method;
|
||||
}
|
||||
@ -47,7 +65,7 @@ public:
|
||||
|
||||
private:
|
||||
Shape m_output_size{0, 0};
|
||||
float m_spatial_scale{0};
|
||||
float m_spatial_scale{0.0f};
|
||||
std::string m_method = "max";
|
||||
};
|
||||
} // namespace v0
|
||||
|
107
src/core/shape_inference/include/roi_pooling_shape_inference.hpp
Normal file
107
src/core/shape_inference/include/roi_pooling_shape_inference.hpp
Normal file
@ -0,0 +1,107 @@
|
||||
// Copyright (C) 2018-2023 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <cmath>
|
||||
|
||||
#include "compare.hpp"
|
||||
#include "dimension_util.hpp"
|
||||
#include "openvino/op/roi_pooling.hpp"
|
||||
|
||||
namespace ov {
|
||||
namespace op {
|
||||
namespace pooling {
|
||||
namespace validate {
|
||||
template <class TROIPooling, class TShape>
|
||||
void rois_input_shape(const TROIPooling* op, const TShape rois_shape) {
|
||||
if (rois_shape.rank().is_static()) {
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
rois_shape.size() == 2,
|
||||
"Expected a 2D tensor for the ROIs input with box coordinates. Got: ",
|
||||
rois_shape);
|
||||
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
rois_shape[1].compatible(5),
|
||||
"The second dimension of ROIs input should contain batch id and box coordinates. ",
|
||||
"This dimension is expected to be equal to 5. Got: ",
|
||||
rois_shape[1]);
|
||||
}
|
||||
}
|
||||
|
||||
template <class TROIPooling>
|
||||
void output_roi_attr(const TROIPooling* op) {
|
||||
const auto& out_roi = op->get_output_roi();
|
||||
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
out_roi.size() == 2,
|
||||
"The dimension of pooled size is expected to be equal to 2. Got: ",
|
||||
out_roi.size());
|
||||
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
std::none_of(out_roi.cbegin(), out_roi.cend(), cmp::Less<size_t>(1)),
|
||||
"Pooled size attributes pooled_h and pooled_w should should be positive integers. Got: ",
|
||||
out_roi[0],
|
||||
" and: ",
|
||||
out_roi[1],
|
||||
"respectively");
|
||||
}
|
||||
|
||||
template <class TROIPooling>
|
||||
void scale_attr(const TROIPooling* op) {
|
||||
const auto scale = op->get_spatial_scale();
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
std::isnormal(scale) && !std::signbit(scale),
|
||||
"The spatial scale attribute should be a positive floating point number. Got: ",
|
||||
scale);
|
||||
}
|
||||
|
||||
template <class TROIPooling>
|
||||
void method_attr(const TROIPooling* op) {
|
||||
const auto& method = op->get_method();
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
method == "max" || method == "bilinear",
|
||||
"Pooling method attribute should be either \'max\' or \'bilinear\'. Got: ",
|
||||
method);
|
||||
}
|
||||
} // namespace validate
|
||||
} // namespace pooling
|
||||
|
||||
namespace v0 {
|
||||
template <class TShape>
|
||||
std::vector<TShape> shape_infer(const ROIPooling* op, const std::vector<TShape>& input_shapes) {
|
||||
NODE_VALIDATION_CHECK(op, input_shapes.size() == 2);
|
||||
using namespace ov::util;
|
||||
|
||||
const auto& feat_shape = input_shapes[0];
|
||||
const auto& rois_shape = input_shapes[1];
|
||||
const auto& feat_rank = feat_shape.rank();
|
||||
|
||||
NODE_VALIDATION_CHECK(op,
|
||||
feat_rank.compatible(4),
|
||||
"Expected a 4D tensor for the feature maps input. Got: ",
|
||||
feat_shape);
|
||||
|
||||
pooling::validate::rois_input_shape(op, rois_shape);
|
||||
pooling::validate::output_roi_attr(op);
|
||||
pooling::validate::scale_attr(op);
|
||||
pooling::validate::method_attr(op);
|
||||
|
||||
TShape out_shape;
|
||||
out_shape.reserve(4);
|
||||
|
||||
out_shape.emplace_back(rois_shape.rank().is_static() ? rois_shape[0] : dim::inf_bound);
|
||||
out_shape.emplace_back(feat_rank.is_static() ? feat_shape[1] : dim::inf_bound);
|
||||
std::copy(op->get_output_roi().cbegin(), op->get_output_roi().cend(), std::back_inserter(out_shape));
|
||||
|
||||
return {out_shape};
|
||||
}
|
||||
|
||||
template <class TShape>
|
||||
void shape_infer(const ROIPooling* op, const std::vector<TShape>& input_shapes, std::vector<TShape>& output_shapes) {
|
||||
output_shapes = shape_infer(op, input_shapes);
|
||||
}
|
||||
} // namespace v0
|
||||
} // namespace op
|
||||
} // namespace ov
|
@ -216,6 +216,39 @@ void Read<std::tuple<unsigned int, unsigned int, unsigned int>>::operator()(
|
||||
Read<unsigned int>{}(is, std::get<2>(tuple));
|
||||
}
|
||||
|
||||
void Read<AnyMap>::operator()(std::istream& is, AnyMap& map) const {
|
||||
std::string key, value;
|
||||
char c;
|
||||
|
||||
is >> c;
|
||||
OPENVINO_ASSERT(c == '{', "Failed to parse ov::AnyMap. Starting symbols is not '{', it's ", c);
|
||||
|
||||
while (c != '}') {
|
||||
std::getline(is, key, ':');
|
||||
size_t enclosed_container_level = 0;
|
||||
|
||||
while (is.good()) {
|
||||
is >> c;
|
||||
if (c == ',') { // delimiter between map's pairs
|
||||
if (enclosed_container_level == 0) // we should interrupt after delimiter
|
||||
break;
|
||||
}
|
||||
if (c == '{' || c == '[') // case of enclosed maps / arrays
|
||||
++enclosed_container_level;
|
||||
if (c == '}' || c == ']') {
|
||||
if (enclosed_container_level == 0)
|
||||
break; // end of map
|
||||
--enclosed_container_level;
|
||||
}
|
||||
|
||||
value += c; // accumulate current value
|
||||
}
|
||||
map.emplace(std::move(key), std::move(value));
|
||||
}
|
||||
|
||||
OPENVINO_ASSERT(c == '}', "Failed to parse ov::AnyMap. Ending symbols is not '}', it's ", c);
|
||||
}
|
||||
|
||||
void Read<std::tuple<unsigned int, unsigned int>>::operator()(std::istream& is,
|
||||
std::tuple<unsigned int, unsigned int>& tuple) const {
|
||||
Read<unsigned int>{}(is, std::get<0>(tuple));
|
||||
|
@ -179,14 +179,26 @@ ov::Tensor or_tensor(const ov::Tensor& lhs, const ov::Tensor& rhs) {
|
||||
}
|
||||
|
||||
struct TensorVectorCmp {
|
||||
// Comparing Tensor vectors as numbers composed with pointers as digits.
|
||||
// Indexed loop used to preserve order of comparison.
|
||||
bool operator()(const ov::TensorVector& lhs, const ov::TensorVector& rhs) const {
|
||||
auto rhs_it = rhs.begin();
|
||||
return std::any_of(lhs.begin(), lhs.end(), [&rhs_it](const ov::Tensor& lhs) {
|
||||
bool is_less =
|
||||
(lhs && *rhs_it) ? lhs.data() < rhs_it->data() : static_cast<bool>(lhs) < static_cast<bool>(*rhs_it);
|
||||
++rhs_it;
|
||||
return is_less;
|
||||
});
|
||||
const auto lhs_size = lhs.size();
|
||||
const auto rhs_size = rhs.size();
|
||||
|
||||
if (lhs_size < rhs_size)
|
||||
return true;
|
||||
if (lhs_size > rhs_size)
|
||||
return false;
|
||||
|
||||
for (size_t i = 0; i < lhs_size; ++i) {
|
||||
if (lhs[i].data() < rhs[i].data())
|
||||
return true;
|
||||
if (lhs[i].data() > rhs[i].data())
|
||||
return false;
|
||||
}
|
||||
|
||||
// if all equals
|
||||
return false;
|
||||
}
|
||||
};
|
||||
|
||||
@ -281,17 +293,14 @@ bool ov::interval_bound_evaluator(const Node* node,
|
||||
auto low_1 = ov::evaluate_lower_bound(node->get_input_source_output(1));
|
||||
auto up_0 = ov::evaluate_upper_bound(node->get_input_source_output(0));
|
||||
auto up_1 = ov::evaluate_upper_bound(node->get_input_source_output(1));
|
||||
if (!low_0 || !low_1 || !up_0 || !up_1)
|
||||
return false;
|
||||
|
||||
std::set<TensorVector, TensorVectorCmp> input_variants = {{low_0, low_1},
|
||||
{low_0, up_1},
|
||||
{up_0, low_1},
|
||||
{up_0, up_1}};
|
||||
|
||||
for (const auto& variant_of_input_vector : input_variants)
|
||||
for (const auto& input_tensor : variant_of_input_vector)
|
||||
if (!input_tensor)
|
||||
return false;
|
||||
|
||||
if (input_variants.size() == 1)
|
||||
return node->evaluate(upper_output_values, *input_variants.begin()) &&
|
||||
node->evaluate(lower_output_values, *input_variants.begin());
|
||||
|
@ -2,18 +2,22 @@
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "ngraph/op/roi_pooling.hpp"
|
||||
#include "openvino/op/roi_pooling.hpp"
|
||||
|
||||
#include "itt.hpp"
|
||||
#include "openvino/core/validation_util.hpp"
|
||||
#include "roi_pooling_shape_inference.hpp"
|
||||
|
||||
using namespace std;
|
||||
using namespace ngraph;
|
||||
|
||||
op::ROIPooling::ROIPooling(const Output<Node>& input,
|
||||
const Output<Node>& coords,
|
||||
const ov::Shape& output_size,
|
||||
const float spatial_scale,
|
||||
const string& method)
|
||||
namespace ov {
|
||||
namespace op {
|
||||
namespace v0 {
|
||||
ROIPooling::ROIPooling(const Output<Node>& input,
|
||||
const Output<Node>& coords,
|
||||
const ov::Shape& output_size,
|
||||
const float spatial_scale,
|
||||
const string& method)
|
||||
: Op({input, coords}),
|
||||
m_output_size(output_size),
|
||||
m_spatial_scale(spatial_scale),
|
||||
@ -21,10 +25,10 @@ op::ROIPooling::ROIPooling(const Output<Node>& input,
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
void op::ROIPooling::validate_and_infer_types() {
|
||||
void ROIPooling::validate_and_infer_types() {
|
||||
OV_OP_SCOPE(v0_ROIPooling_validate_and_infer_types);
|
||||
auto feat_maps_et = get_input_element_type(0);
|
||||
auto coords_et = get_input_element_type(1);
|
||||
const auto& feat_maps_et = get_input_element_type(0);
|
||||
const auto& coords_et = get_input_element_type(1);
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
feat_maps_et.is_real() && coords_et.is_real(),
|
||||
"The data type for input and ROIs is expected to be a floating point type. Got: ",
|
||||
@ -34,72 +38,16 @@ void op::ROIPooling::validate_and_infer_types() {
|
||||
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
feat_maps_et == coords_et,
|
||||
"Type of feature maps (inputs) and rois is expected to be the same. Got: ",
|
||||
"Type of feature maps (inputs) and ROIs is expected to be the same. Got: ",
|
||||
feat_maps_et,
|
||||
" and: ",
|
||||
coords_et);
|
||||
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
m_output_size.size() == 2,
|
||||
"The dimension of pooled size is expected to be equal to 2. Got: ",
|
||||
m_output_size.size());
|
||||
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
m_output_size[0] > 0 && m_output_size[1] > 0,
|
||||
"Pooled size attributes pooled_h and pooled_w should should be "
|
||||
"non-negative integers. Got: ",
|
||||
m_output_size[0],
|
||||
" and: ",
|
||||
m_output_size[1],
|
||||
"respectively");
|
||||
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
m_spatial_scale > 0,
|
||||
"The spatial scale attribute should be a positive floating point number. Got: ",
|
||||
m_spatial_scale);
|
||||
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
m_method == "max" || m_method == "bilinear",
|
||||
"Pooling method attribute should be either \'max\' or \'bilinear\'. Got: ",
|
||||
m_method);
|
||||
const auto output_shapes = shape_infer(this, get_node_input_partial_shapes(*this));
|
||||
set_output_type(0, feat_maps_et, output_shapes[0]);
|
||||
|
||||
const auto& feat_maps_ps = get_input_partial_shape(0);
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
feat_maps_ps.rank().compatible(4),
|
||||
"Expected a 4D tensor for the feature maps input. Got: ",
|
||||
feat_maps_ps);
|
||||
|
||||
const auto& coords_ps = get_input_partial_shape(1);
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
coords_ps.rank().compatible(2),
|
||||
"Expected a 2D tensor for the ROIs input with box coordinates. Got: ",
|
||||
coords_ps);
|
||||
|
||||
if (coords_ps.rank().is_static()) {
|
||||
const auto coords_second_dim = coords_ps[1];
|
||||
NODE_VALIDATION_CHECK(this,
|
||||
coords_second_dim.compatible(5),
|
||||
"The second dimension of ROIs input should contain batch id and box coordinates. ",
|
||||
"This dimension is expected to be equal to 5. Got: ",
|
||||
coords_second_dim);
|
||||
}
|
||||
|
||||
// output shape should be {NUM_ROIS, C, pooled_h, pooled_w}
|
||||
auto output_shape = ov::PartialShape{{Dimension::dynamic(),
|
||||
Dimension::dynamic(),
|
||||
Dimension{static_cast<int64_t>(m_output_size[0])},
|
||||
Dimension{static_cast<int64_t>(m_output_size[1])}}};
|
||||
|
||||
if (coords_ps.rank().is_static()) {
|
||||
output_shape[0] = coords_ps[0];
|
||||
}
|
||||
|
||||
if (feat_maps_ps.rank().is_static()) {
|
||||
output_shape[1] = feat_maps_ps[1];
|
||||
}
|
||||
|
||||
set_output_size(1);
|
||||
set_output_type(0, feat_maps_et, output_shape);
|
||||
|
||||
// if channel dimension, C, not known
|
||||
// feature maps input is used by shape specialization pass
|
||||
@ -114,13 +62,13 @@ void op::ROIPooling::validate_and_infer_types() {
|
||||
}
|
||||
}
|
||||
|
||||
shared_ptr<Node> op::ROIPooling::clone_with_new_inputs(const OutputVector& new_args) const {
|
||||
shared_ptr<Node> ROIPooling::clone_with_new_inputs(const OutputVector& new_args) const {
|
||||
OV_OP_SCOPE(v0_ROIPooling_clone_with_new_inputs);
|
||||
check_new_args_count(this, new_args);
|
||||
return make_shared<ROIPooling>(new_args.at(0), new_args.at(1), m_output_size, m_spatial_scale, m_method);
|
||||
}
|
||||
|
||||
bool op::ROIPooling::visit_attributes(AttributeVisitor& visitor) {
|
||||
bool ROIPooling::visit_attributes(AttributeVisitor& visitor) {
|
||||
OV_OP_SCOPE(v0_ROIPooling_visit_attributes);
|
||||
visitor.on_attribute("output_size", m_output_size);
|
||||
visitor.on_attribute("pooled_h", m_output_size[0]);
|
||||
@ -129,3 +77,21 @@ bool op::ROIPooling::visit_attributes(AttributeVisitor& visitor) {
|
||||
visitor.on_attribute("method", m_method);
|
||||
return true;
|
||||
}
|
||||
|
||||
void ROIPooling::set_output_roi(Shape output_size) {
|
||||
m_output_size = std::move(output_size);
|
||||
}
|
||||
const Shape& ROIPooling::get_output_roi() const {
|
||||
return m_output_size;
|
||||
}
|
||||
|
||||
void ROIPooling::set_spatial_scale(float scale) {
|
||||
m_spatial_scale = scale;
|
||||
}
|
||||
|
||||
void ROIPooling::set_method(std::string method_name) {
|
||||
m_method = std::move(method_name);
|
||||
}
|
||||
} // namespace v0
|
||||
} // namespace op
|
||||
} // namespace ov
|
||||
|
@ -51,23 +51,6 @@ void ov::pass::PassBase::set_callback(const param_callback& callback) {
|
||||
m_pass_config->set_callback(callback);
|
||||
}
|
||||
|
||||
namespace {
|
||||
class RunLocker {
|
||||
public:
|
||||
RunLocker(bool& flag) : m_flag(flag) {
|
||||
OPENVINO_ASSERT(m_flag == false,
|
||||
"Cycle detected. run_on_model() or run_on_function() method should be overridden.");
|
||||
m_flag = true;
|
||||
}
|
||||
~RunLocker() {
|
||||
m_flag = false;
|
||||
}
|
||||
|
||||
private:
|
||||
bool& m_flag;
|
||||
};
|
||||
} // namespace
|
||||
|
||||
// The symbols are requiered to be in cpp file to workaround RTTI issue on Android LLVM
|
||||
|
||||
ov::pass::ModelPass::~ModelPass() = default;
|
||||
|
@ -161,6 +161,187 @@ TEST_F(AnyTests, AnyAsMapOfAnys) {
|
||||
ASSERT_EQ(refMap["testParamString"].as<std::string>(), testString);
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, AnyAsMapOfMapOfAnys) {
|
||||
std::map<std::string, Any> refMap1;
|
||||
refMap1["testParamInt"] = 4;
|
||||
refMap1["testParamString"] = "test";
|
||||
|
||||
std::map<std::string, Any> refMap2;
|
||||
refMap2["testParamInt"] = 5;
|
||||
refMap2["testParamString"] = "test2";
|
||||
|
||||
std::map<std::string, Any> refMap;
|
||||
refMap["refMap1"] = refMap1;
|
||||
refMap["refMap2"] = refMap2;
|
||||
|
||||
Any p = refMap;
|
||||
bool isMap = p.is<std::map<std::string, Any>>();
|
||||
ASSERT_TRUE(isMap);
|
||||
auto testMap = p.as<std::map<std::string, Any>>();
|
||||
|
||||
ASSERT_NE(testMap.find("refMap1"), testMap.end());
|
||||
auto testMap1 = testMap.at("refMap1").as<std::map<std::string, Any>>();
|
||||
ASSERT_NE(testMap1.find("testParamInt"), testMap1.end());
|
||||
ASSERT_NE(testMap1.find("testParamString"), testMap1.end());
|
||||
|
||||
int testInt1 = testMap1["testParamInt"].as<int>();
|
||||
std::string testString1 = testMap1["testParamString"].as<std::string>();
|
||||
|
||||
ASSERT_EQ(refMap1["testParamInt"].as<int>(), testInt1);
|
||||
ASSERT_EQ(refMap1["testParamString"].as<std::string>(), testString1);
|
||||
|
||||
ASSERT_NE(testMap.find("refMap2"), testMap.end());
|
||||
auto testMap2 = testMap.at("refMap2").as<std::map<std::string, Any>>();
|
||||
ASSERT_NE(testMap2.find("testParamInt"), testMap2.end());
|
||||
ASSERT_NE(testMap2.find("testParamString"), testMap2.end());
|
||||
|
||||
int testInt2 = testMap2["testParamInt"].as<int>();
|
||||
std::string testString2 = testMap2["testParamString"].as<std::string>();
|
||||
|
||||
ASSERT_EQ(refMap2["testParamInt"].as<int>(), testInt2);
|
||||
ASSERT_EQ(refMap2["testParamString"].as<std::string>(), testString2);
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, AnyAsMapOfMapOfAnysFromString) {
|
||||
const std::string string_props = "{map1:{prop1:1,prop2:2.0},map2:{prop1:value}}";
|
||||
ov::Any any(string_props);
|
||||
|
||||
ov::AnyMap map;
|
||||
ASSERT_TRUE(any.is<std::string>());
|
||||
ASSERT_FALSE(any.is<ov::AnyMap>());
|
||||
ASSERT_NO_THROW(map = any.as<ov::AnyMap>());
|
||||
ASSERT_EQ(string_props, ov::Any(map).as<std::string>());
|
||||
|
||||
// check map1
|
||||
using MapStrDouble = std::map<std::string, double>;
|
||||
MapStrDouble map1;
|
||||
ASSERT_TRUE(map["map1"].is<std::string>());
|
||||
ASSERT_FALSE(map["map1"].is<ov::AnyMap>());
|
||||
ASSERT_FALSE(map["map1"].is<MapStrDouble>());
|
||||
ASSERT_NO_THROW(map1 = map["map1"].as<MapStrDouble>());
|
||||
ASSERT_EQ(2, map1.size());
|
||||
|
||||
// check map1:prop1
|
||||
ASSERT_EQ(1.0, map1["prop1"]);
|
||||
// check map1:prop2
|
||||
ASSERT_EQ(2.0, map1["prop2"]);
|
||||
|
||||
// check map2
|
||||
ov::AnyMap map2;
|
||||
ASSERT_TRUE(map["map2"].is<std::string>());
|
||||
ASSERT_FALSE(map["map2"].is<ov::AnyMap>());
|
||||
ASSERT_NO_THROW(map2 = map["map2"].as<ov::AnyMap>());
|
||||
ASSERT_EQ(1, map2.size());
|
||||
|
||||
// check map1:prop1
|
||||
ASSERT_TRUE(map2["prop1"].is<std::string>());
|
||||
ASSERT_FALSE(map2["prop1"].is<int>());
|
||||
ASSERT_EQ("value", map2["prop1"].as<std::string>());
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, AnyAsMapOfMapOfMapOfAnysFromString) {
|
||||
const std::string string_props = "{map1:{subprop_map:{prop:value}},prop1:1,prop2:2.0}";
|
||||
ov::Any any(string_props);
|
||||
|
||||
ov::AnyMap map;
|
||||
ASSERT_TRUE(any.is<std::string>());
|
||||
ASSERT_FALSE(any.is<ov::AnyMap>());
|
||||
ASSERT_NO_THROW(map = any.as<ov::AnyMap>());
|
||||
ASSERT_EQ(3, map.size());
|
||||
ASSERT_EQ(string_props, ov::Any(map).as<std::string>());
|
||||
|
||||
// check prop1
|
||||
ASSERT_TRUE(map["prop1"].is<std::string>());
|
||||
ASSERT_FALSE(map["prop1"].is<int>());
|
||||
ASSERT_EQ("1", map["prop1"].as<std::string>());
|
||||
ASSERT_EQ(1, map["prop1"].as<int>());
|
||||
|
||||
// check prop2
|
||||
ASSERT_TRUE(map["prop2"].is<std::string>());
|
||||
ASSERT_FALSE(map["prop2"].is<int>());
|
||||
ASSERT_FALSE(map["prop2"].is<double>());
|
||||
ASSERT_EQ("2.0", map["prop2"].as<std::string>());
|
||||
ASSERT_EQ(2, map["prop2"].as<int>());
|
||||
ASSERT_EQ(2.0, map["prop2"].as<double>());
|
||||
|
||||
// check map1
|
||||
ov::AnyMap map1;
|
||||
ASSERT_TRUE(map["map1"].is<std::string>());
|
||||
ASSERT_FALSE(map["map1"].is<ov::AnyMap>());
|
||||
ASSERT_NO_THROW(map1 = map["map1"].as<ov::AnyMap>());
|
||||
|
||||
// check subprop
|
||||
ov::AnyMap subprop_map;
|
||||
ASSERT_TRUE(map1["subprop_map"].is<std::string>());
|
||||
ASSERT_FALSE(map1["subprop_map"].is<ov::AnyMap>());
|
||||
ASSERT_NO_THROW(subprop_map = map1["subprop_map"].as<ov::AnyMap>());
|
||||
|
||||
// check prop
|
||||
ASSERT_TRUE(subprop_map["prop"].is<std::string>());
|
||||
ASSERT_FALSE(subprop_map["prop"].is<ov::AnyMap>());
|
||||
ASSERT_EQ("value", subprop_map["prop"].as<std::string>());
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, AnyDoesNotShareValues) {
|
||||
// simple types
|
||||
{
|
||||
Any a = 1;
|
||||
Any b = a;
|
||||
a = 2;
|
||||
ASSERT_EQ(1, b.as<int>());
|
||||
ASSERT_EQ(2, a.as<int>());
|
||||
b = 3;
|
||||
ASSERT_EQ(2, a.as<int>());
|
||||
ASSERT_EQ(3, b.as<int>());
|
||||
}
|
||||
|
||||
// AnyMap's
|
||||
{
|
||||
AnyMap map{
|
||||
{"1", ov::Any(1)},
|
||||
{"2", ov::Any(2)},
|
||||
};
|
||||
|
||||
Any a = map;
|
||||
|
||||
// check initial state
|
||||
ASSERT_EQ(1, a.as<AnyMap>()["1"].as<int>());
|
||||
ASSERT_EQ(2, a.as<AnyMap>()["2"].as<int>());
|
||||
|
||||
map["1"] = 3; // change map
|
||||
ASSERT_EQ(1, a.as<AnyMap>()["1"].as<int>()); // Any is not changed
|
||||
|
||||
a.as<AnyMap>()["2"] = 4; // change Any
|
||||
ASSERT_EQ(2, map["2"].as<int>()); // map is not changed
|
||||
|
||||
// erase from Any's map
|
||||
AnyMap from_any_map = a.as<AnyMap>();
|
||||
from_any_map.erase(from_any_map.begin());
|
||||
ASSERT_EQ(2, map.size());
|
||||
|
||||
// erase from map
|
||||
map.erase(map.find("2"));
|
||||
ASSERT_NE(from_any_map.end(), from_any_map.find("2"));
|
||||
ASSERT_EQ(4, a.as<AnyMap>()["2"].as<int>());
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, DISABLED_AnyMapSharesValues) {
|
||||
AnyMap map{
|
||||
{"1", 1},
|
||||
{"2", 2},
|
||||
};
|
||||
|
||||
AnyMap copy_map = map;
|
||||
|
||||
// check initial state
|
||||
ASSERT_EQ(1, copy_map["1"].as<int>());
|
||||
ASSERT_EQ(2, copy_map["2"].as<int>());
|
||||
|
||||
map["1"].as<int>() = 110; // change map
|
||||
EXPECT_EQ(1, copy_map["1"].as<int>()); // TODO: why value is changed here?
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, AnyNotEmpty) {
|
||||
Any p = 4;
|
||||
ASSERT_FALSE(p.empty());
|
||||
@ -401,7 +582,31 @@ TEST_F(AnyTests, PrintToMapOfAnys) {
|
||||
{
|
||||
Any p = refMap;
|
||||
ASSERT_NO_THROW(p.print(stream));
|
||||
ASSERT_EQ(stream.str(), std::string{"testParamInt 4 testParamString test"});
|
||||
ASSERT_EQ(stream.str(), std::string{"{testParamInt:4,testParamString:test}"});
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(AnyTests, PrintToMapOfMapsOfAnys) {
|
||||
std::map<std::string, Any> refMap1;
|
||||
refMap1["testParamInt"] = 4;
|
||||
refMap1["testParamString"] = "test";
|
||||
|
||||
std::map<std::string, Any> refMap2;
|
||||
refMap2["testParamInt"] = 5;
|
||||
refMap2["testParamString"] = "test2";
|
||||
|
||||
std::map<std::string, Any> refMap;
|
||||
refMap["refMap1"] = refMap1;
|
||||
refMap["refMap2"] = refMap2;
|
||||
|
||||
std::stringstream stream;
|
||||
{
|
||||
Any p = refMap;
|
||||
ASSERT_NO_THROW(p.print(stream));
|
||||
ASSERT_EQ(
|
||||
stream.str(),
|
||||
std::string{
|
||||
"{refMap1:{testParamInt:4,testParamString:test},refMap2:{testParamInt:5,testParamString:test2}}"});
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -51,3 +51,31 @@ TEST_F(EvaluateBoundTest, no_exception_when_node_has_output_with_dynamic_element
|
||||
|
||||
EXPECT_NO_THROW(evaluate_both_bounds(fn_op));
|
||||
}
|
||||
|
||||
using BoundEvaluatorTest = ::testing::Test;
|
||||
TEST(BoundEvaluatorTest, no_exception_on_single_bound) {
|
||||
constexpr auto et = element::i32;
|
||||
const auto s = Shape{1, 1};
|
||||
const auto a = std::make_shared<Parameter>(et, PartialShape{s});
|
||||
const auto b = Constant::create(et, s, {1});
|
||||
const auto sub = std::make_shared<Subtract>(a, b);
|
||||
|
||||
int32_t a_l[1] = {1};
|
||||
a->get_output_tensor(0).set_lower_value(Tensor{et, s, a_l});
|
||||
|
||||
int32_t o_[1] = {INT32_MIN}; // initial value of output tensor is not needed, it's set to check whether changed
|
||||
TensorVector output{{et, s, o_}};
|
||||
// evaluations won't be performed due to missing upper bound tensor of parameter a
|
||||
ASSERT_NO_THROW(sub->evaluate_lower(output));
|
||||
EXPECT_EQ(o_[0], INT32_MIN);
|
||||
ASSERT_NO_THROW(sub->evaluate_upper(output));
|
||||
EXPECT_EQ(o_[0], INT32_MIN);
|
||||
|
||||
int32_t a_u[1] = {11};
|
||||
a->get_output_tensor(0).set_upper_value(Tensor{et, s, a_u});
|
||||
// now both bounds of sub node can be calculated
|
||||
ASSERT_NO_THROW(sub->evaluate_lower(output));
|
||||
EXPECT_EQ(o_[0], 0);
|
||||
ASSERT_NO_THROW(sub->evaluate_upper(output));
|
||||
EXPECT_EQ(o_[0], 10);
|
||||
}
|
||||
|
@ -2,109 +2,171 @@
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "common_test_utils/test_assertions.hpp"
|
||||
#include "gtest/gtest.h"
|
||||
#include "ngraph/ngraph.hpp"
|
||||
#include "openvino/opsets/opset11.hpp"
|
||||
#include "type_prop.hpp"
|
||||
|
||||
using namespace std;
|
||||
using namespace ngraph;
|
||||
using namespace ov;
|
||||
using namespace ov::opset11;
|
||||
using namespace testing;
|
||||
|
||||
TEST(type_prop, roi_pooling_basic_shape_inference) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{1, 3, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{4, 5});
|
||||
const auto op = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f);
|
||||
ASSERT_EQ(op->get_method(), "max");
|
||||
ASSERT_EQ(op->get_shape(), (Shape{4, 3, 2, 2}));
|
||||
class TypePropROIPoolingV0 : public TypePropOpTest<op::v0::ROIPooling> {
|
||||
protected:
|
||||
float spatial_scale = 0.625f;
|
||||
Shape pooling_roi_2x2{2, 2};
|
||||
};
|
||||
|
||||
TEST_F(TypePropROIPoolingV0, default_ctor) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, PartialShape{{0, 3}, {1, 3}, {1, 6}, {1, 6}});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{{2, 4}, {1, 5}});
|
||||
|
||||
const auto op = make_op();
|
||||
op->set_arguments(OutputVector{feat_maps, rois});
|
||||
op->set_spatial_scale(spatial_scale);
|
||||
op->set_method("max");
|
||||
op->set_output_roi({3, 4});
|
||||
op->validate_and_infer_types();
|
||||
|
||||
EXPECT_FLOAT_EQ(op->get_spatial_scale(), spatial_scale);
|
||||
EXPECT_EQ(op->get_output_roi(), Shape({3, 4}));
|
||||
EXPECT_EQ(op->get_method(), "max");
|
||||
EXPECT_EQ(op->get_input_size(), 2);
|
||||
EXPECT_EQ(op->get_element_type(), element::f32);
|
||||
EXPECT_EQ(static_cast<Node*>(op.get())->get_output_size(), 1);
|
||||
EXPECT_EQ(op->get_output_partial_shape(0), (PartialShape{{2, 4}, {1, 3}, 3, 4}));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_dynamic_channels_dim) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, PartialShape{1, Dimension(), 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{4, 5});
|
||||
const auto op = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f, "max");
|
||||
ASSERT_TRUE(op->get_output_partial_shape(0).same_scheme(PartialShape{4, Dimension(), 2, 2}));
|
||||
TEST_F(TypePropROIPoolingV0, basic_shape_inference) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{1, 3, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, Shape{4, 5});
|
||||
const auto op = make_op(feat_maps, rois, pooling_roi_2x2, 0.625f);
|
||||
|
||||
EXPECT_EQ(op->get_element_type(), element::f32);
|
||||
EXPECT_EQ(op->get_method(), "max");
|
||||
EXPECT_EQ(op->get_shape(), (Shape{4, 3, 2, 2}));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_dynamic_num_rois_dim) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{1, 3, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, PartialShape{Dimension(), 5});
|
||||
const auto op = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f);
|
||||
ASSERT_TRUE(op->get_output_partial_shape(0).same_scheme(PartialShape{Dimension(), 3, 2, 2}));
|
||||
TEST_F(TypePropROIPoolingV0, dynamic_channels_dim) {
|
||||
auto feat_shape = PartialShape{1, -1, 6, 6};
|
||||
auto rois_shape = PartialShape{4, 5};
|
||||
set_shape_labels(feat_shape, 10);
|
||||
set_shape_labels(rois_shape, 20);
|
||||
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, feat_shape);
|
||||
const auto rois = make_shared<Parameter>(element::f32, rois_shape);
|
||||
const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "max");
|
||||
|
||||
EXPECT_EQ(op->get_element_type(), element::f32);
|
||||
EXPECT_EQ(op->get_output_partial_shape(0), (PartialShape{4, -1, 2, 2}));
|
||||
EXPECT_THAT(get_shape_labels(op->get_output_partial_shape(0)), ElementsAre(20, 11, ov::no_label, ov::no_label));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_dynamic_rank_feat_maps) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, PartialShape::dynamic());
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{4, 5});
|
||||
const auto op = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f);
|
||||
ASSERT_TRUE(op->get_output_partial_shape(0).same_scheme(PartialShape{4, Dimension(), 2, 2}));
|
||||
TEST_F(TypePropROIPoolingV0, dynamic_num_rois_dim) {
|
||||
auto feat_shape = PartialShape{1, 3, 6, 6};
|
||||
auto rois_shape = PartialShape{-1, 5};
|
||||
set_shape_labels(feat_shape, 10);
|
||||
set_shape_labels(rois_shape, 20);
|
||||
|
||||
const auto feat_maps = make_shared<Parameter>(element::f64, feat_shape);
|
||||
const auto rois = make_shared<Parameter>(element::f64, rois_shape);
|
||||
const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "bilinear");
|
||||
|
||||
EXPECT_EQ(op->get_element_type(), element::f64);
|
||||
EXPECT_EQ(op->get_output_partial_shape(0), (PartialShape{-1, 3, 2, 2}));
|
||||
EXPECT_THAT(get_shape_labels(op->get_output_partial_shape(0)), ElementsAre(20, 11, ov::no_label, ov::no_label));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_dynamic_rank_rois) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{1, 3, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, PartialShape::dynamic());
|
||||
const auto op = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f);
|
||||
ASSERT_TRUE(op->get_output_partial_shape(0).same_scheme(PartialShape{Dimension(), 3, 2, 2}));
|
||||
TEST_F(TypePropROIPoolingV0, dynamic_rank_feat_maps) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f16, PartialShape::dynamic());
|
||||
const auto rois = make_shared<Parameter>(element::f16, Shape{4, 5});
|
||||
const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale);
|
||||
|
||||
EXPECT_EQ(op->get_element_type(), element::f16);
|
||||
EXPECT_EQ(op->get_output_partial_shape(0), (PartialShape{4, -1, 2, 2}));
|
||||
EXPECT_THAT(get_shape_labels(op->get_output_partial_shape(0)), Each(ov::no_label));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_incompatible_input_rank) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{1, 3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{3, 5});
|
||||
// feat_maps must be of rank 4
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f, "max"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, dynamic_rank_feat_rois) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{1, 3, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape::dynamic());
|
||||
const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale);
|
||||
|
||||
EXPECT_EQ(op->get_element_type(), element::f32);
|
||||
EXPECT_EQ(op->get_output_partial_shape(0), (PartialShape{-1, 3, 2, 2}));
|
||||
EXPECT_THAT(get_shape_labels(op->get_output_partial_shape(0)), Each(ov::no_label));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_incompatible_pooling_shape) {
|
||||
Shape pool_shape{2, 2, 2};
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{3, 5});
|
||||
// pool_shape must be of rank 2 {pooled_h, pooled_w}
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, pool_shape, 0.625f, "max"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, incompatible_input_rank) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{1, 3, 6, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "max"),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("Expected a 4D tensor for the feature maps input"));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_incompatible_rois_second_dim) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{3, 4});
|
||||
// the second dim of rois must be 5. [batch_id, x_1, y_1, x_2, y_2]
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f, "max"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, incompatible_pooling_shape) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, Shape{2, 2, 2}, spatial_scale, "max"),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("The dimension of pooled size is expected to be equal to 2"));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_incompatible_feature_maps_element_type) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::i32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f32, Shape{3, 5});
|
||||
// feat_maps element type must be floating point type
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f, "max"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, incompatible_rois_second_dim) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 4});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "max"),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("The second dimension of ROIs input should contain batch id and box coordinates. This "
|
||||
"dimension is expected to be equal to 5"));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_incompatible_rois_element_type) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f16, Shape{3, 5});
|
||||
// rois element type must be equal to feat_maps element type (floating point type)
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f, "bilinear"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, incompatible_feature_maps_element_type) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::i32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "max"),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("The data type for input and ROIs is expected to be a floating point type"));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_invalid_pooling_method) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f16, Shape{3, 5});
|
||||
// ROIPooling method is invalid: not max nor bilinear
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, 0.625f, "invalid"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, incompatible_rois_element_type) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::i16, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "bilinear"),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("The data type for input and ROIs is expected to be a floating point type"));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_invalid_spatial_scale) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f16, Shape{3, 5});
|
||||
// ROIPooling spatial scale attribute must be a positive floating point number
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{2, 2}, -0.625f, "max"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, invalid_pooling_method) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, pooling_roi_2x2, spatial_scale, "invalid"),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("Pooling method attribute should be either \'max\' or \'bilinear\'"));
|
||||
}
|
||||
|
||||
TEST(type_prop, roi_pooling_invalid_pooled_size) {
|
||||
const auto feat_maps = make_shared<op::Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<op::Parameter>(element::f16, Shape{3, 5});
|
||||
// ROIPooling pooled_h and pooled_w must be non-negative integers
|
||||
ASSERT_THROW(const auto unused = make_shared<op::v0::ROIPooling>(feat_maps, rois, Shape{1, 0}, 0.625f, "max"),
|
||||
ngraph::NodeValidationFailure);
|
||||
TEST_F(TypePropROIPoolingV0, invalid_spatial_scale) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, pooling_roi_2x2, -1.0f),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("The spatial scale attribute should be a positive floating point number"));
|
||||
}
|
||||
|
||||
TEST_F(TypePropROIPoolingV0, invalid_pooled_size) {
|
||||
const auto feat_maps = make_shared<Parameter>(element::f32, Shape{3, 2, 6, 6});
|
||||
const auto rois = make_shared<Parameter>(element::f32, PartialShape{3, 5});
|
||||
|
||||
OV_EXPECT_THROW(const auto op = make_op(feat_maps, rois, Shape{1, 0}, spatial_scale),
|
||||
NodeValidationFailure,
|
||||
HasSubstr("Pooled size attributes pooled_h and pooled_w should should be positive integers"));
|
||||
}
|
||||
|
@ -81,8 +81,7 @@ TEST(attributes, interpolate_op4) {
|
||||
TEST(attributes, interpolate_op11) {
|
||||
NodeBuilder::get_ops().register_factory<opset11::Interpolate>();
|
||||
const auto img = make_shared<op::Parameter>(element::f32, Shape{1, 3, 32, 32});
|
||||
const auto scales = op::v0::Constant::create(element::f32, {2}, {2.0, 2.0});
|
||||
const auto axes = op::v0::Constant::create(element::i32, {2}, {2, 3});
|
||||
const auto scales = op::v0::Constant::create(element::f32, {4}, {1.0, 1.0, 2.0, 2.0});
|
||||
|
||||
op::v11::Interpolate::InterpolateAttrs attrs;
|
||||
attrs.mode = op::v11::Interpolate::InterpolateMode::BILINEAR_PILLOW;
|
||||
@ -94,7 +93,7 @@ TEST(attributes, interpolate_op11) {
|
||||
attrs.antialias = true;
|
||||
attrs.cube_coeff = -0.75;
|
||||
|
||||
auto interpolate = make_shared<opset11::Interpolate>(img, scales, axes, attrs);
|
||||
auto interpolate = make_shared<opset11::Interpolate>(img, scales, attrs);
|
||||
NodeBuilder builder(interpolate, {img, scales});
|
||||
auto g_interpolate = ov::as_type_ptr<opset11::Interpolate>(builder.create());
|
||||
|
||||
|
@ -25,7 +25,7 @@ TEST(attributes, roi_pooling_op) {
|
||||
NodeBuilder builder(op, {data, coords});
|
||||
const auto g_op = ov::as_type_ptr<opset3::ROIPooling>(builder.create());
|
||||
|
||||
EXPECT_EQ(g_op->get_output_size(), op->get_output_size());
|
||||
EXPECT_EQ(g_op->get_output_roi(), op->get_output_roi());
|
||||
EXPECT_EQ(g_op->get_spatial_scale(), op->get_spatial_scale());
|
||||
EXPECT_EQ(g_op->get_method(), op->get_method());
|
||||
}
|
||||
|
@ -60,7 +60,7 @@ protected:
|
||||
bool supported_impl(const std::vector<ov::Any>& variants) const override;
|
||||
ov::frontend::InputModel::Ptr load_impl(const std::vector<ov::Any>& variants) const override;
|
||||
|
||||
std::map<std::string, PytorchCreatorFunction> m_op_translators;
|
||||
std::map<std::string, CreatorFunction> m_op_translators;
|
||||
};
|
||||
|
||||
} // namespace pytorch
|
||||
|
@ -19,20 +19,22 @@ typedef std::unordered_map<size_t, Output<Node>> TensorMap;
|
||||
class NodeContext : public frontend::NodeContext {
|
||||
public:
|
||||
NodeContext(std::shared_ptr<TorchDecoder> decoder,
|
||||
TensorMap* tensor_map,
|
||||
ParameterVector* external_parameters,
|
||||
const TensorMap& ext_tensor_map,
|
||||
std::shared_ptr<TensorMap> tensor_map,
|
||||
std::shared_ptr<ParameterVector> external_parameters,
|
||||
std::shared_ptr<std::set<size_t>> mutated_tensors,
|
||||
TranslateSession* translate_session)
|
||||
: frontend::NodeContext(decoder->get_op_type()),
|
||||
m_decoder(decoder),
|
||||
m_tensor_map(tensor_map),
|
||||
m_ext_tensor_map(ext_tensor_map),
|
||||
m_tensor_map(tensor_map),
|
||||
m_external_parameters(external_parameters),
|
||||
m_mutated_tensors(mutated_tensors),
|
||||
m_translate_session(translate_session),
|
||||
m_decoder_inputs(decoder->inputs()),
|
||||
m_decoder_outputs(decoder->outputs()) {
|
||||
FRONT_END_GENERAL_CHECK(tensor_map != nullptr && external_parameters != nullptr &&
|
||||
translate_session != nullptr);
|
||||
FRONT_END_GENERAL_CHECK(m_tensor_map != nullptr && m_external_parameters != nullptr &&
|
||||
m_mutated_tensors != nullptr && m_translate_session != nullptr);
|
||||
}
|
||||
|
||||
// Do not search for input in tensor map; try to access it as a constant of specified type T and return its value
|
||||
@ -106,11 +108,7 @@ public:
|
||||
"There is no any named attributes in PyTorch node, query by attribute name is not implemented");
|
||||
}
|
||||
|
||||
void mutate_input(size_t index, Output<Node> ov_output);
|
||||
|
||||
std::set<size_t> get_mutated_tensors() const {
|
||||
return m_mutated_tensors;
|
||||
}
|
||||
void mutate_input(size_t index, Output<Node> ov_output) const;
|
||||
|
||||
std::shared_ptr<TorchDecoder> get_decoder() const {
|
||||
return m_decoder;
|
||||
@ -120,7 +118,7 @@ public:
|
||||
return m_translate_session;
|
||||
}
|
||||
|
||||
void add_tensor_to_context(size_t index, Output<Node> ov_output);
|
||||
void add_tensor_to_context(size_t index, Output<Node> ov_output) const;
|
||||
|
||||
Output<Node> get_tensor_from_model(size_t index) const {
|
||||
if (m_tensor_map->find(index) != m_tensor_map->end()) {
|
||||
@ -130,22 +128,22 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
Output<Node> get_tensor_from_model_or_create_input(size_t index);
|
||||
Output<Node> get_tensor_from_model_or_create_input(size_t index) const;
|
||||
Output<Node> get_input_from_visible_context(size_t index) const;
|
||||
std::shared_ptr<ov::Model> convert_subgraph(size_t index);
|
||||
std::shared_ptr<ov::Model> convert_subgraph(size_t index) const;
|
||||
|
||||
private:
|
||||
std::shared_ptr<TorchDecoder> m_decoder;
|
||||
std::set<size_t> m_mutated_tensors;
|
||||
TensorMap* m_tensor_map;
|
||||
const TensorMap& m_ext_tensor_map;
|
||||
ParameterVector* m_external_parameters;
|
||||
TranslateSession* m_translate_session;
|
||||
std::shared_ptr<TensorMap> m_tensor_map;
|
||||
std::shared_ptr<ParameterVector> m_external_parameters;
|
||||
std::shared_ptr<std::set<size_t>> m_mutated_tensors;
|
||||
TranslateSession* m_translate_session = nullptr;
|
||||
const std::vector<size_t> m_decoder_inputs;
|
||||
const std::vector<size_t> m_decoder_outputs;
|
||||
};
|
||||
|
||||
using PytorchCreatorFunction = std::function<OutputVector(NodeContext&)>;
|
||||
using CreatorFunction = std::function<ov::OutputVector(const ov::frontend::pytorch::NodeContext&)>;
|
||||
|
||||
} // namespace pytorch
|
||||
} // namespace frontend
|
||||
|
@ -42,16 +42,16 @@ std::shared_ptr<Node> NodeContext::mark_node(std::shared_ptr<Node> ov_node) cons
|
||||
return m_decoder->mark_node(ov_node);
|
||||
}
|
||||
|
||||
void NodeContext::mutate_input(size_t index, Output<Node> ov_output) {
|
||||
void NodeContext::mutate_input(size_t index, Output<Node> ov_output) const {
|
||||
FRONT_END_GENERAL_CHECK(!m_decoder->input_is_none(index), "Input is none with index: ", index);
|
||||
auto input_id = m_decoder_inputs.at(index);
|
||||
FRONT_END_GENERAL_CHECK(m_tensor_map->count(input_id), "No tensor corresponding input: ", input_id, " exist.");
|
||||
m_translate_session->encode_tensor_name(ov_output, input_id, m_decoder->get_input_debug_name(index));
|
||||
(*m_tensor_map)[input_id] = ov_output;
|
||||
m_mutated_tensors.insert(input_id);
|
||||
m_mutated_tensors->insert(input_id);
|
||||
}
|
||||
|
||||
void NodeContext::add_tensor_to_context(size_t index, Output<Node> ov_output) {
|
||||
void NodeContext::add_tensor_to_context(size_t index, Output<Node> ov_output) const {
|
||||
if (m_tensor_map->count(index)) {
|
||||
OPENVINO_DEBUG << "[ WARNING ] Current context has tensor. Rewriting.\n";
|
||||
}
|
||||
@ -59,7 +59,7 @@ void NodeContext::add_tensor_to_context(size_t index, Output<Node> ov_output) {
|
||||
(*m_tensor_map)[index] = ov_output;
|
||||
}
|
||||
|
||||
Output<Node> NodeContext::get_tensor_from_model_or_create_input(size_t index) {
|
||||
Output<Node> NodeContext::get_tensor_from_model_or_create_input(size_t index) const {
|
||||
if (m_tensor_map->find(index) != m_tensor_map->end()) {
|
||||
return m_tensor_map->at(index);
|
||||
} else {
|
||||
@ -87,7 +87,7 @@ Output<Node> NodeContext::get_input_from_visible_context(size_t index) const {
|
||||
return input_tensor;
|
||||
}
|
||||
|
||||
std::shared_ptr<ov::Model> NodeContext::convert_subgraph(size_t index) {
|
||||
std::shared_ptr<ov::Model> NodeContext::convert_subgraph(size_t index) const {
|
||||
auto subgraph_decoder = m_decoder->get_subgraph_decoder(index);
|
||||
|
||||
// Extend external context with internal tensors except Parameter nodes, because internal Parameters are created to
|
||||
|
@ -19,7 +19,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_adaptive_avg_pool3d(NodeContext& context) {
|
||||
OutputVector translate_adaptive_avg_pool3d(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto const_tile_params = context.mark_node(v0::Constant::create(element::i32, Shape{5}, {1, 1, 1, 1, 1}));
|
||||
auto const_0 = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {0}));
|
||||
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_adaptive_max_pool2d(NodeContext& context) {
|
||||
OutputVector translate_adaptive_max_pool2d(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto x = context.get_input(0);
|
||||
auto y = context.get_input(1);
|
||||
|
@ -15,7 +15,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_add(NodeContext& context) {
|
||||
OutputVector translate_add(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 3);
|
||||
auto lhs = context.get_input(0);
|
||||
auto rhs = context.get_input(1);
|
||||
|
@ -17,7 +17,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_addcmul(NodeContext& context) {
|
||||
OutputVector translate_addcmul(const NodeContext& context) {
|
||||
num_inputs_check(context, 4, 4);
|
||||
const auto eltwise_mult = std::make_shared<v1::Multiply>(context.get_input(1), context.get_input(2));
|
||||
const auto value = context.get_input(3);
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_addmm(NodeContext& context) {
|
||||
OutputVector translate_addmm(const NodeContext& context) {
|
||||
num_inputs_check(context, 5, 5);
|
||||
auto input = context.get_input(0);
|
||||
auto m1 = context.get_input(1);
|
||||
|
@ -17,7 +17,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_arange(NodeContext& context) {
|
||||
OutputVector translate_arange(const NodeContext& context) {
|
||||
auto zero = context.mark_node(v0::Constant::create(element::i32, Shape{}, {0}));
|
||||
auto one = context.mark_node(v0::Constant::create(element::i32, Shape{}, {1}));
|
||||
int dtype_port = -1;
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_as_tensor(NodeContext& context) {
|
||||
OutputVector translate_as_tensor(const NodeContext& context) {
|
||||
// aten::tensor(t[] data, *, ScalarType? dtype=None, Device? device=None, bool requires_grad=False) -> Tensor
|
||||
num_inputs_check(context, 1, 4);
|
||||
auto dtype = element::f32;
|
||||
|
@ -18,7 +18,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_avg_poolnd(NodeContext& context) {
|
||||
OutputVector translate_avg_poolnd(const NodeContext& context) {
|
||||
num_inputs_check(context, 6, 7);
|
||||
auto input = context.get_input(0);
|
||||
auto kernel = context.const_input<Shape>(1);
|
||||
|
@ -32,7 +32,7 @@ Output<Node> broadcast_const_to_channel_dim(const NodeContext& context,
|
||||
}
|
||||
} // namespace
|
||||
|
||||
OutputVector translate_batch_norm(NodeContext& context) {
|
||||
OutputVector translate_batch_norm(const NodeContext& context) {
|
||||
// Schema: aten::batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var,
|
||||
// bool training, float momentum, float eps, bool cudnn_enabled) -> Tensor
|
||||
num_inputs_check(context, 8, 9);
|
||||
|
29
src/frontends/pytorch/src/op/bitwise_not.cpp
Normal file
29
src/frontends/pytorch/src/op/bitwise_not.cpp
Normal file
@ -0,0 +1,29 @@
|
||||
// Copyright (C) 2018-2023 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "openvino/frontend/pytorch/node_context.hpp"
|
||||
#include "openvino/op/logical_not.hpp"
|
||||
#include "utils.hpp"
|
||||
|
||||
namespace ov {
|
||||
namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_bitwise_not(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 2);
|
||||
auto x = context.get_input(0);
|
||||
FRONT_END_OP_CONVERSION_CHECK(x.get_element_type().compatible(element::boolean),
|
||||
"aten::bitwise_not suppored only for boolean input");
|
||||
auto not_x = context.mark_node(std::make_shared<ov::op::v1::LogicalNot>(x));
|
||||
if (!context.input_is_none(1)) {
|
||||
context.mutate_input(1, not_x);
|
||||
}
|
||||
return {not_x};
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
} // namespace pytorch
|
||||
} // namespace frontend
|
||||
} // namespace ov
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_bool(NodeContext& context) {
|
||||
OutputVector translate_bool(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 1);
|
||||
return {context.mark_node(std::make_shared<ov::op::v0::Convert>(context.get_input(0), element::boolean))};
|
||||
};
|
||||
|
@ -12,7 +12,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_cat(NodeContext& context) {
|
||||
OutputVector translate_cat(const NodeContext& context) {
|
||||
// This translator is only needed to get axis as constant from external scope
|
||||
num_inputs_check(context, 2, 2);
|
||||
const auto&& list_elems = get_list_as_outputs(context.get_input(0));
|
||||
|
@ -15,7 +15,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_clamp(NodeContext& context) {
|
||||
OutputVector translate_clamp(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 3);
|
||||
auto x = context.get_input(0);
|
||||
if (!context.input_is_none(1)) {
|
||||
|
@ -9,7 +9,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_constant(NodeContext& context) {
|
||||
OutputVector translate_constant(const NodeContext& context) {
|
||||
return context.as_constant();
|
||||
};
|
||||
|
||||
|
@ -15,7 +15,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_conv_transposend(NodeContext& context) {
|
||||
OutputVector translate_conv_transposend(const NodeContext& context) {
|
||||
num_inputs_check(context, 8, 8);
|
||||
auto strides = context.const_input<Strides>(3);
|
||||
// PyTorch support only symmetric padding, padding sizes are the same for begins and ends for each dimension
|
||||
|
@ -15,7 +15,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_convnd(NodeContext& context) {
|
||||
OutputVector translate_convnd(const NodeContext& context) {
|
||||
num_inputs_check(context, 7, 7);
|
||||
auto strides = context.const_input<Strides>(3);
|
||||
// In torch pads at beginning are same as at end
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_convolution(NodeContext& context) {
|
||||
OutputVector translate_convolution(const NodeContext& context) {
|
||||
// Schema: aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[]
|
||||
// dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool
|
||||
// cudnn_enabled, bool allow_tf32) -> Tensor
|
||||
|
@ -15,7 +15,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_convolution_mode(NodeContext& context) {
|
||||
OutputVector translate_convolution_mode(const NodeContext& context) {
|
||||
// Schema: aten::_convolution_mode(Tensor input, Tensor weight, Tensor? bias, int[] stride, str padding, int[]
|
||||
// dilation, int groups) -> Tensor
|
||||
num_inputs_check(context, 7, 7);
|
||||
|
@ -3,11 +3,7 @@
|
||||
//
|
||||
|
||||
#include "openvino/frontend/pytorch/node_context.hpp"
|
||||
#include "openvino/op/constant.hpp"
|
||||
#include "openvino/op/convert.hpp"
|
||||
#include "openvino/op/convert_like.hpp"
|
||||
#include "openvino/op/cum_sum.hpp"
|
||||
#include "pt_framework_node.hpp"
|
||||
#include "utils.hpp"
|
||||
|
||||
namespace ov {
|
||||
@ -17,21 +13,13 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_cumsum(NodeContext& context) {
|
||||
OutputVector translate_cumsum(const NodeContext& context) {
|
||||
// aten::cumsum(Tensor self, int dim, *, ScalarType? dtype=None, Tensor out=None)
|
||||
num_inputs_check(context, 2, 4);
|
||||
auto x = context.get_input(0);
|
||||
auto dim = context.get_input(1);
|
||||
if (!context.input_is_none(2)) {
|
||||
if (std::dynamic_pointer_cast<v0::Constant>(context.get_input_from_visible_context(2).get_node_shared_ptr())) {
|
||||
auto dtype = convert_dtype(context.const_input<int64_t>(2));
|
||||
x = context.mark_node(std::make_shared<v0::Convert>(x, dtype));
|
||||
} else if (const auto& fw_node = cast_fw_node(context.get_input(2).get_node_shared_ptr(), "prim::dtype")) {
|
||||
auto out_tensor = fw_node->input_value(0);
|
||||
x = context.mark_node(std::make_shared<v1::ConvertLike>(x, out_tensor));
|
||||
} else {
|
||||
FRONT_END_OP_CONVERSION_CHECK(false, "Couldn't get dtype input");
|
||||
}
|
||||
x = apply_dtype(context, 2, x);
|
||||
}
|
||||
auto result = context.mark_node(std::make_shared<v0::CumSum>(x, dim));
|
||||
if (!context.input_is_none(3)) {
|
||||
|
@ -12,7 +12,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_dim(NodeContext& context) {
|
||||
OutputVector translate_dim(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 1);
|
||||
Output<Node> rank;
|
||||
std::tie(std::ignore, rank) = get_shape_rank(context, context.get_input(0), true);
|
||||
|
@ -17,7 +17,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_div(NodeContext& context) {
|
||||
OutputVector translate_div(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 3);
|
||||
auto x = context.get_input(0);
|
||||
auto y = context.get_input(1);
|
||||
|
@ -12,7 +12,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_elu(NodeContext& context) {
|
||||
OutputVector translate_elu(const NodeContext& context) {
|
||||
// aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> Tensor
|
||||
num_inputs_check(context, 2, 4);
|
||||
auto x = context.get_input(0);
|
||||
|
@ -13,7 +13,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_embedding(NodeContext& context) {
|
||||
OutputVector translate_embedding(const NodeContext& context) {
|
||||
// aten::embedding(Tensor weight, Tensor indices, SymInt padding_idx=-1, bool scale_grad_by_freq=False, bool
|
||||
// sparse=False)
|
||||
num_inputs_check(context, 5, 5);
|
||||
|
@ -30,7 +30,7 @@ OutputVector base_expand(const NodeContext& context, const Output<Node>& x, cons
|
||||
};
|
||||
} // namespace
|
||||
|
||||
OutputVector translate_expand(NodeContext& context) {
|
||||
OutputVector translate_expand(const NodeContext& context) {
|
||||
// aten::expand(Tensor(a) self, SymInt[] size, *, bool implicit=False) -> Tensor(a)
|
||||
num_inputs_check(context, 2, 3);
|
||||
auto x = context.get_input(0);
|
||||
@ -41,7 +41,7 @@ OutputVector translate_expand(NodeContext& context) {
|
||||
return base_expand(context, x, sizes);
|
||||
};
|
||||
|
||||
OutputVector translate_expand_as(NodeContext& context) {
|
||||
OutputVector translate_expand_as(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto x = context.get_input(0);
|
||||
auto y = context.get_input(1);
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_eye(NodeContext& context) {
|
||||
OutputVector translate_eye(const NodeContext& context) {
|
||||
size_t num_inputs = context.get_input_size();
|
||||
auto x = context.get_input(0);
|
||||
// num rows and cols should be integer, but at the moment conversion their data type can be unknown yet
|
||||
|
@ -18,7 +18,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_flatten(NodeContext& context) {
|
||||
OutputVector translate_flatten(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 3);
|
||||
auto x = context.get_input(0);
|
||||
int64_t start_dim = 0;
|
||||
|
@ -14,7 +14,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_floor_divide(NodeContext& context) {
|
||||
OutputVector translate_floor_divide(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto x = context.get_input(0);
|
||||
auto y = context.get_input(1);
|
||||
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_floordiv(NodeContext& context) {
|
||||
OutputVector translate_floordiv(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto x = context.get_input(0);
|
||||
auto y = context.get_input(1);
|
||||
|
@ -5,7 +5,6 @@
|
||||
#include "openvino/frontend/pytorch/node_context.hpp"
|
||||
#include "openvino/op/broadcast.hpp"
|
||||
#include "openvino/op/constant.hpp"
|
||||
#include "openvino/op/convert.hpp"
|
||||
#include "openvino/op/convert_like.hpp"
|
||||
#include "openvino/op/shape_of.hpp"
|
||||
#include "utils.hpp"
|
||||
@ -22,18 +21,6 @@ Output<Node> base_translate_full(const NodeContext& context, const Output<Node>&
|
||||
return context.mark_node(std::make_shared<v3::Broadcast>(value, sizes));
|
||||
}
|
||||
|
||||
Output<Node> base_translate_full_with_convert(const NodeContext& context,
|
||||
const Output<Node>& sizes,
|
||||
const Output<Node>& value,
|
||||
size_t dtype_id) {
|
||||
auto filled_tensor = base_translate_full(context, sizes, value);
|
||||
if (!context.input_is_none(dtype_id)) {
|
||||
auto dtype = convert_dtype(context.const_input<int64_t>(dtype_id));
|
||||
filled_tensor = context.mark_node(std::make_shared<v0::Convert>(filled_tensor, dtype));
|
||||
}
|
||||
return filled_tensor;
|
||||
}
|
||||
|
||||
Output<Node> base_translate_full_with_convertlike(const NodeContext& context,
|
||||
const Output<Node>& sizes,
|
||||
const Output<Node>& value,
|
||||
@ -41,9 +28,21 @@ Output<Node> base_translate_full_with_convertlike(const NodeContext& context,
|
||||
auto filled_tensor = base_translate_full(context, sizes, value);
|
||||
return context.mark_node(std::make_shared<v1::ConvertLike>(filled_tensor, out));
|
||||
}
|
||||
|
||||
Output<Node> base_translate_full_with_convert(const NodeContext& context,
|
||||
const Output<Node>& sizes,
|
||||
Output<Node> value,
|
||||
size_t dtype_id) {
|
||||
if (!context.input_is_none(dtype_id)) {
|
||||
value = apply_dtype(context, dtype_id, value);
|
||||
}
|
||||
|
||||
auto filled_tensor = base_translate_full(context, sizes, value);
|
||||
return filled_tensor;
|
||||
}
|
||||
} // namespace
|
||||
|
||||
OutputVector translate_full(NodeContext& context) {
|
||||
OutputVector translate_full(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 6);
|
||||
auto sizes = context.get_input(0);
|
||||
auto value = context.get_input(1);
|
||||
@ -60,7 +59,7 @@ OutputVector translate_full(NodeContext& context) {
|
||||
return {base_translate_full_with_convert(context, sizes, value, dtype_id)};
|
||||
};
|
||||
|
||||
OutputVector translate_full_like(NodeContext& context) {
|
||||
OutputVector translate_full_like(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 7);
|
||||
auto input = context.get_input(0);
|
||||
auto value = context.get_input(1);
|
||||
@ -72,7 +71,7 @@ OutputVector translate_full_like(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, out)};
|
||||
};
|
||||
|
||||
OutputVector translate_fill_(NodeContext& context) {
|
||||
OutputVector translate_fill_(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto input = context.get_input(0);
|
||||
auto value = context.get_input(1);
|
||||
@ -80,7 +79,7 @@ OutputVector translate_fill_(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, input)};
|
||||
};
|
||||
|
||||
OutputVector translate_new_full(NodeContext& context) {
|
||||
OutputVector translate_new_full(const NodeContext& context) {
|
||||
num_inputs_check(context, 3, 7);
|
||||
auto input = context.get_input(0);
|
||||
auto sizes = context.get_input(1);
|
||||
@ -91,7 +90,7 @@ OutputVector translate_new_full(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, input)};
|
||||
};
|
||||
|
||||
OutputVector translate_zeros(NodeContext& context) {
|
||||
OutputVector translate_zeros(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 5);
|
||||
auto sizes = context.get_input(0);
|
||||
auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {0}));
|
||||
@ -108,7 +107,7 @@ OutputVector translate_zeros(NodeContext& context) {
|
||||
return {base_translate_full_with_convert(context, sizes, value, dtype_id)};
|
||||
};
|
||||
|
||||
OutputVector translate_zeros_like(NodeContext& context) {
|
||||
OutputVector translate_zeros_like(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 6);
|
||||
auto input = context.get_input(0);
|
||||
auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {0}));
|
||||
@ -120,7 +119,7 @@ OutputVector translate_zeros_like(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, out)};
|
||||
};
|
||||
|
||||
OutputVector translate_new_zeros(NodeContext& context) {
|
||||
OutputVector translate_new_zeros(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 6);
|
||||
auto input = context.get_input(0);
|
||||
auto sizes = context.get_input(1);
|
||||
@ -131,7 +130,7 @@ OutputVector translate_new_zeros(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, input)};
|
||||
};
|
||||
|
||||
OutputVector translate_ones(NodeContext& context) {
|
||||
OutputVector translate_ones(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 5);
|
||||
auto sizes = context.get_input(0);
|
||||
auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {1}));
|
||||
@ -148,7 +147,7 @@ OutputVector translate_ones(NodeContext& context) {
|
||||
return {base_translate_full_with_convert(context, sizes, value, dtype_id)};
|
||||
};
|
||||
|
||||
OutputVector translate_ones_like(NodeContext& context) {
|
||||
OutputVector translate_ones_like(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 6);
|
||||
auto input = context.get_input(0);
|
||||
auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {1}));
|
||||
@ -160,7 +159,7 @@ OutputVector translate_ones_like(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, out)};
|
||||
};
|
||||
|
||||
OutputVector translate_new_ones(NodeContext& context) {
|
||||
OutputVector translate_new_ones(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 6);
|
||||
auto input = context.get_input(0);
|
||||
auto sizes = context.get_input(1);
|
||||
@ -171,8 +170,11 @@ OutputVector translate_new_ones(NodeContext& context) {
|
||||
return {base_translate_full_with_convertlike(context, sizes, value, input)};
|
||||
};
|
||||
|
||||
OutputVector translate_empty(NodeContext& context) {
|
||||
num_inputs_check(context, 1, 5);
|
||||
OutputVector translate_empty(const NodeContext& context) {
|
||||
// aten::empty(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool?
|
||||
// pin_memory=None, MemoryFormat? memory_format=None) -> Tensor layout, device and work with memory ignored on our
|
||||
// side, so just skip these parameters
|
||||
num_inputs_check(context, 1, 6);
|
||||
auto sizes = context.get_input(0);
|
||||
// In OV uninitialised data is not supported, so we create a tensor filled with zeros with a given shape and type.
|
||||
auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {0}));
|
||||
|
@ -12,7 +12,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_gelu(NodeContext& context) {
|
||||
OutputVector translate_gelu(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto x = context.get_input(0);
|
||||
auto approximate = context.const_input<std::string>(1);
|
||||
|
@ -12,7 +12,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_get_attr(NodeContext& context) {
|
||||
OutputVector translate_get_attr(const NodeContext& context) {
|
||||
auto res = context.get_decoder()->try_decode_get_attr();
|
||||
FRONT_END_OP_CONVERSION_CHECK(res.size() > 0, "GetAttr must have at least one output.");
|
||||
return res;
|
||||
|
@ -13,7 +13,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_getitem(NodeContext& context) {
|
||||
OutputVector translate_getitem(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto input = context.get_input(0);
|
||||
if (std::dynamic_pointer_cast<ov::op::util::FrameworkNode>(input.get_node_shared_ptr())) {
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_glu(NodeContext& context) {
|
||||
OutputVector translate_glu(const NodeContext& context) {
|
||||
num_inputs_check(context, 2, 2);
|
||||
auto x = context.get_input(0);
|
||||
auto dim = context.input_is_none(1) ? context.mark_node(v0::Constant::create(element::i32, Shape{}, {-1}))
|
||||
|
@ -13,7 +13,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_grid_sampler(NodeContext& context) {
|
||||
OutputVector translate_grid_sampler(const NodeContext& context) {
|
||||
num_inputs_check(context, 4, 5);
|
||||
auto x = context.get_input(0);
|
||||
auto grid = context.get_input(1);
|
||||
|
@ -20,7 +20,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_group_norm(NodeContext& context) {
|
||||
OutputVector translate_group_norm(const NodeContext& context) {
|
||||
// aten::group_norm(Tensor input, int num_groups, Tensor? weight=None, Tensor? bias=None, float
|
||||
// eps=1.0000000000000001e-05, bool cudnn_enabled=True) -> Tensor
|
||||
num_inputs_check(context, 2, 6);
|
||||
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_hardtanh(NodeContext& context) {
|
||||
OutputVector translate_hardtanh(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 3);
|
||||
float min = -1;
|
||||
float max = 1;
|
||||
|
@ -13,7 +13,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_if(NodeContext& context) {
|
||||
OutputVector translate_if(const NodeContext& context) {
|
||||
auto if_node = std::make_shared<opset10::If>(context.get_input(0));
|
||||
context.mark_node(if_node);
|
||||
auto decoder = context.get_decoder();
|
||||
|
@ -56,7 +56,7 @@ std::shared_ptr<Node> get_im2col_indices_along_dim(const NodeContext& context,
|
||||
}
|
||||
} // namespace
|
||||
|
||||
OutputVector translate_im2col(NodeContext& context) {
|
||||
OutputVector translate_im2col(const NodeContext& context) {
|
||||
num_inputs_check(context, 5, 5);
|
||||
auto input = context.get_input(0);
|
||||
auto kernel_size = context.const_input<std::vector<int64_t>>(1);
|
||||
|
@ -10,9 +10,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_index_put_(NodeContext& context) {
|
||||
OutputVector translate_index_put_(const NodeContext& context) {
|
||||
// Pass as PtFrameworkNode to register as `inplace_op`. Conversion to OV operators is done as transformation.
|
||||
auto node = std::make_shared<PtFrameworkNode>(context.get_decoder(), context.inputs());
|
||||
return {context.mark_node(node)};
|
||||
|
@ -88,7 +88,7 @@ OutputVector translate_instance_norm_train(const NodeContext& context,
|
||||
|
||||
} // namespace
|
||||
|
||||
OutputVector translate_instance_norm(NodeContext& context) {
|
||||
OutputVector translate_instance_norm(const NodeContext& context) {
|
||||
num_inputs_check(context, 8, 9);
|
||||
auto input = context.get_input(0);
|
||||
auto eps = context.const_input<float>(7);
|
||||
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_int(NodeContext& context) {
|
||||
OutputVector translate_int(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 1);
|
||||
return {context.mark_node(std::make_shared<ov::op::v0::Convert>(context.get_input(0), element::i32))};
|
||||
};
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_layer_norm(NodeContext& context) {
|
||||
OutputVector translate_layer_norm(const NodeContext& context) {
|
||||
num_inputs_check(context, 5, 6);
|
||||
auto eps = context.const_input<float>(4);
|
||||
auto normalized_shape = context.const_input<Shape>(1);
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_len(NodeContext& context) {
|
||||
OutputVector translate_len(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 1);
|
||||
auto const_0 = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {0}));
|
||||
auto const_1 = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {1}));
|
||||
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_linear(NodeContext& context) {
|
||||
OutputVector translate_linear(const NodeContext& context) {
|
||||
// schema: aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> Tensor
|
||||
num_inputs_check(context, 2, 3);
|
||||
auto x = context.get_input(0);
|
||||
|
@ -15,7 +15,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_list_construct(NodeContext& context) {
|
||||
OutputVector translate_list_construct(const NodeContext& context) {
|
||||
// Process the case when prim::ListConstruct has all inputs constant
|
||||
auto const_0 = context.mark_node(v0::Constant::create(element::i32, Shape{}, {0}));
|
||||
ov::OutputVector consts;
|
||||
|
@ -17,7 +17,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_log(NodeContext& context) {
|
||||
OutputVector translate_log(const NodeContext& context) {
|
||||
// torch.log returns a tensor with the natural logarithm of the elements of input.
|
||||
num_inputs_check(context, 1, 1);
|
||||
auto x = context.get_input(0);
|
||||
@ -26,7 +26,7 @@ OutputVector translate_log(NodeContext& context) {
|
||||
return {log};
|
||||
};
|
||||
|
||||
OutputVector translate_log2(NodeContext& context) {
|
||||
OutputVector translate_log2(const NodeContext& context) {
|
||||
// torch.log2 returns a tensor with the logarithm to the base 2 of the elements of input.
|
||||
num_inputs_check(context, 1, 1);
|
||||
auto x = context.get_input(0);
|
||||
|
@ -13,7 +13,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_loop(NodeContext& context) {
|
||||
OutputVector translate_loop(const NodeContext& context) {
|
||||
const auto& inputs = context.inputs();
|
||||
FRONT_END_OP_CONVERSION_CHECK(inputs.size() >= 2, "Loop must have at least 2 inputs.");
|
||||
auto loop = std::make_shared<ov::op::v5::Loop>(inputs[0], inputs[1]);
|
||||
|
@ -18,7 +18,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_masked_fill(NodeContext& context) {
|
||||
OutputVector translate_masked_fill(const NodeContext& context) {
|
||||
num_inputs_check(context, 3, 3);
|
||||
auto data = context.get_input(0);
|
||||
auto mask = context.get_input(1);
|
||||
|
@ -13,7 +13,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_max_poolnd(NodeContext& context) {
|
||||
OutputVector translate_max_poolnd(const NodeContext& context) {
|
||||
num_inputs_check(context, 6, 6);
|
||||
auto kernel = context.const_input<Shape>(1);
|
||||
auto strides = context.const_input<Strides>(2);
|
||||
|
@ -11,7 +11,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_mean(NodeContext& context) {
|
||||
OutputVector translate_mean(const NodeContext& context) {
|
||||
num_inputs_check(context, 3, 4);
|
||||
auto x = context.get_input(0);
|
||||
auto y = context.get_input(1);
|
||||
|
@ -10,7 +10,7 @@ namespace frontend {
|
||||
namespace pytorch {
|
||||
namespace op {
|
||||
|
||||
OutputVector translate_meshgrid(NodeContext& context) {
|
||||
OutputVector translate_meshgrid(const NodeContext& context) {
|
||||
std::string indexing = "ij";
|
||||
if (!context.input_is_none(1)) {
|
||||
indexing = context.const_input<std::string>(1);
|
||||
|
@ -20,7 +20,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_max(NodeContext& context) {
|
||||
OutputVector translate_max(const NodeContext& context) {
|
||||
// torch.max (same for torch.min) actually has two interfaces smashed together:
|
||||
// torch.max(x, dim, keepdim) and torch.max(x, y)
|
||||
num_inputs_check(context, 1, 3);
|
||||
@ -49,7 +49,7 @@ OutputVector translate_max(NodeContext& context) {
|
||||
return {values, indicies};
|
||||
};
|
||||
|
||||
OutputVector translate_min(NodeContext& context) {
|
||||
OutputVector translate_min(const NodeContext& context) {
|
||||
// torch.min (same for torch.max) actually has two interfaces smashed together:
|
||||
// torch.min(x, dim, keepdim) and torch.min(x, y)
|
||||
num_inputs_check(context, 1, 3);
|
||||
|
@ -16,7 +16,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_narrow(NodeContext& context) {
|
||||
OutputVector translate_narrow(const NodeContext& context) {
|
||||
num_inputs_check(context, 4, 4);
|
||||
|
||||
auto const_1 = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {1}));
|
||||
|
@ -15,7 +15,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_neg(NodeContext& context) {
|
||||
OutputVector translate_neg(const NodeContext& context) {
|
||||
num_inputs_check(context, 1, 1);
|
||||
auto x = context.get_input(0);
|
||||
auto const_neg_1 = context.mark_node(v0::Constant::create(element::i32, Shape{}, {-1}));
|
||||
|
@ -18,7 +18,7 @@ namespace op {
|
||||
|
||||
using namespace ov::op;
|
||||
|
||||
OutputVector translate_nms(NodeContext& context) {
|
||||
OutputVector translate_nms(const NodeContext& context) {
|
||||
num_inputs_check(context, 3, 3);
|
||||
auto const_0 = context.mark_node(v0::Constant::create(element::i32, Shape{}, {0}));
|
||||
auto const_1 = context.mark_node(v0::Constant::create(element::i32, Shape{}, {1}));
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user