618 lines
21 KiB
ReStructuredText
618 lines
21 KiB
ReStructuredText
Performance tricks in OpenVINO for latency mode
|
||
===============================================
|
||
|
||
The goal of this notebook is to provide a step-by-step tutorial for
|
||
improving performance for inferencing in a latency mode. Low latency is
|
||
especially desired in real-time applications when the results are needed
|
||
as soon as possible after the data appears. This notebook assumes
|
||
computer vision workflow and uses
|
||
`YOLOv5n <https://github.com/ultralytics/yolov5>`__ model. We will
|
||
simulate a camera application that provides frames one by one.
|
||
|
||
The performance tips applied in this notebook could be summarized in the
|
||
following figure. Some of the steps below can be applied to any device
|
||
at any stage, e.g., “shared_memory”; some can be used only to specific
|
||
devices, e.g., “inference num threads” to CPU. As the number of
|
||
potential configurations is vast, we recommend looking at the steps
|
||
below and then apply a trial-and-error approach. You can incorporate
|
||
many hints simultaneously, like more inference threads + shared memory.
|
||
It should give even better performance, but we recommend testing it
|
||
anyway.
|
||
|
||
**NOTE**: We especially recommend trying
|
||
``OpenVINO IR model + CPU + shared memory in latency mode`` or
|
||
``OpenVINO IR model + CPU + shared memory + more inference threads``.
|
||
|
||
The quantization and pre-post-processing API are not included here as
|
||
they change the precision (quantization) or processing graph
|
||
(prepostprocessor). You can find examples of how to apply them to
|
||
optimize performance on OpenVINO IR files in
|
||
`111-detection-quantization <../111-detection-quantization>`__ and
|
||
`118-optimize-preprocessing <../118-optimize-preprocessing>`__.
|
||
|
||
|image0|
|
||
|
||
**NOTE**: Many of the steps presented below will give you better
|
||
performance. However, some of them may not change anything if they
|
||
are strongly dependent on either the hardware or the model. Please
|
||
run this notebook on your computer with your model to learn which of
|
||
them makes sense in your case.
|
||
|
||
Prerequisites
|
||
-------------
|
||
|
||
.. |image0| image:: https://user-images.githubusercontent.com/4547501/229120774-01f4f972-424d-4280-8395-220dd432985a.png
|
||
|
||
.. code:: ipython3
|
||
|
||
!pip install -q seaborn ultralytics
|
||
|
||
.. code:: ipython3
|
||
|
||
import os
|
||
import sys
|
||
import time
|
||
from pathlib import Path
|
||
from typing import Any, List, Tuple
|
||
|
||
sys.path.append("../utils")
|
||
import notebook_utils as utils
|
||
|
||
Data
|
||
----
|
||
|
||
We will use the same image of the dog sitting on a bicycle for all
|
||
experiments below. The image is resized and preprocessed to fulfill the
|
||
requirements of this particular object detection model.
|
||
|
||
.. code:: ipython3
|
||
|
||
import numpy as np
|
||
import cv2
|
||
|
||
IMAGE_WIDTH = 640
|
||
IMAGE_HEIGHT = 480
|
||
|
||
# load image
|
||
image = utils.load_image("../data/image/coco_bike.jpg")
|
||
image = cv2.resize(image, dsize=(IMAGE_WIDTH, IMAGE_HEIGHT), interpolation=cv2.INTER_AREA)
|
||
|
||
# preprocess it for YOLOv5
|
||
input_image = image / 255.0
|
||
input_image = np.transpose(input_image, axes=(2, 0, 1))
|
||
input_image = np.expand_dims(input_image, axis=0)
|
||
|
||
# show the image
|
||
utils.show_array(image)
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_4_0.jpg
|
||
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
<DisplayHandle display_id=6f2409e74123f40cdda4989b2a2a5922>
|
||
|
||
|
||
|
||
Model
|
||
-----
|
||
|
||
We decided to go with
|
||
`YOLOv5n <https://github.com/ultralytics/yolov5>`__, one of the
|
||
state-of-the-art object detection models, easily available through the
|
||
PyTorch Hub and small enough to see the difference in performance.
|
||
|
||
.. code:: ipython3
|
||
|
||
import torch
|
||
from IPython.utils import io
|
||
|
||
# directory for all models
|
||
base_model_dir = Path("model")
|
||
|
||
model_name = "yolov5n"
|
||
model_path = base_model_dir / model_name
|
||
|
||
# load YOLOv5n from PyTorch Hub
|
||
pytorch_model = torch.hub.load("ultralytics/yolov5", "custom", path=model_path, device="cpu", skip_validation=True)
|
||
# don't print full model architecture
|
||
with io.capture_output():
|
||
pytorch_model.eval()
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
Using cache found in /opt/home/k8sworker/.cache/torch/hub/ultralytics_yolov5_master
|
||
YOLOv5 🚀 2023-4-21 Python-3.8.10 torch-1.13.1+cpu CPU
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
requirements: /opt/home/k8sworker/.cache/torch/hub/requirements.txt not found, check failed.
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5n.pt to model/yolov5n.pt...
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
0%| | 0.00/3.87M [00:00<?, ?B/s]
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
|
||
Fusing layers...
|
||
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients
|
||
Adding AutoShape...
|
||
|
||
|
||
Hardware
|
||
--------
|
||
|
||
The code below lists the available hardware we will use in the
|
||
benchmarking process.
|
||
|
||
**NOTE**: The hardware you have is probably completely different from
|
||
ours. It means you can see completely different results.
|
||
|
||
.. code:: ipython3
|
||
|
||
import openvino.runtime as ov
|
||
|
||
# initialize OpenVINO
|
||
core = ov.Core()
|
||
|
||
# print available devices
|
||
for device in core.available_devices:
|
||
device_name = core.get_property(device, "FULL_DEVICE_NAME")
|
||
print(f"{device}: {device_name}")
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
CPU: Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz
|
||
|
||
|
||
Helper functions
|
||
----------------
|
||
|
||
We’re defining a benchmark model function to use for all optimized
|
||
models below. It runs inference 1000 times, averages the latency time,
|
||
and prints two measures: seconds per image and frames per second (FPS).
|
||
|
||
.. code:: ipython3
|
||
|
||
INFER_NUMBER = 1000
|
||
|
||
|
||
def benchmark_model(model: Any, input_data: np.ndarray, benchmark_name: str, device_name: str = "CPU") -> float:
|
||
"""
|
||
Helper function for benchmarking the model. It measures the time and prints results.
|
||
"""
|
||
# measure the first inference separately - it may be slower as it contains also initialization
|
||
start = time.perf_counter()
|
||
model(input_data)
|
||
end = time.perf_counter()
|
||
first_infer_time = end - start
|
||
print(f"{benchmark_name} on {device_name}. First inference time: {first_infer_time :.4f} seconds")
|
||
|
||
# benchmarking
|
||
start = time.perf_counter()
|
||
for _ in range(INFER_NUMBER):
|
||
model(input_data)
|
||
end = time.perf_counter()
|
||
|
||
# elapsed time
|
||
infer_time = end - start
|
||
|
||
# print second per image and FPS
|
||
mean_infer_time = infer_time / INFER_NUMBER
|
||
mean_fps = INFER_NUMBER / infer_time
|
||
print(f"{benchmark_name} on {device_name}: {mean_infer_time :.4f} seconds per image ({mean_fps :.2f} FPS)")
|
||
|
||
return mean_infer_time
|
||
|
||
The following functions aim to post-process results and draw boxes on
|
||
the image.
|
||
|
||
.. code:: ipython3
|
||
|
||
# https://gist.github.com/AruniRC/7b3dadd004da04c80198557db5da4bda
|
||
classes = [
|
||
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant",
|
||
"stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra",
|
||
"giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
|
||
"baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork",
|
||
"knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut",
|
||
"cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard",
|
||
"cell phone", "microwave", "oven", "oaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
|
||
"hair drier", "toothbrush"
|
||
]
|
||
|
||
# Colors for the classes above (Rainbow Color Map).
|
||
colors = cv2.applyColorMap(
|
||
src=np.arange(0, 255, 255 / len(classes), dtype=np.float32).astype(np.uint8),
|
||
colormap=cv2.COLORMAP_RAINBOW,
|
||
).squeeze()
|
||
|
||
|
||
def postprocess(detections: np.ndarray) -> List[Tuple]:
|
||
"""
|
||
Postprocess the raw results from the model.
|
||
"""
|
||
# candidates - probability > 0.25
|
||
detections = detections[detections[..., 4] > 0.25]
|
||
|
||
boxes = []
|
||
labels = []
|
||
scores = []
|
||
for obj in detections:
|
||
xmin, ymin, ww, hh = obj[:4]
|
||
score = obj[4]
|
||
label = np.argmax(obj[5:])
|
||
# Create a box with pixels coordinates from the box with normalized coordinates [0,1].
|
||
boxes.append(
|
||
tuple(map(int, (xmin - ww // 2, ymin - hh // 2, ww, hh)))
|
||
)
|
||
labels.append(int(label))
|
||
scores.append(float(score))
|
||
|
||
# Apply non-maximum suppression to get rid of many overlapping entities.
|
||
# See https://paperswithcode.com/method/non-maximum-suppression
|
||
# This algorithm returns indices of objects to keep.
|
||
indices = cv2.dnn.NMSBoxes(
|
||
bboxes=boxes, scores=scores, score_threshold=0.25, nms_threshold=0.5
|
||
)
|
||
|
||
# If there are no boxes.
|
||
if len(indices) == 0:
|
||
return []
|
||
|
||
# Filter detected objects.
|
||
return [(labels[idx], scores[idx], boxes[idx]) for idx in indices.flatten()]
|
||
|
||
|
||
def draw_boxes(img: np.ndarray, boxes):
|
||
"""
|
||
Draw detected boxes on the image.
|
||
"""
|
||
for label, score, box in boxes:
|
||
# Choose color for the label.
|
||
color = tuple(map(int, colors[label]))
|
||
# Draw a box.
|
||
x2 = box[0] + box[2]
|
||
y2 = box[1] + box[3]
|
||
cv2.rectangle(img=img, pt1=box[:2], pt2=(x2, y2), color=color, thickness=2)
|
||
|
||
# Draw a label name inside the box.
|
||
cv2.putText(
|
||
img=img,
|
||
text=f"{classes[label]} {score:.2f}",
|
||
org=(box[0] + 10, box[1] + 20),
|
||
fontFace=cv2.FONT_HERSHEY_COMPLEX,
|
||
fontScale=img.shape[1] / 1200,
|
||
color=color,
|
||
thickness=1,
|
||
lineType=cv2.LINE_AA,
|
||
)
|
||
|
||
|
||
def show_result(results: np.ndarray):
|
||
"""
|
||
Postprocess the raw results, draw boxes and show the image.
|
||
"""
|
||
output_img = image.copy()
|
||
|
||
detections = postprocess(results)
|
||
draw_boxes(output_img, detections)
|
||
|
||
utils.show_array(output_img)
|
||
|
||
Optimizations
|
||
-------------
|
||
|
||
Below, we present the performance tricks for faster inference in the
|
||
latency mode. We release resources after every benchmarking to be sure
|
||
the same amount of resource is available for every experiment.
|
||
|
||
PyTorch model
|
||
~~~~~~~~~~~~~
|
||
|
||
First, we’re benchmarking the original PyTorch model without any
|
||
optimizations applied. We will treat it as our baseline.
|
||
|
||
.. code:: ipython3
|
||
|
||
import torch
|
||
|
||
with torch.no_grad():
|
||
result = pytorch_model(torch.as_tensor(input_image)).detach().numpy()[0]
|
||
show_result(result)
|
||
pytorch_infer_time = benchmark_model(pytorch_model, input_data=torch.as_tensor(input_image).float(), benchmark_name="PyTorch model")
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_14_0.jpg
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
PyTorch model on CPU. First inference time: 0.0288 seconds
|
||
PyTorch model on CPU: 0.0205 seconds per image (48.90 FPS)
|
||
|
||
|
||
ONNX model
|
||
~~~~~~~~~~
|
||
|
||
The first optimization is exporting the PyTorch model to ONNX and
|
||
running it in OpenVINO. It’s possible, thanks to the ONNX frontend. It
|
||
means we don’t necessarily have to convert the model to Intermediate
|
||
Representation (IR) to leverage the OpenVINO Runtime.
|
||
|
||
.. code:: ipython3
|
||
|
||
onnx_path = base_model_dir / Path(f"{model_name}_{IMAGE_WIDTH}_{IMAGE_HEIGHT}").with_suffix(".onnx")
|
||
|
||
# export PyTorch model to ONNX if it doesn't already exist
|
||
if not onnx_path.exists():
|
||
dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
|
||
torch.onnx.export(pytorch_model, dummy_input, onnx_path)
|
||
|
||
# load and compile in OpenVINO
|
||
onnx_model = core.read_model(onnx_path)
|
||
onnx_model = core.compile_model(onnx_model, device_name="CPU")
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
/opt/home/k8sworker/.cache/torch/hub/ultralytics_yolov5_master/models/common.py:514: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
||
y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)
|
||
/opt/home/k8sworker/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py:64: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
||
if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
|
||
|
||
|
||
.. code:: ipython3
|
||
|
||
result = onnx_model(input_image)[onnx_model.output(0)][0]
|
||
show_result(result)
|
||
onnx_infer_time = benchmark_model(model=onnx_model, input_data=input_image, benchmark_name="ONNX model")
|
||
|
||
del onnx_model # release resources
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_17_0.jpg
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
ONNX model on CPU. First inference time: 0.0165 seconds
|
||
ONNX model on CPU: 0.0091 seconds per image (109.62 FPS)
|
||
|
||
|
||
OpenVINO IR model
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
Let’s convert the ONNX model to OpenVINO Intermediate Representation
|
||
(IR) FP16 and run it. Reducing the precision is one of the well-known
|
||
methods for faster inference provided the hardware that supports lower
|
||
precision, such as FP16 or even INT8. If the hardware doesn’t support
|
||
lower precisions, the model will be inferred in FP32 automatically. We
|
||
could also use quantization (INT8), but we should experience a little
|
||
accuracy drop. That’s why we skip that step in this notebook.
|
||
|
||
.. code:: ipython3
|
||
|
||
from openvino.tools import mo
|
||
|
||
ov_model = mo.convert_model(onnx_path, compress_to_fp16=True)
|
||
# save the model on disk
|
||
ov.serialize(ov_model, xml_path=str(onnx_path.with_suffix(".xml")))
|
||
|
||
ov_cpu_model = core.compile_model(ov_model, device_name="CPU")
|
||
|
||
result = ov_cpu_model(input_image)[ov_cpu_model.output(0)][0]
|
||
show_result(result)
|
||
ov_cpu_infer_time = benchmark_model(model=ov_cpu_model, input_data=input_image, benchmark_name="OpenVINO model")
|
||
|
||
del ov_cpu_model # release resources
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_19_0.jpg
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
OpenVINO model on CPU. First inference time: 0.0152 seconds
|
||
OpenVINO model on CPU: 0.0092 seconds per image (109.13 FPS)
|
||
|
||
|
||
OpenVINO IR model on GPU
|
||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Usually, a GPU device is faster than a CPU, so let’s run the above model
|
||
on the GPU. Please note you need to have an Intel GPU and `install
|
||
drivers <https://github.com/openvinotoolkit/openvino_notebooks/wiki/Ubuntu#1-install-python-git-and-gpu-drivers-optional>`__
|
||
to be able to run this step. In addition, offloading to the GPU helps
|
||
reduce CPU load and memory consumption, allowing it to be left for
|
||
routine processes. If you cannot observe a faster inference on GPU, it
|
||
may be because the model is too light to benefit from massive parallel
|
||
execution.
|
||
|
||
.. code:: ipython3
|
||
|
||
ov_gpu_infer_time = 0.0
|
||
if "GPU" in core.available_devices:
|
||
ov_gpu_model = core.compile_model(ov_model, device_name="GPU")
|
||
|
||
result = ov_gpu_model(input_image)[ov_gpu_model.output(0)][0]
|
||
show_result(result)
|
||
ov_gpu_infer_time = benchmark_model(model=ov_gpu_model, input_data=input_image, benchmark_name="OpenVINO model", device_name="GPU")
|
||
|
||
del ov_gpu_model # release resources
|
||
|
||
OpenVINO IR model + more inference threads
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
There is a possibility to add a config for any device (CPU in this
|
||
case). We will increase the number of threads to an equal number of our
|
||
cores. It should help us a lot. There are `more
|
||
options <https://docs.openvino.ai/latest/groupov_runtime_cpp_prop_api.html>`__
|
||
to be changed, so it’s worth playing with them to see what works best in
|
||
our case.
|
||
|
||
.. code:: ipython3
|
||
|
||
num_cores = os.cpu_count()
|
||
|
||
ov_cpu_config_model = core.compile_model(ov_model, device_name="CPU", config={"INFERENCE_NUM_THREADS": num_cores})
|
||
|
||
result = ov_cpu_config_model(input_image)[ov_cpu_config_model.output(0)][0]
|
||
show_result(result)
|
||
ov_cpu_config_infer_time = benchmark_model(model=ov_cpu_config_model, input_data=input_image, benchmark_name="OpenVINO model + more threads")
|
||
|
||
del ov_cpu_config_model # release resources
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_23_0.jpg
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
OpenVINO model + more threads on CPU. First inference time: 0.0143 seconds
|
||
OpenVINO model + more threads on CPU: 0.0094 seconds per image (106.13 FPS)
|
||
|
||
|
||
OpenVINO IR model in latency mode
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
OpenVINO offers a virtual device called
|
||
`AUTO <https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html>`__,
|
||
which can select the best device for us based on a performance hint.
|
||
There are three different hints: ``LATENCY``, ``THROUGHPUT``, and
|
||
``CUMULATIVE_THROUGHPUT``. As this notebook is focused on the latency
|
||
mode, we will use ``LATENCY``. The above hints can be used with other
|
||
devices as well.
|
||
|
||
.. code:: ipython3
|
||
|
||
ov_auto_model = core.compile_model(ov_model, device_name="AUTO", config={"PERFORMANCE_HINT": "LATENCY"})
|
||
|
||
result = ov_auto_model(input_image)[ov_auto_model.output(0)][0]
|
||
show_result(result)
|
||
ov_auto_infer_time = benchmark_model(model=ov_auto_model, input_data=input_image, benchmark_name="OpenVINO model", device_name="AUTO")
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_25_0.jpg
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
OpenVINO model on AUTO. First inference time: 0.0134 seconds
|
||
OpenVINO model on AUTO: 0.0095 seconds per image (105.46 FPS)
|
||
|
||
|
||
OpenVINO IR model in latency mode + shared memory
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
OpenVINO is a C++ toolkit with Python wrappers (API). The default
|
||
behavior in the Python API is copying the input to the additional buffer
|
||
and then running processing in C++, which prevents many
|
||
multiprocessing-related issues. However, it also increases time cost. We
|
||
can create a tensor with enabled shared memory (keeping in mind we
|
||
cannot overwrite our input), save time for copying and improve the
|
||
performance!
|
||
|
||
.. code:: ipython3
|
||
|
||
# it must be assigned to a variable, not to be garbage collected
|
||
c_input_image = np.ascontiguousarray(input_image, dtype=np.float32)
|
||
input_tensor = ov.Tensor(c_input_image, shared_memory=True)
|
||
|
||
result = ov_auto_model(input_tensor)[ov_auto_model.output(0)][0]
|
||
show_result(result)
|
||
ov_auto_shared_infer_time = benchmark_model(model=ov_auto_model, input_data=input_tensor, benchmark_name="OpenVINO model + shared memory", device_name="AUTO")
|
||
|
||
del ov_auto_model # release resources
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_27_0.jpg
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
OpenVINO model + shared memory on AUTO. First inference time: 0.0140 seconds
|
||
OpenVINO model + shared memory on AUTO: 0.0055 seconds per image (181.99 FPS)
|
||
|
||
|
||
Other tricks
|
||
~~~~~~~~~~~~
|
||
|
||
There are other tricks for performance improvement, especially
|
||
quantization and prepostprocessing. To get even more from your model,
|
||
please visit
|
||
`111-detection-quantization <../111-detection-quantization>`__ and
|
||
`118-optimize-preprocessing <../118-optimize-preprocessing>`__.
|
||
|
||
Performance comparison
|
||
----------------------
|
||
|
||
The following graphical comparison is valid for the selected model and
|
||
hardware simultaneously. If you cannot see any improvement between some
|
||
steps, just skip them.
|
||
|
||
.. code:: ipython3
|
||
|
||
%matplotlib inline
|
||
|
||
.. code:: ipython3
|
||
|
||
from matplotlib import pyplot as plt
|
||
|
||
labels = ["PyTorch model", "ONNX model", "OpenVINO IR model", "OpenVINO IR model on GPU", "OpenVINO IR model + more inference threads",
|
||
"OpenVINO IR model in latency mode", "OpenVINO IR model in latency mode + shared memory"]
|
||
# make them milliseconds
|
||
times = list(map(lambda x: 1000 * x, [pytorch_infer_time, onnx_infer_time, ov_cpu_infer_time, ov_gpu_infer_time, ov_cpu_config_infer_time,
|
||
ov_auto_infer_time, ov_auto_shared_infer_time]))
|
||
|
||
bar_colors = colors[::10] / 255.0
|
||
|
||
fig, ax = plt.subplots(figsize=(16, 8))
|
||
ax.bar(labels, times, color=bar_colors)
|
||
|
||
ax.set_ylabel("Inference time [ms]")
|
||
ax.set_title("Performance difference")
|
||
|
||
plt.xticks(rotation='vertical')
|
||
plt.show()
|
||
|
||
|
||
|
||
.. image:: 109-latency-tricks-with-output_files/109-latency-tricks-with-output_30_0.png
|
||
|
||
|
||
Conclusions
|
||
-----------
|
||
|
||
We already showed the steps needed to improve the performance of an
|
||
object detection model. Even if you experience much better performance
|
||
after running this notebook, please note this may not be valid for every
|
||
hardware or every model. For the most accurate results, please use
|
||
``benchmark_app`` `command-line
|
||
tool <https://docs.openvino.ai/latest/openvino_inference_engine_samples_benchmark_app_README.html>`__.
|
||
Note that ``benchmark_app`` cannot measure the impact of some tricks
|
||
above, e.g., shared memory.
|