* update-160823 * fixes * fix-toc-headings * fix-headings * fix * fix-headings * fix * fix-headings * fixes * Update 220-cross-lingual-books-alignment-with-output.rst * fixes * fix * fix-toc-headings * fix-headings * fix toc * fix toc * fix toc * add-missing-301-nncf * Update 301-tensorflow-training-openvino-nncf-with-output.rst * fix toc * fixes
310 lines
12 KiB
ReStructuredText
310 lines
12 KiB
ReStructuredText
The attention center model with OpenVINO™
|
||
=========================================
|
||
|
||
.. _top:
|
||
|
||
This notebook demonstrates how to use the `attention center
|
||
model <https://github.com/google/attention-center/tree/main>`__ with
|
||
OpenVINO. This model is in the `TensorFlow Lite
|
||
format <https://www.tensorflow.org/lite>`__, which is supported in
|
||
OpenVINO now by TFLite frontend.
|
||
|
||
Eye tracking is commonly used in visual neuroscience and cognitive
|
||
science to answer related questions such as visual attention and
|
||
decision making. Computational models that predict where to look have
|
||
direct applications to a variety of computer vision tasks. The attention
|
||
center model takes an RGB image as input and return a 2D point as
|
||
output. This 2D point is the predicted center of human attention on the
|
||
image i.e. the most salient part of images, on which people pay
|
||
attention fist to. This allows find the most visually salient regions
|
||
and handle it as early as possible. For example, it could be used for
|
||
the latest generation image format (such as `JPEG
|
||
XL <https://github.com/libjxl/libjxl>`__), which supports encoding the
|
||
parts that you pay attention to fist. It can help to improve user
|
||
experience, image will appear to load faster.
|
||
|
||
Attention center model architecture is: > The attention center model is
|
||
a deep neural net, which takes an image as input, and uses a pre-trained
|
||
classification network, e.g, ResNet, MobileNet, etc., as the backbone.
|
||
Several intermediate layers that output from the backbone network are
|
||
used as input for the attention center prediction module. These
|
||
different intermediate layers contain different information e.g.,
|
||
shallow layers often contain low level information like
|
||
intensity/color/texture, while deeper layers usually contain higher and
|
||
more semantic information like shape/object. All are useful for the
|
||
attention prediction. The attention center prediction applies
|
||
convolution, deconvolution and/or resizing operator together with
|
||
aggregation and sigmoid function to generate a weighting map for the
|
||
attention center. And then an operator (the Einstein summation operator
|
||
in our case) can be applied to compute the (gravity) center from the
|
||
weighting map. An L2 norm between the predicted attention center and the
|
||
ground-truth attention center can be computed as the training loss.
|
||
Source: `Google AI blog
|
||
post <https://opensource.googleblog.com/2022/12/open-sourcing-attention-center-model.html>`__.
|
||
|
||
.. figure:: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxLCDJHzJNjB_von-vFlq8TJJFA41aB85T-QE3ZNxW8kshAf3HOEyIEJ4uggXjbJmZhsdj7j6i6mvvmXtyaxXJPm3JHuKILNRTPfX9KvICbFBRD8KNuDVmLABzYuhQci3BT2BqV-wM54IxaoAV1YDBbnpJC92UZfEBGvakLusiqND2AaPpWPr2gJV1/s1600/image4.png
|
||
:alt: drawing
|
||
|
||
drawing
|
||
|
||
The attention center model has been trained with images from the `COCO
|
||
dataset <https://cocodataset.org/#home>`__ annotated with saliency from
|
||
the `SALICON dataset <http://salicon.net/>`__.
|
||
|
||
**Table of contents**:
|
||
|
||
- `Imports <#imports>`__
|
||
- `Download the attention-center model <#download-the-attention-center-model>`__
|
||
|
||
- `Convert Tensorflow Lite model to OpenVINO IR format <#convert-tensorflow-lite-model-to-openvino-ir-format>`__
|
||
|
||
- `Select inference device <#select-inference-device>`__
|
||
- `Prepare image to use with attention-center model <#prepare-image-to-use-with-attention-center-model>`__
|
||
- `Load input image <#load-input-image>`__
|
||
- `Get result with OpenVINO IR model <#get-result-with-openvino-ir-model>`__
|
||
|
||
Imports `⇑ <#top>`__
|
||
###############################################################################################################################
|
||
|
||
|
||
.. code:: ipython3
|
||
|
||
import cv2
|
||
|
||
import numpy as np
|
||
import tensorflow as tf
|
||
from pathlib import Path
|
||
import matplotlib.pyplot as plt
|
||
|
||
from openvino.tools import mo
|
||
from openvino.runtime import serialize, Core
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
2023-08-15 23:14:52.395540: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||
2023-08-15 23:14:52.429075: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||
2023-08-15 23:14:52.969814: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||
|
||
|
||
Download the attention-center model `⇑ <#top>`__
|
||
###############################################################################################################################
|
||
|
||
|
||
Download the model as part of `attention-center
|
||
repo <https://github.com/google/attention-center/tree/main>`__. The repo
|
||
include model in folder ``./model``.
|
||
|
||
.. code:: ipython3
|
||
|
||
if not Path('./attention-center').exists():
|
||
! git clone https://github.com/google/attention-center
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
Cloning into 'attention-center'...
|
||
remote: Enumerating objects: 168, done.[K
|
||
remote: Counting objects: 100% (168/168), done.[K
|
||
remote: Compressing objects: 100% (132/132), done.[K
|
||
remote: Total 168 (delta 73), reused 114 (delta 28), pack-reused 0[K
|
||
Receiving objects: 100% (168/168), 26.22 MiB | 4.18 MiB/s, done.
|
||
Resolving deltas: 100% (73/73), done.
|
||
|
||
|
||
Convert Tensorflow Lite model to OpenVINO IR format `⇑ <#top>`__
|
||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||
|
||
|
||
The attention-center model is pre-trained model in TensorFlow Lite
|
||
format. In this Notebook the model will be converted to OpenVINO IR
|
||
format with Model Optimizer. This step will be skipped if the model have
|
||
already been converted. For more information about Model Optimizer,
|
||
please, see the `Model Optimizer Developer
|
||
Guide <https://docs.openvino.ai/2023.0/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html>`__.
|
||
|
||
Also TFLite models format is supported in OpenVINO by TFLite frontend,
|
||
so the model can be passed directly to ``core.read_model()``. You can
|
||
find example in
|
||
`002-openvino-api <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/002-openvino-api>`__.
|
||
|
||
.. code:: ipython3
|
||
|
||
tflite_model_path = Path("./attention-center/model/center.tflite")
|
||
|
||
ir_model_path = Path("./model/ir_center_model.xml")
|
||
|
||
core = Core()
|
||
|
||
if not ir_model_path.exists():
|
||
model = mo.convert_model(tflite_model_path)
|
||
serialize(model, ir_model_path.as_posix())
|
||
print("IR model saved to {}".format(ir_model_path))
|
||
else:
|
||
print("Read IR model from {}".format(ir_model_path))
|
||
model = core.read_model(ir_model_path)
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
IR model saved to model/ir_center_model.xml
|
||
|
||
|
||
Select inference device `⇑ <#top>`__
|
||
###############################################################################################################################
|
||
|
||
|
||
Select device from dropdown list for running inference using OpenVINO:
|
||
|
||
.. code:: ipython3
|
||
|
||
import ipywidgets as widgets
|
||
|
||
device = widgets.Dropdown(
|
||
options=core.available_devices + ["AUTO"],
|
||
value='AUTO',
|
||
description='Device:',
|
||
disabled=False,
|
||
)
|
||
|
||
device
|
||
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')
|
||
|
||
|
||
|
||
.. code:: ipython3
|
||
|
||
compiled_model = core.compile_model(model=model, device_name=device.value)
|
||
|
||
Prepare image to use with attention-center model `⇑ <#top>`__
|
||
###############################################################################################################################
|
||
|
||
|
||
The attention-center model takes an RGB image with shape (480, 640) as
|
||
input.
|
||
|
||
.. code:: ipython3
|
||
|
||
class Image():
|
||
def __init__(self, model_input_image_shape, image_path=None, image=None):
|
||
self.model_input_image_shape = model_input_image_shape
|
||
self.image = None
|
||
self.real_input_image_shape = None
|
||
|
||
if image_path is not None:
|
||
self.image = cv2.imread(str(image_path))
|
||
self.real_input_image_shape = self.image.shape
|
||
elif image is not None:
|
||
self.image = image
|
||
self.real_input_image_shape = self.image.shape
|
||
else:
|
||
raise Exception("Sorry, image can't be found, please, specify image_path or image")
|
||
|
||
def prepare_image_tensor(self):
|
||
rgb_image = cv2.cvtColor(self.image, cv2.COLOR_BGR2RGB)
|
||
resized_image = cv2.resize(rgb_image, (self.model_input_image_shape[1], self.model_input_image_shape[0]))
|
||
|
||
image_tensor = tf.constant(np.expand_dims(resized_image, axis=0),
|
||
dtype=tf.float32)
|
||
return image_tensor
|
||
|
||
def scalt_center_to_real_image_shape(self, predicted_center):
|
||
new_center_y = round(predicted_center[0] * self.real_input_image_shape[1] / self.model_input_image_shape[1])
|
||
new_center_x = round(predicted_center[1] * self.real_input_image_shape[0] / self.model_input_image_shape[0])
|
||
return (int(new_center_y), int(new_center_x))
|
||
|
||
def draw_attention_center_point(self, predicted_center):
|
||
image_with_circle = cv2.circle(self.image,
|
||
predicted_center,
|
||
radius=10,
|
||
color=(3, 3, 255),
|
||
thickness=-1)
|
||
return image_with_circle
|
||
|
||
def print_image(self, predicted_center=None):
|
||
image_to_print = self.image
|
||
if predicted_center is not None:
|
||
image_to_print = self.draw_attention_center_point(predicted_center)
|
||
|
||
plt.imshow(cv2.cvtColor(image_to_print, cv2.COLOR_BGR2RGB))
|
||
|
||
Load input image `⇑ <#top>`__
|
||
###############################################################################################################################
|
||
|
||
|
||
Upload input image using file loading button
|
||
|
||
.. code:: ipython3
|
||
|
||
import ipywidgets as widgets
|
||
|
||
load_file_widget = widgets.FileUpload(
|
||
accept="image/*", multiple=False, description="Image file",
|
||
)
|
||
|
||
load_file_widget
|
||
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
FileUpload(value=(), accept='image/*', description='Image file')
|
||
|
||
|
||
|
||
.. code:: ipython3
|
||
|
||
import io
|
||
import PIL
|
||
# read uploaded image
|
||
image = PIL.Image.open(io.BytesIO(load_file_widget.value[-1]['content'])) if load_file_widget.value else PIL.Image.open("../data/image/coco.jpg")
|
||
image.convert("RGB")
|
||
|
||
input_image = Image((480, 640), image=(np.ascontiguousarray(image)[:, :, ::-1]).astype(np.uint8))
|
||
image_tensor = input_image.prepare_image_tensor()
|
||
input_image.print_image()
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
2023-08-15 23:15:04.645356: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
|
||
Skipping registering GPU devices...
|
||
|
||
|
||
|
||
.. image:: 216-attention-center-with-output_files/216-attention-center-with-output_14_1.png
|
||
|
||
|
||
Get result with OpenVINO IR model `⇑ <#top>`__
|
||
###############################################################################################################################
|
||
|
||
|
||
.. code:: ipython3
|
||
|
||
output_layer = compiled_model.output(0)
|
||
|
||
# make inference, get result in input image resolution
|
||
res = compiled_model([image_tensor])[output_layer]
|
||
# scale point to original image resulution
|
||
predicted_center = input_image.scalt_center_to_real_image_shape(res[0])
|
||
print(f'Prediction attention center point {predicted_center}')
|
||
input_image.print_image(predicted_center)
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
Prediction attention center point (292, 277)
|
||
|
||
|
||
|
||
.. image:: 216-attention-center-with-output_files/216-attention-center-with-output_16_1.png
|
||
|