openvino/docs/notebooks/401-object-detection-with-output.rst

Live Object Detection with OpenVINO™
====================================

This notebook demonstrates live object detection with OpenVINO, using
the `SSDLite
MobileNetV2 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssdlite_mobilenet_v2>`__
from `Open Model
Zoo <https://github.com/openvinotoolkit/open_model_zoo/>`__. Final part
of this notebook shows live inference results from a webcam.
Additionally, you can also upload a video file.

   **NOTE**: To use this notebook with a webcam, you need to run the
   notebook on a computer with a webcam. If you run the notebook on a
   server, the webcam will not work. However, you can still do inference
   on a video.

**Table of contents:**


-  `Preparation <#preparation>`__

   -  `Install requirements <#install-requirements>`__
   -  `Imports <#imports>`__

-  `The Model <#the-model>`__

   -  `Download the Model <#download-the-model>`__
   -  `Convert the Model <#convert-the-model>`__
   -  `Load the Model <#load-the-model>`__

-  `Processing <#processing>`__

   -  `Process Results <#process-results>`__
   -  `Main Processing Function <#main-processing-function>`__

-  `Run <#run>`__

   -  `Run Live Object Detection <#run-live-object-detection>`__
   -  `Run Object Detection on a Video
      File <#run-object-detection-on-a-video-file>`__

-  `References <#references>`__

Preparation
-----------------------------------------------------

Install requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    %pip install -q "openvino-dev>=2023.1.0"
    %pip install -q tensorflow
    %pip install -q opencv-python requests tqdm

    # Fetch `notebook_utils` module
    import urllib.request
    urllib.request.urlretrieve(
        url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
        filename='notebook_utils.py'
    )


.. parsed-literal::

    DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
    Note: you may need to restart the kernel to use updated packages.
    DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    fastapi 0.104.1 requires typing-extensions>=4.8.0, but you have typing-extensions 4.5.0 which is incompatible.
    pydantic 2.4.2 requires typing-extensions>=4.6.1, but you have typing-extensions 4.5.0 which is incompatible.
    pydantic-core 2.10.1 requires typing-extensions!=4.7.0,>=4.6.0, but you have typing-extensions 4.5.0 which is incompatible.
    pytorch-lightning 1.6.5 requires protobuf<=3.20.1, but you have protobuf 3.20.3 which is incompatible.
    Note: you may need to restart the kernel to use updated packages.
    DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
    Note: you may need to restart the kernel to use updated packages.


.. parsed-literal::

    ('notebook_utils.py', <http.client.HTTPMessage at 0x7f00847d95b0>)


Imports
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    import collections
    import tarfile
    import time
    from pathlib import Path

    import cv2
    import numpy as np
    from IPython import display
    import openvino as ov
    from openvino.tools.mo.front import tf as ov_tf_front
    from openvino.tools import mo

    import notebook_utils as utils

The Model
---------------------------------------------------

Download the Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use the ``download_file``, a function from the ``notebook_utils`` file.
It automatically creates a directory structure and downloads the
selected model. This step is skipped if the package is already
downloaded and unpacked. The chosen model comes from the public
directory, which means it must be converted into OpenVINO Intermediate
Representation (OpenVINO IR).

   **NOTE**: Using a model other than ``ssdlite_mobilenet_v2`` may
   require different conversion parameters as well as pre- and
   post-processing.

.. code:: ipython3

    # A directory where the model will be downloaded.
    base_model_dir = Path("model")

    # The name of the model from Open Model Zoo
    model_name = "ssdlite_mobilenet_v2"

    archive_name = Path(f"{model_name}_coco_2018_05_09.tar.gz")
    model_url = f"https://storage.openvinotoolkit.org/repositories/open_model_zoo/public/2022.1/{model_name}/{archive_name}"

    # Download the archive
    downloaded_model_path = base_model_dir / archive_name
    if not downloaded_model_path.exists():
        utils.download_file(model_url, downloaded_model_path.name, downloaded_model_path.parent)

    # Unpack the model
    tf_model_path = base_model_dir / archive_name.with_suffix("").stem / "frozen_inference_graph.pb"
    if not tf_model_path.exists():
        with tarfile.open(downloaded_model_path) as file:
            file.extractall(base_model_dir)


.. parsed-literal::

    model/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz:   0%|          | 0.00/48.7M [00:00<?, ?B/s]


Convert the Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The pre-trained model is in TensorFlow format. To use it with OpenVINO,
convert it to OpenVINO IR format, using `model conversion Python
API <https://docs.openvino.ai/2023.0/openvino_docs_model_processing_introduction.html>`__
(``mo.convert_model`` function). If the model has been already
converted, this step is skipped.

.. code:: ipython3

    precision = "FP16"
    # The output path for the conversion.
    converted_model_path = Path("model") / f"{model_name}_{precision.lower()}.xml"

    # Convert it to IR if not previously converted
    trans_config_path = Path(ov_tf_front.__file__).parent / "ssd_v2_support.json"
    if not converted_model_path.exists():
        ov_model = mo.convert_model(
            tf_model_path,
            compress_to_fp16=(precision == 'FP16'),
            transformations_config=trans_config_path,
            tensorflow_object_detection_api_pipeline_config=tf_model_path.parent / "pipeline.config",
            reverse_input_channels=True
        )
        ov.save_model(ov_model, converted_model_path)
        del ov_model


.. parsed-literal::

    [ WARNING ]  The Preprocessor block has been removed. Only nodes performing mean value subtraction and scaling (if applicable) are kept.


Load the Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Only a few lines of code are required to run the model. First,
initialize OpenVINO Runtime. Then, read the network architecture and
model weights from the ``.bin`` and ``.xml`` files to compile for the
desired device. If you choose ``GPU`` you need to wait for a while, as
the startup time is much longer than in the case of ``CPU``.

There is a possibility to let OpenVINO decide which hardware offers the
best performance. For that purpose, just use ``AUTO``.

.. code:: ipython3

    import ipywidgets as widgets

    core = ov.Core()

    device = widgets.Dropdown(
        options=core.available_devices + ["AUTO"],
        value='AUTO',
        description='Device:',
        disabled=False,
    )

    device


.. parsed-literal::

    Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')


.. code:: ipython3

    # Read the network and corresponding weights from a file.
    model = core.read_model(model=converted_model_path)
    # Compile the model for CPU (you can choose manually CPU, GPU etc.)
    # or let the engine choose the best available device (AUTO).
    compiled_model = core.compile_model(model=model, device_name=device.value)

    # Get the input and output nodes.
    input_layer = compiled_model.input(0)
    output_layer = compiled_model.output(0)

    # Get the input size.
    height, width = list(input_layer.shape)[1:3]

Input and output layers have the names of the input node and output node
respectively. In the case of SSDLite MobileNetV2, there is 1 input and 1
output.

.. code:: ipython3

    input_layer.any_name, output_layer.any_name


.. parsed-literal::

    ('image_tensor:0', 'detection_boxes:0')


Processing
----------------------------------------------------

Process Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First, list all available classes and create colors for them. Then, in
the post-process stage, transform boxes with normalized coordinates
``[0, 1]`` into boxes with pixel coordinates ``[0, image_size_in_px]``.
Afterward, use `non-maximum
suppression <https://paperswithcode.com/method/non-maximum-suppression>`__
to reject overlapping detections and those below the probability
threshold (0.5). Finally, draw boxes and labels inside them.

.. code:: ipython3

    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
    classes = [
        "background", "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train",
        "truck", "boat", "traffic light", "fire hydrant", "street sign", "stop sign",
        "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant",
        "bear", "zebra", "giraffe", "hat", "backpack", "umbrella", "shoe", "eye glasses",
        "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
        "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle",
        "plate", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
        "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
        "couch", "potted plant", "bed", "mirror", "dining table", "window", "desk", "toilet",
        "door", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
        "toaster", "sink", "refrigerator", "blender", "book", "clock", "vase", "scissors",
        "teddy bear", "hair drier", "toothbrush", "hair brush"
    ]

    # Colors for the classes above (Rainbow Color Map).
    colors = cv2.applyColorMap(
        src=np.arange(0, 255, 255 / len(classes), dtype=np.float32).astype(np.uint8),
        colormap=cv2.COLORMAP_RAINBOW,
    ).squeeze()


    def process_results(frame, results, thresh=0.6):
        # The size of the original frame.
        h, w = frame.shape[:2]
        # The 'results' variable is a [1, 1, 100, 7] tensor.
        results = results.squeeze()
        boxes = []
        labels = []
        scores = []
        for _, label, score, xmin, ymin, xmax, ymax in results:
            # Create a box with pixels coordinates from the box with normalized coordinates [0,1].
            boxes.append(
                tuple(map(int, (xmin * w, ymin * h, (xmax - xmin) * w, (ymax - ymin) * h)))
            )
            labels.append(int(label))
            scores.append(float(score))

        # Apply non-maximum suppression to get rid of many overlapping entities.
        # See https://paperswithcode.com/method/non-maximum-suppression
        # This algorithm returns indices of objects to keep.
        indices = cv2.dnn.NMSBoxes(
            bboxes=boxes, scores=scores, score_threshold=thresh, nms_threshold=0.6
        )

        # If there are no boxes.
        if len(indices) == 0:
            return []

        # Filter detected objects.
        return [(labels[idx], scores[idx], boxes[idx]) for idx in indices.flatten()]


    def draw_boxes(frame, boxes):
        for label, score, box in boxes:
            # Choose color for the label.
            color = tuple(map(int, colors[label]))
            # Draw a box.
            x2 = box[0] + box[2]
            y2 = box[1] + box[3]
            cv2.rectangle(img=frame, pt1=box[:2], pt2=(x2, y2), color=color, thickness=3)

            # Draw a label name inside the box.
            cv2.putText(
                img=frame,
                text=f"{classes[label]} {score:.2f}",
                org=(box[0] + 10, box[1] + 30),
                fontFace=cv2.FONT_HERSHEY_COMPLEX,
                fontScale=frame.shape[1] / 1000,
                color=color,
                thickness=1,
                lineType=cv2.LINE_AA,
            )

        return frame

Main Processing Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run object detection on the specified source. Either a webcam or a video
file.

.. code:: ipython3

    # Main processing function to run object detection.
    def run_object_detection(source=0, flip=False, use_popup=False, skip_first_frames=0):
        player = None
        try:
            # Create a video player to play with target fps.
            player = utils.VideoPlayer(
                source=source, flip=flip, fps=30, skip_first_frames=skip_first_frames
            )
            # Start capturing.
            player.start()
            if use_popup:
                title = "Press ESC to Exit"
                cv2.namedWindow(
                    winname=title, flags=cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE
                )

            processing_times = collections.deque()
            while True:
                # Grab the frame.
                frame = player.next()
                if frame is None:
                    print("Source ended")
                    break
                # If the frame is larger than full HD, reduce size to improve the performance.
                scale = 1280 / max(frame.shape)
                if scale < 1:
                    frame = cv2.resize(
                        src=frame,
                        dsize=None,
                        fx=scale,
                        fy=scale,
                        interpolation=cv2.INTER_AREA,
                    )

                # Resize the image and change dims to fit neural network input.
                input_img = cv2.resize(
                    src=frame, dsize=(width, height), interpolation=cv2.INTER_AREA
                )
                # Create a batch of images (size = 1).
                input_img = input_img[np.newaxis, ...]

                # Measure processing time.

                start_time = time.time()
                # Get the results.
                results = compiled_model([input_img])[output_layer]
                stop_time = time.time()
                # Get poses from network results.
                boxes = process_results(frame=frame, results=results)

                # Draw boxes on a frame.
                frame = draw_boxes(frame=frame, boxes=boxes)

                processing_times.append(stop_time - start_time)
                # Use processing times from last 200 frames.
                if len(processing_times) > 200:
                    processing_times.popleft()

                _, f_width = frame.shape[:2]
                # Mean processing time [ms].
                processing_time = np.mean(processing_times) * 1000
                fps = 1000 / processing_time
                cv2.putText(
                    img=frame,
                    text=f"Inference time: {processing_time:.1f}ms ({fps:.1f} FPS)",
                    org=(20, 40),
                    fontFace=cv2.FONT_HERSHEY_COMPLEX,
                    fontScale=f_width / 1000,
                    color=(0, 0, 255),
                    thickness=1,
                    lineType=cv2.LINE_AA,
                )

                # Use this workaround if there is flickering.
                if use_popup:
                    cv2.imshow(winname=title, mat=frame)
                    key = cv2.waitKey(1)
                    # escape = 27
                    if key == 27:
                        break
                else:
                    # Encode numpy array to jpg.
                    _, encoded_img = cv2.imencode(
                        ext=".jpg", img=frame, params=[cv2.IMWRITE_JPEG_QUALITY, 100]
                    )
                    # Create an IPython image.
                    i = display.Image(data=encoded_img)
                    # Display the image in this notebook.
                    display.clear_output(wait=True)
                    display.display(i)
        # ctrl-c
        except KeyboardInterrupt:
            print("Interrupted")
        # any different error
        except RuntimeError as e:
            print(e)
        finally:
            if player is not None:
                # Stop capturing.
                player.stop()
            if use_popup:
                cv2.destroyAllWindows()

Run
---------------------------------------------

Run Live Object Detection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use a webcam as the video input. By default, the primary webcam is set
with ``source=0``. If you have multiple webcams, each one will be
assigned a consecutive number starting at 0. Set ``flip=True`` when
using a front-facing camera. Some web browsers, especially Mozilla
Firefox, may cause flickering. If you experience flickering, set
``use_popup=True``.

   **NOTE**: To use this notebook with a webcam, you need to run the
   notebook on a computer with a webcam. If you run the notebook on a
   server (for example, Binder), the webcam will not work. Popup mode
   may not work if you run this notebook on a remote computer (for
   example, Binder).

Run the object detection:

.. code:: ipython3

    run_object_detection(source=0, flip=True, use_popup=False)


.. parsed-literal::

    Cannot open camera 0


.. parsed-literal::

    [ WARN:0@44.947] global cap_v4l.cpp:982 open VIDEOIO(V4L2:/dev/video0): can't open camera by index
    [ERROR:0@44.947] global obsensor_uvc_stream_channel.cpp:156 getStreamChannelGroup Camera index out of range


Run Object Detection on a Video File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you do not have a webcam, you can still run this demo with a video
file. Any `format supported by
OpenCV <https://docs.opencv.org/4.5.1/dd/d43/tutorial_py_video_display.html>`__
will work.

.. code:: ipython3

    video_file = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/video/Coco%20Walking%20in%20Berkeley.mp4"

    run_object_detection(source=video_file, flip=False, use_popup=False)


.. image:: 401-object-detection-with-output_files/401-object-detection-with-output_21_0.png


.. parsed-literal::

    Source ended


References
----------------------------------------------------

1. `SSDLite
   MobileNetV2 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssdlite_mobilenet_v2>`__
2. `Open Model
   Zoo <https://github.com/openvinotoolkit/open_model_zoo/>`__
3. `Non-Maximum
   Suppression <https://paperswithcode.com/method/non-maximum-suppression>`__