945 lines
34 KiB
ReStructuredText
945 lines
34 KiB
ReStructuredText
Object segmentations with FastSAM and OpenVINO
|
|
==============================================
|
|
|
|
`The Fast Segment Anything Model
|
|
(FastSAM) <https://docs.ultralytics.com/models/fast-sam/>`__ is a
|
|
real-time CNN-based model that can segment any object within an image
|
|
based on various user prompts. ``Segment Anything`` task is designed to
|
|
make vision tasks easier by providing an efficient way to identify
|
|
objects in an image. FastSAM significantly reduces computational demands
|
|
while maintaining competitive performance, making it a practical choice
|
|
for a variety of vision tasks.
|
|
|
|
FastSAM is a model that aims to overcome the limitations of the `Segment
|
|
Anything Model (SAM) <https://docs.ultralytics.com/models/sam/>`__,
|
|
which is a Transformer model that requires significant computational
|
|
resources. FastSAM tackles the segment anything task by dividing it into
|
|
two consecutive stages: all-instance segmentation and prompt-guided
|
|
selection.
|
|
|
|
In the first stage,
|
|
`YOLOv8-seg <https://docs.ultralytics.com/tasks/segment/>`__ is used
|
|
to produce segmentation masks for all instances in the image. In the
|
|
second stage, FastSAM outputs the region-of-interest corresponding to
|
|
the prompt.
|
|
|
|
.. figure:: https://user-images.githubusercontent.com/26833433/248551984-d98f0f6d-7535-45d0-b380-2e1440b52ad7.jpg
|
|
:alt: pipeline
|
|
|
|
pipeline
|
|
|
|
**Table of contents:**
|
|
|
|
- `Prerequisites <#prerequisites>`__
|
|
|
|
- `Install requirements <#install-requirements>`__
|
|
- `Imports <#imports>`__
|
|
|
|
- `FastSAM in Ultralytics <#fastsam-in-ultralytics>`__
|
|
- `Convert the model to OpenVINO Intermediate representation (IR)
|
|
format <#convert-the-model-to-openvino-intermediate-representation-ir-format>`__
|
|
- `Embedding the converted models into the original
|
|
pipeline <#embedding-the-converted-models-into-the-original-pipeline>`__
|
|
|
|
- `Select inference device <#select-inference-device>`__
|
|
- `Adapt OpenVINO models to the original
|
|
pipeline <#adapt-openvino-models-to-the-original-pipeline>`__
|
|
|
|
- `Optimize the model using NNCF Post-training Quantization
|
|
API <#optimize-the-model-using-nncf-post-training-quantization-api>`__
|
|
|
|
- `Compare the performance of the Original and Quantized
|
|
Models <#compare-the-performance-of-the-original-and-quantized-models>`__
|
|
|
|
- `Try out the converted pipeline <#try-out-the-converted-pipeline>`__
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
|
|
|
|
Install requirements
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. code:: ipython3
|
|
|
|
%pip install -q "ultralytics==8.0.200" onnx
|
|
%pip install -q "openvino-dev>=2023.1.0"
|
|
%pip install -q "nncf>=2.6.0"
|
|
%pip install -q gradio
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
|
|
Note: you may need to restart the kernel to use updated packages.
|
|
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
|
|
Note: you may need to restart the kernel to use updated packages.
|
|
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
|
|
Note: you may need to restart the kernel to use updated packages.
|
|
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
|
|
Note: you may need to restart the kernel to use updated packages.
|
|
|
|
|
|
Imports
|
|
~~~~~~~
|
|
|
|
|
|
|
|
.. code:: ipython3
|
|
|
|
import ipywidgets as widgets
|
|
from pathlib import Path
|
|
|
|
import openvino as ov
|
|
import torch
|
|
from PIL import Image, ImageDraw
|
|
from ultralytics import FastSAM
|
|
|
|
import urllib.request
|
|
# Fetch skip_kernel_extension module
|
|
urllib.request.urlretrieve(
|
|
url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/skip_kernel_extension.py',
|
|
filename='skip_kernel_extension.py'
|
|
)
|
|
# Fetch `notebook_utils` module
|
|
urllib.request.urlretrieve(
|
|
url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
|
|
filename='notebook_utils.py'
|
|
)
|
|
from notebook_utils import download_file
|
|
%load_ext skip_kernel_extension
|
|
|
|
FastSAM in Ultralytics
|
|
----------------------
|
|
|
|
|
|
|
|
To work with `Fast Segment Anything
|
|
Model <https://github.com/CASIA-IVA-Lab/FastSAM>`__ by
|
|
``CASIA-IVA-Lab``, we will use the `Ultralytics
|
|
package <https://docs.ultralytics.com/>`__. Ultralytics package exposes
|
|
the ``FastSAM`` class, simplifying the model instantiation and weights
|
|
loading. The code below demonstrates how to initialize a ``FastSAM``
|
|
model and generate a segmentation map.
|
|
|
|
.. code:: ipython3
|
|
|
|
model_name = "FastSAM-x"
|
|
model = FastSAM(model_name)
|
|
|
|
# Run inference on an image
|
|
image_uri = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_bike.jpg"
|
|
image_uri = download_file(image_uri)
|
|
results = model(image_uri, device="cpu", retina_masks=True, imgsz=1024, conf=0.6, iou=0.9)
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/FastSAM-x.pt to 'FastSAM-x.pt'...
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
0%| | 0.00/138M [00:00<?, ?B/s]
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
coco_bike.jpg: 0%| | 0.00/182k [00:00<?, ?B/s]
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
image 1/1 /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/261-fast-segment-anything/coco_bike.jpg: 768x1024 37 objects, 631.0ms
|
|
Speed: 3.8ms preprocess, 631.0ms inference, 21.8ms postprocess per image at shape (1, 3, 768, 1024)
|
|
|
|
|
|
The model returns segmentation maps for all the objects on the image.
|
|
Observe the results below.
|
|
|
|
.. code:: ipython3
|
|
|
|
Image.fromarray(results[0].plot()[..., ::-1])
|
|
|
|
|
|
|
|
|
|
.. image:: 261-fast-segment-anything-with-output_files/261-fast-segment-anything-with-output_9_0.png
|
|
|
|
|
|
|
|
Convert the model to OpenVINO Intermediate representation (IR) format
|
|
---------------------------------------------------------------------
|
|
|
|
|
|
|
|
The Ultralytics Model export API enables conversion of PyTorch models to
|
|
OpenVINO IR format. Under the hood it utilizes the
|
|
``openvino.convert_model`` method to acquire OpenVINO IR versions of the
|
|
models. The method requires a model object and example input for model
|
|
tracing. The FastSAM model itself is based on YOLOv8 model.
|
|
|
|
.. code:: ipython3
|
|
|
|
# instance segmentation model
|
|
ov_model_path = Path(f"{model_name}_openvino_model/{model_name}.xml")
|
|
if not ov_model_path.exists():
|
|
ov_model = model.export(format="openvino", dynamic=True, half=False)
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Ultralytics YOLOv8.0.200 🚀 Python-3.8.10 torch-1.13.1+cpu CPU (Intel Core(TM) i9-10920X 3.50GHz)
|
|
|
|
PyTorch: starting from 'FastSAM-x.pt' with input shape (1, 3, 1024, 1024) BCHW and output shape(s) ((1, 37, 21504), (1, 32, 256, 256)) (138.2 MB)
|
|
|
|
ONNX: starting export with onnx 1.15.0 opset 16...
|
|
ONNX: export success ✅ 3.5s, saved as 'FastSAM-x.onnx' (275.5 MB)
|
|
|
|
OpenVINO: starting export with openvino 2023.1.0-12185-9e6b00e51cd-releases/2023/1...
|
|
OpenVINO: export success ✅ 1.0s, saved as 'FastSAM-x_openvino_model/' (275.9 MB)
|
|
|
|
Export complete (7.5s)
|
|
Results saved to /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/261-fast-segment-anything
|
|
Predict: yolo predict task=segment model=FastSAM-x_openvino_model imgsz=1024
|
|
Validate: yolo val task=segment model=FastSAM-x_openvino_model imgsz=1024 data=ultralytics/datasets/sa.yaml
|
|
Visualize: https://netron.app
|
|
|
|
|
|
Embedding the converted models into the original pipeline
|
|
---------------------------------------------------------
|
|
|
|
|
|
|
|
OpenVINO™ Runtime Python API is used to compile the model in OpenVINO IR
|
|
format. The
|
|
`Core <https://docs.openvino.ai/2022.3/api/ie_python_api/_autosummary/openvino.runtime.Core.html>`__
|
|
class provides access to the OpenVINO Runtime API. The ``core`` object,
|
|
which is an instance of the ``Core`` class represents the API and it is
|
|
used to compile the model.
|
|
|
|
.. code:: ipython3
|
|
|
|
core = ov.Core()
|
|
|
|
Select inference device
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
Select device that will be used to do models inference using OpenVINO
|
|
from the dropdown list:
|
|
|
|
.. code:: ipython3
|
|
|
|
DEVICE = widgets.Dropdown(
|
|
options=core.available_devices + ["AUTO"],
|
|
value="AUTO",
|
|
description="Device:",
|
|
disabled=False,
|
|
)
|
|
|
|
DEVICE
|
|
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')
|
|
|
|
|
|
|
|
Adapt OpenVINO models to the original pipeline
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
Here we create wrapper classes for the OpenVINO model that we want to
|
|
embed in the original inference pipeline. Here are some of the things to
|
|
consider when adapting an OV model: - Make sure that parameters passed
|
|
by the original pipeline are forwarded to the compiled OV model
|
|
properly; sometimes the OV model uses only a portion of the input
|
|
arguments and some are ignored, sometimes you need to convert the
|
|
argument to another data type or unwrap some data structures such as
|
|
tuples or dictionaries. - Guarantee that the wrapper class returns
|
|
results to the pipeline in an expected format. In the example below you
|
|
can see how we pack OV model outputs into a tuple of ``torch`` tensors.
|
|
- Pay attention to the model method used in the original pipeline for
|
|
calling the model - it may be not the ``forward`` method! In this
|
|
example, the model is a part of a ``predictor`` object and called as and
|
|
object, so we need to redefine the magic ``__call__`` method.
|
|
|
|
.. code:: ipython3
|
|
|
|
class OVWrapper:
|
|
def __init__(self, ov_model, device="CPU", stride=32) -> None:
|
|
self.model = core.compile_model(ov_model, device_name=device)
|
|
|
|
self.stride = stride
|
|
self.pt = True
|
|
self.fp16 = False
|
|
self.names = {0: "object"}
|
|
|
|
def __call__(self, im, **_):
|
|
result = self.model(im)
|
|
return torch.from_numpy(result[0]), torch.from_numpy(result[1])
|
|
|
|
Now we initialize the wrapper objects and load them to the FastSAM
|
|
pipeline.
|
|
|
|
.. code:: ipython3
|
|
|
|
wrapped_model = OVWrapper(ov_model_path, device=DEVICE.value, stride=model.predictor.model.stride)
|
|
model.predictor.model = wrapped_model
|
|
|
|
ov_results = model(image_uri, device=DEVICE.value, retina_masks=True, imgsz=640, conf=0.6, iou=0.9)
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
image 1/1 /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/261-fast-segment-anything/coco_bike.jpg: 480x640 33 objects, 353.6ms
|
|
Speed: 3.5ms preprocess, 353.6ms inference, 14.7ms postprocess per image at shape (1, 3, 480, 640)
|
|
|
|
|
|
One can observe the converted model outputs in the next cell, they is
|
|
the same as of the original model.
|
|
|
|
.. code:: ipython3
|
|
|
|
Image.fromarray(ov_results[0].plot()[..., ::-1])
|
|
|
|
|
|
|
|
|
|
.. image:: 261-fast-segment-anything-with-output_files/261-fast-segment-anything-with-output_21_0.png
|
|
|
|
|
|
|
|
Optimize the model using NNCF Post-training Quantization API
|
|
------------------------------------------------------------
|
|
|
|
|
|
|
|
`NNCF <https://github.com/openvinotoolkit/nncf>`__ provides a suite of
|
|
advanced algorithms for Neural Networks inference optimization in
|
|
OpenVINO with minimal accuracy drop. We will use 8-bit quantization in
|
|
post-training mode (without the fine-tuning pipeline) to optimize
|
|
FastSAM.
|
|
|
|
The optimization process contains the following steps:
|
|
|
|
1. Create a Dataset for quantization.
|
|
2. Run ``nncf.quantize`` to obtain a quantized model.
|
|
3. Save the INT8 model using ``openvino.save_model()`` function.
|
|
|
|
.. code:: ipython3
|
|
|
|
do_quantize = widgets.Checkbox(
|
|
value=True,
|
|
description='Quantization',
|
|
disabled=False,
|
|
)
|
|
|
|
do_quantize
|
|
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Checkbox(value=True, description='Quantization')
|
|
|
|
|
|
|
|
The ``nncf.quantize`` function provides an interface for model
|
|
quantization. It requires an instance of the OpenVINO Model and
|
|
quantization dataset. Optionally, some additional parameters for the
|
|
configuration quantization process (number of samples for quantization,
|
|
preset, ignored scope, etc.) can be provided. YOLOv8 model backing
|
|
FastSAM contains non-ReLU activation functions, which require asymmetric
|
|
quantization of activations. To achieve a better result, we will use a
|
|
``mixed`` quantization preset. It provides symmetric quantization of
|
|
weights and asymmetric quantization of activations. For more accurate
|
|
results, we should keep the operation in the postprocessing subgraph in
|
|
floating point precision, using the ``ignored_scope`` parameter.
|
|
|
|
The quantization algorithm is based on `The YOLOv8 quantization
|
|
example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/openvino/yolov8>`__
|
|
in the NNCF repo, refer there for more details. Moreover, you can check
|
|
out other quantization tutorials in the `OV notebooks
|
|
repo <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/230-yolov8-optimization>`__.
|
|
|
|
**Note**: Model post-training quantization is time-consuming process.
|
|
Be patient, it can take several minutes depending on your hardware.
|
|
|
|
.. code:: ipython3
|
|
|
|
%%skip not $do_quantize.value
|
|
|
|
import pickle
|
|
from contextlib import contextmanager
|
|
from zipfile import ZipFile
|
|
|
|
import cv2
|
|
from tqdm.autonotebook import tqdm
|
|
|
|
import nncf
|
|
|
|
|
|
COLLECT_CALIBRATION_DATA = False
|
|
calibration_data = []
|
|
|
|
@contextmanager
|
|
def calibration_data_collection():
|
|
global COLLECT_CALIBRATION_DATA
|
|
try:
|
|
COLLECT_CALIBRATION_DATA = True
|
|
yield
|
|
finally:
|
|
COLLECT_CALIBRATION_DATA = False
|
|
|
|
|
|
class NNCFWrapper:
|
|
def __init__(self, ov_model, stride=32) -> None:
|
|
self.model = core.read_model(ov_model)
|
|
self.compiled_model = core.compile_model(self.model, device_name="CPU")
|
|
|
|
self.stride = stride
|
|
self.pt = True
|
|
self.fp16 = False
|
|
self.names = {0: "object"}
|
|
|
|
def __call__(self, im, **_):
|
|
if COLLECT_CALIBRATION_DATA:
|
|
calibration_data.append(im)
|
|
|
|
result = self.compiled_model(im)
|
|
return torch.from_numpy(result[0]), torch.from_numpy(result[1])
|
|
|
|
# Fetch data from the web and descibe a dataloader
|
|
DATA_URL = "https://ultralytics.com/assets/coco128.zip"
|
|
OUT_DIR = Path('.')
|
|
|
|
download_file(DATA_URL, directory=OUT_DIR, show_progress=True)
|
|
|
|
if not (OUT_DIR / "coco128/images/train2017").exists():
|
|
with ZipFile('coco128.zip', "r") as zip_ref:
|
|
zip_ref.extractall(OUT_DIR)
|
|
|
|
class COCOLoader(torch.utils.data.Dataset):
|
|
def __init__(self, images_path):
|
|
self.images = list(Path(images_path).iterdir())
|
|
|
|
def __getitem__(self, index):
|
|
if isinstance(index, slice):
|
|
return [self.read_image(image_path) for image_path in self.images[index]]
|
|
return self.read_image(self.images[index])
|
|
|
|
def read_image(self, image_path):
|
|
image = cv2.imread(str(image_path))
|
|
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
|
|
return image
|
|
|
|
def __len__(self):
|
|
return len(self.images)
|
|
|
|
|
|
def collect_calibration_data_for_decoder(model, calibration_dataset_size: int,
|
|
calibration_cache_path: Path):
|
|
global calibration_data
|
|
|
|
|
|
if not calibration_cache_path.exists():
|
|
coco_dataset = COCOLoader(OUT_DIR / 'coco128/images/train2017')
|
|
with calibration_data_collection():
|
|
for image in tqdm(coco_dataset[:calibration_dataset_size], desc="Collecting calibration data"):
|
|
model(image, retina_masks=True, imgsz=640, conf=0.6, iou=0.9, verbose=False)
|
|
calibration_cache_path.parent.mkdir(parents=True, exist_ok=True)
|
|
with open(calibration_cache_path, "wb") as f:
|
|
pickle.dump(calibration_data, f)
|
|
else:
|
|
with open(calibration_cache_path, "rb") as f:
|
|
calibration_data = pickle.load(f)
|
|
|
|
return calibration_data
|
|
|
|
|
|
def quantize(model, save_model_path: Path, calibration_cache_path: Path,
|
|
calibration_dataset_size: int, preset: nncf.QuantizationPreset):
|
|
calibration_data = collect_calibration_data_for_decoder(
|
|
model, calibration_dataset_size, calibration_cache_path)
|
|
quantized_ov_decoder = nncf.quantize(
|
|
model.predictor.model.model,
|
|
calibration_dataset=nncf.Dataset(calibration_data),
|
|
preset=preset,
|
|
subset_size=len(calibration_data),
|
|
fast_bias_correction=True,
|
|
ignored_scope=nncf.IgnoredScope(
|
|
types=["Multiply", "Subtract", "Sigmoid"], # ignore operations
|
|
names=[
|
|
"/model.22/dfl/conv/Conv", # in the post-processing subgraph
|
|
"/model.22/Add",
|
|
"/model.22/Add_1",
|
|
"/model.22/Add_2",
|
|
"/model.22/Add_3",
|
|
"/model.22/Add_4",
|
|
"/model.22/Add_5",
|
|
"/model.22/Add_6",
|
|
"/model.22/Add_7",
|
|
"/model.22/Add_8",
|
|
"/model.22/Add_9",
|
|
"/model.22/Add_10",
|
|
],
|
|
)
|
|
)
|
|
ov.save_model(quantized_ov_decoder, save_model_path)
|
|
|
|
wrapped_model = NNCFWrapper(ov_model_path, stride=model.predictor.model.stride)
|
|
model.predictor.model = wrapped_model
|
|
|
|
calibration_dataset_size = 128
|
|
quantized_model_path = Path(f"{model_name}_quantized") / "FastSAM-x.xml"
|
|
calibration_cache_path = Path(f"calibration_data/coco{calibration_dataset_size}.pkl")
|
|
if not quantized_model_path.exists():
|
|
quantize(model, quantized_model_path, calibration_cache_path,
|
|
calibration_dataset_size=calibration_dataset_size,
|
|
preset=nncf.QuantizationPreset.MIXED)
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
coco128.zip: 0%| | 0.00/6.66M [00:00<?, ?B/s]
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Collecting calibration data: 0%| | 0/128 [00:00<?, ?it/s]
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
INFO:nncf:12 ignored nodes was found by name in the NNCFGraph
|
|
INFO:nncf:9 ignored nodes was found by types in the NNCFGraph
|
|
INFO:nncf:Not adding activation input quantizer for operation: 204 /model.22/Sigmoid
|
|
INFO:nncf:Not adding activation input quantizer for operation: 246 /model.22/dfl/conv/Conv
|
|
INFO:nncf:Not adding activation input quantizer for operation: 275 /model.22/Sub
|
|
INFO:nncf:Not adding activation input quantizer for operation: 276 /model.22/Add_10
|
|
INFO:nncf:Not adding activation input quantizer for operation: 297 /model.22/Sub_1
|
|
INFO:nncf:Not adding activation input quantizer for operation: 334 /model.22/Mul_5
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Statistics collection: 100%|██████████| 128/128 [01:07<00:00, 1.91it/s]
|
|
Applying Fast Bias correction: 100%|██████████| 115/115 [00:30<00:00, 3.76it/s]
|
|
|
|
|
|
Compare the performance of the Original and Quantized Models
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
Finally, we iterate both the OV model and the quantized model over the
|
|
calibration dataset to measure the performance.
|
|
|
|
.. code:: ipython3
|
|
|
|
%%skip not $do_quantize.value
|
|
|
|
import datetime
|
|
|
|
coco_dataset = COCOLoader(OUT_DIR / 'coco128/images/train2017')
|
|
calibration_dataset_size = 128
|
|
|
|
wrapped_model = OVWrapper(ov_model_path, device=DEVICE.value, stride=model.predictor.model.stride)
|
|
model.predictor.model = wrapped_model
|
|
|
|
start_time = datetime.datetime.now()
|
|
for image in tqdm(coco_dataset, desc="Measuring inference time"):
|
|
model(image, retina_masks=True, imgsz=640, conf=0.6, iou=0.9, verbose=False)
|
|
duration_base = (datetime.datetime.now() - start_time).seconds
|
|
print("Segmented in", duration_base, "seconds.")
|
|
print("Resulting in", round(calibration_dataset_size / duration_base, 2), "fps")
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Measuring inference time: 0%| | 0/128 [00:00<?, ?it/s]
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Segmented in 21 seconds.
|
|
Resulting in 6.1 fps
|
|
|
|
|
|
.. code:: ipython3
|
|
|
|
%%skip not $do_quantize.value
|
|
|
|
quantized_wrapped_model = OVWrapper(quantized_model_path, device=DEVICE.value, stride=model.predictor.model.stride)
|
|
model.predictor.model = quantized_wrapped_model
|
|
|
|
start_time = datetime.datetime.now()
|
|
for image in tqdm(coco_dataset, desc="Measuring inference time"):
|
|
model(image, retina_masks=True, imgsz=640, conf=0.6, iou=0.9, verbose=False)
|
|
duration_quantized = (datetime.datetime.now() - start_time).seconds
|
|
print("Segmented in", duration_quantized, "seconds")
|
|
print("Resulting in", round(calibration_dataset_size / duration_quantized, 2), "fps")
|
|
print("That is", round(duration_base / duration_quantized, 2), "times faster!")
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Measuring inference time: 0%| | 0/128 [00:00<?, ?it/s]
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Segmented in 11 seconds
|
|
Resulting in 11.64 fps
|
|
That is 1.91 times faster!
|
|
|
|
|
|
Try out the converted pipeline
|
|
------------------------------
|
|
|
|
|
|
|
|
The demo app below is created using `Gradio
|
|
package <https://www.gradio.app/docs/interface>`__.
|
|
|
|
The app allows you to alter the model output interactively. Using the
|
|
Pixel selector type switch you can place foreground/background points or
|
|
bounding boxes on input image.
|
|
|
|
.. code:: ipython3
|
|
|
|
import cv2
|
|
import numpy as np
|
|
import matplotlib.pyplot as plt
|
|
|
|
def fast_process(
|
|
annotations,
|
|
image,
|
|
scale,
|
|
better_quality=False,
|
|
mask_random_color=True,
|
|
bbox=None,
|
|
use_retina=True,
|
|
with_contours=True,
|
|
):
|
|
|
|
original_h = image.height
|
|
original_w = image.width
|
|
|
|
if better_quality:
|
|
for i, mask in enumerate(annotations):
|
|
mask = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_CLOSE, np.ones((3, 3), np.uint8))
|
|
annotations[i] = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_OPEN, np.ones((8, 8), np.uint8))
|
|
|
|
inner_mask = fast_show_mask(
|
|
annotations,
|
|
plt.gca(),
|
|
random_color=mask_random_color,
|
|
bbox=bbox,
|
|
retinamask=use_retina,
|
|
target_height=original_h,
|
|
target_width=original_w,
|
|
)
|
|
|
|
if with_contours:
|
|
contour_all = []
|
|
temp = np.zeros((original_h, original_w, 1))
|
|
for i, mask in enumerate(annotations):
|
|
annotation = mask.astype(np.uint8)
|
|
if not use_retina:
|
|
annotation = cv2.resize(
|
|
annotation,
|
|
(original_w, original_h),
|
|
interpolation=cv2.INTER_NEAREST,
|
|
)
|
|
contours, _ = cv2.findContours(annotation, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
|
|
for contour in contours:
|
|
contour_all.append(contour)
|
|
cv2.drawContours(temp, contour_all, -1, (255, 255, 255), 2 // scale)
|
|
color = np.array([0 / 255, 0 / 255, 255 / 255, 0.9])
|
|
contour_mask = temp / 255 * color.reshape(1, 1, -1)
|
|
|
|
image = image.convert("RGBA")
|
|
overlay_inner = Image.fromarray((inner_mask * 255).astype(np.uint8), "RGBA")
|
|
image.paste(overlay_inner, (0, 0), overlay_inner)
|
|
|
|
if with_contours:
|
|
overlay_contour = Image.fromarray((contour_mask * 255).astype(np.uint8), "RGBA")
|
|
image.paste(overlay_contour, (0, 0), overlay_contour)
|
|
|
|
return image
|
|
|
|
|
|
# CPU post process
|
|
def fast_show_mask(
|
|
annotation,
|
|
ax,
|
|
random_color=False,
|
|
bbox=None,
|
|
retinamask=True,
|
|
target_height=960,
|
|
target_width=960,
|
|
):
|
|
mask_sum = annotation.shape[0]
|
|
height = annotation.shape[1]
|
|
weight = annotation.shape[2]
|
|
#
|
|
areas = np.sum(annotation, axis=(1, 2))
|
|
sorted_indices = np.argsort(areas)[::1]
|
|
annotation = annotation[sorted_indices]
|
|
|
|
index = (annotation != 0).argmax(axis=0)
|
|
if random_color:
|
|
color = np.random.random((mask_sum, 1, 1, 3))
|
|
else:
|
|
color = np.ones((mask_sum, 1, 1, 3)) * np.array([30 / 255, 144 / 255, 255 / 255])
|
|
transparency = np.ones((mask_sum, 1, 1, 1)) * 0.6
|
|
visual = np.concatenate([color, transparency], axis=-1)
|
|
mask_image = np.expand_dims(annotation, -1) * visual
|
|
|
|
mask = np.zeros((height, weight, 4))
|
|
|
|
h_indices, w_indices = np.meshgrid(np.arange(height), np.arange(weight), indexing="ij")
|
|
indices = (index[h_indices, w_indices], h_indices, w_indices, slice(None))
|
|
|
|
mask[h_indices, w_indices, :] = mask_image[indices]
|
|
if bbox is not None:
|
|
x1, y1, x2, y2 = bbox
|
|
ax.add_patch(plt.Rectangle((x1, y1), x2 - x1, y2 - y1, fill=False, edgecolor="b", linewidth=1))
|
|
|
|
if not retinamask:
|
|
mask = cv2.resize(mask, (target_width, target_height), interpolation=cv2.INTER_NEAREST)
|
|
|
|
return mask
|
|
|
|
.. code:: ipython3
|
|
|
|
import gradio as gr
|
|
|
|
examples = [[image_uri], ["https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/empty_road_mapillary.jpg"],
|
|
["https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/wall.jpg"]]
|
|
|
|
object_points = []
|
|
background_points = []
|
|
bbox_points = []
|
|
last_image = examples[0][0]
|
|
|
|
This is the main callback function that is called to segment an image
|
|
based on user input.
|
|
|
|
.. code:: ipython3
|
|
|
|
def segment(
|
|
image,
|
|
model_type,
|
|
input_size=1024,
|
|
iou_threshold=0.75,
|
|
conf_threshold=0.4,
|
|
better_quality=True,
|
|
with_contours=True,
|
|
use_retina=True,
|
|
mask_random_color=True,
|
|
):
|
|
if do_quantize.value and model_type == 'Quantized model':
|
|
model.predictor.model = quantized_wrapped_model
|
|
else:
|
|
model.predictor.model = wrapped_model
|
|
|
|
input_size = int(input_size)
|
|
w, h = image.size
|
|
scale = input_size / max(w, h)
|
|
new_w = int(w * scale)
|
|
new_h = int(h * scale)
|
|
image = image.resize((new_w, new_h))
|
|
|
|
results = model(image,
|
|
device=DEVICE.value,
|
|
retina_masks=use_retina,
|
|
iou=iou_threshold,
|
|
conf=conf_threshold,
|
|
imgsz=input_size,)
|
|
|
|
masks = results[0].masks.data
|
|
# Calculate annotations
|
|
if not (object_points or bbox_points):
|
|
annotations = masks.cpu().numpy()
|
|
else:
|
|
annotations = []
|
|
|
|
if object_points:
|
|
all_points = object_points + background_points
|
|
labels = [1] * len(object_points) + [0] * len(background_points)
|
|
scaled_points = [[int(x * scale) for x in point] for point in all_points]
|
|
h, w = masks[0].shape[:2]
|
|
assert max(h, w) == input_size
|
|
onemask = np.zeros((h, w))
|
|
for mask in sorted(masks, key=lambda x: x.sum(), reverse=True):
|
|
mask_np = (mask == 1.0).cpu().numpy()
|
|
for point, label in zip(scaled_points, labels):
|
|
if mask_np[point[1], point[0]] == 1 and label == 1:
|
|
onemask[mask_np] = 1
|
|
if mask_np[point[1], point[0]] == 1 and label == 0:
|
|
onemask[mask_np] = 0
|
|
annotations.append(onemask >= 1)
|
|
if len(bbox_points) >= 2:
|
|
scaled_bbox_points = []
|
|
for i, point in enumerate(bbox_points):
|
|
x, y = int(point[0] * scale), int(point[1] * scale)
|
|
x = max(min(x, new_w), 0)
|
|
y = max(min(y, new_h), 0)
|
|
scaled_bbox_points.append((x, y))
|
|
|
|
for i in range(0, len(scaled_bbox_points) - 1, 2):
|
|
x0, y0, x1, y1 = *scaled_bbox_points[i], *scaled_bbox_points[i + 1]
|
|
|
|
intersection_area = torch.sum(masks[:, y0:y1, x0:x1], dim=(1, 2))
|
|
masks_area = torch.sum(masks, dim=(1, 2))
|
|
bbox_area = (y1 - y0) * (x1 - x0)
|
|
|
|
union = bbox_area + masks_area - intersection_area
|
|
iou = intersection_area / union
|
|
max_iou_index = torch.argmax(iou)
|
|
|
|
annotations.append(masks[max_iou_index].cpu().numpy())
|
|
|
|
return fast_process(
|
|
annotations=np.array(annotations),
|
|
image=image,
|
|
scale=(1024 // input_size),
|
|
better_quality=better_quality,
|
|
mask_random_color=mask_random_color,
|
|
bbox=None,
|
|
use_retina=use_retina,
|
|
with_contours=with_contours
|
|
)
|
|
|
|
.. code:: ipython3
|
|
|
|
def select_point(img: Image.Image, point_type: str, evt: gr.SelectData) -> Image.Image:
|
|
"""Gradio select callback."""
|
|
img = img.convert("RGBA")
|
|
x, y = evt.index[0], evt.index[1]
|
|
point_radius = np.round(max(img.size) / 100)
|
|
if point_type == "Object point":
|
|
object_points.append((x, y))
|
|
color = (30, 255, 30, 200)
|
|
elif point_type == "Background point":
|
|
background_points.append((x, y))
|
|
color = (255, 30, 30, 200)
|
|
elif point_type == "Bounding Box":
|
|
bbox_points.append((x, y))
|
|
color = (10, 10, 255, 255)
|
|
if len(bbox_points) % 2 == 0:
|
|
# Draw a rectangle if number of points is even
|
|
new_img = Image.new("RGBA", img.size, (255, 255, 255, 0))
|
|
_draw = ImageDraw.Draw(new_img)
|
|
x0, y0, x1, y1 = *bbox_points[-2], *bbox_points[-1]
|
|
x0, x1 = sorted([x0, x1])
|
|
y0, y1 = sorted([y0, y1])
|
|
# Save sorted order
|
|
bbox_points[-2] = (x0, y0)
|
|
bbox_points[-1] = (x1, y1)
|
|
_draw.rectangle((x0, y0, x1, y1), fill=(*color[:-1], 90))
|
|
img = Image.alpha_composite(img, new_img)
|
|
# Draw a point
|
|
ImageDraw.Draw(img).ellipse(
|
|
[(x - point_radius, y - point_radius), (x + point_radius, y + point_radius)],
|
|
fill=color
|
|
)
|
|
return img
|
|
|
|
def clear_points() -> (Image.Image, None):
|
|
"""Gradio clear points callback."""
|
|
global object_points, background_points, bbox_points
|
|
# global object_points; global background_points; global bbox_points
|
|
object_points = []
|
|
background_points = []
|
|
bbox_points = []
|
|
return last_image, None
|
|
|
|
def save_last_picked_image(img: Image.Image) -> None:
|
|
"""Gradio callback saves the last used image."""
|
|
global last_image
|
|
last_image = img
|
|
# If we change the input image
|
|
# we should clear all the previous points
|
|
clear_points()
|
|
# Removes the segmentation map output
|
|
return None
|
|
|
|
with gr.Blocks(title="Fast SAM") as demo:
|
|
with gr.Row(variant="panel"):
|
|
original_img = gr.Image(label="Input", value=examples[0][0], type="pil")
|
|
segmented_img = gr.Image(label="Segmentation Map", type="pil")
|
|
with gr.Row():
|
|
point_type = gr.Radio(
|
|
["Object point", "Background point", "Bounding Box"],
|
|
value="Object point", label="Pixel selector type"
|
|
)
|
|
model_type = gr.Radio(
|
|
["FP32 model", "Quantized model"] if do_quantize.value else ["FP32 model"],
|
|
value="FP32 model", label="Select model variant"
|
|
)
|
|
with gr.Row(variant="panel"):
|
|
segment_button = gr.Button("Segment", variant="primary")
|
|
clear_button = gr.Button("Clear points", variant="secondary")
|
|
gr.Examples(examples, inputs=original_img,
|
|
fn=save_last_picked_image, run_on_click=True, outputs=segmented_img
|
|
)
|
|
|
|
# Callbacks
|
|
original_img.select(select_point,
|
|
inputs=[original_img, point_type],
|
|
outputs=original_img)
|
|
original_img.upload(save_last_picked_image, inputs=original_img, outputs=segmented_img)
|
|
clear_button.click(clear_points, outputs=[original_img, segmented_img])
|
|
segment_button.click(segment, inputs=[original_img, model_type], outputs=segmented_img)
|
|
|
|
try:
|
|
demo.queue().launch(debug=False)
|
|
except Exception:
|
|
demo.queue().launch(share=True, debug=False)
|
|
|
|
# If you are launching remotely, specify server_name and server_port
|
|
# EXAMPLE: `demo.launch(server_name="your server name", server_port="server port in int")`
|
|
# To learn more please refer to the Gradio docs: https://gradio.app/docs/
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
Running on local URL: http://127.0.0.1:7860
|
|
|
|
To create a public link, set `share=True` in `launch()`.
|
|
|
|
|
|
|
|
.. .. raw:: html
|
|
|
|
.. <div><iframe src="http://127.0.0.1:7860/" width="100%" height="500" allow="autoplay; camera; microphone; clipboard-read; clipboard-write;" frameborder="0" allowfullscreen></iframe></div>
|
|
|