openvino/docs/notebooks/272-paint-by-example-with-output.rst

Paint By Example: Exemplar-based Image Editing with Diffusion Models
====================================================================

Stable Diffusion in Diffusers library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To work with Stable Diffusion, we will use the Hugging Face
`Diffusers <https://github.com/huggingface/diffusers>`__ library. To
experiment with in-painting we can use Diffusers which exposes the
`StableDiffusionInpaintPipeline <https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation>`__
similar to the `other Diffusers
pipelines <https://huggingface.co/docs/diffusers/api/pipelines/overview>`__.
The code below demonstrates how to create
``StableDiffusionInpaintPipeline`` using
``stable-diffusion-2-inpainting``. To create the drawing tool we will
install Gradio for handling user interaction.

This is the overall flow of the application: |Flow Diagram|

This is the detailed flowchart for the pipeline: |pipeline-flowchart|

.. |Flow Diagram| image:: https://user-images.githubusercontent.com/103226580/236954918-f364b227-293c-4f78-a9bf-9dcebcb1034a.png
.. |pipeline-flowchart| image:: https://github.com/openvinotoolkit/openvino_notebooks/assets/103226580/cde2d5c4-2540-4a45-ad9c-339f7a69459d

.. code:: ipython3

    %pip install -q "gradio == 3.50.2"
    %pip install -q "diffusers>=-1.14.0" "openvino>=2023.2.0" "transformers >= 4.25.1"


.. parsed-literal::

    Collecting gradio==3.50.2
      Downloading gradio-3.50.2-py3-none-any.whl (20.3 MB)
                                                  0.0/20.3 MB ? eta -:--:--
                                                  0.4/20.3 MB 8.1 MB/s eta 0:00:03
         -                                        1.0/20.3 MB 10.2 MB/s eta 0:00:02
         ---                                      1.6/20.3 MB 11.0 MB/s eta 0:00:02
         ---                                      1.9/20.3 MB 10.1 MB/s eta 0:00:02
         ----                                     2.3/20.3 MB 10.5 MB/s eta 0:00:02
         -----                                    2.6/20.3 MB 9.8 MB/s eta 0:00:02
         ------                                   3.1/20.3 MB 9.8 MB/s eta 0:00:02
         ------                                   3.5/20.3 MB 9.7 MB/s eta 0:00:02
         -------                                  4.0/20.3 MB 9.8 MB/s eta 0:00:02
         --------                                 4.4/20.3 MB 9.7 MB/s eta 0:00:02
         ---------                                4.9/20.3 MB 9.8 MB/s eta 0:00:02
         ----------                               5.4/20.3 MB 9.8 MB/s eta 0:00:02
         -----------                              5.9/20.3 MB 9.9 MB/s eta 0:00:02
         ------------                             6.4/20.3 MB 10.2 MB/s eta 0:00:02
         -------------                            6.9/20.3 MB 10.2 MB/s eta 0:00:02
         --------------                           7.4/20.3 MB 10.3 MB/s eta 0:00:02
         ---------------                          7.9/20.3 MB 10.3 MB/s eta 0:00:02
         ----------------                         8.3/20.3 MB 10.2 MB/s eta 0:00:02
         -----------------                        8.8/20.3 MB 10.3 MB/s eta 0:00:02
         ------------------                       9.4/20.3 MB 10.4 MB/s eta 0:00:02
         -------------------                      9.9/20.3 MB 10.4 MB/s eta 0:00:01
         -------------------                     10.3/20.3 MB 10.2 MB/s eta 0:00:01
         --------------------                    10.8/20.3 MB 10.2 MB/s eta 0:00:01
         ---------------------                   11.1/20.3 MB 10.2 MB/s eta 0:00:01
         ----------------------                  11.6/20.3 MB 10.1 MB/s eta 0:00:01
         -----------------------                 12.1/20.3 MB 10.2 MB/s eta 0:00:01
         ------------------------                12.6/20.3 MB 10.4 MB/s eta 0:00:01
         -------------------------               13.0/20.3 MB 10.4 MB/s eta 0:00:01
         -------------------------               13.5/20.3 MB 10.4 MB/s eta 0:00:01
         --------------------------              14.0/20.3 MB 10.4 MB/s eta 0:00:01
         ---------------------------             14.5/20.3 MB 10.6 MB/s eta 0:00:01
         ----------------------------            15.0/20.3 MB 10.6 MB/s eta 0:00:01
         -----------------------------           15.5/20.3 MB 10.6 MB/s eta 0:00:01
         ------------------------------          16.0/20.3 MB 10.6 MB/s eta 0:00:01
         -------------------------------         16.5/20.3 MB 10.6 MB/s eta 0:00:01
         --------------------------------        17.1/20.3 MB 10.6 MB/s eta 0:00:01
         ---------------------------------       17.5/20.3 MB 10.7 MB/s eta 0:00:01
         ----------------------------------      18.0/20.3 MB 10.7 MB/s eta 0:00:01
         -----------------------------------     18.5/20.3 MB 10.7 MB/s eta 0:00:01
         ------------------------------------    18.9/20.3 MB 10.6 MB/s eta 0:00:01
         -------------------------------------   19.4/20.3 MB 10.6 MB/s eta 0:00:01
         --------------------------------------  19.8/20.3 MB 10.4 MB/s eta 0:00:01
         --------------------------------------  20.3/20.3 MB 10.6 MB/s eta 0:00:01
         --------------------------------------  20.3/20.3 MB 10.6 MB/s eta 0:00:01
         --------------------------------------  20.3/20.3 MB 10.6 MB/s eta 0:00:01
         --------------------------------------  20.3/20.3 MB 10.6 MB/s eta 0:00:01
         --------------------------------------  20.3/20.3 MB 10.6 MB/s eta 0:00:01
         ---------------------------------------- 20.3/20.3 MB 8.5 MB/s eta 0:00:00
    Requirement already satisfied: aiofiles<24.0,>=22.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (22.1.0)
    Requirement already satisfied: altair<6.0,>=4.2.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (4.2.2)
    Requirement already satisfied: fastapi in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.95.1)
    Requirement already satisfied: ffmpy in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.3.0)
    Collecting gradio-client==0.6.1 (from gradio==3.50.2)
      Downloading gradio_client-0.6.1-py3-none-any.whl (299 kB)
                                                  0.0/299.2 kB ? eta -:--:--
         -------------------------------------- 299.2/299.2 kB 6.3 MB/s eta 0:00:00
    Requirement already satisfied: httpx in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.24.0)
    Requirement already satisfied: huggingface-hub>=0.14.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.14.1)
    Collecting importlib-resources<7.0,>=1.3 (from gradio==3.50.2)
      Downloading importlib_resources-6.1.1-py3-none-any.whl (33 kB)
    Requirement already satisfied: jinja2<4.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (3.1.2)
    Requirement already satisfied: markupsafe~=2.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (2.1.2)
    Requirement already satisfied: matplotlib~=3.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (3.5.2)
    Requirement already satisfied: numpy~=1.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (1.23.4)
    Requirement already satisfied: orjson~=3.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (3.8.11)
    Requirement already satisfied: packaging in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (23.1)
    Requirement already satisfied: pandas<3.0,>=1.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (1.3.5)
    Requirement already satisfied: pillow<11.0,>=8.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (9.5.0)
    Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,<3.0.0,>=1.7.4 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (1.10.7)
    Requirement already satisfied: pydub in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.25.1)
    Requirement already satisfied: python-multipart in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.0.6)
    Requirement already satisfied: pyyaml<7.0,>=5.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (6.0)
    Requirement already satisfied: requests~=2.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (2.29.0)
    Requirement already satisfied: semantic-version~=2.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (2.10.0)
    Requirement already satisfied: typing-extensions~=4.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (4.5.0)
    Requirement already satisfied: uvicorn>=0.14.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (0.22.0)
    Requirement already satisfied: websockets<12.0,>=10.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio==3.50.2) (11.0.2)
    Requirement already satisfied: fsspec in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from gradio-client==0.6.1->gradio==3.50.2) (2023.4.0)
    Requirement already satisfied: entrypoints in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from altair<6.0,>=4.2.0->gradio==3.50.2) (0.4)
    Requirement already satisfied: jsonschema>=3.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from altair<6.0,>=4.2.0->gradio==3.50.2) (4.17.3)
    Requirement already satisfied: toolz in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from altair<6.0,>=4.2.0->gradio==3.50.2) (0.12.0)
    Requirement already satisfied: filelock in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from huggingface-hub>=0.14.0->gradio==3.50.2) (3.12.0)
    Requirement already satisfied: tqdm>=4.42.1 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from huggingface-hub>=0.14.0->gradio==3.50.2) (4.65.0)
    Requirement already satisfied: cycler>=0.10 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from matplotlib~=3.0->gradio==3.50.2) (0.11.0)
    Requirement already satisfied: fonttools>=4.22.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from matplotlib~=3.0->gradio==3.50.2) (4.39.3)
    Requirement already satisfied: kiwisolver>=1.0.1 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from matplotlib~=3.0->gradio==3.50.2) (1.4.4)
    Requirement already satisfied: pyparsing>=2.2.1 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from matplotlib~=3.0->gradio==3.50.2) (2.4.7)
    Requirement already satisfied: python-dateutil>=2.7 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from matplotlib~=3.0->gradio==3.50.2) (2.8.2)
    Requirement already satisfied: pytz>=2017.3 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from pandas<3.0,>=1.0->gradio==3.50.2) (2023.3)
    Requirement already satisfied: charset-normalizer<4,>=2 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from requests~=2.0->gradio==3.50.2) (3.1.0)
    Requirement already satisfied: idna<4,>=2.5 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from requests~=2.0->gradio==3.50.2) (3.4)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from requests~=2.0->gradio==3.50.2) (1.26.15)
    Requirement already satisfied: certifi>=2017.4.17 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from requests~=2.0->gradio==3.50.2) (2022.12.7)
    Requirement already satisfied: click>=7.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from uvicorn>=0.14.0->gradio==3.50.2) (8.1.3)
    Requirement already satisfied: h11>=0.8 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from uvicorn>=0.14.0->gradio==3.50.2) (0.14.0)
    Requirement already satisfied: starlette<0.27.0,>=0.26.1 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from fastapi->gradio==3.50.2) (0.26.1)
    Requirement already satisfied: httpcore<0.18.0,>=0.15.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from httpx->gradio==3.50.2) (0.17.0)
    Requirement already satisfied: sniffio in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from httpx->gradio==3.50.2) (1.3.0)
    Requirement already satisfied: colorama in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from click>=7.0->uvicorn>=0.14.0->gradio==3.50.2) (0.4.6)
    Requirement already satisfied: anyio<5.0,>=3.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from httpcore<0.18.0,>=0.15.0->httpx->gradio==3.50.2) (3.6.2)
    Requirement already satisfied: attrs>=17.4.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.50.2) (23.1.0)
    Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.50.2) (0.19.3)
    Requirement already satisfied: six>=1.5 in c:\hackathon\openvino_notebooks\venv310\lib\site-packages (from python-dateutil>=2.7->matplotlib~=3.0->gradio==3.50.2) (1.16.0)
    Installing collected packages: importlib-resources, gradio-client, gradio
      Attempting uninstall: gradio-client
        Found existing installation: gradio_client 0.1.4
        Uninstalling gradio_client-0.1.4:
          Successfully uninstalled gradio_client-0.1.4
      Attempting uninstall: gradio
        Found existing installation: gradio 3.28.1
        Uninstalling gradio-3.28.1:
          Successfully uninstalled gradio-3.28.1
    Successfully installed gradio-3.50.2 gradio-client-0.6.1 importlib-resources-6.1.1
    Note: you may need to restart the kernel to use updated packages.


.. parsed-literal::


    [notice] A new release of pip is available: 23.1 -> 23.3.1
    [notice] To update, run: python.exe -m pip install --upgrade pip


.. parsed-literal::

    Note: you may need to restart the kernel to use updated packages.


.. parsed-literal::


    [notice] A new release of pip is available: 23.1 -> 23.3.1
    [notice] To update, run: python.exe -m pip install --upgrade pip


Download the model from `HuggingFace
Paint-by-Example <https://huggingface.co/Fantasy-Studio/Paint-by-Example>`__.
This might take several minutes because it is over 5GB

.. code:: ipython3

    from diffusers import DPMSolverMultistepScheduler, DiffusionPipeline

    pipeline = DiffusionPipeline.from_pretrained("Fantasy-Studio/Paint-By-Example")

    scheduler_inpaint = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)


.. parsed-literal::

    Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with:
    ```
    pip install accelerate
    ```
    .
    You are using a model of type clip_vision_model to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors.


.. code:: ipython3

    import gc

    extractor = pipeline.feature_extractor
    image_encoder = pipeline.image_encoder
    image_encoder.eval()
    unet_inpaint = pipeline.unet
    unet_inpaint.eval()
    vae_inpaint = pipeline.vae
    vae_inpaint.eval()

    del pipeline
    gc.collect();

Download default images
~~~~~~~~~~~~~~~~~~~~~~~

Download default images.

.. code:: ipython3

    # Fetch `notebook_utils` module
    import urllib.request
    urllib.request.urlretrieve(
        url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
        filename='notebook_utils.py'
    )

    from notebook_utils import download_file

    download_file("https://github-production-user-asset-6210df.s3.amazonaws.com/103226580/286377210-edc98e97-0e43-4796-b771-dacd074c39ea.png", "0.png", "data/image")

    download_file("https://github-production-user-asset-6210df.s3.amazonaws.com/103226580/286377233-b2c2d902-d379-415a-8183-5bdd37c52429.png", "1.png", "data/image")

    download_file("https://github-production-user-asset-6210df.s3.amazonaws.com/103226580/286377248-da1db61e-3521-4cdb-85c8-1386d360ce22.png", "2.png", "data/image")

    download_file("https://github-production-user-asset-6210df.s3.amazonaws.com/103226580/286377279-fa496f17-e850-4351-87c5-2552dfbc4633.jpg", "bird.jpg", "data/reference")

    download_file("https://github-production-user-asset-6210df.s3.amazonaws.com/103226580/286377298-06a25ff2-84d8-4d46-95cd-8c25efa690d8.jpg", "car.jpg", "data/reference")

    download_file("https://github-production-user-asset-6210df.s3.amazonaws.com/103226580/286377318-8841a801-1933-4523-a433-7d2fb64c47e6.jpg", "dog.jpg", "data/reference")


.. parsed-literal::

    data\image\0.png:   0%|          | 0.00/453k [00:00<?, ?B/s]


.. parsed-literal::

    data\image\1.png:   0%|          | 0.00/545k [00:00<?, ?B/s]


.. parsed-literal::

    data\image\2.png:   0%|          | 0.00/431k [00:00<?, ?B/s]


.. parsed-literal::

    data\reference\bird.jpg:   0%|          | 0.00/835k [00:00<?, ?B/s]


.. parsed-literal::

    data\reference\car.jpg:   0%|          | 0.00/414k [00:00<?, ?B/s]


.. parsed-literal::

    data\reference\dog.jpg:   0%|          | 0.00/543k [00:00<?, ?B/s]


.. parsed-literal::

    WindowsPath('C:/hackathon/openvino_notebooks/notebooks/272-paint-by-example/data/reference/dog.jpg')


Convert models to OpenVINO Intermediate representation (IR) format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Adapted from `236 Stable Diffusion v2 Infinite Zoom
notebook <236-stable-diffusion-v2-with-output.html>`__

.. code:: ipython3

    from pathlib import Path
    import torch
    import numpy as np
    import openvino as ov

    model_dir = Path("model")
    model_dir.mkdir(exist_ok=True)
    sd2_inpainting_model_dir = Path("model/paint_by_example")
    sd2_inpainting_model_dir.mkdir(exist_ok=True)

Functions to convert to OpenVINO IR format

.. code:: ipython3

    def cleanup_torchscript_cache():
        """
        Helper for removing cached model representation
        """
        torch._C._jit_clear_class_registry()
        torch.jit._recursive.concrete_type_store = torch.jit._recursive.ConcreteTypeStore()
        torch.jit._state._clear_class_state()


    def convert_image_encoder(image_encoder: torch.nn.Module, ir_path:Path):
        """
        Convert Image Encoder model to IR.
        Function accepts pipeline, prepares example inputs for conversion
        Parameters:
            image_encoder (torch.nn.Module): image encoder PyTorch model
            ir_path (Path): File for storing model
        Returns:
            None
        """
        class ImageEncoderWrapper(torch.nn.Module):
            def __init__(self, image_encoder):
                super().__init__()
                self.image_encoder = image_encoder

            def forward(self, image):
                image_embeddings, negative_prompt_embeds = self.image_encoder(image, return_uncond_vector=True)
                return image_embeddings, negative_prompt_embeds

        if not ir_path.exists():
            image_encoder = ImageEncoderWrapper(image_encoder)
            image_encoder.eval()
            input_ids = torch.randn((1,3,224,224))
            # switch model to inference mode

            # disable gradients calculation for reducing memory consumption
            with torch.no_grad():
                ov_model = ov.convert_model(
                    image_encoder,
                    example_input=input_ids,
                    input=([1,3,224,224],)
                )
                ov.save_model(ov_model, ir_path)
                del ov_model
                cleanup_torchscript_cache()
            print('Image Encoder successfully converted to IR')


    def convert_unet(unet:torch.nn.Module, ir_path:Path, num_channels:int = 4, width:int = 64, height:int = 64):
        """
        Convert Unet model to IR format.
        Function accepts pipeline, prepares example inputs for conversion
        Parameters:
            unet (torch.nn.Module): UNet PyTorch model
            ir_path (Path): File for storing model
            num_channels (int, optional, 4): number of input channels
            width (int, optional, 64): input width
            height (int, optional, 64): input height
        Returns:
            None
        """
        dtype_mapping = {
            torch.float32: ov.Type.f32,
            torch.float64: ov.Type.f64
        }
        if not ir_path.exists():
            # prepare inputs
            encoder_hidden_state = torch.ones((2, 1, 768))
            latents_shape = (2, num_channels, width, height)
            latents = torch.randn(latents_shape)
            t = torch.from_numpy(np.array(1, dtype=np.float32))
            unet.eval()
            dummy_inputs = (latents, t, encoder_hidden_state)
            input_info = []
            for input_tensor in dummy_inputs:
                shape = ov.PartialShape(tuple(input_tensor.shape))
                element_type = dtype_mapping[input_tensor.dtype]
                input_info.append((shape, element_type))

            with torch.no_grad():
                ov_model = ov.convert_model(
                    unet,
                    example_input=dummy_inputs,
                    input=input_info
                )
                ov.save_model(ov_model, ir_path)
                del ov_model
                cleanup_torchscript_cache()
            print('U-Net successfully converted to IR')


    def convert_vae_encoder(vae: torch.nn.Module, ir_path: Path, width:int = 512, height:int = 512):
        """
        Convert VAE model to IR format.
        Function accepts VAE model, creates wrapper class for export only necessary for inference part,
        prepares example inputs for conversion,
        Parameters:
            vae (torch.nn.Module): VAE PyTorch model
            ir_path (Path): File for storing model
            width (int, optional, 512): input width
            height (int, optional, 512): input height
        Returns:
            None
        """
        class VAEEncoderWrapper(torch.nn.Module):
            def __init__(self, vae):
                super().__init__()
                self.vae = vae

            def forward(self, image):
                latents = self.vae.encode(image).latent_dist.sample()
                return latents

        if not ir_path.exists():
            vae_encoder = VAEEncoderWrapper(vae)
            vae_encoder.eval()
            image = torch.zeros((1, 3, width, height))
            with torch.no_grad():
                ov_model = ov.convert_model(vae_encoder, example_input=image, input=([1,3, width, height],))
            ov.save_model(ov_model, ir_path)
            del ov_model
            cleanup_torchscript_cache()
            print('VAE encoder successfully converted to IR')


    def convert_vae_decoder(vae: torch.nn.Module, ir_path: Path, width:int = 64, height:int = 64):
        """
        Convert VAE decoder model to IR format.
        Function accepts VAE model, creates wrapper class for export only necessary for inference part,
        prepares example inputs for conversion,
        Parameters:
            vae (torch.nn.Module): VAE model
            ir_path (Path): File for storing model
            width (int, optional, 64): input width
            height (int, optional, 64): input height
        Returns:
            None
        """
        class VAEDecoderWrapper(torch.nn.Module):
            def __init__(self, vae):
                super().__init__()
                self.vae = vae

            def forward(self, latents):
                latents = 1 / 0.18215 * latents
                return self.vae.decode(latents)

        if not ir_path.exists():
            vae_decoder = VAEDecoderWrapper(vae)
            latents = torch.zeros((1, 4, width, height))

            vae_decoder.eval()
            with torch.no_grad():
                ov_model = ov.convert_model(vae_decoder, example_input=latents, input=([1, 4, width, height],))
            ov.save_model(ov_model, ir_path)
            del ov_model
            cleanup_torchscript_cache()
            print('VAE decoder successfully converted to ')

Do the conversion of the in-painting model:

.. code:: ipython3

    IMAGE_ENCODER_OV_PATH_INPAINT = sd2_inpainting_model_dir / "image_encoder.xml"

    if not IMAGE_ENCODER_OV_PATH_INPAINT.exists():
        convert_image_encoder(image_encoder, IMAGE_ENCODER_OV_PATH_INPAINT)
    else:
        print(f"Image encoder will be loaded from {IMAGE_ENCODER_OV_PATH_INPAINT}")

    del image_encoder
    gc.collect();


.. parsed-literal::

    Image encoder will be loaded from model\paint_by_example\image_encoder.xml


Do the conversion of the Unet model

.. code:: ipython3

    UNET_OV_PATH_INPAINT = sd2_inpainting_model_dir / 'unet.xml'
    if not UNET_OV_PATH_INPAINT.exists():
        convert_unet(unet_inpaint, UNET_OV_PATH_INPAINT, num_channels=9, width=64, height=64)
        del unet_inpaint
        gc.collect()
    else:
        del unet_inpaint
        print(f"U-Net will be loaded from {UNET_OV_PATH_INPAINT}")
    gc.collect();


.. parsed-literal::

    U-Net will be loaded from model\paint_by_example\unet.xml


Do the conversion of the VAE Encoder model

.. code:: ipython3

    VAE_ENCODER_OV_PATH_INPAINT = sd2_inpainting_model_dir / 'vae_encoder.xml'

    if not VAE_ENCODER_OV_PATH_INPAINT.exists():
        convert_vae_encoder(vae_inpaint, VAE_ENCODER_OV_PATH_INPAINT, 512, 512)
    else:
        print(f"VAE encoder will be loaded from {VAE_ENCODER_OV_PATH_INPAINT}")

    VAE_DECODER_OV_PATH_INPAINT = sd2_inpainting_model_dir / 'vae_decoder.xml'
    if not VAE_DECODER_OV_PATH_INPAINT.exists():
        convert_vae_decoder(vae_inpaint, VAE_DECODER_OV_PATH_INPAINT, 64, 64)
    else:
        print(f"VAE decoder will be loaded from {VAE_DECODER_OV_PATH_INPAINT}")

    del vae_inpaint
    gc.collect();


.. parsed-literal::

    VAE encoder will be loaded from model\paint_by_example\vae_encoder.xml
    VAE decoder will be loaded from model\paint_by_example\vae_decoder.xml


Prepare Inference pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~

Function to prepare the mask and masked image.

Adapted from `236 Stable Diffusion v2 Infinite Zoom
notebook <236-stable-diffusion-v2-with-output.html>`__

The main difference is that instead of encoding a text prompt it will
now encode an image as the prompt.

.. code:: ipython3

    import inspect
    from typing import Optional, Union, Dict

    import PIL
    import cv2

    from transformers import CLIPImageProcessor
    from diffusers.pipelines.pipeline_utils import DiffusionPipeline
    from diffusers.schedulers import DDIMScheduler, LMSDiscreteScheduler, PNDMScheduler
    from openvino.runtime import Model


    def prepare_mask_and_masked_image(image:PIL.Image.Image, mask:PIL.Image.Image):
        """
        Prepares a pair (image, mask) to be consumed by the Stable Diffusion pipeline. This means that those inputs will be
        converted to ``np.array`` with shapes ``batch x channels x height x width`` where ``channels`` is ``3`` for the
        ``image`` and ``1`` for the ``mask``.

        The ``image`` will be converted to ``np.float32`` and normalized to be in ``[-1, 1]``. The ``mask`` will be
        binarized (``mask > 0.5``) and cast to ``np.float32`` too.

        Args:
            image (Union[np.array, PIL.Image]): The image to inpaint.
                It can be a ``PIL.Image``, or a ``height x width x 3`` ``np.array``
            mask (_type_): The mask to apply to the image, i.e. regions to inpaint.
                It can be a ``PIL.Image``, or a ``height x width`` ``np.array``.

        Returns:
            tuple[np.array]: The pair (mask, masked_image) as ``torch.Tensor`` with 4
                dimensions: ``batch x channels x height x width``.
        """
        if isinstance(image, (PIL.Image.Image, np.ndarray)):
            image = [image]

        if isinstance(image, list) and isinstance(image[0], PIL.Image.Image):
            image = [np.array(i.convert("RGB"))[None, :] for i in image]
            image = np.concatenate(image, axis=0)
        elif isinstance(image, list) and isinstance(image[0], np.ndarray):
            image = np.concatenate([i[None, :] for i in image], axis=0)

        image = image.transpose(0, 3, 1, 2)
        image = image.astype(np.float32) / 127.5 - 1.0

        # preprocess mask
        if isinstance(mask, (PIL.Image.Image, np.ndarray)):
            mask = [mask]

        if isinstance(mask, list) and isinstance(mask[0], PIL.Image.Image):
            mask = np.concatenate([np.array(m.convert("L"))[None, None, :] for m in mask], axis=0)
            mask = mask.astype(np.float32) / 255.0
        elif isinstance(mask, list) and isinstance(mask[0], np.ndarray):
            mask = np.concatenate([m[None, None, :] for m in mask], axis=0)

        mask = 1 - mask

        mask[mask < 0.5] = 0
        mask[mask >= 0.5] = 1

        masked_image = image * mask

        return mask, masked_image

Class for the pipeline which will connect all the models together: VAE
decode –> image encode –> tokenizer –> Unet –> VAE model –> scheduler

.. code:: ipython3

    class OVStableDiffusionInpaintingPipeline(DiffusionPipeline):
        def __init__(
            self,
            vae_decoder: Model,
            image_encoder: Model,
            image_processor: CLIPImageProcessor,
            unet: Model,
            scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
            vae_encoder: Model = None,
        ):
            """
            Pipeline for text-to-image generation using Stable Diffusion.
            Parameters:
                vae_decoder (Model):
                    Variational Auto-Encoder (VAE) Model to decode images to and from latent representations.
                image_encoder (Model):
                    https://huggingface.co/Fantasy-Studio/Paint-by-Example/blob/main/image_encoder/config.json
                tokenizer (CLIPTokenizer):
                    Tokenizer of class CLIPTokenizer(https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
                unet (Model): Conditional U-Net architecture to denoise the encoded image latents.
                vae_encoder (Model):
                    Variational Auto-Encoder (VAE) Model to encode images to latent representation.
                scheduler (SchedulerMixin):
                    A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of
                    DDIMScheduler, LMSDiscreteScheduler, or PNDMScheduler.
            """
            super().__init__()
            self.scheduler = scheduler
            self.vae_decoder = vae_decoder
            self.vae_encoder = vae_encoder
            self.image_encoder = image_encoder
            self.unet = unet
            self._unet_output = unet.output(0)
            self._vae_d_output = vae_decoder.output(0)
            self._vae_e_output = vae_encoder.output(0) if vae_encoder is not None else None
            self.height = self.unet.input(0).shape[2] * 8
            self.width = self.unet.input(0).shape[3] * 8
            self.image_processor = image_processor

        def prepare_mask_latents(
            self,
            mask,
            masked_image,
            height=512,
            width=512,
            do_classifier_free_guidance=True,
        ):
            """
            Prepare mask as Unet nput and encode input masked image to latent space using vae encoder

            Parameters:
              mask (np.array): input mask array
              masked_image (np.array): masked input image tensor
              heigh (int, *optional*, 512): generated image height
              width (int, *optional*, 512): generated image width
              do_classifier_free_guidance (bool, *optional*, True): whether to use classifier free guidance or not
            Returns:
              mask (np.array): resized mask tensor
              masked_image_latents (np.array): masked image encoded into latent space using VAE
            """
            mask = torch.nn.functional.interpolate(torch.from_numpy(mask), size=(height // 8, width // 8))
            mask = mask.numpy()

            # encode the mask image into latents space so we can concatenate it to the latents
            masked_image_latents = self.vae_encoder(masked_image)[self._vae_e_output]
            masked_image_latents = 0.18215 * masked_image_latents

            mask = np.concatenate([mask] * 2) if do_classifier_free_guidance else mask
            masked_image_latents = (
                np.concatenate([masked_image_latents] * 2)
                if do_classifier_free_guidance
                else masked_image_latents
            )
            return mask, masked_image_latents

        def __call__(
            self,
            image: PIL.Image.Image,
            mask_image: PIL.Image.Image,
            reference_image: PIL.Image.Image,
            num_inference_steps: Optional[int] = 50,
            guidance_scale: Optional[float] = 7.5,
            eta: Optional[float] = 0,
            output_type: Optional[str] = "pil",
            seed: Optional[int] = None,
        ):
            """
            Function invoked when calling the pipeline for generation.
            Parameters:
                image (PIL.Image.Image):
                     Source image for inpainting.
                mask_image (PIL.Image.Image):
                     Mask area for inpainting
                reference_image (PIL.Image.Image):
                     Reference image to inpaint in mask area
                num_inference_steps (int, *optional*, defaults to 50):
                    The number of denoising steps. More denoising steps usually lead to a higher quality image at the
                    expense of slower inference.
                guidance_scale (float, *optional*, defaults to 7.5):
                    Guidance scale as defined in Classifier-Free Diffusion Guidance(https://arxiv.org/abs/2207.12598).
                    guidance_scale is defined as `w` of equation 2.
                    Higher guidance scale encourages to generate images that are closely linked to the text prompt,
                    usually at the expense of lower image quality.
                eta (float, *optional*, defaults to 0.0):
                    Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
                    [DDIMScheduler], will be ignored for others.
                output_type (`str`, *optional*, defaults to "pil"):
                    The output format of the generate image. Choose between
                    [PIL](https://pillow.readthedocs.io/en/stable/): PIL.Image.Image or np.array.
                seed (int, *optional*, None):
                    Seed for random generator state initialization.
            Returns:
                Dictionary with keys:
                    sample - the last generated image PIL.Image.Image or np.array
            """
            if seed is not None:
                np.random.seed(seed)
            # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
            # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
            # corresponds to doing no classifier free guidance.
            do_classifier_free_guidance = guidance_scale > 1.0

            # get reference image embeddings
            image_embeddings = self._encode_image(reference_image, do_classifier_free_guidance=do_classifier_free_guidance)

            # prepare mask
            mask, masked_image = prepare_mask_and_masked_image(image, mask_image)
            # set timesteps
            accepts_offset = "offset" in set(
                inspect.signature(self.scheduler.set_timesteps).parameters.keys()
            )
            extra_set_kwargs = {}
            if accepts_offset:
                extra_set_kwargs["offset"] = 1

            self.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)
            timesteps, num_inference_steps = self.get_timesteps(num_inference_steps, 1)
            latent_timestep = timesteps[:1]

            # get the initial random noise unless the user supplied it
            latents, meta = self.prepare_latents(None, latent_timestep)
            mask, masked_image_latents = self.prepare_mask_latents(
                mask,
                masked_image,
                do_classifier_free_guidance=do_classifier_free_guidance,
            )

            # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
            # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
            # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
            # and should be between [0, 1]
            accepts_eta = "eta" in set(
                inspect.signature(self.scheduler.step).parameters.keys()
            )
            extra_step_kwargs = {}
            if accepts_eta:
                extra_step_kwargs["eta"] = eta

            for t in self.progress_bar(timesteps):
                # expand the latents if we are doing classifier free guidance
                latent_model_input = (
                    np.concatenate([latents] * 2)
                    if do_classifier_free_guidance
                    else latents
                )
                latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
                latent_model_input = np.concatenate(
                    [latent_model_input, masked_image_latents, mask], axis=1
                )
                # predict the noise residual
                noise_pred = self.unet(
                    [latent_model_input, np.array(t, dtype=np.float32), image_embeddings]
                )[self._unet_output]
                # perform guidance
                if do_classifier_free_guidance:
                    noise_pred_uncond, noise_pred_text = noise_pred[0], noise_pred[1]
                    noise_pred = noise_pred_uncond + guidance_scale * (
                        noise_pred_text - noise_pred_uncond
                    )

                # compute the previous noisy sample x_t -> x_t-1
                latents = self.scheduler.step(
                    torch.from_numpy(noise_pred),
                    t,
                    torch.from_numpy(latents),
                    **extra_step_kwargs,
                )["prev_sample"].numpy()
            # scale and decode the image latents with vae
            image = self.vae_decoder(latents)[self._vae_d_output]

            image = self.postprocess_image(image, meta, output_type)
            return {"sample": image}

        def _encode_image(self, image:PIL.Image.Image, do_classifier_free_guidance:bool = True):
            """
            Encodes the image into image encoder hidden states.

            Parameters:
                image (PIL.Image.Image): base image to encode
                do_classifier_free_guidance (bool): whether to use classifier free guidance or not
            Returns:
                image_embeddings (np.ndarray): image encoder hidden states
            """
            processed_image = self.image_processor(image)
            processed_image = processed_image['pixel_values'][0]
            processed_image = np.expand_dims(processed_image, axis=0)

            output = self.image_encoder(processed_image)
            image_embeddings = output[self.image_encoder.output(0)]
            negative_embeddings = output[self.image_encoder.output(1)]

            image_embeddings = np.concatenate([negative_embeddings, image_embeddings])

            return image_embeddings

        def prepare_latents(self, image:PIL.Image.Image = None, latent_timestep:torch.Tensor = None):
            """
            Function for getting initial latents for starting generation

            Parameters:
                image (PIL.Image.Image, *optional*, None):
                    Input image for generation, if not provided randon noise will be used as starting point
                latent_timestep (torch.Tensor, *optional*, None):
                    Predicted by scheduler initial step for image generation, required for latent image mixing with nosie
            Returns:
                latents (np.ndarray):
                    Image encoded in latent space
            """
            latents_shape = (1, 4, self.height // 8, self.width // 8)
            noise = np.random.randn(*latents_shape).astype(np.float32)
            if image is None:
                # if we use LMSDiscreteScheduler, let's make sure latents are mulitplied by sigmas
                if isinstance(self.scheduler, LMSDiscreteScheduler):
                    noise = noise * self.scheduler.sigmas[0].numpy()
                return noise, {}
            input_image, meta = preprocess(image)
            moments = self.vae_encoder(input_image)[self._vae_e_output]
            mean, logvar = np.split(moments, 2, axis=1)
            std = np.exp(logvar * 0.5)
            latents = (mean + std * np.random.randn(*mean.shape)) * 0.18215
            latents = self.scheduler.add_noise(torch.from_numpy(latents), torch.from_numpy(noise), latent_timestep).numpy()
            return latents, meta

        def postprocess_image(self, image:np.ndarray, meta:Dict, output_type:str = "pil"):
            """
            Postprocessing for decoded image. Takes generated image decoded by VAE decoder, unpad it to initila image size (if required),
            normalize and convert to [0, 255] pixels range. Optionally, convertes it from np.ndarray to PIL.Image format

            Parameters:
                image (np.ndarray):
                    Generated image
                meta (Dict):
                    Metadata obtained on latents preparing step, can be empty
                output_type (str, *optional*, pil):
                    Output format for result, can be pil or numpy
            Returns:
                image (List of np.ndarray or PIL.Image.Image):
                    Postprocessed images
            """
            if "padding" in meta:
                pad = meta["padding"]
                (_, end_h), (_, end_w) = pad[1:3]
                h, w = image.shape[2:]
                unpad_h = h - end_h
                unpad_w = w - end_w
                image = image[:, :, :unpad_h, :unpad_w]
            image = np.clip(image / 2 + 0.5, 0, 1)
            image = np.transpose(image, (0, 2, 3, 1))
            # 9. Convert to PIL
            if output_type == "pil":
                image = self.numpy_to_pil(image)
                if "src_height" in meta:
                    orig_height, orig_width = meta["src_height"], meta["src_width"]
                    image = [img.resize((orig_width, orig_height),
                                        PIL.Image.Resampling.LANCZOS) for img in image]
            else:
                if "src_height" in meta:
                    orig_height, orig_width = meta["src_height"], meta["src_width"]
                    image = [cv2.resize(img, (orig_width, orig_width))
                             for img in image]
            return image

        def get_timesteps(self, num_inference_steps:int, strength:float):
            """
            Helper function for getting scheduler timesteps for generation
            In case of image-to-image generation, it updates number of steps according to strength

            Parameters:
               num_inference_steps (int):
                  number of inference steps for generation
               strength (float):
                   value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.
                   Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input.
            """
            # get the original timestep using init_timestep
            init_timestep = min(int(num_inference_steps * strength), num_inference_steps)

            t_start = max(num_inference_steps - init_timestep, 0)
            timesteps = self.scheduler.timesteps[t_start:]

            return timesteps, num_inference_steps - t_start

Select inference device
~~~~~~~~~~~~~~~~~~~~~~~


select device from dropdown list for running inference using OpenVINO

.. code:: ipython3

    from openvino.runtime import Core
    import ipywidgets as widgets

    core = Core()

    device = widgets.Dropdown(
        options=core.available_devices + ["AUTO"],
        value='AUTO',
        description='Device:',
        disabled=False,
    )

    device


.. parsed-literal::

    Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')


Configure Inference Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Configuration steps: 1. Load models on device 2. Configure tokenizer and
scheduler 3. Create instance of OvStableDiffusionInpaintingPipeline
class

This can take a while to run.

.. code:: ipython3

    ov_config = {"INFERENCE_PRECISION_HINT": "f32"} if device.value != "CPU" else {}

    image_encoder_inpaint = core.compile_model(IMAGE_ENCODER_OV_PATH_INPAINT, device.value)
    unet_model_inpaint = core.compile_model(UNET_OV_PATH_INPAINT, device.value)
    vae_decoder_inpaint = core.compile_model(VAE_DECODER_OV_PATH_INPAINT, device.value, ov_config)
    vae_encoder_inpaint = core.compile_model(VAE_ENCODER_OV_PATH_INPAINT, device.value, ov_config)

    ov_pipe_inpaint = OVStableDiffusionInpaintingPipeline(
        image_processor=extractor,
        image_encoder=image_encoder_inpaint,
        unet=unet_model_inpaint,
        vae_encoder=vae_encoder_inpaint,
        vae_decoder=vae_decoder_inpaint,
        scheduler=scheduler_inpaint,
    )

.. code:: ipython3

    # Code adapated from https://huggingface.co/spaces/Fantasy-Studio/Paint-by-Example/blob/main/app.py

    import os
    import gradio as gr

    def predict(dict:gr.components.Image, reference:PIL.Image.Image, seed:int, step:int):
        """
            This function runs when the 'paint' button is pressed. It takes 3 input images. Takes generated image decoded by VAE decoder, unpad it to initila image size (if required),
            normalize and convert to [0, 255] pixels range. Optionally, convertes it from np.ndarray to PIL.Image format

            Parameters:
                dict (Dict):
                    Contains two images in a dictionary
                        'image' is the image that will be painted on
                        'mask' is the black/white image specifying where to paint (white) and not to paint (black)
                image (PIL.Image.Image):
                    Reference image that will be used by the model to know what to paint in the specified area
                seed (int):
                    Used to initialize the random number generator state
                step (int):
                    The number of denoising steps to run during inference. Low = fast/low quality, High = slow/higher quality
            Returns:
                image (PIL.Image.Image):
                    Postprocessed images
        """
        width,height = dict["image"].size

        # If the image is not 512x512 then resize
        if width < height:
            factor = width / 512.0
            width = 512
            height = int((height / factor) / 8.0) * 8
        else:
            factor = height / 512.0
            height = 512
            width = int((width / factor) / 8.0) * 8

        init_image = dict["image"].convert("RGB").resize((width,height))
        mask = dict["mask"].convert("RGB").resize((width,height))

        # If the image is not a 512x512 square then crop
        if width > height:
            buffer = (width - height) / 2
            input_image = init_image.crop((buffer, 0, width - buffer, 512))
            mask = mask.crop((buffer, 0, width - buffer, 512))
        elif width < height:
            buffer = (height - width) / 2
            input_image = init_image.crop((0, buffer, 512, height - buffer))
            mask = mask.crop((0, buffer, 512, height - buffer))
        else:
            input_image = init_image

        if not os.path.exists('output'):
            os.mkdir('output')
        input_image.save('output/init.png')
        mask.save('output/mask.png')
        reference.save('output/ref.png')

        mask = [mask]

        result = ov_pipe_inpaint(
            image=input_image,
            mask_image=mask,
            reference_image=reference,
            seed=seed,
            num_inference_steps=step,
        )["sample"][0]

        out_dir = Path("output")
        out_dir.mkdir(exist_ok=True)
        result.save('output/result.png')

        return result


    example = {}
    ref_dir = 'data/reference'
    image_dir = 'data/image'
    ref_list = [os.path.join(ref_dir,file) for file in os.listdir(ref_dir)]
    ref_list.sort()
    image_list = [os.path.join(image_dir,file) for file in os.listdir(image_dir)]
    image_list.sort()


    image_blocks = gr.Blocks()
    with image_blocks as demo:
        with gr.Group():
            with gr.Box():
                with gr.Row():
                    with gr.Column():
                        image = gr.Image(source='upload', tool='sketch', elem_id="image_upload", type="pil", label="Source Image")
                        reference = gr.Image(source='upload', elem_id="image_upload", type="pil", label="Reference Image")

                    with gr.Column():
                        image_out = gr.Image(label="Output", elem_id="output-img")
                        steps = gr.Slider(label="Steps", value=15, minimum=2, maximum=75, step=1,interactive=True)

                        seed = gr.Slider(0, 10000, label='Seed (0 = random)', value=0, step=1)

                        with gr.Row(elem_id="prompt-container"):
                            btn = gr.Button("Paint!")

                with gr.Row():
                    with gr.Column():
                        gr.Examples(image_list, inputs=[image],label="Examples - Source Image",examples_per_page=12)
                    with gr.Column():
                        gr.Examples(ref_list, inputs=[reference],label="Examples - Reference Image",examples_per_page=12)

                btn.click(fn=predict, inputs=[image, reference, seed, steps], outputs=[image_out])

    # Launching the Gradio app
    try:
        image_blocks.launch(debug=False, height=680)
    except Exception:
        image_blocks.queue().launch(share=True, debug=False, height=680)
    # if you are launching remotely, specify server_name and server_port
    # image_blocks.launch(server_name='your server name', server_port='server port in int')
    # Read more in the docs: https://gradio.app/docs/


.. parsed-literal::

    Running on local URL:  http://127.0.0.1:7860

    To create a public link, set `share=True` in `launch()`.


.. .. raw:: html

..    <div><iframe src="http://127.0.0.1:7860/" width="100%" height="680" allow="autoplay; camera; microphone; clipboard-read; clipboard-write;" frameborder="0" allowfullscreen></iframe></div>