add-253 (#19500)
This commit is contained in:
parent
23cad1770e
commit
8aec490128
896
docs/notebooks/253-zeroscope-text2video-with-output.rst
Normal file
896
docs/notebooks/253-zeroscope-text2video-with-output.rst
Normal file
@ -0,0 +1,896 @@
|
|||||||
|
Video generation with ZeroScope and OpenVINO
|
||||||
|
============================================
|
||||||
|
|
||||||
|
.. _top:
|
||||||
|
|
||||||
|
The ZeroScope model is a free and open-source text-to-video model that
|
||||||
|
can generate realistic and engaging videos from text descriptions. It is
|
||||||
|
based on the
|
||||||
|
`Modelscope <https://modelscope.cn/models/damo/text-to-video-synthesis/summary>`__
|
||||||
|
model, but it has been improved to produce higher-quality videos with a
|
||||||
|
16:9 aspect ratio and no Shutterstock watermark. The ZeroScope model is
|
||||||
|
available in two versions: ZeroScope_v2 576w, which is optimized for
|
||||||
|
rapid content creation at a resolution of 576x320 pixels, and
|
||||||
|
ZeroScope_v2 XL, which upscales videos to a high-definition resolution
|
||||||
|
of 1024x576.
|
||||||
|
|
||||||
|
The ZeroScope model is trained on a dataset of over 9,000 videos and
|
||||||
|
29,000 tagged frames. It uses a diffusion model to generate videos,
|
||||||
|
which means that it starts with a random noise image and gradually adds
|
||||||
|
detail to it until it matches the text description. The ZeroScope model
|
||||||
|
is still under development, but it has already been used to create some
|
||||||
|
impressive videos. For example, it has been used to create videos of
|
||||||
|
people dancing, playing sports, and even driving cars.
|
||||||
|
|
||||||
|
The ZeroScope model is a powerful tool that can be used to create
|
||||||
|
various videos, from simple animations to complex scenes. It is still
|
||||||
|
under development, but it has the potential to revolutionize the way we
|
||||||
|
create and consume video content.
|
||||||
|
|
||||||
|
Both versions of the ZeroScope model are available on Hugging Face:
|
||||||
|
|
||||||
|
- `ZeroScope_v2 576w <https://huggingface.co/cerspense/zeroscope_v2_576w>`__
|
||||||
|
- `ZeroScope_v2 XL <https://huggingface.co/cerspense/zeroscope_v2_XL>`__
|
||||||
|
|
||||||
|
We will use the first one.
|
||||||
|
|
||||||
|
**Table of contents**:
|
||||||
|
|
||||||
|
- `Install and import required packages <#install-and-import-required-packages>`__
|
||||||
|
- `Load the model <#load-the-model>`__
|
||||||
|
- `Convert the model <#convert-the-model>`__
|
||||||
|
|
||||||
|
- `Define the conversion function <#define-the-conversion-function>`__
|
||||||
|
- `UNet <#unet>`__ -
|
||||||
|
- `VAE <#vae>`__
|
||||||
|
- `Text encoder <#text-encoder>`__
|
||||||
|
|
||||||
|
- `Build a pipeline <#build-a-pipeline>`__
|
||||||
|
- `Inference with OpenVINO <#inference-with-openvino>`__
|
||||||
|
|
||||||
|
- `Select inference device <#select-inference-device>`__
|
||||||
|
- `Define a prompt <#define-a-prompt>`__
|
||||||
|
- `Video generation <#video-generation>`__
|
||||||
|
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
This tutorial requires at least 24GB of free memory to generate a video with
|
||||||
|
a frame size of 432x240 and 16 frames. Increasing either of these values will
|
||||||
|
require more memory and take more time.
|
||||||
|
|
||||||
|
|
||||||
|
Install and import required packages `⇑ <#top>`__
|
||||||
|
###############################################################################################################################
|
||||||
|
|
||||||
|
To work with text-to-video synthesis model, we will use Hugging Face’s
|
||||||
|
`Diffusers <https://github.com/huggingface/diffusers>`__ library. It
|
||||||
|
provides already pretrained model from ``cerspense``.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
!pip install -q "diffusers[torch]>=0.15.0" transformers "openvino==2023.1.0.dev20230811" numpy gradio
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
import gc
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional, Union, List, Callable
|
||||||
|
import base64
|
||||||
|
import tempfile
|
||||||
|
import warnings
|
||||||
|
|
||||||
|
import diffusers
|
||||||
|
import transformers
|
||||||
|
import numpy as np
|
||||||
|
import IPython
|
||||||
|
import ipywidgets as widgets
|
||||||
|
import torch
|
||||||
|
import PIL
|
||||||
|
import gradio as gr
|
||||||
|
|
||||||
|
import openvino as ov
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
2023-08-16 21:15:40.145184: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
|
2023-08-16 21:15:40.146998: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
|
||||||
|
2023-08-16 21:15:40.179214: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
|
||||||
|
2023-08-16 21:15:40.180050: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
|
2023-08-16 21:15:40.750499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
|
|
||||||
|
|
||||||
|
Original 576x320 inference requires a lot of RAM (>100GB), so let’s run
|
||||||
|
our example on a smaller frame size, keeping the same aspect ratio. Try
|
||||||
|
reducing values below to reduce the memory consumption.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
WIDTH = 432 # must be divisible by 8
|
||||||
|
HEIGHT = 240 # must be divisible by 8
|
||||||
|
NUM_FRAMES = 16
|
||||||
|
|
||||||
|
Load the model `⇑ <#top>`__
|
||||||
|
###############################################################################################################################
|
||||||
|
|
||||||
|
The model is loaded from HuggingFace using ``.from_pretrained`` method
|
||||||
|
of ``diffusers.DiffusionPipeline``.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
pipe = diffusers.DiffusionPipeline.from_pretrained('cerspense/zeroscope_v2_576w')
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
vae/diffusion_pytorch_model.safetensors not found
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
Loading pipeline components...: 0%| | 0/5 [00:00<?, ?it/s]
|
||||||
|
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
unet = pipe.unet
|
||||||
|
unet.eval()
|
||||||
|
vae = pipe.vae
|
||||||
|
vae.eval()
|
||||||
|
text_encoder = pipe.text_encoder
|
||||||
|
text_encoder.eval()
|
||||||
|
tokenizer = pipe.tokenizer
|
||||||
|
scheduler = pipe.scheduler
|
||||||
|
vae_scale_factor = pipe.vae_scale_factor
|
||||||
|
unet_in_channels = pipe.unet.config.in_channels
|
||||||
|
sample_width = WIDTH // vae_scale_factor
|
||||||
|
sample_height = HEIGHT // vae_scale_factor
|
||||||
|
del pipe
|
||||||
|
gc.collect();
|
||||||
|
|
||||||
|
Convert the model `⇑ <#top>`__
|
||||||
|
###############################################################################################################################
|
||||||
|
|
||||||
|
The architecture for generating videos from text comprises three
|
||||||
|
distinct sub-networks: one for extracting text features, another for
|
||||||
|
translating text features into the video latent space using a diffusion
|
||||||
|
model, and a final one for mapping the video latent space to the visual
|
||||||
|
space. The collective parameters of the entire model amount to
|
||||||
|
approximately 1.7 billion. It’s capable of processing English input. The
|
||||||
|
diffusion model is built upon the Unet3D model and achieves video
|
||||||
|
generation by iteratively denoising a starting point of pure Gaussian
|
||||||
|
noise video.
|
||||||
|
|
||||||
|
.. image:: 253-zeroscope-text2video-with-output_files/253-zeroscope-text2video-with-output_01_02.png
|
||||||
|
|
||||||
|
|
||||||
|
Define the conversion function `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
Model components are PyTorch modules, that can be converted with
|
||||||
|
``ov.convert_model`` function directly. We also use ``ov.save_model``
|
||||||
|
function to serialize the result of conversion.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
def convert(model: torch.nn.Module, xml_path: str, **convert_kwargs) -> Path:
|
||||||
|
xml_path = Path(xml_path)
|
||||||
|
if not xml_path.exists():
|
||||||
|
xml_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with torch.no_grad():
|
||||||
|
converted_model = ov.convert_model(model, **convert_kwargs)
|
||||||
|
ov.save_model(converted_model, xml_path)
|
||||||
|
del converted model
|
||||||
|
gc.collect()
|
||||||
|
torch._C._jit_clear_class_registry()
|
||||||
|
torch.jit._recursive.concrete_type_store = torch.jit._recursive.ConcreteTypeStore()
|
||||||
|
torch.jit._state._clear_class_state()
|
||||||
|
return xml_path
|
||||||
|
|
||||||
|
UNet `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
Text-to-video generation pipeline main component is a conditional 3D
|
||||||
|
UNet model that takes a noisy sample, conditional state, and a timestep
|
||||||
|
and returns a sample shaped output.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
unet_xml_path = convert(
|
||||||
|
unet,
|
||||||
|
"models/unet.xml",
|
||||||
|
example_input={
|
||||||
|
"sample": torch.randn(2, 4, 2, 32, 32),
|
||||||
|
"timestep": torch.tensor(1),
|
||||||
|
"encoder_hidden_states": torch.randn(2, 77, 1024),
|
||||||
|
},
|
||||||
|
input=[
|
||||||
|
("sample", (2, 4, NUM_FRAMES, sample_height, sample_width)),
|
||||||
|
("timestep", ()),
|
||||||
|
("encoder_hidden_states", (2, 77, 1024)),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
del unet
|
||||||
|
gc.collect();
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11.
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
[ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s.
|
||||||
|
|
||||||
|
|
||||||
|
VAE `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
Variational autoencoder (VAE) uses UNet output to decode latents to
|
||||||
|
visual representations. Our VAE model has KL loss for encoding images
|
||||||
|
into latents and decoding latent representations into images. For
|
||||||
|
inference, we need only decoder part.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
class VaeDecoderWrapper(torch.nn.Module):
|
||||||
|
def __init__(self, vae):
|
||||||
|
super().__init__()
|
||||||
|
self.vae = vae
|
||||||
|
|
||||||
|
def forward(self, z: torch.FloatTensor):
|
||||||
|
return self.vae.decode(z)
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
vae_decoder_xml_path = convert(
|
||||||
|
VaeDecoderWrapper(vae),
|
||||||
|
"models/vae.xml",
|
||||||
|
example_input=torch.randn(2, 4, 32, 32),
|
||||||
|
input=((NUM_FRAMES, 4, sample_height, sample_width)),
|
||||||
|
)
|
||||||
|
del vae
|
||||||
|
gc.collect();
|
||||||
|
|
||||||
|
Text encoder `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
Text encoder is used to encode the input prompt to tensor. Default
|
||||||
|
tensor length is 77.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
text_encoder_xml = convert(
|
||||||
|
text_encoder,
|
||||||
|
"models/text_encoder.xml",
|
||||||
|
example_input=torch.ones(1, 77, dtype=torch.int64),
|
||||||
|
input=((1, 77), (ov.Type.i64,)),
|
||||||
|
)
|
||||||
|
del text_encoder
|
||||||
|
gc.collect();
|
||||||
|
|
||||||
|
Build a pipeline `⇑ <#top>`__
|
||||||
|
###############################################################################################################################
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
def tensor2vid(video: torch.Tensor, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) -> List[np.ndarray]:
|
||||||
|
# This code is copied from https://github.com/modelscope/modelscope/blob/1509fdb973e5871f37148a4b5e5964cafd43e64d/modelscope/pipelines/multi_modal/text_to_video_synthesis_pipeline.py#L78
|
||||||
|
# reshape to ncfhw
|
||||||
|
mean = torch.tensor(mean, device=video.device).reshape(1, -1, 1, 1, 1)
|
||||||
|
std = torch.tensor(std, device=video.device).reshape(1, -1, 1, 1, 1)
|
||||||
|
# unnormalize back to [0,1]
|
||||||
|
video = video.mul_(std).add_(mean)
|
||||||
|
video.clamp_(0, 1)
|
||||||
|
# prepare the final outputs
|
||||||
|
i, c, f, h, w = video.shape
|
||||||
|
images = video.permute(2, 3, 0, 4, 1).reshape(
|
||||||
|
f, h, i * w, c
|
||||||
|
) # 1st (frames, h, batch_size, w, c) 2nd (frames, h, batch_size * w, c)
|
||||||
|
images = images.unbind(dim=0) # prepare a list of indvidual (consecutive frames)
|
||||||
|
images = [(image.cpu().numpy() * 255).astype("uint8") for image in images] # f h w c
|
||||||
|
return images
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
class OVTextToVideoSDPipeline(diffusers.DiffusionPipeline):
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vae_decoder: ov.CompiledModel,
|
||||||
|
text_encoder: ov.CompiledModel,
|
||||||
|
tokenizer: transformers.CLIPTokenizer,
|
||||||
|
unet: ov.CompiledModel,
|
||||||
|
scheduler: diffusers.schedulers.DDIMScheduler,
|
||||||
|
):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.vae_decoder = vae_decoder
|
||||||
|
self.text_encoder = text_encoder
|
||||||
|
self.tokenizer = tokenizer
|
||||||
|
self.unet = unet
|
||||||
|
self.scheduler = scheduler
|
||||||
|
self.vae_scale_factor = vae_scale_factor
|
||||||
|
self.unet_in_channels = unet_in_channels
|
||||||
|
self.width = WIDTH
|
||||||
|
self.height = HEIGHT
|
||||||
|
self.num_frames = NUM_FRAMES
|
||||||
|
|
||||||
|
def __call__(
|
||||||
|
self,
|
||||||
|
prompt: Union[str, List[str]] = None,
|
||||||
|
num_inference_steps: int = 50,
|
||||||
|
guidance_scale: float = 9.0,
|
||||||
|
negative_prompt: Optional[Union[str, List[str]]] = None,
|
||||||
|
eta: float = 0.0,
|
||||||
|
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
|
||||||
|
latents: Optional[torch.FloatTensor] = None,
|
||||||
|
prompt_embeds: Optional[torch.FloatTensor] = None,
|
||||||
|
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
|
||||||
|
output_type: Optional[str] = "np",
|
||||||
|
return_dict: bool = True,
|
||||||
|
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
|
||||||
|
callback_steps: int = 1,
|
||||||
|
):
|
||||||
|
r"""
|
||||||
|
Function invoked when calling the pipeline for generation.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompt (`str` or `List[str]`, *optional*):
|
||||||
|
The prompt or prompts to guide the video generation. If not defined, one has to pass `prompt_embeds`.
|
||||||
|
instead.
|
||||||
|
num_inference_steps (`int`, *optional*, defaults to 50):
|
||||||
|
The number of denoising steps. More denoising steps usually lead to a higher quality videos at the
|
||||||
|
expense of slower inference.
|
||||||
|
guidance_scale (`float`, *optional*, defaults to 7.5):
|
||||||
|
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
|
||||||
|
`guidance_scale` is defined as `w` of equation 2. of [Imagen
|
||||||
|
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
|
||||||
|
1`. Higher guidance scale encourages to generate videos that are closely linked to the text `prompt`,
|
||||||
|
usually at the expense of lower video quality.
|
||||||
|
negative_prompt (`str` or `List[str]`, *optional*):
|
||||||
|
The prompt or prompts not to guide the video generation. If not defined, one has to pass
|
||||||
|
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
|
||||||
|
less than `1`).
|
||||||
|
eta (`float`, *optional*, defaults to 0.0):
|
||||||
|
Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
|
||||||
|
[`schedulers.DDIMScheduler`], will be ignored for others.
|
||||||
|
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
|
||||||
|
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
|
||||||
|
to make generation deterministic.
|
||||||
|
latents (`torch.FloatTensor`, *optional*):
|
||||||
|
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for video
|
||||||
|
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
|
||||||
|
tensor will ge generated by sampling using the supplied random `generator`. Latents should be of shape
|
||||||
|
`(batch_size, num_channel, num_frames, height, width)`.
|
||||||
|
prompt_embeds (`torch.FloatTensor`, *optional*):
|
||||||
|
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
|
||||||
|
provided, text embeddings will be generated from `prompt` input argument.
|
||||||
|
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
|
||||||
|
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
|
||||||
|
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
|
||||||
|
argument.
|
||||||
|
output_type (`str`, *optional*, defaults to `"np"`):
|
||||||
|
The output format of the generate video. Choose between `torch.FloatTensor` or `np.array`.
|
||||||
|
return_dict (`bool`, *optional*, defaults to `True`):
|
||||||
|
Whether or not to return a [`~pipelines.stable_diffusion.TextToVideoSDPipelineOutput`] instead of a
|
||||||
|
plain tuple.
|
||||||
|
callback (`Callable`, *optional*):
|
||||||
|
A function that will be called every `callback_steps` steps during inference. The function will be
|
||||||
|
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
|
||||||
|
callback_steps (`int`, *optional*, defaults to 1):
|
||||||
|
The frequency at which the `callback` function will be called. If not specified, the callback will be
|
||||||
|
called at every step.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
`List[np.ndarray]`: generated video frames
|
||||||
|
"""
|
||||||
|
|
||||||
|
num_images_per_prompt = 1
|
||||||
|
|
||||||
|
# 1. Check inputs. Raise error if not correct
|
||||||
|
self.check_inputs(
|
||||||
|
prompt,
|
||||||
|
callback_steps,
|
||||||
|
negative_prompt,
|
||||||
|
prompt_embeds,
|
||||||
|
negative_prompt_embeds,
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. Define call parameters
|
||||||
|
if prompt is not None and isinstance(prompt, str):
|
||||||
|
batch_size = 1
|
||||||
|
elif prompt is not None and isinstance(prompt, list):
|
||||||
|
batch_size = len(prompt)
|
||||||
|
else:
|
||||||
|
batch_size = prompt_embeds.shape[0]
|
||||||
|
|
||||||
|
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
|
||||||
|
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
|
||||||
|
# corresponds to doing no classifier free guidance.
|
||||||
|
do_classifier_free_guidance = guidance_scale > 1.0
|
||||||
|
|
||||||
|
# 3. Encode input prompt
|
||||||
|
prompt_embeds = self._encode_prompt(
|
||||||
|
prompt,
|
||||||
|
num_images_per_prompt,
|
||||||
|
do_classifier_free_guidance,
|
||||||
|
negative_prompt,
|
||||||
|
prompt_embeds=prompt_embeds,
|
||||||
|
negative_prompt_embeds=negative_prompt_embeds,
|
||||||
|
)
|
||||||
|
|
||||||
|
# 4. Prepare timesteps
|
||||||
|
self.scheduler.set_timesteps(num_inference_steps)
|
||||||
|
timesteps = self.scheduler.timesteps
|
||||||
|
|
||||||
|
# 5. Prepare latent variables
|
||||||
|
num_channels_latents = self.unet_in_channels
|
||||||
|
latents = self.prepare_latents(
|
||||||
|
batch_size * num_images_per_prompt,
|
||||||
|
num_channels_latents,
|
||||||
|
prompt_embeds.dtype,
|
||||||
|
generator,
|
||||||
|
latents,
|
||||||
|
)
|
||||||
|
|
||||||
|
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
|
||||||
|
extra_step_kwargs = {"generator": generator, "eta": eta}
|
||||||
|
|
||||||
|
# 7. Denoising loop
|
||||||
|
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
|
||||||
|
with self.progress_bar(total=num_inference_steps) as progress_bar:
|
||||||
|
for i, t in enumerate(timesteps):
|
||||||
|
# expand the latents if we are doing classifier free guidance
|
||||||
|
latent_model_input = (
|
||||||
|
torch.cat([latents] * 2) if do_classifier_free_guidance else latents
|
||||||
|
)
|
||||||
|
latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
|
||||||
|
|
||||||
|
# predict the noise residual
|
||||||
|
noise_pred = self.unet(
|
||||||
|
{
|
||||||
|
"sample": latent_model_input,
|
||||||
|
"timestep": t,
|
||||||
|
"encoder_hidden_states": prompt_embeds,
|
||||||
|
}
|
||||||
|
)[0]
|
||||||
|
noise_pred = torch.tensor(noise_pred)
|
||||||
|
|
||||||
|
# perform guidance
|
||||||
|
if do_classifier_free_guidance:
|
||||||
|
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
|
||||||
|
noise_pred = noise_pred_uncond + guidance_scale * (
|
||||||
|
noise_pred_text - noise_pred_uncond
|
||||||
|
)
|
||||||
|
|
||||||
|
# reshape latents
|
||||||
|
bsz, channel, frames, width, height = latents.shape
|
||||||
|
latents = latents.permute(0, 2, 1, 3, 4).reshape(
|
||||||
|
bsz * frames, channel, width, height
|
||||||
|
)
|
||||||
|
noise_pred = noise_pred.permute(0, 2, 1, 3, 4).reshape(
|
||||||
|
bsz * frames, channel, width, height
|
||||||
|
)
|
||||||
|
|
||||||
|
# compute the previous noisy sample x_t -> x_t-1
|
||||||
|
latents = self.scheduler.step(
|
||||||
|
noise_pred, t, latents, **extra_step_kwargs
|
||||||
|
).prev_sample
|
||||||
|
|
||||||
|
# reshape latents back
|
||||||
|
latents = (
|
||||||
|
latents[None, :]
|
||||||
|
.reshape(bsz, frames, channel, width, height)
|
||||||
|
.permute(0, 2, 1, 3, 4)
|
||||||
|
)
|
||||||
|
|
||||||
|
# call the callback, if provided
|
||||||
|
if i == len(timesteps) - 1 or (
|
||||||
|
(i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0
|
||||||
|
):
|
||||||
|
progress_bar.update()
|
||||||
|
if callback is not None and i % callback_steps == 0:
|
||||||
|
callback(i, t, latents)
|
||||||
|
|
||||||
|
video_tensor = self.decode_latents(latents)
|
||||||
|
|
||||||
|
if output_type == "pt":
|
||||||
|
video = video_tensor
|
||||||
|
else:
|
||||||
|
video = tensor2vid(video_tensor)
|
||||||
|
|
||||||
|
if not return_dict:
|
||||||
|
return (video,)
|
||||||
|
|
||||||
|
return {"frames": video}
|
||||||
|
|
||||||
|
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline._encode_prompt
|
||||||
|
def _encode_prompt(
|
||||||
|
self,
|
||||||
|
prompt,
|
||||||
|
num_images_per_prompt,
|
||||||
|
do_classifier_free_guidance,
|
||||||
|
negative_prompt=None,
|
||||||
|
prompt_embeds: Optional[torch.FloatTensor] = None,
|
||||||
|
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
|
||||||
|
):
|
||||||
|
r"""
|
||||||
|
Encodes the prompt into text encoder hidden states.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompt (`str` or `List[str]`, *optional*):
|
||||||
|
prompt to be encoded
|
||||||
|
num_images_per_prompt (`int`):
|
||||||
|
number of images that should be generated per prompt
|
||||||
|
do_classifier_free_guidance (`bool`):
|
||||||
|
whether to use classifier free guidance or not
|
||||||
|
negative_prompt (`str` or `List[str]`, *optional*):
|
||||||
|
The prompt or prompts not to guide the image generation. If not defined, one has to pass
|
||||||
|
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
|
||||||
|
less than `1`).
|
||||||
|
prompt_embeds (`torch.FloatTensor`, *optional*):
|
||||||
|
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
|
||||||
|
provided, text embeddings will be generated from `prompt` input argument.
|
||||||
|
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
|
||||||
|
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
|
||||||
|
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
|
||||||
|
argument.
|
||||||
|
"""
|
||||||
|
if prompt is not None and isinstance(prompt, str):
|
||||||
|
batch_size = 1
|
||||||
|
elif prompt is not None and isinstance(prompt, list):
|
||||||
|
batch_size = len(prompt)
|
||||||
|
else:
|
||||||
|
batch_size = prompt_embeds.shape[0]
|
||||||
|
|
||||||
|
if prompt_embeds is None:
|
||||||
|
text_inputs = self.tokenizer(
|
||||||
|
prompt,
|
||||||
|
padding="max_length",
|
||||||
|
max_length=self.tokenizer.model_max_length,
|
||||||
|
truncation=True,
|
||||||
|
return_tensors="pt",
|
||||||
|
)
|
||||||
|
text_input_ids = text_inputs.input_ids
|
||||||
|
untruncated_ids = self.tokenizer(
|
||||||
|
prompt, padding="longest", return_tensors="pt"
|
||||||
|
).input_ids
|
||||||
|
|
||||||
|
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
|
||||||
|
text_input_ids, untruncated_ids
|
||||||
|
):
|
||||||
|
removed_text = self.tokenizer.batch_decode(
|
||||||
|
untruncated_ids[:, self.tokenizer.model_max_length - 1 : -1]
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
"The following part of your input was truncated because CLIP can only handle sequences up to"
|
||||||
|
f" {self.tokenizer.model_max_length} tokens: {removed_text}"
|
||||||
|
)
|
||||||
|
|
||||||
|
prompt_embeds = self.text_encoder(text_input_ids)
|
||||||
|
prompt_embeds = prompt_embeds[0]
|
||||||
|
prompt_embeds = torch.tensor(prompt_embeds)
|
||||||
|
|
||||||
|
bs_embed, seq_len, _ = prompt_embeds.shape
|
||||||
|
# duplicate text embeddings for each generation per prompt, using mps friendly method
|
||||||
|
prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
|
||||||
|
prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)
|
||||||
|
|
||||||
|
# get unconditional embeddings for classifier free guidance
|
||||||
|
if do_classifier_free_guidance and negative_prompt_embeds is None:
|
||||||
|
uncond_tokens: List[str]
|
||||||
|
if negative_prompt is None:
|
||||||
|
uncond_tokens = [""] * batch_size
|
||||||
|
elif type(prompt) is not type(negative_prompt):
|
||||||
|
raise TypeError(
|
||||||
|
f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="
|
||||||
|
f" {type(prompt)}."
|
||||||
|
)
|
||||||
|
elif isinstance(negative_prompt, str):
|
||||||
|
uncond_tokens = [negative_prompt]
|
||||||
|
elif batch_size != len(negative_prompt):
|
||||||
|
raise ValueError(
|
||||||
|
f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:"
|
||||||
|
f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches"
|
||||||
|
" the batch size of `prompt`."
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
uncond_tokens = negative_prompt
|
||||||
|
|
||||||
|
max_length = prompt_embeds.shape[1]
|
||||||
|
uncond_input = self.tokenizer(
|
||||||
|
uncond_tokens,
|
||||||
|
padding="max_length",
|
||||||
|
max_length=max_length,
|
||||||
|
truncation=True,
|
||||||
|
return_tensors="pt",
|
||||||
|
)
|
||||||
|
|
||||||
|
negative_prompt_embeds = self.text_encoder(uncond_input.input_ids)
|
||||||
|
negative_prompt_embeds = negative_prompt_embeds[0]
|
||||||
|
negative_prompt_embeds = torch.tensor(negative_prompt_embeds)
|
||||||
|
|
||||||
|
if do_classifier_free_guidance:
|
||||||
|
# duplicate unconditional embeddings for each generation per prompt, using mps friendly method
|
||||||
|
seq_len = negative_prompt_embeds.shape[1]
|
||||||
|
|
||||||
|
negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)
|
||||||
|
negative_prompt_embeds = negative_prompt_embeds.view(
|
||||||
|
batch_size * num_images_per_prompt, seq_len, -1
|
||||||
|
)
|
||||||
|
|
||||||
|
# For classifier free guidance, we need to do two forward passes.
|
||||||
|
# Here we concatenate the unconditional and text embeddings into a single batch
|
||||||
|
# to avoid doing two forward passes
|
||||||
|
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
|
||||||
|
|
||||||
|
return prompt_embeds
|
||||||
|
|
||||||
|
def prepare_latents(
|
||||||
|
self,
|
||||||
|
batch_size,
|
||||||
|
num_channels_latents,
|
||||||
|
dtype,
|
||||||
|
generator,
|
||||||
|
latents=None,
|
||||||
|
):
|
||||||
|
shape = (
|
||||||
|
batch_size,
|
||||||
|
num_channels_latents,
|
||||||
|
self.num_frames,
|
||||||
|
self.height // self.vae_scale_factor,
|
||||||
|
self.width // self.vae_scale_factor,
|
||||||
|
)
|
||||||
|
if isinstance(generator, list) and len(generator) != batch_size:
|
||||||
|
raise ValueError(
|
||||||
|
f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
|
||||||
|
f" size of {batch_size}. Make sure the batch size matches the length of the generators."
|
||||||
|
)
|
||||||
|
|
||||||
|
if latents is None:
|
||||||
|
latents = diffusers.utils.randn_tensor(shape, generator=generator, dtype=dtype)
|
||||||
|
|
||||||
|
# scale the initial noise by the standard deviation required by the scheduler
|
||||||
|
latents = latents * self.scheduler.init_noise_sigma
|
||||||
|
return latents
|
||||||
|
|
||||||
|
def check_inputs(
|
||||||
|
self,
|
||||||
|
prompt,
|
||||||
|
callback_steps,
|
||||||
|
negative_prompt=None,
|
||||||
|
prompt_embeds=None,
|
||||||
|
negative_prompt_embeds=None,
|
||||||
|
):
|
||||||
|
if self.height % 8 != 0 or self.width % 8 != 0:
|
||||||
|
raise ValueError(
|
||||||
|
f"`height` and `width` have to be divisible by 8 but are {self.height} and {self.width}."
|
||||||
|
)
|
||||||
|
|
||||||
|
if (callback_steps is None) or (
|
||||||
|
callback_steps is not None
|
||||||
|
and (not isinstance(callback_steps, int) or callback_steps <= 0)
|
||||||
|
):
|
||||||
|
raise ValueError(
|
||||||
|
f"`callback_steps` has to be a positive integer but is {callback_steps} of type"
|
||||||
|
f" {type(callback_steps)}."
|
||||||
|
)
|
||||||
|
|
||||||
|
if prompt is not None and prompt_embeds is not None:
|
||||||
|
raise ValueError(
|
||||||
|
f"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
|
||||||
|
" only forward one of the two."
|
||||||
|
)
|
||||||
|
elif prompt is None and prompt_embeds is None:
|
||||||
|
raise ValueError(
|
||||||
|
"Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined."
|
||||||
|
)
|
||||||
|
elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
|
||||||
|
raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
|
||||||
|
|
||||||
|
if negative_prompt is not None and negative_prompt_embeds is not None:
|
||||||
|
raise ValueError(
|
||||||
|
f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:"
|
||||||
|
f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
|
||||||
|
)
|
||||||
|
|
||||||
|
if prompt_embeds is not None and negative_prompt_embeds is not None:
|
||||||
|
if prompt_embeds.shape != negative_prompt_embeds.shape:
|
||||||
|
raise ValueError(
|
||||||
|
"`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but"
|
||||||
|
f" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`"
|
||||||
|
f" {negative_prompt_embeds.shape}."
|
||||||
|
)
|
||||||
|
|
||||||
|
def decode_latents(self, latents):
|
||||||
|
scale_factor = 0.18215
|
||||||
|
latents = 1 / scale_factor * latents
|
||||||
|
|
||||||
|
batch_size, channels, num_frames, height, width = latents.shape
|
||||||
|
latents = latents.permute(0, 2, 1, 3, 4).reshape(
|
||||||
|
batch_size * num_frames, channels, height, width
|
||||||
|
)
|
||||||
|
image = self.vae_decoder(latents)[0]
|
||||||
|
image = torch.tensor(image)
|
||||||
|
video = (
|
||||||
|
image[None, :]
|
||||||
|
.reshape(
|
||||||
|
(
|
||||||
|
batch_size,
|
||||||
|
num_frames,
|
||||||
|
-1,
|
||||||
|
)
|
||||||
|
+ image.shape[2:]
|
||||||
|
)
|
||||||
|
.permute(0, 2, 1, 3, 4)
|
||||||
|
)
|
||||||
|
# we always cast to float32 as this does not cause significant overhead and is compatible with bfloat16
|
||||||
|
video = video.float()
|
||||||
|
return video
|
||||||
|
|
||||||
|
Inference with OpenVINO `⇑ <#top>`__
|
||||||
|
###############################################################################################################################
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
core = ov.Core()
|
||||||
|
|
||||||
|
Select inference device `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
select device from dropdown list for running inference using OpenVINO
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
device = widgets.Dropdown(
|
||||||
|
options=core.available_devices + ["AUTO"],
|
||||||
|
value='AUTO',
|
||||||
|
description='Device:',
|
||||||
|
disabled=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
device
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO')
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
%%time
|
||||||
|
ov_unet = core.compile_model(unet_xml_path, device_name=device.value)
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
CPU times: user 14.1 s, sys: 5.62 s, total: 19.7 s
|
||||||
|
Wall time: 10.6 s
|
||||||
|
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
%%time
|
||||||
|
ov_vae_decoder = core.compile_model(vae_decoder_xml_path, device_name=device.value)
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
CPU times: user 456 ms, sys: 320 ms, total: 776 ms
|
||||||
|
Wall time: 328 ms
|
||||||
|
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
%%time
|
||||||
|
ov_text_encoder = core.compile_model(text_encoder_xml, device_name=device.value)
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
CPU times: user 1.78 s, sys: 1.44 s, total: 3.22 s
|
||||||
|
Wall time: 1.13 s
|
||||||
|
|
||||||
|
|
||||||
|
Here we replace the pipeline parts with versions converted to OpenVINO
|
||||||
|
IR and compiled to specific device. Note that we use original pipeline
|
||||||
|
tokenizer and scheduler.
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
ov_pipe = OVTextToVideoSDPipeline(ov_vae_decoder, ov_text_encoder, tokenizer, ov_unet, scheduler)
|
||||||
|
|
||||||
|
Define a prompt `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
prompt = "A panda eating bamboo on a rock."
|
||||||
|
|
||||||
|
Let’s generate a video for our prompt. For full list of arguments, see
|
||||||
|
``__call__`` function definition of ``OVTextToVideoSDPipeline`` class in
|
||||||
|
`Build a pipeline <#Build-a-pipeline>`__ section.
|
||||||
|
|
||||||
|
Video generation `⇑ <#top>`__
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
frames = ov_pipe(prompt, num_inference_steps=25)['frames']
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
|
||||||
|
0%| | 0/25 [00:00<?, ?it/s]
|
||||||
|
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
images = [PIL.Image.fromarray(frame) for frame in frames]
|
||||||
|
images[0].save("output.gif", save_all=True, append_images=images[1:], duration=125, loop=0)
|
||||||
|
with open("output.gif", "rb") as gif_file:
|
||||||
|
b64 = f'data:image/gif;base64,{base64.b64encode(gif_file.read()).decode()}'
|
||||||
|
IPython.display.HTML(f"<img src=\"{b64}\" />")
|
||||||
|
|
||||||
|
|
||||||
|
.. image:: 253-zeroscope-text2video-with-output_files/253-zeroscope-text2video-with-output_01_03.gif
|
||||||
|
|
||||||
|
|
||||||
|
Interactive demo `⇑ <#top>`__
|
||||||
|
###############################################################################################################################
|
||||||
|
|
||||||
|
.. code:: ipython3
|
||||||
|
|
||||||
|
def generate(
|
||||||
|
prompt, seed, num_inference_steps, _=gr.Progress(track_tqdm=True)
|
||||||
|
):
|
||||||
|
generator = torch.Generator().manual_seed(seed)
|
||||||
|
frames = ov_pipe(
|
||||||
|
prompt,
|
||||||
|
num_inference_steps=num_inference_steps,
|
||||||
|
generator=generator,
|
||||||
|
)["frames"]
|
||||||
|
out_file = tempfile.NamedTemporaryFile(suffix=".gif", delete=False)
|
||||||
|
images = [PIL.Image.fromarray(frame) for frame in frames]
|
||||||
|
images[0].save(
|
||||||
|
out_file, save_all=True, append_images=images[1:], duration=125, loop=0
|
||||||
|
)
|
||||||
|
return out_file.name
|
||||||
|
|
||||||
|
|
||||||
|
demo = gr.Interface(
|
||||||
|
generate,
|
||||||
|
[
|
||||||
|
gr.Textbox(label="Prompt"),
|
||||||
|
gr.Slider(0, 1000000, value=42, label="Seed", step=1),
|
||||||
|
gr.Slider(10, 50, value=25, label="Number of inference steps", step=1),
|
||||||
|
],
|
||||||
|
gr.Image(label="Result"),
|
||||||
|
examples=[
|
||||||
|
["An astronaut riding a horse.", 0, 25],
|
||||||
|
["A panda eating bamboo on a rock.", 0, 25],
|
||||||
|
["Spiderman is surfing.", 0, 25],
|
||||||
|
],
|
||||||
|
allow_flagging="never"
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
demo.queue().launch(debug=True)
|
||||||
|
except Exception:
|
||||||
|
demo.queue().launch(share=True, debug=True)
|
||||||
|
# if you are launching remotely, specify server_name and server_port
|
||||||
|
# demo.launch(server_name='your server name', server_port='server port in int')
|
||||||
|
# Read more in the docs: https://gradio.app/docs/
|
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:f9b3abdf1818a885d159961285a1ef96a2c0c0c99d26eac96435b7813e28198d
|
||||||
|
size 41341
|
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:c0786f897470a25d935d1f5e096132f086c7f96f42d441102f598828d6d39452
|
||||||
|
size 1366066
|
@ -154,115 +154,117 @@ Demos that demonstrate inference on a particular model.
|
|||||||
|
|
||||||
.. dropdown:: Explore more notebooks below.
|
.. dropdown:: Explore more notebooks below.
|
||||||
|
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| Notebook | Description | Preview |
|
| Notebook | Description | Preview |
|
||||||
+===============================================================================================================================+============================================================================================================================================+===========================================+
|
+===============================================================================================================================+============================================================================================================================================+====================================================+
|
||||||
| `201-vision-monodepth <notebooks/201-vision-monodepth-with-output.html>`__ |br| |n201| |br| |c201| | Monocular depth estimation with images and video. | |n201-img1| |
|
| `201-vision-monodepth <notebooks/201-vision-monodepth-with-output.html>`__ |br| |n201| |br| |c201| | Monocular depth estimation with images and video. | |n201-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `202-vision-superresolution-image <notebooks/202-vision-superresolution-image-with-output.html>`__ |br| |n202i| |br| |c202i| | Upscale raw images with a super resolution model. | |n202i-img1| → |n202i-img2| |
|
| `202-vision-superresolution-image <notebooks/202-vision-superresolution-image-with-output.html>`__ |br| |n202i| |br| |c202i| | Upscale raw images with a super resolution model. | |n202i-img1| → |n202i-img2| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `202-vision-superresolution-video <notebooks/202-vision-superresolution-video-with-output.html>`__ |br| |n202v| |br| |c202v| | Turn 360p into 1080p video using a super resolution model. | |n202v-img1| → |n202v-img2| |
|
| `202-vision-superresolution-video <notebooks/202-vision-superresolution-video-with-output.html>`__ |br| |n202v| |br| |c202v| | Turn 360p into 1080p video using a super resolution model. | |n202v-img1| → |n202v-img2| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `203-meter-reader <notebooks/203-meter-reader-with-output.html>`__ |br| |n203| | PaddlePaddle pre-trained models to read industrial meter's value. | |n203-img1| |
|
| `203-meter-reader <notebooks/203-meter-reader-with-output.html>`__ |br| |n203| | PaddlePaddle pre-trained models to read industrial meter's value. | |n203-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `204-segmenter-semantic-segmentation <notebooks/204-segmenter-semantic-segmentation-with-output.html>`__ |br| |c204| | Semantic segmentation with OpenVINO™ using Segmenter. | |n204-img1| |
|
| `204-segmenter-semantic-segmentation <notebooks/204-segmenter-semantic-segmentation-with-output.html>`__ |br| |c204| | Semantic segmentation with OpenVINO™ using Segmenter. | |n204-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `206-vision-paddlegan-anime <notebooks/206-vision-paddlegan-anime-with-output.html>`__ | Turn an image into anime using a GAN. | |n206-img1| → |n206-img2| |
|
| `206-vision-paddlegan-anime <notebooks/206-vision-paddlegan-anime-with-output.html>`__ | Turn an image into anime using a GAN. | |n206-img1| → |n206-img2| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `207-vision-paddlegan-superresolution <notebooks/207-vision-paddlegan-superresolution-with-output.html>`__ | Upscale small images with superresolution using a PaddleGAN model. | |n207-img1| → |n207-img2| |
|
| `207-vision-paddlegan-superresolution <notebooks/207-vision-paddlegan-superresolution-with-output.html>`__ | Upscale small images with superresolution using a PaddleGAN model. | |n207-img1| → |n207-img2| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `208-optical-character-recognition <notebooks/208-optical-character-recognition-with-output.html>`__ | Annotate text on images using text recognition resnet. | |n208-img1| |
|
| `208-optical-character-recognition <notebooks/208-optical-character-recognition-with-output.html>`__ | Annotate text on images using text recognition resnet. | |n208-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `212-pyannote-speaker-diarization <notebooks/212-pyannote-speaker-diarization-with-output.html>`__ | Run inference on speaker diarization pipeline. | |n212-img1| |
|
| `212-pyannote-speaker-diarization <notebooks/212-pyannote-speaker-diarization-with-output.html>`__ | Run inference on speaker diarization pipeline. | |n212-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `210-slowfast-video-recognition <notebooks/210-slowfast-video-recognition-with-output.html>`__ |br| |n210| | Video Recognition using SlowFast and OpenVINO™ | |n210-img1| |
|
| `210-slowfast-video-recognition <notebooks/210-slowfast-video-recognition-with-output.html>`__ |br| |n210| | Video Recognition using SlowFast and OpenVINO™ | |n210-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `213-question-answering <notebooks/213-question-answering-with-output.html>`__ |br| |n213| | Answer your questions basing on a context. | |n213-img1| |
|
| `213-question-answering <notebooks/213-question-answering-with-output.html>`__ |br| |n213| | Answer your questions basing on a context. | |n213-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `214-grammar-correction <notebooks/214-grammar-correction-with-output.html>`__ | Grammatical error correction with OpenVINO. | |
|
| `214-grammar-correction <notebooks/214-grammar-correction-with-output.html>`__ | Grammatical error correction with OpenVINO. | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `216-attention-center <notebooks/216-attention-center-with-output.html>`__ | The attention center model with OpenVINO™ | |
|
| `216-attention-center <notebooks/216-attention-center-with-output.html>`__ | The attention center model with OpenVINO™ | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `217-vision-deblur <notebooks/217-vision-deblur-with-output.html>`__ |br| |n217| | Deblur images with DeblurGAN-v2. | |n217-img1| |
|
| `217-vision-deblur <notebooks/217-vision-deblur-with-output.html>`__ |br| |n217| | Deblur images with DeblurGAN-v2. | |n217-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `219-knowledge-graphs-conve <notebooks/219-knowledge-graphs-conve-with-output.html>`__ |br| |n219| | Optimize the knowledge graph embeddings model (ConvE) with OpenVINO. | |
|
| `219-knowledge-graphs-conve <notebooks/219-knowledge-graphs-conve-with-output.html>`__ |br| |n219| | Optimize the knowledge graph embeddings model (ConvE) with OpenVINO. | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `220-cross-lingual-books-alignment <notebooks/220-cross-lingual-books-alignment-with-output.html>`__ |br| |n220| |br| |c220| | Cross-lingual Books Alignment With Transformers and OpenVINO™ | |n220-img1| |
|
| `220-cross-lingual-books-alignment <notebooks/220-cross-lingual-books-alignment-with-output.html>`__ |br| |n220| |br| |c220| | Cross-lingual Books Alignment With Transformers and OpenVINO™ | |n220-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `221-machine-translation <notebooks/221-machine-translation-with-output.html>`__ |br| |n221| |br| |c221| | Real-time translation from English to German. | |
|
| `221-machine-translation <notebooks/221-machine-translation-with-output.html>`__ |br| |n221| |br| |c221| | Real-time translation from English to German. | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `222-vision-image-colorization <notebooks/222-vision-image-colorization-with-output.html>`__ |br| |n222| | Use pre-trained models to colorize black & white images using OpenVINO. | |n222-img1| |
|
| `222-vision-image-colorization <notebooks/222-vision-image-colorization-with-output.html>`__ |br| |n222| | Use pre-trained models to colorize black & white images using OpenVINO. | |n222-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `223-text-prediction <notebooks/223-text-prediction-with-output.html>`__ |br| |c223| | Use pre-trained models to perform text prediction on an input sequence. | |n223-img1| |
|
| `223-text-prediction <notebooks/223-text-prediction-with-output.html>`__ |br| |c223| | Use pre-trained models to perform text prediction on an input sequence. | |n223-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `224-3D-segmentation-point-clouds <notebooks/224-3D-segmentation-point-clouds-with-output.html>`__ | Process point cloud data and run 3D Part Segmentation with OpenVINO. | |n224-img1| |
|
| `224-3D-segmentation-point-clouds <notebooks/224-3D-segmentation-point-clouds-with-output.html>`__ | Process point cloud data and run 3D Part Segmentation with OpenVINO. | |n224-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `225-stable-diffusion-text-to-image <notebooks/225-stable-diffusion-text-to-image-with-output.html>`__ | Text-to-image generation with Stable Diffusion method. | |n225-img1| |
|
| `225-stable-diffusion-text-to-image <notebooks/225-stable-diffusion-text-to-image-with-output.html>`__ | Text-to-image generation with Stable Diffusion method. | |n225-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `226-yolov7-optimization <notebooks/226-yolov7-optimization-with-output.html>`__ | Optimize YOLOv7, using NNCF PTQ API. | |n226-img1| |
|
| `226-yolov7-optimization <notebooks/226-yolov7-optimization-with-output.html>`__ | Optimize YOLOv7, using NNCF PTQ API. | |n226-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `227-whisper-subtitles-generation <notebooks/227-whisper-subtitles-generation-with-output.html>`__ |br| |c227| | Generate subtitles for video with OpenAI Whisper and OpenVINO. | |n227-img1| |
|
| `227-whisper-subtitles-generation <notebooks/227-whisper-subtitles-generation-with-output.html>`__ |br| |c227| | Generate subtitles for video with OpenAI Whisper and OpenVINO. | |n227-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `228-clip-zero-shot-convert <notebooks/228-clip-zero-shot-convert-with-output.html>`__ | Zero-shot Image Classification with OpenAI CLIP and OpenVINO™ | |n228-img1| |
|
| `228-clip-zero-shot-convert <notebooks/228-clip-zero-shot-convert-with-output.html>`__ | Zero-shot Image Classification with OpenAI CLIP and OpenVINO™ | |n228-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `228-clip-zero-shot-quantize <notebooks/228-clip-zero-shot-quantize-with-output.html>`__ | Post-Training Quantization of OpenAI CLIP model with NNCF | |n228-img2| |
|
| `228-clip-zero-shot-quantize <notebooks/228-clip-zero-shot-quantize-with-output.html>`__ | Post-Training Quantization of OpenAI CLIP model with NNCF | |n228-img2| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `229-distilbert-sequence-classification <notebooks/229-distilbert-sequence-classification-with-output.html>`__ |br| |n229| | Sequence classification with OpenVINO. | |n229-img1| |
|
| `229-distilbert-sequence-classification <notebooks/229-distilbert-sequence-classification-with-output.html>`__ |br| |n229| | Sequence classification with OpenVINO. | |n229-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `230-yolov8-optimization <notebooks/230-yolov8-optimization-with-output.html>`__ |br| |c230| | Optimize YOLOv8, using NNCF PTQ API. | |n230-img1| |
|
| `230-yolov8-optimization <notebooks/230-yolov8-optimization-with-output.html>`__ |br| |c230| | Optimize YOLOv8, using NNCF PTQ API. | |n230-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `231-instruct-pix2pix-image-editing <notebooks/231-instruct-pix2pix-image-editing-with-output.html>`__ | Image editing with InstructPix2Pix. | |n231-img1| |
|
| `231-instruct-pix2pix-image-editing <notebooks/231-instruct-pix2pix-image-editing-with-output.html>`__ | Image editing with InstructPix2Pix. | |n231-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `232-clip-language-saliency-map <notebooks/232-clip-language-saliency-map-with-output.html>`__ |br| |c232| | Language-visual saliency with CLIP and OpenVINO™. | |n232-img1| |
|
| `232-clip-language-saliency-map <notebooks/232-clip-language-saliency-map-with-output.html>`__ |br| |c232| | Language-visual saliency with CLIP and OpenVINO™. | |n232-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `233-blip-visual-language-processing <notebooks/233-blip-visual-language-processing-with-output.html>`__ | Visual question answering and image captioning using BLIP and OpenVINO™. | |n233-img1| |
|
| `233-blip-visual-language-processing <notebooks/233-blip-visual-language-processing-with-output.html>`__ | Visual question answering and image captioning using BLIP and OpenVINO™. | |n233-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `234-encodec-audio-compression <notebooks/234-encodec-audio-compression-with-output.html>`__ | Audio compression with EnCodec and OpenVINO™. | |n234-img1| |
|
| `234-encodec-audio-compression <notebooks/234-encodec-audio-compression-with-output.html>`__ | Audio compression with EnCodec and OpenVINO™. | |n234-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `235-controlnet-stable-diffusion <notebooks/235-controlnet-stable-diffusion-with-output.html>`__ | A text-to-image generation with ControlNet Conditioning and OpenVINO™. | |n235-img1| |
|
| `235-controlnet-stable-diffusion <notebooks/235-controlnet-stable-diffusion-with-output.html>`__ | A text-to-image generation with ControlNet Conditioning and OpenVINO™. | |n235-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.html>`__ | Text-to-image generation and Infinite Zoom with Stable Diffusion v2 and OpenVINO™. | |n236-img1| |
|
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.html>`__ | Text-to-image generation and Infinite Zoom with Stable Diffusion v2 and OpenVINO™. | |n236-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.html>`__ | Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware. | |n236-img4| |
|
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.html>`__ | Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware. | |n236-img4| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-optimum-demo-with-output.html>`__ | Stable Diffusion v2.1 using Optimum-Intel OpenVINO. | |n236-img4| |
|
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-optimum-demo-with-output.html>`__ | Stable Diffusion v2.1 using Optimum-Intel OpenVINO. | |n236-img4| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.html>`__ | Stable Diffusion Text-to-Image Demo. | |n236-img4| |
|
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.html>`__ | Stable Diffusion Text-to-Image Demo. | |n236-img4| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-text-to-image-with-output.html>`__ | Text-to-image generation with Stable Diffusion v2 and OpenVINO™. | |n236-img1| |
|
| `236-stable-diffusion-v2 <notebooks/236-stable-diffusion-v2-text-to-image-with-output.html>`__ | Text-to-image generation with Stable Diffusion v2 and OpenVINO™. | |n236-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `237-segment-anything <notebooks/237-segment-anything-with-output.html>`__ | Prompt based object segmentation mask generation, using Segment Anything and OpenVINO™. | |n237-img1| |
|
| `237-segment-anything <notebooks/237-segment-anything-with-output.html>`__ | Prompt based object segmentation mask generation, using Segment Anything and OpenVINO™. | |n237-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `238-deep-floyd-if <notebooks/238-deep-floyd-if-with-output.html>`__ | Text-to-image generation with DeepFloyd IF and OpenVINO™. | |n238-img1| |
|
| `238-deep-floyd-if <notebooks/238-deep-floyd-if-with-output.html>`__ | Text-to-image generation with DeepFloyd IF and OpenVINO™. | |n238-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `239-image-bind <notebooks/239-image-bind-convert-with-output.html>`__ | Binding multimodal data, using ImageBind and OpenVINO™. | |n239-img1| |
|
| `239-image-bind <notebooks/239-image-bind-convert-with-output.html>`__ | Binding multimodal data, using ImageBind and OpenVINO™. | |n239-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `240-dolly-2-instruction-following <notebooks/240-dolly-2-instruction-following-with-output.html>`__ | Instruction following using Databricks Dolly 2.0 and OpenVINO™. | |n240-img1| |
|
| `240-dolly-2-instruction-following <notebooks/240-dolly-2-instruction-following-with-output.html>`__ | Instruction following using Databricks Dolly 2.0 and OpenVINO™. | |n240-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `241-riffusion-text-to-music <notebooks/241-riffusion-text-to-music-with-output.html>`__ | Text-to-Music generation using Riffusion and OpenVINO™. | |n241-img1| |
|
| `241-riffusion-text-to-music <notebooks/241-riffusion-text-to-music-with-output.html>`__ | Text-to-Music generation using Riffusion and OpenVINO™. | |n241-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `242-freevc-voice-conversion <notebooks/242-freevc-voice-conversion-with-output.html>`__ | High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™ | |
|
| `242-freevc-voice-conversion <notebooks/242-freevc-voice-conversion-with-output.html>`__ | High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™ | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `243-tflite-selfie-segmentation <notebooks/243-tflite-selfie-segmentation-with-output.html>`__ |br| |n243| |br| |c243| | Selfie Segmentation using TFLite and OpenVINO™. | |n243-img1| |
|
| `243-tflite-selfie-segmentation <notebooks/243-tflite-selfie-segmentation-with-output.html>`__ |br| |n243| |br| |c243| | Selfie Segmentation using TFLite and OpenVINO™. | |n243-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `244-named-entity-recognition <notebooks/244-named-entity-recognition-with-output.html>`__ |br| |c244| | Named entity recognition with OpenVINO™. | |
|
| `244-named-entity-recognition <notebooks/244-named-entity-recognition-with-output.html>`__ |br| |c244| | Named entity recognition with OpenVINO™. | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `245-typo-detector <notebooks/245-typo-detector-with-output.html>`__ | English Typo Detection in sentences with OpenVINO™. | |n245-img1| |
|
| `245-typo-detector <notebooks/245-typo-detector-with-output.html>`__ | English Typo Detection in sentences with OpenVINO™. | |n245-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `246-depth-estimation-videpth <notebooks/246-depth-estimation-videpth-with-output.html>`__ | Monocular Visual-Inertial Depth Estimation with OpenVINO™. | |n246-img1| |
|
| `246-depth-estimation-videpth <notebooks/246-depth-estimation-videpth-with-output.html>`__ | Monocular Visual-Inertial Depth Estimation with OpenVINO™. | |n246-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `247-code-language-id <notebooks/247-code-language-id-with-output.html>`__ |br| |n247| | Identify the programming language used in an arbitrary code snippet. | |
|
| `247-code-language-id <notebooks/247-code-language-id-with-output.html>`__ |br| |n247| | Identify the programming language used in an arbitrary code snippet. | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `248-stable-diffusion-xl <notebooks/248-stable-diffusion-xl-with-output.html>`__ | Image generation with Stable Diffusion XL and OpenVINO™. | |n248-img1| |
|
| `248-stable-diffusion-xl <notebooks/248-stable-diffusion-xl-with-output.html>`__ | Image generation with Stable Diffusion XL and OpenVINO™. | |n248-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `249-oneformer-segmentation <notebooks/249-oneformer-segmentation-with-output.html>`__ | Universal segmentation with OneFormer and OpenVINO™. | |n249-img1| |
|
| `249-oneformer-segmentation <notebooks/249-oneformer-segmentation-with-output.html>`__ | Universal segmentation with OneFormer and OpenVINO™. | |n249-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `250-music-generation <notebooks/250-music-generation-with-output.html>`__ |br| |n250| |br| |c250| | Controllable Music Generation with MusicGen and OpenVINO™. | |n250-img1| |
|
| `250-music-generation <notebooks/250-music-generation-with-output.html>`__ |br| |n250| |br| |c250| | Controllable Music Generation with MusicGen and OpenVINO™. | |n250-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `251-tiny-sd-image-generation <notebooks/251-tiny-sd-image-generation-with-output.html>`__ |br| |c251| | Image Generation with Tiny-SD and OpenVINO™. | |n251-img1| |
|
| `251-tiny-sd-image-generation <notebooks/251-tiny-sd-image-generation-with-output.html>`__ |br| |c251| | Image Generation with Tiny-SD and OpenVINO™. | |n251-img1| |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
| `252-fastcomposer-image-generation <notebooks/252-fastcomposer-image-generation-with-output.html>`__ | Image generation with FastComposer and OpenVINO™. | |
|
| `252-fastcomposer-image-generation <notebooks/252-fastcomposer-image-generation-with-output.html>`__ | Image generation with FastComposer and OpenVINO™. | |
|
||||||
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
|
| `253-zeroscope-text2video <notebooks/253-zeroscope-text2video-with-output.html>`__ | Text-to video synthesis with ZeroScope and OpenVINO™. | A panda eating bamboo on a rock. |br| |n253-img1| |
|
||||||
|
+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
Model Training
|
Model Training
|
||||||
@ -501,6 +503,8 @@ Made with `contributors-img <https://contrib.rocks>`__.
|
|||||||
:target: https://user-images.githubusercontent.com/76463150/260439306-81c81c8d-1f9c-41d0-b881-9491766def8e.png
|
:target: https://user-images.githubusercontent.com/76463150/260439306-81c81c8d-1f9c-41d0-b881-9491766def8e.png
|
||||||
.. |n251-img1| image:: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png
|
.. |n251-img1| image:: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png
|
||||||
:target: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png
|
:target: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png
|
||||||
|
.. |n253-img1| image:: https://user-images.githubusercontent.com/76161256/261102399-500956d5-4aac-4710-a77c-4df34bcda3be.gif
|
||||||
|
:target: https://user-images.githubusercontent.com/76161256/261102399-500956d5-4aac-4710-a77c-4df34bcda3be.gif
|
||||||
.. |n301-img1| image:: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png
|
.. |n301-img1| image:: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png
|
||||||
:target: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png
|
:target: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png
|
||||||
.. |n401-img1| image:: https://user-images.githubusercontent.com/4547501/141471665-82b28c86-cf64-4bfe-98b3-c314658f2d96.gif
|
.. |n401-img1| image:: https://user-images.githubusercontent.com/4547501/141471665-82b28c86-cf64-4bfe-98b3-c314658f2d96.gif
|
||||||
|
Loading…
Reference in New Issue
Block a user