diff --git a/docs/Documentation/openvino_workflow.md b/docs/Documentation/openvino_workflow.md index a304400f102..3d617a35155 100644 --- a/docs/Documentation/openvino_workflow.md +++ b/docs/Documentation/openvino_workflow.md @@ -17,6 +17,7 @@ Running Inference Deployment on a Local System Deployment on a Model Server + pytorch_2_0_torch_compile | :doc:`Model Preparation ` diff --git a/docs/Documentation/torch_compile.md b/docs/Documentation/torch_compile.md new file mode 100644 index 00000000000..37c5c4da299 --- /dev/null +++ b/docs/Documentation/torch_compile.md @@ -0,0 +1,157 @@ +# PyTorch Deployment via "torch.compile" {#pytorch_2_0_torch_compile} + +@sphinxdirective + + +The ``torch.compile`` feature enables you to use OpenVINO for PyTorch-native applications. +It speeds up PyTorch code by JIT-compiling it into optimized kernels. +By default, Torch code runs in eager-mode, but with the use of ``torch.compile`` it goes through the following steps: + +1. **Graph acquisition** - the model is rewritten as blocks of subgraphs that are either: + + * compiled by TorchDynamo and "flattened", + * falling back to the eager-mode, due to unsupported Python constructs (like control-flow code). + +2. **Graph lowering** - all PyTorch operations are decomposed into their constituent kernels specific to the chosen backend. +3. **Graph compilation** - the kernels call their corresponding low-level device-specific operations. + + + +How to Use +################# + +To use ``torch.compile``, you need to add an import statement and define one of the two available backends: + +| ``openvino`` +| With this backend, Torch FX subgraphs are directly converted to OpenVINO representation without any additional PyTorch based tracing/scripting. + +| ``openvino_ts`` +| With this backend, Torch FX subgraphs are first traced/scripted with PyTorch Torchscript, and then converted to OpenVINO representation. + + +.. tab-set:: + + .. tab-item:: openvino + :sync: backend-openvino + + .. code-block:: console + + import openvino.torch + ... + model = torch.compile(model, backend='openvino') + + Execution diagram: + + .. image:: _static/images/torch_compile_backend_openvino.svg + :width: 992px + :height: 720px + :scale: 60% + :align: center + + .. tab-item:: openvino_ts + :sync: backend-openvino-ts + + .. code-block:: console + + import openvino.torch + ... + model = torch.compile(model, backend='openvino_ts') + + Execution diagram: + + .. image:: _static/images/torch_compile_backend_openvino_ts.svg + :width: 1088px + :height: 720px + :scale: 60% + :align: center + + +Environment Variables ++++++++++++++++++++++++++++ + +* **OPENVINO_TORCH_BACKEND_DEVICE**: enables selecting a specific hardware device to run the application. + By default, the OpenVINO backend for ``torch.compile`` runs PyTorch applications using the CPU. Setting + this variable to GPU.0, for example, will make the application use the integrated graphics processor instead. +* **OPENVINO_TORCH_MODEL_CACHING**: enables saving the optimized model files to a hard drive, after the first application run. + This makes them available for the following application executions, reducing the first-inference latency. + By default, this variable is set to ``False``. Setting it to ``True`` enables caching. +* **OPENVINO_TORCH_CACHE_DIR**: enables defining a custom directory for the model files (if model caching set to ``True``). + By default, the OpenVINO IR is saved in the ``cache`` sub-directory, created in the application's root directory. + +Windows support +++++++++++++++++++++++++++ + +Currently, PyTorch does not support ``torch.compile`` feature on Windows officially. However, it can be accessed by running +the below instructions: + +1. Install the PyTorch nightly wheel file - `2.1.0.dev20230713 `__ , +2. Update the file at ``/Lib/site-packages/torch/_dynamo/eval_frames.py`` +3. Find the function called ``check_if_dynamo_supported()``: + + .. code-block:: console + + def check_if_dynamo_supported(): + if sys.platform == "win32": + raise RuntimeError("Windows not yet supported for torch.compile") + if sys.version_info >= (3, 11): + raise RuntimeError("Python 3.11+ not yet supported for torch.compile") + +4. Put in comments the first two lines in this function, so it looks like this: + + .. code-block:: console + + def check_if_dynamo_supported(): + #if sys.platform == "win32": + # raise RuntimeError("Windows not yet supported for torch.compile") + if sys.version_info >= (3, 11): + `raise RuntimeError("Python 3.11+ not yet supported for torch.compile") + + +Support for Automatic1111 Stable Diffusion WebUI ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Automatic1111 Stable Diffusion WebUI is an open-source repository that hosts a browser-based interface for the Stable Diffusion +based image generation. It allows users to create realistic and creative images from text prompts. +Stable Diffusion WebUI is supported on Intel CPUs, Intel integrated GPUs, and Intel discrete GPUs by leveraging OpenVINO +``torch.compile`` capability. Detailed instructions are available in +`Stable Diffusion WebUI repository. `__ + + +Architecture +################# + +The ``torch.compile`` feature is part of PyTorch 2.0, and is based on: + +* **TorchDynamo** - a Python-level JIT that hooks into the frame evaluation API in CPython, + (PEP 523) to dynamically modify Python bytecode right before it is executed (PyTorch operators + that cannot be extracted to FX graph are executed in the native Python environment). + It maintains the eager-mode capabilities using + `Guards `__ to ensure the + generated graphs are valid. + +* **AOTAutograd** - generates the backward graph corresponding to the forward graph captured by TorchDynamo. +* **PrimTorch** - decomposes complicated PyTorch operations into simpler and more elementary ops. +* **TorchInductor** - a deep learning compiler that generates fast code for multiple accelerators and backends. + + + + +When the PyTorch module is wrapped with ``torch.compile``, TorchDynamo traces the module and +rewrites Python bytecode to extract sequences of PyTorch operations into an FX Graph, +which can be optimized by the OpenVINO backend. The Torch FX graphs are first converted to +inlined FX graphs and the graph partitioning module traverses inlined FX graph to identify +operators supported by OpenVINO. + +All the supported operators are clustered into OpenVINO submodules, converted to the OpenVINO +graph using OpenVINO's PyTorch decoder, and executed in an optimized manner using OpenVINO runtime. +All unsupported operators fall back to the native PyTorch runtime on CPU. If the subgraph +fails during OpenVINO conversion, the subgraph falls back to PyTorch's default inductor backend. + + + +Additional Resources +############################ + +* `PyTorch 2.0 documentation `_ + +@endsphinxdirective diff --git a/docs/_static/css/homepage_style.css b/docs/_static/css/homepage_style.css index e76b61374d2..e505be4088e 100644 --- a/docs/_static/css/homepage_style.css +++ b/docs/_static/css/homepage_style.css @@ -167,3 +167,7 @@ h1 { max-width: 100%; } } + +.sd-row { + --sd-gutter-x: 0rem!important; +} diff --git a/docs/_static/images/torch_compile_backend_openvino.svg b/docs/_static/images/torch_compile_backend_openvino.svg new file mode 100644 index 00000000000..4be98857e76 --- /dev/null +++ b/docs/_static/images/torch_compile_backend_openvino.svg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e26fe889ada0e02a3bbc03e451a7e1d4b06037723349971efff1d721b5e13f6 +size 117253 diff --git a/docs/_static/images/torch_compile_backend_openvino_ts.svg b/docs/_static/images/torch_compile_backend_openvino_ts.svg new file mode 100644 index 00000000000..1d0606b9fc9 --- /dev/null +++ b/docs/_static/images/torch_compile_backend_openvino_ts.svg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c5ba73be918d2105b54e2afe36aa99e8fc5313489afc08f1a92ff75580667ba +size 173373 diff --git a/docs/home.rst b/docs/home.rst index b2ba43180b7..0fd9241372e 100644 --- a/docs/home.rst +++ b/docs/home.rst @@ -26,6 +26,7 @@ OpenVINO 2023.0
  • Even more integrations in 2023.0!
    Load TensorFlow, TensorFlow Lite, and PyTorch models directly, without manual conversion.
    See the supported model formats...
  • CPU inference has become even better. ARM processors are supported and thread scheduling is available on 12th gen IntelĀ® Core and up.
    See how to run OpenVINO on various devices...
  • Post-training optimization and quantization-aware training now in one tool!
    See the new NNCF capabilities...
  • +
  • OpenVINO is enabled in the PyTorch 2.0 torch.compile() backend.
    See how it works...
  • @@ -83,6 +84,13 @@ OpenVINO 2023.0 Reach for performance with post-training and training-time compression with NNCF + .. grid-item-card:: PyTorch 2.0 - torch.compile() backend + :link: pytorch_2_0_torch_compile + :link-alt: torch.compile + :link-type: doc + + Optimize generation of the graph model with PyTorch 2.0 torch.compile() backend + Feature Overview ##############################