[DOCS] Moving files to new location for master (#21575)

* repo update * repo update * repo update * Update LowPrecisionModelRepresentation.rst * Update LowPrecisionModelRepresentation.rst * Update CMakeLists.txt * Update QuantizedNetworks.rst * separate location for pypi publishing files * Update src/bindings/python/CMakeLists.txt fixing path in src/bindings/python/CMakeLists.txt --------- Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
2023-12-12 13:27:01 +01:00 · 2023-12-12 13:27:01 +01:00 · 2f65827f9f
commit 2f65827f9f
parent f7d8bdef46
62 changed files with 63 additions and 1277 deletions
--- a/docs/IE_PLUGIN_DG/LowPrecisionModelRepresentation.rst
+++ b/docs/IE_PLUGIN_DG/LowPrecisionModelRepresentation.rst
@ -1,21 +0,0 @@
-.. {#openvino_docs_ie_plugin_dg_lp_representation}
-
-Representation of low-precision models
-======================================
-The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance on interpretation rules for such models at runtime. 
-Currently, there are two groups of optimization methods that can influence on the IR after applying them to the full-precision model:
- **Sparsity**. It is represented by zeros inside the weights and this is up to the hardware plugin how to interpret these zeros (use weights as is or apply special compression algorithms and sparse arithmetic). No additional mask is provided with the model.
- **Quantization**. The rest of this document is dedicated to the representation of quantized models.
-
-## Representation of quantized models
-The OpenVINO Toolkit represents all the quantized models using the so-called FakeQuantize operation (see the description in [this document](@ref openvino_docs_ops_quantization_FakeQuantize_1)). This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
-In order to be able to execute a particular DL operation in low-precision all its inputs should be quantized i.e. should have FakeQuantize between operation and data blobs.  The figure below shows an example of quantized Convolution which contains two FakeQuantize nodes: one for weights and one for activations (bias is quantized using the same parameters).
-![quantized_convolution]
-<div align="center">Figure 1. Example of quantized Convolution operation.</div>
-
-Starting from OpenVINO 2020.2 release all the quantized models are represented in the compressed form. It means that the weights of low-precision operations are converted into the target precision (e.g. INT8). It helps to substantially reduce the model size. The rest of the parameters can be represented in FLOAT32 or FLOAT16 precision depending on the input full-precision model used in the quantization process. Fig. 2 below shows an example of the part of the compressed IR.
-![quantized_model_example]
-<div align="center">Figure 2. Example of compressed quantized model.</div>  
-
-[quantized_convolution]: images/quantized_convolution.png
-[quantized_model_example]: images/quantized_model_example.png
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_Caffe_Python_Layers.rst
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_Caffe_Python_Layers.rst
@ -1,110 +0,0 @@
-# [LEGACY] Extending Model Optimizer with Caffe Python Layers {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers}
-
-
-.. meta::
-  :description: Learn how to extract operator attributes in Model Optimizer to 
-                support a custom Caffe operation written only in Python.
-
-.. danger::
-
-   The code described here has been **deprecated!** Do not use it to avoid working with a legacy solution. It will be kept for some time to ensure backwards compatibility, but **you should not use** it in contemporary applications.
-
-   This guide describes a deprecated TensorFlow conversion method. The guide on the new and recommended method, using a new frontend, can be found in the  :doc:`Frontend Extensions <openvino_docs_Extensibility_UG_Frontend_Extensions>` article. 
-
-This article provides instructions on how to support a custom Caffe operation written only in Python. For example, the
-`Faster-R-CNN model <https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0>`__ implemented in
-Caffe contains a custom proposal layer written in Python. The layer is described in the
-`Faster-R-CNN prototxt <https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt>`__ in the following way:
-
-.. code-block:: sh
-
-   layer {
-     name: 'proposal'
-     type: 'Python'
-     bottom: 'rpn_cls_prob_reshape'
-     bottom: 'rpn_bbox_pred'
-     bottom: 'im_info'
-     top: 'rois'
-     python_param {
-       module: 'rpn.proposal_layer'
-       layer: 'ProposalLayer'
-       param_str: "'feat_stride': 16"
-     }
-   }
-
-
-This article describes only a procedure on how to extract operator attributes in Model Optimizer. The rest of the
-operation enabling pipeline and information on how to support other Caffe operations (written in C++) is described in
-the :doc:`Customize Model Optimizer <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>` guide.
-
-========================================
-Writing Extractor for Caffe Python Layer
-========================================
-
-Custom Caffe Python layers have an attribute ``type`` (defining the type of the operation) equal to ``Python`` and two
-mandatory attributes ``module`` and ``layer`` in the ``python_param`` dictionary. The ``module`` defines the Python module name
-with the layer implementation, while ``layer`` value is an operation type defined by a user. In order to extract
-attributes for such an operation it is necessary to implement extractor class inherited from the
-``CaffePythonFrontExtractorOp`` class instead of ``FrontExtractorOp`` class, used for standard framework layers. The ``op``
-class attribute value should be set to the ``module + "." + layer`` value so the extractor is triggered for this kind of
-operation.
-
-Below is a simplified example of the extractor for the custom operation Proposal from the mentioned Faster-R-CNN model.
-The full code with additional checks can be found `here <https://github.com/openvinotoolkit/openvino/blob/releases/2022/1/tools/mo/openvino/tools/mo/front/caffe/proposal_python_ext.py>`__.
-
-The sample code uses operation ``ProposalOp`` which corresponds to ``Proposal`` operation described in the :doc:`Available Operations Sets <openvino_docs_ops_opset>`
-page. For a detailed explanation of the extractor, refer to the source code below.
-
-.. code-block:: py
-   :force:
-
-   from openvino.tools.mo.ops.proposal import ProposalOp
-   from openvino.tools.mo.front.extractor import CaffePythonFrontExtractorOp
-   
-   
-   class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
-       op = 'rpn.proposal_layer.ProposalLayer'  # module + "." + layer
-       enabled = True  # extractor is enabled
-   
-       @staticmethod
-       def extract_proposal_params(node, defaults):
-           param = node.pb.python_param  # get the protobuf message representation of the layer attributes
-           # parse attributes from the layer protobuf message to a Python dictionary
-           attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
-           update_attrs = defaults
-   
-           # the operation expects ratio and scale values to be called "ratio" and "scale" while Caffe uses different names
-           if 'ratios' in attrs:
-               attrs['ratio'] = attrs['ratios']
-               del attrs['ratios']
-           if 'scales' in attrs:
-               attrs['scale'] = attrs['scales']
-               del attrs['scales']
-   
-           update_attrs.update(attrs)
-           ProposalOp.update_node_stat(node, update_attrs)  # update the node attributes
-   
-       @classmethod
-       def extract(cls, node):
-           # define default values for the Proposal layer attributes
-           defaults = {
-               'feat_stride': 16,
-               'base_size': 16,
-               'min_size': 16,
-               'ratio': [0.5, 1, 2],
-               'scale': [8, 16, 32],
-               'pre_nms_topn': 6000,
-               'post_nms_topn': 300,
-               'nms_thresh': 0.7
-           }
-           cls.extract_proposal_params(node, defaults)
-           return cls.enabled
-
-====================
-Additional Resources
-====================
-
-* :doc:`Model Optimizer Extensibility <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>`
-* :doc:`Graph Traversal and Modification Using Ports and Connections <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections>`
-* :doc:`Model Optimizer Extensions <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions>`
-
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Extensions.rst
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Extensions.rst
@ -1,60 +0,0 @@
-# [LEGACY] Model Optimizer Extensions {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions}
-
-
-.. meta::
-   :description: Learn about deprecated extensions, which enable injecting logic 
-                 to the model conversion pipeline without changing the Model 
-                 Optimizer core code.
-
-.. toctree::
-   :maxdepth: 1
-   :hidden:
-
-   openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Operation
-   openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Extractor
-   openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Transformation_Extensions
-
-.. danger::
-
-   The code described here has been **deprecated!** Do not use it to avoid working with a legacy solution. It will be kept for some time to ensure backwards compatibility, but **you should not use** it in contemporary applications.
-
-   This guide describes a deprecated TensorFlow conversion method. The guide on the new and recommended method, using a new frontend, can be found in the  :doc:`Frontend Extensions <openvino_docs_Extensibility_UG_Frontend_Extensions>` article. 
-
-Model Optimizer extensions enable you to inject some logic to the model conversion pipeline without changing the Model
-Optimizer core code. There are three types of the Model Optimizer extensions:
-
-1. :doc:`Model Optimizer operation <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Operation>`.
-2. A :doc:`framework operation extractor <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Extractor>`.
-3. A :doc:`model transformation <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Transformation_Extensions>`, which can be executed during front, middle or back phase of the model conversion.
-
-An extension is just a plain text file with a Python code. The file should contain a class (or classes) inherited from
-one of extension base classes. Extension files should be saved to a directory with the following structure:
-
-.. code-block:: sh
-   
-   ./<MY_EXT>/
-              ops/                  - custom operations
-              front/                - framework independent front transformations
-                    <FRAMEWORK_1>/  - front transformations for <FRAMEWORK_1> models only and extractors for <FRAMEWORK_1> operations
-                    <FRAMEWORK_2>/  - front transformations for <FRAMEWORK_2> models only and extractors for <FRAMEWORK_2> operations
-                    ...
-              middle/               - middle transformations
-              back/                 - back transformations
-
-Model Optimizer uses the same layout internally to keep built-in extensions. The only exception is that the 
-``mo/ops/`` directory is also used as a source of the Model Optimizer operations due to historical reasons.
-
-.. note:: 
-   The name of a root directory with extensions should not be equal to "extensions" because it will result in a name conflict with the built-in Model Optimizer extensions.
-
-.. note:: 
-   Model Optimizer itself is built by using these extensions, so there is a huge number of examples of their usage in the Model Optimizer code.
-
-====================
-Additional Resources
-====================
-
-* :doc:`Model Optimizer Extensibility <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>`
-* :doc:`Graph Traversal and Modification Using Ports and Connections <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections>`
-* :doc:`Extending Model Optimizer with Caffe Python Layers <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers>`
-
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Extractor.rst
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Extractor.rst
@ -1,113 +0,0 @@
-# [LEGACY] Operation Extractor {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Extractor}
-
-
-.. meta::
-   :description: Learn about a deprecated generic extension in Model Optimizer, 
-                 which provides the operation extractor usable for all model 
-                 frameworks.
-
-
-.. danger::
-
-   The code described here has been **deprecated!** Do not use it to avoid working with a legacy solution. It will be kept for some time to ensure backwards compatibility, but **you should not use** it in contemporary applications.
-
-   This guide describes a deprecated TensorFlow conversion method. The guide on the new and recommended method, using a new frontend, can be found in the  :doc:`Frontend Extensions <openvino_docs_Extensibility_UG_Frontend_Extensions>` article. 
-
-Model Optimizer runs specific extractor for each operation in the model during the model loading.
-
-There are several types of Model Optimizer extractor extensions:
-
-1. The generic one, which is described in this article.
-2. The special extractor for Caffe models with Python layers. This kind of extractor is described in the :doc:`Extending Model Optimizer with Caffe Python Layers <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers>` guide.
-
-Generic extension provides a generic mechanism for the operation extractor applicable for all frameworks. Model Optimizer provides the ``mo.front.extractor.FrontExtractorOp`` class as a base class to implement the extractor. It has the ``extract`` class method, which gets the only parameter ``Node``, which corresponds to the graph node to extract data from. The operation description in the original framework format is stored in the attribute ``pb`` of the node. The extractor goal is to parse this attribute and save necessary attributes to the corresponding node of the graph. Consider the extractor for the ``Const`` TensorFlow operation (refer to the ``extensions/front/tf/const_ext.py`` file):
-
-.. code-block:: py
-   :force:
-
-   from openvino.tools.mo.front.extractor import FrontExtractorOp
-   from openvino.tools.mo.front.tf.extractors.utils import tf_dtype_extractor, tf_tensor_shape, tf_tensor_content
-   from openvino.tools.mo.ops.const import Const
-   
-   
-   class ConstExtractor(FrontExtractorOp):
-       # The "op" class attribute defines a type of the operation in the framework (in this case it is a TensorFlow), 
-       # for which the extractor should be triggered.
-       op = 'Const'
-       enabled = True  # The flag that indicates that this extractor is enabled.
-   
-       @classmethod
-       def extract(cls, node):  # The entry point of the extractor.
-           # The `node.pb` attribute stores the TensorFlow representation of the operation, which is a Protobuf message of the
-           # specific format. In particular, the message contains the attribute called "value" containing the description of
-           # the constant. The string "pb.attr["value"].tensor" is just a Python binding for Protobuf message parsing.
-           pb_tensor = node.pb.attr["value"].tensor
-           # Get the shape of the tensor from the protobuf message, using the helper function "tf_tensor_shape".
-           shape = tf_tensor_shape(pb_tensor.tensor_shape)
-           # Create a dictionary with necessary attributes.
-           attrs = {
-               'shape': shape,
-               # Get the tensor value, using "tf_tensor_content" helper function.
-               'value': tf_tensor_content(pb_tensor.dtype, shape, pb_tensor),
-               # Get the tensor data type, using "tf_dtype_extractor" helper function.
-               'data_type': tf_dtype_extractor(pb_tensor.dtype),
-           }
-           # Update the node attributes, using default attributes from the "Const" operation and attributes saved to the
-           # "attrs" dictionary.
-           Const.update_node_stat(node, attrs)
-           return cls.enabled
-
-Consider another example with an extractor of the ``Constant`` ONNX operation (refer to the ``extensions/front/onnx/const_ext.py`` file):
-
-.. code-block:: py
-   :force:
-
-   from onnx import numpy_helper
-   from onnx.numpy_helper import to_array
-   
-   from openvino.tools.mo.front.extractor import FrontExtractorOp
-   from openvino.tools.mo.front.onnx.extractors.utils import onnx_attr
-   from openvino.tools.mo.ops.const import Const
-   
-   
-   class ConstantExtractor(FrontExtractorOp):
-       op = 'Constant'
-       enabled = True
-   
-       @classmethod
-       def extract(cls, node):
-           # Use "onnx_attr" helper method, which parses the Protobuf representation of the operation saved in the "node".
-           # Gets the value of the attribute with name "value" as "TensorProto" type (specified with a keyword "t").
-           pb_value = onnx_attr(node, 'value', 't')
-           # Use "numpy_helper.to_array()" ONNX helper method to convert "TensorProto" object to a numpy array.
-           value = numpy_helper.to_array(pb_value)
-   
-           attrs = {
-               'data_type': value.dtype,
-               'value': value,
-           }
-           # Update the node attributes, using default attributes from the "Const" operation and attributes saved to the
-           # "attrs" dictionary.
-           Const.update_node_stat(node, attrs)
-           return cls.enabled
-
-The extractors for operations from different frameworks work similarly. The only difference is in the helper methods used to parse operation attributes encoded with a framework-specific representation.
-
-A common practice is to use ``update_node_stat()`` method of the dedicated ``Op`` class to update the node attributes. This method does the following:
-
-1. Sets values for common attributes like ``op``, ``type``, ``infer``, ``in_ports_count``, ``out_ports_count``, ``version`` to values specific to the dedicated operation (``Const`` operation in this case).
-2. Uses ``supported_attrs()`` and ``backend_attrs()`` methods, defined in the ``Op`` class to update specific node attribute ``IE``. The IR emitter uses the value stored in the ``IE`` attribute to pre-process attribute values and save them to IR.
-3. Optionally sets additional attributes provided to the ``update_node_stat()`` function as a second parameter. Usually these attributes are parsed from the particular instance of the operation.
-
-.. note:: 
-   Model Optimizer uses numpy arrays to store values and numpy arrays of ``np.int64`` type to store shapes in the graph.
-
-====================
-Additional Resources
-====================
-
-* :doc:`Model Optimizer Extensibility <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>`
-* :doc:`Graph Traversal and Modification Using Ports and Connections <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections>`
-* :doc:`Model Optimizer Extensions <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions>`
-* :doc:`Extending Model Optimizer with Caffe Python Layers <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers>`
-
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Operation.rst
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Operation.rst
@ -1,110 +0,0 @@
-# [LEGACY] Model Optimizer Operation {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Operation}
-
-
-.. meta::
-   :description: Learn about the Op class, that contains operation attributes, 
-                 which are set to a node of the graph created during model 
-                 conversion with Model Optimizer.
-
-.. danger::
-
-   The code described here has been **deprecated!** Do not use it to avoid working with a legacy solution. It will be kept for some time to ensure backwards compatibility, but **you should not use** it in contemporary applications.
-
-   This guide describes a deprecated TensorFlow conversion method. The guide on the new and recommended method, using a new frontend, can be found in the  :doc:`Frontend Extensions <openvino_docs_Extensibility_UG_Frontend_Extensions>` article. 
-
-Model Optimizer defines a ``mo.ops.Op`` class (``Op`` will be used later in the document to be short), which is a base class
-for an operation used in the Model Optimizer. The instance of the ``Op`` class serves several purposes:
-
-1. Stores the operation attributes.
-2. Stores the operation shape/value and type inference functions.
-3. Defines operation attributes to be saved to the corresponding IR section.
-4. Contains convenient methods to create a graph node from an ``Op`` object instance and connect it with the existing graph.
-5. Used in the extractors to store parsed attributes and operation specific attributes in the dedicated graph node.
-
-It is important to mention that there is no connection between the instance of the ``Op`` class and the ``Node`` object
-created from it. The ``Op`` class is just a container for attributes describing the operation. Model Optimizer uses the ``Op``
-class during a model conversion to create a node of the graph with attributes copied from the ``Op`` class instance. Graph
-manipulations are performed with graph ``Nodes`` and their attributes and does not involve ``Ops``.
-
-There are a number of common attributes used in the operations. Below is the list of these attributes with description.
-
-* ``id`` — **(Mandatory)** — unique identifier of a node in a graph. Generated automatically, equal to the number of nodes in the graph plus 1 if not specified.
-* ``name`` — **(Mandatory)** — name of the operation. Generated automatically, equal to the ``id`` if not specified.
-* ``type`` — **(Mandatory)** —  type of the operation according to the :doc:`opset specification <openvino_docs_ops_opset>`. For the internal Model Optimizer operations, this attribute should be set to ``None``. The model conversion fails if an operation with ``type`` equal to ``None`` comes to the IR emitting phase.
-* ``version`` — **(Mandatory)** —  the operation set (opset) name the operation belongs to. If not specified,  Model Optimizer sets it equal to ``experimental``. For more information about operation sets, refer to  :doc:`OpenVINO Model Representation <openvino_docs_OV_UG_Model_Representation>` section. 
-* ``op`` — Model Optimizer type of the operation. In many cases, the value of ``type`` is equal to the value of ``op``. However, when Model Optimizer cannot instantiate the opset operation during model loading, it creates an instance of an internal operation. Thus, the attribute ``op`` is used as a type of this internal operation. Later in the pipeline, the node created from an internal operation will be replaced during front, middle or back phase with node(s) created from the opset.
-* ``infer`` — the attribute defines a function calculating output tensor(s) shape and optional value(s). The attribute may be set to ``None`` for the internal Model Optimizer operations used during the front phase only. For more information  about the shape inference function, refer to the :ref:`Partial Inference <mo_partial_inference>`.
-* ``type_infer`` — the attribute defines a function calculating output tensor(s) data type. If the attribute is not defined, the default function is used. The function checks if the ``data_type`` node attribute is set and then propagates this type to the output tensor from the **port 0**. Otherwise, it propagates the data type of the tensor coming into the input **port 0** to the output tensor from the **port 0**.
-* ``in_ports_count`` — default number of input ports to be created for the operation. Additional ports can be created or redundant ports can be removed using dedicated ``Node`` class API methods.
-* ``out_ports_count`` — default number of output ports to be created for the operation. Additional ports can be created or redundant ports can be removed using dedicated ``Node`` class API methods.
-
-Below is an example of the Model Optimizer class for the :doc:`SoftMax <openvino_docs_ops_activation_SoftMax_1>` operation from
-the ``mo/ops/softmax.py`` file with the comments in code.
-
-.. code-block:: py
-   
-   class Softmax(Op):
-       # The class attribute defines a name of the operation so the operation class can be obtained using the
-       # "Op.get_op_class_by_name()" static method
-       op = 'SoftMax'
-   
-       # The operation works as an extractor by default. This is a legacy behavior, currently not recommended for use,
-       # thus "enabled" class attribute is set to False. The recommended approach is to use dedicated extractor extension.
-       enabled = False
-   
-       def __init__(self, graph: Graph, attrs: dict):
-           super().__init__(graph, {  # The constructor of the base class Op is called with additional default attributes.
-               'type': __class__.op,  # The operation is from the opset so the type is set to 'SoftMax'.
-               'op': __class__.op,  # Internal Model Optimizer operation has the same type.
-               'version': 'opset1',  # The operation corresponds to opset1.
-               'infer': Softmax.infer,  # Shape inference function is defined below.
-               'axis': 1,  # Default value for the "axis" attribute of the operation SoftMax.
-               'in_ports_count': 1,  # The operation has one input.
-               'out_ports_count': 1,  # The operation produces one output.
-           }, attrs)
-   
-       # The method returns operation specific attributes list. This method is important when implementing
-       # extractor inherited from CaffePythonFrontExtractorOp class to extract attribute for Caffe Python operation.
-       # However, it is currently used interchangeably with the "backend_attrs()" method. If the "backend_attrs()" is not used,
-       # then the "supported_attrs()" is used instead. In this particular case, the operation has just one attribute "axis".
-       def supported_attrs(self):
-           return ['axis']
-   
-       @staticmethod
-       def infer(node: Node):
-           "some code calculating output shape and values"
-
-There is a dedicated method called ``backend_attrs()`` defining a list of attributes to be saved to the IR. Consider an
-example from the ``mo/ops/pooling.py`` file:
-
-.. code-block:: py
-   
-      def backend_attrs(self):
-           return [
-               ('strides', lambda node: ','.join(map(str, node['stride'][node.spatial_dims]))),
-               ('kernel', lambda node: ','.join(map(str, node['window'][node.spatial_dims]))),
-   
-               ('pads_begin', lambda node: ','.join(map(str, get_backend_pad(node.pad, node.spatial_dims, 0)))),
-               ('pads_end', lambda node: ','.join(map(str, get_backend_pad(node.pad, node.spatial_dims, 1)))),
-   
-               ('pool-method', 'pool_method'),
-               ('exclude-pad', 'exclude_pad'),
-   
-               'rounding_type',
-               'auto_pad',
-           ]
-
-The ``backend_attrs()`` function returns a list of records. A record can be of one of the following formats:
-1. A string defining the attribute to be saved to the IR. If the value of the attribute is ``None``, the attribute is not saved. Examples of this case are ``rounding_type`` and ``auto_pad``.
-2. A tuple, where the first element is a string defining the name of the attribute as it will appear in the IR and the second element is a function to produce the value for this attribute. The function gets an instance of the ``Node`` as the only parameter and returns a string with the value to be saved to the IR. Examples of this case are ``strides``, ``kernel``, ``pads_begin`` and ``pads_end``.
-3. A tuple, where the first element is a string defining the name of the attribute as it will appear in the IR and the second element is the name of the ``Node`` attribute to get the value from. Examples of this case are ``pool-method`` and ``exclude-pad``.
-
-====================
-Additional Resources
-====================
-
-* :doc:`Model Optimizer Extensibility <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>`
-* :doc:`Graph Traversal and Modification Using Ports and Connections <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections>`
-* :doc:`Model Optimizer Extensions <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions>`
-* :doc:`Extending Model Optimizer with Caffe Python Layers <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers>`
-
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Ports_Connections.rst
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Ports_Connections.rst
@ -1,186 +0,0 @@
-# [LEGACY] Graph Traversal and Modification {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections}
-
-
-.. meta::
-   :description: Learn about deprecated APIs and the Port and Connection classes 
-                 in Model Optimizer used for graph traversal and transformation.
-
-.. danger::
-
-   The code described here has been **deprecated!** Do not use it to avoid working with a legacy solution. It will be kept for some time to ensure backwards compatibility, but **you should not use** it in contemporary applications.
-
-   This guide describes a deprecated TensorFlow conversion method. The guide on the new and recommended method, using a new frontend, can be found in the  :doc:`Frontend Extensions <openvino_docs_Extensibility_UG_Frontend_Extensions>` article. 
-
-There are three APIs for a graph traversal and transformation used in the Model Optimizer:
-
-1. The API provided with the ``networkx`` Python library for the ``networkx.MultiDiGraph`` class, which is the base class for
-the ``mo.graph.graph.Graph`` object. For example, the following methods belong to this API level: 
-
-* ``graph.add_edges_from([list])``,
-* ``graph.add_node(x, attrs)``, 
-* ``graph.out_edges(node_id)`` 
-* other methods where ``graph`` is a an instance of the ``networkx.MultiDiGraph`` class.
-
-**This is the lowest-level API. Avoid using it in the Model Optimizer transformations**. For more details, refer to the :ref:`Model Representation in Memory <mo_model_representation_in_memory>` section. 
-
-2. The API built around the ``mo.graph.graph.Node`` class. The ``Node`` class is the primary class to work with graph nodes
-and their attributes. Examples of such methods and functions are:
-
-* ``node.in_node(y)``, 
-* ``node.out_node(x)``,
-* ``node.get_outputs()``,
-* ``node.insert_node_after(n1, y)``,
-* ``create_edge(n1, n2)``
-
-**There are some "Node" class methods not recommended for use and some functions defined in the mo.graph.graph have been deprecated**. For more details, refer to the ``mo/graph/graph.py`` file. 
-
-3. The high-level API called Model Optimizer Graph API, which uses ``mo.graph.graph.Graph``, ``mo.graph.port.Port`` and
-``mo.graph.connection.Connection`` classes. For example, the following methods belong to this API level:
-
-* ``node.in_port(x)``, 
-* ``node.out_port(y)``, 
-* ``port.get_connection()``, 
-* ``connection.get_source()``,
-* ``connection.set_destination(dest_port)``
-
-**This is the recommended API for the Model Optimizer transformations and operations implementation**.
-
-The main benefit of using the Model Optimizer Graph API is that it hides some internal implementation details (the fact that
-the graph contains data nodes), provides API to perform safe and predictable graph manipulations, and adds operation
-semantic to the graph. This is achieved with introduction of concepts of ports and connections. 
-
-.. note:: 
-   This article is dedicated to the Model Optimizer Graph API only and does not cover other two non-recommended APIs.
-
-.. _mo_intro_ports:
-
-=====
-Ports
-=====
-
-An operation semantic describes how many inputs and outputs the operation has. For example, 
-:doc:`Parameter <openvino_docs_ops_infrastructure_Parameter_1>` and :doc:`Const <openvino_docs_ops_infrastructure_Constant_1>` operations have no
-inputs and have one output, :doc:`ReLU <openvino_docs_ops_activation_ReLU_1>` operation has one input and one output, 
-:doc:`Split <openvino_docs_ops_movement_Split_1>` operation has 2 inputs and a variable number of outputs depending on the value of the
-attribute ``num_splits``.
-
-Each operation node in the graph (an instance of the ``Node`` class) has 0 or more input and output ports (instances of
-the ``mo.graph.port.Port`` class). The ``Port`` object has several attributes:
-
-* ``node`` - the instance of the ``Node`` object the port belongs to.
-* ``idx`` - the port number. Input and output ports are numbered independently, starting from ``0``. Thus, 
-:doc:`ReLU <openvino_docs_ops_activation_ReLU_1>` operation has one input port (with index ``0``) and one output port (with index ``0``).
-* ``type`` - the type of the port. Could be equal to either ``"in"`` or ``"out"``.
-* ``data`` - the object that should be used to get attributes of the corresponding data node. This object has methods ``get_shape()`` / ``set_shape()`` and ``get_value()`` / ``set_value()`` to get/set shape/value of the corresponding data node. For example, ``in_port.data.get_shape()`` returns an input shape of a tensor connected to input port ``in_port`` (``in_port.type == 'in'``), ``out_port.data.get_value()`` returns a value of a tensor produced from output port ``out_port`` (``out_port.type == 'out'``).
-
-.. note:: 
-   Functions ``get_shape()`` and ``get_value()`` return ``None`` until the partial inference phase. For more information  about model conversion phases, refer to the :ref:`Model Conversion Pipeline <mo_model_conversion_pipeline>`. For information about partial inference phase, see the :ref:`Partial Inference <mo_partial_inference>`.
-
-There are several methods of the ``Node`` class to get the instance of a corresponding port:
-
-* ``in_port(x)`` and ``out_port(x)`` to get the input/output port with number ``x``.
-* ``in_ports()`` and ``out_ports()`` to get a dictionary, where key is a port number and the value is the corresponding input/output port.
-
-Attributes ``in_ports_count`` and ``out_ports_count`` of the ``Op`` class instance define default number of input and output
-ports to be created for the ``Node``. However, additional input/output ports can be added using methods
-``add_input_port()`` and ``add_output_port()``. Port also can be removed, using the ``delete_input_port()`` and
-``delete_output_port()`` methods.
-
-The ``Port`` class is just an abstraction that works with edges incoming/outgoing to/from a specific ``Node`` instance. For
-example, output port with ``idx = 1`` corresponds to the outgoing edge of a node with an attribute ``out = 1``, the input
-port with ``idx = 2`` corresponds to the incoming edge of a node with an attribute ``in = 2``.
-
-Consider the example of a graph part with 4 operation nodes "Op1", "Op2", "Op3", and "Op4" and a number of data nodes
-depicted with light green boxes.
-
-.. image:: _static/images/MO_ports_example_1.svg
-   :scale: 80 %
-   :align: center
-
-Operation nodes have input ports (yellow squares) and output ports (light purple squares). Input port may not be
-connected. For example, the input **port 2** of node **Op1** does not have incoming edge, while output port always has an
-associated data node (after the partial inference when the data nodes are added to the graph), which may have no
-consumers.
-
-Ports can be used to traverse a graph. The method ``get_source()`` of an input port returns an output port producing the
-tensor consumed by the input port. It is important that the method works the same during front, middle and back phases of a
-model conversion even though the graph structure changes (there are no data nodes in the graph during the front phase).
-
-Let's assume that there are 4 instances of ``Node`` object ``op1, op2, op3``, and ``op4`` corresponding to nodes **Op1**, **Op2**,
-**Op3**, and **Op4**, respectively. The result of ``op2.in_port(0).get_source()`` and ``op4.in_port(1).get_source()`` is the
-same object ``op1.out_port(1)`` of type ``Port``.
-
-The method ``get_destination()`` of an output port returns the input port of the node consuming this tensor. If there are
-multiple consumers of this tensor, the error is raised. The method ``get_destinations()`` of an output port returns a
-list of input ports consuming the tensor.
-
-The method ``disconnect()`` removes a node incoming edge corresponding to the specific input port. The method removes
-several edges if it is applied during the front phase for a node output port connected with multiple nodes.
-
-The method ``port.connect(another_port)`` connects output port ``port`` and input port ``another_port``. The method handles
-situations when the graph contains data nodes (middle and back phases) and does not create an edge between two nodes
-but also automatically creates data node or reuses existing data node. If the method is used during the front phase and
-data nodes do not exist, the method creates edge and properly sets ``in`` and ``out`` edge attributes.
-
-For example, applying the following two methods to the graph above will result in the graph depicted below:
-
-.. code-block:: py
-   :force:
-
-   op4.in_port(1).disconnect()
-   op3.out_port(0).connect(op4.in_port(1))
-
-.. image:: _static/images/MO_ports_example_2.svg
-   :scale: 80 %
-   :align: center
-
-.. note:: 
-   For a full list of available methods, refer to the ``Node`` class implementation in the ``mo/graph/graph.py`` and ``Port`` class implementation in the ``mo/graph/port.py`` files.
-
-===========
-Connections
-===========
-
-Connection is a concept introduced to easily and reliably perform graph modifications. Connection corresponds to a
-link between a source output port with one or more destination input ports or a link between a destination input port
-and source output port producing data. So each port is connected with one or more ports with help of a connection.
-Model Optimizer uses the ``mo.graph.connection.Connection`` class to represent a connection.
-
-There is only one ``get_connection()`` method of the ``Port`` class to get the instance of the corresponding ``Connection``
-object. If the port is not connected, the returned value is ``None``.
-
-For example, the ``op3.out_port(0).get_connection()`` method returns a ``Connection`` object encapsulating edges from node
-**Op3** to data node **data_3_0** and two edges from data node **data_3_0** to two ports of the node **Op4**.
-
-The ``Connection`` class provides methods to get source and destination(s) ports the connection corresponds to:
-
-* ``connection.get_source()`` - returns an output ``Port`` object producing the tensor.
-* ``connection.get_destinations()`` - returns a list of input ``Port`` consuming the data.
-* ``connection.get_destination()`` - returns a single input ``Port`` consuming the data. If there are multiple consumers, the exception is raised.
-
-The ``Connection`` class provides methods to modify a graph by changing a source or destination(s) of a connection. For
-example, the function call ``op3.out_port(0).get_connection().set_source(op1.out_port(0))`` changes source port of edges
-consuming data from port ``op3.out_port(0)`` to ``op1.out_port(0)``. The transformed graph from the sample above is depicted
-below:
-
-.. image:: _static/images/MO_connection_example_1.svg
-   :scale: 80 %
-   :align: center
-
-Another example is the ``connection.set_destination(dest_port)`` method. It disconnects ``dest_port`` and all input ports to which
-the connection is currently connected and connects the connection source port to ``dest_port``.
-
-Note that connection works seamlessly during front, middle, and back phases and hides the fact that the graph structure is
-different.
-
-.. note:: 
-   For a full list of available methods, refer to the ``Connection`` class implementation in the ``mo/graph/connection.py`` file.
-
-====================
-Additional Resources
-====================
-
-* :doc:`Model Optimizer Extensibility <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>`
-* :doc:`Model Optimizer Extensions <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions>`
-* :doc:`Extending Model Optimizer with Caffe Python Layers <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers>`
-
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Transformation_Extensions.rst
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Model_Optimizer_Transformation_Extensions.rst
@ -1,605 +0,0 @@
-# [LEGACY] Graph Transformation Extensions {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Transformation_Extensions}
-
-
-.. meta::
-  :description: Learn about various base classes for front, middle and back phase 
-                transformations applied during model conversion with Model Optimizer.
-
-.. danger::
-
-   The code described here has been **deprecated!** Do not use it to avoid working with a legacy solution. It will be kept for some time to ensure backwards compatibility, but **you should not use** it in contemporary applications.
-
-   This guide describes a deprecated TensorFlow conversion method. The guide on the new and recommended method, using a new frontend, can be found in the  :doc:`Frontend Extensions <openvino_docs_Extensibility_UG_Frontend_Extensions>` article. 
-
-Model Optimizer provides various base classes to implement :ref:`Front Phase Transformations <mo_front_phase_transformations>`,
-:ref:`Middle Phase Transformations <mo_middle_phase_transformations>`, and :ref:`Back Phase Transformations <mo_back_phase_transformations>`.
-All classes have the following common class attributes and methods:
-
-1. The ``enabled`` attribute specifies whether the transformation is enabled or not. The value can be changed during runtime to enable or disable execution of the transformation during a model conversion. Default value is ``True``.
-2. The ``id`` attribute specifies a unique transformation string identifier. This transformation identifier can be used to enable (disable) the transformation by setting environment variable ``MO_ENABLED_TRANSFORMS`` (``MO_DISABLED_TRANSFORMS``) with a comma separated list of ``ids``. The environment variables override the value of the ``enabled`` attribute of the transformation. Instead of using ``id`` attribute value you can add fully defined class name to ``MO_ENABLED_TRANSFORMS`` (``MO_DISABLED_TRANSFORMS``) variable, ``extensions.back.NonmalizeToNormalizeL2.NormalizeToNormalizeL2`` for example. It is an optional attribute.
-3. The ``run_not_recursively`` attribute specifies whether the transformation should be executed in the sub-graphs, for example, body of the :doc:`TensorIterator <openvino_docs_ops_infrastructure_TensorIterator_1>` and the :doc:`Loop <openvino_docs_ops_infrastructure_Loop_5>`. Default value is ``True``.
-4. The ``force_clean_up`` attribute specifies whether the graph clean up should be executed after the transformation. The graph cleanup removes nodes of the graph not reachable from the model inputs. Default value is ``False``.
-5. The ``force_shape_inference`` attribute specifies whether the nodes marked with ``need_shape_inference`` attribute equal to ``True`` should be re-inferred after the transformation. Model Optimizer sets this attribute automatically for nodes, input(s) of which were changed during the transformation, or you can set this attribute manually in the transformation for the specific nodes. Default value is ``False``.
-6. Attribute ``graph_condition`` specifies a list of functions with one parameter -- ``Graph`` object. The transformation is executed if and only if all functions return ``True``. If the attribute is not set, no check is performed.
-7. Method ``run_before()`` returns a list of transformation classes which this transformation should be executed before.
-8. Method ``run_after()`` returns a list of transformation classes which this transformation should be executed after.
-
-.. note:: 
-   Some of the transformation types have specific class attributes and methods, which are explained in the corresponding sections of this document.
-
-Model Optimizer builds a graph of dependencies between registered transformations and executes them in the topological
-order. To execute the transformation during a proper model conversion phase, Model Optimizer defines several
-anchor transformations that do nothing. All transformations are ordered with respect to these anchor transformations.
-The diagram below shows anchor transformations, some of built-in transformations and dependencies between them:
-
-.. image:: _static/images/MO_transformations_graph.svg
-
-User-defined transformations are executed after the corresponding ``Start`` and before the corresponding ``Finish`` anchor
-transformations by default (if ``run_before()`` and ``run_after()`` methods have not been overridden).
-
-.. note:: 
-   The ``PreMiddleStart`` and ``PostMiddleStart`` anchors were introduced due to historical reasons to refactor the Model Optimizer pipeline, which initially had a hardcoded order of transformations.
-
-.. _mo_front_phase_transformations:
-
-===========================
-Front Phase Transformations
-===========================
-
-There are several types of a front phase transformation:
-
-1. :ref:`Pattern-Defined Front Phase Transformations <pattern_defined_front_phase_transformations>` triggered for each sub-graph of the original graph isomorphic to the specified pattern.
-2. :ref:`Specific Operation Front Phase Transformations <specific_operation_front_phase_transformations>` triggered for the node with a specific ``op`` attribute value.
-3. :ref:`Generic Front Phase Transformations <generic_front_phase_transformations>`.
-4. Manually enabled transformation, defined with a JSON configuration file (for TensorFlow, ONNX, Apache MXNet, and PaddlePaddle models), specified using the ``--transformations_config`` command-line parameter:
-
-   1. :ref:`Node Name Pattern Front Phase Transformations <node_name_pattern_front_phase_transformations>`.
-   2. :ref:`Front Phase Transformations Using Start and End Points <start_end_points_front_phase_transformations>`.
-   3. :ref:`Generic Front Phase Transformations Enabled with Transformations Configuration File <generic_transformations_config_front_phase_transformations>`.
-
-.. _pattern_defined_front_phase_transformations:
-
-Pattern-Defined Front Phase Transformations
-###########################################
-
-This type of transformation is implemented using ``mo.front.common.replacement.FrontReplacementSubgraph`` and
-``mo.front.common.replacement.FrontReplacementPattern`` as base classes and works as follows:
-
-1. Define a sub-graph to be matched, using a list of nodes with attributes and edges connecting them (edges may also have attributes).
-2. Model Optimizer searches for all sub-graphs of the original graph, isomorphic to the specified sub-graph (pattern).
-3. Model Optimizer executes the defined function performing graph transformation for each instance of a matched sub-graph. You can override different functions in the base transformation class so the Model Optimizer works differently:
-
-   1. The ``replace_sub_graph(self, graph, match)`` override the method. In this case Model Optimizer only executes the overridden function, pass the ``graph`` object and a dictionary describing the matched sub-graph. You are required to write the transformation and connect the newly created nodes to the rest of the graph.
-   2. The ``generate_sub_graph(self, graph, match)`` override the method. This case is not recommended for use because it is the most complicated approach. It can be effectively replaced with one of two previous approaches. 
-
-The sub-graph pattern is defined in the ``pattern()`` function. This function should return a dictionary with two keys:
-``nodes`` and ``edges``:
-
-* The value for the ``nodes`` key is a list of tuples with two elements.
-
-  * The first element is an alias name for a node that will be used to define edges between nodes and in the transformation function.
-  * The second element is a dictionary with attributes. The key is a name of an attribute that should exist in the node. The value for the attribute can be some specific value to match or a function that gets a single parameter - the attribute value from the node. The function should return the result of attribute comparison with a dedicated value.
-
-* The value for the ``edges`` key is a list of tuples with two or three elements.
-
-  * The first element is the alias name of the node producing a tensor.
-  * The second element is the alias name of the node consuming the tensor.
-  * The third element (optional) is the dictionary with expected edge attributes. This dictionary usually contains attributes like ``in`` and ``out``, defining input and output ports.
-
-Consider the example of a front transformation implemented in the ``extensions/front/Mish_fusion.py`` file performing
-fusing of the sub-graph defining the :doc:`Mish <openvino_docs_ops_activation_Mish_4>` activation function into a single
-operation:
-
-.. code-block:: py
-   :force:
-
-   from openvino.tools.mo.front.Softplus_fusion import SoftplusFusion
-   from openvino.tools.mo.ops.activation_ops import Mish
-   from openvino.tools.mo.front.common.replacement import FrontReplacementSubgraph
-   from openvino.tools.mo.front.subgraph_matcher import SubgraphMatch
-   from openvino.tools.mo.graph.graph import Graph, rename_nodes
-   
-   
-   class MishFusion(FrontReplacementSubgraph):
-       """
-       The transformation looks for the pattern with Softplus defining the Mish function: Mish(x) = x * tanh(SoftPlus(x)).
-       """
-       enabled = True  # Transformation is enabled.
-   
-       def run_after(self):  # Run this transformation after "SoftplusFusion" transformation.
-           return [SoftplusFusion]
-   
-       def pattern(self):  # Define pattern according to formulae x * tanh(SoftPlus(x)).
-           return dict(
-               nodes=[
-                   ('mul', dict(op='Mul')),
-                   ('tanh', dict(op='Tanh')),
-                   ('softplus', dict(op='SoftPlus')),
-               ],
-               edges=[
-                   ('softplus', 'tanh'),
-                   ('tanh', 'mul'),
-               ])
-   
-       def replace_sub_graph(self, graph: Graph, match: [dict, SubgraphMatch]):  # Entry point for the transformation.
-           mul = match['mul']  # Get the Node corresponding to matched "mul" node.
-           mul_name = mul.soft_get('name', mul.id)
-           softplus = match['softplus']  # Get the Node corresponding to the matched "softplus" node.
-   
-           # Determine the input port of Mul which gets the 'input' node output.
-           input_port_idx = int(mul.in_port(0).get_connection().get_source().node.soft_get('op') == 'Tanh')
-   
-           # Check that the same tensor is provided as input to Mul and SoftPlus.
-           if mul.in_port(input_port_idx).get_source() != softplus.in_port(0).get_source():
-               return
-   
-           mish = Mish(graph, {}).create_node()  # Create Mish operation.
-           mish.in_port(0).connect(mul.in_port(input_port_idx).get_source())  # Connect input to the Mish.
-           mul.out_port(0).get_connection().set_source(mish.out_port(0))  # Reconnect outgoing edge from "mul" to Mish.
-   
-           # Rename the created Mish operation to have the name of the "mul" node, which produced the value equal to the
-           # Mish output.
-           rename_nodes([(mul, mul_name + '/TBR'), (mish, mul_name)])
-
-.. _specific_operation_front_phase_transformations:
-
-Specific Operation Front Phase Transformations
-##############################################
-
-This type of transformation is implemented using ``mo.front.common.replacement.FrontReplacementOp`` as base class and
-works as follows:
-
-1. Define an operation type to trigger the transformation.
-2. Model Optimizer searches for all nodes in the graph with the attribute ``op`` equal to the specified value.
-3. Model Optimizer executes the defined function performing graph transformation for each instance of a matched node. You can override different functions in the base transformation class and Model Optimizer works differently:
-
-   1. The ``replace_sub_graph(self, graph, match)`` override method. In this case, Model Optimizer only executes the overridden function. Pass the ``graph`` object and a dictionary with a single key ``op`` with the matched node as value. You are required to write the transformation and connect the newly created nodes to the rest of the graph.
-   2. The ``replace_op(self, graph, node)`` override method. In this case, Model Optimizer executes the overridden function. Pass the ``graph`` object and the matched node as ``node`` parameter. If the function returns an ``id`` of some node, then the ``Node`` with this ``id`` is connected to the consumers of the matched node. After applying the transformation, the matched node is removed from the graph.
-
-The ``FrontReplacementOp`` class provides a simpler mechanism to match a single operation with specific value of the ``op``
-(write the ``op`` attribute in the class instead of defining a ``pattern()`` function) attribute and perform the
-transformation.
-
-Consider an example transformation from the ``extensions/front/Pack.py`` file, which replaces ``Pack`` operation from
-the TensorFlow:
-
-.. code-block:: py
-   :force:
-   
-   from openvino.tools.mo.front.common.partial_infer.utils import int64_array
-   from openvino.tools.mo.front.common.replacement import FrontReplacementOp
-   from openvino.tools.mo.front.tf.graph_utils import create_op_with_const_inputs
-   from openvino.tools.mo.graph.graph import Node, Graph, rename_nodes
-   from openvino.tools.mo.ops.concat import Concat
-   from openvino.tools.mo.ops.unsqueeze import Unsqueeze
-   
-   
-   class Pack(FrontReplacementOp):
-       op = "Pack"  # Trigger transformation for all nodes in the graph with the op = "Pack" attribute 
-       enabled = True  # Transformation is enabled.
-   
-       def replace_op(self, graph: Graph, node: Node):  # Entry point for the transformation.
-           # Create a Concat operation with a number of inputs equal to a number of inputs to Pack.
-           out_node = Concat(graph, {'axis': node.axis, 'in_ports_count': len(node.in_ports())}).create_node()
-           pack_name = node.soft_get('name', node.id)
-   
-           for ind in node.in_ports():
-               # Add dimension of size 1 to all inputs of the Pack operation and add them as Concat inputs.
-               unsqueeze_node = create_op_with_const_inputs(graph, Unsqueeze, {1: int64_array([node.axis])},
-                                                            {'name': node.soft_get('name', node.id) + '/Unsqueeze'})
-               node.in_port(ind).get_connection().set_destination(unsqueeze_node.in_port(0))
-               unsqueeze_node.out_port(0).connect(out_node.in_port(ind))
-   
-           # Rename the created Concat operation to have the name of the "pack" node, which produced the value equal to the
-           # Concat output.
-           rename_nodes([(node, pack_name + '/TBR'), (out_node, pack_name)])
-           return [out_node.id]  # Reconnect the Pack operation consumers to get input from Concat instead.
-
-
-.. _generic_front_phase_transformations:
-
-Generic Front Phase Transformations
-###################################
-
-Model Optimizer provides a mechanism to implement generic front phase transformation. This type of transformation is
-implemented using ``mo.front.common.replacement.FrontReplacementSubgraph`` or
-``mo.front.common.replacement.FrontReplacementPattern`` as base classes. Make sure the transformation is enabled before trying to execute it. 
-Then, Model Optimizer executes the ``find_and_replace_pattern(self, graph)`` method and
-provides a ``Graph`` object as an input.
-
-Consider the example of a generic front transformation from the ``extensions/front/SqueezeNormalize.py`` file performing
-normalization of the :doc:`Squeeze <openvino_docs_ops_shape_Squeeze_1>` operation. Older version of the operation had a list of
-axes to squeeze as an attribute, but now it is a separate input. For backward compatibility, the Model Optimizer
-operation supports both semantics. Before IR generation, however, the operation should be normalized according to the
-specification.
-
-.. code-block:: py
-   :force:
-
-   import logging as log
-   
-   from openvino.tools.mo.front.common.partial_infer.utils import int64_array
-   from openvino.tools.mo.front.common.replacement import FrontReplacementPattern
-   from openvino.tools.mo.graph.graph import Graph
-   from openvino.tools.mo.ops.const import Const
-   from openvino.tools.mo.utils.error import Error
-   
-   
-   class SqueezeNormalize(FrontReplacementPattern):
-       """
-       Normalizes inputs of the Squeeze layers. The layers should have two inputs: the input with data and input with the
-       dimensions to squeeze. If the second input is omitted then all dimensions of size 1 should be removed.
-       """
-       enabled = True  # The transformation is enabled.
-   
-       def find_and_replace_pattern(self, graph: Graph):  # The function is called unconditionally.
-           for squeeze_node in graph.get_op_nodes(op='Squeeze'):  # Iterate over all nodes with op='Squeeze'.
-               # If the operation has only 1 input node and no 'squeeze_dims' Node attribute, then convert the attribute to
-               # the operation input.
-               if len(squeeze_node.in_nodes()) == 1 and squeeze_node.has_valid('squeeze_dims'):
-                   dims_node = Const(graph, {'name': squeeze_node.id + '/Dims',
-                                             'value': int64_array(squeeze_node.squeeze_dims)}).create_node()
-                   squeeze_node.in_port(1).connect(dims_node.out_port(0))
-                   del squeeze_node['squeeze_dims']
-               # If two inputs already exist, that means the operation is already normalized.
-               elif len(squeeze_node.in_nodes()) == 2:
-                   log.debug('The Squeeze node "{}" is already normalized'.format(squeeze_node.name))
-               # In all other cases, raise an error.
-               else:
-                   raise Error('The Squeeze layer "{}" should either have 2 inputs or one input and an "squeeze_dims" '
-                               'attribute'.format(squeeze_node.soft_get('name')))
-
-For the details on implementation and how these front phase transformations work, refer to the ``mo/front/common/replacement.py``
-file.
-
-.. _node_name_pattern_front_phase_transformations:
-
-Node Name Pattern Front Phase Transformations
-#############################################
-
-TensorFlow uses a mechanism of scope to group related operation nodes. It is a good practice to put nodes performing
-particular task into the same scope. This approach divides a graph into logical blocks that are easier to review in the
-TensorBoard. The scope, in fact, just defines a common name prefix for the nodes belonging to it.
-
-For example, Inception topologies contain several types of so-called **Inception blocks**. Some of them are equal to each
-other, but located in different places of the network. For example, Inception V4 from the
-`TensorFlow-Slim image classification model library <https://github.com/tensorflow/models/tree/master/research/slim>`__ has
-``Mixed_5b``, ``Mixed_5c`` and ``Mixed_5d`` inception blocks with exactly the same nodes, with the same set of attributes.
-
-Consider a situation when these Inception blocks are implemented extremely efficiently using a single Inference
-Engine operation called ``InceptionBlock`` and these blocks in the model need to be replaced with instances of this operation.
-Model Optimizer provides mechanism to trigger the transformation for a sub-graph of operations defined by the node name
-regular expressions (scope). In this particular case, some of the patterns are: ``.*InceptionV4/Mixed_5b``,
-``.*InceptionV4/Mixed_5c`` and ``.*InceptionV4/Mixed_5d``. Each pattern starts with ``.*``, because the ``InceptionV4`` prefix 
-is added to all nodes names during a model freeze.
-
-This type of transformation is implemented using ``mo.front.tf.replacement.FrontReplacementFromConfigFileSubGraph`` as a
-base class and works as follows:
-
-1. Prepare a JSON configuration file template defining node names patterns.
-2. Run Model Optimizer with the ``--tensorflow_custom_operations_config_update`` command-line parameter, and Model Optimizer adds information about input and output nodes of the specified sub-graphs.
-3. Model Optimizer executes the defined transformation **only** when you specify the path to the configuration file updated in step 2 using the ``--transformations_config`` command-line parameter.
-
-Consider the following possible configuration file template for the Inception Block transformation:
-
-.. code-block:: json
-
-   [
-       {
-           "custom_attributes": {
-               "attr1_key": "attr1_value",
-               "attr2_key": 123456
-           },
-           "id": "InceptionBlockTransformation",
-           "instances": [
-               ".*InceptionV4/Mixed_5b",
-               ".*InceptionV4/Mixed_5c",
-               ".*InceptionV4/Mixed_5d"
-           ],
-           "match_kind": "scope"
-       }
-   ]
-
-The configuration file contains a list of dictionaries. Each dictionary defines one transformation. Each transformation
-is defined with several parameters:
-
-* ``id`` - **(Mandatory)** — is a unique identifier of the transformation. It is used in the Python code that implements the transformation to link the class and the transformation description from the configuration file.
-* ``match_kind`` - **(Mandatory)** —  is a string that specifies the matching algorithm. For the node name pattern case, the value should be equal to ``scope``. Another possible values are described in the dedicated sections below.
-* ``instances`` - **(Mandatory)** —  specifies instances of the sub-graph to be matched. It contains a list of node names prefixes patterns for the match kind of the ``scope`` type.
-* ``custom_attributes`` - **(Optional)** —  is a dictionary with attributes that can be used in the transformation code.
-
-After running Model Optimizer with additional ``--tensorflow_custom_operations_config_update`` parameter pointing to
-the template configuration file, the content of the file should be updated with two new sections ``inputs`` and ``outputs``.
-The file content after the update is as follows:
-
-.. code-block:: json
-
-   [
-       {
-           "id": "InceptionBlockTransformation",
-           "custom_attributes": {
-               "attr1_key": "attr1_value",
-               "attr2_key": 123456
-           },
-           "instances": [
-               ".*InceptionV4/Mixed_5b",
-               ".*InceptionV4/Mixed_5c",
-               ".*InceptionV4/Mixed_5d"
-           ],
-           "match_kind": "scope",
-           "inputs": [
-               [
-                   {
-                       "node": "Branch_2/Conv2d_0a_1x1/Conv2D$",
-                       "port": 0
-                   },
-                   {
-                       "node": "Branch_3/AvgPool_0a_3x3/AvgPool$",
-                       "port": 0
-                   },
-                   {
-                       "node": "Branch_1/Conv2d_0a_1x1/Conv2D$",
-                       "port": 0
-                   },
-                   {
-                       "node": "Branch_0/Conv2d_0a_1x1/Conv2D$",
-                       "port": 0
-                   }
-               ]
-           ],
-           "outputs": [
-               {
-                   "node": "concat$",
-                   "port": 0
-               }
-           ]
-       }
-   ]
-
-The value for ``inputs`` key is a list of lists describing input tensors of the sub-graph. Each element of the top-level
-list corresponds to one unique input tensor of the sub-graph. Each internal list describes a list of nodes consuming
-this tensor and port numbers, where the tensor is consumed. Model Optimizer generates regular expressions for the input
-nodes names to uniquely identify them in each instance of the sub-graph, defined by the ``instances``. Denote these nodes
-as input nodes of the sub-graph.
-
-In the InceptionV4 topology, the ``InceptionV4/Mixed_5b`` block has four input tensors from outside of the sub-graph,
-but all of them are produced by the ``InceptionV4/Mixed_5a/concat`` node. Therefore, the top-level list of the ``inputs``
-contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by
-``InceptionV4/Mixed_5a/concat`` node. In this case, all four input nodes consume input tensor into "port 0".
-
-The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level
-list is important. This order defines how Model Optimizer attaches input tensors to a new generated
-node if the sub-graph is replaced with a single node. The ``i``-th input node of the sub-graph is obtained using 
-``match.single_input_node(i)`` call in the sub-graph transformation code. More information about API is given below. If it is
-necessary to change the order of input tensors, the configuration file can be edited in the text editor.
-
-The value for the ``outputs`` key is a list describing nodes of the sub-graph producing tensor, that goes outside of the
-sub-graph or does not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in
-the list is important. The ``i``-th element of the list describes the ``i``-th output tensor of the sub-graph, which could be
-obtained using ``match.output_node(i)`` call. The order of elements can be manually changed in the configuration file.
-Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node.
-
-For more examples of this type of transformation, refer to the :doc:`Converting TensorFlow Object Detection API Models <openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models>` guide.
-
-.. _start_end_points_front_phase_transformations:
-
-Front Phase Transformations Using Start and End Points
-######################################################
-
-This type of transformation is implemented using ``mo.front.tf.replacement.FrontReplacementFromConfigFileSubGraph`` as a
-base class and works as follows:
-
-1. Prepare a JSON configuration file that defines the sub-graph to match, using two lists of node names: "start" and "end" nodes.
-2. Model Optimizer executes the defined transformation **only** when you specify the path to the configuration file using the ``--transformations_config`` command-line parameter . Model Optimizer performs the following steps to match the sub-graph:
-
-   1. Starts a graph traversal from every start node following the direction of the graph edges. The search stops in an end node or in the case of a node without consumers. All visited nodes are added to the matched sub-graph.
-   2. Starts another graph traversal from each non-start node of the sub-graph, i.e. every node except nodes from the "start" list. In this step, the edges are traversed in the opposite edge direction. All newly visited nodes are added to the matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the matched sub-graph.
-   3. Checks that all "end" nodes were reached from "start" nodes. If not, it exits with an error.
-   4. Checks that there are no :doc:`Parameter <openvino_docs_ops_infrastructure_Parameter_1>` operations among added nodes. If they exist, the sub-graph depends on the inputs of the model. Such configuration is considered incorrect so  Model Optimizer exits with an error.
-
-This algorithm finds all nodes "between" start and end nodes and nodes needed for calculation of non-input nodes of the
-matched sub-graph.
-
-The example of a JSON configuration file for a transformation with start and end points is
-``extensions/front/tf/ssd_support_api_v1.15.json``:
-
-.. code-block:: json
-   
-   [
-       {
-           "custom_attributes": {
-               "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
-               "pad_mode": "caffe.ResizeParameter.CONSTANT",
-               "resize_mode": "caffe.ResizeParameter.WARP",
-               "clip_before_nms": false,
-               "clip_after_nms": true
-           },
-           "id": "ObjectDetectionAPISSDPostprocessorReplacement",
-           "include_inputs_to_sub_graph": true,
-           "include_outputs_to_sub_graph": true,
-           "instances": {
-               "end_points": [
-                   "detection_boxes",
-                   "detection_scores",
-                   "num_detections"
-               ],
-               "start_points": [
-                   "Postprocessor/Shape",
-                   "Postprocessor/scale_logits",
-                   "Postprocessor/Tile",
-                   "Postprocessor/Reshape_1",
-                   "Postprocessor/Cast_1"
-               ]
-           },
-           "match_kind": "points"
-       }
-   ]
-
-The format of the file is similar to the one provided as an example in the
-:ref:`Node Name Pattern Front Phase Transformations <node_name_pattern_front_phase_transformations>` section. The difference is in
-the value of the ``match_kind`` parameter, which should be equal to the ``points`` and the format of the ``instances`` parameter,
-which should be a dictionary with two keys ``start_points`` and ``end_points``, defining start and end node names
-respectively.
-
-.. note:: 
-   The ``include_inputs_to_sub_graph`` and ``include_outputs_to_sub_graph`` parameters are redundant and should be always equal to ``true``.
-
-.. note:: 
-   This sub-graph match algorithm has a limitation that each start node must have only one input. Therefore, it is not possible to specify, for example, the :doc:`Convolution <openvino_docs_ops_convolution_Convolution_1>` node as input because it has two inputs: data tensor and tensor with weights.
-
-For other examples of transformations with points, refer to the
-:doc:`Converting TensorFlow Object Detection API Models <openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models>` guide.
-
-.. _generic_transformations_config_front_phase_transformations:
-
-Generic Front Phase Transformations Enabled with Transformations Configuration File
-###################################################################################
-
-This type of transformation works similarly to the :ref:`Generic Front Phase Transformations <generic_front_phase_transformations)`
-but require a JSON configuration file to enable it similarly to
-:ref:`Node Name Pattern Front Phase Transformations <node_name_pattern_front_phase_transformations>` and
-:ref:`Front Phase Transformations Using Start and End Points <start_end_points_front_phase_transformations>`.
-
-The base class for this type of transformation is
-``mo.front.common.replacement.FrontReplacementFromConfigFileGeneral``. Model Optimizer executes the 
-``transform_graph(self, graph, replacement_descriptions)`` method and provides the ``Graph`` object and dictionary with values
-parsed from the `custom_attributes` attribute of the provided JSON configuration file.
-
-The example of the configuration file for this type of transformation is ``extensions/front/tf/yolo_v1_tiny.json``:
-
-.. code-block:: json
-   
-   [
-     {
-       "id": "TFYOLO",
-       "match_kind": "general",
-       "custom_attributes": {
-         "classes": 20,
-         "coords": 4,
-         "num": 2,
-         "do_softmax": 0
-       }
-     }
-   ]
-
-and the corresponding transformation file is ``./extensions/front/YOLO.py``:
-
-.. code-block:: py
-   :force:
-   
-   from openvino.tools.mo.front.no_op_eraser import NoOpEraser
-   from openvino.tools.mo.front.standalone_const_eraser import StandaloneConstEraser
-   from openvino.tools.mo.ops.regionyolo import RegionYoloOp
-   from openvino.tools.mo.front.tf.replacement import FrontReplacementFromConfigFileGeneral
-   from openvino.tools.mo.graph.graph import Node, Graph
-   from openvino.tools.mo.ops.result import Result
-   from openvino.tools.mo.utils.error import Error
-   
-   
-   class YoloRegionAddon(FrontReplacementFromConfigFileGeneral):
-       """
-       Replaces all Result nodes in graph with YoloRegion->Result nodes chain.
-       YoloRegion node attributes are taken from configuration file
-       """
-       replacement_id = 'TFYOLO'  # The identifier matching the "id" attribute in the JSON file.
-   
-       def run_after(self):
-           return [NoOpEraser, StandaloneConstEraser]
-   
-       def transform_graph(self, graph: Graph, replacement_descriptions):
-           op_outputs = [n for n, d in graph.nodes(data=True) if 'op' in d and d['op'] == 'Result']
-           for op_output in op_outputs:
-               last_node = Node(graph, op_output).in_node(0)
-               op_params = dict(name=last_node.id + '/YoloRegion', axis=1, end_axis=-1)
-               op_params.update(replacement_descriptions)
-               region_layer = RegionYoloOp(graph, op_params)
-               region_layer_node = region_layer.create_node([last_node])
-               # In here, 'axis' from 'dim_attrs' can be removed to avoid permutation from axis = 1 to axis = 2.
-               region_layer_node.dim_attrs.remove('axis')
-               Result(graph).create_node([region_layer_node])
-               graph.remove_node(op_output)
-
-The configuration file has only 3 parameters: ``id`` identifier of the transformation , ``match_kind`` (which should be equal
-to ``general``) and the ``custom_attributes`` dictionary with custom attributes accessible in the transformation.
-
-.. _mo_middle_phase_transformations:
-
-============================
-Middle Phase Transformations
-============================
-
-There are two types of middle phase transformations:
-
-1. :ref:`Pattern-Defined Middle Phase Transformations <pattern_defined_middle_phase_transformations>` triggered for each sub-graph of the original graph, isomorphic to the specified pattern.
-2. :ref:`Generic Middle Phase Transformations <generic_middle_phase_transformations>`.
-
-.. _pattern_defined_middle_phase_transformations:
-
-Pattern-Defined Middle Phase Transformations
-############################################
-
-This type of transformation is implemented using ``mo.middle.replacement.MiddleReplacementPattern`` as a base class and
-works similarly to the :ref:`Pattern-Defined Middle Phase Transformations <pattern_defined_middle_phase_transformations>`
-The are two differences:
-
-1. The transformation entry function name is ``replace_pattern(self, graph, match)``.
-2. The pattern defining the graph should contain data nodes because the structure of the graph is different between front and middle phases. For more information about the graph structure changes, refer to the :ref:`Partial Inference <mo_partial_inference>`.
-
-For the example of a pattern-defined middle transformation, refer to the ``extensions/middle/L2NormToNorm.py`` file.
-
-.. _generic_middle_phase_transformations:
-
-Generic Middle Phase Transformations
-####################################
-
-Model Optimizer provides a mechanism to implement generic middle phase transformations. This type of transformation is
-implemented using ``mo.middle.replacement.MiddleReplacementPattern`` as a base class and works similarly to the
-:ref:`Generic Front Phase Transformations <generic_front_phase_transformations>`. The only difference is that the
-transformation entry function name is ``find_and_replace_pattern(self, graph: Graph)``.
-
-For the example of this transformation, refer to the ``extensions/middle/CheckForCycle.py`` file.
-
-.. _mo_back_phase_transformations:
-
-==========================
-Back Phase Transformations
-==========================
-
-There are two types of back phase transformations:
-
-1. :ref:`Pattern-Defined Back Phase Transformations <pattern_defined_back_phase_transformations>` triggered for each sub-graph of the original graph, isomorphic to the specified pattern.
-2. :ref:`Generic Back Phase Transformations <generic_back_phase_transformations>`.
-
-.. note:: 
-   The graph layout during the back phase is always NCHW. However, during the front and middle phases it could be NHWC if the original model was using it. For more details, refer to :ref:`Model Conversion Pipeline <mo_model_conversion_pipeline>`.
-
-.. _pattern_defined_back_phase_transformations:
-
-Pattern-Defined Back Phase Transformations
-##########################################
-
-This type of transformation is implemented using ``mo.back.replacement.MiddleReplacementPattern`` as a base class and
-works the same way as :ref:`Pattern-Defined Middle Phase Transformations <pattern_defined_middle_phase_transformations>`.
-
-For the example of a pattern-defined back transformation, refer to the ``extensions/back/ShufflenetReLUReorder.py`` file.
-
-.. _generic_back_phase_transformations:
-
-Generic Back Phase Transformations
-##################################
-
-Model Optimizer provides mechanism to implement generic back phase transformations. This type of transformation is
-implemented using ``mo.back.replacement.BackReplacementPattern`` as a base class and works the same way as
-:ref:`Generic Middle Phase Transformations <generic_middle_phase_transformations>`.
-
-For the example of this transformation, refer to the ``extensions/back/GatherNormalizer.py`` file.
-
-====================
-Additional Resources
-====================
-
-* :doc:`Model Optimizer Extensibility <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>`
-* :doc:`Graph Traversal and Modification Using Ports and Connections <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections>`
-* :doc:`Model Optimizer Extensions <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions>`
-* :doc:`Extending Model Optimizer with Caffe Python Layers <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers>`
-
--- a/docs/OV_Runtime_UG/Int8Inference.md
+++ b/docs/OV_Runtime_UG/Int8Inference.md
@ -1,59 +0,0 @@
-# Low-Precision 8-bit Integer Inference
-
-## Disclaimer
-
-Low-precision 8-bit inference is optimized for:
- Intel® architecture processors with the following instruction set architecture extensions:  
-  - Intel® Advanced Vector Extensions 512 Vector Neural Network Instructions (Intel® AVX-512 VNNI)
-  - Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
-  - Intel® Advanced Vector Extensions 2.0 (Intel® AVX2)
-  - Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2)
- Intel® processor graphics:
-  - Intel® Iris® Xe Graphics
-  - Intel® Iris® Xe MAX Graphics
-
-## Introduction
-
-For 8-bit integer computation, a model must be quantized. You can use a quantized model from [OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel) or quantize a model yourself. For more details on how to get quantized model please refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
-
-The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
-
-When you pass the quantized IR to the OpenVINO™ plugin, the plugin automatically recognizes it as a quantized model and performs 8-bit inference. Note that if you pass a quantized model to another plugin that does not support 8-bit inference but supports all operations from the model, the model is inferred in precision that this plugin supports.
-
-At runtime, the quantized model is loaded to the plugin. The plugin uses the `Low Precision Transformation` component to update the model to infer it in low precision:
-   - Update `FakeQuantize` layers to have quantized output tensors in low-precision range and add dequantization layers to compensate for the update. Dequantization layers are pushed through as many layers as possible to have more layers in low precision. After that, most layers have quantized input tensors in low-precision range and can be inferred in low precision. Ideally, dequantization layers should be fused in the next `FakeQuantize` layer.
-   - Weights are quantized and stored in `Constant` layers. 
-
-## Prerequisites
-
-Let's explore quantized [TensorFlow* implementation of the ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model. Use [Model Downloader](@ref omz_tools_downloader) to download the `FP16` model from [OpenVINO™ Toolkit - Open Model Zoo repository](https://github.com/openvinotoolkit/open_model_zoo):
-
-```sh
-omz_downloader --name resnet-50-tf --precisions FP16-INT8
-```
-After that you should quantize the model with the [Model Quantizer](@ref omz_tools_downloader) tool.
-```sh
-omz_quantizer --model_dir public/resnet-50-tf --dataset_dir <DATASET_DIR> --precisions=FP16-INT8
-```
-
-The simplest way to infer the model and collect performance counters is the [Benchmark Application](../../samples/cpp/benchmark_app/README.md): 
-```sh
-./benchmark_app -m resnet-50-tf.xml -d CPU -niter 1 -api sync -report_type average_counters  -report_folder pc_report_dir
-```
-If you infer the model with the OpenVINO™ CPU plugin and collect performance counters, all operations (except the last non-quantized SoftMax) are executed in INT8 precision.  
-
-## Low-Precision 8-bit Integer Inference Workflow
-
-For 8-bit integer computations, a model must be quantized. Quantized models can be downloaded from [Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel). If the model is not quantized, you can use the [Post-Training Optimization Tool](@ref pot_introduction) to quantize the model. The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
-
-8-bit inference pipeline includes two stages (also refer to the figure below):
-1. *Offline stage*, or *model quantization*. During this stage, [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in the original precision range (`fp32`). `FakeQuantize` layer has `levels` attribute which defines quants count. Quants count defines precision which is used during inference. For `int8` range `levels` attribute value has to be 255 or 256. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_introduction) delivered with the Intel® Distribution of OpenVINO™ toolkit release package.
-
-   When you pass the quantized IR to the OpenVINO™ plugin, the plugin automatically recognizes it as a quantized model and performs 8-bit inference. Note, if you pass a quantized model to another plugin that does not support 8-bit inference but supports all operations from the model, the model is inferred in precision that this plugin supports.
-
-2. *Runtime stage*. This stage is an internal procedure of the OpenVINO™ plugin. During this stage, the quantized model is loaded to the plugin. The plugin uses `Low Precision Transformation` component to update the model to infer it in low precision:
-   - Update `FakeQuantize` layers to have quantized output tensors in low precision range and add dequantization layers to compensate the update. Dequantization layers are pushed through as many layers as possible to have more layers in low precision. After that, most layers have quantized input tensors in low precision range and can be inferred in low precision. Ideally, dequantization layers should be fused in the next `FakeQuantize` layer.
-   - Weights are quantized and stored in `Constant` layers. 
-
-![int8_flow]
-
--- a/docs/articles_en/Legal_Information.rst
+++ b/docs/articles_en/Legal_Information.rst
--- a/docs/articles_en/documentation/openvino_extensibility/openvino_plugin_library/detailed_guides/QuantizedNetworks.rst
+++ b/docs/articles_en/documentation/openvino_extensibility/openvino_plugin_library/detailed_guides/QuantizedNetworks.rst
@ -3,7 +3,11 @@
 Quantized models compute and restrictions
 =========================================

+.. toctree::
+   :maxdepth: 1
+   :hidden:

+   openvino_docs_ie_plugin_dg_lp_representation

 .. meta::
   :description: Learn about the support for quantized models with different 
--- a/docs/articles_en/documentation/openvino_extensibility/openvino_plugin_library/detailed_guides/QuantizedNetworks/LowPrecisionModelRepresentation.rst
+++ b/docs/articles_en/documentation/openvino_extensibility/openvino_plugin_library/detailed_guides/QuantizedNetworks/LowPrecisionModelRepresentation.rst
@ -0,0 +1,34 @@
+.. {#openvino_docs_ie_plugin_dg_lp_representation}
+
+Representation of low-precision models
+======================================
+
+The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance 
+on interpretation rules for such models at runtime. 
+
+Currently, there are two groups of optimization methods that can influence on the IR after applying them to the full-precision model:
+
+- **Sparsity**. It is represented by zeros inside the weights and this is up to the hardware plugin how to interpret these zeros
+  (use weights as is or apply special compression algorithms and sparse arithmetic). No additional mask is provided with the model.
+- **Quantization**. The rest of this document is dedicated to the representation of quantized models.
+
+Representation of quantized models
+###################################
+
+The OpenVINO Toolkit represents all the quantized models using the so-called FakeQuantize operation (see the description in 
+:doc:`this document <openvino_docs_ops_quantization_FakeQuantize_1>`). This operation is very expressive and allows mapping values from 
+arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision 
+data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. 
+It can be considered as an emulation of the quantization process which happens at runtime.
+In order to be able to execute a particular DL operation in low-precision all its inputs should be quantized i.e. should have FakeQuantize 
+between operation and data blobs.  The figure below shows an example of quantized Convolution which contains two FakeQuantize nodes: one for 
+weights and one for activations (bias is quantized using the same parameters).
+
+.. .. image:: _static/images/quantized_convolution.png
+
+Starting from OpenVINO 2020.2 release all the quantized models are represented in the compressed form. It means that the weights 
+of low-precision operations are converted into the target precision (e.g. INT8). It helps to substantially reduce the model size. 
+The rest of the parameters can be represented in FLOAT32 or FLOAT16 precision depending on the input full-precision model used in 
+the quantization process. Fig. 2 below shows an example of the part of the compressed IR.
+
+.. .. image:: _static/images/quantized_model_example.png
--- a/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/extending_model_optimizer_with_caffe_python_layers.rst
+++ b/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/extending_model_optimizer_with_caffe_python_layers.rst
@ -1,5 +1,7 @@
-# [LEGACY] Extending Model Optimizer with Caffe Python Layers {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers}
+.. {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers}

+[LEGACY] Extending Model Optimizer with Caffe Python Layers
+============================================================

 .. meta::
  :description: Learn how to extract operator attributes in Model Optimizer to 
--- a/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions.rst
+++ b/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions.rst
@ -1,5 +1,7 @@
-# [LEGACY] Model Optimizer Extensions {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions}
+.. {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions}

+[LEGACY] Model Optimizer Extensions
+=====================================

 .. meta::
   :description: Learn about deprecated extensions, which enable injecting logic 
--- a/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions/model_optimizer_extractor.rst
+++ b/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions/model_optimizer_extractor.rst
@ -1,5 +1,7 @@
-# [LEGACY] Operation Extractor {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Extractor}
+.. {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Extractor}

+[LEGACY] Operation Extractor
+=============================

 .. meta::
   :description: Learn about a deprecated generic extension in Model Optimizer, 
--- a/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions/model_optimizer_operation.rst
+++ b/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions/model_optimizer_operation.rst
@ -1,5 +1,7 @@
-# [LEGACY] Model Optimizer Operation {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Operation}
+.. {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Operation}

+[LEGACY] Model Optimizer Operation
+===================================

 .. meta::
   :description: Learn about the Op class, that contains operation attributes, 
--- a/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions/model_optimizer_transformation_extensions.rst
+++ b/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_extensions/model_optimizer_transformation_extensions.rst
@ -1,5 +1,7 @@
-# [LEGACY] Graph Transformation Extensions {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Transformation_Extensions}
+.. {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Model_Optimizer_Extensions_Model_Optimizer_Transformation_Extensions}

+[LEGACY] Graph Transformation Extensions
+==========================================

 .. meta::
  :description: Learn about various base classes for front, middle and back phase 
--- a/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_ports_connections.rst
+++ b/docs/articles_en/documentation/openvino_legacy_features/mo_ovc_transition/customize_model_optimizer/model_optimizer_ports_connections.rst
@ -1,5 +1,7 @@
-# [LEGACY] Graph Traversal and Modification {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections}
+.. {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer_Model_Optimizer_Ports_Connections}

+[LEGACY] Graph Traversal and Modification
+===========================================

 .. meta::
   :description: Learn about deprecated APIs and the Port and Connection classes 
--- a/docs/articles_en/glossary.rst
+++ b/docs/articles_en/glossary.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/CPU.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/CPU.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/GNA.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/GNA.rst
@ -359,7 +359,7 @@ and *W* is limited to 87 when there are 64 input channels.

 :download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i16) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i16.csv>`

-:download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i8) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i8.csv>`
+:download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i8) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i8.csv>` 


 .. note:: 
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/GPU.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/GPU.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/GPU/GPU_RemoteTensor_API.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/GPU/GPU_RemoteTensor_API.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/NPU.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/NPU.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/config_properties.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/config_properties.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/ShapeInference.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/ShapeInference.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/ShapeInference/Troubleshooting_ReshapeMethod.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/ShapeInference/Troubleshooting_ReshapeMethod.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_common.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_common.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_internals.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_internals.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_latency.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_latency.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_latency/Model_caching_overview.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_latency/Model_caching_overview.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_tput.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_tput.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_tput_advanced.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/dldt_deployment_optimization_tput_advanced.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/memory_optimization_guide.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/memory_optimization_guide.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/performance_hints.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/performance_hints.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/precision_control.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/precision_control.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview/layout_overview.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview/layout_overview.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview/preprocessing_details.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview/preprocessing_details.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview/preprocessing_usecase_save.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/dldt_deployment_optimization_guide/preprocessing_overview/preprocessing_usecase_save.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection/AutoPlugin_Debugging.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection/AutoPlugin_Debugging.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/automatic_batching.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/automatic_batching.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/hetero_execution.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/hetero_execution.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/multi_device.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/multi_device.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/Python_API_exclusives.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/Python_API_exclusives.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/Python_API_inference.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/Python_API_inference.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/model_representation.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/model_representation.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/ov_infer_request.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/integrate_with_your_application/ov_infer_request.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/model_state_intro.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/model_state_intro.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/model_state_intro/lowlatency2.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/model_state_intro/lowlatency2.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/ov_dynamic_shapes.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/ov_dynamic_shapes.rst
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/ov_dynamic_shapes/ov_without_dynamic_shapes.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/ov_dynamic_shapes/ov_without_dynamic_shapes.rst
--- a/docs/dev/pypi_publish/pre-release-note.md
+++ b/docs/dev/pypi_publish/pre-release-note.md
--- a/docs/dev/pypi_publish/pypi-openvino-dev.md
+++ b/docs/dev/pypi_publish/pypi-openvino-dev.md
--- a/docs/dev/pypi_publish/pypi-openvino-rt.md
+++ b/docs/dev/pypi_publish/pypi-openvino-rt.md
--- a/src/bindings/python/CMakeLists.txt
+++ b/src/bindings/python/CMakeLists.txt
@ -325,11 +325,11 @@ macro(ov_define_setup_py_dependencies)
        "${OpenVINO_SOURCE_DIR}/licensing/onednn_third-party-programs.txt"
        "${OpenVINO_SOURCE_DIR}/licensing/runtime-third-party-programs.txt"
        "${OpenVINO_SOURCE_DIR}/licensing/tbb_third-party-programs.txt"
-        "${OpenVINO_SOURCE_DIR}/docs/install_guides/pypi-openvino-rt.md")
+        "${OpenVINO_SOURCE_DIR}/docs/dev/pypi_publish/pypi-openvino-rt.md")

    if(wheel_pre_release)
        list(APPEND ov_setup_py_deps
-            "${OpenVINO_SOURCE_DIR}/docs/install_guides/pre-release-note.md")
+            "${OpenVINO_SOURCE_DIR}/docs/dev/pypi_publish/pre-release-note.md")
    endif()
 endmacro()

--- a/src/bindings/python/wheel/setup.py
+++ b/src/bindings/python/wheel/setup.py
@ -625,8 +625,8 @@ package_data: typing.Dict[str, list] = {}
 ext_modules = find_prebuilt_extensions(get_install_dirs_list(PY_INSTALL_CFG))
 entry_points = find_entry_points(PY_INSTALL_CFG)

-long_description_md = OPENVINO_SOURCE_DIR / "docs" / "install_guides" / "pypi-openvino-rt.md"
-md_files = [long_description_md, OPENVINO_SOURCE_DIR / "docs" / "install_guides" / "pre-release-note.md"]
+long_description_md = OPENVINO_SOURCE_DIR / "docs" / "dev" / "pypi_publish" / "pypi-openvino-rt.md"
+md_files = [long_description_md, OPENVINO_SOURCE_DIR / "docs" / "dev" / "pypi_publish" / "pre-release-note.md"]
 docs_url = "https://docs.openvino.ai/2023.0/index.html"

 if os.getenv("CI_BUILD_DEV_TAG"):
--- a/tools/openvino_dev/setup.py
+++ b/tools/openvino_dev/setup.py
@ -278,8 +278,8 @@ def concat_files(output_file, input_files):
                outfile.write(content)
    return output_file

-description_md = SCRIPT_DIR.parents[1] / 'docs' / 'install_guides' / 'pypi-openvino-dev.md'
-md_files = [description_md, SCRIPT_DIR.parents[1] / 'docs' / 'install_guides' / 'pre-release-note.md']
+description_md = SCRIPT_DIR.parents[1] / 'docs' / 'dev' / "pypi_publish" / 'pypi-openvino-dev.md'
+md_files = [description_md, SCRIPT_DIR.parents[1] / 'docs' / 'dev' / "pypi_publish" / 'pre-release-note.md']
 docs_url = 'https://docs.openvino.ai/2023.0/index.html'

 if(os.getenv('CI_BUILD_DEV_TAG')):