openvino/docs/notebooks/247-code-language-id-with-output.rst

Programming Language Classification with OpenVINO
=================================================

Overview
--------

This tutorial will be divided in 2 parts: 1. Create a simple inference
pipeline with a pre-trained model using the OpenVINO™ IR format. 2.
Conduct `post-training
quantization <https://docs.openvino.ai/latest/ptq_introduction.html>`__
on a pre-trained model using Hugging Face Optimum and benchmark
performance.

Feel free to use the notebook outline in Jupyter or your IDE for easy
navigation.

**Table of contents:**


-  `Introduction <#introduction>`__

   -  `Task <#task>`__
   -  `Model <#model>`__

-  `Part 1: Inference pipeline with
   OpenVINO <#part--inference-pipeline-with-openvino>`__

   -  `Install prerequisites <#install-prerequisites>`__
   -  `Imports <#imports>`__
   -  `Setting up HuggingFace cache <#setting-up-huggingface-cache>`__
   -  `Select inference device <#select-inference-device>`__
   -  `Download resources <#download-resources>`__
   -  `Create inference pipeline <#create-inference-pipeline>`__
   -  `Inference on new input <#inference-on-new-input>`__

-  `Part 2: OpenVINO post-training quantization with HuggingFace
   Optimum <#part--openvino-post-training-quantization-with-huggingface-optimum>`__

   -  `Define constants and
      functions <#define-constants-and-functions>`__
   -  `Load resources <#load-resources>`__
   -  `Load calibration dataset <#load-calibration-dataset>`__
   -  `Quantize model <#quantize-model>`__
   -  `Load quantized model <#load-quantized-model>`__
   -  `Inference on new input using quantized
      model <#inference-on-new-input-using-quantized-model>`__
   -  `Load evaluation set <#load-evaluation-set>`__
   -  `Evaluate model <#evaluate-model>`__

-  `Additional resources <#additional-resources>`__
-  `Clean up <#clean-up>`__

Introduction
------------


Task
~~~~


**Programming language classification** is the task of identifying which
programming language is used in an arbitrary code snippet. This can be
useful to label new data to include in a dataset, and potentially serve
as an intermediary step when input snippets need to be process based on
their programming language.

It is a relatively easy machine learning task given that each
programming language has its own formal symbols, syntax, and grammar.
However, there are some potential edge cases: - **Ambiguous short
snippets**: For example, TypeScript is a superset of JavaScript, meaning
it does everything JavaScript can and more. For a short input snippet,
it might be impossible to distinguish between the two. Given we know
TypeScript is a superset, and the model doesn’t, we should default to
classifying the input as JavaScript in a post-processing step. -
**Nested programming languages**: Some languages are typically used in
tandem. For example, most HTML contains CSS and JavaScript, and it is
not uncommon to see SQL nested in other scripting languages. For such
input, it is unclear what the expected output class should be. -
**Evolving programming language**: Even though programming languages are
formal, their symbols, syntax, and grammar can be revised and updated.
For example, the walrus operator (``:=``) was a symbol distinctively
used in Golang, but was later introduced in Python 3.8.

Model
~~~~~


The classification model that will be used in this notebook is
`CodeBERTa-language-id <https://huggingface.co/huggingface/CodeBERTa-language-id>`__
by HuggingFace. This model was fine-tuned from the masked language
modeling model
`CodeBERTa-small-v1 <https://huggingface.co/huggingface/CodeBERTa-small-v1>`__
trained on the
`CodeSearchNet <https://huggingface.co/huggingface/CodeBERTa-small-v1>`__
dataset (Husain, 2019).

It supports 6 programming languages: - Go - Java - JavaScript - PHP -
Python - Ruby

Part 1: Inference pipeline with OpenVINO
----------------------------------------


For this section, we will use the `HuggingFace
Optimum <https://huggingface.co/docs/optimum/index>`__ library, which
aims to optimize inference on specific hardware and integrates with the
OpenVINO toolkit. The code will be very similar to the `HuggingFace
Transformers <https://huggingface.co/docs/transformers/index>`__, but
will allow to automatically convert models to the OpenVINO™ IR format.

Install prerequisites
~~~~~~~~~~~~~~~~~~~~~


First, complete the `repository installation steps <../../README.md>`__.

Then, the following cell will install: - HuggingFace Optimum with
OpenVINO support - HuggingFace Evaluate to benchmark results

.. code:: ipython3

    %pip install -q "diffusers>=0.17.1" "openvino>=2023.1.0" "nncf>=2.5.0" "gradio" "onnx>=1.11.0" "transformers>=4.33.0" "evaluate" --extra-index-url https://download.pytorch.org/whl/cpu
    %pip install -q "git+https://github.com/huggingface/optimum-intel.git"


.. parsed-literal::

    DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.25.1 which is incompatible.
    pytorch-lightning 1.6.5 requires protobuf<=3.20.1, but you have protobuf 4.25.1 which is incompatible.
    tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 4.25.1 which is incompatible.
    Note: you may need to restart the kernel to use updated packages.
    DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
    Note: you may need to restart the kernel to use updated packages.


Imports
~~~~~~~


The import ``OVModelForSequenceClassification`` from Optimum is
equivalent to ``AutoModelForSequenceClassification`` from Transformers

.. code:: ipython3

    from functools import partial
    from pathlib import Path

    import pandas as pd
    from datasets import load_dataset, Dataset
    import evaluate
    from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
    from optimum.intel import OVModelForSequenceClassification
    from optimum.intel.openvino import OVConfig, OVQuantizer
    from huggingface_hub.utils import RepositoryNotFoundError


.. parsed-literal::

    2023-12-07 00:07:02.218482: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
    2023-12-07 00:07:02.252471: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
    To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2023-12-07 00:07:02.836089: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT


.. parsed-literal::

    INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino


.. parsed-literal::

    No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'


Setting up HuggingFace cache
~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Resources from HuggingFace will be downloaded in the local folder
``./model`` (next to this notebook) instead of the device global cache
for easy cleanup. Learn more
`here <https://huggingface.co/docs/transformers/installation?highlight=transformers_cache#cache-setup>`__.

.. code:: ipython3

    MODEL_NAME = "CodeBERTa-language-id"
    MODEL_ID = f"huggingface/{MODEL_NAME}"
    MODEL_LOCAL_PATH = Path("./model").joinpath(MODEL_NAME)

Select inference device
~~~~~~~~~~~~~~~~~~~~~~~


select device from dropdown list for running inference using OpenVINO

.. code:: ipython3

    import ipywidgets as widgets
    import openvino as ov

    core = ov.Core()

    device = widgets.Dropdown(
        options=core.available_devices + ["AUTO"],
        value='AUTO',
        description='Device:',
        disabled=False,
    )

    device


.. parsed-literal::

    Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')


Download resources
~~~~~~~~~~~~~~~~~~


.. code:: ipython3

    # try to load resources locally
    try:
        model = OVModelForSequenceClassification.from_pretrained(MODEL_LOCAL_PATH, device=device.value)
        tokenizer = AutoTokenizer.from_pretrained(MODEL_LOCAL_PATH)
        print(f"Loaded resources from local path: {MODEL_LOCAL_PATH.absolute()}")

    # if not found, download from HuggingFace Hub then save locally
    except (RepositoryNotFoundError, OSError):
        print("Downloading resources from HuggingFace Hub")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
        tokenizer.save_pretrained(MODEL_LOCAL_PATH)

        # export=True is needed to convert the PyTorch model to OpenVINO
        model = OVModelForSequenceClassification.from_pretrained(MODEL_ID, export=True, device=device.value)
        model.save_pretrained(MODEL_LOCAL_PATH)
        print(f"Ressources cached locally at: {MODEL_LOCAL_PATH.absolute()}")


.. parsed-literal::

    Downloading resources from HuggingFace Hub


.. parsed-literal::

    Framework not specified. Using pt to export to ONNX.
    Some weights of the model checkpoint at huggingface/CodeBERTa-language-id were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
    - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Using the export variant default. Available variants are:
        - default: The default ONNX variant.
    Using framework PyTorch: 1.13.1+cpu
    Overriding 1 configuration item(s)
    	- use_cache -> False


.. parsed-literal::

    WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11.


.. parsed-literal::

    [ WARNING ]  Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s.
    Compiling the model to AUTO ...


.. parsed-literal::

    Ressources cached locally at: /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-561/.workspace/scm/ov-notebook/notebooks/247-code-language-id/model/CodeBERTa-language-id


Create inference pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~


.. code:: ipython3

    code_classification_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

Inference on new input
~~~~~~~~~~~~~~~~~~~~~~


.. code:: ipython3

    # change input snippet to test model
    input_snippet = "df['speed'] = df.distance / df.time"
    output = code_classification_pipe(input_snippet)

    print(f"Input snippet:\n  {input_snippet}\n")
    print(f"Predicted label: {output[0]['label']}")
    print(f"Predicted score: {output[0]['score']:.2}")


.. parsed-literal::

    Input snippet:
      df['speed'] = df.distance / df.time

    Predicted label: python
    Predicted score: 0.81


Part 2: OpenVINO post-training quantization with HuggingFace Optimum
--------------------------------------------------------------------


In this section, we will quantize a trained model. At a high-level, this
process consists of using lower precision numbers in the model, which
results in a smaller model size and faster inference at the cost of a
potential marginal performance degradation. `Learn
more <https://docs.openvino.ai/latest/ptq_introduction.html>`__.

The HuggingFace Optimum library supports post-training quantization for
OpenVINO. `Learn
more <https://huggingface.co/docs/optimum/main/en/intel/index>`__.

Define constants and functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. code:: ipython3

    QUANTIZED_MODEL_LOCAL_PATH = MODEL_LOCAL_PATH.with_name(f"{MODEL_NAME}-quantized")
    DATASET_NAME = "code_search_net"
    LABEL_MAPPING = {"go": 0, "java": 1, "javascript": 2, "php": 3, "python": 4, "ruby": 5}


    def preprocess_function(examples: dict, tokenizer):
        """Preprocess inputs by tokenizing the `func_code_string` column"""
        return tokenizer(
            examples["func_code_string"],
            padding="max_length",
            max_length=tokenizer.model_max_length,
            truncation=True,
        )


    def map_labels(example: dict) -> dict:
        """Convert string labels to integers"""
        label_mapping = {"go": 0, "java": 1, "javascript": 2, "php": 3, "python": 4, "ruby": 5}
        example["language"] = label_mapping[example["language"]]
        return example


    def get_dataset_sample(dataset_split: str, num_samples: int) -> Dataset:
        """Create a sample with equal representation of each class without downloading the entire data"""
        labels = ["go", "java", "javascript", "php", "python", "ruby"]
        example_per_label = num_samples // len(labels)

        examples = []
        for label in labels:
            subset = load_dataset("code_search_net", split=dataset_split, name=label, streaming=True)
            subset = subset.map(map_labels)
            examples.extend([example for example in subset.shuffle().take(example_per_label)])

        return Dataset.from_list(examples)

Load resources
~~~~~~~~~~~~~~


NOTE: the base model is loaded using
``AutoModelForSequenceClassification`` from ``Transformers``

.. code:: ipython3

    tokenizer = AutoTokenizer.from_pretrained(MODEL_LOCAL_PATH)
    base_model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)

    quantizer = OVQuantizer.from_pretrained(base_model)
    quantization_config = OVConfig()


.. parsed-literal::

    Some weights of the model checkpoint at huggingface/CodeBERTa-language-id were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
    - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Load calibration dataset
~~~~~~~~~~~~~~~~~~~~~~~~


The ``get_dataset_sample()`` function will sample up to ``num_samples``,
with an equal number of examples across the 6 programming languages.

NOTE: Uncomment the method below to download and use the full dataset
(5+ Gb).

.. code:: ipython3

    calibration_sample = get_dataset_sample(dataset_split="train", num_samples=120)
    calibration_sample = calibration_sample.map(partial(preprocess_function, tokenizer=tokenizer))

    # calibration_sample = quantizer.get_calibration_dataset(
    #     DATASET_NAME,
    #     preprocess_function=partial(preprocess_function, tokenizer=tokenizer),
    #     num_samples=120,
    #     dataset_split="train",
    #     preprocess_batch=True,
    # )


.. parsed-literal::

    Map:   0%|          | 0/120 [00:00<?, ? examples/s]


Quantize model
~~~~~~~~~~~~~~


Calling ``quantizer.quantize(...)`` will iterate through the calibration
dataset to quantize and save the model

.. code:: ipython3

    quantizer.quantize(
        quantization_config=quantization_config,
        calibration_dataset=calibration_sample,
        save_directory=QUANTIZED_MODEL_LOCAL_PATH,
    )


.. parsed-literal::

    INFO:nncf:Not adding activation input quantizer for operation: 12 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/NNCFEmbedding[token_type_embeddings]/embedding_0
    INFO:nncf:Not adding activation input quantizer for operation: 11 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/NNCFEmbedding[word_embeddings]/embedding_0
    INFO:nncf:Not adding activation input quantizer for operation: 3 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/ne_0
    INFO:nncf:Not adding activation input quantizer for operation: 4 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/int_0
    INFO:nncf:Not adding activation input quantizer for operation: 5 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/cumsum_0
    INFO:nncf:Not adding activation input quantizer for operation: 13 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/__add___2
    INFO:nncf:Not adding activation input quantizer for operation: 6 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/type_as_0
    INFO:nncf:Not adding activation input quantizer for operation: 7 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 8 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/__mul___0
    INFO:nncf:Not adding activation input quantizer for operation: 9 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/long_0
    INFO:nncf:Not adding activation input quantizer for operation: 10 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/__add___1
    INFO:nncf:Not adding activation input quantizer for operation: 14 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/NNCFEmbedding[position_embeddings]/embedding_0
    INFO:nncf:Not adding activation input quantizer for operation: 15 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/__iadd___0
    INFO:nncf:Not adding activation input quantizer for operation: 16 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 17 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEmbeddings[embeddings]/Dropout[dropout]/dropout_0
    INFO:nncf:Not adding activation input quantizer for operation: 30 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[0]/RobertaAttention[attention]/RobertaSelfAttention[self]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 33 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[0]/RobertaAttention[attention]/RobertaSelfAttention[self]/matmul_1
    INFO:nncf:Not adding activation input quantizer for operation: 39 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[0]/RobertaAttention[attention]/RobertaSelfOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 40 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[0]/RobertaAttention[attention]/RobertaSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 45 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[0]/RobertaOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 46 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[0]/RobertaOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 59 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[1]/RobertaAttention[attention]/RobertaSelfAttention[self]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 62 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[1]/RobertaAttention[attention]/RobertaSelfAttention[self]/matmul_1
    INFO:nncf:Not adding activation input quantizer for operation: 68 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[1]/RobertaAttention[attention]/RobertaSelfOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 69 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[1]/RobertaAttention[attention]/RobertaSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 74 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[1]/RobertaOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 75 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[1]/RobertaOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 88 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[2]/RobertaAttention[attention]/RobertaSelfAttention[self]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 91 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[2]/RobertaAttention[attention]/RobertaSelfAttention[self]/matmul_1
    INFO:nncf:Not adding activation input quantizer for operation: 97 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[2]/RobertaAttention[attention]/RobertaSelfOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 98 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[2]/RobertaAttention[attention]/RobertaSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 103 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[2]/RobertaOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 104 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[2]/RobertaOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 117 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[3]/RobertaAttention[attention]/RobertaSelfAttention[self]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 120 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[3]/RobertaAttention[attention]/RobertaSelfAttention[self]/matmul_1
    INFO:nncf:Not adding activation input quantizer for operation: 126 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[3]/RobertaAttention[attention]/RobertaSelfOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 127 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[3]/RobertaAttention[attention]/RobertaSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 132 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[3]/RobertaOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 133 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[3]/RobertaOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 146 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[4]/RobertaAttention[attention]/RobertaSelfAttention[self]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 149 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[4]/RobertaAttention[attention]/RobertaSelfAttention[self]/matmul_1
    INFO:nncf:Not adding activation input quantizer for operation: 155 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[4]/RobertaAttention[attention]/RobertaSelfOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 156 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[4]/RobertaAttention[attention]/RobertaSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 161 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[4]/RobertaOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 162 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[4]/RobertaOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 175 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[5]/RobertaAttention[attention]/RobertaSelfAttention[self]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 178 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[5]/RobertaAttention[attention]/RobertaSelfAttention[self]/matmul_1
    INFO:nncf:Not adding activation input quantizer for operation: 184 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[5]/RobertaAttention[attention]/RobertaSelfOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 185 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[5]/RobertaAttention[attention]/RobertaSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Not adding activation input quantizer for operation: 190 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[5]/RobertaOutput[output]/__add___0
    INFO:nncf:Not adding activation input quantizer for operation: 191 RobertaForSequenceClassification/RobertaModel[roberta]/RobertaEncoder[encoder]/ModuleList[layer]/RobertaLayer[5]/RobertaOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0
    INFO:nncf:Collecting tensor statistics |█               | 33 / 300
    INFO:nncf:Collecting tensor statistics |███             | 66 / 300
    INFO:nncf:Collecting tensor statistics |█████           | 99 / 300
    INFO:nncf:Compiling and loading torch extension: quantized_functions_cpu...


.. parsed-literal::

    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
    	- Avoid using `tokenizers` before the fork if possible
    	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
    	- Avoid using `tokenizers` before the fork if possible
    	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
    	- Avoid using `tokenizers` before the fork if possible
    	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
    	- Avoid using `tokenizers` before the fork if possible
    	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


.. parsed-literal::

    INFO:nncf:Finished loading torch extension: quantized_functions_cpu


.. parsed-literal::

    Using framework PyTorch: 1.13.1+cpu
    Overriding 1 configuration item(s)
    	- use_cache -> False
    Configuration saved in model/CodeBERTa-language-id-quantized/openvino_config.json


Load quantized model
~~~~~~~~~~~~~~~~~~~~


NOTE: the argument ``export=True`` is not required since the quantized
model is already in the OpenVINO format.

.. code:: ipython3

    quantized_model = OVModelForSequenceClassification.from_pretrained(QUANTIZED_MODEL_LOCAL_PATH, device=device.value)
    quantized_code_classification_pipe = pipeline("text-classification", model=quantized_model, tokenizer=tokenizer)


.. parsed-literal::

    Compiling the model to AUTO ...
    Setting OpenVINO CACHE_DIR to model/CodeBERTa-language-id-quantized/model_cache


Inference on new input using quantized model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. code:: ipython3

    input_snippet = "df['speed'] = df.distance / df.time"
    output = quantized_code_classification_pipe(input_snippet)

    print(f"Input snippet:\n  {input_snippet}\n")
    print(f"Predicted label: {output[0]['label']}")
    print(f"Predicted score: {output[0]['score']:.2}")


.. parsed-literal::

    Input snippet:
      df['speed'] = df.distance / df.time

    Predicted label: python
    Predicted score: 0.81


Load evaluation set
~~~~~~~~~~~~~~~~~~~


NOTE: Uncomment the method below to download and use the full dataset
(5+ Gb).

.. code:: ipython3

    validation_sample = get_dataset_sample(dataset_split="validation", num_samples=120)

    # validation_sample = load_dataset(DATASET_NAME, split="validation")

Evaluate model
~~~~~~~~~~~~~~


.. code:: ipython3

    # This class is needed due to a current limitation of the Evaluate library with multiclass metrics
    # ref: https://discuss.huggingface.co/t/combining-metrics-for-multiclass-predictions-evaluations/21792/16
    class ConfiguredMetric:
        def __init__(self, metric, *metric_args, **metric_kwargs):
            self.metric = metric
            self.metric_args = metric_args
            self.metric_kwargs = metric_kwargs

        def add(self, *args, **kwargs):
            return self.metric.add(*args, **kwargs)

        def add_batch(self, *args, **kwargs):
            return self.metric.add_batch(*args, **kwargs)

        def compute(self, *args, **kwargs):
            return self.metric.compute(*args, *self.metric_args, **kwargs, **self.metric_kwargs)

        @property
        def name(self):
            return self.metric.name

        def _feature_names(self):
            return self.metric._feature_names()

First, an ``Evaluator`` object for ``text-classification`` and a set of
``EvaluationModule`` are instantiated. Then, the evaluator
``.compute()`` method is called on both the base
``code_classification_pipe`` and the quantized
``quantized_code_classification_pipeline``. Finally, results are
displayed.

.. code:: ipython3

    code_classification_evaluator = evaluate.evaluator("text-classification")
    # instantiate an object that can contain multiple `evaluate` metrics
    metrics = evaluate.combine([
        ConfiguredMetric(evaluate.load('f1'), average='macro'),
    ])

    base_results = code_classification_evaluator.compute(
        model_or_pipeline=code_classification_pipe,
        data=validation_sample,
        input_column="func_code_string",
        label_column="language",
        label_mapping=LABEL_MAPPING,
        metric=metrics,
    )

    quantized_results = code_classification_evaluator.compute(
        model_or_pipeline=quantized_code_classification_pipe,
        data=validation_sample,
        input_column="func_code_string",
        label_column="language",
        label_mapping=LABEL_MAPPING,
        metric=metrics,
    )

    results_df = pd.DataFrame.from_records([base_results, quantized_results], index=["base", "quantized"])
    results_df


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>f1</th>
          <th>total_time_in_seconds</th>
          <th>samples_per_second</th>
          <th>latency_in_seconds</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>base</th>
          <td>1.0</td>
          <td>2.045702</td>
          <td>58.659569</td>
          <td>0.017048</td>
        </tr>
        <tr>
          <th>quantized</th>
          <td>1.0</td>
          <td>2.602893</td>
          <td>46.102553</td>
          <td>0.021691</td>
        </tr>
      </tbody>
    </table>
    </div>


Additional resources
--------------------

- `Grammatical Error Correction
with
OpenVINO <https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/214-grammar-correction/214-grammar-correction.ipynb>`__
- `Quantize a Hugging Face Question-Answering Model with
OpenVINO <https://github.com/huggingface/optimum-intel/blob/main/notebooks/openvino/question_answering_quantization.ipynb>`__\ \*\*

Clean up
--------


Uncomment and run cell below to delete all resources cached locally in
./model

.. code:: ipython3

    # import os
    # import shutil

    # try:
    #     shutil.rmtree(path=QUANTIZED_MODEL_LOCAL_PATH)
    #     shutil.rmtree(path=MODEL_LOCAL_PATH)
    #     os.remove(path="./compressed_graph.dot")
    #     os.remove(path="./original_graph.dot")
    # except FileNotFoundError:
    #     print("Directory was already deleted")