Files

Alina Kladieva 29d55ef7f3 Add doc file for Smart CI (#21557 )

2023-12-16 12:49:19 +04:00

17 KiB

Raw Blame History

Smart CI Overview

Smart CI is a feature aiming to optimize pre-commit CI workflow by running only those builds and tests that are actually required to validate changes in a given PR (Pull Request). As an example, if PR changes only CPU plugin, GPU plugin tests in the pre-commit for this PR will be skipped, since they are unrelated. This allows to decrease execution time for isolated changes in product components, and to minimize the load on limited hardware resources.

Product component is a distinct functional unit, responsible for a specific product feature. It is defined by a set of files in repository (e.g. openvino/src/some_folder/**) containing the feature implementation.

This document describes how Smart CI is implemented in our GitHub Actions pre-commit workflows and how to add or modify rules for it.

Prerequisites

Basic understanding of GitHub Actions workflows

Implementation

Smart CI is implemented as a custom GitHub Action stored in openvino repository: .github/actions/smart-ci. In GitHub Actions workflows this action is called as a first step in a separate job:

jobs:
  Smart_CI:
    outputs:
      affected_components: "${{ steps.smart_ci.outputs.affected_components }}"
      skip_workflow: "${{ steps.smart_ci.outputs.skip_workflow }}"
    steps:
      - name: Get affected components
        id: smart_ci
        uses: ./.github/actions/smart-ci
        ...

It takes PR data as an input and outputs a list of product components affected by this PR and a validation scope for each of these components (either only "build" or both "build" and "test" - by design we assume that testing component requires it to be built). Example of such output for PR that changes only Tensorflow Frontend component (files inside src/frontends/tensorflow):

changed_component_names: {'TF_FE'}  # TF_FE is an alias we chose for Tensorflow Frontend component
affected_components={
    "TF_FE": {"test": true, "build": true}, 
    "MO": {"test": true, "build": true}, 
    "CPU": {"build": true}, 
    "Python_API": {"build": true}, 
    ...
}

Once Smart CI job is finished, validation jobs are started. Based on the output from Smart CI, some jobs can be skipped entirely; in other jobs only separate steps can be skipped. This is done via GitHub Actions conditions. For example, the following job called TensorFlow_Hub_Models_Tests will be executed only if PR affects "TF_FE" component and requires running "test" scope for it:

TensorFlow_Hub_Models_Tests:
  needs: [Build, Smart_CI]
  ...
  if: fromJSON(needs.smart_ci.outputs.affected_components).TF_FE.test
  steps:
    - ...

The way how we define product components and "smart" rules for them is described further.

Configuration of Smart CI rules

Smart CI operates based on the set of rules described in two configuration files, stored in openvino repository.

Product components definition: .github/labeler.yml

This file contains mapping of source code paths to corresponding component names. Essentially, this a configuration for actions/labeler GitHub Action, which we use to automatically assign labels to pull requests based on PR changeset. We reuse it for Smart CI purposes, so that each label described in this configuration is considered a component name, and the labeler action automatically determines which components were changed in each PR. For example:

'category: CPU':
- 'src/plugins/intel_cpu/**/*'
- 'src/common/snippets/**/*'
- 'thirdparty/xbyak/**/*'

If PR changes at least one file matching any of the minimatch glob patterns above, label "category: CPU" will be assigned to this PR, and GitHub Actions workflows that use Smart CI feature will consider component named "CPU" changed ("category:" prefix is omitted in component name).

Definition of dependencies between components: .github/components.yml

Some components are not entirely independent, and changes in them may affect other components as well. In this case, in addition to the validation for the changed component itself (build + tests), validation for dependent components is also required (either only build or both build and tests). This file describes these relationships between components, for example:

PyTorch_FE:       # Component name
  revalidate:     # Defines list of components to revalidate (build + test) if the component above was changed
    - MO          # This component depends on PyTorch_FE and requires full revalidation
  build:          # Defines list of components to build if the PyTorch_FE was changed (test runs for them are skipped)
    - CPU         # This component and the component below must be built if PyTorch_FE was changed
    - Python_API

With the example above, the following pipeline will be executed on changes only to PyTorch_FE:

Build for PyTorch_FE
Tests for PyTorch_FE
Build for MO
Tests for MO
Build for CPU
Build for Python_API

Note

: the dependencies are not transitive - if a component "A" depends on component "B", and component "B" depends on component "C", we don't implicitly assume that "A" depends on "C". Each component must specify all his dependents explicitly.

Specifics of pipeline behavior

If the changed component is not defined in components.yml, we assume that it affects all other components, and the full validation scope will be executed.
If more than one component is affected by PR, all jobs required to validate all these components will be executed.
If PR changes files that are not related to any known component - the full validation scope will be executed, since we don't want to skip anything for the unlabeled changes - they are under our control and may potentially introduce regressions. For that we use a patched version of actions/labeler v4.3.0 with the same functionality, but with an additional feature implemented, allowing us to detect cases when PR changes files that do not match any of the patterns in labeler.yml configuration.

How to contribute

Adding a new component

Add a new record to .github/labeler.yml. Root-level key is a component (label) name, and value is a set of globs to define which source code paths are related to this component. See labeler usage to get familiar with globs syntax.
Add a new record to .github/components.yml. Root-level key is a component name, which is the same as the label name defined in the previous step, but with prefix "category:" omitted (if any). If there were spaces present in label name - replace them with underscores. Example: 'category: LP transformations' in labeler.yml -> LP_transformations in components.yml. To fill the value, review other components in components.yml and choose the ones that can be affected by changes in a new component. Put those that require full revalidation (build and test) under revalidate key; and those requiring only build - under build key. Example record:
```
your_component:
  revalidate: 
    - component_1
    - component_2
  build: 
    - component_3
```
If your component does not affect anything else, specify empty list under both revalidate and build:
```
your_component:
  revalidate: []
  build: []
```
If you wish to explicitly show that a component affects all other components, use "all" notation as a value under revalidate (this will cause full pipeline to be executed on changes to your component - equivalent to completely omitting the record about it in components.yml) or build (this will mean that changes to your component will cause building - but not testing - all other components):
```
your_component_1:
  build: 'all'

your_component_2:
  revalidate: 'all'
```
Review other components in components.yml - does your component itself require to be validated when there are changes in any of the listed components? If yes, add your component name under revalidate or build sections of the respective components.

Adding validation for a component

You may wish to add a new validation job to test your new component, or choose an existing one. For that, go to the desired workflow in .github/workflows (the main ones are linux.yml, windows.yml and mac.yml). If Smart CI is enabled for the pipeline, you will find Smart_CI job in the beginning of the workflow:

jobs:
  Smart_CI:
    ...
    steps:
      - name: Get affected components
        id: smart_ci
      ...

Alternatively, you can create a separate workflow for testing your component. The following pages will be helpful:

Adding support for Smart CI to a workflow
using-workflows/about-workflows - official GitHub documentation

Once you have a job that validates your component:

Add Smart_CI to "needs" block for this job - this will ensure that it will get access to Smart CI outputs:

job_that_validates_your_component:
  needs: Smart_CI  # if other job was already specified here, add Smart_CI to list like that: [Other_Job_ID, Smart_CI]
  ...

Add "if" condition to refer to the Smart CI output. To run the whole job conditionally - add it on the same level as "needs" key:

# The job below will be started if YOUR_COMPONENT was affected and "test" scope is required
job_that_validates_your_component:
  needs: [Build, Smart_CI]
  ...
  if: fromJSON(needs.smart_ci.outputs.affected_components).YOUR_COMPONENT.test # or <...>.build, if needed
  steps:
    - ...

If only a separate step within the job must be executed on changes to your component - add "if" to desired steps (syntax is described here):

job_that_validates_your_component:
  needs: [Build, Smart_CI]
  ...
  steps:
    # The step below will be started if YOUR_COMPONENT was affected and "build" scope is required
    - name: step_name
      if: fromJSON(needs.smart_ci.outputs.affected_components).YOUR_COMPONENT.build # or <...>.test, if needed

Note

: when adding Smart CI condition to step within a job, make sure that the job itself won't be skipped on changes only to your component. For that, look at the "if" condition on the job level - it either must be absent (in this case the job will always be executed); or return "True" in all cases you wish your conditional step to be executed.

You can also use any boolean operators to write complex conditions, for example:

  # The below condition will force the job/step to run when either COMPONENT_1 or COMPONENT_2 was changed
  if: fromJSON(needs.smart_ci.outputs.affected_components).COMPONENT_1.test ||
      fromJSON(needs.smart_ci.outputs.affected_components).COMPONENT_2.test

See learn-github-actions/expressions page to learn more about expressions in conditions.

Adding support for Smart CI to a workflow

To use Smart CI in a workflow, add the following code under jobs block before all other jobs that will use it, like that:

jobs:
  Smart_CI:
    runs-on: ubuntu-latest
    outputs:
      affected_components: "${{ steps.smart_ci.outputs.affected_components }}"
      skip_workflow: "${{ steps.smart_ci.outputs.skip_workflow }}"
    steps:
      - name: checkout action
        uses: actions/checkout@v4
        with:
          sparse-checkout: .github/actions/smart-ci

      - name: Get affected components
        id: smart_ci
        uses: ./.github/actions/smart-ci
        with:
          repository: ${{ github.repository }}
          pr: ${{ github.event.number }}
          commit_sha: ${{ github.sha }}
          component_pattern: "category: (.*)"
          repo_token: ${{ secrets.GITHUB_TOKEN }}

If needed, more parameters can be passed to "Get affected components" step, full list is available here: .github/actions/smart-ci/action.yml.

After that, you can refer to the outputs from Smart_CI in validation jobs, as described in Adding validation for a component section. To learn more about the syntax of GitHub Actions Workflows, see also using-workflows/about-workflows.

Skipping the whole workflow for specific changes

For cases, when you want to skip not just a few jobs, but the entire workflow, GitHub by default offers paths-ignore feature. But it has a limitation - it cannot be used in workflows that have jobs marked as "Required" for merge (see details). Since we want to keep our workflows required, a workaround on Smart CI side was added - it returns an indicator that workflow can be completely skipped, if PR was labeled only by given labels and/or changes only files matching given fnmatch patterns. These labels and patterns are passed as inputs to Smart CI action in with block, and the indicator is returned as a separate output called skip_workflow, for example:

  Smart_CI:
    runs-on: ubuntu-latest
    outputs:
      ...
      # The output below is set only if the workflow can be completely skipped, and empty otherwise
      skip_workflow: "${{ steps.smart_ci.outputs.skip_workflow }}"
    steps:
      ...
      - name: Get affected components
        id: smart_ci
        uses: ./.github/actions/smart-ci
        with:
          ...
          # Comma-separated rules for skipping the entire workflow
          skip_when_only_listed_labels_set: 'docs'
          skip_when_only_listed_files_changed: '*.md,*.rst,*.png,*.jpg,*.svg'

The skip_workflow output can then be used to conditionally run a parent job in a workflow (the job that is required to pass before all other jobs and is specified in "needs" block for all of them, for example, Build). The condition looks like that:

Build:
    needs: Smart_CI
    ...
    if: "!needs.smart_ci.outputs.skip_workflow"
    ...

Note

: If a workflow has more than one parent job, the condition must be added to each of them.

This approach works because skipped checks are processed as successful by GitHub, so they do not block merge, unlike required workflows skipped by paths filtering.

Adding Smart CI for components outside openvino repository

Some components (like NVIDIA plugin or ONNX Runtime) are stored in their own repositories and therefore cannot be defined via pattern matching on source code in openvino repository, while they still need to be validated together with core OpenVINO. To add Smart CI rules for such components, skip the first step with modifying labeler configuration in Adding a new component instruction and go directly to the next step:

Add a new record to .github/components.yml, with empty values for revalidate and build keys, like that:
```
NEW_EXTERNAL_COMPONENT:
  revalidate: []
  build: []
```
Review other components in components.yml, find those that have to be validated together with a new component and add a new component's name under revalidate or build sections of these components.
Add or find a job that does integration validation of a new external component with OpenVINO and provide it with an "if" condition: if: fromJSON(needs.smart_ci.outputs.affected_components).NEW_EXTERNAL_COMPONENT like described in step 3 of Adding a new component instruction.

This will ensure that integration validation for this external component is started only on changes to chosen components in openvino repository.

17 KiB Raw Blame History