* OpenCV 4.2 or higher built with `Intel® Distribution of OpenVINO™ Toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__ (building with `Intel® TBB <https://www.threadingbuildingblocks.org/intel-tbb-tutorial>`__ is a plus)
* The following pre-trained models from the :doc:`Open Model Zoo <omz_models_group_intel>`
We will implement a simple face beautification algorithm using a combination of modern Deep Learning techniques and traditional Computer Vision. The general idea behind the algorithm is to make face skin smoother while preserving face features like eyes or a mouth contrast. The algorithm identifies parts of the face using a DNN inference, applies different filters to the parts found, and then combines it into the final result using basic image arithmetics:
Generating face element masks based on a limited set of features (just 35 per face, including all its parts) is not very trivial and is described in the sections below.
The resulting graph is a mixture of G-API's standard operations, user-defined operations (namespace custom::), and DNN inference. The generic function ``cv::gapi::infer<>()`` allows you to trigger inference within the pipeline; networks to infer are specified as template parameters. The sample code is using two versions of ``cv::gapi::infer<>()``:
* An ROI-list oriented one is used to run landmarks inference on a list of faces – this version produces an array of landmarks per every face. More on this in "Face Analytics pipeline" (:ref:`Building a GComputation <gapi_ifd_gcomputation>` section).
Unsharp mask in G-API
+++++++++++++++++++++
The unsharp mask :math:`U` for image :math:`I` is defined as:
.. math::
U = I - s \* L(M(I))
where :math:`M()` is a median filter, :math:`L()` is the Laplace operator, and :math:`s` is a strength coefficient. While G-API doesn't provide this function out-of-the-box, it is expressed naturally with the existing G-API operations:
Note that the code snipped above is a regular C++ function defined with G-API types. Users can write functions like this to simplify graph construction; when called, this function just puts the relevant nodes to the pipeline it is used in.
The face beautification graph is using custom operations extensively. This chapter focuses on the most interesting kernels, refer to G-API Kernel API for general information on defining operations and implementing kernels in G-API.
The algorithm infers locations of face elements (like the eyes, the mouth and the head contour itself) using a generic facial landmarks detector (details) from OpenVINO™ Open Model Zoo. However, the detected landmarks as-is are not enough to generate masks — this operation requires regions of interest on the face represented by closed contours, so some interpolation is applied to get them. This landmarks processing and interpolation is performed by the following kernel:
The kernel takes two arrays of denormalized landmarks coordinates and returns an array of elements' closed contours and an array of faces' closed contours; in other words, outputs are, the first, an array of contours of image areas to be sharpened and, the second, another one to be smoothed.
Briefly, this function restores the bottom side of an eye by a half-ellipse based on two points in left and right eye corners. In fact, ``cv::ellipse2Poly()`` is used to approximate the eye region, and the function only defines ellipse parameters based on just two points:
- The ellipse center and the :math:`X` half-axis calculated by two eye Points.
- The :math:`Y` half-axis calculated according to the assumption that an average eye width is :math:`1/3` of its length.
- The start and the end angles which are 0 and 180 (refer to ``cv::ellipse()`` documentation).
The use of the ``atan2()`` instead of just ``atan()`` in function ``custom::getLineInclinationAngleDegrees()`` is essential as it allows to return a negative value depending on the ``x`` and the ``y`` signs so we can get the right angle even in case of upside-down face arrangement (if we put the points in the right order, of course).
As we have only jaw points in our detected landmarks, we have to get a half-ellipse based on three points of a jaw: the leftmost, the rightmost and the lowest one. The jaw width is assumed to be equal to the forehead width and the latter is calculated using the left and the right points. Speaking of the :math:`Y` axis, we have no points to get it directly, and instead assume that the forehead height is about :math:`2/3` of the jaw height, which can be figured out from the face center (the middle between the left and right points) and the lowest jaw point.
Once the graph is fully expressed, we can finally compile it and run on real data. G-API graph compilation is the stage where the G-API framework actually understands which kernels and networks to use. This configuration happens via G-API compilation arguments.
auto faceParams = cv::gapi::ie::Params<custom::FaceDetector>
{
/\*std::string\*/ faceXmlPath,
/\*std::string\*/ faceBinPath,
/\*std::string\*/ faceDevice
};
auto landmParams = cv::gapi::ie::Params<custom::LandmDetector>
{
/\*std::string\*/ landmXmlPath,
/\*std::string\*/ landmBinPath,
/\*std::string\*/ landmDevice
};
Every ``cv::gapi::ie::Params<>`` object is related to the network specified in its template argument. We should pass there the network type we have defined in ``G_API_NET()`` in the early beginning of the tutorial.
Network parameters are then wrapped in ``cv::gapi::NetworkPackage``:
.. code-block:: cpp
auto networks = cv::gapi::networks(faceParams, landmParams);
More details in "Face Analytics Pipeline" (:ref:`Configuring the Pipeline <gapi_ifd_configuration>` section).
In this example we use a lot of custom kernels, in addition to that we use Fluid backend to optimize out memory for G-API's standard kernels where applicable. The resulting kernel package is formed like this:
More on this in "Face Analytics Pipeline" (:ref:`Configuring the Pipeline <gapi_ifd_configuration>` section).
Running the streaming pipeline
++++++++++++++++++++++++++++++
In order to run the G-API streaming pipeline, all we need is to specify the input video source, call ``cv::GStreamingCompiled::start()``, and then fetch the pipeline processing results:
The tutorial has two goals: to show the use of brand new features of G-API introduced in OpenCV 4.2, and give a basic understanding on a sample face beautification algorithm.
On the test machine (Intel® Core™ i7-8700) the G-API-optimized video pipeline outperforms its serial (non-pipelined) version by a factor of 2.7 – meaning that for such a non-trivial graph, the proper pipelining can bring almost 3x increase in performance.