* adding-metadata * Apply suggestions from code review Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Apply suggestions from code review Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> --------- Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
3.9 KiB
Datumaro
@sphinxdirective
.. meta:: :description: Start working with Datumaro, which offers functionalities for basic data import/export, validation, correction, filtration and transformations.
Datumaro provides a suite of basic data import/export (IE) for more than 35 public vision data
formats and manipulation functionalities such as validation, correction, filtration, and some
transformations. To achieve the web-scale training, this further aims to merge multiple
heterogeneous datasets through comparator and merger. Datumaro is integrated into Geti™, OpenVINO™
Training Extensions, and CVAT for the ease of data preparation. Datumaro is open-sourced and
available on GitHub <https://github.com/openvinotoolkit/datumaro>.
Refer to the official documentation <https://openvinotoolkit.github.io/datumaro/stable/docs/get-started/introduction.html> to learn more.
Plus, enjoy Jupyter notebooks <https://github.com/openvinotoolkit/datumaro/tree/develop/notebooks>__ for the real Datumaro practices.
Detailed Workflow #################
.. image:: ./_static/images/datumaro.png
-
To start working with Datumaro, download public datasets or prepare your own annotated dataset.
.. note:: Datumaro provides a CLI
datum downloadfor downloadingTensorFlow Datasets <https://www.tensorflow.org/datasets>__. -
Import data into Datumaro and manipulate the dataset for the data quality using
Validator,Corrector, andFilter. -
Compare two datasets and transform the label schemas (category information) before merging them.
-
Merge two datasets to a large-scale dataset.
.. note:: There are some choices of merger, i.e.,
ExactMerger,IntersectMerger, andUnionMerger. -
Split the unified dataset into subsets, e.g.,
train,valid, andtestthroughSplitter... note:: We can split data with a given ratio of subsets according to both the number of samples or annotations. Please see
SplitTaskfor the task-specific split. -
Export the cleaned and unified dataset for follow-up workflows such as model training. Go to :doc:
OpenVINO™ Training Extensions <ote_documentation>.
If the results are unsatisfactory, add datasets and perform the same steps, starting with dataset annotation.
Datumaro Components ###################
Datumaro CLIs <https://openvinotoolkit.github.io/datumaro/stable/docs/command-reference/overview.html>__Datumaro APIs <https://openvinotoolkit.github.io/datumaro/stable/docs/reference/datumaro_module.html>__Datumaro data format <https://openvinotoolkit.github.io/datumaro/stable/docs/data-formats/datumaro_format.html>__Supported data formats <https://openvinotoolkit.github.io/datumaro/stable/docs/data-formats/formats/index.html>__
Tutorials #########
Basic skills <https://openvinotoolkit.github.io/datumaro/stable/docs/level-up/basic_skills/index.html>__Intermediate skills <https://openvinotoolkit.github.io/datumaro/stable/docs/level-up/intermediate_skills/index.html>__Advanced skills <https://openvinotoolkit.github.io/datumaro/stable/docs/level-up/advanced_skills/index.html>__
Python Hands-on Examples ########################
Data IE <https://openvinotoolkit.github.io/datumaro/stable/docs/jupyter_notebook_examples/dataset_IO.html>__Data manipulation <https://openvinotoolkit.github.io/datumaro/stable/docs/jupyter_notebook_examples/manipulate.html>__Data exploration <https://openvinotoolkit.github.io/datumaro/stable/docs/jupyter_notebook_examples/explore.html>__Data refinement <https://openvinotoolkit.github.io/datumaro/stable/docs/jupyter_notebook_examples/refine.html>__Data transformation <https://openvinotoolkit.github.io/datumaro/stable/docs/jupyter_notebook_examples/transform.html>__Deep learning end-to-end use-cases <https://openvinotoolkit.github.io/datumaro/stable/docs/jupyter_notebook_examples/e2e_example.html>__
@endsphinxdirective