[Snippets] Moved infrastructure to Linear Intermediate Representation (#16402)

This commit is contained in:
Alexandra Sidorova 2023-05-19 17:16:36 +04:00 committed by GitHub
parent 41de4ba638
commit 9fafcabb7c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
247 changed files with 7714 additions and 5925 deletions

View File

@ -5,7 +5,7 @@ This document describes the design and rationale for a snippets code generator.
Core **CNN operators (convolution, gemm, fully connected) are limited by compute, the rest is memory bound**. Math approximations (like transcendental functions) are rare in emerging workloads and could be treated with the same machinery. **Snippets are designed to optimize topology for memory**, while leaving compute intensive kernels for backend developers. Core **CNN operators (convolution, gemm, fully connected) are limited by compute, the rest is memory bound**. Math approximations (like transcendental functions) are rare in emerging workloads and could be treated with the same machinery. **Snippets are designed to optimize topology for memory**, while leaving compute intensive kernels for backend developers.
The **potential speedup is proportional to shrink in memory-walked bytes**. Therefore, you can transform the problem to a task to optimize for memory walks, whatever pattern snippet has and operations it contains. The number of memory walks should be less or equal to handcrafted optimizations. This guarantees performance improvements over the previous approach (excluding corner cases caused by cache effects). *Shrinkage factor might be encoded to some cost function in future evolution of code generator*. Snippets generator provides diagnostics to estimate this shrinkage factor with `ngraph::snippets::op::Subgraph::print_statistics(bool verbose)` member. The **potential speedup is proportional to shrink in memory-walked bytes**. Therefore, you can transform the problem to a task to optimize for memory walks, whatever pattern snippet has and operations it contains. The number of memory walks should be less or equal to handcrafted optimizations. This guarantees performance improvements over the previous approach (excluding corner cases caused by cache effects). *Shrinkage factor might be encoded to some cost function in future evolution of code generator*. Snippets generator provides diagnostics to estimate this shrinkage factor with `ov::snippets::op::Subgraph::print_statistics(bool verbose)` member.
The SnippetS generator is designed for back-end developers. The main purpose of inventing the snippets code generator is an **operator fusion**, **register allocation** and **target kernel generation** decomposition. This allows modifications (like new fusion support) and feature extensions (like new operation support) to be done in a single point of modification and avoid combinatorial explosion for fusions/types/architectures etc. The SnippetS generator is designed for back-end developers. The main purpose of inventing the snippets code generator is an **operator fusion**, **register allocation** and **target kernel generation** decomposition. This allows modifications (like new fusion support) and feature extensions (like new operation support) to be done in a single point of modification and avoid combinatorial explosion for fusions/types/architectures etc.
@ -28,7 +28,7 @@ Code generation is split into 2 phases, **tokenization** and **lowering**.
### Tokenization ### Tokenization
Tokenization runs on full topology nGraph function inside a specific plugin in a stage of common transformations. Input of tokenization is a topology graph. Output is a modified topology graph with `ngraph::snippets::op::Subgraph` operations installed. Each subgraph contains nGraph function (called **body**) which holds a part of original topology legal for snippet generation (can be scheduled with a single schedule). Tokenization runs on full topology nGraph function inside a specific plugin in a stage of common transformations. Input of tokenization is a topology graph. Output is a modified topology graph with `ov::snippets::op::Subgraph` operations installed. Each subgraph contains nGraph function (called **body**) which holds a part of original topology legal for snippet generation (can be scheduled with a single schedule).
A procedure of finding subgraphs suitable for code generation is called **tokenization**. During tokenization the topology tree is split into subgraphs in the same greedy approach which is used for parsing input stream of characters into the tokens. It may also be seen as and modified into a basic block construction problem, since there is a leader and potentially terminators. See the example of implementation [here](https://github.com/openvinotoolkit/openvino/blob/master/src/common/snippets/src/pass/collapse_subgraph.cpp). A procedure of finding subgraphs suitable for code generation is called **tokenization**. During tokenization the topology tree is split into subgraphs in the same greedy approach which is used for parsing input stream of characters into the tokens. It may also be seen as and modified into a basic block construction problem, since there is a leader and potentially terminators. See the example of implementation [here](https://github.com/openvinotoolkit/openvino/blob/master/src/common/snippets/src/pass/collapse_subgraph.cpp).
@ -94,7 +94,7 @@ The goal of this step is to apply target-independent and schedule-related optimi
All input and output shapes are normalized to 6D for future schedule generation. If shape propagation fails or leads to inconsistent output shapes an exception is raised. All input and output shapes are normalized to 6D for future schedule generation. If shape propagation fails or leads to inconsistent output shapes an exception is raised.
The layout assigned by a user code and passed to a `generate` function is propagated through a subgraph on this step as well. The layout is passed to a `generate` function as a `BlockedShapeVector` which is a `std::vector<BlockedShape>` , while `BlockedShape` is `std::tuple<ngraph::Shape, ngraph::AxisVector, ngraph::element::Type>`. For example, if backend supports `NCHW16c` layout and a tensor has a size of `<1, 42, 17, 31>` and holds single precision floating point, this structure should be `std::make_tuple(ngraph::Shape {1, 3, 17, 31, 16}, ngraph::AxisVector {0, 1, 2, 3, 1}, ngraph::element::f32);`. This allows generic layout representation. The layout assigned by a user code and passed to a `generate` function is propagated through a subgraph on this step as well. The layout is passed to a `generate` function as a `BlockedShapeVector` which is a `std::vector<BlockedShape>` , while `BlockedShape` is `std::tuple<ov::Shape, ov::AxisVector, ov::element::Type>`. For example, if backend supports `NCHW16c` layout and a tensor has a size of `<1, 42, 17, 31>` and holds single precision floating point, this structure should be `std::make_tuple(ov::Shape {1, 3, 17, 31, 16}, ov::AxisVector {0, 1, 2, 3, 1}, ov::element::f32);`. This allows generic layout representation.
##### Dialect conversion ##### Dialect conversion
@ -191,17 +191,17 @@ Broadcast and regular streaming vector load is possible from the same pointer. B
#### Target-specific optimizations #### Target-specific optimizations
Target developers can plug in to the code generation pipeline some specific optimizations with passing `ngraph::pass::Manager` into `generate` function of `subgraph`. **Passes are executed on subgraph in canonical form converted to a snippet dialect**. Target developers can plug in to the code generation pipeline some specific optimizations with passing `ov::pass::Manager` into `generate` function of `subgraph`. **Passes are executed on subgraph in canonical form converted to a snippet dialect**.
*It might be also extended to provide an interface for target independent optimizations in future* *It might be also extended to provide an interface for target independent optimizations in future*
#### Register allocation #### Register allocation
Canonicalized subgraph in a snippets dialect forms a basic block or region inside a snippet (kernel). Registers are allocated globally for the whole subgraph. Since all operations for a subgraph are assumed to be vector, only vector registers are allocated for the first generation of SnippetS. Linear scan register allocation algorithm is used. Register allocator is implemented as the `ngraph::snippets::pass::AssignRegisters` function pass and store allocated registers for each node into `rt_info`. `rt_info` for a node holds a register for Node's output. *However, this part should be refactored better, either to become target independent or to use target-specific abstraction to acquire a new register* Canonicalized subgraph in a snippets dialect forms a basic block or region inside a snippet (kernel). Registers are allocated globally for the whole subgraph. Since all operations for a subgraph are assumed to be vector, only vector registers are allocated for the first generation of SnippetS. Linear scan register allocation algorithm is used. Register allocator is implemented as the `ov::snippets::pass::AssignRegisters` function pass and store allocated registers for each node into `rt_info`. `rt_info` for a node holds a register for Node's output. *However, this part should be refactored better, either to become target independent or to use target-specific abstraction to acquire a new register*
#### Schedule generation #### Schedule generation
The goal of this step is to transform subgraphs in a scalar notation into kernel functions callable from user code. The `Kernel` and `Tile` operations are introduced for this purpose. Each of these operations has a constructor from code region described as a collection of operation and operand pairs `Kernel(const std::vector<std::pair<std::shared_ptr<ngraph::snippets::Emitter>, ngraph::snippets::RegInfo>>& region);`. The goal of this step is to transform subgraphs in a scalar notation into kernel functions callable from user code. The `Kernel` and `Tile` operations are introduced for this purpose. Each of these operations has a constructor from code region described as a collection of operation and operand pairs `Kernel(const std::vector<std::pair<std::shared_ptr<ov::snippets::Emitter>, ov::snippets::RegInfo>>& region);`.
The example above can be used for the following hierarchical IR. If the scope to layout oblivious operations with broadcasting support is limited, `Tile` could be generated as a single loop over the most warning dimension. The second `Tile` is generated to handle tails and can be omitted if not needed. A special pass replaces memory operations on vector with scalar versions for tail subgraph. The example above can be used for the following hierarchical IR. If the scope to layout oblivious operations with broadcasting support is limited, `Tile` could be generated as a single loop over the most warning dimension. The second `Tile` is generated to handle tails and can be omitted if not needed. A special pass replaces memory operations on vector with scalar versions for tail subgraph.
@ -253,7 +253,7 @@ Where
A target code emission is table based. A target is responsible for filling `jitters` table field in `Generator` class. A target code emission is table based. A target is responsible for filling `jitters` table field in `Generator` class.
``` ```
std::map<const ngraph::DiscreteTypeInfo, std::function<std::shared_ptr<Emitter>(std::shared_ptr<ngraph::Node>)>> jitters; std::map<const ov::DiscreteTypeInfo, std::function<std::shared_ptr<Emitter>(std::shared_ptr<ov::Node>)>> jitters;
``` ```
##### Interface with a target ##### Interface with a target
@ -279,7 +279,7 @@ Once a schedule is generated, a target code is emitted from a kernel in `Generat
A target can potentially extend the snippets dialect with a target-specific operation for code emission. It should implement: A target can potentially extend the snippets dialect with a target-specific operation for code emission. It should implement:
* nGraph operation (for example, `class FMA : public ngraph::op::Op`) * nGraph operation (for example, `class FMA : public ov::op::Op`)
* Emitter for the operation (for example, `class FmaEmitter : public Emitter` ) * Emitter for the operation (for example, `class FmaEmitter : public Emitter` )
* register the pair in `jitters` map * register the pair in `jitters` map

View File

@ -6,9 +6,10 @@
#include <vector> #include <vector>
#include <cstdint> #include <cstdint>
#include "ngraph/node.hpp"
namespace ngraph { #include "openvino/core/node.hpp"
namespace ov {
namespace snippets { namespace snippets {
using code = const uint8_t *; using code = const uint8_t *;
@ -24,11 +25,9 @@ public:
/** /**
* @brief Default constructor * @brief Default constructor
*/ */
Emitter(const std::shared_ptr<ngraph::Node>& n) { Emitter(const std::shared_ptr<ov::Node>& n) {}
}
Emitter(std::vector<std::pair<std::shared_ptr<Emitter>, RegInfo>>& region) { Emitter(std::vector<std::pair<std::shared_ptr<Emitter>, RegInfo>>& region) {}
}
/** /**
* @brief called by generator to generate code to produce target code for a specific operation * @brief called by generator to generate code to produce target code for a specific operation
@ -47,12 +46,12 @@ public:
* @brief called by generator to generate data section, if needed for a specific operation * @brief called by generator to generate data section, if needed for a specific operation
* @return void * @return void
*/ */
virtual void emit_data() const { virtual void emit_data() const {}
}
virtual ~Emitter() = default; virtual ~Emitter() = default;
}; };
using AllocatedEmitter = std::pair<std::shared_ptr<Emitter>, ngraph::snippets::RegInfo>; using AllocatedEmitter = std::pair<std::shared_ptr<Emitter>, ov::snippets::RegInfo>;
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -9,74 +9,13 @@
#pragma once #pragma once
#include "snippets_isa.hpp" #include "snippets_isa.hpp"
#include "emitter.hpp"
namespace ngraph { #include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/pass/pass.hpp"
namespace ov {
namespace snippets { namespace snippets {
auto getRegisters(std::shared_ptr<ngraph::Node>& n) -> ngraph::snippets::RegInfo;
typedef std::pair<std::function<std::shared_ptr<Emitter>(const std::shared_ptr<ngraph::Node>&)>,
std::function<std::set<std::vector<element::Type>>(const std::shared_ptr<ngraph::Node>&)>> jitters_value;
/**
* @interface TargetMachine
* @brief Base class Target machine representation. Target derives from this class to provide generator information about supported emitters
* @ingroup snippets
*/
class TargetMachine {
public:
/**
* @brief checks if target is natively supported
* @return true, if supported
*/
virtual bool is_supported() const = 0;
/**
* @brief finalizes code generation
* @return generated kernel binary
*/
virtual code get_snippet() const = 0;
/**
* @brief gets number of lanes supported by target's vector ISA
* @return number of lanes
*/
virtual size_t get_lanes() const = 0;
/**
* @brief called by generator to all the emitter for a target machine
* @return a map by node's type info with callbacks to create an instance of emitter for corresponding operation type
*/
std::function<std::shared_ptr<Emitter>(std::shared_ptr<ngraph::Node>)> get(const ngraph::DiscreteTypeInfo type) const {
auto jitter = jitters.find(type);
if (jitter == jitters.end()) {
OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation.");
}
return jitter->second.first;
}
std::function<std::set<std::vector<element::Type>>(const std::shared_ptr<ngraph::Node>&)>
get_supported_precisions(const ngraph::DiscreteTypeInfo type) const {
auto jitter = jitters.find(type);
if (jitter == jitters.end()) {
OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation.");
}
return jitter->second.second;
}
/**
* @brief checks if emitter for a specific operation is supported
* @return true, if supported
*/
bool has(const ngraph::DiscreteTypeInfo type) const {
return jitters.find(type) != jitters.end();
}
virtual ~TargetMachine() = default;
protected:
std::map<const ngraph::DiscreteTypeInfo, jitters_value> jitters;
};
/** /**
* @interface Schedule * @interface Schedule
* @brief Return scheduling information and pointer to generated kernel code * @brief Return scheduling information and pointer to generated kernel code
@ -117,7 +56,7 @@ public:
/** /**
* @brief Default constructor * @brief Default constructor
*/ */
Generator(const std::shared_ptr<TargetMachine>& t) : target(t) {} Generator(const std::shared_ptr<TargetMachine>& t) : target(t), lowered_saved{} {}
/** /**
* @brief Default destructor * @brief Default destructor
*/ */
@ -126,19 +65,6 @@ public:
* @interface GeneratorConfig * @interface GeneratorConfig
* @brief Allows to tweak the lowering process. * @brief Allows to tweak the lowering process.
*/ */
class GeneratorConfig {
public:
// True if the lowered Emitters need to be accessed during runtime. Normally they're destroyed after code emission.
bool m_save_lowered_code = false;
// True if we can optimize tails for single evaluation during code generation
// More details with optimization examples you can see in generate() method
// For example, tails with Buffer ops doesn't support single evaluation optimizations
// because of that we should always reset memory pointer using finalization offsets
// after data storing to Buffer
bool m_optimize_single_evaluation = true;
// True if we should check runtime info for nodes to call specific needed transformations
bool m_need_fill_tail_register = false;
};
/** /**
* @brief virtual method any specific implementation should implement * @brief virtual method any specific implementation should implement
* @param m model in canonical for for table-based code generation * @param m model in canonical for for table-based code generation
@ -146,7 +72,11 @@ public:
* @param compile_params parameters for generated code * @param compile_params parameters for generated code
* @return pointer to generated code * @return pointer to generated code
*/ */
code generate(std::shared_ptr<ov::Model>& m, const GeneratorConfig& config, const void* compile_params = nullptr); struct LoweringResult {
LoweringResult(code c) : binary_code(c) {}
code binary_code = nullptr;
};
LoweringResult generate(lowered::LinearIR& linear_ir, const lowered::Config& config, const void* compile_params = nullptr);
/** /**
* @brief gets target machine * @brief gets target machine
@ -180,8 +110,8 @@ protected:
std::shared_ptr<TargetMachine> target; std::shared_ptr<TargetMachine> target;
// todo: we need to save lowered code to access compiled brgemm kernels on execution time (normally lowered is destructed by then). // todo: we need to save lowered code to access compiled brgemm kernels on execution time (normally lowered is destructed by then).
// This is temporary solution, remove this when kernel caching is implemented. Don't forget to make generate const method. // This is temporary solution, remove this when kernel caching is implemented. Don't forget to make generate const method.
std::vector<AllocatedEmitter> lowered_saved; lowered::LinearIR lowered_saved;
}; };
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -11,15 +11,15 @@
#include <openvino/cc/ngraph/itt.hpp> #include <openvino/cc/ngraph/itt.hpp>
namespace ngraph { namespace ov {
namespace pass { namespace pass {
namespace itt { namespace itt {
namespace domains { namespace domains {
OV_ITT_DOMAIN(SnippetsTransform); OV_ITT_DOMAIN(SnippetsTransform);
} // namespace domains } // namespace domains
} // namespace itt } // namespace itt
} // namespace pass } // namespace pass
} // namespace ngraph } // namespace ov
OV_CC_DOMAINS(internal_op); OV_CC_DOMAINS(internal_op);

View File

@ -0,0 +1,99 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <openvino/core/node.hpp>
#include <openvino/opsets/opset1.hpp>
#include "snippets/emitter.hpp"
#include "snippets/target_machine.hpp"
#include "snippets/lowered/port_connector.hpp"
#include "snippets/lowered/expression_port.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class LinearIR;
class Expression : public std::enable_shared_from_this<Expression> {
friend class LinearIR;
friend class ExpressionPort;
public:
static size_t LOOP_NULL_ID;
Expression() = default;
virtual ~Expression() = default;
std::shared_ptr<Node> get_node() const;
std::shared_ptr<Emitter> get_emitter() const;
RegInfo get_reg_info() const;
void set_reg_info(RegInfo rinfo);
const PortConnectorPtr& get_input_port_connector(size_t i) const;
const PortConnectorPtr& get_output_port_connector(size_t i) const;
std::vector<PortConnectorPtr> get_input_port_connectors() const { return m_input_port_connectors; }
std::vector<PortConnectorPtr> get_output_port_connectors() const { return m_output_port_connectors; }
const PortDescriptorPtr& get_input_port_descriptor(size_t i) const;
const PortDescriptorPtr& get_output_port_descriptor(size_t i) const;
std::vector<PortDescriptorPtr> get_input_port_descriptors() const { return m_input_port_descriptors; }
std::vector<PortDescriptorPtr> get_output_port_descriptors() const { return m_output_port_descriptors; }
size_t get_input_count() const { return m_input_port_connectors.size(); }
size_t get_output_count() const { return m_output_port_connectors.size(); }
std::vector<size_t> get_loop_ids() const { return m_loop_ids; }
void set_loop_ids(const std::vector<size_t>& loops) { m_loop_ids = loops; }
void set_loop_id(size_t id, size_t idx);
void remove_loop_id(size_t id);
void validate() const;
void init_emitter(const std::shared_ptr<const TargetMachine>& target);
ExpressionPort get_input_port(size_t i);
ExpressionPort get_output_port(size_t i);
protected:
// Note: The constructor initialization is private since an expression can be created only by Linear IR.
// The method must be used only by Linear IR builder of expressions!
explicit Expression(const std::shared_ptr<Node>& n);
void replace_input(size_t port, PortConnectorPtr to);
std::shared_ptr<Node> m_source_node{nullptr};
std::shared_ptr<Emitter> m_emitter{nullptr};
std::vector<PortConnectorPtr> m_input_port_connectors{};
std::vector<PortConnectorPtr> m_output_port_connectors{};
std::vector<PortDescriptorPtr> m_input_port_descriptors{};
std::vector<PortDescriptorPtr> m_output_port_descriptors{};
// The order Loops identifies: Outer ---> Inner
std::vector<size_t> m_loop_ids;
};
using ExpressionPtr = std::shared_ptr<Expression>;
class IOExpression : public Expression {
friend class LinearIR;
public:
enum class io_type {INPUT, OUTPUT, UNDEFINED};
int64_t get_index() const { return m_index; }
io_type get_type() const { return m_type; }
private:
explicit IOExpression(const std::shared_ptr<ov::opset1::Parameter>& n, int64_t index);
explicit IOExpression(const std::shared_ptr<ov::opset1::Result>& n, int64_t index);
int64_t m_index = -1;
io_type m_type = io_type::UNDEFINED;
};
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,55 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "linear_ir.hpp"
#include "snippets/snippets_isa.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class LinearIR::ExpressionFactory {
public:
template<class... Args>
static ExpressionPtr build(const std::shared_ptr<Node>& n, Args&&... params) {
if (const auto par = ov::as_type_ptr<ov::op::v0::Parameter>(n)) {
return create(par, params...);
} else if (const auto res = ov::as_type_ptr<ov::op::v0::Result>(n)) {
return create(res, params...);
} else if (const auto loop_begin = ov::as_type_ptr<op::LoopBegin>(n)) {
return create(loop_begin, params...);
} else if (const auto loop_end = ov::as_type_ptr<op::LoopEnd>(n)) {
return create(loop_end, params...);
}
return create(n, params...);
}
private:
/* -- Default Builders - initialize input port connectors from parents and create new output port connectors themselves */
static ExpressionPtr create(const std::shared_ptr<ov::op::v0::Parameter>& par, const LinearIR& linear_ir,
const std::shared_ptr<ov::Model>& model);
static ExpressionPtr create(const std::shared_ptr<ov::op::v0::Result>& res, const LinearIR& linear_ir,
const std::shared_ptr<ov::Model>& model);
static ExpressionPtr create(const std::shared_ptr<ov::Node>& n, const LinearIR& linear_ir,
const std::shared_ptr<ov::Model>& model);
/* -- Input Builders - get input port connectors from method parameters and create new output port connectors themselves */
static ExpressionPtr create(const std::shared_ptr<op::LoopBegin>& n, const std::vector<PortConnectorPtr>& inputs);
static ExpressionPtr create(const std::shared_ptr<op::LoopEnd>& n, const std::vector<PortConnectorPtr>& inputs);
static ExpressionPtr create(const std::shared_ptr<ov::Node>& n, const std::vector<PortConnectorPtr>& inputs);
// Creates inputs for expression using parent output port connectors
static void create_expression_inputs(const LinearIR& linear_ir, const ExpressionPtr& expr);
// Creates new output port connectors
static void create_expression_outputs(const ExpressionPtr& expr);
// The method verifies of input port connectors to availability of the expression as consumer and add it if missed
static void init_expression_inputs(const ExpressionPtr& expr, const std::vector<PortConnectorPtr>& inputs);
};
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,51 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <vector>
#include "port_descriptor.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class PortConnector;
class Expression;
class ExpressionPort {
public:
enum Type {
Input,
Output
};
ExpressionPort() = default;
explicit ExpressionPort(const std::shared_ptr<Expression>& expr, Type type, size_t port);
const std::shared_ptr<Expression>& get_expr() const { return m_expr; }
Type get_type() const { return m_type; }
size_t get_index() const { return m_port_index; }
const PortDescriptorPtr& get_descriptor_ptr() const;
const std::shared_ptr<PortConnector>& get_port_connector_ptr() const;
// Returns connected ports to the current:
// - Input port returns one source (parent) port
// - Output port returns all consumer ports (children)
std::set<ExpressionPort> get_connected_ports() const;
friend bool operator==(const ExpressionPort& lhs, const ExpressionPort& rhs);
friend bool operator!=(const ExpressionPort& lhs, const ExpressionPort& rhs);
friend bool operator<(const ExpressionPort& lhs, const ExpressionPort& rhs);
private:
std::shared_ptr<Expression> m_expr;
Type m_type = Type::Output;
size_t m_port_index = 0;
};
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,112 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <list>
#include "expression.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class Config {
public:
// True if the lowered Emitters need to be accessed during runtime. Normally they're destroyed after code emission.
bool m_save_expressions = false;
// True if we should check runtime info for nodes to call specific needed transformations
bool m_need_fill_tail_register = false;
size_t m_loop_depth = 1;
};
/* The control flow of Snippets is built on Linear Intermediate Representation (Linear IR).
* The class diagram is described in the documentation `snippets/docs/snippets_design_guide.md`.
*/
class LinearIR {
class ExpressionFactory;
public:
using container = std::list<ExpressionPtr>;
using io_container = std::list<std::shared_ptr<IOExpression>>;
using exprIt = container::iterator;
using constExprIt = container::const_iterator;
LinearIR() = default;
explicit LinearIR(const std::shared_ptr<ov::Model>& m, Config config = {});
ExpressionPtr create_expression(const std::shared_ptr<Node>& n, const std::vector<PortConnectorPtr>& inputs);
static LinearIR::container deep_copy_range(LinearIR::container::const_iterator begin, LinearIR::container::const_iterator end);
const container& get_ops() const {return m_expressions; }
const io_container& get_IO_ops() const {return m_io_expressions; }
Config get_config() {return m_config; }
const ExpressionPtr& get_expr_by_node(const std::shared_ptr<Node>& n) const;
void replace_input(const std::set<ExpressionPort>& consumers, const PortConnectorPtr& to);
void replace_input(const ExpressionPort& expr_port, const PortConnectorPtr& to);
/**
* @brief Move an expression from the position "from" to the position immediately before "to".
* Note: this method does NOT take care about data dependencies and no relevant checks are performed.
* and doesn't touch internal maps.
*/
void move(constExprIt from, constExprIt to);
bool empty() const noexcept {return m_expressions.empty(); }
void debug_print(bool tds_as_pointers = false) const;
container::reference back() noexcept {return m_expressions.back();}
container::const_reference back() const noexcept {return m_expressions.back();}
container::reference front() noexcept {return m_expressions.front();}
container::const_reference front() const noexcept {return m_expressions.front();}
exprIt begin() noexcept {return m_expressions.begin();}
exprIt end() noexcept {return m_expressions.end();}
constExprIt begin() const noexcept {return cbegin();}
constExprIt end() const noexcept {return cend();}
constExprIt cbegin() const noexcept {return m_expressions.cbegin();}
constExprIt cend() const noexcept {return m_expressions.cend();}
container::reverse_iterator rbegin() noexcept {return m_expressions.rbegin();}
container::reverse_iterator rend() noexcept {return m_expressions.rend();}
container::const_reverse_iterator crbegin() const noexcept {return m_expressions.crbegin();}
container::const_reverse_iterator crend() const noexcept {return m_expressions.crend();}
exprIt insert(constExprIt pos, const ov::NodeVector& nodes);
exprIt insert(constExprIt pos, const std::shared_ptr<Node>& n);
exprIt insert(constExprIt pos, container::value_type&& value);
exprIt insert(constExprIt pos, const container::value_type& value);
exprIt insert(constExprIt pos, exprIt begin, exprIt end);
exprIt insert(constExprIt pos, constExprIt begin, constExprIt end);
exprIt erase(exprIt pos);
exprIt erase(constExprIt pos);
void init_emitters(const std::shared_ptr<TargetMachine>& target);
void serialize(const std::string& xml, const std::string& bin);
class LoopManager;
using LoopManagerPtr = std::shared_ptr<LoopManager>;
const LoopManagerPtr& get_loop_manager() const { return m_loop_manager; }
private:
static ov::NodeVector get_ordered_ops(const std::shared_ptr<ov::Model>& model);
// Default ctor - can be called only from Linear IR initialization as default way
ExpressionPtr create_expression(const std::shared_ptr<Node>& n, const std::shared_ptr<ov::Model>& model = nullptr);
void register_expression(const ExpressionPtr& expr, bool io_allowed = false);
void unregister_expression(const ExpressionPtr& expr);
container m_expressions{};
std::unordered_map<std::shared_ptr<Node>, std::shared_ptr<Expression>> m_node2expression_map;
io_container m_io_expressions;
Config m_config{};
LoopManagerPtr m_loop_manager = nullptr;
};
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,83 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "linear_ir.hpp"
#include <openvino/core/node.hpp>
#include <openvino/opsets/opset1.hpp>
#include "port_descriptor.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class LinearIR::LoopManager {
public:
LoopManager() = default;
class LoopInfo {
public:
LoopInfo() = default;
LoopInfo(size_t work_amount, size_t increment,
const std::vector<ExpressionPort>& entries,
const std::vector<ExpressionPort>& exits)
: work_amount(work_amount), increment(increment), entry_exprs(entries), exit_exprs(exits) {}
size_t work_amount = 0;
size_t increment = 0;
// The order of entry and exit expressions is important:
// - The position before first entry expr is Loop Begin position
// - The position after last exit expr is Loop End position
// Note: Scalars aren't entry expressions but can be before first entry expr in Linear IR
std::vector<ExpressionPort> entry_exprs = {};
std::vector<ExpressionPort> exit_exprs = {};
};
using LoopInfoPtr = std::shared_ptr<LoopInfo>;
size_t add_loop_info(const LoopInfoPtr& loop);
void remove_loop_info(size_t index);
LoopInfoPtr get_loop_info(size_t index) const;
size_t get_loop_count() const { return m_map.size(); }
const std::map<size_t, LoopInfoPtr>& get_map() const;
void mark_loop(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
size_t loop_depth, size_t vector_size);
void mark_loop(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
size_t idx,
size_t work_amount,
size_t work_amount_increment,
const std::vector<ExpressionPort>& entries,
const std::vector<ExpressionPort>& exits);
void get_loop_bounds(const LinearIR& linear_ir,
size_t loop_id,
LinearIR::constExprIt& loop_begin_pos,
LinearIR::constExprIt& loop_end_pos) const;
static void get_loop_bounds(const LinearIR& linear_ir,
const std::vector<ExpressionPort>& entries,
const std::vector<ExpressionPort>& exits,
LinearIR::constExprIt& loop_begin_pos,
LinearIR::constExprIt& loop_end_pos,
size_t loop_id = Expression::LOOP_NULL_ID);
private:
static void exprs_marking(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
size_t loop_id, size_t idx);
static void get_io_loop_ports(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
std::vector<ExpressionPort>& entries,
std::vector<ExpressionPort>& exits);
std::map<size_t, LoopInfoPtr> m_map = {};
size_t next_id = 0;
};
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,42 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
#include "snippets/snippets_isa.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface AllocateBuffers
* @brief The pass calculates common size of buffer scratchpad and propagates Buffer offsets to connected MemoryAccess operations.
* Notes:
* - The pass implicitly regulates InPlace processing for some Buffers when it's possible.
* The pass don't allocate new memory for InPlace Buffers, we propagate the same offsets for them.
* - The pass should be splitted into two passes: ProcessInplace (markup of Buffers which can use the same memory)
* and AllocateBuffer (allocate memory for Buffers using MemorySolver which can optimally reuse memory).
* @ingroup snippets
*/
class AllocateBuffers : public Pass {
public:
OPENVINO_RTTI("AllocateBuffers", "Pass")
bool run(lowered::LinearIR& linear_ir) override;
size_t get_scratchpad_size() const { return m_buffer_scratchpad_size; }
private:
static void propagate_offset(const LinearIR& linear_ir, const ExpressionPtr& buffer_expr, size_t offset);
size_t m_buffer_scratchpad_size = 0;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,35 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
#include "snippets/generator.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface AssignRegisters
* @brief Assigns in/out abstract registers indexes to every operation.
* Note that changing of the IR is likely to invalidate register assignment.
* @ingroup snippets
*/
class AssignRegisters : public Pass {
public:
OPENVINO_RTTI("AssignRegisters", "Pass")
explicit AssignRegisters(const std::function<Generator::opRegType(const std::shared_ptr<Node>& op)>& mapper) : m_reg_type_mapper(mapper) {}
bool run(LinearIR& linear_ir) override;
private:
std::function<Generator::opRegType(const std::shared_ptr<Node>& op)> m_reg_type_mapper;
static constexpr size_t reg_count = 16lu;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,38 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface CleanRepeatedDataPointerShifts
* @brief The pass `fuses` (reset) ptr increments and finalization offsets for ports of Loop
* with the same data expression (Buffer with the same ID, the same parent of Loads) to avoid double ptr shifts
* Note: Buffer always employ inplace logics by default. It means that if a loop has both
* an input and an output connected to Buffers, the corresponding register should nevertheless be
* incremented only once (because when the input reg is incremented, output incremented automatically).
* This condition should be removed when Buffers stop being inplace by default.
* @ingroup snippets
*/
class CleanRepeatedDataPointerShifts: public Pass {
public:
OPENVINO_RTTI("CleanRepeatedDataPointerShifts", "Pass")
CleanRepeatedDataPointerShifts() = default;
bool run(LinearIR& linear_ir) override;
private:
bool reuse_increments(const LinearIR& linear_ir, const ExpressionPtr& loop_end_expr);
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,29 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface CleanupLoopOffsets
* @brief Loops are inserted with finalization offsets that reset all managed pointers to their initial values.
* This transformation "fuses" the offsets with an outer loop's ptr_increments, and zeroes the offsets before Results.
* @ingroup snippets
*/
class CleanupLoopOffsets : public Pass {
public:
OPENVINO_RTTI("CleanupLoopOffsets", "Pass")
bool run(LinearIR& linear_ir) override;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,61 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
#include "snippets/lowered/loop_manager.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface FuseLoops
* @brief The pass fuses marking Loops. The transformations support the following fusions of loops:
*
* - Upper Loop is fused into the Current Loop
* Loop_0 (Upper) |
* | => |
* Loop_1 (Current) Loop_0 + Loop_1 => new `Loop_1`
* * It's possible only if other consumers of Loop_0 are after Loop_1 in Linear IR.
* Because Upper Loop_0 will be explicitly moved before Current Loop_1 in linear IR,
* and we must save control dependency (to avoid cases when after fusion some consumers of Loop_0 are before this Loop)
*
* - Lower Loop is fused into the Current Loop
* Loop_0 (Current) Loop_0 + Loop_1 => new `Loop_0`
* | => |
* Loop_1 (Lower) |
* * It's possible only if other parents of Loop_1 are before Loop_0 in Linear IR.
* Because Lower Loop_1 will be explicitly moved after Current Loop_0 in linear IR,
* and we must save control dependency (to avoid cases when after fusion some parents of Loop_1 are after this Loop)
*
* The main conditions of possible fusion is the equal increments and the equal/broadcastable work amounts.
* @ingroup snippets
*/
class FuseLoops : public Pass {
public:
OPENVINO_RTTI("FuseLoops", "Pass")
FuseLoops();
bool run(LinearIR& linear_ir) override;
private:
static bool can_be_fused(const LinearIR::LoopManager::LoopInfoPtr& loop_current,
const LinearIR::LoopManager::LoopInfoPtr& loop_target);
static bool fuse_upper_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_entry_point,
size_t current_loop_id, size_t target_loop_id, size_t dim_idx,
LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos);
static bool fuse_lower_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_entry_point,
size_t current_loop_id, size_t target_loop_id, size_t dim_idx,
LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos);
static void fuse_points(std::vector<ExpressionPort>& exit_points, std::vector<ExpressionPort>& entry_points,
LinearIR::constExprIt loop_begin_pos, LinearIR::constExprIt loop_end_pos);
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,48 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
#include "snippets/op/buffer.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface IdentifyBuffers
* @brief The pass set identifiers for Buffers in common Buffer system.
* The buffers with the same identifier will be assigned the same data register.
* The pass uses greedy graph coloring algorithm using adjacency matrix:
* - Buffers - are vertices of graph;
* - Loops, Brgemm (the same other ops) - are "edges" between Buffers (hub of edges).
* The buffers are connected to the same Loop - are adjacent in graph sense bounds.
* - The vertices (buffers) are adjacent if they are connected to the same Loop and
* their data pointers cannot be proportionally incremented in Loops: different ptr increments or data sizes;
* - Firstly, create adjacency matrix using the definition above;
* - Secondly, assign the same color to non-adjacent vertices of graph (buffers), and use different colors otherwise.
* Note: should be called before ResetBuffer() pass to have correct offsets
* @ingroup snippets
*/
class IdentifyBuffers: public Pass {
public:
OPENVINO_RTTI("IdentifyBuffers", "Pass")
IdentifyBuffers() = default;
bool run(LinearIR& linear_ir) override;
private:
using BufferSet = std::vector<std::shared_ptr<op::Buffer>>;
std::vector<bool> create_adjacency_matrix(const LinearIR& linear_ir, const BufferSet& buffers) const;
std::map<size_t, BufferSet> coloring(BufferSet& buffers, std::vector<bool>& adj);
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,41 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
#include "snippets/lowered/loop_manager.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface InitLoops
* @brief The pass explicitly insert LoadBegin and LoadEnd in Linear IR using LoopManager::LoopInfo from Loop markup algorithm
* @ingroup snippets
*/
class InitLoops : public Pass {
public:
OPENVINO_RTTI("InsertLoops", "Pass")
InitLoops();
bool run(LinearIR& linear_ir) override;
private:
static void insertion(LinearIR& linear_ir, const LinearIR::LoopManager::LoopInfoPtr& loop_info,
size_t loop_id, size_t dim_idx, bool has_outer_loop);
static std::vector<int64_t> init_ptr_increments(const std::vector<ExpressionPort>& loop_inputs,
const std::vector<ExpressionPort>& loop_outputs,
size_t dim_idx);
static std::vector<int64_t> init_finalization_offsets(const std::vector<int64_t>& finalization_offsets, size_t work_amount);
static std::vector<int64_t> init_element_type_sizes(const std::vector<ExpressionPort>& loop_inputs,
const std::vector<ExpressionPort>& loop_outputs);
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,43 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface InsertBuffers
* @brief The pass inserts Buffer between exit points of one loop (or Brgemm) and
* entry points of another loop (or Brgemm) to store intermediate data.
* The pass should be called after FuseLoops.
* @param m_buffer_allocation_rank - rank of shape for memory allocation: shape[shape_rank - normalize(m_allocation_rank) : shape_rank]
* @ingroup snippets
*/
class InsertBuffers : public Pass {
public:
OPENVINO_RTTI("InsertBuffers", "Pass")
InsertBuffers(int32_t buffer_allocation_rank);
bool run(LinearIR& linear_ir) override;
private:
void insertion(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, size_t loop_id,
const std::vector<ExpressionPort>& loop_entries, const std::vector<ExpressionPort>& loop_exits);
LinearIR::constExprIt insertion_position(const LinearIR& linear_ir,
const LinearIR::LoopManagerPtr& loop_manager,
const ExpressionPtr& up_expr,
const ExpressionPtr& down_expr);
int32_t m_buffer_allocation_rank;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,44 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
#include "snippets/lowered/loop_manager.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface InsertLoadStore
* @brief The pass inserts Load and Store expressions in Linear IR after Parameters, Buffers and before Results, Buffers accordingly.
* Note: The pass should be called after FuseLoops and InsertBuffers passes to have all possible data expressions.
* @param m_vector_size - the count of elements for loading/storing
* @ingroup snippets
*/
class InsertLoadStore : public Pass {
public:
explicit InsertLoadStore(size_t vector_size);
OPENVINO_RTTI("InsertLoadStore", "Pass")
bool run(LinearIR& linear_ir) override;
private:
bool insert_load(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it);
bool insert_store(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it);
void update_loops(const LinearIR::LoopManagerPtr& loop_manager, const std::vector<size_t>& loop_ids,
const ExpressionPort& actual_port, const std::vector<ExpressionPort>& target_ports, bool is_entry = true);
void update_loop(const LinearIR::LoopManager::LoopInfoPtr& loop_info,
const ExpressionPort& actual_port, const std::vector<ExpressionPort>& target_ports, bool is_entry = true);
size_t get_count(const PortDescriptorPtr& port_desc) const;
size_t m_vector_size;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,33 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface InsertTailLoop
* @brief Injects tail-processing loop after a vector loop if required.
* Additional optimizations are performed if a loop body is executed only once.
* @ingroup snippets
*/
class InsertTailLoop : public Pass {
static void tail_transformations(LinearIR& linear_ir,
LinearIR::container::const_iterator tail_begin,
LinearIR::container::const_iterator tail_end,
size_t tail_size);
public:
OPENVINO_RTTI("InsertTailLoop", "Pass")
bool run(LinearIR& linear_ir) override;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -4,24 +4,26 @@
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "pass.hpp"
#include <ngraph/pattern/matcher.hpp>
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace lowered {
namespace pass { namespace pass {
/** /**
* @interface LoadMoveBroadcastToBroadcastLoad * @interface LoadMoveBroadcastToBroadcastLoad
* @brief Fuses consecutive Load and MoveBroadcast into a single load insctruction. * @brief Fuses consecutive Load and MoveBroadcast into a single load insctruction.
* The pass is used to convert model to a canonical form for code generation
* @ingroup snippets * @ingroup snippets
*/ */
class LoadMoveBroadcastToBroadcastLoad: public ngraph::pass::MatcherPass { class LoadMoveBroadcastToBroadcastLoad: public Pass {
public: public:
LoadMoveBroadcastToBroadcastLoad(); LoadMoveBroadcastToBroadcastLoad() = default;
OPENVINO_RTTI("LoadMoveBroadcastToBroadcastLoad", "Pass")
bool run(LinearIR& linear_ir) override;
}; };
} // namespace pass } // namespace pass
} // namespace lowered
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -0,0 +1,36 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface MarkLoops
* @brief The pass marks expressions with Loop IDs.
* The pass iterates expression by expression till the following conditions:
* - the layouts and subtensors them are the same
* - the consumer of the expression is explicitly after this expression - the pass marks the branches
* @ingroup snippets
*/
class MarkLoops : public Pass {
public:
OPENVINO_RTTI("MarkLoops", "Pass")
MarkLoops(size_t vector_size);
bool run(LinearIR& linear_ir) override;
private:
size_t m_vector_size;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,32 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface MoveResultOutOfLoop
* @brief After passes with Loop work Result expressions might be inside Loop.
* It means that Result can be before his Parent and LoopEnd, this situation breaks control dependency and
* create cycle dependency in AssignRegister algorithm.
* The pass extracts Result expressions from Loop and insert after.
* @ingroup snippets
*/
class MoveResultOutOfLoop : public Pass {
public:
OPENVINO_RTTI("MoveResultOutOfLoop", "Pass")
MoveResultOutOfLoop() = default;
bool run(LinearIR& linear_ir) override;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,35 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface MoveScalarToConsumer
* @brief As a result of loop insertion or fusion, Scalar operations might end up outside of the loop where their
* consumer is located. This transformation moves every scalar right before its consumer. This is needed to guarantee
* computation validity and also to optimize register allocation.
* Details:
* If ScalarEmitters are called outside the Loop, and only the first Loop iteration would yield correct data
* (assuming the vector reg assigned to scalar will get corrupted inside the loop body).
* To avoid such cases, we move Constants to the places in Linear IR before right Consumer to execute Scalar on each Loop iteration.
* @ingroup snippets
*/
class MoveScalarToConsumer : public Pass {
public:
OPENVINO_RTTI("MoveScalarsToConsumer", "Pass")
MoveScalarToConsumer() = default;
bool run(LinearIR& linear_ir) override;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,67 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "snippets/lowered/linear_ir.hpp"
#include "openvino/core/rtti.hpp"
#include "openvino/core/type.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface Pass
* @brief Base class for transformations on linear IR
* @ingroup snippets
*/
class Pass {
public:
Pass() = default;
virtual ~Pass() = default;
// Note that get_type_info_static and get_type_info are needed to mimic OPENVINO_RTTI interface,
// so the standard OPENVINO_RTTI(...) macros could be used in derived classes.
_OPENVINO_HIDDEN_METHOD static const ::ov::DiscreteTypeInfo& get_type_info_static() {
static ::ov::DiscreteTypeInfo type_info_static {"Pass"};
type_info_static.hash();
return type_info_static;
}
virtual const DiscreteTypeInfo& get_type_info() const {
return get_type_info_static();
}
const char* get_type_name() const {
return get_type_info().name;
}
virtual bool run(lowered::LinearIR& linear_ir) = 0;
};
class PassPipeline {
public:
PassPipeline() = default;
void register_pass(const std::shared_ptr<Pass>& pass);
template<typename T, class... Args>
void register_pass(Args&&... args) {
static_assert(std::is_base_of<Pass, T>::value, "Pass not derived from lowered::Pass");
auto pass = std::make_shared<T>(std::forward<Args>(args)...);
register_pass(pass);
}
void run(lowered::LinearIR& linear_ir);
private:
std::vector<std::shared_ptr<Pass>> m_passes;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,29 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface PropagateLayout
* @brief Propagate layout from Parameter child to parameter and from Result Parent to Result. This is needed to calculate
* proper data pointer offsets in the Kernel;
* @ingroup snippets
*/
class PropagateLayout : public Pass {
public:
OPENVINO_RTTI("PropagateLayout", "Pass")
bool run(LinearIR& linear_ir) override;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,32 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
/**
* @interface SoftmaxDecomposition
* @brief Decomposes Softmax to a range of low-level operations on linear IR
* @ingroup snippets
*/
class SoftmaxDecomposition : public Pass {
public:
explicit SoftmaxDecomposition(size_t vector_size);
OPENVINO_RTTI("SoftmaxDecomposition", "Pass")
bool run(LinearIR& linear_ir) override;
private:
size_t m_vector_size;
};
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,43 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <vector>
#include "port_descriptor.hpp"
#include "expression_port.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class Expression;
class PortConnector {
public:
PortConnector() = default;
explicit PortConnector(ExpressionPort source_descriptor, const std::set<ExpressionPort>& consumer_descriptors = {});
const ExpressionPort& get_source() const { return m_source_port; }
std::set<ExpressionPort> get_consumers() const { return m_consumer_ports; }
void add_consumer(const ExpressionPort& consumer);
void remove_consumer(const ExpressionPort& consumer);
bool found_consumer(const ExpressionPort& consumer) const;
std::set<ExpressionPort>::const_iterator find_consumer(const ExpressionPort& consumer) const;
std::set<ExpressionPort>::iterator find_consumer(const ExpressionPort& consumer);
private:
ExpressionPort m_source_port;
std::set<ExpressionPort> m_consumer_ports;
};
using PortConnectorPtr = std::shared_ptr<PortConnector>;
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,99 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "openvino/core/node.hpp"
#include "openvino/core/attribute_visitor.hpp"
namespace ov {
namespace snippets {
namespace lowered {
class PortDescriptor;
using PortDescriptorPtr = std::shared_ptr<PortDescriptor>;
class PortDescriptor {
public:
// The structure with service values for scheduling parameters
struct ServiceDimensions {
// The value for the subtensor that means that scheduling should be by full dimension
static size_t FULL_DIM;
};
explicit PortDescriptor(const ov::Input<ov::Node>& node,
std::vector<size_t> subtensor_shape = {},
std::vector<size_t> layout = {});
explicit PortDescriptor(const ov::Input<const ov::Node>& node,
std::vector<size_t> subtensor_shape = {},
std::vector<size_t> layout = {});
explicit PortDescriptor(const ov::Output<ov::Node>& node,
std::vector<size_t> subtensor_shape = {},
std::vector<size_t> layout = {});
explicit PortDescriptor(const ov::Output<const ov::Node>& node,
std::vector<size_t> subtensor_shape = {},
std::vector<size_t> layout = {});
PortDescriptor(std::vector<size_t> shape, std::vector<size_t> subtensor_shape, std::vector<size_t> layout = {});
PortDescriptor() = default;
std::vector<size_t> get_shape() const {return m_tensor_shape;}
std::vector<size_t> get_subtensor() const {return m_subtensor_shape;}
std::vector<size_t> get_layout() const {return m_layout;}
size_t get_reg() const { return m_reg; }
void set_shape(const std::vector<size_t>& tensor) { m_tensor_shape = tensor; }
void set_layout(const std::vector<size_t>& layout) { m_layout = layout; }
void set_subtensor(const std::vector<size_t>& subtensor) { m_subtensor_shape = subtensor; }
void set_reg(size_t reg) { m_reg = reg; }
std::string serialize() const;
bool empty() const { return m_layout.empty() && m_subtensor_shape.empty();}
PortDescriptorPtr clone() const;
friend bool operator==(const PortDescriptor& lhs, const PortDescriptor& rhs);
friend bool operator!=(const PortDescriptor& lhs, const PortDescriptor& rhs) {return !(lhs == rhs);}
private:
void validate_arguments();
/// \brief Original tensor shape
std::vector<size_t> m_tensor_shape{};
/// \brief Order of dimensions: NCHW == {0, 1, 2, 3}, NHWC == {0, 2, 3, 1}, NCHW16c == {0, 1, 2, 3, 1}
std::vector<size_t> m_layout{};
/// \brief Minimal tensor size that could be processed in one call
std::vector<size_t> m_subtensor_shape{};
/// \brief The corresponding abstract/physical register
size_t m_reg = 0;
};
class PortDescriptorUtils {
public:
static void set_port_descriptor_ptr(const ov::Input<ov::Node>& n, const PortDescriptorPtr& desc);
static void set_port_descriptor_ptr(const ov::Output<ov::Node>& n, const PortDescriptorPtr& desc);
static PortDescriptorPtr get_port_descriptor_ptr(const ov::Input<ov::Node>& in);
static PortDescriptorPtr get_port_descriptor_ptr(const ov::Input<const ov::Node>& out);
static PortDescriptorPtr get_port_descriptor_ptr(const ov::Output<ov::Node>& in);
static PortDescriptorPtr get_port_descriptor_ptr(const ov::Output<const ov::Node>& out);
static void clean(const std::shared_ptr<ov::Node>& node);
private:
static void init_default(std::vector<PortDescriptorPtr>& in_descs, std::vector<PortDescriptorPtr>& out_descs, const std::shared_ptr<ov::Node>& node);
};
class PortDescriptorVectorAttribute : public ov::RuntimeAttribute {
public:
OPENVINO_RTTI("PortDescriptorVectorAttribute", "", ov::RuntimeAttribute);
PortDescriptorVectorAttribute() = default;
explicit PortDescriptorVectorAttribute(std::vector<PortDescriptorPtr> in_descs = {}, std::vector<PortDescriptorPtr> out_descs = {})
: inputs(std::move(in_descs)), outputs(std::move(out_descs)) {}
std::vector<PortDescriptorPtr> inputs{};
std::vector<PortDescriptorPtr> outputs{};
};
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -1,13 +1,13 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
#include "memory_access.hpp" #include "memory_access.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -20,7 +20,8 @@ class Brgemm : public MemoryAccess {
public: public:
OPENVINO_OP("Brgemm", "SnippetsOpset", MemoryAccess); OPENVINO_OP("Brgemm", "SnippetsOpset", MemoryAccess);
Brgemm(const Output<Node>& A, const Output<Node>& B, Brgemm(const Output<Node>& A, const Output<Node>& B,
const size_t offset_a = 0lu, const size_t offset_b = 0lu, const size_t offset_c = 0lu); const size_t offset_a = 0lu, const size_t offset_b = 0lu, const size_t offset_c = 0lu,
std::vector<size_t> layout_a = {}, std::vector<size_t> layout_b = {}, std::vector<size_t> layout_c = {});
Brgemm() = default; Brgemm() = default;
size_t get_offset_a() const { return get_input_offset(0); } size_t get_offset_a() const { return get_input_offset(0); }
@ -34,9 +35,15 @@ public:
protected: protected:
ov::element::Type get_output_type() const; ov::element::Type get_output_type() const;
std::vector<ov::PartialShape> get_planar_input_shapes(const std::vector<ov::Input<ov::Node>>& inputs) const;
ov::PartialShape get_output_partial_shape(const std::vector<ov::PartialShape>& input_shapes) const; ov::PartialShape get_output_partial_shape(const std::vector<ov::PartialShape>& input_shapes) const;
ov::PartialShape get_planar_output_shape(const ov::PartialShape& output_shape) const;
private:
void custom_constructor_validate_and_infer_types(std::vector<size_t> layout_a, std::vector<size_t> layout_b, std::vector<size_t> layout_c);
void validate_inputs() const;
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -6,9 +6,9 @@
#include <snippets/op/memory_access.hpp> #include <snippets/op/memory_access.hpp>
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -19,7 +19,7 @@ namespace op {
*/ */
class BroadcastLoad : public MemoryAccess { class BroadcastLoad : public MemoryAccess {
public: public:
OPENVINO_OP("BroadcastLoad", "SnippetsOpset", ngraph::snippets::op::MemoryAccess); OPENVINO_OP("BroadcastLoad", "SnippetsOpset", ov::snippets::op::MemoryAccess);
BroadcastLoad(const Output<Node>& x, ov::PartialShape output_shape, size_t offset = 0lu); BroadcastLoad(const Output<Node>& x, ov::PartialShape output_shape, size_t offset = 0lu);
BroadcastLoad() = default; BroadcastLoad() = default;
@ -36,4 +36,4 @@ private:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,9 +4,9 @@
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -15,7 +15,7 @@ namespace op {
* @brief Added to a subgraph if explicit broadcast instruction should be generated * @brief Added to a subgraph if explicit broadcast instruction should be generated
* @ingroup snippets * @ingroup snippets
*/ */
class BroadcastMove : public ngraph::op::Op { class BroadcastMove : public ov::op::Op {
public: public:
OPENVINO_OP("BroadcastMove", "SnippetsOpset"); OPENVINO_OP("BroadcastMove", "SnippetsOpset");
@ -35,4 +35,4 @@ protected:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,12 +1,12 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -16,18 +16,22 @@ namespace op {
* If Buffer has a parent, the operation is for intermediate data storage - IntermediateMemory type. * If Buffer has a parent, the operation is for intermediate data storage - IntermediateMemory type.
* Otherwise, the operation is for allocation of new empty memory with shape `m_shape` - NewMemory type * Otherwise, the operation is for allocation of new empty memory with shape `m_shape` - NewMemory type
* Notes: * Notes:
* - All buffers in a graph have the same memory pointer. So if we have a few buffers, * - All buffers with the same ID in a graph have the same memory pointer. So if we have a few buffers,
* each the corresponding MemoryAccess op for Buffer should have offset for common memory pointer of this Buffer * each the corresponding MemoryAccess op for Buffer should have offset for common memory pointer of this Buffer
* - Buffer should be a single consumer for operation output port * - Buffer should be a single consumer for operation output port
* @param m_type - type of Buffer: IntermediateMemory/NewMemory
* @param m_shape - output allocation shape for Buffer with type NewMemory
* @param m_offset - offset in common Buffer scratchpad
* @param m_id - Buffer ID in common Buffer system
* @ingroup snippets * @ingroup snippets
*/ */
class Buffer : public ngraph::op::Op { class Buffer : public ov::op::Op {
public: public:
OPENVINO_OP("Buffer", "SnippetsOpset"); OPENVINO_OP("Buffer", "SnippetsOpset");
Buffer() = default; Buffer() = default;
Buffer(const ov::Shape& shape); Buffer(const ov::Shape& shape, size_t id = 0);
Buffer(const ov::Output<ov::Node>& arg, const ov::Shape& shape); Buffer(const ov::Output<ov::Node>& arg, const ov::Shape& shape, size_t id = 0);
Buffer(const ov::Output<ov::Node>& arg, int32_t allocation_rank = -1); Buffer(const ov::Output<ov::Node>& arg, int32_t allocation_rank = -1, size_t id = 0);
bool visit_attributes(AttributeVisitor& visitor) override; bool visit_attributes(AttributeVisitor& visitor) override;
void validate_and_infer_types() override; void validate_and_infer_types() override;
@ -38,8 +42,13 @@ public:
IntermediateMemory IntermediateMemory
}; };
size_t get_id() const { return m_id; }
Type get_type() const { return m_type; } Type get_type() const { return m_type; }
ov::Shape get_allocation_shape() const { return m_shape; } ov::Shape get_allocation_shape() const { return m_shape; }
int64_t get_offset() const { return m_offset; }
void set_id(size_t id) { m_id = id; }
void set_offset(int64_t offset) { m_offset = offset; }
size_t get_byte_size() const; size_t get_byte_size() const;
bool is_intermediate_memory() const { return m_type == Type::IntermediateMemory; } bool is_intermediate_memory() const { return m_type == Type::IntermediateMemory; }
@ -48,8 +57,10 @@ public:
private: private:
Type m_type = Type::IntermediateMemory; Type m_type = Type::IntermediateMemory;
ov::Shape m_shape = {}; ov::Shape m_shape = {};
int64_t m_offset = 0;
size_t m_id = 0; // Default ID - 0. All Buffers are from the same set
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -5,9 +5,9 @@
#pragma once #pragma once
#include <openvino/op/convert.hpp> #include <openvino/op/convert.hpp>
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -35,4 +35,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -5,9 +5,9 @@
#pragma once #pragma once
#include <openvino/op/convert.hpp> #include <openvino/op/convert.hpp>
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -34,4 +34,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,12 +1,12 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -20,7 +20,7 @@ namespace op {
* - fill_value - hexadecimal filling value * - fill_value - hexadecimal filling value
* @ingroup snippets * @ingroup snippets
*/ */
class Fill : public ngraph::op::Op { class Fill : public ov::op::Op {
public: public:
OPENVINO_OP("Fill", "SnippetsOpset"); OPENVINO_OP("Fill", "SnippetsOpset");
@ -44,4 +44,4 @@ protected:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,12 +1,12 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -15,7 +15,7 @@ namespace op {
* @brief The operation calculates a horizon maximum of a vector register * @brief The operation calculates a horizon maximum of a vector register
* @ingroup snippets * @ingroup snippets
*/ */
class HorizonMax : public ngraph::op::Op { class HorizonMax : public ov::op::Op {
public: public:
OPENVINO_OP("HorizonMax", "SnippetsOpset"); OPENVINO_OP("HorizonMax", "SnippetsOpset");
@ -29,4 +29,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,12 +1,12 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -15,7 +15,7 @@ namespace op {
* @brief The operation calculates a horizon sum of a vector register * @brief The operation calculates a horizon sum of a vector register
* @ingroup snippets * @ingroup snippets
*/ */
class HorizonSum : public ngraph::op::Op { class HorizonSum : public ov::op::Op {
public: public:
OPENVINO_OP("HorizonSum", "SnippetsOpset"); OPENVINO_OP("HorizonSum", "SnippetsOpset");
@ -29,4 +29,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,10 +4,10 @@
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
#include "snippets/emitter.hpp" #include "snippets/lowered/linear_ir.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -16,22 +16,21 @@ namespace op {
* @brief Generated by Canonicalization and represents compute kernel legal for scheduling * @brief Generated by Canonicalization and represents compute kernel legal for scheduling
* @ingroup snippets * @ingroup snippets
*/ */
class Kernel : public ngraph::op::Op { class Kernel : public ov::op::Op {
public: public:
OPENVINO_OP("Kernel", "SnippetsOpset"); OPENVINO_OP("Kernel", "SnippetsOpset");
Kernel(std::vector<AllocatedEmitter> region, std::shared_ptr<const ov::Model> m); Kernel(lowered::LinearIR region);
Kernel() = default; Kernel() = default;
std::vector<AllocatedEmitter> region; lowered::LinearIR region;
const std::shared_ptr<const ov::Model> model;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override { std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
return std::make_shared<Kernel>(region, model); return std::make_shared<Kernel>(region);
} }
const void *compile_params = nullptr; const void *compile_params = nullptr;
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,10 +4,10 @@
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
#include "snippets/op/memory_access.hpp" #include "snippets/op/memory_access.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -33,6 +33,9 @@ public:
void validate_and_infer_types() override; void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override; std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
protected:
void validate_memory_access_params() const;
}; };
/** /**
@ -60,4 +63,4 @@ private:
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,14 +1,14 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
#include "snippets/emitter.hpp" #include "snippets/emitter.hpp"
#include "ngraph/op/parameter.hpp" #include "ngraph/op/parameter.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -17,20 +17,12 @@ namespace op {
* @brief Base class for LoopBegin and LoopEnd * @brief Base class for LoopBegin and LoopEnd
* @ingroup snippets * @ingroup snippets
*/ */
class LoopBase : public ngraph::op::Op { class LoopBase : public ov::op::Op {
public: public:
OPENVINO_OP("LoopBase", "SnippetsOpset"); OPENVINO_OP("LoopBase", "SnippetsOpset");
LoopBase(const std::vector<Output<Node>>& args, size_t work_amount, size_t increment); LoopBase(const std::vector<Output<Node>>& args);
LoopBase() = default; LoopBase() = default;
bool visit_attributes(AttributeVisitor& visitor) override;
size_t get_work_amount() const;
size_t get_increment() const;
bool get_evaluate_once() const;
protected: protected:
size_t work_amount;
size_t work_amount_increment;
bool evaluate_once; // true if the Loop is executed only once, used to skip setting and testing the loop counter
}; };
class LoopEnd; class LoopEnd;
/** /**
@ -45,18 +37,16 @@ class LoopBegin : public LoopBase {
public: public:
OPENVINO_OP("LoopBegin", "SnippetsOpset", LoopBase); OPENVINO_OP("LoopBegin", "SnippetsOpset", LoopBase);
explicit LoopBegin(const OutputVector& args); LoopBegin();
LoopBegin() = default;
void validate_and_infer_types() override; void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override; std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override;
std::shared_ptr<LoopEnd> get_loop_end(); std::shared_ptr<LoopEnd> get_loop_end() const;
// begin_address and input_regs are needed to communicate information between LoopBegin and LoopEnd emitters bool visit_attributes(AttributeVisitor& visitor) override;
// begin_address are needed to communicate information between LoopBegin and LoopEnd emitters
const uint8_t* begin_address; const uint8_t* begin_address;
std::vector<size_t> input_regs;
private: private:
void validate_and_infer_types_except_LoopEnd(); void validate_and_infer_types_except_LoopEnd();
LoopBegin(const std::vector<Output<Node>>& args, size_t work_amount, size_t work_amount_increment);
}; };
/** /**
@ -77,16 +67,21 @@ private:
class LoopEnd : public LoopBase { class LoopEnd : public LoopBase {
public: public:
OPENVINO_OP("LoopEnd", "SnippetsOpset", LoopBase); OPENVINO_OP("LoopEnd", "SnippetsOpset", LoopBase);
LoopEnd(const std::vector<Output<Node>>& args, size_t work_amount, size_t work_amount_increment, LoopEnd(const Output<Node>& loop_begin, size_t work_amount, size_t work_amount_increment,
std::vector<bool> apply_increment, std::vector<int64_t> finalization_offsets); std::vector<bool> apply_increment, std::vector<int64_t> finalization_offsets,
LoopEnd(const std::vector<Output<Node>>& args, size_t work_amount, size_t work_amount_increment, std::vector<int64_t> element_type_sizes, size_t input_num, size_t output_num);
std::vector<int64_t> ptr_increments, std::vector<int64_t> finalization_offsets); LoopEnd(const Output<Node>& loop_begin, size_t work_amount, size_t work_amount_increment,
std::vector<int64_t> ptr_increments, std::vector<int64_t> finalization_offsets,
std::vector<int64_t> element_type_sizes, size_t input_num, size_t output_num);
LoopEnd() = default; LoopEnd() = default;
std::shared_ptr<LoopBegin> get_loop_begin(); std::shared_ptr<LoopBegin> get_loop_begin();
void validate_and_infer_types() override; void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override; std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override;
const std::vector<int64_t>& get_finalization_offsets() const; const std::vector<int64_t>& get_finalization_offsets() const;
const std::vector<int64_t>& get_ptr_increments() const; const std::vector<int64_t>& get_ptr_increments() const;
const std::vector<int64_t>& get_element_type_sizes() const;
size_t get_input_num() const;
size_t get_output_num() const;
void set_finalization_offsets(std::vector<int64_t> offsets); void set_finalization_offsets(std::vector<int64_t> offsets);
void set_ptr_increments(std::vector<int64_t> new_ptr_increments); void set_ptr_increments(std::vector<int64_t> new_ptr_increments);
// update_ptr_increments resets non-zero increments to the new_increments. It's used when work_amount_increment is // update_ptr_increments resets non-zero increments to the new_increments. It's used when work_amount_increment is
@ -98,14 +93,23 @@ public:
// Used to propagate information about Loop structure, needed to simplify some optimizations. For example, // Used to propagate information about Loop structure, needed to simplify some optimizations. For example,
// to skip pointer increments when outer Loop is empty, and work_amount == vector_size (one inner vector Loop) // to skip pointer increments when outer Loop is empty, and work_amount == vector_size (one inner vector Loop)
// true by default, the optimizations enabled if it's false; // true by default, the optimizations enabled if it's false;
bool has_outer_loop; bool has_outer_loop = true;
size_t get_work_amount() const;
size_t get_increment() const;
bool get_evaluate_once() const;
bool visit_attributes(AttributeVisitor& visitor) override;
private: private:
std::vector<int64_t> ptr_increments; std::vector<int64_t> ptr_increments = {};
std::vector<int64_t> finalization_offsets; std::vector<int64_t> finalization_offsets = {};
size_t loop_io_size; std::vector<int64_t> element_type_sizes = {};
size_t work_amount = 0;
size_t work_amount_increment = 0;
size_t input_num = 0;
size_t output_num = 0;
bool evaluate_once = false; // true if the Loop is executed only once, used to skip setting and testing the loop counter
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,9 +4,9 @@
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -14,12 +14,12 @@ namespace op {
* @interface MemoryAccess * @interface MemoryAccess
* @brief This is a base class for memory access operations (like Load and Store). * @brief This is a base class for memory access operations (like Load and Store).
* It provides universal interface to manipulate with memory: load/store. * It provides universal interface to manipulate with memory: load/store.
* @param m_input_ports - vector of input descriptors: variables of PortDescriptor class * @param m_input_ports - map of input descriptors: variables of PortDescriptor class
* @param m_output_ports - vector of output descriptors: variables of PortDescriptor class * @param m_output_ports - map of output descriptors: variables of PortDescriptor class
* @ingroup snippets * @ingroup snippets
*/ */
class MemoryAccess : public ngraph::op::Op { class MemoryAccess : public ov::op::Op {
public: public:
OPENVINO_OP("MemoryAccess", "SnippetsOpset"); OPENVINO_OP("MemoryAccess", "SnippetsOpset");
@ -55,24 +55,35 @@ public:
size_t get_input_offset(size_t idx = 0) const; size_t get_input_offset(size_t idx = 0) const;
size_t get_output_offset(size_t idx = 0) const; size_t get_output_offset(size_t idx = 0) const;
size_t get_input_port_count() const { return m_input_ports.size(); } std::map<size_t, PortDescriptor> get_memory_access_input_ports() const { return m_input_ports; }
size_t get_output_port_count() const { return m_output_ports.size(); } std::map<size_t, PortDescriptor> get_memory_access_output_ports() const { return m_output_ports; }
bool is_memory_access_input_port(size_t idx) const;
bool is_memory_access_output_port(size_t idx) const;
// All input and output ports are MemoryAccess
bool is_full_memory_access_op() const;
bool visit_attributes(AttributeVisitor& visitor) override; bool visit_attributes(AttributeVisitor& visitor) override;
protected: protected:
explicit MemoryAccess(const OutputVector& arguments, size_t input_count = 0, size_t output_count = 0); explicit MemoryAccess(const OutputVector& arguments, size_t input_count = 0, size_t output_count = 0);
explicit MemoryAccess(const OutputVector& arguments, const std::set<size_t>& input_ports, const std::set<size_t>& output_ports);
MemoryAccess() = default; MemoryAccess() = default;
// This method can be called only in ctors
void ctor_initialize(const std::set<size_t>& input_ports, const std::set<size_t>& output_ports);
void set_input_port_descriptor(const PortDescriptor& desc, const size_t i); void set_input_port_descriptor(const PortDescriptor& desc, const size_t i);
void set_output_port_descriptor(const PortDescriptor& desc, const size_t i); void set_output_port_descriptor(const PortDescriptor& desc, const size_t i);
const PortDescriptor& get_input_port_descriptor(const size_t i) const; const PortDescriptor& get_input_port_descriptor(const size_t i) const;
const PortDescriptor& get_output_port_descriptor(const size_t i) const; const PortDescriptor& get_output_port_descriptor(const size_t i) const;
std::vector<PortDescriptor> m_input_ports; // [port_num, port_desc]
std::vector<PortDescriptor> m_output_ports; std::map<size_t, PortDescriptor> m_input_ports;
std::map<size_t, PortDescriptor> m_output_ports;
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,9 +4,9 @@
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -15,7 +15,7 @@ namespace op {
* @brief Generated by Canonicalization and represents not-an-operation * @brief Generated by Canonicalization and represents not-an-operation
* @ingroup snippets * @ingroup snippets
*/ */
class Nop : public ngraph::op::Op { class Nop : public ov::op::Op {
public: public:
OPENVINO_OP("Nop", "SnippetsOpset"); OPENVINO_OP("Nop", "SnippetsOpset");
@ -29,4 +29,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,11 +4,11 @@
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
#include <ngraph/op/power.hpp> #include <ngraph/op/power.hpp>
#include <snippets/snippets_isa.hpp> #include <snippets/snippets_isa.hpp>
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -41,4 +41,4 @@ private:
}; };
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,10 +4,10 @@
#pragma once #pragma once
#include "ngraph/op/op.hpp" #include "openvino/op/op.hpp"
#include "ngraph/op/constant.hpp" #include "ngraph/op/constant.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -41,4 +41,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -0,0 +1,37 @@
// Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "openvino/op/op.hpp"
#include <snippets/snippets_isa.hpp>
#include <snippets/lowered/expression.hpp>
namespace ov {
namespace snippets {
namespace op {
/**
* @interface SerializationNode
* @brief Fake node needed to serialize lowered::Expression sessionIR
* @ingroup snippets
*/
class SerializationNode : public ov::op::Op {
public:
OPENVINO_OP("SerializationNode", "SnippetsOpset");
SerializationNode() = default;
SerializationNode(const Output<Node> &arg, const std::shared_ptr<lowered::Expression>& expr);
void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector &new_args) const override;
bool visit_attributes(AttributeVisitor &visitor) override;
private:
std::shared_ptr<lowered::Expression> m_expr;
};
} // namespace op
} // namespace snippets
} // namespace ov

View File

@ -4,10 +4,10 @@
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
#include "snippets/op/memory_access.hpp" #include "snippets/op/memory_access.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -37,4 +37,4 @@ public:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -8,13 +8,13 @@
#include <openvino/core/model.hpp> #include <openvino/core/model.hpp>
#include <openvino/op/util/sub_graph_base.hpp> #include <openvino/op/util/sub_graph_base.hpp>
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
#include <ngraph/rt_info.hpp> #include "openvino/core/rt_info.hpp"
#include <ngraph/pass/manager.hpp> #include <ngraph/pass/manager.hpp>
#include "snippets/generator.hpp" #include "snippets/generator.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -69,7 +69,7 @@ public:
// //
// D = < 1, 3, 17, 15, 32> < 0, 1, 2, 3, 4> // D = < 1, 3, 17, 15, 32> < 0, 1, 2, 3, 4>
// E = < 1, 3, 17, 1, 32> < 0, 1, 2, 3, 4> // E = < 1, 3, 17, 1, 32> < 0, 1, 2, 3, 4>
using BlockedShape = std::tuple<ngraph::PartialShape, ngraph::AxisVector, ngraph::element::Type>; using BlockedShape = std::tuple<ov::PartialShape, ov::AxisVector, ov::element::Type>;
using BlockedShapeVector = std::vector<BlockedShape>; using BlockedShapeVector = std::vector<BlockedShape>;
Subgraph() = default; Subgraph() = default;
@ -92,24 +92,25 @@ public:
const ov::Model& body() const { return *m_bodies[0]; } const ov::Model& body() const { return *m_bodies[0]; }
ov::Model& body() { return *m_bodies[0]; } ov::Model& body() { return *m_bodies[0]; }
const std::shared_ptr<ngraph::snippets::Generator>& get_generator() const { return m_generator; } const std::shared_ptr<ov::snippets::Generator>& get_generator() const { return m_generator; }
std::shared_ptr<ngraph::snippets::Generator> & get_generator() { return m_generator; } std::shared_ptr<ov::snippets::Generator>& get_generator() { return m_generator; }
size_t get_buffer_scratchpad_size() const { return m_buffer_scratchpad; } size_t get_buffer_scratchpad_size() const { return m_buffer_scratchpad; }
size_t get_virtual_port_count() const { return m_virtual_port_count; } size_t get_virtual_port_count() const { return m_virtual_port_count; }
bool is_buffer_needed() const { return m_buffer_needed; }
bool is_quantized() const { return config.m_is_quantized; } bool is_quantized() const { return config.m_is_quantized; }
bool has_domain_sensitive_ops() const { return config.m_has_domain_sensitive_ops; } bool has_domain_sensitive_ops() const { return config.m_has_domain_sensitive_ops; }
snippets::Schedule generate(const BlockedShapeVector& output_shapes, snippets::Schedule generate(const BlockedShapeVector& output_shapes,
const BlockedShapeVector& input_shapes, const BlockedShapeVector& input_shapes,
ngraph::pass::Manager& pre_dialect, ov::pass::Manager& pre_common,
ngraph::pass::Manager& post_dialect, ov::pass::Manager& post_common,
ngraph::pass::Manager& post_precision, ov::pass::Manager& post_precision,
lowered::pass::PassPipeline& target_lowered_pipeline,
const void* compile_params = nullptr); const void* compile_params = nullptr);
snippets::Schedule generate(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes, const void* compile_params = nullptr); snippets::Schedule generate(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes, const void* compile_params = nullptr);
snippets::Schedule generate(ngraph::pass::Manager& pre_dialect, snippets::Schedule generate(ov::pass::Manager& pre_common,
ngraph::pass::Manager& post_dialect, ov::pass::Manager& post_common,
ngraph::pass::Manager& post_precision, ov::pass::Manager& post_precision,
lowered::pass::PassPipeline& target_lowered_pipeline,
const void* compile_params = nullptr); const void* compile_params = nullptr);
snippets::Schedule generate(const void* compile_params = nullptr); snippets::Schedule generate(const void* compile_params = nullptr);
ov::PartialShape canonicalize(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes); ov::PartialShape canonicalize(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes);
@ -118,10 +119,9 @@ public:
// plugin sets generator for a snippet to some specific generator. // plugin sets generator for a snippet to some specific generator.
// it's going to be replaced with Jitters table later // it's going to be replaced with Jitters table later
void set_generator(std::shared_ptr<ngraph::snippets::Generator> generator); void set_generator(std::shared_ptr<ov::snippets::Generator> generator);
void set_tile_rank(size_t newRank) {tileRank = newRank;} void set_tile_rank(size_t newRank) {tileRank = newRank;}
void set_virtual_port_count(const size_t count); void set_virtual_port_count(const size_t count);
void set_buffer_needed(const bool need);
void print() const; void print() const;
void print_statistics(bool verbose); void print_statistics(bool verbose);
@ -129,7 +129,7 @@ public:
void serialize() const; void serialize() const;
void set_master_shape(ov::PartialShape new_shape) {master_shape = std::move(new_shape);} void set_master_shape(ov::PartialShape new_shape) {master_shape = std::move(new_shape);}
static auto wrap_node_as_subgraph(const std::shared_ptr<ngraph::Node>& node) -> std::shared_ptr<Subgraph>; static auto wrap_node_as_subgraph(const std::shared_ptr<ov::Node>& node) -> std::shared_ptr<Subgraph>;
static void fill_empty_output_names(const Output<Node>& target_output_node, const Output<Node>& replacement_output_node); static void fill_empty_output_names(const Output<Node>& target_output_node, const Output<Node>& replacement_output_node);
// Non-scalar Constants are tokenized as Parameters inside Subgraph body but some operations with constant inputs // Non-scalar Constants are tokenized as Parameters inside Subgraph body but some operations with constant inputs
@ -138,23 +138,23 @@ public:
static auto constant_input_should_be_inside_body(const std::shared_ptr<ov::Node>& node) -> bool; static auto constant_input_should_be_inside_body(const std::shared_ptr<ov::Node>& node) -> bool;
static bool check_broadcast(const std::shared_ptr<const ov::Node>& node) noexcept; static bool check_broadcast(const std::shared_ptr<const ov::Node>& node) noexcept;
// Return estimated unique buffer count (upper bound). It's needed for tokenization
static auto get_estimated_buffer_count(const ov::NodeVector& ops) -> size_t;
static auto is_domain_sensitive_op(const std::shared_ptr<ov::Node>& op) -> bool;
private: private:
void align_element_types(const BlockedShapeVector& outputShapes, const BlockedShapeVector& inputShapes); void align_element_types(const BlockedShapeVector& outputShapes, const BlockedShapeVector& inputShapes);
void convert_to_snippet_dialect(); void data_flow_transformations(ov::pass::Manager& pre_common, ov::pass::Manager& post_common, ov::pass::Manager& post_precision);
void control_flow_transformations(lowered::LinearIR& linear_ir, lowered::pass::PassPipeline& target_pipeline);
void init_config(); void init_config();
void initialize_buffer_scratchpad_size();
// Count of Subgraph virtual ports: // Count of Subgraph virtual ports:
// - Potential non-scalar Constants that will be created after some transformations (At the moment it's relevant only for FakeQuantize decomposition) // - Potential non-scalar Constants that will be created after some transformations (At the moment it's relevant only for FakeQuantize decomposition)
// Need Buffer op or not
// - Buffers. All Buffers are considered as one common additional virtual port. So we cannot summarize them as potential non-scalar Constants
// NOTE: To avoid overheads in each calculation of this count (for example, in validate_and_type_infer()), // NOTE: To avoid overheads in each calculation of this count (for example, in validate_and_type_infer()),
// we should MANUALLY calculate it where it needed. // we should MANUALLY calculate it where it needed.
size_t m_virtual_port_count = 0; size_t m_virtual_port_count = 0;
bool m_buffer_needed = false;
size_t m_buffer_scratchpad = 0lu; size_t m_buffer_scratchpad = 0lu;
Shape exec_domain = {}; Shape exec_domain = {};
std::shared_ptr<ngraph::snippets::Generator> m_generator = nullptr; std::shared_ptr<ov::snippets::Generator> m_generator = nullptr;
ov::PartialShape master_shape; ov::PartialShape master_shape;
size_t tileRank = 0; // set by plugin to specify the number of dimensions processed in a single kernel call size_t tileRank = 0; // set by plugin to specify the number of dimensions processed in a single kernel call
@ -171,9 +171,6 @@ private:
// True if body has operations that don't support plugin-side domain optimizations // True if body has operations that don't support plugin-side domain optimizations
// (e.g. Transpose, Softmax, MatMul in general doesn't support dimensions collapsing) // (e.g. Transpose, Softmax, MatMul in general doesn't support dimensions collapsing)
bool m_has_domain_sensitive_ops = false; bool m_has_domain_sensitive_ops = false;
// True if we should go through whole body to check for where loops should be explicitly inserted.
// Otherwise, we insert Loops on Parameters and Results - for example, it's optimized out for subgraph with only Eltwise ops
bool m_explicit_loop_insertion = false;
} config; } config;
}; };
@ -182,13 +179,13 @@ static inline std::ostream& operator<<(std::ostream& os, const op::Subgraph::Blo
return os; return os;
} }
static inline auto create_body(std::string name, const ngraph::ResultVector& results, const ngraph::ParameterVector& parameters) -> static inline auto create_body(std::string name, const ov::ResultVector& results, const ov::ParameterVector& parameters) ->
std::shared_ptr<ov::Model> { std::shared_ptr<ov::Model> {
auto body = std::make_shared<ov::Model>(results, parameters, name); auto body = std::make_shared<ov::Model>(results, parameters, name);
return body; return body;
}; };
static inline auto build_subgraph(const std::shared_ptr<ngraph::Node>& node, const ngraph::OutputVector& inputs, static inline auto build_subgraph(const std::shared_ptr<ov::Node>& node, const ov::OutputVector& inputs,
const std::shared_ptr<ov::Model>& body, const std::string name = "") const std::shared_ptr<ov::Model>& body, const std::string name = "")
-> std::shared_ptr<Subgraph>{ -> std::shared_ptr<Subgraph>{
auto subgraph = std::make_shared<Subgraph>(inputs, body); auto subgraph = std::make_shared<Subgraph>(inputs, body);
@ -197,16 +194,16 @@ static inline auto build_subgraph(const std::shared_ptr<ngraph::Node>& node, con
return subgraph; return subgraph;
}; };
// Need to update tensor name manually, since intel_cpu::Graph::Replicate() looks at input.get_tensor().get_name(); // Need to update tensor name manually, since intel_cpu::Graph::Replicate() looks at input.get_shape().get_name();
// If subgraph->get_output_size() == 1, then the name will be restored correctly from the node name // If subgraph->get_output_size() == 1, then the name will be restored correctly from the node name
auto inline update_out_tensor_name(const std::shared_ptr<ngraph::snippets::op::Subgraph>& subgraph) -> void { auto inline update_out_tensor_name(const std::shared_ptr<ov::snippets::op::Subgraph>& subgraph) -> void {
bool not_set = true; bool not_set = true;
for (unsigned int i = 0; i < subgraph->get_output_size() && not_set; i++) { for (unsigned int i = 0; i < subgraph->get_output_size() && not_set; i++) {
for (const auto &in : subgraph->get_output_target_inputs(i)) { for (const auto& in : subgraph->get_output_target_inputs(i)) {
if (ov::is_type<ov::op::v0::Result>(in.get_node())) { if (ov::is_type<ov::op::v0::Result>(in.get_node())) {
const auto& body_result = subgraph->body_ptr()->get_output_op(i); const auto& body_result = subgraph->body_ptr()->get_output_op(i);
const auto& body_result_input = body_result->get_input_source_output(0); const auto& body_result_input = body_result->get_input_source_output(0);
ngraph::snippets::op::Subgraph::fill_empty_output_names( ov::snippets::op::Subgraph::fill_empty_output_names(
subgraph->output(i), body_result_input); subgraph->output(i), body_result_input);
not_set = false; not_set = false;
break; break;
@ -217,4 +214,4 @@ auto inline update_out_tensor_name(const std::shared_ptr<ngraph::snippets::op::S
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,12 +1,12 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/op/op.hpp> #include "openvino/op/op.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace op { namespace op {
@ -15,7 +15,7 @@ namespace op {
* @brief The operation is for intermediate data storage in vector register * @brief The operation is for intermediate data storage in vector register
* @ingroup snippets * @ingroup snippets
*/ */
class VectorBuffer : public ngraph::op::Op { class VectorBuffer : public ov::op::Op {
public: public:
OPENVINO_OP("VectorBuffer", "SnippetsOpset"); OPENVINO_OP("VectorBuffer", "SnippetsOpset");
@ -31,4 +31,4 @@ private:
} // namespace op } // namespace op
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,34 +0,0 @@
// Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/pass.hpp>
#include "snippets/generator.hpp"
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface AssignRegisters
* @brief Assigns internal `vector` register indexes to operations.
* Changing order of variables or datafrow lead to invalidation of register assignment.
* @ingroup snippets
*/
class AssignRegisters : public ngraph::pass::FunctionPass {
public:
explicit AssignRegisters(const std::function<Generator::opRegType(const std::shared_ptr<Node>& op)>& mapper) : m_reg_type_mapper(mapper) {
set_property(ngraph::pass::PassProperty::REQUIRE_STATIC_SHAPE, true);
}
bool run_on_model(const std::shared_ptr<ov::Model>& m) override;
private:
std::function<Generator::opRegType(const std::shared_ptr<Node>& op)> m_reg_type_mapper;
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -1,13 +1,13 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/pattern/matcher.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/graph_rewrite.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -17,7 +17,7 @@ namespace pass {
* Otherwise the pass removes Broadcast operation. * Otherwise the pass removes Broadcast operation.
* @ingroup snippets * @ingroup snippets
*/ */
class BroadcastToMoveBroadcast: public ngraph::pass::MatcherPass { class BroadcastToMoveBroadcast: public ov::pass::MatcherPass {
public: public:
BroadcastToMoveBroadcast(); BroadcastToMoveBroadcast();
}; };
@ -25,4 +25,4 @@ public:
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,12 +4,11 @@
#pragma once #pragma once
#include <ngraph/ngraph.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/pattern/matcher.hpp"
#include <ngraph/pattern/matcher.hpp>
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -35,16 +34,16 @@ namespace pass {
* Scalar constants are placed as is into subgraph due to optimization purpose * Scalar constants are placed as is into subgraph due to optimization purpose
* @ingroup snippets * @ingroup snippets
*/ */
class TokenizeSnippets: public ngraph::pass::MatcherPass { class TokenizeSnippets: public ov::pass::MatcherPass {
public: public:
OPENVINO_RTTI("TokenizeSnippets", "0"); OPENVINO_RTTI("TokenizeSnippets", "0");
explicit TokenizeSnippets(); explicit TokenizeSnippets();
static bool AppropriateForSubgraph(const std::shared_ptr<const Node>&); static bool AppropriateForSubgraph(const std::shared_ptr<const Node>&);
static const std::set<ngraph::element::Type> supported_element_types; static const std::set<ov::element::Type> supported_element_types;
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,22 +1,22 @@
// Copyright (C) 2022 Intel Corporation // Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
class CommonOptimizations : public ngraph::pass::MatcherPass { class CommonOptimizations : public ov::pass::MatcherPass {
public: public:
NGRAPH_RTTI_DECLARATION; OPENVINO_RTTI("CommonOptimizations", "0");
CommonOptimizations(); CommonOptimizations();
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,13 +1,13 @@
// Copyright (C) 2022 Intel Corporation // Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -17,11 +17,11 @@ namespace pass {
* Only single-value (0D) constants are currently supported. * Only single-value (0D) constants are currently supported.
* @ingroup snippets * @ingroup snippets
*/ */
class ConvertConstantsToScalars: public ngraph::pass::MatcherPass { class ConvertConstantsToScalars: public ov::pass::MatcherPass {
public: public:
ConvertConstantsToScalars(); ConvertConstantsToScalars();
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,13 +1,13 @@
// Copyright (C) 2022 Intel Corporation // Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -16,11 +16,11 @@ namespace pass {
* @brief Replace Power with a scalar input with snippets::op::PowerStatic for generation of a more optimal code. * @brief Replace Power with a scalar input with snippets::op::PowerStatic for generation of a more optimal code.
* @ingroup snippets * @ingroup snippets
*/ */
class ConvertPowerToPowerStatic: public ngraph::pass::MatcherPass { class ConvertPowerToPowerStatic: public ov::pass::MatcherPass {
public: public:
ConvertPowerToPowerStatic(); ConvertPowerToPowerStatic();
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,13 +1,13 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -22,11 +22,11 @@ namespace pass {
* change Transpose order to {0, 2, 3, 1} which is supported by Snippets * change Transpose order to {0, 2, 3, 1} which is supported by Snippets
* @ingroup snippets * @ingroup snippets
*/ */
class ExplicitTransposeMatMulInputs: public ngraph::pass::MatcherPass { class ExplicitTransposeMatMulInputs: public ov::pass::MatcherPass {
public: public:
ExplicitTransposeMatMulInputs(); ExplicitTransposeMatMulInputs();
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,13 +4,12 @@
#pragma once #pragma once
#include "ngraph/op/fake_quantize.hpp" #include "openvino/op/fake_quantize.hpp"
#include "ngraph/pass/graph_rewrite.hpp" #include "openvino/pass/graph_rewrite.hpp"
#include "ngraph/pass/constant_folding.hpp"
#include "snippets/pass/transform_convert.hpp" #include "snippets/pass/transform_convert.hpp"
#include "transformations_visibility.hpp" #include "transformations_visibility.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -50,18 +49,18 @@ namespace pass {
* *
*/ */
class FakeQuantizeDecomposition : public ngraph::pass::MatcherPass { class FakeQuantizeDecomposition : public ov::pass::MatcherPass {
public: public:
FakeQuantizeDecomposition(); FakeQuantizeDecomposition();
static bool getScalesAndShifts(const std::shared_ptr<const ngraph::op::v0::FakeQuantize>& fq_node, static bool getScalesAndShifts(const std::shared_ptr<const ov::op::v0::FakeQuantize>& fq_node,
std::vector<float>& cl, std::vector<float>& cl,
std::vector<float>& ch, std::vector<float>& ch,
std::vector<float>& isc, std::vector<float>& isc,
std::vector<float>& ish, std::vector<float>& ish,
std::vector<float>& osc, std::vector<float>& osc,
std::vector<float>& osh); std::vector<float>& osh);
static std::vector<float> calculateScales(const ngraph::element::Type& out_type, static std::vector<float> calculateScales(const ov::element::Type& out_type,
const std::vector<float>& cl, const std::vector<float>& cl,
const std::vector<float>& ch, const std::vector<float>& ch,
const std::vector<float>& isc, const std::vector<float>& isc,
@ -80,11 +79,11 @@ public:
* 2. ConstantFolding * 2. ConstantFolding
* 3. Validate * 3. Validate
*/ */
class CommonFakeQuantizeDecomposition: public ngraph::pass::FunctionPass { class CommonFakeQuantizeDecomposition: public ov::pass::ModelPass {
public: public:
bool run_on_model(const std::shared_ptr<ngraph::Function>& m) override; bool run_on_model(const std::shared_ptr<ov::Model>& m) override;
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,13 +1,17 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include "ngraph/pass/graph_rewrite.hpp" #include "openvino/pass/graph_rewrite.hpp"
#include "ngraph/pattern/matcher.hpp" #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { #include "openvino/op/transpose.hpp"
#include "snippets/lowered/port_descriptor.hpp"
namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -18,13 +22,16 @@ namespace pass {
* but only 0213 Transpose is currently supported. * but only 0213 Transpose is currently supported.
* @ingroup snippets * @ingroup snippets
*/ */
class FuseTransposeBrgemm: public ngraph::pass::MatcherPass { class FuseTransposeBrgemm: public ov::pass::MatcherPass {
public: public:
OPENVINO_RTTI("FuseTransposeBrgemm", "0"); OPENVINO_RTTI("FuseTransposeBrgemm", "0");
FuseTransposeBrgemm(); FuseTransposeBrgemm();
static const std::set<std::vector<int>> supported_cases; static const std::set<std::vector<int>> supported_cases;
private:
static bool is_supported_transpose(const Output<Node>& transpose_port);
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,30 +0,0 @@
// Copyright (C) 2018-2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface InsertBuffer
* @brief The pass inserts Buffers on Inputs and Outputs of special operations [Softmax, Transpose] is it's needed
* @param allocation_rank - rank of shape for Buffer memory allocation: shape[shape_rank - normalize(m_allocation_rank) : shape_rank].
* It's needed to allocate needed memory size that depends on Tile rank, for example.
* Default value is -1 (full shape)
* @ingroup snippets
*/
class InsertBuffer: public ngraph::pass::MatcherPass {
public:
InsertBuffer(const int32_t allocation_rank = -1);
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -1,39 +0,0 @@
// Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface InsertLoad
* @brief Inserts explicit load instruction after each parameter and buffer.
* The pass is used to convert model to a canonical form for code generation
* @ingroup snippets
*/
class InsertLoad: public ngraph::pass::MatcherPass {
public:
InsertLoad(const size_t count = 1lu);
};
/**
* @interface InsertStore
* @brief Inserts explicit store instruction before each result and buffer.
* The pass is used to convert model to a canonical form for code generation
* @ingroup snippets
*/
class InsertStore: public ngraph::pass::MatcherPass {
public:
InsertStore(const size_t count = 1lu);
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -1,43 +0,0 @@
// Copyright (C) 2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface InsertLoops
* @brief Insert explicit Loop operations into the body to process multiple data entities during one kernel execution
* @param master_shape - shape used to determine loop work amounts
* @param loop_depth - the number of last master_shape dimensions processed by loops (aka tileRank - obsolete), could be 1 or 2
* @param vector_size - the number of entities processed on one iteration of vector loop
* @param single_loop_body - true, if we can just insert LoopBegin on inputs and LoopEnd on outputs, othwerwise
* the pass goes all over the body analyzing where LoopBegin and LoopEnd should be inserted:
* synchronization nodes are MatMul, Buffer and other already existing Loops.
* @ingroup snippets
*/
class InsertLoops: public ngraph::pass::FunctionPass {
public:
OPENVINO_RTTI("InsertLoops", "0");
InsertLoops(ov::PartialShape master_shape, size_t loop_depth, size_t vector_size, bool is_optimized = true);
bool run_on_model(const std::shared_ptr<ngraph::Function>& m) override;
static std::vector<bool> calculate_inner_apply_increments(const ov::PartialShape& master, const std::vector<ov::PartialShape>& shapes);
static std::vector<bool> calculate_outer_apply_increments(const std::vector<ov::PartialShape>& shapes);
static std::vector<int64_t> calculate_finalization_offsets(const ov::PartialShape& master, const std::vector<ov::PartialShape>& shapes);
private:
ov::PartialShape m_master_shape;
size_t m_loop_depth;
size_t m_vector_size;
bool m_single_loop_body;
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -4,10 +4,10 @@
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -17,15 +17,15 @@ namespace pass {
* The pass is used to convert model to a canonical form for code generation * The pass is used to convert model to a canonical form for code generation
* @ingroup snippets * @ingroup snippets
*/ */
class InsertMoveBroadcast: public ngraph::pass::MatcherPass { class InsertMoveBroadcast: public ov::pass::MatcherPass {
public: public:
InsertMoveBroadcast(); InsertMoveBroadcast();
static Output<ngraph::Node> BroadcastNodeLastDim(const ngraph::Output<ngraph::Node>& value, static Output<ov::Node> BroadcastNodeLastDim(const ov::Output<ov::Node>& value,
const ov::PartialShape& target_shape, const ov::PartialShape& target_shape,
const ov::PartialShape& normalized_shape); const ov::PartialShape& normalized_shape);
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,29 +0,0 @@
// Copyright (C) 2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface LoopFusion
* @brief Fuse Loops into one Loop if their semantics allow it
* @ingroup snippets
*/
class LoopFusion: public ngraph::pass::MatcherPass {
public:
LoopFusion();
private:
bool Merge(const std::shared_ptr<op::LoopBegin>& buffer);
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -1,99 +0,0 @@
// Copyright (C) 2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "ngraph/op/op.hpp"
#include "ngraph/op/parameter.hpp"
#include "snippets/op/loop.hpp"
namespace ngraph {
namespace snippets {
namespace op {
/* ==== LoopBegin === */
/**
* @interface insertLoopBeginAfterOutputs
* @brief Inserts LoopBegin operation after the group of operations described
* by the input argument (OutputVector). Use insertLoopBegin instead - it has a more universal interface.
* @ingroup snippets
*/
std::shared_ptr<LoopBegin> insertLoopBeginAfterOutputs(const OutputVector& originalOutputs);
/**
* @interface insertLoopBegin
* @brief Inserts LoopBegin operation after the group of operations described
* by the input argument (ParameterVector, NodeVector or OutputVector).
* @ingroup snippets
*/
template<typename T>
std::shared_ptr<LoopBegin> insertLoopBegin(const T& afterTheseNodes) {
static_assert(std::is_same<T, ParameterVector>() || std::is_same<T, NodeVector>(),
"Unsupported template parameter for insertLoopBegin. Only ParameterVector or NodeVector is allowed");
OutputVector originalOutputs;
std::vector<std::set<Input<Node>>> childInputs;
for (const auto &n : afterTheseNodes) {
const auto& nodeOutputs = n->outputs();
// Ignore the LoopBegin->LoopEnd edge to make it easier to construct enclosed Loops
std::move(nodeOutputs.begin(), nodeOutputs.end() - 1 * ov::is_type<LoopBegin>(n), std::back_inserter(originalOutputs));
}
return insertLoopBeginAfterOutputs(originalOutputs);
}
template<>
inline std::shared_ptr<LoopBegin> insertLoopBegin(const OutputVector& afterTheseNodes) {
return insertLoopBeginAfterOutputs(afterTheseNodes);
}
/* ============== */
/* ==== LoopEnd === */
/**
* @interface insertLoopBeginAfterOutputs
* @brief Inserts LoopBegin operation after the group of operations described
* by the input argument (vector of inputs). Use insertLoopEnd instead - it has a more universal interface.
* @param originalInputs LoopEnd will be inserted before these inputs
* @param loopBegin pointer to the beginning of the Loop region
* @param work_amount total number of evaluations to be processed by the loop
* @param increment number of evaluations processed in one iteration of the loop
* @param apply_increment describes which data pointers attributed to the loop should be incremented on every iteration.
* should be used when Loop is connected to Parameters and/or Results
* @param finalization_offsets pointer shifts that should be applied to data pointers before exiting the loop
* @ingroup snippets
*/
std::shared_ptr<LoopEnd> insertLoopEndBeforeInputs(const std::vector<Input<Node>>& originalInputs,
const std::shared_ptr<LoopBegin>& loopBegin,
size_t work_amount, size_t increment,
std::vector<bool> apply_increment = {},
std::vector<int64_t> finalization_offsets = {});
/**
* @interface insertLoopEnd
* @brief Inserts LoopEnd operation before the group of operations described
* by the input argument (ResultVector, NodeVector or OutputVector).
* @ingroup snippets
*/
template<typename T, typename ...Args>
std::shared_ptr<LoopEnd> insertLoopEnd(const T& beforeTheseNodes, Args ...args) {
static_assert(std::is_same<T, ResultVector>() || std::is_same<T, NodeVector>(),
"Unsupported template parameter for insertLoopBegin. Only ParameterVector or NodeVector is allowed");
std::vector<Input<Node>> originalInputs;
for (const auto &n : beforeTheseNodes) {
const auto& nodeInputs = n->inputs();
// Ignore the LoopBegin->LoopEnd edge to facilitate enclosed Loops construction
std::move(nodeInputs.begin(), nodeInputs.end() - 1 * ov::is_type<LoopEnd>(n), std::back_inserter(originalInputs));
}
return insertLoopEndBeforeInputs(originalInputs, args...);
}
template<typename ...Args>
std::shared_ptr<LoopEnd> insertLoopEnd(const std::vector<Input<Node>>& beforeTheseNodes, Args ...args) {
return insertLoopEndBeforeInputs(beforeTheseNodes, args...);
}
/* ============== */
} // namespace op
} // namespace snippets
} // namespace ngraph

View File

@ -1,28 +1,33 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include "ngraph/pass/graph_rewrite.hpp" #include "openvino/pass/graph_rewrite.hpp"
#include "ngraph/pattern/matcher.hpp" #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { #include "snippets/op/brgemm.hpp"
namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
/** /**
* @interface MatMulToBrgemm * @interface MatMulToBrgemm
* @brief Replaces ngraph::MatMul with snippets::op::Brgemm operation (only non-trasposing MatMuls are currently supported) * @brief Replaces ov::MatMul with snippets::op::Brgemm operation (only non-trasposing MatMuls are currently supported)
* @ingroup snippets * @ingroup snippets
*/ */
class MatMulToBrgemm: public ngraph::pass::MatcherPass { class MatMulToBrgemm: public ov::pass::MatcherPass {
public: public:
OPENVINO_RTTI("MatMulToBrgemm", "0"); OPENVINO_RTTI("MatMulToBrgemm", "0");
MatMulToBrgemm(); MatMulToBrgemm();
private:
void init_ports(const std::shared_ptr<op::Brgemm>& brgemm) const;
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,13 +1,13 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -17,7 +17,7 @@ namespace pass {
* TODO: Write pattern * TODO: Write pattern
* @ingroup snippets * @ingroup snippets
*/ */
class TokenizeMHASnippets: public ngraph::pass::MatcherPass { class TokenizeMHASnippets: public ov::pass::MatcherPass {
public: public:
OPENVINO_RTTI("TokenizeMHASnippets", "0"); OPENVINO_RTTI("TokenizeMHASnippets", "0");
TokenizeMHASnippets(); TokenizeMHASnippets();
@ -25,4 +25,4 @@ public:
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -8,7 +8,7 @@
#include <ngraph/pass/pass.hpp> #include <ngraph/pass/pass.hpp>
#include "snippets/generator.hpp" #include "snippets/generator.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -17,7 +17,7 @@ namespace pass {
* @ingroup snippets * @ingroup snippets
* @brief PropagatePrecision transformation propagate precision from parameters to results. * @brief PropagatePrecision transformation propagate precision from parameters to results.
*/ */
class PropagatePrecision: public ngraph::pass::FunctionPass { class PropagatePrecision: public ov::pass::ModelPass {
public: public:
OPENVINO_RTTI("PropagatePrecision", "0"); OPENVINO_RTTI("PropagatePrecision", "0");
PropagatePrecision(const std::shared_ptr<const TargetMachine>& target_machine); PropagatePrecision(const std::shared_ptr<const TargetMachine>& target_machine);
@ -39,7 +39,7 @@ public:
const element::Type& actual, const element::Type& actual,
const element::Type& required) noexcept; const element::Type& required) noexcept;
static bool validate_and_infer_types_and_restore_outputs(const std::shared_ptr<ngraph::Node>& op); static bool validate_and_infer_types_and_restore_outputs(const std::shared_ptr<ov::Node>& op);
private: private:
const std::shared_ptr<const TargetMachine> target_machine; const std::shared_ptr<const TargetMachine> target_machine;
@ -47,4 +47,4 @@ private:
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,29 +0,0 @@
// Copyright (C) 2018-2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface ResetBufferState
* @brief If there is Buffer between loops we should reset Buffer pointer after first loop execution (data storing) using finalization offsets
* to have correct buffer data pointer for data loading in the next loop where data was stored in previous loop
* @ingroup snippets
*/
class ResetBufferState: public ngraph::pass::MatcherPass {
public:
ResetBufferState();
static int64_t calculate_required_finalization_offsets(const size_t inner_master_work_amount, const size_t inner_target_work_amount);
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -0,0 +1,26 @@
// Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "openvino/pass/graph_rewrite.hpp"
#include "openvino/pass/pattern/matcher.hpp"
namespace ov {
namespace snippets {
namespace pass {
/**
* @interface SetSoftmaxPorts
* @brief The pass updates port descriptors in accordance with the Softmax reduction axis
* @ingroup snippets
*/
class SetSoftmaxPorts: public ov::pass::MatcherPass {
public:
SetSoftmaxPorts();
};
} // namespace pass
} // namespace snippets
} // namespace ov

View File

@ -1,30 +0,0 @@
// Copyright (C) 2018-2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface SoftmaxDecomposition
* @brief The pass decomposise Softmax into explicit Snippets dialects
* Note:
* - At the moment Snippets supports Softmax only in MHA pattern where there are Buffer ops before and after Softmax.
* Also Snippets support Loops with Buffer ops on inputs and outputs if Buffer have the same buffer byte size
* because of work with ptr increment. So we have to set Tile rank as buffer allocation rank even if rank 1 is enough
* @ingroup snippets
*/
class SoftmaxDecomposition: public ngraph::pass::MatcherPass {
public:
SoftmaxDecomposition(const size_t vector_size, const int32_t buffer_allocation_rank = -1);
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -1,13 +1,13 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -16,7 +16,7 @@ namespace pass {
* @brief The pass removes Reshape operations around Softmax if possible * @brief The pass removes Reshape operations around Softmax if possible
* @ingroup snippets * @ingroup snippets
*/ */
class SoftmaxReshapeElimination: public ngraph::pass::MatcherPass { class SoftmaxReshapeElimination: public ov::pass::MatcherPass {
public: public:
SoftmaxReshapeElimination(); SoftmaxReshapeElimination();
}; };
@ -24,4 +24,4 @@ public:
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,16 +1,16 @@
// Copyright (C) 2018-2022 Intel Corporation // Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
#include "snippets/pass/mha_tokenization.hpp" #include "snippets/pass/mha_tokenization.hpp"
#include "snippets/pass/collapse_subgraph.hpp" #include "snippets/pass/collapse_subgraph.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -46,7 +46,7 @@ public:
* 4. Some common transformations for Subgraphs. For example, FakeQuantize decomposition * 4. Some common transformations for Subgraphs. For example, FakeQuantize decomposition
* @ingroup snippets * @ingroup snippets
*/ */
class SnippetsTokenization : public ngraph::pass::FunctionPass { class SnippetsTokenization : public ov::pass::ModelPass {
public: public:
OPENVINO_RTTI("SnippetsTokenization", "0"); OPENVINO_RTTI("SnippetsTokenization", "0");
bool run_on_model(const std::shared_ptr<ov::Model>& m) override; bool run_on_model(const std::shared_ptr<ov::Model>& m) override;
@ -55,4 +55,4 @@ public:
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,10 +4,10 @@
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -18,11 +18,11 @@ namespace pass {
* This op is used for real Convert ops inside subgraph body in CPU Plugin * This op is used for real Convert ops inside subgraph body in CPU Plugin
* @ingroup snippets * @ingroup snippets
*/ */
class TransformConvertToConvertTruncation: public ngraph::pass::MatcherPass { class TransformConvertToConvertTruncation: public ov::pass::MatcherPass {
public: public:
TransformConvertToConvertTruncation(); TransformConvertToConvertTruncation();
}; };
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,13 +1,13 @@
// Copyright (C) 2022 Intel Corporation // Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#pragma once #pragma once
#include <ngraph/pass/graph_rewrite.hpp> #include "openvino/pass/graph_rewrite.hpp"
#include <ngraph/pattern/matcher.hpp> #include "openvino/pass/pattern/matcher.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace pass { namespace pass {
@ -16,7 +16,7 @@ namespace pass {
* @brief Decompose Transpose to Load + Store wrapped in several loops. * @brief Decompose Transpose to Load + Store wrapped in several loops.
* @ingroup snippets * @ingroup snippets
*/ */
class TransposeDecomposition: public ngraph::pass::MatcherPass { class TransposeDecomposition: public ov::pass::MatcherPass {
public: public:
OPENVINO_RTTI("TransposeDecomposition", "0"); OPENVINO_RTTI("TransposeDecomposition", "0");
TransposeDecomposition(); TransposeDecomposition();
@ -25,4 +25,4 @@ public:
} // namespace pass } // namespace pass
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -1,40 +0,0 @@
// Copyright (C) 2018-2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/pass/graph_rewrite.hpp>
#include <ngraph/pattern/matcher.hpp>
namespace ngraph {
namespace snippets {
namespace pass {
/**
* @interface SetScalarCountForLoad
* @brief Set count `1` for Load to represent as ScalarLoad
* The pass is used to change element count to loading to "1" to load scalar value
* Used for tail generation
* @ingroup snippets
*/
class SetScalarCountForLoad: public ngraph::pass::MatcherPass {
public:
SetScalarCountForLoad();
};
/**
* @interface SetScalarCountForStore
* @brief Set count `1` for Store to represent as ScalarStore
* The pass is used to change element count to stroring to "1" to store scalar valuw
* Used for tail generation
* @ingroup snippets
*/
class SetScalarCountForStore: public ngraph::pass::MatcherPass {
public:
SetScalarCountForStore();
};
} // namespace pass
} // namespace snippets
} // namespace ngraph

View File

@ -4,8 +4,8 @@
#pragma once #pragma once
#include "ngraph/ops.hpp" #include "openvino/core/node.hpp"
#include <ngraph/opsets/opset1.hpp> #include "openvino/opsets/opset1.hpp"
#include "op/broadcastload.hpp" #include "op/broadcastload.hpp"
#include "op/broadcastmove.hpp" #include "op/broadcastmove.hpp"
@ -25,12 +25,12 @@
#include "op/brgemm.hpp" #include "op/brgemm.hpp"
#include "op/vector_buffer.hpp" #include "op/vector_buffer.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace isa { namespace isa {
#define NGRAPH_OP(a, b) using b::a; #define OV_OP(a, b) using b::a;
#include "snippets_isa_tbl.hpp" #include "snippets_isa_tbl.hpp"
#undef NGRAPH_OP #undef OV_OP
} // namespace isa } // namespace isa
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -4,82 +4,82 @@
#pragma once #pragma once
#ifndef NGRAPH_OP #ifndef OV_OP
#warning "NGRAPH_OP not defined" #warning "OV_OP not defined"
#define NGRAPH_OP(x, y) #define OV_OP(x, y)
#endif #endif
// SnippetS dialect // SnippetS dialect
NGRAPH_OP(Load, ngraph::snippets::op) OV_OP(Load, ov::snippets::op)
NGRAPH_OP(LoadReshape, ngraph::snippets::op) OV_OP(LoadReshape, ov::snippets::op)
NGRAPH_OP(LoopBegin, ngraph::snippets::op) OV_OP(LoopBegin, ov::snippets::op)
NGRAPH_OP(LoopEnd, ngraph::snippets::op) OV_OP(LoopEnd, ov::snippets::op)
NGRAPH_OP(Brgemm, ngraph::snippets::op) OV_OP(Brgemm, ov::snippets::op)
NGRAPH_OP(BroadcastLoad, ngraph::snippets::op) OV_OP(BroadcastLoad, ov::snippets::op)
NGRAPH_OP(Store, ngraph::snippets::op) OV_OP(Store, ov::snippets::op)
NGRAPH_OP(BroadcastMove, ngraph::snippets::op) OV_OP(BroadcastMove, ov::snippets::op)
NGRAPH_OP(Scalar, ngraph::snippets::op) OV_OP(Scalar, ov::snippets::op)
NGRAPH_OP(Nop, ngraph::snippets::op) OV_OP(Nop, ov::snippets::op)
// Layout-oblivious from opset1 // Layout-oblivious from opset1
// opset completeness // opset completeness
NGRAPH_OP(Constant, ngraph::op) OV_OP(Constant, ov::op::v0)
NGRAPH_OP(Parameter, ngraph::op::v0) OV_OP(Parameter, ov::op::v0)
NGRAPH_OP(Result, ngraph::op::v0) OV_OP(Result, ov::op::v0)
NGRAPH_OP(Broadcast, ngraph::op::v1) OV_OP(Broadcast, ov::op::v1)
NGRAPH_OP(ConvertTruncation, ngraph::snippets::op) OV_OP(ConvertTruncation, ov::snippets::op)
NGRAPH_OP(ConvertSaturation, ngraph::snippets::op) OV_OP(ConvertSaturation, ov::snippets::op)
// unary // unary
NGRAPH_OP(Abs, ngraph::op::v0) OV_OP(Abs, ov::op::v0)
NGRAPH_OP(Acos, ngraph::op::v0) OV_OP(Acos, ov::op::v0)
NGRAPH_OP(Asin, ngraph::op::v0) OV_OP(Asin, ov::op::v0)
NGRAPH_OP(Atan, ngraph::op::v0) OV_OP(Atan, ov::op::v0)
NGRAPH_OP(Ceiling, ngraph::op::v0) OV_OP(Ceiling, ov::op::v0)
NGRAPH_OP(Clamp, ngraph::op::v0) OV_OP(Clamp, ov::op::v0)
NGRAPH_OP(Cos, ngraph::op::v0) OV_OP(Cos, ov::op::v0)
NGRAPH_OP(Cosh, ngraph::op::v0) OV_OP(Cosh, ov::op::v0)
NGRAPH_OP(Elu, ngraph::op::v0) OV_OP(Elu, ov::op::v0)
NGRAPH_OP(Erf, ngraph::op::v0) OV_OP(Erf, ov::op::v0)
NGRAPH_OP(Exp, ngraph::op::v0) OV_OP(Exp, ov::op::v0)
NGRAPH_OP(Floor, ngraph::op::v0) OV_OP(Floor, ov::op::v0)
NGRAPH_OP(HardSigmoid, ngraph::op::v0) OV_OP(HardSigmoid, ov::op::v0)
NGRAPH_OP(Log, ngraph::op::v0) OV_OP(Log, ov::op::v0)
NGRAPH_OP(LogicalNot, ngraph::op::v1) OV_OP(LogicalNot, ov::op::v1)
NGRAPH_OP(Negative, ngraph::op::v0) OV_OP(Negative, ov::op::v0)
NGRAPH_OP(Relu, ngraph::op::v0) OV_OP(Relu, ov::op::v0)
NGRAPH_OP(Round, ngraph::op::v5) OV_OP(Round, ov::op::v5)
NGRAPH_OP(Selu, ngraph::op::v0) OV_OP(Selu, ov::op::v0)
NGRAPH_OP(Sign, ngraph::op::v0) OV_OP(Sign, ov::op::v0)
NGRAPH_OP(Sigmoid, ngraph::op::v0) OV_OP(Sigmoid, ov::op::v0)
NGRAPH_OP(Sin, ngraph::op::v0) OV_OP(Sin, ov::op::v0)
NGRAPH_OP(Sinh, ngraph::op::v0) OV_OP(Sinh, ov::op::v0)
NGRAPH_OP(Sqrt, ngraph::op::v0) OV_OP(Sqrt, ov::op::v0)
NGRAPH_OP(Tan, ngraph::op::v0) OV_OP(Tan, ov::op::v0)
NGRAPH_OP(Tanh, ngraph::op::v0) OV_OP(Tanh, ov::op::v0)
// binary // binary
NGRAPH_OP(Add, ngraph::op::v1) OV_OP(Add, ov::op::v1)
NGRAPH_OP(Divide, ngraph::op::v1) OV_OP(Divide, ov::op::v1)
NGRAPH_OP(Equal, ngraph::op::v1) OV_OP(Equal, ov::op::v1)
NGRAPH_OP(FloorMod, ngraph::op::v1) OV_OP(FloorMod, ov::op::v1)
NGRAPH_OP(Greater, ngraph::op::v1) OV_OP(Greater, ov::op::v1)
NGRAPH_OP(GreaterEqual, ngraph::op::v1) OV_OP(GreaterEqual, ov::op::v1)
NGRAPH_OP(Less, ngraph::op::v1) OV_OP(Less, ov::op::v1)
NGRAPH_OP(LessEqual, ngraph::op::v1) OV_OP(LessEqual, ov::op::v1)
NGRAPH_OP(LogicalAnd, ngraph::op::v1) OV_OP(LogicalAnd, ov::op::v1)
NGRAPH_OP(LogicalOr, ngraph::op::v1) OV_OP(LogicalOr, ov::op::v1)
NGRAPH_OP(LogicalXor, ngraph::op::v1) OV_OP(LogicalXor, ov::op::v1)
NGRAPH_OP(Maximum, ngraph::op::v1) OV_OP(Maximum, ov::op::v1)
NGRAPH_OP(Minimum, ngraph::op::v1) OV_OP(Minimum, ov::op::v1)
NGRAPH_OP(Mod, ngraph::op::v1) OV_OP(Mod, ov::op::v1)
NGRAPH_OP(Multiply, ngraph::op::v1) OV_OP(Multiply, ov::op::v1)
NGRAPH_OP(NotEqual, ngraph::op::v1) OV_OP(NotEqual, ov::op::v1)
NGRAPH_OP(Power, ngraph::op::v1) OV_OP(Power, ov::op::v1)
NGRAPH_OP(PRelu, ngraph::op::v0) OV_OP(PRelu, ov::op::v0)
NGRAPH_OP(SquaredDifference, ngraph::op::v0) OV_OP(SquaredDifference, ov::op::v0)
NGRAPH_OP(Subtract, ngraph::op::v1) OV_OP(Subtract, ov::op::v1)
NGRAPH_OP(Xor, ngraph::op::v0) OV_OP(Xor, ov::op::v0)

View File

@ -0,0 +1,80 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
/**
* @brief A file contains public interface for target independent code generator.
* @file generator.hpp
*/
#pragma once
#include "emitter.hpp"
namespace ov {
namespace snippets {
typedef std::pair<std::function<std::shared_ptr<Emitter>(const std::shared_ptr<ov::Node>&)>,
std::function<std::set<std::vector<element::Type>>(const std::shared_ptr<ov::Node>&)>> jitters_value;
/**
* @interface TargetMachine
* @brief Base class Target machine representation. Target derives from this class to provide generator information about supported emitters
* @ingroup snippets
*/
class TargetMachine {
public:
/**
* @brief checks if target is natively supported
* @return true, if supported
*/
virtual bool is_supported() const = 0;
/**
* @brief finalizes code generation
* @return generated kernel binary
*/
virtual code get_snippet() const = 0;
/**
* @brief gets number of lanes supported by target's vector ISA
* @return number of lanes
*/
virtual size_t get_lanes() const = 0;
/**
* @brief called by generator to all the emitter for a target machine
* @return a map by node's type info with callbacks to create an instance of emitter for corresponding operation type
*/
std::function<std::shared_ptr<Emitter>(const std::shared_ptr<Node>)> get(const ov::DiscreteTypeInfo& type) const {
auto jitter = jitters.find(type);
if (jitter == jitters.end()) {
OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation.");
}
return jitter->second.first;
}
std::function<std::set<std::vector<element::Type>>(const std::shared_ptr<ov::Node>&)>
get_supported_precisions(const ov::DiscreteTypeInfo type) const {
auto jitter = jitters.find(type);
if (jitter == jitters.end()) {
OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation.");
}
return jitter->second.second;
}
/**
* @brief checks if emitter for a specific operation is supported
* @return true, if supported
*/
bool has(const ov::DiscreteTypeInfo type) const {
return jitters.find(type) != jitters.end();
}
virtual ~TargetMachine() = default;
protected:
std::map<const ov::DiscreteTypeInfo, jitters_value> jitters;
};
} // namespace snippets
} // namespace ov

View File

@ -12,28 +12,24 @@
#include "emitter.hpp" #include "emitter.hpp"
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
namespace utils { namespace utils {
// Get non-scalar Constant count that will be created after FakeQuantize decomposition. // Get non-scalar Constant count that will be created after FakeQuantize decomposition.
// This count is needed to know exact count of non-scalar Constants during tokenization. // This count is needed to know exact count of non-scalar Constants during tokenization.
auto get_non_scalar_constant_count_for_fq(const std::shared_ptr<ngraph::opset1::FakeQuantize>& fq) -> size_t; auto get_non_scalar_constant_count_for_fq(const std::shared_ptr<ov::opset1::FakeQuantize>& fq) -> size_t;
inline auto is_scalar_constant(const std::shared_ptr<ngraph::Node>& source_output_node) -> bool { inline auto is_scalar_constant(const std::shared_ptr<ov::Node>& source_output_node) -> bool {
return ngraph::is_type<ngraph::opset1::Constant>(source_output_node) && ngraph::shape_size(source_output_node->get_shape()) == 1; return ov::is_type<ov::opset1::Constant>(source_output_node) && ov::shape_size(source_output_node->get_shape()) == 1;
} }
ov::PartialShape get_port_planar_shape(const Input<Node>& out);
ov::PartialShape get_port_planar_shape(const Output<Node>& out); ov::PartialShape get_port_planar_shape(const Output<Node>& out);
ov::PartialShape get_reordered_planar_shape(const ov::PartialShape& shape, const std::vector<size_t>& layout); ov::PartialShape get_reordered_planar_shape(const ov::PartialShape& shape, const std::vector<size_t>& layout);
std::vector<size_t> get_node_output_layout(const std::shared_ptr<Node>& node);
std::vector<size_t> get_node_output_layout(const Node* node);
void set_transpose_output_layout(const ov::Output<Node>& port, const std::shared_ptr<opset1::Transpose>& node);
void set_output_layout(const ov::Output<Node>& port, const std::vector<size_t>& layout);
inline ov::Dimension get_inner_dim(const ov::PartialShape &shape) { return *(shape.rbegin()); } // Copy runtime info using default ngraph method but delete PortDescriptors which may be transferred after copying
inline ov::Dimension get_outer_dim(const ov::PartialShape &shape) { return *(shape.rbegin() + 1); } void safe_copy_runtime_info(const std::shared_ptr<ov::Node>&, const std::shared_ptr<ov::Node>& to);
inline auto normalize_rank(int32_t allocation_rank, const size_t shape_rank) -> int32_t { inline auto normalize_rank(int32_t allocation_rank, const size_t shape_rank) -> int32_t {
return allocation_rank < 0 ? allocation_rank + static_cast<int32_t>(shape_rank) + 1 : allocation_rank; return allocation_rank < 0 ? allocation_rank + static_cast<int32_t>(shape_rank) + 1 : allocation_rank;
@ -56,4 +52,4 @@ constexpr bool everyone_is(T val, P item, Args... item_others) {
} }
} // namespace utils } // namespace utils
} // namespace snippets } // namespace snippets
} // namespace ngraph } // namespace ov

View File

@ -3,224 +3,53 @@
// //
#include "snippets/generator.hpp" #include "snippets/generator.hpp"
#include "snippets/pass/assign_registers.hpp"
#include "snippets/pass/vector_to_scalar.hpp" #include "snippets/lowered/linear_ir.hpp"
#include "snippets/pass/insert_load_store.hpp" #include "snippets/lowered/pass/assign_registers.hpp"
#include "snippets/op/loop.hpp" #include "snippets/lowered/pass/insert_tail_loop.hpp"
#include "snippets/op/subgraph.hpp"
#include "snippets/op/kernel.hpp" #include "snippets/op/kernel.hpp"
#include <snippets/itt.hpp>
#include <ngraph/pass/manager.hpp> #include "snippets/itt.hpp"
#include <openvino/core/type.hpp>
namespace ngraph { namespace ov {
namespace snippets { namespace snippets {
auto getRegisters(const std::shared_ptr<ngraph::Node> &n) -> RegInfo { Generator::LoweringResult Generator::generate(lowered::LinearIR& linear_ir, const lowered::Config& config, const void* compile_params) {
OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::getRegisters") OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::Generator::generate")
OV_ITT_TASK_CHAIN(GENERATE, ov::pass::itt::domains::SnippetsTransform, "Snippets::Generator", "::Transformations")
// ToDo: change to reg_t
std::vector<size_t> rin, rout;
for (const auto& output : n->outputs()) {
const auto& rt = output.get_tensor_ptr()->get_rt_info();
auto it_rt = rt.find("reginfo");
if (it_rt != rt.end())
rout.push_back(it_rt->second.as<size_t>());
}
for (const auto& input : n->inputs()) {
auto rt = input.get_source_output().get_tensor_ptr()->get_rt_info();
auto it_rt = rt.find("reginfo");
if (it_rt != rt.end())
rin.push_back(it_rt->second.as<size_t>());
}
return std::make_pair(rin, rout);
}
auto tail_transformations(NodeVector& tail, const size_t tail_size, const ngraph::snippets::Generator::GeneratorConfig& config) -> void {
NodeVector updated_tile;
auto insertFill = [tail_size](const ov::Input<ov::Node>& input) -> std::shared_ptr<ov::Node> {
auto copyRegInfo = [](const ov::descriptor::Tensor& from, ov::descriptor::Tensor& to) -> void {
auto rt = from.get_rt_info();
auto reginfo = rt.find("reginfo");
if (reginfo != rt.end()) {
to.get_rt_info()["reginfo"] = reginfo->second;
}
};
std::shared_ptr<ov::Node> fill = nullptr;
auto& rt = input.get_rt_info();
auto fill_rt = rt.find("set_fill");
if (fill_rt != rt.end()) {
const auto fill_value = fill_rt->second.as<uint32_t>();
fill = std::make_shared<ngraph::snippets::op::Fill>(input.get_source_output(), tail_size, fill_value);
input.get_node()->set_argument(input.get_index(), fill);
// we should explicitly copy reg info because we insert Fill after assign register
copyRegInfo(fill->get_input_tensor(0), fill->get_output_tensor(0));
}
return fill;
};
for (auto& op : tail) {
// We should fill vector regs by float_min and zero to have
// correct math calculations for ReduceMax and ReduceSum in scalar case.
// Note: We find Maximum and Add ops because HorizonMax and HorizonSum are outside Loop,
// so they are missed in <tail>
if (config.m_need_fill_tail_register &&
(ov::is_type<ov::op::v1::Maximum>(op) ||
ov::is_type<ov::op::v1::Add>(op))) {
for (size_t i = 0; i < op->inputs().size(); ++i) {
if (auto fill = insertFill(op->input(i))) {
updated_tile.push_back(fill);
}
}
} else if (const auto memory_access = std::dynamic_pointer_cast<ngraph::snippets::op::MemoryAccess>(op)) {
for (size_t i = 0; i < memory_access->get_input_port_count(); ++i) {
if (memory_access->get_input_count(i) > 1) {
memory_access->set_input_count(tail_size, i);
}
}
for (size_t i = 0; i < memory_access->get_output_port_count(); ++i) {
if (memory_access->get_output_count(i) > 1) {
memory_access->set_output_count(tail_size, i);
}
}
}
updated_tile.push_back(op);
}
tail = std::move(updated_tile);
}
ngraph::snippets::code ngraph::snippets::Generator::generate(std::shared_ptr<ov::Model>& m,
const GeneratorConfig& config,
const void* compile_params) {
OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::Generator::generate")
if (!target->is_supported()) if (!target->is_supported())
OPENVINO_THROW("unsupported architecture for code generation"); OPENVINO_THROW("unsupported architecture for code generation");
OV_ITT_TASK_CHAIN(GENERATE, ngraph::pass::itt::domains::SnippetsTransform, "Snippets::Generator", "::VectorTile") std::function<opRegType(const std::shared_ptr<Node>& op)> reg_type_mapper = [&](const std::shared_ptr<Node>& op) -> opRegType {
// vector loop return get_op_reg_type(op);
std::vector<AllocatedEmitter> lowered;
auto lower_ops = [&lowered, this](const NodeVector& ops){
std::transform(ops.begin(), ops.end(), std::back_inserter(lowered),
[this](const std::shared_ptr<Node>& n){
return std::make_pair(target->get(n->get_type_info())(n), ngraph::snippets::getRegisters(n));
});
}; };
// *1* solo vector/tail loop + empty outer loop lowered::pass::PassPipeline lowered_pipeline;
// => skip increments (both counter & ptr) : set evaluate_once flag lowered_pipeline.register_pass<lowered::pass::AssignRegisters>(reg_type_mapper);
// *2* solo vector/tail loop + non-empty outer loop lowered_pipeline.register_pass<lowered::pass::InsertTailLoop>();
// => skip counter increments but perform ptr increments : set evaluate_once, lowered_pipeline.run(linear_ir);
// and perform pointer increments through finalization offsets
// *3* vector loop(s) + one tail loop
// => vector as usual, tail depends on outer loop, see *1* and *2*
auto optimize_single_evaluation = [](const std::shared_ptr<op::LoopEnd>& loop, bool force_ptr_increment = false) {
if (loop->get_work_amount() < 2 * loop->get_increment()) {
loop->set_evaluate_once(true);
if (force_ptr_increment || loop->has_outer_loop) {
std::vector<int64_t> new_finalization_offsets(loop->get_finalization_offsets());
const auto& ptr_increments = loop->get_ptr_increments();
for (size_t i = 0; i < new_finalization_offsets.size(); i++) {
new_finalization_offsets[i] += ptr_increments[i];
}
loop->set_finalization_offsets(new_finalization_offsets);
}
return true;
} else {
return false;
}
};
const auto& ops = m->get_ordered_ops();
for (auto op = ops.begin(); op < ops.end(); op++) {
const auto& loop_begin = ov::as_type_ptr<ngraph::snippets::op::LoopBegin>(*op);
// ignore outer loops and possible manual scalar loops linear_ir.init_emitters(target);
if (loop_begin && loop_begin->get_increment() != 1) {
OV_ITT_TASK_NEXT(GENERATE, "::VectorLoop")
NodeVector vector_loop, tail_loop;
std::shared_ptr<op::LoopEnd> vector_loop_end, tail_loop_end;
vector_loop_end = loop_begin->get_loop_end();
tail_loop_end = nullptr;
while (*op != vector_loop_end)
vector_loop.push_back(*op++);
vector_loop.push_back(*op);
const auto work_amount = vector_loop_end->get_work_amount();
const auto increment = vector_loop_end->get_increment();
const auto tail_size = work_amount % increment;
const auto need_tail = tail_size != 0;
const auto need_vector_loop = work_amount >= increment;
// Note, that finalization_offsets could be modified inside optimize_single_evaluation,
// so need to save them here to cover (evaluate_once vector with non-zero finalization_offsets + tail)
std::vector<int64_t> tail_finalization_offsets = need_tail ? vector_loop_end->get_finalization_offsets() : std::vector<int64_t> {};
// vector loops are required => Just copy the body, original loop is already a vector one
if (need_vector_loop) {
// Note that finalization offsets should be applied after the last iteration.
// So if there is a tail, then we should apply offsets after it, but not now.
if (need_tail)
vector_loop_end->set_finalization_offsets(std::vector<int64_t>(tail_finalization_offsets.size(), 0));
if (config.m_optimize_single_evaluation) {
// force ptr increments if there is tail
optimize_single_evaluation(vector_loop_end, need_tail);
}
lower_ops(vector_loop);
}
OV_ITT_TASK_NEXT(GENERATE, "::TailLoop")
// tail is required => transform the body into a tail representation
// tail loop is fake loop because for tail we should calculate only
// finalization offsets which are supported by LoopEnd.
if (need_tail) {
NodeMap vector_to_tail_node_map;
tail_loop = ngraph::clone_nodes(vector_loop, vector_to_tail_node_map);
tail_transformations(tail_loop, tail_size, config);
tail_loop_end = ov::as_type_ptr<op::LoopEnd>(*tail_loop.rbegin());
tail_loop_end->set_finalization_offsets(tail_finalization_offsets);
tail_loop_end->set_increment(tail_size);
// ptr increments were set to the old increment, need to update them in accordance with the new one
tail_loop_end->update_ptr_increments(static_cast<int64_t>(tail_size));
tail_loop_end->set_work_amount(tail_size);
tail_loop_end->has_outer_loop = vector_loop_end->has_outer_loop;
if (config.m_optimize_single_evaluation) {
// tail loop is always executed once
optimize_single_evaluation(tail_loop_end);
}
lower_ops(tail_loop);
}
} else {
lower_ops({*op});
}
}
OV_ITT_TASK_NEXT(GENERATE, "::EmitCode") OV_ITT_TASK_NEXT(GENERATE, "::EmitCode")
//todo: Kernel need info on i/o data access pattern and data shapes to calculate data offsets auto loops2DKernel = std::make_shared<op::Kernel>(linear_ir);
// pass Params and Results
// todo: it's probably better to move AllocaledEmitter creation inside Kernel constructor
// So Kernel accepts only model ptr and target, and creates AllocatedEmitter inside
//emission
auto loops2DKernel = std::make_shared<op::Kernel>(lowered, m);
loops2DKernel->compile_params = compile_params; loops2DKernel->compile_params = compile_params;
std::shared_ptr<Emitter> kernel = target->get(op::Kernel::get_type_info_static())(loops2DKernel); std::shared_ptr<Emitter> kernel = target->get(op::Kernel::get_type_info_static())(loops2DKernel);
kernel->emit_code({}, {}); kernel->emit_code({}, {});
OV_ITT_TASK_NEXT(GENERATE, "::EmitData") OV_ITT_TASK_NEXT(GENERATE, "::EmitData")
for (auto& op : lowered) { for (auto& l : linear_ir.get_ops()) {
op.first->emit_data(); l->get_emitter()->emit_data();
} }
OV_ITT_TASK_NEXT(GENERATE, "::GetSnippet") OV_ITT_TASK_NEXT(GENERATE, "::GetSnippet")
// todo: we save lowered to access compiled brgemm kernels on execution time (normally lowered is destructed by then) // todo: we save lowered to access compiled brgemm kernels on execution time (normally lowered is destructed by then)
// remove this when kernel caching is implemented. Don't forget to make generate const method. // remove this when kernel caching is implemented. Don't forget to make generate const method.
if (config.m_save_lowered_code) if (config.m_save_expressions)
lowered_saved = lowered; lowered_saved = linear_ir;
return target->get_snippet(); return { target->get_snippet() };
} }
std::shared_ptr<const TargetMachine> Generator::get_target_machine() const { std::shared_ptr<const TargetMachine> Generator::get_target_machine() const {
@ -228,8 +57,8 @@ std::shared_ptr<const TargetMachine> Generator::get_target_machine() const {
} }
Generator::opRegType Generator::get_op_reg_type(const std::shared_ptr<Node>& op) const { Generator::opRegType Generator::get_op_reg_type(const std::shared_ptr<Node>& op) const {
if (std::dynamic_pointer_cast<opset1::Parameter>(op) || if (std::dynamic_pointer_cast<ov::op::v0::Parameter>(op) ||
std::dynamic_pointer_cast<opset1::Result>(op) || std::dynamic_pointer_cast<ov::op::v0::Result>(op) ||
std::dynamic_pointer_cast<op::LoopBegin>(op) || std::dynamic_pointer_cast<op::LoopBegin>(op) ||
std::dynamic_pointer_cast<op::LoopEnd>(op) || std::dynamic_pointer_cast<op::LoopEnd>(op) ||
std::dynamic_pointer_cast<op::Brgemm>(op) || std::dynamic_pointer_cast<op::Brgemm>(op) ||
@ -244,10 +73,10 @@ Generator::opRegType Generator::get_op_reg_type(const std::shared_ptr<Node>& op)
ov::op::util::is_binary_elementwise_arithmetic(op) || ov::op::util::is_binary_elementwise_arithmetic(op) ||
ov::op::util::is_binary_elementwise_comparison(op) || ov::op::util::is_binary_elementwise_comparison(op) ||
ov::op::util::is_binary_elementwise_logical(op) || ov::op::util::is_binary_elementwise_logical(op) ||
std::dynamic_pointer_cast<opset1::LogicalNot>(op) || std::dynamic_pointer_cast<ov::op::v1::LogicalNot>(op) ||
std::dynamic_pointer_cast<opset1::PRelu>(op) || std::dynamic_pointer_cast<ov::op::v0::PRelu>(op) ||
std::dynamic_pointer_cast<opset1::Convert>(op) || std::dynamic_pointer_cast<ov::op::v0::Convert>(op) ||
std::dynamic_pointer_cast<opset1::Select>(op) || std::dynamic_pointer_cast<ov::op::v1::Select>(op) ||
std::dynamic_pointer_cast<op::VectorBuffer>(op) || std::dynamic_pointer_cast<op::VectorBuffer>(op) ||
std::dynamic_pointer_cast<op::BroadcastMove>(op) || std::dynamic_pointer_cast<op::BroadcastMove>(op) ||
std::dynamic_pointer_cast<op::Scalar>(op) || std::dynamic_pointer_cast<op::Scalar>(op) ||
@ -262,6 +91,5 @@ Generator::opRegType Generator::get_specific_op_reg_type(const std::shared_ptr<o
OPENVINO_THROW("Register type of the operation " + std::string(op->get_type_name()) + " isn't determined!"); OPENVINO_THROW("Register type of the operation " + std::string(op->get_type_name()) + " isn't determined!");
} }
}// namespace snippets }// namespace snippets
}// namespace ngraph }// namespace ov

View File

@ -0,0 +1,133 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/expression.hpp"
#include "snippets/itt.hpp"
#include "snippets/utils.hpp"
#include "openvino/core/graph_util.hpp"
#include "openvino/core/type.hpp"
namespace ov {
namespace snippets {
namespace lowered {
size_t Expression::LOOP_NULL_ID = SIZE_MAX;
Expression::Expression(const std::shared_ptr<Node>& n)
: m_source_node{n}, m_emitter{nullptr}, m_input_port_connectors{}, m_output_port_connectors{} {
m_input_port_descriptors.reserve(n->get_input_size());
m_output_port_descriptors.reserve(n->get_output_size());
for (const auto& input : n->inputs()) {
m_input_port_descriptors.push_back(PortDescriptorUtils::get_port_descriptor_ptr(input));
}
for (const auto& output : n->outputs()) {
m_output_port_descriptors.push_back(PortDescriptorUtils::get_port_descriptor_ptr(output));
}
}
const PortConnectorPtr& Expression::get_input_port_connector(size_t i) const {
OPENVINO_ASSERT(i < m_input_port_connectors.size(), "Failed to get input port connector: target input port must be less than input count!");
return m_input_port_connectors[i];
}
const PortConnectorPtr& Expression::get_output_port_connector(size_t i) const {
OPENVINO_ASSERT(i < m_output_port_connectors.size(), "Failed to get output port connector: target output port must be less than output count!");
return m_output_port_connectors[i];
}
const PortDescriptorPtr& Expression::get_input_port_descriptor(size_t i) const {
OPENVINO_ASSERT(i < m_input_port_descriptors.size(), "Failed to get input port descriptor: target input port must be less than input count!");
return m_input_port_descriptors[i];
}
const PortDescriptorPtr& Expression::get_output_port_descriptor(size_t i) const {
OPENVINO_ASSERT(i < m_output_port_descriptors.size(), "Failed to get output port descriptor: target output port must be less than output count!");
return m_output_port_descriptors[i];
}
std::shared_ptr<Node> Expression::get_node() const {
if (!m_source_node)
OPENVINO_THROW("An attempt to get uninitialized node from lowered expression");
return m_source_node;
}
std::shared_ptr<Emitter> Expression::get_emitter() const {
return m_emitter;
}
RegInfo Expression::get_reg_info() const {
RegInfo reg_info;
reg_info.first.reserve(m_input_port_descriptors.size());
reg_info.second.reserve(m_output_port_descriptors.size());
for (const auto& port : m_input_port_descriptors)
reg_info.first.push_back(port->get_reg());
for (const auto& port : m_output_port_descriptors)
reg_info.second.push_back(port->get_reg());
return reg_info;
}
void Expression::set_reg_info(RegInfo rinfo) {
const auto& in = rinfo.first;
const auto& out = rinfo.second;
OPENVINO_ASSERT(m_input_port_descriptors.size() == in.size(), "Incorrect count of input physical registers");
OPENVINO_ASSERT(m_output_port_descriptors.size() == out.size(), "Incorrect count of output physical registers");
for (size_t i = 0; i < m_input_port_descriptors.size(); ++i) {
m_input_port_descriptors[i]->set_reg(in[i]);
}
for (size_t i = 0; i < m_output_port_descriptors.size(); ++i) {
m_output_port_descriptors[i]->set_reg(out[i]);
}
}
void Expression::init_emitter(const std::shared_ptr<const TargetMachine>& target) {
m_emitter = target->get(m_source_node->get_type_info())(m_source_node);
}
void Expression::validate() const {
OPENVINO_ASSERT(m_input_port_descriptors.size() == m_input_port_connectors.size(),
"The count of input ports and input port connectors must be equal");
OPENVINO_ASSERT(m_output_port_descriptors.size() == m_output_port_connectors.size(),
"The count of output ports and output port connectors must be equal");
OPENVINO_ASSERT(m_source_node != nullptr,
"The expression has null source node");
}
void Expression::replace_input(size_t port, PortConnectorPtr to) {
OPENVINO_ASSERT(port < m_input_port_connectors.size(), "Failed to replace: target input port must be less than input count!");
m_input_port_connectors[port] = std::move(to);
}
void Expression::set_loop_id(size_t id, size_t idx) {
if (id != LOOP_NULL_ID) {
OPENVINO_ASSERT((std::find(m_loop_ids.begin(), m_loop_ids.end(), id) == m_loop_ids.end()),
"Expression cannot have several the same Loops");
}
if (m_loop_ids.size() <= idx) {
m_loop_ids.resize(idx + 1, LOOP_NULL_ID);
}
m_loop_ids[idx] = id;
}
void Expression::remove_loop_id(size_t id) {
auto it = std::find(m_loop_ids.begin(), m_loop_ids.end(), id);
OPENVINO_ASSERT(it == m_loop_ids.end(), "Expression doesn't have the Loop with ID " + std::to_string(id));
*it = Expression::LOOP_NULL_ID;
}
ExpressionPort Expression::get_input_port(size_t i) {
return ExpressionPort(this->shared_from_this(), ExpressionPort::Type::Input, i);
}
ExpressionPort Expression::get_output_port(size_t i) {
return ExpressionPort(this->shared_from_this(), ExpressionPort::Type::Output, i);
}
IOExpression::IOExpression(const std::shared_ptr<ov::opset1::Parameter>& par, int64_t index)
: Expression(par), m_index(index), m_type{io_type::INPUT} {}
IOExpression::IOExpression(const std::shared_ptr<ov::opset1::Result>& res, int64_t index)
: Expression(res), m_index(index), m_type{io_type::OUTPUT} {}
}// namespace lowered
}// namespace snippets
}// namespace ov

View File

@ -0,0 +1,128 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/expression_factory.hpp"
#include "snippets/snippets_isa.hpp"
namespace ov {
namespace snippets {
namespace lowered {
void LinearIR::ExpressionFactory::create_expression_inputs(const LinearIR& linear_ir, const ExpressionPtr& expr) {
OPENVINO_ASSERT(expr != nullptr, "Failed expression inputs creation: expression is null");
const auto& node = expr->get_node();
expr->m_input_port_connectors.resize(node->get_input_size(), nullptr);
for (const auto& input : node->inputs()) {
const auto input_source = input.get_source_output();
const auto in_index = input.get_index();
const auto& parent_expr = linear_ir.get_expr_by_node(input_source.get_node_shared_ptr());
const auto& port_connector = parent_expr->get_output_port_connector(input_source.get_index());
port_connector->add_consumer(expr->get_input_port(in_index));
expr->m_input_port_connectors[in_index] = port_connector;
}
}
void LinearIR::ExpressionFactory::create_expression_outputs(const ExpressionPtr& expr) {
OPENVINO_ASSERT(expr != nullptr, "Failed expression outputs creation: expression is null");
const auto& node = expr->get_node();
expr->m_output_port_connectors.resize(node->get_output_size(), nullptr);
for (const auto& output : node->outputs()) {
const auto out_index = output.get_index();
const auto source = expr->get_output_port(out_index);
expr->m_output_port_connectors[out_index] = std::make_shared<PortConnector>(source);
}
}
// The method verifies of input port connectors to availability of the expression as consumer and add it if missed
void LinearIR::ExpressionFactory::init_expression_inputs(const ExpressionPtr& expr, const std::vector<PortConnectorPtr>& inputs) {
for (size_t i = 0; i < inputs.size(); ++i) {
const auto& input = inputs[i];
const auto consumers = input->get_consumers();
const auto found = std::find_if(consumers.begin(), consumers.end(),
[&](const ExpressionPort& desc) {
return desc.get_index() == i && desc.get_expr() == expr;
});
if (found == consumers.end()) {
input->add_consumer(expr->get_input_port(i));
}
}
expr->m_input_port_connectors = inputs;
}
ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr<ov::op::v0::Parameter>& par,
const LinearIR& linear_ir, const std::shared_ptr<ov::Model>& model) {
// Note: ctor of shared_ptr isn't friend class for Expression -> we cannot use directly make_shared<Expression>(args)
OPENVINO_ASSERT(model != nullptr, "To create IOExpression from Parameter there must be inited model!");
auto expr = std::make_shared<IOExpression>(IOExpression(par, model->get_parameter_index(par)));
create_expression_outputs(expr);
expr->validate();
return expr;
}
ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr<ov::op::v0::Result>& res,
const LinearIR& linear_ir, const std::shared_ptr<ov::Model>& model) {
// Note: ctor of shared_ptr isn't friend class for Expression -> we cannot use directly make_shared<Expression>(args)
OPENVINO_ASSERT(model != nullptr, "To create IOExpression from Result there must be inited model!");
auto expr = std::make_shared<IOExpression>(IOExpression(res, model->get_result_index(res)));
create_expression_inputs(linear_ir, expr);
// The Result node don't need output port (because of sense of the node). But each node in ngraph must have one output at least.
// The port descriptors are automatically created in constructor. We manually clean output ports.
expr->m_output_port_descriptors.clear();
expr->validate();
return expr;
}
ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr<ov::Node>& n, const LinearIR& linear_ir,
const std::shared_ptr<ov::Model>& model) {
OPENVINO_ASSERT(!ov::is_type<op::LoopBase>(n), "Default expression builder doesn't support LoopBegin and LoopEnd");
// Note: ctor of shared_ptr isn't friend class for Expression
auto expr = std::make_shared<Expression>(Expression(n));
create_expression_inputs(linear_ir, expr);
create_expression_outputs(expr);
expr->validate();
return expr;
}
ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr<op::LoopBegin>& n, const std::vector<PortConnectorPtr>& inputs) {
OPENVINO_ASSERT(inputs.empty(), "LoopBegin cannot have inputs");
auto expr = std::make_shared<Expression>(Expression(n));
init_expression_inputs(expr, inputs);
create_expression_outputs(expr);
expr->validate();
return expr;
}
ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr<op::LoopEnd>& n, const std::vector<PortConnectorPtr>& inputs) {
auto expr = std::make_shared<Expression>(Expression(n));
expr->m_input_port_descriptors.resize(inputs.size(), nullptr);
for (size_t i = 0; i < inputs.size() - 1; ++i) {
expr->m_input_port_descriptors[i] = std::make_shared<PortDescriptor>();
}
const auto& last_input = inputs.back()->get_source();
OPENVINO_ASSERT(ov::is_type<op::LoopBegin>(last_input.get_expr()->get_node()), "LoopEnd expression expects LoopBegin on last input");
expr->m_input_port_descriptors[inputs.size() - 1] = last_input.get_descriptor_ptr()->clone();
init_expression_inputs(expr, inputs);
// The LoopEnd node don't need output port (because of sense of the node). But each node in ngraph must have one output at least.
// The port descriptors are automatically created in constructor. We manually clean output ports.
expr->m_output_port_descriptors.clear();
expr->validate();
return expr;
}
ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr<ov::Node>& n, const std::vector<PortConnectorPtr>& inputs) {
OPENVINO_ASSERT(!ov::is_type<ov::op::v0::Parameter>(n) &&
!ov::is_type<ov::op::v0::Result>(n),
"Expression builder with inputs doesn't support Result and Parameter");
auto expr = std::make_shared<Expression>(Expression(n));
init_expression_inputs(expr, inputs);
create_expression_outputs(expr);
expr->validate();
return expr;
}
}// namespace lowered
}// namespace snippets
}// namespace ov

View File

@ -0,0 +1,57 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/expression_port.hpp"
#include "snippets/utils.hpp"
namespace ov {
namespace snippets {
namespace lowered {
ExpressionPort::ExpressionPort(const std::shared_ptr<Expression>& expr, Type type, size_t port)
: m_expr(expr), m_type(type), m_port_index(port) {}
const PortDescriptorPtr& ExpressionPort::get_descriptor_ptr() const {
const auto& descs = m_type == Type::Input ? m_expr->m_input_port_descriptors
: m_expr->m_output_port_descriptors;
OPENVINO_ASSERT(m_port_index < descs.size(), "Incorrect index of port");
return descs[m_port_index];
}
const std::shared_ptr<PortConnector>& ExpressionPort::get_port_connector_ptr() const {
const auto& connectors = m_type == Type::Input ? m_expr->m_input_port_connectors
: m_expr->m_output_port_connectors;
OPENVINO_ASSERT(m_port_index < connectors.size(), "Incorrect index of port");
return connectors[m_port_index];
}
std::set<ExpressionPort> ExpressionPort::get_connected_ports() const {
if (ExpressionPort::m_type == Type::Input) {
return { m_expr->m_input_port_connectors[m_port_index]->get_source() };
}
if (ExpressionPort::m_type == Type::Output) {
return m_expr->m_output_port_connectors[m_port_index]->get_consumers();
}
OPENVINO_THROW("ExpressionPort supports only Input and Output types");
}
bool operator==(const ExpressionPort& lhs, const ExpressionPort& rhs) {
if (&lhs == &rhs)
return true;
OPENVINO_ASSERT(lhs.get_type() == rhs.get_type(), "Incorrect ExpressionPort comparison");
return lhs.get_index() == rhs.get_index() && lhs.get_expr() == rhs.get_expr();
}
bool operator!=(const ExpressionPort& lhs, const ExpressionPort& rhs) {
return !(lhs == rhs);
}
bool operator<(const ExpressionPort& lhs, const ExpressionPort& rhs) {
OPENVINO_ASSERT(lhs.get_type() == rhs.get_type(), "Incorrect ExpressionPort comparison");
return (lhs.get_index() < rhs.get_index()) || (lhs.get_index() == rhs.get_index() && lhs.get_expr() < rhs.get_expr());
}
}// namespace lowered
}// namespace snippets
}// namespace ov

View File

@ -0,0 +1,263 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/itt.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/lowered/expression_factory.hpp"
#include "snippets/op/serialization_node.hpp"
#include "snippets/utils.hpp"
#include "openvino/core/graph_util.hpp"
#include "openvino/core/type.hpp"
namespace ov {
namespace snippets {
namespace lowered {
LinearIR::LinearIR(const std::shared_ptr<ov::Model>& model, Config config)
: m_io_expressions{}, m_config{std::move(config)}, m_loop_manager(std::make_shared<LoopManager>()) {
constExprIt last_param = m_expressions.end();
for (const auto& n : get_ordered_ops(model)) {
constExprIt insertion_pos = m_expressions.end();
const auto expr = create_expression(n, model);
// Scalar should be on the Linear IR beginning after Parameters to have valid expression order after Loop passes.
// After these passes we must call pass MoveScalarToConsumer() to have a correct accuracy.
// For more details, please see the pass description
if (const auto& scalar = as_type_ptr<op::Scalar>(n)) {
insertion_pos = std::next(last_param);
}
register_expression(expr, true);
const auto& it = m_expressions.insert(insertion_pos, expr);
if (const auto io_expr = std::dynamic_pointer_cast<IOExpression>(expr)) {
m_io_expressions.push_back(io_expr);
if (ov::is_type<ov::op::v0::Parameter>(n))
last_param = it;
}
}
}
ExpressionPtr LinearIR::create_expression(const std::shared_ptr<Node>& n, const std::shared_ptr<ov::Model>& model) {
return ExpressionFactory::build(n, *this, model);
}
ExpressionPtr LinearIR::create_expression(const std::shared_ptr<Node>& n, const std::vector<PortConnectorPtr>& inputs) {
return ExpressionFactory::build(n, inputs);
}
ov::NodeVector LinearIR::get_ordered_ops(const std::shared_ptr<ov::Model>& m) {
if (!m->get_sinks().empty())
OPENVINO_THROW("Linear IR is not supposed to work for model with sinks. Check your transformation pipeline.");
// Note that an important difference between this impl and Model::get_ordered_ops is that Results and Parameters
// are added in REVERSE order, so they will be visited in DIRECT order compared to get_parameters() and get_results()
NodeVector nodes;
const auto& results = m->get_results();
std::copy(results.rbegin(), results.rend(), std::back_inserter(nodes));
const auto& params = m->get_parameters();
std::copy(params.rbegin(), params.rend(), std::back_inserter(nodes));
return ov::topological_sort(nodes);
}
void LinearIR::serialize(const std::string& xml, const std::string& bin) {
auto first_node = std::make_shared<ov::op::v0::Parameter>(element::f32, Shape{});
first_node->set_friendly_name("Start");
first_node->get_rt_info()["execTimeMcs"] = 0;
std::shared_ptr<Node> body_node = first_node;
for (const auto& expr : m_expressions) {
body_node = std::make_shared<op::SerializationNode>(body_node, expr);
}
auto last_node = std::make_shared<ov::op::v0::Result>(body_node);
last_node->set_friendly_name("End");
const auto tmp_model = std::make_shared<ov::Model>(ResultVector {last_node},
ParameterVector {first_node},
"Lowered_IR_Serialization");
ov::pass::Serialize(xml, bin).run_on_model(tmp_model);
}
LinearIR::container LinearIR::deep_copy_range(LinearIR::container::const_iterator begin, LinearIR::container::const_iterator end) {
auto deep_clone_ports = [](std::vector<PortDescriptorPtr>& ports) {
for (auto& port : ports) { port = port->clone(); }
};
LinearIR::container result;
NodeVector original_nodes;
for (auto it = begin; it != end; it++)
original_nodes.push_back((*it)->get_node());
ngraph::NodeMap node_map;
ngraph::clone_nodes(original_nodes, node_map);
for (auto it = begin; it != end; it++) {
// copy by value, so result shared_pointer point to new objects
Expression new_expr = **it;
new_expr.m_source_node = node_map[(*it)->get_node().get()];
deep_clone_ports(new_expr.m_input_port_descriptors);
deep_clone_ports(new_expr.m_output_port_descriptors);
result.emplace_back(std::make_shared<Expression>(new_expr));
}
return result;
}
void LinearIR::debug_print(bool tds_as_pointers) const {
auto print_rinfo = [](const RegInfo& rinfo) {
std::cerr << " : {";
for (auto i : rinfo.first)
std::cerr << i << " ";
std::cerr << " => ";
for (auto i : rinfo.second)
std::cerr << i << " ";
std::cerr << "}";
};
std::map<PortConnectorPtr, int> td2int;
int td_counter = 0;
int counter = 0;
for (const auto& expr : m_expressions) {
const auto& node = expr->get_node();
std::cerr << counter++ << " : " <<
node->get_friendly_name() << " : ";
if (tds_as_pointers) {
for (const auto& in : expr->m_input_port_connectors) {
if (td2int.count(in) == 0)
OPENVINO_THROW("Undefined input descriptor for op");
std::cerr << td2int.at(in) << ", ";
}
std::cerr << "\b\b => ";
for (const auto& out : expr->m_output_port_connectors) {
if (td2int.count(out) == 0)
td2int.insert({out, td_counter++});
std::cerr << td2int.at(out) << ", ";
}
} else {
for (const auto& port_desc : expr->m_input_port_descriptors)
std::cerr << port_desc << ", ";
std::cerr << "\b\b => ";
for (const auto& port_desc : expr->m_output_port_descriptors)
std::cerr << port_desc << ", ";
}
std::cerr << "\b\b";
const auto& rinfo = expr->get_reg_info();
if (!rinfo.first.empty() || !rinfo.second.empty())
print_rinfo(expr->get_reg_info());
std::cerr << "\n";
}
}
void LinearIR::init_emitters(const std::shared_ptr<TargetMachine>& target) {
for (auto& expr : m_expressions) {
if (!expr->get_emitter())
expr->init_emitter(target);
}
}
const ExpressionPtr& LinearIR::get_expr_by_node(const std::shared_ptr<Node>& n) const {
auto found = m_node2expression_map.find(n);
OPENVINO_ASSERT(found != m_node2expression_map.end(), "The node " + n->get_friendly_name() + " hasn't been found in Linear IR");
return found->second;
}
void LinearIR::replace_input(const std::set<ExpressionPort>& consumers, const PortConnectorPtr& to) {
for (const auto& consumer_input : consumers) {
replace_input(consumer_input, to);
}
}
void LinearIR::replace_input(const ExpressionPort& expr_port, const PortConnectorPtr& to) {
const auto port = expr_port.get_index();
const auto& expr = expr_port.get_expr();
OPENVINO_ASSERT(expr_port.get_type() == ExpressionPort::Type::Input, "Failed to replace: target input port must have Input type");
OPENVINO_ASSERT(expr_port.get_index() < expr->get_input_count(), "Failed to replace: target input port must be less than input count!");
const auto& from = expr->get_input_port_connector(port);
if (from == to)
return;
if (!to->found_consumer(expr_port)) {
to->add_consumer(expr_port);
}
from->remove_consumer(expr_port);
expr->replace_input(port, to);
}
void LinearIR::register_expression(const ExpressionPtr& expr, bool io_allowed) {
const auto& node = expr->get_node();
if (!io_allowed && (is_type<ov::op::v0::Result>(node) || is_type<ov::op::v0::Parameter>(node)))
OPENVINO_THROW("LinearIR::insert can't be used to add Parameters or Results to IR");
{
const auto& res = m_node2expression_map.insert({node, expr});
if (!res.second)
OPENVINO_THROW("Duplicate node is detected in linear IR: " + std::string(node->get_friendly_name()));
}
}
void LinearIR::unregister_expression(const ExpressionPtr& expr) {
for (size_t i = 0; i < expr->get_input_count(); ++i) {
const auto& input = expr->get_input_port_connector(i);
input->remove_consumer(expr->get_input_port(i));
}
m_node2expression_map.erase(expr->get_node());
}
LinearIR::exprIt LinearIR::insert(constExprIt pos, container::value_type&& value) {
register_expression(value);
return m_expressions.insert(pos, value);
}
LinearIR::exprIt LinearIR::insert(constExprIt pos, const container::value_type& value) {
register_expression(value);
return m_expressions.insert(pos, value);
}
LinearIR::exprIt LinearIR::insert(constExprIt pos, exprIt begin, exprIt end) {
constExprIt cbegin = begin;
constExprIt cend = end;
return insert(pos, cbegin, cend);
}
LinearIR::exprIt LinearIR::insert(constExprIt pos, constExprIt begin, constExprIt end) {
for (auto b = begin; b != end; b++)
register_expression(*b);
return m_expressions.insert(pos, begin, end);
}
LinearIR::exprIt LinearIR::insert(LinearIR::constExprIt pos, const NodeVector& nodes) {
auto ret = m_expressions.end();
for (const auto& n : nodes) {
const auto& expr = create_expression(n);
register_expression(expr);
ret = m_expressions.insert(pos, expr);
}
// Need to return iterator to the first of the inserted values
return std::prev(ret, static_cast<int64_t>(nodes.size()));
}
LinearIR::exprIt LinearIR::insert(LinearIR::constExprIt pos, const std::shared_ptr<Node>& n) {
const auto& expr = create_expression(n);
register_expression(expr);
return m_expressions.insert(pos, expr);
}
LinearIR::exprIt LinearIR::erase(LinearIR::exprIt pos) {
unregister_expression(*pos);
return m_expressions.erase(pos);
}
LinearIR::exprIt LinearIR::erase(LinearIR::constExprIt pos) {
unregister_expression(*pos);
return m_expressions.erase(pos);
}
void LinearIR::move(LinearIR::constExprIt from, LinearIR::constExprIt to) {
// Instead of `insert()` + `erase()`, we use `splice()` for the same list
m_expressions.splice(to, m_expressions, from);
}
}// namespace lowered
}// namespace snippets
}// namespace ov

View File

@ -0,0 +1,198 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/lowered/expression.hpp"
#include "snippets/utils.hpp"
#include "openvino/core/graph_util.hpp"
#include "openvino/core/type.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
size_t LinearIR::LoopManager::add_loop_info(const LoopInfoPtr &loop) {
const auto index = next_id;
m_map[index] = loop;
next_id++;
return index;
}
void LinearIR::LoopManager::remove_loop_info(size_t index) {
m_map.erase(index);
}
using LoopInfoPtr = LinearIR::LoopManager::LoopInfoPtr;
const std::map<size_t, LoopInfoPtr> &LinearIR::LoopManager::get_map() const {
return m_map;
}
LoopInfoPtr LinearIR::LoopManager::get_loop_info(size_t index) const {
const auto it = m_map.find(index);
OPENVINO_ASSERT(it != m_map.end(), "LoopInformation hasn't been found!");
return it->second;
}
void LinearIR::LoopManager::get_loop_bounds(const LinearIR &linear_ir,
size_t loop_id,
LinearIR::constExprIt &loop_begin_pos,
LinearIR::constExprIt &loop_end_pos) const {
const auto loop_info = get_loop_info(loop_id);
get_loop_bounds(linear_ir, loop_info->entry_exprs, loop_info->exit_exprs, loop_begin_pos, loop_end_pos, loop_id);
}
void LinearIR::LoopManager::get_loop_bounds(const LinearIR &linear_ir,
const std::vector<ExpressionPort> &entries,
const std::vector<ExpressionPort> &exits,
LinearIR::constExprIt &loop_begin_pos,
LinearIR::constExprIt &loop_end_pos,
size_t loop_id) {
OPENVINO_ASSERT(!entries.empty(), "Loop must have entry points");
OPENVINO_ASSERT(!exits.empty(), "Loop must have entry points");
const auto& entry_expr = entries.front().get_expr();
loop_begin_pos = std::find(linear_ir.begin(), linear_ir.end(), entry_expr);
OPENVINO_ASSERT(loop_begin_pos != linear_ir.end(), "Loop begin hasn't been found!");
// Some operations in Loop can be before first entry points: Scalars, VectorBuffer.
// We should iterate by them till the expr is in the corresponding Loop
auto prev_loop_ids = (*std::prev(loop_begin_pos))->get_loop_ids();
while (std::find(prev_loop_ids.begin(), prev_loop_ids.end(), loop_id) != prev_loop_ids.end()) {
loop_begin_pos = std::prev(loop_begin_pos);
prev_loop_ids = (*std::prev(loop_begin_pos))->get_loop_ids();
}
// At the moment all Loops must have exit points
const auto& exit_expr = exits.back().get_expr();
loop_end_pos = std::next(std::find(loop_begin_pos, linear_ir.end(), exit_expr));
OPENVINO_ASSERT(loop_end_pos != linear_ir.end(), "Loop end hasn't been found!");
}
void LinearIR::LoopManager::get_io_loop_ports(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
std::vector<ExpressionPort> &entries,
std::vector<ExpressionPort> &exits) {
entries.clear();
exits.clear();
for (auto expr_it = loop_begin_pos; expr_it != loop_end_pos; ++expr_it) {
const auto& expr = *expr_it;
for (size_t i = 0; i < expr->get_input_count(); ++i) {
const auto in_port = expr->get_input_port(i);
const auto& parent_expr = in_port.get_connected_ports().begin()->get_expr();
if (!ov::is_type<ov::op::v0::Constant>(parent_expr->get_node()) &&
std::find(loop_begin_pos, expr_it, parent_expr) == expr_it) {
entries.push_back(in_port);
}
}
for (size_t i = 0; i < expr->get_output_count(); ++i) {
const auto out_port = expr->get_output_port(i);
const auto consumer_ports = out_port.get_connected_ports();
for (const auto& consumer : consumer_ports) {
const auto& consumer_expr = consumer.get_expr();
if (std::find(expr_it, loop_end_pos, consumer_expr) == loop_end_pos) {
exits.push_back(out_port);
break;
}
}
}
}
}
void LinearIR::LoopManager::mark_loop(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
size_t loop_depth, size_t vector_size) {
std::vector<ExpressionPort> loop_entry_points, loop_exit_points;
LoopManager::get_io_loop_ports(loop_begin_pos, loop_end_pos, loop_entry_points, loop_exit_points);
auto broadcast = [](std::vector<size_t>& lhs, const std::vector<size_t>& rhs, size_t index) -> void {
if (rhs == lhs)
return;
const auto lhs_size = lhs.size();
const auto rhs_size = rhs.size();
const auto size = std::max(lhs_size, rhs_size);
lhs.resize(size, 1);
OPENVINO_ASSERT(index < size, "Incorrect index for broadcasting");
const auto lhs_value = index < lhs_size ? *(lhs.crbegin() + index) : 1;
const auto rhs_value = index < rhs_size ? *(rhs.crbegin() + index) : 1;
OPENVINO_ASSERT(lhs_value == rhs_value || lhs_value == 1 || rhs_value == 1,
"Output shapes of Loop must be broadcastable!");
*(lhs.rbegin() + index) = std::max(lhs_value, rhs_value);
};
auto is_outside_loop = [](const std::vector<size_t>& subtensor) {
return std::all_of(subtensor.begin(), subtensor.end(), [](size_t lhs) { return lhs == PortDescriptor::ServiceDimensions::FULL_DIM; });
};
std::vector<size_t> loop_subtensor;
std::vector<size_t> loop_tensor(loop_depth, 1);
for (const auto& exit_point : loop_exit_points) {
const auto& desc = exit_point.get_descriptor_ptr();
const auto shape = utils::get_reordered_planar_shape(ov::PartialShape(desc->get_shape()), desc->get_layout()).get_shape();
auto subtensor = desc->get_subtensor();
if (subtensor.empty()) {
subtensor.resize(loop_depth, 1);
subtensor[subtensor.size() - 1] = vector_size;
}
const size_t resizing_value = is_outside_loop(subtensor) ? PortDescriptor::ServiceDimensions::FULL_DIM : 1;
while (subtensor.size() < loop_depth)
subtensor.insert(subtensor.begin(), resizing_value);
if (loop_subtensor.empty())
loop_subtensor = subtensor;
OPENVINO_ASSERT(std::equal(loop_subtensor.crbegin(), loop_subtensor.crbegin() + loop_depth, subtensor.crbegin()),
"Incorrect scheduling parameters for loop");
for (size_t dim_idx = 0; dim_idx < loop_depth; ++dim_idx) {
if (*(subtensor.rbegin() + dim_idx) != PortDescriptor::ServiceDimensions::FULL_DIM) {
broadcast(loop_tensor, shape, dim_idx);
}
}
}
for (size_t dim_idx = 0; dim_idx < loop_depth; ++dim_idx) {
if (*(loop_subtensor.rbegin() + dim_idx) == PortDescriptor::ServiceDimensions::FULL_DIM) {
exprs_marking(loop_begin_pos, loop_end_pos, Expression::LOOP_NULL_ID, loop_depth - dim_idx - 1);
continue;
}
OPENVINO_ASSERT(dim_idx < loop_tensor.size(), "Incorrect indexes of Loop for markup");
const auto work_amount =
loop_tensor.size() > dim_idx ? *(loop_tensor.rbegin() + dim_idx)
: 0;
const auto work_amount_increment =
loop_subtensor.size() > dim_idx ? *(loop_subtensor.rbegin() + dim_idx)
: (dim_idx == 0 ? vector_size : 1);
mark_loop(loop_begin_pos, loop_end_pos, loop_depth - dim_idx - 1, work_amount,
work_amount_increment, loop_entry_points, loop_exit_points);
}
}
void LinearIR::LoopManager::mark_loop(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
size_t idx,
size_t work_amount,
size_t work_amount_increment,
const std::vector<ExpressionPort> &entries,
const std::vector<ExpressionPort> &exits) {
const auto loop_info = std::make_shared<LoopManager::LoopInfo>(work_amount, work_amount_increment, entries, exits);
const auto loop_id = this->add_loop_info(loop_info);
exprs_marking(loop_begin_pos, loop_end_pos, loop_id, idx);
}
void LinearIR::LoopManager::exprs_marking(LinearIR::constExprIt loop_begin_pos,
LinearIR::constExprIt loop_end_pos,
size_t loop_id, size_t idx) {
for (auto expr_it = loop_begin_pos; expr_it != loop_end_pos; ++expr_it) {
expr_it->get()->set_loop_id(loop_id, idx);
}
}
}// namespace lowered
}// namespace snippets
}// namespace ov

View File

@ -0,0 +1,108 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/allocate_buffers.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
void AllocateBuffers::propagate_offset(const LinearIR& linear_ir, const ExpressionPtr& buffer_expr, const size_t offset) {
// If Buffer has offset We set this offset in the connected MemoryAccess ops
// to correctly read and write data because all Buffers has the common data pointer on buffer scratchpad
const auto buffer = ov::as_type_ptr<op::Buffer>(buffer_expr->get_node());
// Propagate to up: in Store. Buffer can have only one Store
{
if (buffer->is_intermediate_memory()) {
OPENVINO_ASSERT(buffer_expr->get_input_port_connectors().size() == 1, "Buffer with intermediate memory must have one parent");
const auto& parent_output = buffer_expr->get_input_port_connector(0)->get_source();
const auto& parent_expr = parent_output.get_expr();
const auto port = parent_output.get_index();
const auto& parent_node = parent_expr->get_node();
auto memory_access = ov::as_type_ptr<ov::snippets::op::MemoryAccess>(parent_node);
if (memory_access && memory_access->is_memory_access_output_port(port)) {
memory_access->set_output_offset(offset, port);
} else {
OPENVINO_THROW(
"Buffer::set_offset() was called when Buffer didn't have the corresponding MemoryAccess op for offset propagation");
}
}
}
// Propagate to down: in Load. Buffer can have several Load
const auto& buffer_out = buffer_expr->get_output_port_connector(0);
for (const auto& child_expr_input : buffer_out->get_consumers()) {
const auto& child_expr = child_expr_input.get_expr();
const auto port = child_expr_input.get_index();
const auto& child_node = child_expr->get_node();
auto memory_access = ov::as_type_ptr<ov::snippets::op::MemoryAccess>(child_node);
if (memory_access && memory_access->is_memory_access_input_port(port)) {
memory_access->set_input_offset(offset, port);
} else if (ov::is_type<op::LoopEnd>(child_node)) {
// After Loop initialization, Buffer can be connected to LoopEnd - it's ok
continue;
} else {
OPENVINO_THROW(
"Buffer::set_offset() was called when Buffer didn't have the corresponding MemoryAccess op for offset propagation");
}
}
}
bool AllocateBuffers::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::AllocateBuffers");
bool modified = false;
size_t offset = 0;
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto& expr = *expr_it;
if (auto buffer = as_type_ptr<op::Buffer>(expr->get_node())) {
const auto buffer_size = buffer->get_byte_size();
// If it's the first buffer, offsets are zero => nothing to propagate, can continue
if (m_buffer_scratchpad_size == 0) {
m_buffer_scratchpad_size += buffer_size;
continue;
}
if (buffer->is_intermediate_memory()) {
const auto& parent_expr = expr->get_input_port_connector(0)->get_source().get_expr();
const auto& parent_node = parent_expr->get_node();
// Full MemoryAccess ops need new memory. Previous logic is to check for parent isn't Loop
// TODO: It should be unified in MemoryManager with memory reuse in the near future
const auto ma = ov::as_type_ptr<op::MemoryAccess>(parent_node);
if (ma && ma->is_full_memory_access_op()) {
offset = m_buffer_scratchpad_size;
buffer->set_offset(static_cast<int64_t>(offset));
propagate_offset(linear_ir, *expr_it, offset);
m_buffer_scratchpad_size += buffer_size;
continue;
}
const auto current_allocated_memory_size = m_buffer_scratchpad_size - offset;
if (buffer_size > current_allocated_memory_size) {
m_buffer_scratchpad_size += (buffer_size - current_allocated_memory_size);
// Note: we don't update offset because we just add memory to needed size
}
propagate_offset(linear_ir, *expr_it, offset);
} else {
// Single Buffer without input should allocate new memory
offset = m_buffer_scratchpad_size;
buffer->set_offset(static_cast<int64_t>(offset));
propagate_offset(linear_ir, *expr_it, offset);
m_buffer_scratchpad_size += buffer_size;
}
modified = true;
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -1,120 +1,135 @@
// Copyright (C) 2018-2023 Intel Corporation // Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
// //
#include <snippets/itt.hpp> #include "snippets/lowered/pass/assign_registers.hpp"
#include "snippets/pass/assign_registers.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/snippets_isa.hpp" #include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
// This header is needed to avoid MSVC warning "C2039: 'inserter': is not a member of 'std'"
#include <iterator> #include <iterator>
#if defined(__clang__) namespace ov {
# pragma clang diagnostic push namespace snippets {
# pragma clang diagnostic ignored "-Wunused-lambda-capture" namespace lowered {
#endif namespace pass {
namespace { bool AssignRegisters::run(LinearIR& linear_ir) {
constexpr size_t reg_count = 16lu; OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::AssignRegisters")
using opRegType = ngraph::snippets::Generator::opRegType;
} // namespace
bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr<ov::Model>& f) {
RUN_ON_MODEL_SCOPE(AssignRegisters);
OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::AssignRegisters")
using Reg = size_t; using Reg = size_t;
using tensor = std::shared_ptr<descriptor::Tensor>; using tensor = PortConnectorPtr;
auto ops = f->get_ordered_ops(); const auto& expressions = linear_ir.get_ops();
std::vector<std::pair<opRegType, std::shared_ptr<Node>>> typed_ops; std::vector<std::pair<Generator::opRegType, ExpressionPtr>> typed_ops;
for (const auto& op : ops) { NodeVector ops;
typed_ops.emplace_back(std::make_pair(m_reg_type_mapper(op), op)); Reg num_parameters = 0;
Reg num_results = 0;
Reg num_expressions = 0;
for (auto& expr : expressions) {
auto op = expr->get_node();
auto reg_type = m_reg_type_mapper(op);
typed_ops.emplace_back(reg_type, expr);
num_parameters += is_type<ov::op::v0::Parameter>(op);
num_results += is_type<ov::op::v0::Result>(op);
ops.push_back(op);
num_expressions++;
} }
size_t counter_vec = 0; size_t counter_vec = 0;
size_t counter_gpr = 0; size_t counter_gpr = 0;
std::map<tensor, Reg> regs_vec, regs_gpr; std::map<tensor, Reg> regs_vec, regs_gpr;
// Define a set of immune tensors that will be ignored by auto reg allocation => their reg allocation is done manually // Define a set of immune tensors that will be ignored by auto reg allocation => their reg allocation is done manually
std::map<tensor, Reg> manually_assigned_gprs, manually_assigned_vecs; std::map<tensor, Reg> manually_assigned_gprs, manually_assigned_vecs;
const auto IS_MANUALLY_ALLOCATED_REG = SIZE_MAX; const auto IS_MANUALLY_ALLOCATED_REG = SIZE_MAX;
const auto num_parameters = f->get_parameters().size();
const auto num_results = f->get_results().size();
auto accumulator_reg = 0lu; auto accumulator_reg = 0lu;
for (const auto& op : ops) { for (const auto& expr : expressions) {
if (const auto& param = ov::as_type_ptr<ov::op::v0::Parameter>(op)) { auto op = expr->get_node();
manually_assigned_gprs[op->output(0).get_tensor_ptr()] = if (const auto io_expr = std::dynamic_pointer_cast<IOExpression>(expr)) {
static_cast<Reg>(f->get_parameter_index(param)); if (io_expr->get_type() == IOExpression::io_type::INPUT)
} else if (const auto& result = ov::as_type_ptr<opset1::Result>(op)) { manually_assigned_gprs[expr->get_output_port_connector(0)] = io_expr->get_index();
// here we use the fact that Result input & output tensors are identical by construction else if (io_expr->get_type() == IOExpression::io_type::OUTPUT)
manually_assigned_gprs[op->output(0).get_tensor_ptr()] = manually_assigned_gprs[expr->get_input_port_connector(0)] = num_parameters + io_expr->get_index();
static_cast<Reg>(f->get_result_index(result) + num_parameters); else
} else if (const auto buffer = ov::as_type_ptr<op::Buffer>(op)) { OPENVINO_THROW("Unsupported io_type detected");
} else if (const auto& buffer = ov::as_type_ptr<op::Buffer>(op)) {
const auto buffer_id = buffer->get_id();
// All buffers have one common data pointer // All buffers have one common data pointer
if (buffer->is_intermediate_memory()) { if (buffer->is_intermediate_memory()) {
manually_assigned_gprs[op->input(0).get_tensor_ptr()] = manually_assigned_gprs[expr->get_input_port_connector(0)] =
static_cast<Reg>(num_results + num_parameters); static_cast<Reg>(num_results + num_parameters + buffer_id);
} }
manually_assigned_gprs[op->output(0).get_tensor_ptr()] = manually_assigned_gprs[expr->get_output_port_connector(0)] =
static_cast<Reg>(num_results + num_parameters); static_cast<Reg>(num_results + num_parameters + buffer_id);
} else if (ov::is_type<op::HorizonMax>(op) || ov::is_type<op::HorizonSum>(op)) { } else if (ov::is_type<op::HorizonMax>(op) || ov::is_type<op::HorizonSum>(op)) {
// Only in SoftmaxDecomposition ReduceMax and ReduceSum use HorizonMax/HorizonSum and VectorBuffer. // Only in SoftmaxDecomposition ReduceMax and ReduceSum use HorizonMax/HorizonSum and VectorBuffer.
// We should manually set the one vector register for VectorBuffer and Max/Sum output to simulate a accumulator // We should manually set the one vector register for VectorBuffer and Max/Sum output to simulate a accumulator
// TODO [96351]: We should rewrite accumulator pattern using another way // TODO [96351]: We should rewrite accumulator pattern using another way
const auto input = op->get_input_node_shared_ptr(0); // input - it's accumulator math op: Add or Max const auto& input_tensor = expr->get_input_port_connector(0);
for (size_t i = 0; i < input->get_input_size(); ++i) { const auto& input_expr = input_tensor->get_source().get_expr();
if (ov::is_type<op::VectorBuffer>(input->get_input_node_shared_ptr(i))) { const auto& input_expr_input_tensors = input_expr->get_input_port_connectors();
manually_assigned_vecs[input->input(i).get_tensor_ptr()] = for (const auto& tensor : input_expr_input_tensors) {
if (ov::is_type<op::VectorBuffer>(tensor->get_source().get_expr()->get_node())) {
manually_assigned_vecs[tensor] = static_cast<Reg>(accumulator_reg);
}
}
const auto& output_tensor = expr->get_output_port_connector(0);
manually_assigned_vecs[input_tensor] = static_cast<Reg>(accumulator_reg);
manually_assigned_vecs[output_tensor] = static_cast<Reg>(accumulator_reg);
for (const auto& child_expr_input : output_tensor->get_consumers()) {
if (ov::is_type<op::BroadcastMove>(child_expr_input.get_expr()->get_node())) {
manually_assigned_vecs[child_expr_input.get_expr()->get_output_port_connector(0)] =
static_cast<Reg>(accumulator_reg); static_cast<Reg>(accumulator_reg);
} }
} }
manually_assigned_vecs[input->output(0).get_tensor_ptr()] = // TODO: Fix via common pipeline using LoopEnd:
static_cast<Reg>(accumulator_reg); // All operations `outside loop` after Horizon ops should have the same register to avoid using it in the next Loop
manually_assigned_vecs[op->output(0).get_tensor_ptr()] = const auto current_loops_ids = expr->get_loop_ids();
auto next_expr = output_tensor->get_consumers().begin()->get_expr();
while (next_expr->get_loop_ids() == current_loops_ids) {
manually_assigned_vecs[next_expr->get_output_port_connector(0)] =
static_cast<Reg>(accumulator_reg); static_cast<Reg>(accumulator_reg);
next_expr = next_expr->get_output_port_connector(0)->get_consumers().begin()->get_expr();
}
// If there is Broadcast, it should have the same register as Horizon op
// because it's a result of the accumulator as well
for (auto& out : op->output(0).get_target_inputs()) {
const auto child = out.get_node()->shared_from_this();
if (ov::is_type<op::BroadcastMove>(child)) {
manually_assigned_vecs[child->output(0).get_tensor_ptr()] =
static_cast<Reg>(accumulator_reg);
}
}
accumulator_reg++; accumulator_reg++;
} }
} }
auto enumerate_out_tensors = [IS_MANUALLY_ALLOCATED_REG] (const std::shared_ptr<ov::Node>& op, // Note: have to specify default capture "=" due to MSVC bug (it doesn't capture const expressions implicitly)
// Otherwise WIN build fails with "IS_MANUALLY_ALLOCATED_REG cannot be implicitly captured because no default capture mode has been specified"
// the same problem with all the other lambdas in this file
auto enumerate_out_tensors = [=] (const ExpressionPtr& expr,
decltype(regs_vec)& reg_map, decltype(regs_vec)& reg_map,
const std::map<tensor, Reg>& manually_assigned_regs, const std::map<tensor, Reg>& manually_assigned_regs,
size_t& counter) { size_t& counter) {
for (const auto& output : op->outputs()) { for (const auto& out_tensor : expr->get_output_port_connectors()) {
const auto& t = output.get_tensor_ptr();
// Note that some ops might have identical input&output tensors (Result and Tile* for ex.) // Note that some ops might have identical input&output tensors (Result and Tile* for ex.)
// so we have to check that the tensor has not been enumerated already // so we have to check that the tensor has not been enumerated already
if (reg_map.count(t) == 0) { if (reg_map.count(out_tensor) == 0) {
reg_map[t] = manually_assigned_regs.count(t) == 0 ? counter++ : IS_MANUALLY_ALLOCATED_REG; reg_map[out_tensor] = manually_assigned_regs.count(out_tensor) == 0 ? counter++ : IS_MANUALLY_ALLOCATED_REG;
} }
} }
}; };
for (const auto& t_op : typed_ops) { for (const auto& t_op : typed_ops) {
switch (t_op.first) { switch (t_op.first) {
case opRegType::vec2vec: case Generator::opRegType::vec2vec:
case opRegType::gpr2vec: case Generator::opRegType::gpr2vec:
enumerate_out_tensors(t_op.second, regs_vec, manually_assigned_vecs, counter_vec); enumerate_out_tensors(t_op.second, regs_vec, manually_assigned_vecs, counter_vec);
break; break;
case opRegType::gpr2gpr: case Generator::opRegType::gpr2gpr:
case opRegType::vec2gpr: case Generator::opRegType::vec2gpr:
enumerate_out_tensors(t_op.second, regs_gpr, manually_assigned_gprs, counter_gpr); enumerate_out_tensors(t_op.second, regs_gpr, manually_assigned_gprs, counter_gpr);
break; break;
} }
} }
// todo: make one for gpr and one for vector // todo: make one for gpr and one for vector
std::vector<std::set<Reg>> used_gpr(ops.size(), std::set<Reg>()); // used = used as an input std::vector<std::set<Reg>> used_gpr(num_expressions, std::set<Reg>()); // used = used as an input
std::vector<std::set<Reg>> defined_gpr(ops.size(), std::set<Reg>()); // defined = used as output std::vector<std::set<Reg>> defined_gpr(num_expressions, std::set<Reg>()); // defined = used as output
std::vector<std::set<Reg>> used_vec(ops.size(), std::set<Reg>()); std::vector<std::set<Reg>> used_vec(num_expressions, std::set<Reg>());
std::vector<std::set<Reg>> defined_vec(ops.size(), std::set<Reg>()); std::vector<std::set<Reg>> defined_vec(num_expressions, std::set<Reg>());
auto tensor2reg = [IS_MANUALLY_ALLOCATED_REG] (const std::vector<tensor>& tensors, const std::map<tensor, Reg>& reg_map) { auto tensor2reg = [=] (const std::vector<tensor>& tensors, const std::map<tensor, Reg>& reg_map) {
std::set<Reg> result; std::set<Reg> result;
for (const auto& t : tensors) { for (const auto& t : tensors) {
if (reg_map.count(t) == 0) if (reg_map.count(t) == 0)
@ -128,25 +143,24 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr
for (size_t i = 0; i < typed_ops.size(); i++) { for (size_t i = 0; i < typed_ops.size(); i++) {
const auto& t_op = typed_ops[i]; const auto& t_op = typed_ops[i];
std::vector<tensor> used_tensors, defined_tensors; std::vector<tensor> used_tensors, defined_tensors;
for (const auto& in : t_op.second->inputs()) { for (const auto& in : t_op.second->get_input_port_connectors())
used_tensors.push_back(in.get_tensor_ptr()); used_tensors.push_back(in);
} for (const auto& out : t_op.second->get_output_port_connectors())
for (const auto& out : t_op.second->outputs()) defined_tensors.push_back(out);
defined_tensors.push_back(out.get_tensor_ptr());
switch (t_op.first) { switch (t_op.first) {
case opRegType::vec2vec: case Generator::opRegType::vec2vec:
used_vec[i] = tensor2reg(used_tensors, regs_vec); used_vec[i] = tensor2reg(used_tensors, regs_vec);
defined_vec[i] = tensor2reg(defined_tensors, regs_vec); defined_vec[i] = tensor2reg(defined_tensors, regs_vec);
break; break;
case opRegType::gpr2gpr: case Generator::opRegType::gpr2gpr:
used_gpr[i] = tensor2reg(used_tensors, regs_gpr); used_gpr[i] = tensor2reg(used_tensors, regs_gpr);
defined_gpr[i] = tensor2reg(defined_tensors, regs_gpr); defined_gpr[i] = tensor2reg(defined_tensors, regs_gpr);
break; break;
case opRegType::gpr2vec: case Generator::opRegType::gpr2vec:
used_gpr[i] = tensor2reg(used_tensors, regs_gpr); used_gpr[i] = tensor2reg(used_tensors, regs_gpr);
defined_vec[i] = tensor2reg(defined_tensors, regs_vec); defined_vec[i] = tensor2reg(defined_tensors, regs_vec);
break; break;
case opRegType::vec2gpr: case Generator::opRegType::vec2gpr:
used_vec[i] = tensor2reg(used_tensors, regs_vec); used_vec[i] = tensor2reg(used_tensors, regs_vec);
defined_gpr[i] = tensor2reg(defined_tensors, regs_gpr); defined_gpr[i] = tensor2reg(defined_tensors, regs_gpr);
break; break;
@ -174,19 +188,28 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr
std::inserter(life_in_vec[n], life_in_vec[n].begin())); std::inserter(life_in_vec[n], life_in_vec[n].begin()));
} }
for (size_t n = 0; n < typed_ops.size(); n++) { for (size_t n = 0; n < typed_ops.size(); n++) {
auto op = typed_ops[n].second; const auto& expr = typed_ops[n].second;
for (const auto& out : op->outputs()) { if (is_type<op::LoopEnd>(expr->get_node()) || is_type<ov::op::v0::Result>(expr->get_node()))
for (const auto& port : out.get_target_inputs()) { continue;
size_t k = std::find(ops.begin(), ops.end(), port.get_node()->shared_from_this()) - ops.begin(); for (const auto& out : expr->get_output_port_connectors()) {
if (k == ops.size()) for (const auto& child_expr_input : out->get_consumers()) {
const auto& child_expr = child_expr_input.get_expr();
auto child_it = linear_ir.begin();
std::advance(child_it, n);
size_t k = n;
while (child_it != linear_ir.end() && *child_it != child_expr) {
child_it++;
k++;
}
if (k == typed_ops.size())
OPENVINO_THROW("assign registers can't find target op in the body"); OPENVINO_THROW("assign registers can't find target op in the body");
switch (typed_ops[k].first) { switch (typed_ops[k].first) {
case opRegType::vec2vec: case Generator::opRegType::vec2vec:
case opRegType::vec2gpr: case Generator::opRegType::vec2gpr:
life_out_vec[n].insert(life_in_vec[k].begin(), life_in_vec[k].end()); life_out_vec[n].insert(life_in_vec[k].begin(), life_in_vec[k].end());
break; break;
case opRegType::gpr2gpr: case Generator::opRegType::gpr2gpr:
case opRegType::gpr2vec: case Generator::opRegType::gpr2vec:
life_out_gpr[n].insert(life_in_gpr[k].begin(), life_in_gpr[k].end()); life_out_gpr[n].insert(life_in_gpr[k].begin(), life_in_gpr[k].end());
break; break;
} }
@ -281,8 +304,7 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr
std::map<tensor, Reg> assigned_regs(std::move(manually_assigned_gprs)); std::map<tensor, Reg> assigned_regs(std::move(manually_assigned_gprs));
assigned_regs.insert(manually_assigned_vecs.begin(), manually_assigned_vecs.end()); assigned_regs.insert(manually_assigned_vecs.begin(), manually_assigned_vecs.end());
auto register_assigned_regs = [IS_MANUALLY_ALLOCATED_REG, &assigned_regs](const std::map<tensor, Reg>& unique_regs, auto register_assigned_regs = [=, &assigned_regs](const std::map<tensor, Reg>& unique_regs, const std::map<Reg, Reg>& unique2reused) {
const std::map<Reg, Reg>& unique2reused) {
for (const auto& reg : unique_regs) { for (const auto& reg : unique_regs) {
if (reg.second == IS_MANUALLY_ALLOCATED_REG) if (reg.second == IS_MANUALLY_ALLOCATED_REG)
continue; continue;
@ -294,16 +316,22 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr
register_assigned_regs(regs_vec, unique2reused_map_vec); register_assigned_regs(regs_vec, unique2reused_map_vec);
register_assigned_regs(regs_gpr, unique2reused_map_gpr); register_assigned_regs(regs_gpr, unique2reused_map_gpr);
for (const auto& t_op : typed_ops) { for (auto& t_op : typed_ops) {
for (const auto& out : t_op.second->outputs()) { RegInfo rinfo;
const auto& t = out.get_tensor_ptr(); const auto& expr = t_op.second;
auto& rt = t->get_rt_info(); for (const auto& in : expr->get_input_port_connectors()) {
rt["reginfo"] = static_cast<size_t>(assigned_regs[t]); rinfo.first.push_back(assigned_regs[in]);
} }
for (const auto& out : expr->get_output_port_connectors()) {
rinfo.second.push_back(assigned_regs[out]);
}
t_op.second->set_reg_info(rinfo);
} }
return false; return false;
} }
#if defined(__clang__) } // namespace pass
# pragma clang diagnostic pop } // namespace lowered
#endif } // namespace snippets
} // namespace ov

View File

@ -0,0 +1,109 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/clean_repeated_ptr_shifts.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
bool CleanRepeatedDataPointerShifts::reuse_increments(const LinearIR& linear_ir, const ExpressionPtr& loop_end_expr) {
const auto loop_end = ov::as_type_ptr<op::LoopEnd>(loop_end_expr->get_node());
if (!loop_end)
return false;
const auto loop_connectors = loop_end_expr->get_input_port_connectors();
const auto input_count = loop_end->get_input_num();
const auto output_count = loop_end->get_output_num();
std::set<size_t> resetting_data_indexes;
std::set<size_t> buffers_ids;
// We count expressions only on inputs of Loop because we can only read from the same data but not write to the same data.
// Parameter
// | |
// Load_0 Load_1
std::set<ExpressionPtr> read_data_exprs;
for (size_t i = 0; i < input_count; ++i) {
const auto& parent_output = loop_connectors[i]->get_source().get_expr();
if (const auto buffer = ov::as_type_ptr<op::Buffer>(parent_output->get_node())) {
// If Buffer is missed in set, Just save - it's first meeting
if (buffers_ids.count(buffer->get_id()) == 0) {
buffers_ids.insert(buffer->get_id());
} else {
// The Buffer with the same ID is in set - need to add this Buffer idx to set of Buffers for resetting
resetting_data_indexes.insert(i);
}
} else {
// Remember the current expression if missed
if (read_data_exprs.count(parent_output) == 0) {
read_data_exprs.insert(parent_output);
} else {
// Otherwise we have several Load-semantic expressions which read from the same data.
// Have to zero ptr increments and finalization offsets for all expression except one.
resetting_data_indexes.insert(i);
}
}
}
for (size_t i = 0; i < output_count; ++i) {
const auto consumer_inputs = loop_connectors[input_count + i]->get_consumers();
size_t buffer_count = 0;
size_t loop_count = 0;
for (const auto& consumer_input : consumer_inputs) {
const auto& child_node = consumer_input.get_expr()->get_node();
if (const auto buffer = ov::as_type_ptr<op::Buffer>(child_node)) {
buffer_count++;
// If Buffer is missed in set, Just save - it's first meeting
if (buffers_ids.count(buffer->get_id()) == 0) {
buffers_ids.insert(buffer->get_id());
} else {
// The Buffer with the same ID is in set - need to add this Buffer idx to set of Buffers for resetting
resetting_data_indexes.insert(input_count + i);
}
} else if (ov::is_type<op::LoopEnd>(child_node)) {
loop_count++;
}
}
if (buffer_count > 0) {
OPENVINO_ASSERT((buffer_count == 1) && (buffer_count + loop_count == consumer_inputs.size()),
"Loop output must have not more than 1 Buffer");
}
}
if (resetting_data_indexes.empty())
return false;
auto new_ptr_increments = loop_end->get_ptr_increments();
auto new_finalization_offsets = loop_end->get_finalization_offsets();
for (auto idx_to_drop : resetting_data_indexes) {
new_ptr_increments[idx_to_drop] = 0;
new_finalization_offsets[idx_to_drop] = 0;
}
loop_end->set_ptr_increments(new_ptr_increments);
loop_end->set_finalization_offsets(new_finalization_offsets);
return true;
}
bool CleanRepeatedDataPointerShifts::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::CleanRepeatedDataPointerShifts")
bool modified = false;
for (const auto& expr : linear_ir) {
const auto& node = expr->get_node();
if (ov::is_type<op::LoopEnd>(node)) {
modified |= reuse_increments(linear_ir, expr);
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,66 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/cleanup_loop_offsets.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
bool CleanupLoopOffsets::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::CleanupLoopOffsets")
if (linear_ir.empty())
return false;
bool is_modified = false;
// Note: it doesn't make sense to check the last expression - it must always be Result
const auto before_last = std::prev(linear_ir.end());
for (auto expr_it = linear_ir.begin(); expr_it != before_last; expr_it++) {
const auto& node = expr_it->get()->get_node();
if (auto loop_end = as_type_ptr<op::LoopEnd>(node)) {
auto next_expr_it = std::next(expr_it);
const auto& next_node = next_expr_it->get()->get_node();
// Note: Finalization offsets before the Result can be safely disregarded
// TODO: Need verify that Buffers on the inputs doesn't have other consumers (other Loops)
// and this Loop doesn't have Buffer on other outputs.
if (is_type<ov::op::v0::Result>(next_node)) {
const auto& fin_offsets = loop_end->get_finalization_offsets();
loop_end->set_finalization_offsets(std::vector<int64_t>(fin_offsets.size(), 0));
is_modified = true;
}
if (auto outer_loop_end = as_type_ptr<op::LoopEnd>(next_node)) {
auto fin_offsets = loop_end->get_finalization_offsets();
std::unordered_map<PortConnectorPtr, size_t> per_port_connector_offset;
const auto& loop_inputs = expr_it->get()->get_input_port_connectors();
for (size_t i = 0; i < fin_offsets.size(); i++)
per_port_connector_offset[loop_inputs[i]] = i;
auto outer_ptr_increments = outer_loop_end->get_ptr_increments();
const auto& outer_loop_inputs = next_expr_it->get()->get_input_port_connectors();
for (size_t i = 0; i < outer_ptr_increments.size(); i++) {
const auto& managed_connector = outer_loop_inputs[i];
const auto& found = per_port_connector_offset.find(managed_connector);
if (found != per_port_connector_offset.end()) {
outer_ptr_increments[i] += fin_offsets[found->second];
fin_offsets[found->second] = 0;
is_modified = true;
}
}
outer_loop_end->set_ptr_increments(outer_ptr_increments);
loop_end->set_finalization_offsets(fin_offsets);
}
}
}
return is_modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,343 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/fuse_loops.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
using LoopManager = LinearIR::LoopManager;
using LoopInfoPtr = LoopManager::LoopInfoPtr;
FuseLoops::FuseLoops() : Pass() {}
bool FuseLoops::can_be_fused(const LoopInfoPtr& loop_current, const LoopInfoPtr& loop_target) {
auto current_work_amount = loop_current->work_amount;
auto current_increment = loop_current->increment;
auto target_work_amount = loop_target->work_amount;
auto target_increment = loop_target->increment;
// Loop fusion is supported only if Loops have equal increments and the equal/broadcastable work amounts.
// Note: For example, Broadcastable work amounts are possible in the following case:
// Relu_0 [16x1] Relu_1 [16x128]
// \ /
// Add [16x128]
// Because of expression order in linear IR and work of MarkLoop algorithm, there are 2 Inner Loops:
// - Relu_0 with work amount `1` and increment `vector size`
// - Relu_1 and Add with work amount `128` and increment `vector size`
// We can fuse them into one Loop with work amount `128` and increment `vector size`
const auto supported_work_amount = current_work_amount == target_work_amount || current_work_amount == 1 || target_work_amount == 1;
const auto supported_increment = current_increment == target_increment;
return supported_work_amount && supported_increment;
}
void FuseLoops::fuse_points(std::vector<ExpressionPort>& exit_points, std::vector<ExpressionPort>& entry_points,
LinearIR::constExprIt loop_begin_pos, LinearIR::constExprIt loop_end_pos) {
std::vector<ExpressionPort> new_exit_points;
for (const auto& exit_point : exit_points) {
const auto consumers_inputs = exit_point.get_connected_ports();
std::set<ExpressionPort> mapped_entry_points;
std::set<ExpressionPtr> outside_consumers;
for (const auto& consumer_input : consumers_inputs) {
const auto entry_point_it = std::find(entry_points.begin(), entry_points.end(), consumer_input);
if (entry_point_it != entry_points.end()) {
mapped_entry_points.insert(*entry_point_it);
continue;
}
const auto& consumer = consumer_input.get_expr();
const auto inside_it = std::find(loop_begin_pos, loop_end_pos, consumer);
if (inside_it == loop_end_pos) {
outside_consumers.insert(consumer);
}
}
// Remove entry points which are mapped
auto last_point = entry_points.end();
for (const auto& mapped_entry_point : mapped_entry_points) {
last_point = std::remove(entry_points.begin(), last_point, mapped_entry_point);
}
entry_points.resize(entry_points.size() - mapped_entry_points.size());
// Leave exit point if there are consumers outside after fusion
if (!outside_consumers.empty()) {
new_exit_points.push_back(exit_point);
}
}
exit_points = new_exit_points;
}
bool FuseLoops::fuse_upper_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_entry_point,
size_t current_loop_id, size_t target_loop_id, size_t dim_idx,
LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos) {
const auto& loop_current = loop_manager->get_loop_info(current_loop_id);
const auto& loop_target = loop_manager->get_loop_info(target_loop_id);
if (!can_be_fused(loop_current, loop_target))
return false;
LinearIR::constExprIt target_loop_begin_pos, target_loop_end_pos;
loop_manager->get_loop_bounds(linear_ir, target_loop_id, target_loop_begin_pos, target_loop_end_pos);
// We can fuse Loop_up to Loop_down only in cases when other consumers of Loop_up are after Loop_down
// Because Loop_up should be explicitly moved before Loop_down in linear IR, and we must save control dependency
bool is_fusion_allowed = true;
for (size_t i = 0; i < loop_target->exit_exprs.size() && is_fusion_allowed; ++i) {
const auto target_exit_point = loop_target->exit_exprs[i];
const auto consumer_inputs = target_exit_point.get_connected_ports();
for (const auto& consumer_input : consumer_inputs) {
const auto& consumer = consumer_input.get_expr();
if (ov::is_type<ov::op::v0::Result>(consumer->get_node()) || consumer == current_entry_point.get_expr())
continue;
// The fusing is only valid if target Loop consumer (the Consumer is outside of target Loop)
// is after current Loop (after Loop_down).
is_fusion_allowed = consumer->get_loop_ids()[dim_idx] == target_loop_id || // is inside target Loop
consumer->get_loop_ids()[dim_idx] == current_loop_id || // is inside current Loop
std::find(current_loop_end_pos, linear_ir.cend(), consumer) != linear_ir.end(); // is after current Loop
}
}
if (!is_fusion_allowed)
return false;
// Update entry and exit points in current Loop information before moving till Loop iterators are valid
auto current_entry_points = loop_current->entry_exprs;
auto current_exit_points = loop_current->exit_exprs;
auto target_entry_points = loop_target->entry_exprs;
auto target_exit_points = loop_target->exit_exprs;
fuse_points(target_exit_points, current_entry_points, target_loop_begin_pos, target_loop_end_pos);
const auto insertion_place = current_loop_begin_pos;
const auto is_move_needed = target_loop_end_pos != current_loop_begin_pos;
for (auto it = target_loop_begin_pos; it != target_loop_end_pos;) {
auto expr_it = it;
const auto& expr = *expr_it;
// After moving we will have `it` in new place in the current Loop,
// but for markup we need have the expression from the target Loop.
// Because of that we manually increment iterator before moving
it = std::next(it);
expr->set_loop_id(current_loop_id, dim_idx);
if (is_move_needed)
linear_ir.move(expr_it, insertion_place);
}
// Update current Loop bounds:
current_loop_begin_pos = target_loop_begin_pos;
// Update work_amount for Loop (increment is constant because increments must be the identical for fusion):
loop_current->work_amount = std::max(loop_current->work_amount, loop_target->work_amount);
std::vector<ExpressionPort> new_entries = target_entry_points;
new_entries.insert(new_entries.end(), current_entry_points.begin(), current_entry_points.end());
std::vector<ExpressionPort> new_exits = target_exit_points;
new_exits.insert(new_exits.end(), current_exit_points.begin(), current_exit_points.end());
loop_current->entry_exprs = new_entries;
loop_current->exit_exprs = new_exits;
return true;
}
bool FuseLoops::fuse_lower_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_exit_point,
size_t current_loop_id, size_t target_loop_id, size_t dim_idx,
LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos) {
const auto& loop_current = loop_manager->get_loop_info(current_loop_id);
const auto& loop_target = loop_manager->get_loop_info(target_loop_id);
if (!can_be_fused(loop_current, loop_target))
return false;
// We can fuse Loop_down to Loop_up only in cases when other parents of Loop_down are before Loop_up
// Because Loop_down should be explicitly moved after Loop_up in linear IR, and we must save control dependency
bool is_fusion_allowed = true;
for (size_t i = 0; i < loop_target->entry_exprs.size() && is_fusion_allowed; ++i) {
const auto target_entry_point = loop_target->entry_exprs[i];
const auto parent_expr_output = *target_entry_point.get_connected_ports().begin();
const auto& parent_expr = parent_expr_output.get_expr();
if (ov::is_type<ov::op::v0::Parameter>(parent_expr->get_node()) || parent_expr == current_exit_point.get_expr())
continue;
is_fusion_allowed = parent_expr->get_loop_ids()[dim_idx] == current_loop_id || // The parent expr is from the same current Loop
std::find(linear_ir.cbegin(), current_loop_begin_pos, parent_expr) != current_loop_begin_pos; // The parent is before current Loop
}
if (!is_fusion_allowed)
return false;
LinearIR::constExprIt target_loop_begin_pos, target_loop_end_pos;
loop_manager->get_loop_bounds(linear_ir, target_loop_id, target_loop_begin_pos, target_loop_end_pos);
// Update entry and exit points in current Loop information before moving till Loop iterators are valid
auto current_entry_points = loop_current->entry_exprs;
auto current_exit_points = loop_current->exit_exprs;
auto target_entry_points = loop_target->entry_exprs;
auto target_exit_points = loop_target->exit_exprs;
fuse_points(current_exit_points, target_entry_points, current_loop_begin_pos, current_loop_end_pos);
const auto insertion_place = current_loop_end_pos;
const auto is_move_needed = insertion_place != target_loop_begin_pos;
for (auto it = target_loop_begin_pos; it != target_loop_end_pos;) {
auto expr_it = it;
const auto& expr = *expr_it;
// After moving we will have `it` in new place in the current Loop,
// but for markup we need have the expression from the target Loop.
// Because of that we manually increment iterator before moving
it = std::next(it);
expr->set_loop_id(current_loop_id, dim_idx);
if (is_move_needed)
linear_ir.move(expr_it, insertion_place);
}
// Update current Loop bounds:
if (!is_move_needed)
current_loop_end_pos = target_loop_end_pos;
// Update work_amount for Loop (increment is constant because increments must be the identical for fusion):
loop_current->work_amount = std::max(loop_current->work_amount, loop_target->work_amount);
std::vector<ExpressionPort>& new_entries = current_entry_points;
new_entries.insert(new_entries.end(), target_entry_points.begin(), target_entry_points.end());
std::vector<ExpressionPort>& new_exits = current_exit_points;
new_exits.insert(new_exits.end(), target_exit_points.begin(), target_exit_points.end());
loop_current->entry_exprs = new_entries;
loop_current->exit_exprs = new_exits;
return true;
}
bool FuseLoops::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::FuseLoops")
if (linear_ir.empty())
return false;
const auto& loop_manager = linear_ir.get_loop_manager();
std::vector<size_t> prev_expr_loops;
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto expr = *expr_it;
const auto& node = expr->get_node();
if (ov::is_type<ov::op::v0::Parameter>(node) ||
ov::is_type<ov::op::v0::Constant>(node) ||
ov::is_type<ov::op::v0::Result>(node))
continue;
// Outer Loop ----> Inner Loop
const auto expr_loops = expr->get_loop_ids();
const auto loop_depth = expr_loops.size();
size_t diff_idx = 0;
if (prev_expr_loops.empty()) {
prev_expr_loops = expr_loops;
} else {
OPENVINO_ASSERT(loop_depth == prev_expr_loops.size(),
"Expressions in Linear IR must have the same count of Loop identifiers");
for (; diff_idx < loop_depth; ++diff_idx) {
if (expr_loops[diff_idx] != prev_expr_loops[diff_idx])
break;
}
}
for (size_t dim_idx = diff_idx; dim_idx < loop_depth; ++dim_idx) {
const auto loop_id = expr_loops[dim_idx];
if (loop_id == Expression::LOOP_NULL_ID)
continue;
const auto loop_info = loop_manager->get_loop_info(loop_id);
LinearIR::constExprIt loop_begin_pos, loop_end_pos;
loop_manager->get_loop_bounds(linear_ir, loop_id, loop_begin_pos, loop_end_pos);
// We fuse upper Loops into the current till we can do it.
// After that we fuse lower Loops into the current till we can do it.
// If we have fused on outputs we should verify possible fusions on inputs again because of new entry points
bool need_fusion_checks = true;
while (need_fusion_checks) {
// Loop_0 (Upper) |
// | => |
// Loop_1 (Current) Loop_0 + Loop_1 => new `Loop_1`
auto entry_points = loop_info->entry_exprs;
bool was_fusion_up = false;
for (size_t in_port = 0; in_port < entry_points.size() && !was_fusion_up; ++in_port) {
const auto entry_point = entry_points[in_port];
const auto parent_expr_output = *entry_point.get_connected_ports().begin();
const auto& parent_expr = parent_expr_output.get_expr();
const auto parent = parent_expr->get_node();
if (ov::is_type<ov::op::v0::Constant>(parent) ||
ov::is_type<ov::op::v0::Parameter>(parent) ||
ov::is_type<op::Buffer>(parent)) {
continue;
}
const auto loop_ids_target = parent_expr->get_loop_ids();
OPENVINO_ASSERT(loop_depth == loop_ids_target.size(),
"Expressions in Linear IR must have the same count of Loop identifiers");
const auto loop_id_target = loop_ids_target[dim_idx];
OPENVINO_ASSERT(loop_id != loop_id_target,
"Loops cannot have parents of entry points with the same identifier");
if (loop_id_target == Expression::LOOP_NULL_ID)
continue;
if (fuse_upper_into_current(linear_ir, loop_manager, entry_point, loop_id, loop_id_target,
dim_idx, loop_begin_pos, loop_end_pos)) {
was_fusion_up = true;
loop_manager->remove_loop_info(loop_id_target);
}
}
// If Loops were fused and there are new entry_exprs, we should check for possible fusion again
if (was_fusion_up && entry_points != loop_info->entry_exprs)
continue;
// Loop_0 (Current) Loop_0 + Loop_1 => new `Loop_0`
// | => |
// Loop_1 (Lower) |
auto exit_points = loop_info->exit_exprs;
bool was_fusion_down = false;
for (size_t out_port = 0; out_port < exit_points.size() && !was_fusion_down; ++out_port) {
const auto exit_point = exit_points[out_port];
const auto consumer_exprs_inputs = exit_point.get_connected_ports();
for (const auto& consumer_expr_input : consumer_exprs_inputs) {
const auto& consumer_expr = consumer_expr_input.get_expr();
const auto consumer = consumer_expr->get_node();
if (ov::is_type<ov::op::v0::Result>(consumer) ||
ov::is_type<op::Buffer>(consumer)) {
continue;
}
const auto loop_ids_target = consumer_expr->get_loop_ids();
OPENVINO_ASSERT(loop_depth == loop_ids_target.size(),
"Expressions in Linear IR must have the same count of Loop identifiers");
// The exit point of Loop can have several consumers where some of them can be in this Loop as well
// So we skip this consumer.
const auto loop_id_target = loop_ids_target[dim_idx];
if (loop_id == loop_id_target || loop_id_target == Expression::LOOP_NULL_ID)
continue;
if (fuse_lower_into_current(linear_ir, loop_manager, exit_point, loop_id, loop_id_target,
dim_idx, loop_begin_pos, loop_end_pos)) {
was_fusion_down = true;
loop_manager->remove_loop_info(loop_id_target);
// Need to check for possible fusion again because of new input expressions for Loop
break;
}
}
}
// We iterated by each exit point and didn't fuse new Loops -> we can finish check for possible fusions on outputs.
if (!was_fusion_down)
need_fusion_checks = false;
}
}
}
return true;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,181 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/identify_buffers.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
namespace {
inline size_t index(size_t col_num, size_t row, size_t col) {
return row * col_num + col;
}
} // namespace
std::vector<bool> IdentifyBuffers::create_adjacency_matrix(const LinearIR& linear_ir, const BufferSet& buffers) const {
// The sync point to check for adjacency is Loop because only in Loop we increment pointers.
// So if some Buffers in the one Loop have conflict (cannot be inplace: the different ptr increment and data sizes)
// they are called as adjacent
const auto size = buffers.size();
// TODO: Can we use triangular matrix? Need verify using tests
std::vector<bool> adj(size * size, false);
for (size_t i = 0; i < size; ++i)
adj[index(size, i, i)] = true;
// < ptr_increment, finalization_offset >
using ShiftPtrParams = std::pair<int64_t, int64_t>;
auto get_buffer_idx = [&](const std::shared_ptr<op::Buffer>& buffer) {
const auto iter = std::find(buffers.cbegin(), buffers.cend(), buffer);
NGRAPH_CHECK(iter != buffers.cend(), "Buffer wasn't find in Buffer system of Subgraph");
return std::distance(buffers.cbegin(), iter);
};
auto update_adj_matrix = [&](const std::pair<std::shared_ptr<op::Buffer>, ShiftPtrParams>& buffer,
const std::pair<std::shared_ptr<op::Buffer>, ShiftPtrParams>& neighbour_buffer) {
const bool equal_ptr_params_shifting = buffer.second == neighbour_buffer.second;
const bool equal_element_type_sizes = buffer.first->get_element_type().size() == neighbour_buffer.first->get_element_type().size();
if (!equal_ptr_params_shifting || ((buffer.second.first != 0 || buffer.second.second != 0) && !equal_element_type_sizes)) {
const auto buffer_idx = get_buffer_idx(buffer.first);
const auto neighbour_idx = get_buffer_idx(neighbour_buffer.first);
adj[index(size, neighbour_idx, buffer_idx)] = adj[index(size, buffer_idx, neighbour_idx)] = true;
}
};
for (auto expr_it = linear_ir.cbegin(); expr_it != linear_ir.cend(); expr_it++) {
const auto &expr = *expr_it;
const auto& loop_end = ov::as_type_ptr<op::LoopEnd>(expr->get_node());
if (!loop_end)
continue;
const auto input_count = loop_end->get_input_num();
const auto output_count = loop_end->get_output_num();
const auto ptr_increments = loop_end->get_ptr_increments();
const auto finalization_offsets = loop_end->get_finalization_offsets();
// Buffer -> <ptr increment, finalization_offsets>
std::map<std::shared_ptr<op::Buffer>, ShiftPtrParams> buffer_neighbours;
for (size_t i = 0; i < input_count; ++i) {
const auto& parent_output = expr->get_input_port_connector(i)->get_source().get_expr();
if (const auto buffer = ov::as_type_ptr<op::Buffer>(parent_output->get_node())) {
buffer_neighbours[buffer] = { ptr_increments[i], finalization_offsets[i] };
}
}
for (size_t i = 0; i < output_count; ++i) {
// The consumers of the corresponding Store ops
const auto index = input_count + i;
const auto consumer_inputs = expr->get_input_port_connector(index)->get_consumers();
size_t buffer_count = 0;
size_t loop_count = 0;
for (const auto& consumer_input : consumer_inputs) {
const auto& child_node = consumer_input.get_expr()->get_node();
if (const auto buffer = ov::as_type_ptr<op::Buffer>(child_node)) {
buffer_neighbours[buffer] = { ptr_increments[index], finalization_offsets[index] };
} else if (ov::is_type<op::LoopEnd>(child_node)) {
loop_count++;
}
}
if (buffer_count > 0) {
OPENVINO_ASSERT((buffer_count == 1) && (buffer_count + loop_count == consumer_inputs.size()),
"Loop output must have not more than 1 Buffer");
}
}
for (auto buffer_it = buffer_neighbours.begin(); buffer_it != buffer_neighbours.end(); ++buffer_it) {
for (auto neighbour_it = std::next(buffer_it); neighbour_it != buffer_neighbours.end(); ++neighbour_it) {
update_adj_matrix(*buffer_it, *neighbour_it);
}
}
}
return adj;
}
auto IdentifyBuffers::coloring(BufferSet& buffers, std::vector<bool>& adj) -> std::map<size_t, BufferSet> {
size_t color = 0;
std::map<size_t, BufferSet> color_groups;
const auto size = buffers.size();
for (size_t i = 0; i < size; i++) {
// The Buffer is already colored (visited) - skip
if (!buffers[i])
continue;
const auto& buffer = buffers[i];
color_groups[color].push_back(buffer); // Add to Color Group
buffers[i] = nullptr; // Remove from graph vertices
// While Buffer `i` has non-coloured non-neighbours (while row `i` contains 0)
while (!std::accumulate(adj.begin() + i * size, adj.begin() + (i + 1) * size, true, std::logical_and<bool>())) {
size_t j = i + 1;
// Find first non-adjacent and non-visited (non-colored) Buffer to color him to the same color
for (; j < size; ++j) {
if (!adj[index(size, i, j)] && buffers[j])
break;
}
// If we don't have the corresponding non-adjacent and non-colored Buffers,
// we should make break - all potential Buffers for the current color are already colored
if (j == size)
break;
const auto& neighbour_buffer = buffers[j];
color_groups[color].push_back(neighbour_buffer); // Add to Color Group
buffers[j] = nullptr; // Remove from graph vertices
// Unite adjacency links:
// All the neighbors of Buffer `j` are added to the neighbors of Buffer `i` (the `vertices` are pulled together).
// The result is an updated i-th row of the adjacency matrix,
// in which 0 are only in columns with `vertex` numbers that are not adjacent to either the i-th or j-th `vertices`.
// Mathematically, this can be replaced by the operation of OR of Boolean vectors representing strings i and j.
std::transform(adj.begin() + i * size, adj.begin() + (i + 1) * size, adj.begin() + j * size,
adj.begin() + i * size, std::logical_or<bool>());
}
color++;
}
return color_groups;
}
bool IdentifyBuffers::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::IdentifyBuffers")
// Unite Buffers using Graph coloring algorithm.
// Notes: We identify only Buffer with Intermediate memory because Buffers with new memory are used only in Brgemm case
// so these Buffers are always IntermediateBuffer nonadjacent
BufferSet buffer_exprs;
for (const auto& expr : linear_ir) {
if (const auto buffer = ov::as_type_ptr<op::Buffer>(expr->get_node())) {
buffer_exprs.push_back(buffer);
}
}
// Creation of Adj matrix
auto adj = create_adjacency_matrix(linear_ir, buffer_exprs);
// Graph coloring algorithm
const auto color_groups = coloring(buffer_exprs, adj);
for (const auto& pair : color_groups) {
const auto color = pair.first;
const auto& united_buffers = pair.second;
for (const auto& buffer : united_buffers) {
buffer->set_id(color);
}
}
return true;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,202 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/init_loops.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
namespace {
void filter_ports(LinearIR& linear_ir,
std::vector<ExpressionPort>& loop_entries, std::vector<ExpressionPort>& loop_exits) {
std::vector<ExpressionPort> new_loop_entries;
std::vector<ExpressionPort> new_loop_exits;
new_loop_entries.reserve(loop_entries.size());
new_loop_exits.reserve(loop_exits.size());
for (const auto& loop_entry_point : loop_entries) {
const auto& expr = loop_entry_point.get_expr();
const auto ma = ov::as_type_ptr<op::MemoryAccess>(expr->get_node());
if (ma && ma->is_memory_access_input_port(loop_entry_point.get_index())) {
new_loop_entries.push_back(loop_entry_point);
}
}
for (const auto& loop_exit_point : loop_exits) {
const auto& expr = loop_exit_point.get_expr();
const auto ma = ov::as_type_ptr<op::MemoryAccess>(expr->get_node());
if (ma && ma->is_memory_access_output_port(loop_exit_point.get_index())) {
new_loop_exits.push_back(loop_exit_point);
}
}
loop_entries = new_loop_entries;
loop_exits = new_loop_exits;
}
int64_t get_dim_stride(const size_t dim, const std::vector<size_t>& layout, const std::vector<size_t>& shape) {
int64_t stride = 1;
for (int i = static_cast<int>(layout.size()) - 1; i >= 0; i--) {
if (layout[i] == dim)
break;
stride *= static_cast<int64_t>(shape[layout[i]]);
}
return stride;
}
} // namespace
InitLoops::InitLoops() : Pass() {}
std::vector<int64_t> InitLoops::init_ptr_increments(const std::vector<ExpressionPort>& loop_inputs,
const std::vector<ExpressionPort>& loop_outputs,
size_t dim_idx) {
std::vector<int64_t> ptr_increments;
// Note: Need to find max relevant dim expr to account for broadcasting, collect relevant_dims as well
size_t max_relevant_dim_size = 1;
for (const auto& loop_input : loop_inputs) {
const auto& layout = loop_input.get_descriptor_ptr()->get_layout();
const auto& shape = loop_input.get_descriptor_ptr()->get_shape();
const auto& dim = *(layout.rbegin() + dim_idx);
max_relevant_dim_size = std::max(shape[dim], max_relevant_dim_size);
}
for (const auto& loop_output : loop_outputs) {
const auto& layout = loop_output.get_descriptor_ptr()->get_layout();
const auto& shape = loop_output.get_descriptor_ptr()->get_shape();
const auto& dim = *(layout.rbegin() + dim_idx);
max_relevant_dim_size = std::max(shape[dim], max_relevant_dim_size);
}
for (const auto& loop_input : loop_inputs) {
// For strides we have to use layout from source since source writes data by special rules
const auto source = *loop_input.get_connected_ports().begin();
const auto& layout = loop_input.get_descriptor_ptr()->get_layout();
const auto& shape = loop_input.get_descriptor_ptr()->get_shape();
const auto& dim = *(layout.rbegin() + dim_idx);
int64_t ptr_increment = 0;
// If relevant dim is not broadcasted, then ptr_increment is the dim stride in the new layout
if (!(shape[dim] == 1 && max_relevant_dim_size != 1))
ptr_increment = get_dim_stride(dim, source.get_descriptor_ptr()->get_layout(), shape);
ptr_increments.push_back(ptr_increment);
}
for (const auto& loop_output : loop_outputs) {
const auto& layout = loop_output.get_descriptor_ptr()->get_layout();
const auto& shape = loop_output.get_descriptor_ptr()->get_shape();
const auto& dim = *(layout.rbegin() + dim_idx);
int64_t ptr_increment = 0;
// If relevant dim is not broadcasted, then ptr_increment is the dim stride in the new layout
if (!(shape[dim] == 1 && max_relevant_dim_size != 1))
ptr_increment = get_dim_stride(dim, layout, shape);
ptr_increments.push_back(ptr_increment);
}
return ptr_increments;
}
std::vector<int64_t> InitLoops::init_finalization_offsets(const std::vector<int64_t>& ptr_increments, size_t work_amount) {
std::vector<int64_t> finalization_offsets;
for (const auto& ptr_incr : ptr_increments) {
int64_t offset = -1 * ptr_incr * work_amount;
finalization_offsets.push_back(offset);
}
return finalization_offsets;
}
std::vector<int64_t> InitLoops::init_element_type_sizes(const std::vector<ExpressionPort>& loop_inputs,
const std::vector<ExpressionPort>& loop_outputs) {
std::vector<int64_t> element_types;
element_types.reserve(loop_inputs.size() + loop_outputs.size());
for (const auto& in : loop_inputs) {
element_types.push_back(in.get_expr()->get_node()->get_input_element_type(in.get_index()).size());
}
for (const auto& out : loop_outputs) {
element_types.push_back(out.get_expr()->get_node()->get_output_element_type(out.get_index()).size());
}
return element_types;
}
void InitLoops::insertion(LinearIR& linear_ir, const LinearIR::LoopManager::LoopInfoPtr& loop_info,
size_t loop_id, size_t dim_idx, bool has_outer_loop) {
auto loop_entries = loop_info->entry_exprs;
auto loop_exits = loop_info->exit_exprs;
const auto work_amount = loop_info->work_amount;
const auto work_amount_increment = loop_info->increment;
LinearIR::constExprIt loop_begin_pos, loop_end_pos;
LinearIR::LoopManager::get_loop_bounds(linear_ir, loop_entries, loop_exits, loop_begin_pos, loop_end_pos, loop_id);
filter_ports(linear_ir, loop_entries, loop_exits);
auto ptr_increments = init_ptr_increments(loop_entries, loop_exits, dim_idx);
auto finalization_offsets = init_finalization_offsets(ptr_increments, work_amount);
const auto io_data_sizes = init_element_type_sizes(loop_entries, loop_exits);
const auto& loop_begin = std::make_shared<op::LoopBegin>();
const auto& loop_begin_expr = linear_ir.create_expression(loop_begin, std::vector<PortConnectorPtr>{});
linear_ir.insert(loop_begin_pos, loop_begin_expr);
const auto& loop_end = std::make_shared<op::LoopEnd>(
loop_begin->output(0), work_amount, work_amount_increment, ptr_increments, finalization_offsets,
io_data_sizes, loop_entries.size(), loop_exits.size());
loop_end->has_outer_loop = has_outer_loop;
std::vector<PortConnectorPtr> loop_end_inputs;
for (const auto& expr_port : loop_entries)
loop_end_inputs.push_back(expr_port.get_expr()->get_input_port_connector(expr_port.get_index()));
for (const auto& expr_port : loop_exits)
loop_end_inputs.push_back(expr_port.get_expr()->get_output_port_connector(expr_port.get_index()));
loop_end_inputs.push_back(loop_begin_expr->get_output_port_connector(0));
const auto& loop_end_expr = linear_ir.create_expression(loop_end, loop_end_inputs);
linear_ir.insert(loop_end_pos, loop_end_expr);
}
bool InitLoops::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::InitLoops")
if (linear_ir.empty())
return false;
const auto& loop_manager = linear_ir.get_loop_manager();
std::set<size_t> inserted_loops;
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto expr = *expr_it;
const auto& node = expr->get_node();
if (ov::is_type<op::LoopBase>(node) ||
ov::is_type<op::Buffer>(node) || // Need to cover Buffer
ov::is_type<ov::op::v0::Parameter>(node) ||
ov::is_type<ov::op::v0::Result>(node))
continue;
// Outer Loop ----> Inner Loop
const auto expr_loops = expr->get_loop_ids();
const auto loop_depth = expr_loops.size();
for (size_t i = 0; i < loop_depth; ++i) {
const auto loop_id = expr_loops[i];
if (loop_id == Expression::LOOP_NULL_ID)
continue;
bool need_to_insert = inserted_loops.find(loop_id) == inserted_loops.end();
if (need_to_insert) {
const auto loop_info = loop_manager->get_loop_info(loop_id);
const bool has_outer_loop = i > 0 && inserted_loops.find(expr_loops[i - 1]) != inserted_loops.end();
insertion(linear_ir, loop_info, loop_id, loop_depth - i - 1, has_outer_loop);
inserted_loops.insert(loop_id); // save Loop ID
}
}
}
return true;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,238 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/insert_buffers.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
InsertBuffers::InsertBuffers(int32_t buffer_allocation_rank)
: Pass(), m_buffer_allocation_rank(buffer_allocation_rank) {}
LinearIR::constExprIt InsertBuffers::insertion_position(const LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager,
const ExpressionPtr& up_expr, const ExpressionPtr& down_expr) {
const auto up_loops = up_expr->get_loop_ids();
const auto down_loops = down_expr->get_loop_ids();
OPENVINO_ASSERT(up_loops.size() == down_loops.size(), "The Loop IDs must be normalized!");
size_t loop_idx = 0;
for (; loop_idx < up_loops.size(); ++loop_idx) {
if (up_loops[loop_idx] != down_loops[loop_idx])
break;
}
// If loop_ids of expressions are equal and don't contain LOOP_NULL_ID, it's attempt to insert Buffer between expressions from the same Loop!
if (loop_idx == up_loops.size() && std::none_of(up_loops.begin(), up_loops.end(), [](const size_t id) { return id == Expression::LOOP_NULL_ID; }))
OPENVINO_THROW("Buffer isn't supported in Inner Loop at the moment!");
// If the both expressions are outside Loops, insert Buffer explicitly after first Expression
if (loop_idx == up_loops.size()) {
return std::next(std::find(linear_ir.begin(), linear_ir.end(), up_expr));
}
const auto up_loop_id = up_loops[loop_idx];
const auto down_loop_id = down_loops[loop_idx];
if (up_loop_id != Expression::LOOP_NULL_ID) {
// If upper expression is inside Loop, we should insert Buffer after this Loop
const auto loop_info = loop_manager->get_loop_info(up_loop_id);
LinearIR::constExprIt loop_begin_pos, loop_end_pos;
loop_manager->get_loop_bounds(linear_ir, up_loop_id, loop_begin_pos, loop_end_pos);
return loop_end_pos;
} else if (down_loop_id != Expression::LOOP_NULL_ID) {
// If lower expression is inside Loop, we should insert Buffer before this Loop
const auto loop_info = loop_manager->get_loop_info(down_loop_id);
LinearIR::constExprIt loop_begin_pos, loop_end_pos;
loop_manager->get_loop_bounds(linear_ir, down_loop_id, loop_begin_pos, loop_end_pos);
return loop_begin_pos;
} else {
OPENVINO_THROW("Incorrect configuration for Buffer insertion!");
}
}
void InsertBuffers::insertion(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, size_t loop_id,
const std::vector<ExpressionPort>& loop_entries, const std::vector<ExpressionPort>& loop_exits) {
for (const auto& entry_point : loop_entries) {
const auto& expr = entry_point.get_expr();
const auto port = entry_point.get_index();
const auto node = expr->get_node();
const auto& input_connector = expr->get_input_port_connector(port);
const auto& parent_expr_output = input_connector->get_source();
const auto& parent_expr = parent_expr_output.get_expr();
const auto parent_port = parent_expr_output.get_index();
const auto parent = parent_expr->get_node();
if (ov::is_type<op::Buffer>(parent) ||
ov::is_type<op::VectorBuffer>(parent) ||
ov::is_type<ov::op::v0::Parameter>(parent) ||
ov::is_type<ov::op::v0::Constant>(parent))
continue;
// Each MemoryAccess op needs Buffer
const auto parent_ma = ov::as_type_ptr<op::MemoryAccess>(parent);
const auto node_ma = ov::as_type_ptr<op::MemoryAccess>(node);
bool is_buffer_needed = (parent_ma && parent_ma->is_memory_access_output_port(parent_port)) ||
(node_ma && node_ma->is_memory_access_input_port(port));
if (!is_buffer_needed) {
const auto current_loops = expr->get_loop_ids();
const auto parent_loops = parent_expr->get_loop_ids();
const auto current_loop_count = current_loops.size();
const auto parent_loop_count = parent_loops.size();
OPENVINO_ASSERT(current_loop_count == parent_loop_count);
const auto current_loop_lvl = std::distance(current_loops.begin(), std::find(current_loops.begin(), current_loops.end(), loop_id));
for (size_t i = current_loop_lvl; i < current_loop_count; i++) {
if (current_loops[i] != parent_loops[i] &&
current_loops[i] != Expression::LOOP_NULL_ID &&
parent_loops[i] != Expression::LOOP_NULL_ID) {
is_buffer_needed = true;
break;
}
}
}
if (is_buffer_needed) {
// We should insert Buffer between first different Loops.
// Example: Target Parent Loop identifies: 3, 2, 1
// Current expr Loop identifies: 3, 4, 6
// Need to insert between 2nd and 4th Loops - after 2nd Loop
const auto pos = insertion_position(linear_ir, loop_manager, parent_expr, expr);
const auto buffer = std::make_shared<op::Buffer>(parent->output(parent_port), m_buffer_allocation_rank);
PortDescriptorUtils::set_port_descriptor_ptr(buffer->output(0), parent_expr_output.get_descriptor_ptr()->clone());
// Output connector is automatically filled from PortDescriptor
const auto buffer_expr = linear_ir.create_expression(buffer, {input_connector});
linear_ir.insert(pos, buffer_expr);
linear_ir.replace_input(entry_point, buffer_expr->get_output_port_connector(0));
}
}
for (const auto& exit_point : loop_exits) {
const auto& expr = exit_point.get_expr();
const auto port = exit_point.get_index();
const auto node = expr->get_node();
const auto output_connector = exit_point.get_port_connector_ptr();
const auto child_exprs_inputs = output_connector->get_consumers();
const auto current_loops = expr->get_loop_ids();
const auto current_loop_count = current_loops.size();
const std::vector<PortConnectorPtr> node_outs = {output_connector};
std::set<ExpressionPort> potential_consumers;
std::set<ExpressionPtr> buffers;
const auto current_loop_lvl = std::distance(current_loops.begin(), std::find(current_loops.begin(), current_loops.end(), loop_id));
for (const auto& child_expr_input : child_exprs_inputs) {
const auto& child_expr = child_expr_input.get_expr();
const auto child_port = child_expr_input.get_index();
const auto& child = child_expr->get_node();
if (ov::is_type<ov::op::v0::Result>(child))
continue;
if (ov::is_type<op::Buffer>(child)) {
buffers.insert(child_expr);
continue;
}
// Each MemoryAccess op needs Buffer
const auto child_ma = ov::as_type_ptr<op::MemoryAccess>(child);
const auto node_ma = ov::as_type_ptr<op::MemoryAccess>(node);
if ((child_ma && child_ma->is_memory_access_input_port(child_port)) ||
(node_ma && node_ma->is_memory_access_output_port(port))) {
potential_consumers.insert(child_expr_input);
continue;
}
const auto child_loops = child_expr->get_loop_ids();
const auto child_loop_count = child_loops.size();
OPENVINO_ASSERT(current_loop_count == child_loop_count, "The Loop IDs must be normalized!");
for (size_t i = current_loop_lvl; i < child_loop_count; i++) {
if (current_loops[i] != child_loops[i] &&
current_loops[i] != Expression::LOOP_NULL_ID &&
child_loops[i] != Expression::LOOP_NULL_ID) {
potential_consumers.insert(child_expr_input);
break;
}
}
}
if (!potential_consumers.empty() || buffers.size() > 1) {
// If some of children from one common port are different Buffers,
// we should remove them to insert one common Buffer on one common port
if (!buffers.empty()) {
for (const auto& buffer : buffers) {
const auto& buffer_out = buffer->get_output_port_connector(0);
const auto buffer_consumers_inputs = buffer_out->get_consumers();
linear_ir.replace_input(buffer_consumers_inputs, output_connector);
potential_consumers.insert(buffer_consumers_inputs.begin(), buffer_consumers_inputs.end());
linear_ir.erase(std::find(linear_ir.begin(), linear_ir.end(), buffer));
}
}
// We should insert Buffer between first different Loops.
// Example: Current expr Loop identifies: 3, 2, 1
// Target consumers Loop identifies: 3, 4, 6
// Need to insert after 2nd Loops
// Note: All potential consumers must have the same count of first equal Loop identifies and the same count of different last identifies
// TODO: Need to verify that
const auto pos = insertion_position(linear_ir, loop_manager, expr, (*potential_consumers.begin()).get_expr());
auto buffer = std::make_shared<op::Buffer>(node->output(port), m_buffer_allocation_rank);
PortDescriptorUtils::set_port_descriptor_ptr(buffer->output(0), exit_point.get_descriptor_ptr()->clone());
// We cannot insert Node output connector on Buffer output because not all consumers of Node needs Buffer
// Example:
// Add
// / \ <- It should be the same TD
// Result Buffer
// | <- It should be new TD
// Relu
// Output port connector is automatically filled from PortDescriptor
const auto buffer_expr = linear_ir.create_expression(buffer, node_outs);
linear_ir.insert(pos, buffer_expr);
linear_ir.replace_input(potential_consumers, buffer_expr->get_output_port_connector(0));
}
}
}
bool InsertBuffers::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::InsertBuffers")
if (linear_ir.empty())
return false;
const auto& loop_manager = linear_ir.get_loop_manager();
const auto loop_data_map = loop_manager->get_map();
for (const auto& loop_data : loop_data_map) {
const auto loop_id = loop_data.first;
const auto loop_info = loop_data.second;
const auto loop_entries = loop_info->entry_exprs;
const auto loop_exits = loop_info->exit_exprs;
insertion(linear_ir, loop_manager, loop_id, loop_entries, loop_exits);
}
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto expr = *expr_it;
const auto node = (*expr_it)->get_node();
const auto ma = ov::as_type_ptr<op::MemoryAccess>(node);
if (!ma)
continue;
const auto input_ports = ma->get_memory_access_input_ports();
const auto output_ports = ma->get_memory_access_output_ports();
std::vector<ExpressionPort> loop_entries(input_ports.size()), loop_exits(output_ports.size());
for (const auto& p : input_ports) {
loop_entries[p.first] = expr->get_input_port(p.first);
}
for (const auto& p : output_ports) {
loop_exits[p.first] = expr->get_output_port(p.first);
}
insertion(linear_ir, loop_manager, Expression::LOOP_NULL_ID, loop_entries, loop_exits);
}
return true;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,175 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/insert_load_store.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
namespace {
auto get_inner_loop_id(const std::vector<size_t>& loop_ids) -> size_t {
size_t inner_loop = Expression::LOOP_NULL_ID;
for (int i = static_cast<int>(loop_ids.size()) - 1; i >= 0; --i) {
if (loop_ids[i] != Expression::LOOP_NULL_ID) {
inner_loop = loop_ids[i];
break;
}
}
return inner_loop;
}
} // namespace
using LoopManager = LinearIR::LoopManager;
using LoopInfoPtr = LoopManager::LoopInfoPtr;
InsertLoadStore::InsertLoadStore(size_t vector_size) : m_vector_size(vector_size) {}
void InsertLoadStore::update_loops(const LinearIR::LoopManagerPtr& loop_manager, const std::vector<size_t>& loop_ids,
const ExpressionPort& actual_port, const std::vector<ExpressionPort>& target_ports, bool is_entry) {
for (auto loop_id : loop_ids) {
if (loop_id != Expression::LOOP_NULL_ID)
update_loop(loop_manager->get_loop_info(loop_id), actual_port, target_ports, is_entry);
}
}
void InsertLoadStore::update_loop(const LinearIR::LoopManager::LoopInfoPtr& loop_info,
const ExpressionPort& actual_port, const std::vector<ExpressionPort>& target_ports, bool is_entry) {
auto& ports = is_entry ? loop_info->entry_exprs : loop_info->exit_exprs;
auto port_it = std::find(ports.begin(), ports.end(), actual_port);
if (port_it == ports.end())
return;
port_it = ports.erase(port_it);
ports.insert(port_it, target_ports.cbegin(), target_ports.cend());
}
size_t InsertLoadStore::get_count(const PortDescriptorPtr& port_desc) const {
const auto layout = port_desc->get_layout();
const auto shape = port_desc->get_shape();
// Find last dimension by layout
const auto last_dim_idx = std::find(layout.begin(), layout.end(), layout.size() - 1);
OPENVINO_ASSERT(last_dim_idx != layout.end(), "Load/Store expression have incorrect layout");
const auto dim = shape[*last_dim_idx];
return dim == 1 ? 1 : m_vector_size;
}
bool InsertLoadStore::insert_load(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it) {
const auto& loop_manager = linear_ir.get_loop_manager();
const auto& data_expr = *data_expr_it;
const auto& data_node = data_expr->get_node();
const auto& output_connector = data_expr->get_output_port_connector(0);
const auto consumer_inputs = output_connector->get_consumers();
bool was_inserted = false;
for (const auto& consumer_input : consumer_inputs) {
const auto& consumer_expr = consumer_input.get_expr();
const auto port = consumer_input.get_index();
const auto& consumer = consumer_expr->get_node();
const auto ma = ov::as_type_ptr<op::MemoryAccess>(consumer);
if (ma && ma->is_memory_access_input_port(port))
return false;
// Find Inner Loop
const auto& loop_ids = consumer_expr->get_loop_ids();
const auto inner_loop = get_inner_loop_id(loop_ids);
OPENVINO_ASSERT(inner_loop != Expression::LOOP_NULL_ID, "Loop hasn't been found!");
const auto load = std::make_shared<op::Load>(data_node->output(0), get_count(data_expr->get_output_port_descriptor(0)));
PortDescriptorUtils::set_port_descriptor_ptr(load->output(0), consumer_input.get_descriptor_ptr()->clone());
const auto load_expr = linear_ir.create_expression(load, {output_connector});
linear_ir.insert(std::find(data_expr_it, linear_ir.cend(), consumer_expr), load_expr);
linear_ir.replace_input(consumer_input, load_expr->get_output_port_connector(0));
// Copy Loop identifies
load_expr->set_loop_ids(loop_ids);
// Need to update all the corresponding Loops with the same Entry Point
const auto prev_entry_point = consumer_input;
const auto new_entry_point = load_expr->get_input_port(0);
update_loops(loop_manager, loop_ids, prev_entry_point, {new_entry_point}, true);
was_inserted = true;
}
return was_inserted;
}
bool InsertLoadStore::insert_store(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it) {
const auto& loop_manager = linear_ir.get_loop_manager();
const auto& data_expr = *data_expr_it;
const auto& input_connector = data_expr->get_input_port_connector(0);
const auto& parent_output = input_connector->get_source();
const auto& parent_expr = parent_output.get_expr();
const auto port = parent_output.get_index();
const auto& parent = parent_expr->get_node();
const auto ma = ov::as_type_ptr<op::MemoryAccess>(parent);
if (ma && ma->is_memory_access_output_port(port))
return false;
// Find Inner Loop
const auto& loop_ids = parent_expr->get_loop_ids();
const auto inner_loop = get_inner_loop_id(loop_ids);
OPENVINO_ASSERT(inner_loop != Expression::LOOP_NULL_ID, "Loop hasn't been found!");
const auto store = std::make_shared<op::Store>(parent->output(port), get_count(data_expr->get_input_port_descriptor(0)));
PortDescriptorUtils::set_port_descriptor_ptr(store->output(0), parent_output.get_descriptor_ptr()->clone());
const auto store_expr = linear_ir.create_expression(store, {input_connector});
const auto& reverse_insertion_pos = std::find(std::reverse_iterator<LinearIR::constExprIt>(data_expr_it), linear_ir.crend(), parent_expr);
const auto& insertion_pos = reverse_insertion_pos.base();
linear_ir.insert(insertion_pos, store_expr);
linear_ir.replace_input(data_expr->get_input_port(0), store_expr->get_output_port_connector(0));
// Copy Loop identifies
store_expr->set_loop_ids(loop_ids);
// Need to update all the corresponding Loops with the same Exit Point
const auto prev_exit_point = parent_output;
// The previous exit point byt one output port can have several consumers that can be potential exit points
// So we should verify on the possible future exit points
const auto consumer_inputs = input_connector->get_consumers();
const auto should_be_saved = std::any_of(consumer_inputs.begin(), consumer_inputs.end(),
[](const ExpressionPort& input_port) {
const auto& node = input_port.get_expr()->get_node();
return ov::is_type<ov::op::v0::Result>(node) || ov::is_type<op::Buffer>(node);
});
const auto new_exit_point = store_expr->get_output_port(0);
const auto new_exit_points = should_be_saved ? std::vector<ExpressionPort>{prev_exit_point, new_exit_point}
: std::vector<ExpressionPort>{new_exit_point};
update_loops(loop_manager, loop_ids, prev_exit_point, new_exit_points, false);
return true;
}
bool InsertLoadStore::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::InsertLoadStore")
bool modified = false;
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto expr = *expr_it;
const auto& node = expr->get_node();
if (ov::is_type<ov::op::v0::Parameter>(node)) {
modified |= insert_load(linear_ir, expr_it);
continue;
}
if (ov::is_type<ov::op::v0::Result>(node)) {
modified |= insert_store(linear_ir, expr_it);
continue;
}
if (auto buffer = ov::as_type_ptr<op::Buffer>(node)) {
modified |= insert_load(linear_ir, expr_it);
if (buffer->is_intermediate_memory())
modified |= insert_store(linear_ir, expr_it);
continue;
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,210 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/insert_tail_loop.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
void InsertTailLoop::tail_transformations(LinearIR& linear_ir,
LinearIR::container::const_iterator tail_begin,
LinearIR::container::const_iterator tail_end,
const size_t tail_size) {
const auto& config = linear_ir.get_config();
auto insertFill = [tail_size](const ov::Input<ov::Node>& input) -> std::shared_ptr<ov::Node> {
std::shared_ptr<ov::Node> fill = nullptr;
auto& rt = input.get_rt_info();
auto fill_rt = rt.find("set_fill");
if (fill_rt != rt.end()) {
const auto fill_value = fill_rt->second.as<uint32_t>();
fill = std::make_shared<ov::snippets::op::Fill>(input.get_source_output(), tail_size, fill_value);
input.get_node()->set_argument(input.get_index(), fill);
}
return fill;
};
for (auto expr_it = tail_begin; expr_it != tail_end; expr_it++) {
// We should fill vector regs by float_min and zero to have
// correct math calculations for ReduceMax and ReduceSum in scalar case.
// Note: We find Maximum and Add ops because HorizonMax and HorizonSum are outside Loop,
// so they are missed in <tail>
auto op = (*expr_it)->get_node();
if (config.m_need_fill_tail_register &&
(ov::is_type<ov::op::v1::Maximum>(op) ||
ov::is_type<ov::op::v1::Add>(op))) {
for (size_t i = 0; i < op->inputs().size(); ++i) {
if (auto fill = insertFill(op->input(i))) {
const auto& input = expr_it->get()->get_input_port_connector(i);
const auto consumers = input->get_consumers();
auto fill_expr = linear_ir.create_expression(fill, {input});
linear_ir.insert(expr_it, fill_expr);
linear_ir.replace_input(consumers, fill_expr->get_output_port_connector(0));
// in_reg == out_reg since we want to modify vector reg inplace
const auto reg = expr_it->get()->get_input_port_descriptor(0)->get_reg();
fill_expr->get_input_port_descriptor(0)->set_reg(reg);
fill_expr->get_output_port_descriptor(0)->set_reg(reg);
}
}
} else if (const auto memory_access = std::dynamic_pointer_cast<ov::snippets::op::MemoryAccess>(op)) {
for (const auto p : memory_access->get_memory_access_input_ports()) {
const auto port = p.first;
if (memory_access->get_input_count(port) > 1) {
memory_access->set_input_count(tail_size, port);
}
}
for (const auto p : memory_access->get_memory_access_output_ports()) {
const auto port = p.first;
if (memory_access->get_output_count(port) > 1) {
memory_access->set_output_count(tail_size, port);
}
}
}
}
}
bool InsertTailLoop::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::insertTailLoop")
bool modified = false;
// *1* solo vector/tail loop + empty outer loop
// => skip increments (both counter & ptr) : set evaluate_once flag
// *2* solo vector/tail loop + non-empty outer loop
// => skip counter increments but perform ptr increments : set evaluate_once,
// and perform pointer increments through finalization offsets
// *3* vector loop(s) + one tail loop
// => vector as usual, tail depends on outer loop, see *1* and *2*
auto optimize_single_evaluation = [](const std::shared_ptr<op::LoopEnd>& loop, bool force_ptr_increment = false) {
if (loop->get_work_amount() < 2 * loop->get_increment()) {
loop->set_evaluate_once(true);
if (force_ptr_increment || loop->has_outer_loop) {
std::vector<int64_t> new_finalization_offsets(loop->get_finalization_offsets());
const auto& ptr_increments = loop->get_ptr_increments();
const auto work_amount_incr = static_cast<int64_t>(loop->get_increment());
for (size_t i = 0; i < new_finalization_offsets.size(); i++) {
new_finalization_offsets[i] += ptr_increments[i] * work_amount_incr;
}
loop->set_finalization_offsets(new_finalization_offsets);
}
return true;
} else {
return false;
}
};
auto is_loop_with_buffers = [&linear_ir](const std::shared_ptr<op::LoopEnd>& loop_end) {
auto is_buffer_input = [](const PortConnectorPtr& input) {
const auto& parent_expr = input->get_source().get_expr();
return ov::is_type<op::Buffer>(parent_expr->get_node());
};
auto is_buffer_output = [](const PortConnectorPtr& output) {
const auto child_exprs_inputs = output->get_consumers();
return std::any_of(child_exprs_inputs.begin(), child_exprs_inputs.end(),
[](const ExpressionPort& lp) {return ov::is_type<op::Buffer>(lp.get_expr()->get_node());});
};
const auto& loop_end_expr = linear_ir.get_expr_by_node(loop_end);
const auto inputs = loop_end_expr->get_input_port_connectors();
const auto in_num = loop_end->get_input_num();
const auto out_num = loop_end->get_output_num();
OPENVINO_ASSERT(inputs.size() == (in_num + out_num + 1),
std::string("The LoopEnd expression must have the count of inputs is") +
std::string("equal to count of input and outputs of Loop plus one for work amount"));
const std::vector<PortConnectorPtr> loop_ins(inputs.begin(), inputs.begin() + in_num);
const std::vector<PortConnectorPtr> loop_outs(inputs.begin() + in_num, inputs.begin() + in_num + out_num);
return std::any_of(loop_ins.begin(), loop_ins.end(), is_buffer_input) ||
std::any_of(loop_outs.begin(), loop_outs.end(), is_buffer_output);
};
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end();) {
const auto& loop_begin = ov::as_type_ptr<ov::snippets::op::LoopBegin>((*expr_it)->get_node());
if (!loop_begin) {
expr_it++;
continue;
}
// ignore outer loops and possible manual scalar loops
const auto& loop_end = loop_begin->get_loop_end();
if (loop_end->get_increment() != 1) {
auto loop_begin_expr_it = expr_it;
const auto vector_loop_end = loop_end;
while ((*expr_it)->get_node() != vector_loop_end)
expr_it++;
// Note that exp_it points to the element AFTER loop_end
expr_it++;
const bool is_there_buffer = is_loop_with_buffers(vector_loop_end);
const auto work_amount = vector_loop_end->get_work_amount();
const auto increment = vector_loop_end->get_increment();
const auto tail_size = work_amount % increment;
const auto need_tail = tail_size != 0;
const auto need_vector_loop = work_amount >= increment;
// Note, that finalization_offsets could be modified inside optimize_single_evaluation,
// so need to save them here to cover (evaluate_once vector with non-zero finalization_offsets + tail)
std::vector<int64_t> tail_finalization_offsets = need_tail ? vector_loop_end->get_finalization_offsets()
: std::vector<int64_t> {};
// vector loops are required => Just copy the body, original loop is already a vector one
if (need_vector_loop) {
// Note that finalization offsets should be applied after the last iteration.
// So if there is a tail, then we should apply offsets after it, but not now.
if (need_tail)
vector_loop_end->set_finalization_offsets(
std::vector<int64_t>(tail_finalization_offsets.size(), 0));
// force ptr increments if there is tail
optimize_single_evaluation(vector_loop_end, need_tail || is_there_buffer);
}
// tail is required => transform the body into a tail representation
// tail loop is fake loop because for tail we should calculate only
// finalization offsets which are supported by LoopEnd.
if (need_tail) {
LinearIR::constExprIt tail_begin;
LinearIR::constExprIt tail_end;
if (need_vector_loop) {
auto vector_loop_deep_copy = LinearIR::deep_copy_range(loop_begin_expr_it, expr_it);
auto is_par_or_res = [](const ExpressionPtr& expr) {
return is_type<ov::op::v0::Parameter>(expr->get_node()) ||
is_type<ov::op::v0::Result>(expr->get_node());
};
// Note: It's illegal to insert Parameter or Result to the IR, but they can appear inside vector loop
// So we have to remo them before injecting tail loop into linear_ir
auto to_erase = std::remove_if(vector_loop_deep_copy.begin(), vector_loop_deep_copy.end(), is_par_or_res);
vector_loop_deep_copy.erase(to_erase, vector_loop_deep_copy.end());
tail_begin = linear_ir.insert(expr_it, vector_loop_deep_copy.begin(), vector_loop_deep_copy.end());
tail_end = expr_it;
} else {
tail_begin = loop_begin_expr_it;
tail_end = expr_it;
}
tail_transformations(linear_ir, tail_begin, tail_end, tail_size);
std::shared_ptr<op::LoopEnd> tail_loop_end =
ov::as_type_ptr<op::LoopBegin>((*tail_begin)->get_node())->get_loop_end();
tail_loop_end->set_finalization_offsets(tail_finalization_offsets);
tail_loop_end->set_increment(tail_size);
// ptr increments were set to the old increment, need to update them in accordance with the new one
tail_loop_end->set_work_amount(tail_size);
tail_loop_end->has_outer_loop = vector_loop_end->has_outer_loop;
// Note: despite the fact that the tail loop is always executed once, we still need
// to keep finalization_offsets to reset Buffer
optimize_single_evaluation(tail_loop_end, is_there_buffer);
}
modified = true;
} else {
// if there is a loop, then exprt_it already points to the next statement (after loop end)
// so we need to increment iterator only if there was no loop
expr_it++;
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,65 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/load_movebroadcast_to_broadcastload.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
bool LoadMoveBroadcastToBroadcastLoad::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::LoadMoveBroadcastToBroadcastLoad")
bool modified = false;
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto& expr = *expr_it;
const auto& op = expr->get_node();
// Match on MoveBroadcast because MoveBroadcast is rare node in bodies
if (const auto move_broadcast = ov::as_type_ptr<op::BroadcastMove>(op)) {
const auto& interm_connector = expr->get_input_port_connector(0);
const auto parent_expr = interm_connector->get_source().get_expr();
const auto load = ov::as_type_ptr<op::Load>(parent_expr->get_node());
if (!load)
continue;
// Cannot rewrite Broadcast + Load if load has more than 1 user
// or more than one input, or if Broadcast has several inputs
const auto load_consumers_inputs = interm_connector->get_consumers();
size_t count = 0;
for (const auto& consumer_expr_input : load_consumers_inputs) {
const auto consumer = consumer_expr_input.get_expr()->get_node();
if (!ov::is_type<op::LoopEnd>(consumer))
count++;
}
if (count > 1)
continue;
const auto& outshape = move_broadcast->get_output_partial_shape(0);
const auto broadcastload = std::make_shared<snippets::op::BroadcastLoad>(load->input_value(0), outshape, load->get_offset());
const auto move_consumers = expr->get_output_port_connector(0)->get_consumers();
PortDescriptorUtils::set_port_descriptor_ptr(broadcastload->output(0), expr->get_output_port(0).get_descriptor_ptr()->clone());
const auto broadcastload_expr = linear_ir.create_expression(broadcastload, { parent_expr->get_input_port_connector(0) });
const auto mv_expr_it = expr_it;
const auto insertion_pos = std::next(expr_it);
expr_it = linear_ir.insert(insertion_pos, broadcastload_expr);
linear_ir.erase(std::find(linear_ir.begin(), mv_expr_it, parent_expr));
linear_ir.erase(mv_expr_it);
linear_ir.replace_input(move_consumers, broadcastload_expr->get_output_port_connector(0));
modified |= true;
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,99 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/mark_loops.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
MarkLoops::MarkLoops(size_t vector_size) : Pass(), m_vector_size(vector_size) {}
bool MarkLoops::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::MarkLoops")
if (linear_ir.empty())
return false;
const auto& lowering_config = linear_ir.get_config();
const auto& loop_manager = linear_ir.get_loop_manager();
auto loop_depth = lowering_config.m_loop_depth;
// Parameters Results or Constants are ignored. They can't be used as a loop starting point
auto is_not_start_point = [](const std::shared_ptr<ov::Node>& node) {
return ov::is_type<ov::op::v0::Result>(node) ||
ov::is_type<ov::op::v0::Constant>(node) ||
ov::is_type<ov::op::v0::Parameter>(node);
};
auto are_conflicted = [](const ExpressionPort& lhs, const ExpressionPort& rhs) {
const auto& lhs_desc = lhs.get_descriptor_ptr();
const auto& rhs_desc = rhs.get_descriptor_ptr();
return lhs_desc->get_subtensor() != rhs_desc->get_subtensor() ||
lhs_desc->get_layout() != rhs_desc->get_layout() ||
lhs_desc->get_shape() != rhs_desc->get_shape();
};
for (auto expr_it = linear_ir.cbegin(); expr_it != linear_ir.cend(); expr_it++) {
const auto expr = *expr_it;
const auto& node = expr->get_node();
if (is_not_start_point(node))
continue;
auto loop_begin_pos = expr_it;
auto loop_end_pos = loop_begin_pos;
bool collapse = true;
do {
const auto& prev_expr = *loop_end_pos;
loop_end_pos++;
// If iterator is the last, we should finish Loop
if (loop_end_pos == linear_ir.end())
break;
// If iterator is the last, we should finish Loop
const auto& current_expr = *loop_end_pos;
const auto& current_node = current_expr->get_node();
if (ov::is_type<ov::op::v0::Result>(current_node) ||
ov::is_type<ov::op::v0::Constant>(current_node))
break;
// We finish Loop if
// - the next expr isn't real consumer
// - the is conflict between the corresponding ports
bool is_connected = false;
bool is_conflicted = false;
for (size_t i = 0; i < prev_expr->get_output_count(); ++i) {
const auto& connector = prev_expr->get_output_port_connector(i);
const auto consumers = connector->get_consumers();
const auto found = std::find_if(consumers.begin(), consumers.end(), [&loop_end_pos](const ExpressionPort& consumer) {
return consumer.get_expr() == *loop_end_pos;
});
if (found != consumers.end()) {
if (are_conflicted(*found, connector->get_source())) {
is_conflicted = true;
break;
}
is_connected = true;
}
}
collapse = is_connected && !is_conflicted;
} while (collapse);
loop_manager->mark_loop(loop_begin_pos, loop_end_pos, loop_depth, m_vector_size);
expr_it = std::prev(loop_end_pos);
}
return true;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,74 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/move_result_out_of_loop.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
bool MoveResultOutOfLoop::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::MoveResultOutOfLoop")
if (linear_ir.empty())
return false;
bool modified = false;
const auto loop_manager = linear_ir.get_loop_manager();
// Visit expressions in reverse order, so we'll move Result to an already visited area.
// This is needed to avoid extra hits, when we match to the same Result twice
for (auto expr_it = linear_ir.crbegin(); expr_it != linear_ir.crend(); expr_it++) {
const auto& forward_it = std::prev(expr_it.base());
const auto& expr = *expr_it;
const auto& node = expr->get_node();
if (!ov::is_type<ov::op::v0::Result>(node)) {
continue;
}
const auto& input_connector = expr->get_input_port_connector(0);
const auto& parent_expr = input_connector->get_source().get_expr();
const auto parent_loop_ids = parent_expr->get_loop_ids();
int outer_loop_id = static_cast<int>(parent_loop_ids.size()) - 1;
for (; outer_loop_id >= 0; --outer_loop_id) {
if (parent_loop_ids[outer_loop_id] != Expression::LOOP_NULL_ID) {
break;
}
}
// Parent is out of Loop: just verify that Result is after Parent
if (outer_loop_id < 0) {
const auto parent_it = std::find(forward_it, linear_ir.cend(), parent_expr);
// If Parent is found after Result, we should move Result
if (parent_it != linear_ir.cend()) {
const auto insertion_pos = std::next(parent_it);
const auto result_it = forward_it;
expr_it = std::prev(expr_it); // save iterator before moving
linear_ir.move(result_it, insertion_pos);
modified = true;
}
continue;
}
LinearIR::constExprIt loop_begin_pos, loop_end_pos;
loop_manager->get_loop_bounds(linear_ir, parent_loop_ids[outer_loop_id], loop_begin_pos, loop_end_pos);
// If the Result isn't found after Outer LoopEnd, need to move it to there
if (std::find(loop_end_pos, linear_ir.cend(), expr) == linear_ir.cend()) {
expr_it = std::prev(expr_it); // save iterator before moving
linear_ir.move(forward_it, loop_end_pos);
modified = true;
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,50 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/move_scalar_to_consumer.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
bool MoveScalarToConsumer::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::MoveScalarToConsumer")
if (linear_ir.empty())
return false;
bool modified = false;
// Visit expressions in reverse order, so we'll move Scalar to an already visited area.
// This is needed to avoid extra hits, when we match to the same Scalar twice
for (auto expr_it = linear_ir.rbegin(); expr_it != linear_ir.rend(); expr_it++) {
const auto expr = expr_it->get();
if (ov::is_type<op::Scalar>(expr->get_node())) {
const auto consumers = expr->get_output_port_connector(0)->get_consumers();
OPENVINO_ASSERT(consumers.size() == 1, "Scalar expression is expected to have a single consumer");
const auto& consumer_expr = consumers.begin()->get_expr();
// Move something only if consumer is not already the next one (previous since the iterator is a reverse one)
auto forward_it = std::prev(expr_it.base());
if (consumer_expr != *std::next(forward_it)) {
expr_it = std::prev(expr_it); // save iterator before moving
auto consumer_it = forward_it;
while (*consumer_it != consumer_expr)
consumer_it++;
linear_ir.move(forward_it, consumer_it);
modified = true;
}
}
}
return modified;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,26 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/pass.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
void PassPipeline::register_pass(const std::shared_ptr<Pass>& pass) {
m_passes.push_back(pass);
}
void PassPipeline::run(LinearIR& linear_ir) {
for (const auto& pass : m_passes) {
pass->run(linear_ir);
}
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

View File

@ -0,0 +1,62 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "snippets/lowered/pass/propagate_layout.hpp"
#include "snippets/lowered/linear_ir.hpp"
#include "snippets/lowered/loop_manager.hpp"
#include "snippets/snippets_isa.hpp"
#include "snippets/itt.hpp"
namespace ov {
namespace snippets {
namespace lowered {
namespace pass {
bool PropagateLayout::run(LinearIR& linear_ir) {
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::PropagateLayout")
if (linear_ir.empty())
return false;
for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) {
const auto& expr = *expr_it;
const auto io_expr = std::dynamic_pointer_cast<IOExpression>(expr);
if (!io_expr)
continue;
const bool is_input = io_expr->get_type() == IOExpression::io_type::INPUT;
const auto& connectors = is_input ? expr->get_output_port_connectors() : expr->get_input_port_connectors();
if (connectors.size() != 1)
OPENVINO_THROW("Parameter/Results should have exactly one output/input");
// If input - we should be looking downstream, if output - upstream
const auto& target_connector = connectors.front();
if (is_input) {
const auto consumer_inputs = target_connector->get_consumers();
// Note that here we consider only the first child (which is usually load),
// but often there is another child - LoopEnd
std::set<std::vector<size_t>> child_layouts;
for (const auto& child_input : consumer_inputs) {
const auto& child = child_input.get_expr();
const auto port = child_input.get_index();
const auto& n = child->get_node();
const auto ma = ov::as_type_ptr<op::MemoryAccess>(n);
if (ma && ma->is_memory_access_input_port(port)) {
child_layouts.insert(child_input.get_descriptor_ptr()->get_layout());
}
}
OPENVINO_ASSERT(child_layouts.size() == 1, "All children of an input expression must have the same layout");
io_expr->get_output_port_descriptor(0)->set_layout(*child_layouts.begin());
} else {
io_expr->get_input_port_descriptor(0)->set_layout(target_connector->get_source().get_descriptor_ptr()->get_layout());
}
}
return true;
}
} // namespace pass
} // namespace lowered
} // namespace snippets
} // namespace ov

Some files were not shown because too many files have changed in this diff Show More