It was discovered that it is sometimes useful to mark fast stages (e.g. stages that process less than 100 elements) in order to be able to parse them from final performance report and estimate its contribution into performance.
* Added pass for marking fast stages
* Introduced unit tests