The motivation for this PR is that currently the build fails on my
Ubuntu 17.10 laptop with two processes because that machine "only" has
8 GB of RAM (granted, the optimization options may have been a bit too
excessive). under the new scheme, each specialization of the simulator
is put into a separate compile unit which is part of
libopmsimulators. this has the advantages that the specialized
simulators and the main binary automatically stay consistent, the
compilation is faster (2m25s vs 4m16s on my machine) because all
compile units can be built in parallel and that compilation takes up
less RAM because there is no need to instantiate all specializations
in a single compile unit.
on the minus side, all specializations must now always be compiled,
the approach means slightly more work for the maintainers and the
flow_* startup code gets even more complicated.