opm-simulators/doc/handbook/designpatterns.tex
Andreas Lauser 43764cfc40 handbook: replace \textit by \emph
thanks to [at]pgdr for the suggestion.
2017-01-16 12:34:24 +01:00

476 lines
20 KiB
TeX

\chapter{Design Patterns}
\label{chap:DesignPatterns}
This chapter tries to give a high-level understanding of some of the
fundamental techniques which are used by \Dune and \eWoms and the
motivation behind these design decisions. It is assumed that the
reader already has basic experience in object oriented programming
with \Cplusplus.
First, a quick motivation of \Cplusplus template programming is given. Then
follows an introduction to polymorphism and its opportunities as
opened by the template mechanism. After that, some drawbacks
associated with template programming are discussed, in particular the
blow-up of identifier names, the proliferation of template arguments
and some approaches on how to deal with these problems.
\section{\Cplusplus Template Programming}
One of the main features of modern versions of the \Cplusplus programming
language is robust support for templates. Templates are a mechanism
for code generation built directly into the compiler. For the
motivation behind templates, consider a linked list of \texttt{double}
values which could be implemented like this:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
struct DoubleList {
DoubleList(const double &val, DoubleList *prevNode = 0)
{ value = val; if (prevNode) prevNode->next = this; };
double value;
DoubleList *next;
};
int main() {
DoubleList *head, *tail;
head = tail = new DoubleList(1.23);
tail = new DoubleList(2.34, tail);
tail = new DoubleList(3.56, tail);
};
\end{lstlisting}
But what should be done if a list of strings is also required? The
only ``clean'' way to achive this without templates would be to copy
\texttt{DoubleList}, then rename it and change the type of the
\texttt{value} attribute. It is obvious that this is a very
cumbersome, error-prone and unproductive process. For this reason,
recent standards of the \Cplusplus programming language specify the template
mechanism, which is a way of letting the compiler do the tedious work. Using
templates, a generic linked list can be implemented like this:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
template <class ValueType>
struct List {
List(const ValueType &val, List *prevNode = 0)
{ value = val; if (prevNode) prevNode->next = this; };
ValueType value;
List *next;
};
int main() {
typedef List<double> DoubleList;
DoubleList *head, *tail;
head = tail = new DoubleList(1.23);
tail = new DoubleList(2.34, tail);
tail = new DoubleList(3.56, tail);
typedef List<const char*> StringList;
StringList *head2, *tail2;
head2 = tail2 = new StringList("Hello");
tail2 = new StringList(", ", tail2);
tail2 = new StringList("World!", tail2);
};
\end{lstlisting}
Compared to approaches which use external tools for code generation --
which is the approach chosen for example by the
FEniCS~\cite{FENICS-HP} project -- or heavy (ab)use of the C
preprocessor -- as done for example by the UG framework~\cite{UG-HP}
-- templates have several advantages:
\begin{description}
\item[Well Programmable:] Programming errors are directly detected by
the \Cplusplus compiler. Thus, diagnostic messages from the compiler are
directly useful because the source code compiled is the
same as the one written by the developer. This is not the case
for code generators and C-preprocessor macros where the actual
statements processed by the compiler are obfuscated.
\item[Easily Debugable:] Programs which use the template mechanism can be
debugged almost as easily as \Cplusplus programs which do not use
templates. This is due to the fact that the debugger always knows
the ``real'' source file and line number.
\end{description}
For these reasons \Dune and \eWoms extensively use the template
mechanism. Both projects also try to avoid duplicating functionality
provided by the Standard Template Library (STL,~\cite{STL-REF-HP})
which is part of the \Cplusplus-2003 standard and was further extended
in the \Cplusplus11 standard.
\section{Polymorphism}
In object oriented programming, some methods often make sense for all
classes in a hierarchy, but what actually needs to be \emph{done}
can differ for each concrete class. This observation motivates
\emph{polymorphism}. Fundamentally, polymorphism means all
techniques in which a method call results in the processor executing code
which is specific to the type of object for which the method is
called\footnote{This is the \emph{poly} of polymorphism: There are
multiple ways to achieve the same goal.}.
In \Cplusplus, there are two common ways to achieve polymorphism: The
traditional dynamic polymorphism which does not require template
programming, and static polymorphism which is made possible by the
template mechanism.
\subsection*{Dynamic Polymorphism}
To utilize \emph{dynamic polymorphism} in \Cplusplus, the polymorphic
methods are marked with the \texttt{virtual} keyword in the base
class. Internally, the compiler realizes dynamic polymorphism by
storing a pointer to a so-called \texttt{vtable} within each object of
polymorphic classes. The \texttt{vtable} itself stores the entry point
of each method which is declared \texttt{virtual}. If such a method is
called on an object, the compiler generates code which retrieves the
method's memory address from the object's \texttt{vtable} and then
continues execution at this address. This explains why this mechanism
is called \textbf{dynamic} polymorphism: the code which is actually
executed is dynamically determined at run time.
\begin{example}
\label{example:DynPoly}
A class called \texttt{Car} may feature the methods
\texttt{gasUsage} (on line \ref{designpatterns:virtual-usage}, which
by default roughly corresponds to the current $CO_2$ emission goal
of the European Union, but can be changed by classes representing
actual cars. Also, a method called \texttt{fuelTankSize} makes
sense for all cars, but since there is no useful default, its
\texttt{vtable} entry is set to $0$ in the base class on line
\ref{designpatterns:totally-virtual}. This tells the compiler that
it is mandatory for this method to be defined in derived
classes. Finally, the method \texttt{range} may calculate the
expected remaining kilometers the car can drive given a fill level
of the fuel tank. Since the \texttt{range} method can retrieve the
information it needs, it does not need to be polymorphic.
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
// The base class
class Car
{public:
virtual double gasUsage()
{ return 4.5; };/*@\label{designpatterns:virtual-usage}@*/
virtual double fuelTankSize() = 0;/*@\label{designpatterns:totally-virtual}@*/
double range(double fuelTankFillLevel)
{ return 100*fuelTankFillLevel*fuelTankSize()/gasUsage(); }
};
\end{lstlisting}
\noindent
Actual car models can now be derived from the base class like this:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
// A Mercedes S-class car
class S : public Car
{public:
virtual double gasUsage() { return 9.0; };
virtual double fuelTankSize() { return 65.0; };
};
// A VW Lupo
class Lupo : public Car
{public:
virtual double gasUsage() { return 2.99; };
virtual double fuelTankSize() { return 30.0; };
};
\end{lstlisting}
\noindent
Calling the \texttt{range} method yields the correct result for any
kind of car:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
void printMaxRange(Car &car)
{ std::cout << "Maximum Range: " << car.range(1.00) << "\n"; }
int main()
{
Lupo lupo;
S s;
std::cout << "VW Lupo:";
std::cout << "Median range: " << lupo.range(0.50) << "\n";
printMaxRange(lupo);
std::cout << "Mercedes S-Class:";
std::cout << "Median range: " << s.range(0.50) << "\n";
printMaxRange(s);
}
\end{lstlisting}
For both types of cars, \texttt{Lupo} and \texttt{S} the function
\texttt{printMaxRange} works as expected, it yields
$1003.3\;\mathrm{km}$ for the Lupo and $722.2\;\mathrm{km}$ for the
S-Class.
\end{example}
\begin{exc}
What happens if \dots
\begin{itemize}
\item \dots the \texttt{gasUsage} method is removed from the \texttt{Lupo} class?
\item \dots the \texttt{virtual} qualifier is removed in front of the
\texttt{gasUsage} method in the base class?
\item \dots the \texttt{fuelTankSize} method is removed from the \texttt{Lupo} class?
\item \dots the \texttt{range} method in the \texttt{S} class is
overwritten?
\end{itemize}
\end{exc}
\subsection*{Static Polymorphism}
Dynamic polymorphism has a few disadvantages. The most relevant in the
context of \eWoms is that the compiler can not see ``inside'' the
called methods and thus cannot optimize properly. For example, modern
\Cplusplus compilers 'inline' short methods, i.e. they copy the method's body
to where it is called. First, inlining allows to save a few
instructions by avoiding to jump into and out of the method. Second,
and more importantly, inlining also allows further optimizations which
depend on specific properties of the function arguments (e.g. constant
value elimination) or the contents of the function body (e.g. loop
unrolling). Unfortunately, inlining and other cross-method
optimizations are made next to impossible by dynamic
polymorphism. This is because these optimizations are accomplished by the
compiler (i.e. at compile time) whilst the code which actually gets
executed is only determined at run time for \texttt{virtual}
methods. To overcome this issue, template programming can be used to
achive polymorphism at compile time. This works by supplying the type
of the derived class as an additional template parameter to the base
class. Whenever the base class needs to call back the derived class, the \texttt{this} pointer of the base class is reinterpreted as
being a pointer to an object of the derived class and the method is
then called on the reinterpreted pointer. This scheme gives the \Cplusplus
compiler complete transparency of the code executed and thus opens
much better optimization oportunities. Since this mechanism completely
happens at compile time, it is called ``static polymorphism'' because
the called method cannot be dynamically changed at runtime.
\begin{example}
Using static polymorphism, the base class of example \ref{example:DynPoly}
can be implemented like this:
\begin{lstlisting}[name=staticcars,basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
// The base class. The 'Imp' template parameter is the
// type of implementation, i.e. the derived class
template <class Imp>
class Car
{public:
double gasUsage()
{ return 4.5; };
double fuelTankSize()
{ throw "The derived class needs to implement the fuelTankSize() method"; };
double range(double fuelTankFillLevel)
{ return 100*fuelTankFillLevel*asImp_().fuelTankSize()/asImp_().gasUsage(); }
protected:
// reinterpret 'this' as a pointer to an object of type 'Imp'
Imp &asImp_() { return *static_cast<Imp*>(this); }
};
\end{lstlisting}
(Notice the \texttt{asImp\_()} calls in the \texttt{range} method.) The
derived classes may now be defined like this:
\begin{lstlisting}[name=staticcars,basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
// A Mercedes S-class car
class S : public Car<S>
{public:
double gasUsage() { return 9.0; };
double fuelTankSize() { return 65.0; };
};
// A VW Lupo
class Lupo : public Car<Lupo>
{public:
double gasUsage() { return 2.99; };
double fuelTankSize() { return 30.0; };
};
\end{lstlisting}
\end{example}
\noindent
Analogous to example \ref{example:DynPoly}, the two kinds of cars can
be used generically within (template) functions:
\begin{lstlisting}[name=staticcars,basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
template <class CarType>
void printMaxRange(CarType &car)
{ std::cout << "Maximum Range: " << car.range(1.00) << "\n"; }
int main()
{
Lupo lupo;
S s;
std::cout << "VW Lupo:";
std::cout << "Median range: " << lupo.range(0.50) << "\n";
printMaxRange(lupo);
std::cout << "Mercedes S-Class:";
std::cout << "Median range: " << s.range(0.50) << "\n";
printMaxRange(s);
return 0;
}
\end{lstlisting}
%\textbf{TODO: Exercise}
\section{Common Template Programming Related Problems}
Although \Cplusplus template programming opens a few intriguing
possibilities, it also has a few disadvantages. In this section, a few
of them are outlined and some hints on how they can be dealt with are
provided.
\subsection*{Identifier-Name Blow-Up}
One particular problem with the advanced use of \Cplusplus templates is that the
canonical identifier names for types and methods quickly become really
long and unintelligible. For example, a typical error message
generated using GCC 4.5 and \Dune-PDELab looks like this
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize, numbersep=5pt]
test_pdelab.cc:171:9: error: no matching function for call to Dune::\
PDELab::GridOperatorSpace<Dune::PDELab::PowerGridFunctionSpace<Dune::\
PDELab::GridFunctionSpace<Dune::GridView<Dune::DefaultLeafGridViewTraits\
<const Dune::UGGrid<3>, (Dune::PartitionIteratorType)4u> >, Dune::\
PDELab::Q1LocalFiniteElementMap<double, double, 3>, Dune::PDELab::\
NoConstraints, Ewoms::PDELab::BoxISTLVectorBackend<Ewoms::Properties::\
TTag::LensProblem> >, 2, Dune::PDELab::GridFunctionSpaceBlockwiseMapper>\
, Dune::PDELab::PowerGridFunctionSpace<Dune::PDELab::GridFunctionSpace<\
Dune::GridView<Dune::DefaultLeafGridViewTraits<const Dune::UGGrid<3>, \
(Dune::PartitionIteratorType)4u> >, Dune::PDELab::Q1LocalFiniteElementMap\
<double, double, 3>, Dune::PDELab::NoConstraints, Ewoms::PDELab::\
BoxISTLVectorBackend<Ewoms::Properties::TTag::LensProblem> >, 2, Dune::\
PDELab::GridFunctionSpaceBlockwiseMapper>, Ewoms::PDELab::BoxLocalOperator\
<Ewoms::Properties::TTag::LensProblem>, Dune::PDELab::\
ConstraintsTransformation<long unsigned int, double>, Dune::PDELab::\
ConstraintsTransformation<long unsigned int, double>, Dune::PDELab::\
ISTLBCRSMatrixBackend<2, 2>, true>::GridOperatorSpace()
\end{lstlisting}
This seriously complicates diagnostics. Although there is no full
solution to this problem yet, an effective way of dealing with such
kinds of error messages is to ignore the type information and to just
look at the location given at the beginning of the line. If nested
templates are used, the lines printed by the compiler above the actual
error message specify how exactly the code was instantiated (the lines
starting with ``\texttt{instantiated from}''). In this case it is
advisable to look at the innermost source code location of ``recently
added'' source code.
\subsection*{Proliferation of Template Parameters}
Templates often need a large number of template parameters. For
example, the error message above was produced by the following
snipplet:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
int main()
{
enum {numEq = 2};
enum {dim = 3};
typedef Dune::UGGrid<dim> Grid;
typedef Grid::LeafGridView GridView;
typedef Dune::PDELab::Q1LocalFiniteElementMap<double,double,dim> FEM;
typedef TTAG(LensProblem) TypeTag;
typedef Dune::PDELab::NoConstraints Constraints;
typedef Dune::PDELab::GridFunctionSpace<
GridView, FEM, Constraints, Ewoms::PDELab::BoxISTLVectorBackend<TypeTag>
>
doubleGridFunctionSpace;
typedef Dune::PDELab::PowerGridFunctionSpace<
doubleGridFunctionSpace,
numEq,
Dune::PDELab::GridFunctionSpaceBlockwiseMapper
>
GridFunctionSpace;
typedef typename GridFunctionSpace::ConstraintsContainer<double>::Type
ConstraintsTrafo;
typedef Ewoms::PDELab::BoxLocalOperator<TypeTag> LocalOperator;
typedef Dune::PDELab::GridOperatorSpace<
GridFunctionSpace,
GridFunctionSpace,
LocalOperator,
ConstraintsTrafo,
ConstraintsTrafo,
Dune::PDELab::ISTLBCRSMatrixBackend<numEq, numEq>,
true
>
GOS;
GOS gos; // instantiate grid operator space
}
\end{lstlisting}
Although the code above is not really intuitivly readable, this does
not pose a severe problem as long as the type (in this case the grid
operator space) needs to be specified exactly once in the whole
program. If, on the other hand, it needs to be consistend over
multiple locations in the source code, measures have to be taken in
order to keep the code maintainable.
\section{Traits Classes}
A classic approach to reducing the number of template parameters is to
gather all the arguments in a special class, a so-called traits
class. Instead of writing
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
template <class A, class B, class C, class D>
class MyClass {};
\end{lstlisting}
one can use
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
template <class Traits>
class MyClass {};
\end{lstlisting}
where the \texttt{Traits} class contains public type definitions for
\texttt{A}, \texttt{B}, \texttt{C} and \texttt{D}, e.g.
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
struct MyTraits
{
typedef float A;
typedef double B;
typedef short C;
typedef int D;
};
\end{lstlisting}
\noindent
As there is no free lunch, the traits approach comes with a few
disadvantages of its own:
\begin{enumerate}
\item Hierarchies of traits classes are problematic. This is due to
the fact that each level of the hierarchy must be self-contained. As
a result, it is impossible to define parameters in the base class
which depend on parameters which only later get specified by a
derived traits class.
\item Traits quickly lead to circular dependencies. In practice
this means that traits classes can not extract any information from
templates which get the traits class as an argument -- even if the
extracted information does not require the traits class.
\end{enumerate}
\noindent
To see the point of the first issue, consider the following:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
struct MyBaseTraits {
typedef int Scalar;
typedef std::vector<Scalar> Vector;
typedef std::list<Scalar> List;
typedef std::array<Scalar> Array;
typedef std::set<Scalar> Set;
};
struct MyDoubleTraits : public MyBaseTraits {
typedef double Scalar;
};
int main() {
MyDoubleTraits::Vector v{1.41421, 1.73205, 2};
for (int i = 0; i < v.size(); ++i)
std::cout << v[i]*v[i] << std::endl;
}
\end{lstlisting}
Contrary to what is intended, \texttt{v} is a vector of integers. This
problem can not be solved using static polymorphism, either, since it
would lead to a cyclic dependency between \texttt{MyBaseTraits} and
\texttt{MyDoubleTraits}.
The second issue is illuminated by the following example, where one
would expect the \texttt{MyTraits:: VectorType} to be \texttt{std::vector<double>}:
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize,numbers=left,numberstyle=\tiny, numbersep=5pt]
template <class Traits>
class MyClass {
public: typedef double ScalarType;
private: typedef typename Traits::VectorType VectorType;
};
struct MyTraits {
typedef MyClass<MyTraits>::ScalarType ScalarType;
typedef std::vector<ScalarType> VectorType
};
\end{lstlisting}
Although this example seems to be quite pathetic, in practice it is
often useful to specify parameters in such a way.
% TODO: section about separation of functions, parameters and
% independent variables. (e.g. the material laws: BrooksCorey
% (function), BrooksCoreyParams (function parameters), wetting
% saturation/fluid state (independent variables)