The general loop through all raw records should be based on the iterator
interface of the RawKeyword, but to resolve INCLUDE statements we have
implemented a special case method to get the first record.
Changes the implementation of numbers parsing from std::atoi/f to
std::strtod/l. These support setting the optional end-of-string pointer
which are used to determine if a parsing was successful or not. This has
the nice side effect of *greatly* simplifying the logic, at the expense
of some C-style details.
Tests added to verify that the different edge cases are handled
properly.
Since RawRecords now has automatic storage, managed by std::list,
offering iterators is feasible. The random access
RawKeyword::getRecord's real use was accessing the records in order,
which now is handled via iterators.
By changing the underlying storage of RawKeyword to std::list, we ensure
that the RawRecords aren't reallocated and moved, preserving the
validity of string_view's. This changes the access complexity of
RawRecord from O(1) to O(n).
These functions are called a lot and are trivial accessors to the
underlying containers. By opening them for inlining we get a decent
performance benefit "for free" via optimisation opportunities.
The accumulated strings are moved into RawRecords, which reduces
execution time (rough measurements indicates 4-8%). To facilitate this,
RawRecords are stored directly in the vector in favour of via
shared_ptrs.
An attacker using very long decimal integers as input could trigger a
buffer overflow write during int/double parsing.
The vulnerability has been fixed and raw buffer boundaries are checked.
Additionally, integer buffer size is determined by platform 'int' width.
'double' uses a heuristic to support both pure decimal formats (up to 64
characters long) and float formats.
The push_front() method can cause reallocation of expanded_items,
thereby invalidating iterators already stored in m_recordItems.
Switching to std::list fixes this.
readValueToken spent almost half its time dealing with weirdly formed or
broken floats. Now has a shorter path that can early return a
successfully parsed float and only do slow handling of cases that need
it (notably zero, fortran style exponent and errors).
The boost provided lexical cast are inefficient and is shown to be a
slowdown in the inner loop. Replaces them with std::atoi/std::atof and
some simple correctness checking.
Modifies RawRecord to internally use string_view instead of copies of
the substrings. This *vastly* reduces copying in the processing of each
record and subsequently improves performance. Reduces total memory usage
in Deck construction.
The templated readValueToken has been moved to source file, and uses
explicit instantiation and linking. The deprecated float specialisation
has been removed.
Every header is self-contained and includes only what it must to
function, relying on users include what they need in source files,
adopting a pay-what-you-use model (in particular for internal
dependencies).
i.e. first try to convert it normally and only if this fails, replace
'd' by 'e' and try again.
this is because the slowdown when always taking the second path was
about 7 to 9% for the Norne deck on my machine. (I think this is quite
surprising.)
thanks to [at]joakim-hove for the hint.
- use a opm-macro to reduce code duplication
- add a 'test-suite' target which builds tests. for use if BUILD_TESTING
is 0.
- add a 'check' target which builds tests, then executes them
note that comment handling is currently a bit too simplistic as stuff
like
FOO
'-- hello' /
won't work. as far as I can see, this is not different from the state
before this patch, though.
this is just the result of
```
find -iname "*.[ch]pp" | xargs sed -i "s/ *$//"
find opm/parser/share/keywords -type f | xargs sed -i "s/ *$//"
```
so if it causes conflicts with other patches, the others should get
priority. The rationale behind this patch is that some people tell
their editor to remove white space which leads to larger than
necessary patches...
i.e., make keywords ALL_UPPERCASE before using them because Eclipse
seems to be case-insensitive (although this is one of its undocumented
features...)
The Norne deck actually exhibits this atrocity in form of the
'fluxnum' keyword in the file 'INCLUDE/PETRO/FLUXNUM_0704.prop'.
I don't know if Eclipse cares about the case of the keywords, but
opm-parser currently does for sure. (If Eclipse turns out to be
case-insensitive, the easiest fix for us is to just make all keywords
ALL_UPPERCASE...)
This allows to arbitrary characters like stars into strings. e.g.
MYKEYWORD
'123*456' 2*'Hello, World! (*)' /
is now a valid record with three strings while it threw an exception
before.
This patch works by transferring the removal of the quotes from the
RawDeck class to the readValueToken<T>() function which now has a
specialization for strings that deals with quotes. One small
complication is that the RawDeck needs to be adapted in order not to
split tokens in the middle of strings.
Finally, the StarToken class does not deal with the conversion from
string to the value type of the item anymore which allows it to become
a normal class instead of a template...
these includes are required by the headers. If the affected files
would have been included without the headers included before, a
compiler error would have been produced.
i.e. MULT[XYZ]- was trimmed to MULT[XYZ]. Also, the RawKeyword now
uses ParserKeyword::isValidDeckName() instead of a regular expression
which makes it automatically consistent and also should make it
slightly faster...