Commit Graph

137 Commits

Author SHA1 Message Date
Joakim Hove
af7693ef6b Add parse option keep '/' in data sections
The records are norally terminated on the first unquoted '/', but for the UDQ
and ACTIONX keywords the data sections of a keyword contain mathematical
expressions which can contain '/' literals. This commit adds a per-keyword
ability to terminate the records on the last '/' instead of the first '/'.
2019-03-05 07:15:29 +01:00
Jørgen Kvalsvik
e884b0664c Redesign cmake
Tune the makefile according to new principles, which adds a few bells
and whistles and for clarity.

Synopsis:

* The dependency on opm-common is completely gone. This is reflected in
  travis and appveyor as well. No non-kitware cmake modules are used.
* Directories are flattened, quite a bit - source code is located in the
  lib/ directory if it belongs to opm-parser, and external/ if third
  party.
* The sibling build feature is implemented through cmake's
  export(PACKAGE) rather than implicitly looking through source files.
* Targets explicitly set required public and private include
  directories, compile options and definitions, which cmake will handle
  and propagate
* opm-parser-config.cmake for downstream users is now provided.
* Dependencies are set up using targets. In the future, when cmake 3.x+
  can be used, these should be either targets from newer Find modules,
  or interface libraries.
* Fewer system specific assumptions are coded in, instead we assume
  cmake or users set up system specific details.
* All module wide configuration and looking up libraries is handled in
  the root makefile - all sub directories only set up libraries and
  compile options for the module in question.
* Targets are defined and links handled transitively because cmake now
  is told about them. ${module_LIBRARIES} variables are gone.

This is largely guided by the principles outlined in
https://rix0r.nl/blog/2015/08/13/cmake-guide/

Most source files are just moved - if they have some content change then
it's nothing more than include fixes or similar in order to make them
compile.
2017-06-01 15:29:23 +02:00
Jørgen Kvalsvik
0c1dae7016 Check for separator/quote table overflow
Just relying on the char data type is not sufficient to guard against
overflows, and several input decks would invoke undefined behaviour.
This code path is extremely hot, so we're essentially only reading the
least significant 7 bits to achieve branchless lookup.
2016-10-21 10:33:28 +02:00
Jørgen Kvalsvik
37c04328ca Remove shared_ptr typedefs 2016-10-19 20:38:28 +02:00
Jørgen Kvalsvik
a51127f0c8 Make character lookup table boolean 2016-10-06 13:17:06 +02:00
Jørgen Kvalsvik
d9443c7355 Replace number parser with boost::spirit::qi
The hand-written number parser functions implemented using strtod and
friends were rather slow (profiling indicates that typically 30% of the
program is spent inside of strtod internals). By using
boost::spirit::qi, which we already depend on through boost-filesystem
and others this portion typically seem to be reduced to 20% (via
instruction count) and with somewhat better cache performance.
Rudimentary measuring indicates ~15% speedup overall.

Additionally, the intention is a lot clearer this way, so readability
received a boost. Compilation time of StarToken goes through the roof.
2016-08-28 11:08:30 +02:00
Jørgen Kvalsvik
db33a9cc55 Prefer function objects in Parser::clean
Implementing these checks as function objects improves performance
slightly (5% or so according to my measurements), probably due to the
functions being inlined rather than reduced to function pointers.
2016-08-08 09:42:41 +02:00
Jørgen Kvalsvik
f571f21171 is_separator includes comma
This deprecates the comma replace function in the reader.
2016-08-04 16:05:53 +02:00
Jørgen Kvalsvik
7e9158d319 Anon namespace, removal of unused string constant 2016-08-04 16:05:53 +02:00
Jørgen Kvalsvik
b2f206d54a Replace RawRecord expanded items; reuse view
Reuses the original records string_view rather than expanding to the
same std::string, we save some allocations, memory cache misses and
simplify the class slightly.

Additionally, the uninteresting add-multiple-identical-records logic
ParserItem did before has been moved into RawRecord and is now performed
by std::deque (which also means it can allocate better for itself). The
addition of prepend deprecates push_front.
2016-08-04 16:05:53 +02:00
Jørgen Kvalsvik
11b4bc2dcd Lookup tables in is_separator/quote
The use of lookup tables reduce branching and seem to improve
performance by a few percent.
2016-08-04 16:05:53 +02:00
Jørgen Kvalsvik
33a87a1ced Fix warnings in StarToken 2016-07-13 23:40:09 +02:00
Jørgen Kvalsvik
06c90c4bc7 RawKeyword uses uppercase from util 2016-06-14 17:01:50 +02:00
Magne Sjaastad
85e3ae61b3 VS2015 : Added missing include to cctype 2016-05-25 10:39:19 +02:00
Magne Sjaastad
393bdb42f2 VS2015 : Added missing include to ssize_t 2016-05-25 10:39:19 +02:00
Jørgen Kvalsvik
1f2c2ba98d Name cases where TITLE is unset
When the TITLE keyword was present in the deck, but no parameter was
given the parser would consume the next keyword as the simulation TITLE.
Override this by writing a default TITLE if it's unset.
2016-05-03 15:42:30 +02:00
Jørgen Kvalsvik
0966a9cb8c Fix RawKeywor tests to reflect assumptions 2016-05-03 13:39:36 +02:00
Jørgen Kvalsvik
120a30e94b Replace std::isspace in parser; add \r to is_sep 2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
784a1a5d78 Replace std::isspace with hand-rolled version
Profiling indicate isspace isn't inlined properly, so we replace it with
a hand-rolled easier-to-inline version.
2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
8a4eb5279c Splitting records with string_view; test updates
The splitting of RawRecords into individual symbols uses string_view.
Also updates tests since RawRecord now assumes that the record string it
receives is complete and does *not* contain the terminating slash.
2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
c2b5da457c Re-implement is_separator to use isspace 2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
07eea89c34 Remove redundant overloads
These overloads were written to allow testing, but with string_view
accepting char* they're unecessary and confusing.
2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
6b64796d49 Add char* constructor to string_view
Mostly relevant for testing, this enables string_view to work as
expected with string literals.
2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
bfa3f799b9 Improved fast path in number parsing
Since string_view uses char* for representation, there is no longer a
need to copy into a local char array for most code paths (only when the
input must be modified due to fortran float formatting).
2016-05-03 09:16:28 +02:00
Jørgen Kvalsvik
b648719513 Base string_view on char*, not string::iterator
By representing string_view as char* instead of
std::string::const_iterator the string_view class bring possibly
slightly lower overhead, but mostly enables some optimisations.
2016-05-03 09:16:27 +02:00
Jørgen Kvalsvik
a1ae0e0067 Updated various functions to accept string_view
To take advantage of parser's internal string_view representation of
keyword names, several signatures have been updated to accept
string_view.
2016-05-03 09:16:27 +02:00
Jørgen Kvalsvik
096e843d08 Moved special-casing of TITLE to RawKeyword
Replaces the special-casing of the TITLE keyword where a terminating
slash is added onto the record with a non-mutating version.
2016-05-03 09:16:27 +02:00
Jørgen Kvalsvik
7d29d63bea Replace string with string_view in inner parser
Several inner parser functions modified to use string_view, to reduce
unecessary copying (and indirectly allocationg) related to passing
strings around.
2016-05-03 09:16:27 +02:00
Jørgen Kvalsvik
db6bb58f60 Remove redundant application of uppercase()
When considering if a keyword is valid, the parser procedures convert
the same string to uppercase twice, with copies. This behaviour has been
changed, and ParserKeyword now assumes it will be given a
correctly-formatted keyword to look up.
2016-05-03 09:15:46 +02:00
Jørgen Kvalsvik
9ce28fb7ea Restructure addRawRecordString
Redcues indentation slightly and adds multiple return to better
communicate that some conditions actually logically terminate at several
points.
2016-05-03 09:15:45 +02:00
Jørgen Kvalsvik
4b4d2c02c0 Reimplementation of stripComments
The implementation has been rewritten to use iterators and renamed for
internal use. The public static function still exists for testability
and easy verification, but should be considered an internal part of the
parser.
2016-05-03 09:15:22 +02:00
Jørgen Kvalsvik
bfb7f6ec0c RawRecord internals use ranges over string methods
Rewrites the internal string mainpulation functions of RawRecord to use
std::algorithm and ranges over string methods.
2016-05-03 09:15:22 +02:00
Joakim Hove
bdb1313f41 RawKeyword getRecord( int ) -> getFirstRecord()
The general loop through all raw records should be based on the iterator
interface of the RawKeyword, but to resolve INCLUDE statements we have
implemented a special case method to get the first record.
2016-04-04 16:21:52 +02:00
Joakim Hove
e57d61468d Added non const record iterators to RawKeyword. 2016-04-03 22:00:30 +02:00
Jørgen Kvalsvik
8e4e9b15d8 ReadValueToken correctly parses numbers
Changes the implementation of numbers parsing from std::atoi/f to
std::strtod/l. These support setting the optional end-of-string pointer
which are used to determine if a parsing was successful or not. This has
the nice side effect of *greatly* simplifying the logic, at the expense
of some C-style details.

Tests added to verify that the different edge cases are handled
properly.
2016-04-01 10:13:37 +02:00
Jørgen Kvalsvik
046afdd3be RawKeyword iterator support
Since RawRecords now has automatic storage, managed by std::list,
offering iterators is feasible. The random access
RawKeyword::getRecord's real use was accessing the records in order,
which now is handled via iterators.
2016-03-30 12:47:33 +02:00
Jørgen Kvalsvik
83ae276d67 Fix string_view iterator invalidation bug
By changing the underlying storage of RawKeyword to std::list, we ensure
that the RawRecords aren't reallocated and moved, preserving the
validity of string_view's. This changes the access complexity of
RawRecord from O(1) to O(n).
2016-03-30 12:31:00 +02:00
Jørgen Kvalsvik
9e76ec5f78 Inline hot-but-trivial functions
These functions are called a lot and are trivial accessors to the
underlying containers. By opening them for inlining we get a decent
performance benefit "for free" via optimisation opportunities.
2016-03-22 14:45:17 +01:00
Jørgen Kvalsvik
0da5cadc75 RawRecords auto store, strings moved to RawRecord
The accumulated strings are moved into RawRecords, which reduces
execution time (rough measurements indicates 4-8%). To facilitate this,
RawRecords are stored directly in the vector in favour of via
shared_ptrs.
2016-03-22 08:58:48 +01:00
Jørgen Kvalsvik
aa064a9050 Fix buffer overflow vulnerability
An attacker using very long decimal integers as input could trigger a
buffer overflow write during int/double parsing.

The vulnerability has been fixed and raw buffer boundaries are checked.
Additionally, integer buffer size is determined by platform 'int' width.
'double' uses a heuristic to support both pure decimal formats (up to 64
characters long) and float formats.
2016-03-15 16:42:02 +01:00
Atgeirr Flø Rasmussen
78e0870bad Use std::list instead of std::vector to fix push_front().
The push_front() method can cause reallocation of expanded_items,
thereby invalidating iterators already stored in m_recordItems.
Switching to std::list fixes this.
2016-03-14 14:54:29 +01:00
Atgeirr Flø Rasmussen
f23af386cf Add missing include, remove unused function. 2016-03-14 13:22:22 +01:00
Jørgen Kvalsvik
55b46da658 Moved RawRecord::isTerminator out of interface
This feature is internal to the raw records and is removed from its
public interface.
2016-03-14 08:29:54 +01:00
Jørgen Kvalsvik
2a650d5972 RawRecord refactoring
Some simple refactoring to remove a redundant check and clean up some
initialisation routines.
2016-03-14 08:29:54 +01:00
Jørgen Kvalsvik
dc094cbb16 More efficient findTerminatingSlash
Uses some heuristics and quick exists to avoid always paying worst case
cost for finding terminating slash.
2016-03-14 08:29:54 +01:00
Jørgen Kvalsvik
28eb195ac3 readValueToken< double > split into fast/slow path.
readValueToken spent almost half its time dealing with weirdly formed or
broken floats. Now has a shorter path that can early return a
successfully parsed float and only do slow handling of cases that need
it (notably zero, fortran style exponent and errors).
2016-03-14 08:29:54 +01:00
Jørgen Kvalsvik
38f88b4e14 RawKeyword::isTerminator uses is_separator 2016-03-14 08:29:54 +01:00
Jørgen Kvalsvik
1d1715b421 RawConsts::is_separator function
This replaces the inefficient RawConsts::separators.find( char ) with an
availble, efficient and inlinable is_separator.
2016-03-14 08:29:53 +01:00
Jørgen Kvalsvik
93b7c0739b Replace boost::lexical_cast<> with std functions
The boost provided lexical cast are inefficient and is shown to be a
slowdown in the inner loop. Replaces them with std::atoi/std::atof and
some simple correctness checking.
2016-03-14 08:29:53 +01:00
Jørgen Kvalsvik
e4ddf884f1 Using operator+ and stream operators 2016-03-14 08:29:53 +01:00