* For 64-bit images, use the physical address as the test pattern in test 2.
This will make it easier to diagnose faults.
* Disable test 1 by default (issue #155).
Test 2 provides the same test coverage. Test 1 may make it slightly easier
to diagnose faults with a 32-bit image, so leave it as an option.
* For 32 bit images, use the physical address to generate the offset in test 2.
Detecting a stage change and using that to reset the offset counter
could fail when the config menu was used to skip to the next test
(issue #224).
If the memory map contains very small segments and we have many active CPUs,
the tests that split the segments into chunks distributed across the CPUs may
end up with chunks that are too small for the test algorithm. With 4K pages
and the current limit of 256 active CPUs, this is currently only a problem
for the block move and modulo-n tests, but if we ever support more than 512
active CPUs, it could affect the other tests too.
For now, just skip segments that are too small in the affected tests. As it
only affects the block move and modulo-n tests and only affects very small
regions of memory, the loss of test coverage is negligable.
This may fix issue #216.
* Improve abstraction in vmem.h and limit memory benchmarking to first 2GB.
The third GB may get used for remapping memory regions that are only
accessed during startup, so it's not safe to use it for the memory
speed tests.
* Fix calculation of end limit for locating memory benchmark workspace.
* Document vmem.h.
* Use window number, not current start address, to detect first window.
* Increase the program low-load range from 1MB to 4MB and make more robust.
If the BIOS has reserved some parts of low memory, there may not be
enough contiguous space left to load the program there (issue #49).
So increase the low-load range to include the first 3MB of high
memory. Also guard against the program being initially loaded
straddling the new boundary.
Co-authored-by: Martin Whitaker <memtest@martin-whitaker.me.uk>
The old barrier implementation was very slow when running on a multi-socket
machine (pcmemtest issue 16).
The new implementation provides two options:
- when blocked, spin on a thread-local flag
- when blocked, execute a HLT instruction and wait for a NMI
The first option might be faster, but we need to measure it to find out. A
new boot command line option is provided to select between the two, with a
third setting that uses a mixture of the two.
Mostly we write and read large chunks of data which will make it likely
that the data is no longer in the cache when we come to verify it. But
this is not always true, and in any case, we shouldn't rely on it.