The old barrier implementation was very slow when running on a multi-socket
machine (pcmemtest issue 16).
The new implementation provides two options:
- when blocked, spin on a thread-local flag
- when blocked, execute a HLT instruction and wait for a NMI
The first option might be faster, but we need to measure it to find out. A
new boot command line option is provided to select between the two, with a
third setting that uses a mixture of the two.