Commit Graph

23 Commits

Author SHA1 Message Date
Chao Li
97922cc4cf
Add LoongArch support (#410)
* lib/assert: Add LoongArch assert support

Added LoongArch break 3 assert instruction.

Signed-off-by: Chao Li <lichao@loongson.cn>

* lib/barrier: Add barrier method for LoongArch

Added LoongArch barriers in barrier_spin_wait and barrier_halt_wait
functions.

Signed-off-by: Chao Li <lichao@loognson.cn>

* lib/spinlock: Add LoongArch CPU pause

Because the LoongArch haven't pause instruction, using eight nops to
replace the pause.

Signed-off-by: Chao Li <lichao@loongson.cn>

* lib/string: Make LoongArch use the string function in the file

Since LoongArch GCC doesn't have built-in string functions, use the
string function instance in the sting.c

Signed-off-by: Chao Li <lichao@loongson.cn>

* lib/unistd: Add LoongArch CPU pause

Because the LoongArch haven't pause instruction, using eight nops to
replace the pause.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/acpi: Reduce the way of search RSDP for non-x86 ARCHs

Searching RSDP from legacy BIOS EDBA and reserved areas is available
only on i386 and x64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/cache: Add LoongArch64 cache operations support

Added cache operations support for LoongArch64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/cpuid: Add the compile limit

Make the `cpuid` function action only on i386/x64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/heap: Add heap support for LoongArch64

LoongArch64 uses the low 256MB as the low memory.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/memrw: Add 8-bit and 16-bit memory operations

Added 8-bit and 16-bit memory access operations, which 8-bit uses
`movb` and 16-bit is `movw`.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/memrw: Add LoongArch memory access operations

Added 8/16/32/64-bit memory access operations for LoongArch64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system: Add Loongson PCI vendor ID and Loongson 7A chipset EHCI workaround

1. Added Loongson PCI vendor ID.
2. Added Loongson 7A chipset ECHI workaround.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/io: Add LoongArch64 IO port operations

Added IO port operations for LoongArch64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/reloc64: Add LoongArch64 relocations support

Added R_LARCH_RELATIVE and R_LARCH_NONE relocations support for
LoongArch64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/serial: Add Loongson CPU serial port support

Add the serial port address perfix of Loongson CPU and obtain serial
port clock method.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/smbus: Rename smbus.c to i2c_x86.c

Renamed the smbus.c to i2c_x86.c in i386 and x64 platforms.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/smp: Add LoongArch SMP support

Added LoongArch multi-core support and a way of map to node numbers if
the NUMA is enabled.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/timers: Add LoongArch supports

In LoongArch, there is a stable counter that is independent of other
clocks, it like the TSC in x64. Using it to count the ticks per
millisecond.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/tsc: Add LoongArch support

Usually the frequency of stable counter is not same to CPU frequency, so
using the performance counter for the delay operations.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/usbhcd: Add LoongArch MMIO perfix

Added LoongArch64 MMIO address perfix, use for address the PCI memory
space.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/usbhcd: Add Loongson 7A2000 chipset OHCI BAR offset fix

If the BAR address is not fixed for the Loongson 7A2000 OHCI controller,
some prots will not be usable, This change currently only affects the
LoongArch platform.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system: Add the way to IO access via MMIO

Usually, it is access the IO like PCI IO via MMIO on non-X86 ARCHs, so
a method to access IO via MMIO is added.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system: Add the way to access PCI memory space via MMIO

Some uniformly address ARCHs access the PCI memory depended the MMIO, so
the method to access PCI memory via MMIO is added.

Signed-off-by: Chao Li <lichao@loongson.cn>

* app: Add LoongArch version support

Reduced the version field by two characters to support ARCH name
abbreviations with more than three characters, and added "la64" ARCH
version display.

Singed-off-by: Chao Li <lichao@loongson.cn>

* test/block_move: Add block move test via ASM for LoongArch

Add block move test inline assembly instance for LoongArch.

Signed-off-by: Chao Li <lichao@loongson.cn>

* test/mov_inv_fixed: Add LoongArch ASM version word write operation

Add LoongArch ASM version word write cycle if it uses the HAND_OPTIMISED.

Signed-off-by: Chao Li <lichao@loongson.cn>

* boot: Adjust the AP stack size for LoongArch

LoongArch exception will store all of the GP, FP and CSR on stack, it
need more stack size, make LoongArch AP using 2KB stack size.

Signed-off-by: Chao Li <lichao@loongson.cn>

* boot/efisetup: Add LoongArch CPU halt instruction

Add "idle 0" for LoongArch

Signed-off-by: Chao Li <lichao@loongson.cn>

* boot/efi: Limiting the ms_abi using scope

Make the ms_abi only work on i386 and x64.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/imc/loongson: Add Loongson LoongArch IMC support

Added the Loongson LoongArch CPU IMC instance, support read out the IMC
sequence, currently only supports reading MC0.

Signed-off-by: Chao Li <lichao@loongson.cn>

* app/loongarch: Add intrrupt handler for LoongArch

Added the LoongArch IRQ handler support.

Signed-off-by: Chao Li <lichao@loongson.cn>

* system/loongarch: Add LoongArch ARCH specific files

Added LoongArch ARCH specific files: cpuid.c, cpuinfo.c, hwctrl.c,
memctrl.c, temperature.c, vmem.c, registers.h

They use the same pubilc API for i386 and x64 platforms.

Signed-off-by: Chao Li <lichao@loongson.cn>

* boot: Add LoongArch startup and header

Added the header.S and startup64.S for LoongArch, CPU works on:
1. Page mode.
2. Load and store is cacheable.
3. Instructions is cacheable.
4. DMWn 0 and 1 is used.
5. To access non-cacheable areas, use the perfix 0x8000000000000000.

Signed-off Chao Li <lichao@loongson.cn>

* build64/la64: Add LoongArch64 build files

Add infrastructure files to build memtest86 plus for LoongArch64
platform.

Signed-off-by: Chao Li <lichao@loongson.cn>

* workflows: Add LoongArch64 CI supports

Adjust workflow logci, remvoe 32 and 64 wordsize, replace with "i386,
x86_64 and la64", add LoongArch64 build CI check.

Signed-off-by: Chao Li <lichao@loongson.cn>

---------

Signed-off-by: Chao Li <lichao@loongson.cn>
Signed-off-by: Chao Li <lichao@loognson.cn>
2024-08-30 13:38:46 +02:00
Chao Li
e99ce97648 Add the 64-bit and 32-bit CC flag
Added a new CC flag into build32 and build64 Makefiles to distinguish
whether compiling to 32-bit or 64-bit code.

[Lionel Debroux: rebased on the memrw functions refactor.]

Signed-off-by: Chao Li <lichao@loongson.cn>
2024-07-22 22:50:15 +02:00
Lionel Debroux
8d966d98f4
Refactor the memrw functions to reduce the redundancy. (#415)
The impact is limited now, but will increase when adding support for more architectures and more bit widths.
2024-07-16 08:55:13 +01:00
Lionel Debroux
53ca89f8ae
Add initial NUMA awareness support (#378)
* Add a file containing useful macro definitions, currently a single top-level macro for obtaining the size of an array; use it to replace a sizeof(x) / sizeof(x[0]) construct in system/smbus.c . This requires switching the GCC build mode from C11 to C11 with GCC extensions.

* Initial NUMA awareness (#12) support: parse the ACPI SRAT to build up new internal structures related to proximity domains and affinity; use these structures in setup_vm_map() and calculate_chunk() to skip the work on the processors which don't belong to the proximity domain currently being tested.

Tested on a number of 1S single-domain, 2S multi-domain and 4S multi-domain platforms.

SKIP_RANGE(iterations) trick by Martin Whitaker.
2024-03-13 01:43:26 +01:00
Lionel Debroux
9b9c65b968
Reduce padding and relocations (#355)
* Optimize the JEP106 list by using __attribute__((packed)) to remove padding. The x86 & x86_64 series support unaligned accesses just fine, after all, and this is not remotely a hot path.

* Optimize several string-related constructs by switching to fixed-length char arrays, which avoids pointers + relocations.

* app/interrupt.c: array of different-length strings, but most of those are lengthy enough for this to be a clear win, especially on x86_64;
* system/usbhcd.c: array of same-length strings;
* tests/tests.h: array of structs containing same-length strings.

* Reduce the size of the list of tests by using a narrower type for the cpu mode, which reduces padding.
2023-11-29 12:45:17 +01:00
Lionel Debroux
34eb8186fd
Significantly optimize test_mov_inv_walk1() for size by moving the ternary operators related to inverse and pattern outside hot paths, which prevents generated code duplication and shortens one of the sides of duplicated loops. (#351)
Before:
   text    data     bss     dec     hex filename
   3019       0       0    3019     bcb build32/tests/mov_inv_walk1.o
   2640       0       0    2640     a50 build64/tests/mov_inv_walk1.o

After:
   text    data     bss     dec     hex filename
   1705       0       0    1705     6a9 build32/tests/mov_inv_walk1.o
   1464       0       0    1464     5b8 build64/tests/mov_inv_walk1.o
2023-11-19 17:14:08 +01:00
martinwhitaker
186ef6e913
Improved own addr test (#219)
* For 64-bit images, use the physical address as the test pattern in test 2.

This will make it easier to diagnose faults.

* Disable test 1 by default (issue #155).

Test 2 provides the same test coverage. Test 1 may make it slightly easier
to diagnose faults with a 32-bit image, so leave it as an option.

* For 32 bit images, use the physical address to generate the offset in test 2.

Detecting a stage change and using that to reset the offset counter
could fail when the config menu was used to skip to the next test
(issue #224).
2023-01-04 23:26:22 +01:00
Martin Whitaker
5a2bc4c960 Skip segments in tests where the calculated chunk size is too small.
If the memory map contains very small segments and we have many active CPUs,
the tests that split the segments into chunks distributed across the CPUs may
end up with chunks that are too small for the test algorithm. With 4K pages
and the current limit of 256 active CPUs, this is currently only a problem
for the block move and modulo-n tests, but if we ever support more than 512
active CPUs, it could affect the other tests too.

For now, just skip segments that are too small in the affected tests. As it
only affects the block move and modulo-n tests and only affects very small
regions of memory, the loss of test coverage is negligable.

This may fix issue #216.
2022-12-10 15:24:26 +00:00
a1346054
9660eead4e
Simple maintenance improvements (#145)
* Fix typos

* Add missing final newline

* Trim trailing whitespace
2022-08-15 17:51:48 +02:00
martinwhitaker
93c9c8ded5
Rework memory mapping to allow for larger program size (#54)
* Improve abstraction in vmem.h and limit memory benchmarking to first 2GB.

The third GB may get used for remapping memory regions that are only
accessed during startup, so it's not safe to use it for the memory
speed tests.

* Fix calculation of end limit for locating memory benchmark workspace.

* Document vmem.h.

* Use window number, not current start address, to detect first window.

* Increase the program low-load range from 1MB to 4MB and make more robust.

If the BIOS has reserved some parts of low memory, there may not be
enough contiguous space left to load the program there (issue #49).
So increase the low-load range to include the first 3MB of high
memory. Also guard against the program being initially loaded
straddling the new boundary.

Co-authored-by: Martin Whitaker <memtest@martin-whitaker.me.uk>
2022-04-28 23:04:01 +02:00
Martin Whitaker
e92f488753 Improve efficiency of random number generation (discussion #8).
Use a more efficient algorithm that can be in-lined, and keep the
generator state in a local variable.
2022-03-05 20:04:32 +00:00
Martin Whitaker
4078b7760e Faster barrier implementation.
The old barrier implementation was very slow when running on a multi-socket
machine (pcmemtest issue 16).

The new implementation provides two options:

  - when blocked, spin on a thread-local flag
  - when blocked, execute a HLT instruction and wait for a NMI

The first option might be faster, but we need to measure it to find out. A
new boot command line option is provided to select between the two, with a
third setting that uses a mixture of the two.
2022-02-28 22:05:21 +00:00
Martin Whitaker
3245b6d916 Don't turn the cache off in test 0 when performing dummy runs.
This should fix the slow startup on multi-socket machines (issue #16).
2022-02-19 20:55:41 +00:00
Martin Whitaker
f8b82eb0bd Exclude copyright notices from Doxygen file descriptions. 2022-02-19 19:56:55 +00:00
Martin Whitaker
76adad2fe6 Add ability to generate internal API documentation using Doxygen. 2022-02-19 16:17:40 +00:00
Martin Whitaker
0e61b1605e Remove volatile qualifier from testword pointers.
Now we use the atomic read/write functions, these are redundant.
2022-02-19 13:01:42 +00:00
Martin Whitaker
1888f5c611 Add change to tests/test.c missed in commit dcac5270. 2022-02-02 15:33:25 +00:00
Martin Whitaker
ccab9ab081 Fix operation with a subset of CPU cores enabled.
The last commit removed too much - there are a couple of places where
we need to use a virtual CPU number rather than a physical CPU number.
2022-02-01 15:38:06 +00:00
Martin Whitaker
16d55b7dad Remove distinction between physical and virtual CPUs.
This is no longer needed, now we can display as many CPUs as we can
physically handle.
2022-01-31 22:59:14 +00:00
Martin Whitaker
d9fee4dcbb Flush caches between writing and verifying test data.
Mostly we write and read large chunks of data which will make it likely
that the data is no longer in the cache when we come to verify it. But
this is not always true, and in any case, we shouldn't rely on it.
2021-12-23 11:00:10 +00:00
Martin Whitaker
11c0c6c2f5 Use atomic memory read/write functions in tests.
This ensures compiler optimisations won't interfere with the tests.
2021-12-23 10:07:55 +00:00
Martin Whitaker
8f1d81b65d Add missing includes of stdbool.h.
To ensure we aren't dependent on the order of inclusion.
2021-12-05 13:50:25 +00:00
Martin Whitaker
fbd3376668 Initial commit. 2020-05-24 21:30:55 +01:00