After previous patches it is not used (and should not be used) outside
of qemu_domain.c.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function qemuGetMemoryBackingPath() does not need the @def any more
and priv->memoryBackingDir can be used instead of constructing the path
by calling qemuGetMemoryBackingDomainPath().
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This way we keep the path for each running VM.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This way we _can_ (but do not, yet) remember the memory backing path for
running domains even after configuration change and daemon restart.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This way it does not use driver, since it will be later reworked and the
following patches cleaner, hopefully.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch simplifies (?) the of qemuDomainChangeNet() code while
fixing some incorrect decisions about exactly when it's necessary to
re-attach an interface's bridge device, or to fail the device update
(needReconnect[*]) because the type of connection has changed (or
within bridge and direct (macvtap) type because some attribute of the
connection has changed that can't actually be modified after the
tap/macvtap device of the interface is created).
Example 1: it's pointless to require the bridge device to be
reattached just because the interface has been switched to a different
network (i.e. the name of the network is different), since the new
network could be using the same bridge as the old network (very
uncommon, but technically possible). Instead we should only care if
the name of the *bridge device* changes (or if something in
<virtualport> changes - see Example 3).
Example 2: wrt changing the "type" of the interface, a change should
be allowed if old and new type both used a bridge device (whether or
not the name of the bridge changes), or if old and new type are both
"direct" *and* the device being linked and macvtap mode remain the
same. Any other change in interface type cannot be accommodated and
should be a failure (i.e. needReconnect).
Example 3: there is no valid reason to fail just because the interface
has a <virtualport> element - the <virtualport> could just say
"type='openvswitch'" in both the before and after cases (in which case
it isn't a change by itself, and so is completely acceptable), and
even if the interfaceid changes, or the <virtualport> disappears
completely, that can still be reconciled by simply re-attaching the
bridge device. (If, on the other hand, the modified <virtualport> is
for a type='direct' interface, we can't domodify that, and so must
fail (needReconnect).)
(I tried splitting this into multiple patches, but they were so
intertwined that the intermediate patches made no sense.)
[*] "needReconnect" was a flag added to this function way back in
2012, when I still believed that QEMU might someday support connecting
a new & different device backend (the way the virtual device connects
to the host) to an already existing guest netdev (the virtual device
as it appears to the guest). Sadly that has never happened, so for the
purposes of qemuDOmainChangeNet() "needReconnect" is equivalent to
"fail".
Resolves: https://issues.redhat.com/browse/RHEL-7036
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The new function does what the old qemuDomainChangeNetbridge() did
manually, except that:
1) the new function supports changing from a bridge of one type to
another, e.g. from a Linux host bridge to an OVS
bridge. (previously that wasn't handled)
2) the new function doesn't emit audit log messages. This is actually
a good thing, because the old code would just log a "detach"
followed immediately by "attach" for the same MAC address, so it's
essentially a NOP. (the audit logs don't have any more detailed
info about the connection - just the VM name and MAC address, so it
makes no sense to log the detach/attach pair as it's not providing
any information).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Attempts to use update-device to modify just the link state of a guest
interface were failing due to a supposed attempt to modify something
in the interface that can't be modified live (even though the only
thing that was changing was the link state, which *can* be modified
live).
It turned out that this failure happened because the guest interface
in question was type='network', and the network in question was a
'direct' network that provides each guest interface with one device
from a pool of network devices. As a part of qemuDomainChangeNet() we
would always allocate a new port from the network driver for the
updated interface definition (by way of calling
virDomainNetAllocateActualDevice(newdev)), and this new port (ie the
ActualNetDef in newdev) would of course be allocated a new host device
from the pool (which would of course be different from the one
currently in use by the guest interface (in olddev)). Because direct
interfaces don't support changing the host device in a live update,
this would cause the update to fail.
The solution to this is to realize that as long as the interface
doesn't get switched to a different network as a part of the update,
the network port information (ie the ActualNetDef) will not change as
a part of updating the guest interface itself. So for sake of
comparison we can just point the newdev at the ActualNetDef of olddev,
and then clear out one or the other when we're done (to avoid a double
free or, more likely, attempt to reference freed memory).
(If, on the other hand, the name of the network has changed, or if the
interface type has changed to type='network' from something else, then
we *do* need to allocate a new port (actual device) from the network
driver (as we used to do in all cases when the new type was
'network'), and also indicate that we'll need to replace olddev in the
domain with newdev (because either of these changes is major enough
that we shouldn't just try to fix up olddev)
Partially-Resolves: https://issues.redhat.com/browse/RHEL-7036
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'charstr' is unused since 36d06a5637, breaking the build on some
platforms. Remove it.
Fixes: 36d06a5637
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
QEMU supports only 'raw' and 'telnet' in the
<protocol type='telnets'/>
element. Reject 'telnets' and 'tls'. TLS transport for qemu chardevs is
configured via "tls='yes'" attribute added to the "<source>" element
instead, so this prevents potential misconfig as the value would be
silently accepted.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/412
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that we have a unified generator of chardev backend which is also
validated against the QMP schema we can replace the old generator with
it.
This patch modifies the monitor code to take virJSONValue 'props'
instead of the chardev definition and adds the conversion from the
chardev definition to JSON on higher levels.
The monitor code now also attempts to extract the returned 'pty' if
returned from qemu, so higher level code needs to report the error if
the path is needed and missing.
The current monitor generator is for now abandoned in place and will be
removed later.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The upcoming refactor of the monitor code will make the hotplug code
paths use the same generator we have for commandline -chardev backends
which doesn't refuse to format certain backends which can't be
hotplugged.
To prepare for this we add a check to qemuHotplugChardevAttach()
refusing such hotplug and remove 'qemumonitorjsontest' test cases which
will not make sense any more.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Similarly to how we approach the generators for
-device/-object/-blockdev/-netdev rewrite the generator of -chardev to
be unified with the generator for the monitor.
Unfortunately with -chardev it will be a bit more quirky when compared
to the others as the generator itself will need to know whether it
generates command line output or not as a few field names change and data
is nested differently.
This first step adds the generator and uses it only for command line
generation. This was possible to achieve without changing any of the
output in tests.
In further patches the same generator will then be used also in the
monitor code replacing both.
As basis for the generator I took the monitor code but modified it to
have the same field order as the commandline code and extended it
further to support all backend types, even those which are not
hotpluggable.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
I've added that capability a long time ago when I was converting various
stuff to use JSON but the support in '-chardev' didn't yet materialize.
Fix the comment to make that clear and also that it'll be used in tests
for the upcoming refactor of the chardev code (so that we can validate
generator against the schema even if that doesn't yet work).
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Recent fix to use the proper 'async' monitor function would cause
libvirt to leak some of the objects it's supposed to clean up in other
places besides qemu.
Don't skip the whole function on failure to enter the job but just the
monitor section.
Fixes: 9b22c25548
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'qemuBackupDiskDataCleanupOne()' is entering the monitor while we're in
the async backup job inside 'qemuBackupBegin()' which is semantically
wrong and per upstream report causes crashes if some monitoring commands
are run in parallel.
Use qemuDomainObjEnterMonitorAsync() instead.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/668
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The function can return directly rather than setting 'ret' as there's no
cleanup.
It also doesn't make sense to conditionally compile out the 'break'
statement when checking whether a disk has rawio enabled if
'CAP_SYS_RAWIO' is _not_ defined as the function will still behave the
same.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
pvpanic-pci is the only reasonable implementation of a panic
device for aarch64/virt guests. Right now we're asking users to
provide the model name manually, but we can be more helpful and
fill it in automatically instead.
With this change, the aarch64-panic-no-model test no longer
fails and so it's no longer useful to us. Instead, we can amend
the aarch64-virt-default-models test case to include panic
coverage, something that until now wasn't possible.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Right now the fallback behavior is to use MODEL_ISA if we
haven't been able to find a better match, but that's not very
useful as we're still going to hit an error later, when
QEMU_CAPS_DEVICE_PANIC is not found at Validate time.
Instead of doing that, allow MODEL_DEFAULT to get all the
way to Validate and report an error upon encountering it.
The reported error changes slightly, but other than that the
set of configurations that are allowed and blocked remains
the same.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Perform decisions based on the architecture and machine type
in a single place instead of duplicating them.
This technically adds new behavior for MODEL_ISA in
qemuDomainDefAddDefaultDevices(), but it doesn't make any
difference functionally since we don't set addPanicDevice
outside of ppc64(le) and s390(x). If we did, the lack of
handling for that value would be a latent bug.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This advertises the feature only for the architectures and
machine types where it can actually be used.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We will soon need to use it in a context where we don't have
a virDomainDef handy.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
From: Praveen K Paladugu <prapal@linux.microsoft.com>
Move methods to connect domain interfaces to host bridges to hypervisor.
This is to allow reuse between qemu and ch drivers.
Signed-off-by: Praveen K Paladugu <praveenkpaladugu@gmail.com>
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemu supports this enlightenment since version 7.10.
From the qemu commit:
Hyper-V specification allows to pass parameters for certain hypercalls
using XMM registers ("XMM Fast Hypercall Input"). When the feature is
in use, it allows for faster hypercalls processing as KVM can avoid
reading guest's memory.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
qemu supports this enlightenment since version 7.10.
From the qemu commit:
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V)
hypervisors to collaborate to avoid unnecessary updates to L2
MSR-Bitmap upon vmexits.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Currently, qemuProcessStop() unlocks given domain object right in
the middle of cleanup process. This is dangerous because there
might be another thread which is executing virDomainObjListAdd().
And since the domain object is on the list of domain objects AND
by the time qemuProcessStop() unlocks it the object is also
marked as inactive, the other thread acquires the lock and
switches vm->def pointer.
The unlocking of domain object is needed though, to allow even
processing thread finish its queue. Well, the processing can be
done before any cleanup is attempted.
Therefore, use freshly introduced virEventThreadStop() to join
the event thread and drop lock/unlock from the middle of
qemuProcessStop().
Now, there's a comment being removed that mentions
qemuDomainObjStopWorker() and why it has to be called only after
the domain is marked as dead. This comment is no longed
applicable because call to qemuDomainObjStopWorker() is removed
also. Moreover, priv->beingDestroyed is set to true before
unlocking the domain object, thus any event processing callback
is going to see the domain being destroyed and can chose to
either exit early or finish processing event.
Fixes: 3865410e7f
Resolves: https://issues.redhat.com/browse/RHEL-49607
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This introduces a new 'ps2' feature which, when disabled, results in
no implicit PS/2 bus input devices being automatically added to the
domain and addition of the 'i8042=off' machine option to the QEMU
command-line.
A notable side effect of disabling the i8042 controller in QEMU is that
the vmport device won't be created. For this reason we will not allow
setting the vmport feature if the ps2 feature is explicitly disabled.
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This capability tells us whether given QEMU binary supports the
'-machine xxx,i8042=on/off' toggle used to enable/disable PS/2
controller emulation.
A few facts:
- This option was introduced in QEMU 7.0 and defaults to 'on'
- QEMU versions before 7.0 enabled i8042 controller emulation implicitly
- This option (and i8042 controller emulation itself) is only supported
by descendants of the generic PC machine type (e.g. i440fx, q35, etc.)
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Up until now, we've assumed that all x86 machines have a PS/2
controller built-in. This assumption was correct until QEMU v4.2
introduced a new x86-based machine type - microvm.
Due to this assumption, a pair of unnecessary PS/2 inputs are implicitly
added to all microvm domains. This patch fixes that by whitelisting
machine types which are known to include the i8042 PS/2 controller.
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Attempting to start qemu with or hotplug an empty 'usb-storage' based
disk results in the following error:
qemu-system-x86_64: -device {"driver":"usb-storage","bus":"usb.0","port":"2","id":"usb-disk1","removable":true}: drive property not set
Reject such config at validation step and adjust tests.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Some code paths, such as if hotplug of an empty cdrom fails can cause
that 'qemuBlockStorageSourceChainDetach' will be called with 'NULL'
@data as there is no backend for the disk.
The above case became possible once we allowed hotplug of cdroms and
subsequently fixed the case when users would hotplug an empty cdrom
which ultimately caused the possibility of having no backend in the
hotplug code path which was not possible before (see 'Fixes:' below and
also the commit linked from there).
Make 'qemuBlockStorageSourceChainDetach' tolerate NULL @data by simply
returning early.
Fixes: 894c6c5c16
Resolves: https://issues.redhat.com/browse/RHEL-54550
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There is a family of convenient macros: NULLSTR, NULLSTR_EMPTY,
NULLSTR_STAR, NULLSTR_MINUS which hides ternary operator.
Generated using the following spatch (and its obvious variants):
@@
expression s;
@@
<+...
- s ? s : "<null>"
+ NULLSTR(s)
...+>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
On failure to plug the device the cleanup path didn't roll back the FD
passing to qemu thus qemu would hold the FDs indefinitely.
Resolves: https://issues.redhat.com/browse/RHEL-53964
Fixes: b79abf9c3c (vdpafd)
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add dma-translation attribute to qemu command line if specified in
domain conf.
Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add dma_translation attribute to iommu to enable/disable dma traslation
for intel-iommu
Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Depending on timing between QEMU and libvirt an attempt to resume failed
post-copy migration could immediately report a failure in post-copy
phase again even though the migration actually resumed and is
progressing just fine.
This is caused by QEMU reporting the original migration state (i.e.,
postcopy-paused) until migration is successfully resumed and QEMU
switches to postcopy-active. QEMU 9.1 introduced a new
postcopy-recover-setup migration state which is entered immediately
after requesting migration to be resumed and we can reliably wait for
the migration to either continue or fail without being confused by the
old state.
https://issues.redhat.com/browse/RHEL-22166
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds support for recognizing the new migration state reported
by QEMU when post-copy recovery is requested. It is not actually used
for anything yet.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The s390(x) machines never supported ACPI. That didn't stop users
enabling ACPI in their config. As of libvirt-9.2 (98c4e3d073) with new
enough qemu we reject configs which require ACPI, but qemu can't satisfy
it.
This breaks migration of existing VMs with the old wrong configs to new
libvirt installations.
To address this introduce a post-parse fixup removing the ACPI flag
specifically for s390 machines which do enable it in the definition.
The advantage of doing it in post-parse, rather than simply relaxing the
ABI stability check to allow users providing an fixed XML when migrating
(allowing change of the ACPI flag for s390 in ABI stability check, as it
doesn't impact ABI), is that only the destination installation needs to
be patched in order to preserve migration.
To mitigate the disadvantage of simply stripping it from all s390(x)
configs the hack is not applied when defining or starting a new domain
from the XML, to preserve the error about unsupported configuration.
Resolves: https://issues.redhat.com/browse/RHEL-49516
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
So far we are relying on QEMU or sysadmin to create the file for
pstore. This is suboptimal as in the case of the former we can
not set proper seclabels (there's nothing to set seclabels on
until QEMU is started).
Therefore, make sure the file is created before launching QEMU
and that it has the correct size.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Introduced only a couple of commits ago (in
v10.5.0-84-g90e50e67c6) the pstore device acts as a nonvolatile
storage, where guest kernel can store information about crashes.
This device, however, expects a file in the host from which the
crash data is read. So far, we expected users to provide a path,
but we can autogenerate one if missing. Just put it next to
per-domain's NVRAM stores.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Nothing special going on here.
Resolves: https://issues.redhat.com/browse/RHEL-24746
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The aim of pstore device is to provide a bit of NVRAM storage for
guest kernel to record oops/panic logs just before the it
crashes. Typical usage includes usage in combination with a
watchdog so that the logs can be inspected after the watchdog
rebooted the machine. While Linux kernel (and possibly Windows
too) support many backends, in QEMU there's just 'acpi-erst'
device so stick with that for now. The device must be attached to
a PCI bus and needs two additional values (well, corresponding
memory-backend-file needs them): size and path. Despite using
memory-backend-file this does NOT add any additional RAM to the
guest and thus I've decided to expose it as another device type
instead of memory model.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The new option style renamed one of the cache modes.
https://issues.redhat.com/browse/RHEL-50329
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In cases when a QEMU process takes longer than the time sigterm and
sigkill are issued to kill the process do not simply fail and leave the
VM in state VIR_DOMAIN_SHUTDOWN until the daemon stops. Instead set up
an fd on /proc/$pid and get notified when the QEMU process finally has
terminated to cleanup the VM state.
Resolves: https://issues.redhat.com/browse/RHEL-28819
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If there are absent values in an already existing element
specifying rom settings, we simply use the old ones. This
behaviour is not desired, as users might think that deleting the
element from XML would delete the setting (because the hotplug
succeeds) - which does not happen. Because of that, we should not
accept an interface without elements that cannot be changed.
Therefore, we should not allow absent values for already existing
rom setting during hotplug.
Resolves: https://issues.redhat.com/browse/RHEL-7109
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
New element 'openfiles' had confusing name. Since the patch with
this new element wasn't propagate yet, old name ('rlimit_nofile')
was changed.
...
<binary>
<openfiles max='122333'/>
</binary>
...
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
By definition. Accordingly, filter them out when looking for
a read/write image.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If the configuration explicitly requests a specific type of
firmware image, be it pflash or ROM, we should ignore all images
that are not of that type.
If no specific type has been requested, of course, any type is
considered a match and the selection will be based upon the
other attributes.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Recent commit v10.4.0-87-gd9935a5c4f made a reasonable change to only
reset beingDestroyed back to false when vm->def->id is reset to make
sure other code can detect a domain is (about to become) inactive. It
even added a comment saying any caller of qemuProcessBeginStopJob is
supposed to call qemuProcessStop to clear beingDestroyed. But not every
caller really does so because they first call qemuProcessBeginStopJob
and then check whether a domain is still running. If not the
qemuProcessStop call is skipped leaving beingDestroyed=true. In case of
a persistent domain this may block incoming migrations of such domain as
the migration code would think the domain died unexpectedly (even though
it's still running).
The qemuProcessBeginStopJob function is a wrapper around
virDomainObjBeginJob, but virDomainObjEndJob was used directly for
cleanup. This patch introduces a new qemuProcessEndStopJob wrapper
around virDomainObjEndJob to properly undo everything
qemuProcessBeginStopJob did.
https://issues.redhat.com/browse/RHEL-43309
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Allow migration if the "migrate-precopy" capability is present or
libvirt is not the one running the virtiofs daemon.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Run the daemon with --print-capabilities first, to see what it supports.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As of now, libvirt supports few essential stats as
part of virDomainGetJobStats for Live Migration such
as memory transferred, dirty rate, number of iteration
etc. Currently it does not have support for the vfio
stats returned via QEMU. This patch adds support for that.
Signed-off-by: Kshitij Jha <kshitij.jha@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This commit removes the redundant call to qemuSecurityGetNested() in
qemuStateInitialize(). In qemuSecurityGetModel(), the first security manager
in the stack is already used by default, so this change helps to
simplify the code.
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Fix libvirtd hang since fork() was called while another thread had
security manager locked.
We have the stack security driver, which internally manages other security drivers,
just call them "top" and "nested".
We call virSecurityStackPreFork() to lock the top one, and it also locks
and then unlocks the nested drivers prior to fork. Then in qemuSecurityPostFork(),
it unlocks the top one, but not the nested ones. Thus, if one of the nested
drivers ("dac" or "selinux") is still locked, it will cause a deadlock. If we always
surround nested locks with top lock, it is always secure. Because we have got top lock
before fork child libvirtd.
However, it is not always the case in the current code, We discovered this case:
the nested list obtained through the qemuSecurityGetNested() will be locked directly
for subsequent use, such as in virQEMUDriverCreateCapabilities(), where the nested list
is locked using qemuSecurityGetDOI, but the top one is not locked beforehand.
The problem stack is as follows:
libvirtd thread1 libvirtd thread2 child libvirtd
| | |
| | |
virsh capabilities qemuProcessLanuch |
| | |
| lock top |
| | |
lock nested | |
| | |
| fork------------------->|(nested lock held by thread1)
| | |
| | |
unlock nested unlock top unlock top
|
|
qemuSecuritySetSocketLabel
|
|
lock nested (deadlock)
In this commit, we ensure that the top lock is acquired before the nested lock,
so during fork, it's not possible for another task to acquire the nested lock.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1303031
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In a domain created with an interface with a <driver> subelement,
the device contains a non-NULL virDomainVirtioOptions struct, even
for non-virtio NIC models. The subelement need not be present again
after libvirt restarts, or when the interface is passed to clients.
When clients such as virsh domif-setlink put back the modified
interface XML, the new device's virtio attribute is NULL. This may
fail the equality checks for virtio options in qemuDomainChangeNet,
depending on whether libvird was restarted since define or not.
This patch modifies the check for non-virtio models, to ignore olddev
value of virtio (assumed valid), and to allow either NULL or a struct
with all values ABSENT in the new virtio options.
Signed-off-by: Miroslav Los <mirlos@cisco.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This wires up the emulator 'debug' parameter to control the
/usr/bin/swtpm 'level' parameter for logging.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Changing the postgroup attribute caused unexpected behavior.
Although it can be implemented, it has a non-trivial solution.
No requirement or use has yet been found for implementing this
feature, so it has been disabled for hot-plug.
Resolves: https://issues.redhat.com/browse/RHEL-7299
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The 'hostFips' member of _virQEMUDriver struct is not used
really, due to previous cleanups. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The support for VXHS device was removed in QEMU commit
v5.1.0-rc1~16^2~10. Since we require QEMU-5.2.0 at least there's
no QEMU that has the device and thus the corresponding capability
can be retired.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that the minimal required version of QEMU is 5.2.0 the
conditional setting of QEMU_CAPS_ENABLE_FIPS and
QEMU_CAPS_NETDEV_USER is effectively a dead code. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
According to repology.org and/or distro repos these are the version of QEMU:
CentOS Stream 9: qemu-kvm-9.0.0
Debian 11: qemu-5.2.0
Fedora 39: qemu-8.3.1
openSUSE Leap 15.3: qemu-5.2.0
RHEL-8: qemu-6.2.0
Ubuntu 22.04: qemu-6.2.0
Since the minimal version is 5.2.0 we can bump from 4.2.0 to
5.2.0.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It may happen that QEMU is compiled without SLIRP but with
support for passt. In such case it is acceptable to alter user
provided configuration and switch backend to passt as it offers
all the features as SLIRP.
Resolves: https://issues.redhat.com/browse/RHEL-45518
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that the logic for detecting supported net backend types has
been moved to domain capabilities generation, we can just use it
when validating net backend type. Just like we do for device
models and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that we have a capability for each domain net backend we can
start validating user's selection against QEMU capabilities.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Since -netdev user can be disabled during QEMU compilation, we
can't blindly expect it to just be there. We need a capability
that tracks its presence.
For qemu-4.2.0 we are not able to detect the capability so do the
next best thing - assume the capability is there. This is
consistent with our current behaviour where we blindly assume the
capability, anyway.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When enabling switchover-ack on qemu from libvirt, the .party value
was set to both source and target; however, qemuMigrationParamsCheck()
only takes that into account to validate that the remote side of the
migration supports the flag if it is marked optional or auto/always on.
In the case of switchover-ack, when enabled on only the dst and not
the src, the migration will fail if the src qemu does not support
switchover-ack, as the dst qemu will issue a switchover-ack msg:
qemu/migration/savevm.c ->
loadvm_process_command ->
migrate_send_rp_switchover_ack(mis) ->
migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL)
Since the src qemu doesn't understand messages with header_type ==
MIG_RP_MSG_SWITCHOVER_ACK, qemu will kill the migration with error:
qemu-kvm: RP: Received invalid message 0x0007 length 0x0000
qemu-kvm: Unable to write to socket: Bad file descriptor
Looking at the original commit [1] for optional migration capabilities,
it seems that the spirit of optional handling was to enhance a given
existing capability where possible. Given that switchover-ack
exclusively depends on return-path, adding it as optional to that cap
feels right.
[1] 61e34b0856 ("qemu: Add support for optional migration capabilities")
Fixes: 1cc7737f69 ("qemu: add support for qemu switchover-ack")
Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Now that the logic for detecting supported launchSecurity types
has been moved to domain capabilities generation, we can just use
it when validating launchSecurity type. Just like we do for
device models and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The inspiration for these rules comes from
qemuValidateDomainDef().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While it's very unlikely to have QEMU that supports SEV-SNP but
doesn't support plain SEV, for completeness sake we ought to
query SEV capabilities if QEMU supports either. And similarly to
QEMU_CAPS_SEV_GUEST we need to clear the capability if talking to
QEMU proves SEV is not really supported.
This in turn removes the 'sev-snp-guest' capability from one of
our test cases as Peter's machine he uses to refresh capabilities
is not SEV capable. But that's okay. It's consistent with
'sev-guest' capability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
An iSCSI device with zero hosts will result in a segmentation fault. This patch
adds a check for the number of hosts, which must be one in the case of iSCSI.
Minimal reproducing XML:
<domain type='qemu'>
<name>MyGuest</name>
<uuid>4dea22b3-1d52-d8f3-2516-782e98ab3fa0</uuid>
<os>
<type arch='x86_64'>hvm</type>
</os>
<memory>4096</memory>
<devices>
<disk type='network'>
<source name='dummy' protocol='iscsi'/>
<target dev='vda'/>
</disk>
</devices>
</domain>
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Add plumbing for QEMU's switchover-ack migration capability, which
helps lower the downtime during VFIO migrations. This capability is
enabled by default as long as both the source and destination support
it.
Note: switchover-ack depends on the return path capability, so this may
not be used when VIR_MIGRATE_TUNNELLED flag is set.
Extensive details about the qemu switchover-ack implementation are
available in the qemu series v6 cover letter [1] where the highlight is
the extreme reduction in guest visible downtime. In addition to the
original test results below, I saw a roughly ~20% reduction in downtime
for VFIO VGPU devices at minimum.
=== Test results ===
The below table shows the downtime of two identical migrations. In the
first migration swithcover ack is disabled and in the second it is
enabled. The migrated VM is assigned with a mlx5 VFIO device which has
300MB of device data to be migrated.
+----------------------+-----------------------+----------+
| Switchover ack | VFIO device data size | Downtime |
+----------------------+-----------------------+----------+
| Disabled | 300MB | 1900ms |
| Enabled | 300MB | 420ms |
+----------------------+-----------------------+----------+
Switchover ack gives a roughly 4.5 times improvement in downtime.
The 1480ms difference is time that is used for resource allocation for
the VFIO device in the destination. Without switchover ack, this time is
spent when the source VM is stopped and thus the downtime is much
higher. With switchover ack, the time is spent when the source VM is
still running.
[1] https://patchwork.kernel.org/project/qemu-devel/cover/20230621111201.29729-1-avihaih@nvidia.com/
Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Commit 7c8e606b64 attempted to fix
the specification of the ramfb property for vfio-pci devices, but it
failed when ramfb is explicitly set to 'off'. This is because only the
'vfio-pci-nohotplug' device supports the 'ramfb' property. Since we use
the base 'vfio-pci' device unless ramfb is enabled, attempting to set
the 'ramfb' parameter to 'off' this will result in an error like the
following:
error: internal error: QEMU unexpectedly closed the monitor
(vm='rhel'): 2024-06-06T04:43:22.896795Z qemu-kvm: -device
{"driver":"vfio-pci","host":"0000:b1:00.4","id":"hostdev0","display":"on
","ramfb":false,"bus":"pci.7","addr":"0x0"}: Property 'vfio-pci.ramfb'
not found.
This also more closely matches what is done for mdev devices.
Resolves: https://issues.redhat.com/browse/RHEL-28808
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The attribute 'discard_no_unref' of <disk/> is not allowed to be
changed while the virtual machine is running.
Resolves: https://issues.redhat.com/browse/RHEL-37542
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The firmware descriptors have 'amd-sev-snp` feature which
describes whether firmware is suitable for SEV-SNP guests.
Provide necessary implementation to detect the feature and pick
the right firmware if guest is SEV-SNP enabled.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Pretty straightforward as qemu has 'sev-snp-guest' object which
attributes maps pretty much 1:1 to our XML model. Except for
@vcek where QEMU has 'vcek-disabled`, an inverted boolean, while
we model it as virTristateBool. But that's easy to map too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
SEV-SNP is an enhancement of SEV/SEV-ES and thus it shares some
fields with it. Nevertheless, on XML level, it's yet another type
of <launchSecurity/>.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This capability tracks sev-snp-guest object availability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In QEMU commit v9.0.0-1155-g59d3740cb4 the return type of
'query-sev' monitor command changed to accommodate SEV-SNP. Even
though we currently support launching plain SNP guests, this will
soon change.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In a few instances there is a plain if() check for
_virDomainSecDef::sectype. While this works perfectly for now,
soon there'll be another type and we can utilize compiler to
identify all the places that need adaptation. Switch those if()
statements to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The sectype member of _virDomainSecDef struct is already declared
as of virDomainLaunchSecurity type. There's no need to typecast
it to the very same type when passing it to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Some parts of SEV are to be shared with SEV SNP. In order to
reuse XML parsing / formatting code cleanly, let's move those
common bits into a new struct (virDomainSEVCommonDef) and adjust
rest of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While working on qemuMonitorJSONGetSEVMeasurement() and
qemuMonitorJSONGetSEVInfo() I've noticed that if these functions
fail, they do so without appropriate error set. Fill in error
reporting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When a VM terminates itself while it's being migrated in running state
libvirt would report wrong error:
error: cannot get locked memory limit of process 2502057: No such file or directory
rather than the proper error:
error: operation failed: domain is not running
Remember the error on error paths in qemuMigrationSrcConfirmPhase and
qemuMigrationSrcPerformPhase.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'qemuProcessStop()' clears the 'current' job data. While the code under
the 'error' label in 'qemuMigrationSrcRun()' does check that the VM is
active before accessing the job, it also invokes multiple helper
functions to clean up the migration including
'qemuMigrationSrcNBDCopyCancel()' which calls 'qemuDomainObjWait()'
invalidating the result of the liveness check as it unlocks the VM.
Duplicate the liveness check and explain why. The rest of the code e.g.
accessing the monitor is safe as 'qemuDomainEnterMonitorAsync()'
performs a liveness check. The cleanup path just ignores the return
values of those functions.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function is a pointless wrapper on top of
qemuMigrationDstWaitForCompletion.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to the one change in commit 4d1a1fdffd
we should be checking that the VM is not being yet destroyed if we've
invoked qemuDomainObjWait().
Use the new helper qemuDomainObjIsActive().
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The helper checks whether VM is active including the internal qemu
state. This helper will become useful in situations when an async job
is in use as VIR_JOB_DESTROY can run along async jobs thus both checks
are necessary.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Prevent the possibility that a VM could be considered as alive while
inside qemuProcessStop.
A recently fixed bug which unlocked the domain object while inside
qemuProcessStop showed that there's possibility to confuse the state of
the VM to be considered active while 'qemuProcessStop' is processing
shutdown of the VM. Ensure that this doesn't happen by clearing the
'beingDestroyed' flag only after the VM id is cleared.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
There are few function calls done while cleaning up a stopped VM which
do require the old VM id, to e.g. clean up paths containing the 'short'
domain name in the path.
Anything else, which doesn't strictly require it can be moved after
clearing the 'id' in order to decrease likelyhood of potential bugs.
This patch moves all the code which does not require the 'id' (except
for the log entry and closing the monitor socket) after the statement
clearing the id and adds a comment explaining that anything in the
section must not unlock the VM object.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'qemuDomainObjStopWorker()' which is meant to dispose of the event loop
thread for the monitor unlocks the VM object while disposing the thread
to prevent possible deadlocks with events waiting on the monitor thread.
Unfortunately 'qemuDomainObjStopWorker()' is called *before* the VM is
marked as inactive by clearing 'vm->def->id', but at the same time it's
no longer marked as 'beingDestroyed' when we're inside
'qemuProcessStop()'.
If 'vm' would be kept locked this wouldn't be a problem. Same way it's
not a problem for anything that uses non-ASYNC VM jobs, or when the
monitor is accessed in an async job, as the 'destroy' job interlocks
with those.
It is a problem for code inside an async job which uses
'qemuDomainObjWait()' though. The API contract of qemuDomainObjWait()
ensures the caller that the VM on successful return from it, but in this
specific reason it's not the case, as both 'beingDestroyed' is already
false, and 'vm->def->id' is not yet cleared.
To fix the issue move the 'qemuDomainObjStopWorker()' call *after*
clearing 'vm->def->id' and also add a note stating what the function is
doing.
Fixes: 860a999802
Closes: https://gitlab.com/libvirt/libvirt/-/issues/640
Reported-by: luzhipeng <luzhipeng@cestc.cn>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Document why this function exists and meaning of return values.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Clear the 'disk' member of 'blockjob' as we're freeing the disk object
at this point. While this should not normally happen it was observed
when other bug allowed the VM to be cleared while other threads didn't
yet finish.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to other blockjob handlers, if there's no disk associated with
the blockjob the handler needs to behave correctly. This is needed as
the disk might have been de-associated on unplug or other operations.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Sometimes in release hook it is useful to know if the VM shutdown was graceful
or not. This is especially useful to do cleanup based on the VM shutdown failure
reason in release hook. This patch proposes to use the last argument 'extra'
to pass VM shutoff reason in the call to release hook.
Making this change for Qemu and LXC.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-23833
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Although virDomainDeviceDefValidate() is called as a part of
parsing device XML routine, it validates only that single device.
The virDomainDefValidate() function performs a more comprehensive
check. It should detect errors resulting from dependencies
between devices, or a device and some other part of XML config.
Therefore, a call to virDomainDefValidate() is added at the end
of qemuDomainAttachDeviceConfig().
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In one of my previous commits I've made us substract isolcpus
from all online CPUs when setting affinity on QEMU threads. See
commit below for more info on that. Nevertheless, this is
something that surely deserves an entry in log. I've chosen INFO
priority for now. We can promote that to a regular WARN if users
complain.
Fixes: da95bcb6b2
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We currently hardcode the systemd sysusersdir, but it is desirable to be
able to choose a different location in some cases. For example, Fedora
flatpak builds change the RPM %_sysusersdir macro, but we can't currently
honour that.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reported-by: Yaakov Selkowitz <yselkowi@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The support will be dropped soon by qemu, and libvirt is not rejecting
such configurations. Add validation of this explicitly requested config.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Everywhere we use TPM 2.0 as our default, the chances of TPM
1.2 being supported by the guest OS are very slim. Just reject
such configurations outright.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
TPM 1.2 is a pretty bad default these days, especially for
architectures which were introduced when TPM 2.0 already existed.
We're already carving out exceptions for several scenarios, but
that's basically backwards: at this point, using TPM 1.2 is the
exception.
Restructure the code so that it reflects reality and we don't
have to remember to update it every time a new architecture is
introduced.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
While __attribute((sentinel)) (exposed by glib under
G_GNUC_NULL_TERMINATED macro) is a gcc extension, it's supported
by clang too. It's already being used throughout our code but
some functions that take variadic arguments and expect NULL at
the end were lacking such annotation. Fill them in.
After this, there are still some functions left untouched because
they expect a different sentinel than NULL. Unfortunately, glib
does not provide macro for different sentinels. We may come up
with our own, but let's save that for future work.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When hot-plugging a FS device with un-assigned address with a bootindex
the recently-added validation check would fail as validation on hotplug
is done prior to address assignment.
To fix this problem we can simply relax the check to also pass on _NONE
addresses. Unsupported configurations will still be caught as previous
commit re-checks the definition after address assignment prior to
hotplug.
Resolves: https://issues.redhat.com/browse/RHEL-39271
Fixes: 4690058b6d
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Some of the checks make sense only after the address is allocated and
thus we need to re-do the validation after the address is assigned.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In one of my recent commits, I've introduced
virDomainInterfaceClearQoS() which is a helper that either calls
virNetDevBandwidthClear() ('tc' implementation) or
virNetDevOpenvswitchInterfaceClearQos() (for ovs ifaces). But I
made a micro optimization which leads to a bug: the function
checks whether passed iface has any QoS set and returns early if
it has none. In majority of cases this is right thing to do, but
when removing QoS on virDomainUpdateDeviceFlags() this is
problematic. The new definition (passed as argument to
virDomainInterfaceClearQoS()) contains no QoS (because user
requested its removal) and thus instead of removing the old QoS
setting nothing is done.
Fortunately, the fix is simple - pass olddev which contains the
old QoS setting.
Fixes: 812a146dfe
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The code cleaning up virStorageSource doesn't free data allocated by
virStorageSourceInit() so we need to call virStorageSourceDeinit()
explicitly.
Fixes: 8e66473781
Resolves: https://issues.redhat.com/browse/RHEL-33044
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The virStateDriver struct has .stateInitialize callback which is
declared to return virDrvStateInitResult enum. But some drivers
return a plain int in their implementation which is UB.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
There was no test for this and we mistakenly used 'B' rather than 'T'
when constructing the json value for this parameter. Thus, a value of
'off' was VIR_TRISTATE_SWITCH_OFF=2, which was translated to a boolean
value of 'true'.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Unlike other input types, evdev is not a true device since it's backed by
'-object'. We must use object-add/object-del monitor commands instead of
device-add/device-del in this particular case.
This patch adds support for handling live attachment and
detachment of evdev type devices.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/529
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Previously, the network device hotplug logic would try to ensure only CCW or
PCI addresses. With recent support for the usb-net model, this patch will
ensure USB addresses for usb-net network devices.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/14
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'virQEMUCapsSearchData' has been unused since
commit bc33b8c639 ("qemu: capabilities: Drop the
virQEMUCapsCacheLookupByArch function")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When starting a domain and there's no vCPU/emulator pinning set,
we query the list of all online physical CPUs and set affinity of
the child process (which eventually becomes QEMU) to that list.
We can't assume libvirtd itself had affinity to all online CPUs
and since affinity of the child process is inherited, we should
fix it afterwards. But that's not necessarily correct. Users
might isolate some physical CPUs and we should avoid touching
them unless explicitly told so (i.e. vCPU/emulator pinning told
us so).
Therefore, when attempting to set affinity to all online CPUs
subtract the isolated ones.
Before this commit:
root@localhost:~# cat /sys/devices/system/cpu/isolated
19,21,23
root@virtlab414:~# taskset -cp $(pgrep qemu)
pid 14835's current affinity list: 0-23
After:
root@virtlab414:~# taskset -cp $(pgrep qemu)
pid 17153's current affinity list: 0-18,20,22
Resolves: https://issues.redhat.com/browse/RHEL-33082
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Extend the list of supported formats, update and clarify comment
in qemu.conf.in (removed misleading sentence about the order of
compression format types).
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/589
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Features removed from a CPU model are marked with "removed='yes'"
attribute in the CPU map. Such features will always be present in a CPU
definition produced by libvirt regardless on their state. In other words
a running domain (even saved in a file) will always explicitly contain
states of all features removed from the specified CPU model. This
enables migration to older libvirt which would otherwise think the
affected features should be enabled as they are still included in the
CPU model in the older version of CPU map. Migration from an old libvirt
to a new one would be broken as the new libvirt would think the removed
features should be disabled (because they are not included in the CPU
model anymore), which might not be the case on the source host. Thus we
were refusing to remove CPU features unless they were never working and
no domain could even be running with those features enabled.
This patch removes the limitation. When handling CPU definitions with
missing features marked as removed in the specified CPU model, we know
whether it comes from a running domain, in which case it must have been
created by older libvirt where the missing CPU features were not removed
yet. This means the features must have been enabled on the source and we
can automatically fix the definition by adding the missing features with
correct states.
We can safely remove any CPU feature from our CPU models now, but it
should only be used for features removed from all versions of a given
CPU model in QEMU because unversioned models correspond to v1.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virCPUUpdate check the CPU definition for features that were marked as
removed in the specified CPU model and explicitly adds those that were
not mentioned in the definition. So far such features were added with
VIR_CPU_FEATURE_DISABLE policy, but the caller may want to use a
different policy in some situations, which is now possible via the
removedPolicy parameter.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The virCPUDefAddFeatureInternal helper function only fails if it is
called with VIR_CPU_ADD_FEATURE_MODE_EXCLUSIVE, which is only used in
virCPUDefAddFeature. The other callers (virCPUDefUpdateFeature and
virCPUDefAddFeatureIfMissing) will never get anything but 0 from
virCPUDefAddFeatureInternal and their return type can be changed to
void.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Allow generation of command line for virtio-sound-pci and virtio-sound-device
devices along with additional virtio options.
A new testcase is added to test virtio-sound-pci. The
arm-vexpressa9-virtio testcase is also extended to test virtio-sound-device.
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This patch adds parsing of the virtio sound model, along with parsing
of virtio options and PCI/virtio-mmio address assignment.
A new 'streams' attribute is added for configuring number of PCM streams
(default is 2) in virtio sound devices. QEMU additionally has jacks and chmaps
parameters but these are currently stubbed, hence they are excluded in this
patch series.
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability can be used to detect if the qemu binary already
supports 'ras' feature for 'virt' machine type.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
While QEMU accepts and interprets an empty string in the tls-hostname
field in migration parametes as if it's unset, the same does not apply
for the 'tls-hostname' field when 'blockdev-add'-ing a NBD backend for
non-shared storage migration.
When libvirt sets up migation with TLS in 'qemuMigrationParamsEnableTLS'
the QEMU_MIGRATION_PARAM_TLS_HOSTNAME migration parameter will be set to
empty string in case when the 'hostname' argument is passed as NULL.
Later on when setting up the NBD connections for non-shared storage
migration 'qemuMigrationParamsGetTLSHostname', which fetches the value
of the aforementioned TLS parameter.
This bug was mostly latent until recently as libvirt used
MIGRATION_DEST_CONNECT_HOST mode in most cases which required the
hostname to be passed, thus the parameter was set properly.
This changed with 8d693d79c4 for post-copy migration, where libvirt now
instructs qemu to connect and thus passes NULL hostname to
qemuMigrationParamsEnableTLS, which in turn causes libvirt to try to
add NBD connection with empty string as tls-hostname resulting in:
error: internal error: unable to execute QEMU command 'blockdev-add': Certificate does not match the hostname
To address this modify 'qemuMigrationParamsGetTLSHostname' to undo the
weird semantics the migration code uses to handle TLS hostname and make
it return NULL if the hostname is an empty string.
Fixes: e8fa09d66b
Resolves: https://issues.redhat.com/browse/RHEL-32880
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The CCW variant of the 'vhost-user-fs' device in qemu doesn't
deliberately support the 'bootindex' attribute as the machine is unable
to boot from such device.
Reject '<boot order' on non-PCI virtiofs, add tests validating that it's
rejected as well as that virtiofs on PCI-based hosts but without address
specified will be accepted.
Resolves: https://issues.redhat.com/browse/RHEL-22728
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Pretty straightforward. Just put mem-reserve attribute whenever
it's set. Previous commit ensures it's set only for valid
controller models.
Resolves: https://issues.redhat.com/browse/RHEL-7461
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Only two controller models allow setting mem-reserve:
pcie-root-port and pci-bridge. Reflect this fact during
validation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Ages ago origCPU in domain private data was introduced to provide
backward compatibility when migrating to an old libvirt, which did not
support fetching updated CPU definition from QEMU. Thus origCPU will
contain the original CPU definition before such update. But only if the
update actually changed anything. Let's always fill origCPU with the
original definition when starting a domain so that we can rely on it
being always set, even if it matches the updated definition.
This fixes migration or save operations with custom domain XML after
commit v10.1.0-88-g14d3517410, which expected origCPU to be always set
to the CPU definition from inactive XML to check features explicitly
requested by a user.
https://issues.redhat.com/browse/RHEL-30622
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Tested-by: Han Han <hhan@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The only thing we need to free in the cleanup code is virCPUDef and for
that we already have g_autoptr handler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In QEMU and LXC drivers in a few places only
virNetDevBandwidthClear() is called. This means that if an
interface is of openvswitch vport profile, its QoS is not
removed. And to make matters worse - OVS is designed to remember
state even when corresponding interface is gone. This leads to
stale QoS settings piling up in OVS database.
To resolve this, introduce virDomainInterfaceClearQoS() which
looks at given interface and calls corresponding QoS clear
function. Then, basically replace virNetDevBandwidthClear() calls
in those hypervisor drivers with this new function.
Resolves: https://issues.redhat.com/browse/RHEL-30373
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Both LXC and QEMU drivers have the same code to remove vport when
removing a domain's interface. Instead of repeating the same
pattern in both drivers, move the code into hypervisor agnostic
location (src/hypervisor/) and switch to calling this new
function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This patch will allow usb-net devices to be automatically assigned a USB
address (and skip any attempt to assign a PCI one).
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Implement display="on" and ramfb="on" for vfio PCI host devices in qemu.
This enables passthrough PCI devices for display just like we did for
mdevs.
Resolves: https://issues.redhat.com/browse/RHEL-28808
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
XML metadata for snapshot contains only single list of disk overlays
from the moment when the snapshot was taken. When user creates multiple
branches of snapshots the parent snapshot will still list only the
original disk overlays. This may cause an issue in a specific scenario:
s1
|
+- s2
+- s3 (active)
For this snapshot topology when we delete s2 metadata for s1 are not
updated. Now when we delete s1 the code operated with incorrect
overlays from s1 metadata in order to update s3 metadata resulting in no
changes to s3 metadata.
Now when user tries to delete s3 it fails with following error:
error: Failed to delete snapshot s3
error: operation failed: snapshot VM disk source and parent disk source are not the same
For the actual deletion there is a code to figure out the correct disk
source but it was not used to update metadata as well. Due to reasons
how block commit in libvirt works we need to create a copy of that disk
source in order to have it available when updating metadata as the
original source will be freed at that point.
Resolves: https://issues.redhat.com/browse/RHEL-26276
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Calling this function when deleting internal snapshot isn't required
because with internal snapshots all changes are done within the file
itself so there is no file deletion and no need to update snapshot
metadata.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
If the original code detected a missing or null boot index in the
new XML, it automatically added the current value. This
autocompletion was incorrect because it was impossible to
distinguish between user intent and user error - changing the
boot order itself is forbidden and should always be an error.
Resolves: https://issues.redhat.com/browse/RHEL-23416
Fixes: aa3e07caec
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Change the log level for pauses of guests due to watchdog timeouts
or io errors from debug to warn to enhance the visibility of such
events.
Signed-off-by: Lennart Fricke <lennart.fricke@drehpunkt.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Implement support for loongarch64 in the QEMU driver.
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Add loongarch cpu support, Define new cpu type 'loongarch64'
and implement it's driver functions.
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Current entries should always be listed before obsolete ones.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
The source tag sets the rootdir property of the device, which is
the directory exposed to the guest via the MTP device. The target
tag sets the desc property. This device supports read-only mode
as well. Like virtiofs, it does not support additional access
modes.
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Expose usb-mtp device as another type of <filesystem/>.
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This capability reflects presence of -device usb-mtp.
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use svirt_t instead of virtd_t, since virtd_t is not available in the
session mode and qemu with svirt_t won't be able to talk to unconfined_t
socket.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
On domain startup, qemuSetupCgroupForExtDevices checks
if a cgroup controller is present and skips the setup if not.
Add a similar check to qemuVirtioFSSetupCgroup to prevent
crashing when hotplugging a virtiofs filesystem.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Features marked with added='yes' in CPU model definitions have to be
removed before migration, otherwise older libvirt would complain about
unknown CPU features. We only do this for features that were enabled for
a given CPU model even with older libvirt, which just ignored the
features. And only for features we added ourselves when updating CPU
definition during domain startup, that is we do not remove features
which were explicitly mentioned by a user.
That said, this is not the safest thing we could do, but it's
effectively the same thing we did before the affected features were
added: we ignored them completely on both sides of migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The content is arch specific and checking for Icelake-Server CPU model
on non-x86 architectures does not make sense.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
In near future we will want to check whether capabilities for
given virtType exist, but report an error on our own. Introduce
reportError argument which makes the function report an error iff
set.
In one specific case (virQEMUCapsGetDefaultVersion()) we were
even overwriting (more specific) error message reportd by
virCapabilitiesDomainDataLookup(). Drop that too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The 'display-reload' QMP command had been introduced from QEMU 6.0.0:
9cc0765165
Currently it only supports reloading TLS certificates for VNC.
Resloves: https://issues.redhat.com/browse/RHEL-16333
Signed-off-by: Zheng Yan <yanzheng759@huawei.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The 'display-reload' QMP command was introduced in QEMU 6.0.0, so we
add a compatible capability to check if target QEMU binary supports it.
{"execute":"display-reload", "arguments":{"type": "vnc", "tls-certs": true}}
The new QMP refer to:
9cc0765165
Signed-off-by: Zheng Yan <yanzheng759@huawei.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Currently all machine types which do honour '-usb' are already covered
by code which will either select a proper controller model or would
select the same one which '-usb' would use.
Thus all of the legacy -usb controller code can be removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
- 'virt*' machines already don't allow downgrade
- 'versatilepb' and 'realview' machines use 'pci-ohci' controller with '-usb'
- all other machines ignore '-usb' (some have sysbus-based USB
controller which we don't even consider)
For the 'versatilepb' and 'realview' machines libvirt would already
resort to picking either an existing controller model or trying to pick
the one which '-usb' would select and thus fail either way.
All other machine types ignore it.
We can thus remove the fallback for all arm-based machines.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
- 'pseries' machines already don't allow downgrade
- 'g3beige' and 'mac99' machines use 'pci-ohci' controller with '-usb'
- all other machines ignore '-usb'
For 'g3beige' and 'mac99' libvirt already has 'pci-ohci' as contoller it
would select as one of the options when picking a model, thus it's
impossible to reach situation when '-usb' would be honoured.
All other machine types ignore it.
We can thus remove the fallback for all ppc-based machines.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The default USB device auto-selection code for 'pseries' machines picks
controller models which are also selected when '-usb' is used thus it's
impossible to end up in the case when using '-usb' would be possible:
$ qemu-system-ppc64 --machine pseries,usb=on
qemu-system-ppc64: could not find a module for type 'nec-usb-xhci'
$ qemu-system-ppc64 --machine pseries-2.5,usb=on
qemu-system-ppc64: could not find a module for type 'pci-ohci'
Remove the impossible downgrade and adjust tests.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
- 'q35' machine type already explicitly forbids fallback
- 'isapc' never supported USB and '-usb' is ignored
- 'i440fx' does support '-usb' and translates it into 'piix3-uhci' which
is identical to what libvirt selects
- we currently don't care about 'microvm'
Attempting to start an 'pc' (i440fx) machine with -usb when 'piix3-uhci'
is compiled out will fail and in any other case libvirt will use the
proper explicitly selected controller.
Drop the '-usb' downgrade for x86 arch.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This controller is used as the default/implicit USB controller by
multiple machine types which honour the '-usb' flag of qemu. Add it as
fallback in libvirt too.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The machine types historically have a default USB controller populated
via '-usb' which libvirt assumed implicitly. Qemu will use 'pci-ohci'
for both if '-usb' is used.
Unfortunately an USB controller instantiated via '-usb' is unusable as
the bus name libvirt generates doesn't reflect the real name qemu uses,
and thus no libvirt-defined USB devices can be put on the controller.
This patch will populate the default USB controller into the XML and
select it's model to 'pci-ohci' unconditionally as the machine would
fail to start with '-usb' if that controller model is not available.
This patch doesn't try to make any other assumptions about
auto-populated model of USB controllers, which means that for an
explicit USB controller without model a different model will be picked.
Note that this will likely cause ABI differences and break migration for
the two machine types, in the corner case when the default USB
controller would be populated, but given that both are obsolete board
types and USB was unusable it doesn't make sense to keep supporting this
specific case when '-usb' was formatted.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Assign VIR_DOMAIN_CONTROLLER_MODEL_USB_DEFAULT rather than -1.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Most machine types are avaliable in all arches by qemu. This is also
true for the 'versatilepb' machine type example in the tests.
Move all the ARM architectures together so that they are handled in
sync.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Ideally check='partial' would check exactly the features QEMU would want
to enable when asked for a specific CPU model (and features). But there
is no way we could ask QEMU how a specific CPU would look like. So we
use our definition from CPU map, which may slightly differ as QEMU adds
or removes features from CPU models, and thus we may end up checking
features which QEMU would not enable while missing some required ones.
We can do better in specific cases, though. If a CPU definition uses
only a model and disabled features (or none at all), we already know
whether QEMU can enable all features required by the CPU model as that's
what we use to set usable='yes' attribute in the list of available CPU
models in domain capbilities XML. So when a usable CPU model is
requested without asking for additional features (disabling features is
fine) we can avoid our possible inaccurate check using our CPU map.
For backward compatibility we only consider usable models. If a
specified model is not usable, we still check it the old way and even
let QEMU start it (and disable some features) in case our definition
lacks some features compared to QEMU.
Fixes: https://gitlab.com/libvirt/libvirt/-/issues/608
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
At the moment, any kind of issue being detected in any of the
firmware descriptor files will result in the entire process
being aborted.
In particular, installing a build of edk2 for an architecture
that libvirt doesn't yet know about, for example loongarch64,
will break most firmware-related functionality: it will no
longer be possible to define new EFI VMs, start existing ones,
or even just obtain the domcapabilities for any architecture.
This is obviously unnecessarily harsh. Adopt a more relaxed
approach and simply ignore the firmware descriptors that we
are unable to parse correctly.
https://bugzilla.redhat.com/show_bug.cgi?id=2258946
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Instead of returning the list of paths exactly as obtained
from qemuFirmwareFetchConfigs(), and allocating the list of
firmwares to be exactly that size right away, start with two
empty lists and add elements to them one by one.
At the moment this only makes things more verbose, but later
we're going to change things so that it's possible that some
of the paths/firmwares are not included in the lists returned
to the caller, and at that point the changes will pay off.
Note that we can't use g_auto() for the new list of paths,
because until the very last moment it's not null-terminated,
so g_strfreev() wouldn't be able to handle it correctly.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In a couple of cases, we were reporting an error without
actually terminating the parse process.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The current implementation sets the guest-sync timeout to the
smaller value between the default value (QEMU_AGENT_WAIT_TIME)
and agent->timeout, without considering the timeout passed
via the qga command.
This patch enhances the guest-sync timeout logic to use the
minimum value among the default value, agent->timeout, and
the timeout passed via the qga command.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/590
Signed-off-by: ray <honglei.wang@smartx.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Move the assumption from the code pre-creating the storage to
qemuMigrationDstPrepareStorage where it's checked for other cases.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Migrating into a 'directory' won't ever work as we ask qemu to emulate a
fat filesystem, so restoring of the files won't be possible. Same for
'vhost-user' disks which don't support blockjobs as there's no block
backend used in qemu.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Check the existance of storage per-type rather than trying to come up
with a common "path".
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have a switch statement, the code adding the 'slice' for
block devices of non-equal sizes can be moved to appropriate location.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Automatically free helper variables, remove the 'cleanup' label and
use virBufferCurrentContent() to take the XML from the buffer rather
than extracting it to a separate variable.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Allow storage migration of VDPA devices by properly checking that they
exist on the destionation. Pre-creation is not supported but if the
device exists the migration should be able to succeed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Decrease the likelyhood that addition of a new storage type will be
forgotten.
This patch also unifies the type check to consult the 'actual' type of
the storage in both cases as the NVMe check looked for the XML declared
type while virStorageSourceIsLocalStorage() looks for the
actual/translated type.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This option controls whether the sysctl config for enabling unprivileged
userfaultfd will be installed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
/dev/userfaultfd device is preferred over userfaultfd syscall for
post-copy migrations. Unless qemu driver is configured to disable mount
namespace or to forbid access to /dev/userfaultfd in cgroup_device_acl,
we will copy it to the limited /dev filesystem QEMU will have access to
and label it appropriately. So in the default configuration post-copy
migration will be allowed even without enabling
vm.unprivileged_userfaultfd sysctl.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Previously we were only starting or stopping nbdkit when the guest was
started or stopped or when hotplugging/unplugging a disk. But when doing
block operations, the disk backing store sources can also be be added or
removed independently of the disk device. When this happens the nbdkit
backend was not being handled properly. For example, when doing a
blockcopy from a nbdkit-backed disk to a new disk and pivoting to that
new location, the nbdkit process did not get cleaned up properly. Add
some functionality to qemuDomainStorageSourceAccessModify() to handle
this scenario.
Since we're now starting nbdkit from the ChainAccessAllow/Revoke()
functions, we no longer need to explicitly start nbdkit in hotplug code
paths because the hotplug functions already call these allow/revoke
functions and will start/stop nbdkit if necessary.
Add a check to qemuNbdkitProcessStart() to report an error if we
are trying to start nbdkit for a disk source that already has a running
nbdkit process. This shouldn't happen, and if it does it indicates an
error in another part of our code.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When starting nbdkit processes for the backing store of a disk, we were
returning an error if any backing store failed, but we were not cleaning
up processes that succeeded higher in the chain. Make sure that if we
return a failure status from qemuNbdkitStartStorageSource() that we roll
back any processes that had been started.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This will allow us to start or stop nbdkit for just a single disk source
or for every source in the backing chain. This will be used in following
patches.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
After previous cleanups, qemuMonitorIOWriteWithFD() is but a thin wrapper
over virSocketSendMsgWithFDs(). Replace the body of the former
with a call to the latter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The 'raw' driver without any special configuration is not needed and
creates overhead in qemu.
Stop using the 'raw' format driver in cases when it's not needed. A
special case when it is needed is for FD passed images with only a
single writable FD passed, where we need an overlay driver to properly
reflect the 'read-only' flag.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Store whether qemu supports the appropriate option for block-stream and
block-commit commands and always use it if available.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability is asserted when both block-stream and block-commit QMP
commands support the 'backing-mask-protocol' argument.
The argument causes qemu to record 'raw' as the backing file format in
case when a protocol node is used directly. This is needed to preserve
compatibility of images after a block-commit or block-pull libvirt
operation with older libvirt versions in case when we'll want to remove
the unneded 'raw' format drivers from the block graph.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Move domain interface management methods from qemu to hypervisor. This
refactoring allows the domain management methods to be shared between CH and
qemu drivers.
This commit does not introduce any functional changes.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>