This patch adds support for SUSPEND_DISK event; both lifecycle and
separated. The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off. This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc. This patch is not
aiming for that functionality.
Using VIR_DOMAIN_XML_MIGRATABLE flag, one can request domain's XML
configuration that is suitable for migration or save/restore. Such XML
may contain extra run-time stuff internal to libvirt and some default
configuration may be removed for better compatibility of the XML with
older libvirt releases.
This flag may serve as an easy way to get the XML that can be passed
(after desired modifications) to APIs that accept custom XMLs, such as
virDomainMigrate{,ToURI}2 or virDomainSaveFlags.
The qemuMonitorSetCapabilities() API is used to initialize the QMP
protocol capabilities. It has since been abused to initialize some
libvirt internal capabilities based on command/event existance too.
Move the latter code out into qemuCapsProbeQMP() in the QEMU
capabilities source file instead
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove all use of the existing APIs for querying QEMU
capability flags. Instead obtain a qemuCapsPtr object
from the global cache. This avoids the execution of
'qemu -help' (and related commands) when launching new
guests.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the qemuAgentClose method is called from a place which holds
the domain lock, it is theoretically possible to get a deadlock
in the agent destroy callback. This has not been observed, but
the equivalent code in the QEMU monitor destroy callback has seen
a deadlock.
Remove the redundant locking while unrefing the object and the
bogus assignment
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU
monitor callbacks
Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)):
#0 0x00000030d0e0de4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000030d0e09ca6 in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00000030d0e09ba8 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fcd162f416d in virMutexLock (m=<optimized out>)
at util/threads-pthread.c:85
#4 0x00007fcd1632c651 in virDomainObjLock (obj=<optimized out>)
at conf/domain_conf.c:14256
#5 0x00007fcd0daf05cc in qemuProcessHandleMonitorDestroy (mon=0x7fcccc0029e0,
vm=0x7fcccc00a850) at qemu/qemu_process.c:1026
#6 0x00007fcd0db01710 in qemuMonitorDispose (obj=0x7fcccc0029e0)
at qemu/qemu_monitor.c:249
#7 0x00007fcd162fd4e3 in virObjectUnref (anyobj=<optimized out>)
at util/virobject.c:139
#8 0x00007fcd0db027a9 in qemuMonitorClose (mon=<optimized out>)
at qemu/qemu_monitor.c:860
#9 0x00007fcd0daf61ad in qemuProcessStop (driver=driver@entry=0x7fcd04079d50,
vm=vm@entry=0x7fcccc00a850,
reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0)
at qemu/qemu_process.c:4057
#10 0x00007fcd0db323cf in qemuDomainDestroyFlags (dom=<optimized out>,
flags=<optimized out>) at qemu/qemu_driver.c:1977
#11 0x00007fcd1637ff51 in virDomainDestroyFlags (
domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256
At frame #10 we are holding the domain lock, we call into
qemuProcessStop() to cleanup QEMU, which triggers the monitor
to close, which invokes qemuProcessHandleMonitorDestroy() which
tries to obtain the domain lock again. This is a non-recursive
lock, hence hang.
Since qemuMonitorPtr is a virObject, the unref call in
qemuProcessHandleMonitorDestroy no longer needs mutex
protection. The assignment of priv->mon = NULL, can be
instead done by the caller of qemuMonitorClose(), thus
removing all need for locking.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If QEMU quits immediately after we opened the monitor it was
possible we would skip the clearing of the SELinux process
socket context
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the cgroups APIs we have a virCgroupKillPainfully function
which does the loop sending SIGTERM, then SIGKILL and waiting
for the process to exit. There is similar functionality for
simple processes in qemuProcessKill, but it is tangled with
the QEMU code. Untangle it to provide a virProcessKillPainfuly
function
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of process related functions spread
across multiple files. Start to consolidate them by
creating a virprocess.{c,h} file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://www.gnu.org/licenses/gpl-howto.html recommends that
the 'If not, see <url>.' phrase be a separate sentence.
* tests/securityselinuxhelper.c: Remove doubled line.
* tests/securityselinuxtest.c: Likewise.
* globally: s/; If/. If/
Currently, we mark domain PAUSED (but not emit an event)
just before we issue 'stop' on monitor; This command can
take ages to finish, esp. when domain's doing a lot of
IO - users can enforce qemu to open files with O_DIRECT
which doesn't return from write() until data reaches the
block device. Having said that, we report PAUSED even if
domain is not paused yet.
On agent EOF the qemuProcessHandleAgentEOF() callback is called
which locks virDomainObjPtr. Then qemuAgentClose() is called
(with domain object locked) which eventually calls qemuAgentDispose()
and qemuProcessHandleAgentDestroy(). This tries to lock the
domain object again. Hence the deadlock.
The current qemu capabilities are stored in a virBitmapPtr
object, whose type is exposed to callers. We want to store
more data besides just the flags, so we need to move to a
struct type. This object will also need to be reference
counted, since we'll be maintaining a cache of data per
binary. This change introduces a 'qemuCapsPtr' virObject
class. Most of the change is just renaming types and
variables in all the callers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When reboot using qemu guest agent was requested, qemu driver kept
waiting for SHUTDOWN event from qemu. However, such event is never
emitted during guest reboot and qemu driver would keep waiting forever.
After fixing the last review comments on remote port searching (commit
a14b4aea51), the commit right after that
wasn't modified accordingly, therefore two values weren't changed as
they should and the configurable ports don't work as expected.
This simple commit changes last two values missed and fixes the issue.
In my quest for reusing variables I failed to edit one variable when
fixing details between two patch versions. That results in a failure
to start qemu with autoport and spice tls, because qemu is trying to
bind two sockets to the same port.
Emulator threads should also be pinned by sched_setaffinity(), just
the same as vcpu threads.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Create a new cgroup and move all emulator threads to the new cgroup.
And then we can do the other things:
1. limit only vcpu usage rather than the whole qemu
2. limit for emulator threads(include vhost-net threads)
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
After the cleanup of remote display port allocation, I noticed some
messages that didn't make a lot of sense the way they were written. So
I rephrased them.
The defines QEMU_REMOTE_PORT_MIN and QEMU_REMOTE_PORT_MAX were used to
find free port when starting domains. As this was hard-coded to the
same ports as default VNC servers, there were races with these other
programs. This patch includes the possibility to change the default
starting port as well as the maximum port (mostly for completeness) in
qemu config file.
Support for two new config options in qemu.conf is added:
- remote_port_min (defaults to QEMU_REMOTE_PORT_MIN and
must be >= than this value)
- remote_port_max (defaults to QEMU_REMOTE_PORT_MAX and
must be <= than this value)
Port allocations for SPICE and VNC behave almost the same (with
default ports), but there is some mess in the code. This patch clears
these inconsistencies and makes sure the same behavior will be used
when ports for remote displays are changed.
Changes:
- hard-coded number 5900 removed (handled elsewhere like with VNC)
- reservedVNCPorts renamed to reservedRemotePorts (it's not just for
VNC anymore)
- QEMU_VNC_PORT_{MIN,MAX} renamed to QEMU_REMOTE_PORT_{MIN,MAX}
- port allocation unified for VNC and SPICE
These changes make the security drivers able to find and handle the
correct security label information when more than one label is
available. They also update the DAC driver to be used as an usual
security driver.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
This patch updates the structures that store information about each
domain and each hypervisor to support multiple security labels and
drivers. It also updates all the remaining code to use the new fields.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
Rename qemuDefaultScsiControllerModel to qemuCheckScsiControllerModel.
When scsi model is given explicitly in XML(model > 0) checking if the
underlying QEMU supports it or not first, raise an error on checking
failure.
When the model is not given(mode <= 0), return LSI by default, if
the QEMU doesn't support it, raise an error.
Switch virDomainObjPtr to use the virObject APIs for reference
counting. The main change is that virObjectUnref does not return
the reference count, merely a bool indicating whether the object
still has any refs left. Checking the return value is also not
mandatory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This converts the following public API datatypes to use the
virObject infrastructure:
virConnectPtr
virDomainPtr
virDomainSnapshotPtr
virInterfacePtr
virNetworkPtr
virNodeDevicePtr
virNWFilterPtr
virSecretPtr
virStreamPtr
virStorageVolPtr
virStoragePoolPtr
The code is significantly simplified, since the mutex in the
virConnectPtr object now only needs to be held when accessing
the per-connection virError object instance. All other operations
are completely lock free.
* src/datatypes.c, src/datatypes.h, src/libvirt.c: Convert
public datatypes to use virObject
* src/conf/domain_event.c, src/phyp/phyp_driver.c,
src/qemu/qemu_command.c, src/qemu/qemu_migration.c,
src/qemu/qemu_process.c, src/storage/storage_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xend_internal.c,
tests/qemuxml2argvtest.c, tests/qemuxmlnstest.c,
tests/sexpr2xmltest.c, tests/xmconfigtest.c: Convert
to use virObjectUnref/virObjectRef
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Per the FSF address could be changed from time to time, and GNU
recommends the following now: (http://www.gnu.org/licenses/gpl-howto.html)
You should have received a copy of the GNU General Public License
along with Foobar. If not, see <http://www.gnu.org/licenses/>.
This patch removes the explicit FSF address, and uses above instead
(of course, with inserting 'Lesser' before 'General').
Except a bunch of files for security driver, all others are changed
automatically, the copyright for securify files are not complete,
that's why to do it manually:
src/security/security_selinux.h
src/security/security_driver.h
src/security/security_selinux.c
src/security/security_apparmor.h
src/security/security_apparmor.c
src/security/security_driver.c
The previous check for YAJL would have many undesirable
consequences, the most important being that it caused the
capabilities XML to lose all <guest> elements. There is
no user visible feedback as to what is wrong in this respect,
merely a syslog message. The empty capabilities causes
libvirtd to then throw away all guest XML configs that are
stored.
This changes the code so that the check for YAJL is only
performed at the time we attempt to spawn a QEMU process
error: Failed to start domain vm-vnc
error: unsupported configuration: this qemu binary requires libvirt to be compiled with yajl
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce new members in the virMacAddr 'class'
- virMacAddrSet: set virMacAddr from a virMacAddr
- virMacAddrSetRaw: setting virMacAddr from raw 6 byte MAC address buffer
- virMacAddrGetRaw: writing virMacAddr into raw 6 byte MAC address buffer
- virMacAddrCmp: comparing two virMacAddr
- virMacAddrCmpRaw: comparing a virMacAddr with a raw 6 byte MAC address buffer
then replace raw MAC addresses by replacing
- 'unsigned char *' with virMacAddrPtr
- 'unsigned char ... [VIR_MAC_BUFLEN]' with virMacAddr
and introduce usage of above functions where necessary.
If QEMU supports the BALLOON_EVENT QMP event, then we can
avoid invoking 'query-balloon' when returning XML or the
domain info.
* src/qemu/qemu_capabilities.c, src/qemu/qemu_capabilities.h:
Add QEMU_CAPS_BALLOON_EVENT
* src/qemu/qemu_driver.c: Skip query-balloon in
qemudDomainGetInfo and qemuDomainGetXMLDesc if we have
QEMU_CAPS_BALLOON_EVENT set
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Check
for BALLOON_EVENT at connect to monitor. Add callback
for balloon change notifications
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h:
Add handling of BALLOON_EVENT and impl 'query-events'
check
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This is in preparation of the enablement of s390 guests with virtio devices.
The assignment of device addresses happens in different places, i.e. the
qemu driver and process modules as well as in the unit tests in slightly
different flavors. Currently, these are PPC spapr-vio and PCI
devices, virtio-s390 (not PCI based) will follow.
By optionally passing to qemuDomainAssignAddresses the domain
object and the capabilities it is now possible to call the function
from most of the places (except for hotplug) where address assignment
is done.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
If no 'listen' attribute or <listen> element is set in the
guest XML, the default driver configured listen address is
used. There is no way to client applications to determine
what this address is though. When starting the guest, we
should update the live XML to include this default listen
address
With latest changes to qemu-ga success on some commands is not reported
anymore, e.g. guest-shutdown or guest-suspend-*. However, errors are
still being reported. Therefore, we need to find different source of
indication if operation was successful. Events.
A core use case of the hook scripts is to be able to do things
to a guest's network configuration. It is possible to hook into
the 'start' operation for a QEMU guest which runs just before
the guest is started. The TAP devices will exist at this point,
but the QEMU process will not. It can be desirable to have a
'started' hook too, which runs once QEMU has started.
If libvirtd is restarted it will re-populate firewall rules,
but there is no QEMU hook to trigger for existing domains.
This is solved with a 'reconnect' hook.
Finally, if attaching to an external QEMU process there needs
to be an 'attach' hook script.
This all also applies to the LXC driver
* docs/hooks.html.in: Document new operations
* src/util/hooks.c, src/util/hooks.c: Add 'started', 'reconnect'
and 'attach' operations for QEMU. Add 'prepare', 'started',
'release' and 'reconnect' operations for LXC
* src/lxc/lxc_driver.c: Add hooks for 'prepare', 'started',
'release' and 'reconnect' operations
* src/qemu/qemu_process.c: Add hooks for 'started', 'reconnect'
and 'reconnect' operations
Currently, if qemuProcessStart fail at some point, e.g. because
domain being started wants a PCI/USB device already assigned to
a different domain, we jump to cleanup label where qemuProcessStop
is performed. This unconditionally calls virSecurityManagerRestoreAllLabel
which is wrong because the other domain is still using those devices.
However, once we successfully label all devices/paths in
qemuProcessStart() from that point on, we have to perform a rollback
on failure - that is - we have to virSecurityManagerRestoreAllLabel.
When libvirtd is started and there is an unusable/not-connectable
leftover from earlier started machine, it's more reasonable to say
that the machine "crashed" if we know it was started with
"-no-shutdown".
This patch fixes that and also changes the other result (when machine
was started without "-no-shutdown") to "unknown", because the previous
"failed" reason means (according to include/libvirt/libvirt.h.in:174),
that the machine failed to start.
Like for 'static' placement, when the memory policy mode is
'strict', set the memory policy by writing the advisory nodeset
returned from numad to cgroup file cpuset.mems,
On some of the NUMA platforms, the CPU index in each NUMA node
grows non-consecutive. While on other platforms, it can be inconsecutive,
E.g.
% numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 4 8 12 16 20 24 28
node 0 size: 131058 MB
node 0 free: 86531 MB
node 1 cpus: 1 5 9 13 17 21 25 29
node 1 size: 131072 MB
node 1 free: 127070 MB
node 2 cpus: 2 6 10 14 18 22 26 30
node 2 size: 131072 MB
node 2 free: 127758 MB
node 3 cpus: 3 7 11 15 19 23 27 31
node 3 size: 131072 MB
node 3 free: 127226 MB
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10
This patch is to fix the problem by using the CPU index in
caps->host.numaCell[i]->cpus[i] to set the bitmask instead of
assuming the CPU index of the NUMA nodes are always sequential.
This patch lifts the limit of calling thread detection code only on KVM
guests. With upstream qemu the thread mappings are reported also on
non-KVM machines.
QEMU adopted the thread_id information from the kvm branch.
To remain compatible with older upstream versions of qemu the check is
attempted but the failure to detect threads (or even run the monitor
command - on older versions without SMP support) is treated non-fatal
and the code reports one vCPU with pid of the hypervisor (in same
fashion this was done on non-KVM guests).
Though numad will manage the memory allocation of task dynamically,
it wants management application (libvirt) to pre-set the memory
policy according to the advisory nodeset returned from querying numad,
(just like pre-bind CPU nodeset for domain process), and thus the
performance could benefit much more from it.
This patch introduces new XML tag 'placement', value 'auto' indicates
whether to set the memory policy with the advisory nodeset from numad,
and its value defaults to the value of <vcpu> placement, or 'static'
if 'nodeset' is specified. Example of the new XML tag's usage:
<numatune>
<memory placement='auto' mode='interleave'/>
</numatune>
Just like what current "numatune" does, the 'auto' numa memory policy
setting uses libnuma's API too.
If <vcpu> "placement" is "auto", and <numatune> is not specified
explicitly, a default <numatume> will be added with "placement"
set as "auto", and "mode" set as "strict".
The following XML can now fully drive numad:
1) <vcpu> placement is 'auto', no <numatune> is specified.
<vcpu placement='auto'>10</vcpu>
2) <vcpu> placement is 'auto', no 'placement' is specified for
<numatune>.
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='interleave'/>
</numatune>
And it's also able to control the CPU placement and memory policy
independently. e.g.
1) <vcpu> placement is 'auto', and <numatune> placement is 'static'
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='strict' nodeset='0-10,^7'/>
</numatune>
2) <vcpu> placement is 'static', and <numatune> placement is 'auto'
<vcpu placement='static' cpuset='0-24,^12'>10</vcpu>
<numatune>
<memory mode='interleave' placement='auto'/>
</numatume>
A follow up patch will change the XML formatting codes to always output
'placement' for <vcpu>, even it's 'static'.
When we added the default USB controller into domain XML, we efficiently
broke migration to older versions of libvirt that didn't support USB
controllers at all (0.9.4 and earlier) even for domains that don't use
anything that the older libvirt can't provide. We still want to present
the default USB controller in any XML seen by a user/app but we can
safely remove it from the domain XML used during migration. If we are
migrating to a new enough libvirt, it will add the controller XML back,
while older libvirt won't be confused with it although it will still
tell qemu to create the controller.
Similar approach can be used in the future whenever we find out we
always enabled some kind of device without properly advertising it in
domain XML.
More bug extermination in the category of:
Error: CHECKED_RETURN:
/libvirt/src/conf/network_conf.c:595:
check_return: Calling function "virAsprintf" without checking return value (as is done elsewhere 515 out of 543 times).
/libvirt/src/qemu/qemu_process.c:2780:
unchecked_value: No check of the return value of "virAsprintf(&msg, "was paused (%s)", virDomainPausedReasonTypeToString(reason))".
/libvirt/tests/commandtest.c:809:
check_return: Calling function "setsid" without checking return value (as is done elsewhere 4 out of 5 times).
/libvirt/tests/commandtest.c:830:
unchecked_value: No check of the return value of "virTestGetDebug()".
/libvirt/tests/commandtest.c:831:
check_return: Calling function "virTestGetVerbose" without checking return value (as is done elsewhere 41 out of 42 times).
/libvirt/tests/commandtest.c:833:
check_return: Calling function "virInitialize" without checking return value (as is done elsewhere 18 out of 21 times).
One note about the error in commandtest line 809: setsid() seems to fail when running the test -- could be removed ?
If console[0] is an alias for serial[0], do not enforce the former to
have a PTY source type. This breaks serial consoles on stdio and makes
no sense.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Currently, we have 3 boolean arguments we have to pass
to qemuProcessStart(). As libvirt grows it is harder and harder
to remember them and their position. Therefore we should
switch to flags instead.
Instead of returning a CPUs list, numad returns NUMA node
list instead, this patch is to convert the node list to
cpumap before affinity setting. Otherwise, the domain
processes will be pinned only to CPU[$numa_cell_num],
which will cause significiant performance losses.
Also because numad will balance the affinity dynamically,
reflecting the cpuset from numad back doesn't make much
sense then, and it may just could produce confusion for
the users. Thus the better way is not to reflect it back
to XML. And in this case, it's better to ignore the cpuset
when parsing XML.
The codes to update the cpuset is removed in this patch
incidentally, and there will be a follow up patch to ignore
the manually specified "cpuset" if "placement" is "auto",
and document will be updated too.
In case an API fails with "cannot acquire state change lock", searching
for the API that possibly forgot to end its job is not always easy.
Let's keep track of the job owner and print it out for easier
identification.
As reported by Daniel Berrangé, we have a huge performance regression
for virDomainGetInfo() due to the change which makes virDomainEndJob()
save the XML status file every time it is called. Previous to that
change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that
change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs.
We made the change to be able to recover from libvirtd restart in the
middle of a job. However, only destroy and async jobs are taken care of.
Thus it makes more sense to only save domain state XML when these jobs
are started/stopped.
If the daemon is restarted it will lose list of active
USB devices assigned to active domains. Therefore we need
to rebuild this list on qemuProcessReconnect().
Originally, qemuDomainCheckEjectableMedia was entering monitor with qemu
driver lock. Commit 2067e31bf9, which I
made to fix that, revealed another issue we had (but didn't notice it
since the driver was locked): we didn't set nested job when
qemuDomainCheckEjectableMedia is called during migration. Thus the
original fix I made was wrong.
Since Xen 3.1 the clock=variable semantic is supported. In addition to
qemu/kvm Xen also knows about a variant where the offset is relative to
'localtime' instead of 'utc'.
Extends the libvirt structure with a flag 'basis' to specify, if the
offset is relative to 'localtime' or 'utc'.
Extends the libvirt structure with a flag 'reset' to force the reset
behaviour of 'localtime' and 'utc'; this is needed for backward
compatibility with previous versions of libvirt, since they report
incorrect XML.
Adapt the only user 'qemu' to the new name.
Extend the RelaxNG schema accordingly.
Document the new 'basis' attribute in the HTML documentation.
Adapt test for the new attribute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
The code is splattered with a mix of
sizeof foo
sizeof (foo)
sizeof(foo)
Standardize on sizeof(foo) and add a syntax check rule to
enforce it
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When qemu cannot start, we may call qemuProcessStop() twice.
We have check whether the vm is running at the beginning of
qemuProcessStop() to avoid libvirt deadlock. We call
qemuProcessStop() with driver and vm locked. It seems that
we can avoid libvirt deadlock. But unfortunately we may
unlock driver and vm in the function qemuProcessKill() while
vm->def->id is not -1. So qemuProcessStop() will be run twice,
and monitor will be freed unexpectedly. So we should set
vm->def->id to -1 at the beginning of qemuProcessStop().
Found when attempting to build on Fedora 17 alpha with:
./autogen.sh --system --enable-compile-warnings=error
(this same build command works without problem on Fedora 16). Since
the consumer of the qemuProcessReconnectData doesn't assume that the
other fields of the struct are initialized (although it uses them
internally), the simpler solution is to just switch to C99-style
struct initialization (which doesn't require specification of all
fields).
Return statements with parameter enclosed in parentheses were modified
and parentheses were removed. The whole change was scripted, here is how:
List of files was obtained using this command:
git grep -l -e '\<return\s*([^()]*\(([^()]*)[^()]*\)*)\s*;' | \
grep -e '\.[ch]$' -e '\.py$'
Found files were modified with this command:
sed -i -e \
's_^\(.*\<return\)\s*(\(\([^()]*([^()]*)[^()]*\)*\))\s*\(;.*$\)_\1 \2\4_' \
-e 's_^\(.*\<return\)\s*(\([^()]*\))\s*\(;.*$\)_\1 \2\3_'
Then checked for nonsense.
The whole command looks like this:
git grep -l -e '\<return\s*([^()]*\(([^()]*)[^()]*\)*)\s*;' | \
grep -e '\.[ch]$' -e '\.py$' | xargs sed -i -e \
's_^\(.*\<return\)\s*(\(\([^()]*([^()]*)[^()]*\)*\))\s*\(;.*$\)_\1 \2\4_' \
-e 's_^\(.*\<return\)\s*(\([^()]*\))\s*\(;.*$\)_\1 \2\3_'
This introduces a new running reason VIR_DOMAIN_RUNNING_WAKEUP,
and new suspend event type VIR_DOMAIN_EVENT_STARTED_WAKEUP.
While a wakeup event is emitted, the domain which entered into
VIR_DOMAIN_PMSUSPENDED will be transferred to "running"
with reason VIR_DOMAIN_RUNNING_WAKEUP, and a new domain lifecycle
event emitted with type VIR_DOMAIN_EVENT_STARTED_WAKEUP.
This patch introduces a new event type for the QMP event
SUSPEND:
VIR_DOMAIN_EVENT_ID_PMSUSPEND
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventSuspendCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This patch introduces a new event type for the QMP event
WAKEUP:
VIR_DOMAIN_EVENT_ID_PMWAKEUP
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventWakeupCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This patch introduces a new event type for the QMP event
DEVICE_TRAY_MOVED, which occurs when the tray of a removable
disk is moved (i.e opened or closed):
VIR_DOMAIN_EVENT_ID_TRAY_CHANGE
The event's data includes the device alias and the reason
for tray status' changing, which indicates why the tray
status was changed. Thus the callback definition for the event
is:
enum {
VIR_DOMAIN_EVENT_TRAY_CHANGE_OPEN = 0,
VIR_DOMAIN_EVENT_TRAY_CHANGE_CLOSE,
\#ifdef VIR_ENUM_SENTINELS
VIR_DOMAIN_EVENT_TRAY_CHANGE_LAST
\#endif
} virDomainEventTrayChangeReason;
typedef void
(*virConnectDomainEventTrayChangeCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *devAlias,
int reason,
void *opaque);
numad is an user-level daemon that monitors NUMA topology and
processes resource consumption to facilitate good NUMA resource
alignment of applications/virtual machines to improve performance
and minimize cost of remote memory latencies. It provides a
pre-placement advisory interface, so significant processes can
be pre-bound to nodes with sufficient available resources.
More details: http://fedoraproject.org/wiki/Features/numad
"numad -w ncpus:memory_amount" is the advisory interface numad
provides currently.
This patch add the support by introducing a new XML attribute
for <vcpu>. e.g.
<vcpu placement="auto">4</vcpu>
<vcpu placement="static" cpuset="1-10^6">4</vcpu>
The returned advisory nodeset from numad will be printed
in domain's dumped XML. e.g.
<vcpu placement="auto" cpuset="1-10^6">4</vcpu>
If placement is "auto", the number of vcpus and the current
memory amount specified in domain XML will be used for numad
command line (numad uses MB for memory amount):
numad -w $num_of_vcpus:$current_memory_amount / 1024
The advisory nodeset returned from numad will be used to set
domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity).
If the user specifies both CPU affinity policy (e.g.
(<vcpu cpuset="1-10,^7,^8">4</vcpu>) and placement == "auto"
the specified CPU affinity will be overridden.
Only QEMU/KVM drivers support it now.
See docs update in patch for more details.
Even though we say in documentation setting (tls-)port to -1 is legacy
compat style for enabling autoport, we're roughly doing this for VNC.
However, in case of SPICE auto enable autoport iff both port & tlsPort
are equal -1 as documentation says autoport plays with both.
Currently, startupPolicy='requisite' was determining cold boot
by migrateFrom != NULL. That means, if domain was started up
with migrateFrom set we didn't require disk source path and allowed
it to be dropped. However, on snapshot-revert domain wasn't migrated
but according to documentation, requisite should drop disk source
as well.
Using 'unsigned long' for memory values is risky on 32-bit platforms,
as a PAE guest can have more than 4GiB memory. Our API is
(unfortunately) locked at 'unsigned long' and a scale of 1024, but
the rest of our system should consistently use 64-bit values,
especially since the previous patch centralized overflow checking.
* src/conf/domain_conf.h (_virDomainDef): Always use 64-bit values
for memory. Change hugepage_backed to a bool.
* src/conf/domain_conf.c (virDomainDefParseXML)
(virDomainDefCheckABIStability, virDomainDefFormatInternal): Fix
clients.
* src/vmx/vmx.c (virVMXFormatConfig): Likewise.
* src/xenxs/xen_sxpr.c (xenParseSxpr, xenFormatSxpr): Likewise.
* src/xenxs/xen_xm.c (xenXMConfigGetULongLong): New function.
(xenXMConfigGetULong, xenXMConfigSetInt): Avoid truncation.
(xenParseXM, xenFormatXM): Fix clients.
* src/phyp/phyp_driver.c (phypBuildLpar): Likewise.
* src/openvz/openvz_driver.c (openvzDomainSetMemoryInternal):
Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainDefineXML): Likewise.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Likewise.
* src/qemu/qemu_process.c (qemuProcessStart): Likewise.
* src/qemu/qemu_monitor.h (qemuMonitorGetBalloonInfo): Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONGetBalloonInfo):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainGetInfo)
(qemuDomainGetXMLDesc): Likewise.
* src/uml/uml_conf.c (umlBuildCommandLine): Likewise.
No thanks to 64-bit windows, with 64-bit pid_t, we have to avoid
constructs like 'int pid'. Our API in libvirt-qemu cannot be
changed without breaking ABI; but then again, libvirt-qemu can
only be used on systems that support UNIX sockets, which rules
out Windows (even if qemu could be compiled there) - so for all
points on the call chain that interact with this API decision,
we require a different variable name to make it clear that we
audited the use for safety.
Adding a syntax-check rule only solves half the battle; anywhere
that uses printf on a pid_t still needs to be converted, but that
will be a separate patch.
* cfg.mk (sc_correct_id_types): New syntax check.
* src/libvirt-qemu.c (virDomainQemuAttach): Document why we didn't
use pid_t for pid, and validate for overflow.
* include/libvirt/libvirt-qemu.h (virDomainQemuAttach): Tweak name
for syntax check.
* src/vmware/vmware_conf.c (vmwareExtractPid): Likewise.
* src/driver.h (virDrvDomainQemuAttach): Likewise.
* tools/virsh.c (cmdQemuAttach): Likewise.
* src/remote/qemu_protocol.x (qemu_domain_attach_args): Likewise.
* src/qemu_protocol-structs (qemu_domain_attach_args): Likewise.
* src/util/cgroup.c (virCgroupPidCode, virCgroupKillInternal):
Likewise.
* src/qemu/qemu_command.c(qemuParseProcFileStrings): Likewise.
(qemuParseCommandLinePid): Use pid_t for pid.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* src/conf/domain_conf.h (_virDomainObj): Likewise.
* src/probes.d (rpc_socket_new): Likewise.
* src/qemu/qemu_command.h (qemuParseCommandLinePid): Likewise.
* src/qemu/qemu_driver.c (qemudGetProcessInfo, qemuDomainAttach):
Likewise.
* src/qemu/qemu_process.c (qemuProcessAttach): Likewise.
* src/qemu/qemu_process.h (qemuProcessAttach): Likewise.
* src/uml/uml_driver.c (umlGetProcessInfo): Likewise.
* src/util/virnetdev.h (virNetDevSetNamespace): Likewise.
* src/util/virnetdev.c (virNetDevSetNamespace): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput): Likewise.
* src/conf/storage_conf.h (_virStoragePerms): Use mode_t, uid_t,
and gid_t rather than int.
* src/security/security_dac.c (virSecurityDACSetOwnership): Likewise.
* src/conf/storage_conf.c (virStorageDefParsePerms): Avoid
compiler warning.
* src/qemu/qemu_process.c (qemuFindAgentConfig): avoid crash libvirtd due to
deref a NULL pointer.
* How to reproduce?
1. virsh edit the following xml into guest configuration:
<channel type='pty'>
<target type='virtio'/>
</channel>
2. virsh start <domain>
or
% virt-install -n foo -r 1024 --disk path=/var/lib/libvirt/images/foo.img,size=1 \
--channel pty,target_type=virtio -l <installation tree>
Signed-off-by: Alex Jia <ajia@redhat.com>
This patch allows libvirt to add interfaces to already
existing Open vSwitch bridges. The following syntax in
domain XML file can be used:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'/>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
or if libvirt should auto-generate the interfaceid use
following syntax:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
It is also possible to pass an optional profileid. To do that
use following syntax:
<interface type='bridge'>
<source bridge='ovsbr'/>
<mac address='00:55:1a:65:a2:8d'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'
profileid='test-profile'/>
</virtualport>
</interface>
To create Open vSwitch bridge install Open vSwitch and
run the following command:
ovs-vsctl add-br ovsbr
The current default method of terminating the qemu process is to send
a SIGTERM, wait for up to 1.6 seconds for it to cleanly shutdown, then
send a SIGKILL and wait for up to 1.4 seconds more for the process to
terminate. This is problematic because occasionally 1.6 seconds is not
long enough for the qemu process to flush its disk buffers, so the
guest's disk ends up in an inconsistent state.
Since this only occasionally happens when the timeout prior to SIGKILL
is 1.6 seconds, this patch increases that timeout to 10 seconds. At
the very least, this should reduce the occurrence from "occasionally"
to "extremely rarely". (Once SIGKILL is sent, it waits another 5
seconds for the process to die before returning).
Note that in the cases where it takes less than this for qemu to
shutdown cleanly, libvirt will *not* wait for any longer than it would
without this patch - qemuProcessKill polls the process and returns as
soon as it is gone.
This patch is based on an earlier patch by Eric Blake which was never
committed:
https://www.redhat.com/archives/libvir-list/2011-November/msg00243.html
Aside from rebasing, this patch only drops the driver lock once (prior
to the first time the function sleeps), then leaves it dropped until
it returns (Eric's patch would drop and re-acquire the lock around
each call to sleep).
At the time Eric sent his patch, the response (from Dan Berrange) was
that, while it wasn't a good thing to be holding the driver lock while
sleeping, we really need to rethink locking wrt the driver object,
switching to a finer-grained approach that locks individual items
within the driver object separately to allow for greater concurrency.
This is a good plan, and at the time it made sense to not apply the
patch because there was no known bug related to the driver lock being
held in this function.
However, we now know that the length of the wait in qemuProcessKill is
sometimes too short to allow the qemu process to fully flush its disk
cache before SIGKILL is sent, so we need to lengthen the timeout (in
order to improve the situation with management applications until they
can be updated to use the new VIR_DOMAIN_DESTROY_GRACEFUL flag added
in commit 72f8a7f197). But, if we
lengthen the timeout, we also lengthen the amount of time that all
other threads in libvirtd are essentially blocked from doing anything
(since just about everything needs to acquire the driver lock, if only
for long enough to get a pointer to a domain).
The solution is to modify qemuProcessKill to drop the driver lock
while sleeping, as proposed in Eric's patch. Then we can increase the
timeout with a clear conscience, and thus at least lower the chances
that someone running with existing management software will suffer the
consequence's of qemu's disk cache not being flushed.
In the meantime, we still should work on Dan's proposal to make
locking within the driver object more fine grained.
(NB: although I couldn't find any instance where qemuProcessKill() was
called with no jobs active for the domain (or some other guarantee
that the current thread had at least one refcount on the domain
object), this patch still follows Eric's method of temporarily adding
a ref prior to unlocking the domain object, because I couldn't
convince myself 100% that this was the case.)
In the future (my next patch in fact) we may want to make
decisions depending on qemu having a monitor command or not.
Therefore, we want to set qemuCaps flag instead of querying
on the monitor each time we are about to make that decision.
When libvirt's virDomainDestroy API is shutting down the qemu process,
it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the
process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss
because the guest running in qemu doesn't have time to flush its disk
cache buffers before it's unceremoniously whacked.
This patch maintains that default behavior, but provides a new flag
VIR_DOMAIN_DESTROY_GRACEFUL to alter the behavior. If this flag is set
in the call to virDomainDestroyFlags, SIGKILL will never be sent to
the qemu process; instead, if the timeout is reached and the qemu
process still exists, virDomainDestroy will return an error.
Once this patch is in, the recommended method for applications to call
virDomainDestroyFlags will be with VIR_DOMAIN_DESTROY_GRACEFUL
included. If that fails, then the application can decide if and when
to call virDomainDestroyFlags again without
VIR_DOMAIN_DESTROY_GRACEFUL (to force the issue with SIGKILL).
(Note that this does not address the issue of existing applications
that have not yet been modified to use VIR_DOMAIN_DESTROY_GRACEFUL.
That is a separate patch.)
This patch revises qemuProcessStart() function for qemu
processes to retain CAP_SYS_RAWIO if needed.
And in case of that, add taint flag to domain.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp>
There is now a standard QEMU guest agent that can be installed
and given a virtio serial channel
<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/f16x86_64.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
The protocol that runs over the guest agent is JSON based and
very similar to the JSON monitor. We can't use exactly the same
code because there are some odd differences in the way messages
and errors are structured. The qemu_agent.c file is based on
a combination and simplification of qemu_monitor.c and
qemu_monitor_json.c
* src/qemu/qemu_agent.c, src/qemu/qemu_agent.h: Support for
talking to the agent for shutdown
* src/qemu/qemu_domain.c, src/qemu/qemu_domain.h: Add thread
helpers for talking to the agent
* src/qemu/qemu_process.c: Connect to agent whenever starting
a guest
* src/qemu/qemu_monitor_json.c: Make variable static