We still keep the policy as a collection of simple text files, located in
/etc/qubes-rpc/policy/ directory in the AdminVM (dom0). This allows for
automating of the policy packaging into RPM (trusted) packages, as well policy
customization from within our integrated Salt Stack via (trusted) Salt state
files.
Now, one of the coolest features we’ve introduced in Qubes 4.0 is the ability to
tag VMs and use these to make policy decisions.
Imagine we have several work-related domains. We can now tag all them with some
tag of our choosing, say work:
[user@dom0 user ~]$ qvm-tags itl-email add work
[user@dom0 user ~]$ qvm-tags accounting add work
[user@dom0 user ~]$ qvm-tags project-liberation add work
Now we can easily construct a qrexec policy, e.g. to constrain the file (or
clipbooard) copy operation, so that it’s allowed only between the VMs tagged as
work (while preventing file transfer to and from any VM not tagged with
work) – all we need is to add the following 3 extra rules in the policy file
for qubes.FileCopy:
[user@dom0 user ~]$ cat /etc/qubes-rpc/polisy/qubes.FileCopy
(...)
$tag:work $tag:work allow
$tag:work $anyvm deny
$anyvm $tag:work deny
$anyvm $anyvm ask
We can do the same for clipboard, just need to place the same 3 rules in
qubes.ClipboardPaste policy file.
Also, as already discussed in the previous post on Admin API (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/),
the Core Stack automatically tags VMs with created-by-XYZ tag, where XYZ is
replaced by the name of the VM which invoked the admin.vm.Create* service.
This allows to automatically constrain power of specific management VMs to only
manage its “own” VMs, and not others. Please refer to the article linked above
for the examples.
Furthermore, Disposable VMs can also be referred to via tags in the policy, for
example:
# Allow personal VMs to start DispVMs created by the user via guidom:
$tag:created-by-guidom $dispvm:$tag:created-by-guidom allow
# Allow corpo-managed VMs to start DispVMs based on corpo-owned AppVMs:
$tag:created-by-mgmt-corpo $dispvm:$tag:created-by-mgmt-corpo allow
Of course, we can also explicitly specify Disposable VMs using the
$dispvm: syntax, e.g. to allow AppVMs tagged
work to request qrexec service from a Disposable VM created off
work-printing AppVM (as noted earlier, the “work-printing” would need to have
it’s property template_for_dispvms set for this to work):
$anyvm $dispvm:work-printing allow
In Qubes 4.0 we have also implemented more strict control over the destination
argument for qrexec calls. Until Qubes 4.0, the source VM (i.e., the VM that
calls a service) was responsible for providing a valid destination VM name to
which it wanted to direct the service call (e.g. qubes.FileCopy or
qubes.OpenInVM). Of course, the policy always had the last say in this
process, and if the policy had a deny rule for the specific case, the service
call was dropped.
What has changed in Qubes 4.0 is that whenever the policy says ask (in
contrast to allow or deny), then the VM-provided destination is essentially
ignored, and instead a trusted prompt is displayed in dom0 to ask the user to
select the destination (with a convenient drop down list).
(One could argue that the VM-provided destination argument in qrexec policy call
is not entirely ignored, as it might be used to select a specific qrexec policy
line, in case there were a few matching, such as:
$anyvm work allow
$anyvm work-web ask,default_target=work-web
$anyvm work-email ask
In that case, however, the VM-provided call would still be overridden by the
user choice in case the rule #2 was selected.)
/etc/qubes-rpc/policy/ directory in the AdminVM (dom0). This allows for
automating of the policy packaging into RPM (trusted) packages, as well policy
customization from within our integrated Salt Stack via (trusted) Salt state
files.
Now, one of the coolest features we’ve introduced in Qubes 4.0 is the ability to
tag VMs and use these to make policy decisions.
Imagine we have several work-related domains. We can now tag all them with some
tag of our choosing, say work:
[user@dom0 user ~]$ qvm-tags itl-email add work
[user@dom0 user ~]$ qvm-tags accounting add work
[user@dom0 user ~]$ qvm-tags project-liberation add work
Now we can easily construct a qrexec policy, e.g. to constrain the file (or
clipbooard) copy operation, so that it’s allowed only between the VMs tagged as
work (while preventing file transfer to and from any VM not tagged with
work) – all we need is to add the following 3 extra rules in the policy file
for qubes.FileCopy:
[user@dom0 user ~]$ cat /etc/qubes-rpc/polisy/qubes.FileCopy
(...)
$tag:work $tag:work allow
$tag:work $anyvm deny
$anyvm $tag:work deny
$anyvm $anyvm ask
We can do the same for clipboard, just need to place the same 3 rules in
qubes.ClipboardPaste policy file.
Also, as already discussed in the previous post on Admin API (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/),
the Core Stack automatically tags VMs with created-by-XYZ tag, where XYZ is
replaced by the name of the VM which invoked the admin.vm.Create* service.
This allows to automatically constrain power of specific management VMs to only
manage its “own” VMs, and not others. Please refer to the article linked above
for the examples.
Furthermore, Disposable VMs can also be referred to via tags in the policy, for
example:
# Allow personal VMs to start DispVMs created by the user via guidom:
$tag:created-by-guidom $dispvm:$tag:created-by-guidom allow
# Allow corpo-managed VMs to start DispVMs based on corpo-owned AppVMs:
$tag:created-by-mgmt-corpo $dispvm:$tag:created-by-mgmt-corpo allow
Of course, we can also explicitly specify Disposable VMs using the
$dispvm: syntax, e.g. to allow AppVMs tagged
work to request qrexec service from a Disposable VM created off
work-printing AppVM (as noted earlier, the “work-printing” would need to have
it’s property template_for_dispvms set for this to work):
$anyvm $dispvm:work-printing allow
In Qubes 4.0 we have also implemented more strict control over the destination
argument for qrexec calls. Until Qubes 4.0, the source VM (i.e., the VM that
calls a service) was responsible for providing a valid destination VM name to
which it wanted to direct the service call (e.g. qubes.FileCopy or
qubes.OpenInVM). Of course, the policy always had the last say in this
process, and if the policy had a deny rule for the specific case, the service
call was dropped.
What has changed in Qubes 4.0 is that whenever the policy says ask (in
contrast to allow or deny), then the VM-provided destination is essentially
ignored, and instead a trusted prompt is displayed in dom0 to ask the user to
select the destination (with a convenient drop down list).
(One could argue that the VM-provided destination argument in qrexec policy call
is not entirely ignored, as it might be used to select a specific qrexec policy
line, in case there were a few matching, such as:
$anyvm work allow
$anyvm work-web ask,default_target=work-web
$anyvm work-email ask
In that case, however, the VM-provided call would still be overridden by the
user choice in case the rule #2 was selected.)
A prime example of where this is used is the qubes.FileCopy service. However, we
should note that for most other services, in a well configured system, there
should be very few ask rules. Instead most policies should be either allow
or deny, thereby relieving the user from having to make a security decision with every
service invocation. Even the qubes.FileCopy service should be additionally guarded by
deny rules (e.g. forbidding any file transfers between personal and
work-related VMs), and we believe our integrated Salt Management Stack should be
helpful in creating such policies in larger deployments of Qubes within
corporations.
Here comes in handy another powerful feature of qrexec policy: the target=
specifier, which can be added after the action keyword. This forces the call to
be directed to the specific destination VM, no matter what the source VM
specified. A good place to make use of this is in policies for starting various
Disposable VMs. For example, we might have a special rule for Disposable VMs
which can be invoked only from VMs tagged with work (e.g. for the
qubes.OpenInVM service):
$tag:work $dispvm allow,target=$dispvm:work-printing
$anyvm $dispvm:work-printing deny
The first line means that every DispVM created by a VM tagged with work will
be based on (i.e., use as its template) the work-printing VM. (Recall that, in
order for this to succeed, the work-printing VM also has to have its
template_for_dispvms property set to true.) The second line means that any
other VM (i.e., any VM not tagged with work) will be denied from creating
DispVMs based on the work-printing VM.
qubes.xml, qubesd, and the Admin API
Since the beginning, Qubes Core Stack kept all the information about the Qubes
system’s (persistent) configuration within the /var/lib/qubes/qubes.xml file.
This has included information such as what VMs are defined, on which templates
they are based, how they are network-connected, etc. This file has never been
intended to be edited by the user by hand (except for rare system
troubleshooting). Instead, Qubes has provided lots of tools, such as
qvm-create, qvm-prefs and qubes-prefs, and many more, which operate or
make use of the information from this file.
In Qubes 4.x we’ve introduced the qubesd daemon (service) which is now the
only entity which has direct access to the qubes.xml file, and which exposes a
well-defined API to other tools. This API is used by a few internal tools
running in dom0, such as some power management noscripts, qrexec policy checker,
and qubesd-query(-fast) wrapper, which in turn is used to expose most parts of
this API to other VMs via the qrexec infrastructure. This is what we call Admin
API, and which we I described in the previous post (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/). While the
mapping between Admin API and the internal qubesd API is nearly “one-to-one”,
the primary difference is that Admin API is subject to policy mechanism via
qrexec, while the qubesd-exposed API is not policed, because it is only exposed
locally within the dom0. This architecture is depicted below.
should note that for most other services, in a well configured system, there
should be very few ask rules. Instead most policies should be either allow
or deny, thereby relieving the user from having to make a security decision with every
service invocation. Even the qubes.FileCopy service should be additionally guarded by
deny rules (e.g. forbidding any file transfers between personal and
work-related VMs), and we believe our integrated Salt Management Stack should be
helpful in creating such policies in larger deployments of Qubes within
corporations.
Here comes in handy another powerful feature of qrexec policy: the target=
specifier, which can be added after the action keyword. This forces the call to
be directed to the specific destination VM, no matter what the source VM
specified. A good place to make use of this is in policies for starting various
Disposable VMs. For example, we might have a special rule for Disposable VMs
which can be invoked only from VMs tagged with work (e.g. for the
qubes.OpenInVM service):
$tag:work $dispvm allow,target=$dispvm:work-printing
$anyvm $dispvm:work-printing deny
The first line means that every DispVM created by a VM tagged with work will
be based on (i.e., use as its template) the work-printing VM. (Recall that, in
order for this to succeed, the work-printing VM also has to have its
template_for_dispvms property set to true.) The second line means that any
other VM (i.e., any VM not tagged with work) will be denied from creating
DispVMs based on the work-printing VM.
qubes.xml, qubesd, and the Admin API
Since the beginning, Qubes Core Stack kept all the information about the Qubes
system’s (persistent) configuration within the /var/lib/qubes/qubes.xml file.
This has included information such as what VMs are defined, on which templates
they are based, how they are network-connected, etc. This file has never been
intended to be edited by the user by hand (except for rare system
troubleshooting). Instead, Qubes has provided lots of tools, such as
qvm-create, qvm-prefs and qubes-prefs, and many more, which operate or
make use of the information from this file.
In Qubes 4.x we’ve introduced the qubesd daemon (service) which is now the
only entity which has direct access to the qubes.xml file, and which exposes a
well-defined API to other tools. This API is used by a few internal tools
running in dom0, such as some power management noscripts, qrexec policy checker,
and qubesd-query(-fast) wrapper, which in turn is used to expose most parts of
this API to other VMs via the qrexec infrastructure. This is what we call Admin
API, and which we I described in the previous post (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/). While the
mapping between Admin API and the internal qubesd API is nearly “one-to-one”,
the primary difference is that Admin API is subject to policy mechanism via
qrexec, while the qubesd-exposed API is not policed, because it is only exposed
locally within the dom0. This architecture is depicted below.
As discussed in the post about Admin API (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/), we have put lots of
thought into designing the API in such a way as to allow effective split between
the user and admin roles. Security-wise this means that admins should be able to
manage configurations and policies in the system, but not be able to access the
user data (i.e. AppVMs’ private images). Likewise it should be possible to
prevent users from changing the policies of the system, while allowing them to
use their data (but perhaps not export/leak them easily outside the system).
For completeness I’d like to mention that both qrexec and firewalling policies
are not included in the central qubes.xml file, but rather in separate
locations, i.e. in /etc/qubes-rpc/policy/ and
/var/lib/qubes//firewall.xml respectively. This allows for easy
updating of the policy files, e.g. from within trusted RPMs that are installed
in dom0 and which might be brining new qrexec services, or from whatever tool
used to create/manage firewalling policies.
Finally, a note that qubesd (as in “qubes-daemon”) should not be confused with
[qubes-db][https://github.com/QubesOS/qubes-core-qubesdb] (as in
“qubes-database”). The latter is a Qubes-provided, security-optimized abstraction
for exposing static informations from one VMs to others (mostly from AdminVM),
and which is used e.g. for the agents in the VMs to get to know the VM’s name,
type and other configuration options.
The new attack surface?
With all these changes to the Qubes Core Stack, an important question comes to
mind: how do all these changes affect the security of the system?
In an attempt to provide somewhat meaningful answer to that question, we should
first observe that there exists a number of obvious configurations (including
the default one) of the system, in which there should be no security regression
compared to previous Qubes versions.
Indeed, by not allowing any other VM to access the Admin API (which is what the
default qrexec policy for Admin API does), we essentially reduce the attack
surface onto the Core Stack to what is has been in the previous versions (modulo
potentially more complex policy parser, as discussed below).
Let us now imagine exposing some subset of the Admin API to select, trusted
management VMs, such as the upcoming GUI domain (in Qubes 4.1). As long as we
consider these select VMs as “trusted”, again the situation does not seem to be
any worse that what it was before (we can simply think of dom0 as having
comprised these additional VMs in previous versions of Qubes. Certainly there is
no security benefit here, but likewise there is no added risk).
Now let’s move a step further and relax our trustworthiness requirement for this,
say, GUI domain. We will now consider it only “somewhat trustworthy”. The whole
promise of the new Admin API is that, with a reasonably policed Admin API (see
also the previous post), even if this domain gets compromised, this will not
result in full system compromise, and ideally only in some kind of a DoS where
none of the user data will get compromised. Of course, in such a situation there
is additional attack surface that should be taken into account, such as the
qubesd-exposed interface. In case of a hypothetical bug in the implementation of
the qubesd-exposed interface (which is heavily sanitized and also written in
Python, but still) the attacker who compromised our “somewhat trustworthy” GUI
domain might compromise the whole system. But then again, let’s remember that
without the Admin API we would not have a “somewhat trustworthy” GUI domain in the
first place, and if we assume it was possible for the attacker to compromise
this VM, then she would also be able to compromise dom0 in earlier Qubes
versions. That would be fatal without any additional preconditions (e.g. for a
bug in qubesd).
Finally, we have the case of a “largely untrusted” management VM. The typical
thought into designing the API in such a way as to allow effective split between
the user and admin roles. Security-wise this means that admins should be able to
manage configurations and policies in the system, but not be able to access the
user data (i.e. AppVMs’ private images). Likewise it should be possible to
prevent users from changing the policies of the system, while allowing them to
use their data (but perhaps not export/leak them easily outside the system).
For completeness I’d like to mention that both qrexec and firewalling policies
are not included in the central qubes.xml file, but rather in separate
locations, i.e. in /etc/qubes-rpc/policy/ and
/var/lib/qubes//firewall.xml respectively. This allows for easy
updating of the policy files, e.g. from within trusted RPMs that are installed
in dom0 and which might be brining new qrexec services, or from whatever tool
used to create/manage firewalling policies.
Finally, a note that qubesd (as in “qubes-daemon”) should not be confused with
[qubes-db][https://github.com/QubesOS/qubes-core-qubesdb] (as in
“qubes-database”). The latter is a Qubes-provided, security-optimized abstraction
for exposing static informations from one VMs to others (mostly from AdminVM),
and which is used e.g. for the agents in the VMs to get to know the VM’s name,
type and other configuration options.
The new attack surface?
With all these changes to the Qubes Core Stack, an important question comes to
mind: how do all these changes affect the security of the system?
In an attempt to provide somewhat meaningful answer to that question, we should
first observe that there exists a number of obvious configurations (including
the default one) of the system, in which there should be no security regression
compared to previous Qubes versions.
Indeed, by not allowing any other VM to access the Admin API (which is what the
default qrexec policy for Admin API does), we essentially reduce the attack
surface onto the Core Stack to what is has been in the previous versions (modulo
potentially more complex policy parser, as discussed below).
Let us now imagine exposing some subset of the Admin API to select, trusted
management VMs, such as the upcoming GUI domain (in Qubes 4.1). As long as we
consider these select VMs as “trusted”, again the situation does not seem to be
any worse that what it was before (we can simply think of dom0 as having
comprised these additional VMs in previous versions of Qubes. Certainly there is
no security benefit here, but likewise there is no added risk).
Now let’s move a step further and relax our trustworthiness requirement for this,
say, GUI domain. We will now consider it only “somewhat trustworthy”. The whole
promise of the new Admin API is that, with a reasonably policed Admin API (see
also the previous post), even if this domain gets compromised, this will not
result in full system compromise, and ideally only in some kind of a DoS where
none of the user data will get compromised. Of course, in such a situation there
is additional attack surface that should be taken into account, such as the
qubesd-exposed interface. In case of a hypothetical bug in the implementation of
the qubesd-exposed interface (which is heavily sanitized and also written in
Python, but still) the attacker who compromised our “somewhat trustworthy” GUI
domain might compromise the whole system. But then again, let’s remember that
without the Admin API we would not have a “somewhat trustworthy” GUI domain in the
first place, and if we assume it was possible for the attacker to compromise
this VM, then she would also be able to compromise dom0 in earlier Qubes
versions. That would be fatal without any additional preconditions (e.g. for a
bug in qubesd).
Finally, we have the case of a “largely untrusted” management VM. The typical
scenario could be a management VM “owned” by an organization/corporation. As
explained in the previous post, the Admin API should allow us to grant such a VM
authority over only a subset of VMs, specifically only those which it created,
and not any other (through the convenient created-by-XYZ tags in the policy).
Now, if we consider this VM to become compromised, e.g. as a result of the
organization’s proprietary management agents getting compromised somehow, then
it becomes a very urging question to answer how buggy the qubesd-exposed
interface might be. Again, on most (all?) other client system, such a situation
would be fatal immediately (i.e. no additional attacks would be required after
the attacker compromised the agent), while on Qubes this would only be the
prelude for trying another attacks to get to dom0.
One other aspect which might not be immediately clear is the trade-off between a
more flexible architecture, which e.g. allows to create mutually untrusted
management VMs on the one hand, and the increased complexity of e.g. the policy
checker, which is required to now also understand new keywords such as the
previously introduced $dispvm:xyz or $tag:xyz. In general we believe that if
we can introduce a significant new security improvement on architecture level,
which allows to further decompose the TCB of the system, than it is worth it.
This is because architecture-level security should always go first, before
implementation-level security. Indeed, the latter can always be patched, and in
many cases won’t be critical (because e.g. smart architecture will keep it
outside of the TCB), while the architecture very often cannot be so easily
“fixed”. In fact this is the prime reason why we have Qubes OS, i.e. because
fixing of the monolithic architecture of the mainstream OSes has seemed hopeless
to us.
Summary
The new Qubes Core Stack provides a very flexible framework for managing a
compartmentalized desktop (client) oriented system. Compared to previous Qubes
Core Stacks, it offers much more flexibility, which translates to ability to
further decompose the system into more (largely) mutually untrusting parts.
Some readers might wonder how does the Qubes Core Stack actually compare to
various popular cloud/server/virtualization management APIs, such as
OpenStack/EC2 or even Docker?
While at first sight there might be quite a few similarities related to
management of VMs or containers, the primary differentiating factor is that
Qubes Core Stack has been designed and optimized to bring user one desktop
system built on top of multiple isolated domains (currently implemented as Xen
Virtual Machines, but in the future maybe on top of something else), rather than
for management of service-oriented infrastructure, where the services are
largely independent from each other and where the prime consideration is
scalability.
The Qubes Core Stack is Xen- and virtualization-independent, and should be
easily portable to any compartmentalization technology.
In the upcoming article we will take a look at the updated device and new volume
management in Qubes 4.0.
explained in the previous post, the Admin API should allow us to grant such a VM
authority over only a subset of VMs, specifically only those which it created,
and not any other (through the convenient created-by-XYZ tags in the policy).
Now, if we consider this VM to become compromised, e.g. as a result of the
organization’s proprietary management agents getting compromised somehow, then
it becomes a very urging question to answer how buggy the qubesd-exposed
interface might be. Again, on most (all?) other client system, such a situation
would be fatal immediately (i.e. no additional attacks would be required after
the attacker compromised the agent), while on Qubes this would only be the
prelude for trying another attacks to get to dom0.
One other aspect which might not be immediately clear is the trade-off between a
more flexible architecture, which e.g. allows to create mutually untrusted
management VMs on the one hand, and the increased complexity of e.g. the policy
checker, which is required to now also understand new keywords such as the
previously introduced $dispvm:xyz or $tag:xyz. In general we believe that if
we can introduce a significant new security improvement on architecture level,
which allows to further decompose the TCB of the system, than it is worth it.
This is because architecture-level security should always go first, before
implementation-level security. Indeed, the latter can always be patched, and in
many cases won’t be critical (because e.g. smart architecture will keep it
outside of the TCB), while the architecture very often cannot be so easily
“fixed”. In fact this is the prime reason why we have Qubes OS, i.e. because
fixing of the monolithic architecture of the mainstream OSes has seemed hopeless
to us.
Summary
The new Qubes Core Stack provides a very flexible framework for managing a
compartmentalized desktop (client) oriented system. Compared to previous Qubes
Core Stacks, it offers much more flexibility, which translates to ability to
further decompose the system into more (largely) mutually untrusting parts.
Some readers might wonder how does the Qubes Core Stack actually compare to
various popular cloud/server/virtualization management APIs, such as
OpenStack/EC2 or even Docker?
While at first sight there might be quite a few similarities related to
management of VMs or containers, the primary differentiating factor is that
Qubes Core Stack has been designed and optimized to bring user one desktop
system built on top of multiple isolated domains (currently implemented as Xen
Virtual Machines, but in the future maybe on top of something else), rather than
for management of service-oriented infrastructure, where the services are
largely independent from each other and where the prime consideration is
scalability.
The Qubes Core Stack is Xen- and virtualization-independent, and should be
easily portable to any compartmentalization technology.
In the upcoming article we will take a look at the updated device and new volume
management in Qubes 4.0.
RT @x0rz: You know the security model is broken when you can access BIOS through a #maldoc... go @QubesOS https://t.co/HZzK285s7i
Twitter
Requiem
Malicious update attack targeting BIOS from #maldoc and #powershell dropper 🤘 #VB2017
RT @ttaskett: Apple could learn something from @QubesOS about visual security context. #infosec https://t.co/z4mOHP06AB
Twitter
Felix Krause
📝 One of these is Apple asking you for your password and the other one is a phishing popup that steals your password https://t.co/PdOJcthqL7
Qubes Security Bulletin #34: a bunch of Xen bugs (believed to be DoS only) & a (rather minor) bug in GUI coloring:
https://t.co/aczRAsPxho
https://t.co/aczRAsPxho
GitHub
QubesOS/qubes-secpack
qubes-secpack - Qubes Security Pack
QSB #34: GUI issue and Xen vulnerabilities (XSA-237 through XSA-244)
https://www.qubes-os.org/news/2017/10/12/qsb-34/
Dear Qubes Community,
We have just published Qubes Security Bulletin (QSB) #34:
GUI issue and Xen vulnerabilities (XSA-237 through XSA-244).
The text of this QSB is reproduced below. This QSB and its accompanying
signatures will always be available in the Qubes Security Pack (qubes-secpack).
View QSB #34 in the qubes-secpack:
https://github.com/QubesOS/qubes-secpack/blob/master/QSBs/qsb-034-2017.txt
Learn about the qubes-secpack, including how to obtain, verify, and read it:
https://www.qubes-os.org/security/pack/
View all past QSBs:
https://www.qubes-os.org/security/bulletins/
View the XSA Tracker:
https://www.qubes-os.org/security/xsa/
---===[ Qubes Security Bulletin #34 ]===---
October 12, 2017
GUI issue and Xen vulnerabilities (XSA-237 through XSA-244)
Summary
========
One of our developers, Simon Gaiser (aka HW42), while working on
improving support for device isolation in Qubes 4.0, discovered a
potential security problem with the way Xen handles MSI-capable devices.
The Xen Security Team has classified this problem as XSA-237 [01], which
was published today.
At the same time, the Xen Security Team released several other Xen
Security Advisories (XSA-238 through XSA-244). The impact of these
advisories ranges from system crashes to potential privilege
escalations. However, the latter seem to be mostly theoretical. See our
commentary below for details.
Finally, Eric Larsson discovered a situation in which Qubes GUI
virtualization could allow a VM to produce a window that has no colored
borders (which are used in Qubes as front-line indicators of trust).
A VM cannot use this vulnerability to draw different borders in place of
the correct one, however. We discuss this issue extensively below.
Technical details
==================
Xen issues
-----------
Xen Security Advisory 237 [01]:
| Multiple issues exist with the setup of PCI MSI interrupts:
| - unprivileged guests were permitted access to devices not owned by
| them, in particular allowing them to disable MSI or MSI-X on any
| device
| - HVM guests can trigger a codepath intended only for PV guests
| - some failure paths partially tear down previously configured
| interrupts, leaving inconsistent state
| - with XSM enabled, caller and callee of a hook disagreed about the
| data structure pointed to by a type-less argument
|
| A malicious or buggy guest may cause the hypervisor to crash, resulting
| in Denial of Service (DoS) affecting the entire host. Privilege
| escalation and information leaks cannot be excluded.
Xen Security Advisory 238 [02]:
| DMOPs (which were a subgroup of HVMOPs in older releases) allow guests
| to control and drive other guests. The I/O request server page mapping
| interface uses range sets to represent I/O resources the emulation of
| which is provided by a given I/O request server. The internals of the
| range set implementation require that ranges have a starting value no
| lower than the ending one. Checks for this fact were missing.
|
| Malicious or buggy stub domain kernels or tool stacks otherwise living
| outside of Domain0 can mount a denial of service attack which, if
| successful, can affect the whole system.
|
| Only domains controlling HVM guests can exploit this vulnerability.
| (This includes domains providing hardware emulation services to HVM
| guests.)
Xen Security Advisory 239 [03]:
| Intercepted I/O operations may deal with less than a full machine
| word's worth of data. While read paths had been the subject of earlier
| XSAs (and hence have been fixed), at least one write path was found
| where the data stored into an internal structure could contain bits
| from an uninitialized hypervisor stack slot. A subsequent emulated
| read would then be able to retrieve these bits.
|
https://www.qubes-os.org/news/2017/10/12/qsb-34/
Dear Qubes Community,
We have just published Qubes Security Bulletin (QSB) #34:
GUI issue and Xen vulnerabilities (XSA-237 through XSA-244).
The text of this QSB is reproduced below. This QSB and its accompanying
signatures will always be available in the Qubes Security Pack (qubes-secpack).
View QSB #34 in the qubes-secpack:
https://github.com/QubesOS/qubes-secpack/blob/master/QSBs/qsb-034-2017.txt
Learn about the qubes-secpack, including how to obtain, verify, and read it:
https://www.qubes-os.org/security/pack/
View all past QSBs:
https://www.qubes-os.org/security/bulletins/
View the XSA Tracker:
https://www.qubes-os.org/security/xsa/
---===[ Qubes Security Bulletin #34 ]===---
October 12, 2017
GUI issue and Xen vulnerabilities (XSA-237 through XSA-244)
Summary
========
One of our developers, Simon Gaiser (aka HW42), while working on
improving support for device isolation in Qubes 4.0, discovered a
potential security problem with the way Xen handles MSI-capable devices.
The Xen Security Team has classified this problem as XSA-237 [01], which
was published today.
At the same time, the Xen Security Team released several other Xen
Security Advisories (XSA-238 through XSA-244). The impact of these
advisories ranges from system crashes to potential privilege
escalations. However, the latter seem to be mostly theoretical. See our
commentary below for details.
Finally, Eric Larsson discovered a situation in which Qubes GUI
virtualization could allow a VM to produce a window that has no colored
borders (which are used in Qubes as front-line indicators of trust).
A VM cannot use this vulnerability to draw different borders in place of
the correct one, however. We discuss this issue extensively below.
Technical details
==================
Xen issues
-----------
Xen Security Advisory 237 [01]:
| Multiple issues exist with the setup of PCI MSI interrupts:
| - unprivileged guests were permitted access to devices not owned by
| them, in particular allowing them to disable MSI or MSI-X on any
| device
| - HVM guests can trigger a codepath intended only for PV guests
| - some failure paths partially tear down previously configured
| interrupts, leaving inconsistent state
| - with XSM enabled, caller and callee of a hook disagreed about the
| data structure pointed to by a type-less argument
|
| A malicious or buggy guest may cause the hypervisor to crash, resulting
| in Denial of Service (DoS) affecting the entire host. Privilege
| escalation and information leaks cannot be excluded.
Xen Security Advisory 238 [02]:
| DMOPs (which were a subgroup of HVMOPs in older releases) allow guests
| to control and drive other guests. The I/O request server page mapping
| interface uses range sets to represent I/O resources the emulation of
| which is provided by a given I/O request server. The internals of the
| range set implementation require that ranges have a starting value no
| lower than the ending one. Checks for this fact were missing.
|
| Malicious or buggy stub domain kernels or tool stacks otherwise living
| outside of Domain0 can mount a denial of service attack which, if
| successful, can affect the whole system.
|
| Only domains controlling HVM guests can exploit this vulnerability.
| (This includes domains providing hardware emulation services to HVM
| guests.)
Xen Security Advisory 239 [03]:
| Intercepted I/O operations may deal with less than a full machine
| word's worth of data. While read paths had been the subject of earlier
| XSAs (and hence have been fixed), at least one write path was found
| where the data stored into an internal structure could contain bits
| from an uninitialized hypervisor stack slot. A subsequent emulated
| read would then be able to retrieve these bits.
|
| A malicious unprivileged x86 HVM guest may be able to obtain sensitive
| information from the host or other guests.
Xen Security Advisory 240 [04]:
| x86 PV guests are permitted to set up certain forms of what is often
| called "linear page tables", where pagetables contain references to
| other pagetables at the same level or higher. Certain restrictions
| apply in order to fit into Xen's page type handling system. An
| important restriction was missed, however: Stacking multiple layers
| of page tables of the same level on top of one another is not very
| useful, and the tearing down of such an arrangement involves
| recursion. With sufficiently many layers such recursion will result
| in a stack overflow, commonly resulting in Xen to crash.
|
| A malicious or buggy PV guest may cause the hypervisor to crash,
| resulting in Denial of Service (DoS) affecting the entire host.
| Privilege escalation and information leaks cannot be excluded.
Xen Security Advisory 241 [05]:
| x86 PV guests effect TLB flushes by way of a hypercall. Xen tries to
| reduce the number of TLB flushes by delaying them as much as possible.
| When the last type reference of a page is dropped, the need for a TLB
| flush (before the page is re-used) is recorded. If a guest TLB flush
| request involves an Inter Processor Interrupt (IPI) to a CPU in which
| is the process of dropping the last type reference of some page, and
| if that IPI arrives at exactly the right instruction boundary, a stale
| time stamp may be recorded, possibly resulting in the later omission
| of the necessary TLB flush for that page.
|
| A malicious x86 PV guest may be able to access all of system memory,
| allowing for all of privilege escalation, host crashes, and
| information leaks.
Xen Security Advisory 242 [06]:
| The page type system of Xen requires cleanup when the last reference
| for a given page is being dropped. In order to exclude simultaneous
| updates to a given page by multiple parties, pages which are updated
| are locked beforehand. This locking includes temporarily increasing
| the type reference count by one. When the page is later unlocked, the
| context precludes cleanup, so the reference that is then dropped must
| not be the last one. This was not properly enforced.
|
| A malicious or buggy PV guest may cause a memory leak upon shutdown
| of the guest, ultimately perhaps resulting in Denial of Service (DoS)
| affecting the entire host.
Xen Security Advisory 243 [07]:
| The shadow pagetable code uses linear mappings to inspect and modify the
| shadow pagetables. A linear mapping which points back to itself is known as
| self-linear. For translated guests, the shadow linear mappings (being in a
| separate address space) are not intended to be self-linear. For
| non-translated guests, the shadow linear mappings (being the same
| address space) are intended to be self-linear.
|
| When constructing a monitor pagetable for Xen to run on a vcpu with, the shadow
| linear slot is filled with a self-linear mapping, and for translated guests,
| shortly thereafter replaced with a non-self-linear mapping, when the guest's
| %cr3 is shadowed.
|
| However when writeable heuristics are used, the shadow mappings are used as
| part of shadowing %cr3, causing the heuristics to be applied to Xen's
| pagetables, not the guest shadow pagetables.
|
| While investigating, it was also identified that PV auto-translate mode was
| insecure. This mode was removed in Xen 4.7 due to being unused, unmaintained
| and presumed broken. We are not aware of any guest implementation of PV
| auto-translate mode.
|
| A malicious or buggy HVM guest may cause a hypervisor crash, resulting in a
| Denial of Service (DoS) affecting the entire host, or cause hypervisor memory
| corruption. We cannot rule out a guest being able to escalate its privilege.
Xen Security Advisory 244 [08]:
| The x86-64 architecture allows interrupts to be run on distinct stacks.
| information from the host or other guests.
Xen Security Advisory 240 [04]:
| x86 PV guests are permitted to set up certain forms of what is often
| called "linear page tables", where pagetables contain references to
| other pagetables at the same level or higher. Certain restrictions
| apply in order to fit into Xen's page type handling system. An
| important restriction was missed, however: Stacking multiple layers
| of page tables of the same level on top of one another is not very
| useful, and the tearing down of such an arrangement involves
| recursion. With sufficiently many layers such recursion will result
| in a stack overflow, commonly resulting in Xen to crash.
|
| A malicious or buggy PV guest may cause the hypervisor to crash,
| resulting in Denial of Service (DoS) affecting the entire host.
| Privilege escalation and information leaks cannot be excluded.
Xen Security Advisory 241 [05]:
| x86 PV guests effect TLB flushes by way of a hypercall. Xen tries to
| reduce the number of TLB flushes by delaying them as much as possible.
| When the last type reference of a page is dropped, the need for a TLB
| flush (before the page is re-used) is recorded. If a guest TLB flush
| request involves an Inter Processor Interrupt (IPI) to a CPU in which
| is the process of dropping the last type reference of some page, and
| if that IPI arrives at exactly the right instruction boundary, a stale
| time stamp may be recorded, possibly resulting in the later omission
| of the necessary TLB flush for that page.
|
| A malicious x86 PV guest may be able to access all of system memory,
| allowing for all of privilege escalation, host crashes, and
| information leaks.
Xen Security Advisory 242 [06]:
| The page type system of Xen requires cleanup when the last reference
| for a given page is being dropped. In order to exclude simultaneous
| updates to a given page by multiple parties, pages which are updated
| are locked beforehand. This locking includes temporarily increasing
| the type reference count by one. When the page is later unlocked, the
| context precludes cleanup, so the reference that is then dropped must
| not be the last one. This was not properly enforced.
|
| A malicious or buggy PV guest may cause a memory leak upon shutdown
| of the guest, ultimately perhaps resulting in Denial of Service (DoS)
| affecting the entire host.
Xen Security Advisory 243 [07]:
| The shadow pagetable code uses linear mappings to inspect and modify the
| shadow pagetables. A linear mapping which points back to itself is known as
| self-linear. For translated guests, the shadow linear mappings (being in a
| separate address space) are not intended to be self-linear. For
| non-translated guests, the shadow linear mappings (being the same
| address space) are intended to be self-linear.
|
| When constructing a monitor pagetable for Xen to run on a vcpu with, the shadow
| linear slot is filled with a self-linear mapping, and for translated guests,
| shortly thereafter replaced with a non-self-linear mapping, when the guest's
| %cr3 is shadowed.
|
| However when writeable heuristics are used, the shadow mappings are used as
| part of shadowing %cr3, causing the heuristics to be applied to Xen's
| pagetables, not the guest shadow pagetables.
|
| While investigating, it was also identified that PV auto-translate mode was
| insecure. This mode was removed in Xen 4.7 due to being unused, unmaintained
| and presumed broken. We are not aware of any guest implementation of PV
| auto-translate mode.
|
| A malicious or buggy HVM guest may cause a hypervisor crash, resulting in a
| Denial of Service (DoS) affecting the entire host, or cause hypervisor memory
| corruption. We cannot rule out a guest being able to escalate its privilege.
Xen Security Advisory 244 [08]:
| The x86-64 architecture allows interrupts to be run on distinct stacks.
| The choice of stack is encoded in a field of the corresponding
| interrupt denoscriptor in the Interrupt Denoscriptor Table (IDT). That
| field selects an entry from the active Task State Segment (TSS).
|
| Since, on AMD hardware, Xen switches to an HVM guest's TSS before
| actually entering the guest, with the Global Interrupt Flag still set,
| the selectors in the IDT entry are switched when guest context is
| loaded/unloaded.
|
| When a new CPU is brought online, its IDT is copied from CPU0's IDT,
| including those selector fields. If CPU0 happens at that moment to be
| in HVM context, wrong values for those IDT fields would be installed
| for the new CPU. If the first guest vCPU to be run on that CPU
| belongs to a PV guest, it will then have the ability to escalate its
| privilege or crash the hypervisor.
|
| A malicious or buggy x86 PV guest could escalate its privileges or
| crash the hypervisor.
|
| Avoiding to online CPUs at runtime will avoid this vulnerability.
GUI daemon issue
-----------------
Qubes OS's GUI virtualization enforces colored borders around all VM
windows. There are two types of windows. The first type are normal
windows (with borders, noscriptbars, etc.). In this case, we modify the
window manager to take care of coloring the borders. The second type are
borderless windows (with the override_redirect property set to True in
X11 terminology). Here, the window manager is not involved at all, and
our GUI daemon needs to draw a border itself. This is done by drawing a
2px border whenever window content is changed beneath that area. The bug
was that if the VM application had never sent any updates for (any part
of) the border area, the frame was never drawn. The relevant code is in
the gui-daemon component [09], specifically in gui-daemon/xside.c [10]:
/* update given fragment of window image
* can be requested by VM (MSG_SHMIMAGE) and Xserver (XExposeEvent)
* parameters are not sanitized earlier - we must check it carefully
* also do not let to cover forced colorful frame (for undecoraded windows)
*/
static void do_shm_update(Ghandles * g, struct windowdata *vm_window,
int untrusted_x, int untrusted_y, int untrusted_w,
int untrusted_h)
{
/* ... */
if (!vm_window->image && !(g->screen_window && g->screen_window->image))
return;
/* force frame to be visible: */
/* * left */
delta = border_width - x;
if (delta > 0) {
w -= delta;
x = border_width;
do_border = 1;
}
/* * right */
delta = x + w - (vm_window->width - border_width);
if (delta > 0) {
w -= delta;
do_border = 1;
}
/* * top */
delta = border_width - y;
if (delta > 0) {
h -= delta;
y = border_width;
do_border = 1;
}
/* * bottom */
delta = y + h - (vm_window->height - border_width);
if (delta > 0) {
h -= delta;
do_border = 1;
}
/* ... */
}
The above code is responsible for deciding whether the colored border
needs to be updated. It is updated if both:
a) there is any window image (vm_window->image)
b) the updated area includes a border anywhere
If neither of these conditions is met, no border is drawn. Note that if
the VM tries to draw anything there (for example, a fake border in a
different color), whatever is drawn will be overridden with the correct
borders, which will stay there until the window is destroyed.
Eric Larsson discovered that this situation (not updating the border
area) is reachable -- and even happens with some real world applications
-- when the VM shows a splash screen with a custom shape. While custom
window shapes are not supported in Qubes OS, VMs do not know this. The
VM still thinks the custom-shaped window is there, so it does not send
| interrupt denoscriptor in the Interrupt Denoscriptor Table (IDT). That
| field selects an entry from the active Task State Segment (TSS).
|
| Since, on AMD hardware, Xen switches to an HVM guest's TSS before
| actually entering the guest, with the Global Interrupt Flag still set,
| the selectors in the IDT entry are switched when guest context is
| loaded/unloaded.
|
| When a new CPU is brought online, its IDT is copied from CPU0's IDT,
| including those selector fields. If CPU0 happens at that moment to be
| in HVM context, wrong values for those IDT fields would be installed
| for the new CPU. If the first guest vCPU to be run on that CPU
| belongs to a PV guest, it will then have the ability to escalate its
| privilege or crash the hypervisor.
|
| A malicious or buggy x86 PV guest could escalate its privileges or
| crash the hypervisor.
|
| Avoiding to online CPUs at runtime will avoid this vulnerability.
GUI daemon issue
-----------------
Qubes OS's GUI virtualization enforces colored borders around all VM
windows. There are two types of windows. The first type are normal
windows (with borders, noscriptbars, etc.). In this case, we modify the
window manager to take care of coloring the borders. The second type are
borderless windows (with the override_redirect property set to True in
X11 terminology). Here, the window manager is not involved at all, and
our GUI daemon needs to draw a border itself. This is done by drawing a
2px border whenever window content is changed beneath that area. The bug
was that if the VM application had never sent any updates for (any part
of) the border area, the frame was never drawn. The relevant code is in
the gui-daemon component [09], specifically in gui-daemon/xside.c [10]:
/* update given fragment of window image
* can be requested by VM (MSG_SHMIMAGE) and Xserver (XExposeEvent)
* parameters are not sanitized earlier - we must check it carefully
* also do not let to cover forced colorful frame (for undecoraded windows)
*/
static void do_shm_update(Ghandles * g, struct windowdata *vm_window,
int untrusted_x, int untrusted_y, int untrusted_w,
int untrusted_h)
{
/* ... */
if (!vm_window->image && !(g->screen_window && g->screen_window->image))
return;
/* force frame to be visible: */
/* * left */
delta = border_width - x;
if (delta > 0) {
w -= delta;
x = border_width;
do_border = 1;
}
/* * right */
delta = x + w - (vm_window->width - border_width);
if (delta > 0) {
w -= delta;
do_border = 1;
}
/* * top */
delta = border_width - y;
if (delta > 0) {
h -= delta;
y = border_width;
do_border = 1;
}
/* * bottom */
delta = y + h - (vm_window->height - border_width);
if (delta > 0) {
h -= delta;
do_border = 1;
}
/* ... */
}
The above code is responsible for deciding whether the colored border
needs to be updated. It is updated if both:
a) there is any window image (vm_window->image)
b) the updated area includes a border anywhere
If neither of these conditions is met, no border is drawn. Note that if
the VM tries to draw anything there (for example, a fake border in a
different color), whatever is drawn will be overridden with the correct
borders, which will stay there until the window is destroyed.
Eric Larsson discovered that this situation (not updating the border
area) is reachable -- and even happens with some real world applications
-- when the VM shows a splash screen with a custom shape. While custom
window shapes are not supported in Qubes OS, VMs do not know this. The
VM still thinks the custom-shaped window is there, so it does not send
updates of content outside of that custom shape.
We fixed the issue by forcing an update of the whole window before
making it visible:
static void handle_map(Ghandles * g, struct windowdata *vm_window)
{
/* ... */
/* added code */
if (vm_window->override_redirect) {
/* force window update to draw colorful frame, even when VM have not
* sent any content yet */
do_shm_update(g, vm_window, 0, 0, vm_window->width, vm_window->height);
}
(void) XMapWindow(g->display, vm_window->local_winid);
}
This needs some auxiliary changes in the do_shm_update function, to draw
the frame also in cases when there is no window content yet
(vm_window->image is NULL).
Commentary from the Qubes Security Team
========================================
For the most part, this batch of Xen Security Advisories affects Qubes
OS 3.2 only theoretically. In the case of Qubes OS 4.0, half of them do
not apply at all. We'll comment briefly on each one:
XSA-237 - The impact is believed to be denial of service only. In addition,
we believe proper use of Interrupt Remapping should offer a generic
solution to similar problems, to reduce them to denial of
service at worst.
XSA-238 - The stated impact is denial of service only.
XSA-239 - The attacking domain has no control over what information
is leaked.
XSA-240 - The practical impact is believed to be denial of service (and does not
affect HVMs).
XSA-241 - The issue applies only to PV domains, so the attack vector
is largely limited in Qubes OS 4.0, which uses HVM domains
by default. In addition, the Xen Security Team considers this
bug to be hard to exploit in practice (see advisory).
XSA-242 - The stated impact is denial of service only. In addition, the
issue applies only to PV domains.
XSA-243 - The practical impact is believed to be denial of service. In addition,
the vulnerable code (shadow page tables) is build-time disabled
in Qubes OS 4.0.
XSA-244 - The vulnerable code path (runtime CPU hotplug) is not used
in Qubes OS.
These results reassure us that switching to HVM domains in Qubes OS 4.0
was a good decision.
Compromise Recovery
====================
Starting with Qubes 3.2, we offer Paranoid Backup Restore Mode, which
was designed specifically to aid in the recovery of a (potentially)
compromised Qubes OS system. Thus, if you believe your system might have
been compromised (perhaps because of the bugs discussed in this
bulletin), then you should read and follow the procedure described here:
https://www.qubes-os.org/news/2017/04/26/qubes-compromise-recovery/
Patching
=========
The specific packages that resolve the problems discussed in this
bulletin are as follows:
For Qubes 3.2:
- Xen packages, version 4.6.6-32
- qubes-gui-dom0, version 3.2.12
For Qubes 4.0:
- Xen packages, version 4.8.2-6
- qubes-gui-dom0, version 4.0.5
The packages are to be installed in dom0 via the Qubes VM Manager or via
the qubes-dom0-update command as follows:
For updates from the stable repository (not immediately available):
$ sudo qubes-dom0-update
For updates from the security-testing repository:
$ sudo qubes-dom0-update --enablerepo=qubes-dom0-security-testing
A system restart will be required afterwards.
These packages will migrate from the security-testing repository to the
current (stable) repository over the next two weeks after being tested
by the community.
If you use Anti Evil Maid, you will need to reseal your secret
passphrase to new PCR values, as PCR18+19 will change due to the new
Xen binaries.
Credits
========
The GUI daemon issue was discovered by Eric Larsson.
The PCI MSI issues were discovered by Simon Gaiser (aka HW42).
For other issues, see the original Xen Security Advisories.
References
===========
We fixed the issue by forcing an update of the whole window before
making it visible:
static void handle_map(Ghandles * g, struct windowdata *vm_window)
{
/* ... */
/* added code */
if (vm_window->override_redirect) {
/* force window update to draw colorful frame, even when VM have not
* sent any content yet */
do_shm_update(g, vm_window, 0, 0, vm_window->width, vm_window->height);
}
(void) XMapWindow(g->display, vm_window->local_winid);
}
This needs some auxiliary changes in the do_shm_update function, to draw
the frame also in cases when there is no window content yet
(vm_window->image is NULL).
Commentary from the Qubes Security Team
========================================
For the most part, this batch of Xen Security Advisories affects Qubes
OS 3.2 only theoretically. In the case of Qubes OS 4.0, half of them do
not apply at all. We'll comment briefly on each one:
XSA-237 - The impact is believed to be denial of service only. In addition,
we believe proper use of Interrupt Remapping should offer a generic
solution to similar problems, to reduce them to denial of
service at worst.
XSA-238 - The stated impact is denial of service only.
XSA-239 - The attacking domain has no control over what information
is leaked.
XSA-240 - The practical impact is believed to be denial of service (and does not
affect HVMs).
XSA-241 - The issue applies only to PV domains, so the attack vector
is largely limited in Qubes OS 4.0, which uses HVM domains
by default. In addition, the Xen Security Team considers this
bug to be hard to exploit in practice (see advisory).
XSA-242 - The stated impact is denial of service only. In addition, the
issue applies only to PV domains.
XSA-243 - The practical impact is believed to be denial of service. In addition,
the vulnerable code (shadow page tables) is build-time disabled
in Qubes OS 4.0.
XSA-244 - The vulnerable code path (runtime CPU hotplug) is not used
in Qubes OS.
These results reassure us that switching to HVM domains in Qubes OS 4.0
was a good decision.
Compromise Recovery
====================
Starting with Qubes 3.2, we offer Paranoid Backup Restore Mode, which
was designed specifically to aid in the recovery of a (potentially)
compromised Qubes OS system. Thus, if you believe your system might have
been compromised (perhaps because of the bugs discussed in this
bulletin), then you should read and follow the procedure described here:
https://www.qubes-os.org/news/2017/04/26/qubes-compromise-recovery/
Patching
=========
The specific packages that resolve the problems discussed in this
bulletin are as follows:
For Qubes 3.2:
- Xen packages, version 4.6.6-32
- qubes-gui-dom0, version 3.2.12
For Qubes 4.0:
- Xen packages, version 4.8.2-6
- qubes-gui-dom0, version 4.0.5
The packages are to be installed in dom0 via the Qubes VM Manager or via
the qubes-dom0-update command as follows:
For updates from the stable repository (not immediately available):
$ sudo qubes-dom0-update
For updates from the security-testing repository:
$ sudo qubes-dom0-update --enablerepo=qubes-dom0-security-testing
A system restart will be required afterwards.
These packages will migrate from the security-testing repository to the
current (stable) repository over the next two weeks after being tested
by the community.
If you use Anti Evil Maid, you will need to reseal your secret
passphrase to new PCR values, as PCR18+19 will change due to the new
Xen binaries.
Credits
========
The GUI daemon issue was discovered by Eric Larsson.
The PCI MSI issues were discovered by Simon Gaiser (aka HW42).
For other issues, see the original Xen Security Advisories.
References
===========
[01] https://xenbits.xen.org/xsa/advisory-237.html
[02] https://xenbits.xen.org/xsa/advisory-238.html
[03] https://xenbits.xen.org/xsa/advisory-239.html
[04] https://xenbits.xen.org/xsa/advisory-240.html
[05] https://xenbits.xen.org/xsa/advisory-241.html
[06] https://xenbits.xen.org/xsa/advisory-242.html
[07] https://xenbits.xen.org/xsa/advisory-243.html
[08] https://xenbits.xen.org/xsa/advisory-244.html
[09] https://github.com/QubesOS/qubes-gui-daemon/
[10] https://github.com/QubesOS/qubes-gui-daemon/blob/master/gui-daemon/xside.c#L1317-L1447
--
The Qubes Security Team
https://www.qubes-os.org/security/
[02] https://xenbits.xen.org/xsa/advisory-238.html
[03] https://xenbits.xen.org/xsa/advisory-239.html
[04] https://xenbits.xen.org/xsa/advisory-240.html
[05] https://xenbits.xen.org/xsa/advisory-241.html
[06] https://xenbits.xen.org/xsa/advisory-242.html
[07] https://xenbits.xen.org/xsa/advisory-243.html
[08] https://xenbits.xen.org/xsa/advisory-244.html
[09] https://github.com/QubesOS/qubes-gui-daemon/
[10] https://github.com/QubesOS/qubes-gui-daemon/blob/master/gui-daemon/xside.c#L1317-L1447
--
The Qubes Security Team
https://www.qubes-os.org/security/
RT @kylerankin: There's a reason @QubesOS marks the network VM as untrusted. Safer to treat your network that way #KRACK or not.
A Brief Introduction to the Xen Project and Virtualization from Mohsen Mostafa Jokar
https://blog.xenproject.org/2017/10/17/a-brief-introduction-to-the-xen-project-and-virtualization-from-mohsen-mostafa-jokar/
Mohsen Mostafa Jokar is a Linux administrator who works at the newspaper Hamshahri as a network and virtualization administrator. His interest in virtualization goes back to when he was at school and saw a Microsoft Virtual PC for the first time. He installed it on a PC with 256 MB of RAM and used it […]
https://blog.xenproject.org/2017/10/17/a-brief-introduction-to-the-xen-project-and-virtualization-from-mohsen-mostafa-jokar/
Mohsen Mostafa Jokar is a Linux administrator who works at the newspaper Hamshahri as a network and virtualization administrator. His interest in virtualization goes back to when he was at school and saw a Microsoft Virtual PC for the first time. He installed it on a PC with 256 MB of RAM and used it […]
MSI support for PCI device pass-through with stub domains
https://www.qubes-os.org/news/2017/10/18/msi-support/
Introduction
In this post, we will describe how we fixed MSI support for VMs running in HVM mode in Qubes 4.0.
First, allow us to provide some background about the MSI feature and why we need it in the first place.
In Qubes 4.0, we switched from paravirtualized (PV) virtual machines to hardware virtual machines (HVMs, also known as “fully virtualized” or “hardware-assisted” VMs) for improved security (see the 4.0-rc1 announcement (https://www.qubes-os.org/news/2017/07/31/qubes-40-rc1/#fully-virtualized-vms-for-better-isolation) for details).
For VMs running as HVMs, Xen requires software that can emulate hardware (such as network cards) called QEMU in order to provide device emulation.
By default, Xen runs QEMU in the most trusted domain, dom0, and QEMU has quite a large attack surface.
Running QEMU in dom0 would jeopardize the security of Qubes, so it is necessary to run QEMU outside of dom0.
We do this by using a Xen feature that allows us to run QEMU inside a second “helper” VM called a “stub domain”.*
This way, an attacker who exploits a bug in QEMU will be confined to the stub domain rather than getting full access to dom0.
Admittedly, stub domains run in PV mode, which means that an attacker who were to successfully exploit QEMU would gain the ability to exploit potential Xen bugs in paravirtualization.
Nonetheless, we believe using HVMs to host PCI devices is still a considerable improvement.
Of course, in the long term, we would like to switch to using PVH VMs, but at the moment this is not feasible.
In our testing, we found that pass-through PCI devices did not work in HVMs on some machines.
On the affected machines, networking devices and USB devices, for example, were not usable as they are in Qubes 3.2.
(The kernel driver failed to initialize the device.)
This was a major problem that would have blocked us from moving entirely from PV to HVM in Qubes 4.0.
For this reason, the Qubes 4.0-rc1 installer configures all VMs that have attached PCI devices to use PV mode so that those PCI devices will function correctly.
Problems
After much further testing, we discovered that the affected PCI devices don’t work without MSI support.
(MSI is a method to trigger an interrupt from a PCI device.)
The devices we observed to be problematic were all newer Intel devices (integrated USB controllers and a Wi-Fi card).
While the PCIe standard allows for devices that don’t support legacy interrupts, all the affected devices advertised support for legacy interrupts.
But no interrupts were ever delivered after the driver configured the device.
This made the bug tricky to track down, since we were looking for an error on the software side.
To get those devices working, we needed MSI support.
When running QEMU in dom0, MSI support (and therefore the problematic devices) worked, but with stub domains, it was broken.
This is why, until now, we’ve had patches in place to hide MSI capability from the guest so that the driver doesn’t try to use it (one patch for the Mini-OS-based stub domain (https://github.com/QubesOS/qubes-vmm-xen/blob/ff5eaaa777e9d6ba42242479d1cabacfbdc728ca/patches.misc/hvmpt02-disable-msi-caps.patch) and another for the new Linux-based stub domain (https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/blob/71a01b41a9cf69d580c652a7147c0a8eb33ced97/qemu/patches/disable-msi-caps.patch)).
We found two issues that were preventing MSI support from working with stub domains.
First, the stub domain did not have the required permission on the IRQ, which is reserved for the MSI in the map_pirq hypercall QEMU makes.
(The IRQ is basically a number to distinguish between interrupts from different devices.)
https://www.qubes-os.org/news/2017/10/18/msi-support/
Introduction
In this post, we will describe how we fixed MSI support for VMs running in HVM mode in Qubes 4.0.
First, allow us to provide some background about the MSI feature and why we need it in the first place.
In Qubes 4.0, we switched from paravirtualized (PV) virtual machines to hardware virtual machines (HVMs, also known as “fully virtualized” or “hardware-assisted” VMs) for improved security (see the 4.0-rc1 announcement (https://www.qubes-os.org/news/2017/07/31/qubes-40-rc1/#fully-virtualized-vms-for-better-isolation) for details).
For VMs running as HVMs, Xen requires software that can emulate hardware (such as network cards) called QEMU in order to provide device emulation.
By default, Xen runs QEMU in the most trusted domain, dom0, and QEMU has quite a large attack surface.
Running QEMU in dom0 would jeopardize the security of Qubes, so it is necessary to run QEMU outside of dom0.
We do this by using a Xen feature that allows us to run QEMU inside a second “helper” VM called a “stub domain”.*
This way, an attacker who exploits a bug in QEMU will be confined to the stub domain rather than getting full access to dom0.
Admittedly, stub domains run in PV mode, which means that an attacker who were to successfully exploit QEMU would gain the ability to exploit potential Xen bugs in paravirtualization.
Nonetheless, we believe using HVMs to host PCI devices is still a considerable improvement.
Of course, in the long term, we would like to switch to using PVH VMs, but at the moment this is not feasible.
In our testing, we found that pass-through PCI devices did not work in HVMs on some machines.
On the affected machines, networking devices and USB devices, for example, were not usable as they are in Qubes 3.2.
(The kernel driver failed to initialize the device.)
This was a major problem that would have blocked us from moving entirely from PV to HVM in Qubes 4.0.
For this reason, the Qubes 4.0-rc1 installer configures all VMs that have attached PCI devices to use PV mode so that those PCI devices will function correctly.
Problems
After much further testing, we discovered that the affected PCI devices don’t work without MSI support.
(MSI is a method to trigger an interrupt from a PCI device.)
The devices we observed to be problematic were all newer Intel devices (integrated USB controllers and a Wi-Fi card).
While the PCIe standard allows for devices that don’t support legacy interrupts, all the affected devices advertised support for legacy interrupts.
But no interrupts were ever delivered after the driver configured the device.
This made the bug tricky to track down, since we were looking for an error on the software side.
To get those devices working, we needed MSI support.
When running QEMU in dom0, MSI support (and therefore the problematic devices) worked, but with stub domains, it was broken.
This is why, until now, we’ve had patches in place to hide MSI capability from the guest so that the driver doesn’t try to use it (one patch for the Mini-OS-based stub domain (https://github.com/QubesOS/qubes-vmm-xen/blob/ff5eaaa777e9d6ba42242479d1cabacfbdc728ca/patches.misc/hvmpt02-disable-msi-caps.patch) and another for the new Linux-based stub domain (https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/blob/71a01b41a9cf69d580c652a7147c0a8eb33ced97/qemu/patches/disable-msi-caps.patch)).
We found two issues that were preventing MSI support from working with stub domains.
First, the stub domain did not have the required permission on the IRQ, which is reserved for the MSI in the map_pirq hypercall QEMU makes.
(The IRQ is basically a number to distinguish between interrupts from different devices.)
Fortunately, this problem had already been tracked down by OpenXT (http://openxt.org/), and they made a patch for it (the original (https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch) and our pull request (https://github.com/QubesOS/qubes-vmm-xen/pull/15/commits/2a5229f24296347a40ba3250465a61ca425a6146) based on their patch).
The second problem was that, after setting MSI up, it needed to be enabled.
This is done by setting the MSI enable flag in the PCI config space, which is a special memory mapped region used to configure a PCI device.
However, this write did not reach the device, and therefore no interrupts were delivered.
When running in dom0, the config space write from QEMU goes directly to the real PCI device.
By contrast, inside a stub domain, the write goes to the pcifront driver inside the stub domain and is then blocked by the pciback running in dom0.
With a test patch, we verified that the only problem was with the enable flag write.
No other config space writes were problematic.
Solutions
It appeared that we had to choose from the following options:
Enable permissive mode.
In permissive mode, pciback allows (almost) all writes to the PCI config space.
This seems to be the solution OpenXT chose (see this issue (https://openxt.atlassian.net/browse/OXT-894) and this commit (https://github.com/OpenXT/manager/pull/52/commits/5950ebe73f2411f3af37f5dd56c5c70619e5d99f)).
Allow just the write to this specific config space location (i.e. the enable flag).
Unlike option (1), this option entails allowing only the required write and no others.
Since pcifront has the ability to issue a specific command to pciback to enable MSI, maybe QEMU should send this command instead of the write to the config space.
Something else.
For example, maybe the MSI config should not be handled by QEMU at all in the case of stub domains but should instead be handled directly by Xen.
Option (3) didn’t appear promising.
The enable command that pcifront sends is intended for the normal PV use case where the device is passed to the VM itself (via pcifront) rather than to the stub domain target.
While the command is called enable_msi, pciback does much more than simply setting the enable flag.
It also configures IRQ handling in the dom0 kernel, adapts the MSI masking, and more.
This makes sense in the PV case, but in the HVM case, the MSI configuration is done by QEMU, so this most likely won’t work correctly.
Option (1) would have been the easiest solution.
We would just need to set the option in the domain config.
After discussing it, however, we weren’t convinced that this option is safe (but we also don’t claim it isn’t).
See the paper discussed below and this thread (https://lists.xenproject.org/archives/html/xen-devel/2010-07/msg00257.html) for some potential problems.
So, what about option (2)?
We had to think about whether this might enable a new attack.
If we were to implement option (2), the security scenario would be different from the scenario in which QEMU runs in dom0.
When QEMU runs in dom0, it ensures that MSI is configured in a certain way before enabling MSIs (details (https://git.qemu.org/?p=qemu.git;a=blob;f=hw/xen/xen_pt_config_init.c;h=6f18366f6768ee3d7b72f588dc990a6329124a04;hb=359c41abe32638adad503e386969fa428cecff52#l1114)).
However, we’ve put QEMU in a stub domain so that we don’t have to trust it.
This means that we can no longer trust it to ensure that MSI is configured safely.
What would happen if, for example, a malicious stub domain were to set the enable flag of a PCI device without first configuring it?
As it turns out, ITL has published research relevant to assessing this risk.
The second problem was that, after setting MSI up, it needed to be enabled.
This is done by setting the MSI enable flag in the PCI config space, which is a special memory mapped region used to configure a PCI device.
However, this write did not reach the device, and therefore no interrupts were delivered.
When running in dom0, the config space write from QEMU goes directly to the real PCI device.
By contrast, inside a stub domain, the write goes to the pcifront driver inside the stub domain and is then blocked by the pciback running in dom0.
With a test patch, we verified that the only problem was with the enable flag write.
No other config space writes were problematic.
Solutions
It appeared that we had to choose from the following options:
Enable permissive mode.
In permissive mode, pciback allows (almost) all writes to the PCI config space.
This seems to be the solution OpenXT chose (see this issue (https://openxt.atlassian.net/browse/OXT-894) and this commit (https://github.com/OpenXT/manager/pull/52/commits/5950ebe73f2411f3af37f5dd56c5c70619e5d99f)).
Allow just the write to this specific config space location (i.e. the enable flag).
Unlike option (1), this option entails allowing only the required write and no others.
Since pcifront has the ability to issue a specific command to pciback to enable MSI, maybe QEMU should send this command instead of the write to the config space.
Something else.
For example, maybe the MSI config should not be handled by QEMU at all in the case of stub domains but should instead be handled directly by Xen.
Option (3) didn’t appear promising.
The enable command that pcifront sends is intended for the normal PV use case where the device is passed to the VM itself (via pcifront) rather than to the stub domain target.
While the command is called enable_msi, pciback does much more than simply setting the enable flag.
It also configures IRQ handling in the dom0 kernel, adapts the MSI masking, and more.
This makes sense in the PV case, but in the HVM case, the MSI configuration is done by QEMU, so this most likely won’t work correctly.
Option (1) would have been the easiest solution.
We would just need to set the option in the domain config.
After discussing it, however, we weren’t convinced that this option is safe (but we also don’t claim it isn’t).
See the paper discussed below and this thread (https://lists.xenproject.org/archives/html/xen-devel/2010-07/msg00257.html) for some potential problems.
So, what about option (2)?
We had to think about whether this might enable a new attack.
If we were to implement option (2), the security scenario would be different from the scenario in which QEMU runs in dom0.
When QEMU runs in dom0, it ensures that MSI is configured in a certain way before enabling MSIs (details (https://git.qemu.org/?p=qemu.git;a=blob;f=hw/xen/xen_pt_config_init.c;h=6f18366f6768ee3d7b72f588dc990a6329124a04;hb=359c41abe32638adad503e386969fa428cecff52#l1114)).
However, we’ve put QEMU in a stub domain so that we don’t have to trust it.
This means that we can no longer trust it to ensure that MSI is configured safely.
What would happen if, for example, a malicious stub domain were to set the enable flag of a PCI device without first configuring it?
As it turns out, ITL has published research relevant to assessing this risk.
In “Following the White Rabbit: Software attacks against Intel(R) VT-d technology” (https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf), Rafał Wojtczuk and Joanna Rutkowska describe an attack against VT-d on machines without interrupt remapping support.
For our purposes, the result they describe on page 8 is very important:
Even without access to the PCI config space, a malicious guest is, in many cases, able to generate arbitrary MSIs.
So long as writing to the MSI enable flag does not have any unrelated side effects, there’s no obvious way in which allowing it can worsen security, since an attacker who can set it can already generate arbitrary MSIs anyway.
Meanwhile, we reap the benefits of using HVMs to better isolate VMs with attached PCI devices.
So, we decided to implement option (2).
Based on the analysis above, one could argue that we might as well allow writes to the enable flags for all VMs with attached PCI devices, since doing so shouldn’t decrease security.
To be extra cautious, however, we only allow writes to the enable flags for stub domains.
In other cases, it’s not necessary.
(Here are our patches for pciback (https://github.com/QubesOS/qubes-linux-kernel/pull/12/commits/96b956b38cb24230848a563d3e1ce359c8d8db66) and libxl (https://github.com/QubesOS/qubes-vmm-xen/pull/15/commits/55ef595451d9e2e5583a31c4a3600507ae5500f7).)
Now, the previously problematic devices function correctly inside HVMs.
(Here are the full pull requests: 1 (https://github.com/QubesOS/qubes-linux-kernel/pull/12), 2 (https://github.com/QubesOS/qubes-vmm-xen/pull/15), 3 (https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/pull/3).)
We just merged this feature, and it will be included in Qubes 4.0-rc2, which we plan to release next week.
After these patches undergo further testing, we plan to upstream them so that all Xen users can benefit from our work.
If you have any questions or comments, please write to us on qubes-devel (https://www.qubes-os.org/mailing-lists/#qubes-devel).
*We’ve switch from the Mini-OS-based stub domain to a Linux-based stub domain in Qubes 4.0 based on patches (https://lists.xenproject.org/archives/html/xen-devel/2015-02/msg00426.html) from Anthony Perad and Eric Shelton.
The switch is not significant for the purposes of this article.
For our purposes, the result they describe on page 8 is very important:
Even without access to the PCI config space, a malicious guest is, in many cases, able to generate arbitrary MSIs.
So long as writing to the MSI enable flag does not have any unrelated side effects, there’s no obvious way in which allowing it can worsen security, since an attacker who can set it can already generate arbitrary MSIs anyway.
Meanwhile, we reap the benefits of using HVMs to better isolate VMs with attached PCI devices.
So, we decided to implement option (2).
Based on the analysis above, one could argue that we might as well allow writes to the enable flags for all VMs with attached PCI devices, since doing so shouldn’t decrease security.
To be extra cautious, however, we only allow writes to the enable flags for stub domains.
In other cases, it’s not necessary.
(Here are our patches for pciback (https://github.com/QubesOS/qubes-linux-kernel/pull/12/commits/96b956b38cb24230848a563d3e1ce359c8d8db66) and libxl (https://github.com/QubesOS/qubes-vmm-xen/pull/15/commits/55ef595451d9e2e5583a31c4a3600507ae5500f7).)
Now, the previously problematic devices function correctly inside HVMs.
(Here are the full pull requests: 1 (https://github.com/QubesOS/qubes-linux-kernel/pull/12), 2 (https://github.com/QubesOS/qubes-vmm-xen/pull/15), 3 (https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/pull/3).)
We just merged this feature, and it will be included in Qubes 4.0-rc2, which we plan to release next week.
After these patches undergo further testing, we plan to upstream them so that all Xen users can benefit from our work.
If you have any questions or comments, please write to us on qubes-devel (https://www.qubes-os.org/mailing-lists/#qubes-devel).
*We’ve switch from the Mini-OS-based stub domain to a Linux-based stub domain in Qubes 4.0 based on patches (https://lists.xenproject.org/archives/html/xen-devel/2015-02/msg00426.html) from Anthony Perad and Eric Shelton.
The switch is not significant for the purposes of this article.
New article by Qubes team member Simon Gaiser (HW42): "MSI support for PCI device pass-through with stub domains"
https://t.co/eDVzduPoWY
https://t.co/eDVzduPoWY
Qubes OS
MSI support for PCI device pass-through with stub domains
Introduction In this post, we will describe how we fixed MSI support for VMs running in HVM mode in Qubes 4.0. First, allow us to provide some background about the MSI feature and why we need it i...
Announcing the Xen Project 4.10 RC and Test Day Schedules
https://blog.xenproject.org/2017/10/19/announcing-the-xen-project-4-10-rc-and-test-day-schedules/
On Monday, we created Xen 4.10 RC1 and will release a new release candidate every MONDAY, until we declare a release candidate as the final candidate and cut the Xen 4.10 release. We will also hold a Test Day every WEDNESDAY for the release candidate that was released the week prior to the Test Day […]
https://blog.xenproject.org/2017/10/19/announcing-the-xen-project-4-10-rc-and-test-day-schedules/
On Monday, we created Xen 4.10 RC1 and will release a new release candidate every MONDAY, until we declare a release candidate as the final candidate and cut the Xen 4.10 release. We will also hold a Test Day every WEDNESDAY for the release candidate that was released the week prior to the Test Day […]