Qubes OS – Telegram
Qubes OS
1.99K subscribers
51 photos
2 videos
819 links
A reasonably secure operating system for personal computers.

Qubes-OS.org

⚠️This channel is updated after devs make an announcement to the project.

[Community ran channel]

Help?
English: @QubesChat

German: @QubesOS_user_de

Boost: t.me/QubesOS?boost
Download Telegram
There are also a bunch of additional components not shown on the diagram above,
and which technically are not part of a Qubes system, but which are instrumental
in building of the system:

qubes-builder,
template-builder,
tons of automatic tests,
as well as all the supporting infrastructure for building, testing and
distributing Qubes and the updates, and hosting of the website and
documentation.
As you can see Qubes is a pretty complex creature, and the Qubes Core Stack is
central to its existence.

Qubes VMs: the building blocks for Explicit Partitioning Model

Qubes implements explicit partitioning security model, which means that users
(and/or admins) can define multiple security domains and decide what these
domains can and cannot do. This is a different model than the popular sandboxing
model, as implemented by increasingly many applications and products today,
where every application is automatically sandboxed and some more-or-less
pre-defined set of rules is used to prevent behaviour considered “unsafe”
(whatever that might mean…). I believe the explicit partitioning model
provides many benefits over the sandboxing model, among the most important one
being that it is information-oriented, rather than application-oriented. In
other words it tries to limit damage to the (user’s) data, rather to the
(vendor’s) code and infrastructure.

There have always been a few different kinds of VMs in Qubes: AppVMs, Template
VMs, Standalone VMs, NetVMs, ProxyVMs, DispVM, etc. In Qubes 4 we have slightly
simplified and cleaned up these categories.

First we’ve hidden the PV vs HVM distinction. Now each VM has a property named
virt_mode which is used to decide whether it should be virtualized using Xen
para-virtualization (PV), full virtualization with auxiliary qemu in isolated
“stub domain” (HVM), or – in the near future – as full virtualization
without qemu and the additional stub domain (PVH). This means we no longer
classify VMs as PV vs HVMs, because every VM can be easily switched between
various modes of virtualization with a flip of a property.

We also no longer distinguish between AppVMs, NetVMs and Proxy VMs. Instead,
each VM has a property provides_network, which is false by default, except for
the VMs which we want to expose networking to other VMs (e.g. because they might
have some networking device assigned, such as cellular modem or WiFi device, or
because they act as VPNs or proxies of some sort).

We discuss what the properties are and how to check/set them, later in this
article.

So, to recap, starting with Qubes 4.0 we have only the following classes of VMs:

AppVM - covers what we called AppVMs, NetVMs, and ProxyVMs in earlier Qubes
releases,
DispVM - defined by not having a persistent private image across VM
reboots (see also below),
TemplateVM - these provide root filesystem for AppVMs, the semantics
haven’t changed in Qubes 4,
StandaloneVM - these are like AppVMs, but are not based on any
TemplateVM,
AdminVM a singleton (i.e. a class which has only one instance), which
represents the Administrator VM (Up until recently called dom0, a term
we’re departing from, given it is Xen-specific).
One can list all the VM classes known to the stack using the following command:

[user@dom0 ~]$ qvm-create --help-classes



BTW, we’ve been recently promoting the use of alternative names to “VM(s)”, such
as domain(s), and - more recently - “qube(s)”. This is to stress the connection
between Qubes and virtualization technology is less tight that many might
perceive. Another reason is that, we believe, for many users these alternative
terms might be more friendly than “VMs”, which apparently have strong technical
connotation. The author is quite used to the term VM, however, so should be
excused for (ab)using this term throughout her writings.

Properties, Features and Tags

Each VM (domain) in Qubes can have a number of properties, features and tags,
which describe both how it should behave, as well as what features or services
it offers to other VMs:


Properties, as the name suggests, are used to change the behaviour of the
VM and/or how it is treated by the Qubes Core Stack. The virt_mode property
mentioned above is a good example. For a list of other properties, take a
look here (https://dev.qubes-os.org/projects/core-admin/en/latest/qubes.html#properties). A command which can be used to list, read and
set properties for a VM is qvm-prefs (see below).


Features are very similar to properties, except that they are mostly
opaque to the Core (unlike properties, which are well defined and sanitized).
Features are essentially a dictionary of ‘key=value’ pairs assigned to each
VM. They can be used in various places outside of the main logic of the Core
Stack, specifically in Core Extensions (https://dev.qubes-os.org/projects/core-admin/en/latest/qubes-ext.html). A new tool in
Qubes 4 used to inspect or set features is called qvm-features.


A good example of a mechanism implemented on top of features are
services, which have been reimplemented in Qubes 4. Services are used to
let the VM agents know about whether various additional services, such as
Network Manager, should be enabled or not. Indeed, the qvm-service tool now
(i.e. in Qubes 4.0) internally simply creates features named service.XYZ,
where XYZ is the name of the service passed to qvm-service. It also takes
care of interpreting the true/false values.


Finally, each VM can also have some tags associated with it. Unlike
features, tags do not have a value – a VM can either have a specific tag
(“can be tagged with it”) or not. Unlike properties, they are not interpreted
by any core logic of the Core Stack. The sole purpose of tags is that they
can be used for qrexec policy rules, as discussed below. (This is also the
reason why we wanted to keep them absolutely simple – any complexity within
qrexec policy parsing code would definitely be asking for troubles…).

Last but not least, we should mention that each of the VM has a unique
name associated with it. VM names in Qubes are like filenames in the
filesystem – not only they are unique, but they also are used as a primary
way of identifying VMs, especially for security-related decisions (e.g. for
qrexec policy construction, see below).

And just as one can get away with using generic tagging schemes in place of
referring to paths and filenames in some security systems, similarly in Qubes OS
one can refer to tags (mentioned above and discussed later) in the policy
(but currently not in firewalling rules).

Internally, a VM’s name is implemented as the property
name and thus can be read using qvm-prefs. It cannot be changed, however,
because, starting with Qubes 4.0, we treat it as an immutable property.
User-exposed tools, however, do have a “VM rename” operation, which is
implemented as creating a copy of the VM with the new name and removing the old
VM. Thanks to the new volume
manager we also introduced in Qubes 4 (and which will be the topic of another
post), this operation is actually very cheap, disk-wise.

So, let us now start a console in AdminVM (dom0) and play a bit with these
mechanisms.

Let’s start with something very simple and let us take a look at the properties
of the AdminVM (in the current Qubes implemented by Xen’s dom0):

[user@dom0 ~]$ qvm-prefs dom0
default_dispvm D fedora-25-dvm
label - black
name D dom0
qid D 0
uuid D 00000000-0000-0000-0000-000000000000



As we can see, there aren’t many properties for the AdminVM, and none of them
can be modified by the user or admin. While we’re here, we should mention there
exists a similar tool, qubes-prefs which allows one to view and modify the
global system properties, and these properties should not be confused with
the properties of the AdminVM:

[user@dom0 ~]$ qubes-prefs
check_updates_vm D True
clockvm - sys-net
default_dispvm - fedora-25-dvm
default_fw_netvm D None
default_kernel - 4.9.45-21
default_netvm - sys-firewall
default_pool D lvm
default_pool_kernel - linux-kernel
default_pool_private D lvm
default_pool_root D lvm
default_pool_volatile D lvm
default_template - fedora-25
stats_interval D 3
updatevm - sys-firewall



Now, let’s inspect an ordinary AppVM, say work (which is one of the default
VMs created by the installer, but, of course, the user might have it named
differently):

[user@dom0 ~]$ qvm-prefs work
autostart D False
backup_timestamp D
debug D False
default_dispvm D fedora-25-dvm
default_user D user
gateway D
include_in_backups D True
installed_by_rpm D False
ip D
kernel D 4.9.45-21
kernelopts D nopat
label - blue
mac D 00:16:3E:5E:6C:00
maxmem D 4000
memory D 400
name - work
netvm - None
provides_network D False
qid - 8
qrexec_timeout D 60
stubdom_mem U
stubdom_xid D 6
template - fedora-25
template_for_dispvms D False
updateable D False
uuid - fab9a577-2531-4971-bce1-ca0c9b511f27
vcpus D 2
virt_mode D hvm
visible_gateway D
visible_ip D
visible_netmask D
xid D 5



We see lots of properties which have a D flag displayed, which indicates the
property uses the core-provided default value. In most cases the user is able to
override these values for specific VMs. For example the virt_mode property is
by default set to hvm for all AppVMs, which means that full virtualization
mode is used for the domain, but we can override this for specific VM(s) with
pv and thus force them to be virtualized using para-virtualization mode, which
is what the installer noscripts do for the sys-net and sys-usb VMs in Qubes
4.0-rc1 in order to work around problems with PCI passthrough support that we’ve
observed on some platforms.

As a simple exercise with changing the property, we can switch off networking
for one of the AppVMs by setting its netvm property to an empty value:

[user@dom0 ~]$ qvm-prefs --set work netvm ""



We can now confirm that indeed the VM named work has no network. E.g. we can
use the qvm-ls command (it should print - in the column which indicated the
netvm), or open a terminal in the work VM itself and try to see if there is a
virtual networking device exposing the network (we can e.g. use the ifconfig
or ip a commands).

Instead of setting an empty value as the netvm, we could have also set some
other VM, thus forcing the networking traffic to pass through that other VM,
e.g. to route all the VM’s traffic through a Whonix Tor gateway:

[user@dom0 ~]$ qvm-prefs --set work netvm sys-whonix



The VM which we want to use as the provider of networking services to other VMs,
such as sys-whonix in the example above, should have its provides_network
property set to true.

To revert back to the system-default netvm (as specified by qubes-prefs),
we can use the --default switch:

[user@dom0 ~]$ qvm-prefs --default work netvm



Now let us take a look at the features and services. It might be most
illustrative to look at both of these mechanisms together. Let’s suppose we
would like to enable the Network Manager service in some VM. Perhaps we would
like to use it to create a VPN connection terminated in one of the AppVMs, say
in the work VM. (An alternative option would be to create a separate VM, say
work-vpn, set its provides_network property to true, enable the
network-manager service in there, and use it to provide networking to the other
VMs, e.g. work.)
To enable the Network Manager service and a handy widget that shows the status
of the VPN, we can do:

[user@dom0 ~]$ qvm-service --enable work network-manager



This should result in the following output:

[user@dom0 ~]$ qvm-service work
network-manager on



We see the service is enabled, and if we restart the work VM, we should see
the Network Manager widget appear in the tray.

Now let’s view the features of this same VM:

[user@dom0 ~]$ qvm-features work
service.network-manager 1



We can now clearly see how the services are internally implemented on top of
features. This is done internally by the qvm-services command and by the
noscripts in the VMs, which interpret specifically-formatted features as services.
The qvm-service tool also takes care about properly setting and interpreting
the true/false and empty/non-empty strings to make sense for services states
(on/off), eliminating user mistakes.

One thing left for us to explore are tags, discussed above, but we will defer
this discussion until a later chapter, which talks about qrexec policy, as tags
are exclusively for use within policies. For now, suffice to say, that there is
a dedicated tool, qvm-tags used to check and set/modify the tags for each of
the VM (in addition to some tags being automatically created by the Core Stack,
such as e.g. the created-by-XYZ tag, discussed in the previous article on
Admin API).

As a final note we should stress one more time that VM names, despite being
normal property, are special in that the are immutable. In order to change the
name one needs to 1) create a new VM, 2) import the volume from the original VM
(using admin.vm.volume.CloneFrom and admin.vm.volume.CloneTo calls), as well
as 3) copy all the properties, tags, and features from the original VM, and
4) remove the original VM.

Each of these operations can be controlled independently via qrexec policy.
While this might seem like a superficial complication at first sight, we believe
it allows to simplify the implementation, as well as to minimize the amount of
potential mistakes when creating policies, notably when there is more than one
management VMs in the system, such as e.g. the (de-privileged) GUI domain and
some corporate-owned (but also semi-privileged) management VM. This has been
discussed in more details in the previous post on Admin API (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/).

Disposable VMs redesigned

We have also redesigned how Disposable VMs (DispVMs) work in Qubes 4.0. Before
4.0, DispVMs had two kinds of largely unrelated characteristics: 1) being
disposable, i.e. having no persistent private image, and 2) booting from
pre-created savefiles in order to save on startup time.

In Qubes 4.0 we have redefined DispVMs solely by this first property, i.e.
private-image-not-having, which we believe to be the essential characteristics
of a Disposable VM. The underlying mechanism, e.g. whether the VM is restored
from a savefile to speed up the startup time, might or might not be implemented
for any VM and should not concern the user.

Another major change to DispVMs semantics in Qubes 4 (largely allowed by this
relaxation of DispVM definition) is that any of the AppVMs can be now used as a
template for the DispVM.

This provides lots of flexibility, e.g. it’s easy to create a customized DispVM
for signing PDF documents, or various disposable service VMs, such as Net- and
VPN- VMs. One just creates a normal AppVM, sets everything up there, e.g. loads
keys, configures VPNs, connects to known WiFi networks, and then shuts down the
AppVM and in place of it starts a DispVM based on that AppVM.

This way, whenever one has a gut feeling that something’s wrong with the VM
(e.g. because one just connected a USB stick to copy some slides), it’s easy to
just restart the DispVM in question and benefit from a new clean root
filesystem! Admittedly sometimes this might not be enough to get rid of an
infection, as discussed in the recent post on Compromise
Recovery (https://www.qubes-os.org/news/2017/04/26/qubes-compromise-recovery/), but for many cases this feature is highly desirable.

But opening up this flexibility comes at a price. We have to be careful about
which VMs can create which DispVMs. After all, it would be a disaster to allow
your casual Internet-browsing AppVM to spawn a DispVM based on your sensitive
work AppVM. The casual Internet-browsing VM could have the new DispVM open a
malicious file that compromises the DispVM. The compromised DispVM would then be
able to leak sensitive work-related data, since it uses the work VM as its
template.

There are two mechanisms in place to prevent such mistakes:


Each AppVM has a property called template_for_dispvms, which controls
whether this VM can serve as a template for Disposable VMs (i.e., whether any
DispVMs based on this VM are allowed in the system. By default,
this property is false for all AppVMs and needs to be manually enabled.


The choice of the template (i.e. specific AppVM) for the Disposable VM must
be provided by the qrexec policy (which the source VM cannot modify), and
defaults to the source VM’s default_dispvm property, which by default has a
value as specified via qubes-prefs. The resulting AppVM must have the
template_for_dispvms property set, or otherwise an error will occur.

Here we will take a look at how the template can be specified explicitly when
starting the DispVM from dom0:

[user@dom0 ~]$ qvm-run --dispvm=work --service qubes.StartApp+firefox
Running 'qubes.StartApp+firefox' on $dispvm:work
$dispvm:work: Refusing to create DispVM out of this AppVM, because template_for_dispvms=False



As mentioned above, we also need to explicitly enable use of the work AppVM as
Disposable VMs templates:

[user@dom0 ~]$ qvm-prefs --set work template_for_dispvms True
[user@dom0 ~]$ qvm-run --dispvm=work --service qubes.StartApp+firefox
Running 'qubes.StartApp+firefox' on $dispvm:work



We will look into how the DispVM can be specified via qrexec policy in a later
chapter below.

Qubes Remote Execution (qrexec): the underlying integration framework

Qubes OS is more than just a collection of isolated domains (currently
implemented as Xen VMs). The essential feature of Qubes OS, which sets
it apart from ordinary virtualization systems, is the unique way in which it
securely integrates these isolated domains for use in a single endpoint
system.

It’s probably accurate to say that the great majority of our efforts is on
building this integration in a manner so that it doesn’t ruin the isolation that
Xen provides to us.

There are several layers of integration infrastructure in Qubes OS, as depicted
in the following diagram (click for full size) and described below:
First, there is the Xen-provided mechanism based on shared memory for
inter-VM communication (it’s called “grant tables” in Xen parlance). This is
then used to implement various Xen-provided drivers for networking and
virtual disk (called “block backends/frontends” by Xen). Generally, any
virtualization or containerization system provides some sort of similar
mechanisms, and we could easily use of any them on Qubes, because our Core
Stack uses libvirt to configure these mechanisms. Qubes does have some
specific requirements, however, the most unique one being that we want to put
the backends in unprivileged VMs, rather than in dom0 (or host or root
partition, however it is called in other systems). This is one of the reasons
why Xen still is our hypervisor of choice – most (all?) other systems assume
all the backends to be placed in the privileged domain, which clearly is
undesirable from the security point of view.


Next, we have the Qubes infrastructure layer, which builds on top of the
Xen-provided shared memory communication mechanism. The actual layer of
abstraction is called “vchan”. If we moved to another hypervisor, we would
need to have vchan implemented on top of whatever inter-VM infrastructure
that other hypervisor was to make available to us. Various Qubes-specific
mechanisms, such as our security-optimized GUI virtualization, builds on top
of vchan.


Finally, and most importantly, there is the Qubes-specific qrexec
infrastructure, which also builds on top of vchan, and which exposes
socket-like interfaces to processes running inside the VMs. This
infrastructure is governed by a centralized policy (enforced by the AdminVM).
Most of the Qubes-specific services and apps are built on top of qrexec, with
the notable exception of GUI virtualization, as mentioned earlier.

It’s important to understand several key properties of this qrexec
infrastructure, which distinguish it from other, seemingly similar, solutions.
First, qrexec does not attempt to perform any serialization (à la RPC). Instead,
it exposes only a simple “pipe”, pretty much like a TCP layer. This allows us to
keep the code which handles the incoming (untrusted) data very simple. Any kind
of (de-)serialization and parsing is offloaded to the specific code which has
been registered for that particular service (e.g. qubes.FileCopy).

This might seem like a superficial win. After all, the data needs to be
de-serialized and parsed somewhere, so why would it matter where exactly?
True, but what this model allows us to do is to selectively decide how much
risk we want to take for any specific domain by allowing (or not) specific
service calls from (specific or all) other domains.

For example, by decoupling the (more complex) logic of data parsing as used by
the qubes.FileCopy service from the core code of qrexec we can eliminate
potential attacks on the qubes.FileCopy server code by not allowing e.g. any
of the VMs tagged personal to issue this call to any of the VMs tagged work,
while at the same time still allowing all of these VMs to communicate with the
default clockvm (by default sys-net, adjustable via policy redirect as
discussed below) to request the qubes.GetDate service. We would not have such
a flexibility in trust partitioning if our qrexec infrastructure had
serialization built in (e.g. if it was implementing a protocol like DBus between
VMs). Of course, specific services might decide to use some complex serializing
protocols on top of qrexec (e.g. DBus) very easily, because qrexec connection is
seen as a socket by applications running in the VMs.

Another important characteristic is the policing of qrexec, which is
something we discuss in the next section.

Let’s write a simple qrexec service, which would be illustrative for what we
just discussed. Imagine we want to get the price of a Bitcoin (BTC), but the VM
where we need it has no network connectivity, perhaps because it contains some
very sensitive data and we want to cut off as much interfaces to the untrusted
external world as possible.

In the untrusted VM we will create our simple server with the following body:

curl -s https://blockchain.info/q/24hrprice



The above should be pasted into the /usr/local/etc/qubes-rpc/my.GetBTCprice
file in our untrusted AppVM (let’s name it… untrusted). Let’s also test if
the service work (still from the untrusted VM):

[user@untrusted ~]$ sudo chmod +x /etc/qubes-rpc/my.GetBTCprice
[user@untrusted ~]$ /usr/local/etc/qubes-rpc/my.GetBTCprice
(...)



We should see the recent price of Bitcoin displayed.

Now, let’s create a network-disconnect, trusted AppVM called wallet:

[user@dom0 ~]$ qvm-create wallet -l blue --property netvm=""
[user@dom0 ~]$ qvm-ls



And now from a console in this new AppVM, let’s try to call our newly created
service:

[user@wallet ~]$ qrexec-client-vm untrusted my.GetBTCprice
(...)



The above command will invoke the (trusted) dialog box asking to confirm the
request and select the destination VM. For this experiment just select
untrusted from the “Target” drop-down list and acknowledge (in case you wonder
why the target VM is specified twice – once in the requesting VM and for the
2nd time in the trusted dialog box: we will discuss it in the next section). You
should see the price of the bitcoin printed as a result.

Before we move on to discussing the flexibility of qrexec policies, let’s pause
for a moment and recap what has just happened:


The network-disconnected, trusted VM called wallet requested the service
my.GetBTCprice from a VM named untrusted. The wallet VM had no way to
get the price of BTC, because it has no networking (for security).


After the user confirmed the call (by clicking on the trusted
dialog box, or by having a specific allow policy, as discussed below), it
got passed to the destination VM: untrusted.


The qrexec agent running in the untrusted VM invoked the handling
code for the my.GetBTCprice service. This code, in turn, performed a number
of complex actions: it TCP-connected to some server on the internet
(blockchain.info), performed some very complex crypto handshake to establish
an HTTPS connection to that server, then retrieved some complex data over
that connection, and finally returned the price of Bitcoin. There’s
likely hundreds of thousands lines of code involved in this operation.


Finally, whatever the my.GetBTCprice service returned on its stdout was
automagically taken by qrexec agent and piped back to the requesting VM, our
wallet VM.

The wallet VM got the data it wanted without the need to get involved in
all these complex operations, which take hundreds of thousands of lines of code
talking to untrusted computers over the network. That’s how we can improve the
security of this process without spending effort on auditing or hardening
the programs used (e.g. curl).

Of course that was a very simple example. But a very similar approach is used
among many Qubes services. E.g. system updates for dom0 are downloaded in
untrusted VMs and exposed to (otherwise network-disconnected) dom0 via the
qubes.ReceiveUpdates service (which later verifies digital signatures on the
packages). Another example is qubes.PdfConvert, which offloads the complex
parsing and rendering of PDFs to Disposable VMs and retrieves back only a very
simple format that is easily verified to be non-malicious. This simple format is
then converted back to a (now trusted) PDF.

More expressive qrexec policies

Because pretty much everything in Qubes which provides integration over the
compartmentalized domains is based on qrexec, it is imperative to have a
convenient (i.e. simple to use), secure (i.e. simple in implementation) yet
expressive enough mechanism to control who can request which qrexec services
from whom. Since the original qrexec policing was introduced in Qubes
release 1, the mechanism has undergone some slight gradual improvements.
We still keep the policy as a collection of simple text files, located in
/etc/qubes-rpc/policy/ directory in the AdminVM (dom0). This allows for
automating of the policy packaging into RPM (trusted) packages, as well policy
customization from within our integrated Salt Stack via (trusted) Salt state
files.

Now, one of the coolest features we’ve introduced in Qubes 4.0 is the ability to
tag VMs and use these to make policy decisions.

Imagine we have several work-related domains. We can now tag all them with some
tag of our choosing, say work:

[user@dom0 user ~]$ qvm-tags itl-email add work
[user@dom0 user ~]$ qvm-tags accounting add work
[user@dom0 user ~]$ qvm-tags project-liberation add work



Now we can easily construct a qrexec policy, e.g. to constrain the file (or
clipbooard) copy operation, so that it’s allowed only between the VMs tagged as
work (while preventing file transfer to and from any VM not tagged with
work) – all we need is to add the following 3 extra rules in the policy file
for qubes.FileCopy:

[user@dom0 user ~]$ cat /etc/qubes-rpc/polisy/qubes.FileCopy
(...)
$tag:work $tag:work allow
$tag:work $anyvm deny
$anyvm $tag:work deny

$anyvm $anyvm ask



We can do the same for clipboard, just need to place the same 3 rules in
qubes.ClipboardPaste policy file.

Also, as already discussed in the previous post on Admin API (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/),
the Core Stack automatically tags VMs with created-by-XYZ tag, where XYZ is
replaced by the name of the VM which invoked the admin.vm.Create* service.
This allows to automatically constrain power of specific management VMs to only
manage its “own” VMs, and not others. Please refer to the article linked above
for the examples.

Furthermore, Disposable VMs can also be referred to via tags in the policy, for
example:

# Allow personal VMs to start DispVMs created by the user via guidom:
$tag:created-by-guidom $dispvm:$tag:created-by-guidom allow

# Allow corpo-managed VMs to start DispVMs based on corpo-owned AppVMs:
$tag:created-by-mgmt-corpo $dispvm:$tag:created-by-mgmt-corpo allow



Of course, we can also explicitly specify Disposable VMs using the
$dispvm: syntax, e.g. to allow AppVMs tagged
work to request qrexec service from a Disposable VM created off
work-printing AppVM (as noted earlier, the “work-printing” would need to have
it’s property template_for_dispvms set for this to work):

$anyvm $dispvm:work-printing allow



In Qubes 4.0 we have also implemented more strict control over the destination
argument for qrexec calls. Until Qubes 4.0, the source VM (i.e., the VM that
calls a service) was responsible for providing a valid destination VM name to
which it wanted to direct the service call (e.g. qubes.FileCopy or
qubes.OpenInVM). Of course, the policy always had the last say in this
process, and if the policy had a deny rule for the specific case, the service
call was dropped.

What has changed in Qubes 4.0 is that whenever the policy says ask (in
contrast to allow or deny), then the VM-provided destination is essentially
ignored, and instead a trusted prompt is displayed in dom0 to ask the user to
select the destination (with a convenient drop down list).

(One could argue that the VM-provided destination argument in qrexec policy call
is not entirely ignored, as it might be used to select a specific qrexec policy
line, in case there were a few matching, such as:

$anyvm work allow
$anyvm work-web ask,default_target=work-web
$anyvm work-email ask



In that case, however, the VM-provided call would still be overridden by the
user choice in case the rule #2 was selected.)
A prime example of where this is used is the qubes.FileCopy service. However, we
should note that for most other services, in a well configured system, there
should be very few ask rules. Instead most policies should be either allow
or deny, thereby relieving the user from having to make a security decision with every
service invocation. Even the qubes.FileCopy service should be additionally guarded by
deny rules (e.g. forbidding any file transfers between personal and
work-related VMs), and we believe our integrated Salt Management Stack should be
helpful in creating such policies in larger deployments of Qubes within
corporations.

Here comes in handy another powerful feature of qrexec policy: the target=
specifier, which can be added after the action keyword. This forces the call to
be directed to the specific destination VM, no matter what the source VM
specified. A good place to make use of this is in policies for starting various
Disposable VMs. For example, we might have a special rule for Disposable VMs
which can be invoked only from VMs tagged with work (e.g. for the
qubes.OpenInVM service):

$tag:work $dispvm allow,target=$dispvm:work-printing
$anyvm $dispvm:work-printing deny



The first line means that every DispVM created by a VM tagged with work will
be based on (i.e., use as its template) the work-printing VM. (Recall that, in
order for this to succeed, the work-printing VM also has to have its
template_for_dispvms property set to true.) The second line means that any
other VM (i.e., any VM not tagged with work) will be denied from creating
DispVMs based on the work-printing VM.

qubes.xml, qubesd, and the Admin API

Since the beginning, Qubes Core Stack kept all the information about the Qubes
system’s (persistent) configuration within the /var/lib/qubes/qubes.xml file.
This has included information such as what VMs are defined, on which templates
they are based, how they are network-connected, etc. This file has never been
intended to be edited by the user by hand (except for rare system
troubleshooting). Instead, Qubes has provided lots of tools, such as
qvm-create, qvm-prefs and qubes-prefs, and many more, which operate or
make use of the information from this file.

In Qubes 4.x we’ve introduced the qubesd daemon (service) which is now the
only entity which has direct access to the qubes.xml file, and which exposes a
well-defined API to other tools. This API is used by a few internal tools
running in dom0, such as some power management noscripts, qrexec policy checker,
and qubesd-query(-fast) wrapper, which in turn is used to expose most parts of
this API to other VMs via the qrexec infrastructure. This is what we call Admin
API, and which we I described in the previous post (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/). While the
mapping between Admin API and the internal qubesd API is nearly “one-to-one”,
the primary difference is that Admin API is subject to policy mechanism via
qrexec, while the qubesd-exposed API is not policed, because it is only exposed
locally within the dom0. This architecture is depicted below.
As discussed in the post about Admin API (https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/), we have put lots of
thought into designing the API in such a way as to allow effective split between
the user and admin roles. Security-wise this means that admins should be able to
manage configurations and policies in the system, but not be able to access the
user data (i.e. AppVMs’ private images). Likewise it should be possible to
prevent users from changing the policies of the system, while allowing them to
use their data (but perhaps not export/leak them easily outside the system).

For completeness I’d like to mention that both qrexec and firewalling policies
are not included in the central qubes.xml file, but rather in separate
locations, i.e. in /etc/qubes-rpc/policy/ and
/var/lib/qubes//firewall.xml respectively. This allows for easy
updating of the policy files, e.g. from within trusted RPMs that are installed
in dom0 and which might be brining new qrexec services, or from whatever tool
used to create/manage firewalling policies.

Finally, a note that qubesd (as in “qubes-daemon”) should not be confused with
[qubes-db][https://github.com/QubesOS/qubes-core-qubesdb] (as in
“qubes-database”). The latter is a Qubes-provided, security-optimized abstraction
for exposing static informations from one VMs to others (mostly from AdminVM),
and which is used e.g. for the agents in the VMs to get to know the VM’s name,
type and other configuration options.

The new attack surface?

With all these changes to the Qubes Core Stack, an important question comes to
mind: how do all these changes affect the security of the system?

In an attempt to provide somewhat meaningful answer to that question, we should
first observe that there exists a number of obvious configurations (including
the default one) of the system, in which there should be no security regression
compared to previous Qubes versions.

Indeed, by not allowing any other VM to access the Admin API (which is what the
default qrexec policy for Admin API does), we essentially reduce the attack
surface onto the Core Stack to what is has been in the previous versions (modulo
potentially more complex policy parser, as discussed below).

Let us now imagine exposing some subset of the Admin API to select, trusted
management VMs, such as the upcoming GUI domain (in Qubes 4.1). As long as we
consider these select VMs as “trusted”, again the situation does not seem to be
any worse that what it was before (we can simply think of dom0 as having
comprised these additional VMs in previous versions of Qubes. Certainly there is
no security benefit here, but likewise there is no added risk).

Now let’s move a step further and relax our trustworthiness requirement for this,
say, GUI domain. We will now consider it only “somewhat trustworthy”. The whole
promise of the new Admin API is that, with a reasonably policed Admin API (see
also the previous post), even if this domain gets compromised, this will not
result in full system compromise, and ideally only in some kind of a DoS where
none of the user data will get compromised. Of course, in such a situation there
is additional attack surface that should be taken into account, such as the
qubesd-exposed interface. In case of a hypothetical bug in the implementation of
the qubesd-exposed interface (which is heavily sanitized and also written in
Python, but still) the attacker who compromised our “somewhat trustworthy” GUI
domain might compromise the whole system. But then again, let’s remember that
without the Admin API we would not have a “somewhat trustworthy” GUI domain in the
first place, and if we assume it was possible for the attacker to compromise
this VM, then she would also be able to compromise dom0 in earlier Qubes
versions. That would be fatal without any additional preconditions (e.g. for a
bug in qubesd).

Finally, we have the case of a “largely untrusted” management VM. The typical
scenario could be a management VM “owned” by an organization/corporation. As
explained in the previous post, the Admin API should allow us to grant such a VM
authority over only a subset of VMs, specifically only those which it created,
and not any other (through the convenient created-by-XYZ tags in the policy).
Now, if we consider this VM to become compromised, e.g. as a result of the
organization’s proprietary management agents getting compromised somehow, then
it becomes a very urging question to answer how buggy the qubesd-exposed
interface might be. Again, on most (all?) other client system, such a situation
would be fatal immediately (i.e. no additional attacks would be required after
the attacker compromised the agent), while on Qubes this would only be the
prelude for trying another attacks to get to dom0.

One other aspect which might not be immediately clear is the trade-off between a
more flexible architecture, which e.g. allows to create mutually untrusted
management VMs on the one hand, and the increased complexity of e.g. the policy
checker, which is required to now also understand new keywords such as the
previously introduced $dispvm:xyz or $tag:xyz. In general we believe that if
we can introduce a significant new security improvement on architecture level,
which allows to further decompose the TCB of the system, than it is worth it.
This is because architecture-level security should always go first, before
implementation-level security. Indeed, the latter can always be patched, and in
many cases won’t be critical (because e.g. smart architecture will keep it
outside of the TCB), while the architecture very often cannot be so easily
“fixed”. In fact this is the prime reason why we have Qubes OS, i.e. because
fixing of the monolithic architecture of the mainstream OSes has seemed hopeless
to us.

Summary

The new Qubes Core Stack provides a very flexible framework for managing a
compartmentalized desktop (client) oriented system. Compared to previous Qubes
Core Stacks, it offers much more flexibility, which translates to ability to
further decompose the system into more (largely) mutually untrusting parts.

Some readers might wonder how does the Qubes Core Stack actually compare to
various popular cloud/server/virtualization management APIs, such as
OpenStack/EC2 or even Docker?

While at first sight there might be quite a few similarities related to
management of VMs or containers, the primary differentiating factor is that
Qubes Core Stack has been designed and optimized to bring user one desktop
system built on top of multiple isolated domains (currently implemented as Xen
Virtual Machines, but in the future maybe on top of something else), rather than
for management of service-oriented infrastructure, where the services are
largely independent from each other and where the prime consideration is
scalability.

The Qubes Core Stack is Xen- and virtualization-independent, and should be
easily portable to any compartmentalization technology.

In the upcoming article we will take a look at the updated device and new volume
management in Qubes 4.0.
Qubes Security Bulletin #34: a bunch of Xen bugs (believed to be DoS only) & a (rather minor) bug in GUI coloring:
https://t.co/aczRAsPxho
QSB #34: GUI issue and Xen vulnerabilities (XSA-237 through XSA-244)
https://www.qubes-os.org/news/2017/10/12/qsb-34/

Dear Qubes Community,

We have just published Qubes Security Bulletin (QSB) #34:
GUI issue and Xen vulnerabilities (XSA-237 through XSA-244).
The text of this QSB is reproduced below. This QSB and its accompanying
signatures will always be available in the Qubes Security Pack (qubes-secpack).

View QSB #34 in the qubes-secpack:

https://github.com/QubesOS/qubes-secpack/blob/master/QSBs/qsb-034-2017.txt

Learn about the qubes-secpack, including how to obtain, verify, and read it:

https://www.qubes-os.org/security/pack/

View all past QSBs:

https://www.qubes-os.org/security/bulletins/

View the XSA Tracker:

https://www.qubes-os.org/security/xsa/

---===[ Qubes Security Bulletin #34 ]===---

October 12, 2017


GUI issue and Xen vulnerabilities (XSA-237 through XSA-244)

Summary
========

One of our developers, Simon Gaiser (aka HW42), while working on
improving support for device isolation in Qubes 4.0, discovered a
potential security problem with the way Xen handles MSI-capable devices.
The Xen Security Team has classified this problem as XSA-237 [01], which
was published today.

At the same time, the Xen Security Team released several other Xen
Security Advisories (XSA-238 through XSA-244). The impact of these
advisories ranges from system crashes to potential privilege
escalations. However, the latter seem to be mostly theoretical. See our
commentary below for details.

Finally, Eric Larsson discovered a situation in which Qubes GUI
virtualization could allow a VM to produce a window that has no colored
borders (which are used in Qubes as front-line indicators of trust).
A VM cannot use this vulnerability to draw different borders in place of
the correct one, however. We discuss this issue extensively below.

Technical details
==================

Xen issues
-----------

Xen Security Advisory 237 [01]:

| Multiple issues exist with the setup of PCI MSI interrupts:
| - unprivileged guests were permitted access to devices not owned by
| them, in particular allowing them to disable MSI or MSI-X on any
| device
| - HVM guests can trigger a codepath intended only for PV guests
| - some failure paths partially tear down previously configured
| interrupts, leaving inconsistent state
| - with XSM enabled, caller and callee of a hook disagreed about the
| data structure pointed to by a type-less argument
|
| A malicious or buggy guest may cause the hypervisor to crash, resulting
| in Denial of Service (DoS) affecting the entire host. Privilege
| escalation and information leaks cannot be excluded.

Xen Security Advisory 238 [02]:

| DMOPs (which were a subgroup of HVMOPs in older releases) allow guests
| to control and drive other guests. The I/O request server page mapping
| interface uses range sets to represent I/O resources the emulation of
| which is provided by a given I/O request server. The internals of the
| range set implementation require that ranges have a starting value no
| lower than the ending one. Checks for this fact were missing.
|
| Malicious or buggy stub domain kernels or tool stacks otherwise living
| outside of Domain0 can mount a denial of service attack which, if
| successful, can affect the whole system.
|
| Only domains controlling HVM guests can exploit this vulnerability.
| (This includes domains providing hardware emulation services to HVM
| guests.)

Xen Security Advisory 239 [03]:

| Intercepted I/O operations may deal with less than a full machine
| word's worth of data. While read paths had been the subject of earlier
| XSAs (and hence have been fixed), at least one write path was found
| where the data stored into an internal structure could contain bits
| from an uninitialized hypervisor stack slot. A subsequent emulated
| read would then be able to retrieve these bits.
|
| A malicious unprivileged x86 HVM guest may be able to obtain sensitive
| information from the host or other guests.

Xen Security Advisory 240 [04]:

| x86 PV guests are permitted to set up certain forms of what is often
| called "linear page tables", where pagetables contain references to
| other pagetables at the same level or higher. Certain restrictions
| apply in order to fit into Xen's page type handling system. An
| important restriction was missed, however: Stacking multiple layers
| of page tables of the same level on top of one another is not very
| useful, and the tearing down of such an arrangement involves
| recursion. With sufficiently many layers such recursion will result
| in a stack overflow, commonly resulting in Xen to crash.
|
| A malicious or buggy PV guest may cause the hypervisor to crash,
| resulting in Denial of Service (DoS) affecting the entire host.
| Privilege escalation and information leaks cannot be excluded.

Xen Security Advisory 241 [05]:

| x86 PV guests effect TLB flushes by way of a hypercall. Xen tries to
| reduce the number of TLB flushes by delaying them as much as possible.
| When the last type reference of a page is dropped, the need for a TLB
| flush (before the page is re-used) is recorded. If a guest TLB flush
| request involves an Inter Processor Interrupt (IPI) to a CPU in which
| is the process of dropping the last type reference of some page, and
| if that IPI arrives at exactly the right instruction boundary, a stale
| time stamp may be recorded, possibly resulting in the later omission
| of the necessary TLB flush for that page.
|
| A malicious x86 PV guest may be able to access all of system memory,
| allowing for all of privilege escalation, host crashes, and
| information leaks.

Xen Security Advisory 242 [06]:

| The page type system of Xen requires cleanup when the last reference
| for a given page is being dropped. In order to exclude simultaneous
| updates to a given page by multiple parties, pages which are updated
| are locked beforehand. This locking includes temporarily increasing
| the type reference count by one. When the page is later unlocked, the
| context precludes cleanup, so the reference that is then dropped must
| not be the last one. This was not properly enforced.
|
| A malicious or buggy PV guest may cause a memory leak upon shutdown
| of the guest, ultimately perhaps resulting in Denial of Service (DoS)
| affecting the entire host.

Xen Security Advisory 243 [07]:

| The shadow pagetable code uses linear mappings to inspect and modify the
| shadow pagetables. A linear mapping which points back to itself is known as
| self-linear. For translated guests, the shadow linear mappings (being in a
| separate address space) are not intended to be self-linear. For
| non-translated guests, the shadow linear mappings (being the same
| address space) are intended to be self-linear.
|
| When constructing a monitor pagetable for Xen to run on a vcpu with, the shadow
| linear slot is filled with a self-linear mapping, and for translated guests,
| shortly thereafter replaced with a non-self-linear mapping, when the guest's
| %cr3 is shadowed.
|
| However when writeable heuristics are used, the shadow mappings are used as
| part of shadowing %cr3, causing the heuristics to be applied to Xen's
| pagetables, not the guest shadow pagetables.
|
| While investigating, it was also identified that PV auto-translate mode was
| insecure. This mode was removed in Xen 4.7 due to being unused, unmaintained
| and presumed broken. We are not aware of any guest implementation of PV
| auto-translate mode.
|
| A malicious or buggy HVM guest may cause a hypervisor crash, resulting in a
| Denial of Service (DoS) affecting the entire host, or cause hypervisor memory
| corruption. We cannot rule out a guest being able to escalate its privilege.

Xen Security Advisory 244 [08]:

| The x86-64 architecture allows interrupts to be run on distinct stacks.
| The choice of stack is encoded in a field of the corresponding
| interrupt denoscriptor in the Interrupt Denoscriptor Table (IDT). That
| field selects an entry from the active Task State Segment (TSS).
|
| Since, on AMD hardware, Xen switches to an HVM guest's TSS before
| actually entering the guest, with the Global Interrupt Flag still set,
| the selectors in the IDT entry are switched when guest context is
| loaded/unloaded.
|
| When a new CPU is brought online, its IDT is copied from CPU0's IDT,
| including those selector fields. If CPU0 happens at that moment to be
| in HVM context, wrong values for those IDT fields would be installed
| for the new CPU. If the first guest vCPU to be run on that CPU
| belongs to a PV guest, it will then have the ability to escalate its
| privilege or crash the hypervisor.
|
| A malicious or buggy x86 PV guest could escalate its privileges or
| crash the hypervisor.
|
| Avoiding to online CPUs at runtime will avoid this vulnerability.


GUI daemon issue
-----------------

Qubes OS's GUI virtualization enforces colored borders around all VM
windows. There are two types of windows. The first type are normal
windows (with borders, noscriptbars, etc.). In this case, we modify the
window manager to take care of coloring the borders. The second type are
borderless windows (with the override_redirect property set to True in
X11 terminology). Here, the window manager is not involved at all, and
our GUI daemon needs to draw a border itself. This is done by drawing a
2px border whenever window content is changed beneath that area. The bug
was that if the VM application had never sent any updates for (any part
of) the border area, the frame was never drawn. The relevant code is in
the gui-daemon component [09], specifically in gui-daemon/xside.c [10]:

/* update given fragment of window image
* can be requested by VM (MSG_SHMIMAGE) and Xserver (XExposeEvent)
* parameters are not sanitized earlier - we must check it carefully
* also do not let to cover forced colorful frame (for undecoraded windows)
*/
static void do_shm_update(Ghandles * g, struct windowdata *vm_window,
int untrusted_x, int untrusted_y, int untrusted_w,
int untrusted_h)
{

/* ... */

if (!vm_window->image && !(g->screen_window && g->screen_window->image))
return;
/* force frame to be visible: */
/* * left */
delta = border_width - x;
if (delta > 0) {
w -= delta;
x = border_width;
do_border = 1;
}
/* * right */
delta = x + w - (vm_window->width - border_width);
if (delta > 0) {
w -= delta;
do_border = 1;
}
/* * top */
delta = border_width - y;
if (delta > 0) {
h -= delta;
y = border_width;
do_border = 1;
}
/* * bottom */
delta = y + h - (vm_window->height - border_width);
if (delta > 0) {
h -= delta;
do_border = 1;
}

/* ... */

}

The above code is responsible for deciding whether the colored border
needs to be updated. It is updated if both:
a) there is any window image (vm_window->image)
b) the updated area includes a border anywhere

If neither of these conditions is met, no border is drawn. Note that if
the VM tries to draw anything there (for example, a fake border in a
different color), whatever is drawn will be overridden with the correct
borders, which will stay there until the window is destroyed.

Eric Larsson discovered that this situation (not updating the border
area) is reachable -- and even happens with some real world applications
-- when the VM shows a splash screen with a custom shape. While custom
window shapes are not supported in Qubes OS, VMs do not know this. The
VM still thinks the custom-shaped window is there, so it does not send
updates of content outside of that custom shape.

We fixed the issue by forcing an update of the whole window before
making it visible:

static void handle_map(Ghandles * g, struct windowdata *vm_window)
{

/* ... */

/* added code */
if (vm_window->override_redirect) {
/* force window update to draw colorful frame, even when VM have not
* sent any content yet */
do_shm_update(g, vm_window, 0, 0, vm_window->width, vm_window->height);
}

(void) XMapWindow(g->display, vm_window->local_winid);
}

This needs some auxiliary changes in the do_shm_update function, to draw
the frame also in cases when there is no window content yet
(vm_window->image is NULL).

Commentary from the Qubes Security Team
========================================

For the most part, this batch of Xen Security Advisories affects Qubes
OS 3.2 only theoretically. In the case of Qubes OS 4.0, half of them do
not apply at all. We'll comment briefly on each one:

XSA-237 - The impact is believed to be denial of service only. In addition,
we believe proper use of Interrupt Remapping should offer a generic
solution to similar problems, to reduce them to denial of
service at worst.

XSA-238 - The stated impact is denial of service only.

XSA-239 - The attacking domain has no control over what information
is leaked.

XSA-240 - The practical impact is believed to be denial of service (and does not
affect HVMs).

XSA-241 - The issue applies only to PV domains, so the attack vector
is largely limited in Qubes OS 4.0, which uses HVM domains
by default. In addition, the Xen Security Team considers this
bug to be hard to exploit in practice (see advisory).

XSA-242 - The stated impact is denial of service only. In addition, the
issue applies only to PV domains.

XSA-243 - The practical impact is believed to be denial of service. In addition,
the vulnerable code (shadow page tables) is build-time disabled
in Qubes OS 4.0.

XSA-244 - The vulnerable code path (runtime CPU hotplug) is not used
in Qubes OS.

These results reassure us that switching to HVM domains in Qubes OS 4.0
was a good decision.

Compromise Recovery
====================

Starting with Qubes 3.2, we offer Paranoid Backup Restore Mode, which
was designed specifically to aid in the recovery of a (potentially)
compromised Qubes OS system. Thus, if you believe your system might have
been compromised (perhaps because of the bugs discussed in this
bulletin), then you should read and follow the procedure described here:

https://www.qubes-os.org/news/2017/04/26/qubes-compromise-recovery/

Patching
=========

The specific packages that resolve the problems discussed in this
bulletin are as follows:

For Qubes 3.2:
- Xen packages, version 4.6.6-32
- qubes-gui-dom0, version 3.2.12

For Qubes 4.0:
- Xen packages, version 4.8.2-6
- qubes-gui-dom0, version 4.0.5

The packages are to be installed in dom0 via the Qubes VM Manager or via
the qubes-dom0-update command as follows:

For updates from the stable repository (not immediately available):
$ sudo qubes-dom0-update

For updates from the security-testing repository:
$ sudo qubes-dom0-update --enablerepo=qubes-dom0-security-testing

A system restart will be required afterwards.

These packages will migrate from the security-testing repository to the
current (stable) repository over the next two weeks after being tested
by the community.

If you use Anti Evil Maid, you will need to reseal your secret
passphrase to new PCR values, as PCR18+19 will change due to the new
Xen binaries.

Credits
========

The GUI daemon issue was discovered by Eric Larsson.

The PCI MSI issues were discovered by Simon Gaiser (aka HW42).

For other issues, see the original Xen Security Advisories.

References
===========