Bits, Please!: 2016

30/06/2016

Extracting Qualcomm's KeyMaster Keys - Breaking Android Full Disk Encryption

After covering a TrustZone kernel vulnerability and exploit in the previous blog post, I thought this time it might be interesting to explore some of the implications of code-execution within the TrustZone kernel. In this blog post, I'll demonstrate how TrustZone kernel code-execution can be used to effectively break Android's Full Disk Encryption (FDE) scheme. We'll also see some of the inherent issues stemming from the design of Android's FDE scheme, even without any TrustZone vulnerability.

I've been in contact with Qualcomm regarding the issue prior to the release of this post, and have let them review the blog post. As always, they've been very helpful and fast to respond. Unfortunately, it seems as though fixing the issue is not simple, and might require hardware changes.

If you aren't interested in the technical details and just want to read the conclusions - feel free to jump right to the "Conclusions" section. In the same vein, if you're only interested in the code, jump directly to the "Code" section.

[UPDATE: I've made a factual mistake in the original blog post, and have corrected it in the post below. Apparently Qualcomm are not able to sign firmware images, only OEMs can do so. As such, they cannot be coerced to create a custom TrustZone image. I apologise for the mistake.]

And now without further ado, let's get to it!

Setting the Stage

A couple of months ago the highly-publicised case of Apple vs. FBI brought attention to the topic of privacy - especially in the context of mobile devices. Following the 2015 San Bernardino terrorist attack, the FBI seized a mobile phone belonging to the shooter, Syed Farook, with the intent to search it for any additional evidence or leads related to the ongoing investigation. However, despite being in possession of the device, the FBI were unable to unlock the phone and access its contents.

This may sound puzzling at first. "Surely if the FBI has access to the phone, could they not extract the user data stored on it using forensic tools?". Well, the answer is not that simple. You see, the device in question was an iPhone 5c, running iOS 9.

As you may well know, starting with iOS 8, Apple has automatically enabled Full Disk Encryption (FDE) using an encryption key which is derived from the user's password. In order to access the data on the device, the FBI would have to crack that encryption. Barring any errors in cryptographic design, this would most probably be achieved by cracking the user's password.

"So why not just brute-force the password?". That sounds like a completely valid approach - especially since most users are notoriously bad at choosing strong passwords, even more so when it comes to mobile devices.

However, the engineers at Apple were not oblivious to this concern when designing their FDE scheme. In order to try and mitigate this kind of attack, they've designed the encryption scheme so that the generated encryption key is bound to the hardware of the device.

In short, each device has an immutable 256-bit unique key called the UID, which is randomly generated and fused into the device's hardware at the factory. The key is stored in a way which completely prevents access to it using software or firmware (it can only be set as a key for the AES Engine), meaning that even Apple cannot extract it from the device once it's been set. This device-specific key is then used in combination with the provided user's password in order to generate the resulting encryption key used to protect the data on the device. This effectively 'tangles' the password and the UID key.

Apple's FDE KDF

Binding the encryption key to the device's hardware allows Apple to make the job much harder for would-be attackers. It essentially forces attackers to use the device for each cracking attempt. This, in turn, allows Apple to introduce a whole array of defences that would make cracking attempts on the device unattractive.

For starters, the key-derivation function shown above is engineered in such a way so that it would take a substantial amount of time to compute on the device. Specifically, Apple chose the function's parameters so that a single key derivation would take approximately 80 milliseconds. This delay would make cracking short alphanumeric passwords slow (~2 weeks for a 4-character alphanumeric password), and cracking longer passwords completely infeasible.

In order to further mitigate brute-force attacks on the device itself, Apple has also introduced an incrementally increasingly delay between subsequent password guesses. On the iPhone 5c, this delay was facilitated completely using software. Lastly, Apple has allowed for an option to completely erase all of the information stored on the device after 10 failed password attempts. This configuration, coupled with the software-induced delays, made cracking the password on the device itself rather infeasible as well.

With this in mind, it's a lot more reasonable that the FBI were unable to crack the device's encryption.

Had they been able to extract the UID key, they could have used as much (specialized) hardware as needed in order to rapidly guess many passwords, which would most probably allow them to eventually guess the correct password. However, seeing as the UID key cannot be extracted by means of software or firmware, that option is ruled out.

As for cracking the password on the device, the software-induced delays between password attempts and the possibility of obliterating all the data on the device made that option rather unattractive. That is, unless they could bypass the software protections... However, this is where the story gets rather irrelevant to this blog post, so we'll keep it at that.

If you'd like to read more, you can check out Dan Guido's superb post about the technical aspects of Apple v. FBI, or Matthew Green's great overview on Apple's FDE, or better yet, the iOS Security Guide.

Going back to the issue at hand - we can see that Apple has cleverly designed their FDE scheme in order to make it very difficult to crack. Android, being the mature operating system that it is, was not one to lag behind. In fact, Android has also offered full disk encryption, which has been enabled by default since Android 5.0.

So how does Android's FDE scheme fare? Let's find out.

Android Full Disk Encryption

Starting with Android 5.0, Android devices automatically protect all of the user's information by enabling full disk encryption.

Android FDE is based on a Linux Kernel subsystem called dm-crypt, which is widely deployed and researched. Off the bat, this is already good news - dm-crypt has withstood the test of time, and as such seems like a great candidate for an FDE implementation. However, while the encryption scheme may be robust, the system is only as strong as the key being used to encrypt the information. Additionally, mobile devices tend to cause users to choose poorer passwords in general. This means the key derivation function is hugely important in this setting.

So how is the encryption key generated?

This process is described in great detail in the official documentation of Android FDE, and in even greater detail in Nikolay Elenkov's blog, "Android Explorations". In short, the device generates a randomly-chosen 128-bit master key (which we'll refer to as the Device Encryption Key - DEK) and a 128-bit randomly-chosen salt. The DEK is then protected using an elaborate key derivation scheme, which uses the user's provided unlock credentials (PIN/Password/Pattern) in order to derive a key which will ultimately encrypt the DEK. The encrypted DEK is then stored on the device, inside a special unencrypted structure called the "crypto footer".

The encrypted disk can then be decrypted by simply taking the user's provided credentials, passing them through the key derivation function, and using the resulting key to decrypt the stored DEK. Once the DEK is decrypted, it can be used to decrypt user's information.

However, this is where it gets interesting! Just like Apple's FDE scheme, Android FDE seeks to prevent brute-force cracking attacks; both on the device and especially off of it.

Naturally, in order to prevent on-device cracking attacks, Android introduced delays between decryption attempts and an option to wipe the user's information after a few subsequent failed decryption attempts (just like iOS). But what about preventing off-device brute-force attacks? Well, this is achieved by introducing a step in the key derivation scheme which binds the key to the device's hardware. This binding is performed using Android's Hardware-Backed Keystore - KeyMaster.

KeyMaster

The KeyMaster module is intended to assure the protection of cryptographic keys generated by applications. In order to guarantee that this protection cannot be tampered with, the KeyMaster module runs in a Trusted Execution Environment (TEE), which is completely separate from the Android operating system. In keeping with the TrustZone terminology, we'll refer to the Android operating system as the "Non-Secure World", and to the TEE as the "Secure World".

Put simply, the KeyMaster module can be used to generate encryption keys, and to perform cryptographic operations on them, without ever revealing the keys to the Non-Secure World.

Once the keys are generated in the KeyMaster module, they are encrypted using a hardware-backed encryption key, and returned to Non-Secure World. Whenever the Non-Secure World wishes to perform an operation using the generated keys, it must supply the encrypted "key blob" to the KeyMaster module. The KeyMaster module can then decrypt the stored key, use it to perform the wanted cryptographic operation, and finally return the result to the Non-Secure World.

Since this is all done without ever revealing the cryptographic keys used to protect the key blobs to the Non-Secure World, this means that all cryptographic operations performed using key blobs must be handled by the KeyMaster module, directly on the device itself.

With this in mind, let's see exactly how KeyMaster is used in Android's FDE scheme. We'll do so by taking a closer look at the hardware-bound key derivation function used in Android's FDE scheme. Here's a short schematic detailing the KDF (based on a similar schematic created by Nikolay Elenkov):

Android FDE's KDF

As you can see, in order to bind the KDF to the hardware of the device, an additional field is stored in the crypto footer - a KeyMaster-generated key blob. This key blob contains a KeyMaster-encrypted RSA-2048 private key, which is used to sign the encryption key in an intermediate step in the KDF - thus requiring the use of the KeyMaster module in order to produce the intermediate key used decrypt the DEK in each decryption attempt.

Moreover, the crypto footer also contains an additional field that doesn't serve any direct purpose in the decryption process; the value returned from running scrypt on the final intermediate key (IK3). This value is referred to as the "scrypted_intermediate_key" (Scrypted IK in the diagram above). It is used to verify the validity of the supplied FDE password in case of errors during the decryption process. This is important since it allows Android to know when a given encryption key is valid but the disk itself is faulty. However, knowing this value still shouldn't help the attacker "reverse" it to retrieve the IK3, so it still can't be used to help attackers aiming to guess the password off the device.

As we've seen, the Android FDE's KDF is "bound" to the hardware of the device by the intermediate KeyMaster signature. But how secure is the KeyMaster module? How are the key blobs protected? Unfortunately, this is hard to say. The implementation of the KeyMaster module is provided by the SoC OEMs and, as such, is completely undocumented (essentially a black-box). We could try and rely on the official Android documentation, which states that the KeyMaster module: "...offers an opportunity for Android devices to provide hardware-backed, strong security services...". But surely that's not enough.

So... Are you pondering what I'm pondering?

Reversing Qualcomm's KeyMaster

As we've seen in the previous blog posts, Qualcomm provides a Trusted Execution Environment called QSEE (Qualcomm Secure Execution Environment). The QSEE environment allows small applications, called "Trustlets", to execute on a dedicated secured processor within the "Secure World" of TrustZone. One such QSEE trustlet running in the "Secure World" is the KeyMaster application. As we've already seen how to reverse-engineer QSEE trustlets, we can simply apply the same techniques in order to reverse engineer the KeyMaster module and gain some insight into its inner workings.

First, let's take a look at the Android source code which is used to interact with the KeyMaster application. Doing so reveals that the trustlet only supports four different commands:

As we're interested in the protections guarding the generated key blobs, let's take a look at the KEYMASTER_SIGN_DATA command. This command receives a previously encrypted key blob and somehow performs an operation using the encapsulated cryptographic key. Ergo, by reverse-engineering this function, we should be able to deduce how the encrypted key blobs are decapsulated by the KeyMaster module.

The command's signature is exactly as you'd imagine - the user provides an encrypted key blob, the signature parameters, and the address and length of the data to be signed. The trustlet then decapsulates the key, calculates the signature, and writes it into the shared result buffer.

As luck would have it, the key blob's structure is actually defined in the supplied header files. Here's what it looks like:

Okay! This is pretty interesting.

First, we can see that the key blob contains the unencrypted modulus and public exponent of the generated RSA key. However, the private exponent seems to be encrypted in some way. Not only that, but the whole key blob's authenticity is verified by using an HMAC. So where is the encryption key stored? Where is the HMAC key stored? We'll have to reverse-engineer the KeyMaster module to find out.

Let's take a look at the KeyMaster trustlet's implementation of the KEYMASTER_SIGN_DATA command. The function starts with some boilerplate validations in order to make sure the supplied parameters are valid. We'll skip those, since they aren't the focus of this post. After verifying all the parameters, the function maps-in the user-supplied data buffer, so that it will be accessible to the "Secure World". Eventually, we reach the "core" logic of the function:

Okay, we're definitely getting somewhere!

First of all, we can see that the code calls some function which I've taken the liberty of calling get_some_kind_of_buffer, and stores the results in the variables buffer_0 and buffer_1. Immediately after retrieving these buffers, the code calls the qsee_hmac function in order to calculate the HMAC of the first 0x624 bytes of the user-supplied key blob. This makes sense, since the size of the key blob structure we've seen before is exactly 0x624 bytes (without the HMAC field).

But wait! We've already seen the qsee_hmac function before - in the Widevine application. Specifically, we know it receives the following arguments:

The variable that we've called buffer_1 is passed in as the fourth argument to qsee_hmac. This can only mean one thing... It is in fact the HMAC key!

What about buffer_0? We can already see that it is used in the function do_something_with_keyblob. Not only that, but immediately after calling that function, the signature is calculated and written to the destination buffer. However, as we've previously seen, the private exponent is encrypted in the key blob. Obviously the RSA signature cannot be calculated until the private exponent is decrypted... So what does do_something_with_keyblob do? Let's see:

Aha! Just as we suspected. The function do_something_with_keyblob simply decrypts the private exponent, using buffer_0 as the encryption key!

Finally, let's take a look at the function that was used to retrieve the HMAC and encryption keys (now bearing a more appropriate name):

As we can see in the code above, the HMAC key and the encryption key are both generated using some kind of key derivation function. Each key is generated by invoking the KDF using a pair of hard-coded strings as inputs. The resulting derived key is then stored in the KeyMaster application's global buffer, and the pointer to the key is returned to the caller. Moreover, if we are to trust the provided strings, the internal key derivation function uses an actual hardware key, called the SHK, which would no doubt be hard to extract using software...

...But this is all irrelevant! The decapsulation code we have just reverse-engineered has revealed a very important fact.

Instead of creating a scheme which directly uses the hardware key without ever divulging it to software or firmware, the code above performs the encryption and validation of the key blobs using keys which are directly available to the TrustZone software! Note that the keys are also constant - they are directly derived from the SHK (which is fused into the hardware) and from two "hard-coded" strings.

Let's take a moment to explore some of the implications of this finding.

Conclusions

The key derivation is not hardware bound. Instead of using a real hardware key which cannot be extracted by software (for example, the SHK), the KeyMaster application uses a key derived from the SHK and directly available to TrustZone.
OEMs can comply with law enforcement to break Full Disk Encryption. Since the key is available to TrustZone, OEMs could simply create and sign a TrustZone image which extracts the KeyMaster keys and flash it to the target device. This would allow law enforcement to easily brute-force the FDE password off the device using the leaked keys.
Patching TrustZone vulnerabilities does not necessarily protect you from this issue. Even on patched devices, if an attacker can obtain the encrypted disk image (e.g. by using forensic tools), they can then "downgrade" the device to a vulnerable version, extract the key by exploiting TrustZone, and use them to brute-force the encryption. Since the key is derived directly from the SHK, and the SHK cannot be modified, this renders all down-gradable devices directly vulnerable.
Android FDE is only as strong as the TrustZone kernel or KeyMaster. Finding a TrustZone kernel vulnerability or a vulnerability in the KeyMaster trustlet, directly leads to the disclosure of the KeyMaster keys, thus enabling off-device attacks on Android FDE.

During my communication with Qualcomm I voiced concerns about the usage of a software-accessible key derived from the SHK. I suggested using the SHK (or another hardware key) directly. As far as I know, the SHK cannot be extracted from software, and is only available to the cryptographic processors (similarly to Apple's UID). Therefore, using it would thwart any attempt at off-device brute force attacks (barring the use of specialized hardware to extract the key).

However, reality is not that simple. The SHK is used for many different purposes. Allowing the user to directly encrypt data using the SHK would compromise those use-cases. Not only that, but the KeyMaster application is widely used in the Android operating-system. Modifying its behaviour could "break" applications which rely on it. Lastly, the current design of the KeyMaster application doesn't differentiate between requests which use the KeyMaster application for Android FDE and other requests for different use-cases. This makes it harder to incorporate a fix which only modifies the KeyMaster application.

Regardless, I believe this issue underscores the need for a solution that entangles the full disk encryption key with the device's hardware in a way which cannot be bypassed using software. Perhaps that means redesigning the FDE's KDF. Perhaps this can be addressed using additional hardware. I think this is something Google and OEMs should definitely get together and think about.

Extracting the KeyMaster Keys

Now that we've set our sights on the KeyMaster keys, we are still left with the challenge of extracting the keys directly from TrustZone.

Previously on the zero-to-TrustZone series of blog posts, we've discovered an exploit which allowed us to achieve code-execution within QSEE, namely, within the Widevine DRM application. However, is that enough?

Perhaps we could read the keys directly from the KeyMaster trustlet's memory from the context of the hijacked Widevine trustlet? Unfortunately, the answer is no. Any attempt to access a different QSEE application's memory causes an XPU violation, and subsequently crashes the violating trustlet (even when switching to a kernel context). What about calling the same KDF used by the KeyMaster module to generate the keys from the context of the Widevine trustlet? Unfortunately the answer is no once again. The KDF is only present in the KeyMaster application's code segment, and QSEE applications cannot modify their own code or allocate new executable pages.

Luckily, we've also previously discovered an additional privilege escalation from QSEE to the TrustZone kernel. Surely code execution within the TrustZone kernel would allow us to hijack any QSEE application! Then, once we control the KeyMaster application, we can simply use it to leak the HMAC and encryption keys and call it a day.

Recall that in the previous blog post we reverse-engineered the mechanism behind the invocation of system calls in the TrustZone kernel. Doing so revealed that most system-calls are invoked indirectly by using a set of globally-stored pointers, each of which pointing to a different table of supported system-calls. Each system-call table simply contained a bunch of consecutive 64-bit entries; a 32-bit value representing the syscall number, followed by a 32-bit pointer to the syscall handler function itself. Here is one such table:

Since these tables are used by all QSEE trustlets, they could serve as a highly convenient entry point in order to hijack the code execution within the KeyMaster application!

All we would need to do is to overwrite a system-call handler entry in the table, and point it to a function of our own. Then, once the KeyMaster application invokes the target system-call, it would execute our own handler instead of the original one! This also enables us not to worry about restoring execution after executing our code, which is a nice added bonus.

But there's a tiny snag - in order to direct the handler at a function of our own, we need some way to allocate a chunk of code which will be globally available in the "Secure World". This is because, as mentioned above, different QSEE applications cannot access each other's memory segments. This renders our previous method of overwriting the code segments of the Widevine application useless in this case. However, as we've seen in the past, the TrustZone Kernel's code segments (which are accessible to all QSEE application when executing in kernel context) are protected using a special hardware component called an XPU. Therefore, even when running within the TrustZone kernel and disabling access protection faults in the ARM MMU, we are still unable to modify them.

This is where some brute-force comes in handy... I've written a small snippet of code that quickly iterates over all of the TrustZone Kernel's code segments, and attempts to modify them. If there is any (mistakenly?) XPU-unprotected region, we will surely find it. Indeed, after iterating through the code segments, one rather large segment, ranging from addresses 0xFE806000 to 0xFE810000, appeared to be unprotected!

Since we don't want to disrupt the regular operation of the TrustZone kernel, it would be wise to find a small code-cave in that region, or a small chunk of code that would be harmless to overwrite. Searching around for a bit reveals a small bunch of logging strings in the segment - surely we can overwrite them without any adverse effects:

Now that we have a modifiable code cave in the TrustZone kernel, we can proceed to write a small stub that, when called, will exfiltrate the KeyMaster keys directly from the KeyMaster trustlet's memory!

Lastly, we need a simple way to cause the KeyMaster application to execute the hijacked system-call. Remember, we can easily send commands to the KeyMaster application which, in turn, will cause the KeyMaster application to call quite a few system-calls. Reviewing the KeyMaster's key-generation command reveals that one good candidate to hijack would be the "qsee_hmac" system-call:

KeyMaster's "Generate Key" Flow

Where qsee_hmac's signature is:

This is a good candidate for a few reasons:

The "data" argument that's passed in is a buffer that's shared with the non-secure world. This means whatever we write to it can easily retrieved after returning from the "Secure World".
The qsee_hmac function is not called very often, so hijacking it for a couple of seconds would probably be harmless.
The function receives the address of the HMAC key as one of the arguments. This saves us the need to find the KeyMaster application's address dynamically and calculate the addresses of the keys in memory.

Finally, all our shellcode would have to do is to read the HMAC and encryption keys from the KeyMaster application's global buffer (at the locations we saw earlier on), and "leak" them into the shared buffer. After returning from the command request, we could then simply fish-out the leaked keys from the shared buffer. Here's a small snippet of THUMB assembly that does just that:

Shellcode which leaks KeyMaster Keys

Putting it all together

Finally, we have all the pieces of the puzzle. All we need to do in order to extract the KeyMaster keys is to:

Enable the DACR in the TrustZone kernel to allow us to modify the code cave.
Write a small shellcode stub in the code cave which reads the keys from the KeyMaster application.
Hijack the "qsee_hmac" system-call and point it at our shellcode stub.
Call the KeyMaster's key-generation command, causing it to trigger the poisoned system-call and exfiltrate the keys into the shared buffer.
Read the leaked keys from the shared buffer.

Here's a diagram detailing all of these steps:

The Code

Finally, as always, I've provided the full source code for the attack described above. The code builds upon the two previously disclosed issues in the zero-to-TrustZone series, and allows you to leak the KeyMaster keys directly from your device! After successfully executing the exploit, the KeyMaster keys should be printed to the console, like so:

You can find the full source code of the exploit here:

https://github.com/laginimaineb/ExtractKeyMaster

I've also written a set of python scripts which can be used to brute-force Android full disk encryption off the device. You can find the scripts here:

https://github.com/laginimaineb/android_fde_bruteforce

Simply invoke the python script fde_bruteforce.py using:

The crypto footer from the device
The leaked KeyMaster keys
The word-list containing possible passwords

Currently, the script simply enumerates each password from a given word-list, and attempts to match the encryption result with the "scrypted intermediate key" stored in the crypto footer. That is, it passes each word in the word-list through the Android FDE KDF, scrypts the result, and compares it to the value stored in the crypto footer. Since the implementation is fully in python, it is rather slow... However, those seeking speed could port it to a much faster platform, such as hashcat/oclHashcat.

Here's what it looks like after running it on my own Nexus 6, encrypted using the password "secret":

Lastly, I've also written a script which can be used to decrypt already-generated KeyMaster key blobs. If you simply have a KeyMaster key blob that you'd like to decrypt using the leaked keys, you can do so by invoking the script km_keymaster.py, like so:

Final Thoughts

Full disk encryption is used world-wide, and can sometimes be instrumental to ensuring the privacy of people's most intimate pieces of information. As such, I believe the encryption scheme should be designed to be as "bullet-proof" as possible, against all types of adversaries. As we've seen, the current encryption scheme is far from bullet-proof, and can be hacked by an adversary or even broken by the OEMs themselves (if they are coerced to comply with law enforcement).

I hope that by shedding light on the subject, this research will motivate OEMs and Google to come together and think of a more robust solution for FDE. I realise that in the Android ecosystem this is harder to guarantee, due to the multitude of OEMs. However, I believe a concentrated effort on both sides can help the next generation of Android devices be truly "uncrackable".

15/06/2016

TrustZone Kernel Privilege Escalation (CVE-2016-2431)

In this blog post we'll continue our journey from zero permissions to code execution in the TrustZone kernel. Having previously elevated our privileges to QSEE, we are left with the task of exploiting the TrustZone kernel itself.

"Why?", I hear you ask.

Well... There are quite a few interesting things we can do solely from the context of the TrustZone kernel. To name a few:

We could hijack any QSEE application directly, thus exposing all of it's internal secrets. For example, we could directly extract the stored real-life fingerprint or various secret encryption keys (more on this in the next blog post!).
We could disable the hardware protections provided by the SoC's XPUs, allowing us to read and write directly to all of the DRAM. This includes the memory used by the peripherals on the board (such as the modem).
As we've previously seen, we could blow the QFuses responsible for various device features. In certain cases, this could allow us to unlock a locked bootloader (depending on how the lock is implemented).

So now that we've set the stage, let's start by surveying the attack surface!

War of the Worlds - Hijacking the Linux Kernel from QSEE

After seeing a full QSEE vulnerability and exploit in the previous blog post, I thought it might be nice to see some QSEE shellcode in action.

As we've previously discussed, QSEE is extremely privileged - not only can it interact directly with the TrustZone kernel and access the hardware-secured TrustZone file-system (SFS), but it also has some direct form of access to the system's memory.

In this blog post we'll see how we can make use of this direct memory access in the "Secure World" in order to hijack the Linux Kernel running in the "Normal World", without even requiring a kernel vulnerability.

QSEE privilege escalation vulnerability and exploit (CVE-2015-6639)

In this blog post we'll discover and exploit a vulnerability which will allow us to gain code execution within Qualcomm's Secure Execution Environment (QSEE). I've responsibly disclosed this vulnerability to Google and it has been fixed - for the exact timeline, see the "Timeline" section below.

The QSEE Attack Surface

As we've seen in the previous blog post, Qualcomm's TrustZone implementation enables the "Normal World" operating system to load trusted applications (called trustlets) into a user-space environment within the "Secure World", called QSEE.

This service is provided to the "Normal World" by sending specific SMC calls which are handled by the "Secure World" kernel. However, since SMC calls cannot be invoked from user-mode, all communication between the "Normal World" and a trustlet must pass through the "Normal World" operating system's kernel.

Having said that, regular user-space processes within the "Normal World" sometimes need to communicate with trustlets which provide specific services to them. For example, when playing a DRM protected media file, the process in charge of handling media within Android, "mediaserver", must communicate with the appropriate DRM trustlet in order to decrypt and render the viewed media file. Similarly, the process in charge of handling cryptographic keys, "keystore", needs to be able to communicate with a special trustlet ("keymaster") which provides secure storage and operation on cryptographic keys.

So if communicating with trustlets requires the ability to issue SMCs, and this cannot be done from user-mode, then how do these processes actually communicate with the trustlets?

The answer is by using a Linux kernel device, called "qseecom", which enables user-space processes to perform a wide range of TrustZone-related operations, such as loading trustlets into the secure environment and communicating with loaded trustlets.

However! Although necessary, this is very dangerous; communication with TrustZone exposes a large (!) attack surface - if any trustlet that can be loaded on a particular device contains a vulnerability, we can exploit it in order to gain code execution within the trusted execution environment. Moreover, since the trusted execution environment has the ability to map-in and write to all physical memory belonging to the "Normal World", it can also be used in order to infect the "Normal World" operating system's kernel without there even being a vulnerability in the kernel (simply by directly modifying the kernel's code from the "Secure World").

Because of the dangers outlined above, the access to this device is restricted to the minimal set of processes that require it. A previous dive into the permissions required in order to access the driver has shown that only four processes are able to access "qseecom":

surfaceflinger (running with "system" user-ID)
drmserver (running with "drm" user-ID)
mediaserver (running with "media" user-ID)
keystore (running with "keystore" user-ID)

This means that if we manage to get a hold of any of these four processes, we would then be able to directly attack any trustlet of our choice, directly bypassing the Linux kernel in the process! In fact, this is exactly what we'll do - but we'll get to that later in the series.

For this blog post, let's assume that we already have code-execution within the "mediaserver" process, thus allowing us to directly focus on the attack surface provided by trustlets. Here's an illustration to help visualise the path of the exploit chain we'll cover during the series and the focus of this post:

Vulnerability Scope

I haven't been able to confirm the exact scope of this issue. I've statically checked quite a few devices (such as the Droid Turbo, Nexus 6, Moto X 2nd Gen), and they were all vulnerable. In fact, I believe the issue was very wide-spread, and may have affected most Qualcomm-based devices at the time.

So why was this issue so prevalent? As we'll see shortly, the vulnerability is contained in a trustlet and so does not rely on the TrustZone kernel (which tends to change substantially between SoCs), but rather on code which is designed to be able to execute in the same manner on many different devices. As such, all devices containing the trustlet were made vulnerable, regardless of their SoC.

Also note that on some devices the vulnerable code was present but appeared slightly different (it may have been an older version of the same code). Those devices are also vulnerable, although the indicators and strings you might search for could be slightly different. This means that if you're searching for the exact strings mentioned in this post and don't find them, don't be dissuaded! Instead, reverse-engineer the trustlet using the tools from the previous blog post, and check for yourself.

Enter Widevine

Previously, we decided to focus our research efforts on the "widevine" trustlet, which enables playback of DRM encrypted media using Widevine's DRM platform. This trustlet seems like a good candidate since it is moderately complex (~125KB) and very wide-spread (according to their website, it is available on over 2 billion devices).

After assembling the raw trustlet, we are left with an ELF file, waiting to be analysed. Let's start by taking a look at the function registered by the trustlet in order to handle incoming commands:

As we can see, the first 32-bit value in the command is used to specify the command code, the high-word of which is used to sort the commands into four different categories.

Taking a peek at each of the category-handling functions reveals that the categories are quite rich - all in all, there are about 70 different supported commands - great! However, going over 70 different commands would be a pretty lengthy process - perhaps we can find a shortcut that'll point us in the right direction? For example, maybe there's a category of commands that were accidentally left in even though they're not used on production devices?

Since the libraries which are used to interact with the trustlets are also proprietary, we can't look through the source code to find the answers. Instead, I wrote a small IDAPython script to lookup all the references to "QSEECom_send_cmd", the function used to send commands to trustlets, and check what the "command-code" value is for each reference. Then I simply grouped the results into the categories above, producing the following results:

So... Nobody is using 5X commands. Suspicious!

Sifting through the functions in the 5X category, we reach the following command:

Pretty straight-forward: copies the data from our request buffer into a "malloc"-ed buffer (note that the length field here is not controlled by us, but is derived from the real buffer length passed to QSEOS). Then, the function's flow diverges according to a flag in our request buffer. Let's follow the flow leading to "PRDiagVerifyProvisioning":

Finally, we found a vulnerability!

After some simple validation (such as checking that the first DWORD in the command buffer is indeed zero), the function checks the value of the fourth DWORD in our crafted command buffer. As we can see above, setting that value to zero will lead us to a code-path in which a fully-controlled copy is performed from our command buffer into some global buffer, using the third DWORD as the length argument. Since this code-path only performs the vulnerable memcpy and nothing else, it is much more convenient to deal with (since it doesn't have unwanted side-effects), so we'll stick to this code-path (rather than the one above it, which seems more complex).

Moreover, you might be wondering what is the "global buffer" that's referred to in the function above. After all, it looks a little strange - it isn't passed in to the function at any point, by is simply referred to "magically", by using the register R9.

Remember how the trustlets that we analysed in the previous blog post had a large read-write data segment? This is the data segment in which all the modifiable data of the trustlet is stored - the stack, the heap and the global variables. In order to quickly access this segment from any location in the code, Qualcomm decided to use the platform-specific R9 register as a "global register" whose value is never modified, and which always points to the beginning of the aforementioned segment. According to the ARM AAPCS, this is actually valid behaviour:

What now?

Now that we have a primitive, it's time to try and understand which pieces of data are controllable by our overflow. Again, using a short IDAPython script, we can search for all references to the "global buffer" (R9) which reside after the overflown buffer's start address (that is, after offset 0x10FC). Here are the results:

Disappointingly, nearly all of these functions don't perform any "meaningful" operations of the controllable pieces of data. Specifically, the vast majority of these functions simply store file-system paths in those memory locations, which imply no obvious way to hijack control flow.

Primitive Technology

Since there aren't any function pointers or immediate ways to manipulate control flow directly after the overflown buffer, we'll need to upgrade our buffer overflow primitive into a stronger primitive before we can gain code execution.

Going through the list of functions above, we come across interesting block of data referred to by several functions:

As you can see above, the block of 0x32 DWORDs, starting at offset 0x169C, are used to store "sessions". Whenever a client sends commands to the Widevine trustlet, they must first create a new "session", and all subsequent operations are performed using the specific session identifier issued during the session's creation. This is needed, for example, in order to allow more than a single application to decrypt DRM content at the same time while having completely different internal states.

In any case, as luck would have it, the sessions are complex structures - hinting that they may be used in order to subtly introduce side-effects in our favour. They are also within our line-of-fire, as they are stored at an offset greater than that of the overflown buffer. But, unfortunately, the 0x32 DWORD block mentioned above only stores the pointers to these session objects, not the objects themselves. This means that if we want to overwrite these values, they must point to addresses which are accessible from QSEE (otherwise, trying to access them will simply result in the trustlet crashing).

Finding Ourselves

In order to craft legal session pointers, we'll need to find out where our trustlet is loaded. Exploring the relevant code reveals that QSEOS goes to great lengths in order to protect trustlets from the "Normal World". This is done by creating a special memory region, referred to as "secapp-region", from which the trustlet's memory segments are carved. This area is also protected by an MPU, which prevents the "Normal World" from accessing it in any way (attempting to access those physical addresses from the "Normal World" causes the device to reset).

On the other hand, trustlets reside within the secure region and can obviously access their own memory segments. Not only that, but in fact trustlets can access all allocated memory within the "secapp" region, even memory belonging to other trustlets! However, any attempt to access unallocated memory within the region results in the trustlet immediately crashing.

...Sounds like we're beginning to form a plan!

We can use the overflow primitive in order to overwrite a session pointer to a location within the "secapp" region. Now, we can find a command which causes a read attempt using our poisoned session pointer. If the trustlet crashes after issuing the command, we guessed wrong (in that case, we can simply reload the trustlet). Otherwise, we found an allocated page in the "secapp" region.

But... How do we know which trustlet that page belongs to?

We already have a way to differentiate between allocated and unallocated pages. Now, we need some way to distinguish between pages based on their contents.

Here's an idea - let's look for a function that behaves differently based on the read value in a the session pointer:

Okay! This function tries to access the data at session_pointer + 0xDA. If that value is equal to one, it will return the value 24, otherwise, it will return 35.

This is just like finding a good watermelon; by "tapping" on various memory locations and listening to the "sound" they make, we can deduce something about their contents. Now we just need to give our trustlet a unique "sound" that we can identify by tapping on it.

Since we can only listen to differences between one and non-one values, let's mark our trustlet by creating a unique pattern containing ones and zeros within it. For example, here's a pattern which doesn't occur in any other trustlet:

Now, we can simply write this pattern to the trustlet's data segment by using over overflow primitive, effectively giving it its own distinct "sound".

Finally, we can repeat the following strategy until we find the trustlet:

Randomly tap a memory location in the "secapp" region:

If it sounds "hollow" (i.e., the trustlet crashes) - there's nothing there, so reload our trustlet
Otherwise, tap the sequence of locations within the page which should contain our distinct marking pattern. If it sounds like the pattern above, we found our trustlet

Of course, inspecting the allocation scheme used by QSEOS could allow us to speed things further by only checking relevant memory locations. For example, QSEOS seems to allocate trustlets consecutively, meaning that simply scanning from the end of the "secapp" region to its beginning using increments of half the trustlet's size will guarantee a successful match.

A (messy) write primitive

Now that we have a way to find the trustlet in the secure region, we are able to craft "valid" session pointers, which point to locations within the trustlet. Next up, we need to find a way to create a write primitive. So... are there any functions which write controllable data into a session pointer?

Surprisingly, nearly all functions that do write data to the session pointer do not allow for arbitrary control over the data being written. Nonetheless, one function looks like it could be of some help:

This function generates a random DWORD to be used as a "nonce", then checks if enough time elapsed since the previous time it was called. If so, it adds the random value to the session pointer by calling "addNonceToCache".

First of all, since the "time" field is saved in the global buffer after our overflown buffer, we can easily clear it using our overflow primitive, thus removing the time limitation and allowing us to call the function as frequently as we'd like. Also, note that the generated nonce's random value is written into a buffer which is returned to the user - this means that after a nonce is generated, the caller also learns the value of the nonce.

Let's take a peek at how the nonces are stored in the session pointer:

So there's an array of 16 nonces in the session object - starting at offset 0x88. Whenever a nonce is added, all the previous nonce values are "rolled over" one position to the right (discarding the last nonce), and the new nonce is written into the first location in the nonce array.

See where we're going with this? This is actually a pretty powerful write primitive (albeit a little messy)!

Whenever we want to write a value to a specific location, we can simply set the session pointer to point to that location (minus the offset of the nonces array). Then, we can start generating nonces, until the least-significant byte (this is a little-endian machine) in the generated nonce matches the byte we would like to write. Then, once we get a match, we can increment the session pointer by one, generate the next byte, and so forth.

This allows us to generate any arbitrary value with an expectancy of only 256 nonce-generation calls per byte (since this is a geometric random variable). But at what cost?

Since the values in the nonce cache are "rotated" after every call, this means that we mess-up the 15 DWORDs after the last written memory location. We'll have to work our way around that when we design the exploit.

Writing an exploit

We finally have enough primitives to craft a full exploit! All we need to do is find a value that we can overwrite using the messy write primitive, which will allow us to hijack the control flow of the application.

Let's take a look at the function in charge of handling the "6X" category of commands:

As you can see, the function calls the requested commands by using the command ID specified as an index into an array stored in the global buffer. Each supported command is represented by a 12-byte entry in the array, containing four pieces of information:

The command code (32-bits)
A pointer to the handling function itself (32-bits)
The minimal input length (16-bits)
The minimal output length (16-bits)

If this information is valid, the function pointer is executed, passing in the user's input buffer as the first argument and the output buffer as the second argument.

If we choose an innocuous 6X command, we can overwrite the corresponding entry in the array above so that its function pointer will be directed at any piece of code we'd like to execute. Then, simply calling this command will cause the trustlet to execute the code at our controlled memory location. Great!

We should be wary, however, not to choose a function which lies directly before an "important" command that we might need later. This is because our messy write primitive will destroy the following 15 DWORDs (or rather, the next 5 array entries). Let's take a look at the function which populates the entries in the command array:

There are six consecutive entries corresponding to unused functions. Therefore, if we choose to overwrite the entry directly before them, we'll stay out of trouble.

Universal Shellcode Machine

Although we can now hijack the control flow, we still can't quite execute arbitrary code within QSEE yet. The regular course of action at point would probably be to find a stack-pivot gadget and write a short ROP chain which will enable us to allocate shellcode - however, since the trustlets' code segments aren't writeable, and the TrustZone kernel doesn't expose any system call to QSEE to allow the allocation of new executable pages, we are left with no way to create executable shellcode.

So does this mean we need to write all our logic as a ROP chain? That would be extremely inconvenient (even with the aid of automatic "ROP"-compilers), and might even not be possible if the ROP gadgets in the trustlet are not Turing-Complete.

Luckily, after some careful consideration, we can actually avoid the need to write longer ROP chain. If we think of our shellcode as a Turing Machine, we would like to create a "Universal Turing Machine" (or simulator), which will enable us to execute any given shellcode as if it were running completely within QSEE.

Given a piece of code, we can easily simulate all the control-flow and logic in the "Normal World", simply by executing the code fully in the "Normal World". But what about operations which behave differently in a QSEE-context? If we think about it, there are only a few such operations:

Reading and writing memory
Calling system calls exposed by the TrustZone kernel

These operations must execute within QSEE. However, we can actually execute both of these operations in QSEE by writing one small ROP chain!

All we need is a single ROP chain which will:

Hijack control flow to a separate stack
Prepare arguments for a function call
Call the wanted QSEE function
Return the result to the user and restore execution in QSEE

As you can see, all this chain do is to enable us to execute any given QSEE function using any supplied arguments. But how can we use it to simulate the special operations?

Well, since all system-calls in QSEE have matching calling-stubs in each trustlet, we can use our ROP chain to execute any system call with ease. As for memory accesses - there is an abundance of QSEE functions which can be used as read and write gadgets. Hence, both operations are simple to execute using our short ROP chain.

This leaves us with the following model:

This also means that executing arbitrary shellcode in QSEE doesn't require any engineering effort! All the shellcode developer needs to do is to delegate memory accesses and system calls to specific APIs exposed by the exploit. The rest of the shellcode's logic can remain unchanged and execute completely in the "Normal World". We'll see an example of some shellcode using this model shortly.

Finding a stack pivot

In order to execute a ROP chain, we need to find a convenient stack-pivot gadget. When dealing with large or medium-sized applications, this is not a daunting task - there is simply enough code for us to find at least one gadget that we can use.

However, since we're only dealing with ~125KB of code, we might not be that lucky. Not only that, but at the point at which we hijack the control flow, we only have control over the registers R0 and R1, which point to the input and output buffers, respectively.

After fully disassembling the trustlet's code we are faced with the harsh truth - it seems as though there is no usable stack pivot using our controlled registers. So what can we do?

Recall that ARM opcodes can be decoded in more than one way, depending on the value of the T bit in the CPSR. When the bit is set, the processor is executing in "Thumb" mode, in which the instruction length is 16-bits. Otherwise, the processor is in "ARM" mode, with an instruction length of 32-bits.

We can easily switch between these modes by using the least-significant bit of the PC register when performing a jump. If the least-significant bit is set, the T bit will be set, and the processor will switch to "Thumb" mode. Otherwise, it will switch to ARM mode.

Looking at the trustlet's code - it seems to contain mostly "Thumb" instructions. But perhaps if we were to forcibly decode the instructions as if they were "ARM" instructions, we'd be able to find a hidden stack pivot which was not visible beforehand.

Indeed, that is the case! Searching through the ARM opcodes reveals a very convenient stack-pivot:

By executing this opcode, we will be able to fully control the stack pointer, program counter and other registers by using the values stored in R0 - which, as we saw above, points to the fully user-controlled input buffer. Great!

As for the rest of the ROP chain - it is pretty standard. In order to execute a function and return all we need to do is build a short chain which:

Sets the low registers (R0-R3) and the stack arguments to the function's arguments
Set the link register to point to the rest of our chain
Jump to the function's start address
When control is returned via the crafted LR value, store the return value in user-accessible memory location, such as the supplied output buffer
Restore the stack pointer to the original location and return to the location from which control was originally hijacked

You can find the complete ROP chain and gadgets in the provided exploit code, but I imagine it's exactly what you'd expect.

Putting it all together

At long last, we have all the pieces needed to create a fully functional exploit. Here's a short run-down of the exploit's stages:

Find the Widevine application by repeatedly "tapping" the secapp region and "listening"
Create a "messy" write gadget using the nonce-generation command
Overwrite an unused 6X command entry using the write gadget to direct it to a stack-pivot
Execute any arbitrary code using a small ROP chain under the "Universal Shellcode Machine"

The Exploit

As always, the full exploit code is available here:

https://github.com/laginimaineb/cve-2015-6639

I've also included a sample shellcode using the model described earlier. The shellcode reads a file from TrustZone's secure file-system - SFS. This file-system is encrypted using a special hardware key which should be inaccessible to software running on the device - you can read more about it here. Regardless, running within the "Secure World" allows us to access SFS fully, and even extract critical encryption keys, such as those used to decrypt DRM content.

In fact, this is all it takes:

Also, please note that there are quite a few small details that I did not go into in this blog post (for brevity’s sake, and to keep it interesting). However, every single detail is documented in the exploit's code. So by all means, if you have any unanswered questions regarding the exploit, I encourage you to take a look at the code and documentation.

What's next?

Although we have full code-execution within QSEE, there are still some things beyond our reach. Specifically, we are limited only to the API provided by the system-calls exposed by the TrustZone kernel. For example, if we were looking to unlock a bootloader, we would probably need to be able to blow the device's QFuses. This is, understandably, not possible from QSEE.

With that in mind, in the next blog post, we'll attempt to further elevate our privileges from QSEE to the TrustZone kernel!

Timeline

27.09.2015 - Vulnerability disclosed
27.09.2015 - Initial response from Google
01.10.2015 - PoC sent to Google
14.12.2015 - Vulnerability fixed, patch distributed

I would also like to mention that on 19.10.2015 I was notified by Google that this issue has already been internally discovered and reported by Qualcomm. However, for some reason, the fix was not applied to Nexus devices.

Moreover, there are quite a few firmware images for other devices (such as the Droid Turbo) that I've downloaded from that same time period that appeared to still contain the vulnerability! This suggests that there may have been a hiccup when internally reporting the vulnerability or when applying the fix.

Regardless, as Google has included the issue in the bulletin on 14.12.2015, any OEMs that may have missed the opportunity to patch the issue beforehand, got another reminder.

Bits, Please!

30/06/2016

Extracting Qualcomm's KeyMaster Keys - Breaking Android Full Disk Encryption

Setting the Stage

Android Full Disk Encryption

KeyMaster

Reversing Qualcomm's KeyMaster

Conclusions

Extracting the KeyMaster Keys

Putting it all together

The Code

Final Thoughts

15/06/2016

TrustZone Kernel Privilege Escalation (CVE-2016-2431)

05/05/2016

War of the Worlds - Hijacking the Linux Kernel from QSEE

02/05/2016

QSEE privilege escalation vulnerability and exploit (CVE-2015-6639)

The QSEE Attack Surface

Vulnerability Scope

Enter Widevine

What now?

Primitive Technology

Finding Ourselves

A (messy) write primitive

Writing an exploit

Universal Shellcode Machine

Finding a stack pivot

Putting it all together

The Exploit

What's next?

Timeline

Blog Archive