Effectively bypassing kptr_restrict on Android

In this blog post, we'll take a look at a few ways that I've discovered in order to bypass kptr_restrict on Android, allowing for easier exploitation of vulnerabilities that require some information on the virtual addresses in which the kernel is loaded. But first, for those of you who aren't familiar with the "protection" offered by kptr_restrict, let's get you up to speed on the subject.

What's kptr_restrict?

As we've seen in the previous blog post, sometimes exploits require knowledge of internal kernel pointers - either in order to hijack them, or in order to corrupt them in a controllable manner.

This fact has been known for quite some time - enough time, in fact, for it to be addressed directly. The Linux kernel contains a feature which enables it to filter out such addresses in order to avoid leaking them to a potential attacker. This configurable feature is called "kptr_restrict", and has been present in the Android kernel source tree for at least two years.

As with nearly all configurable kernel parameters, there exists a special file which allows to set the way in which this feature behaves when attempting to filter kernel addresses. In the case of kptr_restrict, the file resides in "/proc/sys/kernel/kptr_restrict", but has some daunting permissions set:

Essentially, only root can modify its value, but any user can read it.

So how does kptr_restrict work? Well, first of all, kernel developers needed a way to mark kernel pointers as such, whenever those are outputted. This is achieved by using a new format specifier, "%pK", which is used to denote that the value written into that specifier contains a kernel pointer, and as such, should be protected.

There are three different values which control the protection offered by kptr_restrict:
  • 0 - The feature is completely disabled
  • 1 - Kernel pointers which are printed using "%pK" are hidden (replaced with zeroes), unless the user has the CAP_SYSLOG capability, and has not changed their UID/GID (to prevent leaking pointers from files opened before dropping permissions).
  • 2 - All kernel pointers printed using "%pK" are hidden
The default value of this configuration is chosen when building the kernel (via CONFIG_SECURITY_KPTR_RESTRICT), but for all modern Android devices that I've ever encountered, this value is always set to "2".

However - how many kernel developers actually know of the need to protect kernel pointers by using "%pK"? The can be easily answered by grepping the kernel for this format string. The answer is, as expected, quite sad:

Merely 35 times (in 23 files) within the entire kernel source code. Needless to say, kernel pointers are very often printed using the "normal" pointer format specifier, "%p" - a simple search shows many hundreds of such uses.

So now that we've set the stage, let's see why the protection offered by kptr_restrict is insufficient on it's own.

Method #1 - Getting dmesg from shell

All log messages printed by the kernel are written to a circular buffer held within the kernel's memory. Users may read from this buffer by invoking the "dmesg" (display message) command. This command actually accesses the buffer by invoking the syslog system call, as you can see from this strace output:

However, the syslog system call can't be accessed by just any user - specifically, the caller must either posses the extremely powerful CAP_SYS_ADMIN capability, or the weaker (and more specific) capability of CAP_SYSLOG.

Either way, most Android processes do not, in fact, have these capabilities, and therefore can't access the kernel log. Or can they? :)

Recall that within Android, the "init" process maintains a list of "services" which can be started or stopped as needed. These services are loaded by "init" upon boot, from a hard-coded list of configuration files, which are almost always stored on the root (read-only) partition, and are therefore read-only.

The configuration files are actually written using a language specific to Android, called the "Android Init Language". This language is pretty simple and easy to use, and allows full control over the permissions with which services are launched (UID/GID) as well as their parameters and "type" (for more information about the language itself, check out the link above).

Another feature of Android are "system properties" - these are key-value pairs which are maintained by the "property service", which is also a thread within the init process. This service allows basic access-control on various "sensitive" system properties, which prevents users from freely modifying any property they please.

These access-permissions for most properties used to be (until Android 4.4) hard-coded within the property service (since Android 5, the permissions are handled by using SELinux labels instead):

However, some properties get special treatment, namely - the "ctl.start" and "ctl.stop" system properties, which are used to either start or stop system services (defined, as mentioned before, using the "Android Init Language").

These properties are checked strictly using SELinux labels, in order to make sure that the privilege of modifying the status of system services is reserved strictly to certain users.

But here comes the surprising part - when connecting locally to the device using "adb" (Android Debug Bridge), we gain execution as the "shell" user. This user is always permitted start and stop one particular service - "dumpstate". Actually, this is used by a feature offered by the "adb" command-line utility, which enables developers to create bug reports containing full information from the device.

Running "adb" with this command-line argument (or simply executing "bugreport" from the adb shell), actually starts the "dumpstate" service by setting the "ctl.start" system property:

So let's take a look at the configuration for the "dumpstate" service:

Since the service has no "user" or "group" configurations, it is actually executed with the root user-ID and group-ID, which could be quite dangerous...

Luckily, the developers of the service were well aware of the potential security risks of running with such high capabilities, and therefore immediately after starting, the service drops its capabilities by modifying its user-ID, group-ID and capabilities, like so:

In short, the service sets the user and group IDs to those of the shell user, but makes sure that it keeps the CAP_SYSLOG capability explicitly.

Reading on reveals that "dumpstate" actually reads the kernel log using the syslog system call (which it is capable of executing since it has the CAP_SYSLOG capability), and writes the contents read back to the caller. Essentially, this means that within the context of the "adb shell", we can freely read the kernel log simply by executing the "bugreport" program. Nice.

However, this still doesn't solve the problem of getting needed symbols for exploits - since, as mentioned earlier, these symbols should generally be printed using the "%pK" format specifier, which means they would appear "censored" in the kernel log.

But alas, most pointers within the kernel are certainly not printed using the special format specifier, but instead use the regular "%p" format, and are therefore left uncensored. This means that the kernel log is typically a treasure trove of useful kernel pointers.

For example, when the kernel boots, the memory map of the kernel's different segments is printed, like so:

Now, assuming there's a single symbol we would like to find, we could simply dump the list of all kernel symbols using the virtual file containing all the symbols - /proc/kallsyms. When kptr_restrict is enabled, the list returned by kallsyms is censored (since it is printed using "%pK"), and therefore won't show any kernel pointers.

Censored symbols from kallsyms  

However, the symbols returned by kallsyms are ordered by their addresses, even if those addresses aren't shown. Moreover, this task is made easier due to the fact that each segment is prefixed and postfixed by specially named marker symbols:

Segment Name           Start Marker                    End Marker         
.text _text _etext
.init __init_begin __init_end
.data _sdata _edata
.bss ___bss_start __bss_end
    We can then use this list to deduce the location of different symbols by simply counting the number of symbols from the start or end marker to our wanted symbol, while adding up the sizes of each of the symbols encountered.

    Another technique would be to cause a wanted kernel pointer to be written to the kernel log. For example, on Qualcomm-based devices (based on the "msm" kernel), whenever the video device is opened, the kernel virtual address of the video device is written to the kernel log:

    msm_vidc_open leaks the pointer to the kernel log

    Method #2 - Retrieving the kernel symbols statically

    Why use this method?
    In many cases, although the device itself is accessible, it may be heavily locked - for example, in extreme cases, adb access may be disabled (however poorly), which would complicate the usage of the first method (unless we manage to gain shell access). In this case, we may wish to build the complete list of kernel symbols from the kernel image itself, statically, without interacting directly with the device.

    Also, since KASLR (Kernel Address Space Layout Randomization) is currently still unused in Android devices, there is no need to consider any kind of runtime modification to the location of the symbols present in the kernel image. This means that the kernel image must contain all the information needed to build the complete list of symbols, including their addresses, exactly as they would appear on a real "live" device.

    How do I get a kernel image?

    Assuming you have the full access to a live device, you could read the kernel image directly from the MMC, via /dev/block. However, in most cases reading the MMC blocks directly requires root permissions, which would make this method pretty obsolete, since with root access we could already disable kptr_restrict.

    The more reasonable path to obtaining the kernel image would be to simply download the firmware file for your particular device, and unpack it. There are many tools which enable firmware unpacking for different devices (for example, I wrote a script to unpack to Nexus 5's bootloader - here), but many such tools are available, and are typically a google-search away.

    Just one word of caution - make sure you download the exact kernel image matching the kernel on your device. You can find the running kernel's version by simply running "uname -a":

    I have the image - now what?

    In order to understand how to extract the full symbol list from a kernel image, we must first inspect the way in which a kernel image is built. Looking over the code, reveals that a special program is used to emit the symbols needed in a special format into the kernel's image, as part of the build process.  The program which receives the symbol map containing the location of each kernel symbol in the kernel's virtual address space, and outputs an assembly file containing the compressed symbol table, which is assembled into the resulting kernel image.

    This means that all we need to do in order to rebuild this table from a raw kernel binary is to understand the exact format in which this symbol table is written. However, for a normally compiled kernel with no additional symbols, this turns out to be a little tricky.

    Since the labels written by the script are not visible in the resulting kernel binary, the first thing we'd have to solve is how to find the beginning of the symbol table within the binary. Luckily, the solution turns out to be pretty simple - remember when we previously had a look at the symbol table from kallsyms? The first two symbols were marker symbols pointing to the beginning of the kernel's text segment. Since the kernel's code is loaded at a known address (typically, 0xC0008000), we can search for this value appearing at least twice consecutively within the binary, and attempt to parse the symbol table's structure starting at that address.

    Going over the symbol table itself, reveals that it is terminated by a NULL address. Then, immediately following the symbol table, the actual number of symbols is written, which means we can easily verify that the table is actually well-formed.

    Then, two tables of "markers" and "symbols" are written into the file. This is done in order to compress the size of the symbols within the table, and by doing so reduce the size of the kernel binary. The compression maps the 256 most used substrings (which are called tokens), into a single byte value. Then, each symbol's name is compressed into a pascal-style string of bytes (meaning, a byte marking the length of the string, then an actual string of characters). Each byte in the compressed name maps to a single tokens, which in turn corresponds to a single "most commonly used" substring. Putting it together, it looks like this:

    According to kernel developers, this usually produces a compression ratio of about 50%.

    I've written a python script which, given a raw kernel binary, extracts the full symbol table from the binary, in the exact same format as they are written within kallsyms. You can find it here. Please let me know if you find the script useful! 

    Method #3 - Finding information disclosures within the kernel

    This is the "classical" method which is commonly used in order to bypass the restrictions imposed by kptr_restrict. For a remote attacker wishing to target a wide variety of devices, it is quite often the best choice, since:
    • The first method typically requires shell access to the device, in order to execute the "bugreport" service
    • The second method requires you to obtain the kernel image, which could be tiresome to do for a very wide variety of devices
    Sadly, it appears that kernel developers are far less aware of the possible risks of leaking kernel pointers than they are of other (e.g., memory corruption) vulnerabilities.

    As a result, finding a kernel memory leak is usually a very short and simple task. To prove this point, after poking around for five minutes on a live device, I've come across such a leak, which is accessible from any context.

    Whenever a socket is opened within Android, it is tagged using a netfilter driver called "qtaguid". This driver accounts for all the data sent or received by every socket (and tag), and allows some restrictions to be placed on sockets, based on the tag assigned to them. Android uses this feature in order to account for data usage by the device. The actual per-process breakdown is done by assigning each process a specific tag, and monitoring the data used by the process based on that tag.

    The driver also exposes a control interface, with which a user can query the current sockets and their tags, along with the user-ID and process-ID from which the socket has been opened. This control interface is facilitated by a world-accessible file, under /proc/net/xt_qtaguid/ctrl.

    However, reading this file reveals that it actually contains the kernel virtual address for each of the sockets which completely uncensored:

    Looking at the source code for the virtual file's "read" implementation, reveals that the address is written without using the special "%pK" format specifier:

    For those interested - the actual pointer written is to the "sock" structure, which is the kernel structure containing the actual "socket" structure, which in turns contains all the function pointers to the operations within this socket.

    This means that if, for example, we have a vulnerability that allows us to overwrite a specific kernel address (like the vulnerability presented in the previous blog post), we could simply:
    • Open a socket and tag it with "qtaguid"
    • Look for the socket's address within /proc/net/xt_qtaguid/ctrl
    • Overwrite the pointer to the "socket" structure to an address within our address-space
    • Populate the overwritten address with a dummy "socket" structure containing fully controller function pointers 
    • Perform any operation on the socket (like closing it), in order to cause the kernel to execute our own code

    Summing it all up

    Just like any other mitigation, kptr_restrict adds a layer of defence which can sometimes slow down an attacker, but is generally not a show-stopper for anyone determined enough. However, unlike most other mitigations, kptr_restrict requires the cooperation of kernel developers to be effective. Right now, things aren't so great. Hopefully this changes :)


    1. Great article! I was unaware that it protected ANY use of "%pK". I just thought it zero'd the output of /proc/kallsyms. I suppose an LKM invoking kallsyms_lookup_name() will get the real symbol address no matter the value of kptr_restrict?

      1. Yup - kallsyms_lookup_name() would succeed, and trying to remove that functionality could break some device drivers that use it. However, getting the symbol table is easy once you have the kernel image (using static_kallsyms), which you certainly do if you're running in the kernel.

    2. Haha cool I just went in to ktpr_restrict and modified the value to 0 to disable it. I do have root but I didn't know if it was write protected against root users :) Nice Article

    3. Haha cool I just went in to ktpr_restrict and modified the value to 0 to disable it. I do have root but I didn't know if it was write protected against root users :) Nice Article

    4. Great work, I've been looking everywhere for something like your script (too lazy to make one myself)
      I forked my own version of the script with some added fixes and 64 bit support, you can see check it out in https://github.com/omershv/static_kallsyms