I’ve been on a sabbatical this academic year, and my goal is to understand the state-of-the art in exploitation and vulnerability analysis by doing it myself, which I expanded on previously.
This post describes the first vulnerability that I found in the XNU kernel, which is the Operating System used for a number of Apple products, including Macs, iPhones, iPads, lots of i-devices really.
The vulnerability is a 20-year-old use-after-free vulnerability in XNU in ndrv.c
, which can be triggered by a root user creating an AF_NDRV
socket, and I learned a ton through identifying the root cause, the fix, and creating a proof-of-concept that triggers the vulnerability.
And it was quite cool to see my name on the security notes.
Root Cause
An attacker with root privileges can cause a dangling pointer in the nd_multiaddrs
linked list where the data is freed but never removed from the linked list.
In ndrv.c:ndrv_do_remove_multicast
, the nd_multiaddrs
linked list
is iterated over to remove the entry ndrv_entry
from the
nd_multiaddrs
linked list with the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Now, if we consider a struct ndrv_multiaddr*
linked list of the following:
1
|
|
Where A
is nd_multiaddrs
(the head of the list) and B
is ndrv_entry
(the entry that we are deleting).
The start of the for
loop sets cur
to cur->next
, and the if
condition in the for
loop compares cur->next
to ndrv_entry
to remove ndrv_entry
from our list.
In our example, this will set cur = B
at the start of the for
loop, then test NULL == B
, and the if
condition will never trigger.
Thus, even though B
is freed after this loop (in the call to ndrv_multiaddr_free
), the nd_multiaddrs
linked list in our example still looks like:
1
|
|
The conditions for triggering this vulnerability are that there must be at least two elements on the nd_multiaddrs
list, and the second element in the list is removed.
Real Root Cause
One of the things that I love about discovering vulnerabilities is trying to put myself in the shoes of the developer to understand why the bug occurred.
I can completely relate to the developer here: it took me awhile of walking through the code to even believe that there was a vulnerability.
On first glance, everything looks fine.
The other aspect to consider is the conditions that have to occur for the bug to be triggered: usually if an off-by-one error (which is essentially what this is) occurs it would be caught by the developer while doing normal testing: because the system wouldn’t do what it was supposed to do.
However, this condition does not trigger in the case of one element in the list, and only occurs if you delete a specific item in a list that has more than one item.
Therefore, I can completely relate to the developer making this mistake and not noticing this bug.
My Proposed Fix
A fix for the vulnerability is to not increment the cur
pointer before entering the loop:
1 2 3 4 5 6 7 |
|
The Real Fix
And, looking at the patched version, it seems that something similar was used by abstracting the deletion logic into a function ndrv_cb_remove_multiaddr
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
It is fascinating to learn how the developers fix these underlying bugs. While the bug itself was fixed, I notice two interesting additions here:
Abstracting the functionality of removing a
ndrv_multiaddr
into a single functionndrv_cb_remove_multiaddr
. In addition to being good software development practice, doing so will help prevent future bugs so developers have a single function to call to delete andrv_multiaddr
rather than doing it themselves (and introducing another bug).The developers also added an
ASSERT(removed);
at the end of the function, and this is important because it essentially encodes the security requirements of the function into theASSERT
statement. If future developers change functionality here, it will be unlikely that the bug will be reintroduced (although it might not be clear to future developers why a function that attempts to remove from a linked list should never fail, so perhaps they would remove it then).
Affected Versions
From what I can tell, it seems that the vulnerable code was introduced in XNU 344 and shipped with Mac OS X Jaguar (10.2), and this bug has been present since ~2002, which makes this a 20-year old bug!
First commit: https://github.com/apple-oss-distributions/xnu/blob/fad439e77835295998e796a2547c75c42f4bc623/bsd/net/ndrv.c#L1054
POC
Here’s the POC that I wrote to trigger this bug, which uses close
on the socket to call
ndrv_do_detach
and then ndrv_remove_all_multicast
, which
dereferences the dangling object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
Vulnerability Discovery
I’m not going to say much publicly at the moment as to how I found this vulnerability, because I’m still using it to find bugs.
I’ll release everything toward the end of my sabbatical, but for now it suffices to say that fuzzing techniques are amazing for finding these types of tricky corner-cases.