yakubin’s notes

Debugging vtable corruption with mmap

In the world of C++ programming memory corruption bugs are common. Sometimes you can track down their source with ASAN, but more often then not ASAN is not a practical solution given real-life constraints. Here is a short demonstration of how one can instead tweak how objects of a certain type are allocated in order to catch corruption of the vtable pointer. Inspired by real-life events.

Let’s say we have a program of roughly this shape:

a.hpp#pragma once

class A {
public:
    virtual ~A() = default;
    virtual int GetInt() const = 0;
    void PrintInt() const;
};
a.cpp#include "a.hpp"

#include <stdio.h>

void A::PrintInt() const {
    int n = GetInt();
    printf("%d\n", n);
}
b.hpp#pragma once

#include "a.hpp"

class B : public A {
public:
    int GetInt() const override;
};
b.cpp#include "b.hpp"

int B::GetInt() const {
    return 42;
}
test1.cpp#include "b.hpp"

int main() {
    A* a = new B;
    void** vtable_ptr = (void**) a;
    *vtable_ptr = (void*)0xdeadbeef;
    a->PrintInt();
    delete a;
}

In such a short program the memory corruption is trivial to find and unlikely to be made. But, let me assure you, this debugging story is inspired by real events and in a real-life program the memory corruption is much harder to find and more likely to be made.

To compile:

g++ -std=c++17 -Wall -Wextra -Og -g -o test1 test1.cpp a.cpp b.cpp

Now let’s run this program:

$ ./test1
zsh: segmentation fault  ./test1

A crash.

Finding the cause of the crash

Let’s run it under a debugger. Here we do it live. In reality you’ll probably only get a core dump, but in this instance a core dump would also be entirely sufficient.

$ gdb ./test1
--- GDB spam omitted ---
(gdb) r
Starting program: /home/yakubin/code/vtable-corruption/test1 

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555199 in A::PrintInt (this=0x55555556aeb0) at a.cpp:6
6       int n = GetInt();

First step now is to determine what instruction causes the segfault. In order to find out, we disassemble the instruction that the program counter points to:

(gdb) set disassembly-flavor intel
(gdb) x/i $pc
=> 0x555555555199 <_ZNK1A8PrintIntEv+7>:    call   QWORD PTR [rax+0x10]

Ok. It tries to call a function whose address is 16 bytes after the address contained in the rax register. Where does the value of rax come from? Let’s disassemble all instructions in current function until the current one to get some context:

(gdb) x/3i _ZNK1A8PrintIntEv
   0x555555555192 <_ZNK1A8PrintIntEv>:  sub    rsp,0x8
   0x555555555196 <_ZNK1A8PrintIntEv+4>:    mov    rax,QWORD PTR [rdi]
=> 0x555555555199 <_ZNK1A8PrintIntEv+7>:    call   QWORD PTR [rax+0x10]

It seems that rax contains a value loaded from the address stored in rdi. This pattern – a read from memory, followed by another read from memory displaced by a constant, followed by a call (on x86 the last two are fused into one, but on load-store architectures like ARM they’re separate) – always screams to me “vtable indirection”, i.e. a virtual method call. The mov reads the address of the vtable from the rdi register. The call reads the address of a virtual method, which it would later jump to.

Another piece of corroborating evidence for this hypothesis is the fact that the supposed vtable address is read from the rdi register, which, according to the Unix System V AMD64 ABI, holds the first function argument. In C++ this is implicitly passed as the first argument and typically vtable address is stored at the beginning of each object which has a vtable. We can check that indeed rdi holds the value of this:

(gdb) p/x this
$1 = 0x55555556aeb0
(gdb) p/x $rdi
$2 = 0x55555556aeb0

Similar heuristics can be applied on other platforms based on cursory familiarity with their ABIs.

Finally, we can inspect what the current vtable address is:

(gdb) p/x *(void**)$rdi
$3 = 0xdeadbeef

Alternatively:

(gdb) p/x $rax
$4 = 0xdeadbeef

That’s the value we put in our code to simulate memory corruption.

Finding the memory corruption

We know some of our memory is corrupted, but we don’t know who did it. To find out, we’ll use memory permissions. We can allocate our objects in such a way that the vtable address is going to be stored in read-only memory, while everything else will be in read-write memory. This way if someone tries to overwrite it, the program will crash. So the end result is going to be the same as before, but this time the backtrace is going to show us who’s really at fault.

Memory permissions can be set with page granularity, so we’re going to need two pages for each object (assuming each object fits in a single page, which it should or else you have very big structs).

The code:

test2.cpp#include <new>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>

#include "b.hpp"

int main() {
    long pagesize = sysconf(_SC_PAGESIZE);

    // Map 2 pages with read-write permissions.
    char* p = (char*) mmap(NULL, 2 * pagesize, PROT_READ | PROT_WRITE,
                            MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (p == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    // Calculate the address that is one pointer size before the end of
    // the first page.
    char* buf = p + pagesize - sizeof p;

    // We use placement new operator to place the object in previously allocated
    // memory, running its constructor, initialising the vtable address etc.
    A* a = new (buf) B;
    void** vtable_ptr = (void**) a;

    // Make the first page read-only after the object is constructed
    // (i.e. the vtable address is already written).
    if (mprotect(p, pagesize, PROT_READ) != 0) {
        perror("mprotect");
        return 2;
    }

    *vtable_ptr = (void*)0xdeadbeef;
    a->PrintInt();

    // Make the first page read-write again before calling the destructor.
    if (mprotect(p, pagesize, PROT_READ | PROT_WRITE) != 0) {
        perror("mprotect");
        return 3;
    }

    // Calling the destructor and freeing memory.
    a->~A();
    munmap(p, 2 * pagesize);
}

Compile:

g++ -std=c++17 -Wall -Wextra -Og -g -o test2 test2.cpp a.cpp b.cpp

And this time gdb shows the line that corrupted the memory instead of the one that crashed later because of it:

(gdb) r
Starting program: /home/yakubin/code/vtable-corruption/test2 

Program received signal SIGSEGV, Segmentation fault.
0x000055555555520f in main () at test2.cpp:35
35      *vtable_ptr = (void*)0xdeadbeef;

Watchpoints

You could theoretically use GDB watchpoints for the same purpose, but:

  1. Each CPU supports only a limited number of hardware watchpoints – e.g. amd64 supports 4. Once you set one more watchpoint, GDB emulates watchpoints in software, single-stepping every instruction one at a time. This makes reproduction of the issue unlikely, since your program will cease to be functional in any way much earlier. So if we don’t know ahead of time which of a large number of objects of a given type is going to have its vtable pointer corrupted, this is not practical. (There is the rr reverse debugger, but its range of supported hardware platforms is so narrow that it could as well not exist.)
  2. To set watchpoints, you need to be live-debugging the issue. In reality you’re probably not even present when the issue is reproduced and all you get is a core dump. With the approach that I presented here on the other hand, you don’t need to be present during reproduction and the device where the issue is reproduced doesn’t even need to have gdb or gdbserver installed. Your level of interference with the environment is generally minimal.