Recovering Symbols from .NET AOT Compiled Binaries

Hey space travelers 🚀

Today we’ll be forensically recovering function symbols from an AOT compiled .NET binary.

I was feeling inspired by some posts like the following to explore what a simple/repeatable symbol recovery technique could look like:

Background

.NET AOT binaries have gained traction with both developers and bad faith actors alike. In addition to having the full power of the .NET framework they offer the following to malware developers:

  1. Additional difficulty reverse engineering samples

    • With thousands of framework methods and native data structures the difficulty of separating runtime from true application logic increases
  2. New avenues for stealthiness / detection bypass

If symbols for these binaries can be recovered it drastically speeds up the time it takes to perform an investigation. This post aims to demonstrate how symbol recovery can be performed in a relatively painless and repeatable way.

A Target Application

I’ll be using challenge #7 “fullspeed” from this year’s Flare-On competition as a target application. It’s a solid representation of a reverse engineering task made more difficult by AOT compilation.

flare logo

Opening the application in the analysis tool of your choice reveals thousands of symbol stripped framework/library methods - with only a few narrow touchpoints (e.g. intermodular calls) that can be used to identify core application logic.

Discovery

Using sysinternal’s strings to search both wide and ascii strings in the application reveals library dependencies. Searching “.NETCoreApp” fast tracks to various assembly informations.

strings

In this case the dependency that would be the greatest aid to reverse engineering the sample is “BouncyCastle”.

Recovery Strategy

With the target framework and dependencies revealed by string searching we can take the following familiar high level approach to symbol recovery:

  1. Compile binary with the same dependencies and runtime compiled in as our target sample
    • Also take note if the binary was published in release mode or not, we need to match that as well
  2. Enumerate the symbols of our binary and create function signatures for each
  3. Enumerate the target sample and search for instances of our function signatures

Creating our Symbol Binary

I used the following automation to build a binary containing the symbols and function signatures I needed:

# .\Setup.ps1
$deps = @("BouncyCastle.Cryptography")

if (Test-Path BuildStub) {
    Remove-Item -Recurse -Force BuildStub
}
dotnet new console --output BuildStub --framework net8.0

$file = (Get-Content BuildStub/BuildStub.csproj) -as [Collections.ArrayList]
$file.Insert($file.Count - 2, "`r`n<PropertyGroup><PublishAot>true</PublishAot></PropertyGroup>")
foreach ($dep in $deps) {
    $file.Insert($file.Count - 2, "`r`n<ItemGroup><TrimmerRootAssembly Include=`"${dep}`" /></ItemGroup>")
}
$file | Set-Content BuildStub/BuildStub.csproj

Push-Location BuildStub
foreach ($dep in $deps) { 
    dotnet add package $dep
}
dotnet publish -c Release -r win-x64
Pop-Location

🔔 Nota Bene

  • An arbitrary number of dependencies can be added to $deps to capture symbols from many libraries at one time
  • The application inserts 2 critical pieces of config to the project:
    • <PublishAot>true</PublishAot> which sets the project to AOT publish mode
    • <TrimmerRootAssembly Include="${dep}" /> which ensures that library dependencies are not optimized out of the build
      • This is something I’ve seen folks get majorly slowed down by. Configuring libraries as root assemblies lets us avoid having to write harnesses that exercise the functions we care about - providing a full symbol table for the library

We now have an easy and repeatable method for creating our symbol binaries.

strings

Creating Signatures

Users of Ida, Binja, etc… are probably comfortable at this point creating their FLIRT/WARP/SigKit sigs and loading them into tools of choice.

As an academic exercise I wanted to go a little further with this exploration and show how we might create a signature matching tool by hand. The aforementioned mechanisms are incredible for their approaches to solving the very difficult problems in function signature matching. However, we’re going to put aside concerns like unconventional control flow, caller/callee signatures, and highly optimized lookup speed for a moment and consider what a basic bespoke signature matcher could look like.

Leaning on a few highly capable libraries to help we can generate our own naive signature database in just a handful of lines of code:

# python generate_signatures.py
import json
from capstone import CS_ARCH_X86, CS_MODE_64, Cs
import pefile
from windows.debug import symbols
import re


def is_valid_ptr(ptr: int, pe) -> bool:
    return pe.OPTIONAL_HEADER.ImageBase <= ptr < pe.OPTIONAL_HEADER.ImageBase + pe.OPTIONAL_HEADER.SizeOfImage


def erase_ptrs(disasm: str, pe) -> str:
    m = re.search(r'0x[0-9a-fA-F]+', disasm)
    if m and is_valid_ptr(int(m.group(), 16), pe):
        disasm = disasm[:m.start()] + disasm[m.end():]
    m = re.search(r'rip [+-] 0x[0-9a-fA-F]+', disasm)
    if m:
        disasm = disasm[:m.start()] + disasm[m.end():]
    return disasm


def main():
    sigs: dict[str, str] = {}

    pe =  pefile.PE('BuildStub\\bin\\Release\\net8.0\\win-x64\\native\\BuildStub.exe', fast_load=True)
    symbols.engine.options = 0
    sh = symbols.VirtualSymbolHandler()
    sh.load_file('BuildStub\\bin\\Release\\net8.0\\win-x64\\native\\BuildStub.exe', addr=pe.OPTIONAL_HEADER.ImageBase)

    md = Cs(CS_ARCH_X86, CS_MODE_64)
    for section in pe.sections:
        if section.IMAGE_SCN_MEM_EXECUTE:
            for i in md.disasm(section.get_data().rstrip(b'\0'), pe.OPTIONAL_HEADER.ImageBase + section.VirtualAddress):
                if i.mnemonic not in ('int3', 'nop') and sh[i.address]:
                    sym_name = str(sh[i.address]).partition('+')[0]
                    sigs[sym_name] = sigs.get(sym_name, [])
                    sigs[sym_name].append(f'{i.mnemonic} {erase_ptrs(i.op_str, pe)}'.strip())

    with open('signature_db.json', 'w') as f:
        json.dump({k: v for k, v in sigs.items() if len(v) > 3 and v[-1] != 'jmp'}, f, indent=2)


if __name__ == '__main__':
    main()

Breaking the major components down a bit:

To create signatures we cross-reference each line of disassembly against our symbol database, yielding us symbol strings in the following format:

BuildStub!S_P_CoreLib_System_SpanHelpers__BinarySearch_0<UInt8__UInt8>+0x4

or in other words

Module!Method+Offset

Leaning on this a simple function signature database can be created with the following structure:

{
    "symbol name": [
        "disassembly 1",
        "disassembly n"
    ]
}

Each time we query our symbol PDB we strip the offset off and append the disassembly to our symbol disassembly listing. In this way we create a basic but generally effective database of symbols to their (generally) full function disassembly.

🔔 Nota Bene

  • Note that the key to making these function signatures portable is removing absolute pointers from signatures
    • By regex searching for addresses and removing them when they fall in the module address-space we create crude but effective-enough portable signature
  • Note padding like interfunction int3s and nops are also removed for signature portability

Example signature:

"BuildStub!BouncyCastle_Cryptography_Org_BouncyCastle_Math_EC_ECCurve__Equals_0": [
    "push rbx",
    "sub rsp, 0x20",
    "mov rbx, rcx",
    "lea rcx, []",   <-- address removed
    "call",          <-- address removed
    "mov rdx, rax",
    "mov rcx, rbx",
    "mov rax, qword ptr [rbx]",
    "add rsp, 0x20",
    "pop rbx",
    "jmp qword ptr [rax + 0x130]"
  ],

Matching Signatures

Given a database of signatures we can now match these against our target binary. A naive approach looks incredibly similar to our signature generator.

High level:

  • Disassemble executable sections, storing disassembly and address information
    • This disassembly uses the same strategy for wiping addresses for portability
  • Enumerate signatures and search the disassembly listing for matching function definitions
# python match_signatures.py
import json
from capstone import CS_ARCH_X86, CS_MODE_64, Cs
import pefile
import re


def is_valid_ptr(ptr: int, pe) -> bool:
    return pe.OPTIONAL_HEADER.ImageBase <= ptr < pe.OPTIONAL_HEADER.ImageBase + pe.OPTIONAL_HEADER.SizeOfImage


def erase_ptrs(disasm: str, pe) -> str:
    m = re.search(r'0x[0-9a-fA-F]+', disasm)
    if m and is_valid_ptr(int(m.group(), 16), pe):
        disasm = disasm[:m.start()] + disasm[m.end():]
    m = re.search(r'rip [+-] 0x[0-9a-fA-F]+', disasm)
    if m:
        disasm = disasm[:m.start()] + disasm[m.end():]
    return disasm


def main():
    addrs: list[int] = []
    disasm: list[str] = []
    sigs: dict[str, str]
    matches: list[tuple[int, str]] = []

    with open('signature_db.json', 'r') as f:
        sigs = json.load(f)

    pe =  pefile.PE('fullspeed.exe', fast_load=True)
    md = Cs(CS_ARCH_X86, CS_MODE_64)
    for section in pe.sections:
        if section.IMAGE_SCN_MEM_EXECUTE:
            for i in md.disasm(section.get_data().rstrip(b'\0'), pe.OPTIONAL_HEADER.ImageBase + section.VirtualAddress):
                if i.mnemonic not in ('int3', 'nop'):
                    addrs.append(i.address)
                    disasm.append(f'{i.mnemonic} {erase_ptrs(i.op_str, pe)}'.strip())

    sig_entries = list(sigs.items())
    for i in range(len(sig_entries)):
        sym_name, fn_sig = sig_entries[i]
        print(f'[*] [{i+1}/{len(sig_entries)}] {sym_name}...')
        for i, addr in enumerate(addrs):
            if disasm[i:i+len(fn_sig)] == fn_sig:
                print(f'\t[+] Match found at 0x{addr:x}')
                matches.append((addr, sym_name))

    with open('match_db.json', 'w') as f:
        json.dump(matches, f, indent=2)


if __name__ == '__main__':
    main()

It’s not fast, it doesn’t handle edge cases - but it does get the job done well enough, and it is easy-as-pie to understand 🎉

Kinda cool right!?

strings

Extra Credit: Enriching the x64dbg Program Database

As a final note I thought it might be helpful to show how you might be able to use these matched signatures to enrich downstream tooling. When I originally solved “fullspeed” I remember badly wanting symbol information in x64dbg during my debug sessions.

from ctypes import CDLL, wintypes
import ctypes
import json
import pefile


lz4 = CDLL(r'E:\re\x64dbg\release\x64\lz4.dll')
LZ4_decompress_fileW = lz4.LZ4_decompress_fileW
LZ4_decompress_fileW.argtypes = [wintypes.LPCWSTR, wintypes.LPCWSTR]
LZ4_decompress_fileW.restype = ctypes.c_longlong
LZ4_compress_fileW = lz4.LZ4_compress_fileW
LZ4_compress_fileW.argtypes = [wintypes.LPCWSTR, wintypes.LPCWSTR]
LZ4_compress_fileW.restype = ctypes.c_longlong

pe =  pefile.PE('fullspeed.exe', fast_load=True)

assert LZ4_decompress_fileW(r'E:\re\x64dbg\release\x64\db\fullspeed.exe.dd64', r'fullspeed.exe.db') == 0

with open('fullspeed.exe.db', 'rb') as f:
    db = json.load(f)

with open('matches.json', 'r') as f:
    matches = json.load(f)

db['labels'] = db.get('labels', [])
for addr, sym_name in matches:
    db['labels'].append({
        "manual": False,
        "module": "fullspeed.exe",
        "text": sym_name.replace('BuildStub!', ''),
        "address": hex(addr - pe.OPTIONAL_HEADER.ImageBase)
    })

with open('fullspeed.exe.db', 'w') as f:
    json.dump(db, f)

assert LZ4_compress_fileW(r'fullspeed.exe.db', r'E:\re\x64dbg\release\x64\db\fullspeed.exe.dd64') == 0

The x64dbg program database is loosely documented on their wiki. You’ll note I opted to lean on their compression library directly, and from there it’s a little mapping of addresses and names to insert these symbols as labels.

Well doesn’t this look nicer to debug now?

strings

Happy hacking, until next time 🪇