Reverse Engineering Challenge: WeeperVM -- Level 1 by Ben_Lolo

Challenge Link

BLUF: Reverse engineering a custom virtual machine architecture and instruction set. Light crypto cracking.

Foreword

A big thanks to Ben_Lolo for putting this together! This was a fantastic and well crafted out-of-the-box challenge.

The challenge’s ISA made for a great reverse engineering exercise, and contained a few interesting twist turns to boot. The embedded cryptography threw me, and this solution will demonstrate both easy and the hard ways to tackle finding the flag.

Goals

  1. Reverse engineer the VM instruction set and architecture
  2. Recover the flag

Lab Environment

- Kali GNU/Linux Rolling (2025.1)
- Python 3.13 (64-bit)
- ghidra 11.3.1 PUBLIC 20250219

Analysis

On decompilation we see the binary is a concise and straightforward presentation of its task. A few analysis notes:

Encrypted strings

We see a runtime string decoder @00011a28. Cleaning up the arguments and types nets us a readable method:

char * string_decoder(byte *ciphertext,long c_len,byte *k,long k_len)
{
  char *out_buf;
  ulong i;
  int stop;
  
  stop = (int)c_len;
  out_buf = (char *)calloc((long)(stop + 1),1);
  for (i = 0; (int)i < stop; i = i + 1) {
    out_buf[i] = k[(int)((long)((ulong)(uint)((int)i >> 0x1f) << 0x20 | i & 0xffffffff) %
                        (long)(int)k_len)] ^ ciphertext[i];
  }
  out_buf[stop] = '\0';
  return out_buf;
}

Which lets us put together our own quick and dirty decoder, example:

def string_decoder(c: bytes, k: bytes) -> str:
    plaintext = bytearray()
    for i in range(len(c)):
        plaintext.append(c[i] ^ k[i % len(k)])
    return plaintext.decode()


k = b'sIU82!q0Kb<F^K'
c = bytes.fromhex('203c38515c065151220c1b327e391a2e3d4c1c0f5f3a')
print(string_decoder(c, k)) 
    # Sumin' ain't right...

Decoding strings in the codebase provides some good context on the apps behavior and pitfalls. For instance the application is nice enough to announce when it’s terminating due to anti-* measures, so decoded strings help orient quite a bit.

Anti-Dynamic Analysis

The challenge contains a number of tricks to prevent debugging and tracing. I did not cover these in depth, because I opted to solve the majority of the challenge statically. It should be noted however, that memory can still be analyzed without tripping protection measures. Part of how I verified my lifted VM code was valid was comparing to the VM data section at runtime:

import subprocess
import threading
import time

proc: subprocess.Popen | None = None
last_mbuf = None

def memory_sentinel():
    global last_mbuf
    while True:
        if proc is None:
            continue
        try:
            with open(f'/proc/{proc.pid}/mem', 'rb') as reader:
                # Observe VM's encrypted input
                reader.seek(0x00012f20 + 0x40)
                last_mbuf = reader.read(32)
        except FileNotFoundError:
            print(f"debug: /proc/{proc.pid}/mem not found.")
            quit()
        time.sleep(0.3)

def vm(inp: bytes) -> int:
    global proc
    proc = subprocess.Popen(['./WeeperVM--Level_1'], 
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE)
    proc.communicate(input=inp)

if __name__ == '__main__':
    t = threading.Thread(target=memory_sentinel)
    t.start()

    vm('w1234567890abcdefghhijklmnopqrst\n'.encode())
    print(last_mbuf.hex(), last_mbuf)

    # e.g. 
    # 8b2f2d302a33d9c0c53593ce6a457b8f76320f0c168b1bd64ba9203767b24f2d 
    # b'\x8b/-0*3\xd9\xc0\xc55\x93\xcejE{\x8fv2\x0f\x0c\x16\x8b\x1b\xd6K\xa9 7g\xb2O-'

Since the address space of this binary is not randomized we can create memory sentinels at specific VM data offsets to observe behavior.

checksec --file=WeeperVM--Level_1
RELRO           STACK CANARY      NX            PIE
No RELRO        No canary found   NX enabled    No PIE

Virtual Architecture

The challenge is best understood starting @00011a28. This method decodes a single instruction and executes it given program integrity checks passing. This is a foundational starting point for understanding the VM.

The VM components consist of a single readwrite memory range, a bytecode section, and 16 32-bit registers. It can be modeled as follows:

weeper_elf = lief.parse('WeeperVM--Level_1')

# bytecode
weep = weeper_elf.get_section(".weep").content
vm_program = [int.from_bytes(weep[i:i + 4], 'little') for i in range(0, len(weep), 4)]

# readwrite program memory
vm_data = bytearray(weeper_elf.get_section(".lotus").content)

# register space
vm_ctx = [0 for _ in range(16)]

A little ghidra cleanup in 00011a28 yields an instruction decoder along the lines of:

  if (sync_check == 0) {
    expanded_op = (opcode >> 0x10 & 0xff) * 0x1010101 ^ opcode;
    expanded_op = exec_opcode(expanded_op >> 0x1c,expanded_op >> 0x1b & 1,expanded_op >> 8 & 0xff,
                              expanded_op >> 0x19 & 3,expanded_op & 0xff);
    return expanded_op;
  }

Instructions are a consistent 32-bit width and decode to:

  • expanded_op >> 0x1c -> opcode
  • expanded_op >> 0x1b & 1 -> destination r/m/c config
  • expanded_op >> 8 & 0xff -> destination c
  • expanded_op >> 0x19 & 3 -> source r/m/c config
  • expanded_op & 0xff -> source c

Browsing into exec_opcode@000119c0 we see a switch that interprets the opcode and associated operand configuration, executing a single instruction.

Opcodes are trivially understood based on unique operators in each switch case.

Operands generally follow the general structure:

  • r/m/c of 0, c = reg index
  • r/m/c of 1, c = memory offset
  • r/m/c of 2, c = immediate

Control flow is achieved using a flags register which roughly decodes to:

  • 1, ==
  • 2, >
  • 4, <=
  • 32, ?something to do with input state?
  • 64, Unconditional

Notably, there is no stack based frame or calling. Method calls are approximated using the following convention:

  • r14 = callee address
  • r15 = return address

There is no call nesting.

Decompilation and Manual Lifting

Knowing the architecture we can write a quick and dirty decompiler that lifts the VM bytecode to an x86-esque assembly listing.

import lief


CONDITIONS = {
    1: 'E',
    2: 'G',
    4: 'LE',
    32: '_HAS_INPUT',
    64: 'MP',
}


weeper_elf = lief.parse('WeeperVM--Level_1')
weep = weeper_elf.get_section(".weep").content
vm_program = [int.from_bytes(weep[i:i + 4], 'little') for i in range(0, len(weep), 4)]
vm_data = bytearray(weeper_elf.get_section(".lotus").content)
pc = 0


def instruction_decode(i32: int) -> tuple[int, int, int, int, int]:
    i32 = (i32 >> 0x10 & 0xff) * 0x1010101 ^ i32
    instr = i32 >> 0x1c
    dst_rm = (i32 >> 0x1b) & 1
    dst_c = i32 >> 8 & 0xff
    src_rmc = (i32 >> 0x19) & 3
    src_c = i32 & 0xff
    return instr, dst_rm, dst_c, src_rmc, src_c


def operand(rmc: int, dst_c: int) -> int:
    if rmc == 0:
        return f'r{dst_c}'
    elif rmc == 1:
        return f'[0x{dst_c:02X}]'
    elif rmc == 2:
        return f'0x{src_c:02X}'
    else:
        raise ValueError("Invalid operand r/m value")


r14 = r15 = 0
while pc < len(vm_program):
    instr, dst_rm, dst_c, src_rmc, src_c = instruction_decode(vm_program[pc])

    match instr:
        case 0:
            lval = operand(dst_rm, dst_c)
            rval = operand(src_rmc, src_c)
            if lval == 'r14':
                r14 = int(rval, 16)
            elif lval == 'r15':
                r15 = int(rval, 16)
            print(f'{pc:04X}: MOVD {lval}, {rval}')
        case 1:
            if dst_rm == 0:
                print(f'{pc:04X}: MOVB {operand(dst_rm, dst_c)}, [*{operand(src_rmc, src_c)}]')
            else:
                print(f'{pc:04X}: MOVB [*{operand(dst_rm, dst_c)}], {operand(src_rmc, src_c)}')
        case 2:
            lval = operand(dst_rm, dst_c)
            rval = operand(src_rmc, src_c)
            if lval == 'r14':
                r14 += int(rval, 16)
                append = f' ; r14 = {r14:04X}'
            elif lval == 'r15':
                r15 += int(rval, 16)
                append = f' ; r15 = {r15:04X}'
            else:
                append = ''
            print(f'{pc:04X}: ADD {lval}, {rval} {append}')
        case 3:
            print(f'{pc:04X}: SUB {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 4:
            print(f'{pc:04X}: MUL {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 5:
            print(f'{pc:04X}: MOD {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 6:
            print(f'{pc:04X}: OUTS [{dst_c:02X}], {operand(src_rmc, src_c)} ;  {chr(vm_data[dst_c]) if vm_data[dst_c] != 0xa else "\\n"}')
        case 7:
            print(f'{pc:04X}: INS 0x{dst_c:02X}, {operand(src_rmc, src_c)}')
        case 9:
            print(f'{pc:04X}: CMP {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 10:
            print(f'{pc:04X}: J{CONDITIONS[src_c]} {operand(dst_rm, dst_c)}')
        case 0xb:
            print(f'{pc:04X}: AND {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 0xc:
            print(f'{pc:04X}: OR {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 0xd:
            print(f'{pc:04X}: XOR {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 0xe:
            print(f'{pc:04X}: SHL {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case 0xf:
            print(f'{pc:04X}: SHR {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
        case _:
            print(f"Unknown instruction: {instr}")
            quit(-1)

    pc += 1

Which yields something along the lines of:

...
0164: OUTS [FA], 0x01 ;  :
0165: OUTS [FF], 0x01 ;   
0166: MOVD r4, 0x20
0167: INS 0x40, r4
0168: OUTS [FB], 0x01 ;  \n
0169: MOVD r12, 0x00
016A: MOVD r14, 0xFF
016B: ADD r14, 0x87  ; r14 = 0186
016C: MOVD r15, 0xFF
016D: ADD r15, 0xFF  ; r15 = 01FE
016E: ADD r15, 0xFF  ; r15 = 02FD
016F: ADD r15, 0x1F  ; r15 = 031C
0170: MOVD r6, 0x40
0171: ADD r6, r12 
0172: MOVB r5, [*r6]
0173: CMP r5, [0x3E]
0174: JE r14
0175: CMP r5, [0x3F]
0176: JE r14
0177: CMP r5, [0xFE]
0178: JE r14
0179: MOVD r2, [0x33]
...

Note the program terminates in a hardcoded ciphertext check:

...
02AA: CMP [0x40], 0xDC
02AB: JG r14
02AC: JLE r14
02AD: CMP [0x41], 0x6F
02AE: JG r14
02AF: JLE r14
02B0: CMP [0x42], 0x56
02B1: JG r14
02B2: JLE r14
...

🔔 This is our target state for finding the flag. We need to find an input resulting in that VM state.

The program listing is short and easily understood. I opted to take a meditative break and lift this by hand to python (mostly done just using find and replace regexes).

Being able to run and debug the program in python was extremely helpful for me to understand it.

# Lifted VM
import lief

weeper_elf = lief.parse('WeeperVM--Level_1')

def vm(test_flag: bytes):
    data = bytearray(weeper_elf.get_section(".lotus").content)

    data[0x40:0x60] = test_flag

    for i in range(0x10):
        data[0xA0 + i + 0x20] = data[0xA0 + i] ^ data[0xA0 + i + 0x10]

    for w128_ix in range(0, 0x20, 0x10):
        n = 0
        for _ in range(0x10):
            w128_n = n
            for w64_ix in range(0x8):
                c1 = data[0x48 + w128_ix + w64_ix]
                ch = (c1 & 0xF0) >> 0x04
                cl = c1 & 0x0F

                if data[0xC0 + w128_n] & data[0x97 - w64_ix] != 0x00:
                    c1 = ((data[0x80 + ch] & 0x0F) << 0x04) | ((data[0x80 + cl] & 0xF0) >> 0x04)
                else:
                    c1 = (data[0x80 + ch] & 0xF0) | (data[0x80 + cl] & 0x0F)

                msk = c1 ^ data[0xC0 + n]

                sb = 0x00
                sb |= (data[0x90] & msk) >> 0x03
                sb |= (data[0x91] & msk) >> 0x04
                sb |= (data[0x92] & msk) << 0x02
                sb |= (data[0x93] & msk) >> 0x01
                sb |= (data[0x94] & msk) << 0x02
                sb |= (data[0x95] & msk) << 0x04
                sb |= (data[0x96] & msk) >> 0x01
                sb |= (data[0x97] & msk) << 0x01

                c0_ix = 0x40 + w128_ix 
                r10 = 0x90
                r9 = 0x98

                ix = (w64_ix + 0x07) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x06) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x02) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x01) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x05) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = w64_ix % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x03) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x04) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01

                if w64_ix < 0x07:
                    n = (n + 0x01) % 0x10

            for w64_ix in range(0x8):
                a = data[0x40 + w128_ix + w64_ix]
                b = data[0x48 + w128_ix + w64_ix]
                data[0x40 + w128_ix + w64_ix] = b
                data[0x48 + w128_ix + w64_ix] = a

        for w64_ix in range(0x8):
            a = data[0x40 + w128_ix + w64_ix]
            b = data[0x48 + w128_ix + w64_ix]
            data[0x40 + w128_ix + w64_ix] = b
            data[0x48 + w128_ix + w64_ix] = a

        for i in range(0x10):
            data[0xD0 + ((i + 0x09) % 0x10)] = data[0xC0 + i]

        for i in range(0x10):
            data[0xC0 + i] = data[0xD0 + i]

    return data[0x40:0x60]

x = vm(b'waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
print(x)

Finding the Flag

Walking the data flow and observing the VM memory shows us an unknown encryption algorithm. It operates in place on our input data[0x40:0x60] and checks against a hardcoded ciphertext. The key interestingly enough appears to be hardcoded in the script. Hooray for small victories 🥳

I found myself somewhat stuck at this point – I was unable to fingerprint the algorithm the author used, and opted to solve using some intuitive cryptanalysis instead.

# feck, no crypto constant fingerprints!
import lief
import yara
weeper_elf = lief.parse('WeeperVM--Level_1')
data = bytearray(weeper_elf.get_section(".lotus").content)
rules = yara.compile(filepath='crypto.yara')
matches = rules.match(data=data)
print(matches) # []

From the lifted python we can generally that the algorithm is symmetrical and operates on 128-bit blocks with halves being S-BOX’d / swapped (relevant: Feistel cipher)

Comparing to similar algorithms like DES and Blowfish we can hopefully take an approach where we create a decrypter by simply:

  1. Reversing the 64-bit half swaps
  2. Recreating encryption state at a given round
  3. Running the encrypted bytes through rounds with reversed state

By printing the result of encryption round 1 in data[0x40:0x60] and running the output through round 1 a second time I was able to verify that assumptions 1-3 are true. The algorithm reverses the round, and we’re given our original input 🎉🎉🎉

Given that, we’re able to make a few small modifications to our original lifted VM to make it a functional decrypter. See comments inline.

import lief

weeper_elf = lief.parse('WeeperVM--Level_1')
N = [0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9, 0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9]

def vm_decrypt(test_flag: bytes):
    data = bytearray(weeper_elf.get_section(".lotus").content)

    data[0x40:0x60] = test_flag

    for i in range(0x10):
        data[0xA0 + i + 0x20] = data[0xA0 + i] ^ data[0xA0 + i + 0x10]

    for w128_ix in [0x10, 0x00]:
        # Relocated 64-bit swap
        for w64_ix in range(0x8):
            a = data[0x40 + w128_ix + w64_ix]
            b = data[0x48 + w128_ix + w64_ix]
            data[0x40 + w128_ix + w64_ix] = b
            data[0x48 + w128_ix + w64_ix] = a

        for _ in range(0x10):
            # Replay rolling modulo in reverse, using captured state (cheesy but works)
            w128_n = n = N.pop()

            # Replay rolling state in reverse, using captured state (cheesy but works)
            if w128_ix == 0x00:
                data[0xc0:0xd0] = bytes.fromhex('46fc63eda6418771dc8d409cda61da1a')
            else:
                data[0xc0:0xd0] = bytes.fromhex('71dc8d409cda61da1a46fc63eda64187')

            # Relocated 64-bit swap
            for w64_ix in range(0x8):
                a = data[0x40 + w128_ix + w64_ix]
                b = data[0x48 + w128_ix + w64_ix]
                data[0x40 + w128_ix + w64_ix] = b
                data[0x48 + w128_ix + w64_ix] = a

            for w64_ix in range(0x8):
                c1 = data[0x48 + w128_ix + w64_ix]
                ch = (c1 & 0xF0) >> 0x04
                cl = c1 & 0x0F

                if data[0xC0 + w128_n] & data[0x97 - w64_ix] != 0x00:
                    c1 = ((data[0x80 + ch] & 0x0F) << 0x04) | ((data[0x80 + cl] & 0xF0) >> 0x04)
                else:
                    c1 = (data[0x80 + ch] & 0xF0) | (data[0x80 + cl] & 0x0F)

                msk = c1 ^ data[0xC0 + n]

                sb = 0x00
                sb |= (data[0x90] & msk) >> 0x03
                sb |= (data[0x91] & msk) >> 0x04
                sb |= (data[0x92] & msk) << 0x02
                sb |= (data[0x93] & msk) >> 0x01
                sb |= (data[0x94] & msk) << 0x02
                sb |= (data[0x95] & msk) << 0x04
                sb |= (data[0x96] & msk) >> 0x01
                sb |= (data[0x97] & msk) << 0x01

                c0_ix = 0x40 + w128_ix 
                r10 = 0x90
                r9 = 0x98

                ix = (w64_ix + 0x07) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x06) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x02) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x01) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x05) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = w64_ix % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x03) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01
                ix = (w64_ix + 0x04) % 0x08
                data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
                r10 += 0x01
                r9 += 0x01

                if w64_ix < 0x07:
                    n = (n + 0x01) % 0x10

    return data[0x40:0x60]


x = vm_decrypt(bytes.fromhex('DC6F56C2D34E507769B24DE98FF52D122A9581E535DB8A1014FFA704D0C1E574')) 
print(x.decode())

# Win!!!
# ()v3Ng34nce_I5_4n_1D10T5_g4m3_()

Neat! Using a little restructuring + replay we’re able to create a shitty but functional decrypter for the hardcoded ciphertext.

Wrap-up

After solving I asked Ben_Lolo what algorithm the VM implemented, and he was kind enough to point me towards Lucifer. The easier way to solve would certainly have been extracting the key from the VM and decrypting using a reference implementation.

At the same time there are some fun lessons to be learned from basic analysis of the algorithm, so I opted to document that approach regardless.

🎉🎉 happy hacking, until next time 🎉🎉

darbonzo