Reverse Engineering Challenge: WeeperVM -- Level 1 by Ben_Lolo
BLUF: Reverse engineering a custom virtual machine architecture and instruction set. Light crypto cracking.
Foreword
A big thanks to Ben_Lolo for putting this together! This was a fantastic and well crafted out-of-the-box challenge.
The challenge’s ISA made for a great reverse engineering exercise, and contained a few interesting twist turns to boot. The embedded cryptography threw me, and this solution will demonstrate both easy and the hard ways to tackle finding the flag.
Goals
- Reverse engineer the VM instruction set and architecture
- Recover the flag
Lab Environment
- Kali GNU/Linux Rolling (2025.1)
- Python 3.13 (64-bit)
- ghidra 11.3.1 PUBLIC 20250219
Analysis
On decompilation we see the binary is a concise and straightforward presentation of its task. A few analysis notes:
Encrypted strings
We see a runtime string decoder @00011a28
. Cleaning up the arguments and types nets us a readable method:
char * string_decoder(byte *ciphertext,long c_len,byte *k,long k_len)
{
char *out_buf;
ulong i;
int stop;
stop = (int)c_len;
out_buf = (char *)calloc((long)(stop + 1),1);
for (i = 0; (int)i < stop; i = i + 1) {
out_buf[i] = k[(int)((long)((ulong)(uint)((int)i >> 0x1f) << 0x20 | i & 0xffffffff) %
(long)(int)k_len)] ^ ciphertext[i];
}
out_buf[stop] = '\0';
return out_buf;
}
Which lets us put together our own quick and dirty decoder, example:
def string_decoder(c: bytes, k: bytes) -> str:
plaintext = bytearray()
for i in range(len(c)):
plaintext.append(c[i] ^ k[i % len(k)])
return plaintext.decode()
k = b'sIU82!q0Kb<F^K'
c = bytes.fromhex('203c38515c065151220c1b327e391a2e3d4c1c0f5f3a')
print(string_decoder(c, k))
# Sumin' ain't right...
Decoding strings in the codebase provides some good context on the apps behavior and pitfalls. For instance the application is nice enough to announce when it’s terminating due to anti-* measures, so decoded strings help orient quite a bit.
Anti-Dynamic Analysis
The challenge contains a number of tricks to prevent debugging and tracing. I did not cover these in depth, because I opted to solve the majority of the challenge statically. It should be noted however, that memory can still be analyzed without tripping protection measures. Part of how I verified my lifted VM code was valid was comparing to the VM data section at runtime:
import subprocess
import threading
import time
proc: subprocess.Popen | None = None
last_mbuf = None
def memory_sentinel():
global last_mbuf
while True:
if proc is None:
continue
try:
with open(f'/proc/{proc.pid}/mem', 'rb') as reader:
# Observe VM's encrypted input
reader.seek(0x00012f20 + 0x40)
last_mbuf = reader.read(32)
except FileNotFoundError:
print(f"debug: /proc/{proc.pid}/mem not found.")
quit()
time.sleep(0.3)
def vm(inp: bytes) -> int:
global proc
proc = subprocess.Popen(['./WeeperVM--Level_1'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
proc.communicate(input=inp)
if __name__ == '__main__':
t = threading.Thread(target=memory_sentinel)
t.start()
vm('w1234567890abcdefghhijklmnopqrst\n'.encode())
print(last_mbuf.hex(), last_mbuf)
# e.g.
# 8b2f2d302a33d9c0c53593ce6a457b8f76320f0c168b1bd64ba9203767b24f2d
# b'\x8b/-0*3\xd9\xc0\xc55\x93\xcejE{\x8fv2\x0f\x0c\x16\x8b\x1b\xd6K\xa9 7g\xb2O-'
Since the address space of this binary is not randomized we can create memory sentinels at specific VM data offsets to observe behavior.
checksec --file=WeeperVM--Level_1
RELRO STACK CANARY NX PIE
No RELRO No canary found NX enabled No PIE
Virtual Architecture
The challenge is best understood starting @00011a28
. This method decodes a single instruction and executes it given program integrity checks passing. This is a foundational starting point for understanding the VM.
The VM components consist of a single readwrite memory range, a bytecode section, and 16 32-bit registers. It can be modeled as follows:
weeper_elf = lief.parse('WeeperVM--Level_1')
# bytecode
weep = weeper_elf.get_section(".weep").content
vm_program = [int.from_bytes(weep[i:i + 4], 'little') for i in range(0, len(weep), 4)]
# readwrite program memory
vm_data = bytearray(weeper_elf.get_section(".lotus").content)
# register space
vm_ctx = [0 for _ in range(16)]
A little ghidra cleanup in 00011a28
yields an instruction decoder along the lines of:
if (sync_check == 0) {
expanded_op = (opcode >> 0x10 & 0xff) * 0x1010101 ^ opcode;
expanded_op = exec_opcode(expanded_op >> 0x1c,expanded_op >> 0x1b & 1,expanded_op >> 8 & 0xff,
expanded_op >> 0x19 & 3,expanded_op & 0xff);
return expanded_op;
}
Instructions are a consistent 32-bit width and decode to:
expanded_op >> 0x1c
->opcode
expanded_op >> 0x1b & 1
->destination r/m/c config
expanded_op >> 8 & 0xff
->destination c
expanded_op >> 0x19 & 3
->source r/m/c config
expanded_op & 0xff
->source c
Browsing into exec_opcode
@000119c0
we see a switch that interprets the opcode and associated operand configuration, executing a single instruction.
Opcodes are trivially understood based on unique operators in each switch case.
Operands generally follow the general structure:
- r/m/c of 0,
c = reg index
- r/m/c of 1,
c = memory offset
- r/m/c of 2,
c = immediate
Control flow is achieved using a flags register which roughly decodes to:
- 1,
==
- 2,
>
- 4,
<=
- 32,
?something to do with input state?
- 64,
Unconditional
Notably, there is no stack based frame or calling. Method calls are approximated using the following convention:
r14 = callee address
r15 = return address
There is no call nesting.
Decompilation and Manual Lifting
Knowing the architecture we can write a quick and dirty decompiler that lifts the VM bytecode to an x86-esque assembly listing.
import lief
CONDITIONS = {
1: 'E',
2: 'G',
4: 'LE',
32: '_HAS_INPUT',
64: 'MP',
}
weeper_elf = lief.parse('WeeperVM--Level_1')
weep = weeper_elf.get_section(".weep").content
vm_program = [int.from_bytes(weep[i:i + 4], 'little') for i in range(0, len(weep), 4)]
vm_data = bytearray(weeper_elf.get_section(".lotus").content)
pc = 0
def instruction_decode(i32: int) -> tuple[int, int, int, int, int]:
i32 = (i32 >> 0x10 & 0xff) * 0x1010101 ^ i32
instr = i32 >> 0x1c
dst_rm = (i32 >> 0x1b) & 1
dst_c = i32 >> 8 & 0xff
src_rmc = (i32 >> 0x19) & 3
src_c = i32 & 0xff
return instr, dst_rm, dst_c, src_rmc, src_c
def operand(rmc: int, dst_c: int) -> int:
if rmc == 0:
return f'r{dst_c}'
elif rmc == 1:
return f'[0x{dst_c:02X}]'
elif rmc == 2:
return f'0x{src_c:02X}'
else:
raise ValueError("Invalid operand r/m value")
r14 = r15 = 0
while pc < len(vm_program):
instr, dst_rm, dst_c, src_rmc, src_c = instruction_decode(vm_program[pc])
match instr:
case 0:
lval = operand(dst_rm, dst_c)
rval = operand(src_rmc, src_c)
if lval == 'r14':
r14 = int(rval, 16)
elif lval == 'r15':
r15 = int(rval, 16)
print(f'{pc:04X}: MOVD {lval}, {rval}')
case 1:
if dst_rm == 0:
print(f'{pc:04X}: MOVB {operand(dst_rm, dst_c)}, [*{operand(src_rmc, src_c)}]')
else:
print(f'{pc:04X}: MOVB [*{operand(dst_rm, dst_c)}], {operand(src_rmc, src_c)}')
case 2:
lval = operand(dst_rm, dst_c)
rval = operand(src_rmc, src_c)
if lval == 'r14':
r14 += int(rval, 16)
append = f' ; r14 = {r14:04X}'
elif lval == 'r15':
r15 += int(rval, 16)
append = f' ; r15 = {r15:04X}'
else:
append = ''
print(f'{pc:04X}: ADD {lval}, {rval} {append}')
case 3:
print(f'{pc:04X}: SUB {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 4:
print(f'{pc:04X}: MUL {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 5:
print(f'{pc:04X}: MOD {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 6:
print(f'{pc:04X}: OUTS [{dst_c:02X}], {operand(src_rmc, src_c)} ; {chr(vm_data[dst_c]) if vm_data[dst_c] != 0xa else "\\n"}')
case 7:
print(f'{pc:04X}: INS 0x{dst_c:02X}, {operand(src_rmc, src_c)}')
case 9:
print(f'{pc:04X}: CMP {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 10:
print(f'{pc:04X}: J{CONDITIONS[src_c]} {operand(dst_rm, dst_c)}')
case 0xb:
print(f'{pc:04X}: AND {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 0xc:
print(f'{pc:04X}: OR {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 0xd:
print(f'{pc:04X}: XOR {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 0xe:
print(f'{pc:04X}: SHL {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case 0xf:
print(f'{pc:04X}: SHR {operand(dst_rm, dst_c)}, {operand(src_rmc, src_c)}')
case _:
print(f"Unknown instruction: {instr}")
quit(-1)
pc += 1
Which yields something along the lines of:
...
0164: OUTS [FA], 0x01 ; :
0165: OUTS [FF], 0x01 ;
0166: MOVD r4, 0x20
0167: INS 0x40, r4
0168: OUTS [FB], 0x01 ; \n
0169: MOVD r12, 0x00
016A: MOVD r14, 0xFF
016B: ADD r14, 0x87 ; r14 = 0186
016C: MOVD r15, 0xFF
016D: ADD r15, 0xFF ; r15 = 01FE
016E: ADD r15, 0xFF ; r15 = 02FD
016F: ADD r15, 0x1F ; r15 = 031C
0170: MOVD r6, 0x40
0171: ADD r6, r12
0172: MOVB r5, [*r6]
0173: CMP r5, [0x3E]
0174: JE r14
0175: CMP r5, [0x3F]
0176: JE r14
0177: CMP r5, [0xFE]
0178: JE r14
0179: MOVD r2, [0x33]
...
Note the program terminates in a hardcoded ciphertext check:
...
02AA: CMP [0x40], 0xDC
02AB: JG r14
02AC: JLE r14
02AD: CMP [0x41], 0x6F
02AE: JG r14
02AF: JLE r14
02B0: CMP [0x42], 0x56
02B1: JG r14
02B2: JLE r14
...
🔔 This is our target state for finding the flag. We need to find an input resulting in that VM state.
The program listing is short and easily understood. I opted to take a meditative break and lift this by hand to python (mostly done just using find and replace regexes).
Being able to run and debug the program in python was extremely helpful for me to understand it.
# Lifted VM
import lief
weeper_elf = lief.parse('WeeperVM--Level_1')
def vm(test_flag: bytes):
data = bytearray(weeper_elf.get_section(".lotus").content)
data[0x40:0x60] = test_flag
for i in range(0x10):
data[0xA0 + i + 0x20] = data[0xA0 + i] ^ data[0xA0 + i + 0x10]
for w128_ix in range(0, 0x20, 0x10):
n = 0
for _ in range(0x10):
w128_n = n
for w64_ix in range(0x8):
c1 = data[0x48 + w128_ix + w64_ix]
ch = (c1 & 0xF0) >> 0x04
cl = c1 & 0x0F
if data[0xC0 + w128_n] & data[0x97 - w64_ix] != 0x00:
c1 = ((data[0x80 + ch] & 0x0F) << 0x04) | ((data[0x80 + cl] & 0xF0) >> 0x04)
else:
c1 = (data[0x80 + ch] & 0xF0) | (data[0x80 + cl] & 0x0F)
msk = c1 ^ data[0xC0 + n]
sb = 0x00
sb |= (data[0x90] & msk) >> 0x03
sb |= (data[0x91] & msk) >> 0x04
sb |= (data[0x92] & msk) << 0x02
sb |= (data[0x93] & msk) >> 0x01
sb |= (data[0x94] & msk) << 0x02
sb |= (data[0x95] & msk) << 0x04
sb |= (data[0x96] & msk) >> 0x01
sb |= (data[0x97] & msk) << 0x01
c0_ix = 0x40 + w128_ix
r10 = 0x90
r9 = 0x98
ix = (w64_ix + 0x07) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x06) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x02) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x01) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x05) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = w64_ix % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x03) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x04) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
if w64_ix < 0x07:
n = (n + 0x01) % 0x10
for w64_ix in range(0x8):
a = data[0x40 + w128_ix + w64_ix]
b = data[0x48 + w128_ix + w64_ix]
data[0x40 + w128_ix + w64_ix] = b
data[0x48 + w128_ix + w64_ix] = a
for w64_ix in range(0x8):
a = data[0x40 + w128_ix + w64_ix]
b = data[0x48 + w128_ix + w64_ix]
data[0x40 + w128_ix + w64_ix] = b
data[0x48 + w128_ix + w64_ix] = a
for i in range(0x10):
data[0xD0 + ((i + 0x09) % 0x10)] = data[0xC0 + i]
for i in range(0x10):
data[0xC0 + i] = data[0xD0 + i]
return data[0x40:0x60]
x = vm(b'waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
print(x)
Finding the Flag
Walking the data flow and observing the VM memory shows us an unknown encryption algorithm. It operates in place on our input data[0x40:0x60]
and checks against a hardcoded ciphertext. The key interestingly enough appears to be hardcoded in the script. Hooray for small victories 🥳
I found myself somewhat stuck at this point – I was unable to fingerprint the algorithm the author used, and opted to solve using some intuitive cryptanalysis instead.
# feck, no crypto constant fingerprints!
import lief
import yara
weeper_elf = lief.parse('WeeperVM--Level_1')
data = bytearray(weeper_elf.get_section(".lotus").content)
rules = yara.compile(filepath='crypto.yara')
matches = rules.match(data=data)
print(matches) # []
From the lifted python we can generally that the algorithm is symmetrical and operates on 128-bit blocks with halves being S-BOX’d / swapped (relevant: Feistel cipher)
Comparing to similar algorithms like DES and Blowfish we can hopefully take an approach where we create a decrypter by simply:
- Reversing the 64-bit half swaps
- Recreating encryption state at a given round
- Running the encrypted bytes through rounds with reversed state
By printing the result of encryption round 1 in data[0x40:0x60]
and running the output through round 1 a second time I was able to verify that assumptions 1-3 are true. The algorithm reverses the round, and we’re given our original input 🎉🎉🎉
Given that, we’re able to make a few small modifications to our original lifted VM to make it a functional decrypter. See comments inline.
import lief
weeper_elf = lief.parse('WeeperVM--Level_1')
N = [0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9, 0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9]
def vm_decrypt(test_flag: bytes):
data = bytearray(weeper_elf.get_section(".lotus").content)
data[0x40:0x60] = test_flag
for i in range(0x10):
data[0xA0 + i + 0x20] = data[0xA0 + i] ^ data[0xA0 + i + 0x10]
for w128_ix in [0x10, 0x00]:
# Relocated 64-bit swap
for w64_ix in range(0x8):
a = data[0x40 + w128_ix + w64_ix]
b = data[0x48 + w128_ix + w64_ix]
data[0x40 + w128_ix + w64_ix] = b
data[0x48 + w128_ix + w64_ix] = a
for _ in range(0x10):
# Replay rolling modulo in reverse, using captured state (cheesy but works)
w128_n = n = N.pop()
# Replay rolling state in reverse, using captured state (cheesy but works)
if w128_ix == 0x00:
data[0xc0:0xd0] = bytes.fromhex('46fc63eda6418771dc8d409cda61da1a')
else:
data[0xc0:0xd0] = bytes.fromhex('71dc8d409cda61da1a46fc63eda64187')
# Relocated 64-bit swap
for w64_ix in range(0x8):
a = data[0x40 + w128_ix + w64_ix]
b = data[0x48 + w128_ix + w64_ix]
data[0x40 + w128_ix + w64_ix] = b
data[0x48 + w128_ix + w64_ix] = a
for w64_ix in range(0x8):
c1 = data[0x48 + w128_ix + w64_ix]
ch = (c1 & 0xF0) >> 0x04
cl = c1 & 0x0F
if data[0xC0 + w128_n] & data[0x97 - w64_ix] != 0x00:
c1 = ((data[0x80 + ch] & 0x0F) << 0x04) | ((data[0x80 + cl] & 0xF0) >> 0x04)
else:
c1 = (data[0x80 + ch] & 0xF0) | (data[0x80 + cl] & 0x0F)
msk = c1 ^ data[0xC0 + n]
sb = 0x00
sb |= (data[0x90] & msk) >> 0x03
sb |= (data[0x91] & msk) >> 0x04
sb |= (data[0x92] & msk) << 0x02
sb |= (data[0x93] & msk) >> 0x01
sb |= (data[0x94] & msk) << 0x02
sb |= (data[0x95] & msk) << 0x04
sb |= (data[0x96] & msk) >> 0x01
sb |= (data[0x97] & msk) << 0x01
c0_ix = 0x40 + w128_ix
r10 = 0x90
r9 = 0x98
ix = (w64_ix + 0x07) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x06) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x02) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x01) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x05) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = w64_ix % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x03) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
ix = (w64_ix + 0x04) % 0x08
data[c0_ix + ix] = ((sb ^ data[c0_ix + ix]) & data[r10]) | (data[c0_ix + ix] & data[r9])
r10 += 0x01
r9 += 0x01
if w64_ix < 0x07:
n = (n + 0x01) % 0x10
return data[0x40:0x60]
x = vm_decrypt(bytes.fromhex('DC6F56C2D34E507769B24DE98FF52D122A9581E535DB8A1014FFA704D0C1E574'))
print(x.decode())
# Win!!!
# ()v3Ng34nce_I5_4n_1D10T5_g4m3_()
Neat! Using a little restructuring + replay we’re able to create a shitty but functional decrypter for the hardcoded ciphertext.
Wrap-up
After solving I asked Ben_Lolo what algorithm the VM implemented, and he was kind enough to point me towards Lucifer. The easier way to solve would certainly have been extracting the key from the VM and decrypting using a reference implementation.
At the same time there are some fun lessons to be learned from basic analysis of the algorithm, so I opted to document that approach regardless.
🎉🎉 happy hacking, until next time 🎉🎉
darbonzo