$ file john
john: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=297bb194bf0ae17829e37240b7c7b6aa8a327572, stripped
[*] '/home/zerocool/chall/todo/john/john'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x8048000)
As we know a packer is a type of malware that has an encrypted payload in order to avoid detection during static analysis. At runtime the malicious portion of the code is unpacked, and execute. This means that we need to look for some procedure that manipulates the .text section of the executable. From :
Self-Injection is just one of the techniques used by malware authors for obfuscation, there are many other techniques like Process Injection (or Process Hollowing), Classic DLL Injection and Thread Execution Hijacking. There are a few different techniques for Self-Injection itself, but a common technique is to first unpack a small stub in-memory, it transfers the execution to the stub, the stub code then changes the permission of a section in the process, write the malicious code into those sections and transfer the execution back to the overwritten sections of the PE file.
The first argument of the mprotect is the address in memory from which we want to start changing permissions. The length is 4096 bytes. This probably means that the malicious code segment is located at param_1. By looking at the code at runtime, we can see that the address on which the mprotect is called is probably 0x804970e. Also it looks like it ends at 0x804990e. This means that it is located in this memory page:
Basically I was almost right about what was happening here. We have that the main is calling an unpacking routine, which is the unpack function above. The num parameter is a counter for a loop:
The counter is likely the size of the function to unpack. Note that the key used to decrypt the function depends on the address of the function itself.
We looked at the unpacking routine, but we're still missing the main part.
Where do we start?
We have two approaches:
Static: Since we understand the unpacking routine we can build an unpacker and look at it with ghidra.
Dynamic: we could run the binary and use ghidra to look at the unpacked running code, or we can dump the memory of the running binary and look at it.
Ok this makes sense: push ebp is how an actual disassembled function starts. Note that we have another function call:
0x8049757: call 0x804922b
Note that this could be another packer... Actually its the address of the unpack function. So we're calling another function trough the unpacker. Here's the new function:
We've got a call to the strstr function, and another call to the packer.
From this behaviour we unedrstand that the check on the flag is performed incrementally by calling unpack and the pack on some code, and so on. This means that we'll not have a point in time in which during execution the binary is completely unpacked. Extracting the unpacked code dynamically can be very time consuming, which means that a more efficient method would be to write a script that reverse engineer the packing algorithm, which then can be used to create a binary containing an unpacked payload. This is what the static approach does in a nutshell.
Let's reverse engineer the packer with python:
import sys
from pwn import u32, p32
if len(sys.argv) < 4:
print("usage : %s <inputfile> <address> <size>" % sys.argv[0])
exit(0)
filepath = sys.argv[1]
address = int(sys.argv[2], 16)
size = int(sys.argv[3], 16)
BEG_BIN = 0x08048000
KEY = [0x04030201, 0x40302010, 0x42303042, 0x44414544, 0xffffffff]
ff = open(filepath, "rb")
f = ff.read()
ff.close()
off = address - BEG_BIN
to_decode = f[off: off+(size*4)]
k = KEY[address % 5]
decode = b""
for i in range(size):
decode += p32(u32(to_decode[i*4: (i+1)*4]) ^ k)
f = f[:off] + decode + f[off+(size*4):]
ff = open(filepath, "wb")
ff.write(f)
ff.close()
This gives us a decent unpacked main:
void FUN_0804970e(void)
{
int iVar1;
int iVar2;
int iVar3;
int iVar4;
int iVar5;
int iVar6;
int iVar7;
int in_stack_00000014;
undefined4 *in_stack_00000018;
if (in_stack_00000014 < 2) {
printf("Usage:\n %s flag{<key>}\n",*in_stack_00000018);
/* WARNING: Subroutine does not return */
exit(0);
}
iVar1 = FUN_0804922b(FUN_080492a0,0x11,in_stack_00000018[1]);
iVar2 = FUN_0804922b(FUN_080492e5,0x11,in_stack_00000018[1]);
iVar3 = FUN_0804922b(FUN_08049329,0x17,in_stack_00000018[1]);
iVar4 = FUN_0804922b(FUN_080496ab,0x18,in_stack_00000018[1]);
iVar5 = FUN_0804922b(FUN_080495e4,0x31,in_stack_00000018[1]);
iVar6 = FUN_0804922b(FUN_08049546,0x27,in_stack_00000018[1],0);
iVar7 = FUN_0804922b(FUN_0804951f,9,in_stack_00000018[1]);
if (iVar1 + iVar2 + iVar3 + iVar4 + iVar5 + iVar6 + iVar7 == 7) {
printf("\x1b[1;37mYou got the flag: \x1b[1;32m%s\x1b[0m\n",in_stack_00000018[1]);
}
else {
printf("\x1b[1;31mLoser\n\x1b[0m");
}
return;
}
We need to repeat this to unpack every function used to create the flag. The first three, as already seen, are respectively checks on the prefix (flag{}), the suffix (}) and the type of characters of the flag (must be ascii). The last one is a check on the flag length (==33). As for the other function we have the following structure:
Where the indented functions are calls that are made inside the outer ones. The first function determines the first 6 characters of the flag, which are 'packer'. We are still missing 21 characters, which are originated from FUN_080495e4 and FUN_08049546.
FUN_080496ab
After unpacking we get:
int FUN_080496ab(void) {
int flagchar;
int in_stack_00000014;
int i;
char usrinput;
i = 1;
while( true ) {
if (6 < i) {
return 1;
}
usrinput = *(char *)(in_stack_00000014 + i + 4);
flagchar = FUN_0804922b(FUN_08049385,0x36,i);
if (usrinput != flagchar) break;
i = i + 1;
}
return 0;
}
Ok looks like there's another function to unpack: FUN_08049385, which becomes:
Which we can either reverse engineer with z3, or by looking at which characters are outputed in registers by using gdb (again dynamic approach). After reverse engineering this we'll get that the first part of the flag is flag{packer. Since the flag is 33 chars long, we are still missing 21.
FUN_080495e4
Looks like we need to reverse engineer this:
undefined4 FUN_080495e4(void) {
undefined4 uVar1;
int i;
undefined4 *puVar2;
undefined4 *puVar3;
int in_GS_OFFSET;
int in_stack_00000014;
int j;
undefined4 local_7c [23];
int canary;
canary = *(int *)(in_GS_OFFSET + 0x14);
puVar2 = &weird_string;
puVar3 = local_7c;
/* filling local_7c with hardcoded data
*/
for (i = 22; i != 0; i = i + -1) {
*puVar3 = *puVar2;
puVar2 = puVar2 + 1;
puVar3 = puVar3 + 1;
}
j = 0;
do {
if (10 < j) {
uVar1 = 1;
LAB_08049692:
/* exit */
if (canary != *(int *)(in_GS_OFFSET + 0x14)) {
/* WARNING: Subroutine does not return */
__stack_chk_fail();
}
return uVar1;
}
/* here's the real magic */
i = FUN_0804922b(FUN_0804945e,0x30,(int)*(char *)(in_stack_00000014 + j + 11),local_7c[j * 2],
local_7c[j * 2 + 1]);
if (i == 0) {
uVar1 = 0;
goto LAB_08049692;
}
if ((*(byte *)(in_stack_00000014 + 0x11) & 1) == 0) {
uVar1 = 0;
goto LAB_08049692;
}
j = j + 1;
} while( true );
}
WTF is this? From a first look it seems like we're preparing local_7c and then we're comparing its content with the flag passed by the user using FUN_0804945e. If for some reason the comparison does not go well the function is exited. More specifically it looks like we're comparing each even character of local_7c and its successor with every character of the user input. This is the code that is doing the comparison:
Which is not something that really makes sense translated in ascii. Actually neither the content of FUN_0804945e makes sense, really. If fact I was so done that I tried to bruteforce the checks. Basically, we know that the return value of FUN_0804945e must be 1. This means that we can script the execution with a gdbinit and try every possible characters for every position of the flag checked by the function to find what we need.
$ p y.py
13:22:02 - starting...
found a char: flag{packer-
found a char: flag{packer-4
found a char: flag{packer-4a3
found a char: flag{packer-4a3-
found a char: flag{packer-4a3-1
found a char: flag{packer-4a3-13
found a char: flag{packer-4a3-133
found a char: flag{packer-4a3-1337
finished. took 778.9688053131104 seconds
13:35:01 - characters found: -4a3-1337
exiting...
Ok now we know that the flag up to now is: flag{packer-4a3-1337 something }, where the missing part is 12 characters long.
FUN_08049546
Ok we're still missing 20 characters, identified by this function:
Ok this looks easy. index starts from zero. Every function call its incremented by one. The check performed every call compares index+22 against 33, which means that we'll have 11 loop iterations. For evey iteration the character at index+20 of the flag is XORed with key and compared with its follower. Since the XOR is invertible, I think that we can reverse the process to get back the original characters. Using z3:
from z3 import Solver, BitVec, Z3Exception
from IPython import embed
from string import printable
flag1 = 'flag{packer-4a3-1337'
orig_key = b'\x0b\x4c\x0f\x00\x01\x16\x10\x07\x09\x38\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
key = []
for i in range(len(orig_key)):
key.append(orig_key[i])
print('charset length: %s' % len(printable))
for char in printable:
print('\ntrying with %s...' % char)
flagchars = [BitVec('flag_' + str(i), 32) for i in range(20, 31)]
s = Solver()
for i in range(0, 10):
s.add(flagchars[i] ^ key[i] == flagchars[i+1])
s.add(flagchars[0] == ord(char))
s.check()
try:
m = s.model()
except Z3Exception:
continue
flag = []
for char in flag1:
flag.append(char)
j = 0
for el in m:
flag.append(chr(m[flagchars[j]].as_long()))
j += 1
flag.append('}')
print(''.join(flag))
#embed()