Packer

john

$ file john
john: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=297bb194bf0ae17829e37240b7c7b6aa8a327572, stripped

[*] '/home/zerocool/chall/todo/john/john'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)

As we know a packer is a type of malware that has an encrypted payload in order to avoid detection during static analysis. At runtime the malicious portion of the code is unpacked, and execute. This means that we need to look for some procedure that manipulates the .text section of the executable. From Unpacking Redaman Malware & Basics of Self-Injection Packers - ft. OALabs (liveoverflow.com):

Self-Injection is just one of the techniques used by malware authors for obfuscation, there are many other techniques like Process Injection (or Process Hollowing), Classic DLL Injection and Thread Execution Hijacking. There are a few different techniques for Self-Injection itself, but a common technique is to first unpack a small stub in-memory, it transfers the execution to the stub, the stub code then changes the permission of a section in the process, write the malicious code into those sections and transfer the execution back to the overwritten sections of the PE file.

We have something similare here:

int unpack(uint *param_1,int param_2) {
  uint uVar1;
  uint *puVar2;
  int iVar3;

  mprotect((void *)((uint)param_1 & 0xfffff000),0x1000,7);
  uVar1 = *(uint *)(&PTR_DAT_0804c03c)[(int)param_1 % 5];
  puVar2 = param_1;
  iVar3 = param_2;
  do {
    *puVar2 = *puVar2 ^ uVar1;
    puVar2 = puVar2 + 1;
    iVar3 = iVar3 + -1;
  } while (iVar3 != 0);
  (*(code *)param_1)();
  iVar3 = FUN_080491e6(param_1,param_2);
  return iVar3;
}

The first argument of the mprotect is the address in memory from which we want to start changing permissions. The length is 4096 bytes. This probably means that the malicious code segment is located at param_1. By looking at the code at runtime, we can see that the address on which the mprotect is called is probably 0x804970e. Also it looks like it ends at 0x804990e. This means that it is located in this memory page:

 0x8049000  0x804a000 r-xp     1000 1000   /home/zerocool/chall/todo/john/john

which is also the only executable page of the .text section. This should be the code we want to examine to find the correct key for this binary:

pwndbg> b *0x0804970e
Breakpoint 1 at 0804970e
pwndbg> r
pwndbg> x/30gi 0x804970e
=> 0x804970e:	pop    ss
   0x804970f:	mov    ebp,esp
   0x8049711:	sub    esp,0x18
   0x8049714:	cmp    DWORD PTR [ebp+0x18],0x1
   0x8049718:	jg     0x804973a
   0x804971a:	mov    eax,DWORD PTR [ebp+0x1c]
   0x804971d:	mov    eax,DWORD PTR [eax]
   0x804971f:	sub    esp,0x8
   0x8049722:	push   eax
   0x8049723:	push   0x804a0f8
   0x8049728:	call   0x8049050 <printf@plt>
   0x804972d:	add    esp,0x10
   0x8049730:	sub    esp,0xc
   0x8049733:	push   0x0
   0x8049735:	call   0x8049080 <exit@plt>
   0x804973a:	mov    DWORD PTR [ebp-0xc],0x0
   0x8049741:	mov    eax,DWORD PTR [ebp+0x1c]
   0x8049744:	add    eax,0x4
   0x8049747:	mov    eax,DWORD PTR [eax]
   0x8049749:	sub    esp,0x4
   0x804974c:	push   eax
   0x804974d:	push   0x11
   0x8049752:	push   0x80492a0
   0x8049757:	call   0x804922b
   0x804975c:	add    esp,0x10
   0x804975f:	add    DWORD PTR [ebp-0xc],eax
   0x8049762:	mov    eax,DWORD PTR [ebp+0x1c]
   0x8049765:	add    eax,0x4
   0x8049768:	mov    eax,DWORD PTR [eax]
   0x804976a:	sub    esp,0x4

Notes from lesson

Basically I was almost right about what was happening here. We have that the main is calling an unpacking routine, which is the unpack function above. The num parameter is a counter for a loop:

int unpack(uint *func,int size) {
  uint *cur_ptr;
  undefined4 uVar1;
  int count;
  uint key;
                    /* here we're preparing the memory page for malicious code */
  mprotect((void *)((uint)func & 0xfffff000),0x1000,7);
  key = *(uint *)(&arr)[(int)func % 5];
  cur_ptr = func;
  count = size;
  do {
    *cur_ptr = *cur_ptr ^ key;
    cur_ptr = cur_ptr + 1;
    count = count + -1;
  } while (count != 0);
  uVar1 = (*(code *)func)();
  count = pack(uVar1,func,size);
  return count;
}

The counter is likely the size of the function to unpack. Note that the key used to decrypt the function depends on the address of the function itself.

We looked at the unpacking routine, but we're still missing the main part.

Where do we start?

We have two approaches:

Static: Since we understand the unpacking routine we can build an unpacker and look at it with ghidra.
Dynamic: we could run the binary and use ghidra to look at the unpacked running code, or we can dump the memory of the running binary and look at it.

Dynamic approach

We'll break at the unpack function with gdb:

unpack((char *)malicious_func_maybe,83,argc,argv);

                         *******************************************************
                         *                      FUNCTION                       *
                         *******************************************************
                         undefined malicious_func_maybe()
           undefined       AL:1         <RETURN>
                         malicious_func_maybe                      XREF[3]:   																																				main:08049878(*),
                         																			0804a228,
                                                              0804a444(*)
      0804970e 17            POP       SS

pwndbg> b *0x0804970e
Breakpoint 1 at 0804970e
pwndbg> r
pwndbg> x/30gi 0x804970e
=> 0x804970e:	pop    ss
   0x804970f:	mov    ebp,esp
   0x8049711:	sub    esp,0x18
   0x8049714:	cmp    DWORD PTR [ebp+0x18],0x1
   0x8049718:	jg     0x804973a
   0x804971a:	mov    eax,DWORD PTR [ebp+0x1c]
   0x804971d:	mov    eax,DWORD PTR [eax]
   0x804971f:	sub    esp,0x8
   0x8049722:	push   eax
   0x8049723:	push   0x804a0f8
   0x8049728:	call   0x8049050 <printf@plt>
   0x804972d:	add    esp,0x10
   0x8049730:	sub    esp,0xc
   0x8049733:	push   0x0
   0x8049735:	call   0x8049080 <exit@plt>
   0x804973a:	mov    DWORD PTR [ebp-0xc],0x0
   0x8049741:	mov    eax,DWORD PTR [ebp+0x1c]
   0x8049744:	add    eax,0x4
   0x8049747:	mov    eax,DWORD PTR [eax]
   0x8049749:	sub    esp,0x4
   0x804974c:	push   eax
   0x804974d:	push   0x11
   0x8049752:	push   0x80492a0
   0x8049757:	call   0x804922b
   0x804975c:	add    esp,0x10
   0x804975f:	add    DWORD PTR [ebp-0xc],eax
   0x8049762:	mov    eax,DWORD PTR [ebp+0x1c]
   0x8049765:	add    eax,0x4
   0x8049768:	mov    eax,DWORD PTR [eax]
   0x804976a:	sub    esp,0x4

Ok up until now I was on the right track. My problem here was that I have no idea on how to read those assembly instructions...

The problem is that what we're looking at are packed instructions. If we break at the following line:

uVar1 = (*(code *)func)();

And we print the same memory cells we'll see the real code used to get the key!

pwndbg> b *0x0804928a
pwndbg> x/30gi 0x804970e
pwndbg> x/30gi 0x804970e
   0x804970e:	push   ebp
   0x804970f:	mov    ebp,esp
   0x8049711:	sub    esp,0x18
   0x8049714:	cmp    DWORD PTR [ebp+0x18],0x1
   0x8049718:	jg     0x804973a
   0x804971a:	mov    eax,DWORD PTR [ebp+0x1c]
   0x804971d:	mov    eax,DWORD PTR [eax]
   0x804971f:	sub    esp,0x8
   0x8049722:	push   eax
   0x8049723:	push   0x804a0f8
   0x8049728:	call   0x8049050 <printf@plt>
   0x804972d:	add    esp,0x10
   0x8049730:	sub    esp,0xc
   0x8049733:	push   0x0
   0x8049735:	call   0x8049080 <exit@plt>
   0x804973a:	mov    DWORD PTR [ebp-0xc],0x0
   0x8049741:	mov    eax,DWORD PTR [ebp+0x1c]
   0x8049744:	add    eax,0x4
   0x8049747:	mov    eax,DWORD PTR [eax]
   0x8049749:	sub    esp,0x4
   0x804974c:	push   eax
   0x804974d:	push   0x11
   0x8049752:	push   0x80492a0
   0x8049757:	call   0x804922b
   0x804975c:	add    esp,0x10
   0x804975f:	add    DWORD PTR [ebp-0xc],eax
   0x8049762:	mov    eax,DWORD PTR [ebp+0x1c]
   0x8049765:	add    eax,0x4
   0x8049768:	mov    eax,DWORD PTR [eax]
   0x804976a:	sub    esp,0x4

Ok this makes sense: push ebp is how an actual disassembled function starts. Note that we have another function call:

   0x8049757:	call   0x804922b

Note that this could be another packer... Actually its the address of the unpack function. So we're calling another function trough the unpacker. Here's the new function:

pwndbg> x/30gi 0x80492a0
   0x80492a0:	push   ebp
   0x80492a1:	mov    ebp,esp
   0x80492a3:	sub    esp,0x18
   0x80492a6:	sub    esp,0x8
   0x80492a9:	push   0x804a039
   0x80492ae:	push   DWORD PTR [ebp+0x18]
   0x80492b1:	call   0x8049030 <strstr@plt>
   0x80492b6:	add    esp,0x10
   0x80492b9:	mov    DWORD PTR [ebp-0xc],eax
   0x80492bc:	mov    eax,DWORD PTR [ebp-0xc]
   0x80492bf:	cmp    eax,DWORD PTR [ebp+0x18]
   0x80492c2:	jne    0x80492cb
   0x80492c4:	mov    eax,0x1
   0x80492c9:	jmp    0x80492e3
   0x80492cb:	sub    esp,0x8
   0x80492ce:	push   DWORD PTR [ebp+0x18]
   0x80492d1:	push   0x804a03f
   0x80492d6:	call   0x8049050 <printf@plt>
   0x80492db:	add    esp,0x10
   0x80492de:	mov    eax,0x0
   0x80492e3:	leave
   0x80492e4:	ret
   0x80492e5:	pop    ss
   0x80492e6:	mov    ecx,0x38aec1d5
   0x80492eb:	mov    bl,0xae
   0x80492ed:	dec    esi
   0x80492ee:	iret
   0x80492ef:	inc    ebp
   0x80492f0:	pop    edx
   0x80492f1:	stos   BYTE PTR es:[edi],al

If we look at the code with gdb:

0x80492b1    call   strstr@plt                     <strstr@plt>
        haystack: 0xffffd518 <— 'flag{esketit}'
        needle: 0x804a039 <— 'flag{'

We've got a call to the strstr function, and another call to the packer.

From this behaviour we unedrstand that the check on the flag is performed incrementally by calling unpack and the pack on some code, and so on. This means that we'll not have a point in time in which during execution the binary is completely unpacked. Extracting the unpacked code dynamically can be very time consuming, which means that a more efficient method would be to write a script that reverse engineer the packing algorithm, which then can be used to create a binary containing an unpacked payload. This is what the static approach does in a nutshell.

Let's reverse engineer the packer with python:

import sys
from pwn import u32, p32

if len(sys.argv) < 4:
    print("usage : %s <inputfile> <address> <size>" % sys.argv[0])
    exit(0)

filepath = sys.argv[1]
address = int(sys.argv[2], 16)
size = int(sys.argv[3], 16)

BEG_BIN = 0x08048000
KEY = [0x04030201, 0x40302010, 0x42303042, 0x44414544, 0xffffffff]
ff = open(filepath, "rb")
f = ff.read()
ff.close()

off = address - BEG_BIN
to_decode = f[off: off+(size*4)]
k = KEY[address % 5]

decode = b""
for i in range(size):
    decode += p32(u32(to_decode[i*4: (i+1)*4]) ^ k)

f = f[:off] + decode + f[off+(size*4):]

ff = open(filepath, "wb")
ff.write(f)
ff.close()

This gives us a decent unpacked main:

void FUN_0804970e(void)

{
  int iVar1;
  int iVar2;
  int iVar3;
  int iVar4;
  int iVar5;
  int iVar6;
  int iVar7;
  int in_stack_00000014;
  undefined4 *in_stack_00000018;

  if (in_stack_00000014 < 2) {
    printf("Usage:\n %s flag{<key>}\n",*in_stack_00000018);
                    /* WARNING: Subroutine does not return */
    exit(0);
  }
  iVar1 = FUN_0804922b(FUN_080492a0,0x11,in_stack_00000018[1]);
  iVar2 = FUN_0804922b(FUN_080492e5,0x11,in_stack_00000018[1]);
  iVar3 = FUN_0804922b(FUN_08049329,0x17,in_stack_00000018[1]);
  iVar4 = FUN_0804922b(FUN_080496ab,0x18,in_stack_00000018[1]);
  iVar5 = FUN_0804922b(FUN_080495e4,0x31,in_stack_00000018[1]);
  iVar6 = FUN_0804922b(FUN_08049546,0x27,in_stack_00000018[1],0);
  iVar7 = FUN_0804922b(FUN_0804951f,9,in_stack_00000018[1]);
  if (iVar1 + iVar2 + iVar3 + iVar4 + iVar5 + iVar6 + iVar7 == 7) {
    printf("\x1b[1;37mYou got the flag: \x1b[1;32m%s\x1b[0m\n",in_stack_00000018[1]);
  }
  else {
    printf("\x1b[1;31mLoser\n\x1b[0m");
  }
  return;
}

We need to repeat this to unpack every function used to create the flag. The first three, as already seen, are respectively checks on the prefix (flag{}), the suffix (}) and the type of characters of the flag (must be ascii). The last one is a check on the flag length (==33). As for the other function we have the following structure:

FUN_080496ab (0x18)
	FUN_08049385 (0x36)

FUN_080495e4 (0x31)
	FUN_0804945e (0x30)

FUN_08049546 (0x27)

Where the indented functions are calls that are made inside the outer ones. The first function determines the first 6 characters of the flag, which are 'packer'. We are still missing 21 characters, which are originated from FUN_080495e4 and FUN_08049546.

FUN_080496ab

After unpacking we get:

int FUN_080496ab(void) {
  int flagchar;
  int in_stack_00000014;
  int i;
  char usrinput;

  i = 1;
  while( true ) {
    if (6 < i) {
      return 1;
    }
    usrinput = *(char *)(in_stack_00000014 + i + 4);
    flagchar = FUN_0804922b(FUN_08049385,0x36,i);
    if (usrinput != flagchar) break;
    i = i + 1;
  }
  return 0;
}

Ok looks like there's another function to unpack: FUN_08049385, which becomes:

int FUN_08049385(void) {
  double dVar1;
  double dVar2;
  double dVar3;
  double dVar4;
  int in_stack_00000014;

  dVar1 = pow((double)in_stack_00000014,5.0);
  dVar2 = pow((double)in_stack_00000014,4.0);
  dVar3 = pow((double)in_stack_00000014,3.0);
  dVar4 = pow((double)in_stack_00000014,2.0);
  return (int)((float)in_stack_00000014 * 99.65 +
               ((float)(dVar3 * 45.83333358 + (dVar1 * 0.5166666688 - dVar2 * 8.125000037)) -
               (float)dVar4 * 109.875) + 84.0);
}

Which we can either reverse engineer with z3, or by looking at which characters are outputed in registers by using gdb (again dynamic approach). After reverse engineering this we'll get that the first part of the flag is flag{packer. Since the flag is 33 chars long, we are still missing 21.

FUN_080495e4

Looks like we need to reverse engineer this:

undefined4 FUN_080495e4(void) {
  undefined4 uVar1;
  int i;
  undefined4 *puVar2;
  undefined4 *puVar3;
  int in_GS_OFFSET;
  int in_stack_00000014;
  int j;
  undefined4 local_7c [23];
  int canary;

  canary = *(int *)(in_GS_OFFSET + 0x14);
  puVar2 = &weird_string;
  puVar3 = local_7c;
                    /* filling local_7c with hardcoded data
                         */
  for (i = 22; i != 0; i = i + -1) {
    *puVar3 = *puVar2;
    puVar2 = puVar2 + 1;
    puVar3 = puVar3 + 1;
  }
  j = 0;
  do {
    if (10 < j) {
      uVar1 = 1;
LAB_08049692:
                    /* exit */
      if (canary != *(int *)(in_GS_OFFSET + 0x14)) {
                    /* WARNING: Subroutine does not return */
        __stack_chk_fail();
      }
      return uVar1;
    }
                    /* here's the real magic */
    i = FUN_0804922b(FUN_0804945e,0x30,(int)*(char *)(in_stack_00000014 + j + 11),local_7c[j * 2],
                     local_7c[j * 2 + 1]);
    if (i == 0) {
      uVar1 = 0;
      goto LAB_08049692;
    }
    if ((*(byte *)(in_stack_00000014 + 0x11) & 1) == 0) {
      uVar1 = 0;
      goto LAB_08049692;
    }
    j = j + 1;
  } while( true );
}

WTF is this? From a first look it seems like we're preparing local_7c and then we're comparing its content with the flag passed by the user using FUN_0804945e. If for some reason the comparison does not go well the function is exited. More specifically it looks like we're comparing each even character of local_7c and its successor with every character of the user input. This is the code that is doing the comparison:

bool FUN_0804945e(void) {
  float10 fVar1;
  double dVar2;
  int in_stack_00000014;
  uint in_stack_00000018;
  uint in_stack_0000001c;
  uint local_34;
  uint uStack48;

  dVar2 = sqrt((double)in_stack_00000014);
  fVar1 = (float10)powl((float10)in_stack_00000014,SUB104((float10)dVar2,0),
                        (int6)((unkuint10)(float10)dVar2 >> 0x20));
  if (_DAT_0804a1a0 <= fVar1) {
    local_34 = (uint)(longlong)ROUND(fVar1 - _DAT_0804a1a0);
    uStack48 = (uint)((ulonglong)(longlong)ROUND(fVar1 - _DAT_0804a1a0) >> 0x20);
    uStack48 = uStack48 ^ 0x80000000;
  }
  else {
    local_34 = (uint)(longlong)ROUND(fVar1);
    uStack48 = (uint)((ulonglong)(longlong)ROUND(fVar1) >> 0x20);
  }
  return (local_34 + 0x15 ^ in_stack_00000018 |
         uStack48 + (0xffffffea < local_34) ^ in_stack_0000001c) == 0;
}

First of all, this is the content of local_7c:

DDE76FA6 1C000000 F8FC7A35 27020000 15000000 00000000 546C155C 6C010000 DDE76FA6 1C000000 66CE3EE9 9D000000 546C155C 6C010000 546C155C 6C010000 414244F3 56070000 C5A46046 01000000 DDE76FA6 1C000000

Which is not something that really makes sense translated in ascii. Actually neither the content of FUN_0804945e makes sense, really. If fact I was so done that I tried to bruteforce the checks. Basically, we know that the return value of FUN_0804945e must be 1. This means that we can script the execution with a gdbinit and try every possible characters for every position of the flag checked by the function to find what we need.

$ p y.py
13:22:02 - starting...
found a char:  flag{packer-
found a char:  flag{packer-4
found a char:  flag{packer-4a3
found a char:  flag{packer-4a3-
found a char:  flag{packer-4a3-1
found a char:  flag{packer-4a3-13
found a char:  flag{packer-4a3-133
found a char:  flag{packer-4a3-1337
finished. took 778.9688053131104 seconds
13:35:01 - characters found: -4a3-1337
exiting...

Ok now we know that the flag up to now is: flag{packer-4a3-1337 something }, where the missing part is 12 characters long.

FUN_08049546

Ok we're still missing 20 characters, identified by this function:

undefined4 FUN_08049546(void) {
  size_t sVar1;
  undefined4 uVar2;
  char *in_stack_00000014;
  int in_stack_00000018;

  sVar1 = strlen(in_stack_00000014);
  if (in_stack_00000018 + 0x16U < sVar1) {
    if ((char)(in_stack_00000014[in_stack_00000018 + 0x14] ^ (&DAT_0804a081)[in_stack_00000018]) ==
        in_stack_00000014[in_stack_00000018 + 0x15]) {
      uVar2 = FUN_08049546(0xdeadb00b,0xdeadb00b,0xdeadb00b,0xdeadb00b,in_stack_00000014,
                           in_stack_00000018 + 1);
    }
    else {
      uVar2 = 0;
    }
  }
  else {
    uVar2 = 1;
  }
  return uVar2;
}

After fixing the function signature with Ghidra it becomes:

bool FUN_08049546(int a,int b,int c,int d,char *user_input,int index) {
  bool chk;
  int userin_len;
  userin_len = strlen(user_input);
  if (index + 22U < (uint)userin_len) {
    if ((char)(user_input[index + 20] ^ (&key)[index]) == user_input[index + 21]) {
      chk = FUN_08049546(L'\xdeadb00b',-0x21524ff5,-0x21524ff5,-0x21524ff5,user_input,index + 1);
    }
    else {
      chk = false;
    }
  }
  else {
    chk = true;
  }
  return chk;
}

Ok this looks easy. index starts from zero. Every function call its incremented by one. The check performed every call compares index+22 against 33, which means that we'll have 11 loop iterations. For evey iteration the character at index+20 of the flag is XORed with key and compared with its follower. Since the XOR is invertible, I think that we can reverse the process to get back the original characters. Using z3:

from z3 import Solver, BitVec, Z3Exception
from IPython import embed
from string import printable

flag1 = 'flag{packer-4a3-1337'
orig_key = b'\x0b\x4c\x0f\x00\x01\x16\x10\x07\x09\x38\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
key = []
for i in range(len(orig_key)):
    key.append(orig_key[i])

print('charset length: %s' % len(printable))
for char in printable:
    print('\ntrying with %s...' % char)

    flagchars = [BitVec('flag_' + str(i), 32) for i in range(20, 31)]
    s = Solver()

    for i in range(0, 10):
        s.add(flagchars[i] ^ key[i] == flagchars[i+1])
    s.add(flagchars[0] == ord(char))

    s.check()
    try:
        m = s.model()
    except Z3Exception:
        continue

    flag = []
    for char in flag1:
        flag.append(char)

    j = 0
    for el in m:
        flag.append(chr(m[flagchars[j]].as_long()))
        j += 1
    flag.append('}')
    print(''.join(flag))
#embed()

And with this we should be all set.

PreviousXSS

Last updated 2 years ago