2022年亚洲午夜一区二区福利,国产精品黄色大片无需下载!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

展開全文

hack the virtual memory, the stack, registers and assembly code

This is the fifth chapter in a series about virtual memory. The goal is to learn some CS basics in a different and more practical way.

If you missed the previous chapters, you should probably start there:

Chapter 0: Hack The Virtual Memory: C strings & /proc
Chapter 1: Hack The Virtual Memory: Python bytes
Chapter 2: Hack The Virtual Memory: Drawing the VM diagram
Chapter 3: Hack the Virtual Memory: malloc, the heap & the program break

The Stack

As we have seen in chapter 2, the stack resides at the high end of memory and grows downward. But how does it work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will have a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables.

Once we understand this, we will be able to play a bit with it, and hijack the flow of our program. Ready? Let’s start!

Note: We will talk only about the user stack, as opposed to the kernel stack

Prerequisites

In order to fully understand this article, you will need to know:

The basics of the C programming language (especially pointers)

Environment

All scripts and programs have been tested on the following system:

Ubuntu

Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Tools used:

gcc
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
objdump
GNU objdump (GNU Binutils for Ubuntu) 2.2

Everything we cover will be true for this system/environment, but may be different on another system

Automatic allocation

Let’s first look at a very simple program that has one function that uses one variable (0-main.c):

#include <stdio.h>

int main(void)
{
    int a;

    a = 972;
    printf("a = %d\n", a);
    return (0);
}

Let’s compile this program and disassemble it using objdump:

holberton$ gcc 0-main.c
holberton$ objdump -d -j .text -M intel

The assembly code produced for our main function is the following:

000000000040052d <main>:
  40052d:       55                      push   rbp
  40052e:       48 89 e5                mov    rbp,rsp
  400531:       48 83 ec 10             sub    rsp,0x10
  400535:       c7 45 fc cc 03 00 00    mov    DWORD PTR [rbp-0x4],0x3cc
  40053c:       8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  40053f:       89 c6                   mov    esi,eax
  400541:       bf e4 05 40 00          mov    edi,0x4005e4
  400546:       b8 00 00 00 00          mov    eax,0x0
  40054b:       e8 c0 fe ff ff          call   400410 <printf@plt>
  400550:       b8 00 00 00 00          mov    eax,0x0
  400555:       c9                      leave  
  400556:       c3                      ret    
  400557:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
  40055e:       00 00

Let’s focus on the first three lines for now:

000000000040052d <main>:
  40052d:       55                      push   rbp
  40052e:       48 89 e5                mov    rbp,rsp
  400531:       48 83 ec 10             sub    rsp,0x10

The first lines of the function main refers to rbp and rsp; these are special purpose registers. rbp is the base pointer, which points to the base of the current stack frame, and rsp is the stack pointer, which points to the top of the current stack frame.

Let’s decompose step by step what is happening here. This is the state of the stack when we enter the function main before the first instruction is run:

the stack

push rbp instruction pushes the value of the register rbp onto the stack. Because it “pushes” onto the stack, now the value of rsp is the memory address of the new top of the stack. The stack and the registers now look like this:

the stack

mov rbp, rsp copies the value of the stack pointer rsp to the base pointer rbp -> rpb and rsp now both point to the top of the stack

the stack

sub rsp, 0x10 creates a space to store values of local variables. The space between rbp and rsp is this space. Note that this space is large enough to store our variable of type integer

the stack

We have just created a space in memory – on the stack – for our local variables. This space is called a stack frame. Every function that has local variables will use a stack frame to store those variables.

Using local variables

The fourth line of assembly code of our main function is the following:

  400535:       c7 45 fc cc 03 00 00    mov    DWORD PTR [rbp-0x4],0x3cc

0x3cc is actually the value 972 in hexadecimal. This line corresponds to our C-code line:

a = 972;

mov DWORD PTR [rbp-0x4],0x3cc is setting the memory at address rbp - 4 to 972. [rbp - 4] IS our local variable a. The computer doesn’t actually know the name of the variable we use in our code, it simply refers to memory addresses on the stack.

This is the state of the stack and the registers after this operation:

the stack

`leave`, Automatic de-allocation

If we look now at the end of the function, we will find this:

  400555:       c9                      leave

The instruction leave sets rsp to rbp, and then pops the top of the stack into rbp.

the stack

Because we pushed the previous value of rbp onto the stack when we entered the function, rbp is now set to the previous value of rbp. This is how:

The local variables are “de-allocated”, and
the stack frame of the previous function is restored before we leave the current function.

The state of the stack and the registers rbp and rsp are restored to the same state as when we entered our main function.

Playing with the stack

When the variables are automatically de-allocated from the stack, they are not completely “destroyed”. Their values are still in memory, and this space will potentially be used by other functions.

This is why it is important to initialize your variables when you write your code, because otherwise, they will take whatever value there is on the stack at the moment when the program is running.

Let’s consider the following C code (1-main.c):

#include <stdio.h>

void func1(void)
{
     int a;
     int b;
     int c;

     a = 98;
     b = 972;
     c = a + b;
     printf("a = %d, b = %d, c = %d\n", a, b, c);
}

void func2(void)
{
     int a;
     int b;
     int c;

     printf("a = %d, b = %d, c = %d\n", a, b, c);
}

int main(void)
{
    func1();
    func2();
    return (0);
}

As you can see, func2 does not set the values of its local vaiables a, b and c, yet if we compile and run this program it will print…

holberton$ gcc 1-main.c && ./a.out 
a = 98, b = 972, c = 1070
a = 98, b = 972, c = 1070
holberton$

… the same variable values of func1! This is because of how the stack works. The two functions declared the same amount of variables, with the same type, in the same order. Their stack frames are exactly the same. When func1 ends, the memory where the values of its local variables reside are not cleared – only rsp is incremented.
As a consequence, when we call func2 its stack frame sits at exactly the same place of the previous func1 stack frame, and the local variables of func2 have the same values of the local variables of func1 when we left func1.

Let’s examine the assembly code to prove it:

holberton$ objdump -d -j .text -M intel

000000000040052d <func1>:
  40052d:       55                      push   rbp
  40052e:       48 89 e5                mov    rbp,rsp
  400531:       48 83 ec 10             sub    rsp,0x10
  400535:       c7 45 f4 62 00 00 00    mov    DWORD PTR [rbp-0xc],0x62
  40053c:       c7 45 f8 cc 03 00 00    mov    DWORD PTR [rbp-0x8],0x3cc
  400543:       8b 45 f8                mov    eax,DWORD PTR [rbp-0x8]
  400546:       8b 55 f4                mov    edx,DWORD PTR [rbp-0xc]
  400549:       01 d0                   add    eax,edx
  40054b:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
  40054e:       8b 4d fc                mov    ecx,DWORD PTR [rbp-0x4]
  400551:       8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
  400554:       8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]
  400557:       89 c6                   mov    esi,eax
  400559:       bf 34 06 40 00          mov    edi,0x400634
  40055e:       b8 00 00 00 00          mov    eax,0x0
  400563:       e8 a8 fe ff ff          call   400410 <printf@plt>
  400568:       c9                      leave  
  400569:       c3                      ret    

000000000040056a <func2>:
  40056a:       55                      push   rbp
  40056b:       48 89 e5                mov    rbp,rsp
  40056e:       48 83 ec 10             sub    rsp,0x10
  400572:       8b 4d fc                mov    ecx,DWORD PTR [rbp-0x4]
  400575:       8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
  400578:       8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]
  40057b:       89 c6                   mov    esi,eax
  40057d:       bf 34 06 40 00          mov    edi,0x400634
  400582:       b8 00 00 00 00          mov    eax,0x0
  400587:       e8 84 fe ff ff          call   400410 <printf@plt>
  40058c:       c9                      leave  
  40058d:       c3                      ret  

000000000040058e <main>:
  40058e:       55                      push   rbp
  40058f:       48 89 e5                mov    rbp,rsp
  400592:       e8 96 ff ff ff          call   40052d <func1>
  400597:       e8 ce ff ff ff          call   40056a <func2>
  40059c:       b8 00 00 00 00          mov    eax,0x0
  4005a1:       5d                      pop    rbp
  4005a2:       c3                      ret    
  4005a3:       66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
  4005aa:       00 00 00 
  4005ad:       0f 1f 00                nop    DWORD PTR [rax]

As you can see, the way the stack frame is formed is always consistent. In our two functions, the size of the stack frame is the same since the local variables are the same.

push   rbp
mov    rbp,rsp
sub    rsp,0x10

And both functions end with the leave statement.

The variables a, b and c are referenced the same way in the two functions:

a lies at memory address rbp - 0xc
b lies at memory address rbp - 0x8
c lies at memory address rbp - 0x4

Note that the order of those variables on the stack is not the same as the order of those variables in our code. The compiler orders them as it wants, so you should never assume the order of your local variables in the stack.

So, this is the state of the stack and the registers rbp and rsp before we leave func1:

the stack

When we leave the function func1, we hit the instruction leave; as previously explained, this is the state of the stack, rbp and rsp right before returning to the function main:

the stack

So when we enter func2, the local variables are set to whatever sits in memory on the stack, and that is why their values are the same as the local variables of the function func1.

the stack

ret

You might have noticed that all our example functions end with the instruction ret. ret pops the return address from stack and jumps there. When functions are called the program uses the instruction call to push the return address before it jumps to the first instruction of the function called.
This is how the program is able to call a function and then return from said function the calling function to execute its next instruction.

So this means that there are more than just variables on the stack, there are also memory addresses of instructions. Let’s revisit our 1-main.c code.

When the main function calls func1,

  400592:       e8 96 ff ff ff          call   40052d <func1>

it pushes the memory address of the next instruction onto the stack, and then jumps to func1.
As a consequence, before executing any instructions in func1, the top of the stack contains this address, so rsp points to this value.

the stack

After the stack frame of func1 is formed, the stack looks like this:

the stack

Wrapping everything up

Given what we just learned, we can directly use rbp to directly access all our local variables (without using the C variables!), as well as the saved rbp value on the stack and the return address values of our functions.

To do so in C, we can use:

    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

Here is the listing of the program 2-main.c:

#include <stdio.h>

void func1(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    a = 98;
    b = 972;
    c = a + b;
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("func1, rpb = %lx\n", rbp);
    printf("func1, rsp = %lx\n", rsp);
    printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
    printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
    printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
    printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
    printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
}

void func2(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
    printf("func2, rpb = %lx\n", rbp);
    printf("func2, rsp = %lx\n", rsp);
}

int main(void)
{
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("main, rpb = %lx\n", rbp);
    printf("main, rsp = %lx\n", rsp);
    func1();
    func2();
    return (0);
}

Getting the values of the variables

the stack

From our previous discoveries, we know that our variables are referenced via rbp – 0xX:

a is at rbp - 0xc
b is at rbp - 0x8
c is at rbp - 0x4

So in order to get the values of those variables, we need to dereference rbp. For the variable a:

cast our variable rbp to a char *: (char *)rbp
subtract the correct amount of bytes to get the address of where the variable is in memory: (char *)rbp) - 0xc
cast it again to a pointer pointing to an int since a is of type int: (int *)(((char *)rbp) - 0xc)
and dereference it to get the value sitting at this address: *(int *)(((char *)rbp) - 0xc)

The saved `rbp` value

the stack

Looking at the above diagram, the current rbp directly points to the saved rbp, so we simply have to cast our variable rbp to a pointer to an unsigned long int and dereference it: *(unsigned long int *)rbp.

The return address value

the stack

The return address value is right before the saved previous rbp on the stack. rbp is 8 bytes long, so we simply need to add 8 to the current value of rbp to get the address where this return value is on the stack. This is how we do it:

cast our variable rbp to a char *: (char *)rbp
add 8 to this value: ((char *)rbp + 8)
cast it to point to an unsigned long int: (unsigned long int *)((char *)rbp + 8)
dereference it to get the value at this address: *(unsigned long int *)((char *)rbp + 8)

The output of our program

holberton$ gcc 2-main.c && ./a.out 
main, rpb = 7ffc78e71b70
main, rsp = 7ffc78e71b70
a = 98, b = 972, c = 1070
func1, rpb = 7ffc78e71b60
func1, rsp = 7ffc78e71b50
func1, a = 98
func1, b = 972
func1, c = 1070
func1, previous rbp value = 7ffc78e71b70
func1, return address value = 400697
func2, a = 98, b = 972, c = 1070
func2, rpb = 7ffc78e71b60
func2, rsp = 7ffc78e71b50
holberton$

We can see that:

from func1 we can access all our variables correctly via rbp
from func1 we can get the rbp of the function main
we confirm that func1 and func2 do have the same rbp and rsp values
the difference between rsp and rbp is 0x10, as seen in the assembly code (sub rsp,0x10)
in the main function, rsp == rbp because there are no local variables

The return address from func1 is 0x400697. Let’s double check this assumption by disassembling the program. If we are correct, this should be the address of the instruction right after the call of func1 in the main function.

holberton$ objdump -d -j .text -M intel | less

0000000000400664 <main>:
  400664:       55                      push   rbp
  400665:       48 89 e5                mov    rbp,rsp
  400668:       48 89 e8                mov    rax,rbp
  40066b:       48 89 c6                mov    rsi,rax
  40066e:       bf 3b 08 40 00          mov    edi,0x40083b
  400673:       b8 00 00 00 00          mov    eax,0x0
  400678:       e8 93 fd ff ff          call   400410 <printf@plt>
  40067d:       48 89 e0                mov    rax,rsp
  400680:       48 89 c6                mov    rsi,rax
  400683:       bf 4c 08 40 00          mov    edi,0x40084c
  400688:       b8 00 00 00 00          mov    eax,0x0
  40068d:       e8 7e fd ff ff          call   400410 <printf@plt>
  400692:       e8 96 fe ff ff          call   40052d <func1>
  400697:       e8 7a ff ff ff          call   400616 <func2>
  40069c:       b8 00 00 00 00          mov    eax,0x0
  4006a1:       5d                      pop    rbp
  4006a2:       c3                      ret    
  4006a3:       66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
  4006aa:       00 00 00 
  4006ad:       0f 1f 00                nop    DWORD PTR [rax]

And yes! \o/

Hack the stack!

Now that we know where to find the return address on the stack, what if we were to modify this value? Could we alter the flow of a program and make func1 return to somewhere else? Let’s add a new function, called bye to our program (3-main.c):

#include <stdio.h>
#include <stdlib.h>

void bye(void)
{
    printf("[x] I am in the function bye!\n");
    exit(98);
}

void func1(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    a = 98;
    b = 972;
    c = a + b;
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("func1, rpb = %lx\n", rbp);
    printf("func1, rsp = %lx\n", rsp);
    printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
    printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
    printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
    printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
    printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
}

void func2(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
    printf("func2, rpb = %lx\n", rbp);
    printf("func2, rsp = %lx\n", rsp);
}

int main(void)
{
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("main, rpb = %lx\n", rbp);
    printf("main, rsp = %lx\n", rsp);
    func1();
    func2();
    return (0);
}

Let’s see at which address the code of this function starts:

holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less

00000000004005bd <bye>:
  4005bd:       55                      push   rbp
  4005be:       48 89 e5                mov    rbp,rsp
  4005c1:       bf d8 07 40 00          mov    edi,0x4007d8
  4005c6:       e8 b5 fe ff ff          call   400480 <puts@plt>
  4005cb:       bf 62 00 00 00          mov    edi,0x62
  4005d0:       e8 eb fe ff ff          call   4004c0 <exit@plt>

Now let’s replace the return address on the stack from the func1 function with the address of the beginning of the function bye, 4005bd (4-main.c):

#include <stdio.h>
#include <stdlib.h>

void bye(void)
{
    printf("[x] I am in the function bye!\n");
    exit(98);
}

void func1(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    a = 98;
    b = 972;
    c = a + b;
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("func1, rpb = %lx\n", rbp);
    printf("func1, rsp = %lx\n", rsp);
    printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
    printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
    printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
    printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
    printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
    /* hack the stack! */
    *(unsigned long int *)((char *)rbp + 8) = 0x4005bd;
}

void func2(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
    printf("func2, rpb = %lx\n", rbp);
    printf("func2, rsp = %lx\n", rsp);
}

int main(void)
{
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("main, rpb = %lx\n", rbp);
    printf("main, rsp = %lx\n", rsp);
    func1();
    func2();
    return (0);
}

holberton$ gcc 4-main.c && ./a.out
main, rpb = 7fff62ef1b60
main, rsp = 7fff62ef1b60
a = 98, b = 972, c = 1070
func1, rpb = 7fff62ef1b50
func1, rsp = 7fff62ef1b40
func1, a = 98
func1, b = 972
func1, c = 1070
func1, previous rbp value = 7fff62ef1b60
func1, return address value = 40074d
[x] I am in the function bye!
holberton$ echo $?
98
holberton$

We have called the function bye, without calling it!

Outro

I hope that you enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you have anything you would like me to cover in the next chapter.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

Files

This repo contains the source code (X-main.c files) for programs created in this tutorial.

Hack the Virtual Memory: malloc, the heap & the program break

Posted by Julien Barbier

Hack the VM!

This is the fourth chapter in a series around virtual memory. The goal is to learn some CS basics, but in a different and more practical way.

If you missed the previous chapters, you should probably start there:

Chapter 0: Hack The Virtual Memory: C strings & /proc
Chapter 1: Hack The Virtual Memory: Python bytes
Chapter 2: Hack The Virtual Memory: Drawing the VM diagram

The heap

In this chapter we will look at the heap and malloc in order to answer some of the questions we ended with at the end of the previous chapter:

Why doesn’t our allocated memory start at the very beginning of the heap (0x2050010 vs 02050000)? What are those first 16 bytes used for?
Is the heap actually growing upwards?

Prerequisites

In order to fully understand this article, you will need to know:

The basics of the C programming language (especially pointers)
The very basics of the Linux filesystem and the shell
We will also use the /proc/[pid]/maps file (see man proc or read our first article Hack The Virtual Memory, chapter 0: C strings & /proc)

Environment

All scripts and programs have been tested on the following system:

Ubuntu

Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Tools used:

gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4

glibc 2.19 (see version.c if you need to check your glibc version)
strace

strace — version 4.8

Everything we will write will be true for this system/environment, but may be different on another system

We will also go through the Linux source code. If you are on Ubuntu, you can download the sources of your current kernel by running this command:

apt-get source linux-image-$(uname -r)

`malloc`

malloc is the common function used to dynamically allocate memory. This memory is allocated on the “heap”.
Note: malloc is not a system call.

From man malloc:

[...] allocate dynamic memory[...]
void *malloc(size_t size);
[...]
The malloc() function allocates size bytes and returns a pointer to the allocated memory.

No malloc, no [heap]

Let’s look at memory regions of a process that does not call malloc (0-main.c).

#include <stdlib.h>
#include <stdio.h>

/**
 * main - do nothing
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    getchar();
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 0-main.c -o 0
julien@holberton:~/holberton/w/hackthevm3$ ./0

Quick reminder (1/3): the memory regions of a process are listed in the /proc/[pid]/maps file. As a result, we first need to know the PID of the process. That is done using the ps command; the second column of ps aux output will give us the PID of the process. Please read chapter 0 to learn more.

julien@holberton:/tmp$ ps aux | grep \ \./0$
julien     3638  0.0  0.0   4200   648 pts/9    S+   12:01   0:00 ./0

Quick reminder (2/3): from the above output, we can see that the PID of the process we want to look at is 3638. As a result, the maps file will be found in the directory /proc/3638.

julien@holberton:/tmp$ cd /proc/3638

Quick reminder (3/3): The maps file contains the memory regions of the process. The format of each line in this file is:
address perms offset dev inode pathname

julien@holberton:/proc/3638$ cat maps
00400000-00401000 r-xp 00000000 08:01 174583                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
00600000-00601000 r--p 00000000 08:01 174583                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
00601000-00602000 rw-p 00001000 08:01 174583                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
7f38f87d7000-7f38f8991000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8991000-7f38f8b91000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b91000-7f38f8b95000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b95000-7f38f8b97000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b97000-7f38f8b9c000 rw-p 00000000 00:00 0 
7f38f8b9c000-7f38f8bbf000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8da3000-7f38f8da6000 rw-p 00000000 00:00 0 
7f38f8dbb000-7f38f8dbe000 rw-p 00000000 00:00 0 
7f38f8dbe000-7f38f8dbf000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8dbf000-7f38f8dc0000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8dc0000-7f38f8dc1000 rw-p 00000000 00:00 0 
7ffdd85c5000-7ffdd85e6000 rw-p 00000000 00:00 0                          [stack]
7ffdd85f2000-7ffdd85f4000 r--p 00000000 00:00 0                          [vvar]
7ffdd85f4000-7ffdd85f6000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:/proc/3638$

Note: hackthevm3 is a symbolic link to hack_the_virtual_memory/03. The Heap/

-> As we can see from the above maps file, there’s no [heap] region allocated.

`malloc(x)`

Let’s do the same but with a program that calls malloc (1-main.c):

#include <stdio.h>
#include <stdlib.h>

/**
 * main - 1 call to malloc
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    malloc(1);
    getchar();
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 1-main.c -o 1
julien@holberton:~/holberton/w/hackthevm3$ ./1

julien@holberton:/proc/3638$ ps aux | grep \ \./1$
julien     3718  0.0  0.0   4332   660 pts/9    S+   12:09   0:00 ./1
julien@holberton:/proc/3638$ cd /proc/3718
julien@holberton:/proc/3718$ cat maps 
00400000-00401000 r-xp 00000000 08:01 176964                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
00600000-00601000 r--p 00000000 08:01 176964                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
00601000-00602000 rw-p 00001000 08:01 176964                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
01195000-011b6000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/3718$

-> the [heap] is here.

Let’s check the return value of malloc to make sure the returned address is in the heap region (2-main.c):

#include <stdio.h>
#include <stdlib.h>

/**
 * main - prints the malloc returned address
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;

    p = malloc(1);
    printf("%p\n", p);
    getchar();
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 2-main.c -o 2
julien@holberton:~/holberton/w/hackthevm3$ ./2 
0x24d6010

julien@holberton:/proc/3718$ ps aux | grep \ \./2$
julien     3834  0.0  0.0   4336   676 pts/9    S+   12:48   0:00 ./2
julien@holberton:/proc/3718$ cd /proc/3834
julien@holberton:/proc/3834$ cat maps
00400000-00401000 r-xp 00000000 08:01 176966                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
00600000-00601000 r--p 00000000 08:01 176966                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
00601000-00602000 rw-p 00001000 08:01 176966                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
024d6000-024f7000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/3834$

-> 024d6000 <0x24d6010 < 024f7000

The returned address is inside the heap region. And as we have seen in the previous chapter, the returned address does not start exactly at the beginning of the region; we’ll see why later.

`strace`, `brk` and `sbrk`

malloc is a “regular” function (as opposed to a system call), so it must call some kind of syscall in order to manipulate the heap. Let’s use strace to find out.

strace is a program used to trace system calls and signals. Any program will always use a few syscalls before your main function is executed. In order to know which syscalls are used by malloc, we will add a write syscall before and after the call to malloc(3-main.c).

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**
 * main - let's find out which syscall malloc is using
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;

    write(1, "BEFORE MALLOC\n", 14);
    p = malloc(1);
    write(1, "AFTER MALLOC\n", 13);
    printf("%p\n", p);
    getchar();
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 3-main.c -o 3
julien@holberton:~/holberton/w/hackthevm3$ strace ./3 
execve("./3", ["./3"], [/* 61 vars */]) = 0
...
write(1, "BEFORE MALLOC\n", 14BEFORE MALLOC
)         = 14
brk(0)                                  = 0xe70000
brk(0xe91000)                           = 0xe91000
write(1, "AFTER MALLOC\n", 13AFTER MALLOC
)          = 13
...
read(0,

From the above listing we can focus on this:

brk(0)                                  = 0xe70000
brk(0xe91000)                           = 0xe91000

-> malloc is using the brk system call in order to manipulate the heap. From brk man page (man brk), we can see what this system call is doing:

...
       int brk(void *addr);
       void *sbrk(intptr_t increment);
...
DESCRIPTION
       brk() and sbrk() change the location of the program  break,  which  defines
       the end of the process's data segment (i.e., the program break is the first
       location after the end of the uninitialized data segment).  Increasing  the
       program  break has the effect of allocating memory to the process; decreas‐
       ing the break deallocates memory.

       brk() sets the end of the data segment to the value specified by addr, when
       that  value  is  reasonable,  the system has enough memory, and the process
       does not exceed its maximum data size (see setrlimit(2)).

       sbrk() increments the program's data space  by  increment  bytes.   Calling
       sbrk()  with  an increment of 0 can be used to find the current location of
       the program break.

The program break is the address of the first location beyond the current end of the data region of the program in the virual memory.

program break before the call to malloc / brk

By increasing the value of the program break, via brk or sbrk, the function malloc creates a new space that can then be used by the process to dynamically allocate memory (using malloc).

program break after the malloc / brk call

So the heap is actually an extension of the data segment of the program.

The first call to brk (brk(0)) returns the current address of the program break to malloc. And the second call is the one that actually creates new memory (since 0xe91000 > 0xe70000) by increasing the value of the program break. In the above example, the heap is now starting at 0xe70000 and ends at 0xe91000. Let’s double check with the /proc/[PID]/maps file:

julien@holberton:/proc/3855$ ps aux | grep \ \./3$
julien     4011  0.0  0.0   4748   708 pts/9    S+   13:04   0:00 strace ./3
julien     4014  0.0  0.0   4336   644 pts/9    S+   13:04   0:00 ./3
julien@holberton:/proc/3855$ cd /proc/4014
julien@holberton:/proc/4014$ cat maps 
00400000-00401000 r-xp 00000000 08:01 176967                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00600000-00601000 r--p 00000000 08:01 176967                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00601000-00602000 rw-p 00001000 08:01 176967                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00e70000-00e91000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/4014$

-> 00e70000-00e91000 rw-p 00000000 00:00 0 [heap] matches the pointers returned back to malloc by brk.

That’s great, but wait, why didmalloc increment the heap by 00e91000 – 00e70000 = 0x21000 or 135168 bytes, when we only asked for only 1 byte?

Many mallocs

What will happen if we call malloc several times? (4-main.c)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**
 * main - many calls to malloc
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;

    write(1, "BEFORE MALLOC #0\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #0\n", 16);
    printf("%p\n", p);

    write(1, "BEFORE MALLOC #1\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #1\n", 16);
    printf("%p\n", p);

    write(1, "BEFORE MALLOC #2\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #2\n", 16);
    printf("%p\n", p);

    write(1, "BEFORE MALLOC #3\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #3\n", 16);
    printf("%p\n", p);

    getchar();
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 4-main.c -o 4
julien@holberton:~/holberton/w/hackthevm3$ strace ./4 
execve("./4", ["./4"], [/* 61 vars */]) = 0
...
write(1, "BEFORE MALLOC #0\n", 17BEFORE MALLOC #0
)      = 17
brk(0)                                  = 0x1314000
brk(0x1335000)                          = 0x1335000
write(1, "AFTER MALLOC #0\n", 16AFTER MALLOC #0
)       = 16
...
write(1, "0x1314010\n", 100x1314010
)             = 10
write(1, "BEFORE MALLOC #1\n", 17BEFORE MALLOC #1
)      = 17
write(1, "AFTER MALLOC #1\n", 16AFTER MALLOC #1
)       = 16
write(1, "0x1314420\n", 100x1314420
)             = 10
write(1, "BEFORE MALLOC #2\n", 17BEFORE MALLOC #2
)      = 17
write(1, "AFTER MALLOC #2\n", 16AFTER MALLOC #2
)       = 16
write(1, "0x1314830\n", 100x1314830
)             = 10
write(1, "BEFORE MALLOC #3\n", 17BEFORE MALLOC #3
)      = 17
write(1, "AFTER MALLOC #3\n", 16AFTER MALLOC #3
)       = 16
write(1, "0x1314c40\n", 100x1314c40
)             = 10
...
read(0,

-> malloc is NOT calling brk each time we call it.

The first time, malloc creates a new space (the heap) for the program (by increasing the program break location). The following times, malloc uses the same space to give our program “new” chunks of memory. Those “new” chunks of memory are part of the memory previously allocated using brk. This way, malloc doesn’t have to use syscalls (brk) every time we call it, and thus it makes malloc – and our programs using malloc – faster. It also allows malloc and free to optimize the usage of the memory.

Let’s double check that we have only one heap, allocated by the first call to brk:

julien@holberton:/proc/4014$ ps aux | grep \ \./4$
julien     4169  0.0  0.0   4748   688 pts/9    S+   13:33   0:00 strace ./4
julien     4172  0.0  0.0   4336   656 pts/9    S+   13:33   0:00 ./4
julien@holberton:/proc/4014$ cd /proc/4172
julien@holberton:/proc/4172$ cat maps
00400000-00401000 r-xp 00000000 08:01 176973                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
00600000-00601000 r--p 00000000 08:01 176973                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
00601000-00602000 rw-p 00001000 08:01 176973                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
01314000-01335000 rw-p 00000000 00:00 0                                  [heap]
7f4a3f2c4000-7f4a3f47e000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f47e000-7f4a3f67e000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f67e000-7f4a3f682000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f682000-7f4a3f684000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f684000-7f4a3f689000 rw-p 00000000 00:00 0 
7f4a3f689000-7f4a3f6ac000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f890000-7f4a3f893000 rw-p 00000000 00:00 0 
7f4a3f8a7000-7f4a3f8ab000 rw-p 00000000 00:00 0 
7f4a3f8ab000-7f4a3f8ac000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f8ac000-7f4a3f8ad000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f8ad000-7f4a3f8ae000 rw-p 00000000 00:00 0 
7ffd1ba73000-7ffd1ba94000 rw-p 00000000 00:00 0                          [stack]
7ffd1bbed000-7ffd1bbef000 r--p 00000000 00:00 0                          [vvar]
7ffd1bbef000-7ffd1bbf1000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:/proc/4172$

-> We have only one [heap] and the addresses match those returned by sbrk: 0x1314000 & 0x1335000

Naive malloc

Based on the above, and assuming we won’t ever need to free anything, we can now write our own (naive) version of malloc, that would move the program break each time it is called.

#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * malloc - naive version of malloc: dynamically allocates memory on the heap using sbrk                         
 * @size: number of bytes to allocate                                                          
 *                                                                                             
 * Return: the memory address newly allocated, or NULL on error                                
 *                                                                                             
 * Note: don't do this at home :)                                                              
 */
void *malloc(size_t size)
{
    void *previous_break;

    previous_break = sbrk(size);
    /* check for error */
    if (previous_break == (void *) -1)
    {
        /* on error malloc returns NULL */
        return (NULL);
    }
    return (previous_break);
}

The 0x10 lost bytes

If we look at the output of the previous program (4-main.c), we can see that the first memory address returned by malloc doesn’t start at the beginning of the heap, but 0x10 bytes after: 0x1314010 vs 0x1314000. Also, when we call malloc(1024) a second time, the address should be 0x1314010 (the returned value of the first call to malloc) + 1024 (or 0x400 in hexadecimal, since the first call to malloc was asking for 1024 bytes) = 0x1318010. But the return value of the second call to malloc is 0x1314420. We have lost 0x10 bytes again! Same goes for the subsequent calls.

Let’s look at what we can find inside those “l(fā)ost” 0x10-byte memory spaces (5-main.c) and whether the memory loss stays constant:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - the 0x10 lost bytes
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;

    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1));
        printf("%p\n", p);
        printf("bytes at %p:\n", (void *)((char *)p - 0x10));
        pmem((char *)p - 0x10, 0x10);
    }
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 5-main.c -o 5
julien@holberton:~/holberton/w/hackthevm3$ ./5
0x1fa8010
bytes at 0x1fa8000:
00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
0x1fa8420
bytes at 0x1fa8410:
00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
0x1fa8c30
bytes at 0x1fa8c20:
00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
0x1fa9840
bytes at 0x1fa9830:
00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
0x1faa850
bytes at 0x1faa840:
00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00
0x1fabc60
bytes at 0x1fabc50:
00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
0x1fad470
bytes at 0x1fad460:
00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
0x1faf080
bytes at 0x1faf070:
00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
0x1fb1090
bytes at 0x1fb1080:
00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00
0x1fb34a0
bytes at 0x1fb3490:
00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
julien@holberton:~/holberton/w/hackthevm3$

There is one clear pattern: the size of the malloc’ed memory chunk is always found in the preceding 0x10 bytes. For instance, the first malloc call is malloc’ing 1024 (0x0400) bytes and we can find 11 04 00 00 00 00 00 00 in the preceding 0x10 bytes. Those last bytes represent the number 0x 00 00 00 00 00 00 04 11 = 0x400 (1024) + 0x10 (the block size preceding those 1024 bytes + 1 (we’ll talk about this “+1” later in this chapter). If we look at each 0x10 bytes preceding the addresses returned by malloc, they all contain the size of the chunk of memory asked to malloc + 0x10 + 1.

At this point, given what we said and saw earlier, we can probably guess that those 0x10 bytes are a sort of data structure used by malloc (and free) to deal with the heap. And indeed, even though we don’t understand everything yet, we can already use this data structure to go from one malloc’ed chunk of memory to the other (6-main.c) as long as we have the address of the beginning of the heap (and as long as we have never called free):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - using the 0x10 bytes to jump to next malloc'ed chunks
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;
    void *heap_start;
    size_t size_of_the_block;

    heap_start = sbrk(0);
    write(1, "START\n", 6);
    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1)); 
        *((int *)p) = i;
        printf("%p: [%i]\n", p, i);
    }
    p = heap_start;
    for (i = 0; i < 10; i++)
    {
        pmem(p, 0x10);
        size_of_the_block = *((size_t *)((char *)p + 8)) - 1;
        printf("%p: [%i] - size = %lu\n",
              (void *)((char *)p + 0x10),
              *((int *)((char *)p + 0x10)),
              size_of_the_block);
        p = (void *)((char *)p + size_of_the_block);
    }
    write(1, "END\n", 4);
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 6-main.c -o 6
julien@holberton:~/holberton/w/hackthevm3$ ./6 
START
0x9e6010: [0]
0x9e6420: [1]
0x9e6c30: [2]
0x9e7840: [3]
0x9e8850: [4]
0x9e9c60: [5]
0x9eb470: [6]
0x9ed080: [7]
0x9ef090: [8]
0x9f14a0: [9]
00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
0x9e6010: [0] - size = 1040
00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
0x9e6420: [1] - size = 2064
00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
0x9e6c30: [2] - size = 3088
00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
0x9e7840: [3] - size = 4112
00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00
0x9e8850: [4] - size = 5136
00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
0x9e9c60: [5] - size = 6160
00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
0x9eb470: [6] - size = 7184
00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
0x9ed080: [7] - size = 8208
00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00
0x9ef090: [8] - size = 9232
00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
0x9f14a0: [9] - size = 10256
END
julien@holberton:~/holberton/w/hackthevm3$

One of our open questions from the previous chapter is now answered: malloc is using 0x10 additional bytes for each malloc’ed memory block to store the size of the block.

0x10 bytes preceeding malloc

This data will actually be used by free to save it to a list of available blocks for future calls to malloc.

But our study also raises a new question: what are the first 8 bytes of the 16 (0x10 in hexadecimal) bytes used for? It seems to always be zero. Is it just padding?

RTFSC

At this stage, we probably want to check the source code of malloc to confirm what we just found (malloc.c from the glibc).

1055 /*
1056       malloc_chunk details:
1057    
1058        (The following includes lightly edited explanations by Colin Plumb.)
1059    
1060        Chunks of memory are maintained using a `boundary tag' method as
1061        described in e.g., Knuth or Standish.  (See the paper by Paul
1062        Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
1063        survey of such techniques.)  Sizes of free chunks are stored both
1064        in the front of each chunk and at the end.  This makes
1065        consolidating fragmented chunks into bigger chunks very fast.  The
1066        size fields also hold bits representing whether chunks are free or
1067        in use.
1068    
1069        An allocated chunk looks like this:
1070    
1071    
1072        chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1073                |             Size of previous chunk, if unallocated (P clear)  |
1074                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1075                |             Size of chunk, in bytes                     |A|M|P|
1076          mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1077                |             User data starts here...                          .
1078                .                                                               .
1079                .             (malloc_usable_size() bytes)                      .
1080                .                                                               |
1081    nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1082                |             (size of chunk, but used for application data)    |
1083                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1084                |             Size of next chunk, in bytes                |A|0|1|
1085                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1086    
1087        Where "chunk" is the front of the chunk for the purpose of most of
1088        the malloc code, but "mem" is the pointer that is returned to the
1089        user.  "Nextchunk" is the beginning of the next contiguous chunk.

-> We were correct \o/. Right before the address returned by malloc to the user, we have two variables:

Size of previous chunk, if unallocated: we never free’d any chunks so that is why it was always 0
Size of chunk, in bytes

Let’s free some chunks to confirm that the first 8 bytes are used the way the source code describes it (7-main.c):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - confirm the source code
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;
    size_t size_of_the_chunk;
    size_t size_of_the_previous_chunk;
    void *chunks[10];

    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1));
        chunks[i] = (void *)((char *)p - 0x10);
        printf("%p\n", p);
    }
    free((char *)(chunks[3]) + 0x10);
    free((char *)(chunks[7]) + 0x10);
    for (i = 0; i < 10; i++)
    {
        p = chunks[i];
        printf("chunks[%d]: ", i);
        pmem(p, 0x10);
        size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1;
        size_of_the_previous_chunk = *((size_t *)((char *)p));
        printf("chunks[%d]: %p, size = %li, prev = %li\n",
              i, p, size_of_the_chunk, size_of_the_previous_chunk);
    }
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 7-main.c -o 7
julien@holberton:~/holberton/w/hackthevm3$ ./7
0x1536010
0x1536420
0x1536c30
0x1537840
0x1538850
0x1539c60
0x153b470
0x153d080
0x153f090
0x15414a0
chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
chunks[0]: 0x1536000, size = 1040, prev = 0
chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
chunks[1]: 0x1536410, size = 2064, prev = 0
chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
chunks[2]: 0x1536c20, size = 3088, prev = 0
chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
chunks[3]: 0x1537830, size = 4112, prev = 0
chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
chunks[4]: 0x1538840, size = 5135, prev = 4112
chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
chunks[5]: 0x1539c50, size = 6160, prev = 0
chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
chunks[6]: 0x153b460, size = 7184, prev = 0
chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
chunks[7]: 0x153d070, size = 8208, prev = 0
chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
chunks[8]: 0x153f080, size = 9231, prev = 8208
chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
chunks[9]: 0x1541490, size = 10256, prev = 0
julien@holberton:~/holberton/w/hackthevm3$

As we can see from the above listing, when the previous chunk has been free’d, the malloc chunk’s first 8 bytes contain the size of the previous unallocated chunk. So the correct representation of a malloc chunk is the following:

malloc chunk

Also, it seems that the first bit of the next 8 bytes (containing the size of the current chunk) serves as a flag to check if the previous chunk is used (1) or not (0). So the correct updated version of our program should be written this way (8-main.c):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - updating with correct checks
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;
    size_t size_of_the_chunk;
    size_t size_of_the_previous_chunk;
    void *chunks[10];
    char prev_used;

    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1));
        chunks[i] = (void *)((char *)p - 0x10);
    }
    free((char *)(chunks[3]) + 0x10);
    free((char *)(chunks[7]) + 0x10);
    for (i = 0; i < 10; i++)
    {
        p = chunks[i];
        printf("chunks[%d]: ", i);
        pmem(p, 0x10);
        size_of_the_chunk = *((size_t *)((char *)p + 8));
        prev_used = size_of_the_chunk & 1;
        size_of_the_chunk -= prev_used;
        size_of_the_previous_chunk = *((size_t *)((char *)p));
        printf("chunks[%d]: %p, size = %li, prev (%s) = %li\n",
              i, p, size_of_the_chunk,
              (prev_used? "allocated": "unallocated"), size_of_the_previous_chunk);
    }
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 8-main.c -o 8
julien@holberton:~/holberton/w/hackthevm3$ ./8 
chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
chunks[0]: 0x1031000, size = 1040, prev (allocated) = 0
chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
chunks[1]: 0x1031410, size = 2064, prev (allocated) = 0
chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
chunks[2]: 0x1031c20, size = 3088, prev (allocated) = 0
chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
chunks[3]: 0x1032830, size = 4112, prev (allocated) = 0
chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
chunks[4]: 0x1033840, size = 5136, prev (unallocated) = 4112
chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
chunks[5]: 0x1034c50, size = 6160, prev (allocated) = 0
chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
chunks[6]: 0x1036460, size = 7184, prev (allocated) = 0
chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
chunks[7]: 0x1038070, size = 8208, prev (allocated) = 0
chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
chunks[8]: 0x103a080, size = 9232, prev (unallocated) = 8208
chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
chunks[9]: 0x103c490, size = 10256, prev (allocated) = 0
julien@holberton:~/holberton/w/hackthevm3$

Is the heap actually growing upwards?

The last question left unanswered is: “Is the heap actually growing upwards?”. From the brk man page, it seems so:

DESCRIPTION
       brk() and sbrk() change the location of the program break, which defines the end  of  the
       process's  data  segment  (i.e., the program break is the first location after the end of
       the uninitialized data segment).  Increasing the program break has the effect of allocat‐
       ing memory to the process; decreasing the break deallocates memory.

Let’s check! (9-main.c)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**
 * main - moving the program break
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int i;

    write(1, "START\n", 6);
    malloc(1);
    getchar();
    write(1, "LOOP\n", 5);
    for (i = 0; i < 0x25000 / 1024; i++)
    {
        malloc(1024);
    }
    write(1, "END\n", 4);
    getchar();
    return (EXIT_SUCCESS);
}

Now let’s confirm this assumption with strace:

julien@holberton:~/holberton/w/hackthevm3$ strace ./9 
execve("./9", ["./9"], [/* 61 vars */]) = 0
...
write(1, "START\n", 6START
)                  = 6
brk(0)                                  = 0x1fd8000
brk(0x1ff9000)                          = 0x1ff9000
...
write(1, "LOOP\n", 5LOOP
)                   = 5
brk(0x201a000)                          = 0x201a000
write(1, "END\n", 4END
)                    = 4
...
julien@holberton:~/holberton/w/hackthevm3$

clearly, malloc made only two calls to brk to increase the allocated space on the heap. And the second call is using a higher memory address argument (0x201a000 > 0x1ff9000). The second syscall was triggered when the space on the heap was too small to host all the malloc calls.

Let’s double check with /proc.

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 9-main.c -o 9
julien@holberton:~/holberton/w/hackthevm3$ ./9
START

julien@holberton:/proc/7855$ ps aux | grep \ \./9$
julien     7972  0.0  0.0   4332   684 pts/9    S+   19:08   0:00 ./9
julien@holberton:/proc/7855$ cd /proc/7972
julien@holberton:/proc/7972$ cat maps
...
00901000-00922000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/7972$

-> 00901000-00922000 rw-p 00000000 00:00 0 [heap]
Let’s hit Enter and look at the [heap] again:

LOOP
END

julien@holberton:/proc/7972$ cat maps
...
00901000-00943000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/7972$

-> 00901000-00943000 rw-p 00000000 00:00 0 [heap]
The beginning of the heap is still the same, but the size has increased upwards from 00922000 to 00943000.

The Address Space Layout Randomisation (ASLR)

You may have noticed something “strange” in the /proc/pid/maps listing above, that we want to study:

The program break is the address of the first location beyond the current end of the data region – so the address of the first location beyond the executable in the virtual memory. As a consequence, the heap should start right after the end of the executable in memory. As you can see in all above listing, it is NOT the case. The only thing that is true is that the heap is always the next memory region after the executable, which makes sense since the heap is actually part of the data segment of the executable itself. Also, if we look even closer, the memory gap size between the executable and the heap is never the same:

Format of the following lines: [PID of the above maps listings]: address of the beginning of the [heap] – address of the end of the executable = memory gap size

[3718]: 01195000 – 00602000 = b93000
[3834]: 024d6000 – 00602000 = 1ed4000
[4014]: 00e70000 – 00602000 = 86e000
[4172]: 01314000 – 00602000 = d12000
[7972]: 00901000 – 00602000 = 2ff000

It seems that this gap size is random, and indeed, it is. If we look at the ELF binary loader source code (fs/binfmt_elf.c) we can find this:

        if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
                current->mm->brk = current->mm->start_brk =
                        arch_randomize_brk(current->mm);
#ifdef compat_brk_randomized
                current->brk_randomized = 1;
#endif
        }

where current->mm->brk is the address of the program break. The arch_randomize_brk function can be found in the arch/x86/kernel/process.c file:

unsigned long arch_randomize_brk(struct mm_struct *mm)
{
        unsigned long range_end = mm->brk + 0x02000000;
        return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
}

The randomize_range returns a start address such that:

    [...... <range> .....]
  start                  end

Source code of the randomize_range function (drivers/char/random.c):

/*
 * randomize_range() returns a start address such that
 *
 *    [...... <range> .....]
 *  start                  end
 *
 * a <range> with size "len" starting at the return value is inside in the
 * area defined by [start, end], but is otherwise randomized.
 */
unsigned long
randomize_range(unsigned long start, unsigned long end, unsigned long len)
{
        unsigned long range = end - len - start;

        if (end <= start + len)
                return 0;
        return PAGE_ALIGN(get_random_int() % range + start);
}

As a result, the offset between the data section of the executable and the program break initial position when the process runs can have a size of anywhere between 0 and 0x02000000. This randomization is known as Address Space Layout Randomisation (ASLR). ASLR is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the positions of the heap and the stack.

The updated VM diagram

With all the above in mind, we can now update our VM diagram:

Virtual memory diagram

`malloc(0)`

Did you ever wonder what was happening when we call malloc with a size of 0? Let’s check! (10-main.c)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - moving the program break
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    size_t size_of_the_chunk;
    char prev_used;

    p = malloc(0);
    printf("%p\n", p);
    pmem((char *)p - 0x10, 0x10);
    size_of_the_chunk = *((size_t *)((char *)p - 8));
    prev_used = size_of_the_chunk & 1;
    size_of_the_chunk -= prev_used;
    printf("chunk size = %li bytes\n", size_of_the_chunk);
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 10-main.c -o 10
julien@holberton:~/holberton/w/hackthevm3$ ./10
0xd08010
00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00
chunk size = 32 bytes
julien@holberton:~/holberton/w/hackthevm3$

-> malloc(0) is actually using 32 bytes, including the first 0x10 bytes.

Again, note that this will not always be the case. From the man page (man malloc):

NULL may also be returned by a successful call to malloc() with a size of zero

Outro

We have learned a couple of things about malloc and the heap. But there is actually more than brk and sbrk. You can try malloc’ing a big chunk of memory, strace it, and look at /proc to learn more before we cover it in a next chapter

Also, studying how free works in coordination with malloc is something we haven’t covered yet. If you want to look at it, you will find part of the answer to why the minimum chunk size is 32 (when we ask malloc for 0 bytes) vs 16 (0x10 in hexadecimal) or 0.

As usual, to be continued! Let me know if you have something you would like me to cover in the next chapter.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

Files

This repo contains the source code (naive_malloc.c, version.c & “X-main.c` files) for programs created in this tutorial.

Hack the Virtual Memory: drawing the VM diagram

Posted by Julien Barbier

hack the virtual memory

Hack The Virtual Memory, chapter 2: Drawing the VM diagram

We previously talked about what you could find in the virtual memory of a process, and where you could find it. Today, we will try to “reconstruct” (part of) the following diagram by making our process print addresses of various elements of the program.

the virtual memory

Prerequisites

In order to fully understand this article, you will need to know:

The basics of the C programming language
A little bit of assembly (but not required)
The very basics of the Linux filesystem and the shell
We will also use the /proc/[pid]/maps file (see man proc or read our first article Hack The Virtual Memory, chapter 0: C strings & /proc)

Environment

All scripts and programs have been tested on the following system:

Ubuntu

Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Everything we will write will be true for this system, but may be different on another system

Tools used:

gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4

objdump

GNU objdump (GNU Binutils for Ubuntu) 2.24

udcli

udis86 1.7.2

bc 1.06.95

The stack

The first thing we want to locate in our diagram is the stack. We know that in C, local variables are located on the stack. So if we print the address of a local variable, it should give us an idea on where we would find the stack in the virtual memory. Let’s use this program (main-1.c) to find out:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;

    printf("Address of a: %p\n", (void *)&a);
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-0.c -o 0
julien@holberton:~/holberton/w/hackthevm2$ ./0
Address of a: 0x7ffd14b8bd9c
julien@holberton:~/holberton/w/hackthevm2$

This will be our first point of reference when we will compare other elements’ addresses.

The heap

The heap is used when you malloc space for your variables. Let’s add a line to use malloc and see where the memory address returned by malloc is located (main-1.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;
    void *p;

    printf("Address of a: %p\n", (void *)&a);
    p = malloc(98);
    if (p == NULL)
    {
        fprintf(stderr, "Can't malloc\n");
        return (EXIT_FAILURE);
    }
    printf("Allocated space in the heap: %p\n", p);
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-1.c -o 1
julien@holberton:~/holberton/w/hackthevm2$ ./1 
Address of a: 0x7ffd4204c554
Allocated space in the heap: 0x901010
julien@holberton:~/holberton/w/hackthevm2$

It’s now clear that the heap (0x901010) is way below the stack (0x7ffd4204c554). At this point we can already draw this diagram:

heap and stack

The executable

Your program is also in the virtual memory. If we print the address of the main function, we should have an idea of where the program is located compared to the stack and the heap. Let’s see if we find it below the heap as expected (main-2.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;
    void *p;

    printf("Address of a: %p\n", (void *)&a);
    p = malloc(98);
    if (p == NULL)
    {
        fprintf(stderr, "Can't malloc\n");
        return (EXIT_FAILURE);
    }
    printf("Allocated space in the heap: %p\n", p);
    printf("Address of function main: %p\n", (void *)main);
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-2.c -o 2
julien@holberton:~/holberton/w/hackthevm2$ ./2 
Address of a: 0x7ffdced37d74
Allocated space in the heap: 0x2199010
Address of function main: 0x40060d
julien@holberton:~/holberton/w/hackthevm2$

It seems that our program (0x40060d) is located below the heap (0x2199010), just as expected.
But let’s make sure that this is the actual code of our program, and not some sort of pointer to another location. Let’s disassemble our program 2 with objdump to look at the “memory address” of the main function:

julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 2 | grep '<main>:' -A 5
000000000040060d <main>:
  40060d:   55                      push   rbp
  40060e:   48 89 e5                mov    rbp,rsp
  400611:   48 83 ec 10             sub    rsp,0x10
  400615:   48 8d 45 f4             lea    rax,[rbp-0xc]
  400619:   48 89 c6                mov    rsi,rax

000000000040060d <main> -> we find the exact same address (0x40060d). If you still have any doubts, you can print the first bytes located at this address, to make sure they match the output of objdump (main-3.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;
    void *p;
    unsigned int i;

    printf("Address of a: %p\n", (void *)&a);
    p = malloc(98);
    if (p == NULL)
    {
        fprintf(stderr, "Can't malloc\n");
        return (EXIT_FAILURE);
    }
    printf("Allocated space in the heap: %p\n", p);
    printf("Address of function main: %p\n", (void *)main);
    printf("First bytes of the main function:\n\t");
    for (i = 0; i < 15; i++)
    {
        printf("%02x ", ((unsigned char *)main)[i]);
    }
    printf("\n");
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-3.c -o 3
julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 3 | grep '<main>:' -A 5
000000000040064d <main>:
  40064d:   55                      push   rbp
  40064e:   48 89 e5                mov    rbp,rsp
  400651:   48 83 ec 10             sub    rsp,0x10
  400655:   48 8d 45 f0             lea    rax,[rbp-0x10]
  400659:   48 89 c6                mov    rsi,rax
julien@holberton:~/holberton/w/hackthevm2$ ./3 
Address of a: 0x7ffeff0f13b0
Allocated space in the heap: 0x8b3010
Address of function main: 0x40064d
First bytes of the main function:
    55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6 
julien@holberton:~/holberton/w/hackthevm2$ echo "55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6" | udcli -64 -x -o 40064d
000000000040064d 55               push rbp                
000000000040064e 4889e5           mov rbp, rsp            
0000000000400651 4883ec10         sub rsp, 0x10           
0000000000400655 488d45f0         lea rax, [rbp-0x10]     
0000000000400659 4889c6           mov rsi, rax            
julien@holberton:~/holberton/w/hackthevm2$

-> We can see that we print the same address and the same content. We are now triple sure this is our main function.

You can download the Udis86 Disassembler Library here.

Here is the updated diagram, based on what we have learned:

stack, heap and executable

Command line arguments and environment variables

The main function can take arguments:

The command line arguments

the first argument of the main function (usually named argc or ac) is the number of command line arguments
the second argument of the main function (usually named argv or av) is an array of pointers to the arguments (C strings)

The environment variables

the third argument of the main function (usally named env or envp) is an array of pointers to the environment variables (C strings)

Let’s see where those elements stand in the virtual memory of our process (main-4.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(int ac, char **av, char **env)
{
        int a;
        void *p;
        int i;

        printf("Address of a: %p\n", (void *)&a);
        p = malloc(98);
        if (p == NULL)
        {
                fprintf(stderr, "Can't malloc\n");
                return (EXIT_FAILURE);
        }
        printf("Allocated space in the heap: %p\n", p);
        printf("Address of function main: %p\n", (void *)main);
        printf("First bytes of the main function:\n\t");
        for (i = 0; i < 15; i++)
        {
                printf("%02x ", ((unsigned char *)main)[i]);
        }
        printf("\n");
        printf("Address of the array of arguments: %p\n", (void *)av);
        printf("Addresses of the arguments:\n\t");
        for (i = 0; i < ac; i++)
        {
                printf("[%s]:%p ", av[i], av[i]);
        }
        printf("\n");
        printf("Address of the array of environment variables: %p\n", (void *)env);
    printf("Address of the first environment variable: %p\n", (void *)(env[0]));
        return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-4.c -o 4
julien@holberton:~/holberton/w/hackthevm2$ ./4 Hello Holberton School!
Address of a: 0x7ffe7d6d8da0
Allocated space in the heap: 0xc8c010
Address of function main: 0x40069d
First bytes of the main function:
    55 48 89 e5 48 83 ec 30 89 7d ec 48 89 75 e0 
Address of the array of arguments: 0x7ffe7d6d8e98
Addresses of the arguments:
    [./4]:0x7ffe7d6da373 [Hello]:0x7ffe7d6da377 [Holberton]:0x7ffe7d6da37d [School!]:0x7ffe7d6da387 
Address of the array of environment variables: 0x7ffe7d6d8ec0
Address of the first environment variables:
    [0x7ffe7d6da38f]:"XDG_VTNR=7"
    [0x7ffe7d6da39a]:"XDG_SESSION_ID=c2"
    [0x7ffe7d6da3ac]:"CLUTTER_IM_MODULE=xim"
julien@holberton:~/holberton/w/hackthevm2$

These elements are above the stack as expected, but now we know the exact order: stack (0x7ffe7d6d8da0) < argv (0x7ffe7d6d8e98) < env (0x7ffe7d6d8ec0) < arguments (from 0x7ffe7d6da373 to 0x7ffe7d6da387 + 8 (8 = size of the string "school" + 1 for the '\0' char)) < environment variables (starting at 0x7ffe7d6da38f).

Actually, we can also see that all the command line arguments are next to each other in the memory, and also right next to the environment variables.

Are the `argv` and `env` arrays next to each other?

The array argv is 5 elements long (there were 4 arguments from the command line + 1 NULL element at the end (argv always ends with NULL to mark the end of the array)). Each element is a pointer to a char and since we are on a 64-bit machine, a pointer is 8 bytes (if you want to make sure, you can use the C operator sizeof() to get the size of a pointer). As a result our argv array is of size 5 * 8 = 40. 40 in base 10 is 0x28 in base 16. If we add this value to the address of the beginning of the array 0x7ffe7d6d8e98, we get… 0x7ffe7d6d8ec0 (The address of the env array)! So the two arrays are next to each other in memory.

Is the first command line argument stored right after the `env` array?

In order to check this we need to know the size of the env array. We know that it ends with a NULL pointer, so in order to get the number of its elements we simply need to loop through it, checking if the “current” element is NULL. Here’s the updated C code (main-5.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**                                                                                                      
 * main - print locations of various elements                                                            
 *                                                                                                       
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS                                      
 */
int main(int ac, char **av, char **env)
{
     int a;
     void *p;
     int i;
     int size;

     printf("Address of a: %p\n", (void *)&a);
     p = malloc(98);
     if (p == NULL)
     {
          fprintf(stderr, "Can't malloc\n");
          return (EXIT_FAILURE);
     }
     printf("Allocated space in the heap: %p\n", p);
     printf("Address of function main: %p\n", (void *)main);
     printf("First bytes of the main function:\n\t");
     for (i = 0; i < 15; i++)
     {
          printf("%02x ", ((unsigned char *)main)[i]);
     }
     printf("\n");
     printf("Address of the array of arguments: %p\n", (void *)av);
     printf("Addresses of the arguments:\n\t");
     for (i = 0; i < ac; i++)
     {
          printf("[%s]:%p ", av[i], av[i]);
     }
     printf("\n");
     printf("Address of the array of environment variables: %p\n", (void *)env);
     printf("Address of the first environment variables:\n");
     for (i = 0; i < 3; i++)
     {
          printf("\t[%p]:\"%s\"\n", env[i], env[i]);
     }
     /* size of the env array */
     i = 0;
     while (env[i] != NULL)
     {
          i++;
     }
     i++; /* the NULL pointer */
     size = i * sizeof(char *);
     printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
     return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ ./5 Hello Betty Holberton!
Address of a: 0x7ffc77598acc
Allocated space in the heap: 0x2216010
Address of function main: 0x40069d
First bytes of the main function:
    55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 
Address of the array of arguments: 0x7ffc77598bc8
Addresses of the arguments:
    [./5]:0x7ffc7759a374 [Hello]:0x7ffc7759a378 [Betty]:0x7ffc7759a37e [Holberton!]:0x7ffc7759a384 
Address of the array of environment variables: 0x7ffc77598bf0
Address of the first environment variables:
    [0x7ffc7759a38f]:"XDG_VTNR=7"
    [0x7ffc7759a39a]:"XDG_SESSION_ID=c2"
    [0x7ffc7759a3ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
julien@holberton:~/holberton/w/hackthevm2$ bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=16
ibase=16
1F0+7FFC77598BF0
7FFC77598DE0
quit
julien@holberton:~/holberton/w/hackthevm2$

-> 7FFC77598DE0 != (but still <) 0x7ffc7759a374. So the answer is no

Wrapping up

Let’s update our diagram with what we learned.

virtual memory with command line arguments and environment variables

Is the stack really growing downwards?

Let’s call a function and figure this out! If this is true, then the variables of the calling function will be higher in memory than those from the called function (main-6.c).

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**                                                                                                      
 * f - print locations of various elements                                                               
 *                                                                                                       
 * Returns: nothing                                                                                      
 */
void f(void)
{
     int a;
     int b;
     int c;

     a = 98;
     b = 1024;
     c = a * b;
     printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c);
     printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c);
}

/**                                                                                                      
 * main - print locations of various elements                                                            
 *                                                                                                       
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS                                      
 */
int main(int ac, char **av, char **env)
{
     int a;
     void *p;
     int i;
     int size;

     printf("Address of a: %p\n", (void *)&a);
     p = malloc(98);
     if (p == NULL)
     {
          fprintf(stderr, "Can't malloc\n");
          return (EXIT_FAILURE);
     }
     printf("Allocated space in the heap: %p\n", p);
     printf("Address of function main: %p\n", (void *)main);
     printf("First bytes of the main function:\n\t");
     for (i = 0; i < 15; i++)
     {
          printf("%02x ", ((unsigned char *)main)[i]);
     }
     printf("\n");
     printf("Address of the array of arguments: %p\n", (void *)av);
     printf("Addresses of the arguments:\n\t");
     for (i = 0; i < ac; i++)
     {
          printf("[%s]:%p ", av[i], av[i]);
     }
     printf("\n");
     printf("Address of the array of environment variables: %p\n", (void *)env);
     printf("Address of the first environment variables:\n");
     for (i = 0; i < 3; i++)
     {
          printf("\t[%p]:\"%s\"\n", env[i], env[i]);
     }
     /* size of the env array */
     i = 0;
     while (env[i] != NULL)
     {
          i++;
     }
     i++; /* the NULL pointer */
     size = i * sizeof(char *);
     printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
     f();
     return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-6.c -o 6
julien@holberton:~/holberton/w/hackthevm2$ ./6
Address of a: 0x7ffdae53ea4c
Allocated space in the heap: 0xf32010
Address of function main: 0x4006f9
First bytes of the main function:
    55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 
Address of the array of arguments: 0x7ffdae53eb48
Addresses of the arguments:
    [./6]:0x7ffdae54038b 
Address of the array of environment variables: 0x7ffdae53eb58
Address of the first environment variables:
    [0x7ffdae54038f]:"XDG_VTNR=7"
    [0x7ffdae54039a]:"XDG_SESSION_ID=c2"
    [0x7ffdae5403ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
[f] a = 98, b = 1024, c = a * b = 100352
[f] Adresses of a: 0x7ffdae53ea04, b = 0x7ffdae53ea08, c = 0x7ffdae53ea0c
julien@holberton:~/holberton/w/hackthevm2$

-> True! (address of var a in function f) 0x7ffdae53ea04 < 0x7ffdae53ea4c (address of var a in function main)

We now update our diagram:

stack is growing downwards

`/proc`

Let’s double check everything we found so far with /proc/[pid]/maps (man proc or refer to the first article in this series to learn about the proc filesystem if you don’t know what it is).

Let’s add a getchar() to our program so that we can look at its “/proc” (main-7.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**                                                                                                      
 * f - print locations of various elements                                                               
 *                                                                                                       
 * Returns: nothing                                                                                      
 */
void f(void)
{
     int a;
     int b;
     int c;

     a = 98;
     b = 1024;
     c = a * b;
     printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c);
     printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c);
}

/**                                                                                                      
 * main - print locations of various elements                                                            
 *                                                                                                       
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS                                      
 */
int main(int ac, char **av, char **env)
{
     int a;
     void *p;
     int i;
     int size;

     printf("Address of a: %p\n", (void *)&a);
     p = malloc(98);
     if (p == NULL)
     {
          fprintf(stderr, "Can't malloc\n");
          return (EXIT_FAILURE);
     }
     printf("Allocated space in the heap: %p\n", p);
     printf("Address of function main: %p\n", (void *)main);
     printf("First bytes of the main function:\n\t");
     for (i = 0; i < 15; i++)
     {
          printf("%02x ", ((unsigned char *)main)[i]);
     }
     printf("\n");
     printf("Address of the array of arguments: %p\n", (void *)av);
     printf("Addresses of the arguments:\n\t");
     for (i = 0; i < ac; i++)
     {
          printf("[%s]:%p ", av[i], av[i]);
     }
     printf("\n");
     printf("Address of the array of environment variables: %p\n", (void *)env);
     printf("Address of the first environment variables:\n");
     for (i = 0; i < 3; i++)
     {
          printf("\t[%p]:\"%s\"\n", env[i], env[i]);
     }
     /* size of the env array */
     i = 0;
     while (env[i] != NULL)
     {
          i++;
     }
     i++; /* the NULL pointer */
     size = i * sizeof(char *);
     printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
     f();
     getchar();
     return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-7.c -o 7
julien@holberton:~/holberton/w/hackthevm2$ ./7 Rona is a Legend SRE
Address of a: 0x7fff16c8146c
Allocated space in the heap: 0x2050010
Address of function main: 0x400739
First bytes of the main function:
    55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 
Address of the array of arguments: 0x7fff16c81568
Addresses of the arguments:
    [./7]:0x7fff16c82376 [Rona]:0x7fff16c8237a [is]:0x7fff16c8237f [a]:0x7fff16c82382 [Legend]:0x7fff16c82384 [SRE]:0x7fff16c8238b 
Address of the array of environment variables: 0x7fff16c815a0
Address of the first environment variables:
    [0x7fff16c8238f]:"XDG_VTNR=7"
    [0x7fff16c8239a]:"XDG_SESSION_ID=c2"
    [0x7fff16c823ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
[f] a = 98, b = 1024, c = a * b = 100352
[f] Adresses of a: 0x7fff16c81424, b = 0x7fff16c81428, c = 0x7fff16c8142c

julien@holberton:~$ ps aux | grep "./7" | grep -v grep
julien     5788  0.0  0.0   4336   628 pts/8    S+   18:04   0:00 ./7 Rona is a Legend SRE
julien@holberton:~$ cat /proc/5788/maps
00400000-00401000 r-xp 00000000 08:01 171828                             /home/julien/holberton/w/hackthevm2/7
00600000-00601000 r--p 00000000 08:01 171828                             /home/julien/holberton/w/hackthevm2/7
00601000-00602000 rw-p 00001000 08:01 171828                             /home/julien/holberton/w/hackthevm2/7
02050000-02071000 rw-p 00000000 00:00 0                                  [heap]
7f68caa1c000-7f68cabd6000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68cabd6000-7f68cadd6000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68cadd6000-7f68cadda000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68cadda000-7f68caddc000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68caddc000-7f68cade1000 rw-p 00000000 00:00 0 
7f68cade1000-7f68cae04000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f68cafe8000-7f68cafeb000 rw-p 00000000 00:00 0 
7f68cafff000-7f68cb003000 rw-p 00000000 00:00 0 
7f68cb003000-7f68cb004000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f68cb004000-7f68cb005000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f68cb005000-7f68cb006000 rw-p 00000000 00:00 0 
7fff16c62000-7fff16c83000 rw-p 00000000 00:00 0                          [stack]
7fff16d07000-7fff16d09000 r--p 00000000 00:00 0                          [vvar]
7fff16d09000-7fff16d0b000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:~$

Let’s check a few things:

The stack starts at 7fff16c62000 and ends at 7fff16c83000. Our variables are all inside this region (0x7fff16c8146c, 0x7fff16c81424, 0x7fff16c81428, 0x7fff16c8142c)
The heap starts at 02050000 and ends at 02071000. Our allocated memory is in there (0x2050010)
Our code (the main function) is located at address 0x400739, so in the following region:
00400000-00401000 r-xp 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
It comes from the file /home/julien/holberton/w/hackthevm2/7 (our executable) and this region has execution permissions, which also makes sense.
The arguments and environment variables (from 0x7fff16c81568 to 0x7fff16c8238f + 0x1f0) are located in the region starting at 7fff16c62000 and ending at 7fff16c83000… the stack! So they are IN the stack, not outside the stack.

This also brings up more questions:

Why is our executable “divided” into three different regions with different permissions? What is inside these two regions?

00600000-00601000 r--p 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
00601000-00602000 rw-p 00001000 08:01 171828 /home/julien/holberton/w/hackthevm2/7

What are all those other regions?
Why our allocated memory does not start at the very beginning of the heap (0x2050010 vs 02050000)? What are those first 16 bytes used for?

There is also another fact that we haven’t checked: Is the heap actually growing upwards?

We’ll find out another day! But before we end this chapter, let’s update our diagram with everything we’ve learned:

the virtual memory

Outro

We have learned a ton of things by simply printing informations from our executables! But we still have open questions that we will explore in a future chapter to complete our diagram of the virtual memory. In the meantime, you should try to find out yourself.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

Files

This repo contains the source code (main-X.c files) for programs created in this tutorial.

Hack The Virtual Memory: Python bytes

Posted by Julien Barbier

hack the vm!

Hack The Virtual Memory, Chapter 1: Python bytes

For this second chapter, we’ll do almost the same thing as for chapter 0: C strings & /proc, but instead we’ll access the virtual memory of a running Python 3 script. It won’t be as straightfoward.
Let’s take this as an excuse to look at some Python 3 internals!

Prerequisites

This article is based on everything we learned in the previous chapter. Please read (and understand) chapter 0: C strings & /proc before reading this one.

In order to fully understand this article, you need to know:

The basics of the C programming language
Some Python
The very basics of the Linux filesystem and the shell
The very basics of the /proc filesystem (see chapter 0: C strings & /proc for an intro on this topic)

Environment

All scripts and programs have been tested on the following system:

Ubuntu

Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4

Python 3:

Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux

Python script

We’ll first use this script (main.py) and try to modify the “string” Holberton in the virtual memory of the process running it.

#!/usr/bin/env python3
'''
Prints a b"string" (bytes object), reads a char from stdin
and prints the same (or not :)) string again
'''

import sys

s = b"Holberton"
print(s)
sys.stdin.read(1)
print(s)

About the bytes object

bytes vs str

As you can see, we are using a bytes object (we use a b in front of our string literal) to store our string. This type will store the characters of the string as bytes (vs potentially multibytes – you can read the unicodeobject.h to learn more about how Python 3 encodes strings). This ensures that the string will be a succession of ASCII-values in the virtual memory of the process running the script.

Technically s is not a Python string (but it doesn’t matter in our context):

julien@holberton:~/holberton/w/hackthevm1$ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "Betty"
>>> type(s)
<class 'str'>
>>> s = b"Betty"
>>> type(s)
<class 'bytes'>
>>> quit()

Everything is an object

Everything in Python is an object: integers, strings, bytes, functions, everything. So the line s = b"Holberton" should create an object of type bytes, and store the string b"Holberton somewhere in memory. Probably in the heap since it has to reserve space for the object and the bytes referenced by or stored in the object (at this point we don’t know about the exact implementation).

Running `read_write_heap.py` against the Python script

Note: read_write_heap.py is a script we wrote in the previous chapter chapter 0: C strings & /proc

Let’s run the above script and then run our read_write_heap.py script:

julien@holberton:~/holberton/w/hackthevm1$ ./main.py 
b'Holberton'

At this point main.py is waiting for the user to hit Enter. That corresponds to the line sys.stdin.read(1) in our code.
Let’s run read_write_heap.py:

julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep
julien     3929  0.0  0.7  31412  7848 pts/0    S+   15:10   0:00 python3 ./main.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 3929 Holberton "~ Betty ~"
[*] maps: /proc/3929/maps
[*] mem: /proc/3929/mem
[*] Found [heap]:
    pathname = [heap]
    addresses = 022dc000-023c6000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [22dc000] | end [23c6000]
[*] Found 'Holberton' at 8e192
[*] Writing '~ Betty ~' at 236a192
julien@holberton:~/holberton/w/hackthevm1$

Easy! As expected, we found the string on the heap and replaced it. Now when we will hit Enter in the main.py script, it will print b'~ Betty ~':


b'Holberton'
julien@holberton:~/holberton/w/hackthevm1$

Wait, WAT?!

WAT!

We found the string “Holberton” and replaced it, but it was not the correct string?
Before we go down the rabbit hole, we have one more thing to check. Our script stops when it finds the first occurence of the string. Let’s run it several times to see if there are more occurences of the same string in the heap.

julien@holberton:~/holberton/w/hackthevm1$ ./main.py 
b'Holberton'


julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep
julien     4051  0.1  0.7  31412  7832 pts/0    S+   15:53   0:00 python3 ./main.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 4051 Holberton "~ Betty ~"
[*] maps: /proc/4051/maps
[*] mem: /proc/4051/mem
[*] Found [heap]:
    pathname = [heap]
    addresses = 00bf4000-00cde000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [bf4000] | end [cde000]
[*] Found 'Holberton' at 8e162
[*] Writing '~ Betty ~' at c82162
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 4051 Holberton "~ Betty ~"
[*] maps: /proc/4051/maps
[*] mem: /proc/4051/mem
[*] Found [heap]:
    pathname = [heap]
    addresses = 00bf4000-00cde000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [bf4000] | end [cde000]
Can't find 'Holberton'
julien@holberton:~/holberton/w/hackthevm1$

Only one occurence. So where is the string “Holberton” that is used by the script? Where is our Python bytes object in memory? Could it be in the stack? Let’s replace “[heap]” by “[stack]”* in our read_write_heap.py script to create the read_write_stack.py:

(*) _see previous article, the stack region is called “[stack]” on the /proc/[pid]/maps file_

#!/usr/bin/env python3
'''
Locates and replaces the first occurrence of a string in the stack
of a process

Usage: ./read_write_stack.py PID search_string replace_by_string
Where:
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
search_string with
'''

import sys

def print_usage_and_exit():
    print('Usage: {} pid search write'.format(sys.argv[0]))
    sys.exit(1)

# check usage
if len(sys.argv) != 4:
    print_usage_and_exit()

# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
    print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string  == "":
    print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string  == "":
    print_usage_and_exit()

# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))

# try opening the maps file
try:
    maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
    print("[ERROR] Can not open file {}:".format(maps_filename))
    print("        I/O error({}): {}".format(e.errno, e.strerror))
    sys.exit(1)

for line in maps_file:
    sline = line.split(' ')
    # check if we found the stack
    if sline[-1][:-1] != "[stack]":
        continue
    print("[*] Found [stack]:")

    # parse line
    addr = sline[0]
    perm = sline[1]
    offset = sline[2]
    device = sline[3]
    inode = sline[4]
    pathname = sline[-1][:-1]
    print("\tpathname = {}".format(pathname))
    print("\taddresses = {}".format(addr))
    print("\tpermisions = {}".format(perm))
    print("\toffset = {}".format(offset))
    print("\tinode = {}".format(inode))

    # check if there is read and write permission
    if perm[0] != 'r' or perm[1] != 'w':
        print("[*] {} does not have read/write permission".format(pathname))
        maps_file.close()
        exit(0)

    # get start and end of the stack in the virtual memory
    addr = addr.split("-")
    if len(addr) != 2: # never trust anyone, not even your OS :)
        print("[*] Wrong addr format")
        maps_file.close()
        exit(1)
    addr_start = int(addr[0], 16)
    addr_end = int(addr[1], 16)
    print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))

    # open and read mem
    try:
        mem_file = open(mem_filename, 'rb+')
    except IOError as e:
        print("[ERROR] Can not open file {}:".format(mem_filename))
        print("        I/O error({}): {}".format(e.errno, e.strerror))
        maps_file.close()
        exit(1)

    # read stack
    mem_file.seek(addr_start)
    stack = mem_file.read(addr_end - addr_start)

    # find string
    try:
        i = stack.index(bytes(search_string, "ASCII"))
    except Exception:
        print("Can't find '{}'".format(search_string))
        maps_file.close()
        mem_file.close()
        exit(0)
    print("[*] Found '{}' at {:x}".format(search_string, i))

    # write the new stringprint("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
    mem_file.seek(addr_start + i)
    mem_file.write(bytes(write_string, "ASCII"))

    # close filesmaps_file.close()
    mem_file.close()

    # there is only one stack in our example
    break

The above script (read_write_heap.py) does exactly the same thing than the previous one (read_write_stack.py), the same way. Except we’re looking at the stack, instead of the heap. Let’s try to find our string in the stack:

julien@holberton:~/holberton/w/hackthevm1$ ./main.py
b'Holberton'

julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep
julien     4124  0.2  0.7  31412  7848 pts/0    S+   16:10   0:00 python3 ./main.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_stack.py 4124 Holberton "~ Betty ~"
[sudo] password for julien: 
[*] maps: /proc/4124/maps
[*] mem: /proc/4124/mem
[*] Found [stack]:
    pathname = [stack]
    addresses = 7fff2997e000-7fff2999f000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [7fff2997e000] | end [7fff2999f000]
Can't find 'Holberton'
julien@holberton:~/holberton/w/hackthevm1$

So our string is not in the heap and not in the stack: Where is it? It’s time to dig into Python 3 internals and locate the string using what we will learn. Brace yourselves, the fun will begin now

Locating the string in the virtual memory

Note: It is important to note that there are many implementations of Python 3. In this article, we are using the original and most commonly used: CPython (coded in C). What we are about to say about Python 3 will only be true for this implementation.

id

There is a simple way to know where the object (be careful, the object is not the string) is in the virtual memory. CPython has a specific implementation of the id() builtin: id() will return the address of the object in memory.

If we add a line to the Python script to print the id of our object, we should get its address (main_id.py):

#!/usr/bin/env python3
'''
Prints:
- the address of the bytes object
- a b"string" (bytes object)
reads a char from stdin
and prints the same (or not :)) string again
'''

import sys

s = b"Holberton"
print(hex(id(s)))
print(s)
sys.stdin.read(1)
print(s)

julien@holberton:~/holberton/w/hackthevm1$ ./main_id.py
0x7f343f010210
b'Holberton'

-> 0x7f343f010210. Let’s look at /proc/ to understand where exactly our object is located.

julien@holberton:/usr/include/python3.4$ ps aux | grep main_id.py | grep -v grep
julien     4344  0.0  0.7  31412  7856 pts/0    S+   16:53   0:00 python3 ./main_id.py
julien@holberton:/usr/include/python3.4$ cat /proc/4344/maps
00400000-006fa000 r-xp 00000000 08:01 655561                             /usr/bin/python3.4
008f9000-008fa000 r--p 002f9000 08:01 655561                             /usr/bin/python3.4
008fa000-00986000 rw-p 002fa000 08:01 655561                             /usr/bin/python3.4
00986000-009a2000 rw-p 00000000 00:00 0 
021ba000-022a4000 rw-p 00000000 00:00 0                                  [heap]
7f343d797000-7f343de79000 r--p 00000000 08:01 663747                     /usr/lib/locale/locale-archive
7f343de79000-7f343df7e000 r-xp 00000000 08:01 136303                     /lib/x86_64-linux-gnu/libm-2.19.so
7f343df7e000-7f343e17d000 ---p 00105000 08:01 136303                     /lib/x86_64-linux-gnu/libm-2.19.so
7f343e17d000-7f343e17e000 r--p 00104000 08:01 136303                     /lib/x86_64-linux-gnu/libm-2.19.so
7f343e17e000-7f343e17f000 rw-p 00105000 08:01 136303                     /lib/x86_64-linux-gnu/libm-2.19.so
7f343e17f000-7f343e197000 r-xp 00000000 08:01 136416                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e197000-7f343e396000 ---p 00018000 08:01 136416                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e396000-7f343e397000 r--p 00017000 08:01 136416                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e397000-7f343e398000 rw-p 00018000 08:01 136416                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e398000-7f343e3bf000 r-xp 00000000 08:01 136275                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e3bf000-7f343e5bf000 ---p 00027000 08:01 136275                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e5bf000-7f343e5c1000 r--p 00027000 08:01 136275                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e5c1000-7f343e5c2000 rw-p 00029000 08:01 136275                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e5c2000-7f343e5c4000 r-xp 00000000 08:01 136408                     /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e5c4000-7f343e7c3000 ---p 00002000 08:01 136408                     /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e7c3000-7f343e7c4000 r--p 00001000 08:01 136408                     /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e7c4000-7f343e7c5000 rw-p 00002000 08:01 136408                     /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e7c5000-7f343e7c8000 r-xp 00000000 08:01 136270                     /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e7c8000-7f343e9c7000 ---p 00003000 08:01 136270                     /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e9c7000-7f343e9c8000 r--p 00002000 08:01 136270                     /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e9c8000-7f343e9c9000 rw-p 00003000 08:01 136270                     /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e9c9000-7f343eb83000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f343eb83000-7f343ed83000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f343ed83000-7f343ed87000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f343ed87000-7f343ed89000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f343ed89000-7f343ed8e000 rw-p 00000000 00:00 0 
7f343ed8e000-7f343eda7000 r-xp 00000000 08:01 136373                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343eda7000-7f343efa6000 ---p 00019000 08:01 136373                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343efa6000-7f343efa7000 r--p 00018000 08:01 136373                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343efa7000-7f343efa8000 rw-p 00019000 08:01 136373                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343efa8000-7f343efac000 rw-p 00000000 00:00 0 
7f343efac000-7f343efcf000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f343f000000-7f343f1b6000 rw-p 00000000 00:00 0 
7f343f1c5000-7f343f1cc000 r--s 00000000 08:01 918462                     /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
7f343f1cc000-7f343f1ce000 rw-p 00000000 00:00 0 
7f343f1ce000-7f343f1cf000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f343f1cf000-7f343f1d0000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f343f1d0000-7f343f1d1000 rw-p 00000000 00:00 0 
7ffccf1fd000-7ffccf21e000 rw-p 00000000 00:00 0                          [stack]
7ffccf23c000-7ffccf23e000 r--p 00000000 00:00 0                          [vvar]
7ffccf23e000-7ffccf240000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:/usr/include/python3.4$

-> Our object is stored in the following memory region: 7f343f000000-7f343f1b6000 rw-p 00000000 00:00 0, which is not the heap, and not the stack. That confirms what we saw earlier. But that doesn’t mean that the string itself is stored in the same memory region. For instance, the bytes object could store a pointer to the string, and not a copy of a string. Of course, at this point we could search for our string in this memory region, but we want to understand and be positive we’re looking in the right area, and not use “brute force” to find the solution. It’s time to learn more about bytes objects.

`bytesobject.h`

We are using the C implementation of Python (CPython), so let’s look at the header file for bytes objects.

Note: If you don’t have the Python 3 header files, you can use this command on Ubuntu: sudo apt-get install python3-dev to downloadd them on your system. If you are using the exact same environment as me (see the “Environment” section above), you should then be able to see the Python 3 header files in the /usr/include/python3.4/ directory.

From bytesobject.h:

typedef struct {
    PyObject_VAR_HEAD
    Py_hash_t ob_shash;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     */
} PyBytesObject;

What does that tell us?

A Python 3 bytes object is represented internally using a variable of type PyBytesObject
ob_sval holds the entire string
The string ends with 0
ob_size stores the length of the string (to find ob_size, look at the definition of the macro PyObject_VAR_HEAD in objects.h. We’ll look at it later)

So in our example, if we were able to print the bytes object, we should see this:

ob_sval: “Holberton” -> Bytes values: 48 6f 6c 62 65 72 74 6f 6e 00
ob_size: 9

Based on what we did learn previously, this means that the string is “inside” the bytes object. So inside the same memory region \o/

What if we didn’t know about the way id was implemented in CPython? There is actually another way that we can use to find where the string is: looking at the actual object in memory.

Looking at the `bytes` object in memory

If we want to look directly at the PyBytesObject variable, we will need to create a C function, and call this C function from Python. There are different ways to call a C function from Python. We will use the simplest one: using a dynamic library.

Creating our C function

So the idea is to create a C function that is called from Python with the object as a parameter, and then “explore” this object to get the exact address of the string (as well as other information about the object).

The function prototype should be: void print_python_bytes(PyObject *p);, where p is a pointer to our object (so p stores the address of our object in the virtual memory). It doesn’t have to return anything.

`object.h`

You probably have noticed that we don’t use a parameter of type PyBytesObject. To understand why, let’s have a look at the object.h header file and see what we can learn from it:

/* Object and type object interface */

/*
Objects are structures allocated on the heap.  Special rules apply to
the use of objects to ensure they are properly garbage-collected.
Objects are never allocated statically or on the stack; they must be
...
*/

“Objects are never allocated statically or on the stack” -> ok, now we know why it was not on the stack.
“Objects are structures allocated on the heap” -> wait… WAT? We searched for the string in the heap and it was NOT there… I’m confused! We’ll discuss this later, in another article

What else can we read in this file:

/*
...
Objects do not float around in memory; once allocated an object keeps
the same size and address.  Objects that must hold variable-size data
...
*/

“Objects do not float around in memory; once allocated an object keeps the same size and address”. Good to know. That means that if we modify the correct string, it will always be modfied, and the addresses will never change
“once allocated” -> allocation? but not using the heap? I’m confused! We’ll discuss this later in another article

/*
...
Objects are always accessed through pointers of the type 'PyObject *'.
The type 'PyObject' is a structure that only contains the reference count
and the type pointer.  The actual memory allocated for an object
contains other data that can only be accessed after casting the pointer
to a pointer to a longer structure type.  This longer type must start
with the reference count and type fields; the macro PyObject_HEAD should be
used for this (to accommodate for future changes).  The implementation
of a particular object type can cast the object pointer to the proper
type and back.
...
*/

“Objects are always accessed through pointers of the type ‘PyObject *'” -> that is why we have to have to take a pointer of type PyObject (vs PyBytesObject) as the parameter of our function
“The actual memory allocated for an object contains other data that can only be accessed after casting the pointer to a pointer to a longer structure type.” -> So we will have to cast our function parameter to PyBytesObject * in order to access all its information. This is possible because the beginning of the PyBytesObject starts with a PyVarObject which itself starts with a PyObject:

/* PyObject_VAR_HEAD defines the initial segment of all variable-size
 * container objects.  These end with a declaration of an array with 1
 * element, but enough space is malloc'ed so that the array actually
 * has room for ob_size elements.  Note that ob_size is an element count,
 * not necessarily a byte count.
 */
#define PyObject_VAR_HEAD      PyVarObject ob_base;
#define Py_INVALID_SIZE (Py_ssize_t)-1

/* Nothing is actually declared to be a PyObject, but every pointer to
 * a Python object can be cast to a PyObject*.  This is inheritance built
 * by hand.  Similarly every pointer to a variable-size Python object can,
 * in addition, be cast to PyVarObject*.
 */
typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

-> Here is the ob_size that bytesobject.h was mentioning.

The C function

Based on everything we just learned, the C code is pretty straightforward (bytes.c):

#include "Python.h"

/**
 * print_python_bytes - prints info about a Python 3 bytes object
 * @p: a pointer to a Python 3 bytes object
 * 
 * Return: Nothing
 */
void print_python_bytes(PyObject *p)
{
     /* The pointer with the correct type.*/
     PyBytesObject *s;
     unsigned int i;

     printf("[.] bytes object info\n");
     /* casting the PyObject pointer to a PyBytesObject pointer */
     s = (PyBytesObject *)p;
     /* never trust anyone, check that this is actually
        a PyBytesObject object. */
     if (s && PyBytes_Check(s))
     {
          /* a pointer holds the memory address of the first byte
         of the data it points to */
          printf("  address of the object: %p\n", (void *)s);
          /* op_size is in the ob_base structure, of type PyVarObject. */
          printf("  size: %ld\n", s->ob_base.ob_size);
          /* ob_sval is the array of bytes, ending with the value 0:
         ob_sval[ob_size] == 0 */
          printf("  trying string: %s\n", s->ob_sval);
          printf("  address of the data: %p\n", (void *)(s->ob_sval));
          printf("  bytes:");
          /* printing each byte at a time, in case this is not
         a "string". bytes doesn't have to be strings.
         ob_sval contains space for 'ob_size+1' elements.
         ob_sval[ob_size] == 0. */
          for (i = 0; i < s->ob_base.ob_size + 1; i++)
          {
               printf(" %02x", s->ob_sval[i] & 0xff);
          }
          printf("\n");
     }
     /* if this is not a PyBytesObject print an error message */
     else
     {
          fprintf(stderr, "  [ERROR] Invalid Bytes Object\n");
     }
}

Calling the C function from the python script

Creating the dynamic library

As we said earlier, we will use the “dynamic library method” to call our C function from Python 3. So we just need to compile our C file with this command:

gcc -Wall -Wextra -pedantic -Werror -std=c99 -shared -Wl,-soname,libPython.so -o libPython.so -fPIC -I/usr/include/python3.4 bytes.c

Don’t forget to include the Python 3 header files directory: -I/usr/include/python3.4

Hopefully, this should have created a dynamic library called libPython.so.

Using the dynamic library from Python 3

In order to use our function we simply need to add these lines in the Python script:

import ctypes

lib = ctypes.CDLL('./libPython.so')
lib.print_python_bytes.argtypes = [ctypes.py_object]

and call our function this way:

lib.print_python_bytes(s)

The new Python script

Here is the complete source code of the new Python 3 script (main_bytes.py):

#!/usr/bin/env python3
'''
Prints:
- the address of the bytes object
- a b"string" (bytes object)
- information about the bytes object
And then:
- reads a char from stdin
- prints the same (or not :)) information again
'''

import sys
import ctypes

lib = ctypes.CDLL('./libPython.so')
lib.print_python_bytes.argtypes = [ctypes.py_object]

s = b"Holberton"
print(hex(id(s)))
print(s)
lib.print_python_bytes(s)

sys.stdin.read(1)

print(hex(id(s)))
print(s)
lib.print_python_bytes(s)

Let’s run it!

julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py 
0x7f04d721b210
b'Holberton'
[.] bytes object info
  address of the object: 0x7f04d721b210
  size: 9
  trying string: Holberton
  address of the data: 0x7f04d721b230
  bytes: 48 6f 6c 62 65 72 74 6f 6e 00

As expected:

id() returns the address of the object itself (0x7f04d721b210)
the size of the data of our object (ob_size) is 9
the data of our object is “Holberton”, 48 6f 6c 62 65 72 74 6f 6e 00 (and it ends with 00 as specified on the header file bytesobject.h)

Annnnnnd, we have found the exact address of our string: 0x7f04d721b230 \o/

monty python

Sorry I had to add at least one Monty Python reference (why)

`rw_all.py`

Now that we undertand a little bit more about what’s happening, it’s ok to “brute-force” the mapped memory regions. Let’s update the script that replaces the string. Instead of looking only in the stack or the heap, let’s look in all readable and writeable memory regions of the process. Here’s the source code:

#!/usr/bin/env python3
'''
Locates and replaces (if we have permission) all occurrences of
an ASCII string in the entire virtual memory of a process.

Usage: ./rw_all.py PID search_string replace_by_string
Where:
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
search_string with
'''

import sys

def print_usage_and_exit():
    print('Usage: {} pid search write'.format(sys.argv[0]))
    exit(1)

# check usage
if len(sys.argv) != 4:
    print_usage_and_exit()

# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
    print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string  == "":
    print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string  == "":
    print_usage_and_exit()

# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))

# try opening the file
try:
    maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
    print("[ERROR] Can not open file {}:".format(maps_filename))
    print("        I/O error({}): {}".format(e.errno, e.strerror))
    exit(1)

for line in maps_file:
    # print the name of the memory region
    sline = line.split(' ')
    name = sline[-1][:-1];
    print("[*] Searching in {}:".format(name))

    # parse line
    addr = sline[0]
    perm = sline[1]
    offset = sline[2]
    device = sline[3]
    inode = sline[4]
    pathname = sline[-1][:-1]

    # check if there are read and write permissions
    if perm[0] != 'r' or perm[1] != 'w':
        print("\t[\x1B[31m!\x1B[m] {} does not have read/write permissions ({})".format(pathname, perm))
        continue

    print("\tpathname = {}".format(pathname))
    print("\taddresses = {}".format(addr))
    print("\tpermisions = {}".format(perm))
    print("\toffset = {}".format(offset))
    print("\tinode = {}".format(inode))

    # get start and end of the memoy region
    addr = addr.split("-")
    if len(addr) != 2: # never trust anyone
        print("[*] Wrong addr format")
        maps_file.close()
        exit(1)
    addr_start = int(addr[0], 16)
    addr_end = int(addr[1], 16)
    print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))

    # open and read the memory region
    try:
        mem_file = open(mem_filename, 'rb+')
    except IOError as e:
        print("[ERROR] Can not open file {}:".format(mem_filename))
        print("        I/O error({}): {}".format(e.errno, e.strerror))
        maps_file.close()

    # read the memory region
    mem_file.seek(addr_start)
    region = mem_file.read(addr_end - addr_start)

    # find string
    nb_found = 0;
    try:
        i = region.index(bytes(search_string, "ASCII"))
        while (i):
            print("\t[\x1B[32m:)\x1B[m] Found '{}' at {:x}".format(search_string, i))
            nb_found = nb_found + 1
            # write the new string
        print("\t[:)] Writing '{}' at {:x}".format(write_string, addr_start + i))
            mem_file.seek(addr_start + i)
            mem_file.write(bytes(write_string, "ASCII"))
            mem_file.flush()

            # update our buffer
        region.write(bytes(write_string, "ASCII"), i)

            i = region.index(bytes(search_string, "ASCII"))
    except Exception:
        if nb_found == 0:
            print("\t[\x1B[31m:(\x1B[m] Can't find '{}'".format(search_string))
    mem_file.close()

# close files
maps_file.close()

Let’s run it!

julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py 
0x7f37f1e01210
b'Holberton'
[.] bytes object info
  address of the object: 0x7f37f1e01210
  size: 9
  trying string: Holberton
  address of the data: 0x7f37f1e01230
  bytes: 48 6f 6c 62 65 72 74 6f 6e 00

julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main_bytes.py | grep -v grep
julien     4713  0.0  0.8  37720  8208 pts/0    S+   18:48   0:00 python3 ./main_bytes.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./rw_all.py 4713 Holberton "~ Betty ~"
[*] maps: /proc/4713/maps
[*] mem: /proc/4713/mem
[*] Searching in /usr/bin/python3.4:
    [!] /usr/bin/python3.4 does not have read/write permissions (r-xp)
...
[*] Searching in [heap]:
    pathname = [heap]
    addresses = 00e26000-00f11000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [e26000] | end [f11000]
    [:)] Found 'Holberton' at 8e422
    [:)] Writing '~ Betty ~' at eb4422
...
[*] Searching in :
    pathname = 
    addresses = 7f37f1df1000-7f37f1fa7000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [7f37f1df1000] | end [7f37f1fa7000]
    [:)] Found 'Holberton' at 10230
    [:)] Writing '~ Betty ~' at 7f37f1e01230
...
[*] Searching in [stack]:
    pathname = [stack]
    addresses = 7ffdc3d0c000-7ffdc3d2d000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [7ffdc3d0c000] | end [7ffdc3d2d000]
    [:(] Can't find 'Holberton'
...
julien@holberton:~/holberton/w/hackthevm1$

And if we hit enter in the running main_bytes.py…

julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py 
0x7f37f1e01210
b'Holberton'
[.] bytes object info
  address of the object: 0x7f37f1e01210
  size: 9
  trying string: Holberton
  address of the data: 0x7f37f1e01230
  bytes: 48 6f 6c 62 65 72 74 6f 6e 00

0x7f37f1e01210
b'~ Betty ~'
[.] bytes object info
  address of the object: 0x7f37f1e01210
  size: 9
  trying string: ~ Betty ~
  address of the data: 0x7f37f1e01230
  bytes: 7e 20 42 65 74 74 79 20 7e 00
julien@holberton:~/holberton/w/hackthevm1$

BOOM!

yeah!

Outro

We managed to modify the string used by our Python 3 script. Awesome! But we still have questions to answer:

What is the “Holberton” string that is in the [heap] memory region?
How does Python 3 allocate memory outside of the heap?
If Python 3 is not using the heap, what does it refer to when it says “Objects are structures allocated on the heap” in object.h?

That will be for another time

In the meantime, if you are too curious to wait for the next article, you can try to find out yourself.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments.

Files

This repo contains the source code for all scripts and dynamic libraries created in this tutorial:

main.py: the first target
main_id.py: the second target, printing the id of the bytes object
main_bytes.py: the final target, printing also the information about the bytes object, using our dynamic library
read_write_heap.py: the “original” script to find and replace strings in the heap of a process
read_write_stack.py: same but, searches and replaces in the stack instead of the heap
rw_all.py: same but in every memory regions that are readable and writable
bytes.c: the C function to print info about a Python 3 bytes object

Read on!

Chapter 0: Hack The Virtual Memory: C strings & /proc
Chapter 1: Hack The Virtual Memory: Python bytes
Chapter 2: Hack The Virtual Memory: Drawing the VM diagram
Chapter 3: Hack the Virtual Memory: malloc, the heap & the program break

Many thanks to Tim for English proof-reading & Guillaume for PEP8 proof-reading

Hack The Virtual Memory: C strings & /proc

Posted by Julien Barbier

hack the vm!

Intro

Hack The Virtual Memory, Chapter 0: Play with C strings & `/proc`

This is the first in a series of small articles / tutorials based around virtual memory. The goal is to learn some CS basics, but in a different and more practical way.

For this first piece, we’ll use /proc to find and modify variables (in this example, an ASCII string) contained inside the virtual memory of a running process, and learn some cool things along the way.

Environment

All scripts and programs have been tested on the following system:

Ubuntu 14.04 LTS

Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4

Python 3:

Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux

Prerequisites

In order to fully understand this article, you need to know:

The basics of the C programming language
Some Python
The very basics of the Linux filesystem and the shell

Virtual Memory

In computing, virtual memory is a memory management technique that is implemented using both hardware and software. It maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage (as seen by a process or task) appears as a contiguous address space, or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer.

The primary benefits of virtual memory include freeing applications from having to manage a shared memory space, increased security due to memory isolation, and being able to conceptually use more memory than might be physically available, using the technique of paging.

You can read more about the virtual memory on Wikipedia.

In chapter 2, we’ll go into more details and do some fact checking on what lies inside the virtual memory and where. For now, here are some key points you should know before you read on:

Each process has its own virtual memory
The amount of virtual memory depends on your system’s architecture
Each OS handles virtual memory differently, but for most modern operating systems, the virtual memory of a process looks like this:

virtual memory

In the high memory addresses you can find (this is a non exhaustive list, there’s much more to be found, but that’s not today’s topic):

The command line arguments and environment variables
The stack, growing “downwards”. This may seem counter-intuitive, but this is the way the stack is implemented in virtual memory

In the low memory addresses you can find:

Your executable (it’s a little more complicated than that, but this is enough to understand the rest of this article)
The heap, growing “upwards”

The heap is a portion of memory that is dynamically allocated (i.e. containing memory allocated using malloc).

Also, keep in mind that virtual memory is not the same as RAM.

C program

Let’s start with this simple C program:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - uses strdup to create a new string, and prints the
 * address of the new duplcated string
 *
 * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    char *s;

    s = strdup("Holberton");
    if (s == NULL)
    {
        fprintf(stderr, "Can't allocate mem with malloc\n");
        return (EXIT_FAILURE);
    }
    printf("%p\n", (void *)s);
    return (EXIT_SUCCESS);
}

strdup

Take a moment to think before going further. How do you think strdup creates a copy of the string “Holberton”? How can you confirm that?

strdup has to create a new string, so it first has to reserve space for it. The function strdup is probably using malloc. A quick look at its man page can confirm:

DESCRIPTION
       The  strdup()  function returns a pointer to a new string which is a duplicate of the string s.
       Memory for the new string is obtained with malloc(3), and can be freed with free(3).

Take a moment to think before going further. Based on what we said earlier about virtual memory, where do you think the duplicate string will be located? At a high or low memory address?

Probably in the lower addresses (in the heap). Let’s compile and run our small C program to test our hypothesis:

julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror main.c -o holberton
julien@holberton:~/holberton/w/hackthevm0$ ./holberton 
0x1822010
julien@holberton:~/holberton/w/hackthevm0$

Our duplicated string is located at the address 0x1822010. Great. But is this a low or a high memory address?

How big is the virtual memory of a process

The size of the virtual memory of a process depends on your system architecture. In this example I am using a 64-bit machine, so theoretically the size of each process’ virtual memory is 2^64 bytes. In theory, the highest memory address possible is 0xffffffffffffffff (1.8446744e+19), and the lowest is 0x0.

0x1822010 is small compared to 0xffffffffffffffff, so the duplicated string is probably located at a lower memory address. We will be able to confirm this when we will be looking at the proc filesystem).

The proc filesystem

From man proc:

The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures.  It is commonly mounted at `/proc`.  Most of it is read-only, but some files allow kernel variables to be changed.

If you list the contents of your /proc directory, you will probably see a lot of files. We will focus on two of them:

/proc/[pid]/mem
/proc/[pid]/maps

mem

From man proc:

      /proc/[pid]/mem
              This file can be used to access the pages of a process's memory
          through open(2), read(2), and lseek(2).

Awesome! So, can we access and modify the entire virtual memory of any process?

maps

From man proc:

      /proc/[pid]/maps
              A  file containing the currently mapped memory regions and their access permissions.
          See mmap(2) for some further information about memory mappings.

              The format of the file is:

       address           perms offset  dev   inode       pathname
       00400000-00452000 r-xp 00000000 08:02 173521      /usr/bin/dbus-daemon
       00651000-00652000 r--p 00051000 08:02 173521      /usr/bin/dbus-daemon
       00652000-00655000 rw-p 00052000 08:02 173521      /usr/bin/dbus-daemon
       00e03000-00e24000 rw-p 00000000 00:00 0           [heap]
       00e24000-011f7000 rw-p 00000000 00:00 0           [heap]
       ...
       35b1800000-35b1820000 r-xp 00000000 08:02 135522  /usr/lib64/ld-2.15.so
       35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522  /usr/lib64/ld-2.15.so
       35b1a20000-35b1a21000 rw-p 00020000 08:02 135522  /usr/lib64/ld-2.15.so
       35b1a21000-35b1a22000 rw-p 00000000 00:00 0
       35b1c00000-35b1dac000 r-xp 00000000 08:02 135870  /usr/lib64/libc-2.15.so
       35b1dac000-35b1fac000 ---p 001ac000 08:02 135870  /usr/lib64/libc-2.15.so
       35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870  /usr/lib64/libc-2.15.so
       35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870  /usr/lib64/libc-2.15.so
       ...
       f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0    [stack:986]
       ...
       7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0   [stack]
       7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0   [vdso]

              The address field is the address space in the process that the mapping occupies.
          The perms field is a set of permissions:

                   r = read
                   w = write
                   x = execute
                   s = shared
                   p = private (copy on write)

              The offset field is the offset into the file/whatever;
          dev is the device (major:minor); inode is the inode on that device.   0  indicates
              that no inode is associated with the memory region,
          as would be the case with BSS (uninitialized data).

              The  pathname field will usually be the file that is backing the mapping.
          For ELF files, you can easily coordinate with the offset field
              by looking at the Offset field in the ELF program headers (readelf -l).

              There are additional helpful pseudo-paths:

                   [stack]
                          The initial process's (also known as the main thread's) stack.

                   [stack:<tid>] (since Linux 3.4)
                          A thread's stack (where the <tid> is a thread ID).
              It corresponds to the /proc/[pid]/task/[tid]/ path.

                   [vdso] The virtual dynamically linked shared object.

                   [heap] The process's heap.

              If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
          There is no easy  way  to  coordinate
              this back to a process's source, short of running it through gdb(1), strace(1), or similar.

              Under Linux 2.0 there is no field giving pathname.

This means that we can look at the /proc/[pid]/mem file to locate the heap of a running process. If we can read from the heap, we can locate the string we want to modify. And if we can write to the heap, we can replace this string with whatever we want.

pid

A process is an instance of a program, with a unique process ID. This process ID (PID) is used by many functions and system calls to interact with and manipulate processes.

We can use the program ps to get the PID of a running process (man ps).

C program

We now have everything we need to write a script or program that finds a string in the heap of a running process and then replaces it with another string (of the same length or shorter). We will work with the following simple program that infinitely loops and prints a “strduplicated” string.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

/**              
 * main - uses strdup to create a new string, loops forever-ever
 *                
 * Return: EXIT_FAILURE if malloc failed. Other never returns
 */
int main(void)
{
     char *s;
     unsigned long int i;

     s = strdup("Holberton");
     if (s == NULL)
     {
          fprintf(stderr, "Can't allocate mem with malloc\n");
          return (EXIT_FAILURE);
     }
     i = 0;
     while (s)
     {
          printf("[%lu] %s (%p)\n", i, s, (void *)s);
          sleep(1);
          i++;
     }
     return (EXIT_SUCCESS);
}

Compiling and running the above source code should give you this output, and loop indefinitely until you kill the process.

julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror loop.c -o loop
julien@holberton:~/holberton/w/hackthevm0$ ./loop 
[0] Holberton (0xfbd010)
[1] Holberton (0xfbd010)
[2] Holberton (0xfbd010)
[3] Holberton (0xfbd010)
[4] Holberton (0xfbd010)
[5] Holberton (0xfbd010)
[6] Holberton (0xfbd010)
[7] Holberton (0xfbd010)
...

If you would like, pause the reading now and try to write a script or program that finds a string in the heap of a running process before reading further.

looking at /proc

Let’s run our loop program.

julien@holberton:~/holberton/w/hackthevm0$ ./loop 
[0] Holberton (0x10ff010)
[1] Holberton (0x10ff010)
[2] Holberton (0x10ff010)
[3] Holberton (0x10ff010)
...

The first thing we need to find is the PID of the process.

julien@holberton:~/holberton/w/hackthevm0$ ps aux | grep ./loop | grep -v grep
julien     4618  0.0  0.0   4332   732 pts/14   S+   17:06   0:00 ./loop

In the above example, the PID is 4618 (it will be different each time we run it, and it is probably a different number if you are trying this on your own computer). As a result, the maps and mem files we want to look at are located in the /proc/4618 directory:

/proc/4618/maps
/proc/4618/mem

A quick ls -la in the directory should give you something like this:

julien@ubuntu:/proc/4618$ ls -la
total 0
dr-xr-xr-x   9 julien julien 0 Mar 15 17:07 .
dr-xr-xr-x 257 root   root   0 Mar 15 10:20 ..
dr-xr-xr-x   2 julien julien 0 Mar 15 17:11 attr
-rw-r--r--   1 julien julien 0 Mar 15 17:11 autogroup
-r--------   1 julien julien 0 Mar 15 17:11 auxv
-r--r--r--   1 julien julien 0 Mar 15 17:11 cgroup
--w-------   1 julien julien 0 Mar 15 17:11 clear_refs
-r--r--r--   1 julien julien 0 Mar 15 17:07 cmdline
-rw-r--r--   1 julien julien 0 Mar 15 17:11 comm
-rw-r--r--   1 julien julien 0 Mar 15 17:11 coredump_filter
-r--r--r--   1 julien julien 0 Mar 15 17:11 cpuset
lrwxrwxrwx   1 julien julien 0 Mar 15 17:11 cwd -> /home/julien/holberton/w/funwthevm
-r--------   1 julien julien 0 Mar 15 17:11 environ
lrwxrwxrwx   1 julien julien 0 Mar 15 17:11 exe -> /home/julien/holberton/w/funwthevm/loop
dr-x------   2 julien julien 0 Mar 15 17:07 fd
dr-x------   2 julien julien 0 Mar 15 17:11 fdinfo
-rw-r--r--   1 julien julien 0 Mar 15 17:11 gid_map
-r--------   1 julien julien 0 Mar 15 17:11 io
-r--r--r--   1 julien julien 0 Mar 15 17:11 limits
-rw-r--r--   1 julien julien 0 Mar 15 17:11 loginuid
dr-x------   2 julien julien 0 Mar 15 17:11 map_files
-r--r--r--   1 julien julien 0 Mar 15 17:11 maps
-rw-------   1 julien julien 0 Mar 15 17:11 mem
-r--r--r--   1 julien julien 0 Mar 15 17:11 mountinfo
-r--r--r--   1 julien julien 0 Mar 15 17:11 mounts
-r--------   1 julien julien 0 Mar 15 17:11 mountstats
dr-xr-xr-x   5 julien julien 0 Mar 15 17:11 net
dr-x--x--x   2 julien julien 0 Mar 15 17:11 ns
-r--r--r--   1 julien julien 0 Mar 15 17:11 numa_maps
-rw-r--r--   1 julien julien 0 Mar 15 17:11 oom_adj
-r--r--r--   1 julien julien 0 Mar 15 17:11 oom_score
-rw-r--r--   1 julien julien 0 Mar 15 17:11 oom_score_adj
-r--------   1 julien julien 0 Mar 15 17:11 pagemap
-r--------   1 julien julien 0 Mar 15 17:11 personality
-rw-r--r--   1 julien julien 0 Mar 15 17:11 projid_map
lrwxrwxrwx   1 julien julien 0 Mar 15 17:11 root -> /
-rw-r--r--   1 julien julien 0 Mar 15 17:11 sched
-r--r--r--   1 julien julien 0 Mar 15 17:11 schedstat
-r--r--r--   1 julien julien 0 Mar 15 17:11 sessionid
-rw-r--r--   1 julien julien 0 Mar 15 17:11 setgroups
-r--r--r--   1 julien julien 0 Mar 15 17:11 smaps
-r--------   1 julien julien 0 Mar 15 17:11 stack
-r--r--r--   1 julien julien 0 Mar 15 17:07 stat
-r--r--r--   1 julien julien 0 Mar 15 17:11 statm
-r--r--r--   1 julien julien 0 Mar 15 17:07 status
-r--------   1 julien julien 0 Mar 15 17:11 syscall
dr-xr-xr-x   3 julien julien 0 Mar 15 17:11 task
-r--r--r--   1 julien julien 0 Mar 15 17:11 timers
-rw-r--r--   1 julien julien 0 Mar 15 17:11 uid_map
-r--r--r--   1 julien julien 0 Mar 15 17:11 wchan

/proc/pid/maps

As we have seen earlier, the /proc/pid/maps file is a text file, so we can directly read it. The content of the maps file of our process looks like this:

julien@ubuntu:/proc/4618$ cat maps
00400000-00401000 r-xp 00000000 08:01 1070052                            /home/julien/holberton/w/funwthevm/loop
00600000-00601000 r--p 00000000 08:01 1070052                            /home/julien/holberton/w/funwthevm/loop
00601000-00602000 rw-p 00001000 08:01 1070052                            /home/julien/holberton/w/funwthevm/loop
010ff000-01120000 rw-p 00000000 00:00 0                                  [heap]
7f144c052000-7f144c20c000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c20c000-7f144c40c000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c40c000-7f144c410000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c410000-7f144c412000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c412000-7f144c417000 rw-p 00000000 00:00 0 
7f144c417000-7f144c43a000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f144c61e000-7f144c621000 rw-p 00000000 00:00 0 
7f144c636000-7f144c639000 rw-p 00000000 00:00 0 
7f144c639000-7f144c63a000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f144c63a000-7f144c63b000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f144c63b000-7f144c63c000 rw-p 00000000 00:00 0 
7ffc94272000-7ffc94293000 rw-p 00000000 00:00 0                          [stack]
7ffc9435e000-7ffc94360000 r--p 00000000 00:00 0                          [vvar]
7ffc94360000-7ffc94362000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Circling back to what we said earlier, we can see that the stack ([stack]) is located in high memory addresses and the heap ([heap]) in the lower memory addresses.

[heap]

Using the maps file, we can find all the information we need to locate our string:

010ff000-01120000 rw-p 00000000 00:00 0                                  [heap]

The heap:

Starts at address 0x010ff000 in the virtual memory of the process
Ends at memory address: 0x01120000
Is readable and writable (rw)

A quick look back to our (still running) loop program:

...
[1024] Holberton (0x10ff010)
...

-> 0x010ff000 < 0x10ff010 < 0x01120000. This confirms that our string is located in the heap. More precisely, it is located at index 0x10 of the heap. If we open the /proc/pid/mem/ file (in this example /proc/4618/mem) and seek to the memory address 0x10ff010, we can write to the heap of the running process, overwriting the “Holberton” string!

Let’s write a script or program that does just that. Choose your favorite language and let’s do it!

If you would like, stop reading now and try to write a script or program that finds a string in the heap of a running process, before reading further. The next paragraph will give away the source code of the answer!

Overwriting the string in the virtual memory

We’ll be using Python 3 for writing the script, but you could write this in any language. Here is the code:

#!/usr/bin/env python3
'''             
Locates and replaces the first occurrence of a string in the heap
of a process    

Usage: ./read_write_heap.py PID search_string replace_by_string
Where:           
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
  search_string with
'''

import sys

def print_usage_and_exit():
    print('Usage: {} pid search write'.format(sys.argv[0]))
    sys.exit(1)

# check usage  
if len(sys.argv) != 4:
    print_usage_and_exit()

# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
    print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string  == "":
    print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string  == "":
    print_usage_and_exit()

# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))

# try opening the maps file
try:
    maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
    print("[ERROR] Can not open file {}:".format(maps_filename))
    print("        I/O error({}): {}".format(e.errno, e.strerror))
    sys.exit(1)

for line in maps_file:
    sline = line.split(' ')
    # check if we found the heap
    if sline[-1][:-1] != "[heap]":
        continue
    print("[*] Found [heap]:")

    # parse line
    addr = sline[0]
    perm = sline[1]
    offset = sline[2]
    device = sline[3]
    inode = sline[4]
    pathname = sline[-1][:-1]
    print("\tpathname = {}".format(pathname))
    print("\taddresses = {}".format(addr))
    print("\tpermisions = {}".format(perm))
    print("\toffset = {}".format(offset))
    print("\tinode = {}".format(inode))

    # check if there is read and write permission
    if perm[0] != 'r' or perm[1] != 'w':
        print("[*] {} does not have read/write permission".format(pathname))
        maps_file.close()
        exit(0)

    # get start and end of the heap in the virtual memory
    addr = addr.split("-")
    if len(addr) != 2: # never trust anyone, not even your OS :)
        print("[*] Wrong addr format")
        maps_file.close()
        exit(1)
    addr_start = int(addr[0], 16)
    addr_end = int(addr[1], 16)
    print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))

    # open and read mem
    try:
        mem_file = open(mem_filename, 'rb+')
    except IOError as e:
        print("[ERROR] Can not open file {}:".format(mem_filename))
        print("        I/O error({}): {}".format(e.errno, e.strerror))
        maps_file.close()
        exit(1)

    # read heap  
    mem_file.seek(addr_start)
    heap = mem_file.read(addr_end - addr_start)

    # find string
    try:
        i = heap.index(bytes(search_string, "ASCII"))
    except Exception:
        print("Can't find '{}'".format(search_string))
        maps_file.close()
        mem_file.close()
        exit(0)
    print("[*] Found '{}' at {:x}".format(search_string, i))

    # write the new string
    print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
    mem_file.seek(addr_start + i)
    mem_file.write(bytes(write_string, "ASCII"))

    # close files
    maps_file.close()
    mem_file.close()

    # there is only one heap in our example
    break

Note: You will need to run this script as root, otherwise you won’t be able to read or write to the /proc/pid/mem file, even if you are the owner of the process.

Running the script

julien@holberton:~/holberton/w/hackthevm0$ sudo ./read_write_heap.py 4618 Holberton "Fun w vm!"
[*] maps: /proc/4618/maps
[*] mem: /proc/4618/mem
[*] Found [heap]:
    pathname = [heap]
    addresses = 010ff000-01120000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [10ff000] | end [1120000]
[*] Found 'Holberton' at 10
[*] Writing 'Fun w vm!' at 10ff010
julien@holberton:~/holberton/w/hackthevm0$

Note that this address corresponds to the one we found manually:

The heap lies from addresses 0x010ff000 to 0x01120000 in the virtual memory of the running process
Our string is at index 0x10 in the heap, so at the memory address 0x10ff010

If we go back to our loop program, it should now print “fun w vm!”

...
[2676] Holberton (0x10ff010)
[2677] Holberton (0x10ff010)
[2678] Holberton (0x10ff010)
[2679] Holberton (0x10ff010)
[2680] Holberton (0x10ff010)
[2681] Holberton (0x10ff010)
[2682] Fun w vm! (0x10ff010)
[2683] Fun w vm! (0x10ff010)
[2684] Fun w vm! (0x10ff010)
[2685] Fun w vm! (0x10ff010)
...

hack the virtual memory of a process: mind blowing !

Outro

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments.

Files

This repo contains the source code for all programs shown in this tutorial:

main.c: the first C program that prints the location of the string and exits
loop.c: the second C program that loops indefinitely
read_write_heap.py: the script used to modify the string in the running C program

What’s next?

In the next chapter we will do almost the same thing, but instead we’ll access the memory of a running Python 3 script. It won’t be that straightfoward. We’ll take this as an excuse to look at some Python 3 internals. If you are curious, try to do it yourself, and find out why the above read_write_heap.py script won’t work to modify a Python 3 ASCII string.

See you next time and Happy Hacking!

Read on!

Chapter 0: Hack The Virtual Memory: C strings & /proc
Chapter 1: Hack The Virtual Memory: Python bytes
Chapter 2: Hack The Virtual Memory: Drawing the VM diagram
Chapter 3: Hack the Virtual Memory: malloc, the heap & the program break

Many thanks to Kristine, Tim for English proof-reading & Guillaume for PEP8 proof-reading

久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

Hack the virtual memory Archives

The Stack

Prerequisites

Environment

Automatic allocation

Using local variables

leave, Automatic de-allocation

Playing with the stack

ret

Wrapping everything up

Getting the values of the variables

The saved rbp value

The return address value

The output of our program

Hack the stack!

Outro

Questions? Feedback?

Thank you for reading!

Files

Read more about the virtual memory

The heap

Prerequisites

Environment

malloc

No malloc, no [heap]

malloc(x)

strace, brk and sbrk

Many mallocs

Naive malloc

The 0x10 lost bytes

RTFSC

Is the heap actually growing upwards?

The Address Space Layout Randomisation (ASLR)

The updated VM diagram

malloc(0)

Outro

Questions? Feedback?

Thank you for reading!

Files

Read more about the virtual memory

Hack The Virtual Memory, chapter 2: Drawing the VM diagram

Prerequisites

Environment

The stack

The heap

The executable

Command line arguments and environment variables

Are the argv and env arrays next to each other?

Is the first command line argument stored right after the env array?

Wrapping up

Is the stack really growing downwards?

/proc

Outro

Questions? Feedback?

Thank you for reading!

Files

Read more about the virtual memory

Hack The Virtual Memory, Chapter 1: Python bytes

Prerequisites

Environment

Python script

About the bytes object

bytes vs str

Everything is an object

Running read_write_heap.py against the Python script

Locating the string in the virtual memory

id

bytesobject.h

Looking at the bytes object in memory

Creating our C function

object.h

The C function

Calling the C function from the python script

Creating the dynamic library

Using the dynamic library from Python 3

The new Python script

rw_all.py

Outro

Questions? Feedback?

`leave`, Automatic de-allocation

The saved `rbp` value

`malloc`

`malloc(x)`

`strace`, `brk` and `sbrk`

`malloc(0)`

Are the `argv` and `env` arrays next to each other?

Is the first command line argument stored right after the `env` array?

`/proc`

Running `read_write_heap.py` against the Python script

`bytesobject.h`

Looking at the `bytes` object in memory

`object.h`

`rw_all.py`

Hack The Virtual Memory, Chapter 0: Play with C strings & `/proc`