This is the fifth chapter in a series about virtual memory. The goal is to learn some CS basics in a different and more practical way. If you missed the previous chapters, you should probably start there: The StackAs we have seen in chapter 2, the stack resides at the high end of memory and grows downward. But how does it work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will have a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables. Once we understand this, we will be able to play a bit with it, and hijack the flow of our program. Ready? Let’s start! Note: We will talk only about the user stack, as opposed to the kernel stack PrerequisitesIn order to fully understand this article, you will need to know: EnvironmentAll scripts and programs have been tested on the following system: Everything we cover will be true for this system/environment, but may be different on another system Automatic allocationLet’s first look at a very simple program that has one function that uses one variable (0-main.c ): #include <stdio.h>
int main(void)
{
int a;
a = 972;
printf("a = %d\n", a);
return (0);
}
Let’s compile this program and disassemble it using objdump : holberton$ gcc 0-main.c
holberton$ objdump -d -j .text -M intel
The assembly code produced for our main function is the following: 000000000040052d <main>:
40052d: 55 push rbp
40052e: 48 89 e5 mov rbp,rsp
400531: 48 83 ec 10 sub rsp,0x10
400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc
40053c: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
40053f: 89 c6 mov esi,eax
400541: bf e4 05 40 00 mov edi,0x4005e4
400546: b8 00 00 00 00 mov eax,0x0
40054b: e8 c0 fe ff ff call 400410 <printf@plt>
400550: b8 00 00 00 00 mov eax,0x0
400555: c9 leave
400556: c3 ret
400557: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0]
40055e: 00 00
Let’s focus on the first three lines for now: 000000000040052d <main>:
40052d: 55 push rbp
40052e: 48 89 e5 mov rbp,rsp
400531: 48 83 ec 10 sub rsp,0x10
The first lines of the function main refers to rbp and rsp ; these are special purpose registers. rbp is the base pointer, which points to the base of the current stack frame, and rsp is the stack pointer, which points to the top of the current stack frame. Let’s decompose step by step what is happening here. This is the state of the stack when we enter the function main before the first instruction is run: We have just created a space in memory – on the stack – for our local variables. This space is called a stack frame. Every function that has local variables will use a stack frame to store those variables. Using local variablesThe fourth line of assembly code of our main function is the following: 400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc
0x3cc is actually the value 972 in hexadecimal. This line corresponds to our C-code line:
a = 972;
mov DWORD PTR [rbp-0x4],0x3cc is setting the memory at address rbp - 4 to 972 . [rbp - 4] IS our local variable a . The computer doesn’t actually know the name of the variable we use in our code, it simply refers to memory addresses on the stack.
This is the state of the stack and the registers after this operation: leave , Automatic de-allocation
If we look now at the end of the function, we will find this: 400555: c9 leave
The instruction leave sets rsp to rbp , and then pops the top of the stack into rbp . Because we pushed the previous value of rbp onto the stack when we entered the function, rbp is now set to the previous value of rbp . This is how: The local variables are “de-allocated”, and the stack frame of the previous function is restored before we leave the current function.
The state of the stack and the registers rbp and rsp are restored to the same state as when we entered our main function. Playing with the stackWhen the variables are automatically de-allocated from the stack, they are not completely “destroyed”. Their values are still in memory, and this space will potentially be used by other functions. This is why it is important to initialize your variables when you write your code, because otherwise, they will take whatever value there is on the stack at the moment when the program is running. Let’s consider the following C code (1-main.c ): #include <stdio.h>
void func1(void)
{
int a;
int b;
int c;
a = 98;
b = 972;
c = a + b;
printf("a = %d, b = %d, c = %d\n", a, b, c);
}
void func2(void)
{
int a;
int b;
int c;
printf("a = %d, b = %d, c = %d\n", a, b, c);
}
int main(void)
{
func1();
func2();
return (0);
}
As you can see, func2 does not set the values of its local vaiables a , b and c , yet if we compile and run this program it will print… holberton$ gcc 1-main.c && ./a.out
a = 98, b = 972, c = 1070
a = 98, b = 972, c = 1070
holberton$
… the same variable values of func1 ! This is because of how the stack works. The two functions declared the same amount of variables, with the same type, in the same order. Their stack frames are exactly the same. When func1 ends, the memory where the values of its local variables reside are not cleared – only rsp is incremented. As a consequence, when we call func2 its stack frame sits at exactly the same place of the previous func1 stack frame, and the local variables of func2 have the same values of the local variables of func1 when we left func1 . Let’s examine the assembly code to prove it: holberton$ objdump -d -j .text -M intel
000000000040052d <func1>:
40052d: 55 push rbp
40052e: 48 89 e5 mov rbp,rsp
400531: 48 83 ec 10 sub rsp,0x10
400535: c7 45 f4 62 00 00 00 mov DWORD PTR [rbp-0xc],0x62
40053c: c7 45 f8 cc 03 00 00 mov DWORD PTR [rbp-0x8],0x3cc
400543: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
400546: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc]
400549: 01 d0 add eax,edx
40054b: 89 45 fc mov DWORD PTR [rbp-0x4],eax
40054e: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4]
400551: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
400554: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
400557: 89 c6 mov esi,eax
400559: bf 34 06 40 00 mov edi,0x400634
40055e: b8 00 00 00 00 mov eax,0x0
400563: e8 a8 fe ff ff call 400410 <printf@plt>
400568: c9 leave
400569: c3 ret
000000000040056a <func2>:
40056a: 55 push rbp
40056b: 48 89 e5 mov rbp,rsp
40056e: 48 83 ec 10 sub rsp,0x10
400572: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4]
400575: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
400578: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
40057b: 89 c6 mov esi,eax
40057d: bf 34 06 40 00 mov edi,0x400634
400582: b8 00 00 00 00 mov eax,0x0
400587: e8 84 fe ff ff call 400410 <printf@plt>
40058c: c9 leave
40058d: c3 ret
000000000040058e <main>:
40058e: 55 push rbp
40058f: 48 89 e5 mov rbp,rsp
400592: e8 96 ff ff ff call 40052d <func1>
400597: e8 ce ff ff ff call 40056a <func2>
40059c: b8 00 00 00 00 mov eax,0x0
4005a1: 5d pop rbp
4005a2: c3 ret
4005a3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
4005aa: 00 00 00
4005ad: 0f 1f 00 nop DWORD PTR [rax]
As you can see, the way the stack frame is formed is always consistent. In our two functions, the size of the stack frame is the same since the local variables are the same. push rbp
mov rbp,rsp
sub rsp,0x10
And both functions end with the leave statement. The variables a , b and c are referenced the same way in the two functions: a lies at memory address rbp - 0xc
b lies at memory address rbp - 0x8
c lies at memory address rbp - 0x4
Note that the order of those variables on the stack is not the same as the order of those variables in our code. The compiler orders them as it wants, so you should never assume the order of your local variables in the stack. So, this is the state of the stack and the registers rbp and rsp before we leave func1 : When we leave the function func1 , we hit the instruction leave ; as previously explained, this is the state of the stack, rbp and rsp right before returning to the function main : So when we enter func2 , the local variables are set to whatever sits in memory on the stack, and that is why their values are the same as the local variables of the function func1 . retYou might have noticed that all our example functions end with the instruction ret . ret pops the return address from stack and jumps there. When functions are called the program uses the instruction call to push the return address before it jumps to the first instruction of the function called. This is how the program is able to call a function and then return from said function the calling function to execute its next instruction. So this means that there are more than just variables on the stack, there are also memory addresses of instructions. Let’s revisit our 1-main.c code. When the main function calls func1 , 400592: e8 96 ff ff ff call 40052d <func1>
it pushes the memory address of the next instruction onto the stack, and then jumps to func1 . As a consequence, before executing any instructions in func1 , the top of the stack contains this address, so rsp points to this value. After the stack frame of func1 is formed, the stack looks like this: Wrapping everything upGiven what we just learned, we can directly use rbp to directly access all our local variables (without using the C variables!), as well as the saved rbp value on the stack and the return address values of our functions. To do so in C, we can use: register long rsp asm ("rsp");
register long rbp asm ("rbp");
Here is the listing of the program 2-main.c : #include <stdio.h>
void func1(void)
{
int a;
int b;
int c;
register long rsp asm ("rsp");
register long rbp asm ("rbp");
a = 98;
b = 972;
c = a + b;
printf("a = %d, b = %d, c = %d\n", a, b, c);
printf("func1, rpb = %lx\n", rbp);
printf("func1, rsp = %lx\n", rsp);
printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
}
void func2(void)
{
int a;
int b;
int c;
register long rsp asm ("rsp");
register long rbp asm ("rbp");
printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
printf("func2, rpb = %lx\n", rbp);
printf("func2, rsp = %lx\n", rsp);
}
int main(void)
{
register long rsp asm ("rsp");
register long rbp asm ("rbp");
printf("main, rpb = %lx\n", rbp);
printf("main, rsp = %lx\n", rsp);
func1();
func2();
return (0);
}
Getting the values of the variablesFrom our previous discoveries, we know that our variables are referenced via rbp – 0xX: a is at rbp - 0xc
b is at rbp - 0x8
c is at rbp - 0x4
So in order to get the values of those variables, we need to dereference rbp . For the variable a : cast our variable rbp to a char * : (char *)rbp subtract the correct amount of bytes to get the address of where the variable is in memory: (char *)rbp) - 0xc cast it again to a pointer pointing to an int since a is of type int : (int *)(((char *)rbp) - 0xc) and dereference it to get the value sitting at this address: *(int *)(((char *)rbp) - 0xc)
The saved rbp valueLooking at the above diagram, the current rbp directly points to the saved rbp , so we simply have to cast our variable rbp to a pointer to an unsigned long int and dereference it: *(unsigned long int *)rbp . The return address valueThe return address value is right before the saved previous rbp on the stack. rbp is 8 bytes long, so we simply need to add 8 to the current value of rbp to get the address where this return value is on the stack. This is how we do it: cast our variable rbp to a char * : (char *)rbp add 8 to this value: ((char *)rbp + 8) cast it to point to an unsigned long int : (unsigned long int *)((char *)rbp + 8) dereference it to get the value at this address: *(unsigned long int *)((char *)rbp + 8)
The output of our programholberton$ gcc 2-main.c && ./a.out
main, rpb = 7ffc78e71b70
main, rsp = 7ffc78e71b70
a = 98, b = 972, c = 1070
func1, rpb = 7ffc78e71b60
func1, rsp = 7ffc78e71b50
func1, a = 98
func1, b = 972
func1, c = 1070
func1, previous rbp value = 7ffc78e71b70
func1, return address value = 400697
func2, a = 98, b = 972, c = 1070
func2, rpb = 7ffc78e71b60
func2, rsp = 7ffc78e71b50
holberton$
We can see that: from func1 we can access all our variables correctly via rbp from func1 we can get the rbp of the function main we confirm that func1 and func2 do have the same rbp and rsp values the difference between rsp and rbp is 0x10, as seen in the assembly code (sub rsp,0x10 ) in the main function, rsp == rbp because there are no local variables
The return address from func1 is 0x400697 . Let’s double check this assumption by disassembling the program. If we are correct, this should be the address of the instruction right after the call of func1 in the main function. holberton$ objdump -d -j .text -M intel | less
0000000000400664 <main>:
400664: 55 push rbp
400665: 48 89 e5 mov rbp,rsp
400668: 48 89 e8 mov rax,rbp
40066b: 48 89 c6 mov rsi,rax
40066e: bf 3b 08 40 00 mov edi,0x40083b
400673: b8 00 00 00 00 mov eax,0x0
400678: e8 93 fd ff ff call 400410 <printf@plt>
40067d: 48 89 e0 mov rax,rsp
400680: 48 89 c6 mov rsi,rax
400683: bf 4c 08 40 00 mov edi,0x40084c
400688: b8 00 00 00 00 mov eax,0x0
40068d: e8 7e fd ff ff call 400410 <printf@plt>
400692: e8 96 fe ff ff call 40052d <func1>
400697: e8 7a ff ff ff call 400616 <func2>
40069c: b8 00 00 00 00 mov eax,0x0
4006a1: 5d pop rbp
4006a2: c3 ret
4006a3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
4006aa: 00 00 00
4006ad: 0f 1f 00 nop DWORD PTR [rax]
And yes! \o/ Hack the stack!Now that we know where to find the return address on the stack, what if we were to modify this value? Could we alter the flow of a program and make func1 return to somewhere else? Let’s add a new function, called bye to our program (3-main.c ): #include <stdio.h>
#include <stdlib.h>
void bye(void)
{
printf("[x] I am in the function bye!\n");
exit(98);
}
void func1(void)
{
int a;
int b;
int c;
register long rsp asm ("rsp");
register long rbp asm ("rbp");
a = 98;
b = 972;
c = a + b;
printf("a = %d, b = %d, c = %d\n", a, b, c);
printf("func1, rpb = %lx\n", rbp);
printf("func1, rsp = %lx\n", rsp);
printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
}
void func2(void)
{
int a;
int b;
int c;
register long rsp asm ("rsp");
register long rbp asm ("rbp");
printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
printf("func2, rpb = %lx\n", rbp);
printf("func2, rsp = %lx\n", rsp);
}
int main(void)
{
register long rsp asm ("rsp");
register long rbp asm ("rbp");
printf("main, rpb = %lx\n", rbp);
printf("main, rsp = %lx\n", rsp);
func1();
func2();
return (0);
}
Let’s see at which address the code of this function starts: holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less
00000000004005bd <bye>:
4005bd: 55 push rbp
4005be: 48 89 e5 mov rbp,rsp
4005c1: bf d8 07 40 00 mov edi,0x4007d8
4005c6: e8 b5 fe ff ff call 400480 <puts@plt>
4005cb: bf 62 00 00 00 mov edi,0x62
4005d0: e8 eb fe ff ff call 4004c0 <exit@plt>
Now let’s replace the return address on the stack from the func1 function with the address of the beginning of the function bye , 4005bd (4-main.c ): #include <stdio.h>
#include <stdlib.h>
void bye(void)
{
printf("[x] I am in the function bye!\n");
exit(98);
}
void func1(void)
{
int a;
int b;
int c;
register long rsp asm ("rsp");
register long rbp asm ("rbp");
a = 98;
b = 972;
c = a + b;
printf("a = %d, b = %d, c = %d\n", a, b, c);
printf("func1, rpb = %lx\n", rbp);
printf("func1, rsp = %lx\n", rsp);
printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
/* hack the stack! */
*(unsigned long int *)((char *)rbp + 8) = 0x4005bd;
}
void func2(void)
{
int a;
int b;
int c;
register long rsp asm ("rsp");
register long rbp asm ("rbp");
printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
printf("func2, rpb = %lx\n", rbp);
printf("func2, rsp = %lx\n", rsp);
}
int main(void)
{
register long rsp asm ("rsp");
register long rbp asm ("rbp");
printf("main, rpb = %lx\n", rbp);
printf("main, rsp = %lx\n", rsp);
func1();
func2();
return (0);
}
holberton$ gcc 4-main.c && ./a.out
main, rpb = 7fff62ef1b60
main, rsp = 7fff62ef1b60
a = 98, b = 972, c = 1070
func1, rpb = 7fff62ef1b50
func1, rsp = 7fff62ef1b40
func1, a = 98
func1, b = 972
func1, c = 1070
func1, previous rbp value = 7fff62ef1b60
func1, return address value = 40074d
[x] I am in the function bye!
holberton$ echo $?
98
holberton$
We have called the function bye , without calling it! OutroI hope that you enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you have anything you would like me to cover in the next chapter. Questions? Feedback?If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42. Haters, please send your comments to /dev/null . Happy Hacking! Thank you for reading!As always, no one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed. FilesThis repo contains the source code (X-main.c files) for programs created in this tutorial. Read more about the virtual memoryFollow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the fifth chapter in our series on the virtual memory. If you missed the previous ones, here are the links to them: Many thanks to Naomi for proof-reading!
This is the fourth chapter in a series around virtual memory. The goal is to learn some CS basics, but in a different and more practical way. If you missed the previous chapters, you should probably start there: The heapIn this chapter we will look at the heap and malloc in order to answer some of the questions we ended with at the end of the previous chapter: PrerequisitesIn order to fully understand this article, you will need to know: EnvironmentAll scripts and programs have been tested on the following system: Tools used: Everything we will write will be true for this system/environment, but may be different on another system We will also go through the Linux source code. If you are on Ubuntu, you can download the sources of your current kernel by running this command: apt-get source linux-image-$(uname -r)
malloc
malloc is the common function used to dynamically allocate memory. This memory is allocated on the “heap”. Note: malloc is not a system call.
From man malloc : [...] allocate dynamic memory[...]
void *malloc(size_t size);
[...]
The malloc() function allocates size bytes and returns a pointer to the allocated memory.
No malloc, no [heap]Let’s look at memory regions of a process that does not call malloc (0-main.c ). #include <stdlib.h>
#include <stdio.h>
/**
* main - do nothing
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
getchar();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 0-main.c -o 0
julien@holberton:~/holberton/w/hackthevm3$ ./0
Quick reminder (1/3): the memory regions of a process are listed in the /proc/[pid]/maps file. As a result, we first need to know the PID of the process. That is done using the ps command; the second column of ps aux output will give us the PID of the process. Please read chapter 0 to learn more. julien@holberton:/tmp$ ps aux | grep \ \./0$
julien 3638 0.0 0.0 4200 648 pts/9 S+ 12:01 0:00 ./0
Quick reminder (2/3): from the above output, we can see that the PID of the process we want to look at is 3638 . As a result, the maps file will be found in the directory /proc/3638 . julien@holberton:/tmp$ cd /proc/3638
Quick reminder (3/3): The maps file contains the memory regions of the process. The format of each line in this file is: address perms offset dev inode pathname julien@holberton:/proc/3638$ cat maps
00400000-00401000 r-xp 00000000 08:01 174583 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
00600000-00601000 r--p 00000000 08:01 174583 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
00601000-00602000 rw-p 00001000 08:01 174583 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
7f38f87d7000-7f38f8991000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8991000-7f38f8b91000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b91000-7f38f8b95000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b95000-7f38f8b97000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b97000-7f38f8b9c000 rw-p 00000000 00:00 0
7f38f8b9c000-7f38f8bbf000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8da3000-7f38f8da6000 rw-p 00000000 00:00 0
7f38f8dbb000-7f38f8dbe000 rw-p 00000000 00:00 0
7f38f8dbe000-7f38f8dbf000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8dbf000-7f38f8dc0000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8dc0000-7f38f8dc1000 rw-p 00000000 00:00 0
7ffdd85c5000-7ffdd85e6000 rw-p 00000000 00:00 0 [stack]
7ffdd85f2000-7ffdd85f4000 r--p 00000000 00:00 0 [vvar]
7ffdd85f4000-7ffdd85f6000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
julien@holberton:/proc/3638$
Note: hackthevm3 is a symbolic link to hack_the_virtual_memory/03. The Heap/ -> As we can see from the above maps file, there’s no [heap] region allocated. malloc(x)
Let’s do the same but with a program that calls malloc (1-main.c ): #include <stdio.h>
#include <stdlib.h>
/**
* main - 1 call to malloc
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
malloc(1);
getchar();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 1-main.c -o 1
julien@holberton:~/holberton/w/hackthevm3$ ./1
julien@holberton:/proc/3638$ ps aux | grep \ \./1$
julien 3718 0.0 0.0 4332 660 pts/9 S+ 12:09 0:00 ./1
julien@holberton:/proc/3638$ cd /proc/3718
julien@holberton:/proc/3718$ cat maps
00400000-00401000 r-xp 00000000 08:01 176964 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
00600000-00601000 r--p 00000000 08:01 176964 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
00601000-00602000 rw-p 00001000 08:01 176964 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
01195000-011b6000 rw-p 00000000 00:00 0 [heap]
...
julien@holberton:/proc/3718$
-> the [heap] is here. Let’s check the return value of malloc to make sure the returned address is in the heap region (2-main.c ): #include <stdio.h>
#include <stdlib.h>
/**
* main - prints the malloc returned address
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
p = malloc(1);
printf("%p\n", p);
getchar();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 2-main.c -o 2
julien@holberton:~/holberton/w/hackthevm3$ ./2
0x24d6010
julien@holberton:/proc/3718$ ps aux | grep \ \./2$
julien 3834 0.0 0.0 4336 676 pts/9 S+ 12:48 0:00 ./2
julien@holberton:/proc/3718$ cd /proc/3834
julien@holberton:/proc/3834$ cat maps
00400000-00401000 r-xp 00000000 08:01 176966 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
00600000-00601000 r--p 00000000 08:01 176966 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
00601000-00602000 rw-p 00001000 08:01 176966 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
024d6000-024f7000 rw-p 00000000 00:00 0 [heap]
...
julien@holberton:/proc/3834$
-> 024d6000 <0x24d6010 < 024f7000 The returned address is inside the heap region. And as we have seen in the previous chapter, the returned address does not start exactly at the beginning of the region; we’ll see why later. strace , brk and sbrk
malloc is a “regular” function (as opposed to a system call), so it must call some kind of syscall in order to manipulate the heap. Let’s use strace to find out.
strace is a program used to trace system calls and signals. Any program will always use a few syscalls before your main function is executed. In order to know which syscalls are used by malloc , we will add a write syscall before and after the call to malloc (3-main.c ).
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* main - let's find out which syscall malloc is using
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
write(1, "BEFORE MALLOC\n", 14);
p = malloc(1);
write(1, "AFTER MALLOC\n", 13);
printf("%p\n", p);
getchar();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 3-main.c -o 3
julien@holberton:~/holberton/w/hackthevm3$ strace ./3
execve("./3", ["./3"], [/* 61 vars */]) = 0
...
write(1, "BEFORE MALLOC\n", 14BEFORE MALLOC
) = 14
brk(0) = 0xe70000
brk(0xe91000) = 0xe91000
write(1, "AFTER MALLOC\n", 13AFTER MALLOC
) = 13
...
read(0,
From the above listing we can focus on this: brk(0) = 0xe70000
brk(0xe91000) = 0xe91000
-> malloc is using the brk system call in order to manipulate the heap. From brk man page (man brk ), we can see what this system call is doing: ...
int brk(void *addr);
void *sbrk(intptr_t increment);
...
DESCRIPTION
brk() and sbrk() change the location of the program break, which defines
the end of the process's data segment (i.e., the program break is the first
location after the end of the uninitialized data segment). Increasing the
program break has the effect of allocating memory to the process; decreas‐
ing the break deallocates memory.
brk() sets the end of the data segment to the value specified by addr, when
that value is reasonable, the system has enough memory, and the process
does not exceed its maximum data size (see setrlimit(2)).
sbrk() increments the program's data space by increment bytes. Calling
sbrk() with an increment of 0 can be used to find the current location of
the program break.
The program break is the address of the first location beyond the current end of the data region of the program in the virual memory. By increasing the value of the program break, via brk or sbrk , the function malloc creates a new space that can then be used by the process to dynamically allocate memory (using malloc ). So the heap is actually an extension of the data segment of the program. The first call to brk (brk(0) ) returns the current address of the program break to malloc . And the second call is the one that actually creates new memory (since 0xe91000 > 0xe70000 ) by increasing the value of the program break. In the above example, the heap is now starting at 0xe70000 and ends at 0xe91000 . Let’s double check with the /proc/[PID]/maps file: julien@holberton:/proc/3855$ ps aux | grep \ \./3$
julien 4011 0.0 0.0 4748 708 pts/9 S+ 13:04 0:00 strace ./3
julien 4014 0.0 0.0 4336 644 pts/9 S+ 13:04 0:00 ./3
julien@holberton:/proc/3855$ cd /proc/4014
julien@holberton:/proc/4014$ cat maps
00400000-00401000 r-xp 00000000 08:01 176967 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00600000-00601000 r--p 00000000 08:01 176967 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00601000-00602000 rw-p 00001000 08:01 176967 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00e70000-00e91000 rw-p 00000000 00:00 0 [heap]
...
julien@holberton:/proc/4014$
-> 00e70000-00e91000 rw-p 00000000 00:00 0 [heap] matches the pointers returned back to malloc by brk . That’s great, but wait, why didmalloc increment the heap by 00e91000 – 00e70000 = 0x21000 or 135168 bytes, when we only asked for only 1 byte? Many mallocsWhat will happen if we call malloc several times? (4-main.c ) #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* main - many calls to malloc
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
write(1, "BEFORE MALLOC #0\n", 17);
p = malloc(1024);
write(1, "AFTER MALLOC #0\n", 16);
printf("%p\n", p);
write(1, "BEFORE MALLOC #1\n", 17);
p = malloc(1024);
write(1, "AFTER MALLOC #1\n", 16);
printf("%p\n", p);
write(1, "BEFORE MALLOC #2\n", 17);
p = malloc(1024);
write(1, "AFTER MALLOC #2\n", 16);
printf("%p\n", p);
write(1, "BEFORE MALLOC #3\n", 17);
p = malloc(1024);
write(1, "AFTER MALLOC #3\n", 16);
printf("%p\n", p);
getchar();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 4-main.c -o 4
julien@holberton:~/holberton/w/hackthevm3$ strace ./4
execve("./4", ["./4"], [/* 61 vars */]) = 0
...
write(1, "BEFORE MALLOC #0\n", 17BEFORE MALLOC #0
) = 17
brk(0) = 0x1314000
brk(0x1335000) = 0x1335000
write(1, "AFTER MALLOC #0\n", 16AFTER MALLOC #0
) = 16
...
write(1, "0x1314010\n", 100x1314010
) = 10
write(1, "BEFORE MALLOC #1\n", 17BEFORE MALLOC #1
) = 17
write(1, "AFTER MALLOC #1\n", 16AFTER MALLOC #1
) = 16
write(1, "0x1314420\n", 100x1314420
) = 10
write(1, "BEFORE MALLOC #2\n", 17BEFORE MALLOC #2
) = 17
write(1, "AFTER MALLOC #2\n", 16AFTER MALLOC #2
) = 16
write(1, "0x1314830\n", 100x1314830
) = 10
write(1, "BEFORE MALLOC #3\n", 17BEFORE MALLOC #3
) = 17
write(1, "AFTER MALLOC #3\n", 16AFTER MALLOC #3
) = 16
write(1, "0x1314c40\n", 100x1314c40
) = 10
...
read(0,
-> malloc is NOT calling brk each time we call it. The first time, malloc creates a new space (the heap) for the program (by increasing the program break location). The following times, malloc uses the same space to give our program “new” chunks of memory. Those “new” chunks of memory are part of the memory previously allocated using brk . This way, malloc doesn’t have to use syscalls (brk ) every time we call it, and thus it makes malloc – and our programs using malloc – faster. It also allows malloc and free to optimize the usage of the memory. Let’s double check that we have only one heap, allocated by the first call to brk : julien@holberton:/proc/4014$ ps aux | grep \ \./4$
julien 4169 0.0 0.0 4748 688 pts/9 S+ 13:33 0:00 strace ./4
julien 4172 0.0 0.0 4336 656 pts/9 S+ 13:33 0:00 ./4
julien@holberton:/proc/4014$ cd /proc/4172
julien@holberton:/proc/4172$ cat maps
00400000-00401000 r-xp 00000000 08:01 176973 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
00600000-00601000 r--p 00000000 08:01 176973 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
00601000-00602000 rw-p 00001000 08:01 176973 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
01314000-01335000 rw-p 00000000 00:00 0 [heap]
7f4a3f2c4000-7f4a3f47e000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f47e000-7f4a3f67e000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f67e000-7f4a3f682000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f682000-7f4a3f684000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f684000-7f4a3f689000 rw-p 00000000 00:00 0
7f4a3f689000-7f4a3f6ac000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f890000-7f4a3f893000 rw-p 00000000 00:00 0
7f4a3f8a7000-7f4a3f8ab000 rw-p 00000000 00:00 0
7f4a3f8ab000-7f4a3f8ac000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f8ac000-7f4a3f8ad000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f8ad000-7f4a3f8ae000 rw-p 00000000 00:00 0
7ffd1ba73000-7ffd1ba94000 rw-p 00000000 00:00 0 [stack]
7ffd1bbed000-7ffd1bbef000 r--p 00000000 00:00 0 [vvar]
7ffd1bbef000-7ffd1bbf1000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
julien@holberton:/proc/4172$
-> We have only one [heap] and the addresses match those returned by sbrk : 0x1314000 & 0x1335000 Naive mallocBased on the above, and assuming we won’t ever need to free anything, we can now write our own (naive) version of malloc , that would move the program break each time it is called. #include <stdlib.h>
#include <unistd.h>
/**
* malloc - naive version of malloc: dynamically allocates memory on the heap using sbrk
* @size: number of bytes to allocate
*
* Return: the memory address newly allocated, or NULL on error
*
* Note: don't do this at home :)
*/
void *malloc(size_t size)
{
void *previous_break;
previous_break = sbrk(size);
/* check for error */
if (previous_break == (void *) -1)
{
/* on error malloc returns NULL */
return (NULL);
}
return (previous_break);
}
The 0x10 lost bytesIf we look at the output of the previous program (4-main.c ), we can see that the first memory address returned by malloc doesn’t start at the beginning of the heap, but 0x10 bytes after: 0x1314010 vs 0x1314000 . Also, when we call malloc(1024) a second time, the address should be 0x1314010 (the returned value of the first call to malloc ) + 1024 (or 0x400 in hexadecimal, since the first call to malloc was asking for 1024 bytes) = 0x1318010 . But the return value of the second call to malloc is 0x1314420 . We have lost 0x10 bytes again! Same goes for the subsequent calls. Let’s look at what we can find inside those “l(fā)ost” 0x10 -byte memory spaces (5-main.c ) and whether the memory loss stays constant: #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* pmem - print mem
* @p: memory address to start printing from
* @bytes: number of bytes to print
*
* Return: nothing
*/
void pmem(void *p, unsigned int bytes)
{
unsigned char *ptr;
unsigned int i;
ptr = (unsigned char *)p;
for (i = 0; i < bytes; i++)
{
if (i != 0)
{
printf(" ");
}
printf("%02x", *(ptr + i));
}
printf("\n");
}
/**
* main - the 0x10 lost bytes
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
int i;
for (i = 0; i < 10; i++)
{
p = malloc(1024 * (i + 1));
printf("%p\n", p);
printf("bytes at %p:\n", (void *)((char *)p - 0x10));
pmem((char *)p - 0x10, 0x10);
}
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 5-main.c -o 5
julien@holberton:~/holberton/w/hackthevm3$ ./5
0x1fa8010
bytes at 0x1fa8000:
00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
0x1fa8420
bytes at 0x1fa8410:
00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
0x1fa8c30
bytes at 0x1fa8c20:
00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
0x1fa9840
bytes at 0x1fa9830:
00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
0x1faa850
bytes at 0x1faa840:
00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00
0x1fabc60
bytes at 0x1fabc50:
00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
0x1fad470
bytes at 0x1fad460:
00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
0x1faf080
bytes at 0x1faf070:
00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
0x1fb1090
bytes at 0x1fb1080:
00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00
0x1fb34a0
bytes at 0x1fb3490:
00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
julien@holberton:~/holberton/w/hackthevm3$
There is one clear pattern: the size of the malloc’ed memory chunk is always found in the preceding 0x10 bytes. For instance, the first malloc call is malloc’ing 1024 (0x0400 ) bytes and we can find 11 04 00 00 00 00 00 00 in the preceding 0x10 bytes. Those last bytes represent the number 0x 00 00 00 00 00 00 04 11 = 0x400 (1024) + 0x10 (the block size preceding those 1024 bytes + 1 (we’ll talk about this “+1” later in this chapter). If we look at each 0x10 bytes preceding the addresses returned by malloc , they all contain the size of the chunk of memory asked to malloc + 0x10 + 1 . At this point, given what we said and saw earlier, we can probably guess that those 0x10 bytes are a sort of data structure used by malloc (and free ) to deal with the heap. And indeed, even though we don’t understand everything yet, we can already use this data structure to go from one malloc’ed chunk of memory to the other (6-main.c ) as long as we have the address of the beginning of the heap (and as long as we have never called free ): #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* pmem - print mem
* @p: memory address to start printing from
* @bytes: number of bytes to print
*
* Return: nothing
*/
void pmem(void *p, unsigned int bytes)
{
unsigned char *ptr;
unsigned int i;
ptr = (unsigned char *)p;
for (i = 0; i < bytes; i++)
{
if (i != 0)
{
printf(" ");
}
printf("%02x", *(ptr + i));
}
printf("\n");
}
/**
* main - using the 0x10 bytes to jump to next malloc'ed chunks
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
int i;
void *heap_start;
size_t size_of_the_block;
heap_start = sbrk(0);
write(1, "START\n", 6);
for (i = 0; i < 10; i++)
{
p = malloc(1024 * (i + 1));
*((int *)p) = i;
printf("%p: [%i]\n", p, i);
}
p = heap_start;
for (i = 0; i < 10; i++)
{
pmem(p, 0x10);
size_of_the_block = *((size_t *)((char *)p + 8)) - 1;
printf("%p: [%i] - size = %lu\n",
(void *)((char *)p + 0x10),
*((int *)((char *)p + 0x10)),
size_of_the_block);
p = (void *)((char *)p + size_of_the_block);
}
write(1, "END\n", 4);
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 6-main.c -o 6
julien@holberton:~/holberton/w/hackthevm3$ ./6
START
0x9e6010: [0]
0x9e6420: [1]
0x9e6c30: [2]
0x9e7840: [3]
0x9e8850: [4]
0x9e9c60: [5]
0x9eb470: [6]
0x9ed080: [7]
0x9ef090: [8]
0x9f14a0: [9]
00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
0x9e6010: [0] - size = 1040
00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
0x9e6420: [1] - size = 2064
00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
0x9e6c30: [2] - size = 3088
00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
0x9e7840: [3] - size = 4112
00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00
0x9e8850: [4] - size = 5136
00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
0x9e9c60: [5] - size = 6160
00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
0x9eb470: [6] - size = 7184
00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
0x9ed080: [7] - size = 8208
00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00
0x9ef090: [8] - size = 9232
00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
0x9f14a0: [9] - size = 10256
END
julien@holberton:~/holberton/w/hackthevm3$
One of our open questions from the previous chapter is now answered: malloc is using 0x10 additional bytes for each malloc’ed memory block to store the size of the block. This data will actually be used by free to save it to a list of available blocks for future calls to malloc . But our study also raises a new question: what are the first 8 bytes of the 16 (0x10 in hexadecimal) bytes used for? It seems to always be zero. Is it just padding? RTFSCAt this stage, we probably want to check the source code of malloc to confirm what we just found (malloc.c from the glibc). 1055 /*
1056 malloc_chunk details:
1057
1058 (The following includes lightly edited explanations by Colin Plumb.)
1059
1060 Chunks of memory are maintained using a `boundary tag' method as
1061 described in e.g., Knuth or Standish. (See the paper by Paul
1062 Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
1063 survey of such techniques.) Sizes of free chunks are stored both
1064 in the front of each chunk and at the end. This makes
1065 consolidating fragmented chunks into bigger chunks very fast. The
1066 size fields also hold bits representing whether chunks are free or
1067 in use.
1068
1069 An allocated chunk looks like this:
1070
1071
1072 chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1073 | Size of previous chunk, if unallocated (P clear) |
1074 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1075 | Size of chunk, in bytes |A|M|P|
1076 mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1077 | User data starts here... .
1078 . .
1079 . (malloc_usable_size() bytes) .
1080 . |
1081 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1082 | (size of chunk, but used for application data) |
1083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1084 | Size of next chunk, in bytes |A|0|1|
1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1086
1087 Where "chunk" is the front of the chunk for the purpose of most of
1088 the malloc code, but "mem" is the pointer that is returned to the
1089 user. "Nextchunk" is the beginning of the next contiguous chunk.
-> We were correct \o/. Right before the address returned by malloc to the user, we have two variables: Let’s free some chunks to confirm that the first 8 bytes are used the way the source code describes it (7-main.c ): #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* pmem - print mem
* @p: memory address to start printing from
* @bytes: number of bytes to print
*
* Return: nothing
*/
void pmem(void *p, unsigned int bytes)
{
unsigned char *ptr;
unsigned int i;
ptr = (unsigned char *)p;
for (i = 0; i < bytes; i++)
{
if (i != 0)
{
printf(" ");
}
printf("%02x", *(ptr + i));
}
printf("\n");
}
/**
* main - confirm the source code
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
int i;
size_t size_of_the_chunk;
size_t size_of_the_previous_chunk;
void *chunks[10];
for (i = 0; i < 10; i++)
{
p = malloc(1024 * (i + 1));
chunks[i] = (void *)((char *)p - 0x10);
printf("%p\n", p);
}
free((char *)(chunks[3]) + 0x10);
free((char *)(chunks[7]) + 0x10);
for (i = 0; i < 10; i++)
{
p = chunks[i];
printf("chunks[%d]: ", i);
pmem(p, 0x10);
size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1;
size_of_the_previous_chunk = *((size_t *)((char *)p));
printf("chunks[%d]: %p, size = %li, prev = %li\n",
i, p, size_of_the_chunk, size_of_the_previous_chunk);
}
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 7-main.c -o 7
julien@holberton:~/holberton/w/hackthevm3$ ./7
0x1536010
0x1536420
0x1536c30
0x1537840
0x1538850
0x1539c60
0x153b470
0x153d080
0x153f090
0x15414a0
chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
chunks[0]: 0x1536000, size = 1040, prev = 0
chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
chunks[1]: 0x1536410, size = 2064, prev = 0
chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
chunks[2]: 0x1536c20, size = 3088, prev = 0
chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
chunks[3]: 0x1537830, size = 4112, prev = 0
chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
chunks[4]: 0x1538840, size = 5135, prev = 4112
chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
chunks[5]: 0x1539c50, size = 6160, prev = 0
chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
chunks[6]: 0x153b460, size = 7184, prev = 0
chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
chunks[7]: 0x153d070, size = 8208, prev = 0
chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
chunks[8]: 0x153f080, size = 9231, prev = 8208
chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
chunks[9]: 0x1541490, size = 10256, prev = 0
julien@holberton:~/holberton/w/hackthevm3$
As we can see from the above listing, when the previous chunk has been free’d, the malloc chunk’s first 8 bytes contain the size of the previous unallocated chunk. So the correct representation of a malloc chunk is the following: Also, it seems that the first bit of the next 8 bytes (containing the size of the current chunk) serves as a flag to check if the previous chunk is used (1 ) or not (0 ). So the correct updated version of our program should be written this way (8-main.c ): #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* pmem - print mem
* @p: memory address to start printing from
* @bytes: number of bytes to print
*
* Return: nothing
*/
void pmem(void *p, unsigned int bytes)
{
unsigned char *ptr;
unsigned int i;
ptr = (unsigned char *)p;
for (i = 0; i < bytes; i++)
{
if (i != 0)
{
printf(" ");
}
printf("%02x", *(ptr + i));
}
printf("\n");
}
/**
* main - updating with correct checks
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
int i;
size_t size_of_the_chunk;
size_t size_of_the_previous_chunk;
void *chunks[10];
char prev_used;
for (i = 0; i < 10; i++)
{
p = malloc(1024 * (i + 1));
chunks[i] = (void *)((char *)p - 0x10);
}
free((char *)(chunks[3]) + 0x10);
free((char *)(chunks[7]) + 0x10);
for (i = 0; i < 10; i++)
{
p = chunks[i];
printf("chunks[%d]: ", i);
pmem(p, 0x10);
size_of_the_chunk = *((size_t *)((char *)p + 8));
prev_used = size_of_the_chunk & 1;
size_of_the_chunk -= prev_used;
size_of_the_previous_chunk = *((size_t *)((char *)p));
printf("chunks[%d]: %p, size = %li, prev (%s) = %li\n",
i, p, size_of_the_chunk,
(prev_used? "allocated": "unallocated"), size_of_the_previous_chunk);
}
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 8-main.c -o 8
julien@holberton:~/holberton/w/hackthevm3$ ./8
chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
chunks[0]: 0x1031000, size = 1040, prev (allocated) = 0
chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
chunks[1]: 0x1031410, size = 2064, prev (allocated) = 0
chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
chunks[2]: 0x1031c20, size = 3088, prev (allocated) = 0
chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
chunks[3]: 0x1032830, size = 4112, prev (allocated) = 0
chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
chunks[4]: 0x1033840, size = 5136, prev (unallocated) = 4112
chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
chunks[5]: 0x1034c50, size = 6160, prev (allocated) = 0
chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
chunks[6]: 0x1036460, size = 7184, prev (allocated) = 0
chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
chunks[7]: 0x1038070, size = 8208, prev (allocated) = 0
chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
chunks[8]: 0x103a080, size = 9232, prev (unallocated) = 8208
chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
chunks[9]: 0x103c490, size = 10256, prev (allocated) = 0
julien@holberton:~/holberton/w/hackthevm3$
Is the heap actually growing upwards?The last question left unanswered is: “Is the heap actually growing upwards?”. From the brk man page, it seems so: DESCRIPTION
brk() and sbrk() change the location of the program break, which defines the end of the
process's data segment (i.e., the program break is the first location after the end of
the uninitialized data segment). Increasing the program break has the effect of allocat‐
ing memory to the process; decreasing the break deallocates memory.
Let’s check! (9-main.c ) #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* main - moving the program break
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
int i;
write(1, "START\n", 6);
malloc(1);
getchar();
write(1, "LOOP\n", 5);
for (i = 0; i < 0x25000 / 1024; i++)
{
malloc(1024);
}
write(1, "END\n", 4);
getchar();
return (EXIT_SUCCESS);
}
Now let’s confirm this assumption with strace: julien@holberton:~/holberton/w/hackthevm3$ strace ./9
execve("./9", ["./9"], [/* 61 vars */]) = 0
...
write(1, "START\n", 6START
) = 6
brk(0) = 0x1fd8000
brk(0x1ff9000) = 0x1ff9000
...
write(1, "LOOP\n", 5LOOP
) = 5
brk(0x201a000) = 0x201a000
write(1, "END\n", 4END
) = 4
...
julien@holberton:~/holberton/w/hackthevm3$
clearly, malloc made only two calls to brk to increase the allocated space on the heap. And the second call is using a higher memory address argument (0x201a000 > 0x1ff9000 ). The second syscall was triggered when the space on the heap was too small to host all the malloc calls. Let’s double check with /proc . julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 9-main.c -o 9
julien@holberton:~/holberton/w/hackthevm3$ ./9
START
julien@holberton:/proc/7855$ ps aux | grep \ \./9$
julien 7972 0.0 0.0 4332 684 pts/9 S+ 19:08 0:00 ./9
julien@holberton:/proc/7855$ cd /proc/7972
julien@holberton:/proc/7972$ cat maps
...
00901000-00922000 rw-p 00000000 00:00 0 [heap]
...
julien@holberton:/proc/7972$
-> 00901000-00922000 rw-p 00000000 00:00 0 [heap] Let’s hit Enter and look at the [heap] again: LOOP
END
julien@holberton:/proc/7972$ cat maps
...
00901000-00943000 rw-p 00000000 00:00 0 [heap]
...
julien@holberton:/proc/7972$
-> 00901000-00943000 rw-p 00000000 00:00 0 [heap] The beginning of the heap is still the same, but the size has increased upwards from 00922000 to 00943000 . The Address Space Layout Randomisation (ASLR)You may have noticed something “strange” in the /proc/pid/maps listing above, that we want to study: The program break is the address of the first location beyond the current end of the data region – so the address of the first location beyond the executable in the virtual memory. As a consequence, the heap should start right after the end of the executable in memory. As you can see in all above listing, it is NOT the case. The only thing that is true is that the heap is always the next memory region after the executable, which makes sense since the heap is actually part of the data segment of the executable itself. Also, if we look even closer, the memory gap size between the executable and the heap is never the same: Format of the following lines: [PID of the above maps listings]: address of the beginning of the [heap] – address of the end of the executable = memory gap size [3718]: 01195000 – 00602000 = b93000 [3834]: 024d6000 – 00602000 = 1ed4000 [4014]: 00e70000 – 00602000 = 86e000 [4172]: 01314000 – 00602000 = d12000 [7972]: 00901000 – 00602000 = 2ff000
It seems that this gap size is random, and indeed, it is. If we look at the ELF binary loader source code (fs/binfmt_elf.c ) we can find this: if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
current->mm->brk = current->mm->start_brk =
arch_randomize_brk(current->mm);
#ifdef compat_brk_randomized
current->brk_randomized = 1;
#endif
}
where current->mm->brk is the address of the program break. The arch_randomize_brk function can be found in the arch/x86/kernel/process.c file: unsigned long arch_randomize_brk(struct mm_struct *mm)
{
unsigned long range_end = mm->brk + 0x02000000;
return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
}
The randomize_range returns a start address such that: [...... <range> .....]
start end
Source code of the randomize_range function (drivers/char/random.c ): /*
* randomize_range() returns a start address such that
*
* [...... <range> .....]
* start end
*
* a <range> with size "len" starting at the return value is inside in the
* area defined by [start, end], but is otherwise randomized.
*/
unsigned long
randomize_range(unsigned long start, unsigned long end, unsigned long len)
{
unsigned long range = end - len - start;
if (end <= start + len)
return 0;
return PAGE_ALIGN(get_random_int() % range + start);
}
As a result, the offset between the data section of the executable and the program break initial position when the process runs can have a size of anywhere between 0 and 0x02000000 . This randomization is known as Address Space Layout Randomisation (ASLR). ASLR is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the positions of the heap and the stack. The updated VM diagramWith all the above in mind, we can now update our VM diagram: malloc(0)
Did you ever wonder what was happening when we call malloc with a size of 0 ? Let’s check! (10-main.c ) #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/**
* pmem - print mem
* @p: memory address to start printing from
* @bytes: number of bytes to print
*
* Return: nothing
*/
void pmem(void *p, unsigned int bytes)
{
unsigned char *ptr;
unsigned int i;
ptr = (unsigned char *)p;
for (i = 0; i < bytes; i++)
{
if (i != 0)
{
printf(" ");
}
printf("%02x", *(ptr + i));
}
printf("\n");
}
/**
* main - moving the program break
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
void *p;
size_t size_of_the_chunk;
char prev_used;
p = malloc(0);
printf("%p\n", p);
pmem((char *)p - 0x10, 0x10);
size_of_the_chunk = *((size_t *)((char *)p - 8));
prev_used = size_of_the_chunk & 1;
size_of_the_chunk -= prev_used;
printf("chunk size = %li bytes\n", size_of_the_chunk);
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 10-main.c -o 10
julien@holberton:~/holberton/w/hackthevm3$ ./10
0xd08010
00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00
chunk size = 32 bytes
julien@holberton:~/holberton/w/hackthevm3$
-> malloc(0) is actually using 32 bytes, including the first 0x10 bytes. Again, note that this will not always be the case. From the man page (man malloc ): NULL may also be returned by a successful call to malloc() with a size of zero
OutroWe have learned a couple of things about malloc and the heap. But there is actually more than brk and sbrk . You can try malloc’ing a big chunk of memory, strace it, and look at /proc to learn more before we cover it in a next chapter Also, studying how free works in coordination with malloc is something we haven’t covered yet. If you want to look at it, you will find part of the answer to why the minimum chunk size is 32 (when we ask malloc for 0 bytes) vs 16 (0x10 in hexadecimal) or 0 . As usual, to be continued! Let me know if you have something you would like me to cover in the next chapter. Questions? Feedback?If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42. Haters, please send your comments to /dev/null . Happy Hacking! Thank you for reading!As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed. FilesThis repo contains the source code (naive_malloc.c , version.c & “X-main.c` files) for programs created in this tutorial. Read more about the virtual memoryFollow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! Many thanks to Tim, Anne and Ian for proof-reading!
Hack The Virtual Memory, chapter 2: Drawing the VM diagramWe previously talked about what you could find in the virtual memory of a process, and where you could find it. Today, we will try to “reconstruct” (part of) the following diagram by making our process print addresses of various elements of the program. PrerequisitesIn order to fully understand this article, you will need to know: The basics of the C programming language A little bit of assembly (but not required) The very basics of the Linux filesystem and the shell We will also use the /proc/[pid]/maps file (see man proc or read our first article Hack The Virtual Memory, chapter 0: C strings & /proc)
EnvironmentAll scripts and programs have been tested on the following system: Ubuntu Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Everything we will write will be true for this system, but may be different on another system
Tools used: The stackThe first thing we want to locate in our diagram is the stack. We know that in C, local variables are located on the stack. So if we print the address of a local variable, it should give us an idea on where we would find the stack in the virtual memory. Let’s use this program (main-1.c ) to find out: #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
int a;
printf("Address of a: %p\n", (void *)&a);
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-0.c -o 0
julien@holberton:~/holberton/w/hackthevm2$ ./0
Address of a: 0x7ffd14b8bd9c
julien@holberton:~/holberton/w/hackthevm2$
This will be our first point of reference when we will compare other elements’ addresses. The heapThe heap is used when you malloc space for your variables. Let’s add a line to use malloc and see where the memory address returned by malloc is located (main-1.c ): #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
int a;
void *p;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-1.c -o 1
julien@holberton:~/holberton/w/hackthevm2$ ./1
Address of a: 0x7ffd4204c554
Allocated space in the heap: 0x901010
julien@holberton:~/holberton/w/hackthevm2$
It’s now clear that the heap (0x901010 ) is way below the stack (0x7ffd4204c554 ). At this point we can already draw this diagram: The executableYour program is also in the virtual memory. If we print the address of the main function, we should have an idea of where the program is located compared to the stack and the heap. Let’s see if we find it below the heap as expected (main-2.c ): #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
int a;
void *p;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
printf("Address of function main: %p\n", (void *)main);
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-2.c -o 2
julien@holberton:~/holberton/w/hackthevm2$ ./2
Address of a: 0x7ffdced37d74
Allocated space in the heap: 0x2199010
Address of function main: 0x40060d
julien@holberton:~/holberton/w/hackthevm2$
It seems that our program (0x40060d ) is located below the heap (0x2199010 ), just as expected. But let’s make sure that this is the actual code of our program, and not some sort of pointer to another location. Let’s disassemble our program 2 with objdump to look at the “memory address” of the main function: julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 2 | grep '<main>:' -A 5
000000000040060d <main>:
40060d: 55 push rbp
40060e: 48 89 e5 mov rbp,rsp
400611: 48 83 ec 10 sub rsp,0x10
400615: 48 8d 45 f4 lea rax,[rbp-0xc]
400619: 48 89 c6 mov rsi,rax
000000000040060d <main> -> we find the exact same address (0x40060d ). If you still have any doubts, you can print the first bytes located at this address, to make sure they match the output of objdump (main-3.c ):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
int a;
void *p;
unsigned int i;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
printf("Address of function main: %p\n", (void *)main);
printf("First bytes of the main function:\n\t");
for (i = 0; i < 15; i++)
{
printf("%02x ", ((unsigned char *)main)[i]);
}
printf("\n");
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-3.c -o 3
julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 3 | grep '<main>:' -A 5
000000000040064d <main>:
40064d: 55 push rbp
40064e: 48 89 e5 mov rbp,rsp
400651: 48 83 ec 10 sub rsp,0x10
400655: 48 8d 45 f0 lea rax,[rbp-0x10]
400659: 48 89 c6 mov rsi,rax
julien@holberton:~/holberton/w/hackthevm2$ ./3
Address of a: 0x7ffeff0f13b0
Allocated space in the heap: 0x8b3010
Address of function main: 0x40064d
First bytes of the main function:
55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6
julien@holberton:~/holberton/w/hackthevm2$ echo "55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6" | udcli -64 -x -o 40064d
000000000040064d 55 push rbp
000000000040064e 4889e5 mov rbp, rsp
0000000000400651 4883ec10 sub rsp, 0x10
0000000000400655 488d45f0 lea rax, [rbp-0x10]
0000000000400659 4889c6 mov rsi, rax
julien@holberton:~/holberton/w/hackthevm2$
-> We can see that we print the same address and the same content. We are now triple sure this is our main function. You can download the Udis86 Disassembler Library here. Here is the updated diagram, based on what we have learned: Command line arguments and environment variablesThe main function can take arguments: Let’s see where those elements stand in the virtual memory of our process (main-4.c ): #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(int ac, char **av, char **env)
{
int a;
void *p;
int i;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
printf("Address of function main: %p\n", (void *)main);
printf("First bytes of the main function:\n\t");
for (i = 0; i < 15; i++)
{
printf("%02x ", ((unsigned char *)main)[i]);
}
printf("\n");
printf("Address of the array of arguments: %p\n", (void *)av);
printf("Addresses of the arguments:\n\t");
for (i = 0; i < ac; i++)
{
printf("[%s]:%p ", av[i], av[i]);
}
printf("\n");
printf("Address of the array of environment variables: %p\n", (void *)env);
printf("Address of the first environment variable: %p\n", (void *)(env[0]));
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-4.c -o 4
julien@holberton:~/holberton/w/hackthevm2$ ./4 Hello Holberton School!
Address of a: 0x7ffe7d6d8da0
Allocated space in the heap: 0xc8c010
Address of function main: 0x40069d
First bytes of the main function:
55 48 89 e5 48 83 ec 30 89 7d ec 48 89 75 e0
Address of the array of arguments: 0x7ffe7d6d8e98
Addresses of the arguments:
[./4]:0x7ffe7d6da373 [Hello]:0x7ffe7d6da377 [Holberton]:0x7ffe7d6da37d [School!]:0x7ffe7d6da387
Address of the array of environment variables: 0x7ffe7d6d8ec0
Address of the first environment variables:
[0x7ffe7d6da38f]:"XDG_VTNR=7"
[0x7ffe7d6da39a]:"XDG_SESSION_ID=c2"
[0x7ffe7d6da3ac]:"CLUTTER_IM_MODULE=xim"
julien@holberton:~/holberton/w/hackthevm2$
These elements are above the stack as expected, but now we know the exact order: stack (0x7ffe7d6d8da0 ) < argv (0x7ffe7d6d8e98 ) < env (0x7ffe7d6d8ec0 ) < arguments (from 0x7ffe7d6da373 to 0x7ffe7d6da387 + 8 (8 = size of the string "school" + 1 for the '\0' char)) < environment variables (starting at 0x7ffe7d6da38f ). Actually, we can also see that all the command line arguments are next to each other in the memory, and also right next to the environment variables. Are the argv and env arrays next to each other?The array argv is 5 elements long (there were 4 arguments from the command line + 1 NULL element at the end (argv always ends with NULL to mark the end of the array)). Each element is a pointer to a char and since we are on a 64-bit machine, a pointer is 8 bytes (if you want to make sure, you can use the C operator sizeof() to get the size of a pointer). As a result our argv array is of size 5 * 8 = 40 . 40 in base 10 is 0x28 in base 16 . If we add this value to the address of the beginning of the array 0x7ffe7d6d8e98 , we get… 0x7ffe7d6d8ec0 (The address of the env array)! So the two arrays are next to each other in memory. Is the first command line argument stored right after the env array?In order to check this we need to know the size of the env array. We know that it ends with a NULL pointer, so in order to get the number of its elements we simply need to loop through it, checking if the “current” element is NULL . Here’s the updated C code (main-5.c ): #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(int ac, char **av, char **env)
{
int a;
void *p;
int i;
int size;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
printf("Address of function main: %p\n", (void *)main);
printf("First bytes of the main function:\n\t");
for (i = 0; i < 15; i++)
{
printf("%02x ", ((unsigned char *)main)[i]);
}
printf("\n");
printf("Address of the array of arguments: %p\n", (void *)av);
printf("Addresses of the arguments:\n\t");
for (i = 0; i < ac; i++)
{
printf("[%s]:%p ", av[i], av[i]);
}
printf("\n");
printf("Address of the array of environment variables: %p\n", (void *)env);
printf("Address of the first environment variables:\n");
for (i = 0; i < 3; i++)
{
printf("\t[%p]:\"%s\"\n", env[i], env[i]);
}
/* size of the env array */
i = 0;
while (env[i] != NULL)
{
i++;
}
i++; /* the NULL pointer */
size = i * sizeof(char *);
printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ ./5 Hello Betty Holberton!
Address of a: 0x7ffc77598acc
Allocated space in the heap: 0x2216010
Address of function main: 0x40069d
First bytes of the main function:
55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
Address of the array of arguments: 0x7ffc77598bc8
Addresses of the arguments:
[./5]:0x7ffc7759a374 [Hello]:0x7ffc7759a378 [Betty]:0x7ffc7759a37e [Holberton!]:0x7ffc7759a384
Address of the array of environment variables: 0x7ffc77598bf0
Address of the first environment variables:
[0x7ffc7759a38f]:"XDG_VTNR=7"
[0x7ffc7759a39a]:"XDG_SESSION_ID=c2"
[0x7ffc7759a3ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
julien@holberton:~/holberton/w/hackthevm2$ bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
obase=16
ibase=16
1F0+7FFC77598BF0
7FFC77598DE0
quit
julien@holberton:~/holberton/w/hackthevm2$
-> 7FFC77598DE0 != (but still <) 0x7ffc7759a374 . So the answer is no Wrapping upLet’s update our diagram with what we learned. Is the stack really growing downwards?Let’s call a function and figure this out! If this is true, then the variables of the calling function will be higher in memory than those from the called function (main-6.c ). #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* f - print locations of various elements
*
* Returns: nothing
*/
void f(void)
{
int a;
int b;
int c;
a = 98;
b = 1024;
c = a * b;
printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c);
printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c);
}
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(int ac, char **av, char **env)
{
int a;
void *p;
int i;
int size;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
printf("Address of function main: %p\n", (void *)main);
printf("First bytes of the main function:\n\t");
for (i = 0; i < 15; i++)
{
printf("%02x ", ((unsigned char *)main)[i]);
}
printf("\n");
printf("Address of the array of arguments: %p\n", (void *)av);
printf("Addresses of the arguments:\n\t");
for (i = 0; i < ac; i++)
{
printf("[%s]:%p ", av[i], av[i]);
}
printf("\n");
printf("Address of the array of environment variables: %p\n", (void *)env);
printf("Address of the first environment variables:\n");
for (i = 0; i < 3; i++)
{
printf("\t[%p]:\"%s\"\n", env[i], env[i]);
}
/* size of the env array */
i = 0;
while (env[i] != NULL)
{
i++;
}
i++; /* the NULL pointer */
size = i * sizeof(char *);
printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
f();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-6.c -o 6
julien@holberton:~/holberton/w/hackthevm2$ ./6
Address of a: 0x7ffdae53ea4c
Allocated space in the heap: 0xf32010
Address of function main: 0x4006f9
First bytes of the main function:
55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
Address of the array of arguments: 0x7ffdae53eb48
Addresses of the arguments:
[./6]:0x7ffdae54038b
Address of the array of environment variables: 0x7ffdae53eb58
Address of the first environment variables:
[0x7ffdae54038f]:"XDG_VTNR=7"
[0x7ffdae54039a]:"XDG_SESSION_ID=c2"
[0x7ffdae5403ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
[f] a = 98, b = 1024, c = a * b = 100352
[f] Adresses of a: 0x7ffdae53ea04, b = 0x7ffdae53ea08, c = 0x7ffdae53ea0c
julien@holberton:~/holberton/w/hackthevm2$
-> True! (address of var a in function f ) 0x7ffdae53ea04 < 0x7ffdae53ea4c (address of var a in function main ) We now update our diagram: /proc
Let’s double check everything we found so far with /proc/[pid]/maps (man proc or refer to the first article in this series to learn about the proc filesystem if you don’t know what it is). Let’s add a getchar() to our program so that we can look at its “/proc ” (main-7.c ): #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* f - print locations of various elements
*
* Returns: nothing
*/
void f(void)
{
int a;
int b;
int c;
a = 98;
b = 1024;
c = a * b;
printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c);
printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c);
}
/**
* main - print locations of various elements
*
* Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
*/
int main(int ac, char **av, char **env)
{
int a;
void *p;
int i;
int size;
printf("Address of a: %p\n", (void *)&a);
p = malloc(98);
if (p == NULL)
{
fprintf(stderr, "Can't malloc\n");
return (EXIT_FAILURE);
}
printf("Allocated space in the heap: %p\n", p);
printf("Address of function main: %p\n", (void *)main);
printf("First bytes of the main function:\n\t");
for (i = 0; i < 15; i++)
{
printf("%02x ", ((unsigned char *)main)[i]);
}
printf("\n");
printf("Address of the array of arguments: %p\n", (void *)av);
printf("Addresses of the arguments:\n\t");
for (i = 0; i < ac; i++)
{
printf("[%s]:%p ", av[i], av[i]);
}
printf("\n");
printf("Address of the array of environment variables: %p\n", (void *)env);
printf("Address of the first environment variables:\n");
for (i = 0; i < 3; i++)
{
printf("\t[%p]:\"%s\"\n", env[i], env[i]);
}
/* size of the env array */
i = 0;
while (env[i] != NULL)
{
i++;
}
i++; /* the NULL pointer */
size = i * sizeof(char *);
printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
f();
getchar();
return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-7.c -o 7
julien@holberton:~/holberton/w/hackthevm2$ ./7 Rona is a Legend SRE
Address of a: 0x7fff16c8146c
Allocated space in the heap: 0x2050010
Address of function main: 0x400739
First bytes of the main function:
55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
Address of the array of arguments: 0x7fff16c81568
Addresses of the arguments:
[./7]:0x7fff16c82376 [Rona]:0x7fff16c8237a [is]:0x7fff16c8237f [a]:0x7fff16c82382 [Legend]:0x7fff16c82384 [SRE]:0x7fff16c8238b
Address of the array of environment variables: 0x7fff16c815a0
Address of the first environment variables:
[0x7fff16c8238f]:"XDG_VTNR=7"
[0x7fff16c8239a]:"XDG_SESSION_ID=c2"
[0x7fff16c823ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
[f] a = 98, b = 1024, c = a * b = 100352
[f] Adresses of a: 0x7fff16c81424, b = 0x7fff16c81428, c = 0x7fff16c8142c
julien@holberton:~$ ps aux | grep "./7" | grep -v grep
julien 5788 0.0 0.0 4336 628 pts/8 S+ 18:04 0:00 ./7 Rona is a Legend SRE
julien@holberton:~$ cat /proc/5788/maps
00400000-00401000 r-xp 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
00600000-00601000 r--p 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
00601000-00602000 rw-p 00001000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
02050000-02071000 rw-p 00000000 00:00 0 [heap]
7f68caa1c000-7f68cabd6000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f68cabd6000-7f68cadd6000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f68cadd6000-7f68cadda000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f68cadda000-7f68caddc000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f68caddc000-7f68cade1000 rw-p 00000000 00:00 0
7f68cade1000-7f68cae04000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f68cafe8000-7f68cafeb000 rw-p 00000000 00:00 0
7f68cafff000-7f68cb003000 rw-p 00000000 00:00 0
7f68cb003000-7f68cb004000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f68cb004000-7f68cb005000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f68cb005000-7f68cb006000 rw-p 00000000 00:00 0
7fff16c62000-7fff16c83000 rw-p 00000000 00:00 0 [stack]
7fff16d07000-7fff16d09000 r--p 00000000 00:00 0 [vvar]
7fff16d09000-7fff16d0b000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
julien@holberton:~$
Let’s check a few things: The stack starts at 7fff16c62000 and ends at 7fff16c83000 . Our variables are all inside this region (0x7fff16c8146c , 0x7fff16c81424 , 0x7fff16c81428 , 0x7fff16c8142c ) The heap starts at 02050000 and ends at 02071000 . Our allocated memory is in there (0x2050010 ) Our code (the main function) is located at address 0x400739 , so in the following region:
00400000-00401000 r-xp 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7 It comes from the file /home/julien/holberton/w/hackthevm2/7 (our executable) and this region has execution permissions, which also makes sense. The arguments and environment variables (from 0x7fff16c81568 to 0x7fff16c8238f + 0x1f0 ) are located in the region starting at 7fff16c62000 and ending at 7fff16c83000 … the stack! So they are IN the stack, not outside the stack.
This also brings up more questions: Why is our executable “divided” into three different regions with different permissions? What is inside these two regions? What are all those other regions? Why our allocated memory does not start at the very beginning of the heap (0x2050010 vs 02050000 )? What are those first 16 bytes used for?
There is also another fact that we haven’t checked: Is the heap actually growing upwards? We’ll find out another day! But before we end this chapter, let’s update our diagram with everything we’ve learned: OutroWe have learned a ton of things by simply printing informations from our executables! But we still have open questions that we will explore in a future chapter to complete our diagram of the virtual memory. In the meantime, you should try to find out yourself. Questions? Feedback?If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42. Haters, please send your comments to /dev/null . Happy Hacking! Thank you for reading!As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed. FilesThis repo contains the source code (main-X.c files) for programs created in this tutorial. Read more about the virtual memoryFollow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the third chapter in our series on the virtual memory. Many thanks to Tim and SantaCruzDad for English proof-reading!
Hack The Virtual Memory, Chapter 1: Python bytesFor this second chapter, we’ll do almost the same thing as for chapter 0: C strings & /proc, but instead we’ll access the virtual memory of a running Python 3 script. It won’t be as straightfoward. Let’s take this as an excuse to look at some Python 3 internals! PrerequisitesThis article is based on everything we learned in the previous chapter. Please read (and understand) chapter 0: C strings & /proc before reading this one. In order to fully understand this article, you need to know: The basics of the C programming language Some Python The very basics of the Linux filesystem and the shell The very basics of the /proc filesystem (see chapter 0: C strings & /proc for an intro on this topic)
EnvironmentAll scripts and programs have been tested on the following system: Python scriptWe’ll first use this script (main.py ) and try to modify the “string” Holberton in the virtual memory of the process running it. #!/usr/bin/env python3
'''
Prints a b"string" (bytes object), reads a char from stdin
and prints the same (or not :)) string again
'''
import sys
s = b"Holberton"
print(s)
sys.stdin.read(1)
print(s)
About the bytes objectbytes vs strAs you can see, we are using a bytes object (we use a b in front of our string literal) to store our string. This type will store the characters of the string as bytes (vs potentially multibytes – you can read the unicodeobject.h to learn more about how Python 3 encodes strings). This ensures that the string will be a succession of ASCII-values in the virtual memory of the process running the script. Technically s is not a Python string (but it doesn’t matter in our context): julien@holberton:~/holberton/w/hackthevm1$ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "Betty"
>>> type(s)
<class 'str'>
>>> s = b"Betty"
>>> type(s)
<class 'bytes'>
>>> quit()
Everything is an objectEverything in Python is an object: integers, strings, bytes, functions, everything. So the line s = b"Holberton" should create an object of type bytes , and store the string b"Holberton somewhere in memory. Probably in the heap since it has to reserve space for the object and the bytes referenced by or stored in the object (at this point we don’t know about the exact implementation). Running read_write_heap.py against the Python scriptNote: read_write_heap.py is a script we wrote in the previous chapter chapter 0: C strings & /proc Let’s run the above script and then run our read_write_heap.py script: julien@holberton:~/holberton/w/hackthevm1$ ./main.py
b'Holberton'
At this point main.py is waiting for the user to hit Enter . That corresponds to the line sys.stdin.read(1) in our code. Let’s run read_write_heap.py : julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep
julien 3929 0.0 0.7 31412 7848 pts/0 S+ 15:10 0:00 python3 ./main.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 3929 Holberton "~ Betty ~"
[*] maps: /proc/3929/maps
[*] mem: /proc/3929/mem
[*] Found [heap]:
pathname = [heap]
addresses = 022dc000-023c6000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [22dc000] | end [23c6000]
[*] Found 'Holberton' at 8e192
[*] Writing '~ Betty ~' at 236a192
julien@holberton:~/holberton/w/hackthevm1$
Easy! As expected, we found the string on the heap and replaced it. Now when we will hit Enter in the main.py script, it will print b'~ Betty ~' : b'Holberton'
julien@holberton:~/holberton/w/hackthevm1$
Wait, WAT?! We found the string “Holberton” and replaced it, but it was not the correct string? Before we go down the rabbit hole, we have one more thing to check. Our script stops when it finds the first occurence of the string. Let’s run it several times to see if there are more occurences of the same string in the heap. julien@holberton:~/holberton/w/hackthevm1$ ./main.py
b'Holberton'
julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep
julien 4051 0.1 0.7 31412 7832 pts/0 S+ 15:53 0:00 python3 ./main.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 4051 Holberton "~ Betty ~"
[*] maps: /proc/4051/maps
[*] mem: /proc/4051/mem
[*] Found [heap]:
pathname = [heap]
addresses = 00bf4000-00cde000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [bf4000] | end [cde000]
[*] Found 'Holberton' at 8e162
[*] Writing '~ Betty ~' at c82162
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 4051 Holberton "~ Betty ~"
[*] maps: /proc/4051/maps
[*] mem: /proc/4051/mem
[*] Found [heap]:
pathname = [heap]
addresses = 00bf4000-00cde000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [bf4000] | end [cde000]
Can't find 'Holberton'
julien@holberton:~/holberton/w/hackthevm1$
Only one occurence. So where is the string “Holberton” that is used by the script? Where is our Python bytes object in memory? Could it be in the stack? Let’s replace “[heap]” by “[stack]”* in our read_write_heap.py script to create the read_write_stack.py : (*) _see previous article, the stack region is called “[stack]” on the /proc/[pid]/maps file_ #!/usr/bin/env python3
'''
Locates and replaces the first occurrence of a string in the stack
of a process
Usage: ./read_write_stack.py PID search_string replace_by_string
Where:
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
search_string with
'''
import sys
def print_usage_and_exit():
print('Usage: {} pid search write'.format(sys.argv[0]))
sys.exit(1)
# check usage
if len(sys.argv) != 4:
print_usage_and_exit()
# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string == "":
print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string == "":
print_usage_and_exit()
# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))
# try opening the maps file
try:
maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
print("[ERROR] Can not open file {}:".format(maps_filename))
print(" I/O error({}): {}".format(e.errno, e.strerror))
sys.exit(1)
for line in maps_file:
sline = line.split(' ')
# check if we found the stack
if sline[-1][:-1] != "[stack]":
continue
print("[*] Found [stack]:")
# parse line
addr = sline[0]
perm = sline[1]
offset = sline[2]
device = sline[3]
inode = sline[4]
pathname = sline[-1][:-1]
print("\tpathname = {}".format(pathname))
print("\taddresses = {}".format(addr))
print("\tpermisions = {}".format(perm))
print("\toffset = {}".format(offset))
print("\tinode = {}".format(inode))
# check if there is read and write permission
if perm[0] != 'r' or perm[1] != 'w':
print("[*] {} does not have read/write permission".format(pathname))
maps_file.close()
exit(0)
# get start and end of the stack in the virtual memory
addr = addr.split("-")
if len(addr) != 2: # never trust anyone, not even your OS :)
print("[*] Wrong addr format")
maps_file.close()
exit(1)
addr_start = int(addr[0], 16)
addr_end = int(addr[1], 16)
print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
# open and read mem
try:
mem_file = open(mem_filename, 'rb+')
except IOError as e:
print("[ERROR] Can not open file {}:".format(mem_filename))
print(" I/O error({}): {}".format(e.errno, e.strerror))
maps_file.close()
exit(1)
# read stack
mem_file.seek(addr_start)
stack = mem_file.read(addr_end - addr_start)
# find string
try:
i = stack.index(bytes(search_string, "ASCII"))
except Exception:
print("Can't find '{}'".format(search_string))
maps_file.close()
mem_file.close()
exit(0)
print("[*] Found '{}' at {:x}".format(search_string, i))
# write the new stringprint("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
mem_file.seek(addr_start + i)
mem_file.write(bytes(write_string, "ASCII"))
# close filesmaps_file.close()
mem_file.close()
# there is only one stack in our example
break
The above script (read_write_heap.py ) does exactly the same thing than the previous one (read_write_stack.py ), the same way. Except we’re looking at the stack, instead of the heap. Let’s try to find our string in the stack: julien@holberton:~/holberton/w/hackthevm1$ ./main.py
b'Holberton'
julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep
julien 4124 0.2 0.7 31412 7848 pts/0 S+ 16:10 0:00 python3 ./main.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_stack.py 4124 Holberton "~ Betty ~"
[sudo] password for julien:
[*] maps: /proc/4124/maps
[*] mem: /proc/4124/mem
[*] Found [stack]:
pathname = [stack]
addresses = 7fff2997e000-7fff2999f000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [7fff2997e000] | end [7fff2999f000]
Can't find 'Holberton'
julien@holberton:~/holberton/w/hackthevm1$
So our string is not in the heap and not in the stack: Where is it? It’s time to dig into Python 3 internals and locate the string using what we will learn. Brace yourselves, the fun will begin now Locating the string in the virtual memoryNote: It is important to note that there are many implementations of Python 3. In this article, we are using the original and most commonly used: CPython (coded in C). What we are about to say about Python 3 will only be true for this implementation. idThere is a simple way to know where the object (be careful, the object is not the string) is in the virtual memory. CPython has a specific implementation of the id() builtin: id() will return the address of the object in memory. If we add a line to the Python script to print the id of our object, we should get its address (main_id.py ): #!/usr/bin/env python3
'''
Prints:
- the address of the bytes object
- a b"string" (bytes object)
reads a char from stdin
and prints the same (or not :)) string again
'''
import sys
s = b"Holberton"
print(hex(id(s)))
print(s)
sys.stdin.read(1)
print(s)
julien@holberton:~/holberton/w/hackthevm1$ ./main_id.py
0x7f343f010210
b'Holberton'
-> 0x7f343f010210 . Let’s look at /proc/ to understand where exactly our object is located. julien@holberton:/usr/include/python3.4$ ps aux | grep main_id.py | grep -v grep
julien 4344 0.0 0.7 31412 7856 pts/0 S+ 16:53 0:00 python3 ./main_id.py
julien@holberton:/usr/include/python3.4$ cat /proc/4344/maps
00400000-006fa000 r-xp 00000000 08:01 655561 /usr/bin/python3.4
008f9000-008fa000 r--p 002f9000 08:01 655561 /usr/bin/python3.4
008fa000-00986000 rw-p 002fa000 08:01 655561 /usr/bin/python3.4
00986000-009a2000 rw-p 00000000 00:00 0
021ba000-022a4000 rw-p 00000000 00:00 0 [heap]
7f343d797000-7f343de79000 r--p 00000000 08:01 663747 /usr/lib/locale/locale-archive
7f343de79000-7f343df7e000 r-xp 00000000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so
7f343df7e000-7f343e17d000 ---p 00105000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so
7f343e17d000-7f343e17e000 r--p 00104000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so
7f343e17e000-7f343e17f000 rw-p 00105000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so
7f343e17f000-7f343e197000 r-xp 00000000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e197000-7f343e396000 ---p 00018000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e396000-7f343e397000 r--p 00017000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e397000-7f343e398000 rw-p 00018000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8
7f343e398000-7f343e3bf000 r-xp 00000000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e3bf000-7f343e5bf000 ---p 00027000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e5bf000-7f343e5c1000 r--p 00027000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e5c1000-7f343e5c2000 rw-p 00029000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7f343e5c2000-7f343e5c4000 r-xp 00000000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e5c4000-7f343e7c3000 ---p 00002000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e7c3000-7f343e7c4000 r--p 00001000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e7c4000-7f343e7c5000 rw-p 00002000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so
7f343e7c5000-7f343e7c8000 r-xp 00000000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e7c8000-7f343e9c7000 ---p 00003000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e9c7000-7f343e9c8000 r--p 00002000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e9c8000-7f343e9c9000 rw-p 00003000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so
7f343e9c9000-7f343eb83000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f343eb83000-7f343ed83000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f343ed83000-7f343ed87000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f343ed87000-7f343ed89000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f343ed89000-7f343ed8e000 rw-p 00000000 00:00 0
7f343ed8e000-7f343eda7000 r-xp 00000000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343eda7000-7f343efa6000 ---p 00019000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343efa6000-7f343efa7000 r--p 00018000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343efa7000-7f343efa8000 rw-p 00019000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f343efa8000-7f343efac000 rw-p 00000000 00:00 0
7f343efac000-7f343efcf000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f343f000000-7f343f1b6000 rw-p 00000000 00:00 0
7f343f1c5000-7f343f1cc000 r--s 00000000 08:01 918462 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
7f343f1cc000-7f343f1ce000 rw-p 00000000 00:00 0
7f343f1ce000-7f343f1cf000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f343f1cf000-7f343f1d0000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f343f1d0000-7f343f1d1000 rw-p 00000000 00:00 0
7ffccf1fd000-7ffccf21e000 rw-p 00000000 00:00 0 [stack]
7ffccf23c000-7ffccf23e000 r--p 00000000 00:00 0 [vvar]
7ffccf23e000-7ffccf240000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
julien@holberton:/usr/include/python3.4$
-> Our object is stored in the following memory region: 7f343f000000-7f343f1b6000 rw-p 00000000 00:00 0 , which is not the heap, and not the stack. That confirms what we saw earlier. But that doesn’t mean that the string itself is stored in the same memory region. For instance, the bytes object could store a pointer to the string, and not a copy of a string. Of course, at this point we could search for our string in this memory region, but we want to understand and be positive we’re looking in the right area, and not use “brute force” to find the solution. It’s time to learn more about bytes objects. bytesobject.h
We are using the C implementation of Python (CPython), so let’s look at the header file for bytes objects. Note: If you don’t have the Python 3 header files, you can use this command on Ubuntu : sudo apt-get install python3-dev to downloadd them on your system. If you are using the exact same environment as me (see the “Environment” section above), you should then be able to see the Python 3 header files in the /usr/include/python3.4/ directory. From bytesobject.h : typedef struct {
PyObject_VAR_HEAD
Py_hash_t ob_shash;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
*/
} PyBytesObject;
What does that tell us? A Python 3 bytes object is represented internally using a variable of type PyBytesObject ob_sval holds the entire string
The string ends with 0 ob_size stores the length of the string (to find ob_size , look at the definition of the macro PyObject_VAR_HEAD in objects.h . We’ll look at it later)
So in our example, if we were able to print the bytes object, we should see this: Based on what we did learn previously, this means that the string is “inside” the bytes object. So inside the same memory region \o/ What if we didn’t know about the way id was implemented in CPython? There is actually another way that we can use to find where the string is: looking at the actual object in memory. Looking at the bytes object in memoryIf we want to look directly at the PyBytesObject variable, we will need to create a C function, and call this C function from Python. There are different ways to call a C function from Python. We will use the simplest one: using a dynamic library. Creating our C functionSo the idea is to create a C function that is called from Python with the object as a parameter, and then “explore” this object to get the exact address of the string (as well as other information about the object). The function prototype should be: void print_python_bytes(PyObject *p); , where p is a pointer to our object (so p stores the address of our object in the virtual memory). It doesn’t have to return anything. object.h
You probably have noticed that we don’t use a parameter of type PyBytesObject . To understand why, let’s have a look at the object.h header file and see what we can learn from it: /* Object and type object interface */
/*
Objects are structures allocated on the heap. Special rules apply to
the use of objects to ensure they are properly garbage-collected.
Objects are never allocated statically or on the stack; they must be
...
*/
“Objects are never allocated statically or on the stack” -> ok, now we know why it was not on the stack. “Objects are structures allocated on the heap” -> wait… WAT? We searched for the string in the heap and it was NOT there… I’m confused! We’ll discuss this later, in another article
What else can we read in this file: /*
...
Objects do not float around in memory; once allocated an object keeps
the same size and address. Objects that must hold variable-size data
...
*/
“Objects do not float around in memory; once allocated an object keeps the same size and address”. Good to know. That means that if we modify the correct string, it will always be modfied, and the addresses will never change “once allocated” -> allocation? but not using the heap? I’m confused! We’ll discuss this later in another article
/*
...
Objects are always accessed through pointers of the type 'PyObject *'.
The type 'PyObject' is a structure that only contains the reference count
and the type pointer. The actual memory allocated for an object
contains other data that can only be accessed after casting the pointer
to a pointer to a longer structure type. This longer type must start
with the reference count and type fields; the macro PyObject_HEAD should be
used for this (to accommodate for future changes). The implementation
of a particular object type can cast the object pointer to the proper
type and back.
...
*/
“Objects are always accessed through pointers of the type ‘PyObject *'” -> that is why we have to have to take a pointer of type PyObject (vs PyBytesObject ) as the parameter of our function “The actual memory allocated for an object contains other data that can only be accessed after casting the pointer to a pointer to a longer structure type.” -> So we will have to cast our function parameter to PyBytesObject * in order to access all its information. This is possible because the beginning of the PyBytesObject starts with a PyVarObject which itself starts with a PyObject :
/* PyObject_VAR_HEAD defines the initial segment of all variable-size
* container objects. These end with a declaration of an array with 1
* element, but enough space is malloc'ed so that the array actually
* has room for ob_size elements. Note that ob_size is an element count,
* not necessarily a byte count.
*/
#define PyObject_VAR_HEAD PyVarObject ob_base;
#define Py_INVALID_SIZE (Py_ssize_t)-1
/* Nothing is actually declared to be a PyObject, but every pointer to
* a Python object can be cast to a PyObject*. This is inheritance built
* by hand. Similarly every pointer to a variable-size Python object can,
* in addition, be cast to PyVarObject*.
*/
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
-> Here is the ob_size that bytesobject.h was mentioning. The C functionBased on everything we just learned, the C code is pretty straightforward (bytes.c ): #include "Python.h"
/**
* print_python_bytes - prints info about a Python 3 bytes object
* @p: a pointer to a Python 3 bytes object
*
* Return: Nothing
*/
void print_python_bytes(PyObject *p)
{
/* The pointer with the correct type.*/
PyBytesObject *s;
unsigned int i;
printf("[.] bytes object info\n");
/* casting the PyObject pointer to a PyBytesObject pointer */
s = (PyBytesObject *)p;
/* never trust anyone, check that this is actually
a PyBytesObject object. */
if (s && PyBytes_Check(s))
{
/* a pointer holds the memory address of the first byte
of the data it points to */
printf(" address of the object: %p\n", (void *)s);
/* op_size is in the ob_base structure, of type PyVarObject. */
printf(" size: %ld\n", s->ob_base.ob_size);
/* ob_sval is the array of bytes, ending with the value 0:
ob_sval[ob_size] == 0 */
printf(" trying string: %s\n", s->ob_sval);
printf(" address of the data: %p\n", (void *)(s->ob_sval));
printf(" bytes:");
/* printing each byte at a time, in case this is not
a "string". bytes doesn't have to be strings.
ob_sval contains space for 'ob_size+1' elements.
ob_sval[ob_size] == 0. */
for (i = 0; i < s->ob_base.ob_size + 1; i++)
{
printf(" %02x", s->ob_sval[i] & 0xff);
}
printf("\n");
}
/* if this is not a PyBytesObject print an error message */
else
{
fprintf(stderr, " [ERROR] Invalid Bytes Object\n");
}
}
Calling the C function from the python scriptCreating the dynamic libraryAs we said earlier, we will use the “dynamic library method” to call our C function from Python 3. So we just need to compile our C file with this command: gcc -Wall -Wextra -pedantic -Werror -std=c99 -shared -Wl,-soname,libPython.so -o libPython.so -fPIC -I/usr/include/python3.4 bytes.c
Don’t forget to include the Python 3 header files directory: -I/usr/include/python3.4 Hopefully, this should have created a dynamic library called libPython.so . Using the dynamic library from Python 3In order to use our function we simply need to add these lines in the Python script: import ctypes
lib = ctypes.CDLL('./libPython.so')
lib.print_python_bytes.argtypes = [ctypes.py_object]
and call our function this way: lib.print_python_bytes(s)
The new Python scriptHere is the complete source code of the new Python 3 script (main_bytes.py ): #!/usr/bin/env python3
'''
Prints:
- the address of the bytes object
- a b"string" (bytes object)
- information about the bytes object
And then:
- reads a char from stdin
- prints the same (or not :)) information again
'''
import sys
import ctypes
lib = ctypes.CDLL('./libPython.so')
lib.print_python_bytes.argtypes = [ctypes.py_object]
s = b"Holberton"
print(hex(id(s)))
print(s)
lib.print_python_bytes(s)
sys.stdin.read(1)
print(hex(id(s)))
print(s)
lib.print_python_bytes(s)
Let’s run it! julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py
0x7f04d721b210
b'Holberton'
[.] bytes object info
address of the object: 0x7f04d721b210
size: 9
trying string: Holberton
address of the data: 0x7f04d721b230
bytes: 48 6f 6c 62 65 72 74 6f 6e 00
As expected: id() returns the address of the object itself (0x7f04d721b210 )
the size of the data of our object (ob_size ) is 9 the data of our object is “Holberton”, 48 6f 6c 62 65 72 74 6f 6e 00 (and it ends with 00 as specified on the header file bytesobject.h )
Annnnnnd, we have found the exact address of our string: 0x7f04d721b230 \o/ Sorry I had to add at least one Monty Python reference (why) rw_all.py
Now that we undertand a little bit more about what’s happening, it’s ok to “brute-force” the mapped memory regions. Let’s update the script that replaces the string. Instead of looking only in the stack or the heap, let’s look in all readable and writeable memory regions of the process. Here’s the source code: #!/usr/bin/env python3
'''
Locates and replaces (if we have permission) all occurrences of
an ASCII string in the entire virtual memory of a process.
Usage: ./rw_all.py PID search_string replace_by_string
Where:
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
search_string with
'''
import sys
def print_usage_and_exit():
print('Usage: {} pid search write'.format(sys.argv[0]))
exit(1)
# check usage
if len(sys.argv) != 4:
print_usage_and_exit()
# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string == "":
print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string == "":
print_usage_and_exit()
# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))
# try opening the file
try:
maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
print("[ERROR] Can not open file {}:".format(maps_filename))
print(" I/O error({}): {}".format(e.errno, e.strerror))
exit(1)
for line in maps_file:
# print the name of the memory region
sline = line.split(' ')
name = sline[-1][:-1];
print("[*] Searching in {}:".format(name))
# parse line
addr = sline[0]
perm = sline[1]
offset = sline[2]
device = sline[3]
inode = sline[4]
pathname = sline[-1][:-1]
# check if there are read and write permissions
if perm[0] != 'r' or perm[1] != 'w':
print("\t[\x1B[31m!\x1B[m] {} does not have read/write permissions ({})".format(pathname, perm))
continue
print("\tpathname = {}".format(pathname))
print("\taddresses = {}".format(addr))
print("\tpermisions = {}".format(perm))
print("\toffset = {}".format(offset))
print("\tinode = {}".format(inode))
# get start and end of the memoy region
addr = addr.split("-")
if len(addr) != 2: # never trust anyone
print("[*] Wrong addr format")
maps_file.close()
exit(1)
addr_start = int(addr[0], 16)
addr_end = int(addr[1], 16)
print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
# open and read the memory region
try:
mem_file = open(mem_filename, 'rb+')
except IOError as e:
print("[ERROR] Can not open file {}:".format(mem_filename))
print(" I/O error({}): {}".format(e.errno, e.strerror))
maps_file.close()
# read the memory region
mem_file.seek(addr_start)
region = mem_file.read(addr_end - addr_start)
# find string
nb_found = 0;
try:
i = region.index(bytes(search_string, "ASCII"))
while (i):
print("\t[\x1B[32m:)\x1B[m] Found '{}' at {:x}".format(search_string, i))
nb_found = nb_found + 1
# write the new string
print("\t[:)] Writing '{}' at {:x}".format(write_string, addr_start + i))
mem_file.seek(addr_start + i)
mem_file.write(bytes(write_string, "ASCII"))
mem_file.flush()
# update our buffer
region.write(bytes(write_string, "ASCII"), i)
i = region.index(bytes(search_string, "ASCII"))
except Exception:
if nb_found == 0:
print("\t[\x1B[31m:(\x1B[m] Can't find '{}'".format(search_string))
mem_file.close()
# close files
maps_file.close()
Let’s run it! julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py
0x7f37f1e01210
b'Holberton'
[.] bytes object info
address of the object: 0x7f37f1e01210
size: 9
trying string: Holberton
address of the data: 0x7f37f1e01230
bytes: 48 6f 6c 62 65 72 74 6f 6e 00
julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main_bytes.py | grep -v grep
julien 4713 0.0 0.8 37720 8208 pts/0 S+ 18:48 0:00 python3 ./main_bytes.py
julien@holberton:~/holberton/w/hackthevm1$ sudo ./rw_all.py 4713 Holberton "~ Betty ~"
[*] maps: /proc/4713/maps
[*] mem: /proc/4713/mem
[*] Searching in /usr/bin/python3.4:
[!] /usr/bin/python3.4 does not have read/write permissions (r-xp)
...
[*] Searching in [heap]:
pathname = [heap]
addresses = 00e26000-00f11000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [e26000] | end [f11000]
[:)] Found 'Holberton' at 8e422
[:)] Writing '~ Betty ~' at eb4422
...
[*] Searching in :
pathname =
addresses = 7f37f1df1000-7f37f1fa7000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [7f37f1df1000] | end [7f37f1fa7000]
[:)] Found 'Holberton' at 10230
[:)] Writing '~ Betty ~' at 7f37f1e01230
...
[*] Searching in [stack]:
pathname = [stack]
addresses = 7ffdc3d0c000-7ffdc3d2d000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [7ffdc3d0c000] | end [7ffdc3d2d000]
[:(] Can't find 'Holberton'
...
julien@holberton:~/holberton/w/hackthevm1$
And if we hit enter in the running main_bytes.py … julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py
0x7f37f1e01210
b'Holberton'
[.] bytes object info
address of the object: 0x7f37f1e01210
size: 9
trying string: Holberton
address of the data: 0x7f37f1e01230
bytes: 48 6f 6c 62 65 72 74 6f 6e 00
0x7f37f1e01210
b'~ Betty ~'
[.] bytes object info
address of the object: 0x7f37f1e01210
size: 9
trying string: ~ Betty ~
address of the data: 0x7f37f1e01230
bytes: 7e 20 42 65 74 74 79 20 7e 00
julien@holberton:~/holberton/w/hackthevm1$
BOOM! OutroWe managed to modify the string used by our Python 3 script. Awesome! But we still have questions to answer: What is the “Holberton” string that is in the [heap] memory region? How does Python 3 allocate memory outside of the heap? If Python 3 is not using the heap, what does it refer to when it says “Objects are structures allocated on the heap” in object.h ?
That will be for another time In the meantime, if you are too curious to wait for the next article, you can try to find out yourself. Questions? Feedback?If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42. Haters, please send your comments to /dev/null . Happy Hacking! Thank you for reading!As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments. FilesThis repo contains the source code for all scripts and dynamic libraries created in this tutorial: main.py : the first target
main_id.py : the second target, printing the id of the bytes object
main_bytes.py : the final target, printing also the information about the bytes object, using our dynamic library
read_write_heap.py : the “original” script to find and replace strings in the heap of a process
read_write_stack.py : same but, searches and replaces in the stack instead of the heap
rw_all.py : same but in every memory regions that are readable and writable
bytes.c : the C function to print info about a Python 3 bytes object
Read on!Many thanks to Tim for English proof-reading & Guillaume for PEP8 proof-reading
IntroHack The Virtual Memory, Chapter 0: Play with C strings & /proc This is the first in a series of small articles / tutorials based around virtual memory. The goal is to learn some CS basics, but in a different and more practical way. For this first piece, we’ll use /proc to find and modify variables (in this example, an ASCII string) contained inside the virtual memory of a running process, and learn some cool things along the way. EnvironmentAll scripts and programs have been tested on the following system: Ubuntu 14.04 LTS gcc Python 3:
PrerequisitesIn order to fully understand this article, you need to know: Virtual MemoryIn computing, virtual memory is a memory management technique that is implemented using both hardware and software. It maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage (as seen by a process or task) appears as a contiguous address space, or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer. The primary benefits of virtual memory include freeing applications from having to manage a shared memory space, increased security due to memory isolation, and being able to conceptually use more memory than might be physically available, using the technique of paging. You can read more about the virtual memory on Wikipedia. In chapter 2, we’ll go into more details and do some fact checking on what lies inside the virtual memory and where. For now, here are some key points you should know before you read on: Each process has its own virtual memory The amount of virtual memory depends on your system’s architecture Each OS handles virtual memory differently, but for most modern operating systems, the virtual memory of a process looks like this:
In the high memory addresses you can find (this is a non exhaustive list, there’s much more to be found, but that’s not today’s topic): The command line arguments and environment variables The stack, growing “downwards”. This may seem counter-intuitive, but this is the way the stack is implemented in virtual memory
In the low memory addresses you can find: Your executable (it’s a little more complicated than that, but this is enough to understand the rest of this article) The heap, growing “upwards”
The heap is a portion of memory that is dynamically allocated (i.e. containing memory allocated using malloc ). Also, keep in mind that virtual memory is not the same as RAM. C programLet’s start with this simple C program: #include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* main - uses strdup to create a new string, and prints the
* address of the new duplcated string
*
* Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
char *s;
s = strdup("Holberton");
if (s == NULL)
{
fprintf(stderr, "Can't allocate mem with malloc\n");
return (EXIT_FAILURE);
}
printf("%p\n", (void *)s);
return (EXIT_SUCCESS);
}
strdupTake a moment to think before going further. How do you think strdup creates a copy of the string “Holberton”? How can you confirm that? . . . strdup has to create a new string, so it first has to reserve space for it. The function strdup is probably using malloc . A quick look at its man page can confirm:
DESCRIPTION
The strdup() function returns a pointer to a new string which is a duplicate of the string s.
Memory for the new string is obtained with malloc(3), and can be freed with free(3).
Take a moment to think before going further. Based on what we said earlier about virtual memory, where do you think the duplicate string will be located? At a high or low memory address? . . . Probably in the lower addresses (in the heap). Let’s compile and run our small C program to test our hypothesis: julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror main.c -o holberton
julien@holberton:~/holberton/w/hackthevm0$ ./holberton
0x1822010
julien@holberton:~/holberton/w/hackthevm0$
Our duplicated string is located at the address 0x1822010 . Great. But is this a low or a high memory address? How big is the virtual memory of a processThe size of the virtual memory of a process depends on your system architecture. In this example I am using a 64-bit machine, so theoretically the size of each process’ virtual memory is 2^64 bytes. In theory, the highest memory address possible is 0xffffffffffffffff (1.8446744e+19), and the lowest is 0x0 . 0x1822010 is small compared to 0xffffffffffffffff , so the duplicated string is probably located at a lower memory address. We will be able to confirm this when we will be looking at the proc filesystem).
The proc filesystemFrom man proc : The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures. It is commonly mounted at `/proc`. Most of it is read-only, but some files allow kernel variables to be changed.
If you list the contents of your /proc directory, you will probably see a lot of files. We will focus on two of them: /proc/[pid]/mem
/proc/[pid]/maps
memFrom man proc : /proc/[pid]/mem
This file can be used to access the pages of a process's memory
through open(2), read(2), and lseek(2).
Awesome! So, can we access and modify the entire virtual memory of any process? mapsFrom man proc : /proc/[pid]/maps
A file containing the currently mapped memory regions and their access permissions.
See mmap(2) for some further information about memory mappings.
The format of the file is:
address perms offset dev inode pathname
00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon
00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon
00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon
00e03000-00e24000 rw-p 00000000 00:00 0 [heap]
00e24000-011f7000 rw-p 00000000 00:00 0 [heap]
...
35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so
35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so
35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so
35b1a21000-35b1a22000 rw-p 00000000 00:00 0
35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so
35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so
...
f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986]
...
7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack]
7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
The address field is the address space in the process that the mapping occupies.
The perms field is a set of permissions:
r = read
w = write
x = execute
s = shared
p = private (copy on write)
The offset field is the offset into the file/whatever;
dev is the device (major:minor); inode is the inode on that device. 0 indicates
that no inode is associated with the memory region,
as would be the case with BSS (uninitialized data).
The pathname field will usually be the file that is backing the mapping.
For ELF files, you can easily coordinate with the offset field
by looking at the Offset field in the ELF program headers (readelf -l).
There are additional helpful pseudo-paths:
[stack]
The initial process's (also known as the main thread's) stack.
[stack:<tid>] (since Linux 3.4)
A thread's stack (where the <tid> is a thread ID).
It corresponds to the /proc/[pid]/task/[tid]/ path.
[vdso] The virtual dynamically linked shared object.
[heap] The process's heap.
If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
There is no easy way to coordinate
this back to a process's source, short of running it through gdb(1), strace(1), or similar.
Under Linux 2.0 there is no field giving pathname.
This means that we can look at the /proc/[pid]/mem file to locate the heap of a running process. If we can read from the heap, we can locate the string we want to modify. And if we can write to the heap, we can replace this string with whatever we want. pidA process is an instance of a program, with a unique process ID. This process ID (PID) is used by many functions and system calls to interact with and manipulate processes. We can use the program ps to get the PID of a running process (man ps ). C programWe now have everything we need to write a script or program that finds a string in the heap of a running process and then replaces it with another string (of the same length or shorter). We will work with the following simple program that infinitely loops and prints a “strduplicated” string. #include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
/**
* main - uses strdup to create a new string, loops forever-ever
*
* Return: EXIT_FAILURE if malloc failed. Other never returns
*/
int main(void)
{
char *s;
unsigned long int i;
s = strdup("Holberton");
if (s == NULL)
{
fprintf(stderr, "Can't allocate mem with malloc\n");
return (EXIT_FAILURE);
}
i = 0;
while (s)
{
printf("[%lu] %s (%p)\n", i, s, (void *)s);
sleep(1);
i++;
}
return (EXIT_SUCCESS);
}
Compiling and running the above source code should give you this output, and loop indefinitely until you kill the process. julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror loop.c -o loop
julien@holberton:~/holberton/w/hackthevm0$ ./loop
[0] Holberton (0xfbd010)
[1] Holberton (0xfbd010)
[2] Holberton (0xfbd010)
[3] Holberton (0xfbd010)
[4] Holberton (0xfbd010)
[5] Holberton (0xfbd010)
[6] Holberton (0xfbd010)
[7] Holberton (0xfbd010)
...
If you would like, pause the reading now and try to write a script or program that finds a string in the heap of a running process before reading further. . . . looking at /procLet’s run our loop program. julien@holberton:~/holberton/w/hackthevm0$ ./loop
[0] Holberton (0x10ff010)
[1] Holberton (0x10ff010)
[2] Holberton (0x10ff010)
[3] Holberton (0x10ff010)
...
The first thing we need to find is the PID of the process. julien@holberton:~/holberton/w/hackthevm0$ ps aux | grep ./loop | grep -v grep
julien 4618 0.0 0.0 4332 732 pts/14 S+ 17:06 0:00 ./loop
In the above example, the PID is 4618 (it will be different each time we run it, and it is probably a different number if you are trying this on your own computer). As a result, the maps and mem files we want to look at are located in the /proc/4618 directory: /proc/4618/maps
/proc/4618/mem
A quick ls -la in the directory should give you something like this: julien@ubuntu:/proc/4618$ ls -la
total 0
dr-xr-xr-x 9 julien julien 0 Mar 15 17:07 .
dr-xr-xr-x 257 root root 0 Mar 15 10:20 ..
dr-xr-xr-x 2 julien julien 0 Mar 15 17:11 attr
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 autogroup
-r-------- 1 julien julien 0 Mar 15 17:11 auxv
-r--r--r-- 1 julien julien 0 Mar 15 17:11 cgroup
--w------- 1 julien julien 0 Mar 15 17:11 clear_refs
-r--r--r-- 1 julien julien 0 Mar 15 17:07 cmdline
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 comm
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 coredump_filter
-r--r--r-- 1 julien julien 0 Mar 15 17:11 cpuset
lrwxrwxrwx 1 julien julien 0 Mar 15 17:11 cwd -> /home/julien/holberton/w/funwthevm
-r-------- 1 julien julien 0 Mar 15 17:11 environ
lrwxrwxrwx 1 julien julien 0 Mar 15 17:11 exe -> /home/julien/holberton/w/funwthevm/loop
dr-x------ 2 julien julien 0 Mar 15 17:07 fd
dr-x------ 2 julien julien 0 Mar 15 17:11 fdinfo
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 gid_map
-r-------- 1 julien julien 0 Mar 15 17:11 io
-r--r--r-- 1 julien julien 0 Mar 15 17:11 limits
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 loginuid
dr-x------ 2 julien julien 0 Mar 15 17:11 map_files
-r--r--r-- 1 julien julien 0 Mar 15 17:11 maps
-rw------- 1 julien julien 0 Mar 15 17:11 mem
-r--r--r-- 1 julien julien 0 Mar 15 17:11 mountinfo
-r--r--r-- 1 julien julien 0 Mar 15 17:11 mounts
-r-------- 1 julien julien 0 Mar 15 17:11 mountstats
dr-xr-xr-x 5 julien julien 0 Mar 15 17:11 net
dr-x--x--x 2 julien julien 0 Mar 15 17:11 ns
-r--r--r-- 1 julien julien 0 Mar 15 17:11 numa_maps
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 oom_adj
-r--r--r-- 1 julien julien 0 Mar 15 17:11 oom_score
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 oom_score_adj
-r-------- 1 julien julien 0 Mar 15 17:11 pagemap
-r-------- 1 julien julien 0 Mar 15 17:11 personality
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 projid_map
lrwxrwxrwx 1 julien julien 0 Mar 15 17:11 root -> /
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 sched
-r--r--r-- 1 julien julien 0 Mar 15 17:11 schedstat
-r--r--r-- 1 julien julien 0 Mar 15 17:11 sessionid
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 setgroups
-r--r--r-- 1 julien julien 0 Mar 15 17:11 smaps
-r-------- 1 julien julien 0 Mar 15 17:11 stack
-r--r--r-- 1 julien julien 0 Mar 15 17:07 stat
-r--r--r-- 1 julien julien 0 Mar 15 17:11 statm
-r--r--r-- 1 julien julien 0 Mar 15 17:07 status
-r-------- 1 julien julien 0 Mar 15 17:11 syscall
dr-xr-xr-x 3 julien julien 0 Mar 15 17:11 task
-r--r--r-- 1 julien julien 0 Mar 15 17:11 timers
-rw-r--r-- 1 julien julien 0 Mar 15 17:11 uid_map
-r--r--r-- 1 julien julien 0 Mar 15 17:11 wchan
/proc/pid/mapsAs we have seen earlier, the /proc/pid/maps file is a text file, so we can directly read it. The content of the maps file of our process looks like this: julien@ubuntu:/proc/4618$ cat maps
00400000-00401000 r-xp 00000000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop
00600000-00601000 r--p 00000000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop
00601000-00602000 rw-p 00001000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop
010ff000-01120000 rw-p 00000000 00:00 0 [heap]
7f144c052000-7f144c20c000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f144c20c000-7f144c40c000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f144c40c000-7f144c410000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f144c410000-7f144c412000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so
7f144c412000-7f144c417000 rw-p 00000000 00:00 0
7f144c417000-7f144c43a000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f144c61e000-7f144c621000 rw-p 00000000 00:00 0
7f144c636000-7f144c639000 rw-p 00000000 00:00 0
7f144c639000-7f144c63a000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f144c63a000-7f144c63b000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so
7f144c63b000-7f144c63c000 rw-p 00000000 00:00 0
7ffc94272000-7ffc94293000 rw-p 00000000 00:00 0 [stack]
7ffc9435e000-7ffc94360000 r--p 00000000 00:00 0 [vvar]
7ffc94360000-7ffc94362000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Circling back to what we said earlier, we can see that the stack ([stack] ) is located in high memory addresses and the heap ([heap] ) in the lower memory addresses. [heap]Using the maps file, we can find all the information we need to locate our string: 010ff000-01120000 rw-p 00000000 00:00 0 [heap]
The heap: Starts at address 0x010ff000 in the virtual memory of the process Ends at memory address: 0x01120000 Is readable and writable (rw )
A quick look back to our (still running) loop program: ...
[1024] Holberton (0x10ff010)
...
-> 0x010ff000 < 0x10ff010 < 0x01120000 . This confirms that our string is located in the heap. More precisely, it is located at index 0x10 of the heap. If we open the /proc/pid/mem/ file (in this example /proc/4618/mem ) and seek to the memory address 0x10ff010 , we can write to the heap of the running process, overwriting the “Holberton” string! Let’s write a script or program that does just that. Choose your favorite language and let’s do it! If you would like, stop reading now and try to write a script or program that finds a string in the heap of a running process, before reading further. The next paragraph will give away the source code of the answer! . . . Overwriting the string in the virtual memoryWe’ll be using Python 3 for writing the script, but you could write this in any language. Here is the code: #!/usr/bin/env python3
'''
Locates and replaces the first occurrence of a string in the heap
of a process
Usage: ./read_write_heap.py PID search_string replace_by_string
Where:
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
search_string with
'''
import sys
def print_usage_and_exit():
print('Usage: {} pid search write'.format(sys.argv[0]))
sys.exit(1)
# check usage
if len(sys.argv) != 4:
print_usage_and_exit()
# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string == "":
print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string == "":
print_usage_and_exit()
# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))
# try opening the maps file
try:
maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
print("[ERROR] Can not open file {}:".format(maps_filename))
print(" I/O error({}): {}".format(e.errno, e.strerror))
sys.exit(1)
for line in maps_file:
sline = line.split(' ')
# check if we found the heap
if sline[-1][:-1] != "[heap]":
continue
print("[*] Found [heap]:")
# parse line
addr = sline[0]
perm = sline[1]
offset = sline[2]
device = sline[3]
inode = sline[4]
pathname = sline[-1][:-1]
print("\tpathname = {}".format(pathname))
print("\taddresses = {}".format(addr))
print("\tpermisions = {}".format(perm))
print("\toffset = {}".format(offset))
print("\tinode = {}".format(inode))
# check if there is read and write permission
if perm[0] != 'r' or perm[1] != 'w':
print("[*] {} does not have read/write permission".format(pathname))
maps_file.close()
exit(0)
# get start and end of the heap in the virtual memory
addr = addr.split("-")
if len(addr) != 2: # never trust anyone, not even your OS :)
print("[*] Wrong addr format")
maps_file.close()
exit(1)
addr_start = int(addr[0], 16)
addr_end = int(addr[1], 16)
print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
# open and read mem
try:
mem_file = open(mem_filename, 'rb+')
except IOError as e:
print("[ERROR] Can not open file {}:".format(mem_filename))
print(" I/O error({}): {}".format(e.errno, e.strerror))
maps_file.close()
exit(1)
# read heap
mem_file.seek(addr_start)
heap = mem_file.read(addr_end - addr_start)
# find string
try:
i = heap.index(bytes(search_string, "ASCII"))
except Exception:
print("Can't find '{}'".format(search_string))
maps_file.close()
mem_file.close()
exit(0)
print("[*] Found '{}' at {:x}".format(search_string, i))
# write the new string
print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
mem_file.seek(addr_start + i)
mem_file.write(bytes(write_string, "ASCII"))
# close files
maps_file.close()
mem_file.close()
# there is only one heap in our example
break
Note: You will need to run this script as root, otherwise you won’t be able to read or write to the /proc/pid/mem file, even if you are the owner of the process. Running the scriptjulien@holberton:~/holberton/w/hackthevm0$ sudo ./read_write_heap.py 4618 Holberton "Fun w vm!"
[*] maps: /proc/4618/maps
[*] mem: /proc/4618/mem
[*] Found [heap]:
pathname = [heap]
addresses = 010ff000-01120000
permisions = rw-p
offset = 00000000
inode = 0
Addr start [10ff000] | end [1120000]
[*] Found 'Holberton' at 10
[*] Writing 'Fun w vm!' at 10ff010
julien@holberton:~/holberton/w/hackthevm0$
Note that this address corresponds to the one we found manually: The heap lies from addresses 0x010ff000 to 0x01120000 in the virtual memory of the running process Our string is at index 0x10 in the heap, so at the memory address 0x10ff010
If we go back to our loop program, it should now print “fun w vm!” ...
[2676] Holberton (0x10ff010)
[2677] Holberton (0x10ff010)
[2678] Holberton (0x10ff010)
[2679] Holberton (0x10ff010)
[2680] Holberton (0x10ff010)
[2681] Holberton (0x10ff010)
[2682] Fun w vm! (0x10ff010)
[2683] Fun w vm! (0x10ff010)
[2684] Fun w vm! (0x10ff010)
[2685] Fun w vm! (0x10ff010)
...
OutroQuestions? Feedback?If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42. Haters, please send your comments to /dev/null . Happy Hacking! Thank you for reading!As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments. FilesThis repo contains the source code for all programs shown in this tutorial: main.c : the first C program that prints the location of the string and exits
loop.c : the second C program that loops indefinitely
read_write_heap.py : the script used to modify the string in the running C program
What’s next?In the next chapter we will do almost the same thing, but instead we’ll access the memory of a running Python 3 script. It won’t be that straightfoward. We’ll take this as an excuse to look at some Python 3 internals. If you are curious, try to do it yourself, and find out why the above read_write_heap.py script won’t work to modify a Python 3 ASCII string. See you next time and Happy Hacking! Read on!Many thanks to Kristine, Tim for English proof-reading & Guillaume for PEP8 proof-reading
|