Skip to content

Instantly share code, notes, and snippets.

@robertojrojas
Forked from x0nu11byt3/elf_format_cheatsheet.md
Created February 22, 2025 21:34
Show Gist options
  • Select an option

  • Save robertojrojas/a94e1c923c41c32c280ff9edad6f1b7a to your computer and use it in GitHub Desktop.

Select an option

Save robertojrojas/a94e1c923c41c32c280ff9edad6f1b7a to your computer and use it in GitHub Desktop.

Revisions

  1. @lockedbyte lockedbyte revised this gist Oct 6, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1308,7 +1308,7 @@ As explained previously, they get loaded in the process memory space, and the li
    - `__libc_csu_init`: These run any program-level initializers (kind of like constructors for your whole program).
    - `__libc_csu_fini`: These run any program-level finalizers (kind of like destructors for your whole program).
    - `main`: For libc-linked programs, this is the default library being called by `__libc_start_main` and where the first user-custom code is executed.
    - `.eh_frame`: GCC exception handling features.
    - `.eh_frame`: DWARF-based debugging features such as stack unwinding.
    Summary:
  2. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 20 additions and 7 deletions.
    27 changes: 20 additions & 7 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1201,13 +1201,26 @@ typedef struct
    Auxv type:
    ```
    #define AT_EXECFD 2 /* File descriptor of program */
    #define AT_PHDR 3 /* Program headers for program */
    #define AT_PHENT 4 /* Size of program header entry */
    #define AT_PHNUM 5 /* Number of program headers */
    #define AT_PAGESZ 6 /* System page size */
    #define AT_ENTRY 9 /* Entry point of program */
    #define AT_UID 11 /* Real uid */
    /* Legal values for a_type (entry type). */
    #define AT_NULL 0 /* End of vector */
    #define AT_IGNORE 1 /* Entry should be ignored */
    #define AT_EXECFD 2 /* File descriptor of program */
    #define AT_PHDR 3 /* Program headers for program */
    #define AT_PHENT 4 /* Size of program header entry */
    #define AT_PHNUM 5 /* Number of program headers */
    #define AT_PAGESZ 6 /* System page size */
    #define AT_BASE 7 /* Base address of interpreter */
    #define AT_FLAGS 8 /* Flags */
    #define AT_ENTRY 9 /* Entry point of program */
    #define AT_NOTELF 10 /* Program is not ELF */
    #define AT_UID 11 /* Real uid */
    #define AT_EUID 12 /* Effective uid */
    #define AT_GID 13 /* Real gid */
    #define AT_EGID 14 /* Effective gid */
    #define AT_CLKTCK 17 /* Frequency of times() */
    /* Pointer to the global system page used for system calls and other nice things. */
    #define AT_SYSINFO 32
    #define AT_SYSINFO_EHDR 33
    ```
    The auxiliary vector is a special structure that is for passing information directly from the kernel to the newly running program. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities; that is specific features that the kernel has identified the underlying hardware has that userspace programs can take advantage of.
  3. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 24 additions and 7 deletions.
    31 changes: 24 additions & 7 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1136,13 +1136,26 @@ typedef struct
    Auxv type:
    ```
    #define AT_EXECFD 2 /* File descriptor of program */
    #define AT_PHDR 3 /* Program headers for program */
    #define AT_PHENT 4 /* Size of program header entry */
    #define AT_PHNUM 5 /* Number of program headers */
    #define AT_PAGESZ 6 /* System page size */
    #define AT_ENTRY 9 /* Entry point of program */
    #define AT_UID 11 /* Real uid */
    /* Legal values for a_type (entry type). */
    #define AT_NULL 0 /* End of vector */
    #define AT_IGNORE 1 /* Entry should be ignored */
    #define AT_EXECFD 2 /* File descriptor of program */
    #define AT_PHDR 3 /* Program headers for program */
    #define AT_PHENT 4 /* Size of program header entry */
    #define AT_PHNUM 5 /* Number of program headers */
    #define AT_PAGESZ 6 /* System page size */
    #define AT_BASE 7 /* Base address of interpreter */
    #define AT_FLAGS 8 /* Flags */
    #define AT_ENTRY 9 /* Entry point of program */
    #define AT_NOTELF 10 /* Program is not ELF */
    #define AT_UID 11 /* Real uid */
    #define AT_EUID 12 /* Effective uid */
    #define AT_GID 13 /* Real gid */
    #define AT_EGID 14 /* Effective gid */
    #define AT_CLKTCK 17 /* Frequency of times() */
    /* Pointer to the global system page used for system calls and other nice things. */
    #define AT_SYSINFO 32
    #define AT_SYSINFO_EHDR 33
    ```
    The auxiliary vector is a special structure that is for passing information directly from the kernel to the newly running program. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities; that is specific features that the kernel has identified the underlying hardware has that userspace programs can take advantage of.
    @@ -1169,6 +1182,10 @@ The stack for that process address space is set up in a very specific way to pas
    ![Stack init](https://i.imgur.com/Zy6Js20.png)
    Sample view:
    ![View](https://i.imgur.com/DHxTr7n.png)
    Struct:
    ```
  4. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1167,6 +1167,8 @@ The stack for that process address space is set up in a very specific way to pas
    ![Auxiliary vector](https://i.imgur.com/sRZkt21.png)
    ![Stack init](https://i.imgur.com/Zy6Js20.png)
    Struct:
    ```
  5. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1370,6 +1370,14 @@ Rela, has an addend, Rel doesn't.
    - [https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/](https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/)
    - [https://gcc.gnu.org/onlinedocs/gccint/Initialization.html](https://gcc.gnu.org/onlinedocs/gccint/Initialization.html)
    - [https://gcc.gnu.org/wiki/TransactionalMemory](https://gcc.gnu.org/wiki/TransactionalMemory)
    - [http://pmarlier.free.fr/gcc-tm-tut.html](http://pmarlier.free.fr/gcc-tm-tut.html)
    - [https://github.com/gcc-mirror/gcc/blob/master/libgcc/crtstuff.c](https://github.com/gcc-mirror/gcc/blob/master/libgcc/crtstuff.c)
    - [https://www.bottomupcs.com/starting_a_process.xhtml](https://www.bottomupcs.com/starting_a_process.xhtml)
    - [https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/](https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/)
  6. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1276,7 +1276,7 @@ As explained previously, they get loaded in the process memory space, and the li
    - `__libc_csu_init`: These run any program-level initializers (kind of like constructors for your whole program).
    - `__libc_csu_fini`: These run any program-level finalizers (kind of like destructors for your whole program).
    - `main`: For libc-linked programs, this is the default library being called by `__libc_start_main` and where the first user-custom code is executed.
    - `.eh_frame`
    - `.eh_frame`: GCC exception handling features.
    Summary:
  7. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 16 additions and 12 deletions.
    28 changes: 16 additions & 12 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1257,24 +1257,24 @@ As explained previously, they get loaded in the process memory space, and the li
    ## Common objects and functions
    - `frame_dummy`
    - `frame_dummy`: This function lives in the `.init` section. It is defined as `void frame_dummy ( void )` and its whole point in life is to call `__register_frame_info_bases` which has arguments.
    - `_start`: This is where `e_entry` points to, and first code to be executed.
    - `_init`:
    - `_init`: The dynamic loader executes the (INIT) function before control is passed _start function and executes the (FINI) function just before control is passed back to the OS kernel. The _init function is the default function used for the (INIT) tag. It calls several functions like `__gmon_start__`, `frame_dummy`, `__do_global_ctors_aux`.
    - `_fini`: The dynamic loader executes the (FINI) function just before control is passed back to the OS kernel.
    - `.init`: Code to be executed when the program starts.
    - `.fini`: Code to be executed at the end of the program.
    - `.init_array`: Array of pointers to use as constructors.
    - `.fini_array`: Array of pointers to use as destructors.
    - `__libc_start_main`: Libc functions that set up some stuff and calls `main()`.
    - `deregister_tm_clones`
    - `register_tm_clones`
    - `__stack_chk_fail`:
    - `__do_global_dtors_aux`
    - `__do_global_dtors_aux_fini_array_entry`
    - `__init_array_end`
    - `__frame_dummy_init_array_entry`
    - `__init_array_start`
    - `__libc_csu_init`
    - `__libc_csu_fini`
    - `deregister_tm_clones`: Transactional memory is intended to make programming with threads simpler. It is an alternative to lock-based synchronization. These routines tear down and setup, respectively, a table used by the library (libitm) which supports these functions.
    - `register_tm_clones`: Transactional memory is intended to make programming with threads simpler. It is an alternative to lock-based synchronization. These routines tear down and setup, respectively, a table used by the library (libitm) which supports these functions.
    - `__register_frame_info_bases`:
    - `__stack_chk_fail`: Stack smashing Protector function.
    - `__do_global_dtors_aux`: Runs all the global destructors on exit from the program on systems where `.fini_array` is not available.
    - `__do_global_dtors_aux_fini_array_entry` and `__init_array_end`: These mark the end and start of the `.fini_array` section, which contains pointers to all the program-level finalizers.
    - `__frame_dummy_init_array_entry` and `__init_array_start`: These mark the end and start of the `.init_array` section, which contains pointers to all the program-level initializers.
    - `__libc_csu_init`: These run any program-level initializers (kind of like constructors for your whole program).
    - `__libc_csu_fini`: These run any program-level finalizers (kind of like destructors for your whole program).
    - `main`: For libc-linked programs, this is the default library being called by `__libc_start_main` and where the first user-custom code is executed.
    - `.eh_frame`
    @@ -1286,6 +1286,8 @@ Summary:
    - `__libc_start_main` calls the executable `main()`;
    - `__libc_start_main` calls the executable `exit()`.
    ![Diagram](https://i.imgur.com/SwFHy2M.png)
    ## FAQ (Frequently Asked Questions)
    ### Why do we need sections?
    @@ -1372,6 +1374,8 @@ Rela, has an addend, Rel doesn't.
    - [https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/](https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/)
    - [https://web.archive.org/web/20191210114310/http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html](https://web.archive.org/web/20191210114310/http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html)
    - `/usr/include/elf.h`
    - `ELF(5)` man pages
  8. @lockedbyte lockedbyte revised this gist Oct 5, 2020. 1 changed file with 9 additions and 5 deletions.
    14 changes: 9 additions & 5 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1258,20 +1258,24 @@ As explained previously, they get loaded in the process memory space, and the li
    ## Common objects and functions
    - `frame_dummy`
    - `_start`
    - `_init`
    - `__libc_start_main`
    - `_start`: This is where `e_entry` points to, and first code to be executed.
    - `_init`:
    - `.init`: Code to be executed when the program starts.
    - `.fini`: Code to be executed at the end of the program.
    - `.init_array`: Array of pointers to use as constructors.
    - `.fini_array`: Array of pointers to use as destructors.
    - `__libc_start_main`: Libc functions that set up some stuff and calls `main()`.
    - `deregister_tm_clones`
    - `register_tm_clones`
    - `__stack_chk_fail`
    - `__stack_chk_fail`:
    - `__do_global_dtors_aux`
    - `__do_global_dtors_aux_fini_array_entry`
    - `__init_array_end`
    - `__frame_dummy_init_array_entry`
    - `__init_array_start`
    - `__libc_csu_init`
    - `__libc_csu_fini`
    - `main`: For libc-linked programs, this is the default library being called by `__libc_start_main` and where the first user-custom code is executed.
    - `.eh_frame`
    Summary:
  9. @lockedbyte lockedbyte revised this gist Oct 4, 2020. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1336,6 +1336,10 @@ Rela, has an addend, Rel doesn't.
    -- TO DO --
    ### What happens if A (program) which uses libc, imports also B (library) which also uses libc?
    -- TO DO --
    ### When a() (local) calls b() (libc) and b() calls c() (libc too) is c() imported in .dynsym?
    -- TO DO --
  10. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1318,7 +1318,7 @@ When the binary is executed on another system, the interpreter tries to find tha
    Rel is used in 32-bit systems, instead, Rela is used in 64-bit ones.
    Rela, has an addend, Rel no.
    Rela, has an addend, Rel doesn't.
    ### How is process address selected?
  11. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1279,7 +1279,7 @@ Summary:
    - `_start` calls the libc `__libc_start_main`;
    - `__libc_start_main` calls the executable `__libc_csu_init` (statically-linked part of the libc);
    - `__libc_csu_init` calls the executable constructors (and other initialisatios);
    - `__libc_start_main` calls the executable `main();
    - `__libc_start_main` calls the executable `main()`;
    - `__libc_start_main` calls the executable `exit()`.
    ## FAQ (Frequently Asked Questions)
  12. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1273,6 +1273,14 @@ As explained previously, they get loaded in the process memory space, and the li
    - `__libc_csu_fini`
    - `.eh_frame`
    Summary:
    - `_start` calls the libc `__libc_start_main`;
    - `__libc_start_main` calls the executable `__libc_csu_init` (statically-linked part of the libc);
    - `__libc_csu_init` calls the executable constructors (and other initialisatios);
    - `__libc_start_main` calls the executable `main();
    - `__libc_start_main` calls the executable `exit()`.
    ## FAQ (Frequently Asked Questions)
  13. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1152,6 +1152,12 @@ Then reads the interpreter code and starts it from it's entry point. The interpr
    The interpreter loads the binary, and gives the control to the entry point of the binary.
    Summary:
    - The kernel maps the program in memory (and the vDSO);
    - The kernel sets up the stack and registers (passing information such as the argument and environment variables) and calls the main program entry point.
    - The executable is loaded at a fixed address and no relocation is needed.
    ### Dynamically-linked executable files
  14. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 40 additions and 3 deletions.
    43 changes: 40 additions & 3 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1187,12 +1187,13 @@ Auxv type:
    The auxiliary vector is a special structure that is for passing information directly from the kernel to the newly running program. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities; that is specific features that the kernel has identified the underlying hardware has that userspace programs can take advantage of.
    Then the operating system maps an interpreter into the process's virtual memory (Usually `ld-linux.so`).
    Then reads the interpreter code and starts it from it's entry point. The interpreter can be retrieved by the `.interp` section in the ELF file.
    After the program code has been loaded into memory as described previously, the ELF handler also loads the ELF interpreter program into memory with `load_elf_interp()`. This process is similar to the process of loading the original program: the code checks the format information in the ELF header, reads in the ELF program header, maps all of the `PT_LOAD` segments from the file into the new program's memory, and leaves room for the interpreter's `BSS` segment. The interpreter can be retrieved by the `.interp` section in the ELF file.
    The execution start address for the program is also set to be the entry point of the interpreter, rather than that of the program itself. When the `execve()` system call completes, execution then begins with the ELF interpreter, which takes care of satisfying the linkage requirements of the program from user space — finding and loading the shared libraries that the program depends on, and resolving the program's undefined symbols to the correct definitions in those libraries. Once this linkage process is done (which relies on a much deeper understanding of the ELF format than the kernel has), the interpreter can start the execution of the new program itself, at the address previously recorded in the `AT_ENTRY` auxiliary value.
    We mentioned previously that system calls are slow, and modern systems have mechanisms to avoid the overheads of calling a trap to the processor.
    In Linux, this is implemented by a neat trick between the dynamic loader and the kernel, all communicated with the AUXV structure. The kernel actually adds a small shared library into the address space of every newly created process which contains a function that makes system calls for you. The beauty of this system is that if the underlying hardware supports a fast system call mechanism the kernel (being the creator of the library) can use it, otherwise it can use the old scheme of generating a trap. This library is named linux-gate.so.1, so called because it is a gateway to the inner workings of the kernel.
    In Linux, this is implemented by a neat trick between the dynamic loader and the kernel, all communicated with the `AUXV` structure. The kernel actually adds a small shared library into the address space of every newly created process which contains a function that makes system calls for you. The beauty of this system is that if the underlying hardware supports a fast system call mechanism the kernel (being the creator of the library) can use it, otherwise it can use the old scheme of generating a trap. This library is named linux-gate.so.1, so called because it is a gateway to the inner workings of the kernel.
    When the kernel starts the dynamic linker it adds an entry to the auxv called AT_SYSINFO_EHDR, which is the address in memory that the special kernel library lives in. When the dynamic linker starts it can look for the AT_SYSINFO_EHDR pointer, and if found load that library for the program. The program has no idea this library exists; this is a private arrangement between the dynamic linker and the kernel.
    @@ -1210,6 +1211,40 @@ Once `__libc_start_main` has completed with the `_init` call it finally calls th
    Finally, call end functions and calls `exit()` with the return value from `main()`.
    The linker's next work will be resolving with lazy binding all the library functions when they are called.
    Using the library's symbols and the dynamic symbols from you executable, and relocations for the GOT, the dynamic linking will be performed successfully.
    Summary:
    - locate and map all dependencies (as well as shared object specified in LD_PRELOAD);
    - relocate the files.
    This is a very high level overview as I understand it:
    - the kernels initialises the process:
    - it maps the main program, the interpreter (dynamic linker) segments and the vDSO in the virtual address space;
    - it sets up the stack (passing the arguments, environment) and calls the dynamic linker entry point;
    - the dynamic linker loads the different ELF objects and binds them together
    - it relocates itself (!);
    - it finds and loads the necessary libraries;
    - it does the relocations (which binds the ELF objects);
    - it calls the initialisation functions functions of the shared objects;
    - Those functions are specified in the DT_INIT and DT_INIT_ARRAY entries of the ELF objects.
    - it calls the main program entry point;
    - The main program entry point is found in the AT_ENTRY entry of the auxiliary vector: it has been initialised by the kernel from the e_entry ELF header field.
    - the executable then initialises itself.
    ### Shared libraries
    As explained previously, they get loaded in the process memory space, and the linker does the dynamic-linking work.
    @@ -1313,6 +1348,8 @@ Rela, has an addend, Rel no.
    - [https://www.bottomupcs.com/starting_a_process.xhtml](https://www.bottomupcs.com/starting_a_process.xhtml)
    - [https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/](https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/)
    - `/usr/include/elf.h`
    - `ELF(5)` man pages
  15. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 83 additions and 4 deletions.
    87 changes: 83 additions & 4 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1115,23 +1115,100 @@ They are not supposed to be loaded as some relocations are pending to create a f
    ### Statically-linked executable files
    First, when we decide to run an executable the kernel set up a process and give it a virtual memory space.
    First, when we decide to run an executable the kernel set up a process and give it a virtual memory space, an stack etc.
    The stack for that process address space is set up in a very specific way to pass information to the dynamic linker. This particular setup and arrangement of information is known as the auxiliary vector or auxv.
    ![Auxiliary vector](https://i.imgur.com/sRZkt21.png)
    Struct:
    ```
    typedef struct
    {
    uint64_t a_type;
    union
    {
    uint64_t a_val;
    } a_un;
    } Elf64_auxv_t;
    ```
    Auxv type:
    ```
    #define AT_EXECFD 2 /* File descriptor of program */
    #define AT_PHDR 3 /* Program headers for program */
    #define AT_PHENT 4 /* Size of program header entry */
    #define AT_PHNUM 5 /* Number of program headers */
    #define AT_PAGESZ 6 /* System page size */
    #define AT_ENTRY 9 /* Entry point of program */
    #define AT_UID 11 /* Real uid */
    ```
    The auxiliary vector is a special structure that is for passing information directly from the kernel to the newly running program. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities; that is specific features that the kernel has identified the underlying hardware has that userspace programs can take advantage of.
    Then the operating system maps an interpreter into the process's virtual memory (Usually `ld-linux.so`).
    The interpreter can be retrieved by the `.interp` section in the ELF file.
    Then reads the interpreter code and starts it from it's entry point. The interpreter can be retrieved by the `.interp` section in the ELF file.
    The interpreter loads the binary, and gives the control to the entry point of the binary.
    ### Dynamically-linked executable files
    First, when we decide to run an executable the kernel set up a process and give it a virtual memory space.
    First, when we decide to run an executable the kernel set up a process and give it a virtual memory space, an stack etc.
    The stack for that process address space is set up in a very specific way to pass information to the dynamic linker. This particular setup and arrangement of information is known as the auxiliary vector or auxv.
    ![Auxiliary vector](https://i.imgur.com/sRZkt21.png)
    Struct:
    ```
    typedef struct
    {
    uint64_t a_type;
    union
    {
    uint64_t a_val;
    } a_un;
    } Elf64_auxv_t;
    ```
    Auxv type:
    ```
    #define AT_EXECFD 2 /* File descriptor of program */
    #define AT_PHDR 3 /* Program headers for program */
    #define AT_PHENT 4 /* Size of program header entry */
    #define AT_PHNUM 5 /* Number of program headers */
    #define AT_PAGESZ 6 /* System page size */
    #define AT_ENTRY 9 /* Entry point of program */
    #define AT_UID 11 /* Real uid */
    ```
    The auxiliary vector is a special structure that is for passing information directly from the kernel to the newly running program. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities; that is specific features that the kernel has identified the underlying hardware has that userspace programs can take advantage of.
    Then the operating system maps an interpreter into the process's virtual memory (Usually `ld-linux.so`).
    The interpreter can be retrieved by the `.interp` section in the ELF file.
    Then reads the interpreter code and starts it from it's entry point. The interpreter can be retrieved by the `.interp` section in the ELF file.
    We mentioned previously that system calls are slow, and modern systems have mechanisms to avoid the overheads of calling a trap to the processor.
    In Linux, this is implemented by a neat trick between the dynamic loader and the kernel, all communicated with the AUXV structure. The kernel actually adds a small shared library into the address space of every newly created process which contains a function that makes system calls for you. The beauty of this system is that if the underlying hardware supports a fast system call mechanism the kernel (being the creator of the library) can use it, otherwise it can use the old scheme of generating a trap. This library is named linux-gate.so.1, so called because it is a gateway to the inner workings of the kernel.
    When the kernel starts the dynamic linker it adds an entry to the auxv called AT_SYSINFO_EHDR, which is the address in memory that the special kernel library lives in. When the dynamic linker starts it can look for the AT_SYSINFO_EHDR pointer, and if found load that library for the program. The program has no idea this library exists; this is a private arrangement between the dynamic linker and the kernel.
    The interpreter loads the binary, and parse it to find which libraries does the binary need, and maps them with mmap or similar options and then performs any necessary last-minute relocations in the binary’s code sections to fill in the correct addresses for references to the dynamic libraries.
    The dynamic linker will jump to the entry point address as given in the ELF binary.
    The entry point is the `_start` function in the binary. At this point we can see in the disassembley some values are pushed onto the stack. The first value is the address of `__libc_csu_fini` function, another is the address of `__libc_csu_init` and then finally the address of `main()` function. After this the value `__libc_start_main` function is called.
    At this stage we can see that the `__libc_start_main function` will receive quite a few input paramaters on the stack. Firstly it will have access to the program arguments, environment variables and auxiliary vector from the kernel. Then the initalization function will have pushed onto the stack addresses for functions to handle `init`, `fini`, and finally the address of the `main()` function itself.
    The last value pushed onto the stack for the `__libc_start_main` was the initialisation function `__libc_csu_init`. If we follow the call chain through from `__libc_csu_init` we can see it does some setup and then calls the `_init` function in the executable. The `_init` function eventually calls some functions called `__do_global_ctors_aux`, `frame_dummy` and `call_gmon_start`.
    Once `__libc_start_main` has completed with the `_init` call it finally calls the `main()` function. Remember that it had the stack setup initially with the arguments and environment pointers from the kernel; this is how main gets its `argc`, `argv[]`, `envp[]` arguments. The process now runs and the setup phase is complete.
    Finally, call end functions and calls `exit()` with the return value from `main()`.
    ### Shared libraries
    @@ -1234,6 +1311,8 @@ Rela, has an addend, Rel no.
    - [https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/](https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/)
    - [https://www.bottomupcs.com/starting_a_process.xhtml](https://www.bottomupcs.com/starting_a_process.xhtml)
    - `/usr/include/elf.h`
    - `ELF(5)` man pages
  16. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 24 additions and 0 deletions.
    24 changes: 24 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1170,26 +1170,50 @@ When the binary is executed on another system, the interpreter tries to find tha
    ### When using PIC/PIE executables, how do the addresses get patched so the offset is added?
    -- TO DO --
    ### What is the difference between .got, .plt.got, .plt and .got.plt?
    `.got` is for relocations regarding global 'variables' while `.got.plt` is an auxiliary section to act together with `.plt` when resolving procedures absolute addresses.
    ### Where is mmap space located?
    -- TO DO --
    ### Where is ld loaded?
    -- TO DO --
    ### Where are needed libraries loaded?
    -- TO DO --
    ### What is the difference between Rel and Rela?
    Rel is used in 32-bit systems, instead, Rela is used in 64-bit ones.
    Rela, has an addend, Rel no.
    ### How is process address selected?
    -- TO DO --
    ### How does alignment work?
    -- TO DO --
    ### How are other segments included in PT_LOAD ones?
    -- TO DO --
    ### What happens if we include more than one shared-library?
    -- TO DO --
    ### When a() (local) calls b() (libc) and b() calls c() (libc too) is c() imported in .dynsym?
    -- TO DO --
    ## References
    - [Practical Linux Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly](https://www.amazon.es/Practical-Binary-Analysis-Instrumentation-Disassembly/dp/1593279124) By Dennis Andriesse
  17. @lockedbyte lockedbyte revised this gist Jul 28, 2020. 1 changed file with 0 additions and 2 deletions.
    2 changes: 0 additions & 2 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1208,8 +1208,6 @@ When the binary is executed on another system, the interpreter tries to find tha
    - [https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/](https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/)
    - [https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/](https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/)
    - [https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/](https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/)
    - `/usr/include/elf.h`
  18. @lockedbyte lockedbyte revised this gist Jul 27, 2020. No changes.
  19. @lockedbyte lockedbyte revised this gist Jul 27, 2020. No changes.
  20. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 19 additions and 15 deletions.
    34 changes: 19 additions & 15 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1139,22 +1139,22 @@ As explained previously, they get loaded in the process memory space, and the li
    ## Common objects and functions
    - `frame_dummy`:
    - `_start`:
    - `_init`:
    - `__libc_start_main`:
    - `deregister_tm_clones`:
    - `register_tm_clones`:
    - `__stack_chk_fail`:
    - `__do_global_dtors_aux`:
    - `__do_global_dtors_aux_fini_array_entry`:
    - `__init_array_end`:
    - `__frame_dummy_init_array_entry`:
    - `__init_array_start`:
    - `__libc_csu_init`:
    - `__libc_csu_fini`:
    - `frame_dummy`
    - `_start`
    - `_init`
    - `__libc_start_main`
    - `deregister_tm_clones`
    - `register_tm_clones`
    - `__stack_chk_fail`
    - `__do_global_dtors_aux`
    - `__do_global_dtors_aux_fini_array_entry`
    - `__init_array_end`
    - `__frame_dummy_init_array_entry`
    - `__init_array_start`
    - `__libc_csu_init`
    - `__libc_csu_fini`
    - `.eh_frame`:
    - `.eh_frame`
    ## FAQ (Frequently Asked Questions)
    @@ -1208,6 +1208,10 @@ When the binary is executed on another system, the interpreter tries to find tha
    - [https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/](https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/)
    - [https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/](https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/)
    - [https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/](https://www.it-swarm-es.tech/es/c/que-funciones-agrega-gcc-al-linux-elf/822753373/)
    - `/usr/include/elf.h`
    - `ELF(5)` man pages
  21. @lockedbyte lockedbyte revised this gist Jul 27, 2020. No changes.
  22. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 39 additions and 13 deletions.
    52 changes: 39 additions & 13 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1109,7 +1109,33 @@ The other structure is mostly the same as dynamically-linked executables.
    ## Step-by-step ELF loading for each object type, ASLR and PIC/PIE
    First, when an ELF file gets executed, the kernel set ups a process, and starts a virtual memory address space.
    ### Relocatable files
    They are not supposed to be loaded as some relocations are pending to create a fully working executable first.
    ### Statically-linked executable files
    First, when we decide to run an executable the kernel set up a process and give it a virtual memory space.
    Then the operating system maps an interpreter into the process's virtual memory (Usually `ld-linux.so`).
    The interpreter can be retrieved by the `.interp` section in the ELF file.
    The interpreter loads the binary, and gives the control to the entry point of the binary.
    ### Dynamically-linked executable files
    First, when we decide to run an executable the kernel set up a process and give it a virtual memory space.
    Then the operating system maps an interpreter into the process's virtual memory (Usually `ld-linux.so`).
    The interpreter can be retrieved by the `.interp` section in the ELF file.
    The interpreter loads the binary, and parse it to find which libraries does the binary need, and maps them with mmap or similar options and then performs any necessary last-minute relocations in the binary’s code sections to fill in the correct addresses for references to the dynamic libraries.
    ### Shared libraries
    As explained previously, they get loaded in the process memory space, and the linker does the dynamic-linking work.
    ## Common objects and functions
    @@ -1136,33 +1162,33 @@ First, when an ELF file gets executed, the kernel set ups a process, and starts
    Sections are there just to make the linker's work easier. For example, when you, in a relocation want to specify a relocation for `ET_REL` files, you specify the offset within that section.
    ## How does the compiler make dynamically-linked executables (DT_NEEDED)?
    ### How does the compiler make dynamically-linked executables (DT_NEEDED)?
    When the compiler compiles for a dynamically-linked executable, instead of compiling it to a .a library and linking it statically, it creates in the `.dynamic` section specified by `DT_NEEDED` a string with the library name (Eg.: libc.so.6).
    When the binary is executed on another system, the interpreter tries to find that library by name and load it to memory to start the dynamic-linking process.
    ## When using PIC/PIE executables, how do the addresses get patched so the offset is added?
    ### When using PIC/PIE executables, how do the addresses get patched so the offset is added?
    ## What is the difference between .got, .plt.got, .plt and .got.plt?
    ### What is the difference between .got, .plt.got, .plt and .got.plt?
    ## Where is mmap space located?
    ### Where is mmap space located?
    ## Where is ld loaded?
    ### Where is ld loaded?
    ## Where are needed libraries loaded?
    ### Where are needed libraries loaded?
    ## What is the difference between Rel and Rela?
    ### What is the difference between Rel and Rela?
    ## How is process address selected?
    ### How is process address selected?
    ## How does alignment work?
    ### How does alignment work?
    ## How are other segments included in PT_LOAD ones?
    ### How are other segments included in PT_LOAD ones?
    ## What happens if we include more than one shared-library?
    ### What happens if we include more than one shared-library?
    ## When a() (local) calls b() (libc) and b() calls c() (libc too) is c() imported in .dynsym?
    ### When a() (local) calls b() (libc) and b() calls c() (libc too) is c() imported in .dynsym?
    ## References
  23. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1144,6 +1144,8 @@ When the binary is executed on another system, the interpreter tries to find tha
    ## When using PIC/PIE executables, how do the addresses get patched so the offset is added?
    ## What is the difference between .got, .plt.got, .plt and .got.plt?
    ## Where is mmap space located?
    ## Where is ld loaded?
  24. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 21 additions and 1 deletion.
    22 changes: 21 additions & 1 deletion elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1071,22 +1071,42 @@ It provides information to the linker to, once it's time to link it to the rest
    The object file content's is different from the other ELF files such as `ET_EXEC` and `ET_DYN`.
    #### Executable files
    It usually have `.rela.text` and `.rela.eh_frame` sections.
    As it is not a completely formed ELF yet, no specific sections has been created, therefore you will find just common code and data sections, and symbols.
    #### Statically-linked executable files
    Executable files are those that do not depend from external libraries, then no relocations should be pending for them as they can load without external objects.
    They do not need `.dynamic` or the Dynamic segment, they do not need the GOT or PLT as function calls are done directly to the function address and without any intermediate.
    Then in this type of ELF files you will find common code and data sections, and symbols (which can be removed).
    As they are static, if they use libc functions the total size will be considerably long.
    #### Dynamically-linked executable files
    They are still executables, but as they are dynamically linked they are PIC (Process Independient Code).
    They need GOT and PLT as intermediates to use external functions from shared-libraries such as `printf()`.
    In this type of executables you will usually find common code and data sections, the GOT, the PLT, Dynamic-linking symbol sections such as `.dynsym` and `.dynstr` (As well as static symbols which are not needed).
    You will also find the `.dynamic` section, which is crucial for dynamic linking, and `.rela.dyn`, `.rela.plt`.
    #### Shared libraries
    They get loaded in a process memory to provide functions to the executable which is going to use them.
    They are similar to dynamically-linked executables, but not equal.
    Here there is no `PT_INTERP` segment, as the shared-library is not loaded by the kernel but by the linker.
    Also, local functions are included also in `.dynsym` (Not just in `.symtab`), and `__libc_start_main` is not imported.
    The other structure is mostly the same as dynamically-linked executables.
    ## Step-by-step ELF loading for each object type, ASLR and PIC/PIE
    First, when an ELF file gets executed, the kernel set ups a process, and starts a virtual memory address space.
  25. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 11 additions and 1 deletion.
    12 changes: 11 additions & 1 deletion elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1093,12 +1093,22 @@ First, when an ELF file gets executed, the kernel set ups a process, and starts
    ## Common objects and functions
    - `frame_dummy`:
    - `frame_dummy`:
    - `_start`:
    - `_init`:
    - `__libc_start_main`:
    - `deregister_tm_clones`:
    - `register_tm_clones`:
    - `__stack_chk_fail`:
    - `__do_global_dtors_aux`:
    - `__do_global_dtors_aux_fini_array_entry`:
    - `__init_array_end`:
    - `__frame_dummy_init_array_entry`:
    - `__init_array_start`:
    - `__libc_csu_init`:
    - `__libc_csu_fini`:
    - `.eh_frame`:
    ## FAQ (Frequently Asked Questions)
  26. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 6 additions and 16 deletions.
    22 changes: 6 additions & 16 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1093,22 +1093,12 @@ First, when an ELF file gets executed, the kernel set ups a process, and starts
    ## Common objects and functions
    frame_dummy
    _start
    __libc_start_main
    deregister_tm_clones
    __stack_chk_fail
    __overflow
    __cxa_atexit
    __cxa_finalize
    __fprintf_chk
    __errno_location
    __ctype_get_mb_cur_max
    __assert_fail
    __printf_chk
    _ITM_deregisterTMCloneTable
    __gmon_start__
    _ITM_registerTMCloneTable
    - `frame_dummy`:
    - `_start`:
    - `_init`:
    - `__libc_start_main`:
    - `deregister_tm_clones`:
    - `__stack_chk_fail`:
    ## FAQ (Frequently Asked Questions)
  27. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 13 additions and 25 deletions.
    38 changes: 13 additions & 25 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -622,8 +622,6 @@ Division of segments / sections:
    - `.got.plt`
    - `.bss`

    -- TO DO --

    ## Symbols

    Symbols are a symbolic reference to some type of data or code such as a global
    @@ -737,8 +735,6 @@ st_type defines:
    #define STT_HIPROC 15 /* End of processor-specific */
    ```
    -- TO DO --
    ## Dynamic Linking
    ![ELF Dynamic Linking](https://i.imgur.com/3j1GTPL.png)
    @@ -910,8 +906,6 @@ d_tag defines:
    #define DT_PROCNUM DT_MIPS_NUM /* Most used by any processor */
    ```
    -- TO DO --
    ## Relocation
    Relocation is the process of connecting symbolic references with symbolic definitions. Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data.
    @@ -1023,19 +1017,21 @@ They generally are used for debugging purposes, and they make the reverse engine
    But, as dynamic symbols are still present, you can view the imported functions from external libraries like glibc.
    -- TO DO --
    ## Differences between 32-bit and 64-bit ELF objects
    The main differences are:
    - In the ELF header, the `e_machine` changes.
    - The sizes of the values along the ELF file changes too.
    -- TO DO --
    ## Sections VS Segments
    Segments are divided into sections, each section has an utility for the ELF file.
    Sections per se, are not useful at runtime, so they are only useful at link time.
    Segments are used for creating a block of memory, with some specific permissions and store there some content.
    In contrast from other File formats, ELF files are composed of sections and segments. As previously mentioned, sections gather all needed information to link a given object file and build an executable, while Program Headers split the executable into segments with different attributes, which will eventually be loaded into memory.
    In order to understand the relationship between Sections and Segments, we can picture segments as a tool to make the linux loader’s life easier, as they group sections by attributes into single segments in order to make the loading process of the executable more efficient, instead of loading each individual section into memory. The following diagram attempts to illustrate this concept:
    @@ -1046,6 +1042,13 @@ In order to understand the relationship between Sections and Segments, we can pi
    ## In-memory loaded ELF VS ELF file
    ELF files in disk are just a format that defines how to load it in memory to work fine.
    In disk it specifies some not neccesary useful information such as .symtab, .strtab, they are not used at runtime and are there just for debugging purposes.
    Size in memory is usually different than in disk, for example, someone can define uninitialized variables (stored at bss). In disk you just have to specify it's size without occupying that space. Once loaded in memory you have to fill that space somehow, for example with zeroes, so when loading the storage needed to allocate the ELF increases.
    Basic overview:
    ELF file in disk:
    @@ -1058,8 +1061,6 @@ ELF loaded in memory:
    ![In-memory ELF](https://i.imgur.com/sGtvRnH.png)
    -- TO DO --
    ## Differences between ELF objects
    #### Object Files
    @@ -1070,36 +1071,26 @@ It provides information to the linker to, once it's time to link it to the rest
    The object file content's is different from the other ELF files such as `ET_EXEC` and `ET_DYN`.
    -- TO DO --
    #### Executable files
    Executable files are those that do not depend from external libraries, then no relocations should be pending for them as they can load without external objects.
    They do not need `.dynamic` or the Dynamic segment, they do not need the GOT or PLT as function calls are done directly to the function address and without any intermediate.
    -- TO DO --
    #### Dynamically-linked executable files
    They are still executables, but as they are dynamically linked they are PIC (Process Independient Code).
    They need GOT and PLT as intermediates to use external functions from shared-libraries such as `printf()`.
    -- TO DO --
    #### Shared libraries
    They get loaded in a process memory to provide functions to the executable which is going to use them.
    -- TO DO --
    ## Step-by-step ELF loading for each object type, ASLR and PIC/PIE
    First, when an ELF file gets executed, the kernel set ups a process, and starts a virtual memory address space.
    -- TO DO --
    ## Common objects and functions
    frame_dummy
    @@ -1135,13 +1126,10 @@ When the binary is executed on another system, the interpreter tries to find tha
    ## Where is mmap space located?
    ## Where is ld loaded?
    ## Where are needed libraries loaded?
    ## What is the difference between Rel and Rela?
    ## How is process address selected?
  28. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1170,6 +1170,8 @@ When the binary is executed on another system, the interpreter tries to find tha
    - [https://lwn.net/Articles/631631/](https://lwn.net/Articles/631631/)
    - [https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/](https://codywu2010.wordpress.com/2014/11/29/about-elf-pie-pic-and-else/)
    - `/usr/include/elf.h`
    - `ELF(5)` man pages
  29. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 34 additions and 0 deletions.
    34 changes: 34 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1119,6 +1119,40 @@ _ITM_deregisterTMCloneTable
    __gmon_start__
    _ITM_registerTMCloneTable
    ## FAQ (Frequently Asked Questions)
    ### Why do we need sections?
    Sections are there just to make the linker's work easier. For example, when you, in a relocation want to specify a relocation for `ET_REL` files, you specify the offset within that section.
    ## How does the compiler make dynamically-linked executables (DT_NEEDED)?
    When the compiler compiles for a dynamically-linked executable, instead of compiling it to a .a library and linking it statically, it creates in the `.dynamic` section specified by `DT_NEEDED` a string with the library name (Eg.: libc.so.6).
    When the binary is executed on another system, the interpreter tries to find that library by name and load it to memory to start the dynamic-linking process.
    ## When using PIC/PIE executables, how do the addresses get patched so the offset is added?
    ## Where is mmap space located?
    ## Where is ld loaded?
    ## Where are needed libraries loaded?
    ## What is the difference between Rel and Rela?
    ## How is process address selected?
    ## How does alignment work?
    ## How are other segments included in PT_LOAD ones?
    ## What happens if we include more than one shared-library?
    ## When a() (local) calls b() (libc) and b() calls c() (libc too) is c() imported in .dynsym?
    ## References
  30. @lockedbyte lockedbyte revised this gist Jul 27, 2020. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions elf_format_cheatsheet.md
    Original file line number Diff line number Diff line change
    @@ -1130,6 +1130,12 @@ _ITM_registerTMCloneTable
    - [https://hydrasky.com/malware-analysis/elf-file-chapter-2-relocation-and-dynamic-linking/](https://hydrasky.com/malware-analysis/elf-file-chapter-2-relocation-and-dynamic-linking/)
    - [https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/](https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/)
    - [https://man7.org/linux/man-pages/man8/ld.so.8.html](https://man7.org/linux/man-pages/man8/ld.so.8.html)
    - [https://lwn.net/Articles/631631/](https://lwn.net/Articles/631631/)
    - `/usr/include/elf.h`
    - `ELF(5)` man pages