NOTE: This is a working copy. This tutorial is unfinished and may contain inaccuracies.
I've written the title of this tutorial in Chinese, as I suspect that its contents may, at first glance, appear similarly incomprehensible to the audience.
However, just as I can sketch for you the following...
可執行文件 = (可 = can) + (執行 = execute) + (文件 = file) = executable (file)
內容 = contents
分析 = analysis
工具 = tools
Linux可執行文件ㄉ內容分析工具 = Linux Tools for Analysing the Contents of Executables
(more specifically, of course, ELF object files)
... so can we break down the majority of the complexity behind:
- the ELF (executable and linkable format),
- object files,
- how executables are executed,
- how C programmes work,
- how the C standard library works,
- and what these three tools are used for.
In the end, it comes down to reading source code and understanding the protocols and mechanisms that have evolved around programme execution.
The purpose of this tutorial isn't to teach you everything about these tools or about executable formats. It is to demystify aspects of programme execution on Linux and demonstrate how accessible this knowledge is.
The word ELF in this tutorial refers to a specific format and protocol for executable files. There is no equivalent protocol in Python, because Python is an interpreted language. An interpreter sits between the Python programme you want to execute and the operating system on which the programme is executed. That interpreter may itself be written in an interpreted language, but, inevitably, some piece of code must interact directly with the operating system, and this code will need to be structured in a fashion that the OS can understand. ELF is one such format, and there are many others. You can even write your binary format!
Here's a simple introductory document on:
ELF How-To: http://cs.mipt.ru/docs/comp/eng/os/linux/howto/howto_english/elf/elf-howto.html
Wikipedia also has a decent overview: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
As you can see from these documents, ELF is really just a file format that is interpreted by some code in the kernel. If you're curious, take a look at fs/exec.c:do_execve_common which is called when we execute a programme:
https://github.com/torvalds/linux/blob/96c57ade7e9ba2d1deba635a5989cc111f185dca/fs/exec.c#L1428
For ELF binaries, take a look at fs/binfmt_elf.c:load_elf_binary:
https://github.com/torvalds/linux/blob/master/fs/binfmt_elf.c#L571
In the end, this comes down to just formats and code to process files in those formats.
As a side note, if you use apt, you can retrieve the code for any programme installed on your computer using apt-get source. For the tools discussed here, do:
$ apt-get source binutils
The three tools discussed here are packaged as binutils and distributed by GNU/FSF.
Here's the homepage for these tools with descriptions of what they each do:
https://www.gnu.org/software/binutils/
- nm - Lists symbols from object files.
- objdump - Displays information from object files.
- readelf - Displays information from any ELF format object file.
These tools are just programmes written in C that read in files written in the ELF [format] and spits out interesting information about them.
If you're curious how these tools work, just read the source!
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree
But before you read the source, read the manpages to get an overview of what these tools do:
man readelf
http://linuxcommand.org/man_pages/readelf1.html
man objdump
http://linuxcommand.org/man_pages/objdump1.html
man nm