Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save NelsonBigHead/c915d82e8e8a7c47fdd6417a3943e04c to your computer and use it in GitHub Desktop.
Save NelsonBigHead/c915d82e8e8a7c47fdd6417a3943e04c to your computer and use it in GitHub Desktop.

Revisions

  1. @mikesmullin mikesmullin revised this gist Nov 27, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -1129,7 +1129,7 @@ References:
    http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
    - How Rust encodes exceptions and interrupts
    https://os.phil-opp.com/handling-exceptions/
    - NeHe's famous OpenGL game dev tutorials incl. examples in Windows MASM
    - NeHe's famous OpenGL game dev tutorials incl. examples in Windows MASM
    http://nehe.gamedev.net/tutorial/creating_an_opengl_window_(win32)/13001/
    - How Debuggers Work w/ Breakpoints
    https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints
  2. @mikesmullin mikesmullin revised this gist Nov 27, 2018. 1 changed file with 0 additions and 26 deletions.
    26 changes: 0 additions & 26 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -1,31 +1,5 @@
    # Mike's x86-64 Assembly (ASM) Notes

    Most CS graduates learned the bare minimum assembly to pass an exam.
    I am doing it after 15+ years web programming industry experience because I have always been self-taught, and it took a long time for it to became relevant to me, but now I harbor several motivations:

    - **Information Security**:
    - **Reverse Engineering**: Blue team malware analysis
    - **Penetration Testing**: Red team tooling and AV evasion
    - Writing my own [secure] [server and/or personal] **Operating System**,
    or contributing to a cool one that already enjoys broad support (ie. Linux kernel, hardening)
    - **Anonymity and Anti-Surveillance**: Understanding how undocumented features of commodity CPU vendors hardware lead to
    some of the most egregious vulnerabilities and exploits in history (ie. Spectre, Meltdown)
    - **Systems programming** (Windows assembly, Linux driver contribution, debugging)
    - **Game Development and Hacking** (WebAssembly, porting old games, cross-platform compatibility, DRM, shipping assets in tight binary packages, cheats and trainers)
    - **Productivity**: One day I might like to develop my own set of build tools including
    an assembler, linker, custom high-level language, and IDE (or again, contribute to a cool one that already exists like Golang or Rust)

    I feel these notes are compiled with a little more love and attention than the average
    graduate paper or recycled academic book from the 1990s, but a lot of this information
    hasn't changed since then, so I make good use of references, as well. When doing my
    own research, it was sad how much of it was ugly formatted (systems engineers are
    rarely also web developers) and suffering major link rot (many prideful assembly
    warriors proudly coded their own web servers and then got auto-pwned circa 2016
    by vulnerability scanners and remote-code execution, and just went offline), so
    if that happens here just be sure to check archive.org for them. I tried to
    provide summaries as a mirror of the most important stuff, and then link to
    remaining works which are better or more thorough and exhaustive.

    ## Assembling Binary Machine Code

    ### Operating `Modes`:
  3. @mikesmullin mikesmullin revised this gist Nov 26, 2018. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -985,6 +985,8 @@ References:
    http://wwwcdf.pd.infn.it/localdoc/gdbint.pdf
    - Windbg Commands
    https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/commands
    - x86dbg is perhaps the best Windows debugger today
    https://x64dbg.com/
    - Hex-Rays Interactive Disassembler (IDA); most professional, but expensive
    https://www.hex-rays.com/products/ida/index.shtml
    - Binary Ninja; less featureful but cheap, modern interactive disassembler
  4. @mikesmullin mikesmullin revised this gist Nov 25, 2018. 1 changed file with 9 additions and 9 deletions.
    18 changes: 9 additions & 9 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -334,16 +334,16 @@ when referring to these registers, which describes both a) operand width, and b)
    While there are several places you may reference a register, including `MODRM.reg`, `MODRM.rm`, `SIB.index`, `SIB.base`,
    and `PO.reg`, you'll find they all use the same `3` or `4`-bit mapping convention, as follows:

    |Register<br>Reference|<br>Low `8`-bits³|(`3`-bit / `4th`-bit=`0b1`)<br>High `8`-bits¹ ³|<br>Low `16`-bits|<br>Low `32`-bits⁴|<br>Full `64`-bit Register
    |Register<br>Reference|(`3`-bit / `4th`-bit=`0b1`)<br>Low `8`-bits³|<br>High `8`-bits¹ ³|<br>Low `16`-bits|<br>Low `32`-bits⁴|<br>Full `64`-bit Register
    -|-|-|-|-|-
    `0b000`| |`AL`/`R8B` |`AX`/`R8W` |`EAX`/`R8D` |`RAX`/`R8`
    `0b001`| |`CL`/`R9B` |`CX`/`R9W` |`ECX`/`R9D` |`RCX`/`R9`
    `0b010`| |`DL`/`R10B` |`DX`/`R10W`|`EDX`/`R10D`|`RDX`/`R10`
    `0b011`| |`BL`/`R11B` |`BX`/`R11W`|`EBX`/`R11D`|`RBX`/`R11`
    `0b100`|`AH`|`SPL`²/`R12B`|`SP`/`R12W`|`ESP`/`R12D`|`RSP`/`R12`
    `0b101`|`CH`|`BPL`²/`R13B`|`BP`/`R13W`|`EBP`/`R13D`|`RBP`/`R13`
    `0b110`|`DH`|`SIL`²/`R14B`|`SI`/`R14W`|`ESI`/`R14D`|`RSI`/`R14`
    `0b111`|`BH`|`DIL`²/`R15B`|`DI`/`R15W`|`EDI`/`R15D`|`RDI`/`R15`
    `0b000`|`AL`/`R8B` | |`AX`/`R8W` |`EAX`/`R8D` |`RAX`/`R8`
    `0b001`|`CL`/`R9B` | |`CX`/`R9W` |`ECX`/`R9D` |`RCX`/`R9`
    `0b010`|`DL`/`R10B` | |`DX`/`R10W`|`EDX`/`R10D`|`RDX`/`R10`
    `0b011`|`BL`/`R11B` | |`BX`/`R11W`|`EBX`/`R11D`|`RBX`/`R11`
    `0b100`|`SPL`²/`R12B`|`AH`|`SP`/`R12W`|`ESP`/`R12D`|`RSP`/`R12`
    `0b101`|`BPL`²/`R13B`|`CH`|`BP`/`R13W`|`EBP`/`R13D`|`RBP`/`R13`
    `0b110`|`SIL`²/`R14B`|`DH`|`SI`/`R14W`|`ESI`/`R14D`|`RSI`/`R14`
    `0b111`|`DIL`²/`R15B`|`BH`|`DI`/`R15W`|`EDI`/`R15D`|`RDI`/`R15`

    **NOTES:**

  5. @mikesmullin mikesmullin revised this gist Nov 22, 2018. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -989,6 +989,8 @@ References:
    https://www.hex-rays.com/products/ida/index.shtml
    - Binary Ninja; less featureful but cheap, modern interactive disassembler
    https://binary.ninja/
    - Reversed non-standard opcode mappings which may confuse normal disassemblers
    https://github.com/XlogicX/irasm

    ---

  6. @mikesmullin mikesmullin revised this gist Nov 22, 2018. 1 changed file with 29 additions and 5 deletions.
    34 changes: 29 additions & 5 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,30 @@
    # Mike's x86-64 Assembly (ASM) Notes

    > One day I might like to make my own version of ASM or C language, similar to Golang or Rust and/or NASM.
    Most CS graduates learned the bare minimum assembly to pass an exam.
    I am doing it after 15+ years web programming industry experience because I have always been self-taught, and it took a long time for it to became relevant to me, but now I harbor several motivations:

    - **Information Security**:
    - **Reverse Engineering**: Blue team malware analysis
    - **Penetration Testing**: Red team tooling and AV evasion
    - Writing my own [secure] [server and/or personal] **Operating System**,
    or contributing to a cool one that already enjoys broad support (ie. Linux kernel, hardening)
    - **Anonymity and Anti-Surveillance**: Understanding how undocumented features of commodity CPU vendors hardware lead to
    some of the most egregious vulnerabilities and exploits in history (ie. Spectre, Meltdown)
    - **Systems programming** (Windows assembly, Linux driver contribution, debugging)
    - **Game Development and Hacking** (WebAssembly, porting old games, cross-platform compatibility, DRM, shipping assets in tight binary packages, cheats and trainers)
    - **Productivity**: One day I might like to develop my own set of build tools including
    an assembler, linker, custom high-level language, and IDE (or again, contribute to a cool one that already exists like Golang or Rust)

    I feel these notes are compiled with a little more love and attention than the average
    graduate paper or recycled academic book from the 1990s, but a lot of this information
    hasn't changed since then, so I make good use of references, as well. When doing my
    own research, it was sad how much of it was ugly formatted (systems engineers are
    rarely also web developers) and suffering major link rot (many prideful assembly
    warriors proudly coded their own web servers and then got auto-pwned circa 2016
    by vulnerability scanners and remote-code execution, and just went offline), so
    if that happens here just be sure to check archive.org for them. I tried to
    provide summaries as a mirror of the most important stuff, and then link to
    remaining works which are better or more thorough and exhaustive.

    ## Assembling Binary Machine Code

    @@ -646,7 +670,7 @@ a long array of floats to/from a block of memory in one operation.
    The integer part is encoded as a simple unsigned int.
    However, the fractional part is encoded as a base2 binary fraction,
    which commonly results in a [continued fraction](https://en.wikipedia.org/wiki/Continued_fraction) pattern,
    which gets truncated--and can lead to infamous FPU rounding errors if not handled carefully.
    which gets truncated--and can lead to infamous FPU rounding errors if not handled carefully.
    ex: `3.1f` = `0b11` + `0b000 1100 1100 1100 1100 110...` _(the pattern would repeat infinitely if not truncated)_
    This is stored little-endian so any zero-fill happens on the right side.

    @@ -988,7 +1012,7 @@ Here is some useful trivia about that:
    match the exact release version and compiler used.
    - Confusingly, there is no trivial way to tell static and dynamic `.lib` files apart,
    except that [dynamic] import libraries for DLLs will be much smaller than the
    matching static library would be.
    matching static library would be.
    - `.lib` files may only be used at compile time to build statically linked binaries.
    - `.dll` files are intended to only be used at runtime to as dynamically linked binaries.
    - Technically `.dll` files contain enough information that a reverse engineer could
    @@ -1005,7 +1029,7 @@ Here is some useful trivia about that:
    code which uses the `.dll`, or via fuzz testing.
    - The version of `gcc` toolchain GNU linker (`ld`) ported to Windows can statically
    link using `.dll` inputs directly, which means it is able to implicitly synthesize
    the normally required but missing `.lib` stubs automagically!
    the normally required but missing `.lib` stubs automagically!
    - **Decorated names** or **mangled names** are a symbol naming convention used in the
    COFF files. They are a series of ASCII prefix and suffixes which guarantee that
    each function is named uniquely when merged into the same flat COFF table format.
    @@ -1030,7 +1054,7 @@ References:
    https://docs.microsoft.com/en-us/windows/desktop/Debug/pe-format
    - MSDN Article from 2002 going into tremendous depth on history and intentions
    http://www.delphibasics.info/home/delphibasicsarticles/anin-depthlookintothewin32portableexecutablefileformat-part1
    http://www.delphibasics.info/home/delphibasicsarticles/anin-depthlookintothewin32portableexecutablefileformat-part2
    http://www.delphibasics.info/home/delphibasicsarticles/anin-depthlookintothewin32portableexecutablefileformat-part2
    - MSDN Article from Mar 2002 detailing steps the Windows Loader takes with PE binaries
    https://www.cnblogs.com/binsys/articles/2711010.html
    - Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
  7. @mikesmullin mikesmullin revised this gist Nov 22, 2018. 1 changed file with 88 additions and 1 deletion.
    89 changes: 88 additions & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -522,7 +522,10 @@ References:

    ---

    # Appendix: Extensions
    # Appendix: x86 Extensions

    As new models of the x86 family are released, the instruction set is extended with new features.
    Here we provide a chronologically ordered summary of what was added, when, and why.

    ## History of the FPU

    @@ -617,6 +620,84 @@ References:

    ---

    # Floating Point Numbers (IEEE-754)

    Floats come in various sizes. When serialized for compact transmission over the
    network, a clever dev may try to encode them as a string, or a tuple of 1-byte
    integers (integer and mantissa, optionally an exponent). But when you need the
    processor to do really quick, especially bulk, binary floating point math, the
    following is the standard form used everywhere.

    SIMD instructions operate almost exclusively on `ST0-7`, `MMX0-7`, and more recently
    the `XMM0-15` registers. When utilizing the high-precision `80`/`128`-bit values,
    you may need to perform multiple `MOV` and `PUSH` operations to fill the entire
    register, since the other registers and `immediate` operands are much smaller.
    As an optimization, some instructions accept a memory pointer operand to read/write
    a long array of floats to/from a block of memory in one operation.

    ### Data Structure:

    - **`1`-bit Sign** (`0`=positive)
    - **`8`-bit base2 Exponent add `+127` bias** ([why not signed two's compliment?](https://stackoverflow.com/a/2835476))
    Take the whole number integer part, convert to binary, remove any [insignificant] 0 prefixes, count digits, minus one, that's the binary exponent
    convert that binary exponent (say, 8 digits) to binary and add +127
    - **`23`-bit Mantissa a.k.a. Significand**
    This is the combination of the integer and fractional parts concatenated.
    The integer part is encoded as a simple unsigned int.
    However, the fractional part is encoded as a base2 binary fraction,
    which commonly results in a [continued fraction](https://en.wikipedia.org/wiki/Continued_fraction) pattern,
    which gets truncated--and can lead to infamous FPU rounding errors if not handled carefully.
    ex: `3.1f` = `0b11` + `0b000 1100 1100 1100 1100 110...` _(the pattern would repeat infinitely if not truncated)_
    This is stored little-endian so any zero-fill happens on the right side.

    ## Let's manually encode `1.0f`!

    - **sign:** `0b0` = a positive number
    - **mantissa:** `0b1` + `0b0` zero-extended
    _(It is easier to calculate in this order because the mantissa value informs the exponent value.)_
    - **exponent:** `0d0` + `0d127` = `0d127` = `0b01111111`

    ```
    IEEE-754 32-bit (single precision) Floating Point (x86; little-endian)
    offset 0 1 10 32
    single [0 0111 1111 1000 0000 0000 0000 0000 000] = 0x3f800000 = 1.0f
    | | | | |
    sign 1 | | | |
    exponent |<--8-->| | |
    mantissa |<-----------23----------->|
    ```

    The structure is the same for `64`-bit (`double` precision) floats
    except the exponent has `11` bits, and a bias of `+1023`.

    The exponent bit has a four magic values which have reserved special meanings:

    Exponent | Mantissa | Meaning
    -|-|-
    `0b0` | `0b0` | zero (`0d0`)
    `0b0` | non-zero | denormalized
    all `0b1`'s | `0b0` | `Infinity`
    all `0b1`'s | non-zero | `NaN` ¹

    **NOTES**:
    1. You can hide data inside the mantissa of `NaN` structures.
    Some compilers use this to specify more precise reason codes (ie. if `NaN` resulted from failed computation.)

    References:
    - How to encode a float by hand
    https://www.youtube.com/watch?v=8afbTaA-gOQ
    - University lecture explaining the math
    https://www.youtube.com/watch?v=03fhijH6e2w
    - University lecture performing addition by hand
    https://www.youtube.com/watch?v=KiWz-mGFqHI
    - Interactive hosted calculator
    https://babbage.cs.qc.cuny.edu/ieee-754.old/decimal.html
    - Explaining floating point rounding errors
    https://www.youtube.com/watch?v=PZRI1IfStY0

    ---

    # Appendix: Stack vs. Heap

    The stack is a data structure in memory the processor can understand and maintain,
    @@ -726,6 +807,8 @@ References:
    https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
    - Stack Frame Layout on x64
    https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64
    - Microsoft __fastcall 64 ABI calling convention
    https://msdn.microsoft.com/en-us/library/ms235286.aspx

    ---

    @@ -1044,10 +1127,14 @@ References:
    http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
    - How Rust encodes exceptions and interrupts
    https://os.phil-opp.com/handling-exceptions/
    - NeHe's famous OpenGL game dev tutorials incl. examples in Windows MASM
    http://nehe.gamedev.net/tutorial/creating_an_opengl_window_(win32)/13001/
    - How Debuggers Work w/ Breakpoints
    https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints
    - Ralf Brown's BIOS Interrupt List
    http://www.ctyme.com/rbrown.htm
    - Agner Fog's books and blog, reknown for advanced assembly information
    https://www.agner.org/optimize/
    - Intel® 64 and IA-32 Architectures Software Developer Manuals
    https://software.intel.com/en-us/articles/intel-sdm
    - AMD64 Architecture Programmer's Manual
  8. @mikesmullin mikesmullin revised this gist Nov 18, 2018. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -724,6 +724,8 @@ References:
    https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
    - Anatomy of a Program in Memory
    https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
    - Stack Frame Layout on x64
    https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64

    ---

    @@ -846,6 +848,8 @@ References:
    https://www.pcjs.org/pubs/pc/software/tools/microsoft/masm/5.00/
    - Third-party community support forums (anecdotal information and references)
    http://www.masm32.com/board/
    - Art of Assembly (contains summary of MASM syntax)
    http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_8/CH08-1.html#top
    - Steve Gibson's MASM enthusiast page
    https://www.grc.com/smgassembly.htm
    - Netwide Assembler (NASM) Documentation
  9. @mikesmullin mikesmullin revised this gist Nov 17, 2018. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -974,10 +974,12 @@ References:
    https://en.wikipedia.org/wiki/Address_space_layout_randomization
    - In 2017 "ASLR⊕Cache" attack demonstrated defeating ASLR from a web browser using JavaScript
    https://www.vusec.net/projects/anc/
    - NTSTATUS values (Windows `%errorlevel%` codes)
    - NTSTATUS values (Windows %errorlevel% codes)
    https://msdn.microsoft.com/en-us/library/cc704588.aspx
    - Official intro and reference for Windows-based graphical user interfaces
    https://docs.microsoft.com/en-us/windows/desktop/winmsg/windowing
    - Windows System Error Codes
    https://docs.microsoft.com/en-us/windows/desktop/Debug/system-error-codes

    ---

  10. @mikesmullin mikesmullin revised this gist Nov 17, 2018. No changes.
  11. @mikesmullin mikesmullin revised this gist Nov 17, 2018. 1 changed file with 0 additions and 2 deletions.
    2 changes: 0 additions & 2 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -813,8 +813,6 @@ Calculated in some of the following ways:
    the segment_selector refers to the GDT which refers to a protected memory page
    the offset is the address relative to that.

    **NOTE:** pointers are only used by JMP and CALL instructions.

    References:
    - Using Short/Relative vs. Far Jumps
    https://thestarman.pcministry.com/asm/2bytejumps.htm
  12. @mikesmullin mikesmullin revised this gist Nov 17, 2018. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -978,6 +978,8 @@ References:
    https://www.vusec.net/projects/anc/
    - NTSTATUS values (Windows `%errorlevel%` codes)
    https://msdn.microsoft.com/en-us/library/cc704588.aspx
    - Official intro and reference for Windows-based graphical user interfaces
    https://docs.microsoft.com/en-us/windows/desktop/winmsg/windowing

    ---

  13. @mikesmullin mikesmullin revised this gist Nov 17, 2018. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -104,7 +104,7 @@ References:
    https://stackoverflow.com/questions/21165678/why-64-bit-mode-long-mode-doesnt-use-segment-registers
    - How much memory can a 64-bit machine address? (physically, logically, and theoretically)
    https://superuser.com/questions/168114/how-much-memory-can-a-64bit-machine-address-at-a-time
    - Open Security Training: Intermedia Intel x86: Architecture, Assembly, and Applications
    - Open Security Training: Intermediate Intel x86: Architecture, Assembly, and Applications
    https://www.youtube.com/playlist?list=PL8F8D45D6C1FFD177

    ##### REX Prefix Byte Data Structure (8 bits)
    @@ -854,6 +854,8 @@ References:
    https://www.nasm.us/doc/
    - NASM Tutorial
    http://cs.lmu.edu/~ray/notes/nasmtutorial/
    - SASM: Simple crossplatform IDE for NASM, MASM, GAS, FASM assembly languages
    https://dman95.github.io/SASM/english.html

    ---

  14. @mikesmullin mikesmullin revised this gist Nov 16, 2018. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -873,7 +873,7 @@ References:
    - GDB Internals
    http://wwwcdf.pd.infn.it/localdoc/gdbint.pdf
    - Windbg Commands
    http://windbg.info/doc/1-common-cmds.html
    https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/commands
    - Hex-Rays Interactive Disassembler (IDA); most professional, but expensive
    https://www.hex-rays.com/products/ida/index.shtml
    - Binary Ninja; less featureful but cheap, modern interactive disassembler
    @@ -974,6 +974,8 @@ References:
    https://en.wikipedia.org/wiki/Address_space_layout_randomization
    - In 2017 "ASLR⊕Cache" attack demonstrated defeating ASLR from a web browser using JavaScript
    https://www.vusec.net/projects/anc/
    - NTSTATUS values (Windows `%errorlevel%` codes)
    https://msdn.microsoft.com/en-us/library/cc704588.aspx

    ---

  15. @mikesmullin mikesmullin revised this gist Nov 15, 2018. 1 changed file with 25 additions and 3 deletions.
    28 changes: 25 additions & 3 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -885,7 +885,8 @@ References:

    Windows executables (`*.exe`, `*.dll`) use **Portable Executable** (`PE`) format,
    which is a wrapper around and **Component Object File Format** (`COFF`), which is
    used by binary linker files (`*.obj`, `*.lib`).
    used by binary linker files (`*.obj`, `*.lib`). Technically Windows 64-bit uses
    a version internally called PE32+.

    A linker (ie. `link.exe`, `cl.exe`, `ld`, etc.) is basically designed to parse one
    or more `COFF` files, and wrap them into a single executable with a `PE` header.
    @@ -895,9 +896,12 @@ Here is some useful trivia about that:
    - Microsoft COFF is an extended version of the original by AT&T.
    - `.obj` and `.lib` files contain a simple table data structure mapping
    unique ASCII string symbol names to code or address offsets in another file.
    - `.lib` may include source code, but most of the time (e.g., in Visual Studio)
    they are just headers with pointers to address offsets in a `.dll` which must
    - `.lib` may include source code (static), but most of the time (e.g., in Visual Studio)
    they are just header stubs (dynamic) with pointers to address offsets in a `.dll` which must
    match the exact release version and compiler used.
    - Confusingly, there is no trivial way to tell static and dynamic `.lib` files apart,
    except that [dynamic] import libraries for DLLs will be much smaller than the
    matching static library would be.
    - `.lib` files may only be used at compile time to build statically linked binaries.
    - `.dll` files are intended to only be used at runtime to as dynamically linked binaries.
    - Technically `.dll` files contain enough information that a reverse engineer could
    @@ -912,6 +916,9 @@ Here is some useful trivia about that:
    when, and what effect they have on the `.dll` functions.
    Though a determined hacker could successfully guess them by looking at example
    code which uses the `.dll`, or via fuzz testing.
    - The version of `gcc` toolchain GNU linker (`ld`) ported to Windows can statically
    link using `.dll` inputs directly, which means it is able to implicitly synthesize
    the normally required but missing `.lib` stubs automagically!
    - **Decorated names** or **mangled names** are a symbol naming convention used in the
    COFF files. They are a series of ASCII prefix and suffixes which guarantee that
    each function is named uniquely when merged into the same flat COFF table format.
    @@ -934,6 +941,11 @@ Here is some useful trivia about that:
    References:
    - PE Format
    https://docs.microsoft.com/en-us/windows/desktop/Debug/pe-format
    - MSDN Article from 2002 going into tremendous depth on history and intentions
    http://www.delphibasics.info/home/delphibasicsarticles/anin-depthlookintothewin32portableexecutablefileformat-part1
    http://www.delphibasics.info/home/delphibasicsarticles/anin-depthlookintothewin32portableexecutablefileformat-part2
    - MSDN Article from Mar 2002 detailing steps the Windows Loader takes with PE binaries
    https://www.cnblogs.com/binsys/articles/2711010.html
    - Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
    https://msdn.microsoft.com/en-us/library/ms809762.aspx
    - Portable Executable
    @@ -952,6 +964,16 @@ References:
    https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names?view=vs-2017
    - Linking Explicitly
    https://msdn.microsoft.com/en-us/library/784bt7z7.aspx
    - Creating the smallest possible PE executable
    https://web.archive.org/web/20101024125357/http://www.phreedom.org:80/solar/code/tinype/
    - DLL search order
    https://docs.microsoft.com/en-us/windows/desktop/dlls/dynamic-link-library-search-order
    - CppCon 2017: James McNellis “Everything You Ever Wanted to Know about DLLs”
    https://www.youtube.com/watch?v=JPQWQfDhICA
    - Address Space Layout Randomization (ASLR)
    https://en.wikipedia.org/wiki/Address_space_layout_randomization
    - In 2017 "ASLR⊕Cache" attack demonstrated defeating ASLR from a web browser using JavaScript
    https://www.vusec.net/projects/anc/

    ---

  16. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -948,6 +948,10 @@ References:
    http://wjradburn.com/software/
    - Difference between .lib and .dll
    http://www.differencebetween.net/technology/difference-between-lib-and-dll/
    - Decorated/Mangled Names
    https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names?view=vs-2017
    - Linking Explicitly
    https://msdn.microsoft.com/en-us/library/784bt7z7.aspx

    ---

  17. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -1000,6 +1000,8 @@ References:
    https://www.youtube.com/watch?v=oaVwzYN6BP4
    - Visual x86, x64, and ARM Emulator
    https://www.codeproject.com/Articles/478527/X86-ARM-Emulator
    - Build an 8-bit computer from scratch
    https://eater.net/8bit/parts
    - C to Linux x86-64 Assembly (ASM) examples
    https://gist.github.com/mikesmullin/6330894
    - Linux x86_64 Syscall Table
    @@ -1013,4 +1015,4 @@ References:
    - Intel® 64 and IA-32 Architectures Software Developer Manuals
    https://software.intel.com/en-us/articles/intel-sdm
    - AMD64 Architecture Programmer's Manual
    https://www.amd.com/system/files/TechDocs/24594.pdf
    https://www.amd.com/system/files/TechDocs/24594.pdf
  18. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -906,7 +906,7 @@ Here is some useful trivia about that:
    constants passed as function arguments. These are typically shared in the form of
    a C header (`*.h`) file, as part of an SDK (e.g,
    [windows sdk](https://docs.microsoft.com/en-us/windows/desktop/winmsg/wm-destroy)
    , [opengl sdk](view-source:https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)),
    , [opengl sdk](https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)),
    if the developer wants you to have them.
    The other thing you may not have is the documentation about what inputs are valid,
    when, and what effect they have on the `.dll` functions.
  19. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -906,7 +906,7 @@ Here is some useful trivia about that:
    constants passed as function arguments. These are typically shared in the form of
    a C header (`*.h`) file, as part of an SDK (e.g,
    [windows sdk](https://docs.microsoft.com/en-us/windows/desktop/winmsg/wm-destroy)
    , [opengl sdk](https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)),
    , [opengl sdk](view-source:https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)),
    if the developer wants you to have them.
    The other thing you may not have is the documentation about what inputs are valid,
    when, and what effect they have on the `.dll` functions.
  20. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -938,7 +938,7 @@ References:
    https://msdn.microsoft.com/en-us/library/ms809762.aspx
    - Portable Executable
    https://en.wikipedia.org/wiki/Portable_Executable
    - Official Microsoft PE/COFF Technical Specification for Rev 6., 1999
    - Official Microsoft PE/COFF Technical Specification for Rev 6., 1999
    https://courses.cs.washington.edu/courses/cse378/03wi/lectures/LinkerFiles/coff.pdf
    - Handy Quick-Reference Posters
    https://github.com/corkami/pics/blob/master/binary/README.md#executables
  21. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -913,8 +913,8 @@ Here is some useful trivia about that:
    Though a determined hacker could successfully guess them by looking at example
    code which uses the `.dll`, or via fuzz testing.
    - **Decorated names** or **mangled names** are a symbol naming convention used in the
    `.obj` files. They are a series of ASCII prefix and suffixes which guarantee that
    each function is named uniquely when merged into the same flat `.obj` table format.
    COFF files. They are a series of ASCII prefix and suffixes which guarantee that
    each function is named uniquely when merged into the same flat COFF table format.
    The additional data mangled into the name includes:
    - The **function name**.
    - The **class name** that the function is a member of, if it is a member function.
  22. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 18 additions and 24 deletions.
    42 changes: 18 additions & 24 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -65,6 +65,10 @@ References:
    https://www.cs.virginia.edu/~evans/cs216/guides/x86.html
    - Why is Displacement limited to 32-bits?
    https://stackoverflow.com/questions/31853189/x86-64-assembly-why-displacement-not-64-bits
    - Opcode Reference (Complex)
    http://ref.x86asm.net/
    - Opcode Reference (Simple)
    http://www.felixcloutier.com/x86/

    #### The Prefix

    @@ -120,7 +124,8 @@ Trivia:
    References:
    - Nice illustration of REX bits being prepended
    https://paul.bone.id.au/2018/09/26/more-x86-addressing/

    - Good explanation of encoding the RAX prefix for Long mode 64-bit registers
    https://www.systutorials.com/72643/beginners-guide-x86-64-instruction-encoding/

    ### The Operation Code (Opcode)

    @@ -338,7 +343,9 @@ References:
    http://lomont.org/Math/Papers/2009/Introduction%20to%20x64%20Assembly.pdf
    - x86 Oddities, Ange Albertini (Reverse Engineer), 2017
    https://github.com/corkami/docs/blob/master/x86/x86.md

    - Good table of Registers
    https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture
    https://www.tortall.net/projects/yasm/manual/html/arch-x86-registers.html

    ### The `Memory` Address Operand

    @@ -436,6 +443,8 @@ References:
    https://stackoverflow.com/a/33328318
    - Addressing modes
    https://en.wikipedia.org/wiki/Addressing_mode#Simple_addressing_modes_for_data
    - Memory Translation and Segmentation
    https://manybutfinite.com/post/memory-translation-and-segmentation/

    ---

    @@ -713,6 +722,8 @@ References:
    https://stackoverflow.com/questions/21021223/how-does-the-gcc-determine-stack-size-the-function-based-on-c-will-use
    - C dynamic memory allocation
    https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
    - Anatomy of a Program in Memory
    https://manybutfinite.com/post/anatomy-of-a-program-in-memory/

    ---

    @@ -841,6 +852,8 @@ References:
    https://www.grc.com/smgassembly.htm
    - Netwide Assembler (NASM) Documentation
    https://www.nasm.us/doc/
    - NASM Tutorial
    http://cs.lmu.edu/~ray/notes/nasmtutorial/

    ---

    @@ -971,35 +984,18 @@ References:

    ---

    # Appendix: General References
    # Appendix: Miscellanous Tools & References

    - Manually assembling mnemonic opcodes to binary machine code
    https://en.wikibooks.org/wiki/X86_Assembly/Machine_Language_Conversion#8086_instruction_format_(16_bit)
    - The x86 Instruction Structure
    https://www.codeproject.com/articles/662301/x-instruction-encoding-revealed-bit-twiddling-fo
    - X86-64 Instruction Encoding
    https://wiki.osdev.org/X86-64_Instruction_Encoding
    - CPU Rings Privilege and Protection
    https://manybutfinite.com/post/cpu-rings-privilege-and-protection/
    - Memory Translation and Segmentation
    https://manybutfinite.com/post/memory-translation-and-segmentation/
    - Stack vs. Heap: RAM Memory Layout
    https://imgur.com/gallery/DflKz1C
    - Opcode Reference (Complex)
    http://ref.x86asm.net/
    - Opcode Reference (Simple)
    http://www.felixcloutier.com/x86/
    - Good table of Registers
    https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture
    https://www.tortall.net/projects/yasm/manual/html/arch-x86-registers.html
    - Interesting overview from Haskell to Machine code
    http://www.stephendiehl.com/posts/monads_machine_code.html
    - Good explanation of encoding the RAX prefix for Long mode 64-bit registers
    https://www.systutorials.com/72643/beginners-guide-x86-64-instruction-encoding/
    - Intel: Introduction to x64 Assembly (an official guide)
    https://software.intel.com/en-us/articles/introduction-to-x64-assembly/
    - Determining whether Operand and Address size is 8, 16, 32, or 64 bits
    https://wiki.osdev.org/X86-64_Instruction_Encoding#Operand-size_and_address-size_override_prefix
    - Punching Cards (for FORTRAN programming)
    https://www.youtube.com/watch?v=oaVwzYN6BP4
    - Visual x86, x64, and ARM Emulator
    @@ -1008,15 +1004,13 @@ References:
    https://gist.github.com/mikesmullin/6330894
    - Linux x86_64 Syscall Table
    http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
    - Anatomy of a Program in Memory
    https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
    - How Rust encodes exceptions and interrupts
    https://os.phil-opp.com/handling-exceptions/
    - How Debuggers Work w/ Breakpoints
    https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints
    - Ralf Brown's BIOS Interrupt List
    http://www.ctyme.com/rbrown.htm
    - NASM Tutorial
    http://cs.lmu.edu/~ray/notes/nasmtutorial/
    - Intel® 64 and IA-32 Architectures Software Developer Manuals
    https://software.intel.com/en-us/articles/intel-sdm
    - AMD64 Architecture Programmer's Manual
    https://www.amd.com/system/files/TechDocs/24594.pdf
  23. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 3 additions and 5 deletions.
    8 changes: 3 additions & 5 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -900,11 +900,9 @@ Here is some useful trivia about that:
    Though a determined hacker could successfully guess them by looking at example
    code which uses the `.dll`, or via fuzz testing.
    - **Decorated names** or **mangled names** are a symbol naming convention used in the
    `.obj` files. They are a series of ASCII prefix and suffixes which communicate
    additional detail about the function in order to reduce the likelihood that
    a function by the same name but different overloaded signature will not be
    clobbered when merged into the same flat `.obj` table format. The additional data
    mangled into the name includes:
    `.obj` files. They are a series of ASCII prefix and suffixes which guarantee that
    each function is named uniquely when merged into the same flat `.obj` table format.
    The additional data mangled into the name includes:
    - The **function name**.
    - The **class name** that the function is a member of, if it is a member function.
    This may include the class that encloses the class that contains the function, and so on.
  24. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -900,7 +900,7 @@ Here is some useful trivia about that:
    Though a determined hacker could successfully guess them by looking at example
    code which uses the `.dll`, or via fuzz testing.
    - **Decorated names** or **mangled names** are a symbol naming convention used in the
    `.obj` files. they are a series of ASCII prefix and suffixes which communicate
    `.obj` files. They are a series of ASCII prefix and suffixes which communicate
    additional detail about the function in order to reduce the likelihood that
    a function by the same name but different overloaded signature will not be
    clobbered when merged into the same flat `.obj` table format. The additional data
  25. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -861,6 +861,10 @@ References:
    http://wwwcdf.pd.infn.it/localdoc/gdbint.pdf
    - Windbg Commands
    http://windbg.info/doc/1-common-cmds.html
    - Hex-Rays Interactive Disassembler (IDA); most professional, but expensive
    https://www.hex-rays.com/products/ida/index.shtml
    - Binary Ninja; less featureful but cheap, modern interactive disassembler
    https://binary.ninja/

    ---

  26. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -882,15 +882,15 @@ Here is some useful trivia about that:
    they are just headers with pointers to address offsets in a `.dll` which must
    match the exact release version and compiler used.
    - `.lib` files may only be used at compile time to build statically linked binaries.
    - `.dll` files are intended to only be used at runtime to as dynamically linked binaries,
    - `.dll` files are intended to only be used at runtime to as dynamically linked binaries.
    - Technically `.dll` files contain enough information that a reverse engineer could
    statically link them without a `.lib`, if they wanted to.
    - If you only have a `.dll`, you may be missing the compile-time
    constants passed as function arguments. These are typically shared in the form of
    a C header (`*.h`) file, as part of an SDK (e.g,
    [windows sdk](https://docs.microsoft.com/en-us/windows/desktop/winmsg/wm-destroy)
    , [opengl sdk](https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)
    ), if the developer wants you to have them.
    , [opengl sdk](https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)),
    if the developer wants you to have them.
    The other thing you may not have is the documentation about what inputs are valid,
    when, and what effect they have on the `.dll` functions.
    Though a determined hacker could successfully guess them by looking at example
  27. @mikesmullin mikesmullin revised this gist Nov 13, 2018. 1 changed file with 93 additions and 1 deletion.
    94 changes: 93 additions & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -810,6 +810,40 @@ References:

    ---

    # Appendix: Brief History of Assemblers

    One of the earliest commercial-grade assembler tools was **Microsoft Macro Assembler** (`MASM`) in 1981.
    It was initially marketed for commercial use, and included documentation.
    Beginning with v7 (1991) it was only available packaged with various
    Microsoft SDKs and C compilers, and its license required you to own a copy of
    Visual Studio. Since then its documentation has also become sparse and difficult
    to get ahold of.

    Its early influence led to many derivatives; importantly, it inspired the
    open-source **Netwide Assembler** (`NASM`) project, which is basically MASM with
    improvements that allow it to work across all platforms.

    Some hardcore enthusiasts still author primarily in MASM and hoan their techniques
    by collecting, preserving, and resharing rare code artifacts from fellow enthusiasts.

    Today there are numerous assemblers to choose from, including Richard Stallman's
    GNU Assembler (`GAS`) which ships with Linux coreutils, but these are the most
    common choices.

    References:
    - Current [but sparse] Official Microsoft Macro Assembler Reference
    https://docs.microsoft.com/en-us/cpp/assembler/masm/microsoft-macro-assembler-reference?view=vs-2017
    - PCjs Project: kindly hosted mirror of old Microsoft Macro Assembler 5.00 Manuals (1987)
    https://www.pcjs.org/pubs/pc/software/tools/microsoft/masm/5.00/
    - Third-party community support forums (anecdotal information and references)
    http://www.masm32.com/board/
    - Steve Gibson's MASM enthusiast page
    https://www.grc.com/smgassembly.htm
    - Netwide Assembler (NASM) Documentation
    https://www.nasm.us/doc/

    ---

    # Appendix: Reverse Engineering & Malware Analysis

    References:
    @@ -830,7 +864,57 @@ References:

    ---

    # Appendix: Windows PE Binary format
    # Appendix: Windows PE/COFF Binary format

    Windows executables (`*.exe`, `*.dll`) use **Portable Executable** (`PE`) format,
    which is a wrapper around and **Component Object File Format** (`COFF`), which is
    used by binary linker files (`*.obj`, `*.lib`).

    A linker (ie. `link.exe`, `cl.exe`, `ld`, etc.) is basically designed to parse one
    or more `COFF` files, and wrap them into a single executable with a `PE` header.

    Here is some useful trivia about that:
    - `.obj` is Windows COFF, `.o` is the equivalent Linux ELF; same purpose, different formats.
    - Microsoft COFF is an extended version of the original by AT&T.
    - `.obj` and `.lib` files contain a simple table data structure mapping
    unique ASCII string symbol names to code or address offsets in another file.
    - `.lib` may include source code, but most of the time (e.g., in Visual Studio)
    they are just headers with pointers to address offsets in a `.dll` which must
    match the exact release version and compiler used.
    - `.lib` files may only be used at compile time to build statically linked binaries.
    - `.dll` files are intended to only be used at runtime to as dynamically linked binaries,
    - Technically `.dll` files contain enough information that a reverse engineer could
    statically link them without a `.lib`, if they wanted to.
    - If you only have a `.dll`, you may be missing the compile-time
    constants passed as function arguments. These are typically shared in the form of
    a C header (`*.h`) file, as part of an SDK (e.g,
    [windows sdk](https://docs.microsoft.com/en-us/windows/desktop/winmsg/wm-destroy)
    , [opengl sdk](https://www.khronos.org/registry/OpenGL/api/GLES2/gl2.h)
    ), if the developer wants you to have them.
    The other thing you may not have is the documentation about what inputs are valid,
    when, and what effect they have on the `.dll` functions.
    Though a determined hacker could successfully guess them by looking at example
    code which uses the `.dll`, or via fuzz testing.
    - **Decorated names** or **mangled names** are a symbol naming convention used in the
    `.obj` files. they are a series of ASCII prefix and suffixes which communicate
    additional detail about the function in order to reduce the likelihood that
    a function by the same name but different overloaded signature will not be
    clobbered when merged into the same flat `.obj` table format. The additional data
    mangled into the name includes:
    - The **function name**.
    - The **class name** that the function is a member of, if it is a member function.
    This may include the class that encloses the class that contains the function, and so on.
    - The **namespace** the function belongs to, if it is part of a namespace.
    - The C function **parameter types**, in order.
    - The **calling convention**.
    - The **return type** of the function.
    - You can decode decorated/mangled names using supplied tools, like so:
    ```
    "> C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /symbols "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\lib\amd64\msvcrt.lib"
    "> C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\undname.exe" "??$?RUTlsDtorNode@@@__crt_internal_free_policy@@QEBAXQEBUTlsDtorNode@@@Z"
    Undecoration of :- "??$?RUTlsDtorNode@@@__crt_internal_free_policy@@QEBAXQEBUTlsDtorNode@@@Z"
    is :- "public: void __cdecl __crt_internal_free_policy::operator()<struct TlsDtorNode>(struct TlsDtorNode const * __ptr64 const)const __ptr64"
    ```

    References:
    - PE Format
    @@ -839,8 +923,16 @@ References:
    https://msdn.microsoft.com/en-us/library/ms809762.aspx
    - Portable Executable
    https://en.wikipedia.org/wiki/Portable_Executable
    - Official Microsoft PE/COFF Technical Specification for Rev 6., 1999
    https://courses.cs.washington.edu/courses/cse378/03wi/lectures/LinkerFiles/coff.pdf
    - Handy Quick-Reference Posters
    https://github.com/corkami/pics/blob/master/binary/README.md#executables
    - CFF Explorer Suite: view structure of PE files (not COFF files tho)
    https://ntcore.com/?page_id=388
    - PEView: view structure of 32-bit PE/COFF files
    http://wjradburn.com/software/
    - Difference between .lib and .dll
    http://www.differencebetween.net/technology/difference-between-lib-and-dll/

    ---

  28. @mikesmullin mikesmullin revised this gist Nov 11, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -305,7 +305,7 @@ when referring to these registers, which describes both a) operand width, and b)
    While there are several places you may reference a register, including `MODRM.reg`, `MODRM.rm`, `SIB.index`, `SIB.base`,
    and `PO.reg`, you'll find they all use the same `3` or `4`-bit mapping convention, as follows:

    |Register<br>Reference|(`3`-bit / `4th`-bit=`0b1`)<br>Low `8`-bits³|<br>High `8`-bits¹ ³|<br>Low `16`-bits|<br>Low `32`-bits⁴|<br>Full `64`-bit Register
    |Register<br>Reference|<br>Low `8`-bits³|(`3`-bit / `4th`-bit=`0b1`)<br>High `8`-bits¹ ³|<br>Low `16`-bits|<br>Low `32`-bits⁴|<br>Full `64`-bit Register
    -|-|-|-|-|-
    `0b000`| |`AL`/`R8B` |`AX`/`R8W` |`EAX`/`R8D` |`RAX`/`R8`
    `0b001`| |`CL`/`R9B` |`CX`/`R9W` |`ECX`/`R9D` |`RCX`/`R9`
  29. @mikesmullin mikesmullin revised this gist Nov 11, 2018. 1 changed file with 52 additions and 18 deletions.
    70 changes: 52 additions & 18 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -292,27 +292,29 @@ As a helpful mnemonic convention when programming assembly and referencing docum
    when referring to these registers, which describes both a) operand width, and b) where those bits are located within the full register.

    ```
    [0100011101001111010011110100010001001010010011110100001000100001] A register
    64 32 16 8 0 offset
    | | | |<----->| AL (Low 8-bits)
    | | |<----->| AH (High 8-bits)
    | |<---------------------------->| EAX (Low 32-bits)
    |<------------------------------------------------------------>| RAX (Full 64-bit register)
    | If most significant byte first (little-endian)
    A register [0100011101001111010011110100010001001010010011110100001000100001]
    offset 0 8 16 32 64
    (Low 8-bits) AL |<----->| | | |
    (High 8-bits) AH |<----->| | |
    (Low 16-bits) AX |<------------->| | |
    (Low 32-bits) EAX |<---------------------------->| |
    (Full 64-bit register) RAX |<------------------------------------------------------------>|
    ```

    While there are several places you may reference a register, including `MODRM.reg`, `MODRM.rm`, `SIB.index`, `SIB.base`,
    and `PO.reg`, you'll find they all use the same `3` or `4`-bit mapping convention, as follows:

    |`3`-bit<br>Reference|(`4th` bit = `0b0`/`0b1`)<br>Full `64`-bit Register|<br>Low `32`-bits|<br>Low `16`-bits|<br>Low `8`-bits³|<br>High `8`-bits¹ ³
    |Register<br>Reference|(`3`-bit / `4th`-bit=`0b1`)<br>Low `8`-bits³|<br>High `8`-bits¹ ³|<br>Low `16`-bits|<br>Low `32`-bits|<br>Full `64`-bit Register
    -|-|-|-|-|-
    `0b000`|`RAX`/`R8`|`EAX`/`R8D`|`AX`/`R8W`|`AL`/`R8B`
    `0b001`|`RCX`/`R9`|`ECX`/`R9D`|`CX`/`R9W`|`CL`/`R9B`
    `0b010`|`RDX`/`R10`|`EDX`/`R10D`|`DX`/`R10W`|`DL`/`R10B`
    `0b011`|`RBX`/`R11`|`EBX`/`R11D`|`BX`/`R11W`|`BL`/`R11B`
    `0b100`|`RSP`/`R12`|`ESP`/`R12D`|`SP`/`R12W`|`SPL`²/`R12B`|`AH`
    `0b101`|`RBP`/`R13`|`EBP`/`R13D`|`BP`/`R13W`|`BPL`²/`R13B`|`CH`
    `0b110`|`RSI`/`R14`|`ESI`/`R14D`|`SI`/`R14W`|`SIL`²/`R14B`|`DH`
    `0b111`|`RDI`/`R15`|`EDI`/`R15D`|`DI`/`R15W`|`DIL`²/`R15B`|`BH`
    `0b000`| |`AL`/`R8B` |`AX`/`R8W` |`EAX`/`R8D` |`RAX`/`R8`
    `0b001`| |`CL`/`R9B` |`CX`/`R9W` |`ECX`/`R9D` |`RCX`/`R9`
    `0b010`| |`DL`/`R10B` |`DX`/`R10W`|`EDX`/`R10D`|`RDX`/`R10`
    `0b011`| |`BL`/`R11B` |`BX`/`R11W`|`EBX`/`R11D`|`RBX`/`R11`
    `0b100`|`AH`|`SPL`²/`R12B`|`SP`/`R12W`|`ESP`/`R12D`|`RSP`/`R12`
    `0b101`|`CH`|`BPL`²/`R13B`|`BP`/`R13W`|`EBP`/`R13D`|`RBP`/`R13`
    `0b110`|`DH`|`SIL`²/`R14B`|`SI`/`R14W`|`ESI`/`R14D`|`RSI`/`R14`
    `0b111`|`BH`|`DIL`²/`R15B`|`DI`/`R15W`|`EDI`/`R15D`|`RDI`/`R15`

    **NOTES:**

    @@ -322,7 +324,7 @@ and `PO.reg`, you'll find they all use the same `3` or `4`-bit mapping conventio
    In fact, the lower `8` bytes of `SP`, `BP`, `SI`, and `DI` were not even addressable before x64 `Long` mode.
    3. Both high and low `8`-bit registers are only directly addressable from `Real` mode or `Virtual 8086` mode,
    but you can always grab the larger-width version of the same register, and it will contain those bytes, of course.
    4. _WARNING:_ `32`-bit registers are sign-extended when used in `Long` mode.
    4. _WARNING:_ `32`-bit registers are zero-extended when used in `Long` mode.
    (ie. `INC EAX` will zero-fill all of `RAX`, but `INC AL` or `INC AX` will not.)

    References:
    @@ -714,6 +716,36 @@ References:

    ---

    # Appendix: Big vs. Little Endianness

    This only applies at the byte level. It is the order which bytes are read by the
    processor. The x86 processor expects little-endian, which means the most significant
    byte is to the left.

    ie. `0d2` is `0x02000000` in `32`-bit little-endian, and `0d-2` is `0xfeffffff` in `32`-bit little-endian,
    where as the same values in big endian would be `0x00000002` and `0xfffffffe`.

    **WARNING:** Sometimes tools like debuggers, disassemblers, calculators, etc. will
    print the values opposite to what you are expecting for the architecture in context.
    In these cases, they are simply trying to be too helpful. Be aware of the byte order,
    and maybe check with a hex editor or multiple tools to be certain when it matters.

    **QUIRK:** Registers are typically drawn with the `EAX`, `AX`, `AH`, `AL` on the right-hand
    side, but in fact if you set a value like `0d24` in `RAX` and then print the values
    of `RAX`, `EAX`, `AX`, `AL` you will see they all equal `0d24`, and `AH` equals `0d0`,
    which means that their slices all actually begin from the most significant byte first.
    I like to think that the registers are stored little endian too, for consistency,
    and that all those drawings are backwards. Its uncommon to set `RAX` only to select
    `EAX`, so it may not matter, but its a little trivia to be aware of.

    References:
    - Endianness
    https://en.wikipedia.org/wiki/Endianness
    - Endianness inside CPU registers
    https://stackoverflow.com/questions/4504775/endianness-inside-cpu-registers

    ---

    # Appendix: Other Registers

    As you master your understanding of x86 architecture, there are a few registers
    @@ -791,6 +823,10 @@ References:
    http://wiki.cheatengine.org/index.php?title=Assembler&redirect=no
    - Types of Compiler Optimizations (useful to identify what you are reverse engineering)
    https://en.wikipedia.org/wiki/Compiler_optimization
    - GDB Internals
    http://wwwcdf.pd.infn.it/localdoc/gdbint.pdf
    - Windbg Commands
    http://windbg.info/doc/1-common-cmds.html

    ---

    @@ -884,8 +920,6 @@ References:
    https://os.phil-opp.com/handling-exceptions/
    - How Debuggers Work w/ Breakpoints
    https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints
    - GDB Internals
    http://wwwcdf.pd.infn.it/localdoc/gdbint.pdf
    - Ralf Brown's BIOS Interrupt List
    http://www.ctyme.com/rbrown.htm
    - NASM Tutorial
  30. @mikesmullin mikesmullin revised this gist Nov 11, 2018. 1 changed file with 20 additions and 0 deletions.
    20 changes: 20 additions & 0 deletions x86-assembly-notes.md
    Original file line number Diff line number Diff line change
    @@ -707,6 +707,10 @@ References:
    https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
    - Strategies with various implementations of `malloc`
    https://softwareengineering.stackexchange.com/a/319060
    - How GCC calculates stack size and layout
    https://stackoverflow.com/questions/21021223/how-does-the-gcc-determine-stack-size-the-function-based-on-c-will-use
    - C dynamic memory allocation
    https://en.wikipedia.org/wiki/C_dynamic_memory_allocation

    ---

    @@ -768,6 +772,10 @@ Calculated in some of the following ways:

    **NOTE:** pointers are only used by JMP and CALL instructions.

    References:
    - Using Short/Relative vs. Far Jumps
    https://thestarman.pcministry.com/asm/2bytejumps.htm

    ---

    # Appendix: Reverse Engineering & Malware Analysis
    @@ -872,3 +880,15 @@ References:
    http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
    - Anatomy of a Program in Memory
    https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
    - How Rust encodes exceptions and interrupts
    https://os.phil-opp.com/handling-exceptions/
    - How Debuggers Work w/ Breakpoints
    https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints
    - GDB Internals
    http://wwwcdf.pd.infn.it/localdoc/gdbint.pdf
    - Ralf Brown's BIOS Interrupt List
    http://www.ctyme.com/rbrown.htm
    - NASM Tutorial
    http://cs.lmu.edu/~ray/notes/nasmtutorial/
    - AMD64 Architecture Programmer's Manual
    https://www.amd.com/system/files/TechDocs/24594.pdf