Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save fatgrass/cbf8ccccdae6ee9831d55817d88bf08a to your computer and use it in GitHub Desktop.
Save fatgrass/cbf8ccccdae6ee9831d55817d88bf08a to your computer and use it in GitHub Desktop.

Revisions

  1. @yrp604 yrp604 revised this gist Jun 8, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -44,7 +44,7 @@
    | +-----------------------+ | +---------------+
    | | | |
    +-------v---------+ +-------v---------+ +-----------v-----+ +-------v---------+
    |hndl_mach_scall64| |hndl_unix_scall64| |hndl_mdep_scall64| |hndl_diag_scall64|
    |hndl_unix_scall64| |hndl_mach_scall64| |hndl_mdep_scall64| |hndl_diag_scall64|
    +-------+---------+ +-------+---------+ +-------+---------+ +-------+---------+
    | | | |
    | | | |
  2. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -107,7 +107,7 @@ From [OSDev](http://wiki.osdev.org/Sysenter), `sysenter` is not an interrupt, ra
    wrmsr64(MSR_IA32_SYSENTER_EIP, (uintptr_t)hi64_sysenter);
    ```

    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter` which is defined in `xnu/osfmk/x86_64/idt64.s`.
    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter`, which is defined in `xnu/osfmk/x86_64/idt64.s`.

    Interestingly, neither `int 0x80` nor `sysenter` will work on amd64. If we trace the code out, we always end up in the 32bit code path, which kicks us to the 64bit code path (we end up in `hndl_alltraps` which calls `user_trap` from `xnu/osfmk/i386/trap.c`. This _does not_ link us to any of the syscall dispatching that we need, and thus will not execute our system calls. As far as I can tell, from a 64bit binary you must enter the kernel through `syscall`.

  3. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -28,6 +28,7 @@
    |sysenter|--->|hi64_sysenter|--------+ +->|hndl_alltraps|-->|user_trap|
    +--------+ +-------------+ +-------------+ +---------+
    +--------+ +-------------+ +--------------+ +----------------+ +-----------------+
    |syscall |--->|hi64_syscall |--->|L_dispatch_U64|--->|L_dispatch_64bit|--->|L_common_dispatch|-+
    +--------+ +-------------+ +--------------+ +----------------+ +-----------------+ |
  4. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 12 additions and 12 deletions.
    24 changes: 12 additions & 12 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -28,15 +28,15 @@
    |sysenter|--->|hi64_sysenter|--------+ +->|hndl_alltraps|-->|user_trap|
    +--------+ +-------------+ +-------------+ +---------+
    +--------+ +-------------+ +--------------+ +----------------+
    |syscall |--->|hi64_syscall |-------------->|L_dispatch_U64|-->|L_dispatch_64bit|-+
    +--------+ +-------------+ +--------------+ +----------------+ |
    |
    +----------------------------------------+
    |
    | +-----------------+ +------------+
    +->|L_common_dispatch|->|hndl_syscall|
    +-----------------+ +-+--+--+--+-+
    +--------+ +-------------+ +--------------+ +----------------+ +-----------------+
    |syscall |--->|hi64_syscall |--->|L_dispatch_U64|--->|L_dispatch_64bit|--->|L_common_dispatch|-+
    +--------+ +-------------+ +--------------+ +----------------+ +-----------------+ |
    |
    +-----------------------+
    |
    +-----v------+
    |hndl_syscall|
    +-+--+--+--+-+
    | | | |
    +-------------------------------------------+ | | |
    | | | |
    @@ -47,9 +47,9 @@
    +-------+---------+ +-------+---------+ +-------+---------+ +-------+---------+
    | | | |
    | | | |
    +-------v------+ +-------v----------+ +-------v---------+ +---v------+
    |unix_syscall64| |mach_call_munger64| |machdep_syscall64| |diagCall64|
    +--------------+ +------------------+ +-----------------+ +----------+
    +-------v------+ +-------v----------+ +-------v---------+ +----v-----+
    |unix_syscall64| |mach_call_munger64| |machdep_syscall64| |diagCall64|
    +--------------+ +------------------+ +-----------------+ +----------+
    ```

    ## The question
  5. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -90,7 +90,7 @@ To figure out where these values are pushed from registers to memory, we're goin
    We'll start at the definition of the interrupt handler in `xnu/osfmk/x86_64/idt_table.h`. Here we see a few interesting things:
    ```C
    ```
    USER_TRAP_SPC(0x80, idt64_unix_scall)
    USER_TRAP_SPC(0x81, idt64_mach_scall)
    USER_TRAP_SPC(0x82, idt64_mdep_scall)
    @@ -106,7 +106,9 @@ From [OSDev](http://wiki.osdev.org/Sysenter), `sysenter` is not an interrupt, ra
    wrmsr64(MSR_IA32_SYSENTER_EIP, (uintptr_t)hi64_sysenter);
    ```

    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter` which is defined in `xnu/osfmk/x86_64/idt64.s`. However, interestingly neither `int 80` nor `sysenter` will work on amd64. If we trace the code out, we always end up in the 32bit code path, which kicks us to the 64bit code path (we end up in `hndl_alltraps` which calls `user_trap` from `xnu/osfmk/i386/trap.c`. This _does not_ link us to any of the syscall dispatching that we need, and thus will not execute our system calls. As far as I can tell, from a 64bit binary you must enter the kernel through `syscall`.
    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter` which is defined in `xnu/osfmk/x86_64/idt64.s`.

    Interestingly, neither `int 0x80` nor `sysenter` will work on amd64. If we trace the code out, we always end up in the 32bit code path, which kicks us to the 64bit code path (we end up in `hndl_alltraps` which calls `user_trap` from `xnu/osfmk/i386/trap.c`. This _does not_ link us to any of the syscall dispatching that we need, and thus will not execute our system calls. As far as I can tell, from a 64bit binary you must enter the kernel through `syscall`.

    ### `syscall`

  6. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -81,6 +81,7 @@ These come from a combination of the constants defined in `xnu/osfmk/mach/i386/s
    ## Tracing execution flow
    To figure out where these values are pushed from registers to memory, we're going to trace execution from from userland to the respective kernel function. There are three common ways (coming from a linux background) to transition from usermode to kernelmode:
    1. `int 0x80`
    2. `sysenter`
    3. `syscall`
  7. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -76,7 +76,7 @@ Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent sysc
    For example: if we wanted unix syscall 1 (`exit()`), `rax` would need to be equal to `0x02 << 24 | 1`, or `0x2000001`. If we wanted mach trap 31 (`mach_msg()`), `rax` would need to be `0x100001f`.
    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x20000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right handler.
    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x2000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right handler.
    ## Tracing execution flow
  8. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 5 additions and 3 deletions.
    8 changes: 5 additions & 3 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -66,14 +66,16 @@ Where `p` is the process executing the syscall, `uap` is a pointer to a struct c
    ## Background on syscall numbers
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative syscall no.), unix syscalls (positive syscall no. under 0x6000), and PPC syscalls (positive syscalls no. over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, and with different constants. The syscall number is stored in `rax`. The following defines the conditions under which various syscalls are dispatched to their respective handlers:
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative syscall no.), unix syscalls (positive syscall no. under `0x6000`), and PPC syscalls (positive syscalls no. over `0x6000`) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, and with different constants. The syscall number is stored in `rax`. The following defines how syscalls are dispatched based on the value in `rax`:
    * Mach Traps: `rax & 0x01 << 24`
    * Unix Syscall: `rax & 0x02 << 24`
    * Machine Dependent: `rax & 0x03 << 24`
    * Diagnostics: `rax & 0x04 << 24`
    * Mach IPC (unused?): `rax & 0x05 << 24`
    For example: if we wanted unix syscall 1 (`exit()`), `rax` would need to be equal to `0x02 << 24 | 1`, or `0x2000001`. If we wanted mach trap 31 (`mach_msg()`), `rax` would need to be `0x100001f`.
    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x20000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right handler.
    ## Tracing execution flow
    @@ -97,13 +99,13 @@ This is kinda cool -- we can directly jump into the dispatch functions by varryi

    ### `sysenter`

    From [OSDev](http://wiki.osdev.org/Sysenter), `sysenter` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amoungst other things when the `sysenter` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:
    From [OSDev](http://wiki.osdev.org/Sysenter), `sysenter` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amongst other things when the `sysenter` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:

    ```C
    wrmsr64(MSR_IA32_SYSENTER_EIP, (uintptr_t)hi64_sysenter);
    ```
    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter` which is defined in `xnu/osfmk/x86_64/idt64.s`. However, interestingly either `int 80` nor `sysenter` will work on amd64. If we trace the code out, we always end up in the 32bit code path, which kicks us to the 64bit code path (we end up in `hndl_alltraps` which calls `user_trap` from `xnu/osfmk/i386/trap.c`. This _does not_ link us to any of the syscall dispatching that we need, and thus will not execute our system calls. As far as I can tell, from a 64bit binary you must enter the kernel through `sysenter`.
    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter` which is defined in `xnu/osfmk/x86_64/idt64.s`. However, interestingly neither `int 80` nor `sysenter` will work on amd64. If we trace the code out, we always end up in the 32bit code path, which kicks us to the 64bit code path (we end up in `hndl_alltraps` which calls `user_trap` from `xnu/osfmk/i386/trap.c`. This _does not_ link us to any of the syscall dispatching that we need, and thus will not execute our system calls. As far as I can tell, from a 64bit binary you must enter the kernel through `syscall`.
    ### `syscall`
  9. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 12 additions and 12 deletions.
    24 changes: 12 additions & 12 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -16,26 +16,26 @@
    |int 0x80|--->|idt64_unix_scall|-+-+-+----->|L_32bit_entry_check|--->|L_64bit_entry_reject|-+
    +--------+ +----------------+ ^ ^ ^ +-------------------+ +--------------------+ |
    | | | |
    +--------+ +----------------+ | | | +-------------------------------------------------+
    |int 0x81|--->|idt64_mach_scall|-+ | | |
    +--------+ +----------------+ | | | +--------------+ +----------------+ +-----------------+
    | | +>|L_dispatch_U64|->|L_dispatch_64bit|->|L_common_dispatch|-+
    +--------+ +----------------+ | | | +--------------------------------------------------+
    |int 0x81|--->|idt64_mach_scall|-+ | | |
    +--------+ +----------------+ | | | +--------------+ +----------------+ +-----------------+
    | | +->|L_dispatch_U64|->|L_dispatch_64bit|->|L_common_dispatch|-+
    +--------+ +----------------+ | | +--------------+ +----------------+ +-----------------+ |
    |int 0x82|--->|idt64_mdep_scall|---+ | |
    +--------+ +----------------+ | +-----------------------------------------------------------+
    | |
    +--------+ +-------------+ | | +-------------+ +---------+
    |sysenter|--->|hi64_sysenter|--------+ +>|hndl_alltraps|-->|user_trap|
    +--------+ +----------------+ | +------------------------------------------------------------+
    | |
    +--------+ +-------------+ | | +-------------+ +---------+
    |sysenter|--->|hi64_sysenter|--------+ +->|hndl_alltraps|-->|user_trap|
    +--------+ +-------------+ +-------------+ +---------+
    +--------+ +-------------+ +--------------+ +----------------+
    |syscall |--->|hi64_syscall |-------------->|L_dispatch_U64|-->|L_dispatch_64bit|-+
    +--------+ +-------------+ +--------------+ +----------------+ |
    |
    +---------------------------------------+
    |
    | +-----------------+ +------------+
    +>|L_common_dispatch|->|hndl_syscall|
    +----------------------------------------+
    |
    | +-----------------+ +------------+
    +->|L_common_dispatch|->|hndl_syscall|
    +-----------------+ +-+--+--+--+-+
    | | | |
    +-------------------------------------------+ | | |
  10. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -13,29 +13,29 @@
    v v
    +--------+ +----------------+ +-------------------+ +--------------------+
    |int 0x80+--->|idt64_unix_scall+-+-+-+----->|L_32bit_entry_check+--->|L_64bit_entry_reject+-+
    |int 0x80|--->|idt64_unix_scall|-+-+-+----->|L_32bit_entry_check|--->|L_64bit_entry_reject|-+
    +--------+ +----------------+ ^ ^ ^ +-------------------+ +--------------------+ |
    | | | |
    +--------+ +----------------+ | | | +-------------------------------------------------+
    |int 0x81+--->|idt64_mach_scall+-+ | | |
    |int 0x81|--->|idt64_mach_scall|-+ | | |
    +--------+ +----------------+ | | | +--------------+ +----------------+ +-----------------+
    | | +>|L_dispatch_U64+->|L_dispatch_64bit+->|L_common_dispatch+-+
    | | +>|L_dispatch_U64|->|L_dispatch_64bit|->|L_common_dispatch|-+
    +--------+ +----------------+ | | +--------------+ +----------------+ +-----------------+ |
    |int 0x82+--->|idt64_mdep_scall+---+ | |
    |int 0x82|--->|idt64_mdep_scall|---+ | |
    +--------+ +----------------+ | +-----------------------------------------------------------+
    | |
    +--------+ +-------------+ | | +-------------+ +---------+
    |sysenter+--->|hi64_sysenter+--------+ +>|hndl_alltraps+-->|user_trap|
    |sysenter|--->|hi64_sysenter|--------+ +>|hndl_alltraps|-->|user_trap|
    +--------+ +-------------+ +-------------+ +---------+
    +--------+ +-------------+ +--------------+ +----------------+
    |syscall +--->|hi64_syscall +-------------->|L_dispatch_U64+-->|L_dispatch_64bit+-+
    |syscall |--->|hi64_syscall |-------------->|L_dispatch_U64|-->|L_dispatch_64bit|-+
    +--------+ +-------------+ +--------------+ +----------------+ |
    |
    +---------------------------------------+
    |
    | +-----------------+ +------------+
    +>|L_common_dispatch+->|hndl_syscall|
    +>|L_common_dispatch|->|hndl_syscall|
    +-----------------+ +-+--+--+--+-+
    | | | |
    +-------------------------------------------+ | | |
  11. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 8 additions and 8 deletions.
    16 changes: 8 additions & 8 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -13,29 +13,29 @@
    v v
    +--------+ +----------------+ +-------------------+ +--------------------+
    |int 0x80+---->idt64_unix_scall+-+-+-+------>L_32bit_entry_check+---->L_64bit_entry_reject+-+
    |int 0x80+--->|idt64_unix_scall+-+-+-+----->|L_32bit_entry_check+--->|L_64bit_entry_reject+-+
    +--------+ +----------------+ ^ ^ ^ +-------------------+ +--------------------+ |
    | | | |
    +--------+ +----------------+ | | | +-------------------------------------------------+
    |int 0x81+---->idt64_mach_scall+-+ | | |
    |int 0x81+--->|idt64_mach_scall+-+ | | |
    +--------+ +----------------+ | | | +--------------+ +----------------+ +-----------------+
    | | +->L_dispatch_U64+-->L_dispatch_64bit+-->L_common_dispatch+-+
    | | +>|L_dispatch_U64+->|L_dispatch_64bit+->|L_common_dispatch+-+
    +--------+ +----------------+ | | +--------------+ +----------------+ +-----------------+ |
    |int 0x82+---->idt64_mdep_scall+---+ | |
    |int 0x82+--->|idt64_mdep_scall+---+ | |
    +--------+ +----------------+ | +-----------------------------------------------------------+
    | |
    +--------+ +-------------+ | | +-------------+ +---------+
    |sysenter+---->hi64_sysenter+--------+ +->hndl_alltraps+--->user_trap|
    |sysenter+--->|hi64_sysenter+--------+ +>|hndl_alltraps+-->|user_trap|
    +--------+ +-------------+ +-------------+ +---------+
    +--------+ +-------------+ +--------------+ +----------------+
    |syscall +---->hi64_syscall +--------------->L_dispatch_U64+--->L_dispatch_64bit+-+
    |syscall +--->|hi64_syscall +-------------->|L_dispatch_U64+-->|L_dispatch_64bit+-+
    +--------+ +-------------+ +--------------+ +----------------+ |
    |
    +---------------------------------------+
    |
    | +-----------------+ +------------+
    +->L_common_dispatch+-->hndl_syscall|
    +>|L_common_dispatch+->|hndl_syscall|
    +-----------------+ +-+--+--+--+-+
    | | | |
    +-------------------------------------------+ | | |
    @@ -54,7 +54,7 @@

    ## The question

    A while ago when starting to audit XNU syscalls, I noticed something kind of funny and wanted to track it down. To preface, everything here is specific only to xnu 10.11.2 on amd64, though may apply to other architectures. Additionally, my kernel debugger is currently broken (thanks VMWare!), so take this with a grain of salt as it's not been verified. Please let me know if anything is mistaken or unclear.
    A while ago when starting to audit xnu syscalls, I noticed something kind of funny and wanted to track it down. To preface, everything here is specific only to xnu 10.11.2 on amd64, though may apply to other architectures. Additionally, my kernel debugger is currently broken (thanks VMWare!), so take this with a grain of salt as it's not been verified. Please let me know if anything is mistaken or unclear.

    Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:

  12. @yrp604 yrp604 revised this gist Mar 25, 2016. 1 changed file with 84 additions and 12 deletions.
    96 changes: 84 additions & 12 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -1,54 +1,125 @@
    A while ago when starting to audit XNU syscalls, I noticed something kind of funny. To preface, everything here is specific only to xnu on amd64, though may apply to other architectures. Additionally, my kernel debugger is currently broken (thanks VMWare!), so take this with a grain of salt as it's not been verified.
    # XNU syscall path

    Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:
    ## Chart

    ```
    +------------------+
    |These push their | +-----------------------+
    |respective syscall| |This overwrites the |
    |dispatch functions| |saved dispatch function|
    |onto the stack | |with hndl_alltraps |
    +--------+---------+ +-----------+-----------+
    | |
    v v
    +--------+ +----------------+ +-------------------+ +--------------------+
    |int 0x80+---->idt64_unix_scall+-+-+-+------>L_32bit_entry_check+---->L_64bit_entry_reject+-+
    +--------+ +----------------+ ^ ^ ^ +-------------------+ +--------------------+ |
    | | | |
    +--------+ +----------------+ | | | +-------------------------------------------------+
    |int 0x81+---->idt64_mach_scall+-+ | | |
    +--------+ +----------------+ | | | +--------------+ +----------------+ +-----------------+
    | | +->L_dispatch_U64+-->L_dispatch_64bit+-->L_common_dispatch+-+
    +--------+ +----------------+ | | +--------------+ +----------------+ +-----------------+ |
    |int 0x82+---->idt64_mdep_scall+---+ | |
    +--------+ +----------------+ | +-----------------------------------------------------------+
    | |
    +--------+ +-------------+ | | +-------------+ +---------+
    |sysenter+---->hi64_sysenter+--------+ +->hndl_alltraps+--->user_trap|
    +--------+ +-------------+ +-------------+ +---------+
    +--------+ +-------------+ +--------------+ +----------------+
    |syscall +---->hi64_syscall +--------------->L_dispatch_U64+--->L_dispatch_64bit+-+
    +--------+ +-------------+ +--------------+ +----------------+ |
    |
    +---------------------------------------+
    |
    | +-----------------+ +------------+
    +->L_common_dispatch+-->hndl_syscall|
    +-----------------+ +-+--+--+--+-+
    | | | |
    +-------------------------------------------+ | | |
    | | | |
    | +-----------------------+ | +---------------+
    | | | |
    +-------v---------+ +-------v---------+ +-----------v-----+ +-------v---------+
    |hndl_mach_scall64| |hndl_unix_scall64| |hndl_mdep_scall64| |hndl_diag_scall64|
    +-------+---------+ +-------+---------+ +-------+---------+ +-------+---------+
    | | | |
    | | | |
    +-------v------+ +-------v----------+ +-------v---------+ +---v------+
    |unix_syscall64| |mach_call_munger64| |machdep_syscall64| |diagCall64|
    +--------------+ +------------------+ +-----------------+ +----------+
    ```

    ## The question

    A while ago when starting to audit XNU syscalls, I noticed something kind of funny and wanted to track it down. To preface, everything here is specific only to xnu 10.11.2 on amd64, though may apply to other architectures. Additionally, my kernel debugger is currently broken (thanks VMWare!), so take this with a grain of salt as it's not been verified. Please let me know if anything is mistaken or unclear.

    Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:

    ```C
    void exit(proc_t p, struct exit_args *uap, int *retval)
    ```
    Where `p` is the process executing the syscall, `uap` is a pointer to a struct containing the user args, and `retval` is a pointer that will contain the result of the syscall. However, this seems kind of odd -- OS X uses the SystemV ABI everywhere, including syscalls and this means the syscall arguments are passed in registers (`rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`) with the syscall number in `rax`. This raises an obvious question: where do these values get moved from registers to memory, and where is that memory located (userspace vs kernelspace).
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative), unix syscalls (positive syscalls under 0x6000), and PPC syscalls (positive syscalls over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, and with different constants. The syscall number is still stored in `rax`.
    ## Background on syscall numbers
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative syscall no.), unix syscalls (positive syscall no. under 0x6000), and PPC syscalls (positive syscalls no. over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, and with different constants. The syscall number is stored in `rax`. The following defines the conditions under which various syscalls are dispatched to their respective handlers:
    * Mach Traps: `rax & 0x01 << 24`
    * Unix Syscall: `rax & 0x02 << 24`
    * Machine Dependent: `rax & 0x03 << 24`
    * Diagnostics: `rax & 0x04 << 24`
    * Mach IPC (unused?): `rax & 0x05 << 24`
    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x20000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right syscall handler.
    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x20000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right handler.
    We'll start at the definition of the interrupt handler in `xnu/osfmk/x86_64/idt_table.h`. Here we see a few interesting things:
    ## Tracing execution flow
    To figure out where these values are pushed from registers to memory, we're going to trace execution from from userland to the respective kernel function. There are three common ways (coming from a linux background) to transition from usermode to kernelmode:
    1. `int 0x80`
    2. `sysenter`
    3. `syscall`
    ### `int 0x80`
    We'll start at the definition of the interrupt handler in `xnu/osfmk/x86_64/idt_table.h`. Here we see a few interesting things:
    ```C
    USER_TRAP_SPC(0x80, idt64_unix_scall)
    USER_TRAP_SPC(0x81, idt64_mach_scall)
    USER_TRAP_SPC(0x82, idt64_mdep_scall)
    ```

    This is kinda cool -- we can directly jump into the dispatch functions by varrying our interrupt number. Traditionally on x86 machines, `int 0x80` was a syscall. However, this indicates we can actually call to the kernel from usermode with any of these three interrupts (as long as we want the appropriate type of call). In fact, we must use the correct interrupt number when attempting to call into the kernel in this fashion (e.g a unix syscall with `int 0x81` will fail).

    ### `sysenter`

    From [OSDev](http://wiki.osdev.org/Sysenter), `sysenter` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amoungst other things when the `sysenter` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:

    ```C
    wrmsr64(MSR_IA32_SYSENTER_EIP, (uintptr_t)hi64_sysenter);
    ```
    This is kinda cool -- we can directly jump into the dispatch functions by varrying our interrupt number. Traditionally on x86 machines, `int 0x80` was a syscall. However, this indicates we can actually call to the kernel from usermode with any of these three interrupts (as long as we want the appropriate type). In fact, we must use the correct interrupt number when attempting to call into the kernel in this fashion (e.g a unix syscall with `int 0x81` will fail). With the introduction of amd64 we got the `sysenter` and `syscall` instructions to enter the kernel without the overhead of an interrupt. These function a bit differently and require more explination to follow the code path we're interested in.
    This means when `sysenter` executes, the value of `rip` is set to `hi64_sysenter` which is defined in `xnu/osfmk/x86_64/idt64.s`. However, interestingly either `int 80` nor `sysenter` will work on amd64. If we trace the code out, we always end up in the 32bit code path, which kicks us to the 64bit code path (we end up in `hndl_alltraps` which calls `user_trap` from `xnu/osfmk/i386/trap.c`. This _does not_ link us to any of the syscall dispatching that we need, and thus will not execute our system calls. As far as I can tell, from a 64bit binary you must enter the kernel through `sysenter`.
    From [OSDev](http://wiki.osdev.org/Sysenter), `syscall` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amoungst other things when the `syscall` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:
    ### `syscall`
    `syscall` is very similar to `sysenter`, only with a different MSR. Again in `osfmk/i386/mp_desc.c` we find the relevant code:
    ```C
    wrmsr64(MSR_IA32_LSTAR, (uintptr_t)hi64_syscall);
    ```

    From this we can take away that when `syscall` executes, `rip` will be set to `hi64_syscall`, which is another function defined in our old friend `xnu/osfmk/x86_64/idt64.s`. From here, we'll see that we're loading `hndl_syscall` onto the stack, at the offset `ISF64_TRAPFN` (it's a macro which corresponds to a structure offset).

    ```asm
    leaq HNDL_SYSCALL(%rip), %r11;
    movq %r11, ISF64_TRAPFN(%rsp)
    ```


    From here we branch to `L_dispatch_U64` where `rsp` gets copied to `r15` and then into `L_dispatch_64bit` which saves our user register state to the kernel stack. This means `r15` is a pointer to a `x86_saved_state_t`, which is defined as a `x86_saved_state64` (in `xnu/osfmk/mach/i386/thread_status.h`). We store the earlier saved value from `ISF64_TRAPFN` (which was `hndl_syscall`) in `rdx` and jump to `L_common_dispatch` which finally calls the function stored in `rdx`.

    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15` (still our saved state). This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:
    @@ -74,8 +145,9 @@ To briefly explain this code: first we're getting the current thread struct. Sec
    This pretty much solves our original mystery: the interrupt handler pushes all the registers onto the kernel stack, and that kernel stack is in turn copied into the thread's struct. The address of the memory inside the thread struct is passed to our syscall, which uses it for referencing all arguments.
    As we've listed quite a few functions, below a sequential list of every function or label a standard unix syscall should hit in xnu, between the `syscall` and the start of the syscall function:
    ## Notes
    As we've listed quite a few functions, below a sequential list of every function or label a standard unix syscall should hit in xnu, between the `syscall` and the start of the syscall function:
    ```
    hi64_syscall
  13. @yrp604 yrp604 revised this gist Mar 24, 2016. 1 changed file with 14 additions and 1 deletion.
    15 changes: 14 additions & 1 deletion xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,13 @@
    A while ago when starting to audit XNU syscalls, I noticed something kind of funny. To preface, everything here is specific only to xnu on amd64, though may apply to other architectures. Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:
    A while ago when starting to audit XNU syscalls, I noticed something kind of funny. To preface, everything here is specific only to xnu on amd64, though may apply to other architectures. Additionally, my kernel debugger is currently broken (thanks VMWare!), so take this with a grain of salt as it's not been verified.

    Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:


    ```C
    void exit(proc_t p, struct exit_args *uap, int *retval)
    ```
    Where `p` is the process executing the syscall, `uap` is a pointer to a struct containing the user args, and `retval` is a pointer that will contain the result of the syscall. However, this seems kind of odd -- OS X uses the SystemV ABI everywhere, including syscalls and this means the syscall arguments are passed in registers (`rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`) with the syscall number in `rax`. This raises an obvious question: where do these values get moved from registers to memory, and where is that memory located (userspace vs kernelspace).
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative), unix syscalls (positive syscalls under 0x6000), and PPC syscalls (positive syscalls over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, and with different constants. The syscall number is still stored in `rax`.
    @@ -18,31 +22,38 @@ These come from a combination of the constants defined in `xnu/osfmk/mach/i386/s
    We'll start at the definition of the interrupt handler in `xnu/osfmk/x86_64/idt_table.h`. Here we see a few interesting things:
    ```C
    USER_TRAP_SPC(0x80, idt64_unix_scall)
    USER_TRAP_SPC(0x81, idt64_mach_scall)
    USER_TRAP_SPC(0x82, idt64_mdep_scall)
    ```


    This is kinda cool -- we can directly jump into the dispatch functions by varrying our interrupt number. Traditionally on x86 machines, `int 0x80` was a syscall. However, this indicates we can actually call to the kernel from usermode with any of these three interrupts (as long as we want the appropriate type). In fact, we must use the correct interrupt number when attempting to call into the kernel in this fashion (e.g a unix syscall with `int 0x81` will fail). With the introduction of amd64 we got the `sysenter` and `syscall` instructions to enter the kernel without the overhead of an interrupt. These function a bit differently and require more explination to follow the code path we're interested in.

    From [OSDev](http://wiki.osdev.org/Sysenter), `syscall` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amoungst other things when the `syscall` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:


    ```C
    wrmsr64(MSR_IA32_LSTAR, (uintptr_t)hi64_syscall);
    ```
    From this we can take away that when `syscall` executes, `rip` will be set to `hi64_syscall`, which is another function defined in our old friend `xnu/osfmk/x86_64/idt64.s`. From here, we'll see that we're loading `hndl_syscall` onto the stack, at the offset `ISF64_TRAPFN` (it's a macro which corresponds to a structure offset).
    ```asm
    leaq HNDL_SYSCALL(%rip), %r11;
    movq %r11, ISF64_TRAPFN(%rsp)
    ```


    From here we branch to `L_dispatch_U64` where `rsp` gets copied to `r15` and then into `L_dispatch_64bit` which saves our user register state to the kernel stack. This means `r15` is a pointer to a `x86_saved_state_t`, which is defined as a `x86_saved_state64` (in `xnu/osfmk/mach/i386/thread_status.h`). We store the earlier saved value from `ISF64_TRAPFN` (which was `hndl_syscall`) in `rdx` and jump to `L_common_dispatch` which finally calls the function stored in `rdx`.

    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15` (still our saved state). This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:


    ```C
    thread = current_thread();
    uthread = get_bsdthread_info(thread);
    @@ -58,12 +69,14 @@ Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64`
    error = (*(callp->sy_call))((void *)p, vt, &(uthread->uu_rval[0]));
    ```
    To briefly explain this code: first we're getting the current thread struct. Second we're getting the system call entry out of the syscall table. This includes the number of arguments the syscall expects, as well as the function pointer (`sy_call`). Third we're getting a chunk of memory out of the current thread struct, and finally we're copying the arguments from saved reg state into the specified memory on the kernels thread struct.
    This pretty much solves our original mystery: the interrupt handler pushes all the registers onto the kernel stack, and that kernel stack is in turn copied into the thread's struct. The address of the memory inside the thread struct is passed to our syscall, which uses it for referencing all arguments.
    As we've listed quite a few functions, below a sequential list of every function or label a standard unix syscall should hit in xnu, between the `syscall` and the start of the syscall function:
    ```
    hi64_syscall
    L_dispatch_U64
  14. @yrp604 yrp604 revised this gist Mar 24, 2016. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -24,7 +24,7 @@ USER_TRAP_SPC(0x81, idt64_mach_scall)
    USER_TRAP_SPC(0x82, idt64_mdep_scall)
    ```

    This is kinda cool -- we can directly jump into the dispatch functions by varrying our interrupt number. Traditionally on x86 machines, `int 0x80` was a syscall. However, this indicates we can actually call to the kernel from usermode with any of these three interrupts (as long as we want the appropriate type). In fact, we must use the correct interrupt number when attempting to call into the kernel in this fashion (e.g a unix syscall with `int 0x81` will fail). With the introduction of amd64 we got the `sysenter` and `syscall` instructions to enter the kernel without the overhead of an interrupt. These function a bit bit differently and require more explination to follow the code path.
    This is kinda cool -- we can directly jump into the dispatch functions by varrying our interrupt number. Traditionally on x86 machines, `int 0x80` was a syscall. However, this indicates we can actually call to the kernel from usermode with any of these three interrupts (as long as we want the appropriate type). In fact, we must use the correct interrupt number when attempting to call into the kernel in this fashion (e.g a unix syscall with `int 0x81` will fail). With the introduction of amd64 we got the `sysenter` and `syscall` instructions to enter the kernel without the overhead of an interrupt. These function a bit differently and require more explination to follow the code path we're interested in.

    From [OSDev](http://wiki.osdev.org/Sysenter), `syscall` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amoungst other things when the `syscall` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:

    @@ -39,9 +39,9 @@ leaq HNDL_SYSCALL(%rip), %r11;
    movq %r11, ISF64_TRAPFN(%rsp)
    ```

    From here we branch to `L_dispatch_U64` where `rsp` gets copied to `r15` and then into `L_dispatch_64bit` which saves our user register state to the kernel stack. This means `r15` is a pointer to a `x86_saved_state_t`, which is defined as a `x86_saved_state64` (in `xnu/osfmk/mach/i386/thread_status.h`). We store the earlier saved value from `ISF64_TRAPFN` (which was `hndl_syscall`) in `rdx` and jump to `L_common_dispatch` which finally calls the function in `rdx`.
    From here we branch to `L_dispatch_U64` where `rsp` gets copied to `r15` and then into `L_dispatch_64bit` which saves our user register state to the kernel stack. This means `r15` is a pointer to a `x86_saved_state_t`, which is defined as a `x86_saved_state64` (in `xnu/osfmk/mach/i386/thread_status.h`). We store the earlier saved value from `ISF64_TRAPFN` (which was `hndl_syscall`) in `rdx` and jump to `L_common_dispatch` which finally calls the function stored in `rdx`.

    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15`. This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:
    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15` (still our saved state). This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:

    ```C
    thread = current_thread();
  15. @yrp604 yrp604 revised this gist Mar 24, 2016. 1 changed file with 42 additions and 4 deletions.
    46 changes: 42 additions & 4 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,12 @@
    A while ago when starting to audit XNU syscalls, I noticed something kind of funny. To preface, everything here is specific only to xnu on amd64, though may apply to other architectures. Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:

    ```
    ```C
    void exit(proc_t p, struct exit_args *uap, int *retval)
    ```
    Where `p` is the process executing the syscall, `uap` is a pointer to a struct containing the user args, and `retval` is a pointer that will contain the result of the syscall. However, this seems kind of odd -- OS X uses the SystemV ABI everywhere, including syscalls and this means the syscall arguments are passed in registers (`rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`) with the syscall number in `rax`. This raises an obvious question: where do these values get moved from registers to memory, and where is that memory located (userspace vs kernelspace).
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative), unix syscalls (positive syscalls under 0x6000), and PPC syscalls (positive syscalls over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, with different constants. The syscall number is still stored in `rax`
    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative), unix syscalls (positive syscalls under 0x6000), and PPC syscalls (positive syscalls over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, and with different constants. The syscall number is still stored in `rax`.
    * Mach Traps: `rax & 0x01 << 24`
    * Unix Syscall: `rax & 0x02 << 24`
    @@ -16,9 +16,34 @@ Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent sysc
    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x20000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right syscall handler.
    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15`. This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:
    We'll start at the definition of the interrupt handler in `xnu/osfmk/x86_64/idt_table.h`. Here we see a few interesting things:
    ```C
    USER_TRAP_SPC(0x80, idt64_unix_scall)
    USER_TRAP_SPC(0x81, idt64_mach_scall)
    USER_TRAP_SPC(0x82, idt64_mdep_scall)
    ```

    This is kinda cool -- we can directly jump into the dispatch functions by varrying our interrupt number. Traditionally on x86 machines, `int 0x80` was a syscall. However, this indicates we can actually call to the kernel from usermode with any of these three interrupts (as long as we want the appropriate type). In fact, we must use the correct interrupt number when attempting to call into the kernel in this fashion (e.g a unix syscall with `int 0x81` will fail). With the introduction of amd64 we got the `sysenter` and `syscall` instructions to enter the kernel without the overhead of an interrupt. These function a bit bit differently and require more explination to follow the code path.

    From [OSDev](http://wiki.osdev.org/Sysenter), `syscall` is not an interrupt, rather it's an instruction which transitions us to kernelspace from userspace. Specifically, the value of `rip` will be loaded from a model specific register (MSR) amoungst other things when the `syscall` instruction is executed. A bit of grepping leads us to `osfmk/i386/mp_desc.c`:

    ```C
    wrmsr64(MSR_IA32_LSTAR, (uintptr_t)hi64_syscall);
    ```
    From this we can take away that when `syscall` executes, `rip` will be set to `hi64_syscall`, which is another function defined in our old friend `xnu/osfmk/x86_64/idt64.s`. From here, we'll see that we're loading `hndl_syscall` onto the stack, at the offset `ISF64_TRAPFN` (it's a macro which corresponds to a structure offset).
    ```asm
    leaq HNDL_SYSCALL(%rip), %r11;
    movq %r11, ISF64_TRAPFN(%rsp)
    ```

    From here we branch to `L_dispatch_U64` where `rsp` gets copied to `r15` and then into `L_dispatch_64bit` which saves our user register state to the kernel stack. This means `r15` is a pointer to a `x86_saved_state_t`, which is defined as a `x86_saved_state64` (in `xnu/osfmk/mach/i386/thread_status.h`). We store the earlier saved value from `ISF64_TRAPFN` (which was `hndl_syscall`) in `rdx` and jump to `L_common_dispatch` which finally calls the function in `rdx`.

    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15`. This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:

    ```C
    thread = current_thread();
    uthread = get_bsdthread_info(thread);
    // regs is derrived from r15 ...
    @@ -35,4 +60,17 @@ Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64`
    To briefly explain this code: first we're getting the current thread struct. Second we're getting the system call entry out of the syscall table. This includes the number of arguments the syscall expects, as well as the function pointer (`sy_call`). Third we're getting a chunk of memory out of the current thread struct, and finally we're copying the arguments from saved reg state into the specified memory on the kernels thread struct.
    This pretty much solves our mystery: the interrupt handler pushes all the registers onto the kernel stack, and that kernel stack is in turn copied into the thread's struct. The address of the memory inside the thread struct is passed to our syscall, which uses it for referencing all arguments.
    This pretty much solves our original mystery: the interrupt handler pushes all the registers onto the kernel stack, and that kernel stack is in turn copied into the thread's struct. The address of the memory inside the thread struct is passed to our syscall, which uses it for referencing all arguments.
    As we've listed quite a few functions, below a sequential list of every function or label a standard unix syscall should hit in xnu, between the `syscall` and the start of the syscall function:
    ```
    hi64_syscall
    L_dispatch_U64
    L_dispatch_64bit
    L_common_dispatch
    hndl_syscall // rdx, pushed in hi64_syscall
    hndl_unix_scall64
    unix_syscall64
    error = (*(callp->sy_call))((void *)p, vt, &(uthread->uu_rval[0])); // now we're there
    ```
  16. @yrp604 yrp604 created this gist Mar 24, 2016.
    38 changes: 38 additions & 0 deletions xnu-syscall-life-amd64.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,38 @@
    A while ago when starting to audit XNU syscalls, I noticed something kind of funny. To preface, everything here is specific only to xnu on amd64, though may apply to other architectures. Let's use the `exit()` syscall as an example. Exit is defined in `xnu/bsd/kern/kern_exit.c` as:

    ```
    void exit(proc_t p, struct exit_args *uap, int *retval)
    ```

    Where `p` is the process executing the syscall, `uap` is a pointer to a struct containing the user args, and `retval` is a pointer that will contain the result of the syscall. However, this seems kind of odd -- OS X uses the SystemV ABI everywhere, including syscalls and this means the syscall arguments are passed in registers (`rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`) with the syscall number in `rax`. This raises an obvious question: where do these values get moved from registers to memory, and where is that memory located (userspace vs kernelspace).

    Starting in `xnu/osfmk/x86_64/idt64.s` we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative), unix syscalls (positive syscalls under 0x6000), and PPC syscalls (positive syscalls over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, with different constants. The syscall number is still stored in `rax`

    * Mach Traps: `rax & 0x01 << 24`
    * Unix Syscall: `rax & 0x02 << 24`
    * Machine Dependent: `rax & 0x03 << 24`
    * Diagnostics: `rax & 0x04 << 24`
    * Mach IPC (unused?): `rax & 0x05 << 24`

    These come from a combination of the constants defined in `xnu/osfmk/mach/i386/syscall_sw.h` and `hndl_syscall` from `xnu/osfmk/x86_64/idt64.s`. Reading `hndl_syscall` will explain why when shellcoding for xnu you must add `0x20000000` to your syscall numbers -- otherwise they won't be appropriately dispatched to the right syscall handler.

    Following the unix syscall path in `hndl_syscall` we jump to `hndl_unix_scall64` which in turn calls `unix_syscall64` with a single argument of `r15`. This function is defined in `xnu/bsd/dev/i386/systemcalls.c`. From here, it's easiest to just snip the relevant code to our question:

    ```
    thread = current_thread();
    uthread = get_bsdthread_info(thread);
    // regs is derrived from r15 ...
    code = regs->rax & SYSCALL_NUMBER_MASK;
    callp = (code >= NUM_SYSENT) ? &sysent[63] : &sysent[code];
    // ...
    vt = (void *)uthread->uu_arg;
    // ...
    memcpy(vt, args_start_at_rdi ? &regs->rdi : &regs->rsi,
    args_in_regs * sizeof(syscall_arg_t));
    // ...
    error = (*(callp->sy_call))((void *)p, vt, &(uthread->uu_rval[0]));
    ```

    To briefly explain this code: first we're getting the current thread struct. Second we're getting the system call entry out of the syscall table. This includes the number of arguments the syscall expects, as well as the function pointer (`sy_call`). Third we're getting a chunk of memory out of the current thread struct, and finally we're copying the arguments from saved reg state into the specified memory on the kernels thread struct.

    This pretty much solves our mystery: the interrupt handler pushes all the registers onto the kernel stack, and that kernel stack is in turn copied into the thread's struct. The address of the memory inside the thread struct is passed to our syscall, which uses it for referencing all arguments.