The life of an XNU unix syscall on amd64

A while ago when starting to audit XNU syscalls, I noticed something kind of funny. To preface, everything here is specific only to xnu on amd64, though may apply to other architectures. Let's use the exit() syscall as an example. Exit is defined in xnu/bsd/kern/kern_exit.c as:

void exit(proc_t p, struct exit_args *uap, int *retval)

Where p is the process executing the syscall, uap is a pointer to a struct containing the user args, and retval is a pointer that will contain the result of the syscall. However, this seems kind of odd -- OS X uses the SystemV ABI everywhere, including syscalls and this means the syscall arguments are passed in registers (rdi, rsi, rdx, rcx, r8, r9) with the syscall number in rax. This raises an obvious question: where do these values get moved from registers to memory, and where is that memory located (userspace vs kernelspace).

Starting in xnu/osfmk/x86_64/idt64.s we find the interrupt and subsequent syscall handling code. Specifically, we find something kind of interesting: xnu is well known for having two "types" of syscalls: traditional unix syscalls and mach traps. Going back to old nemo articles we see him discuss three types of syscalls: mach traps (negative), unix syscalls (positive syscalls under 0x6000), and PPC syscalls (positive syscalls over 0x6000) [uninformed 4.3]. Today the layout is conceptually the same, but with more types of syscalls, with different constants. The syscall number is still stored in rax

Mach Traps: rax & 0x01 << 24
Unix Syscall: rax & 0x02 << 24
Machine Dependent: rax & 0x03 << 24
Diagnostics: rax & 0x04 << 24
Mach IPC (unused?): rax & 0x05 << 24

These come from a combination of the constants defined in xnu/osfmk/mach/i386/syscall_sw.h and hndl_syscall from xnu/osfmk/x86_64/idt64.s. Reading hndl_syscall will explain why when shellcoding for xnu you must add 0x20000000 to your syscall numbers -- otherwise they won't be appropriately dispatched to the right syscall handler.

Following the unix syscall path in hndl_syscall we jump to hndl_unix_scall64 which in turn calls unix_syscall64 with a single argument of r15. This function is defined in xnu/bsd/dev/i386/systemcalls.c. From here, it's easiest to just snip the relevant code to our question:

  thread = current_thread();
  uthread = get_bsdthread_info(thread);
  // regs is derrived from r15 ...
  code = regs->rax & SYSCALL_NUMBER_MASK;
  callp = (code >= NUM_SYSENT) ? &sysent[63] : &sysent[code];
  // ...
  vt = (void *)uthread->uu_arg;
  // ...
  memcpy(vt, args_start_at_rdi ? &regs->rdi : &regs->rsi,
        args_in_regs * sizeof(syscall_arg_t));
  // ...
  error = (*(callp->sy_call))((void *)p, vt, &(uthread->uu_rval[0]));

To briefly explain this code: first we're getting the current thread struct. Second we're getting the system call entry out of the syscall table. This includes the number of arguments the syscall expects, as well as the function pointer (sy_call). Third we're getting a chunk of memory out of the current thread struct, and finally we're copying the arguments from saved reg state into the specified memory on the kernels thread struct.

This pretty much solves our mystery: the interrupt handler pushes all the registers onto the kernel stack, and that kernel stack is in turn copied into the thread's struct. The address of the memory inside the thread struct is passed to our syscall, which uses it for referencing all arguments.

fatgrass/xnu-syscall-life-amd64.md

Select an option

No results found

Select an option

No results found