## Problem pkgsrc builds on macOS have historically been significantly more unreliable than all other operating systems. In my most recent builds there were over 400 instances of the error: ``` pkg_add: no pkg found for '', sorry. ``` This never happens on other OS, even though I use NFS for the packages directory on illumos, Linux, NetBSD, and macOS. ## Reproduce Two separate chroots. Each has the packages directory NFS mounted. In the first session, repeatedly update the directory: ``` $ while true; do dd if=/dev/random of=/Volumes/data/packages/Darwin/12.3/arm64/All/blah.tgz count=1000 rm -f /Volumes/data/packages/Darwin/12.3/arm64/All/blah.tgz done ``` Then attempt to simply install a package in the other session: ``` $ pkg_add digest-20220214 pkg_add: no pkg found for 'digest-20220214', sorry. pkg_add: 1 package addition failed ``` dtruss(1) shines some light: ``` $ dtruss pkg_add digest-20220214 ... open_nocancel("/Volumes/data/packages/Darwin/12.3/arm64/All\0", 0x1100004, 0x0) = 3 0 fstatfs64(0x3, 0x16FCAED50, 0x0) = 0 0 getdirentries64(0x3, 0x147809000, 0x2000) = 8112 0 getdirentries64(0x3, 0x147809000, 0x2000) = 8168 0 getdirentries64(0x3, 0x147809000, 0x2000) = 8112 0 getdirentries64(0x3, 0x147809000, 0x2000) = 8176 0 getdirentries64(0x3, 0x147809000, 0x2000) = 8136 0 getdirentries64(0x3, 0x147809000, 0x2000) = 8128 0 getdirentries64(0x3, 0x147809000, 0x2000) = -1 Err#2 close_nocancel(0x3) = 0 0 write_nocancel(0x2, "pkg_add: \0", 0x9) = 9 0 write_nocancel(0x2, "no pkg found for 'digest-20220214', sorry.\0", 0x2A) = 42 0 write_nocancel(0x2, "\n\0", 0x1) = 1 0 write_nocancel(0x2, "pkg_add: \0", 0x9) = 9 0 write_nocancel(0x2, "1 package addition failed\0", 0x19) = 25 0 write_nocancel(0x2, "\n\0", 0x1) = 1 0 ``` dtrace for the stack: ``` $ dtrace -n 'syscall::getdirentries64:return/errno != 0/ {ustack();}' -c "pkg_add digest-20220214" dtrace: description 'syscall::getdirentries64:return' matched 1 probe pkg_add: no pkg found for 'digest-20220214', sorry. pkg_add: 1 package addition failed dtrace: pid 14747 has exited CPU ID FUNCTION:NAME 2 855 getdirentries64:return libsystem_kernel.dylib`__getdirentries64+0x8 libsystem_c.dylib`readdir+0x2c pkg_add`fetchListFile+0x4c pkg_add`find_best_package_int+0x114 pkg_add`find_best_package+0x78 pkg_add`find_archive+0x118 pkg_add`pkg_do+0x4c pkg_add`pkg_perform+0x38 pkg_add`main+0x314 dyld`start+0x8bc ``` `getdirentries64()` is returning `ENOENT` due to the readdir being invalidated with the directory being updated. [`fetchListFile()`](https://github.com/NetBSD/pkgsrc/blob/trunk/net/libfetch/files/file.c#L232) is just doing a bog-standard `opendir()`/`readdir()`/`closedir()` loop the same as many other pieces of software, as well as manual page examples, with no `ENOENT` handling whatsoever. ## Open Questions Why is this only a problem on macOS? Looking around in lots of third party code, as well as manual page examples, none of them have any handling for retrying `readdir()` loops after checking for `ENOENT`. The macOS manual page doesn't even list `ENOENT` as a valid errno! ## Workaround Stupid workaround is to increase NFS sizes: ``` $ mount_nfs -o rwsize=1048576,dsize=1048576 ``` which doesn't make the problem go away but does make it significantly less likely. ## Fix Yes please!