This is very important for [reproducible builds](https://reproducible-builds.org/) that is a basics for secure delivery. * [Reproducible builds: Archives](https://reproducible-builds.org/docs/archives/#zip-files) has a hint to use the `-X` * https://fekir.info/post/reproducible-zip-archives/ more detailed article how to make deterministic ZIP in Python and with a shell command * https://www.npmjs.com/package/deterministic-zip a NodeJS package to reproducibly zip * https://stackoverflow.com/questions/62524167/zip-non-deterministic-result-in-linux has some hints * https://wiki.debian.org/ReproducibleBuilds/TimestampsInZip Debian detects non-deterministic zip are used and reports as an error. Some workaround is proposed. And archive preserves a user and group (usually only their ids uid/gid) and time of last modification `mtime`. The time is almost always not important so you can set standard static reproducible date 1 Feb 1080. The date is used in many tools like Maven, Gradle etc. Or instead you can use `SOURCE_DATE_EPOCH` env variable and use a date from git log. Owner uid/gid can be just zeroed. ## deterministically archive folder to .tar.gz and remove the folder ```sh reproducible_tar() { src_folder=$1 tar \ --remove-files \ --sort=name \ --mtime='UTC 1980-02-01' \ --owner=0 --group=0 --numeric-owner \ --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \ -z \ -cf $src_folder.tar.gz $src_folder/ TZ=UTC touch -a -m -t 198002010000.00 $src_folder.tar.gz } reproducible_tar test_targz ``` Here * `--remove-files` will remove the source folder once it was successfully compressed. * `--sort=name` will sort files so their order in tar will be always the same * `--mtime='UTC 1980-02-01'` sets a modification time to a standard static reproducible date 1 Feb 1080 in UTC. * `--owner=0 --group=0 --numeric-owner` remove owner uid and gid. * `--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime` remove headers with access time `atime` and `ctime`. * `-z` to compress the tar to gzip format. You may use `--use-compress-program 'gzip -9'` to set the `gzip` options like max compression. The resulted archive also is better to arrange with mtime. The `touch -a -m` sets the access and modification times. Now to uncompress use: ```sh pushd ./folder_with_archive || exit 1 tar -xf $src_folder.tar.gz rm -f $src_folder.tar.gz ``` The tar doesn't have an option to delete the archive during decompression. In gzip this is a default behaviour and to keep an archive you should add the `-k` or `--keep` option. Not sure why the tar doesn't work like that. So the last command is manual remove of the archive with `rm`. You may have a big archive and removing parts while extracting may safe sometimes. For example on a router with a small NAND flash. Unarchiving may fail so anyway this would be dangerous. Deletion may happen only after a successful uncompression. ## deterministically archive folder to .zip and remove the folder ```sh reproducible_zip() { src_folder=$1 TZ=UTC find . -exec touch --no-dereference -a -m -t 198002010000.00 {} + TZ=UTC zip -q --move --recurse-paths --symlinks -X $src_folder.zip $src_folder TZ=UTC touch -a -m -t 198002010000.00 $src_folder.zip } reproducible_zip test_zip ``` The ZIP doesn't an option to set the `mtime` so we have to change the `mtime` of all files and symlinks in the folder and only then zip it. Zip command options: * `-q` is quite * `--move` or `-m` will remove files once zip is complete * `--recurse-paths` to compress all subfolders * `--symlinks` add symlinks too, otherwise they'll ignored * `-X` or `--no-extra` (not supported for some reason) is used to remove `uid`/`gid` fields. The resulted archive also is better to arrange with `mtime`. The `touch -a -m` sets the access and modification times. To unarchive zip you also have to specify `UTC` timezone otherwise it will set files time in local timezone. If you have symlinks the additionally also you'll have to touch them to set `mtime`. ```sh pushd ./folder_with_archive || exit 1 TZ=UTC unzip -q $src_folder.zip # unzip doesn't restore mtime of symlinks (bug?), so update it manually TZ=UTC find . -exec touch --no-dereference -a -m -t 198002010000.00 {} + rm -f $src_folder.zip ``` The unzip doesn't have an option to delete the archive during decompression. So the last command is manual remove of the archive with `rm`.