This is very important for reproducible builds that is a basics for secure delivery.
- Reproducible builds: Archives has a hint to use the
-X - https://fekir.info/post/reproducible-zip-archives/ more detailed article how to make deterministic ZIP in Python and with a shell command
- https://www.npmjs.com/package/deterministic-zip a NodeJS package to reproducibly zip
- https://stackoverflow.com/questions/62524167/zip-non-deterministic-result-in-linux has some hints
- https://wiki.debian.org/ReproducibleBuilds/TimestampsInZip Debian detects non-deterministic zip are used and reports as an error. Some workaround is proposed.
And archive preserves a user and group (usually only their ids uid/gid) and time of last modification mtime.
The time is almost always not important so you can set standard static reproducible date 1 Feb 1080.
The date is used in many tools like Maven, Gradle etc.
Or instead you can use SOURCE_DATE_EPOCH env variable and use a date from git log.
Owner uid/gid can be just zeroed.
reproducible_tar() {
src_folder=$1
tar \
--remove-files \
--sort=name \
--mtime='UTC 1980-02-01' \
--owner=0 --group=0 --numeric-owner \
--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
-z \
-cf $src_folder.tar.gz $src_folder/
TZ=UTC touch -a -m -t 198002010000.00 $src_folder.tar.gz
}
reproducible_tar test_targzHere
--remove-fileswill remove the source folder once it was successfully compressed.--sort=namewill sort files so their order in tar will be always the same--mtime='UTC 1980-02-01'sets a modification time to a standard static reproducible date 1 Feb 1080 in UTC.--owner=0 --group=0 --numeric-ownerremove owner uid and gid.--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctimeremove headers with access timeatimeandctime.-zto compress the tar to gzip format. You may use--use-compress-program 'gzip -9'to set thegzipoptions like max compression.
The resulted archive also is better to arrange with mtime. The touch -a -m sets the access and modification times.
Now to uncompress use:
pushd ./folder_with_archive || exit 1
tar -xf $src_folder.tar.gz
rm -f $src_folder.tar.gzThe tar doesn't have an option to delete the archive during decompression.
In gzip this is a default behaviour and to keep an archive you should add the -k or --keep option.
Not sure why the tar doesn't work like that.
So the last command is manual remove of the archive with rm.
You may have a big archive and removing parts while extracting may safe sometimes.
For example on a router with a small NAND flash. Unarchiving may fail so anyway this would be dangerous.
Deletion may happen only after a successful uncompression.
reproducible_zip() {
src_folder=$1
TZ=UTC find . -exec touch --no-dereference -a -m -t 198002010000.00 {} +
TZ=UTC zip -q --move --recurse-paths --symlinks -X $src_folder.zip $src_folder
TZ=UTC touch -a -m -t 198002010000.00 $src_folder.zip
}
reproducible_zip test_zipThe ZIP doesn't an option to set the mtime so we have to change the mtime of all files and symlinks in the folder and only then zip it.
Zip command options:
-qis quite--moveor-mwill remove files once zip is complete--recurse-pathsto compress all subfolders--symlinksadd symlinks too, otherwise they'll ignored-Xor--no-extra(not supported for some reason) is used to removeuid/gidfields.
The resulted archive also is better to arrange with mtime. The touch -a -m sets the access and modification times.
To unarchive zip you also have to specify UTC timezone otherwise it will set files time in local timezone.
If you have symlinks the additionally also you'll have to touch them to set mtime.
pushd ./folder_with_archive || exit 1
TZ=UTC unzip -q $src_folder.zip
# unzip doesn't restore mtime of symlinks (bug?), so update it manually
TZ=UTC find . -exec touch --no-dereference -a -m -t 198002010000.00 {} +
rm -f $src_folder.zipThe unzip doesn't have an option to delete the archive during decompression.
So the last command is manual remove of the archive with rm.