Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save kyle0r/32d77ea5f3b451814c06a5947a248c03 to your computer and use it in GitHub Desktop.

Select an option

Save kyle0r/32d77ea5f3b451814c06a5947a248c03 to your computer and use it in GitHub Desktop.
Side-by-side visual comparison of similar directory hierarchies with comm, vimdiff and rhash

Quick snippet to document side-by-side visual comparison of similar directory hierarchies (trees) — two examples:
one using comm, one using vimdiff.

Includes a demonstration of how to prune a path with find.

These snippets give a quick indication of differences between two similar directory hierarchies (trees).
They're a good first step before deciding whether a recursive hash comparison is worthwhile.

I wrote and used these snippets to compare two root-filesystem backups from a KVM but they could be used for any similar directory hierarchy.

They let me spot deltas between the two hierarchies and confirm the newer backup superseded the older one, so the old copy could be discarded.

comm pipeline

  • Runs two subshells producing lists of file paths and file size sorted by file paths:
    • Left: cd to /mnt/tmp-vey-disk-2, skip ./var, find files and print path+size, sort by path.
    • Right: cd to /mnt/tmp-vey-disk-1, find files, print path+size, sort by path.
  • comm -3 compares the two sorted streams and prints lines unique to each (suppresses common lines).
  • Pipe to less -S to view results with horizontal scrolling.
comm -3 \
  <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \
  <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) |less -S

vimdiff pipeline

As above but swaps comm for vimdiff and removes the less -S

vimdiff \
  <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \
  <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 )

What about a recursive hash comparison?

I'd suggest to use rhash for this. Run it once for the left directory hierarchy and once for the right. For example:

Replace <checksum_filename> and <path_to_dir> based on your scenario.

time rhash --recursive --speed --percents --sfv=<checksum_filename>.sfv <path_to_dir>

Then sort and compare the SFV files to locate binary differences between the two directory hierarchies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment