Last active
August 30, 2025 11:49
-
-
Save kyle0r/32d77ea5f3b451814c06a5947a248c03 to your computer and use it in GitHub Desktop.
Revisions
-
kyle0r revised this gist
Aug 30, 2025 . 1 changed file with 73 additions and 13 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,30 +1,89 @@ I wrote and used these snippets to compare two root-filesystem backups from a KVM but they could be used for any similar directory hierarchy. The snippets helped me spot deltas between the two hierarchies and confirm the newer backup superseded the older one, so the old copy could be discarded. # Summary This Gist presents a concise yet practical set of **shell snippet recipes** that help to visually and functionally compare two similar directory hierarchies (trees). For example, comparing two root filesystem backups, but it works for any pair of paths. The gist outlines three main approaches: 1. **comm** → review output via `less -S`. 1. **vimdiff** → interactively explore differences. 1. **rhash** → compare checksums for exact binary verification, then view differences with `comm` or `vimdiff`. - Great for **backup verification, spotting file changes, and ensuring integrity**. - Provides **well-documented, adaptable, and safe pipelines**. - Useful for **sysadmins, DevOps engineers, and anyone working with filesystem snapshots**. # Details 1. **Using `comm`** - Generate sorted lists of files (including their paths and sizes) from two directories, optionally pruning specific paths (like `./var`). - Use `comm -3` to show items unique to each path list. 2. **Using `vimdiff`** - Similar to `comm`, but instead pipes the results into `vimdiff` for a side-by-side interactive visual diff. 3. **Using `rhash` with recursive checksum comparison** - Generate SFV (CRC32) checksum lists of each directory’s non-empty files (`--sfv`). - Optionally prune subdirectories and skip performance stats. - Use `comm -3` or `vimdiff` on sorted checksum lists to highlight true content differences. ## Pipeline best practices covered - Subshell usage preserves the current working directory and anchors the comparison by removing parent paths that would otherwise cause mismatches. - Progress feedback (`time`, `--speed`, `--percents`) - Safe handling of filenames with whitespace (`-print0`, `xargs -0`) - Ignoration of SFV comment lines (`grep -v '^;'`) - Ensuring proper sorting for reliable diffs - Efficient browsing with `less -S` ## What a reader can learn - **Quick Directory Diffs with Size Context** Spot missing or added files using `comm`. - **Visual Confirmation** Explore side-by-side diffs interactively with `vimdiff`. - **Content Integrity Check** Verify identical files using checksum-based methods with `rhash`. - **Flexible Pruning** Exclude irrelevant paths (like logs or caches). - **Safe File Handling** Use proper flags and piping to handle edge cases (spaces, special chars). - **Deeper Tool Understanding** See how Unix tools like `comm`, `vimdiff`, `rhash`, `grep`, and `less` combine for robust comparisons. ## Reader's guide ### 1. Choose Your Comparison Strategy - **Fast and basic**: Use `comm` for quick detection of additions/deletions. - **Visual and interactive**: Use `vimdiff` for detailed inspection. - **Thorough and content-based**: Use `rhash` to ensure files are truly identical. ### 2. Adapt the Snippets - Change directory paths to suit your case. - Adjust prunes (e.g., skip `./var`). - For large sets, remove performance flags in `rhash` for speed. --- # The snippets ## `comm` pipeline - Runs two subshells producing lists of file paths and file size sorted by file paths: - Subshells are anchored to the specified dir - Left: `cd` to `/mnt/tmp-vey-disk-2`, skip `./var`, `find` files and print path+size, `sort` by path. - Right: `cd` to `/mnt/tmp-vey-disk-1`, `find` files, print path+size, `sort` by path. - `comm -3` compares the two sorted streams and prints lines unique to each (suppresses common lines). - Pipe to `less -S` to view results with horizontal scrolling. ``` comm -3 \ <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \ <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) |less -S ``` ## `vimdiff` pipeline As the previous invocation but swaps `comm` for `vimdiff` and removes the `less -S` ``` vimdiff \ @@ -121,3 +180,4 @@ As above but prunes a path during the find invocation. ## Explainer for `rhash` results comparison with `vimdiff` As above but invokes a visual diff with `vimdiff`. -
kyle0r revised this gist
Aug 26, 2025 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ Some shell snippets to document side-by-side visual comparison of similar directory hierarchies (trees) — three examples: one using `comm`, one using `vimdiff` and one using `rhash`. Includes a demonstration of how to prune a path with `find`. -
kyle0r revised this gist
Aug 26, 2025 . 1 changed file with 22 additions and 20 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,4 @@ Quick snippet to document side-by-side visual comparison of similar directory hierarchies (trees) — three examples: one using `comm`, one using `vimdiff` and one using `rhash` @@ -12,6 +13,7 @@ They let me spot deltas between the two hierarchies and confirm the newer backup # `comm` pipeline - Runs two subshells producing lists of file paths and file size sorted by file paths: - Subshells are anchored to the specified dir - Left: cd to /mnt/tmp-vey-disk-2, skip ./var, find files and print path+size, sort by path. - Right: cd to /mnt/tmp-vey-disk-1, find files, print path+size, sort by path. - comm -3 compares the two sorted streams and prints lines unique to each (suppresses common lines). @@ -30,6 +32,8 @@ vimdiff \ <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) ``` ----- # What about a recursive hash comparison? Here is a set of pipelines that should give you a useful result: @@ -47,7 +51,6 @@ Here is a set of pipelines that should give you a useful result: | xargs -0 -- rhash --speed --percents --sfv -- \ ) > /tmp/right-tmp-find-pipeline.sfv ``` As above but prunes a path. 💡🏁 `rhash` will be more performant with a lot of small files if you remove the `--speed --percents` options, which will mute the statistics calculations and output. @@ -67,24 +70,23 @@ vimdiff \ ``` ## Explainer for creating the left-hand checksums - Runs a subshell so the current shell's cwd is unchanged and rooted at the specified dir. - Inside that subshell: - time measures and prints the wall/CPU time for the pipeline. - `find . -size +0c -a -type f -a -print0` - starts at ., finds regular files (-type f) with size > 0 bytes (-size +0c). - `-print0` emits NUL-terminated filenames (safe for whitespace/newlines). - `| xargs -0 -- rhash --speed --percents --sfv --` - `xargs -0` reads the NUL-separated names and supplies them as arguments to rhash. - the `--` after xargs stops option parsing; the `--` passed to rhash indicates no more options (file args follow). - rhash options: - `--speed` and `--percents` show progress/performance. - `--sfv` requests SFV (CRC32) output. - Given file arguments, `rhash` computes checksums and writes the SFV to stdout. - The final shell redirection (`> /tmp/left-tmp-find-pipeline.sfv`) captures rhash's stdout (the SFV) into `/tmp/left-tmp-find-pipeline.sfv`. **Summary:** In a subshell rooted at `./tmp-vey-disk-1`, this finds all non-empty regular files, computes CRC32 SFV entries for them with progress output, saves the SFV to `/tmp/left-tmp-find-pipeline.sfv`, and reports timing. ## Explainer for creating the right-hand checksums @@ -97,25 +99,25 @@ As above but prunes a path during the find invocation. - -3 suppresses column 3 (lines common to both), leaving only lines unique to A (output column 1) and unique to B (output column 2). - Process substitution for left stream: `<( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . | sort )` - `grep -v '^;' /tmp/left-tmp-find-pipeline.sfv` - Remove lines starting with ';' (rhash SFV comment/header lines). - `| grep .` - Remove empty lines (keep only non-blank lines). - `| sort` - Ensure the stream is sorted; required by comm to work correctly. - Process substitution for right stream: `<( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort )` - Same steps as left but operating on the right SFV file. - Pipe to less -S - `less -S` shows output with horizontal truncation (no line wrapping) and lets you scroll. - Useful because SFV lines contain "path<TAB>size" and can be long. - Overall effect - Produce a paged view of entries present in only one SFV or the other (paths and checksums), excluding SFV comment/header lines and blank lines. This helps spot files that differ at the binary level between the left and right directory hierarchies. ## Explainer for `rhash` results comparison with `vimdiff` As above but invokes a visual diff with `vimdiff`. -
kyle0r revised this gist
Aug 26, 2025 . 1 changed file with 12 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -36,12 +36,16 @@ Here is a set of pipelines that should give you a useful result: ## `rhash` the left-hand side ``` ( cd ./tmp-vey-disk-1 ; time find . -size +0c -a -type f -a -print0 \ | xargs -0 -- rhash --speed --percents --sfv -- \ ) > /tmp/left-tmp-find-pipeline.sfv ``` ## `rhash` the right-hand side ``` ( cd ./tmp-vey-disk-2 ; time find . -path './var' -a -prune -o \( -size +0c -a -type f -a -print0 \) \ | xargs -0 -- rhash --speed --percents --sfv -- \ ) > /tmp/right-tmp-find-pipeline.sfv ``` As above but prunes a path. @@ -50,12 +54,16 @@ As above but prunes a path. ## Compare the results with `comm` ``` comm -3 \ <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) \ <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) | less -S ``` ## Compare the results with `vimdiff` ``` vimdiff \ <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) \ <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) ``` ## Explainer for creating the left-hand checksums -
kyle0r revised this gist
Aug 26, 2025 . 1 changed file with 78 additions and 8 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ Quick snippet to document side-by-side visual comparison of similar directory hierarchies (trees) — three examples: one using `comm`, one using `vimdiff` and one using `rhash` Includes a demonstration of how to prune a path with `find`. The `comm` and `vimdiff` give a quick indication of differences between two similar directory hierarchies (trees). They're a good first step before deciding whether a recursive hash comparison is worthwhile. I wrote and used these snippets to compare two root-filesystem backups from a KVM but they could be used for any similar directory hierarchy. @@ -23,7 +23,7 @@ comm -3 \ ``` # `vimdiff` pipeline As the previous invocation but swaps `comm` for `vimdiff` and removes the `less -S` ``` vimdiff \ <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \ @@ -32,12 +32,82 @@ vimdiff \ # What about a recursive hash comparison? Here is a set of pipelines that should give you a useful result: ## `rhash` the left-hand side ``` ( cd ./tmp-vey-disk-1 ; time find . -size +0c -a -type f -a -print0 | xargs -0 -- rhash --speed --percents --sfv -- ) > /tmp/left-tmp-find-pipeline.sfv ``` ## `rhash` the right-hand side ``` ( cd ./tmp-vey-disk-2 ; time find . -path './var' -a -prune -o \( -size +0c -a -type f -a -print0 \) | xargs -0 -- rhash --speed --percents --sfv -- ) > /tmp/right-tmp-find-pipeline.sfv ``` As above but prunes a path. 💡🏁 `rhash` will be more performant with a lot of small files if you remove the `--speed --percents` options, which will mute the statistics calculations and output. ## Compare the results with `comm` ``` comm -3 <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) | less -S ``` ## Compare the results with `vimdiff` ``` vimdiff <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) ``` ## Explainer for creating the left-hand checksums - Runs a subshell so the current shell's cwd is unchanged: ( cd ./tmp-vey-disk-1 ; ... ) - Inside that subshell: - time measures and prints the wall/CPU time for the pipeline. - find . -size +0c -a -type f -a -print0 - starts at ., finds regular files (-type f) with size > 0 bytes (-size +0c). - -print0 emits NUL-terminated filenames (safe for whitespace/newlines). - | xargs -0 -- rhash --speed --percents --sfv -- - xargs -0 reads the NUL-separated names and supplies them as arguments to rhash. - the -- after xargs stops option parsing; the -- passed to rhash indicates no more options (file args follow). - rhash options: - --speed and --percents show progress/performance. - --sfv requests SFV (CRC32) output. - Given file arguments, rhash computes checksums and writes the SFV to stdout. - The final shell redirection (> /tmp/left-tmp-find-pipeline.sfv) captures rhash's stdout (the SFV) into /tmp/left-tmp-find-pipeline.sfv. Summary: In a subshell rooted at ./tmp-vey-disk-1, this finds all non-empty regular files, computes CRC32 SFV entries for them with progress output, saves the SFV to /tmp/left-tmp-find-pipeline.sfv, and reports timing. ## Explainer for creating the right-hand checksums As above but prunes a path during the find invocation. ## Explainer for `rhash` results comparison with `comm` - comm -3 A B - Compares two sorted text streams A and B. - -3 suppresses column 3 (lines common to both), leaving only lines unique to A (output column 1) and unique to B (output column 2). - Process substitution for left stream: <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . | sort ) - grep -v '^;' /tmp/left-tmp-find-pipeline.sfv - Remove lines starting with ';' (rhash SFV comment/header lines). - | grep . - Remove empty lines (keep only non-blank lines). - | sort - Ensure the stream is sorted; required by comm to work correctly. - Process substitution for right stream: <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) - Same steps as left but operating on the right SFV file. - Pipe to less -S - less -S shows output with horizontal truncation (no line wrapping) and lets you scroll. - Useful because SFV lines contain "path<TAB>size" and can be long. - Overall effect - Produce a paged view of entries present in only one SFV or the other (paths and checksums), excluding SFV comment/header lines and blank lines. This helps spot files that differ at the binary level between the left and right directory hierarchies. ## Explainer for `rhash` results comparison with `vimdiff` As above but invokes a visual diff with `vimdiff`. -
kyle0r renamed this gist
Aug 26, 2025 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
kyle0r renamed this gist
Aug 26, 2025 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
kyle0r created this gist
Aug 26, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,43 @@ Quick snippet to document side-by-side visual comparison of similar directory hierarchies (trees) — two examples: one using `comm`, one using `vimdiff`. Includes a demonstration of how to prune a path with `find`. These snippets give a quick indication of differences between two similar directory hierarchies (trees). They're a good first step before deciding whether a recursive hash comparison is worthwhile. I wrote and used these snippets to compare two root-filesystem backups from a KVM but they could be used for any similar directory hierarchy. They let me spot deltas between the two hierarchies and confirm the newer backup superseded the older one, so the old copy could be discarded. # `comm` pipeline - Runs two subshells producing lists of file paths and file size sorted by file paths: - Left: cd to /mnt/tmp-vey-disk-2, skip ./var, find files and print path+size, sort by path. - Right: cd to /mnt/tmp-vey-disk-1, find files, print path+size, sort by path. - comm -3 compares the two sorted streams and prints lines unique to each (suppresses common lines). - Pipe to less -S to view results with horizontal scrolling. ``` comm -3 \ <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \ <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) |less -S ``` # `vimdiff` pipeline As above but swaps `comm` for `vimdiff` and removes the `less -S` ``` vimdiff \ <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \ <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) ``` # What about a recursive hash comparison? I'd suggest to use `rhash` for this. Run it once for the left directory hierarchy and once for the right. For example: Replace `<checksum_filename>` and `<path_to_dir>` based on your scenario. ``` time rhash --recursive --speed --percents --sfv=<checksum_filename>.sfv <path_to_dir> ``` Then sort and compare the SFV files to locate binary differences between the two directory hierarchies.