Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hlucasfranca/0ed21c0f30f4ffbb4f80f738e488a894 to your computer and use it in GitHub Desktop.
Save hlucasfranca/0ed21c0f30f4ffbb4f80f738e488a894 to your computer and use it in GitHub Desktop.

Revisions

  1. @invalid-email-address Anonymous created this gist May 14, 2012.
    185 changes: 185 additions & 0 deletions git-compressing-and-deltas.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,185 @@
    # Git Compression of Blobs and Packfiles.
    Many users of Git are [curious about](http://programmers.stackexchange.com/questions/148434/why-do-git-mercurial-repositories-use-less-space/148498#148498) the lack of delta compression at the object (blob) level when commits are first written. This efficiency is saved until the pack file is written. Loose objects are written in compressed, but non-delta format at the time of each commit.

    A simple run though of a commit sequence with only the smallest change to the image (in uncompressed TIFF format to amplify the observable behavior) aids the understanding of this deferred and different approach efficiency.


    ## The command sequence:

    Create the repo:

    $ git init test6
    Initialized empty Git repository in /Users/mccm06/Documents/Temp/Scratch/test6/.git/
    [master (root-commit) 05e9c3e] First-commit
    0 files changed
    create mode 100644 README

    $ du -c
    72 ./.git/hooks
    8 ./.git/info
    8 ./.git/logs/refs/heads
    8 ./.git/logs/refs
    16 ./.git/logs
    8 ./.git/objects/05
    8 ./.git/objects/54
    8 ./.git/objects/e6
    0 ./.git/objects/info
    0 ./.git/objects/pack
    24 ./.git/objects
    8 ./.git/refs/heads
    0 ./.git/refs/tags
    8 ./.git/refs
    0 ./.git/rr-cache
    168 ./.git
    168 .
    168 total

    There's only a total of 168kb for the entire repo and working directory.

    Now copy in the white image:

    $ cp ../completely-white.tiff .

    And show how large that image is (5294kb) in an uncompressed TIFF format:

    $ ls -al
    total 10344
    drwxr-xr-x 5 170 .
    drwxrwxr-x 6 204 ..
    drwxr-xr-x 15 510 .git
    -rw-r--r-- 1 0 README
    -rw-r--r-- 1 5294996 completely-white.tiff

    And show the size of the entire repo and working copy:

    $ du -c
    72 ./.git/hooks
    8 ./.git/info
    8 ./.git/logs/refs/heads
    8 ./.git/logs/refs
    16 ./.git/logs
    8 ./.git/objects/05
    8 ./.git/objects/54
    8 ./.git/objects/e6
    0 ./.git/objects/info
    0 ./.git/objects/pack
    24 ./.git/objects
    8 ./.git/refs/heads
    0 ./.git/refs/tags
    8 ./.git/refs
    0 ./.git/rr-cache
    168 ./.git
    10512 .
    10512 total

    Now add that file to the staging area and commit it.

    $ git add .
    $ git commit -m"White image added"
    [master f93724b] White image added
    1 file changed, 0 insertions(+), 0 deletions(-)
    create mode 100644 completely-white.tiff

    $ du -c
    72 ./.git/hooks
    8 ./.git/info
    8 ./.git/logs/refs/heads
    8 ./.git/logs/refs
    16 ./.git/logs
    8 ./.git/objects/05
    8 ./.git/objects/54
    8 ./.git/objects/87
    56 ./.git/objects/90
    8 ./.git/objects/e6
    8 ./.git/objects/f9
    0 ./.git/objects/info
    0 ./.git/objects/pack
    96 ./.git/objects
    8 ./.git/refs/heads
    0 ./.git/refs/tags
    8 ./.git/refs
    0 ./.git/rr-cache
    240 ./.git
    10584 .
    10584 total

    The file is compressed when saved to its blob in the objects/90 directory and thus is only 56kb in size.

    Make minor edits to the image:

    $ open completely-white.tiff -a Pixelmator

    $ git status
    # On branch master
    # Changes not staged for commit:
    # (use "git add <file>..." to update what will be committed)
    # (use "git checkout -- <file>..." to discard changes in working directory)
    #
    # modified: completely-white.tiff
    #
    no changes added to commit (use "git add" and/or "git commit -a")

    $ git add .
    $ git commit -m"Dot added to white image"
    [master 3015c4a] Dot added to white image
    1 file changed, 0 insertions(+), 0 deletions(-)

    Now, after the second minor alteration, we can see there is an equally sized 56kb directory called 31 that contains the blob for the modified image.

    $ du -c
    72 ./.git/hooks
    8 ./.git/info
    8 ./.git/logs/refs/heads
    8 ./.git/logs/refs
    16 ./.git/logs
    8 ./.git/objects/05
    8 ./.git/objects/30
    56 ./.git/objects/31
    8 ./.git/objects/47
    8 ./.git/objects/54
    8 ./.git/objects/87
    56 ./.git/objects/90
    8 ./.git/objects/e6
    8 ./.git/objects/f9
    0 ./.git/objects/info
    0 ./.git/objects/pack
    168 ./.git/objects
    8 ./.git/refs/heads
    0 ./.git/refs/tags
    8 ./.git/refs
    0 ./.git/rr-cache
    312 ./.git
    10656 .
    10656 total

    Lastly, we will compress the history into a single packfile instead of loose objects:

    $ git gc --aggressive
    Counting objects: 9, done.
    Delta compression using up to 4 threads.
    Compressing objects: 100% (7/7), done.
    Writing objects: 100% (9/9), done.
    Total 9 (delta 1), reused 0 (delta 0)
    $ du -c
    72 ./.git/hooks
    16 ./.git/info
    8 ./.git/logs/refs/heads
    8 ./.git/logs/refs
    16 ./.git/logs
    8 ./.git/objects/info
    32 ./.git/objects/pack
    40 ./.git/objects
    0 ./.git/refs/heads
    0 ./.git/refs/tags
    0 ./.git/refs
    0 ./.git/rr-cache
    192 ./.git
    10536 .
    10536 total

    And you can see that the combined packfile is only 40kb instead of 168kb when stored separately in two unique blobs in separate object directories.

    ## References
    * [Git Internal References](http://git-scm.com/book/en/Git-Internals-Git-References)
    * [Packfiles](http://git-scm.com/book/en/Git-Internals-Packfiles)
    * [StackOverflow question on Git Efficiency](http://programmers.stackexchange.com/questions/148434/why-do-git-mercurial-repositories-use-less-space/148498#148498)