# Migrating From SVN to Git This gist details the following: 1. [Converting a Subversion (SVN) repository into a Git repository](https://gist.github.com/barrysteyn/2ba947313e0a4ad086c3#migrating-from-svn-to-git-is-roughly-split-into-three-steps) 2. [Purging the resultant Git repository of large files](https://gist.github.com/barrysteyn/2ba947313e0a4ad086c3#find-and-purge-large-files-from-git-history) ## Migrating from SVN to Git is roughly split into three steps: 1. Retrieve a list of SVN commit usernames 2. Match SVN usernames to email addresses 3. Migrate to Git using git-svn clone command ### Step 1: Retrieve A List Of SVN Commit Usernames A SVN commit only lists a user's *username*. Git on the other hand lists much more details, but at the very least, a git commit author needs both a *username* and an *email* address associated to that username. Since the email address is not available in SVN, it needs to be manually matched. A list of usernames as recorded by SVN therefore needs to be created for the match. The following command will result in a file called *authors.txt* which will have the SVN usernames as its contents: ``` svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt ``` ### Step 2: Match SVN usernames to email addresses The contents of *authors.txt* is in the following format: ``` jwilkins = jwilkins ``` It needs to be converted into this: ``` jwilkins = John Albin Wilkins ``` ### Step 3: Migrate To Git Using git-svn clone Command Create a folder where the git clone is to be stored, and then do the following: ``` git svn clone --stdlayout --authors-file=path/to/authors.txt ``` This last step may take some time, but it should result in a Git repo. ##Find And Purge Large Files From Git History Git (at least GitHub) seems to be stricter than SVN regarding large files. In order to migrate a SVN repository to Git, one may need to purge these files from the Git history. ### Step 1: Determine The Files That Are Large Go to newly created Git repo and do the following: ``` git rev-list --objects --all | sort -k 2 > allfileshas.txt;git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt ``` This will result in two files: 1. *allfileshas.txt* - a list of all sha's in the git repo 2. *bigobjects.txt* - a list of sha's representing objects that are large To transform these two files into a list of file names and sorted by size in descending order: ``` for SHA in `cut -f 1 -d\ < bigobjects.txt`; do echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print$1,$3,$7}' >> bigtosmall.txt; done ``` **NOTE**: The above script may take a long time (and may never stop), so after 2 minutes (max), just ctr-c stop it. The resulting file, `bigtosmall.txt` will contain a list of file names, sorted from largest to smallest. ### Step 2: Purge The Files From The Git History Select files (or even a directory) from `bigtosmall.txt` that you want purged. Then run the following for each file, substituing `MY-BIG-DIRECTORY-OR-FILE` with the directory or file that is to be purged: ``` git filter-branch -f --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all ```