PageRenderTime 31ms CodeModel.GetById 17ms app.highlight 11ms RepoModel.GetById 1ms app.codeStats 0ms

/workbook/markdown/27-Filter-Branch.md

http://github.com/matthewmccullough/git-workshop
Markdown | 111 lines | 51 code | 60 blank | 0 comment | 0 complexity | beb68c2b041c8a8aa6de4ed8c3f21367 MD5 | raw file
  1# Filter-Branch
  2This chapter was inspired by the Grails repo-splitting script from [Jeff Scott Brown of SpringSource](http://www.springsource.com/people/jbrown).
  3
  4## Pruning out folders from a repo
  5`filter-branch` is commonly used on a clone of the repo to split a too-large repo into smaller ones.
  6
  7## `filter-branch` Reference Page
  8
  9[Filter Branch Command Documentation](http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html)
 10
 11
 12## Process
 13We'll start with `bigrepo` and create a new repo that contains only `c`. If we wanted to split into multiple repos, we would simply clone for each desired final repo and run a `filter-branch` on each.
 14
 15
 16## Preparation
 17This is a destructive command and thus we need to clone the repository before we start operating on it. We will effectively be creating a new repository and leaving the old repository behind.
 18
 19Since git can optimize local-disk clones of repositories with hardlinks, we want to create a clone that is entirely separate from the original one:
 20
 21
 22    git clone --no-hardlinks /path/to/originalrepo newrepo
 23
 24
 25## Pruning
 26
 27### With a Subdirectory
 28
 29If we wish to only save the files in the `c` directory while purging all branches, we can run the following command. Oddly lightweight tags are kept here. This _relocates_ all files in the `c` subfolder to the _root_ of the new repository. This is usually not what we want. Users typically want to specify what to prune away, leaving all other folder structures intact.
 30
 31
 32    git filter-branch --subdirectory-filter c HEAD
 33
 34
 35The same command can have an additional option to keep all the branches.
 36
 37* the `--` separates filter-branch options from revision options
 38* the `--all` rewrites all branches and tags
 39* the `--prune-empty` removes commits that would no longer have any content
 40
 41
 42`git filter-branch --prune-empty --subdirectory-filter c HEAD -- --all`
 43
 44
 45### With the Tree and Checkouts
 46
 47Alternatively, we can use a tree filter which _prunes away_ the selected folder or filename pattern using shell commands. It checks out each commit and runs the command against it. This allows for the full power of any shell command to be leveraged, including greps.
 48
 49* `-f` force the `rm` or else commits where that file didn't exist would fail on the shell command.
 50* `--prune-empty` removes any commits that have no files (blank, empty) after the shell command performs its surgery.
 51
 52
 53`git filter-branch --tree-filter "rm -rf c" --prune-empty HEAD`
 54
 55
 56### With the Index
 57
 58A variation on this is the `--index-filter` which is much faster. It only operates on the DAG, not on checkouts and the staging area. It only uses git commands, not full shell commands.
 59
 60* `--cached` is supplied to leave untracked files alone. Only operate on tracked files.
 61* `--ignore-unmatch` is supplied to allow the command to always succeed for every commit, even if the file didn't exist.
 62
 63
 64`git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch c" --prune-empty HEAD`
 65
 66
 67And this variation that adds the `--tag-name-filter` and `-- --all` which keeps the `.git/refs/heads/original/refs/tags` folder, keeps all references to the original tags in the `/git/info/refs` file, and re-writes the tag to `.git/refs/tags/AGOODPOINT` and `.git/refs/heads/addingonefile` branch.
 68
 69* `--tag-name-filter cat` re-writes all tags
 70
 71
 72`git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch c" --prune-empty --tag-name-filter cat -- --all`
 73
 74
 75## Cleanup
 76
 77### Remove any `original` refs
 78
 79Many of the `filter-branch` invocations will create a `.git/refs/original` folder to allow for a restore after a `filter-branch` execution. These are still first class references and will cause the objects to be retained. If you have reviewed the results of the filter and are satisfied with the result, remove these refs so that the objects can be cleaned up in the next steps.
 80
 81`git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d`
 82
 83
 84### Expire all entries from the reflog
 85
 86Keep in mind that filter-branch only removes references from the DAG (history), but doesn't purge the `.git/objects` directory. Git always partitions repository cleanup into a separate step that is usually run on a scheduled basis.
 87
 88Expire all the old `reflog` entries now instead of at the scheduled time:
 89
 90    git reflog expire --expire=now --all
 91
 92
 93### Reset Working Directory
 94
 95Reset to the "new" (possibly different) `HEAD` state now that entries have been removed with `filter-branch`.
 96
 97    git reset --hard
 98
 99
100### Garbage Collection
101
102Garbage collect any orphaned entries. From the git-gc man page, please note that:
103
104> `git gc` tries very hard to be safe about the garbage it collects. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, refs saved by git `filter-branch` in `refs/original/`, or reflogs (which may reference commits in branches that were later amended or rewound).
105
106> If you are expecting some objects to be collected and they arenĂ¢€™t, check all of those locations and decide whether it makes sense in your case to remove those references.
107
108* `--prune=now` Prune all unreachable (orphaned) objects from the DAG without a separate invocation of prune
109
110`git gc --aggressive --prune=now`
111