/workbook/markdown/27-Filter-Branch.md

http://github.com/matthewmccullough/git-workshop · Markdown · 111 lines · 51 code · 60 blank · 0 comment · 0 complexity · beb68c2b041c8a8aa6de4ed8c3f21367 MD5 · raw file

  1. # Filter-Branch
  2. This chapter was inspired by the Grails repo-splitting script from [Jeff Scott Brown of SpringSource](http://www.springsource.com/people/jbrown).
  3. ## Pruning out folders from a repo
  4. `filter-branch` is commonly used on a clone of the repo to split a too-large repo into smaller ones.
  5. ## `filter-branch` Reference Page
  6. [Filter Branch Command Documentation](http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html)
  7. ## Process
  8. We'll start with `bigrepo` and create a new repo that contains only `c`. If we wanted to split into multiple repos, we would simply clone for each desired final repo and run a `filter-branch` on each.
  9. ## Preparation
  10. This is a destructive command and thus we need to clone the repository before we start operating on it. We will effectively be creating a new repository and leaving the old repository behind.
  11. Since git can optimize local-disk clones of repositories with hardlinks, we want to create a clone that is entirely separate from the original one:
  12. git clone --no-hardlinks /path/to/originalrepo newrepo
  13. ## Pruning
  14. ### With a Subdirectory
  15. If we wish to only save the files in the `c` directory while purging all branches, we can run the following command. Oddly lightweight tags are kept here. This _relocates_ all files in the `c` subfolder to the _root_ of the new repository. This is usually not what we want. Users typically want to specify what to prune away, leaving all other folder structures intact.
  16. git filter-branch --subdirectory-filter c HEAD
  17. The same command can have an additional option to keep all the branches.
  18. * the `--` separates filter-branch options from revision options
  19. * the `--all` rewrites all branches and tags
  20. * the `--prune-empty` removes commits that would no longer have any content
  21. `git filter-branch --prune-empty --subdirectory-filter c HEAD -- --all`
  22. ### With the Tree and Checkouts
  23. Alternatively, we can use a tree filter which _prunes away_ the selected folder or filename pattern using shell commands. It checks out each commit and runs the command against it. This allows for the full power of any shell command to be leveraged, including greps.
  24. * `-f` force the `rm` or else commits where that file didn't exist would fail on the shell command.
  25. * `--prune-empty` removes any commits that have no files (blank, empty) after the shell command performs its surgery.
  26. `git filter-branch --tree-filter "rm -rf c" --prune-empty HEAD`
  27. ### With the Index
  28. A variation on this is the `--index-filter` which is much faster. It only operates on the DAG, not on checkouts and the staging area. It only uses git commands, not full shell commands.
  29. * `--cached` is supplied to leave untracked files alone. Only operate on tracked files.
  30. * `--ignore-unmatch` is supplied to allow the command to always succeed for every commit, even if the file didn't exist.
  31. `git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch c" --prune-empty HEAD`
  32. And this variation that adds the `--tag-name-filter` and `-- --all` which keeps the `.git/refs/heads/original/refs/tags` folder, keeps all references to the original tags in the `/git/info/refs` file, and re-writes the tag to `.git/refs/tags/AGOODPOINT` and `.git/refs/heads/addingonefile` branch.
  33. * `--tag-name-filter cat` re-writes all tags
  34. `git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch c" --prune-empty --tag-name-filter cat -- --all`
  35. ## Cleanup
  36. ### Remove any `original` refs
  37. Many of the `filter-branch` invocations will create a `.git/refs/original` folder to allow for a restore after a `filter-branch` execution. These are still first class references and will cause the objects to be retained. If you have reviewed the results of the filter and are satisfied with the result, remove these refs so that the objects can be cleaned up in the next steps.
  38. `git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d`
  39. ### Expire all entries from the reflog
  40. Keep in mind that filter-branch only removes references from the DAG (history), but doesn't purge the `.git/objects` directory. Git always partitions repository cleanup into a separate step that is usually run on a scheduled basis.
  41. Expire all the old `reflog` entries now instead of at the scheduled time:
  42. git reflog expire --expire=now --all
  43. ### Reset Working Directory
  44. Reset to the "new" (possibly different) `HEAD` state now that entries have been removed with `filter-branch`.
  45. git reset --hard
  46. ### Garbage Collection
  47. Garbage collect any orphaned entries. From the git-gc man page, please note that:
  48. > `git gc` tries very hard to be safe about the garbage it collects. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, refs saved by git `filter-branch` in `refs/original/`, or reflogs (which may reference commits in branches that were later amended or rewound).
  49. > If you are expecting some objects to be collected and they aren’t, check all of those locations and decide whether it makes sense in your case to remove those references.
  50. * `--prune=now` Prune all unreachable (orphaned) objects from the DAG without a separate invocation of prune
  51. `git gc --aggressive --prune=now`