/blog/2021/05/12/why-you-shouldn't-use-git-lfs/index.html
HTML | 262 lines | 221 code | 30 blank | 11 comment | 0 complexity | 1cd851d42fa513839b57d926ce1e3965 MD5 | raw file
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- <!--
- Design by Free CSS Templates
- http://www.freecsstemplates.org
- Released for free under a Creative Commons Attribution 2.5 License
- Name : Pollinating
- Description: A two-column, fixed-width design with dark color scheme.
- Version : 1.0
- Released : 20101114
- -->
- <html xmlns="http://www.w3.org/1999/xhtml">
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
-
- <title>Gregory Szorc's Digital Home
- | Why You Shouldn't Use Git LFS
- </title>
- <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/blog/feed" />
- <link rel="alternate" type="application/atom+xml" title="Atom 1.0"
- href="/blog/feed/atom" />
- <link rel="stylesheet" href="/style/style.css" type="text/css" />
- <link rel="stylesheet" href="/css/pygments_murphy.css" type="text/css" />
- </head>
- <body>
- <div id="wrapper">
-
- <div id="menu">
- <ul>
- <li><a href="/">Home</a></li>
- <li><a href="/blog/">Blog</a></li>
- <li><a href="/notes">Notes</a></li>
- <li><a href="/work.html">Work</a></li>
- <li><a href="/skills.html">Skills</a></li>
- <li><a href="/thoughts.html">Thoughts</a></li>
- <li><a href="/resume.pdf">Resume</a></li>
- </ul>
- </div>
- <div id="page">
- <div id="page-bgtop">
- <div id="page-bgbtm">
- <div id="content">
-
- <div class="blog_post">
- <a name="why-you-shouldn't-use-git-lfs"></a>
- <h2 class="blog_post_title"><a href="/blog/2021/05/12/why-you-shouldn't-use-git-lfs" rel="bookmark" title="Permanent Link to Why You Shouldn't Use Git LFS">Why You Shouldn't Use Git LFS</a></h2>
- <small>May 12, 2021 at 10:30 AM | categories:
- <a href='/blog/category/mercurial'>Mercurial</a>, <a href='/blog/category/git'>Git</a>
- </small><p/>
- <div class="post_prose">
-
- <p>I have long held the opinion that you should avoid Git LFS if possible.
- Since people keeping asking me why, I figured I'd capture my thoughts
- in a blog post so I have something to refer them to.</p>
- <p>Here are my reasons for not using Git LFS.</p>
- <h2>Git LFS is a Stop Gap Solution</h2>
- <p>Git LFS was developed outside the official Git project to fulfill a
- very real market need that Git didn't/doesn't handle large files very
- well.</p>
- <p>I believe it is inevitable that Git will gain better support for
- handling of large files, as this seems like a critical feature for
- a popular version control tool.</p>
- <p>If you make this long bet, LFS is only an interim solution and its
- value proposition disappears after Git has better native support
- for large files.</p>
- <p>LFS as a stop gap solution would be tolerable except for the fact
- that...</p>
- <h2>Git LFS is a One Way Door</h2>
- <p>The adoption or removal of Git LFS in a repository is an irreversible
- decision that requires rewriting history and losing your original
- commit SHAs.</p>
- <p>In some contexts, rewriting history is tolerable. In many others, it
- is an extremely expensive proposition. My experience maintaining
- version control in professional contexts aligns with the opinion
- that rewriting history is expensive and should only be considered a
- measure of last resort. Maybe if tools made it easier to rewrite
- history without the negative consequences (e.g. GitHub would redirect
- references to old SHA1 in URLs and API calls) I would change my opinion
- here. Until that day, the drawbacks of losing history are just too
- high to stomach for many.</p>
- <p>The reason adoption or removal of LFS is irreversible is due to the
- way Git LFS works. What LFS does is change the blob content that a
- Git commit/tree references. Instead of the content itself, it stores
- a pointer to the content. At checkout and commit time, LFS blobs/records
- are treated specially via a mechanism in Git that allows content to be
- rewritten as it moves between Git's core storage and its materialized
- representation. (The same filtering mechanism is responsible for
- normalizing line endings in text files. Although that feature is built
- into the core Git product and doesn't work exactly the same way. But
- the principles are the same.)</p>
- <p>Since the LFS pointer is part of the Merkle tree that a Git commit
- derives from, you can't add or remove LFS from a repo without rewriting
- existing Git commit SHAs.</p>
- <p>I want to explicitly call out that even if a rewrite is acceptable in
- the short term, things may change in the future. If you adopt LFS
- today, you are committing to a) running an LFS server forever b)
- incurring a history rewrite in the future in order to remove LFS from
- your repo, or c) ceasing to provide an LFS server and locking out people
- from using older Git commits. I don't think any of these are great
- options: I would prefer if there were a way to offboard from LFS in
- the future with minimal disruption. This is theoretically possible, but
- it requires the Git core product to recognize LFS blobs/records natively.
- There's no guarantee this will happen. So adoption of Git LFS is a one
- way door that can't be easily reversed.</p>
- <h2>LFS is More Complexity</h2>
- <p>LFS is more complex for Git end users.</p>
- <p>Git users have to install, configure, and sometimes know about the
- existence of Git LFS. Version control should <em>just work</em>. Large file
- handling should <em>just work</em>. End-users shouldn't have to care that
- large files are handled slightly differently from small files.</p>
- <p>The usability of Git LFS is generally pretty good. However, there's an
- upper limit on that usability as long as LFS exists outside the core
- Git product. And LFS will likely never be integrated into the core Git
- product because the Git maintainers know that LFS is only a stop gap
- solution. They would rather solve large files storage <em>correctly</em> than
- ~forever carry the legacy baggage of having to support LFS in the core
- product.</p>
- <p>LFS is more complexity for Git server operators as well. Instead of
- a self-contained Git repository and server to support, you now have
- to support a likely separate HTTP server to facilitate LFS access.
- This isn't the hardest thing in the world, especially since we're
- talking about key-value blob storage, which is arguably a solved problem.
- But it's another piece of infrastructure to support and secure and it
- increases the surface area of complexity instead of minimizing it.
- As a server operator, I would much prefer if the large file storage
- were integrated into the core Git product and I simply needed to provide
- some settings for it to <em>just work</em>.</p>
- <h2>Mercurial Does LFS Slightly Better</h2>
- <p>Since I'm a maintainer of the Mercurial version control tool, I thought
- I'd throw out how Mercurial handles large file storage better than
- Git. Mercurial's large file handling isn't great, but I believe it
- is strictly better with regards to the trade-offs of adopting large
- file storage.</p>
- <p>In Mercurial, use of LFS is a dynamic feature that server/repo operators
- can choose to enable or disable whenever they want. When the Mercurial
- server sends file content to a client, presence of external/LFS storage
- is a <em>flag</em> set on that file revision. Essentially, the flag says <em>the
- data you are receiving is an LFS record, not the file content itself</em> and
- the client knows how to resolve that record into content.</p>
- <p>Conceptually, this is little different from Git LFS records in terms of
- content resolution. However, the big difference is this flag is part
- of the repository interchange data, not the core repository data as it
- is with Git. Since this flag isn't part of the Merkle tree used to
- derive the commit SHA, adding, removing, or altering the content of the
- LFS records doesn't require rewriting commit SHAs. The tracked content
- SHA - the data now stored in LFS - is still tracked as part of the Merkle
- tree, so the integrity of the commit / repository can still be verified.</p>
- <p>In Mercurial, the choice of whether to use LFS and what to use LFS for
- is made by the server operator and settings can change over time. For
- example, you could start with no use of LFS and then one day decide to
- use LFS for all file revisions larger than 10 MB. Then a year later
- you lower that to all revisions larger than 1 MB. Then a year after
- that Mercurial gains better <em>native</em> support for large files and
- you decide to stop using LFS altogether.</p>
- <p>Also in Mercurial, it is possible for clients to push a large file
- <em>inline</em> as part of the push operation. When the server sees that
- large file, it can be like <em>this is a large file: I'm going to add
- it to the blob store and advertise it as LFS</em>. Because the large
- file record isn't part of the Merkle tree, you can have nice things
- like this.</p>
- <p>I suspect it is only a matter of time before Git's wire protocol learns
- the ability to dynamically advertise <em>remote servers</em> for content
- retrieval and this feature will be leveraged for better large file
- handling. Until that day, I suppose we're stuck with having to
- rewrite history with LFS and/or funnel large blobs through Git natively,
- with all the pain that entails.</p>
- <h2>Conclusion</h2>
- <p>This post summarized reasons to avoid Git LFS. Are there justifiable
- scenarios for using LFS? Absolutely! If you insist on using Git and
- insist on tracking many <em>large</em> files in version control, you
- should definitely consider LFS. (Although, if you are a heavy user
- of large files in version control, I would consider Plastic SCM instead,
- as they seem to have the most mature solution for large files handling.)</p>
- <p>The main point of this post is to highlight some drawbacks with
- using Git LFS because Git LFS is most definitely not a magic bullet. If
- you can stomach the short and long term effects of Git LFS adoption, by
- all means, use Git LFS. But please make an informed decision either way.</p>
- </div>
- </div>
- </div>
-
- <div id="sidebar">
- <ul>
- <li>
- <h2>Categories</h2>
- <ul>
- <li><a href="/blog/category/apple">Apple</a></li>
- <li><a href="/blog/category/bugzilla">Bugzilla</a></li>
- <li><a href="/blog/category/ci">CI</a></li>
- <li><a href="/blog/category/clang">Clang</a></li>
- <li><a href="/blog/category/docker">Docker</a></li>
- <li><a href="/blog/category/firefox">Firefox</a></li>
- <li><a href="/blog/category/git">Git</a></li>
- <li><a href="/blog/category/javascript">JavaScript</a></li>
- <li><a href="/blog/category/mercurial">Mercurial</a></li>
- <li><a href="/blog/category/mozreview">MozReview</a></li>
- <li><a href="/blog/category/mozilla">Mozilla</a></li>
- <li><a href="/blog/category/personal">Personal</a></li>
- <li><a href="/blog/category/programming">Programming</a></li>
- <li><a href="/blog/category/puppet">Puppet</a></li>
- <li><a href="/blog/category/pyoxidizer">PyOxidizer</a></li>
- <li><a href="/blog/category/python">Python</a></li>
- <li><a href="/blog/category/review-board">Review Board</a></li>
- <li><a href="/blog/category/rust">Rust</a></li>
- <li><a href="/blog/category/sync">Sync</a></li>
- <li><a href="/blog/category/browsers">browsers</a></li>
- <li><a href="/blog/category/build-system">build system</a></li>
- <li><a href="/blog/category/code-review">code review</a></li>
- <li><a href="/blog/category/compilers">compilers</a></li>
- <li><a href="/blog/category/internet">internet</a></li>
- <li><a href="/blog/category/logging">logging</a></li>
- <li><a href="/blog/category/mach">mach</a></li>
- <li><a href="/blog/category/make">make</a></li>
- <li><a href="/blog/category/misc">misc</a></li>
- <li><a href="/blog/category/movies">movies</a></li>
- <li><a href="/blog/category/packaging">packaging</a></li>
- <li><a href="/blog/category/pymake">pymake</a></li>
- <li><a href="/blog/category/security">security</a></li>
- <li><a href="/blog/category/sysadmin">sysadmin</a></li>
- <li><a href="/blog/category/testing">testing</a></li>
- </ul>
- </li>
- </ul>
- </div>
- <div style="clear: both;"> </div>
- </div>
- </div>
- </div>
- <div id="footer">
-
- <hr/>
- <p>Copyright (c) 2012- Gregory Szorc. All rights reserved. Design by <a href="http://www.freecsstemplates.org/"> CSS Templates</a>.</p>
- </div>
- </div>
- </body>
- </html>