PageRenderTime 56ms CodeModel.GetById 25ms RepoModel.GetById 0ms app.codeStats 0ms

/blog/2021/05/12/why-you-shouldn't-use-git-lfs/index.html

https://github.com/indygreg/indygreg.github.com
HTML | 262 lines | 221 code | 30 blank | 11 comment | 0 complexity | 1cd851d42fa513839b57d926ce1e3965 MD5 | raw file
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  2. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  3. <!--
  4. Design by Free CSS Templates
  5. http://www.freecsstemplates.org
  6. Released for free under a Creative Commons Attribution 2.5 License
  7. Name : Pollinating
  8. Description: A two-column, fixed-width design with dark color scheme.
  9. Version : 1.0
  10. Released : 20101114
  11. -->
  12. <html xmlns="http://www.w3.org/1999/xhtml">
  13. <head>
  14. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  15. <title>Gregory Szorc's Digital Home
  16. | Why You Shouldn't Use Git LFS
  17. </title>
  18. <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/blog/feed" />
  19. <link rel="alternate" type="application/atom+xml" title="Atom 1.0"
  20. href="/blog/feed/atom" />
  21. <link rel="stylesheet" href="/style/style.css" type="text/css" />
  22. <link rel="stylesheet" href="/css/pygments_murphy.css" type="text/css" />
  23. </head>
  24. <body>
  25. <div id="wrapper">
  26. <div id="menu">
  27. <ul>
  28. <li><a href="/">Home</a></li>
  29. <li><a href="/blog/">Blog</a></li>
  30. <li><a href="/notes">Notes</a></li>
  31. <li><a href="/work.html">Work</a></li>
  32. <li><a href="/skills.html">Skills</a></li>
  33. <li><a href="/thoughts.html">Thoughts</a></li>
  34. <li><a href="/resume.pdf">Resume</a></li>
  35. </ul>
  36. </div>
  37. <div id="page">
  38. <div id="page-bgtop">
  39. <div id="page-bgbtm">
  40. <div id="content">
  41. <div class="blog_post">
  42. <a name="why-you-shouldn't-use-git-lfs"></a>
  43. <h2 class="blog_post_title"><a href="/blog/2021/05/12/why-you-shouldn't-use-git-lfs" rel="bookmark" title="Permanent Link to Why You Shouldn't Use Git LFS">Why You Shouldn't Use Git LFS</a></h2>
  44. <small>May 12, 2021 at 10:30 AM | categories:
  45. <a href='/blog/category/mercurial'>Mercurial</a>, <a href='/blog/category/git'>Git</a>
  46. </small><p/>
  47. <div class="post_prose">
  48. <p>I have long held the opinion that you should avoid Git LFS if possible.
  49. Since people keeping asking me why, I figured I'd capture my thoughts
  50. in a blog post so I have something to refer them to.</p>
  51. <p>Here are my reasons for not using Git LFS.</p>
  52. <h2>Git LFS is a Stop Gap Solution</h2>
  53. <p>Git LFS was developed outside the official Git project to fulfill a
  54. very real market need that Git didn't/doesn't handle large files very
  55. well.</p>
  56. <p>I believe it is inevitable that Git will gain better support for
  57. handling of large files, as this seems like a critical feature for
  58. a popular version control tool.</p>
  59. <p>If you make this long bet, LFS is only an interim solution and its
  60. value proposition disappears after Git has better native support
  61. for large files.</p>
  62. <p>LFS as a stop gap solution would be tolerable except for the fact
  63. that...</p>
  64. <h2>Git LFS is a One Way Door</h2>
  65. <p>The adoption or removal of Git LFS in a repository is an irreversible
  66. decision that requires rewriting history and losing your original
  67. commit SHAs.</p>
  68. <p>In some contexts, rewriting history is tolerable. In many others, it
  69. is an extremely expensive proposition. My experience maintaining
  70. version control in professional contexts aligns with the opinion
  71. that rewriting history is expensive and should only be considered a
  72. measure of last resort. Maybe if tools made it easier to rewrite
  73. history without the negative consequences (e.g. GitHub would redirect
  74. references to old SHA1 in URLs and API calls) I would change my opinion
  75. here. Until that day, the drawbacks of losing history are just too
  76. high to stomach for many.</p>
  77. <p>The reason adoption or removal of LFS is irreversible is due to the
  78. way Git LFS works. What LFS does is change the blob content that a
  79. Git commit/tree references. Instead of the content itself, it stores
  80. a pointer to the content. At checkout and commit time, LFS blobs/records
  81. are treated specially via a mechanism in Git that allows content to be
  82. rewritten as it moves between Git's core storage and its materialized
  83. representation. (The same filtering mechanism is responsible for
  84. normalizing line endings in text files. Although that feature is built
  85. into the core Git product and doesn't work exactly the same way. But
  86. the principles are the same.)</p>
  87. <p>Since the LFS pointer is part of the Merkle tree that a Git commit
  88. derives from, you can't add or remove LFS from a repo without rewriting
  89. existing Git commit SHAs.</p>
  90. <p>I want to explicitly call out that even if a rewrite is acceptable in
  91. the short term, things may change in the future. If you adopt LFS
  92. today, you are committing to a) running an LFS server forever b)
  93. incurring a history rewrite in the future in order to remove LFS from
  94. your repo, or c) ceasing to provide an LFS server and locking out people
  95. from using older Git commits. I don't think any of these are great
  96. options: I would prefer if there were a way to offboard from LFS in
  97. the future with minimal disruption. This is theoretically possible, but
  98. it requires the Git core product to recognize LFS blobs/records natively.
  99. There's no guarantee this will happen. So adoption of Git LFS is a one
  100. way door that can't be easily reversed.</p>
  101. <h2>LFS is More Complexity</h2>
  102. <p>LFS is more complex for Git end users.</p>
  103. <p>Git users have to install, configure, and sometimes know about the
  104. existence of Git LFS. Version control should <em>just work</em>. Large file
  105. handling should <em>just work</em>. End-users shouldn't have to care that
  106. large files are handled slightly differently from small files.</p>
  107. <p>The usability of Git LFS is generally pretty good. However, there's an
  108. upper limit on that usability as long as LFS exists outside the core
  109. Git product. And LFS will likely never be integrated into the core Git
  110. product because the Git maintainers know that LFS is only a stop gap
  111. solution. They would rather solve large files storage <em>correctly</em> than
  112. ~forever carry the legacy baggage of having to support LFS in the core
  113. product.</p>
  114. <p>LFS is more complexity for Git server operators as well. Instead of
  115. a self-contained Git repository and server to support, you now have
  116. to support a likely separate HTTP server to facilitate LFS access.
  117. This isn't the hardest thing in the world, especially since we're
  118. talking about key-value blob storage, which is arguably a solved problem.
  119. But it's another piece of infrastructure to support and secure and it
  120. increases the surface area of complexity instead of minimizing it.
  121. As a server operator, I would much prefer if the large file storage
  122. were integrated into the core Git product and I simply needed to provide
  123. some settings for it to <em>just work</em>.</p>
  124. <h2>Mercurial Does LFS Slightly Better</h2>
  125. <p>Since I'm a maintainer of the Mercurial version control tool, I thought
  126. I'd throw out how Mercurial handles large file storage better than
  127. Git. Mercurial's large file handling isn't great, but I believe it
  128. is strictly better with regards to the trade-offs of adopting large
  129. file storage.</p>
  130. <p>In Mercurial, use of LFS is a dynamic feature that server/repo operators
  131. can choose to enable or disable whenever they want. When the Mercurial
  132. server sends file content to a client, presence of external/LFS storage
  133. is a <em>flag</em> set on that file revision. Essentially, the flag says <em>the
  134. data you are receiving is an LFS record, not the file content itself</em> and
  135. the client knows how to resolve that record into content.</p>
  136. <p>Conceptually, this is little different from Git LFS records in terms of
  137. content resolution. However, the big difference is this flag is part
  138. of the repository interchange data, not the core repository data as it
  139. is with Git. Since this flag isn't part of the Merkle tree used to
  140. derive the commit SHA, adding, removing, or altering the content of the
  141. LFS records doesn't require rewriting commit SHAs. The tracked content
  142. SHA - the data now stored in LFS - is still tracked as part of the Merkle
  143. tree, so the integrity of the commit / repository can still be verified.</p>
  144. <p>In Mercurial, the choice of whether to use LFS and what to use LFS for
  145. is made by the server operator and settings can change over time. For
  146. example, you could start with no use of LFS and then one day decide to
  147. use LFS for all file revisions larger than 10 MB. Then a year later
  148. you lower that to all revisions larger than 1 MB. Then a year after
  149. that Mercurial gains better <em>native</em> support for large files and
  150. you decide to stop using LFS altogether.</p>
  151. <p>Also in Mercurial, it is possible for clients to push a large file
  152. <em>inline</em> as part of the push operation. When the server sees that
  153. large file, it can be like <em>this is a large file: I'm going to add
  154. it to the blob store and advertise it as LFS</em>. Because the large
  155. file record isn't part of the Merkle tree, you can have nice things
  156. like this.</p>
  157. <p>I suspect it is only a matter of time before Git's wire protocol learns
  158. the ability to dynamically advertise <em>remote servers</em> for content
  159. retrieval and this feature will be leveraged for better large file
  160. handling. Until that day, I suppose we're stuck with having to
  161. rewrite history with LFS and/or funnel large blobs through Git natively,
  162. with all the pain that entails.</p>
  163. <h2>Conclusion</h2>
  164. <p>This post summarized reasons to avoid Git LFS. Are there justifiable
  165. scenarios for using LFS? Absolutely! If you insist on using Git and
  166. insist on tracking many <em>large</em> files in version control, you
  167. should definitely consider LFS. (Although, if you are a heavy user
  168. of large files in version control, I would consider Plastic SCM instead,
  169. as they seem to have the most mature solution for large files handling.)</p>
  170. <p>The main point of this post is to highlight some drawbacks with
  171. using Git LFS because Git LFS is most definitely not a magic bullet. If
  172. you can stomach the short and long term effects of Git LFS adoption, by
  173. all means, use Git LFS. But please make an informed decision either way.</p>
  174. </div>
  175. </div>
  176. </div>
  177. <div id="sidebar">
  178. <ul>
  179. <li>
  180. <h2>Categories</h2>
  181. <ul>
  182. <li><a href="/blog/category/apple">Apple</a></li>
  183. <li><a href="/blog/category/bugzilla">Bugzilla</a></li>
  184. <li><a href="/blog/category/ci">CI</a></li>
  185. <li><a href="/blog/category/clang">Clang</a></li>
  186. <li><a href="/blog/category/docker">Docker</a></li>
  187. <li><a href="/blog/category/firefox">Firefox</a></li>
  188. <li><a href="/blog/category/git">Git</a></li>
  189. <li><a href="/blog/category/javascript">JavaScript</a></li>
  190. <li><a href="/blog/category/mercurial">Mercurial</a></li>
  191. <li><a href="/blog/category/mozreview">MozReview</a></li>
  192. <li><a href="/blog/category/mozilla">Mozilla</a></li>
  193. <li><a href="/blog/category/personal">Personal</a></li>
  194. <li><a href="/blog/category/programming">Programming</a></li>
  195. <li><a href="/blog/category/puppet">Puppet</a></li>
  196. <li><a href="/blog/category/pyoxidizer">PyOxidizer</a></li>
  197. <li><a href="/blog/category/python">Python</a></li>
  198. <li><a href="/blog/category/review-board">Review Board</a></li>
  199. <li><a href="/blog/category/rust">Rust</a></li>
  200. <li><a href="/blog/category/sync">Sync</a></li>
  201. <li><a href="/blog/category/browsers">browsers</a></li>
  202. <li><a href="/blog/category/build-system">build system</a></li>
  203. <li><a href="/blog/category/code-review">code review</a></li>
  204. <li><a href="/blog/category/compilers">compilers</a></li>
  205. <li><a href="/blog/category/internet">internet</a></li>
  206. <li><a href="/blog/category/logging">logging</a></li>
  207. <li><a href="/blog/category/mach">mach</a></li>
  208. <li><a href="/blog/category/make">make</a></li>
  209. <li><a href="/blog/category/misc">misc</a></li>
  210. <li><a href="/blog/category/movies">movies</a></li>
  211. <li><a href="/blog/category/packaging">packaging</a></li>
  212. <li><a href="/blog/category/pymake">pymake</a></li>
  213. <li><a href="/blog/category/security">security</a></li>
  214. <li><a href="/blog/category/sysadmin">sysadmin</a></li>
  215. <li><a href="/blog/category/testing">testing</a></li>
  216. </ul>
  217. </li>
  218. </ul>
  219. </div>
  220. <div style="clear: both;">&nbsp;</div>
  221. </div>
  222. </div>
  223. </div>
  224. <div id="footer">
  225. <hr/>
  226. <p>Copyright (c) 2012- Gregory Szorc. All rights reserved. Design by <a href="http://www.freecsstemplates.org/"> CSS Templates</a>.</p>
  227. </div>
  228. </div>
  229. </body>
  230. </html>