PageRenderTime 39ms CodeModel.GetById 11ms RepoModel.GetById 1ms app.codeStats 0ms

/blog/2015/10/22/append-i/o-performance-on-windows/index.html

https://github.com/indygreg/indygreg.github.com
HTML | 219 lines | 178 code | 30 blank | 11 comment | 0 complexity | d0d235d85ae5bd02b8b47f92d4c46459 MD5 | raw file
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  2. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  3. <!--
  4. Design by Free CSS Templates
  5. http://www.freecsstemplates.org
  6. Released for free under a Creative Commons Attribution 2.5 License
  7. Name : Pollinating
  8. Description: A two-column, fixed-width design with dark color scheme.
  9. Version : 1.0
  10. Released : 20101114
  11. -->
  12. <html xmlns="http://www.w3.org/1999/xhtml">
  13. <head>
  14. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  15. <title>Gregory Szorc's Digital Home
  16. | Append I/O Performance on Windows
  17. </title>
  18. <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/blog/feed" />
  19. <link rel="alternate" type="application/atom+xml" title="Atom 1.0"
  20. href="/blog/feed/atom" />
  21. <link rel="stylesheet" href="/style/style.css" type="text/css" />
  22. <link rel="stylesheet" href="/css/pygments_murphy.css" type="text/css" />
  23. </head>
  24. <body>
  25. <div id="wrapper">
  26. <div id="menu">
  27. <ul>
  28. <li><a href="/">Home</a></li>
  29. <li><a href="/blog/">Blog</a></li>
  30. <li><a href="/notes">Notes</a></li>
  31. <li><a href="/work.html">Work</a></li>
  32. <li><a href="/skills.html">Skills</a></li>
  33. <li><a href="/thoughts.html">Thoughts</a></li>
  34. <li><a href="/resume.pdf">Resume</a></li>
  35. </ul>
  36. </div>
  37. <div id="page">
  38. <div id="page-bgtop">
  39. <div id="page-bgbtm">
  40. <div id="content">
  41. <div class="blog_post">
  42. <a name="append-i/o-performance-on-windows"></a>
  43. <h2 class="blog_post_title"><a href="/blog/2015/10/22/append-i/o-performance-on-windows" rel="bookmark" title="Permanent Link to Append I/O Performance on Windows">Append I/O Performance on Windows</a></h2>
  44. <small>October 22, 2015 at 02:15 AM | categories:
  45. <a href='/blog/category/mozilla'>Mozilla</a>
  46. </small><p/>
  47. <div class="post_prose">
  48. <p>A few weeks ago, some coworkers were complaining about the relative
  49. performance of Mercurial cloning on Windows. I investigated on my
  50. brand new i7-6700K Windows 10 desktop machine and sure enough they were
  51. correct: cloning times on Windows were several minutes slower than
  52. Linux on the same hardware. What gives?</p>
  53. <p>I performed a few clones with Python under a profiler. It pointed to a
  54. potential slowdown in file I/O. I wanted more details so I fired up
  55. <a href="https://technet.microsoft.com/en-us/library/bb896645.aspx">Sysinternals Process Monitor</a>
  56. (strace for Windows) and captured data for a clone.</p>
  57. <p>As I was looking at the raw system calls related to I/O, something
  58. immediately popped out: <em>CloseFile()</em> operations were frequently
  59. taking 1-5 <em>milliseconds</em> whereas other operations like opening, reading,
  60. and writing files only took 1-5 <em>microseconds</em>. <strong>That's a 1000x
  61. difference!</strong></p>
  62. <p>I wrote a custom Python script to analyze an export of Process Monitor's
  63. data. Sure enough, it said we were spending hundreds of seconds in
  64. <em>CloseFile()</em> operations (it was being called a few hundred thousand
  65. times). I posted the findings to some mailing lists.
  66. <a href="https://groups.google.com/d/msg/mozilla.dev.platform/yupx2ToQ5T4/WAMC_Q-DCAAJ">Follow-ups</a>
  67. in Mozilla's dev-platform list pointed me to
  68. <a href="http://blogs.msdn.com/b/oldnewthing/archive/2011/09/23/10215586.aspx">an old MSDN blog post</a>
  69. where it documents behavior similar to what I was seeing.</p>
  70. <p>Long story short, <strong>closing file handles that have been appended to is
  71. slow on Windows.</strong> This is apparently due to an implementation detail of
  72. NTFS. Writing to a file in place is fine and only takes microseconds for
  73. the open, write, and close. But if you append a file, closing the
  74. associated file handle is going to take a few milliseconds. Even if you
  75. are using Overlapped I/O (async I/O on Windows), the <em>CloseHandle()</em>
  76. call to close the file handle blocks the calling thread! Seriously.</p>
  77. <p>This behavior is in stark contrast to Linux and OS X, where system I/O
  78. functions take microseconds (assuming your I/O subsystem can keep up).</p>
  79. <p>There are two ways to work around this issue:</p>
  80. <ol>
  81. <li>Reduce the amount of file closing operations on appended files.</li>
  82. <li>Use multiple threads for I/O on Windows.</li>
  83. </ol>
  84. <p>Armed with this knowledge, I dug into the guts of Mercurial and
  85. proceeded to write a
  86. <a href="https://selenic.com/repo/hg/rev/836291420d53">number</a>
  87. <a href="https://selenic.com/repo/hg/rev/39d643252b9f">of</a>
  88. <a href="https://selenic.com/repo/hg/rev/56a640b0f656">patches</a>
  89. that drastically reduced the amount of file I/O system calls during
  90. clone and pull operations. While I intend to write a blog post with the
  91. full details, <strong>cloning the Firefox repository with Mercurial 3.6 on
  92. Windows is now several minutes faster.</strong> Pretty much all of this is due
  93. to reducing the number of file close operations by aggressively reusing
  94. file handles.</p>
  95. <p>I also
  96. <a href="https://selenic.com/pipermail/mercurial-devel/2015-September/073788.html">experimented</a>
  97. with moving file close operations to a separate thread on Windows. While
  98. this change didn't make it into Mercurial 3.6, the results were very
  99. promising. Even on Python (which doesn't have real asynchronous threads
  100. due to the GIL), moving file closing to a background thread freed up the
  101. main thread to do the CPU heavy work of processing data. This made clones
  102. several minutes faster. (Python does release the GIL when performing an
  103. I/O system call.) Furthermore, <strong>simply creating a dedicated thread for
  104. closing file handles made Mercurial faster than 7-zip at writing tens of
  105. thousands of files from an uncompressed tar archive</strong>. (I'm not going to
  106. post the time for <em>tar</em> on Windows because it is embarassing.) That's a
  107. Python process on Windows faster than a native executable that is lauded
  108. for its speed (7-zip). Just by offloading file closing to a single
  109. separate thread. Crazy.</p>
  110. <p>I can optimize file closing in Mercurial all I want. However,
  111. Mercurial's storage model relies on several files. For the Firefox
  112. repository, we have to write ~225,000 files during clone. Assuming
  113. 1ms per file close (which is generous), that's 225s (or 3:45) wall
  114. time performing file closes. That's not going to scale. I've already
  115. started experimenting with alternative storage modes that initially use
  116. 1-6 files. This should enable Mercurial clones to run at over 100 MB/s
  117. (yes, Python and Windows can do I/O that quickly if you are smart about
  118. things).</p>
  119. <p>My primary takeaway is that creating/appending to thousands of files
  120. is slow on Windows and should be addressed at the architecture level
  121. by not requiring thousands of files and at the implementation level
  122. by minimizing the number of file close operations after write.
  123. If you absolutely must create/append to thousands of files, use multiple
  124. threads for at least closing file handles.</p>
  125. <p>My secondary takeaway is that Sysinternals Process Monitor is amazing.
  126. I used it against Firefox and immediately found
  127. <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1211090">performance concerns</a>.
  128. It can be extremely eye opening to see how your higher-level code
  129. is translated into function calls into your operating system and where
  130. the performance hot spots are or aren't at the OS level.</p>
  131. </div>
  132. </div>
  133. </div>
  134. <div id="sidebar">
  135. <ul>
  136. <li>
  137. <h2>Categories</h2>
  138. <ul>
  139. <li><a href="/blog/category/apple">Apple</a></li>
  140. <li><a href="/blog/category/bugzilla">Bugzilla</a></li>
  141. <li><a href="/blog/category/ci">CI</a></li>
  142. <li><a href="/blog/category/clang">Clang</a></li>
  143. <li><a href="/blog/category/docker">Docker</a></li>
  144. <li><a href="/blog/category/firefox">Firefox</a></li>
  145. <li><a href="/blog/category/git">Git</a></li>
  146. <li><a href="/blog/category/javascript">JavaScript</a></li>
  147. <li><a href="/blog/category/mercurial">Mercurial</a></li>
  148. <li><a href="/blog/category/mozreview">MozReview</a></li>
  149. <li><a href="/blog/category/mozilla">Mozilla</a></li>
  150. <li><a href="/blog/category/personal">Personal</a></li>
  151. <li><a href="/blog/category/programming">Programming</a></li>
  152. <li><a href="/blog/category/puppet">Puppet</a></li>
  153. <li><a href="/blog/category/pyoxidizer">PyOxidizer</a></li>
  154. <li><a href="/blog/category/python">Python</a></li>
  155. <li><a href="/blog/category/review-board">Review Board</a></li>
  156. <li><a href="/blog/category/rust">Rust</a></li>
  157. <li><a href="/blog/category/sync">Sync</a></li>
  158. <li><a href="/blog/category/browsers">browsers</a></li>
  159. <li><a href="/blog/category/build-system">build system</a></li>
  160. <li><a href="/blog/category/code-review">code review</a></li>
  161. <li><a href="/blog/category/compilers">compilers</a></li>
  162. <li><a href="/blog/category/internet">internet</a></li>
  163. <li><a href="/blog/category/logging">logging</a></li>
  164. <li><a href="/blog/category/mach">mach</a></li>
  165. <li><a href="/blog/category/make">make</a></li>
  166. <li><a href="/blog/category/misc">misc</a></li>
  167. <li><a href="/blog/category/movies">movies</a></li>
  168. <li><a href="/blog/category/packaging">packaging</a></li>
  169. <li><a href="/blog/category/pymake">pymake</a></li>
  170. <li><a href="/blog/category/security">security</a></li>
  171. <li><a href="/blog/category/sysadmin">sysadmin</a></li>
  172. <li><a href="/blog/category/testing">testing</a></li>
  173. </ul>
  174. </li>
  175. </ul>
  176. </div>
  177. <div style="clear: both;">&nbsp;</div>
  178. </div>
  179. </div>
  180. </div>
  181. <div id="footer">
  182. <hr/>
  183. <p>Copyright (c) 2012- Gregory Szorc. All rights reserved. Design by <a href="http://www.freecsstemplates.org/"> CSS Templates</a>.</p>
  184. </div>
  185. </div>
  186. </body>
  187. </html>