/blog/2015/10/22/append-i/o-performance-on-windows/index.html
HTML | 219 lines | 178 code | 30 blank | 11 comment | 0 complexity | d0d235d85ae5bd02b8b47f92d4c46459 MD5 | raw file
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- <!--
- Design by Free CSS Templates
- http://www.freecsstemplates.org
- Released for free under a Creative Commons Attribution 2.5 License
- Name : Pollinating
- Description: A two-column, fixed-width design with dark color scheme.
- Version : 1.0
- Released : 20101114
- -->
- <html xmlns="http://www.w3.org/1999/xhtml">
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
-
- <title>Gregory Szorc's Digital Home
- | Append I/O Performance on Windows
- </title>
- <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/blog/feed" />
- <link rel="alternate" type="application/atom+xml" title="Atom 1.0"
- href="/blog/feed/atom" />
- <link rel="stylesheet" href="/style/style.css" type="text/css" />
- <link rel="stylesheet" href="/css/pygments_murphy.css" type="text/css" />
- </head>
- <body>
- <div id="wrapper">
-
- <div id="menu">
- <ul>
- <li><a href="/">Home</a></li>
- <li><a href="/blog/">Blog</a></li>
- <li><a href="/notes">Notes</a></li>
- <li><a href="/work.html">Work</a></li>
- <li><a href="/skills.html">Skills</a></li>
- <li><a href="/thoughts.html">Thoughts</a></li>
- <li><a href="/resume.pdf">Resume</a></li>
- </ul>
- </div>
- <div id="page">
- <div id="page-bgtop">
- <div id="page-bgbtm">
- <div id="content">
-
- <div class="blog_post">
- <a name="append-i/o-performance-on-windows"></a>
- <h2 class="blog_post_title"><a href="/blog/2015/10/22/append-i/o-performance-on-windows" rel="bookmark" title="Permanent Link to Append I/O Performance on Windows">Append I/O Performance on Windows</a></h2>
- <small>October 22, 2015 at 02:15 AM | categories:
- <a href='/blog/category/mozilla'>Mozilla</a>
- </small><p/>
- <div class="post_prose">
-
- <p>A few weeks ago, some coworkers were complaining about the relative
- performance of Mercurial cloning on Windows. I investigated on my
- brand new i7-6700K Windows 10 desktop machine and sure enough they were
- correct: cloning times on Windows were several minutes slower than
- Linux on the same hardware. What gives?</p>
- <p>I performed a few clones with Python under a profiler. It pointed to a
- potential slowdown in file I/O. I wanted more details so I fired up
- <a href="https://technet.microsoft.com/en-us/library/bb896645.aspx">Sysinternals Process Monitor</a>
- (strace for Windows) and captured data for a clone.</p>
- <p>As I was looking at the raw system calls related to I/O, something
- immediately popped out: <em>CloseFile()</em> operations were frequently
- taking 1-5 <em>milliseconds</em> whereas other operations like opening, reading,
- and writing files only took 1-5 <em>microseconds</em>. <strong>That's a 1000x
- difference!</strong></p>
- <p>I wrote a custom Python script to analyze an export of Process Monitor's
- data. Sure enough, it said we were spending hundreds of seconds in
- <em>CloseFile()</em> operations (it was being called a few hundred thousand
- times). I posted the findings to some mailing lists.
- <a href="https://groups.google.com/d/msg/mozilla.dev.platform/yupx2ToQ5T4/WAMC_Q-DCAAJ">Follow-ups</a>
- in Mozilla's dev-platform list pointed me to
- <a href="http://blogs.msdn.com/b/oldnewthing/archive/2011/09/23/10215586.aspx">an old MSDN blog post</a>
- where it documents behavior similar to what I was seeing.</p>
- <p>Long story short, <strong>closing file handles that have been appended to is
- slow on Windows.</strong> This is apparently due to an implementation detail of
- NTFS. Writing to a file in place is fine and only takes microseconds for
- the open, write, and close. But if you append a file, closing the
- associated file handle is going to take a few milliseconds. Even if you
- are using Overlapped I/O (async I/O on Windows), the <em>CloseHandle()</em>
- call to close the file handle blocks the calling thread! Seriously.</p>
- <p>This behavior is in stark contrast to Linux and OS X, where system I/O
- functions take microseconds (assuming your I/O subsystem can keep up).</p>
- <p>There are two ways to work around this issue:</p>
- <ol>
- <li>Reduce the amount of file closing operations on appended files.</li>
- <li>Use multiple threads for I/O on Windows.</li>
- </ol>
- <p>Armed with this knowledge, I dug into the guts of Mercurial and
- proceeded to write a
- <a href="https://selenic.com/repo/hg/rev/836291420d53">number</a>
- <a href="https://selenic.com/repo/hg/rev/39d643252b9f">of</a>
- <a href="https://selenic.com/repo/hg/rev/56a640b0f656">patches</a>
- that drastically reduced the amount of file I/O system calls during
- clone and pull operations. While I intend to write a blog post with the
- full details, <strong>cloning the Firefox repository with Mercurial 3.6 on
- Windows is now several minutes faster.</strong> Pretty much all of this is due
- to reducing the number of file close operations by aggressively reusing
- file handles.</p>
- <p>I also
- <a href="https://selenic.com/pipermail/mercurial-devel/2015-September/073788.html">experimented</a>
- with moving file close operations to a separate thread on Windows. While
- this change didn't make it into Mercurial 3.6, the results were very
- promising. Even on Python (which doesn't have real asynchronous threads
- due to the GIL), moving file closing to a background thread freed up the
- main thread to do the CPU heavy work of processing data. This made clones
- several minutes faster. (Python does release the GIL when performing an
- I/O system call.) Furthermore, <strong>simply creating a dedicated thread for
- closing file handles made Mercurial faster than 7-zip at writing tens of
- thousands of files from an uncompressed tar archive</strong>. (I'm not going to
- post the time for <em>tar</em> on Windows because it is embarassing.) That's a
- Python process on Windows faster than a native executable that is lauded
- for its speed (7-zip). Just by offloading file closing to a single
- separate thread. Crazy.</p>
- <p>I can optimize file closing in Mercurial all I want. However,
- Mercurial's storage model relies on several files. For the Firefox
- repository, we have to write ~225,000 files during clone. Assuming
- 1ms per file close (which is generous), that's 225s (or 3:45) wall
- time performing file closes. That's not going to scale. I've already
- started experimenting with alternative storage modes that initially use
- 1-6 files. This should enable Mercurial clones to run at over 100 MB/s
- (yes, Python and Windows can do I/O that quickly if you are smart about
- things).</p>
- <p>My primary takeaway is that creating/appending to thousands of files
- is slow on Windows and should be addressed at the architecture level
- by not requiring thousands of files and at the implementation level
- by minimizing the number of file close operations after write.
- If you absolutely must create/append to thousands of files, use multiple
- threads for at least closing file handles.</p>
- <p>My secondary takeaway is that Sysinternals Process Monitor is amazing.
- I used it against Firefox and immediately found
- <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1211090">performance concerns</a>.
- It can be extremely eye opening to see how your higher-level code
- is translated into function calls into your operating system and where
- the performance hot spots are or aren't at the OS level.</p>
- </div>
- </div>
- </div>
-
- <div id="sidebar">
- <ul>
- <li>
- <h2>Categories</h2>
- <ul>
- <li><a href="/blog/category/apple">Apple</a></li>
- <li><a href="/blog/category/bugzilla">Bugzilla</a></li>
- <li><a href="/blog/category/ci">CI</a></li>
- <li><a href="/blog/category/clang">Clang</a></li>
- <li><a href="/blog/category/docker">Docker</a></li>
- <li><a href="/blog/category/firefox">Firefox</a></li>
- <li><a href="/blog/category/git">Git</a></li>
- <li><a href="/blog/category/javascript">JavaScript</a></li>
- <li><a href="/blog/category/mercurial">Mercurial</a></li>
- <li><a href="/blog/category/mozreview">MozReview</a></li>
- <li><a href="/blog/category/mozilla">Mozilla</a></li>
- <li><a href="/blog/category/personal">Personal</a></li>
- <li><a href="/blog/category/programming">Programming</a></li>
- <li><a href="/blog/category/puppet">Puppet</a></li>
- <li><a href="/blog/category/pyoxidizer">PyOxidizer</a></li>
- <li><a href="/blog/category/python">Python</a></li>
- <li><a href="/blog/category/review-board">Review Board</a></li>
- <li><a href="/blog/category/rust">Rust</a></li>
- <li><a href="/blog/category/sync">Sync</a></li>
- <li><a href="/blog/category/browsers">browsers</a></li>
- <li><a href="/blog/category/build-system">build system</a></li>
- <li><a href="/blog/category/code-review">code review</a></li>
- <li><a href="/blog/category/compilers">compilers</a></li>
- <li><a href="/blog/category/internet">internet</a></li>
- <li><a href="/blog/category/logging">logging</a></li>
- <li><a href="/blog/category/mach">mach</a></li>
- <li><a href="/blog/category/make">make</a></li>
- <li><a href="/blog/category/misc">misc</a></li>
- <li><a href="/blog/category/movies">movies</a></li>
- <li><a href="/blog/category/packaging">packaging</a></li>
- <li><a href="/blog/category/pymake">pymake</a></li>
- <li><a href="/blog/category/security">security</a></li>
- <li><a href="/blog/category/sysadmin">sysadmin</a></li>
- <li><a href="/blog/category/testing">testing</a></li>
- </ul>
- </li>
- </ul>
- </div>
- <div style="clear: both;"> </div>
- </div>
- </div>
- </div>
- <div id="footer">
-
- <hr/>
- <p>Copyright (c) 2012- Gregory Szorc. All rights reserved. Design by <a href="http://www.freecsstemplates.org/"> CSS Templates</a>.</p>
- </div>
- </div>
- </body>
- </html>