/doc/cpuprofile.html

http://google-perftools.googlecode.com/ · HTML · 520 lines · 441 code · 76 blank · 3 comment · 0 complexity · 00628739023f29ea9f50e3fc8b9da5c8 MD5 · raw file

  1. <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
  2. <HTML>
  3. <HEAD>
  4. <link rel="stylesheet" href="designstyle.css">
  5. <title>Google CPU Profiler</title>
  6. </HEAD>
  7. <BODY>
  8. <p align=right>
  9. <i>Last modified
  10. <script type=text/javascript>
  11. var lm = new Date(document.lastModified);
  12. document.write(lm.toDateString());
  13. </script></i>
  14. </p>
  15. <p>This is the CPU profiler we use at Google. There are three parts
  16. to using it: linking the library into an application, running the
  17. code, and analyzing the output.</p>
  18. <p>On the off-chance that you should need to understand it, the CPU
  19. profiler data file format is documented separately,
  20. <a href="cpuprofile-fileformat.html">here</a>.
  21. <H1>Linking in the Library</H1>
  22. <p>To install the CPU profiler into your executable, add
  23. <code>-lprofiler</code> to the link-time step for your executable.
  24. (It's also probably possible to add in the profiler at run-time using
  25. <code>LD_PRELOAD</code>, e.g.
  26. <code>% env LD_PRELOAD="/usr/lib/libprofiler.so" &lt;binary&gt;</code>,
  27. but this isn't necessarily recommended.)</p>
  28. <p>This does <i>not</i> turn on CPU profiling; it just inserts the
  29. code. For that reason, it's practical to just always link
  30. <code>-lprofiler</code> into a binary while developing; that's what we
  31. do at Google. (However, since any user can turn on the profiler by
  32. setting an environment variable, it's not necessarily recommended to
  33. install profiler-linked binaries into a production, running
  34. system.)</p>
  35. <H1>Running the Code</H1>
  36. <p>There are several alternatives to actually turn on CPU profiling
  37. for a given run of an executable:</p>
  38. <ol>
  39. <li> <p>Define the environment variable CPUPROFILE to the filename
  40. to dump the profile to. For instance, to profile
  41. <code>/usr/local/bin/my_binary_compiled_with_libprofiler_so</code>:</p>
  42. <pre>% env CPUPROFILE=/tmp/mybin.prof /usr/local/bin/my_binary_compiled_with_libprofiler_so</pre>
  43. <li> <p>In your code, bracket the code you want profiled in calls to
  44. <code>ProfilerStart()</code> and <code>ProfilerStop()</code>.
  45. (These functions are declared in <code>&lt;google/profiler.h&gt;</code>.)
  46. <code>ProfilerStart()</code> will take
  47. the profile-filename as an argument.</p>
  48. </ol>
  49. <p>In Linux 2.6 and above, profiling works correctly with threads,
  50. automatically profiling all threads. In Linux 2.4, profiling only
  51. profiles the main thread (due to a kernel bug involving itimers and
  52. threads). Profiling works correctly with sub-processes: each child
  53. process gets its own profile with its own name (generated by combining
  54. CPUPROFILE with the child's process id).</p>
  55. <p>For security reasons, CPU profiling will not write to a file -- and
  56. is thus not usable -- for setuid programs.</p>
  57. <p>See the include-file <code>google/profiler.h</code> for
  58. advanced-use functions, including <code>ProfilerFlush()</code> and
  59. <code>ProfilerStartWithOptions()</code>.</p>
  60. <H2>Modifying Runtime Behavior</H2>
  61. <p>You can more finely control the behavior of the CPU profiler via
  62. environment variables.</p>
  63. <table frame=box rules=sides cellpadding=5 width=100%>
  64. <tr valign=top>
  65. <td><code>CPUPROFILE_FREQUENCY=<i>x</i></code></td>
  66. <td>default: 100</td>
  67. <td>
  68. How many interrupts/second the cpu-profiler samples.
  69. </td>
  70. </tr>
  71. <tr valign=top>
  72. <td><code>CPUPROFILE_REALTIME=1</code></td>
  73. <td>default: [not set]</td>
  74. <td>
  75. If set to any value (including 0 or the empty string), use
  76. ITIMER_REAL instead of ITIMER_PROF to gather profiles. In
  77. general, ITIMER_REAL is not as accurate as ITIMER_PROF, and also
  78. interacts badly with use of alarm(), so prefer ITIMER_PROF unless
  79. you have a reason prefer ITIMER_REAL.
  80. </td>
  81. </tr>
  82. </table>
  83. <h1><a name="pprof">Analyzing the Output</a></h1>
  84. <p><code>pprof</code> is the script used to analyze a profile. It has
  85. many output modes, both textual and graphical. Some give just raw
  86. numbers, much like the <code>-pg</code> output of <code>gcc</code>,
  87. and others show the data in the form of a dependency graph.</p>
  88. <p>pprof <b>requires</b> <code>perl5</code> to be installed to run.
  89. It also requires <code>dot</code> to be installed for any of the
  90. graphical output routines, and <code>gv</code> to be installed for
  91. <code>--gv</code> mode (described below).
  92. </p>
  93. <p>Here are some ways to call pprof. These are described in more
  94. detail below.</p>
  95. <pre>
  96. % pprof /bin/ls ls.prof
  97. Enters "interactive" mode
  98. % pprof --text /bin/ls ls.prof
  99. Outputs one line per procedure
  100. % pprof --gv /bin/ls ls.prof
  101. Displays annotated call-graph via 'gv'
  102. % pprof --gv --focus=Mutex /bin/ls ls.prof
  103. Restricts to code paths including a .*Mutex.* entry
  104. % pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
  105. Code paths including Mutex but not string
  106. % pprof --list=getdir /bin/ls ls.prof
  107. (Per-line) annotated source listing for getdir()
  108. % pprof --disasm=getdir /bin/ls ls.prof
  109. (Per-PC) annotated disassembly for getdir()
  110. % pprof --text localhost:1234
  111. Outputs one line per procedure for localhost:1234
  112. % pprof --callgrind /bin/ls ls.prof
  113. Outputs the call information in callgrind format
  114. </pre>
  115. <h3>Analyzing Text Output</h3>
  116. <p>Text mode has lines of output that look like this:</p>
  117. <pre>
  118. 14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
  119. </pre>
  120. <p>Here is how to interpret the columns:</p>
  121. <ol>
  122. <li> Number of profiling samples in this function
  123. <li> Percentage of profiling samples in this function
  124. <li> Percentage of profiling samples in the functions printed so far
  125. <li> Number of profiling samples in this function and its callees
  126. <li> Percentage of profiling samples in this function and its callees
  127. <li> Function name
  128. </ol>
  129. <h3>Analyzing Callgrind Output</h3>
  130. <p>Use <a href="http://kcachegrind.sourceforge.net">kcachegrind</a> to
  131. analyze your callgrind output:</p>
  132. <pre>
  133. % pprof --callgrind /bin/ls ls.prof > ls.callgrind
  134. % kcachegrind ls.callgrind
  135. </pre>
  136. <p>The cost is specified in 'hits', i.e. how many times a function
  137. appears in the recorded call stack information. The 'calls' from
  138. function a to b record how many times function b was found in the
  139. stack traces directly below function a.</p>
  140. <p>Tip: if you use a debug build the output will include file and line
  141. number information and kcachegrind will show an annotated source
  142. code view.</p>
  143. <h3>Node Information</h3>
  144. <p>In the various graphical modes of pprof, the output is a call graph
  145. annotated with timing information, like so:</p>
  146. <A HREF="pprof-test-big.gif">
  147. <center><table><tr><td>
  148. <img src="pprof-test.gif">
  149. </td></tr></table></center>
  150. </A>
  151. <p>Each node represents a procedure. The directed edges indicate
  152. caller to callee relations. Each node is formatted as follows:</p>
  153. <center><pre>
  154. Class Name
  155. Method Name
  156. local (percentage)
  157. <b>of</b> cumulative (percentage)
  158. </pre></center>
  159. <p>The last one or two lines contains the timing information. (The
  160. profiling is done via a sampling method, where by default we take 100
  161. samples a second. Therefor one unit of time in the output corresponds
  162. to about 10 milliseconds of execution time.) The "local" time is the
  163. time spent executing the instructions directly contained in the
  164. procedure (and in any other procedures that were inlined into the
  165. procedure). The "cumulative" time is the sum of the "local" time and
  166. the time spent in any callees. If the cumulative time is the same as
  167. the local time, it is not printed.</p>
  168. <p>For instance, the timing information for test_main_thread()
  169. indicates that 155 units (about 1.55 seconds) were spent executing the
  170. code in <code>test_main_thread()</code> and 200 units were spent while
  171. executing <code>test_main_thread()</code> and its callees such as
  172. <code>snprintf()</code>.</p>
  173. <p>The size of the node is proportional to the local count. The
  174. percentage displayed in the node corresponds to the count divided by
  175. the total run time of the program (that is, the cumulative count for
  176. <code>main()</code>).</p>
  177. <h3>Edge Information</h3>
  178. <p>An edge from one node to another indicates a caller to callee
  179. relationship. Each edge is labelled with the time spent by the callee
  180. on behalf of the caller. E.g, the edge from
  181. <code>test_main_thread()</code> to <code>snprintf()</code> indicates
  182. that of the 200 samples in <code>test_main_thread()</code>, 37 are
  183. because of calls to <code>snprintf()</code>.</p>
  184. <p>Note that <code>test_main_thread()</code> has an edge to
  185. <code>vsnprintf()</code>, even though <code>test_main_thread()</code>
  186. doesn't call that function directly. This is because the code was
  187. compiled with <code>-O2</code>; the profile reflects the optimized
  188. control flow.</p>
  189. <h3>Meta Information</h3>
  190. <p>The top of the display should contain some meta information
  191. like:</p>
  192. <pre>
  193. /tmp/profiler2_unittest
  194. Total samples: 202
  195. Focusing on: 202
  196. Dropped nodes with &lt;= 1 abs(samples)
  197. Dropped edges with &lt;= 0 samples
  198. </pre>
  199. <p>This section contains the name of the program, and the total
  200. samples collected during the profiling run. If the
  201. <code>--focus</code> option is on (see the <a href="#focus">Focus</a>
  202. section below), the legend also contains the number of samples being
  203. shown in the focused display. Furthermore, some unimportant nodes and
  204. edges are dropped to reduce clutter. The characteristics of the
  205. dropped nodes and edges are also displayed in the legend.</p>
  206. <h3><a name=focus>Focus and Ignore</a></h3>
  207. <p>You can ask pprof to generate a display focused on a particular
  208. piece of the program. You specify a regular expression. Any portion
  209. of the call-graph that is on a path which contains at least one node
  210. matching the regular expression is preserved. The rest of the
  211. call-graph is dropped on the floor. For example, you can focus on the
  212. <code>vsnprintf()</code> libc call in <code>profiler2_unittest</code>
  213. as follows:</p>
  214. <pre>
  215. % pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof
  216. </pre>
  217. <A HREF="pprof-vsnprintf-big.gif">
  218. <center><table><tr><td>
  219. <img src="pprof-vsnprintf.gif">
  220. </td></tr></table></center>
  221. </A>
  222. <p>Similarly, you can supply the <code>--ignore</code> option to
  223. ignore samples that match a specified regular expression. E.g., if
  224. you are interested in everything except calls to
  225. <code>snprintf()</code>, you can say:</p>
  226. <pre>
  227. % pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof
  228. </pre>
  229. <h3>Interactive mode</a></h3>
  230. <p>By default -- if you don't specify any flags to the contrary --
  231. pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
  232. you can run many of the commands described above. You can type
  233. <code>help</code> for a list of what commands are available in
  234. interactive mode.</p>
  235. <h3><a name=options>pprof Options</a></h3>
  236. For a complete list of pprof options, you can run <code>pprof
  237. --help</code>.
  238. <h4>Output Type</h4>
  239. <p>
  240. <center>
  241. <table frame=box rules=sides cellpadding=5 width=100%>
  242. <tr valign=top>
  243. <td><code>--text</code></td>
  244. <td>
  245. Produces a textual listing. (Note: If you have an X display, and
  246. <code>dot</code> and <code>gv</code> installed, you will probably
  247. be happier with the <code>--gv</code> output.)
  248. </td>
  249. </tr>
  250. <tr valign=top>
  251. <td><code>--gv</code></td>
  252. <td>
  253. Generates annotated call-graph, converts to postscript, and
  254. displays via gv (requres <code>dot</code> and <code>gv</code> be
  255. installed).
  256. </td>
  257. </tr>
  258. <tr valign=top>
  259. <td><code>--dot</code></td>
  260. <td>
  261. Generates the annotated call-graph in dot format and
  262. emits to stdout (requres <code>dot</code> be installed).
  263. </td>
  264. </tr>
  265. <tr valign=top>
  266. <td><code>--ps</code></td>
  267. <td>
  268. Generates the annotated call-graph in Postscript format and
  269. emits to stdout (requres <code>dot</code> be installed).
  270. </td>
  271. </tr>
  272. <tr valign=top>
  273. <td><code>--pdf</code></td>
  274. <td>
  275. Generates the annotated call-graph in PDF format and emits to
  276. stdout (requires <code>dot</code> and <code>ps2pdf</code> be
  277. installed).
  278. </td>
  279. </tr>
  280. <tr valign=top>
  281. <td><code>--gif</code></td>
  282. <td>
  283. Generates the annotated call-graph in GIF format and
  284. emits to stdout (requres <code>dot</code> be installed).
  285. </td>
  286. </tr>
  287. <tr valign=top>
  288. <td><code>--list=&lt;<i>regexp</i>&gt;</code></td>
  289. <td>
  290. <p>Outputs source-code listing of routines whose
  291. name matches &lt;regexp&gt;. Each line
  292. in the listing is annotated with flat and cumulative
  293. sample counts.</p>
  294. <p>In the presence of inlined calls, the samples
  295. associated with inlined code tend to get assigned
  296. to a line that follows the location of the
  297. inlined call. A more precise accounting can be
  298. obtained by disassembling the routine using the
  299. --disasm flag.</p>
  300. </td>
  301. </tr>
  302. <tr valign=top>
  303. <td><code>--disasm=&lt;<i>regexp</i>&gt;</code></td>
  304. <td>
  305. Generates disassembly of routines that match
  306. &lt;regexp&gt;, annotated with flat and
  307. cumulative sample counts and emits to stdout.
  308. </td>
  309. </tr>
  310. </table>
  311. </center>
  312. <h4>Reporting Granularity</h4>
  313. <p>By default, pprof produces one entry per procedure. However you can
  314. use one of the following options to change the granularity of the
  315. output. The <code>--files</code> option seems to be particularly
  316. useless, and may be removed eventually.</p>
  317. <center>
  318. <table frame=box rules=sides cellpadding=5 width=100%>
  319. <tr valign=top>
  320. <td><code>--addresses</code></td>
  321. <td>
  322. Produce one node per program address.
  323. </td>
  324. </tr>
  325. <td><code>--lines</code></td>
  326. <td>
  327. Produce one node per source line.
  328. </td>
  329. </tr>
  330. <td><code>--functions</code></td>
  331. <td>
  332. Produce one node per function (this is the default).
  333. </td>
  334. </tr>
  335. <td><code>--files</code></td>
  336. <td>
  337. Produce one node per source file.
  338. </td>
  339. </tr>
  340. </table>
  341. </center>
  342. <h4>Controlling the Call Graph Display</h4>
  343. <p>Some nodes and edges are dropped to reduce clutter in the output
  344. display. The following options control this effect:</p>
  345. <center>
  346. <table frame=box rules=sides cellpadding=5 width=100%>
  347. <tr valign=top>
  348. <td><code>--nodecount=&lt;n&gt;</code></td>
  349. <td>
  350. This option controls the number of displayed nodes. The nodes
  351. are first sorted by decreasing cumulative count, and then only
  352. the top N nodes are kept. The default value is 80.
  353. </td>
  354. </tr>
  355. <tr valign=top>
  356. <td><code>--nodefraction=&lt;f&gt;</code></td>
  357. <td>
  358. This option provides another mechanism for discarding nodes
  359. from the display. If the cumulative count for a node is
  360. less than this option's value multiplied by the total count
  361. for the profile, the node is dropped. The default value
  362. is 0.005; i.e. nodes that account for less than
  363. half a percent of the total time are dropped. A node
  364. is dropped if either this condition is satisfied, or the
  365. --nodecount condition is satisfied.
  366. </td>
  367. </tr>
  368. <tr valign=top>
  369. <td><code>--edgefraction=&lt;f&gt;</code></td>
  370. <td>
  371. This option controls the number of displayed edges. First of all,
  372. an edge is dropped if either its source or destination node is
  373. dropped. Otherwise, the edge is dropped if the sample
  374. count along the edge is less than this option's value multiplied
  375. by the total count for the profile. The default value is
  376. 0.001; i.e., edges that account for less than
  377. 0.1% of the total time are dropped.
  378. </td>
  379. </tr>
  380. <tr valign=top>
  381. <td><code>--focus=&lt;re&gt;</code></td>
  382. <td>
  383. This option controls what region of the graph is displayed
  384. based on the regular expression supplied with the option.
  385. For any path in the callgraph, we check all nodes in the path
  386. against the supplied regular expression. If none of the nodes
  387. match, the path is dropped from the output.
  388. </td>
  389. </tr>
  390. <tr valign=top>
  391. <td><code>--ignore=&lt;re&gt;</code></td>
  392. <td>
  393. This option controls what region of the graph is displayed
  394. based on the regular expression supplied with the option.
  395. For any path in the callgraph, we check all nodes in the path
  396. against the supplied regular expression. If any of the nodes
  397. match, the path is dropped from the output.
  398. </td>
  399. </tr>
  400. </table>
  401. </center>
  402. <p>The dropped edges and nodes account for some count mismatches in
  403. the display. For example, the cumulative count for
  404. <code>snprintf()</code> in the first diagram above was 41. However
  405. the local count (1) and the count along the outgoing edges (12+1+20+6)
  406. add up to only 40.</p>
  407. <h1>Caveats</h1>
  408. <ul>
  409. <li> If the program exits because of a signal, the generated profile
  410. will be <font color=red>incomplete, and may perhaps be
  411. completely empty</font>.
  412. <li> The displayed graph may have disconnected regions because
  413. of the edge-dropping heuristics described above.
  414. <li> If the program linked in a library that was not compiled
  415. with enough symbolic information, all samples associated
  416. with the library may be charged to the last symbol found
  417. in the program before the library. This will artificially
  418. inflate the count for that symbol.
  419. <li> If you run the program on one machine, and profile it on
  420. another, and the shared libraries are different on the two
  421. machines, the profiling output may be confusing: samples that
  422. fall within shared libaries may be assigned to arbitrary
  423. procedures.
  424. <li> If your program forks, the children will also be profiled
  425. (since they inherit the same CPUPROFILE setting). Each process
  426. is profiled separately; to distinguish the child profiles from
  427. the parent profile and from each other, all children will have
  428. their process-id appended to the CPUPROFILE name.
  429. <li> Due to a hack we make to work around a possible gcc bug, your
  430. profiles may end up named strangely if the first character of
  431. your CPUPROFILE variable has ascii value greater than 127.
  432. This should be exceedingly rare, but if you need to use such a
  433. name, just set prepend <code>./</code> to your filename:
  434. <code>CPUPROFILE=./&Auml;gypten</code>.
  435. </ul>
  436. <hr>
  437. <address>Sanjay Ghemawat<br>
  438. <!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
  439. <!-- hhmts start -->
  440. Last modified: Fri May 9 14:41:29 PDT 2008
  441. <!-- hhmts end -->
  442. </address>
  443. </BODY>
  444. </HTML>