PageRenderTime 57ms CodeModel.GetById 14ms RepoModel.GetById 0ms app.codeStats 0ms

/hphp/runtime/vm/jit/refcount-opts.cpp

https://gitlab.com/Blueprint-Marketing/hhvm
C++ | 1114 lines | 287 code | 69 blank | 758 comment | 16 complexity | bb860fc5ef5ee252c3419125cd178276 MD5 | raw file
  1. /*
  2. +----------------------------------------------------------------------+
  3. | HipHop for PHP |
  4. +----------------------------------------------------------------------+
  5. | Copyright (c) 2010-2014 Facebook, Inc. (http://www.facebook.com) |
  6. +----------------------------------------------------------------------+
  7. | This source file is subject to version 3.01 of the PHP license, |
  8. | that is bundled with this package in the file LICENSE, and is |
  9. | available through the world-wide-web at the following url: |
  10. | http://www.php.net/license/3_01.txt |
  11. | If you did not receive a copy of the PHP license and are unable to |
  12. | obtain it through the world-wide-web, please send a note to |
  13. | license@php.net so we can mail you a copy immediately. |
  14. +----------------------------------------------------------------------+
  15. */
  16. //////////////////////////////////////////////////////////////////////
  17. /*
  18. Welcome to refcount-opts. Theoretically reading this block comment first will
  19. make the rest of this file make more sense.
  20. -- Overview --
  21. This file contains passes that attempt to reduce the number and strength of
  22. reference counting operations in an IR program. It uses a few strategies, but
  23. fundamentally most of what's going on is about trying to prove that an IncRef
  24. is post-dominated by a DecRef that provably can't go to zero, with no events in
  25. between that can tell if the IncRef happened, and if so, removing the pair.
  26. This doc comment is going to explain a few concepts, interleaved with
  27. discussion on how they are used by the various analysis and optimization passes
  28. in this module.
  29. -- Must/May Alias Sets --
  30. The passes in this file operate on groups of SSATmp's called "must-alias-set"s
  31. (or often "asets" in the code). These are sets of SSATmp names that are known
  32. to alias the same object, in a "semi"-flow-insensitive way (see below). Every
  33. SSATmp that may have a reference counted type is assigned to a must-alias-set.
  34. Crucially, if two SSATmps belong to different must-alias-sets, they still /may/
  35. point to the same object. For SSATmps a and b, in this module we use
  36. (a&b).maybe(Counted) as a flow-insensitive May-Alias(a,b) predicate: if two
  37. tmps may alias but only in a way that is not reference counted, we don't care
  38. for our purposes.
  39. A subtle point worth considering is that it is possible (and not uncommon) that
  40. some of the SSATmps in one must-alias-set May-Alias some but not all of the
  41. SSATmps in another must-alias-set: the reason is that the must-alias-sets are
  42. still subject to SSA rules about where tmps are defined, and some of the tmps
  43. in a set may be defined by instructions that take conditional jumps if the
  44. object doesn't satisfy some condition (e.g. CheckType). This is why it may
  45. make sense to think of the must-alias-sets as "semi"-flow-insensitive: it's
  46. globally true that the names all refer to the same object, but the names
  47. involved aren't globally defined.
  48. The first thing this module does is run a function to map SSATmps to their
  49. must-alias-sets, and then, for each must-alias-set S, compute which of the
  50. other must-alias-sets contain any tmps that May-Alias any tmp from S.
  51. -- Weakening DecRefs --
  52. This file contains a relatively cheap pass that can weaken DecRefs into
  53. DecRefNZ by proving that they can't go to zero (unless there is already a bug
  54. in the program).
  55. The way this works is to do a backward dataflow analysis, computing
  56. "will_be_used_again" information. This dataflow analysis has a boolean for
  57. each must-alias-set, indicating whether all paths from a program point contain
  58. a use of that object in a way that implies their reference count is not zero
  59. (for example, if every path decrefs it again). Then, it converts any DecRef
  60. instruction on tmps whose must-alias-sets say they "will_be_used_again" to
  61. DecRefNZ.
  62. One rule this pass relies on is that it is illegal to DecRef an object in a way
  63. that takes its refcount to zero, and then IncRef it again after that. This is
  64. not illegal for trivial reasons, because object __destruct methods can
  65. ressurect an object in PHP. But within one JIT region, right now we declare it
  66. illegal to generate IR that uses an object after a DecRef that might take it to
  67. zero.
  68. Since this pass converts instructions that may (in general) re-enter the
  69. VM---running arbitrary PHP code for a destructor---it's potentially profitable
  70. to run it earlier than other parts of refcount opts. For example, it can allow
  71. heap memory accesses to be proven redundant that otherwise would not be, and
  72. can prevent the rest of this analysis from assuming some DecRefs can re-enter
  73. that actually can't.
  74. -- RC Flowgraphs --
  75. Other optimizations in this file are performed on "RC flowgraphs", which are an
  76. abstract representation of only the effects of the IR program that matter for
  77. the optimization, on a single must-alias-set at a time. The RC graphs contain
  78. explicit control flow nodes ("phi" nodes for joins, "sigma" nodes for splits),
  79. as well as nodes for things like decref instructions, incref instructions, and
  80. "req" nodes that indicate that the reference count of an object may be observed
  81. at that point up to some level. Nodes in an RC graph each come with a "lower
  82. bound" on the reference count for the graph's must-alias-set at that program
  83. point (more about lower bounds below)---these lower bounds are the lower bound
  84. before that node in the flowgraph. We build independent graphs for each
  85. must-alias-set, and they do not need to contain specific nodes relating to
  86. possible cross-set effects (based on May-Alias relationships)---that
  87. information is available in these graphs through the "req" nodes and lower
  88. bound information.
  89. The graphs are constructed after first computing information that allows us to
  90. process each must-alias-set independently. Then they are processed one at a
  91. time with a set of "legal transformation rules". The rules are applied in a
  92. single pass over the flowgraph, going forwards, but potentially backtracking
  93. when certain rules apply, since they may enable more rules to apply to previous
  94. nodes. At this point it might help to go look at one or two of the
  95. transformation rule examples below (e.g. rule_inc_dec_fold), but that
  96. documentation is not duplicated here.
  97. The intention is that these rules are smaller and easier to understand the
  98. correctness of than trying to do these transformations without an explicit data
  99. structure, but a disadvantage is that this pass needs to allocate a lot of
  100. temporary information in these graphs. The backtracking also seemed a bit
  101. convoluted to do directly on the IR. We may eventually change this to work
  102. without the extra data structure, but that's how it works right now.
  103. Most of the analysis code in this module is about computing the information we
  104. need to build these flowgraphs, before we do the actual optimizations on them.
  105. The rest of this doc-comment talks primarily about that analysis---see the
  106. comments near the rule_* functions for more about the flowgraph optimizations
  107. themselves, and the comments near the Node structure for a description of the
  108. node types in these graphs.
  109. -- RC "lower bounds" --
  110. A lower bound on the reference count of a must-alias-set indicates a known
  111. minimum for the value of its object's count field at that program point. This
  112. minimum value can be interpreted as a minimum value of the actual integer in
  113. memory at each point, if the program were not modified by this pass. A lower
  114. bound is therefore always non-negative.
  115. The first utility of this information is pretty obvious: if a DecRef
  116. instruction is encountered when the lower bound of must-alias-set is greater
  117. than one, that DecRef instruction can be converted to DecRefNZ, since it can't
  118. possibly free the object. (See the flowgraph rule_decnz.) Knowledge of the
  119. lower bound is also required for folding unobservable incref/decref pairs, and
  120. generally this information is inspected by most of the things done as RC
  121. flowgraph transformations.
  122. The lower bound must be tracked conservatively to ensure that our
  123. transformations are correct. This means we can increase a lower bound only
  124. when we see instructions that /must/ imply an increase in the object's count
  125. field, but we must decrease a lower bound whenever we see instructions that
  126. /may/ imply a decrease in the count. It might clarify this a little to list
  127. the reasons that a must-alias-set's lower bounds can be increased:
  128. o An explicit IncRef instruction in the instruction stream of a tmp in the
  129. must-alias-set.
  130. o Instructions that "produce references" (generally instructions that
  131. allocate new objects).
  132. o Some situations with loads from memory (described later).
  133. A must-alias-set's lower bound can be decreased in many situations, including:
  134. o An explicit DecRef or DecRefNZ of an SSATmp that maps to this
  135. must-alias-set.
  136. o In some situations, executing an instruction that could decref pointers
  137. that live in memory, for example by re-entering and running arbitrary php
  138. code. (Memory is discussed more shortly; this concept is called "memory
  139. support".)
  140. o An SSATmp in this must-alias-set is passed as an argument to an IR
  141. instruction that may decref it. (This is the "consumes reference"
  142. IRInstruction flag.)
  143. o We see an instruction that may reduce the lower bound of a different
  144. must-alias-set, for any reason, and that different set May-Alias this set.
  145. If the last point were the exact rule we used, it would potentially mean lots
  146. of reductions in lower bounds, which could be very pessimistic, so to obviate
  147. the need to do it all the time we introduce an "exclusivity" principle on
  148. tracking lower bounds. What this principle means is the following: if we see
  149. some reason in the IR to increment the lower bound in an alias set A, we can
  150. /only/ increment the lower bound of A, even if that same information could also
  151. be used to increase the lower bound of other asets. If we could theoretically
  152. use the same information to increase the lower bound of a different set B, we
  153. can't do that at the same time---we have to choose one to apply it to. (This
  154. situation comes up with loads and is discussed further in "About Loads".)
  155. This exclusivity principle provides the following rule for dealing with
  156. instructions that may decrease reference counts because of May-Alias
  157. relationships: when we need to decrease the lower bound of a must-alias-set, if
  158. its lower bound is currently non-zero, we have no obligation to decrement the
  159. lower bound in any other must-alias-set, regardless of May-Alias relationships.
  160. The exclusivity of the lower bound means we know we're just cancelling out
  161. something that raised the lower bound on this set and no other, so the state on
  162. other sets can't be affected.
  163. The pessimistic case still applies, however, if you need to reduce the lower
  164. bound on a must-alias-set S that currently has a lower bound of zero. Then all
  165. the other sets that May-Alias S must have their lower bound reduced as well.
  166. -- Memory Support --
  167. This section is going to introduce the "memory support" concept, but details
  168. will be fleshed out further in following sections.
  169. The key idea behind this concept is that we can keep lower bounds higher than
  170. we would otherwise be able to by tracking at least some of the pointers to the
  171. object that may be in memory. An alternative, conservative approach to stores,
  172. for example, might be to eagerly attempt to reduce the lower bound on the must
  173. alias set for value being stored at the location of the store itself. By
  174. instead keeping track of the fact that that memory location may contain a
  175. pointer to that must-alias-set until it may be decref'd later, we can keep the
  176. lower bound higher for longer.
  177. The state of memory support for each must-alias-set is a bitvector of the
  178. memory locations AliasAnalysis has identified in the program. If a bit is set,
  179. it indicates that that memory location may contain a pointer to that must alias
  180. set. When a must-alias-set has any memory support bits set, it is going to be
  181. analyzed more conservatively than if it doesn't. And importantly, the memory
  182. support bits are may-information: just because a bit is set, doesn't mean that
  183. we know for sure that memory location contains a pointer to this object. It
  184. just means that it might, and that our lower bound may have been "kept higher
  185. for longer" using that knowledge at some point.
  186. The primary thing we need to do more conservatively with must-alias-sets that
  187. have memory support is reduce their lower bound if they could be decref'd
  188. through that pointer in memory. Since this effect on the analysis just reduces
  189. lower bounds, it would never be incorrect to leave the memory support bit set
  190. forever in this situation, which is also conceptually necessary for this to
  191. work as may-information.
  192. However, if we see an instruction that could DecRef one of these objects
  193. through a pointer in memory and its lower_bound is currently non-zero, we can
  194. be sure we've accounted for that may-DecRef by balancing it with a IncRef of
  195. some sort that we've already observed. In this situation, we can remove the
  196. memory support bit to avoid futher reductions in the lower bound of that set
  197. via that memory location.
  198. Since this is may-information that makes analysis more conservative, the memory
  199. support bits should conceptually be or'd at merge points. It is fine to think
  200. of it that way for general understanding of the analysis here, but in this
  201. implementation we don't actually treat it that way when merging. Because we
  202. want to be able to quickly find memory-supported must-alias-sets from a given
  203. memory location when analyzing memory effects of IR instructions (i.e. without
  204. iterating every tracked alias set), we restrict the state to require that at
  205. most one must-alias-set is supported by a given memory location during the
  206. analysis. If we reach situations that would break that restriction, we must
  207. handle it conservatively (using a `pessimized' state, which is discussed some
  208. later, as a last resort). The merge_memory_support function elaborates on the
  209. details of how this is done.
  210. Another thing to note about memory support is that we may have more bits set
  211. than the current lower bound for an object. This situation can arise due to
  212. conservatively reducing the lower bound, or due to pure stores happening before
  213. IncRef instructions that raise the lower bound for that new pointer.
  214. Most of the complexity in this analysis is related to instructions that load or
  215. store from memory, and therefore interacts with memory support. There are
  216. enough details to discuss it futher in next several sections of this doc.
  217. -- About Loads --
  218. On entry to a region, it is assumed that normal VM reference count invariants
  219. hold for all memory---specificaly, each reference count on each object in the
  220. heap is exactly the number of live pointers to that object. And in general,
  221. accesses to memory must maintain this invariant when they are done outside of
  222. small regions that may temporarily break that invariant. We make use of this
  223. fact to increase object lower bounds.
  224. Accesses to memory within an IR compilation unit may be lowered into
  225. instructions that separate reference count manipulation from stores and loads
  226. (which is necessary for this pass to be able to optimize the former), so we
  227. can't just assume loading a value from somewhere implies that there is a
  228. reference count on the object, since our input program itself may have changed
  229. that. Furthermore, our input program may contain complex instructions other
  230. than lowered stores that can "store over" memory locations we've already loaded
  231. from, with a decref of the old value, and our analysis pass needs to reduce
  232. lower bounds when we see those situations if we were using that memory location
  233. to increase a lower bound on a loaded value.
  234. To accomplish this, first this module performs a forward dataflow analysis to
  235. compute program locations at which each memory location assigned an id by
  236. AliasAnalysis are known to be "balanced" with regard to reference counting.
  237. The gist of this is that if the last thing to manipulate a memory location must
  238. have been code outside of this region, future loads from the memory location
  239. define SSATmps that we know must have a lower bound of 1, corresponding to the
  240. live pointer in that memory location. However, when this analysis observes a
  241. PureStore---i.e. a lowered, within-region store that does not imply reference
  242. counting---a future load does not imply anything about the reference count,
  243. because the program may have potentially written a pointer there that is not
  244. yet "balanced" (we would need to see an IncRef or some other instruction
  245. associated with the stored value to know that it has a reference).
  246. Using the results of this analysis, we can add to the lower bound of some
  247. must-alias-sets when we see loads from locations that are known to be balanced
  248. at that program point. When we do this, we must also track that the object has
  249. a pointer in memory, which could cause a reduction in the lower bound later if
  250. someone could decref it through that pointer, so the location must be added to
  251. the memory support bitvector for that must-alias-set. Whenever we see complex
  252. instructions that may store to memory with the normal VM semantics of decrefing
  253. old values, if they could overwrite locations that are currently "supporting"
  254. one of our alias sets, we need to remove one from the alias set's lower bound
  255. in case it decided to overwrite (and decref) the pointer that was in memory.
  256. The "exclusivity" guarantee of our lower bounds requires that if we want to
  257. raise the lower bound of an object because it was loaded from a memory location
  258. known to be "balanced", then we only raise the lower bound for this reason on
  259. at most one must-alias-set at a time. This means if we see a load from a
  260. location that is known to contain a balanced pointer, but we were already using
  261. that location as memory support on a different set, we either need to remove
  262. one from the lower bound of the other set before adding one to the new set, or
  263. leave everything alone. This commonly happens across php calls right now,
  264. where values must be reloaded from memory because SSATmps can't span calls.
  265. The way this pass currently handles this is the following: if we can reduce the
  266. lower bound on the other set (because it has a non-zero lower bound), we'll
  267. take the first choice, since the previously supported aset will probably not be
  268. used again if we're spanning a call. On the other hand, if the old set has a
  269. lower bound of zero, so we can't compensate for removing it, we leave
  270. everything alone.
  271. -- Effects of Pure Stores on Memory Support --
  272. There are two main kinds of stores from the perspective of this module. There
  273. are lowered stores (PureStore and PureSpillFrame) that happen within our IR
  274. compilation unit, and don't imply reference count manipulation, and there are
  275. stores that happen with "hhbc semantics" outside of the visibility of this
  276. compilation unit, which imply decreffing the value that used to live in a
  277. memory location as it's replaced with a new one. This module needs some
  278. understanding of both types, and both of these types of stores affect memory
  279. support, but in different ways.
  280. For any instruction that may do non-lowered stores outside of our unit ("stores
  281. with hhbc semantics"), if the location(s) it may be storing to could be
  282. supporting the lower bound in any must-alias-set, we should remove the support
  283. and decrement the lower bound, because it could DecRef the value in order to
  284. replace it with something else. If we can't actually reduce the lower bound
  285. (because it's already zero), we must leave the memory support flag alone,
  286. because we haven't really accounted for the reference that lived in that memory
  287. location, and it might not have actually been DecRef'd at that program point,
  288. and could be DecRef'd later after we've seen another IncRef. If we didn't
  289. leave the bit alone in this situation, the lower bound could end up too high
  290. after a later IncRef.
  291. On the other hand, for a PureStore with a known destination, we don't need to
  292. reduce the lower bound of any set that was supported by that location, since it
  293. never implies a DecRef. If the input IR program itself is trying to store to
  294. that location "with hhbc semantics", then the program will also explicitly
  295. contain the other lowered parts of this high level store, including any
  296. appropriate loads and DecRefs of the old value, so we won't miss their effects.
  297. So, for a PureStore we can simply mark the location as no-longer providing
  298. memory support on the set it used to, but leave the lower bound alone.
  299. The final case is a PureStore to an unknown location (either because it was not
  300. supplied a AliasAnalysis id, or because it stored to something like a PtrToGen
  301. that could refer to anything in memory). In this situation, it may or may not
  302. be overwriting a location we had been using for memory support---however, it's
  303. harmless to leave that state alone, with the following rationale:
  304. If it doesn't actually overwrite it, then obviously things are the same, and
  305. we're good. On the other hand, if it does actually overwrite it, then we don't
  306. need to adjust the lower bound still, because it's a pure store (i.e. for the
  307. same reason we didn't reduce the lower bound in the case above where we knew
  308. where the store was going). If we do nothing to our state, the only difference
  309. from the known location, then, is that we may have "unnecessarily" left a
  310. must-alias-set marked as getting memory support when it doesn't need to be
  311. anymore. But the point of marking part of a lower bound as coming from memory
  312. support is just so that future stores (or loads) can potentially /reduce/ its
  313. lower bound, so at worst it could reduce it later when it wouldn't really have
  314. needed to if we had better information about where the store was going. In
  315. other words, it can be thought of as an optimization to clear the memory
  316. support state when we see a PureStore with a known target location: it's not
  317. required for correctness.
  318. -- Effects of Pure Stores on the Must-Alias-Set Being Stored --
  319. The other thing to take into account with stores is that they put a (possibly
  320. new) pointer in memory, which means it now could be loaded and DecRef'd later,
  321. possibly by code we can't directly see in our compilation unit. To handle
  322. this, we can divide things into four situations, based on two boolean
  323. attributes: whether or not we have an AliasAnalysis bit for the location being
  324. stored to ("known" vs "unknown" location), and whether or not the lower bound
  325. on the must-alias-set for the value being stored is currently above zero.
  326. The reason the lower bound matters when we see the store is the following:
  327. we've possibly created a pointer in memory, which could be DecRef'd later, but
  328. if the lower bound is zero we don't have a way to account for that, since it
  329. can't go negative. It's not ok to just ignore this. Take the following
  330. example, where t1 has a lower bound of zero:
  331. StMem ptr, t1
  332. IncRef t1
  333. IncRef t1
  334. RaiseWarning "something" // any instruction that can re-enter and decref t1
  335. DecRef t1
  336. ...
  337. If we simply ignored the fact that a new pointer has been created at the store,
  338. that means the lower bound would be two after the two IncRefs, with no memory
  339. support flags. Then when we see the RaiseWarning, we won't know we need to
  340. reduce the lower bound, since we didn't account for the store, and now we'll
  341. think we can change the DecRef to DecRefNZ, but this is not actually a sound
  342. transformation.
  343. If the input program is not malformed, it will in fact be doing a 'balancing'
  344. IncRef for any new pointers it creates, before anything could access it---in
  345. fact it may have done that before the store, but our analysis in general
  346. could've lost that information in the tracked lower bound because of a
  347. May-Alias decref, or because it was done through an SSATmp that is mapped to a
  348. different must-alias-set that actually is the same object (although we don't
  349. know).
  350. With this in mind, we'll discuss all four cases:
  351. Unknown target, Zero LB:
  352. We flag all must-alias-sets as "pessimized". This state inserts a Halt
  353. node in each of the RC flowgraphs, and stops all optimizations along that
  354. control flow path: it prevents us from doing anything else in any
  355. successor blocks.
  356. Known target, Zero LB:
  357. Unlike the above, this case is not that uncommon. Since we know where it
  358. is going, we don't have to give up on everything. Instead, we leave the
  359. lower bound at zero, but set a memory support bit for the new location.
  360. Recall that we can in general have more memory support locations for one
  361. aset than the tracked lower bound---this is one situation that can cause
  362. that.
  363. Unknown target, Non-Zero LB:
  364. We don't know where the store is going, but we can account for balancing
  365. the possibly-new pointer. In this case, we decrement the lower bound and
  366. just eagerly behave as if the must-alias-set for the stored value may be
  367. decref'd right there. Since the lower bound is non-zero, we don't need to
  368. worry about changing lower bounds in other sets that May-Alias this one,
  369. because of the "exclusivity" rule for lower bounds.
  370. Known target, Non-Zero LB:
  371. Since we know where the new pointer will be, similar to the second case,
  372. we don't need to reduce the lower bound yet---we can wait until we see an
  373. instruction that might decref our object through that pointer. In this
  374. case, we can just mark the target location as memory support for the
  375. must-alias-set for the stored value, and leave its lower bound alone.
  376. -- More about Memory --
  377. Another consideration about memory in this module arises from the fact that our
  378. analysis passes make no attempt to track which object pointers may be escaped.
  379. For that matter, much of the optimization we currently do here is removing
  380. redundant reference counting of locals and eval stack slots, which arises from
  381. lowering the HHBC stack machine semantics to HHIR---generally speaking these
  382. values could be accessible through the heap as far as we know. This is
  383. important because it means that we can make no transformations to the program
  384. that would affect the behavior of increfs or decrefs in memory locations we
  385. aren't tracking, on the off chance they happen to contain a pointer to one of
  386. our tracked objects.
  387. The way we maintain correctness here is to never move or eliminate reference
  388. counting operations unless we know about at least /two/ references to the
  389. object being counted. The importance of this is easiest to illustrate with
  390. delayed increfs (relevant to rules inc_pass_req, inc_pass_phi, and
  391. inc_pass_sig), although it applies to inc/dec pair removal also: it is fine to
  392. move an incref forward in the IR instruction stream, as long as nothing could
  393. observe the difference between the reference count the object "should" have,
  394. and the one it will have after we delay the incref. We need to consider how
  395. reachability from the heap can affect this.
  396. If the lower bound at an incref instruction is two or greater, we know we can
  397. push the incref down as much as we want (basically until we reach an exit from
  398. the compilation unit, or until we reach something that may decref the object
  399. and reduce the lower bound). On the other hand, if the lower bound before the
  400. incref is zero, in order to move the incref forward, we would need to stop at
  401. any instruction that could decref /anything/ in any memory location, since
  402. we're making the assumption that there may be other live pointers to the
  403. object---if we were to push that incref forward, we could change whether other
  404. pointers to the object are considered the last reference, and cause a decref to
  405. free the object when it shouldn't. (We could try to do this on the rc
  406. flowgraphs, but at least in a trivial implementation it would lead to a much
  407. larger number of flowgraph nodes, so instead we leave easy cases to a separate,
  408. local, "remove_trivial_incdecs" pass and ignore hard cases.)
  409. The above two cases are relatively straightforward. The remaining case is when
  410. the lower bound before an incref is one. It turns out to be safe to sink in
  411. this case, and it fits the idea that we "know about two references". Whatever
  412. caused the lower bound to be one before the incref will ensure that the
  413. object's liveness is not affected---here's why:
  414. There are two possibilities: the object is either reachable through at least
  415. one unknown pointer, or it isn't. If it isn't, then the safety of moving the
  416. incref is relatively straight-forward: we'll be pushing the actual /second/
  417. reference down, and it is safe to push it as long as we don't move it through
  418. something that may decref it (or until we reach an exit from the compilation
  419. unit). For the other possibility, it is sufficient to consider only having one
  420. unknown pointer: in this situation, we're pushing the actual /third/ reference
  421. down, and if anything decrefs the object through the pointer we don't know
  422. about, it will still know not to free it because we left the second reference
  423. alone (whatever was causing our lower bound to be one), and therefore a decref
  424. through this unknown pointer won't think it is removing the last reference.
  425. Also worth discussing is that there are several runtime objects in the VM with
  426. operations that have behavioral differences based on whether the reference
  427. count is greater than one. For instance, types like KindOfString and
  428. KindOfArray do in place updates when they have a refcount of one, and KindOfRef
  429. is treated "observably" as a php reference only if the refcount is greater than
  430. one. Making sure we don't change these situations is actually the same
  431. condition as discussed above: by the above scheme for not changing whether
  432. pointers we don't know about constitute the last counted reference to an
  433. object, we are both preventing decrefs from going to zero when they shouldn't,
  434. and modifications to objects from failing to COW when they should.
  435. A fundamental meta-rule that arises out of all the above considerations for any
  436. of the RC flowgraph transformation rules is that we cannot move (or remove)
  437. increfs unless the lower bound on the incref node is at least one (meaning
  438. after the incref we "know about two references"). Similarly, anything that
  439. could reduce the lower bound must put a node in the RC flowgraph to update that
  440. information (a Req{1} node usually) so we don't push increfs too far or remove
  441. them when we shouldn't.
  442. -- "Trivial" incdec removal pass --
  443. This module also contains a local optimization that removes IncRef/DecRefNZ
  444. pairs in a block that have no non-"pure" memory-accessing instructions in
  445. between them.
  446. This optimization can be performed without regard to the lower bound of any
  447. objects involved, and the DecRef -> DecRefNZ transformations the rest of the
  448. code makes can create situations where these opportunities are visible. Some
  449. of these situations would be removable by the main pass if we had a more
  450. complicated scheme for dealing with "unknown heap pointers" (i.e. the stuff in
  451. the "more about memory" section described above). But other situations may
  452. also occur because the main pass may create unnecessary Req nodes in the middle
  453. of code sequences that don't really observe references when we're dealing with
  454. unrelated PureStores of possibly-aliasing tmps that have lower bounds of zero.
  455. In general it is a simple pass to reason about the correctness of, and it
  456. cleans up some things we can miss, so it is easier to do some of the work this
  457. way than to complicate the main pass further.
  458. */
  459. //////////////////////////////////////////////////////////////////////
  460. #include "hphp/runtime/vm/jit/opt.h"
  461. #include <algorithm>
  462. #include <cstdio>
  463. #include <string>
  464. #include <limits>
  465. #include <sstream>
  466. #include <array>
  467. #include <tuple>
  468. #include <folly/Format.h>
  469. #include <folly/ScopeGuard.h>
  470. #include <folly/Conv.h>
  471. #include <boost/dynamic_bitset.hpp>
  472. #include "hphp/util/safe-cast.h"
  473. #include "hphp/util/dataflow-worklist.h"
  474. #include "hphp/util/match.h"
  475. #include "hphp/util/trace.h"
  476. #include "hphp/runtime/vm/jit/ir-unit.h"
  477. #include "hphp/runtime/vm/jit/pass-tracer.h"
  478. #include "hphp/runtime/vm/jit/cfg.h"
  479. #include "hphp/runtime/vm/jit/state-vector.h"
  480. #include "hphp/runtime/vm/jit/ir-instruction.h"
  481. #include "hphp/runtime/vm/jit/analysis.h"
  482. #include "hphp/runtime/vm/jit/block.h"
  483. #include "hphp/runtime/vm/jit/containers.h"
  484. #include "hphp/runtime/vm/jit/memory-effects.h"
  485. #include "hphp/runtime/vm/jit/alias-analysis.h"
  486. #include "hphp/runtime/vm/jit/mutation.h"
  487. #include "hphp/runtime/vm/jit/timer.h"
  488. namespace HPHP { namespace jit {
  489. namespace {
  490. TRACE_SET_MOD(hhir_refcount);
  491. //////////////////////////////////////////////////////////////////////
  492. /*
  493. * Id's of must-alias-sets. We use -1 as an invalid id.
  494. */
  495. using ASetID = int32_t;
  496. struct MustAliasSet {
  497. explicit MustAliasSet(Type widestType, SSATmp* representative)
  498. : widestType(widestType)
  499. , representative(representative)
  500. {}
  501. /*
  502. * Widest type for this MustAliasSet, used for computing may-alias
  503. * information.
  504. *
  505. * Because of how we build MustAliasSets (essentially canonical(), or groups
  506. * of LdCtx instructions), it is guaranteed that this widestType includes all
  507. * possible values for the set. However it is not the case that every tmp in
  508. * the set necessarily has a subtype of widestType, because of situations
  509. * that can occur with AssertType and interface types. This does not affect
  510. * correctness, but it's worth being aware of.
  511. */
  512. Type widestType;
  513. /*
  514. * A representative of the set. This is only used for debug tracing, and is
  515. * currently the first instruction (in an rpo traversal) that defined a tmp
  516. * in the must-alias-set. (I.e. it'll be the canonical() tmp, or the first
  517. * LdCtx we saw.)
  518. */
  519. SSATmp* representative;
  520. /*
  521. * Set of ids of the other MustAliasSets that this set may alias, in a flow
  522. * insensitive way, and not including this set itself. This is based only on
  523. * the type of the representative. See the comments at the top of this file.
  524. */
  525. jit::flat_set<ASetID> may_alias;
  526. };
  527. //////////////////////////////////////////////////////////////////////
  528. // Analysis results for memory locations known to contain balanced reference
  529. // counts. See populate_mrinfo.
  530. struct MemRefAnalysis {
  531. struct BlockInfo {
  532. uint32_t rpoId;
  533. ALocBits avail_in;
  534. ALocBits avail_out;
  535. ALocBits kill;
  536. ALocBits gen;
  537. };
  538. explicit MemRefAnalysis(IRUnit& unit) : info(unit, BlockInfo{}) {}
  539. StateVector<Block,BlockInfo> info;
  540. };
  541. //////////////////////////////////////////////////////////////////////
  542. // Per must-alias-set state information for rc_analyze.
  543. struct ASetInfo {
  544. /*
  545. * A lower bound of the actual reference count of the object that this alias
  546. * set refers to. See "RC lower bounds" in the documentation---there are
  547. * some subtleties here.
  548. */
  549. int32_t lower_bound{0};
  550. /*
  551. * Set of memory location ids that are being used to support the lower bound
  552. * of this object. The purpose of this set is to reduce lower bounds when we
  553. * see memory events that might decref a pointer: this means it's never
  554. * incorrect to leave a bit set in memory_support conservatively, but there
  555. * are situations where we must set bits here or our analysis will be wrong.
  556. *
  557. * An important note is that the bits in memory_support can represent memory
  558. * locations that possibly alias (via ALocMeta::conflicts). Setting only one
  559. * bit from the conflict set is sufficient when we know something must be in
  560. * memory in the set---any memory effects that can affect other may-aliasing
  561. * locations will still apply to all of them as needed.
  562. *
  563. * However, whenever we handle removing memory support, if you need to remove
  564. * one bit, you generally speaking are going to need to remove the support
  565. * for the whole conflict set.
  566. */
  567. ALocBits memory_support;
  568. /*
  569. * Sometimes we lose too much track of what's going on to do anything useful.
  570. * In this situation, all the sets get flagged as `pessimized', we don't do
  571. * anything to them anymore, and a Halt node is added to all graphs.
  572. *
  573. * Note: right now this state is per-ASetInfo, but we must pessimize
  574. * everything at once if we pessimize anything, because of how the analyzer
  575. * will lose track of aliasing effects. (We will probably either change it to
  576. * be per-RCState later or fix the alias handling.)
  577. */
  578. bool pessimized{false};
  579. };
  580. // State structure for rc_analyze.
  581. struct RCState {
  582. bool initialized{false};
  583. jit::vector<ASetInfo> asets;
  584. /*
  585. * MemRefAnalysis availability state. This is just part of this struct for
  586. * convenience when stepping through RCAnalysis results. It is used to know
  587. * when loads can provide memory support.
  588. */
  589. ALocBits avail;
  590. /*
  591. * Map from AliasClass ids to the must-alias-set that has it as
  592. * memory_support, if any do. At most one ASet will be supported by any
  593. * location at a time, to fit the "exclusivity" condition on lower bounds.
  594. * The mapped value is -1 if no ASet is currently supported by that location.
  595. */
  596. std::array<ASetID,kMaxTrackedALocs> support_map;
  597. };
  598. // The analysis result structure for rc_analyze. This structure gets fed into
  599. // build_graphs to create our RC graphs.
  600. struct RCAnalysis {
  601. struct BlockInfo {
  602. uint32_t rpoId;
  603. RCState state_in;
  604. };
  605. explicit RCAnalysis(IRUnit& unit) : info(unit, BlockInfo{}) {}
  606. StateVector<Block,BlockInfo> info;
  607. };
  608. //////////////////////////////////////////////////////////////////////
  609. struct Env {
  610. explicit Env(IRUnit& unit)
  611. : unit(unit)
  612. , rpoBlocks(rpoSortCfg(unit))
  613. , idoms(findDominators(unit, rpoBlocks, numberBlocks(unit, rpoBlocks)))
  614. , ainfo(collect_aliases(unit, rpoBlocks))
  615. , mrinfo(unit)
  616. , asetMap(unit, -1)
  617. {}
  618. IRUnit& unit;
  619. BlockList rpoBlocks;
  620. IdomVector idoms;
  621. Arena arena;
  622. AliasAnalysis ainfo;
  623. MemRefAnalysis mrinfo;
  624. StateVector<SSATmp,ASetID> asetMap; // -1 is invalid (not-Counted tmps)
  625. jit::vector<MustAliasSet> asets;
  626. };
  627. //////////////////////////////////////////////////////////////////////
  628. /*
  629. * Nodes in the RC flowgraphs.
  630. */
  631. enum class NT : uint8_t { Inc, Dec, Req, Phi, Sig, Halt, Empty };
  632. struct Node {
  633. Node* next{nullptr};
  634. Node* prev{nullptr}; // unused for Phi nodes; as they may have >1 preds
  635. int32_t lower_bound{0};
  636. NT type;
  637. // Counter used by optimize pass to wait to visit Phis until after
  638. // non-backedge predecessors.
  639. int16_t visit_counter{0};
  640. protected:
  641. explicit Node(NT type) : type(type) {}
  642. Node(const Node&) = default;
  643. Node& operator=(const Node&) = default;
  644. };
  645. /*
  646. * IncRef and DecRef{NZ,} nodes.
  647. */
  648. struct NInc : Node {
  649. explicit NInc(IRInstruction* inst) : Node(NT::Inc), inst(inst) {}
  650. IRInstruction* inst;
  651. };
  652. struct NDec : Node {
  653. explicit NDec(IRInstruction* inst) : Node(NT::Dec), inst(inst) {}
  654. IRInstruction* inst;
  655. };
  656. /*
  657. * Control flow splits and joins.
  658. */
  659. struct NPhi : Node {
  660. explicit NPhi(Block* block) : Node(NT::Phi), block(block) {}
  661. Block* block;
  662. Node** pred_list{0};
  663. uint32_t pred_list_cap{0};
  664. uint32_t pred_list_sz{0};
  665. uint32_t back_edge_preds{0};
  666. };
  667. struct NSig : Node {
  668. explicit NSig(Block* block) : Node(NT::Sig), block(block) {}
  669. Block* block;
  670. Node* taken{nullptr};
  671. };
  672. /*
  673. * Halt means to stop processing along this control flow path---something
  674. * during analysis had to pessimize and we can't continue.
  675. *
  676. * When we've pessimized a set, we also guarantee that all successors have a
  677. * lower_bound of zero, which will block all rcfg transformation rules from
  678. * applying, so it's actually not necessary to halt---it just prevents
  679. * processing parts of the graph unnecessarily.
  680. *
  681. * For the case of join points which were halted on one side, optimize_graph
  682. * will not process through the join because the visit_counter will never be
  683. * high enough. In the case of back edges, it may process through the loop
  684. * unnecessarily, but it won't make any illegal transformations because the
  685. * lower_bound will be zero.
  686. */
  687. struct NHalt : Node { explicit NHalt() : Node(NT::Halt) {} };
  688. /*
  689. * Empty nodes are useful for building graphs, since not every node type can
  690. * have control flow edges, but it has no meaning later.
  691. */
  692. struct NEmpty : Node { explicit NEmpty() : Node(NT::Empty) {} };
  693. /*
  694. * Req nodes mean the reference count of the object may be observed, up to some
  695. * "level". The level is a number we have to keep the lower_bound above to
  696. * avoid changing program behavior. It will be INT32_MAX on exits from the
  697. * compilation unit.
  698. */
  699. struct NReq : Node {
  700. explicit NReq(int32_t level) : Node(NT::Req), level(level) {}
  701. int32_t level;
  702. };
  703. #define X(Kind, kind) \
  704. UNUSED N##Kind* to_##kind(Node* n) { \
  705. assertx(n->type == NT::Kind); \
  706. return static_cast<N##Kind*>(n); \
  707. } \
  708. UNUSED const N##Kind* to_##kind(const Node* n) { \
  709. return to_##kind(const_cast<Node*>(n)); \
  710. }
  711. X(Inc, inc)
  712. X(Dec, dec)
  713. X(Req, req)
  714. X(Phi, phi)
  715. X(Sig, sig)
  716. X(Halt, halt)
  717. X(Empty, empty)
  718. #undef X
  719. //////////////////////////////////////////////////////////////////////
  720. template<class Kill, class Gen>
  721. void mrinfo_step_impl(Env& env,
  722. const IRInstruction& inst,
  723. Kill kill,
  724. Gen gen) {
  725. auto do_store = [&] (AliasClass dst, SSATmp* value) {
  726. /*
  727. * Pure stores potentially (temporarily) break the heap's reference count
  728. * invariants on a memory location, but only if the value being stored is
  729. * possibly counted.
  730. */
  731. if (value->type().maybe(TCounted)) {
  732. kill(env.ainfo.may_alias(canonicalize(dst)));
  733. }
  734. };
  735. auto const effects = memory_effects(inst);
  736. match<void>(
  737. effects,
  738. [&] (IrrelevantEffects) {},
  739. [&] (ExitEffects) {},
  740. [&] (ReturnEffects) {},
  741. [&] (GeneralEffects) {},
  742. [&] (UnknownEffects) { kill(ALocBits{}.set()); },
  743. [&] (PureStore x) { do_store(x.dst, x.value); },
  744. /*
  745. * Note that loads do not kill a location. In fact, it's possible that the
  746. * IR program itself could cause a location to not be `balanced' using only
  747. * PureLoads. (For example, it could load a local to decref it as part of
  748. * a return sequence.)
  749. *
  750. * It's safe not to add it to the kill set, though, because if the IR
  751. * program is destroying a memory location, it is already malformed if it
  752. * loads the location again and then uses it in a way that relies on the
  753. * pointer still being dereferenceable. Moreover, in these situations,
  754. * even though the avail bit from mrinfo will be set on the second load, we
  755. * won't be able to remove support from the previous aset, and won't raise
  756. * the lower bound on the new loaded value.
  757. */
  758. [&] (PureLoad) {},
  759. /*
  760. * Since there's no semantically correct way to do PureLoads from the
  761. * locations in a PureSpillFrame unless something must have stored over
  762. * them again first, we don't need to kill anything here.
  763. */
  764. [&] (PureSpillFrame x) {},
  765. [&] (CallEffects x) {
  766. /*
  767. * Because PHP callees can side-exit (or for that matter throw from their
  768. * prologue), the program is ill-formed unless we have balanced reference
  769. * counting for all memory locations. Even if the call has the
  770. * destroys_locals flag this is the case---after it destroys the locals
  771. * the new value will have a fully synchronized reference count.
  772. *
  773. * This may need modifications after we allow php values to span calls in
  774. * SSA registers.
  775. */
  776. gen(ALocBits{}.set());
  777. }
  778. );
  779. }
  780. // Helper for stepping after we've created a MemRefAnalysis.
  781. void mrinfo_step(Env& env, const IRInstruction& inst, ALocBits& avail) {
  782. mrinfo_step_impl(
  783. env,
  784. inst,
  785. [&] (ALocBits kill) { avail &= ~kill; },
  786. [&] (ALocBits gen) { avail |= gen; }
  787. );
  788. }
  789. /*
  790. * Perform an analysis to determine memory locations that are known to hold
  791. * "balanced" values with respect to reference counting. This means the
  792. * location "owns" a reference in the normal sense---i.e. the count on the
  793. * object is at least one on account of the pointer in that location.
  794. *
  795. * Normal ("hhbc-semantics") operations on php values in memory all preserve
  796. * balanced reference counts (i.e. a pointer in memory corresponds to one value
  797. * in the count field of the pointee). However, when we lower hhbc opcodes to
  798. * HHIR, some opcodes split up the reference counting operations from the
  799. * memory operations: when we observe a "pure store" instruction, therefore,
  800. * the location involved may no longer be "balanced" with regard to reference
  801. * counting. See further discussion in the doc comment at the top of this
  802. * file.
  803. */
  804. void populate_mrinfo(Env& env) {
  805. FTRACE(1, "populate_mrinfo ---------------------------------------\n");
  806. FTRACE(3, "locations:\n{}\n", show(env.ainfo));
  807. /*
  808. * 1. Compute block summaries.
  809. */
  810. for (auto& blk : env.rpoBlocks) {
  811. for (auto& inst : blk->instrs()) {
  812. mrinfo_step_impl(
  813. env,
  814. inst,
  815. [&] (ALocBits kill) {
  816. env.mrinfo.info[blk].kill |= kill;
  817. env.mrinfo.info[blk].gen &= ~kill;
  818. },
  819. [&] (ALocBits gen) {
  820. env.mrinfo.info[blk].gen |= gen;
  821. env.mrinfo.info[blk].kill &= ~gen;
  822. }
  823. );
  824. }
  825. }
  826. FTRACE(3, "summaries:\n{}\n",
  827. [&] () -> std::string {
  828. auto ret = std::string{};
  829. for (auto& blk : env.rpoBlocks) {
  830. folly::format(&ret, " B{: <3}: {}\n"
  831. " : {}\n",
  832. blk->id(),
  833. show(env.mrinfo.info[blk].kill),
  834. show(env.mrinfo.info[blk].gen)
  835. );
  836. }
  837. return ret;
  838. }()
  839. );
  840. /*
  841. * 2. Find fixed point of avail_in:
  842. *
  843. * avail_out = avail_in - kill + gen
  844. * avail_in = isect(pred) avail_out
  845. *
  846. * Locations that are marked "avail" mean they imply a non-zero lower bound
  847. * on the object they point to, if they contain a reference counted type, and
  848. * assuming they are actually legal to load from.
  849. */
  850. auto incompleteQ = dataflow_worklist<uint32_t>(env.rpoBlocks.size());
  851. for (auto rpoId = uint32_t{0}; rpoId < env.rpoBlocks.size(); ++rpoId) {
  852. env.mrinfo.info[env.rpoBlocks[rpoId]].rpoId = rpoId;
  853. }
  854. // avail_outs all default construct to zeros.
  855. // avail_in on the entry block (with no preds) will be set to all 1 below.
  856. incompleteQ.push(0);
  857. do {
  858. auto const blk = env.rpoBlocks[incompleteQ.pop()];
  859. auto& binfo = env.mrinfo.info[blk];
  860. binfo.avail_in.set();
  861. blk->forEachPred([&] (Block* pred) {
  862. binfo.avail_in &= env.mrinfo.info[pred].avail_out;
  863. });
  864. auto const old = binfo.avail_out;
  865. binfo.avail_out = (binfo.avail_in & ~binfo.kill) | binfo.gen;
  866. if (binfo.avail_out != old) {
  867. if (auto const t = blk->taken()) {
  868. incompleteQ.push(env.mrinfo.info[t].rpoId);
  869. }
  870. if (auto const n = blk->next()) {
  871. incompleteQ.push(env.mrinfo.info[n].rpoId);
  872. }
  873. }
  874. } while (!incompleteQ.empty());
  875. FTRACE(4, "fixed point:\n{}\n",
  876. [&] () -> std::string {
  877. auto ret = std::string{};
  878. for (auto& blk : env.rpoBlocks) {
  879. folly::format(&ret, " B{: <3}: {}\n",
  880. blk->id(),
  881. show(env.mrinfo.info[blk].avail_in)
  882. );
  883. }
  884. return ret;
  885. }()
  886. );
  887. }
  888. //////////////////////////////////////////////////////////////////////
  889. using HPHP::jit::show;
  890. DEBUG_ONLY std::string show(const boost::dynamic_bitset<>& bs) {
  891. std::ostringstream out;
  892. out << bs;
  893. return out.str();
  894. }
  895. /*
  896. * This helper for weaken_decrefs reports uses of reference-counted values that
  897. * imply that their reference count cannot be zero (or it would be a bug).
  898. * This includes any use of an SSATmp that implies the pointer isn't already
  899. * freed.
  900. *
  901. * For now, it's limited to the reference counting operations on these values,
  902. * because other types of uses need will to be evaluated on a per-instruction
  903. * basis: we can't just check instruction srcs blindly to find these types of
  904. * uses, because in general a use of an SSATmp with a reference-counted pointer
  905. * type (like Obj, Arr, etc), implies only a use of the SSA-defined pointer
  906. * value (i.e. the pointer bits sitting in a virtual SSA register), not
  907. * necessarily of the value pointed to, which is what we care about here and
  908. * isn't represented in SSA.
  909. */
  910. template<class Gen>
  911. void weaken_decref_step(const Env& env, const IRInstruction& inst, Gen gen) {
  912. switch (inst.op()) {
  913. case DecRef:
  914. case DecRefNZ:
  915. case IncRef:
  916. {
  917. auto const asetID = env.asetMap[inst.src(0)];
  918. if (asetID != -1) gen(asetID);
  919. }
  920. break;
  921. default:
  922. break;
  923. }
  924. }
  925. /*
  926. * Backward pass that weakens DecRefs to DecRefNZ if they cannot go to zero
  927. * based on future use of the value that they are DecRefing. See "Weakening
  928. * DecRefs" in the doc comment at the top of this file.
  929. */
  930. void weaken_decrefs(Env& env) {
  931. FTRACE(2, "weaken_decrefs ----------------------------------------\n");
  932. auto const poBlocks = [&] {
  933. auto ret = env.rpoBlocks;
  934. std::reverse(begin(ret), end(ret));
  935. return ret;
  936. }();
  937. /*
  938. * 0. Initialize block state structures and put all blocks in the worklist.
  939. */
  940. auto incompleteQ = dataflow_worklist<uint32_t>(poBlocks.size());
  941. struct BlockInfo {
  942. BlockInfo() {}
  943. uint32_t poId;
  944. boost::dynamic_bitset<> in_used;
  945. boost::dynamic_bitset<> out_used;
  946. boost::dynamic_bitset<> gen;
  947. };
  948. StateVector<Block,BlockInfo> blockInfos(env.unit, BlockInfo{});
  949. for (auto poId = uint32_t{0}; poId < poBlocks.size(); ++poId) {
  950. auto const blk = poBlocks[poId];
  951. blockInfos[bl