/hphp/runtime/vm/jit/refcount-opts.cpp
C++ | 1114 lines | 287 code | 69 blank | 758 comment | 16 complexity | bb860fc5ef5ee252c3419125cd178276 MD5 | raw file
- /*
- +----------------------------------------------------------------------+
- | HipHop for PHP |
- +----------------------------------------------------------------------+
- | Copyright (c) 2010-2014 Facebook, Inc. (http://www.facebook.com) |
- +----------------------------------------------------------------------+
- | This source file is subject to version 3.01 of the PHP license, |
- | that is bundled with this package in the file LICENSE, and is |
- | available through the world-wide-web at the following url: |
- | http://www.php.net/license/3_01.txt |
- | If you did not receive a copy of the PHP license and are unable to |
- | obtain it through the world-wide-web, please send a note to |
- | license@php.net so we can mail you a copy immediately. |
- +----------------------------------------------------------------------+
- */
- //////////////////////////////////////////////////////////////////////
- /*
- Welcome to refcount-opts. Theoretically reading this block comment first will
- make the rest of this file make more sense.
- -- Overview --
- This file contains passes that attempt to reduce the number and strength of
- reference counting operations in an IR program. It uses a few strategies, but
- fundamentally most of what's going on is about trying to prove that an IncRef
- is post-dominated by a DecRef that provably can't go to zero, with no events in
- between that can tell if the IncRef happened, and if so, removing the pair.
- This doc comment is going to explain a few concepts, interleaved with
- discussion on how they are used by the various analysis and optimization passes
- in this module.
- -- Must/May Alias Sets --
- The passes in this file operate on groups of SSATmp's called "must-alias-set"s
- (or often "asets" in the code). These are sets of SSATmp names that are known
- to alias the same object, in a "semi"-flow-insensitive way (see below). Every
- SSATmp that may have a reference counted type is assigned to a must-alias-set.
- Crucially, if two SSATmps belong to different must-alias-sets, they still /may/
- point to the same object. For SSATmps a and b, in this module we use
- (a&b).maybe(Counted) as a flow-insensitive May-Alias(a,b) predicate: if two
- tmps may alias but only in a way that is not reference counted, we don't care
- for our purposes.
- A subtle point worth considering is that it is possible (and not uncommon) that
- some of the SSATmps in one must-alias-set May-Alias some but not all of the
- SSATmps in another must-alias-set: the reason is that the must-alias-sets are
- still subject to SSA rules about where tmps are defined, and some of the tmps
- in a set may be defined by instructions that take conditional jumps if the
- object doesn't satisfy some condition (e.g. CheckType). This is why it may
- make sense to think of the must-alias-sets as "semi"-flow-insensitive: it's
- globally true that the names all refer to the same object, but the names
- involved aren't globally defined.
- The first thing this module does is run a function to map SSATmps to their
- must-alias-sets, and then, for each must-alias-set S, compute which of the
- other must-alias-sets contain any tmps that May-Alias any tmp from S.
- -- Weakening DecRefs --
- This file contains a relatively cheap pass that can weaken DecRefs into
- DecRefNZ by proving that they can't go to zero (unless there is already a bug
- in the program).
- The way this works is to do a backward dataflow analysis, computing
- "will_be_used_again" information. This dataflow analysis has a boolean for
- each must-alias-set, indicating whether all paths from a program point contain
- a use of that object in a way that implies their reference count is not zero
- (for example, if every path decrefs it again). Then, it converts any DecRef
- instruction on tmps whose must-alias-sets say they "will_be_used_again" to
- DecRefNZ.
- One rule this pass relies on is that it is illegal to DecRef an object in a way
- that takes its refcount to zero, and then IncRef it again after that. This is
- not illegal for trivial reasons, because object __destruct methods can
- ressurect an object in PHP. But within one JIT region, right now we declare it
- illegal to generate IR that uses an object after a DecRef that might take it to
- zero.
- Since this pass converts instructions that may (in general) re-enter the
- VM---running arbitrary PHP code for a destructor---it's potentially profitable
- to run it earlier than other parts of refcount opts. For example, it can allow
- heap memory accesses to be proven redundant that otherwise would not be, and
- can prevent the rest of this analysis from assuming some DecRefs can re-enter
- that actually can't.
- -- RC Flowgraphs --
- Other optimizations in this file are performed on "RC flowgraphs", which are an
- abstract representation of only the effects of the IR program that matter for
- the optimization, on a single must-alias-set at a time. The RC graphs contain
- explicit control flow nodes ("phi" nodes for joins, "sigma" nodes for splits),
- as well as nodes for things like decref instructions, incref instructions, and
- "req" nodes that indicate that the reference count of an object may be observed
- at that point up to some level. Nodes in an RC graph each come with a "lower
- bound" on the reference count for the graph's must-alias-set at that program
- point (more about lower bounds below)---these lower bounds are the lower bound
- before that node in the flowgraph. We build independent graphs for each
- must-alias-set, and they do not need to contain specific nodes relating to
- possible cross-set effects (based on May-Alias relationships)---that
- information is available in these graphs through the "req" nodes and lower
- bound information.
- The graphs are constructed after first computing information that allows us to
- process each must-alias-set independently. Then they are processed one at a
- time with a set of "legal transformation rules". The rules are applied in a
- single pass over the flowgraph, going forwards, but potentially backtracking
- when certain rules apply, since they may enable more rules to apply to previous
- nodes. At this point it might help to go look at one or two of the
- transformation rule examples below (e.g. rule_inc_dec_fold), but that
- documentation is not duplicated here.
- The intention is that these rules are smaller and easier to understand the
- correctness of than trying to do these transformations without an explicit data
- structure, but a disadvantage is that this pass needs to allocate a lot of
- temporary information in these graphs. The backtracking also seemed a bit
- convoluted to do directly on the IR. We may eventually change this to work
- without the extra data structure, but that's how it works right now.
- Most of the analysis code in this module is about computing the information we
- need to build these flowgraphs, before we do the actual optimizations on them.
- The rest of this doc-comment talks primarily about that analysis---see the
- comments near the rule_* functions for more about the flowgraph optimizations
- themselves, and the comments near the Node structure for a description of the
- node types in these graphs.
- -- RC "lower bounds" --
- A lower bound on the reference count of a must-alias-set indicates a known
- minimum for the value of its object's count field at that program point. This
- minimum value can be interpreted as a minimum value of the actual integer in
- memory at each point, if the program were not modified by this pass. A lower
- bound is therefore always non-negative.
- The first utility of this information is pretty obvious: if a DecRef
- instruction is encountered when the lower bound of must-alias-set is greater
- than one, that DecRef instruction can be converted to DecRefNZ, since it can't
- possibly free the object. (See the flowgraph rule_decnz.) Knowledge of the
- lower bound is also required for folding unobservable incref/decref pairs, and
- generally this information is inspected by most of the things done as RC
- flowgraph transformations.
- The lower bound must be tracked conservatively to ensure that our
- transformations are correct. This means we can increase a lower bound only
- when we see instructions that /must/ imply an increase in the object's count
- field, but we must decrease a lower bound whenever we see instructions that
- /may/ imply a decrease in the count. It might clarify this a little to list
- the reasons that a must-alias-set's lower bounds can be increased:
- o An explicit IncRef instruction in the instruction stream of a tmp in the
- must-alias-set.
- o Instructions that "produce references" (generally instructions that
- allocate new objects).
- o Some situations with loads from memory (described later).
- A must-alias-set's lower bound can be decreased in many situations, including:
- o An explicit DecRef or DecRefNZ of an SSATmp that maps to this
- must-alias-set.
- o In some situations, executing an instruction that could decref pointers
- that live in memory, for example by re-entering and running arbitrary php
- code. (Memory is discussed more shortly; this concept is called "memory
- support".)
- o An SSATmp in this must-alias-set is passed as an argument to an IR
- instruction that may decref it. (This is the "consumes reference"
- IRInstruction flag.)
- o We see an instruction that may reduce the lower bound of a different
- must-alias-set, for any reason, and that different set May-Alias this set.
- If the last point were the exact rule we used, it would potentially mean lots
- of reductions in lower bounds, which could be very pessimistic, so to obviate
- the need to do it all the time we introduce an "exclusivity" principle on
- tracking lower bounds. What this principle means is the following: if we see
- some reason in the IR to increment the lower bound in an alias set A, we can
- /only/ increment the lower bound of A, even if that same information could also
- be used to increase the lower bound of other asets. If we could theoretically
- use the same information to increase the lower bound of a different set B, we
- can't do that at the same time---we have to choose one to apply it to. (This
- situation comes up with loads and is discussed further in "About Loads".)
- This exclusivity principle provides the following rule for dealing with
- instructions that may decrease reference counts because of May-Alias
- relationships: when we need to decrease the lower bound of a must-alias-set, if
- its lower bound is currently non-zero, we have no obligation to decrement the
- lower bound in any other must-alias-set, regardless of May-Alias relationships.
- The exclusivity of the lower bound means we know we're just cancelling out
- something that raised the lower bound on this set and no other, so the state on
- other sets can't be affected.
- The pessimistic case still applies, however, if you need to reduce the lower
- bound on a must-alias-set S that currently has a lower bound of zero. Then all
- the other sets that May-Alias S must have their lower bound reduced as well.
- -- Memory Support --
- This section is going to introduce the "memory support" concept, but details
- will be fleshed out further in following sections.
- The key idea behind this concept is that we can keep lower bounds higher than
- we would otherwise be able to by tracking at least some of the pointers to the
- object that may be in memory. An alternative, conservative approach to stores,
- for example, might be to eagerly attempt to reduce the lower bound on the must
- alias set for value being stored at the location of the store itself. By
- instead keeping track of the fact that that memory location may contain a
- pointer to that must-alias-set until it may be decref'd later, we can keep the
- lower bound higher for longer.
- The state of memory support for each must-alias-set is a bitvector of the
- memory locations AliasAnalysis has identified in the program. If a bit is set,
- it indicates that that memory location may contain a pointer to that must alias
- set. When a must-alias-set has any memory support bits set, it is going to be
- analyzed more conservatively than if it doesn't. And importantly, the memory
- support bits are may-information: just because a bit is set, doesn't mean that
- we know for sure that memory location contains a pointer to this object. It
- just means that it might, and that our lower bound may have been "kept higher
- for longer" using that knowledge at some point.
- The primary thing we need to do more conservatively with must-alias-sets that
- have memory support is reduce their lower bound if they could be decref'd
- through that pointer in memory. Since this effect on the analysis just reduces
- lower bounds, it would never be incorrect to leave the memory support bit set
- forever in this situation, which is also conceptually necessary for this to
- work as may-information.
- However, if we see an instruction that could DecRef one of these objects
- through a pointer in memory and its lower_bound is currently non-zero, we can
- be sure we've accounted for that may-DecRef by balancing it with a IncRef of
- some sort that we've already observed. In this situation, we can remove the
- memory support bit to avoid futher reductions in the lower bound of that set
- via that memory location.
- Since this is may-information that makes analysis more conservative, the memory
- support bits should conceptually be or'd at merge points. It is fine to think
- of it that way for general understanding of the analysis here, but in this
- implementation we don't actually treat it that way when merging. Because we
- want to be able to quickly find memory-supported must-alias-sets from a given
- memory location when analyzing memory effects of IR instructions (i.e. without
- iterating every tracked alias set), we restrict the state to require that at
- most one must-alias-set is supported by a given memory location during the
- analysis. If we reach situations that would break that restriction, we must
- handle it conservatively (using a `pessimized' state, which is discussed some
- later, as a last resort). The merge_memory_support function elaborates on the
- details of how this is done.
- Another thing to note about memory support is that we may have more bits set
- than the current lower bound for an object. This situation can arise due to
- conservatively reducing the lower bound, or due to pure stores happening before
- IncRef instructions that raise the lower bound for that new pointer.
- Most of the complexity in this analysis is related to instructions that load or
- store from memory, and therefore interacts with memory support. There are
- enough details to discuss it futher in next several sections of this doc.
- -- About Loads --
- On entry to a region, it is assumed that normal VM reference count invariants
- hold for all memory---specificaly, each reference count on each object in the
- heap is exactly the number of live pointers to that object. And in general,
- accesses to memory must maintain this invariant when they are done outside of
- small regions that may temporarily break that invariant. We make use of this
- fact to increase object lower bounds.
- Accesses to memory within an IR compilation unit may be lowered into
- instructions that separate reference count manipulation from stores and loads
- (which is necessary for this pass to be able to optimize the former), so we
- can't just assume loading a value from somewhere implies that there is a
- reference count on the object, since our input program itself may have changed
- that. Furthermore, our input program may contain complex instructions other
- than lowered stores that can "store over" memory locations we've already loaded
- from, with a decref of the old value, and our analysis pass needs to reduce
- lower bounds when we see those situations if we were using that memory location
- to increase a lower bound on a loaded value.
- To accomplish this, first this module performs a forward dataflow analysis to
- compute program locations at which each memory location assigned an id by
- AliasAnalysis are known to be "balanced" with regard to reference counting.
- The gist of this is that if the last thing to manipulate a memory location must
- have been code outside of this region, future loads from the memory location
- define SSATmps that we know must have a lower bound of 1, corresponding to the
- live pointer in that memory location. However, when this analysis observes a
- PureStore---i.e. a lowered, within-region store that does not imply reference
- counting---a future load does not imply anything about the reference count,
- because the program may have potentially written a pointer there that is not
- yet "balanced" (we would need to see an IncRef or some other instruction
- associated with the stored value to know that it has a reference).
- Using the results of this analysis, we can add to the lower bound of some
- must-alias-sets when we see loads from locations that are known to be balanced
- at that program point. When we do this, we must also track that the object has
- a pointer in memory, which could cause a reduction in the lower bound later if
- someone could decref it through that pointer, so the location must be added to
- the memory support bitvector for that must-alias-set. Whenever we see complex
- instructions that may store to memory with the normal VM semantics of decrefing
- old values, if they could overwrite locations that are currently "supporting"
- one of our alias sets, we need to remove one from the alias set's lower bound
- in case it decided to overwrite (and decref) the pointer that was in memory.
- The "exclusivity" guarantee of our lower bounds requires that if we want to
- raise the lower bound of an object because it was loaded from a memory location
- known to be "balanced", then we only raise the lower bound for this reason on
- at most one must-alias-set at a time. This means if we see a load from a
- location that is known to contain a balanced pointer, but we were already using
- that location as memory support on a different set, we either need to remove
- one from the lower bound of the other set before adding one to the new set, or
- leave everything alone. This commonly happens across php calls right now,
- where values must be reloaded from memory because SSATmps can't span calls.
- The way this pass currently handles this is the following: if we can reduce the
- lower bound on the other set (because it has a non-zero lower bound), we'll
- take the first choice, since the previously supported aset will probably not be
- used again if we're spanning a call. On the other hand, if the old set has a
- lower bound of zero, so we can't compensate for removing it, we leave
- everything alone.
- -- Effects of Pure Stores on Memory Support --
- There are two main kinds of stores from the perspective of this module. There
- are lowered stores (PureStore and PureSpillFrame) that happen within our IR
- compilation unit, and don't imply reference count manipulation, and there are
- stores that happen with "hhbc semantics" outside of the visibility of this
- compilation unit, which imply decreffing the value that used to live in a
- memory location as it's replaced with a new one. This module needs some
- understanding of both types, and both of these types of stores affect memory
- support, but in different ways.
- For any instruction that may do non-lowered stores outside of our unit ("stores
- with hhbc semantics"), if the location(s) it may be storing to could be
- supporting the lower bound in any must-alias-set, we should remove the support
- and decrement the lower bound, because it could DecRef the value in order to
- replace it with something else. If we can't actually reduce the lower bound
- (because it's already zero), we must leave the memory support flag alone,
- because we haven't really accounted for the reference that lived in that memory
- location, and it might not have actually been DecRef'd at that program point,
- and could be DecRef'd later after we've seen another IncRef. If we didn't
- leave the bit alone in this situation, the lower bound could end up too high
- after a later IncRef.
- On the other hand, for a PureStore with a known destination, we don't need to
- reduce the lower bound of any set that was supported by that location, since it
- never implies a DecRef. If the input IR program itself is trying to store to
- that location "with hhbc semantics", then the program will also explicitly
- contain the other lowered parts of this high level store, including any
- appropriate loads and DecRefs of the old value, so we won't miss their effects.
- So, for a PureStore we can simply mark the location as no-longer providing
- memory support on the set it used to, but leave the lower bound alone.
- The final case is a PureStore to an unknown location (either because it was not
- supplied a AliasAnalysis id, or because it stored to something like a PtrToGen
- that could refer to anything in memory). In this situation, it may or may not
- be overwriting a location we had been using for memory support---however, it's
- harmless to leave that state alone, with the following rationale:
- If it doesn't actually overwrite it, then obviously things are the same, and
- we're good. On the other hand, if it does actually overwrite it, then we don't
- need to adjust the lower bound still, because it's a pure store (i.e. for the
- same reason we didn't reduce the lower bound in the case above where we knew
- where the store was going). If we do nothing to our state, the only difference
- from the known location, then, is that we may have "unnecessarily" left a
- must-alias-set marked as getting memory support when it doesn't need to be
- anymore. But the point of marking part of a lower bound as coming from memory
- support is just so that future stores (or loads) can potentially /reduce/ its
- lower bound, so at worst it could reduce it later when it wouldn't really have
- needed to if we had better information about where the store was going. In
- other words, it can be thought of as an optimization to clear the memory
- support state when we see a PureStore with a known target location: it's not
- required for correctness.
- -- Effects of Pure Stores on the Must-Alias-Set Being Stored --
- The other thing to take into account with stores is that they put a (possibly
- new) pointer in memory, which means it now could be loaded and DecRef'd later,
- possibly by code we can't directly see in our compilation unit. To handle
- this, we can divide things into four situations, based on two boolean
- attributes: whether or not we have an AliasAnalysis bit for the location being
- stored to ("known" vs "unknown" location), and whether or not the lower bound
- on the must-alias-set for the value being stored is currently above zero.
- The reason the lower bound matters when we see the store is the following:
- we've possibly created a pointer in memory, which could be DecRef'd later, but
- if the lower bound is zero we don't have a way to account for that, since it
- can't go negative. It's not ok to just ignore this. Take the following
- example, where t1 has a lower bound of zero:
- StMem ptr, t1
- IncRef t1
- IncRef t1
- RaiseWarning "something" // any instruction that can re-enter and decref t1
- DecRef t1
- ...
- If we simply ignored the fact that a new pointer has been created at the store,
- that means the lower bound would be two after the two IncRefs, with no memory
- support flags. Then when we see the RaiseWarning, we won't know we need to
- reduce the lower bound, since we didn't account for the store, and now we'll
- think we can change the DecRef to DecRefNZ, but this is not actually a sound
- transformation.
- If the input program is not malformed, it will in fact be doing a 'balancing'
- IncRef for any new pointers it creates, before anything could access it---in
- fact it may have done that before the store, but our analysis in general
- could've lost that information in the tracked lower bound because of a
- May-Alias decref, or because it was done through an SSATmp that is mapped to a
- different must-alias-set that actually is the same object (although we don't
- know).
- With this in mind, we'll discuss all four cases:
- Unknown target, Zero LB:
- We flag all must-alias-sets as "pessimized". This state inserts a Halt
- node in each of the RC flowgraphs, and stops all optimizations along that
- control flow path: it prevents us from doing anything else in any
- successor blocks.
- Known target, Zero LB:
- Unlike the above, this case is not that uncommon. Since we know where it
- is going, we don't have to give up on everything. Instead, we leave the
- lower bound at zero, but set a memory support bit for the new location.
- Recall that we can in general have more memory support locations for one
- aset than the tracked lower bound---this is one situation that can cause
- that.
- Unknown target, Non-Zero LB:
- We don't know where the store is going, but we can account for balancing
- the possibly-new pointer. In this case, we decrement the lower bound and
- just eagerly behave as if the must-alias-set for the stored value may be
- decref'd right there. Since the lower bound is non-zero, we don't need to
- worry about changing lower bounds in other sets that May-Alias this one,
- because of the "exclusivity" rule for lower bounds.
- Known target, Non-Zero LB:
- Since we know where the new pointer will be, similar to the second case,
- we don't need to reduce the lower bound yet---we can wait until we see an
- instruction that might decref our object through that pointer. In this
- case, we can just mark the target location as memory support for the
- must-alias-set for the stored value, and leave its lower bound alone.
- -- More about Memory --
- Another consideration about memory in this module arises from the fact that our
- analysis passes make no attempt to track which object pointers may be escaped.
- For that matter, much of the optimization we currently do here is removing
- redundant reference counting of locals and eval stack slots, which arises from
- lowering the HHBC stack machine semantics to HHIR---generally speaking these
- values could be accessible through the heap as far as we know. This is
- important because it means that we can make no transformations to the program
- that would affect the behavior of increfs or decrefs in memory locations we
- aren't tracking, on the off chance they happen to contain a pointer to one of
- our tracked objects.
- The way we maintain correctness here is to never move or eliminate reference
- counting operations unless we know about at least /two/ references to the
- object being counted. The importance of this is easiest to illustrate with
- delayed increfs (relevant to rules inc_pass_req, inc_pass_phi, and
- inc_pass_sig), although it applies to inc/dec pair removal also: it is fine to
- move an incref forward in the IR instruction stream, as long as nothing could
- observe the difference between the reference count the object "should" have,
- and the one it will have after we delay the incref. We need to consider how
- reachability from the heap can affect this.
- If the lower bound at an incref instruction is two or greater, we know we can
- push the incref down as much as we want (basically until we reach an exit from
- the compilation unit, or until we reach something that may decref the object
- and reduce the lower bound). On the other hand, if the lower bound before the
- incref is zero, in order to move the incref forward, we would need to stop at
- any instruction that could decref /anything/ in any memory location, since
- we're making the assumption that there may be other live pointers to the
- object---if we were to push that incref forward, we could change whether other
- pointers to the object are considered the last reference, and cause a decref to
- free the object when it shouldn't. (We could try to do this on the rc
- flowgraphs, but at least in a trivial implementation it would lead to a much
- larger number of flowgraph nodes, so instead we leave easy cases to a separate,
- local, "remove_trivial_incdecs" pass and ignore hard cases.)
- The above two cases are relatively straightforward. The remaining case is when
- the lower bound before an incref is one. It turns out to be safe to sink in
- this case, and it fits the idea that we "know about two references". Whatever
- caused the lower bound to be one before the incref will ensure that the
- object's liveness is not affected---here's why:
- There are two possibilities: the object is either reachable through at least
- one unknown pointer, or it isn't. If it isn't, then the safety of moving the
- incref is relatively straight-forward: we'll be pushing the actual /second/
- reference down, and it is safe to push it as long as we don't move it through
- something that may decref it (or until we reach an exit from the compilation
- unit). For the other possibility, it is sufficient to consider only having one
- unknown pointer: in this situation, we're pushing the actual /third/ reference
- down, and if anything decrefs the object through the pointer we don't know
- about, it will still know not to free it because we left the second reference
- alone (whatever was causing our lower bound to be one), and therefore a decref
- through this unknown pointer won't think it is removing the last reference.
- Also worth discussing is that there are several runtime objects in the VM with
- operations that have behavioral differences based on whether the reference
- count is greater than one. For instance, types like KindOfString and
- KindOfArray do in place updates when they have a refcount of one, and KindOfRef
- is treated "observably" as a php reference only if the refcount is greater than
- one. Making sure we don't change these situations is actually the same
- condition as discussed above: by the above scheme for not changing whether
- pointers we don't know about constitute the last counted reference to an
- object, we are both preventing decrefs from going to zero when they shouldn't,
- and modifications to objects from failing to COW when they should.
- A fundamental meta-rule that arises out of all the above considerations for any
- of the RC flowgraph transformation rules is that we cannot move (or remove)
- increfs unless the lower bound on the incref node is at least one (meaning
- after the incref we "know about two references"). Similarly, anything that
- could reduce the lower bound must put a node in the RC flowgraph to update that
- information (a Req{1} node usually) so we don't push increfs too far or remove
- them when we shouldn't.
- -- "Trivial" incdec removal pass --
- This module also contains a local optimization that removes IncRef/DecRefNZ
- pairs in a block that have no non-"pure" memory-accessing instructions in
- between them.
- This optimization can be performed without regard to the lower bound of any
- objects involved, and the DecRef -> DecRefNZ transformations the rest of the
- code makes can create situations where these opportunities are visible. Some
- of these situations would be removable by the main pass if we had a more
- complicated scheme for dealing with "unknown heap pointers" (i.e. the stuff in
- the "more about memory" section described above). But other situations may
- also occur because the main pass may create unnecessary Req nodes in the middle
- of code sequences that don't really observe references when we're dealing with
- unrelated PureStores of possibly-aliasing tmps that have lower bounds of zero.
- In general it is a simple pass to reason about the correctness of, and it
- cleans up some things we can miss, so it is easier to do some of the work this
- way than to complicate the main pass further.
- */
- //////////////////////////////////////////////////////////////////////
- #include "hphp/runtime/vm/jit/opt.h"
- #include <algorithm>
- #include <cstdio>
- #include <string>
- #include <limits>
- #include <sstream>
- #include <array>
- #include <tuple>
- #include <folly/Format.h>
- #include <folly/ScopeGuard.h>
- #include <folly/Conv.h>
- #include <boost/dynamic_bitset.hpp>
- #include "hphp/util/safe-cast.h"
- #include "hphp/util/dataflow-worklist.h"
- #include "hphp/util/match.h"
- #include "hphp/util/trace.h"
- #include "hphp/runtime/vm/jit/ir-unit.h"
- #include "hphp/runtime/vm/jit/pass-tracer.h"
- #include "hphp/runtime/vm/jit/cfg.h"
- #include "hphp/runtime/vm/jit/state-vector.h"
- #include "hphp/runtime/vm/jit/ir-instruction.h"
- #include "hphp/runtime/vm/jit/analysis.h"
- #include "hphp/runtime/vm/jit/block.h"
- #include "hphp/runtime/vm/jit/containers.h"
- #include "hphp/runtime/vm/jit/memory-effects.h"
- #include "hphp/runtime/vm/jit/alias-analysis.h"
- #include "hphp/runtime/vm/jit/mutation.h"
- #include "hphp/runtime/vm/jit/timer.h"
- namespace HPHP { namespace jit {
- namespace {
- TRACE_SET_MOD(hhir_refcount);
- //////////////////////////////////////////////////////////////////////
- /*
- * Id's of must-alias-sets. We use -1 as an invalid id.
- */
- using ASetID = int32_t;
- struct MustAliasSet {
- explicit MustAliasSet(Type widestType, SSATmp* representative)
- : widestType(widestType)
- , representative(representative)
- {}
- /*
- * Widest type for this MustAliasSet, used for computing may-alias
- * information.
- *
- * Because of how we build MustAliasSets (essentially canonical(), or groups
- * of LdCtx instructions), it is guaranteed that this widestType includes all
- * possible values for the set. However it is not the case that every tmp in
- * the set necessarily has a subtype of widestType, because of situations
- * that can occur with AssertType and interface types. This does not affect
- * correctness, but it's worth being aware of.
- */
- Type widestType;
- /*
- * A representative of the set. This is only used for debug tracing, and is
- * currently the first instruction (in an rpo traversal) that defined a tmp
- * in the must-alias-set. (I.e. it'll be the canonical() tmp, or the first
- * LdCtx we saw.)
- */
- SSATmp* representative;
- /*
- * Set of ids of the other MustAliasSets that this set may alias, in a flow
- * insensitive way, and not including this set itself. This is based only on
- * the type of the representative. See the comments at the top of this file.
- */
- jit::flat_set<ASetID> may_alias;
- };
- //////////////////////////////////////////////////////////////////////
- // Analysis results for memory locations known to contain balanced reference
- // counts. See populate_mrinfo.
- struct MemRefAnalysis {
- struct BlockInfo {
- uint32_t rpoId;
- ALocBits avail_in;
- ALocBits avail_out;
- ALocBits kill;
- ALocBits gen;
- };
- explicit MemRefAnalysis(IRUnit& unit) : info(unit, BlockInfo{}) {}
- StateVector<Block,BlockInfo> info;
- };
- //////////////////////////////////////////////////////////////////////
- // Per must-alias-set state information for rc_analyze.
- struct ASetInfo {
- /*
- * A lower bound of the actual reference count of the object that this alias
- * set refers to. See "RC lower bounds" in the documentation---there are
- * some subtleties here.
- */
- int32_t lower_bound{0};
- /*
- * Set of memory location ids that are being used to support the lower bound
- * of this object. The purpose of this set is to reduce lower bounds when we
- * see memory events that might decref a pointer: this means it's never
- * incorrect to leave a bit set in memory_support conservatively, but there
- * are situations where we must set bits here or our analysis will be wrong.
- *
- * An important note is that the bits in memory_support can represent memory
- * locations that possibly alias (via ALocMeta::conflicts). Setting only one
- * bit from the conflict set is sufficient when we know something must be in
- * memory in the set---any memory effects that can affect other may-aliasing
- * locations will still apply to all of them as needed.
- *
- * However, whenever we handle removing memory support, if you need to remove
- * one bit, you generally speaking are going to need to remove the support
- * for the whole conflict set.
- */
- ALocBits memory_support;
- /*
- * Sometimes we lose too much track of what's going on to do anything useful.
- * In this situation, all the sets get flagged as `pessimized', we don't do
- * anything to them anymore, and a Halt node is added to all graphs.
- *
- * Note: right now this state is per-ASetInfo, but we must pessimize
- * everything at once if we pessimize anything, because of how the analyzer
- * will lose track of aliasing effects. (We will probably either change it to
- * be per-RCState later or fix the alias handling.)
- */
- bool pessimized{false};
- };
- // State structure for rc_analyze.
- struct RCState {
- bool initialized{false};
- jit::vector<ASetInfo> asets;
- /*
- * MemRefAnalysis availability state. This is just part of this struct for
- * convenience when stepping through RCAnalysis results. It is used to know
- * when loads can provide memory support.
- */
- ALocBits avail;
- /*
- * Map from AliasClass ids to the must-alias-set that has it as
- * memory_support, if any do. At most one ASet will be supported by any
- * location at a time, to fit the "exclusivity" condition on lower bounds.
- * The mapped value is -1 if no ASet is currently supported by that location.
- */
- std::array<ASetID,kMaxTrackedALocs> support_map;
- };
- // The analysis result structure for rc_analyze. This structure gets fed into
- // build_graphs to create our RC graphs.
- struct RCAnalysis {
- struct BlockInfo {
- uint32_t rpoId;
- RCState state_in;
- };
- explicit RCAnalysis(IRUnit& unit) : info(unit, BlockInfo{}) {}
- StateVector<Block,BlockInfo> info;
- };
- //////////////////////////////////////////////////////////////////////
- struct Env {
- explicit Env(IRUnit& unit)
- : unit(unit)
- , rpoBlocks(rpoSortCfg(unit))
- , idoms(findDominators(unit, rpoBlocks, numberBlocks(unit, rpoBlocks)))
- , ainfo(collect_aliases(unit, rpoBlocks))
- , mrinfo(unit)
- , asetMap(unit, -1)
- {}
- IRUnit& unit;
- BlockList rpoBlocks;
- IdomVector idoms;
- Arena arena;
- AliasAnalysis ainfo;
- MemRefAnalysis mrinfo;
- StateVector<SSATmp,ASetID> asetMap; // -1 is invalid (not-Counted tmps)
- jit::vector<MustAliasSet> asets;
- };
- //////////////////////////////////////////////////////////////////////
- /*
- * Nodes in the RC flowgraphs.
- */
- enum class NT : uint8_t { Inc, Dec, Req, Phi, Sig, Halt, Empty };
- struct Node {
- Node* next{nullptr};
- Node* prev{nullptr}; // unused for Phi nodes; as they may have >1 preds
- int32_t lower_bound{0};
- NT type;
- // Counter used by optimize pass to wait to visit Phis until after
- // non-backedge predecessors.
- int16_t visit_counter{0};
- protected:
- explicit Node(NT type) : type(type) {}
- Node(const Node&) = default;
- Node& operator=(const Node&) = default;
- };
- /*
- * IncRef and DecRef{NZ,} nodes.
- */
- struct NInc : Node {
- explicit NInc(IRInstruction* inst) : Node(NT::Inc), inst(inst) {}
- IRInstruction* inst;
- };
- struct NDec : Node {
- explicit NDec(IRInstruction* inst) : Node(NT::Dec), inst(inst) {}
- IRInstruction* inst;
- };
- /*
- * Control flow splits and joins.
- */
- struct NPhi : Node {
- explicit NPhi(Block* block) : Node(NT::Phi), block(block) {}
- Block* block;
- Node** pred_list{0};
- uint32_t pred_list_cap{0};
- uint32_t pred_list_sz{0};
- uint32_t back_edge_preds{0};
- };
- struct NSig : Node {
- explicit NSig(Block* block) : Node(NT::Sig), block(block) {}
- Block* block;
- Node* taken{nullptr};
- };
- /*
- * Halt means to stop processing along this control flow path---something
- * during analysis had to pessimize and we can't continue.
- *
- * When we've pessimized a set, we also guarantee that all successors have a
- * lower_bound of zero, which will block all rcfg transformation rules from
- * applying, so it's actually not necessary to halt---it just prevents
- * processing parts of the graph unnecessarily.
- *
- * For the case of join points which were halted on one side, optimize_graph
- * will not process through the join because the visit_counter will never be
- * high enough. In the case of back edges, it may process through the loop
- * unnecessarily, but it won't make any illegal transformations because the
- * lower_bound will be zero.
- */
- struct NHalt : Node { explicit NHalt() : Node(NT::Halt) {} };
- /*
- * Empty nodes are useful for building graphs, since not every node type can
- * have control flow edges, but it has no meaning later.
- */
- struct NEmpty : Node { explicit NEmpty() : Node(NT::Empty) {} };
- /*
- * Req nodes mean the reference count of the object may be observed, up to some
- * "level". The level is a number we have to keep the lower_bound above to
- * avoid changing program behavior. It will be INT32_MAX on exits from the
- * compilation unit.
- */
- struct NReq : Node {
- explicit NReq(int32_t level) : Node(NT::Req), level(level) {}
- int32_t level;
- };
- #define X(Kind, kind) \
- UNUSED N##Kind* to_##kind(Node* n) { \
- assertx(n->type == NT::Kind); \
- return static_cast<N##Kind*>(n); \
- } \
- UNUSED const N##Kind* to_##kind(const Node* n) { \
- return to_##kind(const_cast<Node*>(n)); \
- }
- X(Inc, inc)
- X(Dec, dec)
- X(Req, req)
- X(Phi, phi)
- X(Sig, sig)
- X(Halt, halt)
- X(Empty, empty)
- #undef X
- //////////////////////////////////////////////////////////////////////
- template<class Kill, class Gen>
- void mrinfo_step_impl(Env& env,
- const IRInstruction& inst,
- Kill kill,
- Gen gen) {
- auto do_store = [&] (AliasClass dst, SSATmp* value) {
- /*
- * Pure stores potentially (temporarily) break the heap's reference count
- * invariants on a memory location, but only if the value being stored is
- * possibly counted.
- */
- if (value->type().maybe(TCounted)) {
- kill(env.ainfo.may_alias(canonicalize(dst)));
- }
- };
- auto const effects = memory_effects(inst);
- match<void>(
- effects,
- [&] (IrrelevantEffects) {},
- [&] (ExitEffects) {},
- [&] (ReturnEffects) {},
- [&] (GeneralEffects) {},
- [&] (UnknownEffects) { kill(ALocBits{}.set()); },
- [&] (PureStore x) { do_store(x.dst, x.value); },
- /*
- * Note that loads do not kill a location. In fact, it's possible that the
- * IR program itself could cause a location to not be `balanced' using only
- * PureLoads. (For example, it could load a local to decref it as part of
- * a return sequence.)
- *
- * It's safe not to add it to the kill set, though, because if the IR
- * program is destroying a memory location, it is already malformed if it
- * loads the location again and then uses it in a way that relies on the
- * pointer still being dereferenceable. Moreover, in these situations,
- * even though the avail bit from mrinfo will be set on the second load, we
- * won't be able to remove support from the previous aset, and won't raise
- * the lower bound on the new loaded value.
- */
- [&] (PureLoad) {},
- /*
- * Since there's no semantically correct way to do PureLoads from the
- * locations in a PureSpillFrame unless something must have stored over
- * them again first, we don't need to kill anything here.
- */
- [&] (PureSpillFrame x) {},
- [&] (CallEffects x) {
- /*
- * Because PHP callees can side-exit (or for that matter throw from their
- * prologue), the program is ill-formed unless we have balanced reference
- * counting for all memory locations. Even if the call has the
- * destroys_locals flag this is the case---after it destroys the locals
- * the new value will have a fully synchronized reference count.
- *
- * This may need modifications after we allow php values to span calls in
- * SSA registers.
- */
- gen(ALocBits{}.set());
- }
- );
- }
- // Helper for stepping after we've created a MemRefAnalysis.
- void mrinfo_step(Env& env, const IRInstruction& inst, ALocBits& avail) {
- mrinfo_step_impl(
- env,
- inst,
- [&] (ALocBits kill) { avail &= ~kill; },
- [&] (ALocBits gen) { avail |= gen; }
- );
- }
- /*
- * Perform an analysis to determine memory locations that are known to hold
- * "balanced" values with respect to reference counting. This means the
- * location "owns" a reference in the normal sense---i.e. the count on the
- * object is at least one on account of the pointer in that location.
- *
- * Normal ("hhbc-semantics") operations on php values in memory all preserve
- * balanced reference counts (i.e. a pointer in memory corresponds to one value
- * in the count field of the pointee). However, when we lower hhbc opcodes to
- * HHIR, some opcodes split up the reference counting operations from the
- * memory operations: when we observe a "pure store" instruction, therefore,
- * the location involved may no longer be "balanced" with regard to reference
- * counting. See further discussion in the doc comment at the top of this
- * file.
- */
- void populate_mrinfo(Env& env) {
- FTRACE(1, "populate_mrinfo ---------------------------------------\n");
- FTRACE(3, "locations:\n{}\n", show(env.ainfo));
- /*
- * 1. Compute block summaries.
- */
- for (auto& blk : env.rpoBlocks) {
- for (auto& inst : blk->instrs()) {
- mrinfo_step_impl(
- env,
- inst,
- [&] (ALocBits kill) {
- env.mrinfo.info[blk].kill |= kill;
- env.mrinfo.info[blk].gen &= ~kill;
- },
- [&] (ALocBits gen) {
- env.mrinfo.info[blk].gen |= gen;
- env.mrinfo.info[blk].kill &= ~gen;
- }
- );
- }
- }
- FTRACE(3, "summaries:\n{}\n",
- [&] () -> std::string {
- auto ret = std::string{};
- for (auto& blk : env.rpoBlocks) {
- folly::format(&ret, " B{: <3}: {}\n"
- " : {}\n",
- blk->id(),
- show(env.mrinfo.info[blk].kill),
- show(env.mrinfo.info[blk].gen)
- );
- }
- return ret;
- }()
- );
- /*
- * 2. Find fixed point of avail_in:
- *
- * avail_out = avail_in - kill + gen
- * avail_in = isect(pred) avail_out
- *
- * Locations that are marked "avail" mean they imply a non-zero lower bound
- * on the object they point to, if they contain a reference counted type, and
- * assuming they are actually legal to load from.
- */
- auto incompleteQ = dataflow_worklist<uint32_t>(env.rpoBlocks.size());
- for (auto rpoId = uint32_t{0}; rpoId < env.rpoBlocks.size(); ++rpoId) {
- env.mrinfo.info[env.rpoBlocks[rpoId]].rpoId = rpoId;
- }
- // avail_outs all default construct to zeros.
- // avail_in on the entry block (with no preds) will be set to all 1 below.
- incompleteQ.push(0);
- do {
- auto const blk = env.rpoBlocks[incompleteQ.pop()];
- auto& binfo = env.mrinfo.info[blk];
- binfo.avail_in.set();
- blk->forEachPred([&] (Block* pred) {
- binfo.avail_in &= env.mrinfo.info[pred].avail_out;
- });
- auto const old = binfo.avail_out;
- binfo.avail_out = (binfo.avail_in & ~binfo.kill) | binfo.gen;
- if (binfo.avail_out != old) {
- if (auto const t = blk->taken()) {
- incompleteQ.push(env.mrinfo.info[t].rpoId);
- }
- if (auto const n = blk->next()) {
- incompleteQ.push(env.mrinfo.info[n].rpoId);
- }
- }
- } while (!incompleteQ.empty());
- FTRACE(4, "fixed point:\n{}\n",
- [&] () -> std::string {
- auto ret = std::string{};
- for (auto& blk : env.rpoBlocks) {
- folly::format(&ret, " B{: <3}: {}\n",
- blk->id(),
- show(env.mrinfo.info[blk].avail_in)
- );
- }
- return ret;
- }()
- );
- }
- //////////////////////////////////////////////////////////////////////
- using HPHP::jit::show;
- DEBUG_ONLY std::string show(const boost::dynamic_bitset<>& bs) {
- std::ostringstream out;
- out << bs;
- return out.str();
- }
- /*
- * This helper for weaken_decrefs reports uses of reference-counted values that
- * imply that their reference count cannot be zero (or it would be a bug).
- * This includes any use of an SSATmp that implies the pointer isn't already
- * freed.
- *
- * For now, it's limited to the reference counting operations on these values,
- * because other types of uses need will to be evaluated on a per-instruction
- * basis: we can't just check instruction srcs blindly to find these types of
- * uses, because in general a use of an SSATmp with a reference-counted pointer
- * type (like Obj, Arr, etc), implies only a use of the SSA-defined pointer
- * value (i.e. the pointer bits sitting in a virtual SSA register), not
- * necessarily of the value pointed to, which is what we care about here and
- * isn't represented in SSA.
- */
- template<class Gen>
- void weaken_decref_step(const Env& env, const IRInstruction& inst, Gen gen) {
- switch (inst.op()) {
- case DecRef:
- case DecRefNZ:
- case IncRef:
- {
- auto const asetID = env.asetMap[inst.src(0)];
- if (asetID != -1) gen(asetID);
- }
- break;
- default:
- break;
- }
- }
- /*
- * Backward pass that weakens DecRefs to DecRefNZ if they cannot go to zero
- * based on future use of the value that they are DecRefing. See "Weakening
- * DecRefs" in the doc comment at the top of this file.
- */
- void weaken_decrefs(Env& env) {
- FTRACE(2, "weaken_decrefs ----------------------------------------\n");
- auto const poBlocks = [&] {
- auto ret = env.rpoBlocks;
- std::reverse(begin(ret), end(ret));
- return ret;
- }();
- /*
- * 0. Initialize block state structures and put all blocks in the worklist.
- */
- auto incompleteQ = dataflow_worklist<uint32_t>(poBlocks.size());
- struct BlockInfo {
- BlockInfo() {}
- uint32_t poId;
- boost::dynamic_bitset<> in_used;
- boost::dynamic_bitset<> out_used;
- boost::dynamic_bitset<> gen;
- };
- StateVector<Block,BlockInfo> blockInfos(env.unit, BlockInfo{});
- for (auto poId = uint32_t{0}; poId < poBlocks.size(); ++poId) {
- auto const blk = poBlocks[poId];
- blockInfos[bl