refcount-opts.cpp | searchcode

/hphp/runtime/vm/jit/refcount-opts.cpp

https://gitlab.com/Blueprint-Marketing/hhvm
C++ | 1114 lines | 287 code | 69 blank | 758 comment | 16 complexity | bb860fc5ef5ee252c3419125cd178276 MD5 | raw file

/*
   +----------------------------------------------------------------------+
   | HipHop for PHP                                                       |
   +----------------------------------------------------------------------+
   | Copyright (c) 2010-2014 Facebook, Inc. (http://www.facebook.com)     |
   +----------------------------------------------------------------------+
   | This source file is subject to version 3.01 of the PHP license,      |
   | that is bundled with this package in the file LICENSE, and is        |
   | available through the world-wide-web at the following url:           |
   | http://www.php.net/license/3_01.txt                                  |
   | If you did not receive a copy of the PHP license and are unable to   |
   | obtain it through the world-wide-web, please send a note to          |
   | license@php.net so we can mail you a copy immediately.               |
   +----------------------------------------------------------------------+
*/

//////////////////////////////////////////////////////////////////////

/*

Welcome to refcount-opts.  Theoretically reading this block comment first will
make the rest of this file make more sense.


-- Overview --

This file contains passes that attempt to reduce the number and strength of
reference counting operations in an IR program.  It uses a few strategies, but
fundamentally most of what's going on is about trying to prove that an IncRef
is post-dominated by a DecRef that provably can't go to zero, with no events in
between that can tell if the IncRef happened, and if so, removing the pair.

This doc comment is going to explain a few concepts, interleaved with
discussion on how they are used by the various analysis and optimization passes
in this module.


-- Must/May Alias Sets --

The passes in this file operate on groups of SSATmp's called "must-alias-set"s
(or often "asets" in the code).  These are sets of SSATmp names that are known
to alias the same object, in a "semi"-flow-insensitive way (see below).  Every
SSATmp that may have a reference counted type is assigned to a must-alias-set.
Crucially, if two SSATmps belong to different must-alias-sets, they still /may/
point to the same object.  For SSATmps a and b, in this module we use
(a&b).maybe(Counted) as a flow-insensitive May-Alias(a,b) predicate: if two
tmps may alias but only in a way that is not reference counted, we don't care
for our purposes.

A subtle point worth considering is that it is possible (and not uncommon) that
some of the SSATmps in one must-alias-set May-Alias some but not all of the
SSATmps in another must-alias-set: the reason is that the must-alias-sets are
still subject to SSA rules about where tmps are defined, and some of the tmps
in a set may be defined by instructions that take conditional jumps if the
object doesn't satisfy some condition (e.g. CheckType).  This is why it may
make sense to think of the must-alias-sets as "semi"-flow-insensitive: it's
globally true that the names all refer to the same object, but the names
involved aren't globally defined.

The first thing this module does is run a function to map SSATmps to their
must-alias-sets, and then, for each must-alias-set S, compute which of the
other must-alias-sets contain any tmps that May-Alias any tmp from S.


-- Weakening DecRefs --

This file contains a relatively cheap pass that can weaken DecRefs into
DecRefNZ by proving that they can't go to zero (unless there is already a bug
in the program).

The way this works is to do a backward dataflow analysis, computing
"will_be_used_again" information.  This dataflow analysis has a boolean for
each must-alias-set, indicating whether all paths from a program point contain
a use of that object in a way that implies their reference count is not zero
(for example, if every path decrefs it again).  Then, it converts any DecRef
instruction on tmps whose must-alias-sets say they "will_be_used_again" to
DecRefNZ.

One rule this pass relies on is that it is illegal to DecRef an object in a way
that takes its refcount to zero, and then IncRef it again after that.  This is
not illegal for trivial reasons, because object __destruct methods can
ressurect an object in PHP.  But within one JIT region, right now we declare it
illegal to generate IR that uses an object after a DecRef that might take it to
zero.

Since this pass converts instructions that may (in general) re-enter the
VM---running arbitrary PHP code for a destructor---it's potentially profitable
to run it earlier than other parts of refcount opts.  For example, it can allow
heap memory accesses to be proven redundant that otherwise would not be, and
can prevent the rest of this analysis from assuming some DecRefs can re-enter
that actually can't.


-- RC Flowgraphs --

Other optimizations in this file are performed on "RC flowgraphs", which are an
abstract representation of only the effects of the IR program that matter for
the optimization, on a single must-alias-set at a time.  The RC graphs contain
explicit control flow nodes ("phi" nodes for joins, "sigma" nodes for splits),
as well as nodes for things like decref instructions, incref instructions, and
"req" nodes that indicate that the reference count of an object may be observed
at that point up to some level.  Nodes in an RC graph each come with a "lower
bound" on the reference count for the graph's must-alias-set at that program
point (more about lower bounds below)---these lower bounds are the lower bound
before that node in the flowgraph.  We build independent graphs for each
must-alias-set, and they do not need to contain specific nodes relating to
possible cross-set effects (based on May-Alias relationships)---that
information is available in these graphs through the "req" nodes and lower
bound information.

The graphs are constructed after first computing information that allows us to
process each must-alias-set independently.  Then they are processed one at a
time with a set of "legal transformation rules".  The rules are applied in a
single pass over the flowgraph, going forwards, but potentially backtracking
when certain rules apply, since they may enable more rules to apply to previous
nodes.  At this point it might help to go look at one or two of the
transformation rule examples below (e.g. rule_inc_dec_fold), but that
documentation is not duplicated here.

The intention is that these rules are smaller and easier to understand the
correctness of than trying to do these transformations without an explicit data
structure, but a disadvantage is that this pass needs to allocate a lot of
temporary information in these graphs.  The backtracking also seemed a bit
convoluted to do directly on the IR.  We may eventually change this to work
without the extra data structure, but that's how it works right now.

Most of the analysis code in this module is about computing the information we
need to build these flowgraphs, before we do the actual optimizations on them.
The rest of this doc-comment talks primarily about that analysis---see the
comments near the rule_* functions for more about the flowgraph optimizations
themselves, and the comments near the Node structure for a description of the
node types in these graphs.


-- RC "lower bounds" --

A lower bound on the reference count of a must-alias-set indicates a known
minimum for the value of its object's count field at that program point.  This
minimum value can be interpreted as a minimum value of the actual integer in
memory at each point, if the program were not modified by this pass.  A lower
bound is therefore always non-negative.

The first utility of this information is pretty obvious: if a DecRef
instruction is encountered when the lower bound of must-alias-set is greater
than one, that DecRef instruction can be converted to DecRefNZ, since it can't
possibly free the object.  (See the flowgraph rule_decnz.)  Knowledge of the
lower bound is also required for folding unobservable incref/decref pairs, and
generally this information is inspected by most of the things done as RC
flowgraph transformations.

The lower bound must be tracked conservatively to ensure that our
transformations are correct.  This means we can increase a lower bound only
when we see instructions that /must/ imply an increase in the object's count
field, but we must decrease a lower bound whenever we see instructions that
/may/ imply a decrease in the count.  It might clarify this a little to list
the reasons that a must-alias-set's lower bounds can be increased:

   o An explicit IncRef instruction in the instruction stream of a tmp in the
     must-alias-set.

   o Instructions that "produce references" (generally instructions that
     allocate new objects).

   o Some situations with loads from memory (described later).

A must-alias-set's lower bound can be decreased in many situations, including:

   o An explicit DecRef or DecRefNZ of an SSATmp that maps to this
     must-alias-set.

   o In some situations, executing an instruction that could decref pointers
     that live in memory, for example by re-entering and running arbitrary php
     code.  (Memory is discussed more shortly; this concept is called "memory
     support".)

   o An SSATmp in this must-alias-set is passed as an argument to an IR
     instruction that may decref it.  (This is the "consumes reference"
     IRInstruction flag.)

   o We see an instruction that may reduce the lower bound of a different
     must-alias-set, for any reason, and that different set May-Alias this set.

If the last point were the exact rule we used, it would potentially mean lots
of reductions in lower bounds, which could be very pessimistic, so to obviate
the need to do it all the time we introduce an "exclusivity" principle on
tracking lower bounds.  What this principle means is the following: if we see
some reason in the IR to increment the lower bound in an alias set A, we can
/only/ increment the lower bound of A, even if that same information could also
be used to increase the lower bound of other asets.  If we could theoretically
use the same information to increase the lower bound of a different set B, we
can't do that at the same time---we have to choose one to apply it to.  (This
situation comes up with loads and is discussed further in "About Loads".)

This exclusivity principle provides the following rule for dealing with
instructions that may decrease reference counts because of May-Alias
relationships: when we need to decrease the lower bound of a must-alias-set, if
its lower bound is currently non-zero, we have no obligation to decrement the
lower bound in any other must-alias-set, regardless of May-Alias relationships.
The exclusivity of the lower bound means we know we're just cancelling out
something that raised the lower bound on this set and no other, so the state on
other sets can't be affected.

The pessimistic case still applies, however, if you need to reduce the lower
bound on a must-alias-set S that currently has a lower bound of zero.  Then all
the other sets that May-Alias S must have their lower bound reduced as well.


-- Memory Support --

This section is going to introduce the "memory support" concept, but details
will be fleshed out further in following sections.

The key idea behind this concept is that we can keep lower bounds higher than
we would otherwise be able to by tracking at least some of the pointers to the
object that may be in memory.  An alternative, conservative approach to stores,
for example, might be to eagerly attempt to reduce the lower bound on the must
alias set for value being stored at the location of the store itself.  By
instead keeping track of the fact that that memory location may contain a
pointer to that must-alias-set until it may be decref'd later, we can keep the
lower bound higher for longer.

The state of memory support for each must-alias-set is a bitvector of the
memory locations AliasAnalysis has identified in the program.  If a bit is set,
it indicates that that memory location may contain a pointer to that must alias
set.  When a must-alias-set has any memory support bits set, it is going to be
analyzed more conservatively than if it doesn't.  And importantly, the memory
support bits are may-information: just because a bit is set, doesn't mean that
we know for sure that memory location contains a pointer to this object.  It
just means that it might, and that our lower bound may have been "kept higher
for longer" using that knowledge at some point.

The primary thing we need to do more conservatively with must-alias-sets that
have memory support is reduce their lower bound if they could be decref'd
through that pointer in memory.  Since this effect on the analysis just reduces
lower bounds, it would never be incorrect to leave the memory support bit set
forever in this situation, which is also conceptually necessary for this to
work as may-information.

However, if we see an instruction that could DecRef one of these objects
through a pointer in memory and its lower_bound is currently non-zero, we can
be sure we've accounted for that may-DecRef by balancing it with a IncRef of
some sort that we've already observed.  In this situation, we can remove the
memory support bit to avoid futher reductions in the lower bound of that set
via that memory location.

Since this is may-information that makes analysis more conservative, the memory
support bits should conceptually be or'd at merge points.  It is fine to think
of it that way for general understanding of the analysis here, but in this
implementation we don't actually treat it that way when merging.  Because we
want to be able to quickly find memory-supported must-alias-sets from a given
memory location when analyzing memory effects of IR instructions (i.e. without
iterating every tracked alias set), we restrict the state to require that at
most one must-alias-set is supported by a given memory location during the
analysis.  If we reach situations that would break that restriction, we must
handle it conservatively (using a `pessimized' state, which is discussed some
later, as a last resort).  The merge_memory_support function elaborates on the
details of how this is done.

Another thing to note about memory support is that we may have more bits set
than the current lower bound for an object.  This situation can arise due to
conservatively reducing the lower bound, or due to pure stores happening before
IncRef instructions that raise the lower bound for that new pointer.

Most of the complexity in this analysis is related to instructions that load or
store from memory, and therefore interacts with memory support.  There are
enough details to discuss it futher in next several sections of this doc.


-- About Loads --

On entry to a region, it is assumed that normal VM reference count invariants
hold for all memory---specificaly, each reference count on each object in the
heap is exactly the number of live pointers to that object.  And in general,
accesses to memory must maintain this invariant when they are done outside of
small regions that may temporarily break that invariant.  We make use of this
fact to increase object lower bounds.

Accesses to memory within an IR compilation unit may be lowered into
instructions that separate reference count manipulation from stores and loads
(which is necessary for this pass to be able to optimize the former), so we
can't just assume loading a value from somewhere implies that there is a
reference count on the object, since our input program itself may have changed
that.  Furthermore, our input program may contain complex instructions other
than lowered stores that can "store over" memory locations we've already loaded
from, with a decref of the old value, and our analysis pass needs to reduce
lower bounds when we see those situations if we were using that memory location
to increase a lower bound on a loaded value.

To accomplish this, first this module performs a forward dataflow analysis to
compute program locations at which each memory location assigned an id by
AliasAnalysis are known to be "balanced" with regard to reference counting.
The gist of this is that if the last thing to manipulate a memory location must
have been code outside of this region, future loads from the memory location
define SSATmps that we know must have a lower bound of 1, corresponding to the
live pointer in that memory location.  However, when this analysis observes a
PureStore---i.e. a lowered, within-region store that does not imply reference
counting---a future load does not imply anything about the reference count,
because the program may have potentially written a pointer there that is not
yet "balanced" (we would need to see an IncRef or some other instruction
associated with the stored value to know that it has a reference).

Using the results of this analysis, we can add to the lower bound of some
must-alias-sets when we see loads from locations that are known to be balanced
at that program point.  When we do this, we must also track that the object has
a pointer in memory, which could cause a reduction in the lower bound later if
someone could decref it through that pointer, so the location must be added to
the memory support bitvector for that must-alias-set.  Whenever we see complex
instructions that may store to memory with the normal VM semantics of decrefing
old values, if they could overwrite locations that are currently "supporting"
one of our alias sets, we need to remove one from the alias set's lower bound
in case it decided to overwrite (and decref) the pointer that was in memory.

The "exclusivity" guarantee of our lower bounds requires that if we want to
raise the lower bound of an object because it was loaded from a memory location
known to be "balanced", then we only raise the lower bound for this reason on
at most one must-alias-set at a time.  This means if we see a load from a
location that is known to contain a balanced pointer, but we were already using
that location as memory support on a different set, we either need to remove
one from the lower bound of the other set before adding one to the new set, or
leave everything alone.  This commonly happens across php calls right now,
where values must be reloaded from memory because SSATmps can't span calls.

The way this pass currently handles this is the following: if we can reduce the
lower bound on the other set (because it has a non-zero lower bound), we'll
take the first choice, since the previously supported aset will probably not be
used again if we're spanning a call.  On the other hand, if the old set has a
lower bound of zero, so we can't compensate for removing it, we leave
everything alone.


-- Effects of Pure Stores on Memory Support --

There are two main kinds of stores from the perspective of this module.  There
are lowered stores (PureStore and PureSpillFrame) that happen within our IR
compilation unit, and don't imply reference count manipulation, and there are
stores that happen with "hhbc semantics" outside of the visibility of this
compilation unit, which imply decreffing the value that used to live in a
memory location as it's replaced with a new one.  This module needs some
understanding of both types, and both of these types of stores affect memory
support, but in different ways.

For any instruction that may do non-lowered stores outside of our unit ("stores
with hhbc semantics"), if the location(s) it may be storing to could be
supporting the lower bound in any must-alias-set, we should remove the support
and decrement the lower bound, because it could DecRef the value in order to
replace it with something else.  If we can't actually reduce the lower bound
(because it's already zero), we must leave the memory support flag alone,
because we haven't really accounted for the reference that lived in that memory
location, and it might not have actually been DecRef'd at that program point,
and could be DecRef'd later after we've seen another IncRef.  If we didn't
leave the bit alone in this situation, the lower bound could end up too high
after a later IncRef.

On the other hand, for a PureStore with a known destination, we don't need to
reduce the lower bound of any set that was supported by that location, since it
never implies a DecRef.  If the input IR program itself is trying to store to
that location "with hhbc semantics", then the program will also explicitly
contain the other lowered parts of this high level store, including any
appropriate loads and DecRefs of the old value, so we won't miss their effects.
So, for a PureStore we can simply mark the location as no-longer providing
memory support on the set it used to, but leave the lower bound alone.

The final case is a PureStore to an unknown location (either because it was not
supplied a AliasAnalysis id, or because it stored to something like a PtrToGen
that could refer to anything in memory).  In this situation, it may or may not
be overwriting a location we had been using for memory support---however, it's
harmless to leave that state alone, with the following rationale:

If it doesn't actually overwrite it, then obviously things are the same, and
we're good.  On the other hand, if it does actually overwrite it, then we don't
need to adjust the lower bound still, because it's a pure store (i.e. for the
same reason we didn't reduce the lower bound in the case above where we knew
where the store was going).  If we do nothing to our state, the only difference
from the known location, then, is that we may have "unnecessarily" left a
must-alias-set marked as getting memory support when it doesn't need to be
anymore.  But the point of marking part of a lower bound as coming from memory
support is just so that future stores (or loads) can potentially /reduce/ its
lower bound, so at worst it could reduce it later when it wouldn't really have
needed to if we had better information about where the store was going.  In
other words, it can be thought of as an optimization to clear the memory
support state when we see a PureStore with a known target location: it's not
required for correctness.


-- Effects of Pure Stores on the Must-Alias-Set Being Stored --

The other thing to take into account with stores is that they put a (possibly
new) pointer in memory, which means it now could be loaded and DecRef'd later,
possibly by code we can't directly see in our compilation unit.  To handle
this, we can divide things into four situations, based on two boolean
attributes: whether or not we have an AliasAnalysis bit for the location being
stored to ("known" vs "unknown" location), and whether or not the lower bound
on the must-alias-set for the value being stored is currently above zero.

The reason the lower bound matters when we see the store is the following:
we've possibly created a pointer in memory, which could be DecRef'd later, but
if the lower bound is zero we don't have a way to account for that, since it
can't go negative.  It's not ok to just ignore this.  Take the following
example, where t1 has a lower bound of zero:

   StMem ptr, t1
   IncRef t1
   IncRef t1
   RaiseWarning "something"  // any instruction that can re-enter and decref t1
   DecRef t1
   ...

If we simply ignored the fact that a new pointer has been created at the store,
that means the lower bound would be two after the two IncRefs, with no memory
support flags.  Then when we see the RaiseWarning, we won't know we need to
reduce the lower bound, since we didn't account for the store, and now we'll
think we can change the DecRef to DecRefNZ, but this is not actually a sound
transformation.

If the input program is not malformed, it will in fact be doing a 'balancing'
IncRef for any new pointers it creates, before anything could access it---in
fact it may have done that before the store, but our analysis in general
could've lost that information in the tracked lower bound because of a
May-Alias decref, or because it was done through an SSATmp that is mapped to a
different must-alias-set that actually is the same object (although we don't
know).

With this in mind, we'll discuss all four cases:

  Unknown target, Zero LB:

     We flag all must-alias-sets as "pessimized".  This state inserts a Halt
     node in each of the RC flowgraphs, and stops all optimizations along that
     control flow path: it prevents us from doing anything else in any
     successor blocks.

  Known target, Zero LB:

     Unlike the above, this case is not that uncommon.  Since we know where it
     is going, we don't have to give up on everything.  Instead, we leave the
     lower bound at zero, but set a memory support bit for the new location.
     Recall that we can in general have more memory support locations for one
     aset than the tracked lower bound---this is one situation that can cause
     that.

  Unknown target, Non-Zero LB:

     We don't know where the store is going, but we can account for balancing
     the possibly-new pointer.  In this case, we decrement the lower bound and
     just eagerly behave as if the must-alias-set for the stored value may be
     decref'd right there.  Since the lower bound is non-zero, we don't need to
     worry about changing lower bounds in other sets that May-Alias this one,
     because of the "exclusivity" rule for lower bounds.

  Known target, Non-Zero LB:

     Since we know where the new pointer will be, similar to the second case,
     we don't need to reduce the lower bound yet---we can wait until we see an
     instruction that might decref our object through that pointer.  In this
     case, we can just mark the target location as memory support for the
     must-alias-set for the stored value, and leave its lower bound alone.


-- More about Memory --

Another consideration about memory in this module arises from the fact that our
analysis passes make no attempt to track which object pointers may be escaped.
For that matter, much of the optimization we currently do here is removing
redundant reference counting of locals and eval stack slots, which arises from
lowering the HHBC stack machine semantics to HHIR---generally speaking these
values could be accessible through the heap as far as we know.  This is
important because it means that we can make no transformations to the program
that would affect the behavior of increfs or decrefs in memory locations we
aren't tracking, on the off chance they happen to contain a pointer to one of
our tracked objects.

The way we maintain correctness here is to never move or eliminate reference
counting operations unless we know about at least /two/ references to the
object being counted.  The importance of this is easiest to illustrate with
delayed increfs (relevant to rules inc_pass_req, inc_pass_phi, and
inc_pass_sig), although it applies to inc/dec pair removal also: it is fine to
move an incref forward in the IR instruction stream, as long as nothing could
observe the difference between the reference count the object "should" have,
and the one it will have after we delay the incref.  We need to consider how
reachability from the heap can affect this.

If the lower bound at an incref instruction is two or greater, we know we can
push the incref down as much as we want (basically until we reach an exit from
the compilation unit, or until we reach something that may decref the object
and reduce the lower bound).  On the other hand, if the lower bound before the
incref is zero, in order to move the incref forward, we would need to stop at
any instruction that could decref /anything/ in any memory location, since
we're making the assumption that there may be other live pointers to the
object---if we were to push that incref forward, we could change whether other
pointers to the object are considered the last reference, and cause a decref to
free the object when it shouldn't.  (We could try to do this on the rc
flowgraphs, but at least in a trivial implementation it would lead to a much
larger number of flowgraph nodes, so instead we leave easy cases to a separate,
local, "remove_trivial_incdecs" pass and ignore hard cases.)

The above two cases are relatively straightforward.  The remaining case is when
the lower bound before an incref is one.  It turns out to be safe to sink in
this case, and it fits the idea that we "know about two references".  Whatever
caused the lower bound to be one before the incref will ensure that the
object's liveness is not affected---here's why:

There are two possibilities: the object is either reachable through at least
one unknown pointer, or it isn't.  If it isn't, then the safety of moving the
incref is relatively straight-forward: we'll be pushing the actual /second/
reference down, and it is safe to push it as long as we don't move it through
something that may decref it (or until we reach an exit from the compilation
unit).  For the other possibility, it is sufficient to consider only having one
unknown pointer: in this situation, we're pushing the actual /third/ reference
down, and if anything decrefs the object through the pointer we don't know
about, it will still know not to free it because we left the second reference
alone (whatever was causing our lower bound to be one), and therefore a decref
through this unknown pointer won't think it is removing the last reference.

Also worth discussing is that there are several runtime objects in the VM with
operations that have behavioral differences based on whether the reference
count is greater than one.  For instance, types like KindOfString and
KindOfArray do in place updates when they have a refcount of one, and KindOfRef
is treated "observably" as a php reference only if the refcount is greater than
one.  Making sure we don't change these situations is actually the same
condition as discussed above: by the above scheme for not changing whether
pointers we don't know about constitute the last counted reference to an
object, we are both preventing decrefs from going to zero when they shouldn't,
and modifications to objects from failing to COW when they should.

A fundamental meta-rule that arises out of all the above considerations for any
of the RC flowgraph transformation rules is that we cannot move (or remove)
increfs unless the lower bound on the incref node is at least one (meaning
after the incref we "know about two references").  Similarly, anything that
could reduce the lower bound must put a node in the RC flowgraph to update that
information (a Req{1} node usually) so we don't push increfs too far or remove
them when we shouldn't.


-- "Trivial" incdec removal pass --

This module also contains a local optimization that removes IncRef/DecRefNZ
pairs in a block that have no non-"pure" memory-accessing instructions in
between them.

This optimization can be performed without regard to the lower bound of any
objects involved, and the DecRef -> DecRefNZ transformations the rest of the
code makes can create situations where these opportunities are visible.  Some
of these situations would be removable by the main pass if we had a more
complicated scheme for dealing with "unknown heap pointers" (i.e. the stuff in
the "more about memory" section described above).  But other situations may
also occur because the main pass may create unnecessary Req nodes in the middle
of code sequences that don't really observe references when we're dealing with
unrelated PureStores of possibly-aliasing tmps that have lower bounds of zero.

In general it is a simple pass to reason about the correctness of, and it
cleans up some things we can miss, so it is easier to do some of the work this
way than to complicate the main pass further.

*/

//////////////////////////////////////////////////////////////////////

#include "hphp/runtime/vm/jit/opt.h"

#include <algorithm>
#include <cstdio>
#include <string>
#include <limits>
#include <sstream>
#include <array>
#include <tuple>

#include <folly/Format.h>
#include <folly/ScopeGuard.h>
#include <folly/Conv.h>

#include <boost/dynamic_bitset.hpp>

#include "hphp/util/safe-cast.h"
#include "hphp/util/dataflow-worklist.h"
#include "hphp/util/match.h"
#include "hphp/util/trace.h"

#include "hphp/runtime/vm/jit/ir-unit.h"
#include "hphp/runtime/vm/jit/pass-tracer.h"
#include "hphp/runtime/vm/jit/cfg.h"
#include "hphp/runtime/vm/jit/state-vector.h"
#include "hphp/runtime/vm/jit/ir-instruction.h"
#include "hphp/runtime/vm/jit/analysis.h"
#include "hphp/runtime/vm/jit/block.h"
#include "hphp/runtime/vm/jit/containers.h"
#include "hphp/runtime/vm/jit/memory-effects.h"
#include "hphp/runtime/vm/jit/alias-analysis.h"
#include "hphp/runtime/vm/jit/mutation.h"
#include "hphp/runtime/vm/jit/timer.h"

namespace HPHP { namespace jit {

namespace {

TRACE_SET_MOD(hhir_refcount);

//////////////////////////////////////////////////////////////////////

/*
 * Id's of must-alias-sets.  We use -1 as an invalid id.
 */
using ASetID = int32_t;

struct MustAliasSet {
  explicit MustAliasSet(Type widestType, SSATmp* representative)
    : widestType(widestType)
    , representative(representative)
  {}

  /*
   * Widest type for this MustAliasSet, used for computing may-alias
   * information.
   *
   * Because of how we build MustAliasSets (essentially canonical(), or groups
   * of LdCtx instructions), it is guaranteed that this widestType includes all
   * possible values for the set.  However it is not the case that every tmp in
   * the set necessarily has a subtype of widestType, because of situations
   * that can occur with AssertType and interface types.  This does not affect
   * correctness, but it's worth being aware of.
   */
  Type widestType;

  /*
   * A representative of the set.  This is only used for debug tracing, and is
   * currently the first instruction (in an rpo traversal) that defined a tmp
   * in the must-alias-set.  (I.e. it'll be the canonical() tmp, or the first
   * LdCtx we saw.)
   */
  SSATmp* representative;

  /*
   * Set of ids of the other MustAliasSets that this set may alias, in a flow
   * insensitive way, and not including this set itself.  This is based only on
   * the type of the representative.  See the comments at the top of this file.
   */
  jit::flat_set<ASetID> may_alias;
};

//////////////////////////////////////////////////////////////////////

// Analysis results for memory locations known to contain balanced reference
// counts.  See populate_mrinfo.
struct MemRefAnalysis {
  struct BlockInfo {
    uint32_t rpoId;
    ALocBits avail_in;
    ALocBits avail_out;
    ALocBits kill;
    ALocBits gen;
  };

  explicit MemRefAnalysis(IRUnit& unit) : info(unit, BlockInfo{}) {}

  StateVector<Block,BlockInfo> info;
};

//////////////////////////////////////////////////////////////////////

// Per must-alias-set state information for rc_analyze.
struct ASetInfo {
  /*
   * A lower bound of the actual reference count of the object that this alias
   * set refers to.  See "RC lower bounds" in the documentation---there are
   * some subtleties here.
   */
  int32_t lower_bound{0};

  /*
   * Set of memory location ids that are being used to support the lower bound
   * of this object.  The purpose of this set is to reduce lower bounds when we
   * see memory events that might decref a pointer: this means it's never
   * incorrect to leave a bit set in memory_support conservatively, but there
   * are situations where we must set bits here or our analysis will be wrong.
   *
   * An important note is that the bits in memory_support can represent memory
   * locations that possibly alias (via ALocMeta::conflicts).  Setting only one
   * bit from the conflict set is sufficient when we know something must be in
   * memory in the set---any memory effects that can affect other may-aliasing
   * locations will still apply to all of them as needed.
   *
   * However, whenever we handle removing memory support, if you need to remove
   * one bit, you generally speaking are going to need to remove the support
   * for the whole conflict set.
   */
  ALocBits memory_support;

  /*
   * Sometimes we lose too much track of what's going on to do anything useful.
   * In this situation, all the sets get flagged as `pessimized', we don't do
   * anything to them anymore, and a Halt node is added to all graphs.
   *
   * Note: right now this state is per-ASetInfo, but we must pessimize
   * everything at once if we pessimize anything, because of how the analyzer
   * will lose track of aliasing effects.  (We will probably either change it to
   * be per-RCState later or fix the alias handling.)
   */
  bool pessimized{false};
};

// State structure for rc_analyze.
struct RCState {
  bool initialized{false};
  jit::vector<ASetInfo> asets;

  /*
   * MemRefAnalysis availability state.  This is just part of this struct for
   * convenience when stepping through RCAnalysis results.  It is used to know
   * when loads can provide memory support.
   */
  ALocBits avail;

  /*
   * Map from AliasClass ids to the must-alias-set that has it as
   * memory_support, if any do.  At most one ASet will be supported by any
   * location at a time, to fit the "exclusivity" condition on lower bounds.
   * The mapped value is -1 if no ASet is currently supported by that location.
   */
  std::array<ASetID,kMaxTrackedALocs> support_map;
};

// The analysis result structure for rc_analyze.  This structure gets fed into
// build_graphs to create our RC graphs.
struct RCAnalysis {
  struct BlockInfo {
    uint32_t rpoId;
    RCState state_in;
  };

  explicit RCAnalysis(IRUnit& unit) : info(unit, BlockInfo{}) {}

  StateVector<Block,BlockInfo> info;
};

//////////////////////////////////////////////////////////////////////

struct Env {
  explicit Env(IRUnit& unit)
    : unit(unit)
    , rpoBlocks(rpoSortCfg(unit))
    , idoms(findDominators(unit, rpoBlocks, numberBlocks(unit, rpoBlocks)))
    , ainfo(collect_aliases(unit, rpoBlocks))
    , mrinfo(unit)
    , asetMap(unit, -1)
  {}

  IRUnit& unit;
  BlockList rpoBlocks;
  IdomVector idoms;
  Arena arena;
  AliasAnalysis ainfo;
  MemRefAnalysis mrinfo;

  StateVector<SSATmp,ASetID> asetMap;  // -1 is invalid (not-Counted tmps)
  jit::vector<MustAliasSet> asets;
};

//////////////////////////////////////////////////////////////////////

/*
 * Nodes in the RC flowgraphs.
 */
enum class NT : uint8_t { Inc, Dec, Req, Phi, Sig, Halt, Empty };
struct Node {
  Node* next{nullptr};
  Node* prev{nullptr};  // unused for Phi nodes; as they may have >1 preds
  int32_t lower_bound{0};
  NT type;

  // Counter used by optimize pass to wait to visit Phis until after
  // non-backedge predecessors.
  int16_t visit_counter{0};

protected:
  explicit Node(NT type) : type(type) {}
  Node(const Node&) = default;
  Node& operator=(const Node&) = default;
};

/*
 * IncRef and DecRef{NZ,} nodes.
 */
struct NInc : Node {
  explicit NInc(IRInstruction* inst) : Node(NT::Inc), inst(inst) {}
  IRInstruction* inst;
};
struct NDec : Node {
  explicit NDec(IRInstruction* inst) : Node(NT::Dec), inst(inst) {}
  IRInstruction* inst;
};

/*
 * Control flow splits and joins.
 */
struct NPhi : Node {
  explicit NPhi(Block* block) : Node(NT::Phi), block(block) {}
  Block* block;
  Node** pred_list{0};
  uint32_t pred_list_cap{0};
  uint32_t pred_list_sz{0};
  uint32_t back_edge_preds{0};
};
struct NSig : Node {
  explicit NSig(Block* block) : Node(NT::Sig), block(block) {}
  Block* block;
  Node* taken{nullptr};
};

/*
 * Halt means to stop processing along this control flow path---something
 * during analysis had to pessimize and we can't continue.
 *
 * When we've pessimized a set, we also guarantee that all successors have a
 * lower_bound of zero, which will block all rcfg transformation rules from
 * applying, so it's actually not necessary to halt---it just prevents
 * processing parts of the graph unnecessarily.
 *
 * For the case of join points which were halted on one side, optimize_graph
 * will not process through the join because the visit_counter will never be
 * high enough.  In the case of back edges, it may process through the loop
 * unnecessarily, but it won't make any illegal transformations because the
 * lower_bound will be zero.
 */
struct NHalt : Node { explicit NHalt() : Node(NT::Halt) {} };

/*
 * Empty nodes are useful for building graphs, since not every node type can
 * have control flow edges, but it has no meaning later.
 */
struct NEmpty : Node { explicit NEmpty() : Node(NT::Empty) {} };

/*
 * Req nodes mean the reference count of the object may be observed, up to some
 * "level".  The level is a number we have to keep the lower_bound above to
 * avoid changing program behavior.  It will be INT32_MAX on exits from the
 * compilation unit.
 */
struct NReq : Node {
  explicit NReq(int32_t level) : Node(NT::Req), level(level) {}
  int32_t level;
};

#define X(Kind, kind)                               \
  UNUSED N##Kind* to_##kind(Node* n) {              \
    assertx(n->type == NT::Kind);                    \
    return static_cast<N##Kind*>(n);                \
  }                                                 \
  UNUSED const N##Kind* to_##kind(const Node* n) {  \
    return to_##kind(const_cast<Node*>(n));         \
  }

X(Inc, inc)
X(Dec, dec)
X(Req, req)
X(Phi, phi)
X(Sig, sig)
X(Halt, halt)
X(Empty, empty)

#undef X

//////////////////////////////////////////////////////////////////////

template<class Kill, class Gen>
void mrinfo_step_impl(Env& env,
                      const IRInstruction& inst,
                      Kill kill,
                      Gen gen) {
  auto do_store = [&] (AliasClass dst, SSATmp* value) {
    /*
     * Pure stores potentially (temporarily) break the heap's reference count
     * invariants on a memory location, but only if the value being stored is
     * possibly counted.
     */
    if (value->type().maybe(TCounted)) {
      kill(env.ainfo.may_alias(canonicalize(dst)));
    }
  };

  auto const effects = memory_effects(inst);
  match<void>(
    effects,
    [&] (IrrelevantEffects) {},
    [&] (ExitEffects)      {},
    [&] (ReturnEffects)    {},
    [&] (GeneralEffects)   {},
    [&] (UnknownEffects)   { kill(ALocBits{}.set()); },
    [&] (PureStore x)      { do_store(x.dst, x.value); },

    /*
     * Note that loads do not kill a location.  In fact, it's possible that the
     * IR program itself could cause a location to not be `balanced' using only
     * PureLoads.  (For example, it could load a local to decref it as part of
     * a return sequence.)
     *
     * It's safe not to add it to the kill set, though, because if the IR
     * program is destroying a memory location, it is already malformed if it
     * loads the location again and then uses it in a way that relies on the
     * pointer still being dereferenceable.  Moreover, in these situations,
     * even though the avail bit from mrinfo will be set on the second load, we
     * won't be able to remove support from the previous aset, and won't raise
     * the lower bound on the new loaded value.
     */
    [&] (PureLoad) {},

    /*
     * Since there's no semantically correct way to do PureLoads from the
     * locations in a PureSpillFrame unless something must have stored over
     * them again first, we don't need to kill anything here.
     */
    [&] (PureSpillFrame x) {},

    [&] (CallEffects x) {
      /*
       * Because PHP callees can side-exit (or for that matter throw from their
       * prologue), the program is ill-formed unless we have balanced reference
       * counting for all memory locations.  Even if the call has the
       * destroys_locals flag this is the case---after it destroys the locals
       * the new value will have a fully synchronized reference count.
       *
       * This may need modifications after we allow php values to span calls in
       * SSA registers.
       */
      gen(ALocBits{}.set());
    }
  );
}

// Helper for stepping after we've created a MemRefAnalysis.
void mrinfo_step(Env& env, const IRInstruction& inst, ALocBits& avail) {
  mrinfo_step_impl(
    env,
    inst,
    [&] (ALocBits kill) { avail &= ~kill; },
    [&] (ALocBits gen)  { avail |= gen; }
  );
}

/*
 * Perform an analysis to determine memory locations that are known to hold
 * "balanced" values with respect to reference counting.  This means the
 * location "owns" a reference in the normal sense---i.e. the count on the
 * object is at least one on account of the pointer in that location.
 *
 * Normal ("hhbc-semantics") operations on php values in memory all preserve
 * balanced reference counts (i.e. a pointer in memory corresponds to one value
 * in the count field of the pointee).  However, when we lower hhbc opcodes to
 * HHIR, some opcodes split up the reference counting operations from the
 * memory operations: when we observe a "pure store" instruction, therefore,
 * the location involved may no longer be "balanced" with regard to reference
 * counting.  See further discussion in the doc comment at the top of this
 * file.
 */
void populate_mrinfo(Env& env) {
  FTRACE(1, "populate_mrinfo ---------------------------------------\n");
  FTRACE(3, "locations:\n{}\n", show(env.ainfo));

  /*
   * 1. Compute block summaries.
   */
  for (auto& blk : env.rpoBlocks) {
    for (auto& inst : blk->instrs()) {
      mrinfo_step_impl(
        env,
        inst,
        [&] (ALocBits kill) {
          env.mrinfo.info[blk].kill |= kill;
          env.mrinfo.info[blk].gen &= ~kill;
        },
        [&] (ALocBits gen) {
          env.mrinfo.info[blk].gen |= gen;
          env.mrinfo.info[blk].kill &= ~gen;
        }
      );
    }
  }

  FTRACE(3, "summaries:\n{}\n",
    [&] () -> std::string {
      auto ret = std::string{};
      for (auto& blk : env.rpoBlocks) {
        folly::format(&ret, "  B{: <3}: {}\n"
                            "      : {}\n",
          blk->id(),
          show(env.mrinfo.info[blk].kill),
          show(env.mrinfo.info[blk].gen)
        );
      }
      return ret;
    }()
  );

  /*
   * 2. Find fixed point of avail_in:
   *
   *   avail_out = avail_in - kill + gen
   *   avail_in  = isect(pred) avail_out
   *
   * Locations that are marked "avail" mean they imply a non-zero lower bound
   * on the object they point to, if they contain a reference counted type, and
   * assuming they are actually legal to load from.
   */

  auto incompleteQ = dataflow_worklist<uint32_t>(env.rpoBlocks.size());
  for (auto rpoId = uint32_t{0}; rpoId < env.rpoBlocks.size(); ++rpoId) {
    env.mrinfo.info[env.rpoBlocks[rpoId]].rpoId = rpoId;
  }
  // avail_outs all default construct to zeros.
  // avail_in on the entry block (with no preds) will be set to all 1 below.
  incompleteQ.push(0);

  do {
    auto const blk = env.rpoBlocks[incompleteQ.pop()];
    auto& binfo = env.mrinfo.info[blk];

    binfo.avail_in.set();
    blk->forEachPred([&] (Block* pred) {
      binfo.avail_in &= env.mrinfo.info[pred].avail_out;
    });

    auto const old = binfo.avail_out;
    binfo.avail_out = (binfo.avail_in & ~binfo.kill) | binfo.gen;
    if (binfo.avail_out != old) {
      if (auto const t = blk->taken()) {
        incompleteQ.push(env.mrinfo.info[t].rpoId);
      }
      if (auto const n = blk->next()) {
        incompleteQ.push(env.mrinfo.info[n].rpoId);
      }
    }
  } while (!incompleteQ.empty());

  FTRACE(4, "fixed point:\n{}\n",
    [&] () -> std::string {
      auto ret = std::string{};
      for (auto& blk : env.rpoBlocks) {
        folly::format(&ret, "  B{: <3}: {}\n",
          blk->id(),
          show(env.mrinfo.info[blk].avail_in)
        );
      }
      return ret;
    }()
  );
}

//////////////////////////////////////////////////////////////////////

using HPHP::jit::show;
DEBUG_ONLY std::string show(const boost::dynamic_bitset<>& bs) {
  std::ostringstream out;
  out << bs;
  return out.str();
}

/*
 * This helper for weaken_decrefs reports uses of reference-counted values that
 * imply that their reference count cannot be zero (or it would be a bug).
 * This includes any use of an SSATmp that implies the pointer isn't already
 * freed.
 *
 * For now, it's limited to the reference counting operations on these values,
 * because other types of uses need will to be evaluated on a per-instruction
 * basis: we can't just check instruction srcs blindly to find these types of
 * uses, because in general a use of an SSATmp with a reference-counted pointer
 * type (like Obj, Arr, etc), implies only a use of the SSA-defined pointer
 * value (i.e. the pointer bits sitting in a virtual SSA register), not
 * necessarily of the value pointed to, which is what we care about here and
 * isn't represented in SSA.
 */
template<class Gen>
void weaken_decref_step(const Env& env, const IRInstruction& inst, Gen gen) {
  switch (inst.op()) {
  case DecRef:
  case DecRefNZ:
  case IncRef:
    {
      auto const asetID = env.asetMap[inst.src(0)];
      if (asetID != -1) gen(asetID);
    }
    break;
  default:
    break;
  }
}

/*
 * Backward pass that weakens DecRefs to DecRefNZ if they cannot go to zero
 * based on future use of the value that they are DecRefing.  See "Weakening
 * DecRefs" in the doc comment at the top of this file.
 */
void weaken_decrefs(Env& env) {
  FTRACE(2, "weaken_decrefs ----------------------------------------\n");
  auto const poBlocks = [&] {
    auto ret = env.rpoBlocks;
    std::reverse(begin(ret), end(ret));
    return ret;
  }();

  /*
   * 0. Initialize block state structures and put all blocks in the worklist.
   */
  auto incompleteQ = dataflow_worklist<uint32_t>(poBlocks.size());
  struct BlockInfo {
    BlockInfo() {}
    uint32_t poId;
    boost::dynamic_bitset<> in_used;
    boost::dynamic_bitset<> out_used;
    boost::dynamic_bitset<> gen;
  };
  StateVector<Block,BlockInfo> blockInfos(env.unit, BlockInfo{});
  for (auto poId = uint32_t{0}; poId < poBlocks.size(); ++poId) {
    auto const blk = poBlocks[poId];
    blockInfos[bl