PageRenderTime 42ms CodeModel.GetById 25ms app.highlight 5ms RepoModel.GetById 1ms app.codeStats 1ms

/compiler/specialise/SpecConstr.lhs

https://github.com/luite/ghc
Haskell | 1872 lines | 1278 code | 334 blank | 260 comment | 71 complexity | 7806c64adb5e6f9087b3fd713c4eb6c9 MD5 | raw file

Large files files are truncated, but you can click here to view the full file

   1ToDo [Nov 2010]
   2~~~~~~~~~~~~~~~
   31. Use a library type rather than an annotation for ForceSpecConstr
   42. Nuke NoSpecConstr
   5
   6%
   7% (c) The GRASP/AQUA Project, Glasgow University, 1992-1998
   8%
   9\section[SpecConstr]{Specialise over constructors}
  10
  11\begin{code}
  12module SpecConstr(
  13        specConstrProgram
  14#ifdef GHCI
  15        , SpecConstrAnnotation(..)
  16#endif
  17    ) where
  18
  19#include "HsVersions.h"
  20
  21import CoreSyn
  22import CoreSubst
  23import CoreUtils
  24import CoreUnfold       ( couldBeSmallEnoughToInline )
  25import CoreFVs          ( exprsFreeVars )
  26import CoreMonad
  27import Literal          ( litIsLifted )
  28import HscTypes         ( ModGuts(..) )
  29import WwLib            ( mkWorkerArgs )
  30import DataCon
  31import Coercion         hiding( substTy, substCo )
  32import Rules
  33import Type             hiding ( substTy )
  34import TyCon            ( isRecursiveTyCon )
  35import Id
  36import MkCore           ( mkImpossibleExpr )
  37import Var
  38import VarEnv
  39import VarSet
  40import Name
  41import BasicTypes
  42import DynFlags         ( DynFlags(..) )
  43import StaticFlags      ( opt_PprStyle_Debug )
  44import Maybes           ( orElse, catMaybes, isJust, isNothing )
  45import Demand
  46import Serialized       ( deserializeWithData )
  47import Util
  48import Pair
  49import UniqSupply
  50import Outputable
  51import FastString
  52import UniqFM
  53import MonadUtils
  54import Control.Monad    ( zipWithM )
  55import Data.List
  56
  57
  58-- See Note [SpecConstrAnnotation]
  59#ifndef GHCI
  60type SpecConstrAnnotation = ()
  61#else
  62import TyCon            ( TyCon )
  63import GHC.Exts( SpecConstrAnnotation(..) )
  64#endif
  65\end{code}
  66
  67-----------------------------------------------------
  68                        Game plan
  69-----------------------------------------------------
  70
  71Consider
  72        drop n []     = []
  73        drop 0 xs     = []
  74        drop n (x:xs) = drop (n-1) xs
  75
  76After the first time round, we could pass n unboxed.  This happens in
  77numerical code too.  Here's what it looks like in Core:
  78
  79        drop n xs = case xs of
  80                      []     -> []
  81                      (y:ys) -> case n of
  82                                  I# n# -> case n# of
  83                                             0 -> []
  84                                             _ -> drop (I# (n# -# 1#)) xs
  85
  86Notice that the recursive call has an explicit constructor as argument.
  87Noticing this, we can make a specialised version of drop
  88
  89        RULE: drop (I# n#) xs ==> drop' n# xs
  90
  91        drop' n# xs = let n = I# n# in ...orig RHS...
  92
  93Now the simplifier will apply the specialisation in the rhs of drop', giving
  94
  95        drop' n# xs = case xs of
  96                      []     -> []
  97                      (y:ys) -> case n# of
  98                                  0 -> []
  99                                  _ -> drop' (n# -# 1#) xs
 100
 101Much better!
 102
 103We'd also like to catch cases where a parameter is carried along unchanged,
 104but evaluated each time round the loop:
 105
 106        f i n = if i>0 || i>n then i else f (i*2) n
 107
 108Here f isn't strict in n, but we'd like to avoid evaluating it each iteration.
 109In Core, by the time we've w/wd (f is strict in i) we get
 110
 111        f i# n = case i# ># 0 of
 112                   False -> I# i#
 113                   True  -> case n of { I# n# ->
 114                            case i# ># n# of
 115                                False -> I# i#
 116                                True  -> f (i# *# 2#) n
 117
 118At the call to f, we see that the argument, n is known to be (I# n#),
 119and n is evaluated elsewhere in the body of f, so we can play the same
 120trick as above.
 121
 122
 123Note [Reboxing]
 124~~~~~~~~~~~~~~~
 125We must be careful not to allocate the same constructor twice.  Consider
 126        f p = (...(case p of (a,b) -> e)...p...,
 127               ...let t = (r,s) in ...t...(f t)...)
 128At the recursive call to f, we can see that t is a pair.  But we do NOT want
 129to make a specialised copy:
 130        f' a b = let p = (a,b) in (..., ...)
 131because now t is allocated by the caller, then r and s are passed to the
 132recursive call, which allocates the (r,s) pair again.
 133
 134This happens if
 135  (a) the argument p is used in other than a case-scrutinsation way.
 136  (b) the argument to the call is not a 'fresh' tuple; you have to
 137        look into its unfolding to see that it's a tuple
 138
 139Hence the "OR" part of Note [Good arguments] below.
 140
 141ALTERNATIVE 2: pass both boxed and unboxed versions.  This no longer saves
 142allocation, but does perhaps save evals. In the RULE we'd have
 143something like
 144
 145  f (I# x#) = f' (I# x#) x#
 146
 147If at the call site the (I# x) was an unfolding, then we'd have to
 148rely on CSE to eliminate the duplicate allocation.... This alternative
 149doesn't look attractive enough to pursue.
 150
 151ALTERNATIVE 3: ignore the reboxing problem.  The trouble is that
 152the conservative reboxing story prevents many useful functions from being
 153specialised.  Example:
 154        foo :: Maybe Int -> Int -> Int
 155        foo   (Just m) 0 = 0
 156        foo x@(Just m) n = foo x (n-m)
 157Here the use of 'x' will clearly not require boxing in the specialised function.
 158
 159The strictness analyser has the same problem, in fact.  Example:
 160        f p@(a,b) = ...
 161If we pass just 'a' and 'b' to the worker, it might need to rebox the
 162pair to create (a,b).  A more sophisticated analysis might figure out
 163precisely the cases in which this could happen, but the strictness
 164analyser does no such analysis; it just passes 'a' and 'b', and hopes
 165for the best.
 166
 167So my current choice is to make SpecConstr similarly aggressive, and
 168ignore the bad potential of reboxing.
 169
 170
 171Note [Good arguments]
 172~~~~~~~~~~~~~~~~~~~~~
 173So we look for
 174
 175* A self-recursive function.  Ignore mutual recursion for now,
 176  because it's less common, and the code is simpler for self-recursion.
 177
 178* EITHER
 179
 180   a) At a recursive call, one or more parameters is an explicit
 181      constructor application
 182        AND
 183      That same parameter is scrutinised by a case somewhere in
 184      the RHS of the function
 185
 186  OR
 187
 188    b) At a recursive call, one or more parameters has an unfolding
 189       that is an explicit constructor application
 190        AND
 191      That same parameter is scrutinised by a case somewhere in
 192      the RHS of the function
 193        AND
 194      Those are the only uses of the parameter (see Note [Reboxing])
 195
 196
 197What to abstract over
 198~~~~~~~~~~~~~~~~~~~~~
 199There's a bit of a complication with type arguments.  If the call
 200site looks like
 201
 202        f p = ...f ((:) [a] x xs)...
 203
 204then our specialised function look like
 205
 206        f_spec x xs = let p = (:) [a] x xs in ....as before....
 207
 208This only makes sense if either
 209  a) the type variable 'a' is in scope at the top of f, or
 210  b) the type variable 'a' is an argument to f (and hence fs)
 211
 212Actually, (a) may hold for value arguments too, in which case
 213we may not want to pass them.  Supose 'x' is in scope at f's
 214defn, but xs is not.  Then we'd like
 215
 216        f_spec xs = let p = (:) [a] x xs in ....as before....
 217
 218Similarly (b) may hold too.  If x is already an argument at the
 219call, no need to pass it again.
 220
 221Finally, if 'a' is not in scope at the call site, we could abstract
 222it as we do the term variables:
 223
 224        f_spec a x xs = let p = (:) [a] x xs in ...as before...
 225
 226So the grand plan is:
 227
 228        * abstract the call site to a constructor-only pattern
 229          e.g.  C x (D (f p) (g q))  ==>  C s1 (D s2 s3)
 230
 231        * Find the free variables of the abstracted pattern
 232
 233        * Pass these variables, less any that are in scope at
 234          the fn defn.  But see Note [Shadowing] below.
 235
 236
 237NOTICE that we only abstract over variables that are not in scope,
 238so we're in no danger of shadowing variables used in "higher up"
 239in f_spec's RHS.
 240
 241
 242Note [Shadowing]
 243~~~~~~~~~~~~~~~~
 244In this pass we gather up usage information that may mention variables
 245that are bound between the usage site and the definition site; or (more
 246seriously) may be bound to something different at the definition site.
 247For example:
 248
 249        f x = letrec g y v = let x = ...
 250                             in ...(g (a,b) x)...
 251
 252Since 'x' is in scope at the call site, we may make a rewrite rule that
 253looks like
 254        RULE forall a,b. g (a,b) x = ...
 255But this rule will never match, because it's really a different 'x' at
 256the call site -- and that difference will be manifest by the time the
 257simplifier gets to it.  [A worry: the simplifier doesn't *guarantee*
 258no-shadowing, so perhaps it may not be distinct?]
 259
 260Anyway, the rule isn't actually wrong, it's just not useful.  One possibility
 261is to run deShadowBinds before running SpecConstr, but instead we run the
 262simplifier.  That gives the simplest possible program for SpecConstr to
 263chew on; and it virtually guarantees no shadowing.
 264
 265Note [Specialising for constant parameters]
 266~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 267This one is about specialising on a *constant* (but not necessarily
 268constructor) argument
 269
 270    foo :: Int -> (Int -> Int) -> Int
 271    foo 0 f = 0
 272    foo m f = foo (f m) (+1)
 273
 274It produces
 275
 276    lvl_rmV :: GHC.Base.Int -> GHC.Base.Int
 277    lvl_rmV =
 278      \ (ds_dlk :: GHC.Base.Int) ->
 279        case ds_dlk of wild_alH { GHC.Base.I# x_alG ->
 280        GHC.Base.I# (GHC.Prim.+# x_alG 1)
 281
 282    T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) ->
 283    GHC.Prim.Int#
 284    T.$wfoo =
 285      \ (ww_sme :: GHC.Prim.Int#) (w_smg :: GHC.Base.Int -> GHC.Base.Int) ->
 286        case ww_sme of ds_Xlw {
 287          __DEFAULT ->
 288        case w_smg (GHC.Base.I# ds_Xlw) of w1_Xmo { GHC.Base.I# ww1_Xmz ->
 289        T.$wfoo ww1_Xmz lvl_rmV
 290        };
 291          0 -> 0
 292        }
 293
 294The recursive call has lvl_rmV as its argument, so we could create a specialised copy
 295with that argument baked in; that is, not passed at all.   Now it can perhaps be inlined.
 296
 297When is this worth it?  Call the constant 'lvl'
 298- If 'lvl' has an unfolding that is a constructor, see if the corresponding
 299  parameter is scrutinised anywhere in the body.
 300
 301- If 'lvl' has an unfolding that is a inlinable function, see if the corresponding
 302  parameter is applied (...to enough arguments...?)
 303
 304  Also do this is if the function has RULES?
 305
 306Also
 307
 308Note [Specialising for lambda parameters]
 309~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 310    foo :: Int -> (Int -> Int) -> Int
 311    foo 0 f = 0
 312    foo m f = foo (f m) (\n -> n-m)
 313
 314This is subtly different from the previous one in that we get an
 315explicit lambda as the argument:
 316
 317    T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) ->
 318    GHC.Prim.Int#
 319    T.$wfoo =
 320      \ (ww_sm8 :: GHC.Prim.Int#) (w_sma :: GHC.Base.Int -> GHC.Base.Int) ->
 321        case ww_sm8 of ds_Xlr {
 322          __DEFAULT ->
 323        case w_sma (GHC.Base.I# ds_Xlr) of w1_Xmf { GHC.Base.I# ww1_Xmq ->
 324        T.$wfoo
 325          ww1_Xmq
 326          (\ (n_ad3 :: GHC.Base.Int) ->
 327             case n_ad3 of wild_alB { GHC.Base.I# x_alA ->
 328             GHC.Base.I# (GHC.Prim.-# x_alA ds_Xlr)
 329             })
 330        };
 331          0 -> 0
 332        }
 333
 334I wonder if SpecConstr couldn't be extended to handle this? After all,
 335lambda is a sort of constructor for functions and perhaps it already
 336has most of the necessary machinery?
 337
 338Furthermore, there's an immediate win, because you don't need to allocate the lamda
 339at the call site; and if perchance it's called in the recursive call, then you
 340may avoid allocating it altogether.  Just like for constructors.
 341
 342Looks cool, but probably rare...but it might be easy to implement.
 343
 344
 345Note [SpecConstr for casts]
 346~~~~~~~~~~~~~~~~~~~~~~~~~~~
 347Consider
 348    data family T a :: *
 349    data instance T Int = T Int
 350
 351    foo n = ...
 352       where
 353         go (T 0) = 0
 354         go (T n) = go (T (n-1))
 355
 356The recursive call ends up looking like
 357        go (T (I# ...) `cast` g)
 358So we want to spot the constructor application inside the cast.
 359That's why we have the Cast case in argToPat
 360
 361Note [Local recursive groups]
 362~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 363For a *local* recursive group, we can see all the calls to the
 364function, so we seed the specialisation loop from the calls in the
 365body, not from the calls in the RHS.  Consider:
 366
 367  bar m n = foo n (n,n) (n,n) (n,n) (n,n)
 368   where
 369     foo n p q r s
 370       | n == 0    = m
 371       | n > 3000  = case p of { (p1,p2) -> foo (n-1) (p2,p1) q r s }
 372       | n > 2000  = case q of { (q1,q2) -> foo (n-1) p (q2,q1) r s }
 373       | n > 1000  = case r of { (r1,r2) -> foo (n-1) p q (r2,r1) s }
 374       | otherwise = case s of { (s1,s2) -> foo (n-1) p q r (s2,s1) }
 375
 376If we start with the RHSs of 'foo', we get lots and lots of specialisations,
 377most of which are not needed.  But if we start with the (single) call
 378in the rhs of 'bar' we get exactly one fully-specialised copy, and all
 379the recursive calls go to this fully-specialised copy. Indeed, the original
 380function is later collected as dead code.  This is very important in
 381specialising the loops arising from stream fusion, for example in NDP where
 382we were getting literally hundreds of (mostly unused) specialisations of
 383a local function.
 384
 385In a case like the above we end up never calling the original un-specialised
 386function.  (Although we still leave its code around just in case.)
 387
 388However, if we find any boring calls in the body, including *unsaturated*
 389ones, such as
 390      letrec foo x y = ....foo...
 391      in map foo xs
 392then we will end up calling the un-specialised function, so then we *should*
 393use the calls in the un-specialised RHS as seeds.  We call these
 394"boring call patterns", and callsToPats reports if it finds any of these.
 395
 396
 397Note [Do not specialise diverging functions]
 398~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 399Specialising a function that just diverges is a waste of code.
 400Furthermore, it broke GHC (simpl014) thus:
 401   {-# STR Sb #-}
 402   f = \x. case x of (a,b) -> f x
 403If we specialise f we get
 404   f = \x. case x of (a,b) -> fspec a b
 405But fspec doesn't have decent strictnes info.  As it happened,
 406(f x) :: IO t, so the state hack applied and we eta expanded fspec,
 407and hence f.  But now f's strictness is less than its arity, which
 408breaks an invariant.
 409
 410Note [SpecConstrAnnotation]
 411~~~~~~~~~~~~~~~~~~~~~~~~~~~
 412SpecConstrAnnotation is defined in GHC.Exts, and is only guaranteed to
 413be available in stage 2 (well, until the bootstrap compiler can be
 414guaranteed to have it)
 415
 416So we define it to be () in stage1 (ie when GHCI is undefined), and
 417'#ifdef' out the code that uses it.
 418
 419See also Note [Forcing specialisation]
 420
 421Note [Forcing specialisation]
 422~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 423With stream fusion and in other similar cases, we want to fully specialise
 424some (but not necessarily all!) loops regardless of their size and the
 425number of specialisations. We allow a library to specify this by annotating
 426a type with ForceSpecConstr and then adding a parameter of that type to the
 427loop. Here is a (simplified) example from the vector library:
 428
 429  data SPEC = SPEC | SPEC2
 430  {-# ANN type SPEC ForceSpecConstr #-}
 431
 432  foldl :: (a -> b -> a) -> a -> Stream b -> a
 433  {-# INLINE foldl #-}
 434  foldl f z (Stream step s _) = foldl_loop SPEC z s
 435    where
 436      foldl_loop !sPEC z s = case step s of
 437                              Yield x s' -> foldl_loop sPEC (f z x) s'
 438                              Skip       -> foldl_loop sPEC z s'
 439                              Done       -> z
 440
 441SpecConstr will spot the SPEC parameter and always fully specialise
 442foldl_loop. Note that
 443
 444  * We have to prevent the SPEC argument from being removed by
 445    w/w which is why (a) SPEC is a sum type, and (b) we have to seq on
 446    the SPEC argument.
 447
 448  * And lastly, the SPEC argument is ultimately eliminated by
 449    SpecConstr itself so there is no runtime overhead.
 450
 451This is all quite ugly; we ought to come up with a better design.
 452
 453ForceSpecConstr arguments are spotted in scExpr' and scTopBinds which then set
 454sc_force to True when calling specLoop. This flag does three things:
 455  * Ignore specConstrThreshold, to specialise functions of arbitrary size
 456        (see scTopBind)
 457  * Ignore specConstrCount, to make arbitrary numbers of specialisations
 458        (see specialise)
 459  * Specialise even for arguments that are not scrutinised in the loop
 460        (see argToPat; Trac #4488)
 461  * Only specialise on recursive types a finite number of times
 462        (see is_too_recursive; Trac #5550)
 463
 464This flag is inherited for nested non-recursive bindings (which are likely to
 465be join points and hence should be fully specialised) but reset for nested
 466recursive bindings.
 467
 468What alternatives did I consider? Annotating the loop itself doesn't
 469work because (a) it is local and (b) it will be w/w'ed and having
 470w/w propagating annotations somehow doesn't seem like a good idea. The
 471types of the loop arguments really seem to be the most persistent
 472thing.
 473
 474Annotating the types that make up the loop state doesn't work,
 475either, because (a) it would prevent us from using types like Either
 476or tuples here, (b) we don't want to restrict the set of types that
 477can be used in Stream states and (c) some types are fixed by the user
 478(e.g., the accumulator here) but we still want to specialise as much
 479as possible.
 480
 481ForceSpecConstr is done by way of an annotation:
 482  data SPEC = SPEC | SPEC2
 483  {-# ANN type SPEC ForceSpecConstr #-}
 484But SPEC is the *only* type so annotated, so it'd be better to
 485use a particular library type.
 486
 487Alternatives to ForceSpecConstr
 488~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 489Instead of giving the loop an extra argument of type SPEC, we
 490also considered *wrapping* arguments in SPEC, thus
 491  data SPEC a = SPEC a | SPEC2
 492
 493  loop = \arg -> case arg of
 494                     SPEC state ->
 495                        case state of (x,y) -> ... loop (SPEC (x',y')) ...
 496                        S2 -> error ...
 497The idea is that a SPEC argument says "specialise this argument
 498regardless of whether the function case-analyses it".  But this
 499doesn't work well:
 500  * SPEC must still be a sum type, else the strictness analyser
 501    eliminates it
 502  * But that means that 'loop' won't be strict in its real payload
 503This loss of strictness in turn screws up specialisation, because
 504we may end up with calls like
 505   loop (SPEC (case z of (p,q) -> (q,p)))
 506Without the SPEC, if 'loop' were strict, the case would move out
 507and we'd see loop applied to a pair. But if 'loop' isn't strict
 508this doesn't look like a specialisable call.
 509
 510Note [NoSpecConstr]
 511~~~~~~~~~~~~~~~~~~~
 512The ignoreDataCon stuff allows you to say
 513    {-# ANN type T NoSpecConstr #-}
 514to mean "don't specialise on arguments of this type.  It was added
 515before we had ForceSpecConstr.  Lacking ForceSpecConstr we specialised
 516regardless of size; and then we needed a way to turn that *off*.  Now
 517that we have ForceSpecConstr, this NoSpecConstr is probably redundant.
 518(Used only for PArray.)
 519
 520-----------------------------------------------------
 521                Stuff not yet handled
 522-----------------------------------------------------
 523
 524Here are notes arising from Roman's work that I don't want to lose.
 525
 526Example 1
 527~~~~~~~~~
 528    data T a = T !a
 529
 530    foo :: Int -> T Int -> Int
 531    foo 0 t = 0
 532    foo x t | even x    = case t of { T n -> foo (x-n) t }
 533            | otherwise = foo (x-1) t
 534
 535SpecConstr does no specialisation, because the second recursive call
 536looks like a boxed use of the argument.  A pity.
 537
 538    $wfoo_sFw :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int#
 539    $wfoo_sFw =
 540      \ (ww_sFo [Just L] :: GHC.Prim.Int#) (w_sFq [Just L] :: T.T GHC.Base.Int) ->
 541         case ww_sFo of ds_Xw6 [Just L] {
 542           __DEFAULT ->
 543                case GHC.Prim.remInt# ds_Xw6 2 of wild1_aEF [Dead Just A] {
 544                  __DEFAULT -> $wfoo_sFw (GHC.Prim.-# ds_Xw6 1) w_sFq;
 545                  0 ->
 546                    case w_sFq of wild_Xy [Just L] { T.T n_ad5 [Just U(L)] ->
 547                    case n_ad5 of wild1_aET [Just A] { GHC.Base.I# y_aES [Just L] ->
 548                    $wfoo_sFw (GHC.Prim.-# ds_Xw6 y_aES) wild_Xy
 549                    } } };
 550           0 -> 0
 551
 552Example 2
 553~~~~~~~~~
 554    data a :*: b = !a :*: !b
 555    data T a = T !a
 556
 557    foo :: (Int :*: T Int) -> Int
 558    foo (0 :*: t) = 0
 559    foo (x :*: t) | even x    = case t of { T n -> foo ((x-n) :*: t) }
 560                  | otherwise = foo ((x-1) :*: t)
 561
 562Very similar to the previous one, except that the parameters are now in
 563a strict tuple. Before SpecConstr, we have
 564
 565    $wfoo_sG3 :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int#
 566    $wfoo_sG3 =
 567      \ (ww_sFU [Just L] :: GHC.Prim.Int#) (ww_sFW [Just L] :: T.T
 568    GHC.Base.Int) ->
 569        case ww_sFU of ds_Xws [Just L] {
 570          __DEFAULT ->
 571        case GHC.Prim.remInt# ds_Xws 2 of wild1_aEZ [Dead Just A] {
 572          __DEFAULT ->
 573            case ww_sFW of tpl_B2 [Just L] { T.T a_sFo [Just A] ->
 574            $wfoo_sG3 (GHC.Prim.-# ds_Xws 1) tpl_B2             -- $wfoo1
 575            };
 576          0 ->
 577            case ww_sFW of wild_XB [Just A] { T.T n_ad7 [Just S(L)] ->
 578            case n_ad7 of wild1_aFd [Just L] { GHC.Base.I# y_aFc [Just L] ->
 579            $wfoo_sG3 (GHC.Prim.-# ds_Xws y_aFc) wild_XB        -- $wfoo2
 580            } } };
 581          0 -> 0 }
 582
 583We get two specialisations:
 584"SC:$wfoo1" [0] __forall {a_sFB :: GHC.Base.Int sc_sGC :: GHC.Prim.Int#}
 585                  Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int a_sFB)
 586                  = Foo.$s$wfoo1 a_sFB sc_sGC ;
 587"SC:$wfoo2" [0] __forall {y_aFp :: GHC.Prim.Int# sc_sGC :: GHC.Prim.Int#}
 588                  Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int (GHC.Base.I# y_aFp))
 589                  = Foo.$s$wfoo y_aFp sc_sGC ;
 590
 591But perhaps the first one isn't good.  After all, we know that tpl_B2 is
 592a T (I# x) really, because T is strict and Int has one constructor.  (We can't
 593unbox the strict fields, because T is polymorphic!)
 594
 595%************************************************************************
 596%*                                                                      *
 597\subsection{Top level wrapper stuff}
 598%*                                                                      *
 599%************************************************************************
 600
 601\begin{code}
 602specConstrProgram :: ModGuts -> CoreM ModGuts
 603specConstrProgram guts
 604  = do
 605      dflags <- getDynFlags
 606      us     <- getUniqueSupplyM
 607      annos  <- getFirstAnnotations deserializeWithData guts
 608      let binds' = fst $ initUs us (go (initScEnv dflags annos) (mg_binds guts))
 609      return (guts { mg_binds = binds' })
 610  where
 611    go _   []           = return []
 612    go env (bind:binds) = do (env', bind') <- scTopBind env bind
 613                             binds' <- go env' binds
 614                             return (bind' : binds')
 615\end{code}
 616
 617
 618%************************************************************************
 619%*                                                                      *
 620\subsection{Environment: goes downwards}
 621%*                                                                      *
 622%************************************************************************
 623
 624Note [Work-free values only in environment]
 625~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 626The sc_vals field keeps track of in-scope value bindings, so 
 627that if we come across (case x of Just y ->...) we can reduce the
 628case from knowing that x is bound to a pair.
 629
 630But only *work-free* values are ok here. For example if the envt had
 631    x -> Just (expensive v)
 632then we do NOT want to expand to
 633     let y = expensive v in ...
 634because the x-binding still exists and we've now duplicated (expensive v).
 635
 636This seldom happens because let-bound constructor applications are 
 637ANF-ised, but it can happen as a result of on-the-fly transformations in
 638SpecConstr itself.  Here is Trac #7865:
 639
 640        let {
 641          a'_shr =
 642            case xs_af8 of _ {
 643              [] -> acc_af6;
 644              : ds_dgt [Dmd=<L,A>] ds_dgu [Dmd=<L,A>] ->
 645                (expensive x_af7, x_af7
 646            } } in
 647        let {
 648          ds_sht =
 649            case a'_shr of _ { (p'_afd, q'_afe) ->
 650            TSpecConstr_DoubleInline.recursive
 651              (GHC.Types.: @ GHC.Types.Int x_af7 wild_X6) (q'_afe, p'_afd)
 652            } } in
 653
 654When processed knowing that xs_af8 was bound to a cons, we simplify to 
 655   a'_shr = (expensive x_af7, x_af7)
 656and we do NOT want to inline that at the occurrence of a'_shr in ds_sht.
 657(There are other occurrences of a'_shr.)  No no no.
 658
 659It would be possible to do some on-the-fly ANF-ising, so that a'_shr turned
 660into a work-free value again, thus
 661   a1 = expensive x_af7
 662   a'_shr = (a1, x_af7)
 663but that's more work, so until its shown to be important I'm going to 
 664leave it for now.
 665
 666\begin{code}
 667data ScEnv = SCE { sc_dflags    :: DynFlags,
 668                   sc_size      :: Maybe Int,   -- Size threshold
 669                   sc_count     :: Maybe Int,   -- Max # of specialisations for any one fn
 670                                                -- See Note [Avoiding exponential blowup]
 671
 672                   sc_recursive :: Int,         -- Max # of specialisations over recursive type.
 673                                                -- Stops ForceSpecConstr from diverging.
 674
 675                   sc_force     :: Bool,        -- Force specialisation?
 676                                                -- See Note [Forcing specialisation]
 677
 678                   sc_subst     :: Subst,       -- Current substitution
 679                                                -- Maps InIds to OutExprs
 680
 681                   sc_how_bound :: HowBoundEnv,
 682                        -- Binds interesting non-top-level variables
 683                        -- Domain is OutVars (*after* applying the substitution)
 684
 685                   sc_vals      :: ValueEnv,
 686                        -- Domain is OutIds (*after* applying the substitution)
 687                        -- Used even for top-level bindings (but not imported ones)
 688                        -- The range of the ValueEnv is *work-free* values
 689                        -- such as (\x. blah), or (Just v)
 690                        -- but NOT (Just (expensive v))
 691                        -- See Note [Work-free values only in environment]
 692
 693                   sc_annotations :: UniqFM SpecConstrAnnotation
 694             }
 695
 696---------------------
 697-- As we go, we apply a substitution (sc_subst) to the current term
 698type InExpr = CoreExpr          -- _Before_ applying the subst
 699type InVar  = Var
 700
 701type OutExpr = CoreExpr         -- _After_ applying the subst
 702type OutId   = Id
 703type OutVar  = Var
 704
 705---------------------
 706type HowBoundEnv = VarEnv HowBound      -- Domain is OutVars
 707
 708---------------------
 709type ValueEnv = IdEnv Value             -- Domain is OutIds
 710data Value    = ConVal AltCon [CoreArg] -- _Saturated_ constructors
 711                                        --   The AltCon is never DEFAULT
 712              | LambdaVal               -- Inlinable lambdas or PAPs
 713
 714instance Outputable Value where
 715   ppr (ConVal con args) = ppr con <+> interpp'SP args
 716   ppr LambdaVal         = ptext (sLit "<Lambda>")
 717
 718---------------------
 719initScEnv :: DynFlags -> UniqFM SpecConstrAnnotation -> ScEnv
 720initScEnv dflags anns
 721  = SCE { sc_dflags      = dflags,
 722          sc_size        = specConstrThreshold dflags,
 723          sc_count       = specConstrCount     dflags,
 724          sc_recursive   = specConstrRecursive dflags,
 725          sc_force       = False,
 726          sc_subst       = emptySubst,
 727          sc_how_bound   = emptyVarEnv,
 728          sc_vals        = emptyVarEnv,
 729          sc_annotations = anns }
 730
 731data HowBound = RecFun  -- These are the recursive functions for which
 732                        -- we seek interesting call patterns
 733
 734              | RecArg  -- These are those functions' arguments, or their sub-components;
 735                        -- we gather occurrence information for these
 736
 737instance Outputable HowBound where
 738  ppr RecFun = text "RecFun"
 739  ppr RecArg = text "RecArg"
 740
 741scForce :: ScEnv -> Bool -> ScEnv
 742scForce env b = env { sc_force = b }
 743
 744lookupHowBound :: ScEnv -> Id -> Maybe HowBound
 745lookupHowBound env id = lookupVarEnv (sc_how_bound env) id
 746
 747scSubstId :: ScEnv -> Id -> CoreExpr
 748scSubstId env v = lookupIdSubst (text "scSubstId") (sc_subst env) v
 749
 750scSubstTy :: ScEnv -> Type -> Type
 751scSubstTy env ty = substTy (sc_subst env) ty
 752
 753scSubstCo :: ScEnv -> Coercion -> Coercion
 754scSubstCo env co = substCo (sc_subst env) co
 755
 756zapScSubst :: ScEnv -> ScEnv
 757zapScSubst env = env { sc_subst = zapSubstEnv (sc_subst env) }
 758
 759extendScInScope :: ScEnv -> [Var] -> ScEnv
 760        -- Bring the quantified variables into scope
 761extendScInScope env qvars = env { sc_subst = extendInScopeList (sc_subst env) qvars }
 762
 763        -- Extend the substitution
 764extendScSubst :: ScEnv -> Var -> OutExpr -> ScEnv
 765extendScSubst env var expr = env { sc_subst = extendSubst (sc_subst env) var expr }
 766
 767extendScSubstList :: ScEnv -> [(Var,OutExpr)] -> ScEnv
 768extendScSubstList env prs = env { sc_subst = extendSubstList (sc_subst env) prs }
 769
 770extendHowBound :: ScEnv -> [Var] -> HowBound -> ScEnv
 771extendHowBound env bndrs how_bound
 772  = env { sc_how_bound = extendVarEnvList (sc_how_bound env)
 773                            [(bndr,how_bound) | bndr <- bndrs] }
 774
 775extendBndrsWith :: HowBound -> ScEnv -> [Var] -> (ScEnv, [Var])
 776extendBndrsWith how_bound env bndrs
 777  = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndrs')
 778  where
 779    (subst', bndrs') = substBndrs (sc_subst env) bndrs
 780    hb_env' = sc_how_bound env `extendVarEnvList`
 781                    [(bndr,how_bound) | bndr <- bndrs']
 782
 783extendBndrWith :: HowBound -> ScEnv -> Var -> (ScEnv, Var)
 784extendBndrWith how_bound env bndr
 785  = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndr')
 786  where
 787    (subst', bndr') = substBndr (sc_subst env) bndr
 788    hb_env' = extendVarEnv (sc_how_bound env) bndr' how_bound
 789
 790extendRecBndrs :: ScEnv -> [Var] -> (ScEnv, [Var])
 791extendRecBndrs env bndrs  = (env { sc_subst = subst' }, bndrs')
 792                      where
 793                        (subst', bndrs') = substRecBndrs (sc_subst env) bndrs
 794
 795extendBndr :: ScEnv -> Var -> (ScEnv, Var)
 796extendBndr  env bndr  = (env { sc_subst = subst' }, bndr')
 797                      where
 798                        (subst', bndr') = substBndr (sc_subst env) bndr
 799
 800extendValEnv :: ScEnv -> Id -> Maybe Value -> ScEnv
 801extendValEnv env _  Nothing   = env
 802extendValEnv env id (Just cv) 
 803 | valueIsWorkFree cv      -- Don't duplicate work!!  Trac #7865
 804 = env { sc_vals = extendVarEnv (sc_vals env) id cv }
 805extendValEnv env _ _ = env
 806
 807extendCaseBndrs :: ScEnv -> OutExpr -> OutId -> AltCon -> [Var] -> (ScEnv, [Var])
 808-- When we encounter
 809--      case scrut of b
 810--          C x y -> ...
 811-- we want to bind b, to (C x y)
 812-- NB1: Extends only the sc_vals part of the envt
 813-- NB2: Kill the dead-ness info on the pattern binders x,y, since
 814--      they are potentially made alive by the [b -> C x y] binding
 815extendCaseBndrs env scrut case_bndr con alt_bndrs
 816   = (env2, alt_bndrs')
 817 where
 818   live_case_bndr = not (isDeadBinder case_bndr)
 819   env1 | Var v <- scrut = extendValEnv env v cval
 820        | otherwise      = env  -- See Note [Add scrutinee to ValueEnv too]
 821   env2 | live_case_bndr = extendValEnv env1 case_bndr cval
 822        | otherwise      = env1
 823
 824   alt_bndrs' | case scrut of { Var {} -> True; _ -> live_case_bndr }
 825              = map zap alt_bndrs
 826              | otherwise
 827              = alt_bndrs
 828
 829   cval = case con of
 830                DEFAULT    -> Nothing
 831                LitAlt {}  -> Just (ConVal con [])
 832                DataAlt {} -> Just (ConVal con vanilla_args)
 833                      where
 834                        vanilla_args = map Type (tyConAppArgs (idType case_bndr)) ++
 835                                       varsToCoreExprs alt_bndrs
 836
 837   zap v | isTyVar v = v                -- See NB2 above
 838         | otherwise = zapIdOccInfo v
 839
 840
 841decreaseSpecCount :: ScEnv -> Int -> ScEnv
 842-- See Note [Avoiding exponential blowup]
 843decreaseSpecCount env n_specs
 844  = env { sc_count = case sc_count env of
 845                       Nothing -> Nothing
 846                       Just n  -> Just (n `div` (n_specs + 1)) }
 847        -- The "+1" takes account of the original function;
 848        -- See Note [Avoiding exponential blowup]
 849
 850---------------------------------------------------
 851-- See Note [SpecConstrAnnotation]
 852ignoreType    :: ScEnv -> Type   -> Bool
 853ignoreDataCon  :: ScEnv -> DataCon -> Bool
 854forceSpecBndr :: ScEnv -> Var    -> Bool
 855#ifndef GHCI
 856ignoreType    _ _  = False
 857ignoreDataCon  _ _ = False
 858forceSpecBndr _ _  = False
 859
 860#else /* GHCI */
 861
 862ignoreDataCon env dc = ignoreTyCon env (dataConTyCon dc)
 863
 864ignoreType env ty
 865  = case tyConAppTyCon_maybe ty of
 866      Just tycon -> ignoreTyCon env tycon
 867      _          -> False
 868
 869ignoreTyCon :: ScEnv -> TyCon -> Bool
 870ignoreTyCon env tycon
 871  = lookupUFM (sc_annotations env) tycon == Just NoSpecConstr
 872
 873forceSpecBndr env var = forceSpecFunTy env . snd . splitForAllTys . varType $ var
 874
 875forceSpecFunTy :: ScEnv -> Type -> Bool
 876forceSpecFunTy env = any (forceSpecArgTy env) . fst . splitFunTys
 877
 878forceSpecArgTy :: ScEnv -> Type -> Bool
 879forceSpecArgTy env ty
 880  | Just ty' <- coreView ty = forceSpecArgTy env ty'
 881
 882forceSpecArgTy env ty
 883  | Just (tycon, tys) <- splitTyConApp_maybe ty
 884  , tycon /= funTyCon
 885      = lookupUFM (sc_annotations env) tycon == Just ForceSpecConstr
 886        || any (forceSpecArgTy env) tys
 887
 888forceSpecArgTy _ _ = False
 889#endif /* GHCI */
 890\end{code}
 891
 892Note [Add scrutinee to ValueEnv too]
 893~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 894Consider this:
 895   case x of y
 896     (a,b) -> case b of c
 897                I# v -> ...(f y)...
 898By the time we get to the call (f y), the ValueEnv
 899will have a binding for y, and for c
 900    y -> (a,b)
 901    c -> I# v
 902BUT that's not enough!  Looking at the call (f y) we
 903see that y is pair (a,b), but we also need to know what 'b' is.
 904So in extendCaseBndrs we must *also* add the binding
 905   b -> I# v
 906else we lose a useful specialisation for f.  This is necessary even
 907though the simplifier has systematically replaced uses of 'x' with 'y'
 908and 'b' with 'c' in the code.  The use of 'b' in the ValueEnv came
 909from outside the case.  See Trac #4908 for the live example.
 910
 911Note [Avoiding exponential blowup]
 912~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 913The sc_count field of the ScEnv says how many times we are prepared to
 914duplicate a single function.  But we must take care with recursive
 915specialiations.  Consider
 916
 917        let $j1 = let $j2 = let $j3 = ...
 918                            in
 919                            ...$j3...
 920                  in
 921                  ...$j2...
 922        in
 923        ...$j1...
 924
 925If we specialise $j1 then in each specialisation (as well as the original)
 926we can specialise $j2, and similarly $j3.  Even if we make just *one*
 927specialisation of each, because we also have the original we'll get 2^n
 928copies of $j3, which is not good.
 929
 930So when recursively specialising we divide the sc_count by the number of
 931copies we are making at this level, including the original.
 932
 933
 934%************************************************************************
 935%*                                                                      *
 936\subsection{Usage information: flows upwards}
 937%*                                                                      *
 938%************************************************************************
 939
 940\begin{code}
 941data ScUsage
 942   = SCU {
 943        scu_calls :: CallEnv,           -- Calls
 944                                        -- The functions are a subset of the
 945                                        --      RecFuns in the ScEnv
 946
 947        scu_occs :: !(IdEnv ArgOcc)     -- Information on argument occurrences
 948     }                                  -- The domain is OutIds
 949
 950type CallEnv = IdEnv [Call]
 951type Call = (ValueEnv, [CoreArg])
 952        -- The arguments of the call, together with the
 953        -- env giving the constructor bindings at the call site
 954
 955nullUsage :: ScUsage
 956nullUsage = SCU { scu_calls = emptyVarEnv, scu_occs = emptyVarEnv }
 957
 958combineCalls :: CallEnv -> CallEnv -> CallEnv
 959combineCalls = plusVarEnv_C (++)
 960
 961combineUsage :: ScUsage -> ScUsage -> ScUsage
 962combineUsage u1 u2 = SCU { scu_calls = combineCalls (scu_calls u1) (scu_calls u2),
 963                           scu_occs  = plusVarEnv_C combineOcc (scu_occs u1) (scu_occs u2) }
 964
 965combineUsages :: [ScUsage] -> ScUsage
 966combineUsages [] = nullUsage
 967combineUsages us = foldr1 combineUsage us
 968
 969lookupOccs :: ScUsage -> [OutVar] -> (ScUsage, [ArgOcc])
 970lookupOccs (SCU { scu_calls = sc_calls, scu_occs = sc_occs }) bndrs
 971  = (SCU {scu_calls = sc_calls, scu_occs = delVarEnvList sc_occs bndrs},
 972     [lookupVarEnv sc_occs b `orElse` NoOcc | b <- bndrs])
 973
 974data ArgOcc = NoOcc     -- Doesn't occur at all; or a type argument
 975            | UnkOcc    -- Used in some unknown way
 976
 977            | ScrutOcc  -- See Note [ScrutOcc]
 978                 (DataConEnv [ArgOcc])   -- How the sub-components are used
 979
 980type DataConEnv a = UniqFM a     -- Keyed by DataCon
 981
 982{- Note  [ScrutOcc]
 983~~~~~~~~~~~~~~~~~~~
 984An occurrence of ScrutOcc indicates that the thing, or a `cast` version of the thing,
 985is *only* taken apart or applied.
 986
 987  Functions, literal: ScrutOcc emptyUFM
 988  Data constructors:  ScrutOcc subs,
 989
 990where (subs :: UniqFM [ArgOcc]) gives usage of the *pattern-bound* components,
 991The domain of the UniqFM is the Unique of the data constructor
 992
 993The [ArgOcc] is the occurrences of the *pattern-bound* components
 994of the data structure.  E.g.
 995        data T a = forall b. MkT a b (b->a)
 996A pattern binds b, x::a, y::b, z::b->a, but not 'a'!
 997
 998-}
 999
1000instance Outputable ArgOcc where
1001  ppr (ScrutOcc xs) = ptext (sLit "scrut-occ") <> ppr xs
1002  ppr UnkOcc        = ptext (sLit "unk-occ")
1003  ppr NoOcc         = ptext (sLit "no-occ")
1004
1005evalScrutOcc :: ArgOcc
1006evalScrutOcc = ScrutOcc emptyUFM
1007
1008-- Experimentally, this vesion of combineOcc makes ScrutOcc "win", so
1009-- that if the thing is scrutinised anywhere then we get to see that
1010-- in the overall result, even if it's also used in a boxed way
1011-- This might be too agressive; see Note [Reboxing] Alternative 3
1012combineOcc :: ArgOcc -> ArgOcc -> ArgOcc
1013combineOcc NoOcc         occ           = occ
1014combineOcc occ           NoOcc         = occ
1015combineOcc (ScrutOcc xs) (ScrutOcc ys) = ScrutOcc (plusUFM_C combineOccs xs ys)
1016combineOcc UnkOcc        (ScrutOcc ys) = ScrutOcc ys
1017combineOcc (ScrutOcc xs) UnkOcc        = ScrutOcc xs
1018combineOcc UnkOcc        UnkOcc        = UnkOcc
1019
1020combineOccs :: [ArgOcc] -> [ArgOcc] -> [ArgOcc]
1021combineOccs xs ys = zipWithEqual "combineOccs" combineOcc xs ys
1022
1023setScrutOcc :: ScEnv -> ScUsage -> OutExpr -> ArgOcc -> ScUsage
1024-- _Overwrite_ the occurrence info for the scrutinee, if the scrutinee
1025-- is a variable, and an interesting variable
1026setScrutOcc env usg (Cast e _) occ      = setScrutOcc env usg e occ
1027setScrutOcc env usg (Tick _ e) occ      = setScrutOcc env usg e occ
1028setScrutOcc env usg (Var v)    occ
1029  | Just RecArg <- lookupHowBound env v = usg { scu_occs = extendVarEnv (scu_occs usg) v occ }
1030  | otherwise                           = usg
1031setScrutOcc _env usg _other _occ        -- Catch-all
1032  = usg
1033\end{code}
1034
1035%************************************************************************
1036%*                                                                      *
1037\subsection{The main recursive function}
1038%*                                                                      *
1039%************************************************************************
1040
1041The main recursive function gathers up usage information, and
1042creates specialised versions of functions.
1043
1044\begin{code}
1045scExpr, scExpr' :: ScEnv -> CoreExpr -> UniqSM (ScUsage, CoreExpr)
1046        -- The unique supply is needed when we invent
1047        -- a new name for the specialised function and its args
1048
1049scExpr env e = scExpr' env e
1050
1051
1052scExpr' env (Var v)      = case scSubstId env v of
1053                            Var v' -> return (mkVarUsage env v' [], Var v')
1054                            e'     -> scExpr (zapScSubst env) e'
1055
1056scExpr' env (Type t)     = return (nullUsage, Type (scSubstTy env t))
1057scExpr' env (Coercion c) = return (nullUsage, Coercion (scSubstCo env c))
1058scExpr' _   e@(Lit {})   = return (nullUsage, e)
1059scExpr' env (Tick t e)   = do (usg, e') <- scExpr env e
1060                              return (usg, Tick t e')
1061scExpr' env (Cast e co)  = do (usg, e') <- scExpr env e
1062                              return (usg, Cast e' (scSubstCo env co))
1063scExpr' env e@(App _ _)  = scApp env (collectArgs e)
1064scExpr' env (Lam b e)    = do let (env', b') = extendBndr env b
1065                              (usg, e') <- scExpr env' e
1066                              return (usg, Lam b' e')
1067
1068scExpr' env (Case scrut b ty alts)
1069  = do  { (scrut_usg, scrut') <- scExpr env scrut
1070        ; case isValue (sc_vals env) scrut' of
1071                Just (ConVal con args) -> sc_con_app con args scrut'
1072                _other                 -> sc_vanilla scrut_usg scrut'
1073        }
1074  where
1075    sc_con_app con args scrut'  -- Known constructor; simplify
1076     = do { let (_, bs, rhs) = findAlt con alts
1077                                  `orElse` (DEFAULT, [], mkImpossibleExpr ty)
1078                alt_env'     = extendScSubstList env ((b,scrut') : bs `zip` trimConArgs con args)
1079          ; scExpr alt_env' rhs }
1080
1081    sc_vanilla scrut_usg scrut' -- Normal case
1082     = do { let (alt_env,b') = extendBndrWith RecArg env b
1083                        -- Record RecArg for the components
1084
1085          ; (alt_usgs, alt_occs, alts')
1086                <- mapAndUnzip3M (sc_alt alt_env scrut' b') alts
1087
1088          ; let scrut_occ  = foldr combineOcc NoOcc alt_occs
1089                scrut_usg' = setScrutOcc env scrut_usg scrut' scrut_occ
1090                -- The combined usage of the scrutinee is given
1091                -- by scrut_occ, which is passed to scScrut, which
1092                -- in turn treats a bare-variable scrutinee specially
1093
1094          ; return (foldr combineUsage scrut_usg' alt_usgs,
1095                    Case scrut' b' (scSubstTy env ty) alts') }
1096
1097    sc_alt env scrut' b' (con,bs,rhs)
1098     = do { let (env1, bs1) = extendBndrsWith RecArg env bs
1099                (env2, bs2) = extendCaseBndrs env1 scrut' b' con bs1
1100          ; (usg, rhs') <- scExpr env2 rhs
1101          ; let (usg', b_occ:arg_occs) = lookupOccs usg (b':bs2)
1102                scrut_occ = case con of
1103                               DataAlt dc -> ScrutOcc (unitUFM dc arg_occs)
1104                               _          -> ScrutOcc emptyUFM
1105          ; return (usg', b_occ `combineOcc` scrut_occ, (con, bs2, rhs')) }
1106
1107scExpr' env (Let (NonRec bndr rhs) body)
1108  | isTyVar bndr        -- Type-lets may be created by doBeta
1109  = scExpr' (extendScSubst env bndr rhs) body
1110
1111  | otherwise
1112  = do  { let (body_env, bndr') = extendBndr env bndr
1113        ; (rhs_usg, rhs_info)  <- scRecRhs env (bndr',rhs)
1114
1115        ; let body_env2         = extendHowBound body_env [bndr'] RecFun
1116                                   -- Note [Local let bindings]
1117              RI _ rhs' _ _ _   = rhs_info
1118              body_env3         = extendValEnv body_env2 bndr' (isValue (sc_vals env) rhs')
1119
1120        ; (body_usg, body') <- scExpr body_env3 body
1121
1122          -- NB: For non-recursive bindings we inherit sc_force flag from
1123          -- the parent function (see Note [Forcing specialisation])
1124        ; (spec_usg, specs) <- specialise env
1125                                          (scu_calls body_usg)
1126                                          rhs_info
1127                                          (SI [] 0 (Just rhs_usg))
1128
1129        ; return (body_usg { scu_calls = scu_calls body_usg `delVarEnv` bndr' }
1130                    `combineUsage` rhs_usg `combineUsage` spec_usg,
1131                  mkLets [NonRec b r | (b,r) <- specInfoBinds rhs_info specs] body')
1132        }
1133
1134
1135-- A *local* recursive group: see Note [Local recursive groups]
1136scExpr' env (Let (Rec prs) body)
1137  = do  { let (bndrs,rhss)      = unzip prs
1138              (rhs_env1,bndrs') = extendRecBndrs env bndrs
1139              rhs_env2          = extendHowBound rhs_env1 bndrs' RecFun
1140              force_spec        = any (forceSpecBndr env) bndrs'
1141                -- Note [Forcing specialisation]
1142
1143        ; (rhs_usgs, rhs_infos) <- mapAndUnzipM (scRecRhs rhs_env2) (bndrs' `zip` rhss)
1144        ; (body_usg, body')     <- scExpr rhs_env2 body
1145
1146        -- NB: start specLoop from body_usg
1147        ; (spec_usg, specs) <- specLoop (scForce rhs_env2 force_spec)
1148                                        (scu_calls body_usg) rhs_infos nullUsage
1149                                        [SI [] 0 (Just usg) | usg <- rhs_usgs]
1150                -- Do not unconditionally generate specialisations from rhs_usgs
1151                -- Instead use them only if we find an unspecialised call
1152                -- See Note [Local recursive groups]
1153
1154        ; let rhs_usg = combineUsages rhs_usgs
1155              all_usg = spec_usg `combineUsage` rhs_usg `combineUsage` body_usg
1156              bind'   = Rec (concat (zipWith specInfoBinds rhs_infos specs))
1157
1158        ; return (all_usg { scu_calls = scu_calls all_usg `delVarEnvList` bndrs' },
1159                  Let bind' body') }
1160\end{code}
1161
1162Note [Local let bindings]
1163~~~~~~~~~~~~~~~~~~~~~~~~~
1164It is not uncommon to find this
1165
1166   let $j = \x. <blah> in ...$j True...$j True...
1167
1168Here $j is an arbitrary let-bound function, but it often comes up for
1169join points.  We might like to specialise $j for its call patterns.
1170Notice the difference from a letrec, where we look for call patterns
1171in the *RHS* of the function.  Here we look for call patterns in the
1172*body* of the let.
1173
1174At one point I predicated this on the RHS mentioning the outer
1175recursive function, but that's not essential and might even be
1176harmful.  I'm not sure.
1177
1178
1179\begin{code}
1180scApp :: ScEnv -> (InExpr, [InExpr]) -> UniqSM (ScUsage, CoreExpr)
1181
1182scApp env (Var fn, args)        -- Function is a variable
1183  = ASSERT( not (null args) )
1184    do  { args_w_usgs <- mapM (scExpr env) args
1185        ; let (arg_usgs, args') = unzip args_w_usgs
1186              arg_usg = combineUsages arg_usgs
1187        ; case scSubstId env fn of
1188            fn'@(Lam {}) -> scExpr (zapScSubst env) (doBeta fn' args')
1189                        -- Do beta-reduction and try again
1190
1191            Var fn' -> return (arg_usg `combineUsage` mkVarUsage env fn' args',
1192                               mkApps (Var fn') args')
1193
1194            other_fn' -> return (arg_usg, mkApps other_fn' args') }
1195                -- NB: doing this ignores any usage info from the substituted
1196                --     function, but I don't think that matters.  If it does
1197                --     we can fix it.
1198  where
1199    doBeta :: OutExpr -> [OutExpr] -> OutExpr
1200    -- ToDo: adjust for System IF
1201    doBeta (Lam bndr body) (arg : args) = Let (NonRec bndr arg) (doBeta body args)
1202    doBeta fn              args         = mkApps fn args
1203
1204-- The function is almost always a variable, but not always.
1205-- In particular, if this pass follows float-in,
1206-- which it may, we can get
1207--      (let f = ...f... in f) arg1 arg2
1208scApp env (other_fn, args)
1209  = do  { (fn_usg,   fn')   <- scExpr env other_fn
1210        ; (arg_usgs, args') <- mapAndUnzipM (scExpr env) args
1211        ; return (combineUsages arg_usgs `combineUsage` fn_usg, mkApps fn' args') }
1212
1213----------------------
1214mkVarUsage :: ScEnv -> Id -> [CoreExpr] -> ScUsage
1215mkVarUsage env fn args
1216  = case lookupHowBound env fn of
1217        Just RecFun -> SCU { scu_calls = unitVarEnv fn [(sc_vals env, args)]
1218                           , scu_occs  = emptyVarEnv }
1219        Just RecArg -> SCU { scu_calls = emptyVarEnv
1220                           , scu_occs  = unitVarEnv fn arg_occ }
1221        Nothing     -> nullUsage
1222  where
1223    -- I rather think we could use UnkOcc all the time
1224    arg_occ | null args = UnkOcc
1225            | otherwise = evalScrutOcc
1226
1227----------------------
1228scTopBind :: ScEnv -> CoreBind -> UniqSM (ScEnv, CoreBind)
1229scTopBind env (Rec prs)
1230  | Just threshold <- sc_size env
1231  , not force_spec
1232  , not (all (couldBeSmallEnoughToInline (sc_dflags env) threshold) rhss)
1233                -- No specialisation
1234  = do  { let (rhs_env,bndrs') = extendRecBndrs env bndrs
1235        ; (_, rhss') <- mapAndUnzipM (scExpr rhs_env) rhss
1236        ; return (rhs_env, Rec (bndrs' `zip` rhss')) }
1237  | otherwise   -- Do specialisation
1238  = do  { let (rhs_env1,bndrs') = extendRecBndrs env bndrs
1239              rhs_env2          = extendHowBound rhs_env1 bndrs' RecFun
1240
1241        ; (rhs_usgs, rhs_infos) <- mapAndUnzipM (scRecRhs rhs_env2) (bndrs' `zip` rhss)
1242        ; let rhs_usg = combineUsages rhs_usgs
1243
1244        ; (_, specs) <- specLoop (scForce rhs_env2 force_spec)
1245                                 (scu_calls rhs_usg) rhs_infos nullUsage
1246                                 [SI [] 0 Nothing | _ <- bndrs]
1247
1248        ; return (rhs_env1,  -- For the body of the letrec, delete the RecFun business
1249                  Rec (concat (zipWith specInfoBinds rhs_infos specs))) }
1250  where
1251    (bndrs,rhss) = unzip prs
1252    force_spec   = any (forceSpecBndr env) bndrs
1253      -- Note [Forcing specialisation]
1254
1255scTopBind env (NonRec bndr rhs)
1256  = do  { (_, rhs') <- scExpr env rhs
1257        ; let (env1, bndr') = extendBndr env bndr
1258              env2          = extendValEnv env1 bndr' (isValue (sc_vals env) rhs')
1259        ; return (env2, NonRec bndr' rhs') }
1260
1261----------------------
1262scRecRhs :: ScEnv -> (OutId, InExpr) -> UniqSM (ScUsage, RhsInfo)
1263scRecRhs env (bndr,rhs)
1264  = do  { let (arg_bndrs,body)       = collectBinders rhs
1265              (body_env, arg_bndrs') = extendBndrsWith RecArg env arg_bndrs
1266        ; (body_usg, body')         <- scExpr body_env body
1267        ; let (rhs_usg, arg_occs)    = lookupOccs body_usg arg_bndrs'
1268        ; return (rhs_usg, RI bndr (mkLams arg_bndrs' body')
1269                                   arg_bndrs body arg_occs) }
1270                -- The arg_occs says how the visible,
1271                -- lambda-bound binders of the RHS are used
1272                -- (including the TyVar binders)
1273                -- Two pats are the same if they match both ways
1274
1275----------------------
1276specInfoBinds :: RhsInfo -> SpecInfo -> [(Id,CoreExpr)]
1277specInfoBinds (RI fn new_rhs _ _ _) (SI specs _ _)
1278  = [(id,rhs) | OS _ _ id rhs <- specs] ++
1279              -- First the specialised bindings
1280
1281    [(fn `addIdSpecialisations` rules, new_rhs)]
1282              -- And now the original binding
1283  where
1284    rules = [r | OS _ r _ _ <- specs]
1285\end{code}
1286
1287
1288%************************************************************************
1289%*                                                                      *
1290                The specialiser itself
1291%*                                                                      *
1292%*************…

Large files files are truncated, but you can click here to view the full file