PageRenderTime 55ms CodeModel.GetById 21ms RepoModel.GetById 0ms app.codeStats 1ms

/compiler/cmm/CmmLayoutStack.hs

http://github.com/ghc/ghc
Haskell | 1125 lines | 536 code | 154 blank | 435 comment | 12 complexity | 4c8065f81533bb6212db2836a69012bd MD5 | raw file
Possible License(s): MIT, BSD-3-Clause, GPL-3.0
  1. {-# LANGUAGE CPP, RecordWildCards, GADTs #-}
  2. module CmmLayoutStack (
  3. cmmLayoutStack, setInfoTableStackMap
  4. ) where
  5. import StgCmmUtils ( callerSaveVolatileRegs ) -- XXX layering violation
  6. import StgCmmForeign ( saveThreadState, loadThreadState ) -- XXX layering violation
  7. import BasicTypes
  8. import Cmm
  9. import CmmInfo
  10. import BlockId
  11. import CLabel
  12. import CmmUtils
  13. import MkGraph
  14. import ForeignCall
  15. import CmmLive
  16. import CmmProcPoint
  17. import SMRep
  18. import Hoopl
  19. import UniqSupply
  20. import StgCmmUtils ( newTemp )
  21. import Maybes
  22. import UniqFM
  23. import Util
  24. import DynFlags
  25. import FastString
  26. import Outputable hiding ( isEmpty )
  27. import qualified Data.Set as Set
  28. import Control.Monad.Fix
  29. import Data.Array as Array
  30. import Data.Bits
  31. import Data.List (nub)
  32. import Control.Monad (liftM)
  33. import Prelude hiding ((<*>))
  34. #include "HsVersions.h"
  35. {- Note [Stack Layout]
  36. The job of this pass is to
  37. - replace references to abstract stack Areas with fixed offsets from Sp.
  38. - replace the CmmHighStackMark constant used in the stack check with
  39. the maximum stack usage of the proc.
  40. - save any variables that are live across a call, and reload them as
  41. necessary.
  42. Before stack allocation, local variables remain live across native
  43. calls (CmmCall{ cmm_cont = Just _ }), and after stack allocation local
  44. variables are clobbered by native calls.
  45. We want to do stack allocation so that as far as possible
  46. - stack use is minimized, and
  47. - unnecessary stack saves and loads are avoided.
  48. The algorithm we use is a variant of linear-scan register allocation,
  49. where the stack is our register file.
  50. - First, we do a liveness analysis, which annotates every block with
  51. the variables live on entry to the block.
  52. - We traverse blocks in reverse postorder DFS; that is, we visit at
  53. least one predecessor of a block before the block itself. The
  54. stack layout flowing from the predecessor of the block will
  55. determine the stack layout on entry to the block.
  56. - We maintain a data structure
  57. Map Label StackMap
  58. which describes the contents of the stack and the stack pointer on
  59. entry to each block that is a successor of a block that we have
  60. visited.
  61. - For each block we visit:
  62. - Look up the StackMap for this block.
  63. - If this block is a proc point (or a call continuation, if we
  64. aren't splitting proc points), emit instructions to reload all
  65. the live variables from the stack, according to the StackMap.
  66. - Walk forwards through the instructions:
  67. - At an assignment x = Sp[loc]
  68. - Record the fact that Sp[loc] contains x, so that we won't
  69. need to save x if it ever needs to be spilled.
  70. - At an assignment x = E
  71. - If x was previously on the stack, it isn't any more
  72. - At the last node, if it is a call or a jump to a proc point
  73. - Lay out the stack frame for the call (see setupStackFrame)
  74. - emit instructions to save all the live variables
  75. - Remember the StackMaps for all the successors
  76. - emit an instruction to adjust Sp
  77. - If the last node is a branch, then the current StackMap is the
  78. StackMap for the successors.
  79. - Manifest Sp: replace references to stack areas in this block
  80. with real Sp offsets. We cannot do this until we have laid out
  81. the stack area for the successors above.
  82. In this phase we also eliminate redundant stores to the stack;
  83. see elimStackStores.
  84. - There is one important gotcha: sometimes we'll encounter a control
  85. transfer to a block that we've already processed (a join point),
  86. and in that case we might need to rearrange the stack to match
  87. what the block is expecting. (exactly the same as in linear-scan
  88. register allocation, except here we have the luxury of an infinite
  89. supply of temporary variables).
  90. - Finally, we update the magic CmmHighStackMark constant with the
  91. stack usage of the function, and eliminate the whole stack check
  92. if there was no stack use. (in fact this is done as part of the
  93. main traversal, by feeding the high-water-mark output back in as
  94. an input. I hate cyclic programming, but it's just too convenient
  95. sometimes.)
  96. There are plenty of tricky details: update frames, proc points, return
  97. addresses, foreign calls, and some ad-hoc optimisations that are
  98. convenient to do here and effective in common cases. Comments in the
  99. code below explain these.
  100. -}
  101. -- All stack locations are expressed as positive byte offsets from the
  102. -- "base", which is defined to be the address above the return address
  103. -- on the stack on entry to this CmmProc.
  104. --
  105. -- Lower addresses have higher StackLocs.
  106. --
  107. type StackLoc = ByteOff
  108. {-
  109. A StackMap describes the stack at any given point. At a continuation
  110. it has a particular layout, like this:
  111. | | <- base
  112. |-------------|
  113. | ret0 | <- base + 8
  114. |-------------|
  115. . upd frame . <- base + sm_ret_off
  116. |-------------|
  117. | |
  118. . vars .
  119. . (live/dead) .
  120. | | <- base + sm_sp - sm_args
  121. |-------------|
  122. | ret1 |
  123. . ret vals . <- base + sm_sp (<--- Sp points here)
  124. |-------------|
  125. Why do we include the final return address (ret0) in our stack map? I
  126. have absolutely no idea, but it seems to be done that way consistently
  127. in the rest of the code generator, so I played along here. --SDM
  128. Note that we will be constructing an info table for the continuation
  129. (ret1), which needs to describe the stack down to, but not including,
  130. the update frame (or ret0, if there is no update frame).
  131. -}
  132. data StackMap = StackMap
  133. { sm_sp :: StackLoc
  134. -- ^ the offset of Sp relative to the base on entry
  135. -- to this block.
  136. , sm_args :: ByteOff
  137. -- ^ the number of bytes of arguments in the area for this block
  138. -- Defn: the offset of young(L) relative to the base is given by
  139. -- (sm_sp - sm_args) of the StackMap for block L.
  140. , sm_ret_off :: ByteOff
  141. -- ^ Number of words of stack that we do not describe with an info
  142. -- table, because it contains an update frame.
  143. , sm_regs :: UniqFM (LocalReg,StackLoc)
  144. -- ^ regs on the stack
  145. }
  146. instance Outputable StackMap where
  147. ppr StackMap{..} =
  148. text "Sp = " <> int sm_sp $$
  149. text "sm_args = " <> int sm_args $$
  150. text "sm_ret_off = " <> int sm_ret_off $$
  151. text "sm_regs = " <> pprUFM sm_regs ppr
  152. cmmLayoutStack :: DynFlags -> ProcPointSet -> ByteOff -> CmmGraph
  153. -> UniqSM (CmmGraph, BlockEnv StackMap)
  154. cmmLayoutStack dflags procpoints entry_args
  155. graph@(CmmGraph { g_entry = entry })
  156. = do
  157. -- We need liveness info. Dead assignments are removed later
  158. -- by the sinking pass.
  159. let liveness = cmmLocalLiveness dflags graph
  160. blocks = postorderDfs graph
  161. (final_stackmaps, _final_high_sp, new_blocks) <-
  162. mfix $ \ ~(rec_stackmaps, rec_high_sp, _new_blocks) ->
  163. layout dflags procpoints liveness entry entry_args
  164. rec_stackmaps rec_high_sp blocks
  165. new_blocks' <- mapM (lowerSafeForeignCall dflags) new_blocks
  166. return (ofBlockList entry new_blocks', final_stackmaps)
  167. layout :: DynFlags
  168. -> BlockSet -- proc points
  169. -> BlockEnv CmmLocalLive -- liveness
  170. -> BlockId -- entry
  171. -> ByteOff -- stack args on entry
  172. -> BlockEnv StackMap -- [final] stack maps
  173. -> ByteOff -- [final] Sp high water mark
  174. -> [CmmBlock] -- [in] blocks
  175. -> UniqSM
  176. ( BlockEnv StackMap -- [out] stack maps
  177. , ByteOff -- [out] Sp high water mark
  178. , [CmmBlock] -- [out] new blocks
  179. )
  180. layout dflags procpoints liveness entry entry_args final_stackmaps final_sp_high blocks
  181. = go blocks init_stackmap entry_args []
  182. where
  183. (updfr, cont_info) = collectContInfo blocks
  184. init_stackmap = mapSingleton entry StackMap{ sm_sp = entry_args
  185. , sm_args = entry_args
  186. , sm_ret_off = updfr
  187. , sm_regs = emptyUFM
  188. }
  189. go [] acc_stackmaps acc_hwm acc_blocks
  190. = return (acc_stackmaps, acc_hwm, acc_blocks)
  191. go (b0 : bs) acc_stackmaps acc_hwm acc_blocks
  192. = do
  193. let (entry0@(CmmEntry entry_lbl tscope), middle0, last0) = blockSplit b0
  194. let stack0@StackMap { sm_sp = sp0 }
  195. = mapFindWithDefault
  196. (pprPanic "no stack map for" (ppr entry_lbl))
  197. entry_lbl acc_stackmaps
  198. -- (a) Update the stack map to include the effects of
  199. -- assignments in this block
  200. let stack1 = foldBlockNodesF (procMiddle acc_stackmaps) middle0 stack0
  201. -- (b) Insert assignments to reload all the live variables if this
  202. -- block is a proc point
  203. let middle1 = if entry_lbl `setMember` procpoints
  204. then foldr blockCons middle0 (insertReloads stack0)
  205. else middle0
  206. -- (c) Look at the last node and if we are making a call or
  207. -- jumping to a proc point, we must save the live
  208. -- variables, adjust Sp, and construct the StackMaps for
  209. -- each of the successor blocks. See handleLastNode for
  210. -- details.
  211. (middle2, sp_off, last1, fixup_blocks, out)
  212. <- handleLastNode dflags procpoints liveness cont_info
  213. acc_stackmaps stack1 tscope middle0 last0
  214. -- (d) Manifest Sp: run over the nodes in the block and replace
  215. -- CmmStackSlot with CmmLoad from Sp with a concrete offset.
  216. --
  217. -- our block:
  218. -- middle1 -- the original middle nodes
  219. -- middle2 -- live variable saves from handleLastNode
  220. -- Sp = Sp + sp_off -- Sp adjustment goes here
  221. -- last1 -- the last node
  222. --
  223. let middle_pre = blockToList $ foldl blockSnoc middle1 middle2
  224. final_blocks = manifestSp dflags final_stackmaps stack0 sp0 final_sp_high entry0
  225. middle_pre sp_off last1 fixup_blocks
  226. acc_stackmaps' = mapUnion acc_stackmaps out
  227. -- If this block jumps to the GC, then we do not take its
  228. -- stack usage into account for the high-water mark.
  229. -- Otherwise, if the only stack usage is in the stack-check
  230. -- failure block itself, we will do a redundant stack
  231. -- check. The stack has a buffer designed to accommodate
  232. -- the largest amount of stack needed for calling the GC.
  233. --
  234. this_sp_hwm | isGcJump last0 = 0
  235. | otherwise = sp0 - sp_off
  236. hwm' = maximum (acc_hwm : this_sp_hwm : map sm_sp (mapElems out))
  237. go bs acc_stackmaps' hwm' (final_blocks ++ acc_blocks)
  238. -- -----------------------------------------------------------------------------
  239. -- Not foolproof, but GCFun is the culprit we most want to catch
  240. isGcJump :: CmmNode O C -> Bool
  241. isGcJump (CmmCall { cml_target = CmmReg (CmmGlobal l) })
  242. = l == GCFun || l == GCEnter1
  243. isGcJump _something_else = False
  244. -- -----------------------------------------------------------------------------
  245. -- This doesn't seem right somehow. We need to find out whether this
  246. -- proc will push some update frame material at some point, so that we
  247. -- can avoid using that area of the stack for spilling. The
  248. -- updfr_space field of the CmmProc *should* tell us, but it doesn't
  249. -- (I think maybe it gets filled in later when we do proc-point
  250. -- splitting).
  251. --
  252. -- So we'll just take the max of all the cml_ret_offs. This could be
  253. -- unnecessarily pessimistic, but probably not in the code we
  254. -- generate.
  255. collectContInfo :: [CmmBlock] -> (ByteOff, BlockEnv ByteOff)
  256. collectContInfo blocks
  257. = (maximum ret_offs, mapFromList (catMaybes mb_argss))
  258. where
  259. (mb_argss, ret_offs) = mapAndUnzip get_cont blocks
  260. get_cont :: Block CmmNode x C -> (Maybe (Label, ByteOff), ByteOff)
  261. get_cont b =
  262. case lastNode b of
  263. CmmCall { cml_cont = Just l, .. }
  264. -> (Just (l, cml_ret_args), cml_ret_off)
  265. CmmForeignCall { .. }
  266. -> (Just (succ, ret_args), ret_off)
  267. _other -> (Nothing, 0)
  268. -- -----------------------------------------------------------------------------
  269. -- Updating the StackMap from middle nodes
  270. -- Look for loads from stack slots, and update the StackMap. This is
  271. -- purely for optimisation reasons, so that we can avoid saving a
  272. -- variable back to a different stack slot if it is already on the
  273. -- stack.
  274. --
  275. -- This happens a lot: for example when function arguments are passed
  276. -- on the stack and need to be immediately saved across a call, we
  277. -- want to just leave them where they are on the stack.
  278. --
  279. procMiddle :: BlockEnv StackMap -> CmmNode e x -> StackMap -> StackMap
  280. procMiddle stackmaps node sm
  281. = case node of
  282. CmmAssign (CmmLocal r) (CmmLoad (CmmStackSlot area off) _)
  283. -> sm { sm_regs = addToUFM (sm_regs sm) r (r,loc) }
  284. where loc = getStackLoc area off stackmaps
  285. CmmAssign (CmmLocal r) _other
  286. -> sm { sm_regs = delFromUFM (sm_regs sm) r }
  287. _other
  288. -> sm
  289. getStackLoc :: Area -> ByteOff -> BlockEnv StackMap -> StackLoc
  290. getStackLoc Old n _ = n
  291. getStackLoc (Young l) n stackmaps =
  292. case mapLookup l stackmaps of
  293. Nothing -> pprPanic "getStackLoc" (ppr l)
  294. Just sm -> sm_sp sm - sm_args sm + n
  295. -- -----------------------------------------------------------------------------
  296. -- Handling stack allocation for a last node
  297. -- We take a single last node and turn it into:
  298. --
  299. -- C1 (some statements)
  300. -- Sp = Sp + N
  301. -- C2 (some more statements)
  302. -- call f() -- the actual last node
  303. --
  304. -- plus possibly some more blocks (we may have to add some fixup code
  305. -- between the last node and the continuation).
  306. --
  307. -- C1: is the code for saving the variables across this last node onto
  308. -- the stack, if the continuation is a call or jumps to a proc point.
  309. --
  310. -- C2: if the last node is a safe foreign call, we have to inject some
  311. -- extra code that goes *after* the Sp adjustment.
  312. handleLastNode
  313. :: DynFlags -> ProcPointSet -> BlockEnv CmmLocalLive -> BlockEnv ByteOff
  314. -> BlockEnv StackMap -> StackMap -> CmmTickScope
  315. -> Block CmmNode O O
  316. -> CmmNode O C
  317. -> UniqSM
  318. ( [CmmNode O O] -- nodes to go *before* the Sp adjustment
  319. , ByteOff -- amount to adjust Sp
  320. , CmmNode O C -- new last node
  321. , [CmmBlock] -- new blocks
  322. , BlockEnv StackMap -- stackmaps for the continuations
  323. )
  324. handleLastNode dflags procpoints liveness cont_info stackmaps
  325. stack0@StackMap { sm_sp = sp0 } tscp middle last
  326. = case last of
  327. -- At each return / tail call,
  328. -- adjust Sp to point to the last argument pushed, which
  329. -- is cml_args, after popping any other junk from the stack.
  330. CmmCall{ cml_cont = Nothing, .. } -> do
  331. let sp_off = sp0 - cml_args
  332. return ([], sp_off, last, [], mapEmpty)
  333. -- At each CmmCall with a continuation:
  334. CmmCall{ cml_cont = Just cont_lbl, .. } ->
  335. return $ lastCall cont_lbl cml_args cml_ret_args cml_ret_off
  336. CmmForeignCall{ succ = cont_lbl, .. } -> do
  337. return $ lastCall cont_lbl (wORD_SIZE dflags) ret_args ret_off
  338. -- one word of args: the return address
  339. CmmBranch {} -> handleBranches
  340. CmmCondBranch {} -> handleBranches
  341. CmmSwitch {} -> handleBranches
  342. where
  343. -- Calls and ForeignCalls are handled the same way:
  344. lastCall :: BlockId -> ByteOff -> ByteOff -> ByteOff
  345. -> ( [CmmNode O O]
  346. , ByteOff
  347. , CmmNode O C
  348. , [CmmBlock]
  349. , BlockEnv StackMap
  350. )
  351. lastCall lbl cml_args cml_ret_args cml_ret_off
  352. = ( assignments
  353. , spOffsetForCall sp0 cont_stack cml_args
  354. , last
  355. , [] -- no new blocks
  356. , mapSingleton lbl cont_stack )
  357. where
  358. (assignments, cont_stack) = prepareStack lbl cml_ret_args cml_ret_off
  359. prepareStack lbl cml_ret_args cml_ret_off
  360. | Just cont_stack <- mapLookup lbl stackmaps
  361. -- If we have already seen this continuation before, then
  362. -- we just have to make the stack look the same:
  363. = (fixupStack stack0 cont_stack, cont_stack)
  364. -- Otherwise, we have to allocate the stack frame
  365. | otherwise
  366. = (save_assignments, new_cont_stack)
  367. where
  368. (new_cont_stack, save_assignments)
  369. = setupStackFrame dflags lbl liveness cml_ret_off cml_ret_args stack0
  370. -- For other last nodes (branches), if any of the targets is a
  371. -- proc point, we have to set up the stack to match what the proc
  372. -- point is expecting.
  373. --
  374. handleBranches :: UniqSM ( [CmmNode O O]
  375. , ByteOff
  376. , CmmNode O C
  377. , [CmmBlock]
  378. , BlockEnv StackMap )
  379. handleBranches
  380. -- Note [diamond proc point]
  381. | Just l <- futureContinuation middle
  382. , (nub $ filter (`setMember` procpoints) $ successors last) == [l]
  383. = do
  384. let cont_args = mapFindWithDefault 0 l cont_info
  385. (assigs, cont_stack) = prepareStack l cont_args (sm_ret_off stack0)
  386. out = mapFromList [ (l', cont_stack)
  387. | l' <- successors last ]
  388. return ( assigs
  389. , spOffsetForCall sp0 cont_stack (wORD_SIZE dflags)
  390. , last
  391. , []
  392. , out)
  393. | otherwise = do
  394. pps <- mapM handleBranch (successors last)
  395. let lbl_map :: LabelMap Label
  396. lbl_map = mapFromList [ (l,tmp) | (l,tmp,_,_) <- pps ]
  397. fix_lbl l = mapFindWithDefault l l lbl_map
  398. return ( []
  399. , 0
  400. , mapSuccessors fix_lbl last
  401. , concat [ blk | (_,_,_,blk) <- pps ]
  402. , mapFromList [ (l, sm) | (l,_,sm,_) <- pps ] )
  403. -- For each successor of this block
  404. handleBranch :: BlockId -> UniqSM (BlockId, BlockId, StackMap, [CmmBlock])
  405. handleBranch l
  406. -- (a) if the successor already has a stackmap, we need to
  407. -- shuffle the current stack to make it look the same.
  408. -- We have to insert a new block to make this happen.
  409. | Just stack2 <- mapLookup l stackmaps
  410. = do
  411. let assigs = fixupStack stack0 stack2
  412. (tmp_lbl, block) <- makeFixupBlock dflags sp0 l stack2 tscp assigs
  413. return (l, tmp_lbl, stack2, block)
  414. -- (b) if the successor is a proc point, save everything
  415. -- on the stack.
  416. | l `setMember` procpoints
  417. = do
  418. let cont_args = mapFindWithDefault 0 l cont_info
  419. (stack2, assigs) =
  420. setupStackFrame dflags l liveness (sm_ret_off stack0)
  421. cont_args stack0
  422. (tmp_lbl, block) <- makeFixupBlock dflags sp0 l stack2 tscp assigs
  423. return (l, tmp_lbl, stack2, block)
  424. -- (c) otherwise, the current StackMap is the StackMap for
  425. -- the continuation. But we must remember to remove any
  426. -- variables from the StackMap that are *not* live at
  427. -- the destination, because this StackMap might be used
  428. -- by fixupStack if this is a join point.
  429. | otherwise = return (l, l, stack1, [])
  430. where live = mapFindWithDefault (panic "handleBranch") l liveness
  431. stack1 = stack0 { sm_regs = filterUFM is_live (sm_regs stack0) }
  432. is_live (r,_) = r `elemRegSet` live
  433. makeFixupBlock :: DynFlags -> ByteOff -> Label -> StackMap
  434. -> CmmTickScope -> [CmmNode O O]
  435. -> UniqSM (Label, [CmmBlock])
  436. makeFixupBlock dflags sp0 l stack tscope assigs
  437. | null assigs && sp0 == sm_sp stack = return (l, [])
  438. | otherwise = do
  439. tmp_lbl <- liftM mkBlockId $ getUniqueM
  440. let sp_off = sp0 - sm_sp stack
  441. block = blockJoin (CmmEntry tmp_lbl tscope)
  442. (maybeAddSpAdj dflags sp_off (blockFromList assigs))
  443. (CmmBranch l)
  444. return (tmp_lbl, [block])
  445. -- Sp is currently pointing to current_sp,
  446. -- we want it to point to
  447. -- (sm_sp cont_stack - sm_args cont_stack + args)
  448. -- so the difference is
  449. -- sp0 - (sm_sp cont_stack - sm_args cont_stack + args)
  450. spOffsetForCall :: ByteOff -> StackMap -> ByteOff -> ByteOff
  451. spOffsetForCall current_sp cont_stack args
  452. = current_sp - (sm_sp cont_stack - sm_args cont_stack + args)
  453. -- | create a sequence of assignments to establish the new StackMap,
  454. -- given the old StackMap.
  455. fixupStack :: StackMap -> StackMap -> [CmmNode O O]
  456. fixupStack old_stack new_stack = concatMap move new_locs
  457. where
  458. old_map = sm_regs old_stack
  459. new_locs = stackSlotRegs new_stack
  460. move (r,n)
  461. | Just (_,m) <- lookupUFM old_map r, n == m = []
  462. | otherwise = [CmmStore (CmmStackSlot Old n)
  463. (CmmReg (CmmLocal r))]
  464. setupStackFrame
  465. :: DynFlags
  466. -> BlockId -- label of continuation
  467. -> BlockEnv CmmLocalLive -- liveness
  468. -> ByteOff -- updfr
  469. -> ByteOff -- bytes of return values on stack
  470. -> StackMap -- current StackMap
  471. -> (StackMap, [CmmNode O O])
  472. setupStackFrame dflags lbl liveness updfr_off ret_args stack0
  473. = (cont_stack, assignments)
  474. where
  475. -- get the set of LocalRegs live in the continuation
  476. live = mapFindWithDefault Set.empty lbl liveness
  477. -- the stack from the base to updfr_off is off-limits.
  478. -- our new stack frame contains:
  479. -- * saved live variables
  480. -- * the return address [young(C) + 8]
  481. -- * the args for the call,
  482. -- which are replaced by the return values at the return
  483. -- point.
  484. -- everything up to updfr_off is off-limits
  485. -- stack1 contains updfr_off, plus everything we need to save
  486. (stack1, assignments) = allocate dflags updfr_off live stack0
  487. -- And the Sp at the continuation is:
  488. -- sm_sp stack1 + ret_args
  489. cont_stack = stack1{ sm_sp = sm_sp stack1 + ret_args
  490. , sm_args = ret_args
  491. , sm_ret_off = updfr_off
  492. }
  493. -- -----------------------------------------------------------------------------
  494. -- Note [diamond proc point]
  495. --
  496. -- This special case looks for the pattern we get from a typical
  497. -- tagged case expression:
  498. --
  499. -- Sp[young(L1)] = L1
  500. -- if (R1 & 7) != 0 goto L1 else goto L2
  501. -- L2:
  502. -- call [R1] returns to L1
  503. -- L1: live: {y}
  504. -- x = R1
  505. --
  506. -- If we let the generic case handle this, we get
  507. --
  508. -- Sp[-16] = L1
  509. -- if (R1 & 7) != 0 goto L1a else goto L2
  510. -- L2:
  511. -- Sp[-8] = y
  512. -- Sp = Sp - 16
  513. -- call [R1] returns to L1
  514. -- L1a:
  515. -- Sp[-8] = y
  516. -- Sp = Sp - 16
  517. -- goto L1
  518. -- L1:
  519. -- x = R1
  520. --
  521. -- The code for saving the live vars is duplicated in each branch, and
  522. -- furthermore there is an extra jump in the fast path (assuming L1 is
  523. -- a proc point, which it probably is if there is a heap check).
  524. --
  525. -- So to fix this we want to set up the stack frame before the
  526. -- conditional jump. How do we know when to do this, and when it is
  527. -- safe? The basic idea is, when we see the assignment
  528. --
  529. -- Sp[young(L)] = L
  530. --
  531. -- we know that
  532. -- * we are definitely heading for L
  533. -- * there can be no more reads from another stack area, because young(L)
  534. -- overlaps with it.
  535. --
  536. -- We don't necessarily know that everything live at L is live now
  537. -- (some might be assigned between here and the jump to L). So we
  538. -- simplify and only do the optimisation when we see
  539. --
  540. -- (1) a block containing an assignment of a return address L
  541. -- (2) ending in a branch where one (and only) continuation goes to L,
  542. -- and no other continuations go to proc points.
  543. --
  544. -- then we allocate the stack frame for L at the end of the block,
  545. -- before the branch.
  546. --
  547. -- We could generalise (2), but that would make it a bit more
  548. -- complicated to handle, and this currently catches the common case.
  549. futureContinuation :: Block CmmNode O O -> Maybe BlockId
  550. futureContinuation middle = foldBlockNodesB f middle Nothing
  551. where f :: CmmNode a b -> Maybe BlockId -> Maybe BlockId
  552. f (CmmStore (CmmStackSlot (Young l) _) (CmmLit (CmmBlock _))) _
  553. = Just l
  554. f _ r = r
  555. -- -----------------------------------------------------------------------------
  556. -- Saving live registers
  557. -- | Given a set of live registers and a StackMap, save all the registers
  558. -- on the stack and return the new StackMap and the assignments to do
  559. -- the saving.
  560. --
  561. allocate :: DynFlags -> ByteOff -> LocalRegSet -> StackMap
  562. -> (StackMap, [CmmNode O O])
  563. allocate dflags ret_off live stackmap@StackMap{ sm_sp = sp0
  564. , sm_regs = regs0 }
  565. =
  566. -- we only have to save regs that are not already in a slot
  567. let to_save = filter (not . (`elemUFM` regs0)) (Set.elems live)
  568. regs1 = filterUFM (\(r,_) -> elemRegSet r live) regs0
  569. in
  570. -- make a map of the stack
  571. let stack = reverse $ Array.elems $
  572. accumArray (\_ x -> x) Empty (1, toWords dflags (max sp0 ret_off)) $
  573. ret_words ++ live_words
  574. where ret_words =
  575. [ (x, Occupied)
  576. | x <- [ 1 .. toWords dflags ret_off] ]
  577. live_words =
  578. [ (toWords dflags x, Occupied)
  579. | (r,off) <- nonDetEltsUFM regs1,
  580. -- See Note [Unique Determinism and code generation]
  581. let w = localRegBytes dflags r,
  582. x <- [ off, off - wORD_SIZE dflags .. off - w + 1] ]
  583. in
  584. -- Pass over the stack: find slots to save all the new live variables,
  585. -- choosing the oldest slots first (hence a foldr).
  586. let
  587. save slot ([], stack, n, assigs, regs) -- no more regs to save
  588. = ([], slot:stack, plusW dflags n 1, assigs, regs)
  589. save slot (to_save, stack, n, assigs, regs)
  590. = case slot of
  591. Occupied -> (to_save, Occupied:stack, plusW dflags n 1, assigs, regs)
  592. Empty
  593. | Just (stack', r, to_save') <-
  594. select_save to_save (slot:stack)
  595. -> let assig = CmmStore (CmmStackSlot Old n')
  596. (CmmReg (CmmLocal r))
  597. n' = plusW dflags n 1
  598. in
  599. (to_save', stack', n', assig : assigs, (r,(r,n')):regs)
  600. | otherwise
  601. -> (to_save, slot:stack, plusW dflags n 1, assigs, regs)
  602. -- we should do better here: right now we'll fit the smallest first,
  603. -- but it would make more sense to fit the biggest first.
  604. select_save :: [LocalReg] -> [StackSlot]
  605. -> Maybe ([StackSlot], LocalReg, [LocalReg])
  606. select_save regs stack = go regs []
  607. where go [] _no_fit = Nothing
  608. go (r:rs) no_fit
  609. | Just rest <- dropEmpty words stack
  610. = Just (replicate words Occupied ++ rest, r, rs++no_fit)
  611. | otherwise
  612. = go rs (r:no_fit)
  613. where words = localRegWords dflags r
  614. -- fill in empty slots as much as possible
  615. (still_to_save, save_stack, n, save_assigs, save_regs)
  616. = foldr save (to_save, [], 0, [], []) stack
  617. -- push any remaining live vars on the stack
  618. (push_sp, push_assigs, push_regs)
  619. = foldr push (n, [], []) still_to_save
  620. where
  621. push r (n, assigs, regs)
  622. = (n', assig : assigs, (r,(r,n')) : regs)
  623. where
  624. n' = n + localRegBytes dflags r
  625. assig = CmmStore (CmmStackSlot Old n')
  626. (CmmReg (CmmLocal r))
  627. trim_sp
  628. | not (null push_regs) = push_sp
  629. | otherwise
  630. = plusW dflags n (- length (takeWhile isEmpty save_stack))
  631. final_regs = regs1 `addListToUFM` push_regs
  632. `addListToUFM` save_regs
  633. in
  634. -- XXX should be an assert
  635. if ( n /= max sp0 ret_off ) then pprPanic "allocate" (ppr n <+> ppr sp0 <+> ppr ret_off) else
  636. if (trim_sp .&. (wORD_SIZE dflags - 1)) /= 0 then pprPanic "allocate2" (ppr trim_sp <+> ppr final_regs <+> ppr push_sp) else
  637. ( stackmap { sm_regs = final_regs , sm_sp = trim_sp }
  638. , push_assigs ++ save_assigs )
  639. -- -----------------------------------------------------------------------------
  640. -- Manifesting Sp
  641. -- | Manifest Sp: turn all the CmmStackSlots into CmmLoads from Sp. The
  642. -- block looks like this:
  643. --
  644. -- middle_pre -- the middle nodes
  645. -- Sp = Sp + sp_off -- Sp adjustment goes here
  646. -- last -- the last node
  647. --
  648. -- And we have some extra blocks too (that don't contain Sp adjustments)
  649. --
  650. -- The adjustment for middle_pre will be different from that for
  651. -- middle_post, because the Sp adjustment intervenes.
  652. --
  653. manifestSp
  654. :: DynFlags
  655. -> BlockEnv StackMap -- StackMaps for other blocks
  656. -> StackMap -- StackMap for this block
  657. -> ByteOff -- Sp on entry to the block
  658. -> ByteOff -- SpHigh
  659. -> CmmNode C O -- first node
  660. -> [CmmNode O O] -- middle
  661. -> ByteOff -- sp_off
  662. -> CmmNode O C -- last node
  663. -> [CmmBlock] -- new blocks
  664. -> [CmmBlock] -- final blocks with Sp manifest
  665. manifestSp dflags stackmaps stack0 sp0 sp_high
  666. first middle_pre sp_off last fixup_blocks
  667. = final_block : fixup_blocks'
  668. where
  669. area_off = getAreaOff stackmaps
  670. adj_pre_sp, adj_post_sp :: CmmNode e x -> CmmNode e x
  671. adj_pre_sp = mapExpDeep (areaToSp dflags sp0 sp_high area_off)
  672. adj_post_sp = mapExpDeep (areaToSp dflags (sp0 - sp_off) sp_high area_off)
  673. -- Add unwind pseudo-instructions to document Sp level for debugging
  674. add_unwind_info block
  675. | debugLevel dflags > 0 = CmmUnwind Sp sp_unwind : block
  676. | otherwise = block
  677. sp_unwind = CmmRegOff (CmmGlobal Sp) (sp0 - wORD_SIZE dflags)
  678. final_middle = maybeAddSpAdj dflags sp_off $
  679. blockFromList $
  680. add_unwind_info $
  681. map adj_pre_sp $
  682. elimStackStores stack0 stackmaps area_off $
  683. middle_pre
  684. final_last = optStackCheck (adj_post_sp last)
  685. final_block = blockJoin first final_middle final_last
  686. fixup_blocks' = map (mapBlock3' (id, adj_post_sp, id)) fixup_blocks
  687. getAreaOff :: BlockEnv StackMap -> (Area -> StackLoc)
  688. getAreaOff _ Old = 0
  689. getAreaOff stackmaps (Young l) =
  690. case mapLookup l stackmaps of
  691. Just sm -> sm_sp sm - sm_args sm
  692. Nothing -> pprPanic "getAreaOff" (ppr l)
  693. maybeAddSpAdj :: DynFlags -> ByteOff -> Block CmmNode O O -> Block CmmNode O O
  694. maybeAddSpAdj _ 0 block = block
  695. maybeAddSpAdj dflags sp_off block
  696. = block `blockSnoc` CmmAssign spReg (cmmOffset dflags (CmmReg spReg) sp_off)
  697. {-
  698. Sp(L) is the Sp offset on entry to block L relative to the base of the
  699. OLD area.
  700. SpArgs(L) is the size of the young area for L, i.e. the number of
  701. arguments.
  702. - in block L, each reference to [old + N] turns into
  703. [Sp + Sp(L) - N]
  704. - in block L, each reference to [young(L') + N] turns into
  705. [Sp + Sp(L) - Sp(L') + SpArgs(L') - N]
  706. - be careful with the last node of each block: Sp has already been adjusted
  707. to be Sp + Sp(L) - Sp(L')
  708. -}
  709. areaToSp :: DynFlags -> ByteOff -> ByteOff -> (Area -> StackLoc) -> CmmExpr -> CmmExpr
  710. areaToSp dflags sp_old _sp_hwm area_off (CmmStackSlot area n)
  711. = cmmOffset dflags (CmmReg spReg) (sp_old - area_off area - n)
  712. -- Replace (CmmStackSlot area n) with an offset from Sp
  713. areaToSp dflags _ sp_hwm _ (CmmLit CmmHighStackMark)
  714. = mkIntExpr dflags sp_hwm
  715. -- Replace CmmHighStackMark with the number of bytes of stack used,
  716. -- the sp_hwm. See Note [Stack usage] in StgCmmHeap
  717. areaToSp dflags _ _ _ (CmmMachOp (MO_U_Lt _) args)
  718. | falseStackCheck args
  719. = zeroExpr dflags
  720. areaToSp dflags _ _ _ (CmmMachOp (MO_U_Ge _) args)
  721. | falseStackCheck args
  722. = mkIntExpr dflags 1
  723. -- Replace a stack-overflow test that cannot fail with a no-op
  724. -- See Note [Always false stack check]
  725. areaToSp _ _ _ _ other = other
  726. -- | Determine whether a stack check cannot fail.
  727. falseStackCheck :: [CmmExpr] -> Bool
  728. falseStackCheck [ CmmMachOp (MO_Sub _)
  729. [ CmmRegOff (CmmGlobal Sp) x_off
  730. , CmmLit (CmmInt y_lit _)]
  731. , CmmReg (CmmGlobal SpLim)]
  732. = fromIntegral x_off >= y_lit
  733. falseStackCheck _ = False
  734. -- Note [Always false stack check]
  735. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  736. -- We can optimise stack checks of the form
  737. --
  738. -- if ((Sp + x) - y < SpLim) then .. else ..
  739. --
  740. -- where are non-negative integer byte offsets. Since we know that
  741. -- SpLim <= Sp (remember the stack grows downwards), this test must
  742. -- yield False if (x >= y), so we can rewrite the comparison to False.
  743. -- A subsequent sinking pass will later drop the dead code.
  744. -- Optimising this away depends on knowing that SpLim <= Sp, so it is
  745. -- really the job of the stack layout algorithm, hence we do it now.
  746. --
  747. -- The control flow optimiser may negate a conditional to increase
  748. -- the likelihood of a fallthrough if the branch is not taken. But
  749. -- not every conditional is inverted as the control flow optimiser
  750. -- places some requirements on the predecessors of both branch targets.
  751. -- So we better look for the inverted comparison too.
  752. optStackCheck :: CmmNode O C -> CmmNode O C
  753. optStackCheck n = -- Note [Always false stack check]
  754. case n of
  755. CmmCondBranch (CmmLit (CmmInt 0 _)) _true false _ -> CmmBranch false
  756. CmmCondBranch (CmmLit (CmmInt _ _)) true _false _ -> CmmBranch true
  757. other -> other
  758. -- -----------------------------------------------------------------------------
  759. -- | Eliminate stores of the form
  760. --
  761. -- Sp[area+n] = r
  762. --
  763. -- when we know that r is already in the same slot as Sp[area+n]. We
  764. -- could do this in a later optimisation pass, but that would involve
  765. -- a separate analysis and we already have the information to hand
  766. -- here. It helps clean up some extra stack stores in common cases.
  767. --
  768. -- Note that we may have to modify the StackMap as we walk through the
  769. -- code using procMiddle, since an assignment to a variable in the
  770. -- StackMap will invalidate its mapping there.
  771. --
  772. elimStackStores :: StackMap
  773. -> BlockEnv StackMap
  774. -> (Area -> ByteOff)
  775. -> [CmmNode O O]
  776. -> [CmmNode O O]
  777. elimStackStores stackmap stackmaps area_off nodes
  778. = go stackmap nodes
  779. where
  780. go _stackmap [] = []
  781. go stackmap (n:ns)
  782. = case n of
  783. CmmStore (CmmStackSlot area m) (CmmReg (CmmLocal r))
  784. | Just (_,off) <- lookupUFM (sm_regs stackmap) r
  785. , area_off area + m == off
  786. -> go stackmap ns
  787. _otherwise
  788. -> n : go (procMiddle stackmaps n stackmap) ns
  789. -- -----------------------------------------------------------------------------
  790. -- Update info tables to include stack liveness
  791. setInfoTableStackMap :: DynFlags -> BlockEnv StackMap -> CmmDecl -> CmmDecl
  792. setInfoTableStackMap dflags stackmaps (CmmProc top_info@TopInfo{..} l v g)
  793. = CmmProc top_info{ info_tbls = mapMapWithKey fix_info info_tbls } l v g
  794. where
  795. fix_info lbl info_tbl@CmmInfoTable{ cit_rep = StackRep _ } =
  796. info_tbl { cit_rep = StackRep (get_liveness lbl) }
  797. fix_info _ other = other
  798. get_liveness :: BlockId -> Liveness
  799. get_liveness lbl
  800. = case mapLookup lbl stackmaps of
  801. Nothing -> pprPanic "setInfoTableStackMap" (ppr lbl <+> ppr info_tbls)
  802. Just sm -> stackMapToLiveness dflags sm
  803. setInfoTableStackMap _ _ d = d
  804. stackMapToLiveness :: DynFlags -> StackMap -> Liveness
  805. stackMapToLiveness dflags StackMap{..} =
  806. reverse $ Array.elems $
  807. accumArray (\_ x -> x) True (toWords dflags sm_ret_off + 1,
  808. toWords dflags (sm_sp - sm_args)) live_words
  809. where
  810. live_words = [ (toWords dflags off, False)
  811. | (r,off) <- nonDetEltsUFM sm_regs
  812. , isGcPtrType (localRegType r) ]
  813. -- See Note [Unique Determinism and code generation]
  814. -- -----------------------------------------------------------------------------
  815. -- Lowering safe foreign calls
  816. {-
  817. Note [Lower safe foreign calls]
  818. We start with
  819. Sp[young(L1)] = L1
  820. ,-----------------------
  821. | r1 = foo(x,y,z) returns to L1
  822. '-----------------------
  823. L1:
  824. R1 = r1 -- copyIn, inserted by mkSafeCall
  825. ...
  826. the stack layout algorithm will arrange to save and reload everything
  827. live across the call. Our job now is to expand the call so we get
  828. Sp[young(L1)] = L1
  829. ,-----------------------
  830. | SAVE_THREAD_STATE()
  831. | token = suspendThread(BaseReg, interruptible)
  832. | r = foo(x,y,z)
  833. | BaseReg = resumeThread(token)
  834. | LOAD_THREAD_STATE()
  835. | R1 = r -- copyOut
  836. | jump Sp[0]
  837. '-----------------------
  838. L1:
  839. r = R1 -- copyIn, inserted by mkSafeCall
  840. ...
  841. Note the copyOut, which saves the results in the places that L1 is
  842. expecting them (see Note {safe foreign call convention]). Note also
  843. that safe foreign call is replace by an unsafe one in the Cmm graph.
  844. -}
  845. lowerSafeForeignCall :: DynFlags -> CmmBlock -> UniqSM CmmBlock
  846. lowerSafeForeignCall dflags block
  847. | (entry@(CmmEntry _ tscp), middle, CmmForeignCall { .. }) <- blockSplit block
  848. = do
  849. -- Both 'id' and 'new_base' are KindNonPtr because they're
  850. -- RTS-only objects and are not subject to garbage collection
  851. id <- newTemp (bWord dflags)
  852. new_base <- newTemp (cmmRegType dflags (CmmGlobal BaseReg))
  853. let (caller_save, caller_load) = callerSaveVolatileRegs dflags
  854. save_state_code <- saveThreadState dflags
  855. load_state_code <- loadThreadState dflags
  856. let suspend = save_state_code <*>
  857. caller_save <*>
  858. mkMiddle (callSuspendThread dflags id intrbl)
  859. midCall = mkUnsafeCall tgt res args
  860. resume = mkMiddle (callResumeThread new_base id) <*>
  861. -- Assign the result to BaseReg: we
  862. -- might now have a different Capability!
  863. mkAssign (CmmGlobal BaseReg) (CmmReg (CmmLocal new_base)) <*>
  864. caller_load <*>
  865. load_state_code
  866. (_, regs, copyout) =
  867. copyOutOflow dflags NativeReturn Jump (Young succ)
  868. (map (CmmReg . CmmLocal) res)
  869. ret_off []
  870. -- NB. after resumeThread returns, the top-of-stack probably contains
  871. -- the stack frame for succ, but it might not: if the current thread
  872. -- received an exception during the call, then the stack might be
  873. -- different. Hence we continue by jumping to the top stack frame,
  874. -- not by jumping to succ.
  875. jump = CmmCall { cml_target = entryCode dflags $
  876. CmmLoad (CmmReg spReg) (bWord dflags)
  877. , cml_cont = Just succ
  878. , cml_args_regs = regs
  879. , cml_args = widthInBytes (wordWidth dflags)
  880. , cml_ret_args = ret_args
  881. , cml_ret_off = ret_off }
  882. graph' <- lgraphOfAGraph ( suspend <*>
  883. midCall <*>
  884. resume <*>
  885. copyout <*>
  886. mkLast jump, tscp)
  887. case toBlockList graph' of
  888. [one] -> let (_, middle', last) = blockSplit one
  889. in return (blockJoin entry (middle `blockAppend` middle') last)
  890. _ -> panic "lowerSafeForeignCall0"
  891. -- Block doesn't end in a safe foreign call:
  892. | otherwise = return block
  893. foreignLbl :: FastString -> CmmExpr
  894. foreignLbl name = CmmLit (CmmLabel (mkForeignLabel name Nothing ForeignLabelInExternalPackage IsFunction))
  895. callSuspendThread :: DynFlags -> LocalReg -> Bool -> CmmNode O O
  896. callSuspendThread dflags id intrbl =
  897. CmmUnsafeForeignCall
  898. (ForeignTarget (foreignLbl (fsLit "suspendThread"))
  899. (ForeignConvention CCallConv [AddrHint, NoHint] [AddrHint] CmmMayReturn))
  900. [id] [CmmReg (CmmGlobal BaseReg), mkIntExpr dflags (fromEnum intrbl)]
  901. callResumeThread :: LocalReg -> LocalReg -> CmmNode O O
  902. callResumeThread new_base id =
  903. CmmUnsafeForeignCall
  904. (ForeignTarget (foreignLbl (fsLit "resumeThread"))
  905. (ForeignConvention CCallConv [AddrHint] [AddrHint] CmmMayReturn))
  906. [new_base] [CmmReg (CmmLocal id)]
  907. -- -----------------------------------------------------------------------------
  908. plusW :: DynFlags -> ByteOff -> WordOff -> ByteOff
  909. plusW dflags b w = b + w * wORD_SIZE dflags
  910. data StackSlot = Occupied | Empty
  911. -- Occupied: a return address or part of an update frame
  912. instance Outputable StackSlot where
  913. ppr Occupied = text "XXX"
  914. ppr Empty = text "---"
  915. dropEmpty :: WordOff -> [StackSlot] -> Maybe [StackSlot]
  916. dropEmpty 0 ss = Just ss
  917. dropEmpty n (Empty : ss) = dropEmpty (n-1) ss
  918. dropEmpty _ _ = Nothing
  919. isEmpty :: StackSlot -> Bool
  920. isEmpty Empty = True
  921. isEmpty _ = False
  922. localRegBytes :: DynFlags -> LocalReg -> ByteOff
  923. localRegBytes dflags r
  924. = roundUpToWords dflags (widthInBytes (typeWidth (localRegType r)))
  925. localRegWords :: DynFlags -> LocalReg -> WordOff
  926. localRegWords dflags = toWords dflags . localRegBytes dflags
  927. toWords :: DynFlags -> ByteOff -> WordOff
  928. toWords dflags x = x `quot` wORD_SIZE dflags
  929. insertReloads :: StackMap -> [CmmNode O O]
  930. insertReloads stackmap =
  931. [ CmmAssign (CmmLocal r) (CmmLoad (CmmStackSlot Old sp)
  932. (localRegType r))
  933. | (r,sp) <- stackSlotRegs stackmap
  934. ]
  935. stackSlotRegs :: StackMap -> [(LocalReg, StackLoc)]
  936. stackSlotRegs sm = nonDetEltsUFM (sm_regs sm)
  937. -- See Note [Unique Determinism and code generation]