gif up gif gif
Next: 3. Compiler Effectiveness Up: 2. A Compilation Model Previous: 2.4 Causes of loss

2.5 Placement of LOD events

 

When the compiler detects that an LOD must be inserted into the code to satisfy a dependence, it is often the case that the dependencies imply a region over which the LOD can be placed rather than an exact position. Consider the example illustrated below:


280            do 100 mrs = 1, nrs
281    C     
282              do 10 mq = 1, nq
283                do 10 mi = 1, morb
284    10            xrsiq(mi, mq) = zero
285              do 40 mp = 1, np
286                do 30 mq = 1, mp
287                  mrspq = mrspq+1
288                  val = xrspq(mrspq)
289                  if (val .eq. zero) 
289                    goto 30
290                  do 20 mi = 1, morb
291                    xrsiq(mi, mq) = xrsiq(mi, mq)+val*v(mp, mi)
292                    xrsiq(mi, mp) = xrsiq(mi, mp)+val*v(mq, mi)
293    20              continue
294    30            continue
295    40          continue
296    C     
297              mrsij = mrsij0
298              do 90 mi = 1, morb
299                do 50 mj = 1, mi
300    50            xij(mj) = zero
301                do 70 mq = 1, nq
302                  val = xrsiq(mi, mq)
303                  if (val .eq. zero) 
303                    goto 70
304                  do 60 mj = 1, mi
305    60              xij(mj) = xij(mj)+val*v(mq, mj)
306    70            continue
307                do 80 mj = 1, mi
308                  mrsij = mrsij+1
309    80            xrsij(mrsij) = xij(mj)
310    90          continue
311    C     
312              mrsij0 = mrsij0+nrs
313    100       continue


In this example, the lines marked with icons (i.e. 284, 291 and 292) represent DU definitions of the xrsiq array. Similarly, the code marked with at line 302, represents a CU use of xrsiq. Together these define three dependencies which must each be broken by a CU/DU synchronization (i.e. an LOD). There are, however, several choices as to the position in the code where the synchronization should be placed. It could be placed immediately prior to the CU use of xrsiq, but then the LOD would be executed 2,821,875 times. Clearly, the optimimum position of the LOD is at a point in the code which dominates the CU use and which post-dominates all DU definitions, but which executes least frequently.

  In the experimental compiler, the decoupler uses a graph algorithm to hoist LODs to a basic block in the intermediate code which satisfies the above criteria. In this example, it places the LOD at line 298, resulting in an LOD frequency of just 2,625.

It is also possible that the target of the dependence is executed less frequently than the sources, and perhaps even less frequently than the ``obvious'' position that the hoisting algorithm might suggest. This could occur, for example, if the sources of the dependencies execute conditionally and their conditions are rarely true. In such cases it pays to migrate the LOD towards the sources, possibly replicating the LOD where sources are guarded by different conditions.



gif up gif gif
Next: 3. Compiler Effectiveness Up: 2. A Compilation Model Previous: 2.4 Causes of loss



npt@dcs.ed.ac.uk