When the compiler detects that an LOD must be inserted into the code to satisfy a dependence, it is often the case that the dependencies imply a region over which the LOD can be placed rather than an exact position. Consider the example illustrated below:
280 do 100 mrs = 1, nrs 281 C 282 do 10 mq = 1, nq 283 do 10 mi = 1, morb 284 10 xrsiq(mi, mq) = zero 285 do 40 mp = 1, np 286 do 30 mq = 1, mp 287 mrspq = mrspq+1 288 val = xrspq(mrspq) 289 if (val .eq. zero) 289 goto 30 290 do 20 mi = 1, morb 291 xrsiq(mi, mq) = xrsiq(mi, mq)+val*v(mp, mi) 292 xrsiq(mi, mp) = xrsiq(mi, mp)+val*v(mq, mi) 293 20 continue 294 30 continue 295 40 continue 296 C 297 mrsij = mrsij0 298 do 90 mi = 1, morb 299 do 50 mj = 1, mi 300 50 xij(mj) = zero 301 do 70 mq = 1, nq 302 val = xrsiq(mi, mq) 303 if (val .eq. zero) 303 goto 70 304 do 60 mj = 1, mi 305 60 xij(mj) = xij(mj)+val*v(mq, mj) 306 70 continue 307 do 80 mj = 1, mi 308 mrsij = mrsij+1 309 80 xrsij(mrsij) = xij(mj) 310 90 continue 311 C 312 mrsij0 = mrsij0+nrs 313 100 continue
xrsiq
array. Similarly, the code marked with
at line 302, represents a CU use of xrsiq
. Together
these define three dependencies which must each be broken by a CU/DU synchronization
(i.e. an LOD). There are, however, several choices as to the position in the code
where the synchronization should be placed. It could be placed immediately prior to the
CU use of xrsiq
, but then the LOD would be executed 2,821,875 times. Clearly, the
optimimum position of the LOD is at a point in the code which dominates the CU use
and which post-dominates all DU definitions, but which executes least frequently.
In the experimental compiler, the decoupler uses a graph algorithm to hoist LODs to a basic block in the intermediate code which satisfies the above criteria. In this example, it places the LOD at line 298, resulting in an LOD frequency of just 2,625.
It is also possible that the target of the dependence is executed less frequently than the sources, and perhaps even less frequently than the ``obvious'' position that the hoisting algorithm might suggest. This could occur, for example, if the sources of the dependencies execute conditionally and their conditions are rarely true. In such cases it pays to migrate the LOD towards the sources, possibly replicating the LOD where sources are guarded by different conditions.