One of my old-day notes 🦥.
Collapse of Nested Loops
The collapse
clause converts a prefect nested loop into a single loop then parallelize it. The condition of a perfect nested loop is that, the inner loop is tightly included by the outer loop, and no other codes lying between:
for(int i = 0 ... ) {
for(int j = 0 ...) {
task[i][j];
}
}
Such condition is hard to meet. Moreover, it best suits with (1) the static
scheduler instead of the dynamic
one, and (2) when the parallelism of the outer loop is smaller than the number of threads, i.e., i < num_threads
.
In the situation where the workload is imbalanced and the parallelism of outer loop is not an issue, the dynamic
scheduler still performs better as in our test cases.
Experimental setting: only change the Scatter phase with the following scheduling policy,
scheduler | time (s) |
---|---|
dynamic collapse | 2.004 |
static collapse | 1.329 |
dynamic | 1.276 |