One of my old-day notes 🦥.

Collapse of Nested Loops

The collapse clause converts a prefect nested loop into a single loop then parallelize it. The condition of a perfect nested loop is that, the inner loop is tightly included by the outer loop, and no other codes lying between:

for(int i = 0 ... ) {
  for(int j = 0 ...) {
    task[i][j];
  }
}

Such condition is hard to meet. Moreover, it best suits with (1) the static scheduler instead of the dynamic one, and (2) when the parallelism of the outer loop is smaller than the number of threads, i.e., i < num_threads.

In the situation where the workload is imbalanced and the parallelism of outer loop is not an issue, the dynamic scheduler still performs better as in our test cases.

Experimental setting: only change the Scatter phase with the following scheduling policy,

schedulertime (s)
dynamic collapse2.004
static collapse1.329
dynamic1.276