The content of this post is extracted from my previous random notes. I am too lazy to update and organize it 🦥.

C++17 new feature – parallel algorithms

The parallel algorithms and execution policies are introduced in C++17. Unfortuantely, according to CppReference, only GCC and Intel support these features. Clang still leaves them unimplemented.

A blog about it.

The parallel library brough by C++17 requires the usage of Intel’s oneTBB for multithreading.
However, there are version conflicts between gcc and oneTBB, as mentioned in issue1 and issue2. Thus, we need to match the gcc with oneTBB for correct version:

  1. g++11 with oneTBB.2021 (tested).
  2. g++9/10 with oneTBB.2019 (untested).

Moreoever, the TBB-backboned parallel algorithms does not promise superior performance (perphase due to the implementation overheads), according to the discussion here. Programmers may need to implement their own parallel algorithms to achieve optimal speed.

Fast Parallel Algorithms

For implementation, we test std::, tbb-based std::parallel with par and par_unseq, and gnu_parallel for performance evaluation. gnu_parallel performs as the fastest toolkits.

TODO: I should implement all those algorithms by myself in the near future.

Sorting

gnu_parallel is favored by someones

When operating on a vector of size 2^31, the performance of various implementations are:

methodstime (10^-6s)
std::15.87
par2373.95
par_unseq11.50
gnu_parallel6.54

Prefix Sum

It is also a well-studied algorithm, as descripted by link.

When operating on a vector of size 2^31, the performance of various implementations are:

methodstime (10^-6s)
std::5.08
par5.58
par_unseq5.42
gnu_parallel4.25

Conclusion

libstdc++ offers built-in parallel implementations for a variety of algorithms, including sort and partial_sum. The parallel mode is implicitly enabled during the compilation with -fopenmp and _GLIBCXX_PARALLEL.

Moreover, the parallel components called by e.g., std::sort are in fact the gnu_parallel codes. We can also explicitly call gnu_parallel by including the header, e.g.,<parallel/algorithm>. Compared with the parallel std::, gnu_parallel incurs smaller overhead and thus delivers (slightly) better performance.

The tbb-based parallel methods, which is the new feature of C++17, are unsatisfactory. The par policy behaves extremely poorly in Sorting and a bit bad in Prefix Sum. Suprisingly, the par_unseq policy (parallelism + vectorization) is rather good in Sorting, only second to gnu_parallel. The TBB and its optimization strategies remain to be explored in the future.

More details regarding gnu_parallel can be found on this page.