Operator Overload

Reference: here. The return of overloaded operator should be a reference, otherwise return-by-code will create a (temporary) rvalue that cannot be passed to the next operation f2 by non-const reference. i.e., rvalue cannot be non-const referenced. #include <vector> #include <iostream> #include <functional> template<typename T, typename FN> requires std::invocable<FN, T&> // diff std::invocable? std::vector<T>& operator| (std::vector<T>& vec, FN fn) noexcept { for(auto& e: vec) { fn(e); } return vec; } int main(){ std::vector v{1, 2, 3}; auto f1 = [](int& i) {i *= i; }; std::function f2 {[](const int& i) {std::cout << i << ' '; } }; v | f1 | f2; }```

August 17, 2023 · 103 words · Me

Multidimensional Subscript Operator []

Finally, C++23 allows overload for the subscript operator [] to be multi-dimensional. Before that, we normally either use: vector of vector to form a matrix, and access it as mat[i][j] a class containing a big 1-d vector, but behaves as 2-d by overloading the operator (), e.g., mat(i,j) Now, with C++23, we advance the second option (which offers efficient memory access) with better indexing approaching as follow: template <typename T, size_t R, size_t C> struct matrix { T& operator[](size_t const r, size_t const c) noexcept { return data_[r * C + c]; } T const& operator[](size_t const r, size_t const c) const noexcept { return data_[r * C + c]; } static constexpr size_t Rows = R; static constexpr size_t Columns = C; private: std::array<T, R * C> data_; }; int main() { matrix<int, 3, 2> m; for(size_t i = 0; i < m....

May 13, 2023 · 198 words · Yac

Bitwise Op

🦥 An old note. Bitwise vs Arithmetic running on a vector of size 2^31, bitwise operations are significantly faster than arithmetic counterparts: seg = 64; volume = (vec_size - 1)/ seg + 1; unsigned bs = log2(seg); unsigned bv= log2(volume); unsigned bbv = volume - 1; Arithmetic: out[i] = i % volume * seg + i / volume Bitwise: out[i] = ((i & bbv) << bs) + (i >> bv)...

May 7, 2023 · 80 words · Me

Omp Parallel Region

The results look suspicious to me… But I wrote down this note many days ago 🦥. Maybe I need to evaluate it again. Multiple Parallel Regions The cost of constructing parallel region is expensive in OpenMP. Let’s use two example for illustration: Three loops operating on a vector of size 2^31, e.g., for(size_t i = 0; i < vec.size(); i++) vec[i] += 1, vec[i] *= 0.9, vec[i] /= 7, Case 1: a large parallel region including the three loops by omp parallel { omp for }...

May 2, 2023 · 238 words · Me

Omp Collapse

One of my old-day notes 🦥. Collapse of Nested Loops The collapse clause converts a prefect nested loop into a single loop then parallelize it. The condition of a perfect nested loop is that, the inner loop is tightly included by the outer loop, and no other codes lying between: for(int i = 0 ... ) { for(int j = 0 ...) { task[i][j]; } } Such condition is hard to meet....

May 2, 2023 · 158 words · Yac

Vector vs Array

Another post recycled from my earlier notes. I really don’t have motivation to improve it further 🦥. Vector vs Array Initilization The Vector is the preferred choice for data storage in mordern C++. It is internally implemented based on the Array. However, the performance gap between the two is indeed obvious. The Vector can be initialized via std::vector<T> vec(size). Meanwhile, an Array is initialized by T* arr = new T[size]...

May 1, 2023 · 460 words · Yac

Gather with SIMD

Writing SIMD code that works across different platforms can be a challenging task. The following log illustrates how a seemingly simple operation in C++ can quickly escalate into a significant problem. Let’s look into the code below, where the elements of x is accessed through indices specified by idx. normal code std::vector<float> x = /*some data*/ std::vector<int> idx = /* index */ for(auto i: idx) { auto data = x[i]; } Gather with Intel In AVX512, Gather is a specific intrinsic function to transfer data from a data array to a target vec, according to an index vec....

April 27, 2023 · 1014 words · Yac

SIMD is Pain

Writing code with SIMD for vectorization is painful. It deserves a blog series to record all sorts of pains I have encountered and (partially) overcome. Indeed, once the pain of coding and debugging is finished, the program is lightning-faster. Nonetheless, I am here to complain instead of praising. Let me state why writing SIMD code is causing me emotional damage: a single line of normal c++ code could be easily inflated to a dozen lines of code....

April 25, 2023 · 477 words · Yac

Parallel Algorithms from Libraries

The content of this post is extracted from my previous random notes. I am too lazy to update and organize it 🦥. C++17 new feature – parallel algorithms The parallel algorithms and execution policies are introduced in C++17. Unfortuantely, according to CppReference, only GCC and Intel support these features. Clang still leaves them unimplemented. A blog about it. The parallel library brough by C++17 requires the usage of Intel’s oneTBB for multithreading....

April 25, 2023 · 382 words · Yac