Gather with SIMD

Writing SIMD code that works across different platforms can be a challenging task. The following log illustrates how a seemingly simple operation in C++ can quickly escalate into a significant problem. Let’s look into the code below, where the elements of x is accessed through indices specified by idx. normal code std::vector<float> x = /*some data*/ std::vector<int> idx = /* index */ for(auto i: idx) { auto data = x[i]; } Gather with Intel In AVX512, Gather is a specific intrinsic function to transfer data from a data array to a target vec, according to an index vec....

April 27, 2023 · 1014 words · Yac

SIMD is Pain

Writing code with SIMD for vectorization is painful. It deserves a blog series to record all sorts of pains I have encountered and (partially) overcome. Indeed, once the pain of coding and debugging is finished, the program is lightning-faster. Nonetheless, I am here to complain instead of praising. Let me state why writing SIMD code is causing me emotional damage: a single line of normal c++ code could be easily inflated to a dozen lines of code....

April 25, 2023 · 477 words · Yac