How do I vectorize code with intrinsics or std::simd in C++?

Vectorization is a powerful technique used in C++ to enhance the performance of code by leveraging SIMD (Single Instruction, Multiple Data) instructions. By using intrinsics or libraries like std::simd, you can improve the execution speed of your programs by performing operations on multiple data points in a single instruction.

Using Intrinsics

Intel and other CPU manufacturers provide intrinsic functions that allow programmers to make use of SIMD instructions directly. Here's a basic example of how you can use intrinsics to perform a vector addition of two arrays:

#include <immintrin.h> // For Intel intrinsics #include <iostream> void vector_add(const float* a, const float* b, float* result, int N) { for (int i = 0; i < N; i += 4) { __m128 vec_a = _mm_load_ps(&a[i]); // Load 4 floats from array a __m128 vec_b = _mm_load_ps(&b[i]); // Load 4 floats from array b __m128 vec_result = _mm_add_ps(vec_a, vec_b); // Perform vector addition _mm_store_ps(&result[i], vec_result); // Store result } } int main() { const int N = 8; float a[N] = {1, 2, 3, 4, 5, 6, 7, 8}; float b[N] = {10, 9, 8, 7, 6, 5, 4, 3}; float result[N]; vector_add(a, b, result, N); for (const auto& r : result) { std::cout << r << " "; } return 0; }

Using std::simd

As an alternative to intrinsics, you can use the std::simd feature introduced in C++23, which provides a more generic and easier way to write SIMD code. Here is how you can perform the same vector addition using std::simd:

#include <iostream> #include <experimental/simd> void vector_add(const float* a, const float* b, float* result, int N) { for (int i = 0; i < N; i += std::experimental::simd_size::value) { std::experimental::simd vec_a(&a[i]); std::experimental::simd vec_b(&b[i]); std::experimental::simd vec_result = vec_a + vec_b; vec_result.copy_to(&result[i]); } } int main() { const int N = 8; float a[N] = {1, 2, 3, 4, 5, 6, 7, 8}; float b[N] = {10, 9, 8, 7, 6, 5, 4, 3}; float result[N]; vector_add(a, b, result, N); for (const auto& r : result) { std::cout << r << " "; } return 0; }

vectorization C++ intrinsics std::simd SIMD performance optimization