How do I vectorize code with intrinsics or std::simd in C++?

Vectorization is a powerful technique used in C++ to enhance the performance of code by leveraging SIMD (Single Instruction, Multiple Data) instructions. By using intrinsics or libraries like std::simd, you can improve the execution speed of your programs by performing operations on multiple data points in a single instruction.

Using Intrinsics

Intel and other CPU manufacturers provide intrinsic functions that allow programmers to make use of SIMD instructions directly. Here's a basic example of how you can use intrinsics to perform a vector addition of two arrays:


        #include <immintrin.h>  // For Intel intrinsics
        #include <iostream>

        void vector_add(const float* a, const float* b, float* result, int N) {
            for (int i = 0; i < N; i += 4) {
                __m128 vec_a = _mm_load_ps(&a[i]); // Load 4 floats from array a
                __m128 vec_b = _mm_load_ps(&b[i]); // Load 4 floats from array b
                __m128 vec_result = _mm_add_ps(vec_a, vec_b); // Perform vector addition
                _mm_store_ps(&result[i], vec_result); // Store result
            }
        }

        int main() {
            const int N = 8;
            float a[N] = {1, 2, 3, 4, 5, 6, 7, 8};
            float b[N] = {10, 9, 8, 7, 6, 5, 4, 3};
            float result[N];
            vector_add(a, b, result, N);

            for (const auto& r : result) {
                std::cout << r << " ";
            }
            return 0;
        }

Using std::simd

As an alternative to intrinsics, you can use the std::simd feature introduced in C++23, which provides a more generic and easier way to write SIMD code. Here is how you can perform the same vector addition using std::simd:


        #include <iostream>
        #include <experimental/simd>

        void vector_add(const float* a, const float* b, float* result, int N) {
            for (int i = 0; i < N; i += std::experimental::simd_size::value) {
                std::experimental::simd vec_a(&a[i]);
                std::experimental::simd vec_b(&b[i]);
                std::experimental::simd vec_result = vec_a + vec_b;
                vec_result.copy_to(&result[i]);
            }
        }

        int main() {
            const int N = 8;
            float a[N] = {1, 2, 3, 4, 5, 6, 7, 8};
            float b[N] = {10, 9, 8, 7, 6, 5, 4, 3};
            float result[N];
            vector_add(a, b, result, N);

            for (const auto& r : result) {
                std::cout << r << " ";
            }
            return 0;
        }

vectorization C++ intrinsics std::simd SIMD performance optimization

How do I vectorize code with intrinsics or std::simd in C++?

Using Intrinsics

Using std::simd

Popular Topics

Recent Languages

How do I vectorize code with intrinsics or std::simd in C++?

Using Intrinsics

Using std::simd

Related Questions

Popular Topics

Recent Languages