Vectorization is a powerful technique used in C++ to enhance the performance of code by leveraging SIMD (Single Instruction, Multiple Data) instructions. By using intrinsics or libraries like std::simd
, you can improve the execution speed of your programs by performing operations on multiple data points in a single instruction.
Intel and other CPU manufacturers provide intrinsic functions that allow programmers to make use of SIMD instructions directly. Here's a basic example of how you can use intrinsics to perform a vector addition of two arrays:
#include <immintrin.h> // For Intel intrinsics
#include <iostream>
void vector_add(const float* a, const float* b, float* result, int N) {
for (int i = 0; i < N; i += 4) {
__m128 vec_a = _mm_load_ps(&a[i]); // Load 4 floats from array a
__m128 vec_b = _mm_load_ps(&b[i]); // Load 4 floats from array b
__m128 vec_result = _mm_add_ps(vec_a, vec_b); // Perform vector addition
_mm_store_ps(&result[i], vec_result); // Store result
}
}
int main() {
const int N = 8;
float a[N] = {1, 2, 3, 4, 5, 6, 7, 8};
float b[N] = {10, 9, 8, 7, 6, 5, 4, 3};
float result[N];
vector_add(a, b, result, N);
for (const auto& r : result) {
std::cout << r << " ";
}
return 0;
}
As an alternative to intrinsics, you can use the std::simd
feature introduced in C++23, which provides a more generic and easier way to write SIMD code. Here is how you can perform the same vector addition using std::simd
:
#include <iostream>
#include <experimental/simd>
void vector_add(const float* a, const float* b, float* result, int N) {
for (int i = 0; i < N; i += std::experimental::simd_size::value) {
std::experimental::simd vec_a(&a[i]);
std::experimental::simd vec_b(&b[i]);
std::experimental::simd vec_result = vec_a + vec_b;
vec_result.copy_to(&result[i]);
}
}
int main() {
const int N = 8;
float a[N] = {1, 2, 3, 4, 5, 6, 7, 8};
float b[N] = {10, 9, 8, 7, 6, 5, 4, 3};
float result[N];
vector_add(a, b, result, N);
for (const auto& r : result) {
std::cout << r << " ";
}
return 0;
}
How do I avoid rehashing overhead with std::set in multithreaded code?
How do I find elements with custom comparators with std::set for embedded targets?
How do I erase elements while iterating with std::set for embedded targets?
How do I provide stable iteration order with std::unordered_map for large datasets?
How do I reserve capacity ahead of time with std::unordered_map for large datasets?
How do I erase elements while iterating with std::unordered_map in multithreaded code?
How do I provide stable iteration order with std::map for embedded targets?
How do I provide stable iteration order with std::map in multithreaded code?
How do I avoid rehashing overhead with std::map in performance-sensitive code?
How do I merge two containers efficiently with std::map for embedded targets?