Changes

Jump to: navigation, search

GPU621/Intel Advisor

925 bytes added, 11:05, 23 November 2018
no edit summary
* AVX
* AVX2
 
=== Example ===
 
For SSE >= SSE4.1, to multiply two 128-bit vector of signed 32-bit integers, you would use the following Intel intrisic function:
 
<source lang="cpp">
__m128i _mm_mullo_epi32(__m128i a, __m128i b)
</source>
 
Prior to SSE4.1, the same thing can be done with the following sequence of function calls.
 
<source lang="cpp">
// Vec4i operator * (Vec4i const & a, Vec4i const & b) {
// #ifdef
__m128i a13 = _mm_shuffle_epi32(a, 0xF5); // (-,a3,-,a1)
__m128i b13 = _mm_shuffle_epi32(b, 0xF5); // (-,b3,-,b1)
__m128i prod02 = _mm_mul_epu32(a, b); // (-,a2*b2,-,a0*b0)
__m128i prod13 = _mm_mul_epu32(a13, b13); // (-,a3*b3,-,a1*b1)
__m128i prod01 = _mm_unpacklo_epi32(prod02, prod13); // (-,-,a1*b1,a0*b0)
__m128i prod23 = _mm_unpackhi_epi32(prod02, prod13); // (-,-,a3*b3,a2*b2)
__m128i prod = _mm_unpacklo_epi64(prod01, prod23); // (ab3,ab2,ab1,ab0)
</source>
[https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2 Intel Intrinsics SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2]
49
edits

Navigation menu