Changes

GPU621/Intel Advisor

925 bytes added, 11:05, 23 November 2018

no edit summary

* AVX

* AVX2

=== Example ===

For SSE >= SSE4.1, to multiply two 128-bit vector of signed 32-bit integers, you would use the following Intel intrisic function:

__m128i _mm_mullo_epi32(__m128i a, __m128i b)

</source>

Prior to SSE4.1, the same thing can be done with the following sequence of function calls.

// Vec4i operator * (Vec4i const & a, Vec4i const & b) {

// #ifdef

__m128i a13 = _mm_shuffle_epi32(a, 0xF5); // (-,a3,-,a1)

__m128i b13 = _mm_shuffle_epi32(b, 0xF5); // (-,b3,-,b1)

__m128i prod02 = _mm_mul_epu32(a, b); // (-,a2*b2,-,a0*b0)

__m128i prod13 = _mm_mul_epu32(a13, b13); // (-,a3*b3,-,a1*b1)

__m128i prod01 = _mm_unpacklo_epi32(prod02, prod13); // (-,-,a1*b1,a0*b0)

__m128i prod23 = _mm_unpackhi_epi32(prod02, prod13); // (-,-,a3*b3,a2*b2)

__m128i prod = _mm_unpacklo_epi64(prod01, prod23); // (ab3,ab2,ab1,ab0)

</source>

[https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2 Intel Intrinsics SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2]

Jespiritu

49

edits

Changes

GPU621/Intel Advisor

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools