GPU621/Intel Advisor

__m128i prod = _mm_unpacklo_epi64(prod01, prod23); // (ab3,ab2,ab1,ab0)
Code sample was taken from this StackOverflow thread: [ Fastest way to multiply two vectors of 32bit integers in C++, with SSE]
Here is a link to an interactive guide to Intel Intrinsics: [,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2 Intel Intrinsics SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2]

