Changes

GPU621/Intel Advisor

186 bytes added, 14:45, 23 November 2018

no edit summary

</pre>

=== ~~Example~~ SSE Examples ===

For SSE >= SSE4.1, to multiply two 128-bit vector of signed 32-bit integers, you would use the following Intel intrisic function:

Here is a link to an interactive guide to Intel Intrinsics: [https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2 Intel Intrinsics SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2]

== Vectorization Examples ==

[INSERT IMAGE HERE]

A pointer alias means that two pointers point to the same location in memory or the two pointers overlap in memory.

If you compile the vec_samples project with the ~~`NOALIAS`~~ macro, the `<code>matvec` </code> function declaration will include the `<code>restrict` </code> keyword. The `<code>restrict` </code> keyword will tell the compiler that pointers `<code>a` </code> and `<code>b` </code> do not overlap and that the compiler is free optimize the code blocks that uses the pointers.

[INSERT IMAGE HERE]

</source>

To learn more about the `<code>restrict` </code> keyword and how the compiler can optimize code if it knows that two pointers do not overlap, you can visit this StackOverflow thread: [https://stackoverflow.com/a/30827880 What does the restrict keyword mean in C++?]

=== Loop-Carried Dependency ===

Pointers that overlap one another may introduce a loop-carried dependency when those pointers point to an array of data. The vectorizer will make this assumption and, as a result, will not auto-vectorize the code.

In the code example below, `<code>a` </code> is a function of `<code>b`</code>. If pointers `<code>a` </code> and `<code>b` </code> overlap, then there exists the possibility that if `<code>a` </code> is modified then `<code>b` </code> will also be modified, and therefore may create the possibility of a loop-carried dependency. This means the loop cannot be vectorized.

=== Alignment ===

To align data elements to an `<code>x` </code> amount of bytes in memory, use the `<code>align` </code> macro.

Code snippet that is used to align the data elements in the 'vec_samples' project.

To address this issue, add some padding.

For example, if you have a `<code>4 x 19` </code> array of floats, and your system access to a 128-bit vector registers, then you should add 1 column to make the array `<code>4 x 20` </code> so that the number of columns is evenly divisible by the number of floats that can be loaded onto a 128-bit vector register, which is 4 floats.

[INSERT IMAGE HERE]

= Summary =

Jespiritu

49

edits

Changes

GPU621/Intel Advisor

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools