Changes

← Older edit

Team NP Complete

40 bytes removed, 12:58, 22 December 2017

→‎OpenMP

OpenMP is a parallel programming API provided by Intel for C, C++, and Fortran. It enables flexible implementation of parallel algorithms, allowing computers of all builds to utilize all cores it has access to.

In this project, the <nowiki>#pragma omp parallel for</nowiki> statement was used in several locations in the program where for loops had no ~~external~~ dependencies. Where there were dependencies, math was used in such a way that the for loops no longer ~~required~~ had the ~~external variables~~dependencies. Their usage will be discussed further down.

=Program=

==Without ~~Parallel Processes~~OpenMP==

[[File:FFT.png|thumb|fig 3.1 Fourier transformation code block.]]

[[File:NonParallel.png|1000px|center|Non-Parallel Process]]

==With ~~Optimized Parallel Processes~~OpenMP and Optimization==

[[File:PFFT.png|thumb|fig 4.1 Fourier transformation code block, with OpenMP parallelization.]]

[[File:PFT.png|thumb|fig 4.2 Fourier transformations called in OpenMP.]]

[[File:Parallel.png|1000px|center|Non-Parallel Process]]

=Comparison and Analysis=

We parallelized the code for rendering and state evolution above. We did not, however, yet parallelize the FFT code. The reason for this is that we initially thought that there was a loop-carried dependency that could not be resolved:

[[File:FFT_crop.png|~~1000px~~500px|none|Hotspot]]

The issue is the <code>T *= phiT;</code> line. This means that every time the loop counter <code>l</code> increases, <code>T</code> gets multiplied by <code>phiT</code>. This statement seems painfully obvious, but it prevents us from parallelizing the code since the iterations can't be done in an arbitrary order. What it does mean, however, is that we can remove that line and replace any usage of <code>T</code> with <code>pow(phiT, l)</code>. We can then parallelize it, since the iterations are now order-invariant. When we do that, the FPS somehow does not change. In fact, if we remove the parallel for construct, the FPS drops to a meager 1 frame per second. This is awful, and likely because the <code>pow()</code> operation is very computationally expensive. Are we stuck? Of course not. We can apply math. There is a property of complex numbers which allows us to turn exponentiation into multiplication. If we write the complex number <code>phiT</code> as <code>phiT = cos(arg) + i * sin(arg)</code>, and we can since it has norm 1, we have <code>phiT ** l = cos(l * arg) + i * sin(l * arg)</code>. This gave a tremendous speedup since the trigonometric functions are apparently less costly than exponentiation. The code and new vTune analyses are below.

Claffan

28

edits

CDOT Wiki β

Changes

Team NP Complete

CDOT Wiki ^β