Changes

GPUSquad

496 bytes added, 14:05, 7 April 2018

→‎Assignment 1

</source>

The hotspot seems to be the double for-loop based on m and n in the Jacobi iterations code of the dojacobi() function. I believe these matrix calculations could be parallelized for improved performance. Note that the for-loop that the double loop is inside of is based on a constant numbers, iters, so it doesn't grow with the problem size. It would be O(iters * n^2) which is still O(n^2) not O(n^3).

</pre>

==== Idea 2 - LZW Compression ====

=== Assignment 2 ===

We parallelized the original code by placing the jacobi calculations into a kernel. For this initial parallel version, we only used 1D threading and had each thread run a for loop for the other dimension.

The iters loop launches a kernel for each iteration and we use double buffering (where we choose to launch the kernel with either d_a, d_b or d_b, d_a) since we can't simply swap pointers like in the serial code.

Tsarkarcd

93

edits

Changes

GPUSquad

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools