Jump to: navigation, search


1,209 bytes added, 18:26, 11 April 2018
Assignment 3
In our attempts to make the kernel scalable with ghost cells, we scaled along one dimension. However, we were inconsistent in our scaling. The 1D kernel scaled along the n (y) dimension while the 2d kernels scaled along the m (x) dimension. Scaling along the x dimension, while allowing results to be testable between serial and 2D parallelized versions of the code, produced distributions that were strangely banded and skewed. In other words, we made the code render weird things faster:
FINAL TIMINGS <pre style="color: red"> THE GRAPH IMMEDIATELY BELOW IS INCORRECT: there was an error recording the 1D runtimes for assignment 2</pre>
Unfortunately our group's inability to effectively use profiling tools has left this discrepancy as a mystery.
In conclusion, while it may be possible to parallelize the algorithm we chose well, the effort to do so would involve ensuring that shared memory is properly synchronized in two block dimensions (2 dimensions of ghost cells rather than the 1 we implemented), and to ensure that shared memory is allocated appropriately such that maximum occupancy is established within the GPU. Unfortunately, our attempts fell short, and while implementing constant memory seemed to speed up the kernel a bit, our solution was not fully scalable in both dimensions, and shared memory was not implemented in a way that improved kernel efficiency.

Navigation menu