
Jump to: navigation, search


730 bytes added, 20:18, 7 April 2018
no edit summary
=== Assignment 3 ===
We had realized that our implementation of a kernel had made some massive improvements, compared to the serial version, but after profiling the Assignment 2 version we had noticed that we could still make improvements. <br><br>Problem:
The kernels had been executing concurrently but the percentage of concurrency was quite low.
Initiate thread count based on Compute Capability of the CUDA device.
The number of threads that were initialized per block had been calculated based on resident threads and blocks.
The number of blocks for the grid had been recalculated to incorporate the complexity of the image and the new threads per block.
=== Results ===

Navigation menu