Changes

Jump to: navigation, search

Avengers

428 bytes added, 04:51, 31 March 2019
Assignment 3
=== Assignment 3 ===
To optimize our code, we used shared memory inside the kernel. On averageFor our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time for all 7 different problem sizes was reduced by approximately 216 milliseconds. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application.
Below is a picture of the optimized kernel:
[[File:QuickerKernel.PNG]]
 
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:
[[File:OptimizationGraph.PNG]]
46
edits

Navigation menu