Changes

Avengers

428 bytes added, 04:51, 31 March 2019

→‎Assignment 3

=== Assignment 3 ===

To optimize our code, we used shared memory inside the kernel. ~~On average~~For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time ~~for all 7 different problem sizes was reduced by approximately 216 milliseconds~~. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application.

Below is a picture of the optimized kernel:

[[File:QuickerKernel.PNG]]

On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.

Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:

[[File:OptimizationGraph.PNG]]

Jsidhu26

46

edits

Changes

Avengers

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools