Changes

Jump to: navigation, search

Happy Valley

535 bytes added, 09:02, 9 April 2018
Assignment 3
- Low Memcpy/Compute Overlap
Concurrent Kernel Execution can let CUDA programmers launch several kernels asynchronouslyby utilizing Stream functionalities. Unfortunately , it is not applicable for the Bitonic sort algorithm since for the same reason we cannot parallelize 2 outer loops. If we launch kernels in parallel, they will start 'competing' for the data values and thus we will end up having race conditions. Low memcpy/compute overlap is related to the Concurrent Kernel Execution. In theory, you can pass chunks of the input array asynchronously into each kernel in the array. However, it seems to be hard to partition the inout data in any meaningful way.  
''' Source Code '''
68
edits

Navigation menu