Changes

Jump to: navigation, search

TudyBert

302 bytes added, 13:01, 19 April 2013
Assignment 2
Here's the code for newly parallelized method:
int idx = blockIdx.x * blockDim.x + threadIdx.x; int enlargeRow, enlargeCol; __shared__ int pixel;
for(int j = 0; j < nj; j++) { pixel = work[idx * nj + j]; enlargeRow = idx * factor; enlargeCol = j * factor; for(int c = enlargeRow; c < (enlargeRow + factor); c++) { for(int d = enlargeCol; d < (enlargeCol + factor); d++) { result[d + c * blockDim.x * gridDim.x * factor] = pixel; } } }    While I did see a decrease in the time taken to run 50 loops, the decrease wasn't as significant as I had hoped. Obviously this kernel isn't optimized so I'm looking forward to some more impressive results as I update the code.
=== Assignment 3 ===
1
edit

Navigation menu