Jump to: navigation, search


1,283 bytes added, 12:53, 19 April 2013
The chunk of the processing is wasted on copying the two arrays over from one image to another. If I have time I might look into parallelizing this as well. It would be interesting to see if the speed of the GPU can overcome the overhead of copying to and from the device.
=== Assignment 2 ===
For Assignment 2 I simply put the four for loops into a kernel and replaced the outermost loop with thread indices. I made a helper method that set up memory on the device and launched the kernel with a 1 dimension array of blocks each containing 1 thread. I launched as many blocks of 1 thread as there were rows in the image file. I figured this was the quickest way to get this method parallelized. Unfortunately I hit a wall with my data sizes. The CPU version of the enlarge image method fails when run for more than 50 loops. The error thrown is a Visual Studio debugging error so I'm think VS isn't too happy with having the CPU hogged for so long. As a result I've had to extrapolate times for larger loops by assuming a linear increase in time taken.
Here's the code for newly parallelized method:
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int enlargeRow, enlargeCol;
__shared__ int pixel;
for(int j = 0; j < nj; j++)
pixel = work[idx * nj + j];
enlargeRow = idx * factor;
enlargeCol = j * factor;
for(int c = enlargeRow; c < (enlargeRow + factor); c++)
for(int d = enlargeCol; d < (enlargeCol + factor); d++)
result[d + c * blockDim.x * gridDim.x * factor] = pixel;
=== Assignment 3 ===

Navigation menu