Changes

Team CNT

684 bytes added, 22:03, 18 April 2013

→‎Assignment 3

Now I realized I did not use shared memory for assignment 2. So I changed that. In my assignment 3 I am using shared memory. As for assignment 2 I am doing 1000 times image negate. Last lectures for this course helped a lot to learn to optimize. Specially coalesced access. It is amazing! In assignment 2 my negate function calculations became faster almost twice just because I was using the kernel. I was thinking there is nothing to be done more. And now my negate function calculations became faster more than twice than in assignment 2 thanks to the optimization. This is my profile for assignment 3:

[[Image:Profiler_scr.png]]

The code for Kernel is here:

<pre>

__global__ void cudaNegateImage2D_Coalescence(int *result, const int *work, int ni, int nj)

{

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

__shared__ float cache_a[NTPB][NTPB];

__shared__ float cache_b[NTPB][NTPB];

cache_a[threadIdx.y][threadIdx.x] = work[j * ni + i];

//cache_b[threadIdx.x][threadIdx.y] = b[j * NTPB + j];

__syncthreads();

if (i < ni && j < nj) {

//result[i * nj + j] = -(work[i * nj + j]) + 255;

cache_b[threadIdx.y][threadIdx.x] = -cache_a[threadIdx.y][threadIdx.x] + 255;

}

__syncthreads();

result[j * ni + i] = cache_b[threadIdx.y][threadIdx.x];

}

</pre>

Natalia Levkevitch

1

edit

Changes

Team CNT

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools