Changes

Jump to: navigation, search

GPU610/TeamKappa

1,669 bytes added, 09:09, 16 December 2015
Code Snippet of CUDA kernel
'''kernel CUDA version work in progress...and comparasions'''
 
==== Assignment 2 - Code Snippet of CUDA kernel====
<code><pre>__global__ void montecarlo(const double* d_x, const double* d_y, int* d_score) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if ((d_x[idx]) * (d_x[idx]) +
(d_y[idx]) * (d_y[idx]) <= 1){
d_score[idx] = 1;
}
else
d_score[idx] = 0;
}
</pre></code>
 
 
It is not so much of a huge performance difference over the CPU code. The issue is with the data being transferred and initialized.
It appears the calculation time of the program were only a fraction of the computation. The problem lies when the generation of random
of points on the CPU on host and copying it over device. However, the GPU compute the results much quicker than the CPU. The next approach is to find a way to generate the random numbers concurrently on the GPU reducing the amount of serial work on the CPU.
 
 
 
==== Assignment 3 - Code Snippet of CUDA kernel Optimized version====
<code><pre>___global__ void montecarlo(const double* d_x, const double* d_y, int* d_score) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if ((d_x[idx]) * (d_x[idx]) +
(d_y[idx]) * (d_y[idx]) <= 1){
d_score[idx] = 1;
}
else
d_score[idx] = 0;
}
 
 
__global__ void reduce(int* c, int* d, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int t = threadIdx.x;
__shared__ float s_c[1024];
if (i < n)
s_c[t] = c[i];
else
s_c[t] = 0;
__syncthreads();
 
for (int stride = 1; stride < blockDim.x; stride *= 2) {
if (t % (2 * stride) == 0)
s_c[t] += s_c[t + stride];
__syncthreads();
}
 
if (t == 0)
d[blockIdx.x] = s_c[0];
}
</pre></code>
 
Currently having problems trying to generate random numbers using cuRAND
== Assignment 2 ==
13
edits

Navigation menu