Changes

Jump to: navigation, search

Kernal Blas

189 bytes removed, 09:37, 4 April 2018
Assignment 3
----
After realizing the cudaMemcpy was took quite a bit of time, we focused our efforts on optimizing it.
It was difficult to find a solution because the initial copy always takes a bit of time.<br>
We tried using cudaMallocHost to see if we can allocate memory instead of using malloc. <br>
cadaMallocHost will allocate pinned memory which is stored in RAM and can be accessed by the GPU's DMA directly.
We changed one part of our code
The kernel code we used to optimize our code
<syntaxhighlight lang="cpp">
__global__ cudaMallocHost((void gpu_monte_carlo(float *estimate, curandState *states)&host, float nsize) { unsigned int tid = threadIdx.x + blockDim.x * blockIdx.x; float points_in_circle = 0; float x, y;
curand_init(1234, tid, 0, &states[tid]); // Initialize CURAND
 
 
for (int i = 0; i < n; i++) {
x = curand_uniform(&states[tid]);
y = curand_uniform(&states[tid]);
points_in_circle += (x*x + y*y <= 1.0f); // count if x & y is in the circle.
}
estimate[tid] = 4.0f * points_in_circle / n; // return estimate of pi
}
</syntaxhighlight>
How we optimize and improved the code from assignment 2 is instead of using a randomized number we ask the user for input on pi calculation. As expected <br/>
The error in PI estimation is how far it is from the known value of pi. PI = 3.1415926535
<br>
[[File:kernal-blas-optimized.png]]
[[File:Chartp3.PNG]]
96
edits

Navigation menu