Changes

Jump to: navigation, search

Kernal Blas

358 bytes removed, 10:06, 4 April 2018
Assignment 2
'''Parallelizing
From one of the suggested improvements in the algorithm post link. A potential improvement is changing from char& c to a const char in the for loop <syntaxhighlight lang="cpp">  for (char& c : input) { </syntaxhighlight > since char& c is not being modified. Otherwise we We did not see any other way to parallelize compressionthe algorithm.
=== Assignment 2 ===
[[File:Prof.PNG]] <br>
Profiling the code shows that '''memcpycudaMalloc''' takes up most of the time spent. Even when <br>
there are 10 iterations, the time remains at 300 milliseconds. <br>
As the iteration passes 25 million, we have a bit of memory leak which results in inaccurate results. <br><br>
----
After realizing the cudaMemcpy and cudaMalloc takes quite a bit of time, we focused our efforts on optimizing it.
It was difficult to find a solution because the initial copy always takes a bit of timeto set up.<br>
We tried using cudaMallocHost to see if we can allocate memory instead of using malloc. <br>
cadaMallocHost cudaMallocHost will allocate pinned memory which is stored in RAM and can be accessed by the GPU's DMA directly.
We changed one part of our code
<br/>
 
The error in PI estimation is how far it is from the known value of pi. PI = 3.1415926535
 
<br/>
Here is we can see where an error occurs and onward where , we suspect that a memory leak causes the problem resulting in an error in pi calculation
'''Optimized time run results
96
edits

Navigation menu