Changes

Kernal Blas

358 bytes removed, 10:06, 4 April 2018

→‎Assignment 2

'''Parallelizing

~~From one of the suggested improvements in the algorithm post link. A potential improvement is changing from char& c to a const char in the for loop~~ ~~<syntaxhighlight lang="cpp">~~ ~~for (char& c : input) {~~ ~~</syntaxhighlight >~~ ~~since char& c is not being modified. Otherwise we~~ We did not see any other way to parallelize ~~compression~~the algorithm.

=== Assignment 2 ===

[[File:Prof.PNG]]

Profiling the code shows that '''~~memcpy~~cudaMalloc''' takes up most of the time spent. Even when

there are 10 iterations, the time remains at 300 milliseconds.

As the iteration passes 25 million, we have a bit of memory leak which results in inaccurate results.

----

After realizing the cudaMemcpy and cudaMalloc takes quite a bit of time, we focused our efforts on optimizing it.

It was difficult to find a solution because the initial copy ~~always~~ takes a bit of timeto set up.

We tried using cudaMallocHost to see if we can allocate memory instead of using malloc.

~~cadaMallocHost~~ cudaMallocHost will allocate pinned memory which is stored in RAM and can be accessed by the GPU's DMA directly.

We changed one part of our code

~~The error in PI estimation is how far it is from the known value of pi. PI = 3.1415926535~~

Here is we can see where an error occurs ~~and onward where~~ , we suspect that a memory leak causes the problem resulting in an error in pi calculation

'''Optimized time run results

Jpham14

96

edits

Changes

Kernal Blas

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools