Changes

DPS915 C U D A B O Y S

136 bytes removed, 17:01, 5 November 2015

→‎Description

And replacing it with

...

~~if (mode == 0)~~

~~getInversionBuffer << < dGrid, dBlock >> >(d_a, bufferSize, d_output);~~

if (mode == 1)

getCycleBuffer << < dGrid, dBlock >> >(d_a, bufferSize, d_output);

if (mode == 2)

getRC4Buffer << < dGrid, dBlock >> >(d_a, bufferSize, d_output);

~~...~~

~~Removing the CPU bottleneck inside the <code>xorCipher</code> method:~~

~~for (int i = 0; i < bufferSize; i++){~~

~~// inverting every byte in the buffer~~

~~buffer[i] = buffer[i] ^ keyBuffer[i];~~

}

~~And replacing it with~~

~~...~~

~~getXorBuffer << < (n + ntpb - 1) / ntpb, ntpb >> >(d_a, d_b, bufferSize);~~

...

'''Creating Kernels'''

We created kernels for each of the 4 2 different methods of Cipher that the program handles(RC4 and Cycle, but not the others -- read on):

/**

* Description: RC4 Cuda Kernel

}

/**

* Description: Inversion Cuda Kernel

**/

~~__global__ void getInversionBuffer(char * buffer, int bufferSize) {~~

~~int idx = blockIdx.x * blockDim.x + threadIdx.x;~~

~~if (idx < bufferSize)~~

~~buffer[idx] = ~buffer[idx];~~

}

/**You may be asking what about the two other methods of cipher: '''byte inversion''' and '''xor cipher'''? Well, as it turns out these methods run perfectly fine on the CPU and usually are faster on the CPU than the GPU. We initially had converted these functions over to CUDA, but we soon discovered that these functions did not need to be converted as they ran faster on the CPU than they did on the GPU. * DescriptionHere's an example of run time of Xor Cipher on both CPU and GPU with the 789MB file: ~~XOR Cuda Kernel~~ **GPU: http://i.imgur.com/0PsLxzQ.png -- 6.263 seconds ~~__global__ void getXorBuffer(char * buffer, char * keyBuffer, int bufferSize) {~~ ~~int idx = blockIdx~~CPU: http://i.imgur.~~x * blockDim~~com/ktn14q3.~~x + threadIdx~~png -- 3.x;722 seconds ~~if (idx < bufferSize)~~ ~~buffer[idx] = buffer[idx] ^ keyBuffer[idx];~~ }As we can see, the CPU runs way faster than the GPU: no parallelization needed here!

==== Profiling ====

Johnathan Ragimov

1

edit

Changes

DPS915 C U D A B O Y S

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools