Open main menu

CDOT Wiki β

Changes

BetaT

748 bytes added, 19:10, 12 April 2017
no edit summary
}
Array 1 Array 2==== HOW THE ALGORITHM WORKS ====
xxxxx xxxxx xxxxx xxxxx This is focusing on the algorithm inside the CALCULATE Kernel only.
xxxxx xxxxx 1. We begin with 2 Arrays
xxxxx xxxxx [[File:2Arrazs.png]]
xxxxx xxxxx
Upon initialization the 1st 2. The first column of the first First array has its variables set depending on a condition, this will be represented is initialized by (o)'s belowthe INITIALIZE Kernel.
Array 1 Array 2[[File:Initialize.png]]
oxxxx xxxxx oxxxx xxxxx 3. The second array copies the values from the first column of the First array
oxxxx xxxxx [[File:Copy1stColumn.png]]
oxxxx xxxxx 4. The First array copies a single value from the Second array
oxxxx xxxxx [[File:2ndCall.png]]
The next kernel below will execute the following calucations5. 1st: Array 2 will copy The remaining values for the first 2nd column of Array 1.the First array are calculated through the Second array as follows.This will be represented by (o)'s on Array2
Array 1 Array 2[[File:3rdCall.png]]
oxxxx oxxxx oxxxx oxxxx  oxxxx oxxxx  oxxxx oxxxx  oxxxx oxxxx   2nd: Array 1 will set the values in its [0,1] dimension->(marked by 2) to the values in Array 2's [1,0] dimension (marked by a 2)6Array 1 Array 2 o2xxx oxxxx oxxxx 2xxxx  oxxxx oxxxx  oxxxx oxxxx  oxxxx oxxxx   3rd: Next Array 1 will calculate its next The 2nd column (marked by of the 3) by performing a calculation as shown above on Array 2's first First array is now copied into the 2nd column (marked by 3 ).  Array 1 Array 2 o3xxx 3xxxx o3xxx 3xxxx  o3xxx 3xxxx  o3xxx 3xxxx  o3xxx 3xxxx   This process will loop until it has reached end of the Second arrayand the cycle is repeated until finished.
[[File:LAstReset.png]]
== CPU VS GPU Loop Comparisons Only==
== THIRD OPTIMIZATION ==
=== SAVING TRAVEL COSTS BY REMOVING THE UNNECESSARY ARRAY ===
As we discovered above, the second array is not necessary while we are performing all the calculations on Shared Memory which can be seen in section 3.3.2. This provides us with the ability to further optimize our Kernel by reducing the amount of time we spend transferring data across the PCI bus. Below is an image of the data transfer times for the CALCULATE kernel.
[[File:OPTIMIZATIONCOMPARISON.png]]
 
 
= CONCLUSIONS =
 
== OVERALL TIME COMPARISONS ==
 
Below are the final comparisons of all execution times between the CPU and GPU.
 
All times are in milliseconds.
 
[[File:finalCompare.png]]
 
== APPLICATION OUTPUT ==
 
Upon completion of the application it will create a file based on the output of the algorithm. The following image below displays that output comparing the original program to the parallelized program.
 
[[File:outputs.png]]
 
== FINAL THOUGHTS ==
 
Upon completion of this Project I have learned a few things:
 
First, I learned that not all program can be parallelized even if they seem to be a good candidate to begin with.
 
Secondly, understand the algorithm of the application is a key factor in being able to optimize the solution, because sometimes you will need to rearrange the code in order to obtain better performance from the GPU and understanding the algorithm will help ensure that the output at the end of the program will remain the same.
 
Thirdly the management of resources and constraints, having registers, shared memory, constant memory, latency, threads, and multi-processors are all factors which need to be considered when using the GPU. Understanding how these resources can impact and influence your program helps deciding which ones to use in specific situations.
212
edits