Changes

← Older edit

BetaT

748 bytes added, 19:10, 12 April 2017

no edit summary

}

~~Array 1 Array 2~~==== HOW THE ALGORITHM WORKS ====

~~xxxxx xxxxx~~ ~~xxxxx xxxxx~~ This is focusing on the algorithm inside the CALCULATE Kernel only.

~~xxxxx xxxxx~~ 1. We begin with 2 Arrays

~~xxxxx xxxxx~~ [[File:2Arrazs.png]]

~~xxxxx xxxxx~~

~~Upon initialization the 1st~~ 2. The first column of the ~~first~~ First array ~~has its variables set depending on a condition, this will be represented~~ is initialized by ~~(o)'s below~~the INITIALIZE Kernel.

~~Array 1 Array 2~~[[File:Initialize.png]]

~~oxxxx xxxxx~~ ~~oxxxx xxxxx~~ 3. The second array copies the values from the first column of the First array

~~oxxxx xxxxx~~ [[File:Copy1stColumn.png]]

~~oxxxx xxxxx~~ 4. The First array copies a single value from the Second array

~~oxxxx xxxxx~~ [[File:2ndCall.png]]

~~The next kernel below will execute the following calucations~~5. ~~1st:~~ ~~Array 2 will copy~~ The remaining values for the ~~first~~ 2nd column of ~~Array 1.~~the First array are calculated through the Second array as follows.~~This will be represented by (o)'s on Array2~~

~~Array 1 Array 2~~[[File:3rdCall.png]]

~~oxxxx oxxxx~~ ~~oxxxx oxxxx~~ ~~oxxxx oxxxx~~ ~~oxxxx oxxxx~~ ~~oxxxx oxxxx~~ ~~2nd: Array 1 will set the values in its [0,1] dimension->(marked by 2) to the values in Array 2's [1,0] dimension (marked by a 2)~~6. ~~Array 1 Array 2~~ ~~o2xxx oxxxx~~ ~~oxxxx 2xxxx~~ ~~oxxxx oxxxx~~ ~~oxxxx oxxxx~~ ~~oxxxx oxxxx~~ ~~3rd:~~ ~~Next Array 1 will calculate its next~~ The 2nd column ~~(marked by~~ of the ~~3) by performing a calculation as shown above on Array 2's first~~ First array is now copied into the 2nd column ~~(marked by 3 ).~~ ~~Array 1 Array 2~~ ~~o3xxx 3xxxx~~ ~~o3xxx 3xxxx~~ ~~o3xxx 3xxxx~~ ~~o3xxx 3xxxx~~ ~~o3xxx 3xxxx~~ ~~This process will loop until it has reached end~~ of the Second arrayand the cycle is repeated until finished.

[[File:LAstReset.png]]

== CPU VS GPU Loop Comparisons Only==

== THIRD OPTIMIZATION ==

=== SAVING TRAVEL COSTS BY REMOVING THE UNNECESSARY ARRAY ===

As we discovered above, the second array is not necessary while we are performing all the calculations on Shared Memory which can be seen in section 3.3.2. This provides us with the ability to further optimize our Kernel by reducing the amount of time we spend transferring data across the PCI bus. Below is an image of the data transfer times for the CALCULATE kernel.

[[File:OPTIMIZATIONCOMPARISON.png]]

= CONCLUSIONS =

== OVERALL TIME COMPARISONS ==

Below are the final comparisons of all execution times between the CPU and GPU.

All times are in milliseconds.

[[File:finalCompare.png]]

== APPLICATION OUTPUT ==

Upon completion of the application it will create a file based on the output of the algorithm. The following image below displays that output comparing the original program to the parallelized program.

[[File:outputs.png]]

== FINAL THOUGHTS ==

Upon completion of this Project I have learned a few things:

First, I learned that not all program can be parallelized even if they seem to be a good candidate to begin with.

Secondly, understand the algorithm of the application is a key factor in being able to optimize the solution, because sometimes you will need to rearrange the code in order to obtain better performance from the GPU and understanding the algorithm will help ensure that the output at the end of the program will remain the same.

Thirdly the management of resources and constraints, having registers, shared memory, constant memory, latency, threads, and multi-processors are all factors which need to be considered when using the GPU. Understanding how these resources can impact and influence your program helps deciding which ones to use in specific situations.

Jadach1

212

edits

CDOT Wiki β

Changes

BetaT

CDOT Wiki ^β