Difference between revisions of "Solo Act"

From CDOT Wiki
Jump to: navigation, search
(Assignment 3)
(Assignment 3)
Line 71: Line 71:
  
  
The first performance consideration, for this project, was with respect to thread divergence. As you can see in the above image, from the course notes, the means by which a bsp tree processed is similar to the reduction function example from the course notes.  
+
 
 +
 
 +
 
 +
 
 +
[[File:Cudasplitblock.png|500px]]
 +
 
 +
 
  
  
Line 78: Line 84:
  
  
[[File:Cudasplitblock.png|500px]]
 
  
  
 +
[[File:Fig04-z-curve.png|500px]]
  
[[File:Fig04-z-curve.png|500px]]
 
  
  
  
 
[[File:Fig06-numbering.png|500px]]
 
[[File:Fig06-numbering.png|500px]]

Revision as of 20:22, 13 April 2018

Solo Act

Team Members

  1. Nick Simas, All of the things.

Progress

Assignment 1

Profile

For assignment 1, I selected an open-source dungeonGenerator project from github.

https://github.com/DivineChili/BSP-Dungeon-Generator/

Dungeon bsp2.png Dungeon bsp7.png

As you can see from the above images, the purpose of this program is to generate a map image for game content. The program achieves this by repetitively splitting a 2d space using a binary partition algorithm.

The project was written for windows, so I decided to initially profile with the built in Visual Studio profiler.

NjsimasProfileresults.png

The above image shows the function inclusive time percentage. The results are 22% and 15% respectively. The only entries higher are Main and library components used for printing. This demonstrates that the majority of source code processing is occurring in these two functions. Both of them would thus are hotspots and may benefit from parallelization.

Assignment 2

Parallelize

One of the immediate problems I realized with this project was that the target functions were far too large. Parallelization would require possibly three or more kernels, which was beyond the scope of this assignment. Instead I decided to focus on the bsp tree portion of the code, and decouple this portion from actual dungeon generating logic.


Sourcedecouple.png


The decoupled source-code can be seen above, along with the timing logic in it's respective main function.


Mytreedia.png


To parrellize a bsp tree required some analysis which can be seen above. The tree itself must be stored in memory according to some design. The way I decided to organize the leafs and nodes of the tree, in a linear context, can be seen below. The above image shows how the tree design corresponds to the linear arrangement in memory.


Warpss.png


The image above reflects the design I chose with respect to the warps and thread behavior. Each warp processes a single 'round' of the tree, utilizing as many threads equivalent to the number of leaves in that 'round'.


Kerneldecouple.png


The above image shows the kernel with the benchmarking logic in the main function. T


Outputgood.png


This next image shows an output example for the first five elements.


Njsimasgraph.png


Finally, the graphs above show a comparison in performance between the first, decouple function and the previous kernel. As you can see, the original, recursive function performs faster up until the 600th element where they are equal. The parallel kernel subsequently outperforms the recursive function.

Assignment 3

Optimize


Divergence.png




Cudasplitblock.png



Outputclobber.png



Fig04-z-curve.png



Fig06-numbering.png