Revision as of 20:22, 13 April 2018

Solo Act

Team Members

Nick Simas, All of the things.

Progress

Assignment 1

Profile

For assignment 1, I selected an open-source dungeonGenerator project from github.

https://github.com/DivineChili/BSP-Dungeon-Generator/

As you can see from the above images, the purpose of this program is to generate a map image for game content. The program achieves this by repetitively splitting a 2d space using a binary partition algorithm.

The project was written for windows, so I decided to initially profile with the built in Visual Studio profiler.

The above image shows the function inclusive time percentage. The results are 22% and 15% respectively. The only entries higher are Main and library components used for printing. This demonstrates that the majority of source code processing is occurring in these two functions. Both of them would thus are hotspots and may benefit from parallelization.

Assignment 2

Parallelize

One of the immediate problems I realized with this project was that the target functions were far too large. Parallelization would require possibly three or more kernels, which was beyond the scope of this assignment. Instead I decided to focus on the bsp tree portion of the code, and decouple this portion from actual dungeon generating logic.

The decoupled source-code can be seen above, along with the timing logic in it's respective main function.

To parrellize a bsp tree required some analysis which can be seen above. The tree itself must be stored in memory according to some design. The way I decided to organize the leafs and nodes of the tree, in a linear context, can be seen below. The above image shows how the tree design corresponds to the linear arrangement in memory.

The image above reflects the design I chose with respect to the warps and thread behavior. Each warp processes a single 'round' of the tree, utilizing as many threads equivalent to the number of leaves in that 'round'.

The above image shows the kernel with the benchmarking logic in the main function. T

This next image shows an output example for the first five elements.

Finally, the graphs above show a comparison in performance between the first, decouple function and the previous kernel. As you can see, the original, recursive function performs faster up until the 600th element where they are equal. The parallel kernel subsequently outperforms the recursive function.

Assignment 3

Optimize

@@ Line 71: / Line 71: @@
-The first performance consideration, for this project, was with respect to thread divergence. As you can see in the above image, from the course notes, the means by which a bsp tree processed is similar to the reduction function example from the course notes.
+[[File:Cudasplitblock.png|500px]]
@@ Line 78: / Line 84: @@
-[[File:Cudasplitblock.png|500px]]
+[[File:Fig04-z-curve.png|500px]]
-[[File:Fig04-z-curve.png|500px]]
 [[File:Fig06-numbering.png|500px]]

Difference between revisions of "Solo Act"

Revision as of 20:22, 13 April 2018

Contents

Solo Act

Team Members

Progress

Assignment 1

Assignment 2

Assignment 3

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools