Changes

Jump to: navigation, search

Avengers

2,640 bytes added, 15:27, 30 March 2019
Assignment 2
=== Assignment 2 ===
For Assignment 2, we decided to parallelize the application selected by Bruno.
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.
 
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.
 
[[File:NestedLoops.PNG]]
 
The nested for loops represent the serial way of calculating the dimensions. To parallelize this, we did the following:
 
1. Use CUDA device properties to design the grid and blocks.
 
[[File:cudaDevProps.PNG]]
 
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).
 
[[File:adjustNTPG.PNG]]
 
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):
* 1 array represents the hypotenuse side
* 1 array represents one of the sides of the triangle
* This was done by using CUDA's thrust library.
 
[[File:allocInit.PNG]]
 
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.
 
[[File:NBandLaunch.PNG]]
 
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.
 
[[File:kernel.PNG]]
 
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.
 
[[File:Timings.PNG]]
 
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.
 
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.
 
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.
 
(Image)
 
=== Assignment 3 ===
46
edits

Navigation menu