https://wiki.cdot.senecacollege.ca/w/api.php?action=feedcontributions&user=Jsidhu26&feedformat=atomCDOT Wiki - User contributions [en]2024-03-28T12:43:47ZUser contributionsMediaWiki 1.30.0https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138839Avengers2019-04-06T05:56:44Z<p>Jsidhu26: /* Other Optimizations */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. In this case, the function that is applied to each element in the 2-dimensional array is 'std::rand()/100'. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot as it is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
==== Using Shared Memory ====<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
==== Results ====<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]<br />
<br />
==== Other Optimizations ====<br />
<br />
Another optimization would be to optimize the original algorithm itself. The pow() function itself very expensive because it is software implemented. A better alternative would be to use hardware functions. In this case, it would be quicker to use the (*) operator and multiply the values with themselves as opposed to setting them to the power of 2.<br />
<br />
[[File:OptimizedAlgorithm.PNG]]<br />
<br />
After modifying the code to reflect this change, both the GPU and CPU methods are roughly equal to each other (in terms of execution time) when the value of n is 800.</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:OptimizedAlgorithm.PNG&diff=138838File:OptimizedAlgorithm.PNG2019-04-06T05:54:29Z<p>Jsidhu26: A graph that reflects the time taken for each problem size after optimizing the original algorithm</p>
<hr />
<div>A graph that reflects the time taken for each problem size after optimizing the original algorithm</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138837Avengers2019-04-06T05:51:19Z<p>Jsidhu26: /* Other Optimizations */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. In this case, the function that is applied to each element in the 2-dimensional array is 'std::rand()/100'. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot as it is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
==== Using Shared Memory ====<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
==== Results ====<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]<br />
<br />
==== Other Optimizations ====<br />
<br />
Another optimization would be to optimize the original algorithm itself. The pow() function itself very expensive because it is software implemented. A better alternative would be to use hardware functions. In this case, it would be quicker to use the (*) operator and multiply the values with themselves as opposed to setting them to the power of 2.<br />
<br />
(Image)<br />
<br />
After modifying the code to reflect this change, both the GPU and CPU methods are roughly equal to each other (in terms of execution time) when the value of n is 800.</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138836Avengers2019-04-06T05:43:27Z<p>Jsidhu26: /* Results */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. In this case, the function that is applied to each element in the 2-dimensional array is 'std::rand()/100'. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot as it is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
==== Using Shared Memory ====<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
==== Results ====<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]<br />
<br />
==== Other Optimizations ====<br />
<br />
Another optimization would be to optimize the original algorithm itself. The pow() function itself very expensive because it is software implemented. A better alternative would be to use hardware functions. In this case, it would be quicker to use the (*) operator and multiply the values with themselves as opposed to setting them to the power of 2.<br />
<br />
After modifying the code to reflect this change, both the GPU and CPU methods are roughly equal to each other (in terms of execution time) when the value of n is 800.</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138671Avengers2019-04-01T00:32:59Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. In this case, the function that is applied to each element in the 2-dimensional array is 'std::rand()/100'. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot as it is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
==== Using Shared Memory ====<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
==== Results ====<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138660Avengers2019-03-31T08:55:18Z<p>Jsidhu26: /* Using Shared Memory */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
==== Using Shared Memory ====<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
==== Results ====<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138659Avengers2019-03-31T08:54:37Z<p>Jsidhu26: /* Assignment 3 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
==== Using Shared Memory ====<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138658Avengers2019-03-31T08:51:39Z<p>Jsidhu26: /* Assignment 3 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
<br />
To optimize our code, we used shared memory inside the kernel. For our purposes, allocating arrays in the kernel using shared memory required a constant value for the number of threads per block. This meant that the number of threads per block could not be calculated at run time. Instead, we set the number of threads per block to 1024 and declared it as a constant in the beginning of the application. This allowed us to use shared memory inside the kernel and optimize our application. <br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138657Avengers2019-03-31T08:42:57Z<p>Jsidhu26: /* Assignment 3 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
<br />
To optimize our code, we used shared memory inside the kernel. On average, the run time for all 7 different problem sizes was reduced by approximately 216 milliseconds.<br />
<br />
Below is a picture of the optimized kernel:<br />
<br />
[[File:QuickerKernel.PNG]]<br />
<br />
<br />
Below is a graph that illustrates the time saved on all 7 different problem sizes with the optimized kernel:<br />
<br />
[[File:OptimizationGraph.PNG]]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:OptimizationGraph.PNG&diff=138656File:OptimizationGraph.PNG2019-03-31T08:41:09Z<p>Jsidhu26: this graph illustrates the optimization for each problem size</p>
<hr />
<div>this graph illustrates the optimization for each problem size</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138655Avengers2019-03-31T08:30:50Z<p>Jsidhu26: /* Results */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===<br />
<br />
To optimize our code, we used shared memory inside the kernel. This reduced the run time for each problem size</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:TimeComparison.PNG&diff=138653File:TimeComparison.PNG2019-03-31T08:29:14Z<p>Jsidhu26: Jsidhu26 uploaded a new version of File:TimeComparison.PNG</p>
<hr />
<div>A graph that shows the timings for both serial and parallel approaches to finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:TimeComparison.PNG&diff=138654File:TimeComparison.PNG2019-03-31T08:29:04Z<p>Jsidhu26: Jsidhu26 uploaded a new version of File:TimeComparison.PNG</p>
<hr />
<div>A graph that shows the timings for both serial and parallel approaches to finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:TimeComparison.PNG&diff=138652File:TimeComparison.PNG2019-03-31T08:25:02Z<p>Jsidhu26: Jsidhu26 uploaded a new version of File:TimeComparison.PNG</p>
<hr />
<div>A graph that shows the timings for both serial and parallel approaches to finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138651Avengers2019-03-31T08:23:02Z<p>Jsidhu26: /* Assignment 2 & 3 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
<br />
=== Assignment 3 ===<br />
<br />
To optimize our code, we used shared memory inside the kernel. This reduced the run time for each problem size</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:TimeComparison.PNG&diff=138650File:TimeComparison.PNG2019-03-31T08:22:27Z<p>Jsidhu26: Jsidhu26 uploaded a new version of File:TimeComparison.PNG</p>
<hr />
<div>A graph that shows the timings for both serial and parallel approaches to finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:QuickerKernel.PNG&diff=138649File:QuickerKernel.PNG2019-03-31T08:10:31Z<p>Jsidhu26: using shared memory, the kernel's timings are optimized and printing from the kernel still works properly</p>
<hr />
<div>using shared memory, the kernel's timings are optimized and printing from the kernel still works properly</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138648Avengers2019-03-31T06:54:33Z<p>Jsidhu26: /* Assignment 3 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 & 3 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138647Avengers2019-03-31T06:54:04Z<p>Jsidhu26: /* Assignment 2 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 & 3 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
[[File:TimeComparison.PNG]]<br />
<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:TimeComparison.PNG&diff=138646File:TimeComparison.PNG2019-03-31T06:53:40Z<p>Jsidhu26: A graph that shows the timings for both serial and parallel approaches to finding Pythagorean triples</p>
<hr />
<div>A graph that shows the timings for both serial and parallel approaches to finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138642Avengers2019-03-30T19:41:24Z<p>Jsidhu26: /* Assignment 2 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
(Image)<br />
<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138641Avengers2019-03-30T19:39:16Z<p>Jsidhu26: /* Team Avengers */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
(Image)<br />
<br />
<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138640Avengers2019-03-30T19:36:45Z<p>Jsidhu26: /* Progress */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
==== Identifying Parallelize-able Code ====<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. <br />
<br />
==== Offloading Process ====<br />
To parallelize the code mentioned above, we did the following:<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
==== Time Logging ====<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
<br />
==== Results ====<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
(Image)<br />
<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=138639Avengers2019-03-30T19:27:24Z<p>Jsidhu26: /* Assignment 2 */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
For Assignment 2, we decided to parallelize the application selected by Bruno.<br />
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.<br />
<br />
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.<br />
<br />
[[File:NestedLoops.PNG]]<br />
<br />
The nested for loops represent the serial way of calculating the dimensions. To parallelize this, we did the following:<br />
<br />
1. Use CUDA device properties to design the grid and blocks.<br />
<br />
[[File:cudaDevProps.PNG]]<br />
<br />
2. Adjust the number of threads to be used in the grid depending on the value passed in by the user (max hyptonuse value).<br />
<br />
[[File:adjustNTPG.PNG]]<br />
<br />
3. Allocated 2 arrays on device, initialized from 1 to the maximum hypotenuse (given by the user as an argument):<br />
<br />
* 1 array represents the hypotenuse side<br />
* 1 array represents one of the sides of the triangle<br />
* This was done by using CUDA's thrust library.<br />
<br />
[[File:allocInit.PNG]]<br />
<br />
4. Calculated the number of blocks required and iterated through the thrust vector, passing each individual element to the kernel launch along with the two previously allocated arrays.<br />
<br />
[[File:NBandLaunch.PNG]]<br />
<br />
5. The kernel contains the instructions for verifying whether the value passed in is part of a Pythagorean triple. If a Pythagorean triple is found, the values are printed out.<br />
<br />
[[File:kernel.PNG]]<br />
<br />
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.<br />
<br />
[[File:Timings.PNG]]<br />
<br />
calculateSerial() contains the initial version of the application. It has 3 nested for loops and has a serialized approach to finding the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.<br />
<br />
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.<br />
<br />
(Image)<br />
<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:Timings.PNG&diff=138638File:Timings.PNG2019-03-30T19:25:19Z<p>Jsidhu26: shows how the timings were reported for the serial and parallel way of finding Pythagorean triples</p>
<hr />
<div>shows how the timings were reported for the serial and parallel way of finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:NestedLoops.PNG&diff=138637File:NestedLoops.PNG2019-03-30T19:23:00Z<p>Jsidhu26: Shows the serialized approach for finding Pythagorean triples</p>
<hr />
<div>Shows the serialized approach for finding Pythagorean triples</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:NBandLaunch.PNG&diff=138636File:NBandLaunch.PNG2019-03-30T19:21:51Z<p>Jsidhu26: shows the adjustment for blocks in grid, and launch of kernel</p>
<hr />
<div>shows the adjustment for blocks in grid, and launch of kernel</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:Kernel.PNG&diff=138635File:Kernel.PNG2019-03-30T19:21:25Z<p>Jsidhu26: an image of the kernel used for parallelizing</p>
<hr />
<div>an image of the kernel used for parallelizing</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:CudaDevProps.PNG&diff=138634File:CudaDevProps.PNG2019-03-30T19:20:05Z<p>Jsidhu26: used the device properties to design the grid</p>
<hr />
<div>used the device properties to design the grid</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:AllocInit.PNG&diff=138633File:AllocInit.PNG2019-03-30T19:18:50Z<p>Jsidhu26: shows the allocation and initialization of the device arrays</p>
<hr />
<div>shows the allocation and initialization of the device arrays</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:AdjustNTPG.PNG&diff=138632File:AdjustNTPG.PNG2019-03-30T19:17:30Z<p>Jsidhu26: shows the adjusting of the number of threads to be used in the grid</p>
<hr />
<div>shows the adjusting of the number of threads to be used in the grid</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137820Avengers2019-02-16T04:30:00Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915 Link to Array Processing code]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into sub-arrays and a process can be assigned to each sub-array.<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137819Avengers2019-02-16T04:28:18Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
[https://github.com/jsidhu26/DPS915]<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into subarrays and a process can be assigned to each subarray.<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137804Avengers2019-02-16T03:08:10Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
'''! Insert link here !'''<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).<br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into subarrays and a process can be assigned to each subarray.<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137800Avengers2019-02-16T03:06:36Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
'''! Insert link here !'''<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2). In a serial manner, the function accesses each element in the array and assigns a random number to it. <br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into subarrays and a process can be assigned to each subarray.<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137799Avengers2019-02-16T03:05:27Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
'''! Insert link here !'''<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other. In a serial manner, the function accesses each element in the array and assigns a random number to it. <br />
<br />
The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into subarrays and a process can be assigned to each subarray.<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137798Avengers2019-02-16T02:45:36Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
'''! Insert link here !'''<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137797Avengers2019-02-16T02:44:36Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
!Insert link here!<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137796Avengers2019-02-16T02:44:18Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
!Insert link here!<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpeg|200px|thumb|left]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137794Avengers2019-02-16T02:43:16Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
!Insert link here!<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpg|200px|thumb|left]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137793Avengers2019-02-16T02:42:26Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
!Insert link here!<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array_Processing_-_Call_Graph.jpg]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137792Avengers2019-02-16T02:41:17Z<p>Jsidhu26: /* Jaideep - Array Processing */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
!Insert link here!<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:Array Processing - Call Graph.jpg]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=File:Array_Processing_-_Call_Graph.jpeg&diff=137791File:Array Processing - Call Graph.jpeg2019-02-16T02:39:23Z<p>Jsidhu26: This image contains the call graph that was created when gprof was used to profile my program</p>
<hr />
<div>This image contains the call graph that was created when gprof was used to profile my program</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137790Avengers2019-02-16T02:36:57Z<p>Jsidhu26: /* Jaideep */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep - Array Processing ====<br />
<br />
In [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Blaise Barney's Notes on Array Processing], an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."<br />
<br />
I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:<br />
<br />
!Insert link here!<br />
<br />
After using '''gprof''' to profile my program, a call graph is generated with this content:<br />
<br />
[[File:C:\Users\Jaideep Sidhu\Desktop\Seneca\Semester 6\DPS915\callgraph.jpg]]<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137763Avengers2019-02-15T17:48:16Z<p>Jsidhu26: /* Team Members */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Pythagorean Triples<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Array Processing<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno - Pythagorean Triples ====<br />
<br />
[https://www.daniweb.com/programming/software-development/threads/345155/calculating-the-pythagorean-triples Pythagorean Triples by Violet_82]<br />
<br />
"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:<br />
<br />
a^2+b^2=c^2. <br />
<br />
The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]<br />
<br />
In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."<br />
<br />
Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html<br />
<br />
This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.<br />
<br />
The flat profile generated by the program is as follows:<br />
<br />
Flat profile:<br />
<br />
Each sample counts as 0.01 seconds.<br />
% cumulative self self total <br />
time seconds seconds calls Ts/call Ts/call name <br />
97.67 5.98 5.98 Triangle::calculateDimensions(double, double, double, int)<br />
0.00 5.98 0.00 2842 0.00 0.00 Triangle::printDimensions(double, double, double)<br />
0.00 5.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN8TriangleC2Edddi<br />
<br />
We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them. <br />
<br />
Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly. <br />
<br />
For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.<br />
<br />
[https://github.com/brucremo/DPS915 Code and Execution Instructions on GitHub]<br />
<br />
==== Jaideep ====<br />
Subject:<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137746Avengers2019-02-15T03:29:45Z<p>Jsidhu26: /* Jaideep Sidhu */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Some responsibility <br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Some different responsibility<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno ====<br />
Subject: [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Array Processing by Blaise Barney]<br />
<br />
==== Jaideep ====<br />
Subject:<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137745Avengers2019-02-15T03:29:16Z<p>Jsidhu26: /* Team Members */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Some responsibility <br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Some different responsibility<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno ====<br />
Subject: [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Array Processing by Blaise Barney]<br />
<br />
==== Jaideep Sidhu ====<br />
Subject:<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=Avengers&diff=137744Avengers2019-02-15T03:28:10Z<p>Jsidhu26: /* Team Members */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
= Team Avengers =<br />
== Team Members == <br />
# [mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Bruno Alexander Cremonese de Morais], Some responsibility <br />
# ...<br />
[mailto:bacremonese-de-morai@senecacollege.ca?subject=gpu610 Email All]<br />
<br />
# [mailto:jsidhu26@senecacollege.ca?subject=gpu610 Jaideep Sidhu], Some responsibility <br />
# ...<br />
[mailto:jsidhu26@senecacollege.ca?subject=gpu610 Email All]<br />
<br />
== Progress ==<br />
=== Assignment 1 ===<br />
==== Bruno ====<br />
Subject: [https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesArray Array Processing by Blaise Barney]<br />
<br />
==== Jaideep Sidhu ====<br />
Subject:<br />
<br />
=== Assignment 2 ===<br />
=== Assignment 3 ===</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=GPU610/DPS915_G_P_Index_20191&diff=137743GPU610/DPS915 G P Index 201912019-02-15T03:26:46Z<p>Jsidhu26: /* Presentation Schedule */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
<br />
Please add an overview of your group here and create a separate project page for your group!<br />
<br />
= Project Rules =<br />
<br />
# Use the Group page for a Journal of your activities throughout the course of the project<br />
# Project should cover material that differs from the material on the course web site<br />
# Presentation can be in Powerpoint or as a Walkthrough the group project page<br />
# Link to the project page should be included in the Student List table<br />
# Presentation slots (see below) are on a first-come first-served basis<br />
# Attendance at all presentations is mandatory - marks will be deducted for absenteeism<br />
# Marks will be awarded for both Group Wiki page and for the Presentation proper<br />
<br />
<br /><br />
<br />
= Potential Projects =<br />
<br />
* [[GPU610/DPS915_G_P_Index_20157 | Fall 2015 semester (Former Students)]]<br />
* [[GPU610/DPS915_G_P_Index_20171 | Winter 2017 semester (Former Students)]]<br />
* [[GPU610/DPS915_G_P_Index_20181 | Winter 2018 semester (Former Students)]]<br />
<br />
=== Suggested Projects ===<br />
<br />
* image processing - [http://cimg.eu/ CImg Library], [http://dlib.net/imaging.html dlib C++ library]<br />
* data compression - [http://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm LWZ algorithm], [http://www.mattmahoney.net/dc/dce.html Explained by Matt Mahoney]<br />
* grep - [http://www.boost.org/doc/libs/1_36_0/libs/regex/example/grep/grep.cpp Boost], [http://stackoverflow.com/questions/5731035/how-to-implement-grep-in-c-so-it-works-with-pipes-stdin-etc Stack Overflow ]<br />
* exclusive scan - [http://15418.courses.cs.cmu.edu/spring2016/article/4 CMU Assignment 2 Part 2]<br />
* simple circle renderer - [http://15418.courses.cs.cmu.edu/spring2016/article/4 CMU Assignment 2 Part 3]<br />
* object detection/tracking - [http://dlib.net/imaging.html#scan_fhog_pyramid dlib C++ library]<br />
* ray tracing - [http://khrylx.github.io/DSGPURayTracing/ by Yuan Ling (CMU) ] [https://github.com/jazztext/VRRayTracing/ by Kaffine Shearer (CMU)] [https://github.com/szellmann/visionaray Visionaray]<br />
* sorting algorithms - [http://www.cprogramming.com/tutorial/computersciencetheory/sortcomp.html Alex Allain cprogramming.com], [https://www.toptal.com/developers/sorting-algorithms Animations]<br />
* Jacobi's method for Poisson's equation - [https://math.berkeley.edu/~wilken/228A.F07/chr_lecture.pdf Rycroft's Lecture Note]<br />
* Gaussian Regression - [http://abhishekjoshi2.github.io/cuGP/ cuGP]<br />
* Halide - [http://haboric-hu.github.io/ Convolutional Networks]<br />
* Sudoku - [http://www.andrew.cmu.edu/user/astian/ by Tian Debebe (CMU)]<br />
<br />
=== C++ Open Source Libraries ===<br />
* List of open source libraries - [http://en.cppreference.com/w/cpp/links/libs cppreference.com]<br />
<br />
=== Carnegie-Mellon University Links ===<br />
* [http://15418.courses.cs.cmu.edu/spring2016/article/17 Spring 2016]<br />
* [http://15418.courses.cs.cmu.edu/spring2015/competition Spring 2015]<br />
* [http://15418.courses.cs.cmu.edu/spring2014/article/12 Spring 2014]<br />
<br />
=== Other Links ===<br />
* [https://sites.google.com/a/nirmauni.ac.in/cudacodes/cuda-projects Nirma University - restricted use of code to students of Nirma but may be a source of ideas]<br />
<br />
=== Reference Papers ===<br />
* [http://www.cs.utexas.edu/~pingali/CS378/2008sp/papers/GPUSurvey.pdf 2008 Survey Paper - you can search this paper for traditional topic ideas]<br />
* [http://www.nvidia.com/object/cuda_showcase_html.html Nvidia Showcase - probably too challenging - but could lead to simpler ideas]<br />
<br />
=== Interesting aspects to consider in your project ===<br />
* Try a different language - Javascript (Node.js bindings), Python (pyCUDA bindings)<br />
* Try APIs - [http://halide-lang.org/ Halide], OpenCV, Caffe, Latte<br />
* Compare CPU and GPU performance<br />
* Compare different blocksizes<br />
* Compare different algorithms on different machines<br />
* Implement your project on a Jetson TK1 board<br />
<br />
<br /><br />
<br />
= Presentation Schedule =<br />
<br />
<br />
{| border="1"<br />
|-<br />
|Team Name<br />
|Date and Time<br />
|-<br />
|<br />
|March 25 8:00<br />
|-<br />
<br />
|<br />
|March 25 8:20<br />
|-<br />
<br />
|<br />
|March 25 8:40<br />
|-<br />
<br />
|<br />
|March 25 9:00<br />
|-<br />
<br />
|<br />
|March 25 9:20<br />
|-<br />
<br />
|<br />
|March 29 8:00<br />
|-<br />
<br />
|<br />
|March 29 8:20<br />
|-<br />
<br />
|<br />
|March 29 8:40<br />
|-<br />
<br />
|Algo-holics<br />
|March 29 9:00<br />
|-<br />
<br />
|<br />
|March 29 9:20<br />
|-<br />
<br />
|<br />
|April 1 8:00<br />
|-<br />
<br />
|<br />
|April 1 8:20<br />
|-<br />
<br />
|<br />
|April 1 8:40<br />
|-<br />
<br />
|[[Avengers | Avengers]]<br />
|April 1 9:00<br />
|-<br />
<br />
|<br />
|April 1 9:20<br />
|-<br />
<br />
|[[triForce |triForce]] <br />
|April 5 8:00<br />
|-<br />
<br />
|[[Ghost Cells | Ghost Cells]]<br />
|April 5 8:20<br />
|-<br />
<br />
|<br />
|April 5 8:40<br />
|-<br />
<br />
|<br />
|April 5 9:00<br />
|-<br />
<br />
|group 6<br />
|April 5 9:20<br />
|-<br />
<br />
|}<br />
<br />
<br /><br />
<br />
= Group and Project Index =<br />
<br />
You can find a sample project page template [[GPU610/DPS915_Sample_Project_Page | here]]<br />
<br />
== [[GPU610/DPS915_Sample_Project_Page | Sample Group Title and email addresses]] ==<br />
<br />
# [mailto:chris.szalwinski@senecacollege.ca?subject=DPS915 Chris Szalwinski]<br />
# [mailto:fardad.soleimanloo@senecacollege.ca?subject=DPS915 Fardad Soleimanloo]<br />
# [mailto:chris.szalwinski@senecacollege.ca;fardad.soleimanloo@senecacollege.ca?subject=DPS915 eMail All]</div>Jsidhu26https://wiki.cdot.senecacollege.ca/w/index.php?title=GPU610/DPS915_Student_List_20191&diff=137705GPU610/DPS915 Student List 201912019-02-12T23:14:58Z<p>Jsidhu26: /* Student List */</p>
<hr />
<div>{{GPU610/DPS915 Index | 20191}}<br />
<br />
Please add your information to the student list below!<br />
<br />
= Participation =<br />
== Student List Syntax ==<br />
Insert the following at the end of the table (if you are a student in GPU610/DPS915).<br /><br />
<big><pre>|[[User:WN | FN]] ||LN|| [[PN |GN]] ||SB|| [mailto:ID@myseneca.ca?subject=SB ID]<br />
|-</pre></big><br />
Replace the following with your own information: <br /><br />
* WN: Your Wiki User name<br />
* FN: Your First Name<br />
* LN: Your Last Name<br />
* PN: Your Group's Project Page Name on the wiki<br />
* GN: Your Group name<br />
* SB: Your Subject(example: GPU610)<br />
* ID: Your email ID (myseneca id)<br />
<br />
== Student List ==<br />
<br />
<br />
{| class="wikitable sortable" border="1" cellpadding="5"<br />
|+ GPU610/DPS915 - Student List<br />
! First Name !! Last Name !! Team Name !! Subject !! Seneca Id<br />
|-<br />
|[[User:Chris Szalwinski | Chris]]||Szalwinski||[[GPU610/DPS915 Sample Team Page|Team Name]]||GPU610||[mailto:chris.szalwinski@senecacollege.ca?subject=gpu610 chris.szalwinski]<br />
|-<br />
|[[User:ysim2 | Yoosuk]]||Sim||[[Ghost Cells|Ghost Cells]]||DPS915||[mailto:ysim2@myseneca.ca?subject=dps915 ysim2]<br />
|-<br />
|[[User:rdittrich | Robert]] ||Dittrich|| [[Ghost Cells |Ghost Cells]] ||DPS915|| [mailto:rdittrich@myseneca.ca?subject=SB rdittrich]<br />
|-<br />
|[[User:Wwpark | Woosle]]||Park||[[N/A|N/A]]||DPS915||[mailto:wwpark@myseneca.ca?subject=dps915 wwpark]<br />
|-<br />
|[[User:PriyankaDhiman | Priyanka]] ||Dhiman|| [[GPU610/DPS915 Sample Team Page|Team Name]] ||GPU610|| [mailto:pdhiman2@myseneca.ca?subject=SB pdhiman2]<br />
|-<br />
|Dillon||Coull||-||GPU610||[mailto:dcoull@myseneca.ca?subject=gpu610 dcoull]<br />
|-<br />
|[[User:Akkabia | Abdul]]||Kabia||[[GPU610/gpuchill|GPU n' Chill]]||GPU610||[mailto:akkabia@myseneca.ca?subject=gpu610 akkabia]<br />
|-<br />
|[[User:Achisholm1 | Alex]]||achisholm1||[[GPU610/DPS915 Sample Team Page|Team Name]]||GPU610||[mailto:achisholm1@myseneca.ca?subject=gpu610 achisholm1]<br />
|-<br />
|[[User:afaux | Andrew]] ||Faux|| [[GPU610 | Team Name]] ||GPU610|| [mailto:afaux@myseneca.ca?subject=SB ID]<br />
|-<br />
|[[User:ssdhillon20 | Sukhbeer]]||Dhillon||[[Algo_holics |Algo-holics]]||GPU610||[mailto:ssdhillon20@myseneca.ca?subject=GPU610 ssdhillon20]<br />
|-<br />
|[[User:gsingh520 | Gurpreet]]||Singh||[[Algo_holics |Algo-holics]]||GPU610||[mailto:gsingh520@myseneca.ca?subject=GPU610 gsingh520]<br />
|-<br />
|[[User:Edgar Giang | Edgar]]||Giang||[[Algo_holics |Algo-holics]]||GPU610||[mailto:egiang1@myseneca.ca?subject=GPU610 egiang1]<br />
|-<br />
|[[User:DavidFerri | David]] ||Ferri|| [[triForce |triForce]] ||GPU610|| [mailto:dpferri@myseneca.ca?subject=GPU610 dpferri]<br />
|-<br />
|[[User:Vincent Terpstra | Vincent]] ||Terpstra|| [[triForce |triForce]] ||GPU610|| [mailto:vterpstra@myseneca.ca?subject=GPU610 vterpstra]<br />
|-<br />
|[[User:Raymond Kiguru | Raymond]] ||Kiguru|| [[triForce |triForce]] ||GPU610|| [mailto:rkiguru@myseneca.ca?subject=GPU610 rkiguru]<br />
|-<br />
|[[User: Xiaowei Huang | Xiaowei]]||Huang||[[GPU610/DPS915 Sample Team Page|group 6]]||GPU610||[mailto:xhuang110@myseneca.ca?subject=gpu610 xhuang110]<br />
|-<br />
|[[User:Akshat | Akshatkumar]] ||Patel|| [[N/A|N/A]] ||DPS915|| [mailto:apatel271@myseneca.ca?subject=dps915 apatel271]<br />
|-<br />
|[[User:Yyuan34 | Yihang]] ||Yuan|| [[GPU610/DPS915 Sample Team Page |group 6]] ||GPU610|| [mailto:yyuan34@myseneca.ca?subject=gpu610 yyuan34]<br />
|-<br />
|[[User:Dserpa | Daniel]] ||Serpa|| [[GPU610/gpuchill|GPU n' Chill]] ||GPU610|| [mailto:dserpa@myseneca.ca?subject=GPU610 dserpa]<br />
|-<br />
|[[User:jtardif1 | Josh]] ||Tardif|| [[GPU610/gpuchill|GPU n' Chill]] ||GPU610|| [mailto:jtardif1@myseneca.ca?subject=GPU610 jtardif1]<br />
|-<br />
|[[User:Henryleung | Henry]] ||Leung|| [[GPU610/DPS915 Sample Team Page |Team Name]] ||GPU610|| [mailto:hleung16@myseneca.ca?subject=GPU610 hleung16]<br />
|-<br />
|[[User:zzhou33 | Zhijian]] ||Zhou|| [[GPU610/DPS915 | group 6]] ||DPS915|| [mailto:zzhou33@myseneca.ca?subject=DPS915 zzhou33]<br />
|-<br />
|[[User:jPitters | Jordan]] ||Pitters|| [[N/A|N/A]] ||DPS915|| [mailto:jpitters@myseneca.ca?subject=dps915 jpitters]<br />
|-<br />
|[[User:Izhogova | Inna]] ||Zhogova|| [[Ghost Cells|Ghost Cells]] ||DPS915|| [mailto:izhogova@myseneca.ca?subject=dps915 izhogova]<br />
|-<br />
|[[User:Brucremo | Bruno Alexander]] ||Cremonese de Morais|| [[Avengers|Avengers]] ||DPS915|| [mailto:bacremonese-de-morai@myseneca.ca?subject=dps915 brucremo]<br />
|-<br />
|[[User:Jsidhu26 | Jaideep ]] ||Sidhu|| [[Avengers|Avengers]] ||DPS915|| [mailto:jsidhu26@myseneca.ca?subject=dps915 jsidhu26]<br />
|-<br />
}</div>Jsidhu26