Difference between revisions of "DPS915/CodeKirin"

From CDOT Wiki
Jump to: navigation, search
(Assignment 2 and 3)
(Assignment 2 and 3)
Line 62: Line 62:
  
 
[[File:ChartMonteCarlo.JPG]]
 
[[File:ChartMonteCarlo.JPG]]
 +
 +
 +
 +
''' Some Code Snippets '''
 +
 +
Sets the tid (threadIdx.x) index of the temp array in shared memory to 1, when the total <= 1.0, and sync the threads. Then sum up all the 1s in the array for that specific block and pass it out into another array.
 +
 +
[[File:Code1.JPG]]
 +
 +
After copying from the device to host, obtain the total sum of results from all kernels and calculates the value of pi.
 +
 +
[[File:Code2.JPG]]

Revision as of 17:45, 3 December 2014

Calculations of Pi

Team Member

  1. Tony Yu

Progress

Assignment 1

Findings

The code used is taken from the following site http://www.cplusplus.com/forum/beginner/1149/ with some changes to the code.

This program will calculate pi to a precision based on the value entered by the user. Currently it displays to a precision of 9 decimal places.

Will attempt to figure out a way to calculate to as much decimal places as possible, which should drastically increase the time it takes to run the program. Currrent possible solution is to use the BigNumber library.

(Updated)

The code used is taken from the following site https://helloacm.com/cc-coding-exercise-finding-approximation-of-pi-using-monto-carlo-algorithm/ with some changes to it.

This program will calculate pi using the Monte Carlo approach to a precision based on the value entered by the user.

Value of 1 Million

Profile a1(million).JPG

Value of 10 Million

Profile a1(10 million).JPG

Value of 100 Million

Profile a1(100 million).JPG

Value of 1 Billion

Profile a1(billion).JPG

Assignment 2 and 3

I have combined the 2nd and 3rd part of the assignment together, since I had some issues with the kernel.

The following results compares the upgraded code with the original to show a significant increase in speed.

(Note)

For some reason the code crashes my graphic driver past 8000000 (8 million) dots, and even at 8 million it crashes most of the time. The Nvidia Visual Profiler doesn't work either, it gets stuck on generating timeline, so I used clock_t in the code instead in order to calculate execution time of the kernel. Don't think this is 100% accurate though.

Value of 1 Million

MillionMonteCarlo.JPG

Value of 5 Million

5MillionMonteCarlo.JPG

Value of 8 Million

8MillionMonteCarlo.JPG


Comparison Chart

ChartMonteCarlo.JPG


Some Code Snippets

Sets the tid (threadIdx.x) index of the temp array in shared memory to 1, when the total <= 1.0, and sync the threads. Then sum up all the 1s in the array for that specific block and pass it out into another array.

Code1.JPG

After copying from the device to host, obtain the total sum of results from all kernels and calculates the value of pi.

Code2.JPG