Difference between revisions of "DPS915 Toad"

From CDOT Wiki
Jump to: navigation, search
(PI Calculation)
(Progress)
Line 46: Line 46:
 
  time  seconds  seconds    calls  Ts/call  Ts/call  name
 
  time  seconds  seconds    calls  Ts/call  Ts/call  name
 
   0.00      0.00    0.00        1    0.00    0.00  _GLOBAL__sub_I_main
 
   0.00      0.00    0.00        1    0.00    0.00  _GLOBAL__sub_I_main
 +
 +
====Image Processing====
 +
 +
 +
For this assignment, I had originally intended to do image processing for .png and .jpeg files, however this turned out to be much harder than I thought. Without the necessary skills to write an image parser, I was reliant on already established libraries such as `ImageMagick` and `pnglib`. However, it became problematic to get these libraries to run on my personal machine as well as the matrix servers.
 +
 +
I came across some code written by Christopher Ginac (open source) located here (http://www.dreamincode.net/forums/topic/76816-image-processing-tutorial/), that I was able to use to manipulate images (create negatives, rotate, flip, etc…) and then test how long it takes to perform each action. When I first began my profiling, I quickly found out that small images simply do not require a lot of computational power to manipulate. So my solution was to take a simple 2.2 MB image, enlarge it to 110 MB, work with that image instead and then shrink it back down to the original 2.2 MB size.
 +
 +
This is the profile:
 +
 +
  %  cumulative  self              self    total
 +
time  seconds  seconds    calls  s/call  s/call      name
 +
34.89      3.52    3.52        5          0.70    0.70      Image::operator=(Image const&)
 +
22.99      5.84    2.32        4          0.58    0.58      Image::Image(int, int, int)
 +
14.67      7.32    1.48                                              Image::rotateImage(int, Image&)
 +
11.10      8.44    1.12        1          1.12    1.12      Image::Image(Image const&)
 +
10.11      9.46    1.02                                              writeImage(char*, Image&)
 +
  2.28      9.69    0.23                                              Image::negateImage(Image&)
 +
  1.88      9.88    0.19                                              Image::reflectImage(bool, Image&)
 +
  1.68    10.05    0.17                                              Image::enlargeImage(int, Image&)
 +
  0.30    10.08    0.03                                              Image::shrinkImage(int, Image&)
 +
  0.10    10.09    0.01                                              readImage(char*, Image&)
 +
  0.00    10.09    0.00        5          0.00    0.00      Image::~Image()
 +
  0.00    10.09    0.00        1          0.00    0.00      _GLOBAL__sub_I__ZN5ImageC2Ev
 +
 +
 +
As you can see, an image rotation by 180 degrees took approximately 14.67 percent of the computational time. Although it only took 1.48 seconds, that’s still seems to be quite a long time.
 +
 +
Consider this: video games render on my home machine between 80 and 120 frames per second. Each frame has a resolution of 1920x1080, which by simple high definition standards, would result in an image size of around 3.5 MB’s. If we multiply that by 100, we can 350 MB’s of data and then when we consider that more that just rotations are occurring, we quickly find that parallel programming is the only solution to parsing so much data at that required rate. Although the CPU has low latency and low throughput, the GPU with its high latency and high throughput should be able to give us a hand here.
 +
 +
When I opened up the code to look at the logic responsible for the rotation of the image, I found two double for loops, one after the other (total of 4 ‘for’ loops). It is then logical to take each computation performed on this matrix (which is a 2D array of ints) and write a CUDA kernel task to run parallel in order to reduce rendering time.
 +
 +
My plan is to parse the image many times (in a loop), in order to ‘simulate’ a game rendering - so that I can significantly increase the amount of data being passed to the processor. Therefore, we should be able to see a clear difference between the native C++ code written to execute on a CPU, versus the CUDA code written to run on the Nvidia GPU.
  
 
=== Assignment 2 ===
 
=== Assignment 2 ===
 
=== Assignment 3 ===
 
=== Assignment 3 ===

Revision as of 17:14, 14 October 2015

Project Name Goes here

Team Members

  1. Sandeep Saldanha
  2. Kris Vukasinovic
  3. ...

Email All

Progress

Assignment 1

PI Calculation

Explanation

One of the profiles we decided to look at was the Monte Carlo PI Approximation method for solving the value of PI. We found that as the number of iterations increased exponentially by 10, so did our ability to get more digits for the value of PI. We believe that the scope of this program is too small to analyze as a group of 2 and are not using this as our program to solve.

Code
double xValue, yValue;
for(int i = 0; i < npoints; i++)
{
//Generate random numbers
xValue = (double) rand()/RAND_MAX;
yValue = (double) rand()/RAND_MAX;

if(sqrt((xValue*xValue)+(yValue*yValue)) <= 1)
{
circle_count++;
}
}

double pi, ds;

cout<<circle_count<<"/"<<npoints<<endl;
ds = (double)circle_count/npoints;
cout<<ds<<endl;
pi = ((4.0)*ds);

cout<<"PI = "<< pi <<endl;

"

Sample GPROF
Flat profile:

Each sample counts as 0.01 seconds.
no time accumulated

 %   cumulative   self              self     total
time   seconds   seconds    calls  Ts/call  Ts/call  name
 0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_main

Image Processing

For this assignment, I had originally intended to do image processing for .png and .jpeg files, however this turned out to be much harder than I thought. Without the necessary skills to write an image parser, I was reliant on already established libraries such as `ImageMagick` and `pnglib`. However, it became problematic to get these libraries to run on my personal machine as well as the matrix servers.

I came across some code written by Christopher Ginac (open source) located here (http://www.dreamincode.net/forums/topic/76816-image-processing-tutorial/), that I was able to use to manipulate images (create negatives, rotate, flip, etc…) and then test how long it takes to perform each action. When I first began my profiling, I quickly found out that small images simply do not require a lot of computational power to manipulate. So my solution was to take a simple 2.2 MB image, enlarge it to 110 MB, work with that image instead and then shrink it back down to the original 2.2 MB size.

This is the profile:

 %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call       name
34.89      3.52     3.52        5          0.70     0.70       Image::operator=(Image const&)
22.99      5.84     2.32        4          0.58     0.58       Image::Image(int, int, int)
14.67      7.32     1.48                                              Image::rotateImage(int, Image&)
11.10      8.44     1.12        1          1.12     1.12       Image::Image(Image const&)
10.11      9.46     1.02                                              writeImage(char*, Image&)
 2.28      9.69     0.23                                               Image::negateImage(Image&)
 1.88      9.88     0.19                                               Image::reflectImage(bool, Image&)
 1.68     10.05     0.17                                              Image::enlargeImage(int, Image&)
 0.30     10.08     0.03                                              Image::shrinkImage(int, Image&)
 0.10     10.09     0.01                                              readImage(char*, Image&)
 0.00     10.09     0.00        5          0.00     0.00       Image::~Image()
 0.00     10.09     0.00        1          0.00     0.00       _GLOBAL__sub_I__ZN5ImageC2Ev


As you can see, an image rotation by 180 degrees took approximately 14.67 percent of the computational time. Although it only took 1.48 seconds, that’s still seems to be quite a long time.

Consider this: video games render on my home machine between 80 and 120 frames per second. Each frame has a resolution of 1920x1080, which by simple high definition standards, would result in an image size of around 3.5 MB’s. If we multiply that by 100, we can 350 MB’s of data and then when we consider that more that just rotations are occurring, we quickly find that parallel programming is the only solution to parsing so much data at that required rate. Although the CPU has low latency and low throughput, the GPU with its high latency and high throughput should be able to give us a hand here.

When I opened up the code to look at the logic responsible for the rotation of the image, I found two double for loops, one after the other (total of 4 ‘for’ loops). It is then logical to take each computation performed on this matrix (which is a 2D array of ints) and write a CUDA kernel task to run parallel in order to reduce rendering time.

My plan is to parse the image many times (in a loop), in order to ‘simulate’ a game rendering - so that I can significantly increase the amount of data being passed to the processor. Therefore, we should be able to see a clear difference between the native C++ code written to execute on a CPU, versus the CUDA code written to run on the Nvidia GPU.

Assignment 2

Assignment 3