Difference between revisions of "Team CNT"

From CDOT Wiki
Jump to: navigation, search
Line 32: Line 32:
  
 
=== Assignment 2 ===
 
=== Assignment 2 ===
 +
I am not sure we are doing the Team work or individual. But since I did not here anything from my team mate I decided to go with my first assignment. I met many difficulties to adapt the existing c++ code to transfer some computations on GPU. In my first assignment I am doing different manipulations with the image. Image is a class. And this is one of the reason of delay with my second assignment. Apparently, I can't pass class to the Kernel. Kernel accepts only low types variable. So all the time I was truing different approaches how to parallelize my code. I even bought a new computer with CUDA compatible GPU card. And finally, when I found that method image.negate() would be easy to try to compute on Kernel, I've met some other difficulties. My code was done, I was happy. I decided to  make 1000 negates of image to make sense of parallelyzing the code. I rewrote some code in my first assignment, remade the profile for first assignment (1000 negate operations). It took around 13 second on Linux. I profiled my code with Kernel computations on CUDA profiler, and.....it took around 60 seconds....So here is my new profile for assignment 1:
 +
[[File:profile.jpg]]
 
=== Assignment 3 ===
 
=== Assignment 3 ===

Revision as of 22:49, 13 March 2013

Team CNT

Team Members

  1. Natalia Levkevitch
  2. Tony Yu


Progress

Assignment 1

Natalia's Findings

I have found a C++ code which was uploaded on Dreamincode.net and is open source. The application is called "Basic Image Processor". At the beginning I was stuck with trying to convert the JPG file to be able to process through application. But after some effort I figured how to compile it. The application allows the user to select what he/she wants to do with the image including rotate the image, that is what I am thinking to parallelize. But basically, there are other options which could be interesting to try to parallelize as well. I'll see how it will be going.

Since the application was designed to ask user for options, I had to modify the code to be able to profile the code. So I hard-coded the file name of the image, and preselected the option "rotate the image".

Example of profiling with pgm image of size 35M

Profile.png

So as you can see, most of the time are spent on rotate the image. Overall it is still not so long as I expected....So I'll try bigger picture. Either way, it is interesting to try to decrease even this time.

Example of profiling with pgm image of size 1.8M and with different options

Options(1.8).jpg

I decided to try to enable all options to process the file: rotate, translate, shrink, enlarge, reflect, mean grey. What I discovered, is that the program aborts after a while. After trying with different images, I realized the image size is matters. So the program can't do everything with a big size image like 30Mb. But it works fine with smaller images. The screen shot above is profiling using image of size 1.8Mb. The screen shot under the text - using the image of size 4Mb.


Example of profiling with pgm image of size 4M and with different options

Options(4).jpg


Assignment 2

I am not sure we are doing the Team work or individual. But since I did not here anything from my team mate I decided to go with my first assignment. I met many difficulties to adapt the existing c++ code to transfer some computations on GPU. In my first assignment I am doing different manipulations with the image. Image is a class. And this is one of the reason of delay with my second assignment. Apparently, I can't pass class to the Kernel. Kernel accepts only low types variable. So all the time I was truing different approaches how to parallelize my code. I even bought a new computer with CUDA compatible GPU card. And finally, when I found that method image.negate() would be easy to try to compute on Kernel, I've met some other difficulties. My code was done, I was happy. I decided to make 1000 negates of image to make sense of parallelyzing the code. I rewrote some code in my first assignment, remade the profile for first assignment (1000 negate operations). It took around 13 second on Linux. I profiled my code with Kernel computations on CUDA profiler, and.....it took around 60 seconds....So here is my new profile for assignment 1: Profile.jpg

Assignment 3