Difference between revisions of "BLAStoise"

From CDOT Wiki
Jump to: navigation, search
m (Assignment 1)
(Assignment 1)
Line 310: Line 310:
 
'''Running the program'''
 
'''Running the program'''
  
To profile the program, run this command.
+
To compile the program on Linux you must download the OpenCV library and then create a makefile that will create the executable OilPaint.exe. To compile the program in visual studio you will need to set project properties for OpenCV through setting the C/C++ Additional Include Directories, Linker Additional Library Directories and Input Additional Dependencies (opencv_world320d.lib).
  
$
+
Run the executable with the arguments 5 (brush size), 20 (colour intensity), filename.format (including file format)
  
Run the code with
+
Output:
  
$ "OilPaint" 5 20 filename.jpg
+
[[File:t2.jpg|600px]] [[File:OilVersion-t2.jpg|600px]] 
  
Output
+
gprof Output:
 
+
  Flat profile:
[[File:t2.jpg|600px]] [[File:OilVersion-t2.jpg|600px]]   
+
  Each sample counts as 0.01 seconds.
 +
    %  cumulative  self                self    total         
 +
  time  seconds  seconds      calls  s/call  s/call  name   
 +
  79.79    20.17    20.17          1    20.17    24.77  paint(cv::Mat&, cv::Mat&, int, int, int, int)
 +
  12.53    23.34    3.17 1742140400    0.00    0.00  cv::Vec<unsigned char, 3>& cv::Mat::at<cv::Vec<unsigned char, 3> >(int, int)
 +
    5.62    24.76    1.42 1737354300    0.00    0.00  cv::Vec<unsigned char, 3>::operator[](int)
 +
    1.94    25.25    0.49          1    0.49    0.49 cv::Size_<int>::Size_(int, int)
 +
    ... (there are a lot of other calls to the library that did not significantly affect the profiling)
  
Profile Output
+
vs Performance profiler Output:
 +
[[File:VSPROFILE.png]]
  
 
The time required for the program depends largely on the file size being converted. Around 5 seconds for a 50KB image and 100 seconds for a 1MB image. It depends on the brush size and intensity levels as well.  
 
The time required for the program depends largely on the file size being converted. Around 5 seconds for a 50KB image and 100 seconds for a 1MB image. It depends on the brush size and intensity levels as well.  
Line 328: Line 336:
 
'''Analysis'''
 
'''Analysis'''
  
The profiling revealed that 99% of the processing time is spent in the paint function where the for-loop logic is located. Within that 99% the program spends roughly 2/3rds of its time reading accessing data through the "at" function of the OpenCV Mat class (n-dimensional dense array class). The other 1/3 is spent on direct access through OpenCV’s Vec class (short numerical vectors). The for-loop is structured divides the picture up based on brush size. Then it finds the colour for each pixel in that section. Finally, it then averages the intensity to produce the final colour of that group of pixels. This is what makes this program ideal for parallelizing, because each iteration of this for-loop is calculating the final colours for each pixel. (SIMD type of process, the single instruction is to find the final colour and the multiple data is the pixels.)
+
The profiling revealed that 80-99% of the processing time is spent in the paint function where the for-loop logic is located. Within that 99% the program spends roughly 2/3 of its time reading accessing data through the "at" function of the OpenCV Mat class (n-dimensional dense array class). The other 1/3 is spent on direct access through OpenCV’s Vec class (short numerical vectors). The for-loop is structured divides the picture up based on brush size. Then it finds the colour for each pixel in that section. Finally, it then averages the intensity to produce the final colour of that group of pixels. This is what makes this program ideal for parallelizing, because each iteration of this for-loop is calculating the final colours for each pixel. (SIMD type of process, the single instruction is to find the final colour and the multiple data is the pixels.)
 
    
 
    
 
   //Simplified for-loop structure
 
   //Simplified for-loop structure

Revision as of 17:58, 12 February 2017


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Project Name Goes here

Team Members

  1. Matt Babol, Sudoku
  2. Jonathan Desmond, Oil Painting
  3. Sallie Jiang, Some other responsibility

Email All

                      _
           _,..-"""--' `,.-".
         ,'      __.. --',  |
       _/   _.-"' |    .' | |       ____
 ,.-""'    `-"+.._|     `.' | `-..,',--.`.
|   ,.                      '    j 7    l \__
|.-'                            /| |    j||  .
`.                   |         / L`.`""','|\  \
  `.,----..._       ,'`"'-.  ,'   \ `""'  | |  l
    Y        `-----'       v'    ,'`,.__..' |   .
     `.                   /     /   /     `.|   |
       `.                /     l   j       ,^.  |L
         `._            L       +. |._   .' \|  | \
           .`--...__,..-'""'-._  l L  """    |  |  \
         .'  ,`-......L_       \  \ \     _.'  ,'.  l
      ,-"`. / ,-.---.'  `.      \  L..--"'  _.-^.|   l
.-"".'"`.  Y  `._'   '    `.     | | _,.--'"     |   |
 `._'   |  |,-'|      l     `.   | |"..          |   l
 ,'.    |  |`._'      |      `.  | |_,...---"""""`    L
/   |   j _|-' `.     L       | j ,|              |   |
`-,"._,-+' /`---^..../._____,.L',' `.             |\  |
  |,'      L                   |     `-.          | \j
           .                    \       `,        |  |
            \                __`.Y._      -.     j   |
             \           _.,'       `._     \    |  j
             ,-"`-----""""'           |`.    \  7   |
            /  `.        '            |  \    \ /   |
           |     `      /             |   \    Y    |
           |      \    .             ,'    |   L_.-')
            L      `.  |            /      ]     _.-^._
             \   ,'  `-7         ,-'      / |  ,'      `-._
            _,`._       `.   _,-'        ,',^.-            `.
         ,-'     v....  _.`"',          _:'--....._______,.-'
       ._______./     /',,-'"'`'--.  ,-'  `.
                """""`.,'         _\`----...'
                       --------""'

Progress

Assignment 1

Sudoku Solver by Matt B.

Sudoku solver is a program that solves a sudoku puzzle. The user can input an existing file and have the program solve this, or can manually enter values to be solved. The sudoku puzzle is 9x9 in size. The data needs to be in a specific format for the program to work. There are 9 rows of values, with each cell/element in the row needing to be separated by a space. A value of 0 tells the program to solve this value.

Original source code can be found here.


Easy puzzle

To compile the program, open the terminal and go to the projects directory

$ g++ -std=c++0x -pg solver.cpp checks.cpp checksolution.cpp -o Sudoku

This will create an executable file called Sudoku. -pg is used for creating a gmon.out file, which will allow us to profile the program with arguments.

This is the easy sudoku puzzle that will be running through the program first. The file is saved as 'puzzle' in the same directory.

0 6 0 0 0 0 9 7 2
0 5 0 0 0 2 0 0 3
0 7 0 3 9 0 5 0 0
2 0 0 0 0 5 4 0 8
0 0 0 0 0 0 0 0 0
3 0 1 8 0 0 0 0 6
0 0 4 0 2 3 0 8 0
7 0 0 9 0 0 0 2 0
9 2 5 0 0 0 0 4 0

Run the code with

$ ./Sudoku puzzle

After the program is done running, the result is

1 6 3 4 5 8 9 7 2 
4 5 9 7 1 2 8 6 3 
8 7 2 3 9 6 5 1 4 
2 9 7 1 6 5 4 3 8 
5 8 6 2 3 4 1 9 7 
3 4 1 8 7 9 2 5 6 
6 1 4 5 2 3 7 8 9 
7 3 8 9 4 1 6 2 5 
9 2 5 6 8 7 3 4 1

Test Case

To profile the program, run this command.

$ gprof  -p -b ./Sudoku gmon.out > Sudoku.flt

The profiling result

Flat profile:

Each sample counts as 0.01 seconds.
no time accumulated

 %   cumulative   self              self     total           
time   seconds   seconds    calls  Ts/call  Ts/call  name    
 0.00      0.00     0.00     4539     0.00     0.00  checkRow(int, int)
 0.00      0.00     0.00     1620     0.00     0.00  checkColumn(int, int)
 0.00      0.00     0.00     1120     0.00     0.00  placeNum(int, int)
 0.00      0.00     0.00      698     0.00     0.00  checkSquare(int, int, int)
 0.00      0.00     0.00      476     0.00     0.00  goBack(int&, int&)
 0.00      0.00     0.00        2     0.00     0.00  print(int (*) [9])
 0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_sudoku
 0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_temp
 0.00      0.00     0.00        1     0.00     0.00  solveSudoku()
 0.00      0.00     0.00        1     0.00     0.00  storePositions()
 0.00      0.00     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)
 0.00      0.00     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)


		Call graph
 
granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name
                0.00    0.00    4539/4539        placeNum(int, int) [10]
[8]      0.0    0.00    0.00    4539         checkRow(int, int) [8]
-----------------------------------------------
                0.00    0.00    1620/1620        placeNum(int, int) [10]
[9]      0.0    0.00    0.00    1620         checkColumn(int, int) [9]
-----------------------------------------------
                0.00    0.00    1120/1120        solveSudoku() [16]
[10]     0.0    0.00    0.00    1120         placeNum(int, int) [10]
                0.00    0.00    4539/4539        checkRow(int, int) [8]
                0.00    0.00    1620/1620        checkColumn(int, int) [9]
                0.00    0.00     698/698         checkSquare(int, int, int) [11]
-----------------------------------------------
                0.00    0.00     698/698         placeNum(int, int) [10]
[11]     0.0    0.00    0.00     698         checkSquare(int, int, int) [11]
-----------------------------------------------
                0.00    0.00     476/476         solveSudoku() [16]
[12]     0.0    0.00    0.00     476         goBack(int&, int&) [12]
-----------------------------------------------
                0.00    0.00       2/2           main [6]
[13]     0.0    0.00    0.00       2         print(int (*) [9]) [13]
-----------------------------------------------
                0.00    0.00       1/1           __libc_csu_init [30]
[14]     0.0    0.00    0.00       1         _GLOBAL__sub_I_sudoku [14]
                0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [18]
-----------------------------------------------
                0.00    0.00       1/1           __libc_csu_init [30]
[15]     0.0    0.00    0.00       1         _GLOBAL__sub_I_temp [15]
                0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
                0.00    0.00       1/1           main [6]
[16]     0.0    0.00    0.00       1         solveSudoku() [16]
                0.00    0.00    1120/1120        placeNum(int, int) [10]
                0.00    0.00     476/476         goBack(int&, int&) [12]
-----------------------------------------------
                0.00    0.00       1/1           main [6]
[17]     0.0    0.00    0.00       1         storePositions() [17]
-----------------------------------------------
                0.00    0.00       1/1           _GLOBAL__sub_I_sudoku [14]
[18]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [18]
-----------------------------------------------
                0.00    0.00       1/1           _GLOBAL__sub_I_temp [15]
[19]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
  
Index by function name

 [14] _GLOBAL__sub_I_sudoku  [16] solveSudoku()          [13] print(int (*) [9])
 [15] _GLOBAL__sub_I_temp    [17] storePositions()       [12] goBack(int&, int&)
  [9] checkColumn(int, int)  [18] __static_initialization_and_destruction_0(int, int) [8] checkRow(int, int)
 [11] checkSquare(int, int, int) [19] __static_initialization_and_destruction_0(int, int) [10] placeNum(int, int)


Analysis

The analysis reveals that the program took no time in solving this puzzle. However, the function checkRow and checkColumn were called the most. These two functions are used for checking whether the row and columns are correct. For a deeper analysis, a harder puzzle must be used.


Hard puzzle

For the hard puzzle, below is the input file as well as the result

0 0 0 0 0 0 0 0 0 
0 0 0 0 0 3 0 8 5 
0 0 1 0 2 0 0 0 0 
0 0 0 5 0 7 0 0 0 
0 0 4 0 0 0 1 0 0 
0 9 0 0 0 0 0 0 0 
5 0 0 0 0 0 0 7 3 
0 0 2 0 1 0 0 0 0 
0 0 0 0 4 0 0 0 9 

9 8 7 6 5 4 3 2 1 
2 4 6 1 7 3 9 8 5 
3 5 1 9 2 8 7 4 6 
1 2 8 5 3 7 6 9 4 
6 3 4 8 9 2 1 5 7 
7 9 5 4 6 1 8 3 2 
5 1 9 2 8 6 4 7 3 
4 7 2 3 1 9 5 6 8 
8 6 3 7 4 5 2 1 9 

The profiling results are

Flat profile:

Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
45.34      7.76     7.76 622577597     0.00     0.00  checkRow(int, int)
18.63     10.94     3.19 223365661     0.00     0.00  checkColumn(int, int)
14.38     13.40     2.46 157353814     0.00     0.00  placeNum(int, int)
13.24     15.67     2.27 100608583     0.00     0.00  checkSquare(int, int, int)
 4.80     16.49     0.82 69175252     0.00     0.00  goBack(int&, int&)
 3.46     17.08     0.59        1     0.59    17.08  solveSudoku()
 0.29     17.13     0.05        1     0.05     0.05  _GLOBAL__sub_I_sudoku
 0.00     17.13     0.00        2     0.00     0.00  print(int (*) [9])
 0.00     17.13     0.00        1     0.00     0.00  _GLOBAL__sub_I_temp
 0.00     17.13     0.00        1     0.00     0.00  storePositions()
 0.00     17.13     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)
 0.00     17.13     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)


		Call graph

granularity: each sample hit covers 2 byte(s) for 0.06% of 17.13 seconds 

index % time    self  children    called     name
                                                 <spontaneous>
[1]     99.7    0.00   17.08                 main [1]
                0.59   16.49       1/1           solveSudoku() [2]
                0.00    0.00       2/2           print(int (*) [9]) [16]
                0.00    0.00       1/1           storePositions() [18]
-----------------------------------------------
                0.59   16.49       1/1           main [1]
[2]     99.7    0.59   16.49       1         solveSudoku() [2]
                2.46   13.21 157353814/157353814     placeNum(int, int) [3]
                0.82    0.00 69175252/69175252     goBack(int&, int&) [7]
-----------------------------------------------
                2.46   13.21 157353814/157353814     solveSudoku() [2]
[3]     91.5    2.46   13.21 157353814         placeNum(int, int) [3]
                7.76    0.00 622577597/622577597     checkRow(int, int) [4]
                3.19    0.00 223365661/223365661     checkColumn(int, int) [5]
                2.27    0.00 100608583/100608583     checkSquare(int, int, int) [6]
-----------------------------------------------
                7.76    0.00 622577597/622577597     placeNum(int, int) [3]
[4]     45.3    7.76    0.00 622577597         checkRow(int, int) [4]
-----------------------------------------------
                3.19    0.00 223365661/223365661     placeNum(int, int) [3]
[5]     18.6    3.19    0.00 223365661         checkColumn(int, int) [5]
-----------------------------------------------
                2.27    0.00 100608583/100608583     placeNum(int, int) [3]
[6]     13.2    2.27    0.00 100608583         checkSquare(int, int, int) [6]
-----------------------------------------------
                0.82    0.00 69175252/69175252     solveSudoku() [2]
[7]      4.8    0.82    0.00 69175252         goBack(int&, int&) [7]
-----------------------------------------------
                0.05    0.00       1/1           __libc_csu_init [9]
[8]      0.3    0.05    0.00       1         _GLOBAL__sub_I_sudoku [8]
                0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
                                                 <spontaneous>
[9]      0.3    0.00    0.05                 __libc_csu_init [9]
                0.05    0.00       1/1           _GLOBAL__sub_I_sudoku [8]
                0.00    0.00       1/1           _GLOBAL__sub_I_temp [17]
-----------------------------------------------
                0.00    0.00       2/2           main [1]
[16]     0.0    0.00    0.00       2         print(int (*) [9]) [16]
-----------------------------------------------
                0.00    0.00       1/1           __libc_csu_init [9]
[17]     0.0    0.00    0.00       1         _GLOBAL__sub_I_temp [17]
                0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [20]
-----------------------------------------------
                0.00    0.00       1/1           main [1]
[18]     0.0    0.00    0.00       1         storePositions() [18]
-----------------------------------------------
                0.00    0.00       1/1           _GLOBAL__sub_I_sudoku [8]
[19]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
                0.00    0.00       1/1           _GLOBAL__sub_I_temp [17]
[20]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [20]
-----------------------------------------------

Index by function name

   [8] _GLOBAL__sub_I_sudoku   [2] solveSudoku()          [16] print(int (*) [9])
  [17] _GLOBAL__sub_I_temp    [18] storePositions()        [7] goBack(int&, int&)
   [5] checkColumn(int, int)  [19] __static_initialization_and_destruction_0(int, int) [4] checkRow(int, int)
   [6] checkSquare(int, int, int) [20] __static_initialization_and_destruction_0(int, int) [3] placeNum(int, int)

Analysis

With a harder puzzle, the time for the program to solve the puzzle increased significantly. The total time to complete this puzzle was 17.13 seconds. The program spends almost half of its running time checking if the row is correct, and another 18% of its time checking whether the column is correct. This program contains thousands of calls to check the row and column values, this is why the program would be excellent project for parallelizing.



Oil Painting By Sallie J.

This program converts a regular image into a stylized oil painting, it uses OpenCV. The painting algorithm depends on the brush size and colour intensity. The program takes three command line arguments: int for brush size, int for intensity and file name of an image. Upon finishing, the program produces the original image along with the oil paint version and the total time required in seconds.

The original source code can be found here.

However there have been some changes to make testing and profiling slightly easier. (Mainly changes are putting the for-loop logic into a function outside of the main and modifying for command line arguments instead of hard coding values.)

Running the program

To compile the program on Linux you must download the OpenCV library and then create a makefile that will create the executable OilPaint.exe. To compile the program in visual studio you will need to set project properties for OpenCV through setting the C/C++ Additional Include Directories, Linker Additional Library Directories and Input Additional Dependencies (opencv_world320d.lib).

Run the executable with the arguments 5 (brush size), 20 (colour intensity), filename.format (including file format)

Output:

T2.jpg OilVersion-t2.jpg

gprof Output:

 Flat profile:
 Each sample counts as 0.01 seconds.
   %   cumulative   self                 self     total           
  time   seconds   seconds      calls   s/call   s/call  name    
  79.79     20.17    20.17          1    20.17    24.77  paint(cv::Mat&, cv::Mat&, int, int, int, int)
  12.53     23.34     3.17 1742140400     0.00     0.00  cv::Vec<unsigned char, 3>& cv::Mat::at<cv::Vec<unsigned char, 3> >(int, int)
   5.62     24.76     1.42 1737354300     0.00     0.00  cv::Vec<unsigned char, 3>::operator[](int)
   1.94     25.25     0.49          1     0.49     0.49  cv::Size_<int>::Size_(int, int)
   ... (there are a lot of other calls to the library that did not significantly affect the profiling)

vs Performance profiler Output: VSPROFILE.png

The time required for the program depends largely on the file size being converted. Around 5 seconds for a 50KB image and 100 seconds for a 1MB image. It depends on the brush size and intensity levels as well.

Analysis

The profiling revealed that 80-99% of the processing time is spent in the paint function where the for-loop logic is located. Within that 99% the program spends roughly 2/3 of its time reading accessing data through the "at" function of the OpenCV Mat class (n-dimensional dense array class). The other 1/3 is spent on direct access through OpenCV’s Vec class (short numerical vectors). The for-loop is structured divides the picture up based on brush size. Then it finds the colour for each pixel in that section. Finally, it then averages the intensity to produce the final colour of that group of pixels. This is what makes this program ideal for parallelizing, because each iteration of this for-loop is calculating the final colours for each pixel. (SIMD type of process, the single instruction is to find the final colour and the multiple data is the pixels.)

 //Simplified for-loop structure
 for (int y = BrushSize; y < (height - BrushSize); y++) //for each row based on brush size
 {
 	for (int x = BrushSize; x < (width - BrushSize); x++) //for each column in brush size
 	{
 		for (int j = -BrushSize; j <= BrushSize; j++) //for each pixel row in one brush size grouping
 		{
 			for (int i = -BrushSize; i <= BrushSize; i++)//for each pixel column in one brush size grouping
 			{ //algorithm and logic for colour calculations }
               }
       }
 }

Assignment 2

Assignment 3