Difference between revisions of "0xCAFEBABE"

From CDOT Wiki
Jump to: navigation, search
(Team Members)
Line 113: Line 113:
 
  11.06    19.30    2.13 334007584    0.00    0.00  swap(int*, int*)
 
  11.06    19.30    2.13 334007584    0.00    0.00  swap(int*, int*)
 
   0.00    19.30    0.00        2    0.00    0.00  printArray(int*, int)
 
   0.00    19.30    0.00        2    0.00    0.00  printArray(int*, int)
 
- Shashank Polugari
 
 
== Profile 5: Rotation Program ==
 
Flat profile:
 
 
Each sample counts as 0.01 seconds.
 
  %  cumulative  self              self    total         
 
time  seconds  seconds    calls  Ts/call  Ts/call  name   
 
99.70      3.33    3.33                            rotate(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&)
 
  0.30      3.34    0.01                            _ZNSt6vectorIiSaIiEE19_M_emplace_back_auxIJiEEEvDpOT_
 
  0.00      3.34    0.00        1    0.00    0.00  _GLOBAL__sub_I__Z6rotateRSt6vectorIS_IiSaIiEESaIS1_EE
 
  
 
- Theo Dule
 
- Theo Dule

Revision as of 15:47, 13 November 2015

0xCAFEBABE

Team Members

  1. Luv Kapur
  2. Martin Ristov
  3. Theo Dule
  4. Steven De Filippis

Email All

Progress

Assignment 1

Profile 1: Solving unsteady 1D heat transfer equation

For this project, the code analyzed was to solve a problem with constant value boundary condition.

The code was borrowed from Dr. Pengfei Du, and I had to split up his class calls into function calls, respectively, to help me better analyze it.

Flat profile: Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total          
time   seconds   seconds    calls  Ts/call  Ts/call  name   
 0.03      0.01     0.01        1     0.00     0.00  _GLOBAL__sub_I__Z5setupR14pdu1dsteadyExp
 99.78     1.03     1.02        1     1.01     1.01  _GLOBAL__sub_I__Z5solvingR15pdu1dsteadyExp


As shown, most of the time taken was in the solving function, which took care of calculating the wave. Here is the result in a graphical animation: http://pengfeidu.net/Media/gif/1d-animated.gif

- Martin Ristov

Profile 2: Solving a Rubiks Cube

My choice of an application to profile is a Rubiks cube solver. While a Rubiks cube is typically thought of as a 3x3 cube, you can exponentially increase the number of squares. By doing so, solving a much larger Rubiks cube on a typical CPU becomes useless.

I came across a github repo for a (GPL v3) Rubiks cube solver at: https://github.com/brownan/Rubiks-Cube-Solver

I compiled Brown's code with profiling flags and got some interesting results.

Flat profile:

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
44.24    301.99   301.99 1768367109     0.00     0.00  cube_turn
41.94    588.30   286.31 1768367110     0.00     0.00  edge_hash2
13.51    680.53    92.24        1    92.24   682.02  edge_generate
 0.10    681.24     0.71 132162051     0.00     0.00  stack_push
 0.06    681.67     0.43                             cube_120convert
 0.05    682.03     0.36                             edge_hash1
 0.04    682.33     0.31 132162010     0.00     0.00  stack_peek_cube
 0.03    682.52     0.19 132162010     0.00     0.00  stack_peek_distance
 0.03    682.70     0.18 132162051     0.00     0.00  stack_pop
 0.02    682.81     0.11 132162010     0.00     0.00  stack_peek_turn
 0.00    682.82     0.02                             _GLOBAL__sub_I_65535_0_cube_solved
 0.00    682.82     0.00        1     0.00     0.00  edge_write
 0.00    682.82     0.00        1     0.00   682.02  make_edge

cube_turn and edge_hash2 are the top two hotspots.

cube_turn - https://github.com/brownan/Rubiks-Cube-Solver/blob/master/cube.c#L181

edge_hash2 - https://github.com/brownan/Rubiks-Cube-Solver/blob/master/edgetable.c#L227


I feel like either two of these hotspots would be an excellent opportunity to parallelize.

~ Steven


Profile 3: Solving a Maze

I decided to profile an application which generates a maze according to the user provided rows and columns and then provides with the solution for the generated maze. As a maze can quickly rise in complexity when dealing with larger number of rows and columns, I believe that this type of an application could be a perfect fit to parallelize and make use of a much powerful GPU.

The above application has been taken form the following github repo : https://github.com/Syntaf/MazeSolver


Flat Profile :

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
99.97     13.00    13.00                             mazeGen::printMazeData(std::string) const
 0.08     13.01     0.01        1    10.00    10.00  AStar::findPath(int const&, int const&, int, int const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >)
 0.00     13.01     0.00   159198     0.00     0.00  disjointSets::setFind(int)
 0.00     13.01     0.00    70742     0.00     0.00  void std::__adjust_heap<__gnu_cxx::__normal_iterator<node*, std::vector<node, std::allocator<node> > >, long, node, std::less<node> >(__gnu_cxx::__normal_iterator<node*, std::vector<node, std::allocator<node> > >, long, long, node, std::less<node>)
 0.00     13.01     0.00    39999     0.00     0.00  disjointSets::setUnion(int, int)
 0.00     13.01     0.00     4433     0.00     0.00  void std::vector<int, std::allocator<int> >::_M_emplace_back_aux<int>(int&&)
 0.00     13.01     0.00      803     0.00     0.00  void std::vector<char, std::allocator<char> >::emplace_back<char>(char&&) [clone .constprop.131]
 0.00     13.01     0.00       10     0.00     0.00  void std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > >::_M_emplace_back_aux<std::vector<char, std::allocator<char> > const&>(std::vector<char, std::allocator<char> > const&)
 0.00     13.01     0.00        9     0.00     0.00  void std::vector<node, std::allocator<node> >::_M_emplace_back_aux<node const&>(node const&)
 0.00     13.01     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN12disjointSetsC2Ei
 0.00     13.01     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN4node14updatePriorityERKiS1_
 0.00     13.01     0.00        1     0.00     0.00  _GLOBAL__sub_I_main
 0.00     13.01     0.00        1     0.00     0.00  disjointSets::disjointSets(int)
 0.00     13.01     0.00        1     0.00     0.00  disjointSets::~disjointSets()
 0.00     13.01     0.00        1     0.00     0.00  AStar::makeReadyMap(std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > > const&)

As you can see the maximum time spent by the CPU is while generating the maze, due to a number of nested loops which evaluate heavy computation of the maze data

- Luv

Profile 4: Color Sorter

Flat profile:

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
49.79      9.61     9.61                             main
39.15     17.16     7.55        1     7.55     9.69  sortColors(int*, int)
11.06     19.30     2.13 334007584     0.00     0.00  swap(int*, int*)
 0.00     19.30     0.00        2     0.00     0.00  printArray(int*, int)

- Theo Dule

Assignment 2

Assignment 3