# 0xCAFEBABE

## Profile 1: Solving unsteady 1D heat transfer equation

For this project, the code analyzed was to solve a problem with constant value boundary condition.

The code was borrowed from Dr. Pengfei Du, and I had to split up his class calls into function calls, respectively, to help me better analyze it.

Flat profile: Each sample counts as 0.01 seconds.

``` %   cumulative   self              self     total
time   seconds   seconds    calls  Ts/call  Ts/call  name
0.03      0.01     0.01        1     0.00     0.00  _GLOBAL__sub_I__Z5setupR14pdu1dsteadyExp
99.78     1.03     1.02        1     1.01     1.01  _GLOBAL__sub_I__Z5solvingR15pdu1dsteadyExp
```

As shown, most of the time taken was in the solving function, which took care of calculating the wave. Here is the result in a graphical animation: http://pengfeidu.net/Media/gif/1d-animated.gif

- Martin Ristov

## Profile 2: Solving a Rubiks Cube

My choice of an application to profile is a Rubiks cube solver. While a Rubiks cube is typically thought of as a 3x3 cube, you can exponentially increase the number of squares. By doing so, solving a much larger Rubiks cube on a typical CPU becomes useless.

I came across a github repo for a (GPL v3) Rubiks cube solver at: https://github.com/brownan/Rubiks-Cube-Solver

I compiled Brown's code with profiling flags and got some interesting results.

Flat profile:

Each sample counts as 0.01 seconds.

``` %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
44.24    301.99   301.99 1768367109     0.00     0.00  cube_turn
41.94    588.30   286.31 1768367110     0.00     0.00  edge_hash2
13.51    680.53    92.24        1    92.24   682.02  edge_generate
0.10    681.24     0.71 132162051     0.00     0.00  stack_push
0.06    681.67     0.43                             cube_120convert
0.05    682.03     0.36                             edge_hash1
0.04    682.33     0.31 132162010     0.00     0.00  stack_peek_cube
0.03    682.52     0.19 132162010     0.00     0.00  stack_peek_distance
0.03    682.70     0.18 132162051     0.00     0.00  stack_pop
0.02    682.81     0.11 132162010     0.00     0.00  stack_peek_turn
0.00    682.82     0.02                             _GLOBAL__sub_I_65535_0_cube_solved
0.00    682.82     0.00        1     0.00     0.00  edge_write
0.00    682.82     0.00        1     0.00   682.02  make_edge
```

cube_turn and edge_hash2 are the top two hotspots.

I feel like either two of these hotspots would be an excellent opportunity to parallelize.

~ Steven

## Profile 3: Solving a Maze

I decided to profile an application which generates a maze according to the user provided rows and columns and then provides with the solution for the generated maze. As a maze can quickly rise in complexity when dealing with larger number of rows and columns, I believe that this type of an application could be a perfect fit to parallelize and make use of a much powerful GPU.

The above application has been taken form the following github repo : https://github.com/Syntaf/MazeSolver

Flat Profile :

Each sample counts as 0.01 seconds.

``` %   cumulative   self              self     total
time   seconds   seconds    calls  ms/call  ms/call  name
99.97     13.00    13.00                             mazeGen::printMazeData(std::string) const
0.08     13.01     0.01        1    10.00    10.00  AStar::findPath(int const&, int const&, int, int const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >)
0.00     13.01     0.00   159198     0.00     0.00  disjointSets::setFind(int)
0.00     13.01     0.00    70742     0.00     0.00  void std::__adjust_heap<__gnu_cxx::__normal_iterator<node*, std::vector<node, std::allocator<node> > >, long, node, std::less<node> >(__gnu_cxx::__normal_iterator<node*, std::vector<node, std::allocator<node> > >, long, long, node, std::less<node>)
0.00     13.01     0.00    39999     0.00     0.00  disjointSets::setUnion(int, int)
0.00     13.01     0.00     4433     0.00     0.00  void std::vector<int, std::allocator<int> >::_M_emplace_back_aux<int>(int&&)
0.00     13.01     0.00      803     0.00     0.00  void std::vector<char, std::allocator<char> >::emplace_back<char>(char&&) [clone .constprop.131]
0.00     13.01     0.00       10     0.00     0.00  void std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > >::_M_emplace_back_aux<std::vector<char, std::allocator<char> > const&>(std::vector<char, std::allocator<char> > const&)
0.00     13.01     0.00        9     0.00     0.00  void std::vector<node, std::allocator<node> >::_M_emplace_back_aux<node const&>(node const&)
0.00     13.01     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN12disjointSetsC2Ei
0.00     13.01     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN4node14updatePriorityERKiS1_
0.00     13.01     0.00        1     0.00     0.00  _GLOBAL__sub_I_main
0.00     13.01     0.00        1     0.00     0.00  disjointSets::disjointSets(int)
0.00     13.01     0.00        1     0.00     0.00  disjointSets::~disjointSets()
0.00     13.01     0.00        1     0.00     0.00  AStar::makeReadyMap(std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > > const&)
```

As you can see the maximum time spent by the CPU is while generating the maze, due to a number of nested loops which evaluate heavy computation of the maze data

- Luv

## Profile 4: Color Sorter

Flat profile:

Each sample counts as 0.01 seconds.

``` %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
49.79      9.61     9.61                             main
39.15     17.16     7.55        1     7.55     9.69  sortColors(int*, int)
11.06     19.30     2.13 334007584     0.00     0.00  swap(int*, int*)
0.00     19.30     0.00        2     0.00     0.00  printArray(int*, int)
```

- Theo Dule