# Changes

## Algo holics

, 00:23, 2 March 2019
Progress
<h5>SUDOKU PUZZLE SOLVER ====Sudoku Puzzle Solver by Gurpreet Singh</h5> ====
Is it a program that solves Sudoku puzzles(9X9) using Bruteforce algorithm. The user can either pass a Sudoku files as an input or enter the values manually. Moreover, the file or the manual entry must strictly have 9 rows and 9 columns in them. Last but not the least, all the cells must be separated by a space and the cells that needs to be solved must have 0 in them as their value.
The original source code can be found at [https://github.com/shafeeq/Sudoku Link]
<h5>=====Logic</h5>=====
In this program the Bruteforce algorithm first put 1 in the first cell. Then it moves to the second cell and put 1 in there and check if it satisfies all the rules and conditions. If it don't, then the algorithm will increment it's value to 2 and then check again. The value can change from 0-9 to find the correct value for a cell. If none of the value from the range of 0-9 satisfies the cell, then the program will iterate back and change the value of the first cell to 2 and then try the whole process again. In this way it will solve the puzzle.
<h5>=====Compiling the program</h5>=====
Enter the following commands:
<h5>=====Analysis</h5>=====
To analyze the call graph, enter the following command:
gprof -q -b a> a.clg
From the above Call graph we can see that for a harder Sudoku puzzle, the time increased significantly. Moreover, it can also be seen that almost 50% of the time is consumed by the checkRow function, 18.8% by checkColumn and finally 12% by the checkSquare function. Thousand of calls were made to these 3 functions, if we parallelizing these functions then the efficiency of the program can be increased significantly.
----
==== Simple Artificial Neural Network by Sukhbeer====
=====Introduction=====
I am very interested in neural networks and I started learning about them recently. I think this is a good opportunity to build on my knowledge of a NN while also parallelising it. For that purpose, I have selected a very basic neural network which feeds forward with ReLu and softmax and back-propagates on a sample batch from MNIST handwritten digits dataset. In each iteration, the weights are adjusted to train the network for better predictions.
The code performs matrix-multiplication(dot-product) each time that the activation vector and delta vector are calculated for the next node and the previous node respectively.

=====Source Code=====
Here is the [https://cognitivedemons.wordpress.com/2018/06/08/neural-network-in-c-part-2-mnist-handwritten-digits-dataset/ source code] used.
The result given below is the comparison of predictions as made by the trained network after 10000 iterations. The ground truth is the actual value of the labels between 0-9 (true for the corresponding digit in the dataset).

-----------------------------------------------Epoch 10000--------------------------------------------------
Predictions:
0.000848207 9.07445e-06 0.000145165 0.797735 4.94866e-06 0.19374 1.55013e-06 0.000244941 0.00657041 0.000700498
1.36476e-05 1.07548e-07 8.3835e-05 0.000744837 0.299883 9.37717e-05 3.53349e-05 0.00822595 0.00210021 0.688819
5.11556e-06 0.000616957 0.000233088 0.87458 2.20579e-05 0.0140489 5.03569e-08 0.000518445 0.0826038 0.0273714
0.0178851 3.64621e-08 0.0174107 0.000322792 0.716312 0.00120967 0.189534 0.00303238 0.00613965 0.0481543
7.40077e-07 0.96872 0.014224 0.00555447 2.56397e-05 0.000115577 0.000157107 0.00366156 0.00669771 0.000842866
7.37584e-05 0.00306397 0.0184482 0.056542 0.000217984 0.0807415 0.000430994 1.09367e-05 0.838792 0.00167921
1.23026e-05 1.10682e-09 6.47478e-07 0.000129503 1.28475e-05 1.20242e-05 1.18166e-09 0.953265 2.63176e-05 0.046541
0.974183 3.50241e-18 1.99895e-07 3.4534e-07 2.3755e-11 0.0257772 1.96811e-09 6.99407e-09 3.92052e-05 2.28711e-08
2.21581e-05 9.26954e-09 0.000182046 0.00336899 3.40876e-05 0.0800376 8.35955e-07 1.2496e-07 0.914781 0.00157335
8.59312e-07 4.1739e-05 0.000106891 0.000122639 0.00018295 4.02451e-05 7.21105e-07 0.898311 0.00405182 0.0971408

Ground truth:
0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0
Loss 0.184251
--------------------------------------------End of Epoch :(------------------------------------------------

=====Profiling=====
Here are the results of profiling the program
{| class="wikitable mw-collapsible mw-collapsed"
! Flat profile
|-
|

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
97.98 1061.73 1061.73 dot(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int, int)
1.41 1076.95 15.23 transpose(float*, int, int)
0.16 1078.65 1.70 operator-(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&)
0.14 1080.13 1.48 operator*(float, std::vector<float, std::allocator<float> > const&)
0.12 1081.47 1.33 relu(std::vector<float, std::allocator<float> > const&)
0.08 1082.34 0.87 519195026 1.68 1.68 void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&)
0.07 1083.07 0.73 operator*(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&)
0.05 1083.63 0.56 reluPrime(std::vector<float, std::allocator<float> > const&)
0.03 1083.93 0.30 softmax(std::vector<float, std::allocator<float> > const&, int)
0.02 1084.14 0.21 442679 474.87 474.87 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&)
0.02 1084.31 0.17 13107321 12.98 12.98 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&)
0.01 1084.45 0.14 operator/(std::vector<float, std::allocator<float> > const&, float)
0.01 1084.58 0.13 462000 281.67 281.67 void std::vector<std::string, std::allocator<std::string> >::_M_emplace_back_aux<std::string const&>(std::string const&)
0.01 1084.68 0.10 split(std::string const&, char)
0.00 1084.68 0.00 3 0.00 0.00 std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&)
0.00 1084.68 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z5printRKSt6vectorIfSaIfEEii

|}
{| class="wikitable mw-collapsible mw-collapsed"
! Call graph
|-
|

Call graph

granularity: each sample hit covers 2 byte(s) for 0.00% of 1084.68 seconds

index % time self children called name
<spontaneous>
[1] 97.9 1061.73 0.00 dot(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int, int) [1]
-----------------------------------------------
<spontaneous>
[2] 1.4 15.23 0.00 transpose(float*, int, int) [2]
-----------------------------------------------
<spontaneous>
[3] 0.2 1.70 0.00 operator-(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&) [3]
-----------------------------------------------
<spontaneous>
[4] 0.1 0.56 0.97 reluPrime(std::vector<float, std::allocator<float> > const&) [4]
0.82 0.00 491520000/519195026 void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&) [7]
0.15 0.00 310000/442679 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&) [11]
-----------------------------------------------
<spontaneous>
[5] 0.1 1.48 0.00 operator*(float, std::vector<float, std::allocator<float> > const&) [5]
-----------------------------------------------
<spontaneous>
[6] 0.1 1.33 0.01 relu(std::vector<float, std::allocator<float> > const&) [6]
0.00 0.00 307321/13107321 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&) [12]
0.00 0.00 2075026/519195026 void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&) [7]
0.00 0.00 2679/442679 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&) [11]
-----------------------------------------------
0.00 0.00 2075026/519195026 relu(std::vector<float, std::allocator<float> > const&) [6]
0.04 0.00 25600000/519195026 softmax(std::vector<float, std::allocator<float> > const&, int) [9]
0.82 0.00 491520000/519195026 reluPrime(std::vector<float, std::allocator<float> > const&) [4]
[7] 0.1 0.87 0.00 519195026 void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&) [7]
-----------------------------------------------
<spontaneous>
[8] 0.1 0.73 0.00 operator*(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&) [8]
-----------------------------------------------
<spontaneous>
[9] 0.1 0.30 0.27 softmax(std::vector<float, std::allocator<float> > const&, int) [9]
0.17 0.00 12800000/13107321 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&) [12]
0.06 0.00 130000/442679 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&) [11]
0.04 0.00 25600000/519195026 void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&) [7]
-----------------------------------------------
<spontaneous>
[10] 0.0 0.10 0.13 split(std::string const&, char) [10]
0.13 0.00 462000/462000 void std::vector<std::string, std::allocator<std::string> >::_M_emplace_back_aux<std::string const&>(std::string const&) [14]
-----------------------------------------------
0.00 0.00 2679/442679 relu(std::vector<float, std::allocator<float> > const&) [6]
0.06 0.00 130000/442679 softmax(std::vector<float, std::allocator<float> > const&, int) [9]
0.15 0.00 310000/442679 reluPrime(std::vector<float, std::allocator<float> > const&) [4]
[11] 0.0 0.21 0.00 442679 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&) [11]
-----------------------------------------------
0.00 0.00 307321/13107321 relu(std::vector<float, std::allocator<float> > const&) [6]
0.17 0.00 12800000/13107321 softmax(std::vector<float, std::allocator<float> > const&, int) [9]
[12] 0.0 0.17 0.00 13107321 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&) [12]
-----------------------------------------------
<spontaneous>
[13] 0.0 0.14 0.00 operator/(std::vector<float, std::allocator<float> > const&, float) [13]
-----------------------------------------------
0.13 0.00 462000/462000 split(std::string const&, char) [10]
[14] 0.0 0.13 0.00 462000 void std::vector<std::string, std::allocator<std::string> >::_M_emplace_back_aux<std::string const&>(std::string const&) [14]
-----------------------------------------------
0.00 0.00 3/3 random_vector(int) [28]
[22] 0.0 0.00 0.00 3 std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) [22]
-----------------------------------------------
0.00 0.00 1/1 __libc_csu_init [38]
[23] 0.0 0.00 0.00 1 _GLOBAL__sub_I__Z5printRKSt6vectorIfSaIfEEii [23]
-----------------------------------------------
Index by function name

[23] _GLOBAL__sub_I__Z5printRKSt6vectorIfSaIfEEii (nn.cpp) [2] transpose(float*, int, int) [13] operator/(std::vector<float, std::allocator<float> > const&, float)
[1] dot(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int, int) [14] void std::vector<std::string, std::allocator<std::string> >::_M_emplace_back_aux<std::string const&>(std::string const&) [3] operator-(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&)
[6] relu(std::vector<float, std::allocator<float> > const&) [7] void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&) [8] operator*(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&)
[10] split(std::string const&, char) [11] void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&) [5] operator*(float, std::vector<float, std::allocator<float> > const&)
[9] softmax(std::vector<float, std::allocator<float> > const&, int) [12] void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&)
[4] reluPrime(std::vector<float, std::allocator<float> > const&) [22] std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&)

|}

=====Analysis=====
The total execution time of the program is around 10 minutes. As is evident from profiling results, most of the execution time is taken up by the ''dot()'' function as it does matrix-matrix multiplication. This is the hotspot of the program that can be made efficient by doing this computation and other vector multiplications on the GPU.
----
=== Assignment 2 ===
=== Assignment 3 ===
57
edits