Changes

Pixels

2,568 bytes added, 16:55, 15 April 2018

→‎Part 2

With Mr Szalwinski's advice we've starting looking into how to invert a matrix and use NVidia's cuSOLVER library. Our research has shown that by inverting a Matrix we are solving linear system of equations. And if the goal is to solve equations there are better ways of doing so.

NVidia cuSOLVER library provides hight-level API functions for common matrix factorization and triangular solve routines for dense matrices. While it also provides routines for sparse matrix factorization it lies beyond the scope of this ~~assignment~~assi----gnment. Our code includes 3 factorization methods on the GPU: LU, QR and Cholesky decomposition as well as LU decomposition on CPU to benchmark the runtimes.

[[File:file9.png|500px]] ~~LU stands for Lower & Upper.~~ This methods involves ~~transformation~~ 1. Transformation of the original matrix into two parts - lower and upper, such that if we multiply them together we get the original matrix. Since the lower matrix has 1's in its diagonal we only need to store everything below that. This effectively means that both upper and lower matrices could be stored in the original matrix's size. However extra space is needed for the calculations. 2. Having two triangular matrices we are able to solve the equation by either forward-backword substitution or Gaussian elimination. [[File:file10.png|800px]] The code above (PART 1) creates a matrix of NxN size and two vectors of N size. To solve for b1 in A*b1=b. It then uses cblas level 2 function for Matrix * Vector multiplication to get a result. [[File:file11.png|750px]] The code above (PART 2) creates and allocates memory on the device and on line 54 calculates the size required for factorization even though the end result will be stored in the original matrix. On line 61 the function <code>cusolverDnDget----rf()</code> uses d_A matrix and decomposes it into two triangular matrices with partial pivoting (only rows exchanged). On line 62 <code>cusolverDnDgetrs()</code> solves triangular system by USING two triangular matrices stored in d_A AND a result we calculated earlier in d_B. The output of this function is a vector which is placed in d_B. At this point the resulting vector d_B should match the initialized B1 vector on line 29. This is how we know we have solved. Similar decomposition can be found in QR and Cholesky ---- === LU decomposition on the CPU is presented below with Big O(n^3) runtime === ===== Runner =====[[File:file12.png|600px]] ===== Solvers =====[[File:file13.png|700px]] ---- === Run Times ===We have decided to include the information about GPU in our final output as we were testing it on different graphics cards. [[File:file14.png|700px]][[File:file15.png|700px]] Looking at the Matrix size of 10,000 x 10,000 the LU decomposition on GPU runs about 39 times faster then CPU, which is substantial improvements in the runtimes. ---- It is also interesting to note the difference in runtime between LU, QR and Cholesky methods. ===== QR =====Uses: ... =====Cholesky =====Uses a symmetrical matrix that satisfies the relationship A^-1 = A^T. Which basically means that the data is mirrored over diagonal and therefore takes twice as fast to run as LU ---- The complete code can be found at Github repo [https://github.com/YuriyKartuzov/cuSOLVER HERE]

YuriyKartuzov

120

edits

CDOT Wiki β

Changes

Pixels

CDOT Wiki ^β