Profiling results
Profiling results
There is a stark difference between both versions. The parallel code outperforms our expectations and seems very efficient.
Assignment 3
For further optimization, the code can be made to use cuda streams which will enable concurrency of the slices.

