Profiling results
====Profiling results====
There is a stark difference between both versions. The parallel code outperforms our expectations and seems very efficient.
=== Assignment 3 ===
For further optimization, the code can be made to use cuda streams which will enable concurrency of the slices.

