Changes

Jump to: navigation, search

Installation Wizards

1,856 bytes added, 18:42, 1 April 2017
Assignment 2
We have decided to pursue the image processing project as it has more potential for a significant speedup after using the parallel processing. We will focus our time on speeding up the functions used to negate, reflect and enlarge the image.
=== Assignment 2 ===
== Parallel Image Processor ==
 
For the second part of this assignment we proceeded to convert the C implementation of the image processor to a CUDA implementation that would use parallel programming for a performance increase. The first hurdle we encountered when beginning this process was that the image used in our image processor was stored as a 2D array which would create problems when we try to use it in a CUDA implementation. Matt took the responsibility of re writing the code to use a 1D array that stored the pixels in column major order.
 
Once we had the 1D array to store the image, we were able to start writing kernels for the compute-intensive functions of our processor. We created kernels for reflectImage(), enlargeImage() and negateImage() and began to test for any sort of performance increase.
 
Results:
Test CUDA C++
Negating and reflecting image 2.988s 2.752s
Enlarge by scale 8, negate and reflect 5.728s 6.027s
 
As seen from the results above the c++ implementation of the processor seems to run at around the same time as the CUDA implementation when the image is only negated and reflected. However, once the image is scaled by any factor, there is a definite increase in performance from the CUDA implementation. It i also worth noting that the profiled times form the CUDA implementation seemed to vary a lot more than the c++ implementation which we think is from the variation in times for the cudaMemcpy() to run.
 
To continue our optimizations, we think that we could get more of a performance increase by minimizing the amount of data copied to and from the GPU. We are going to look into storing the image on the GPU until all transformations have been done and then copying the data back from the GPU.
 
=== Assignment 3 ===
37
edits

Navigation menu