GPU610/TeamEh

From CDOT Wiki
Revision as of 20:12, 1 October 2014 by Benjamin Snively (talk | contribs) (Assignment 1: - Added Benjamin's results)
Jump to: navigation, search


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Team Eh

Team Members

  1. Benjamin Snively, Some responsibility
  2. Brad Hoover, Some other responsibility
  3. Balint Czunyi, Some other responsibility
  1. ...

Email All

Progress

Assignment 1

Benjamin Snively's Results

Introduction

This image processing program was found on github. It processes and manipulates images using convolutions matrices (kernels). It has several different functions including aligning and sharpening images.

To convolve an image the kernel is applied to each pixel. Using the kernel, the pixel's value is combined with that of its neighbors to create a new pixel value. This program implements the filter using two loops to loop over each pixel in sequence. For a given an image convolution is an O(rows x columns) function. As blurring operation on each pixel is independent of the others, therefore it is a perfect candidate for parallelization.

To profile the application, I created a large bitmap file (about 800 x 800, 2MB) and ran it through three different operations. To conserve space, I have not included a profile of all of the available operations.

Gassian Blur

Command: --gassian 5

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
44.81     88.57    88.57                             _mcount_private
31.92    151.66    63.09                             __fentry__
 4.85    161.25     9.59        1     9.59    45.84  Gauss_filter::smooth_ord(Matrix<std::tuple<unsigned int, unsigned int, unsigned int> >&)
 1.92    165.05     3.80 633231640     0.00     0.00  Matrix<std::tuple<unsigned int, unsigned int, unsigned int> >::operator()(unsigned int, unsigned int)
 1.43    167.88     2.83 633887736     0.00     0.00  std::__shared_ptr<std::tuple<unsigned int, unsigned int, unsigned int>, (__gnu_cxx::_Lock_policy)2>::get() const
 1.37    170.58     2.70 630508256     0.00     0.00  std::_Tuple_impl<0ul, int&, int&, int&>& std::_Tuple_impl<0ul, int&, int&, int&>::operator=<unsigned int, unsigned int, unsigned int>(std::_Tuple_impl<0ul, unsigned int, unsigned int, unsigned int> const&)
 0.92    172.40     1.82 630508256     0.00     0.00  std::_Head_base<0ul, int&, false>::_Head_base(int&)
 0.87    174.12     1.72 630508256     0.00     0.00  std::_Tuple_impl<2ul, int&>& std::_Tuple_impl<2ul, int&>::operator=<unsigned int>(std::_Tuple_impl<2ul, unsigned int> const&)
 0.86    175.81     1.69 630508256     0.00     0.00  std::_Head_base<1ul, int&, false>::_Head_base(int&)
 0.84    177.47     1.66 630508256     0.00     0.00  std::_Tuple_impl<1ul, int&, int&>& std::_Tuple_impl<1ul, int&, int&>::operator=<unsigned int, unsigned int>(std::_Tuple_impl<1ul, unsigned int, unsigned int> const&)
 0.78    179.02     1.55 630508256     0.00     0.00  std::_Head_base<2ul, int&, false>::_Head_base(int&)
 0.77    180.54     1.52 630508256     0.00     0.00  std::tuple<int&, int&, int&> std::tie<int, int, int>(int&, int&, int&)
 0.74    182.00     1.46 630508256     0.00     0.00  std::_Tuple_impl<2ul, int&>::_Tuple_impl(int&)
 0.65    183.28     1.28 630508256     0.00     0.00  std::tuple<int&, int&, int&>& std::tuple<int&, int&, int&>::operator=<unsigned int, unsigned int, unsigned int, void>(std::tuple<unsigned int, unsigned int, unsigned int> const&)
 0.57    184.41     1.13 630508256     0.00     0.00  std::tuple<int&, int&, int&>::tuple(int&, int&, int&)
 0.55    185.50     1.09 630508256     0.00     0.00  std::_Head_base<0ul, int&, false>::_M_head(std::_Head_base<0ul, int&, false>&)
 0.52    186.53     1.03 630508256     0.00     0.00  std::_Tuple_impl<0ul, int&, int&, int&>::_Tuple_impl(int&, int&, int&)
Sharpen

Command: --unsharp

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
44.44      0.96     0.96                             _mcount_private
27.31      1.55     0.59                             __fentry__
 7.41      1.71     0.16        1   160.00   458.44  unsharp(Matrix<std::tuple<unsigned int, unsigned int, unsigned int> >)
 5.56      1.83     0.12 20345464     0.00     0.00  Matrix<std::tuple<unsigned int, unsigned int, unsigned int> >::operator()(unsigned int, unsigned int)
 1.39      1.86     0.03 21001560     0.00     0.00  std::__shared_ptr<std::tuple<unsigned int, unsigned int, unsigned int>, (__gnu_cxx::_Lock_policy)2>::get() const
 1.39      1.89     0.03  7876396     0.00     0.00  std::_Tuple_impl<0ul, unsigned int, unsigned int, unsigned int>::_M_head(std::_Tuple_impl<0ul, unsigned int, unsigned int, unsigned int>&)
 1.39      1.92     0.03   656096     0.00     0.00  std::_Tuple_impl<2ul, unsigned int>& std::_Tuple_impl<2ul, unsigned int>::operator=<unsigned char>(std::_Tuple_impl<2ul, unsigned char>&&)
 0.93      1.94     0.02  7876396     0.00     0.00  std::_Tuple_impl<1ul, unsigned int, unsigned int>::_M_head(std::_Tuple_impl<1ul, unsigned int, unsigned int>&)
 0.93      1.96     0.02  1968288     0.00     0.00  unsigned char&& std::forward<unsigned char>(std::remove_reference<unsigned char>::type&)
 0.93      1.98     0.02   656096     0.00     0.00  std::_Head_base<0ul, unsigned char, false>::_M_head(std::_Head_base<0ul, unsigned char, false>&)
Identity

command: --custom '0,0,0,0,1,0,0,0,0'

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
53.61      1.71     1.71                             _mcount_private
28.21      2.61     0.90                             __fentry__
 4.39      2.75     0.14        2    70.00   218.45  Use_kernel::new_im()
 1.88      2.81     0.06  8542240     0.00     0.00  Matrix<std::tuple<unsigned int, unsigned int, unsigned int> >::operator()(unsigned int, unsigned int)
 0.94      2.84     0.03  5904864     0.00     0.00  std::_Tuple_impl<0ul, int&, int&, int&>::_Tuple_impl(int&, int&, int&)
 0.63      2.86     0.02 13126802     0.00     0.00  __gnu_cxx::__enable_if<std::__is_integer<int>::__value, double>::__type std::floor<int>(int)
 0.63      2.88     0.02  9841440     0.00     0.00  double& std::forward<double&>(std::remove_reference<double&>::type&)
 0.63      2.90     0.02  7223552     0.00     0.00  std::_Head_base<0ul, unsigned int, false>::_M_head(std::_Head_base<0ul, unsigned int, false> const&)
 0.63      2.92     0.02  5904864     0.00     0.00  std::_Tuple_impl<1ul, int&, int&>& std::_Tuple_impl<1ul, int&, int&>::operator=<unsigned int, unsigned int>(std::_Tuple_impl<1ul, unsigned int, unsigned int> const&)
 0.63      2.94     0.02  5904864     0.00     0.00  std::_Tuple_impl<2ul, int&>::_Tuple_impl(int&)
Summary

The functions that perform the filtering are Gauss_filter::smooth_ord, unsharp and Use_kernel::new_im(). These functions are all O(r x c) with respect to image dimensions and thus where the biggest gains from parallelization will be found.

Assignment 2

Assignment 3