GPU610/Team AGC

From CDOT Wiki
Revision as of 14:31, 6 October 2014 by Christopher Markieta (talk | contribs) (Fixed Mail All)
Jump to: navigation, search


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Team AGC

Team Members

  1. Andy Cooc, Some responsibility
  2. Gabriel Castro, Some other responsibility
  3. Christopher Markieta, Some other responsibility

Email All

Progress

Assignment 1

Andy's Findings

...

Gabriel's Findings

Pillow Library (Python)

The Pillow library is an image manipulation library for python. It has many common functions like re-sizing, cropping, rotating, blur, sharpen etc. It's available on github at https://github.com/python-pillow/Pillow

I used 4 filters for testing

filter run time
Blur 1.305s
Gaussian Blur 4.492
Auto Contrast 0.092
Grey Scale 0.046

See the code on github.

Gaussian Blur

A gaussian blur is a filter that can be applied to an image to make it blurry, in its simplest form it goes through every pixel in an image and averages it with its surrounding pixels. In the Pillow library it is implemented in a C module. This helps with processing time as C is hundreds of times faster than python. A quick look at the source shows that algorithm contains a nested for loop that's O(x * y) and then one that's O(x * y * r), that means that time grows n^2 based on the size of the image and n^3 based on the size of the blur radius.

Conclusion

With many nested operations, this a really good candidate for GPU paralization.

Christopher's Findings

xor_me

xor_me is an open source, brute force password cracker for doc and xls files.

xor_me was written by Benoît Sibaud and is copyrighted under the terms of the GNU Lesser General Public License version 3 only, as published by the Free Software Foundation. Here is a link to my fork of this repository:

https://github.com/Markieta/xor_me

Here is the profile of a short run of brute_force:

Flat profile:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls  Ts/call  Ts/call  name
99.35    10.58    10.58                             lclGetKey(unsigned char const*, long)
0.19     10.60     0.02                             dump_exit(int)
0.00     10.60     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z9lclGetKeyPKhl

As you can see, lclGetKey is the function where I will spend most of my time trying to parallelize code.

This application is quite CPU intensive, but hasn't been optimized for multiple threads. It is using 100% of 1 thread from the 8 available on my CPU. See the list of processes below for an example.

top - 23:25:32 up  1:16,  3 users,  load average: 0.94, 0.65, 0.60
Tasks: 256 total,   2 running, 253 sleeping,   0 stopped,   1 zombie
%Cpu(s): 13.7 us,  0.7 sy,  0.0 ni, 85.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   3979012 total,  3270196 used,   708816 free,   352592 buffers
KiB Swap:  4124668 total,        0 used,  4124668 free.   984048 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                        
 5229 markieta  20   0   12664   1064    896 R 100.0  0.0   1:06.04 brute_force                                    
 1467 root      20   0  538620 208272 166664 S   3.7  5.2   6:55.85 Xorg                                           
 2540 markieta  20   0 1084328 169500  53460 S   3.3  4.3  11:57.88 chrome                                         
 2075 markieta   9 -11  509016   6840   4500 S   2.7  0.2   3:21.47 pulseaudio                                     
 3022 markieta  20   0 1094124  79252  10220 S   2.3  2.0   3:05.77 chrome                                         
 2932 markieta  20   0  829212  43176  19848 S   1.7  1.1   0:31.29 /usr/bin/termin                                
 2989 markieta  20   0 1039100 117292  23968 S   1.7  2.9   1:48.53 chrome                                         
 2296 markieta  20   0 1553220 170248  65592 S   1.0  4.3   3:16.43 compiz                                         
 2789 markieta  20   0 1064160  99040  20116 S   0.7  2.5   1:24.61 chrome                                         
 2993 markieta  20   0 1112596 155316  26192 S   0.7  3.9   1:31.43 chrome                                         
 3482 root      20   0       0      0      0 S   0.3  0.0   0:07.34 kworker/2:2                                    
 4364 root      20   0       0      0      0 S   0.3  0.0   0:00.96 kworker/0:0                                    
 5230 markieta  20   0   24952   1776   1176 R   0.3  0.0   0:00.14 top

Here I ran the brute_force function again to see how long it would really take. I did not run this command for too long, hence the the termination ^C after about 2 minutes.

markieta@Y560:~/Documents/xor_me$ time ./brute_force 0x499a 0xcc61
Key: 499a
Hash: cc61
Password: '1950'
Password: 'w2#1'
^CState: 20:22:48:46:7e:48:62:c3c0:00000184023807d008a011c010800000:6c62477d45472100

real	2m16.888s
user	2m16.082s
sys	0m0.640s

I am hoping to see a lot of improvement after my research and implementation with CUDA and OpenCL.

Team AGC has decided to collaborate on this project for assignment 2 onwards.

Assignment 2

Assignment 3