Open main menu

CDOT Wiki β

Changes

DPS915 C U D A B O Y S

81,447 bytes removed, 12:57, 8 December 2015
Assignment 3
== Progress ==
==Assignment 1 == === <span style="color: green">&#x2713; Profile 0: File Encryption</span> ======= Description ====This piece of software takes a file and ecrypts it one of 4 ways:# Byte Inversion# Byte Cycle# Xor Cipher# RC4 Cipher  Inside the byteCipher method, exists a for loop that could use optimization. Within this loop specifically, the lines that call the <code>cycle</code> and <code>rc4_output</code> functions are the ones that are taking the longest time to execute: for (int i = 0; i < bufferSize; i++){ // going over every byte in the file switch (mode) { case 0: // inversion buffer[i] = ~buffer[i]; break; case 1: // cycle buffer [i] = cycle (buffer [i]); break; case 2: // RC4 buffer [i] = buffer [i] ^ rc4_output(); break; } } Here is what these functions <code>cycle</code> and <code>rc4_output</code> functions look like: char cycle (char value) { int leftMask = 170; int rightMask = 85; int iLeft = value & leftMask; int iRight = value & rightMask; iLeft = Assignment iLeft >> 1; iRight = iRight << 1; return iLeft | iRight; }  unsigned char rc4_output() { unsigned char temp; i = (i + 1) & 0xFF; j = (j + S[i]) & 0xFF; temp = S[i]; S[i] = S[j]; S[j] = temp; return S[(S[i] + S[j]) & 0xFF]; }  We need to change these two functions so they are added to the CUDA device as "device functions". ==== Profiling on Linux ==== The following test runs were performed on the following Virtual Machine:* CentOS 7* i7-3820 @ 3.6 GHz* 2GB DDR3* gcc version 4.8.3  Using compiler settings: g++ -c -O2 -g -pg -std=c++11 encFile.cpp    '''RC4 Cipher - 283 MB mp3 File''' [root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/music.mp3 /home/johny/aes/music.mp3 * * * File Protector * * * Mode 4: RC4 cipher Please enter the RC4 key (8 chars min) testing123 The password is: testing123 Beginning encryption Completed: 100% Cipher completed. Program terminated.  real 0m6.758s user 0m3.551s sys 0m0.068s  Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 84.05 1.70 1.70 296271519 0.00 0.00 rc4_output() 13.39 1.97 0.27 byteCipher(int, std::string) 2.73 2.02 0.06 1 55.09 55.09 rc4_init(unsigned char*, unsigned int) 0.00 2.02 0.00 1 0.00 0.00 _GLOBAL__sub_I_S As we can see the <code>rc4_output</code> and <code>byteCipher</code> functions take up most of the processing time.   '''RC4 Cipher - 636 MB iso File''' [root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/cent.iso /home/johny/aes/cent.iso * * * File Protector * * * Mode 4: RC4 cipher Please enter the RC4 key (8 chars min) testing123 The password is: testing123 Beginning encryption Completed: 100% Cipher completed. Program terminated.  real 0m10.293s user 0m8.235s sys 0m0.312s  Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 74.86 3.59 3.59 666894336 0.00 0.00 rc4_output() 23.21 4.70 1.11 byteCipher(int, std::string) 2.09 4.80 0.10 1 100.16 100.16 rc4_init(unsigned char*, unsigned int) 0.00 4.80 0.00 1 0.00 0.00 _GLOBAL__sub_I_S '''RC4 Cipher - 789 MB iso File'''  [root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/xu.iso /home/johny/aes/xu.iso * * * File Protector * * * Mode 4: RC4 cipher Please enter the RC4 key (8 chars min) testing123 The password is: testing123 Beginning encryption Completed: 100% Cipher completed. Program terminated.  real 0m12.566s user 0m10.170s sys 0m0.228s  Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 75.51 4.40 4.40 827326464 0.00 0.00 rc4_output() 23.02 5.74 1.34 byteCipher(int, std::string) 1.63 5.84 0.10 1 95.15 95.15 rc4_init(unsigned char*, unsigned int) 0.00 5.84 0.00 1 0.00 0.00 _GLOBAL__sub_I_S  ==== Profiling on Windows ==== The following test runs were performed on the following Machine:* Windows 10* i7-4790k @ 4GHz* 16GB DDR3* Visual Studio 2013  '''RC4 Cipher - 283 MB mp3 File''' [[File:winmp3.png]]  '''RC4 Cipher - 636 MB iso File''' [[File:wincent.png]]  '''RC4 Cipher - 789 MB iso File''' [[File:winxu.png]]   '''Byte Cycle - 283 MB mp3 File''' [[File:winmp32.png]]  '''Byte Cycle - 636 MB iso File''' [[File:wincent2.png]]  '''Byte Cycle - 789 MB iso File''' [[File:winxu2.png]] === <span style="color: red">&#x2717; Profile 1 : PI Approximation</span> ===
== Profile 1: PI Approximation ==
* Sample run:
operation - took - 47.1807910000 secs
3.1415537704
operation - took - 47.1643760000 secs 3.1415782660 operation - took - 47.1696770000 secs 3.1415815554 operation - took - 47.2050050000 secs
real 3m33.129s
user 3m32.925s
0.00 106.93 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z10reportTimePKcNSt6chrono8durationIlSt5ratioILl1ELl1000000EEEE
=== <span style="color: greenred">&#x2713x2717; Profile 2: Wave Form Generator</span> === <s>'''This is the program we selected to optimize. It's a great candidate because it has 2 primary functions that have a few for loops in them. One of the functions reads an Mp3 file and writes wave data to a file -- this function takes quite a bit of time to execute. The other function actually takes this data and converts it to a view-able sound wave image.Both functions would benefit greatly from the extra processing power that a GPU provides: mp3 read/decode time would be greatly reduced.'''</s>  'This piece of code is too complex and requires a linux environment to run. Please see Profile 0 for the one we are currently using.'
* Sample Run
0.00 7.29 0.00 7272 0.00 0.00 BstdRead
0.00 7.29 0.00 7271 0.00 0.00 BstdFileEofP
0.00 7.29 0.00 176 0.00 0.00 __gnu_cxx::__normal_iterator<std::string const*, std::vector<std::string, std::allocator<std::string> > >::base() const 0.00 7.29 0.00 144 0.00 0.00 std::string* std::__addressof<std::string>(std::string&) 0.00 7.29 0.00 130 0.00 0.00 std::less<std::string>::operator()(std::string const&, std::string const&) const 0.00 7.29 0.00 130 0.00 0.00 bool std::operator< <char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 0.00 7.29 0.00 129 0.00 0.00 std::_Select1st<std::pair<std::string const, boost::program_options::variable_value> >::operator()(std::pair<std::string const, boost::program_options::variable_value> const&) const 0.00 7.29 0.00 117 0.00 0.00 std::vector<short, std::allocator<short> >::size() const 0.00 7.29 0.00 107 0.00 0.00 std::_Vector_base<std::string, std::allocator<std::string> >::_M_get_Tp_allocator() 0.00 7.29 0.00 99 0.00 0.00 std::_Rb_tree<std::string, std::pair<std::string const, boost::program_options::variable_value>, std::_Select1st<std::pair<std::string const, boost::program_options::variable_value> >, std::less<std::string>, std::allocator<std::pair<std::string const, boost::program_options::variable_value> > >: ..... === <span style="color: red">&#x2717;Profile 3: String Processor</span> === * Sample run:  ext_string_example es + 123 = ext_string123 456 + es = 456ext_string es * 3 = ext_stringext_stringext_string 3 * es = ext_stringext_stringext_string original: abc1234?abc1234?abc1234 es - abc = 1234?1234?1234 es - 123 = abc?abc?abc es - ? = abc1234abc1234abc1234 ext_string == eXt_StRiNg original: eXt_StRiNg lowercase: ext_string uppercase: EXT_STRING original: [ ext_string ] remove leading space: [ext_string ] remove trailing space: [ext_string] es: abc, ijk, pqr, xyz ---> split: (abc) (ijk) (pqr) (xyz) es: abc, ijk, pqr, xyz ---> split_n(3): (abc) (ijk) (pqr) es: 1, -23, 456, -7890 ---> parse: (1) (-23) (456) (-7890) es: 1.1, -23.32, 456.654, -7890.0987 ---> parsed: (1.1000000000000001) (-23.32) (456.654) (-7890.0986999999996) non_repeated_char_example No non-repeated chars in string. First non repeated char: a First non repeated char: b
=== Assignment 2 ====== Assignment 3 ===[[File:a3graph.png]]