# Difference between revisions of "GPUSquad"

Jump to: navigation, search

# GPUSquad

```         (           (                         (
(       )\ )        )\ )   (            (     )\ )
)\ )   (()/(    (  (()/( ( )\      (    )\   (()/(
(()/(    /(_))   )\  /(_)))((_)     )\((((_)(  /(_))
/(_))_ (_))  _ ((_)(_)) ((_)_   _ ((_))\ _ )\(_))_
(_)) __|| _ \| | | |/ __| / _ \ | | | |(_)_\(_)|   \
| (_ ||  _/| |_| |\__ \| (_) || |_| | / _ \  | |) |
\___||_|   \___/ |___/ \__\_\ \___/ /_/ \_\ |___/```

## Progress

### Assignment 1

#### Idea 1 - Jacobi Method for 2D Poisson Problem

This topic is based on the following pdf: https://math.berkeley.edu/~wilken/228A.F07/chr_lecture.pdf

Background:

Poisson's equation according to Wikipedia:

"In mathematics, Poisson's equation is a partial differential equation of elliptic type with broad utility in mechanical engineering and theoretical physics. It arises, for instance, to describe the potential field caused by a given charge or mass density distribution; with the potential field known, one can then calculate gravitational or electrostatic field. It is a generalization of Laplace's equation, which is also frequently seen in physics."

According to the intro in the pdf above, finite-difference and finite-element methods are the solution techniques of choice for solving elliptic PDE problems. Regardless of which type of technique you choose, you will end up with sets of linear relationships between various variables, and will need to solve for the solution u_k which satisfies all the linear relationships prescribed by the PDE.

This can be written, according to the pdf, as a matrix "Au = b, where we wish to find a solution u, given that A is a matrix capturing the differentiation operator, and b corresponds to any source or boundary terms". The Jacobi method can be used to solve this matrix, and is used in the code sample later.

Jacobi method according to Wikipedia:

"In numerical linear algebra, the Jacobi method (or Jacobi iterative method[1]) is an algorithm for determining the solutions of a diagonally dominant system of linear equations. Each diagonal element is solved for, and an approximate value is plugged in. The process is then iterated until it converges."

**************************

CODE - Based on PDF

**************************

The following code is based on the jacobi.cc and common.cc code at the bottom of the source PDF, but there are some changes for these tests. The files use .cpp extensions instead of .cc, the code in common.cc was added to jacobi.cpp so it's a single file, and code for carrying out the Jacobi iterations in the main were placed into a seperate function called dojacobi() for gprof profiling.

The std::cout line inside the output_and_error function was commented out to avoid excessive printing to the terminal.

Compilation on Linux: g++ -std=c++0x -O2 -g -pg -o jacobi jacobi.cpp

```// Load standard libraries
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <cmath>
using namespace std;

// Set grid size and number of iterations
const int save_iters=20;
const int total_iters=5000;
const int error_every=2;
const int m=330,n=330;
const double xmin=-1,xmax=1;
const double ymin=-1,ymax=1;

// Compute useful constants
const double pi=3.1415926535897932384626433832795;
const double omega=2/(1+sin(2*pi/n));
const double dx=(xmax-xmin)/(m-1);
const double dy=(ymax-ymin)/(n-1);
const double dxxinv=1/(dx*dx);
const double dyyinv=1/(dy*dy);
const double dcent=1/(2*(dxxinv+dyyinv));

// Input function
inline double f(int i,int j) {
double x=xmin+i*dx,y=ymin+j*dy;
return abs(x)>0.5||abs(y)>0.5?0:1;
}

// Common output and error routine
void output_and_error(char* filename,double *a,const int sn) {
// Computes the error if sn%error every==0
if(sn%error_every==0) {
double z,error=0;int ij;
for(int j=1;j<n-1;j++) {
for(int i=1;i<m-1;i++) {
ij=i+m*j;
z=f(i,j)-a[ij]*(2*dxxinv+2*dyyinv)
+dxxinv*(a[ij-1]+a[ij+1])
+dyyinv*(a[ij-m]+a[ij+m]);
error+=z*z;
}
}
//cout << sn << " " << error*dx*dy << endl;
}

// Saves the matrix if sn<=save iters
if(sn<=save_iters) {
int i,j,ij=0,ds=sizeof(float);
float x,y,data_float;const char *pfloat;
pfloat=(const char*)&data_float;

ofstream outfile;
static char fname[256];
sprintf(fname,"%s.%d",filename,sn);
outfile.open(fname,fstream::out
| fstream::trunc | fstream::binary);

data_float=m;outfile.write(pfloat,ds);

for(i=0;i<m;i++) {
x=xmin+i*dx;
data_float=x;outfile.write(pfloat,ds);
}

for(j=0;j<n;j++) {
y=ymin+j*dy;
data_float=y;
outfile.write(pfloat,ds);

for(i=0;i<m;i++) {
data_float=a[ij++];
outfile.write(pfloat,ds);
}
}

outfile.close();
}
}

void dojacobi(int i, int j, int ij, int k, double error, double u[],
double v[], double z, double* a, double* b) {

// Set initial guess to be identically zero
for(ij=0;ij<m*n;ij++) u[ij]=v[ij]=0;
output_and_error("jacobi out",u,0);

// Carry out Jacobi iterations
for(k=1;k<=total_iters;k++) {
// Alternately flip input and output matrices
if (k%2==0) {a=u;b=v;} else {a=v;b=u;}

// Compute Jacobi iteration
for(j=1;j<n-1;j++) {
for(i=1;i<m-1;i++) {
ij=i+m*j;
a[ij]=(f(i,j)+dxxinv*(b[ij-1]+b[ij+1])
+dyyinv*(b[ij-m]+b[ij+m]))*dcent;
}
}

// Save and compute error if necessary
output_and_error("jacobi out",a,k);
}

}

int main() {
int i,j,ij,k;
double error,u[m*n],v[m*n],z;
double *a,*b;

dojacobi(i, j, ij, k, error, u, v, z, a, b);

return 0;
}```

************************

Inputs / Performance

************************

In a 2D PDE such as the 2D Poisson problem, the matrix will have m * n gridpoints.

At m = 33, n = 33, iterations = 5000:

```Flat profile:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls  Ts/call  Ts/call  name
100.00      0.05     0.05                             dojacobi(int, int, int, int, double, double*, double*, double, double*, double*)
0.00      0.05     0.00     5001     0.00     0.00  output_and_error(char*, double*, int)
0.00      0.05     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z16output_and_errorPcPdi```

At m = 165, n = 165, iterations = 5000:

```Flat profile:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls  us/call  us/call  name
99.15      1.16     1.16                             dojacobi(int, int, int, int, double, double*, double*, double, double*, double*)
0.85      1.17     0.01     5001     2.00     2.00  output_and_error(char*, double*, int)
0.00      1.17     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z16output_and_errorPcPdi```

At m = 330, n = 330, iterations = 5000:

```Flat profile:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls  us/call  us/call  name
99.43      5.26     5.26                             dojacobi(int, int, int, int, double, double*, double*, double, double*, double*)
0.57      5.29     0.03     5001     6.00     6.00  output_and_error(char*, double*, int)
0.00      5.29     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z16output_and_errorPcPdi```

The hotspot seems to clearly be the triple for-loop in the Jacobi iterations code of the dojacobi() function. I believe these matrix calculations could be parallelized for improved performance.

#### Idea 2 - LZW Compression

BACKGROUND:(Paraphrased from "LZW compression" at http://whatis.techtarget.com/definition/LZW-compression) LZW compression is a compression algorithm that creates a dictionary of tokens from collections of characters. These tokens correspond to different patterns of bit values. Unique tokens are generated from the longest possible series of characters that recur in sequence any time later in a body of text. These patterns of bits that correspond to string tokens are written to an output file. Since commonly recurring sequences of words are registered to the dictionary as smaller (perhaps 12 bit) tokens, the corresponding dictionary bit code comes to represent a series of characters that would otherwise be longer than 12 bits. This is what facilitates the compression.

CODE:

The following code is an example program that performs Compression and Decompression of text files using a LZW algorithm that encodes to 12 bit values.

This is algorithm was provided with a link as a potential project through the group projects page, but here it is again: https://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm

The body of code for the algorithm is as follows:

// Compile with gcc 4.7.2 or later, using the following command line: // // g++ -std=c++0x lzw.c -o lzw // //LZW algorithm implemented using fixed 12 bit codes.

1. include <iostream>
2. include <sstream>
3. include <fstream>
1. include <bitset>
2. include <string>
3. include <unordered_map>
1. define MAX_DEF 4096

using namespace std;

string convert_int_to_bin(int number) {

```   string result = bitset<12>(number).to_string();
return result;
```

}

void compress(string input, int size, string filename) {

```   unordered_map<string, int> compress_dictionary(MAX_DEF);
//Dictionary initializing with ASCII
for ( int unsigned i = 0 ; i < 256 ; i++ ){
compress_dictionary[string(1,i)] = i;
}
string current_string;
unsigned int code;
unsigned int next_code = 256;
//Output file for compressed data
ofstream outputFile;
outputFile.open(filename + ".lzw");
```
```   for(char& c: input){
current_string = current_string + c;
if ( compress_dictionary.find(current_string) ==compress_dictionary.end() ){
if (next_code <= MAX_DEF)
compress_dictionary.insert(make_pair(current_string, next_code++));
current_string.erase(current_string.size()-1);
outputFile << convert_int_to_bin(compress_dictionary[current_string]);
current_string = c;
}
}
if (current_string.size())
outputFile << convert_int_to_bin(compress_dictionary[current_string]);
outputFile.close();
```

}

void decompress(string input, int size, string filename) {

```   unordered_map<unsigned int, string> dictionary(MAX_DEF);
//Dictionary initializing with ASCII
for ( int unsigned i = 0 ; i < 256 ; i++ ){
dictionary[i] = string(1,i);
}
string previous_string;
unsigned int code;
unsigned int next_code = 256;
//Output file for decompressed data
ofstream outputFile;
outputFile.open(filename + "_uncompressed.txt");
```
```   int i =0;
while (i<size){
//Extracting 12 bits and converting binary to decimal
string subinput = input.substr(i,12);
bitset<12> binary(subinput);
code = binary.to_ullong();
i+=12;
```
```       if ( dictionary.find(code) ==dictionary.end() )
dictionary.insert(make_pair(code,(previous_string + previous_string.substr(0,1))));
outputFile<<dictionary[code];
if ( previous_string.size())
dictionary.insert(make_pair(next_code++,previous_string + dictionary[code][0]));
previous_string = dictionary[code];
}
outputFile.close();
```

}

string convert_char_to_string(const char *pCh, int arraySize){

```   string str;
if (pCh[arraySize-1] == '\0') str.append(pCh);
else for(int i=0; i<arraySize; i++) str.append(1,pCh[i]);
return str;
```

}

static void show_usage() {

```       cerr << "Usage: \n"
<< "Specify the file that needs to be compressed or decompressed\n"
<<"lzw -c input    #compress file input\n"
<<"lzw -d input    #decompress file input\n"
<<"Compressed data will be found in a file with the same name but with a .lzw extension\n"
<<"Decompressed data can be found in a file with the same name and a _uncompressed.txt extension\n"
<< endl;
```

}

int main (int argc, char* argv[]) {

```   streampos size;
char * memblock;
```
```   if (argc <2)
{
show_usage();
return(1);
}
ifstream file (argv[2], ios::in|ios::binary|ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg (0, ios::beg);
file.read (memblock, size);
file.close();
string input = convert_char_to_string(memblock,size);
if (string( "-c" ) == argv[1] )
compress(input,size, argv[2]);
else if (string( "-d" ) == argv[1] )
decompress(input,size, argv[2]);
else
show_usage();
}
else {
cout << "Unable to open file."<<endl;
show_usage();
}
return 0;
```

}

PROFILING:

The above program needs an input file to compress and decompress text in. For the purposes of testing, the Gutenberg press' complete works of Shakespeare was used as an input text file (http://www.gutenberg.org/files/100/100-0.txt) because it represents a large enough body of text to actually have perceptible run times for compression.

The text inside the file may have to be duplicated several times later to provide a reasonable data size if we proceed to optimize the compression algorithm for execution on a gpu for it to create perceptible run times.