Open main menu

CDOT Wiki β

Changes

GPU610 Team Tsubame

15,118 bytes added, 14:54, 12 April 2017
Attempt 1:
= TBD... Maze =
== Team Members ==
# [mailto:mavillaflor@myseneca.ca?subject=GPU610 Mark Anthony Villaflor] (Leader; Wiki; Maze)# [mailto:hli206@myseneca.ca?subject=GPU610 Huachen Li](Wiki; Prime Numbers)# [mailto:ylei11@myseneca.ca?subject=GPU610 Yanhao Lei](Wiki; Pi) [mailto:mavillaflor@myseneca.ca,;hli206@myseneca.ca,;ylei11@myseneca.ca?subject=GPU610 eMail All]
== Progress ==
 
* February 10, 2017 - Completed Phase 1 (Select and Assess)
* March 20, 2017 - Added instructions to compile and execute Maze in Visual Studio
* April 4, 2017 - Updated Phase 2 and 3
== PHASE 1 ==
29 std::chrono::steady_clock::time_point ts, te;
30 ts = std::chrono::steady_clock::now();
31 long double pi = 4.0 * arctan1( n ); 32 te = std::chrono::steady_clock::now(); 33 reportTime("Arctangent(1) ", te - ts);
34
35 // Maximum length of a long double is 64 digits; minus "3." gives 62 digits.
04 #include <chrono>
05
06 // Duplicated from [https://scs.senecac.on.ca/~gpu610/pages/workshops/w2.htmlWorkshop 2 - BLAS]
07 void reportTime(const char* msg, std::chrono::steady_clock::duration span) {
08 auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(span);
Example:
> maze 5000 5000
 
'''Update (03.20.2017) - How to setup in Visual Studio 2015'''
 
1. Download source files from https://github.com/corzani/maze
 
2. Create a new Visual Studio project
 
3. Copy all header and cpp files from the downloaded maze directory to the '''project''' directory (not the solution directory)
 
4. In VS, right click on the project in the Solution explorer and Add -> Existing Item; pick the copied files from the previous step
 
5. In VS, go to menu Tools -> NuGet Package Manager -> Package Manager Console; in the new console (at the bottom of the screen in default configuration), execute:
PM> '''Install-Package libpng'''
 
6. In VS, Open MazePng.cpp, remove the following line:
#include <unistd.h>
 
7. In VS, change the Configuration from '''Debug to Release''' and Platform from '''x86 to x64'''
 
8. In VS, go to menu Build -> Build Solution
'''Analysis:'''
0.00 3.84 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN9MazeDebugC2Ejj
0.00 3.84 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
 
*Profiling with windows environment:
 
[[File:SDiagram.PNG]]
'''Summary:'''
----
 
=== Prime Numbers ===
 
Script started on Fri 10 Feb 2017 12:21:37 AM EST
hli206@matrix:~/GPU610/Assignment/primeNum> cat primeNum.custom.cpp
#include<iostream>
#include<ctime>
#include<chrono>
#include<cstdlib>
#include<math.h> // Math.h For sqrt function
using namespace std;
using namespace std::chrono;
void init(float* a, int n) {
const float randf = 1.0f / (float) RAND_MAX;
for (int i = 0; i < n; i++)
a[i] = std::rand() * randf;
}
void reportTime(const char* msg, steady_clock::duration span) {
auto ms = duration_cast<milliseconds>(span);
std::cout << msg << " - took - " <<
ms.count() << " millisecs" << std::endl;
}
void primeFinder(int num, int bn) {
for (int i=2; i<num; i++)
for (int j=2; j*j<=i; j++)
{
if (i % j == 0)
break;
else if (j+1 > sqrt(i)) {
if(bn == 1)
cout << i << endl;
}
}
}
int main() {
int n, boolN;
steady_clock::time_point ts, te;
cout<<"Enter the Number :";
cin>>n;
cout<<"Show result?(1/0)"<<endl;
cin>>boolN;
cout<<"List Of Prime Numbers Below "<<n<<endl;
std::srand(std::time(nullptr));
ts = steady_clock::now();
primeFinder(n, boolN);
te = steady_clock::now();
reportTime("Prime Number(); ", te - ts);
return 0;
}
hli206@matrix:~/GPU610/Assignment/primeNum> cat Makefile
# Makefile for Assignment1
#
VER=custom
GCC_VERSION = 5.2.0
PREFIX = /usr/local/gcc/${GCC_VERSION}/bin/
CC = ${PREFIX}gcc
CPP = ${PREFIX}g++
primeNum.${VER}: primeNum.${VER}.o
$(CPP) -oprimeNum.${VER} primeNum.${VER}.o
primeNum.${VER}.o: primeNum.${VER}.cpp
$(CPP) -c -O2 -std=c++14 primeNum.${VER}.cpp
clean:
rm *.o
hli206@matrix:~/GPU610/Assignment/primeNum> make
make: `primeNum.custom' is up to date.
hli206@matrix:~/GPU610/Assignment/primeNum> primeNum.custom
Enter the Number :50
Show result?(1/0)
1
List Of Prime Numbers Below 50
5
7
11
13
17
19
23
29
31
37
41
43
47
Prime Number(); - took - 0 millisecs
hli206@matrix:~/GPU610/Assignment/primeNum> primeNum.custom
Enter the Number :100000
Show result?(1/0)
0
List Of Prime Numbers Below 100000
Prime Number(); - took - 84 millisecs
hli206@matrix:~/GPU610/Assignment/primeNum> primeNum.custom
Enter the Number :1000000
Show result?(1/0)
0
List Of Prime Numbers Below 1000000
Prime Number(); - took - 2065 millisecs
hli206@matrix:~/GPU610/Assignment/primeNum> primNum.custom
If 'primNum.custom' is not a typo you can use command-not-found to lookup the package that contains it, like this:
cnf primNum.custom
hli206@matrix:~/GPU610/Assignment/primeNum> primeNum.custom
Enter the Number :10000000
Show result?(1/0)
0
List Of Prime Numbers Below 10000000
Prime Number(); - took - 53504 millisecs
hli206@matrix:~/GPU610/Assignment/primeNum> exit
exit
Script done on Fri 10 Feb 2017 12:25:53 AM EST
 
'''Analysis'''
 
This is a simple c++ program for finding any prime number within range. The program function primeFinder () is the only function in the program so it takes up to nearly 100% of the execution time. Moreover, primeFinder() has a runtime of O(n^2).
 
primeFinder() function:
void primeFinder(int num, int bn) {
for (int i=2; i<num; i++)
for (int j=2; j*j<=i; j++)
{
if (i % j == 0)
break;
else if (j+1 > sqrt(i)) {
if(bn == 1)
cout << i << endl;
}
}
}
 
 
----
 
=== Recommendation ===
 
Maze.
 
Although Pi may have more potential, its calculated results are too limited. On the other hand, Maze has a higher limit which will enable us to do better testing.
== PHASE 2 ==
=== Maze ===
 
[[File:Out2.png|thumb|Maze 3x3]]
 
[[File:Out1.png|thumb|Maze 10x10]]
 
'''Introduction:'''
 
The Maze program (by corzani@github) generates a maze image (of type PNG).
 
The following function creates the canvas of the Maze and initialize it by filling all the hexadecimal within each pixel. After the canvas is filled, the function calls the createImage function. Finally, the function write the result from the createImage function in a png file using the "lpng" extension.
 
void MazePng::toPng(unsigned int scale) {
...
color_type = PNG_COLOR_TYPE_RGB;
bit_depth = 8;
row_pointers = new png_bytep[height];
// This for loop filling all the hexadecimal within each pixel.
for (int i = 0; i < height; ++i) {
row_pointers[i] = new png_byte[width * 3];
// The inner for loop fills all three colours (red, green, blue) to the pixel.
for (int j = 0; j < width * 3; j += 3) {
row_pointers[i][j] = WALL;
row_pointers[i][j + 1] = WALL;
row_pointers[i][j + 2] = WALL;
}
}
createImage(row_pointers, 0);
...
}
 
The following function carves the path of the Maze by changing the colour of pixels to white.
void MazePng::createImage(png_bytep *row_pointers, unsigned int scale) {
...
//This loops sets the pixels of the row_pointers using the cells pattern
for (unsigned int y = 0; y < height; ++y) {
setPixel(row_pointers, 0, (y * 2) + 1, WALL);
for (unsigned int x = 0; x < width; ++x) {
switch ((cells[(y * width) + x] & 0xC0) >> 6) {
case 2:
temp[0] = PATH;
setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, temp[0]);
break;
case 1:
temp[1] = PATH;
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, temp[1]);
break;
case 0:
setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, temp[0]);
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, temp[1]);
break;
}
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 1, PATH);
}
}
}
'''Testing Environment:'''
* Operating system: Windows 8.1 Enterprise 64-bit (6.3, Build 9600) (9600.winblue_ltsb.160812-0914)
* Compiler: Visual Studio 2015
* Processor: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (8 CPUs), ~3.6GHz
* System memory: 16384MB RAM
 
'''How to setup in Visual Studio 2015 in Cuda:'''
 
1. Download source files from https://github.com/corzani/maze
 
2. Create a new Cuda project
 
3. Copy all header and cpp files from the downloaded maze directory to the '''project''' directory (not the solution directory)
 
4. In VS, right click on the project in the Solution explorer and Add -> Existing Item; pick the copied files from the previous step
 
5. In VS, go to menu Tools -> NuGet Package Manager -> Package Manager Console; in the new console (at the bottom of the screen in default configuration), execute:
PM> '''Install-Package libpng'''
 
6. In VS, replace the entire MazePng.cpp file with the following code:
/*
* MazePng.cpp
*
* Created on: 12 Jul 2013
* Author: yac
*/
#include "MazePng.h"
// Include the parallel code...
#include "MazePng.cuh"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
MazePng::MazePng(const unsigned int width, const unsigned int height) : AbstractMaze(width, height) { }
MazePng::~MazePng() { }
void MazePng::toPng(unsigned int scale) {
png_byte color_type;
png_byte bit_depth;
png_structp png_ptr;
png_infop info_ptr;
png_bytep *row_pointers;
int height, width;
FILE *fp;
width = (this->width * 2) + 1;
height = (this->height * 2) + 1;
color_type = PNG_COLOR_TYPE_RGB;
bit_depth = 8;
row_pointers = new png_bytep[height];
// Call the kernel wrapper...
kw_drawWalls(row_pointers, this->cells, this->width, this->height, width, height, WALL, start);
fp = fopen("out.png", "wb");
png_ptr = png_create_write_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL);
info_ptr = png_create_info_struct(png_ptr);
png_init_io(png_ptr, fp);
png_set_IHDR(png_ptr, info_ptr, width, height, bit_depth, color_type,
PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_BASE,
PNG_FILTER_TYPE_BASE);
png_write_info(png_ptr, info_ptr);
png_write_image(png_ptr, row_pointers);
png_write_end(png_ptr, NULL);
png_destroy_write_struct(&png_ptr, &info_ptr);
for (int i = 0; i < height; ++i) {
delete[] row_pointers[i];
}
delete[] row_pointers;
fclose(fp);
}
 
7. In VS, right click on the project in the Solution explorer and Add -> '''New Item'''; Add a new '''Cuda c++ header''' file, name it '''"MazePng.cuh"'''
 
8. Copy and paste the following code into '''"MazePng.cuh"''':
 
// MazePng.cuh
#ifndef MAZEPNG_CUH_
#define MAZEPNG_CUH_
#include "MazePng.h"
#include <cuda_runtime.h>
const int ntpb = 1024;
void kw_drawWalls(png_bytep*& row_pointers, const short* cells, const int width, const int height, const int pixWidth, const int pixHeight, const png_byte WALL, const int start);
#endif
 
9. In VS, right click on the project in the Solution explorer and Add -> '''New Item'''; Add a new '''Cuda c++''' file, name it '''"MazePng.cu"'''
 
10. Copy and paste the following code into '''"MazePng.cu"''':
 
// MazePng.cu
#include "MazePng.cuh"
__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < size) {
rows[i] = WALL;
__syncthreads();
int px = i % len;
int py = i / len;
int x = (px - 1) / 2;
int y = (py - 1) / 2;
if (px > 0 && py > 0 && x < width && y < height) {
int c = (cells[y * width + x] & 0xC0) >> 6;
int idx = py * len + 3 * px;
if (c == 2) {
if (py % 2 > 0 && px % 2 == 0) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
else if (c == 1) {
if (py % 2 == 0 && px % 2 > 0) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
else if (c == 0) {
if ((py % 2 > 0 && px % 2 == 0) || (py % 2 == 0 && px % 2 > 0)) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
if (py % 2 > 0 && px % 2 > 0) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
}
}
// Kernel wrapper...
void kw_drawWalls(png_bytep*& row_pointers, const short* cells, const int width, const int height, const int pixWidth, const int pixHeight, const png_byte WALL, const int start) {
int rowLen = pixWidth * 3;
for (int i = 0; i < pixHeight; ++i) {
row_pointers[i] = new png_byte[rowLen];
}
png_byte* d_rows;
short* d_cells;
int nblks = (pixHeight * rowLen + ntpb - 1) / ntpb;
int szRows = pixHeight * rowLen * sizeof(png_byte);
int szCells = height * width * sizeof(short);
cudaMalloc((void**)&d_rows, szRows);
cudaMalloc((void**)&d_cells, szCells);
cudaMemcpy(d_cells, cells, szCells, cudaMemcpyHostToDevice);
k_drawWalls << <nblks, ntpb >> > (d_rows, d_cells, width, height, rowLen, szRows);
cudaDeviceSynchronize();
cudaStream_t* stream = new cudaStream_t[pixHeight];
for (int i = 0; i < pixHeight; i++) {
cudaStreamCreate(&stream[i]);
}
for (int i = 0; i < pixHeight; i++) {
cudaMemcpyAsync(row_pointers[i], d_rows + (i * rowLen), rowLen * sizeof(png_byte), cudaMemcpyDeviceToHost, stream[i]);
}
for (int i = 0; i < pixHeight; i++) {
cudaStreamDestroy(stream[i]);
}
if (start < width) {
int x = start * 2 + 1;
row_pointers[0][3 * x] = row_pointers[0][3 * x + 1] = row_pointers[0][3 * x + 2] = PATH;
}
delete[] stream;
cudaFree(d_cells);
cudaFree(d_rows);
}
 
11. In VS, change the Configuration from '''Debug to Release''' and Platform from '''x86 to x64'''
 
12. In VS, go to menu Build -> Build Solution
 
'''Analysis:'''
 
[[File:SPDiagram.PNG]]
 
The diagram shows that maze less than the size of 2,250,000 cells perform better in the serial code. However, the parallelized code has better performance if there are more cells to process.
== PHASE 3 ==
 
=== Attempt 1 ===
 
We tried to optimize the maze program by removing some of the if statements to reduce thread divergence, however the attempt made the program slower than the phase 2 version and only a little bit faster than the serial version.
 
'''New Kernels:'''
// Initialize all pixels to black (hex 000)
__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < size) {
rows[i] = WALL;
__syncthreads();
}
}
 
// Set pixels to white according to the pattern the cell belongs to
__global__ void k_drawPaths(png_byte* rows, const short* cells, const int width, const int height, const int len) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < width * height) {
int x = i % width;
int y = i / width;
int cas = (cells[i] & 0xC0) >> 6;
int oddY = (2 * y + 1) * len;
int evenY = (2 * y + 2) * len;
int oddX = (2 * x + 1) * 3;
int evenX = (2 * x + 2) * 3;
if (cas == 2) {
rows[oddY + evenX] = rows[oddY + evenX + 1] = rows[oddY + evenX + 2] = PATH;
}
else if (cas == 1) {
rows[evenY + oddX] = rows[evenY + oddX + 1] = rows[evenY + oddX + 2] = PATH;
}
else if (cas == 0) {
rows[oddY + evenX] = rows[oddY + evenX + 1] = rows[oddY + evenX + 2] = PATH;
rows[evenY + oddX] = rows[evenY + oddX + 1] = rows[evenY + oddX + 2] = PATH;
}
rows[oddY + oddX] = rows[oddY + oddX + 1] = rows[oddY + oddX + 2] = PATH;
}
}
 
'''Analysis:'''
 
[[File:SPODiagram.PNG]]
 
For this attempt, the kernel executes for each cell instead of for each byte (in phase 2); it is no longer processing more than 1 pixel in each cell for every thread, which may be the cause to the longer processing time.
 
=== Attempt 2 ===
 
We decided to use shared memory in the GPU to improve the speed of the program. However, the maze image does not show correctly: the paths were showing as randomly coloured pixels; the cause is due to the threads setting only a part (1/3, 2/3) of a pixel's values to hexadecimal F.
240
edits