Changes

Jump to: navigation, search

GPU610 Team Tsubame

196 bytes added, 14:54, 12 April 2017
Attempt 1:
== Team Members ==
# [mailto:mavillaflor@myseneca.ca?subject=GPU610 Mark Anthony Villaflor] (Leader; Wiki; Maze)# [mailto:hli206@myseneca.ca?subject=GPU610 Huachen Li] (Wiki; Prime Numbers)
# [mailto:ylei11@myseneca.ca?subject=GPU610 Yanhao Lei] (Wiki; Pi)
[mailto:mavillaflor@myseneca.ca;hli206@myseneca.ca;ylei11@myseneca.ca?subject=GPU610 eMail All]
* February 10, 2017 - Completed Phase 1 (Select and Assess)
* March 20, 2017 - Added instructions to compile and execute Maze in Visual Studio
* April 4, 2017 - Updated Phase 2 and 3
== PHASE 1 ==
0.00 3.84 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN9MazeDebugC2Ejj
0.00 3.84 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
 
*Profiling with windows environment:
 
[[File:SDiagram.PNG]]
'''Summary:'''
=== Maze ===
[[File:Out2.png|thumb|Maze 3x3]]
 
[[File:Out1.png|thumb|Maze 10x10]]
 
'''Introduction:'''
 
The Maze program (by corzani@github) generates a maze image (of type PNG).
 
The following function creates the canvas of the Maze and initialize it by filling all the hexadecimal within each pixel. After the canvas is filled, the function calls the createImage function. Finally, the function write the result from the createImage function in a png file using the "lpng" extension.
 
void MazePng::toPng(unsigned int scale) {
...
color_type = PNG_COLOR_TYPE_RGB;
bit_depth = 8;
row_pointers = new png_bytep[height];
// This for loop filling all the hexadecimal within each pixel.
for (int i = 0; i < height; ++i) {
row_pointers[i] = new png_byte[width * 3];
// The inner for loop fills all three colours (red, green, blue) to the pixel.
for (int j = 0; j < width * 3; j += 3) {
row_pointers[i][j] = WALL;
row_pointers[i][j + 1] = WALL;
row_pointers[i][j + 2] = WALL;
}
}
createImage(row_pointers, 0);
...
}
 
The following function carves the path of the Maze by changing the colour of pixels to white.
void MazePng::createImage(png_bytep *row_pointers, unsigned int scale) {
...
//This loops sets the pixels of the row_pointers using the cells pattern
for (unsigned int y = 0; y < height; ++y) {
setPixel(row_pointers, 0, (y * 2) + 1, WALL);
for (unsigned int x = 0; x < width; ++x) {
switch ((cells[(y * width) + x] & 0xC0) >> 6) {
case 2:
temp[0] = PATH;
setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, temp[0]);
break;
case 1:
temp[1] = PATH;
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, temp[1]);
break;
case 0:
setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, temp[0]);
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, temp[1]);
break;
}
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 1, PATH);
}
}
}
'''Testing Environment:'''
* Operating system: Windows 8.1 Enterprise 64-bit (6.3, Build 9600) (9600.winblue_ltsb.160812-0914)
PM> '''Install-Package libpng'''
6. In VS, Open MazePng.cpp, Replace replace the entire MazePng.cpp file with Paralleling code in the followingcode:
/*
* MazePng.cpp
// Call the kernel wrapper...
kw_drawWalls(row_pointers, this->cells, this->width, this->height, width, height, WALL, start);
// createImage(row_pointers, 0);
fp = fopen("out.png", "wb");
}
void inline MazePng::setPixel(png_bytep *row_pointers, unsigned int x, unsigned int y,
png_byte type) {
row_pointers[y][0 + 3 * x] = row_pointers[y][1 + 3 * x] = row_pointers[y][2 + 3 * x] = type;
}
void MazePng::createImage(png_bytep *row_pointers, unsigned int scale) {
if (start < width) {
setPixel(row_pointers, start * 2 + 1, 0, PATH);
}
for (unsigned int y = 0; y < height; ++y) { // for each row in range [1, actual height (not the number of pixels) )...
for (unsigned int x = 0; x < width; ++x) { // for each cell in range [1, actual width)...
switch ((cells[(y * width) + x] & 0xC0) >> 6) {
case 2:
setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, PATH); // even x, odd y
break;
case 1:
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, PATH); // odd x, even y
break;
case 0:
setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, PATH); // even x, odd y
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, PATH); // odd x, even y
break;
}
setPixel(row_pointers, 1 + (x * 2), (y * 2) + 1, PATH); // odd
}
}
}
7. In VS, right click on the project in the Solution explorer and Add -> '''New Item'''; Add a new '''Cuda c++ header''' file, name it '''"MazePng.cuh"'''
8. Cope Copy and paste the following code into '''"MazePng.cuh"''':
// MazePng.cuh
9. In VS, right click on the project in the Solution explorer and Add -> '''New Item'''; Add a new '''Cuda c++''' file, name it '''"MazePng.cu"'''
10. Cope Copy and paste the following code into '''"MazePng.cu"''':
// MazePng.cu
}
// Kernel wrapper...
void kw_drawWalls(png_bytep*& row_pointers, const short* cells, const int width, const int height, const int pixWidth, const int pixHeight, const png_byte WALL, const int start) {
int rowLen = pixWidth * 3;
k_drawWalls << <nblks, ntpb >> > (d_rows, d_cells, width, height, rowLen, szRows);
// cudaDeviceSynchronize();
cudaStream_t* stream = new cudaStream_t[pixHeight];
12. In VS, go to menu Build -> Build Solution
'''Analysis:'''
 
[[File:SPDiagram.PNG]]
'''Parallelize:'''The diagram shows that maze less than the size of 2,250,000 cells perform better in the serial code. However, the parallelized code has better performance if there are more cells to process.
== PHASE 3 ==
== Presentation ==
 
'''1. Introduction'''
* Maze
The program generates a maze in png file. It takes 2 argument, the height and the width of the maze.
 
* Analysis:
 
'''Parallelizable?'''
The program function named toPng() has a runtime of O(n^2) takes up an average of 42% (min: 35%; max: 50%) of the execution time.
 
for (int i = 0; i < height; ++i) {
row_pointers[i] = new png_byte[width * 3];
for (int j = 0; j < width * 3; j += 3) {
row_pointers[i][j] = WALL;
row_pointers[i][j + 1] = WALL;
row_pointers[i][j + 2] = WALL;
}
}
 
* Profiling:
[[File:SDiagram.PNG]]=== Attempt 1 ===
'''We tried to optimize the maze program by removing some of the if statements to reduce thread divergence, however the attempt made the program slower than the phase 2version and only a little bit faster than the serial version. Parallelize'''
*Kernel'''New Kernels:''' __global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < size) { rows[i] = WALL; __syncthreads(); int px = i % len; int py = i / len; int x = (px - 1) / 2; int y = (py - 1) / 2; if (px > 0 && py > 0 && x < width && y < height) { int c = (cells[y * width + x] & 0xC0) >> 6; int idx = py * len + 3 * px; if (c == 2) { if Initialize all pixels to black (py % 2 > 0 && px % 2 == 0hex 000) { rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH; } } else if (c == 1) { if (py % 2 == 0 && px % 2 > 0) { rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH; } } else if (c == 0) { if ((py % 2 > 0 && px % 2 == 0) || (py % 2 == 0 && px % 2 > 0)) { rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH; } } if (py % 2 > 0 && px % 2 > 0) { rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH; } } } }  *Profiling: [[File:SPDiagram.PNG]]  *Sum up: **Maze size < 1500 * 1500**Maze size > 1500 * 1500 '''3. Optimize''' * New Kernel
__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
}
// Set pixels to white according to the pattern the cell belongs to
__global__ void k_drawPaths(png_byte* rows, const short* cells, const int width, const int height, const int len) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
}
Because there was too many else if condition in the old Kernel, we rewrite the Kernel to avoid divergent.'''Analysis:'''
[[File:SPODiagram.PNG]]
*Sum up: For this attempt, the kernel executes for each cell instead of for each byte (in phase 2); it is no longer processing more than 1 pixel in each cell for every thread, which may be the cause to the longer processing time. === Attempt 2 === **Still too many if statement We decided to use shared memory in the Kernel**Almost GPU to improve the speed of the program. However, the maze image does not show correctly: the same section paths were showing as randomly coloured pixels; the cause is due to the serialthreads setting only a part (1/3, 2/3) of a pixel's values to hexadecimal F.
240
edits

Navigation menu