Difference between revisions of "GPU610/OctoPig"

From CDOT Wiki
Jump to: navigation, search
(Chad's Profiling Findings)
(Chad's Profiling Findings)
Line 23: Line 23:
 
I decided to scrap the randomization part of the original program for now because I want to have consistent profile results every run. I chose to focus on the seed found [http://maximecb.github.com/Turing-Drawings/#6,4,1,2,0,1,3,1,4,3,3,1,3,0,5,2,1,4,2,0,2,1,1,4,1,0,1,3,0,5,1,3,1,1,3,0,1,1,2,3,3,5,1,1,3,1,0,3,1,2,1,1,1,1,3,3,3,2,2,2,3,3,2,1,3,5,3,0,2,3,2,4,1,1 here] because who doesn't love the Matrix. I also decided to only generate 1000 images each time the program is run.
 
I decided to scrap the randomization part of the original program for now because I want to have consistent profile results every run. I chose to focus on the seed found [http://maximecb.github.com/Turing-Drawings/#6,4,1,2,0,1,3,1,4,3,3,1,3,0,5,2,1,4,2,0,2,1,1,4,1,0,1,3,0,5,1,3,1,1,3,0,1,1,2,3,3,5,1,1,3,1,0,3,1,2,1,1,1,1,3,3,3,2,2,2,3,3,2,1,3,5,3,0,2,3,2,4,1,1 here] because who doesn't love the Matrix. I also decided to only generate 1000 images each time the program is run.
  
After profiling I determined that 99% of the run was being spent in two functions, 89% and 10% respectively. The code for the two functions is below.
+
After profiling I determined that 99% of the run was being spent in two functions, 89% and 10% respectively. It's also interesting to note that the function that took 89% gets called 70000 times. The code for the two functions is below.
  
 
<source lang="cpp">
 
<source lang="cpp">
Line 118: Line 118:
 
}
 
}
 
</source>
 
</source>
 +
 +
Because ~90% of the run time is take up by one function it's really the only possible candidate for moving to the GPU and optimizing. Upon closer inspection though you can see that the update function only contains one for loop and the for loop essentially snakes through the image map 1 pixel at a time. I don't think it can be parallelized because each iteration is dependent on the one before. The image creation is reliant on the snaking effect.
  
 
'''Analysis:'''
 
'''Analysis:'''

Revision as of 00:39, 21 February 2013


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

TBA

Team Members

  1. Chad Pilkey
  2. Justin Robinson

Email All

Progress

Assignment 1

We began by profiling two things: a string-comparison algorithm, and a program which develops quasi-random visuals from an initial seed using a Turing Machine algorithm.


Justin's Findings

String comparison was not a good choice. If anything, it makes more sense to do string comparison in a serial fashion than in parallel. Comparing two strings of over 13,000 characters each in such a fashion as to force the algorithm to compare them all just barely results in a runtime of 0.001 seconds.


Chad's Profiling Findings

I started with this blog post I found and the demo that Maxime Chevalier-Boisvert (the original creator) posted as a github page here. Make sure to press random a few times to see the different results. She also posted the source code here, the only issue was that it's all written in JavaScript. My main task for completely the assignment was to somehow port the code into C++. The biggest challenge while porting was migrating from a loosely typed language to a strongly typed one, but I managed in the end to successfully port and compile the code. I'm sure there's a lot of cleaning up I can do with my C++ code, but this first assignment was more just a proof of concept than anything else.

I decided to scrap the randomization part of the original program for now because I want to have consistent profile results every run. I chose to focus on the seed found here because who doesn't love the Matrix. I also decided to only generate 1000 images each time the program is run.

After profiling I determined that 99% of the run was being spent in two functions, 89% and 10% respectively. It's also interesting to note that the function that took 89% gets called 70000 times. The code for the two functions is below.

void Program::update(int numItrs) {
	for (int i = 0; i < numItrs; ++i)
	{
		int sy = map[mapWidth * yPos + xPos];
		int st = state;

		int idx = (numStates * sy + st) * 3;
		st = table[idx + 0];
		sy = table[idx + 1];
		int ac = table[idx + 2];

		// Update the current state
		state = st;

		// Write the new symbol
		map[mapWidth * yPos + xPos] = sy;

		// Perform the transition action
		switch (ac)
		{
			case ACTION_LEFT:
			xPos += 1;
			if (xPos >= mapWidth)
				xPos -= mapWidth;
			break;

			case ACTION_RIGHT:
			xPos -= 1;
			if (xPos < 0)
				xPos += mapWidth;
			break;

			case ACTION_UP:
			yPos -= 1;
			if (yPos < 0)
				yPos += mapHeight;
			break;

			case ACTION_DOWN:
			yPos += 1;
			if (yPos >= mapHeight)
				yPos -= mapHeight;
			break;

			default:
			error("invalid action: " + ac);
		}

		itrCount++;
	}
}
void updateRender(tmach::Program* program) {
	float startTime = tmach::getTimeMillis();
	int startItrc = program->itrCount;

		// Until the update time is exhausted
	do {
		// Update the program
		program->update(5000);

		float curTime = tmach::getTimeMillis();
		int curItrc = program->itrCount;

		if (curItrc - startItrc >= UPDATE_ITRS ||
			curTime - startTime >= UPDATE_TIME)
		break;
	} while (true);

	// Produce the image data
	int* map = program->map;
	int map_size = program->size_map;
	unsigned char data [WIDTH * HEIGHT * 3];
	for (int i = 0; i < program->size_map; ++i)
	{
		int sy = map[i];

		int r = colorMap[3 * sy + 0];
		int g = colorMap[3 * sy + 1];
		int b = colorMap[3 * sy + 2];

		data[3 * i + 0] = r;
		data[3 * i + 1] = g;
		data[3 * i + 2] = b;
	}

	// Show the image data
	tmach::createBMP(WIDTH, HEIGHT, data, WIDTH*HEIGHT*4, "temp.bmp");
}

Because ~90% of the run time is take up by one function it's really the only possible candidate for moving to the GPU and optimizing. Upon closer inspection though you can see that the update function only contains one for loop and the for loop essentially snakes through the image map 1 pixel at a time. I don't think it can be parallelized because each iteration is dependent on the one before. The image creation is reliant on the snaking effect.

Analysis: Of our two selections, the choice to move forward with the pattern generator is clear.

Difficulties Met:

One apparent difficulty is the need to port the pattern-generator from Javascript into C in order to make use of CUDA.


Summary:

  • String comparison is not an efficient use of parallel programming practises.


Resources:

Assignment 2

Assignment 3