Changes

Jump to: navigation, search

GPU610/TeamKCM

370 bytes added, 15:20, 5 December 2014
no edit summary
===== Sample Data = 500 =====
[[File:a3 as3v1-500.png]]
===== Sample Data = 1000 =====
[[File:a3 as3v1-2500.png]]
===== Sample Data = 5000 =====
[[File:a3 as3v1-5000.png]]
===== Sample Data = 10000 =====
[[File:a3 as3v1-10000.png]]
==== Chart ====
[[File:a3 duration chart.png]]
 
 
=== Assignment 3 v2 ===
==== Description of v2 ====
Changed Double precision to single precision float
 
==== Sample Date = 500 ====
[[File:as3v2-500.png]]
 
==== Sample Date = 1000 ====
[[File:as3v2-1000.png]]
 
==== Sample Date = 5000 ====
[[File:as3v2-5000.png]]
 
==== Sample Date = 10000 ====
[[File:as3v2-10000.png]]
 
==== Char ====
[[File:Table.png]]
 
 
==== Conclusions / Problems Encountered ====
Using CUDA, Our team achieved around 8000% speed up in total run time compare to original project and final result. We were certain that by implementing kernel to the original project will result in huge speed up because a calculation was done in a loop(serial) to get each heat points in specific time. To accomplish the project, first we focused on understanding the original project to find out the "hot spot" in the code and different variables and their uses. Working with heat equation problem was challenging, because we had to understand how heat points are calculated with a equation and had to find out which part/variables are needed and not to develop a kernel. The original project did not have any dependent variables given from user input, so we had to decide which part of the equation we want to be dependent. As we develop kernel and work with different resources in the program we had to decide how to work with and approach different memories in the program(accessing global memory, using shared memory..etc).One of the principle logic the program used is, to get a new heat value a last heat value is needed for calculation. So, when developing kernel our team had to figure out how many times memory copies are need and what parts can be done in GPU side(shared,register memories...) In result, we managed to minimize the number of memory copies from CPU to GPU and use more shared and register memories.
1
edit

Navigation menu