Changes

GPU610/Team DAG

198 bytes added, 13:28, 4 March 2013

→‎Assignment 1

Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels.

All 4 of these procedure calls are part of the Fortran library included for executing the analysis. There are some portions of the 'main' method in the Drive_God_lin.c code which include a parallel OpenMP pragma, and this could also be tuned for some improvement for initialization of the data arrays, but may not provide improvement for the reading of the data file.

~~Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile).~~

(Each sample counts as 0.01 seconds.)

~~Each sample counts as 0.01 seconds.~~

{| class="wikitable" border="1"|+ Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile). (Each sample counts as 0.01 seconds.)! % ~~cumulative self self total~~ Time !! Cum Sec !! Self Sec !! Calls !! Self ms/Call !! Total ms/call !! Name |-| 47.66 || 9.99 || 9.99 || 314400 || 0.03 || 0.03 || zfunr_

~~time seconds seconds calls ms/call ms/call name~~|-

| 30.45 || 16.73 || 6.38 || 524 || 12.18 || 12.18 || ordres_

|-

47| 10.~~66 9~~84 || 18.~~99 9~~64 || 2.99 27 || 314400 || 0.03 01 || 0.~~03 zfunr_~~01 || cfft_

~~30.45 16.37 6.38 524 12.18 12.18 ordres_~~|-

| 103.~~84 18~~53 || 19.~~64 2~~38 || 0.27 74 || 314400 || 0.01 00 || 0.~~01 cfft_~~04 || tunelasr_

~~3.53 19.38 0.74 314400 0.00 0.04 tunelasr_~~|-

| 3.34 || 20.08 || 0.70 || 1048 || 0.67 || 13.42 || spectrum_|} The method call with the highest time per call is the ~~spectrum_~~ordres_ method. This indicates it may be the best target for a parallel implementation.

=== Assignment 2 ===

=== Assignment 3 ===

Christopher Schreiber

1

edit

Changes

GPU610/Team DAG

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools