Revision as of 13:28, 4 March 2013

GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Team DAG

Team Members

Chris Schreiber, Team Lead

Progress

Assignment 1

Project selection discussed with Chris Szalwinski. Configuring local working environment and hardware for working with the CERN project source code.

Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels. All 4 of these procedure calls are part of the Fortran library included for executing the analysis. There are some portions of the 'main' method in the Drive_God_lin.c code which include a parallel OpenMP pragma, and this could also be tuned for some improvement for initialization of the data arrays, but may not provide improvement for the reading of the data file.

(Each sample counts as 0.01 seconds.)

Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile). (Each sample counts as 0.01 seconds.)
%Time	Cum Sec	Self Sec	Calls	Self ms/Call	Total ms/call	Name
47.66	9.99	9.99	314400	0.03	0.03	zfunr_
30.45	16.73	6.38	524	12.18	12.18	ordres_
10.84	18.64	2.27	314400	0.01	0.01	cfft_
3.53	19.38	0.74	314400	0.00	0.04	tunelasr_
3.34	20.08	0.70	1048	0.67	13.42	spectrum_

The method call with the highest time per call is the ordres_ method. This indicates it may be the best target for a parallel implementation.

@@ Line 11: / Line 11: @@
 Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels.
 All 4 of these procedure calls are part of the Fortran library included for executing the analysis.  There are some portions of the 'main' method in the Drive_God_lin.c code which include a parallel OpenMP pragma, and this could also be tuned for some improvement for initialization of the data arrays, but may not provide improvement for the reading of the data file.
-Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile).
+(Each sample counts as 0.01 seconds.)
-Each sample counts as 0.01 seconds.
-  %   cumulative   self              self     total
+{| class="wikitable" border="1"
+|+ Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile).
+ (Each sample counts as 0.01 seconds.)
+! %Time !! Cum Sec !! Self Sec !! Calls !! Self ms/Call !! Total ms/call !! Name
+|-
+| 47.66 || 9.99 || 9.99 || 314400 || 0.03 || 0.03 || zfunr_
-time   seconds   seconds    calls  ms/call  ms/call  name
+|-
+| 30.45 || 16.73 || 6.38 || 524 || 12.18 || 12.18 || ordres_
+|-
-.66      9.99     9.99   314400     0.03     0.03  zfunr_
+| 10.84 || 18.64 || 2.27 || 314400 || 0.01 || 0.01 || cfft_
-.45     16.37     6.38      524    12.18    12.18  ordres_
+|-
-.84     18.64     2.27   314400     0.01     0.01  cfft_
+|  3.53 || 19.38 || 0.74 || 314400 || 0.00 || 0.04 || tunelasr_
-.53     19.38     0.74   314400     0.00     0.04  tunelasr_
+|-
-.34     20.08     0.70     1048     0.67    13.42  spectrum_
+|  3.34 || 20.08 || 0.70 || 1048 || 0.67 || 13.42 || spectrum_
+|}
+The method call with the highest time per call is the  ordres_ method.  This indicates it may be the best target for a parallel implementation.
 === Assignment 2 ===
 === Assignment 3 ===

Difference between revisions of "GPU610/Team DAG"

Revision as of 13:28, 4 March 2013

Contents

Team DAG

Team Members

Progress

Assignment 1

Assignment 2

Assignment 3

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools