Changes

Jump to: navigation, search

GPU621 Team 1

3,361 bytes removed, 16:00, 1 December 2016
VTune Tutorial 2: Locks and Wait Tutorial
== VTune Tutorial 2: Locks and Wait Tutorial ==
'''1Reference: https://software. Prepare for Analysis'''intel.com/en-us/node/471876
NoteWork Flow of Analyzing locks and waits: configuration step is skipped and using default application configuration.
Determine the baseline (total execution time which you will compare subsequent runs of the application)[[File:1. gif]]
Do this by running the The sample application we are using for the first timethis tutorial is called "tachyon".
After running the For application, the baseline for configuration options and to get the first run is 6file refer to this: https://software.063sintel.com/en-us/node/471878
[[File:1.PNG]]  '''2. Find lock'''  '''2a.''' Choose and run locks and waits analysis:  With visual studio, click on "new analysis". Choose your analysis target (the application executable). Click on the Analysis Type tab. Under algorithm analysis, click on "Locks and Waits" and click start to run the analysis. Click on the Analysis Type tab. Under algorithm analysis, click on "Locks and Waits" and click start to run the analysis. [[File:3.PNG]]  You should see the Locks and Waits viewport and the summary of the results. [[File:4.PNG]]  '''2b.''' Interpret result data: To interpret the data on the sample code performance, do the following: 1. Analyze the basic performance metrics.2. Identify locks.  Analyze the basic performance metrics: The Result summary section provides data on the overall application performance per the following metric: [[File:5.PNG]] 1.) Elapsed Time is the total time the application ran, including data allocation and calculations 2.) Wait Time occurs when software threads are waiting due to APIs that block or cause synchronization. Wait Time is calculated per thread, so the total wait time may exceed the application Elapsed time. Expand the wait time metric to view a distribution per processor utilization level. In the sample application, most of the Wait time is characterized with an ineffective processor usage. 3.) Wait Count is the overall number of times the system wait API was called for the analyzed application. 4.) Spin Time is the time a thread is active in a synchronization construct; the current value exceeds the threshold, so it classified as a performance issue and highlighted in pink. 5.) CPU Time is the sum of CPU time for all threads. 6.) Total Thread Count is the number of threads in the application.  7.) Paused Time is the amount of Elapsed time during which the analysis was paused via GUI, CLI commands, or user API. For the analyze_locks application, the Wait time is high, to identify the cause you need to understand how this Wait time was distributed per synchronization objects. The Top Waiting Objects section provides the list of synchronization objects with the highest Wait Time and Wait Count, sorted by the Wait Time metric. [[File:6.PNG]]  For the analyze_locks application, focus on the first three objects and explore the Bottom-up pane for more details. The Thread Concurrency Histogram represents the Elapsed time and concurrency level for the specified number of running threads. Ideally, the highest bar of your chart should be within the OK or Ideal utilization range.  [[File:7.PNG]] Note the Target Concurrency value. By default, this number is equal to the number of physical cores. Consider this number as your optimization goal. For the sample code, the chart shows that analyze_locks is a multithreaded application running maximum 4 threads simultaneously on a machine with 4 cores. But it is not using available cores effectively. Hover over the second bar to understand how long the application ran serially. The tooltip shows that the application ran one thread for almost 6.611 seconds, which is classified as Poor concurrency. The CPU Usage Histogram represents the Elapsed time and usage level for the logical CPUs. Ideally, the highest bar of your chart should be within the Ok or Ideal utilization range. [[File:8.PNGgif]]
== VTune Tutorial 3: Disk input Output Analysis ==
== Presentation ==
[https://docs.google.com/presentation/d/1zCTrwVQe-oJmkgK0mWI35bUyyqOtJHZYl9QEmmjgpb8/edit#slide?usp=id.g196aa60647_0_0sharing Link To Google Slides Presentation]
47
edits

Navigation menu