Open main menu

CDOT Wiki β

GPU621 Team 1

Revision as of 19:19, 30 November 2016 by Timothy Chun Yan Ngai (talk | contribs) (VTune Tutorial 1: Finding HotSpot)

What is VTune Amplifier?

Application that apply different analysis to your program
Helps programmer debugs and improve program
Provide GUI version
It has both a standalone and a IDE add-on

Where To Get Vtune ?

Price: $899
Window / Linux
Free Software Tools
- Students
- Educators
- Academic researchers
- Open source contributors


Download Here

Getting Started

VTune Tutorial 1: Finding HotSpot

 

This example program will be downloaded when you install VTune Amplifier. Following is the directory that contain the sample code from Intel.

[Program Files]\IntelSWTools\VTune Amplifier XE <version>\samples\en\C++\tachyon_vtune_amp_xe.zip

Open the project using Visual Studio. Then you can run the the VTune Amplifier and click new Analysis. (You need to download Vtune Amplifier to have that tab on Visual Studio)

 

This should be the next page you will get. You begin to choose different type of Analysis here. We are going do a Basic Hotpots Analysis.Then click start to start the Analysis.

 

The program should run itself after you begin.You will notice that the image is loading from the bottom to the top. After the program finish running, it will take a while for Amplifer to generate report.

 

After the program finish running, it will take a while for Amplifer to generate report.

 

The first page will shows a summary of the program.The time it took, the top hotspots, CPU usage etc. We will focus looking at the Hotspots table. we notice that the "initialize_2D_buffer" use the most the CPU time. If you look at the code on find_hotspots.cpp you will notice it is actually one function in side that cpp file

 

We go to the bootom up tab. it will give you a graph that shows the Hotspots table you got. You can clearly see that "initialize_2D_buffer" use the most time compare to the other function.

 

If we double click on this function we it will shows the sources code and shows you which line of the code actually use the most time in the specific function. Now we can tell that most of the time are spend on the while loop.

 

To compare a paralleled version of this code I already have a program that use CILK PLUS to parallelize that program. Below is the link to download that code. Simply replace the find_hotspots.cpp with this code, build it and run the Analysis again.

Link:File:Find hotspots.zip



 

When you run the Analysis this time you will see the program image load at the same time in different level instead of loading from the bottom to the top.


 

This time we should able to see that the Elased Time is shorter than the old time about 10second and the Top Hotspot is no longer "initialize_2D_buffer".


 

If we go to bottom up tab you can see that the "initialize_2D_buffer" is no longer exist and it shows the Cilk worker graph that shows the program is not run a parallel.

VTune Tutorial 2: Locks and Wait Tutorial

1. Prepare for Analysis

Note: configuration step is skipped and using default application configuration.

Determine the baseline (total execution time which you will compare subsequent runs of the application).

Do this by running the application for the first time.

After running the application, the baseline for the first run is 6.063s.

 


2. Find lock

a. Choose and run locks and waits analysis:

With visual studio, click on "new analysis".

Choose your analysis target (the application executable).

Click on the Analysis Type tab. Under algorithm analysis, click on "Locks and Waits" and click start to run the analysis.

 

VTune Tutorial 3: Disk input Output Analysis

Resources