Changes

Jump to: navigation, search

GPU621 Team Tsubame

2,256 bytes added, 03:55, 21 November 2016
How is it actually used?
The following walk-through assumes that you have Visual Studio 2015 and Intel Advisor 2017 installed.
==== ''' Preparations: ===='''
1. Download and unzip Prefix Scan.zip to a preferred location and open it with Visual Studio 2015.
4. Change the following project properties:
a 1. In C/C++ > General > Additional Include Directories, add the Advisor’s directory using macro notation: $(ADVISOR_..._DIR)include (or $(ADVISOR_..._DIR)\include if the environment variable does not end with a backslash).
b 2. In C/C++ > General > Debug Information Format, confirm it is set to Program Database (/Zi).  3. In Linker > Debugging > Generate Debug Info, set it to Optimize for debugging (/DEBUG).
c. In Linker > Debugging > Generate Debug Info, set it to Optimize for debugging (/DEBUG). d 4. In C/C++ > Optimization > Optimization, confirm it is set to Maximize Speed (/O2) or higher.
e 5. On the same page, set Inline Function Expansion to Only __inline (/Ob1).
f 6. In C/C++ > Code Generation > Runtime Library, confirm it has been set to Multi-threaded DLL (/MD); another option is to set this field to Multi-threaded Debug DLL (/MDd).
g 7. Enable OpenMP under C/C++ > Language > OpenMP Support by setting it to Generate Parallel Code (/Qopenmp).
h 8. Click OK to save the properties.
5. Comment out the “terminate” section in w3.main.cpp to end the application without waiting for user input.
16. Select OK to complete the project creation process.
 
''' Profiling: '''
1. Allow Advisor to survey the application by clicking on the Collect button under the Threading Workflow tab (on the left panel).
 
2. Continue profiling by running the Trip Counts and FLOPS analysis.
 
''' Further Analysis: '''
1. Looking at the report, you can pick targets from the list of Function Call Sites and Loops to annotate and determine if they are suitable for parallel framework code. For the purpose of this walkthrough, the inner loop of the upsweep in exclusive scan was chosen as the target for annotations.
 
2. To add annotations, include the <advisor-annotate.h> header file.
 
3. Mark a possible parallel site and task with the following macros:
 
4. Rebuild the project and you might need to re-run the Survey Analysis and the (optional) Trip Counts and FLOPS Analysis.
 
5. Checking the checkboxes beside certain sites will mark them for deeper analyses.
 
6. With one of the sites checked, run the Dependencies Analysis.
 
7. For this example, there should be no dependencies. However, this is one warning: One task in parallel site; right click on the warning and select the What Should I Do Next? option.
NOTE: the What Should I Do Next? option is very useful for opening the documentations on the module you are pointing at.
 
8. Go back to the Survey Report and uncheck the Deeper Analysis checkbox beside the target site.
NOTE: you can mark multiple sites and nest multiple tasks in each site, but the analyses will run longer.
 
9. Once you have annotated the sites and their tasks, run the Suitability Analysis.
 
10. Since OpenMP is the focus of this workshop, change the Threading Model to OpenMP. Next, set the CPU Count to the amount of processors available on the machine.
 
11. Load Imbalance and Runtime Overhead will change as you modify the Avg. Number of Iterations (Tasks) and the Avg. Iteration (Task) Duration sliders and click Apply.
 
12. Estimated performance will also increase if you check the Runtime Modeling checkboxes that have benefits attached. The blue links will explain the means to enable the enhancements.
 
=== Resources ===
For more information, please refer to the Intel Advisor tutorials at https://software.intel.com/en-us/articles/advisorxe-tutorials.
232
edits

Navigation menu