OpenMP Debugging in Visual Studio / Team Debug

From CDOT Wiki
Revision as of 15:45, 6 December 2017 by Sofimofi (talk | contribs) (Case B - Using the Parallel Stacks and the Parallel Watch Window)
Jump to: navigation, search

Group Members

  1. Sofia Ngo-Trong
  2. Azusa Shimazaki
  3. Orlandson Asturiano

please feel free to change the contents' depth!!!

test1

test

Processes(Rough)

Processes Why would you have multiple projects in one solution? https://stackoverflow.com/questions/8678251/benefits-of-multiple-projects-and-one-solution

  • services
  • custom setup actions
  • working multiple languages
  • creating libraries used in different places
  • large programs could be made up of many smaller projects for better management
  • working with multiple applications that interact with each other

Configuration

https://msdn.microsoft.com/en-us/library/jj919165.aspx

By default breaking/stepping/stopping applies to all other processes, but can be changed if you needed.

In order to add a new process you need to find the .pdb files.

The debugger needs access to these files of the processes

.pdb file holds the debugging and project state info that’s created on compile


Multiple processes

Each project is an individual process

If you have more than one project in a project solution, you can choose which projects the debugger starts

You could also attach a process outside of the debugger to the debugger, including processes on a remote device but your inspection ability is limited

You could also set process to automatically start in the debugger – useful for services and custom setup actions

When you have multiple processes, only one process is active in the debugger, but in order to switch between processes, you must be in break mode

When you switch to a process, all windows will show information for that process only

When you stop debugging, if the current process was launched from the debugger it will terminate, however if you attached the debugger to the current process (attach to a process outside of vs2017), the debugger will detach and leave that process running

Background

How can we debug the parallel program? bra bra bra... Our test environment is "visual studio 2015" and "Intel Parallel Studio XE 2016"

User Interface

Attach to Process dialog box

Processes window

Process window1

-shortcut

-how to open

-description/How to use

-info you can see

-what is attach

-what is detach

Threads window

Thread window1

-shortcut

-how to open

-description/how to use

-info you can see

Source window

Debug Location toolbar

Parallel Stacks window

The Parallel Stacks Window shows call stack information for all the threads in your application. You can focus on different threads and see the stack frames for them.


Setup:

1. When you start debug (F5), click on Debug > Windows > Parallel Stacks.
2. To be able to see more detailed debug info in the Parallel Stacks Window, go to Debug > Options, and under Debugging > General uncheck "Enable Just My Code", and under Debugging > Symbols put a checkmark on "Microsoft Symbol Servers".

Parallel Tasks window

Parallel Watch window

GPU Threads window

Walkthrough

Case A

How to use Process window and thread window under multiple OpenMP project

1. set OpenMP

2. create multiple subprojects in one project

3. set up multiple start up

4. how debug windows shows the status of multiple projects

5. how you can use each tool to find helpful info

project1: test1.cpp

#include <iostream>
using namespace std;

int main() {

#pragma omp parallel 
	{
#pragma omp for 

	for (int i = 0; i < 10; i++)

	     cout << " now i at test1= " << i << endl;
	}
}

Project2: test2.cpp

#include <iostream>
using namespace std;

int main() {

#pragma omp parallel 
	{
#pragma omp for 

	for (int j = 0; j < 10; j++)

	     cout << " now j at test2= " << j << endl;
	}
}

Case B - Using the Parallel Stacks and the Parallel Watch Window

We will use the following program, which uses cilk API for parallelization, to experiment with the Parallel Stacks and the Parallel Watch Window:


// cilkthreads.cpp

#include <iostream>
#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
#include <thread>	// std::this_thread::sleep_for
#include <chrono>	// std::chrono::seconds
 
int foo(int i);
int boo(int i);
int coo(int i);
int doo(int i);
int zoo(int i);
int bla(int i);
int blo(int i);
int blu(int i);
int vroom(int i);
int beep(int i);
int screech(int i);
int woof(int i);
int meow(int i);
int oink(int i);
int zzz(int i);
int cough(int i);

int main() {
	int nwt = __cilkrts_get_nworkers();
	std::cout << "Number of workers is " << nwt << std::endl; 	

	int i = 1;

	cilk_spawn foo(i);
	cilk_spawn coo(i);
 	cilk_spawn boo(i);
	cilk_spawn doo(i);
	cilk_spawn zoo(i);

	foo(i);

	cilk_sync;
 
	return 0;
}

int foo(int i) {
	++i;
	int tid = __cilkrts_get_worker_number();
	std::this_thread::sleep_for(std::chrono::seconds(1));
	printf("Foo! from worker %d\n", tid);
	return i;
}

int boo(int i) {
	i += 5; 
	int tid = __cilkrts_get_worker_number();
	std::this_thread::sleep_for(std::chrono::seconds(1));
	printf("Boo! from worker %d\n", tid);
	i = zzz(i);
	return i ;
}

int coo(int i) {
	i *= 10;
	int tid = __cilkrts_get_worker_number();
	std::this_thread::sleep_for(std::chrono::seconds(1));
	printf("Coo! from worker %d\n", tid);
	i = bla(i);
	return i;
}

int doo(int i) {
	i *= 100;
	int tid = __cilkrts_get_worker_number();
	std::this_thread::sleep_for(std::chrono::seconds(1));
	printf("Doo! from worker %d\n", tid);
	i = vroom(i);
	return i;
}

int zoo(int i) {
	--i; 
	int tid = __cilkrts_get_worker_number();
	std::this_thread::sleep_for(std::chrono::seconds(1));
	printf("Zoo! from worker %d\n", tid);
	i = woof(i);
	i = meow(i);
	i = oink(i);
	return i;
}

int bla(int i) {
	i *= 3; 
	int tid = __cilkrts_get_worker_number();
	printf("Bla bla! from worker %d\n", tid);
	i = blo(i);
	return i;
}

int blo(int i) {
	i += 4;
	int tid = __cilkrts_get_worker_number();
	printf("Blo Blo! from worker %d\n", tid);
	i = blu(i);
	return i;
}

int blu(int i) {
	i *= 7;
	int tid = __cilkrts_get_worker_number();
	printf("Blu Blu! from worker %d\n", tid);
 	return i;
}

int vroom(int i) {
	i += 5;
	int tid = __cilkrts_get_worker_number();
	printf("Vroom! from worker %d\n", tid);
	i = beep(i);
	return i;
}

int beep(int i) {
	i -= 2;
	int tid = __cilkrts_get_worker_number();
	printf("Beep beep! from worker %d\n", tid);
	i = screech(i);
	return i;
}

int screech(int i) {
	i -= 5;
 	int tid = __cilkrts_get_worker_number();
	printf("Screeeeeeeeechhhhhh! from worker %d\n", tid);
	return i;
}

int woof(int i) {
	i += 12;
	int tid = __cilkrts_get_worker_number();
	printf("Woof! from worker %d\n", tid); 
	return i;
}

int meow(int i) {
	i *= 6;
	int tid = __cilkrts_get_worker_number();
	printf("Meow! from worker %d\n", tid);
	return i;
}

int oink(int i) {
	i++;
	int tid = __cilkrts_get_worker_number();
	printf("Oink! from worker %d\n", tid);
	return i;
}

int zzz(int i) {
	i -= 10;
	int tid = __cilkrts_get_worker_number();
	for (int i = 0; i < 10; i++) {
		cough(i);
	}
	printf("Zzzzzzzzz...... from worker %d\n", tid);
	return i;
}

int cough(int i) {
	i += 8;
	int tid = __cilkrts_get_worker_number();
	printf("cough! from worker %d\n", tid); 
 	return i;
}

In the above code, the main program calls functions that may themselves call other functions. At each cilk_spawn keyword, we can expect a new child thread to call the function. However, if the function have very short operations each, then the different spawns may not even be distributed to different child threads, since each function call may take very fast. That was originally the case, where all of the function calls were done by one thread. Therefore, the functions were adjusted to sleep for 1 second within the function itself. This way, the functions took long enough so that the program did spawn into multiple child threads.

Here is the output of the program, with the cilk_for loop commented out:

Output

From the output we can see that 4 different threads (child threads) occupied the 6 function calls. In order of the function calls in the code, worker 0 took foo(), worker 3 took coo(), worker 2 took boo(), worker 1 took doo(), worker 0 took zoo(), and finally worker 3 took the remaining foo() function. The Parallel Stacks window allows us to see the call stack information for all active threads at any point in our program.

Setup:

1. Put a breakpoint at all function calls, all function definitions, cilk_sync, and cilk_for.


Walkthrough:

First function call: At our first function call at

 cilk_spawn foo();

we can see in the Threads window the Main Thread, with a yellow arrow pointing at it :

Main Thread

and the respective view for Parallel Stacks:

Stacks - Main Thread

In the above, the blue-highlighted boxes refer to the call stack of the current thread, which is Main Thread, indicated by the yellow arrow. The program begins with 4 threads; 1 splits off into what is our Main Thread, and the other 3 split off elsewhere.
We can hover our mouse over any row in the boxes to get more info:

Step 1 - stacks

Hovering above "main" in the Main "1 Thread", we can see which line in the code the current stack frame is at.

As we keep hitting F5, we go through each stack frame as determined by our breakpoints. Here, our foo() function was executed. We can see the sleep_for function that was executed, all within the Main thread:

Step 1b - stacks

At this point, another thread has begun. We can see "Worker 3" has started in the Threads window:

Step 1b - Worker 3

And if we double click on it, the focus will shift to its call stack in the Parallel Stacks window:

Step 1b - Worker 3

To clear out the other threads from the program which have nothing to do with our cilk spawned threads, we can flag the threads we want in the Threads window, and then click on the flags icon at the top of the Parallel Stacks window, which will just show the call stacks of the flagged threads:

Step 1b - Flagged Threads
Step 1b - Flagged Threads


From this point forward, we will just view the call stacks for the threads which we are flagging, which are the Main Thread, and the 3 worker threads.

Second function call:

cilk_spawn coo()

Now Worker 3 has taken the next spawned thread, which we can see the call stack highlighted in blue:

Worker 3

Also, the middle box indicates the 2 threads, Worker 1 and Worker 2, which seem to just be waiting for work. Main Thread on the left also seems to be just waiting.

Third function call:

cilk_spawn boo();

Now worker 1 has taken charge of the next spawned thread, highlighted in blue:

Worker 1

Fourth function call:

cilk_spawn doo();

Finally, worker 2 picks up the next spawned thread:

Worker 2

At this point, the Main thread, in function foo, has already printed its line.

Fifth function call:

cilk_spawn zoo();

Finally, the Main thread is available and picks up the next spawned thread, which is a call to zoo function.

Main thread

As we had stepped to the breakpoint set at each function definitions, we saw that Worker 1's stack call had gone from boo(), to zzz() to cough(), Worker 2 had gone from doo() to vroom() to beep(), and Worker 3's had gone from coo() to bla().


Sixth function call:

foo();

The next thread to free up was Worker 2, which snatched up the call to function foo() since it was available.

Worker 2


Worker 3 is free:

Now, Worker 3 has finished its work and is just waiting, as shown in the rightmost box:

Worker 2


Syncing up:

cilk_sync;

At this point, all spawned threads have synced up, as indicated in the right box, and the main thread continues, on the left side box.

Synced up


The Parallel Stacks Window is a valuable tool in debugging multi-threaded applications as we can have a view of all threads at once and their call stacks in any given time. It allows us to see the delegation of work to different threads as they are made free and as they are indicated by the compiler to work.

Case C

https://en.wikipedia.org/wiki/Help:Cheatsheet