GPU621/Distributed Workload

From CDOT Wiki
Jump to: navigation, search



Is a template library developed by Intel to provide methods to facilitate parallel programming. This is done by dividing a computation into tasks that can be scheduled to run in parallel threads on multi-core processors
Threading Building Blocks includes algorithms, concurrent containers, locks and memory allocation tools.
TBB is designed to work with any C++ compiler.

#include <tbb/tbb.h>

blocked_range<int> range0(0 ,40);
for (auto i = range.begin(); i != range.end(); i++) {
	 b[i] = 2 * a[i] + b[i];


The Standard Template Library also extends useful functionality, including generic data structures, containers, iterators and algorithms that can be used to write clean efficient code.
The person who in 1979 was initially interested with ideas of generic programming, his work at AT&T and Bell Laboratories eventually lead to a proposal to the ANSI/ISO for the standardization of STL into the C++ standard.

#include <iostream>
#include <vector>
int main () {
  std::vector<int> myvector;
  for (int i=0; i < 6; i++) myvector.push_back(i);

  for (std::vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it)
    std::cout << ' ' << *it;
  std::cout << '/n';


Both libraries use C++ templates to provide generic programming structures. The libraries do overlap when it comes to the functionality they provide, however STL is designed to be more general use and TBB specializes on parallel programming with threads.


Both libraries use random access iterators to ease navigation of containers. TBB follows the standard set by STL and the ISO C++ standard, but they also extend them so that tbb::concurrent_vector<T> can be used safely in parallel threads.


STL implements the following common containers

  • vector
  • list
  • queue
  • stack
  • map

TBB does not implement as many containers however it does include some that are useful in parallel programming and extends their functionality.

  • blocked_range<T>
  • concurrent_hash_map<T>
  • cuncurrent_vector<T>
  • concurrent_queue<T>


Some serial algorithms exist for STL that can preform tasks such like searching and sorting. These functions are typically used to operate on the containers like std::merge() and std::sort()
The algorithms in TBB are much more vital to the usefulness of the library. TBB uses templated functions like

  • parallel_for(range, body [, partitioner]);
  • parallel_scan(range, body [, partitioner]);
  • parallel_reduce(range, body [, partitioner]);

These functions operate on the blocked_range container class in TBB to preform operations in parallel as described in the body object, typically by overloading the () operator. The following code snippet will demonstrate a simple parallel_reduce implementation.

#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"

using namespace tbb;

struct Sum {
    float value;
    Sum() : value(0) {}
    Sum( Sum& s, split ) {value = 0;}
    void operator()( const blocked_range<float*>& r ) {
        float temp = value;
        for( float* a=r.begin(); a!=r.end(); ++a ) {
            temp += *a;
        value = temp;
    void join( Sum& rhs ) {value += rhs.value;}

float ParallelSum( float array[], size_t n ) {
    Sum total;
    parallel_reduce( blocked_range<float*>( array, array+n ), total );
    return total.value;

Some things to notice about this code are as follows. All of the reduce operations are done in the overloaded () operator. The join() and Sum(Sum& s, split) split constructor are needed to split the blocked_range