Difference between revisions of "GPU621/Analyzing False Sharing"

From CDOT Wiki
Jump to: navigation, search
(Example Of A False Sharing)
Line 6: Line 6:
  
 
= '''Example Of A False Sharing''' =
 
= '''Example Of A False Sharing''' =
 +
 +
#include <thread>
 +
#include <vector>
 +
#include <iostream>
 +
#include <chrono>
 +
using namespace std;
 +
const int sizeOfNumbers = 10000; // Size of vector numbers
 +
const int sizeOfSum = 2;    // Size of vector sum
 +
void sumUp(const vector<int> numbers, vector<long>& sum, int id) {
 +
    for (int i = 0; i < numbers.size(); i++) {
 +
        if (i % sizeOfSum == id) {
 +
            sum[id] += I;
 +
        }
 +
    }
 +
    // The output of this sum
 +
    cout << "sum " << id << " = " << sum[id] << endl;
 +
}
 +
int main() {
 +
    vector<int> numbers;
 +
    for (int i = 0; i < sizeOfNumbers; i++) { //Initialize vector numbers to 0 to 100
 +
        numbers.push_back(i);
 +
    }
 +
    cout << "-----Thread-----" << endl;
 +
    {  // Thread
 +
        vector<long> sum(sizeOfSum, 0); //Set size=sizeOfSum and all to 0
 +
        auto start = chrono::steady_clock::now();
 +
        vector<thread> td;
 +
        for (int i = 0; i < sizeOfSum; i++) {
 +
            td.emplace_back(sumUp, numbers, ref(sum), I);
 +
        }
 +
        for (int i = 0; i < sizeOfSum; i++) {
 +
            td[i].join();
 +
        }
 +
        auto end = chrono::steady_clock::now();
 +
        cout << "Thread time consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl;
 +
    }
 +
    cout << endl << "-----Serial-----" << endl;
 +
    {  // Serial
 +
        vector<long> sum(sizeOfSum, 0);
 +
        auto start = chrono::steady_clock::now();
 +
        for (int i = 0; i < sizeOfSum; i++) {
 +
            sumUp(numbers, sum, I);
 +
        }
 +
        auto end = chrono::steady_clock::now();
 +
        cout << "Serial time consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl;
 +
    }
 +
}
  
 
In this program code, we can see that at the beginning of the main function we declare a vector numbers and initialize it from 0 to 100. We can also see in the main function that the code is divided into two main blocks. The first block is the Thread block, which is executed concurrently using multiple threads. While the second block is the Serial block, it is executed concurrently using normal serial logic.
 
In this program code, we can see that at the beginning of the main function we declare a vector numbers and initialize it from 0 to 100. We can also see in the main function that the code is divided into two main blocks. The first block is the Thread block, which is executed concurrently using multiple threads. While the second block is the Serial block, it is executed concurrently using normal serial logic.

Revision as of 11:27, 23 November 2022

Group Members


  1. Ryan Leong
  2. Yash Padsala
  3. Shani Patel

Example Of A False Sharing

#include <thread>
#include <vector>
#include <iostream>
#include <chrono>
using namespace std;
const int sizeOfNumbers = 10000; // Size of vector numbers
const int sizeOfSum = 2;     // Size of vector sum
void sumUp(const vector<int> numbers, vector<long>& sum, int id) {
    for (int i = 0; i < numbers.size(); i++) {
        if (i % sizeOfSum == id) {
            sum[id] += I;
        }
    }
    // The output of this sum
    cout << "sum " << id << " = " << sum[id] << endl;
}
int main() {
    vector<int> numbers;
    for (int i = 0; i < sizeOfNumbers; i++) { //Initialize vector numbers to 0 to 100
        numbers.push_back(i);
    }
    cout << "-----Thread-----" << endl;
    {   // Thread
        vector<long> sum(sizeOfSum, 0); //Set size=sizeOfSum and all to 0
        auto start = chrono::steady_clock::now();
        vector<thread> td;
        for (int i = 0; i < sizeOfSum; i++) {
            td.emplace_back(sumUp, numbers, ref(sum), I);
        }
        for (int i = 0; i < sizeOfSum; i++) {
            td[i].join();
        }
        auto end = chrono::steady_clock::now();
        cout << "Thread time consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl;
    }
    cout << endl << "-----Serial-----" << endl;
    {   // Serial
        vector<long> sum(sizeOfSum, 0);
        auto start = chrono::steady_clock::now();
        for (int i = 0; i < sizeOfSum; i++) {
            sumUp(numbers, sum, I);
        }
        auto end = chrono::steady_clock::now();
        cout << "Serial time consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl;
    }
}

In this program code, we can see that at the beginning of the main function we declare a vector numbers and initialize it from 0 to 100. We can also see in the main function that the code is divided into two main blocks. The first block is the Thread block, which is executed concurrently using multiple threads. While the second block is the Serial block, it is executed concurrently using normal serial logic.

The main purpose of the sumUp function is to calculate the sum of odd elements or even elements based on the data in the first vector argument in the argument list. Also, the sum will be recorded in the corresponding position of the second vector argument using the int argument as the index.

Which block of code do you feel will take less time?

Theoretically on a multicore machine it should be the Thread code block that is faster, right? But the result is.