Open main menu

CDOT Wiki β

Changes

DPS921/Group 8

2,055 bytes added, 11:32, 28 November 2018
Location of the problem - Local cache
= Location of the problem - Local cache =
===What is Cache===A cache is a small, fast array of memory placed between the processor core and main memory that stores portions of recently referenced main memory. The processor uses cache memory instead of main memory whenever possible to increase system performance. False sharing is comment issue in symmetric multiprocessor (SMP) system. In SMP, each multi-processor has their own shared-memory architectures. There is a potential for those architectures being in a same cache line managed by the caching mechanism, as they are small enough. The caching protocol would not allow other party to access the data. Except those data shares a cache block with data. Those data are protected by a hardware write lock that only one core can hold at a time. When two processors are both altered data, the protocol would force the first participant to update the whole unit despite a lack of logical necessity. Each update of an individual element of a cache line marks the line as invalid. Other processors accessing a different element in the same line see the line marked as invalid. They are forced to fetch a more recent copy of the line from memory or elsewhere, even though the element accessed has not been modified. This is because cache coherency is maintained on a cache-line basis, and not for individual elements. As a result, there will be an increase in interconnect traffic and overhead.
[[File:Smp.PNG]]
 
===Caching Coherency Protocol===
The caching coherency protocol would not allow other party to access the data. Except those data shares a cache block with data. Those data are protected by a hardware write lock that only one core can hold at a time. When two processors are both altered data, the protocol would force the first participant to update the whole unit despite a lack of logical necessity. Each update of an individual element of a cache line marks the line as invalid. Other processors accessing a different element in the same line see the line marked as invalid. They are forced to fetch a more recent copy of the line from memory or elsewhere, even though the element accessed has not been modified. This is because cache coherency is maintained on a cache-line basis, and not for individual elements. As a result, there will be an increase in interconnect traffic and overhead.
 
 
[[File:Painter_1.PNG]]
 
[[File:Painter_2.PNG]]
 
[[File:Painter_3.PNG]]
===How MESI protocol works:===
= Signs of false sharing =
False sharing is more than two threads updating at lease two independence element in the same cache line. False sharing occurs when processors in a shared-memory parallel system refer to data objects within a same cache line. So that false sharing has two unique characteristics. One is the simple compared the contents of adjacent memory in the data processes occur on one same cache line to those on , the another is at least two threads participate the committed page can detect false sharingexecution.
let’s take an example.
= Solutions =
We will now explore two typical ways to deal with false sharing in an OMP environment.
 
____________________________________________________________________________________________________________________________________________________________
 
<nowiki>
 
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <chrono>
#include <algorithm>
#include <omp.h>
#include "pch.h"
using namespace std::chrono;
 
struct MyStruct
{
int value;
};
 
int main(int argc, char** argv)
{
MyStruct testArr[4];
 
int value[4] = {};
 
omp_set_num_threads(4);
double start_time = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < 4; i++) {
 
testArr[i].value = 0;
 
for (int j = 0; j < 10000; j++) {
testArr[i].value = testArr[i].value + 1;
}
 
}
 
std::cout << "testArr value 1: " << testArr[0].value << std::endl;
std::cout << "testArr value 2: " << testArr[1].value << std::endl;
std::cout << "testArr value 3: " << testArr[2].value << std::endl;
std::cout << "testArr value 4: " << testArr[3].value << std::endl;
 
 
double time = omp_get_wtime() - start_time;
std::cout << "Execution Time: " << time << std::endl;
 
return 0;
}
 
</nowiki>
 
[[File:serial.png]]
 
_________________________________________________________________________________________________________________________________________
== '''Padding''' ==
#include <omp.h>
#include "pch.h"
 
using namespace std::chrono;
struct MyStruct
{ float int value; int padding[2415]; };
int main(int argc, char** argv){
MyStruct testArr[4];
int threads_usedvalue[4] = {};  omp_set_num_threads(14);
double start_time = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < 4; i++) {
 
testArr[i].value = 0;
#pragma omp parallel for
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 10000; j++) {
//#pragma omp critical testArr[i].value = testArr[i].value + 21;
}
}
 
std::cout << "testArr value 1: " << testArr[0].value << std::endl;
std::cout << "testArr value 2: " << testArr[1].value << std::endl;
std::cout << "testArr value 3: " << testArr[2].value << std::endl;
std::cout << "testArr value 4: " << testArr[3].value << std::endl;
 
double time = omp_get_wtime() - start_time;
return 0;
}
 
</nowiki>
____________________________________________________________________________________________________________________________________________________________
[[File:1threads_paddingpadding.jpg|1000px]]  [[File:2threads_padding.jpg|1000px]]  [[File:3threads_padding.jpg|1000px]]  [[File:4threads_padding.jpg|1000pxpng]]
== '''Critical construct''' ==
struct MyStruct
{ float int value; //int padding[24];};
int main(int argc, char** argv){
MyStruct testArr[4];
float partial_val int value[4] = 0.0{};
int threads_used;
omp_set_num_threads(4);
double start_time = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < 4; i++) {
 
testArr[i].value = 0;
 
for (int j = 0; j < 10000; j++) {
partial_val #pragma omp critical testArr[i].value = testArr[i].value + 21;
}
#pragma omp critical testArr[i].value += partial_val;
}
 
std::cout << "testArr value 1: " << testArr[0].value << std::endl;
std::cout << "testArr value 2: " << testArr[1].value << std::endl;
std::cout << "testArr value 3: " << testArr[2].value << std::endl;
std::cout << "testArr value 4: " << testArr[3].value << std::endl;
 
double time = omp_get_wtime() - start_time;
return 0;
}
}
 
</nowiki>
______________________________________________________________________________________________________________________________________________________________
[[File:4threads_criticalcritical.jpg|1000pxpng]]
=Conclusion =
False sharing is a lurking problem that hinders the scalabilty of a program and it can be easily missed. It is very important to keep and eye out for the problem and recognize it quickly in parallel programming where performance is key. The two methods we explored, padding and thread local variable, are both reliable solution to false sharing but having a local variable to a thread is definitely better than padding as padding is more resource wasteful making it counter intuitive in parallel programming.
35
edits