Difference between revisions of "DPS921/Group 8"

From CDOT Wiki
Jump to: navigation, search
(Location of the problem - Local cache)
(How MESI protocol works:)
Line 16: Line 16:
 
False sharing is comment issue in symmetric multiprocessor (SMP) system.  In SMP, each multi-processor has their own shared-memory architectures. There is a potential for those architectures being in a same cache line managed by the caching mechanism, as they are small enough. The caching protocol would not allow other party to access the data. Except those data shares a cache block with data. Those data are protected by a hardware write lock that only one core can hold at a time. When two processors are both altered data, the protocol would force the first participant to update the whole unit despite a lack of logical necessity. To ensure data consistency across multiple caches, multiprocessor-capable Intel® processors follow the MESI protocol. Each update of an individual element of a cache line marks the line as invalid. Other processors accessing a different element in the same line see the line marked as invalid. They are forced to fetch a more recent copy of the line from memory or elsewhere, even though the element accessed has not been modified. This is because cache coherency is maintained on a cache-line basis, and not for individual elements. As a result, there will be an increase in interconnect traffic and overhead.
 
False sharing is comment issue in symmetric multiprocessor (SMP) system.  In SMP, each multi-processor has their own shared-memory architectures. There is a potential for those architectures being in a same cache line managed by the caching mechanism, as they are small enough. The caching protocol would not allow other party to access the data. Except those data shares a cache block with data. Those data are protected by a hardware write lock that only one core can hold at a time. When two processors are both altered data, the protocol would force the first participant to update the whole unit despite a lack of logical necessity. To ensure data consistency across multiple caches, multiprocessor-capable Intel® processors follow the MESI protocol. Each update of an individual element of a cache line marks the line as invalid. Other processors accessing a different element in the same line see the line marked as invalid. They are forced to fetch a more recent copy of the line from memory or elsewhere, even though the element accessed has not been modified. This is because cache coherency is maintained on a cache-line basis, and not for individual elements. As a result, there will be an increase in interconnect traffic and overhead.
  
=How MESI protocol works:=
+
==How MESI protocol works:==
 
The cache line would be signed ‘Exclusive’ as its first load. As the processor find the cache line is loaded by other processors. The access of cache line is changed to ‘Share’. If the processor stored a cache line marked as ‘Share’, the cache line is marked as ‘Modified’ and all other processors are sent an ‘Invalid’ cache line message. I the processor sees the same cache line which is now marked ‘Modified’ being accessed by another processor, the processor stores the cache line back t memory and marks its cache line as ‘Shared’.
 
The cache line would be signed ‘Exclusive’ as its first load. As the processor find the cache line is loaded by other processors. The access of cache line is changed to ‘Share’. If the processor stored a cache line marked as ‘Share’, the cache line is marked as ‘Modified’ and all other processors are sent an ‘Invalid’ cache line message. I the processor sees the same cache line which is now marked ‘Modified’ being accessed by another processor, the processor stores the cache line back t memory and marks its cache line as ‘Shared’.
  

Revision as of 10:48, 26 November 2018

Group 8

Our Project: Analyzing False Sharing - Case Studies

https://wiki.cdot.senecacollege.ca/wiki/DPS921/Group_8

Group Members

  1. Aditya Rahman
  2. Zhijian Zhou
  3. eMail All

False Sharing in Parallel Programming

Introduction

Location of the problem - Local cache

False sharing is comment issue in symmetric multiprocessor (SMP) system. In SMP, each multi-processor has their own shared-memory architectures. There is a potential for those architectures being in a same cache line managed by the caching mechanism, as they are small enough. The caching protocol would not allow other party to access the data. Except those data shares a cache block with data. Those data are protected by a hardware write lock that only one core can hold at a time. When two processors are both altered data, the protocol would force the first participant to update the whole unit despite a lack of logical necessity. To ensure data consistency across multiple caches, multiprocessor-capable Intel® processors follow the MESI protocol. Each update of an individual element of a cache line marks the line as invalid. Other processors accessing a different element in the same line see the line marked as invalid. They are forced to fetch a more recent copy of the line from memory or elsewhere, even though the element accessed has not been modified. This is because cache coherency is maintained on a cache-line basis, and not for individual elements. As a result, there will be an increase in interconnect traffic and overhead.

How MESI protocol works:

The cache line would be signed ‘Exclusive’ as its first load. As the processor find the cache line is loaded by other processors. The access of cache line is changed to ‘Share’. If the processor stored a cache line marked as ‘Share’, the cache line is marked as ‘Modified’ and all other processors are sent an ‘Invalid’ cache line message. I the processor sees the same cache line which is now marked ‘Modified’ being accessed by another processor, the processor stores the cache line back t memory and marks its cache line as ‘Shared’.

Signs of false sharing

Solutions

We will now explore two typical ways to deal with false sharing in an OMP environment.

Padding

One way to eliminating false sharing is to add in padding to the data. The idea of padding in general is for memory alignment, by utilizing padding we can eliminate cache line invalidation interfering with read and write of elements.

How padding works: Let's say we have an int element num[i] = 10; in memory this would be stored as 40 bytes ( 10 * 4 byte) and a single standard cache line is 64 byte which means 24 byte needs to be padded otherwise another element will occupy that region which will result in 2 or more thread accessing same cache line causing false sharing.

Cacheline1.jpeg

Cacheline2.jpeg

____________________________________________________________________________________________________________________________________________________________

//Test.cpp Test padding excution time

  1. include <iostream>
  2. include <iomanip>
  3. include <cstdlib>
  4. include <chrono>
  5. include <algorithm>
  6. include <omp.h>
  7. include "pch.h"

using namespace std::chrono;

struct MyStruct { float value; int padding[24]; };

int main(int argc, char** argv) { MyStruct testArr[4];

omp_set_num_threads(3); double start_time = omp_get_wtime();

       #pragma omp parallel for

for (int i = 0; i < 4; i++) { for (int j = 0; j < 10000; j++) { //#pragma omp critical testArr[i].value = testArr[i].value + 2; }

}

double time = omp_get_wtime() - start_time; std::cout << "Execution Time: " << time << std::endl;

return 0; }

____________________________________________________________________________________________________________________________________________________________

Synchronization

The other way to eliminating false sharing is to implement a mutual exclusion construct. This the better method than using padding as there is no wasting of memory and data access is not hindered due to cache line invalidation. Programming a mutual exclusion implementation is done by using the critical construct in an op environment. The critical construct restricts statements to a single thread to process at a time, making variables local to a single thread ensures that multiple threads do not write data to the same cache line.

Conclusion