Open main menu

CDOT Wiki β

Changes

GPU621/False Sharing

16 bytes removed, 18:01, 6 December 2021
Results
[[File:naive_implementation.png|500px|thumb|Execution time of naive implementation without any optimization levels (Od).]]
The algorithm calculates the correct answer, but the performance is absolutely terrible. The reason is that an int array is a contiguous block of memory with each integer taking up 4 bytes. Assuming a 64 byte cache lineEven though each thread modifies their own indexed element, due to spatial locality, our entire the system will bring in the other elements in the array only takes up half as part of the cache opening up the possibility for multiple threads to share the same cache line resulting in false sharing. Although there were cases where higher thread count produced better results, there were many cases that performed worse than a single thread. This is due to the scheduling of thread execution that is out of the programmer's hands. It is possible that the selected schedule managed to minimize the frequency of false sharing giving better performance. However, this is extremely unreliable, so we need a better solution to false sharing.
<br><br><br><br><br><br><br><br><br><br><br><br>
83
edits