Open main menu

CDOT Wiki β

Changes

GPU621/Analyzing False Sharing

782 bytes added, 14:45, 23 November 2022
no edit summary
As we said above, the smallest unit of CPU operation on the cache is the size of a cache line, which is 64 bytes. As you can see in our program code, the sum is a vector that stores two long data types. The two long data are in fact located on the same cache line. When two threads read and write sum[0] and sum[1] separately, it looks like they are not using the same variable, but in fact, they are affecting each other.For example, if a thread updates sum[0] in CPU0, this causes the cache line of sum[1] in CPU1 to be invalidated. This causes the program to take longer to run, as it needs to re-read the data.
 
We try to change the sizeOfSum to 8, which means that changing the size of the vector sum to 8 and the if condition in the sumUp function to i%8==id will give the program more data to process. Nevertheless, the result still does not bring the time between the Serial block and Thread block closer and the Serial block still takes less time.
 
What would it take to show the advantages of multicore? As you can see in our program, multiple threads are not operating on the same variable, but only a single thread is reading and writing a lot of variables in the same cache line, so you can introduce a local variable.
 
We implement another function sumUp2 in the program code:
 
Make some more changes to our main function:
 
When we ran the program again, we came to this conclusion:
118
edits