Changes

GPU621/Analyzing False Sharing

34 bytes added, 14:14, 23 November 2022

→‎Example Of A False Sharing

Theoretically, this code should be executed faster on a multicore machine with a Thread block than a serial block. But the result is.

[[File:exampleOutput1.jpg|left]]

Or this

To our surprise, the serial block took much less time, no matter how many times I ran it. This turned our existing knowledge upside down, but don't worry, it's because you don't understand False Sharing yet.

As we said above, the smallest unit of CPU operation on the cache is the size of a cache line, which is 64 bytes. As you can see in our program code, the sum is a vector that stores two long data types. The two long data are in fact located on the same cache line. When two threads read and write sum[0] and sum[1] separately, it looks like they are not using the same variable, but in fact, they are affecting each other.For example, if a thread updates sum[0] in CPU0, this causes the cache line of sum[1] in CPU1 to be invalidated. This causes the program to take longer to run, as it needs to re-read the data.

Ryan Leong

118

edits

Changes

GPU621/Analyzing False Sharing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools