3.2. Race condition
Multi-threading can be fast, but it may also create bugs.
When threads run in parallel, their instructions can interleave with each other
in unpredictable ways, which could lead to incorrect results. For a simple example, if we want to do x += 1, the
actual execution in the CPU could look like this:
x = 0 # Initial value
temp1 = x # Read
temp1 += 1 # Compute
x = temp1 # Write back
If we have two threads doing the same concurrently, one possible execution could look like this, where Thread 1 finishes all three steps before Thread 2 starts:
x = 0 # Initial value
temp1 = x # Thread 1 Read
temp1 += 1 # Thread 1 Compute, temp1 is 1.
x = temp1 # Thread 1 Write back, x is now 1.
temp2 = x # Thread 2 Read
temp2 += 1 # Thread 2 Compute, temp2 is 2.
x = temp2 # Thread 2 Write back, x is now 2.
This gives the correct result of 2. However, another possible execution has the instructions interleaved like this:
x = 0 # Initial value
temp1 = x # Thread 1 Read
temp1 += 1 # Thread 1 Compute, temp1 is 1.
temp2 = x # Thread 2 Read
temp2 += 1 # Thread 2 Compute, temp2 is 1.
x = temp1 # Thread 1 Write back, x is now 1.
x = temp2 # Thread 2 Write back, x is still 1.
This gives the wrong result of 1. Which execution actually happens depends on
the exact timing of the threads, which can vary between runs. The code may
work correctly most of the time and only fail occasionally, making the bug
very hard to reproduce and debug. The same bug can occur in matmul because
+= is used when computing the inner-product.
This condition of having multiple threads writing to the same variable concurrently, or one writing while others are reading, is known as a race condition. It is created by instructions from different threads interleaving without waiting or blocking each other, which is known as asynchronous execution.
So, asynchronous execution can indeed save us some time by running things in parallel, but it requires careful management to avoid race conditions. If multiple threads want to modify the same variable in an asynchronous manner, it may create a bug.
Note that we often use async and sync as short forms of asynchronous and synchronous respectively.