Multithreading

Multitasking can be accomplished through multiple processes or multiple threads within a single process.

As previously mentioned, a process is made up of several threads, and a process has at least one thread.

Since threads are execution units directly supported by the operating system, high-level languages often have built-in support for multithreading. Python is no exception, and Python threads are actual POSIX threads, not simulated threads.

Python’s standard library provides two modules: _thread and threading. The _thread module is a low-level module, while threading is a higher-level module that wraps around _thread. In most cases, we only need to use the threading module.

To start a thread, you simply pass a function to create a Thread instance and then call start() to begin execution:

python

import time
import threading

# Code executed by the new thread:
def loop():
    print('Thread %s is running...' % threading.current_thread().name)
    n = 0
    while n < 5:
        n = n + 1
        print('Thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
    print('Thread %s ended.' % threading.current_thread().name)

print('Thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('Thread %s ended.' % threading.current_thread().name)

The output will be:

Thread MainThread is running...
Thread LoopThread is running...
Thread LoopThread >>> 1
Thread LoopThread >>> 2
Thread LoopThread >>> 3
Thread LoopThread >>> 4
Thread LoopThread >>> 5
Thread LoopThread ended.
Thread MainThread ended.

Each process starts with a default thread called the main thread, which can spawn new threads. The threading module has a function called current_thread() that always returns the instance of the current thread. The name of the main thread instance is MainThread, while the name of the child thread can be specified during its creation (in this case, it’s named LoopThread). The name is solely for display purposes and has no other significance; if not specified, Python automatically names threads Thread-1, Thread-2, etc.

Lock

The primary difference between multithreading and multiprocessing is that in multiprocessing, each variable has its own copy within each process, which does not affect the others. In contrast, in multithreading, all variables are shared among all threads, meaning any variable can be modified by any thread. This shared data can lead to significant risks, as multiple threads may attempt to modify a variable simultaneously, leading to inconsistent data.

Let’s look at an example of multiple threads modifying a variable:

python

# multithread
import time
import threading

# Assume this is your bank balance:
balance = 0

def change_it(n):
    # Deposit first, then withdraw; the result should be 0:
    global balance
    balance = balance + n
    balance = balance - n

def run_thread(n):
    for i in range(10000000):
        change_it(n)

t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)

We define a shared variable balance, initialized to 0, and start two threads that will deposit and withdraw money. Theoretically, the result should be 0. However, due to thread scheduling determined by the operating system, the result for balance may not be 0 after enough iterations.

The issue arises because high-level language statements can comprise multiple lower-level operations. For instance, even a simple computation:

python

balance = balance + n

is broken down into two steps:

Compute balance + n and store it in a temporary variable.
Assign the temporary variable's value back to balance.

This can be visualized as:

python

x = balance + n
balance = x

Since x is a local variable, both threads have their instances of x. During normal execution:

Initial balance = 0

For thread t1:

x1 = balance + 5 # x1 = 0 + 5 = 5
balance = x1 # balance = 5
x1 = balance - 5 # x1 = 5 - 5 = 0
balance = x1 # balance = 0

For thread t2:

x2 = balance + 8 # x2 = 0 + 8 = 8
balance = x2 # balance = 8
x2 = balance - 8 # x2 = 8 - 8 = 0
balance = x2 # balance = 0

Result: balance = 0.

However, if t1 and t2 run alternately, the following sequence can occur:

Initial balance = 0

For thread t1:

x1 = balance + 5 # x1 = 0 + 5 = 5

For thread t2:

x2 = balance + 8 # x2 = 0 + 8 = 8
balance = x2 # balance = 8

Continuing with thread t1:

balance = x1 # balance = 5
x1 = balance - 5 # x1 = 5 - 5 = 0
balance = x1 # balance = 0

Continuing with thread t2:

x2 = balance - 8 # x2 = 0 - 8 = -8
balance = x2 # balance = -8

Result: balance = -8.

The root cause is that modifying balance requires multiple statements, and during execution, the threads may be interrupted, leading to conflicts over the shared variable.

To ensure that balance is calculated correctly, we need to add a lock around the change_it() function. When a thread begins executing change_it(), it acquires the lock, preventing other threads from executing it simultaneously. Only after releasing the lock can other threads proceed to modify balance. A lock can be created using threading.Lock():

python

balance = 0
lock = threading.Lock()

def run_thread(n):
    for i in range(100000):
        # First, acquire the lock:
        lock.acquire()
        try:
            # Modify with confidence:
            change_it(n)
        finally:
            # Always release the lock after modification:
            lock.release()

When multiple threads attempt to execute lock.acquire(), only one thread will successfully acquire the lock and continue executing, while others will wait until the lock is released.

It is essential for the thread that acquired the lock to release it after completing its work; otherwise, the waiting threads will remain blocked indefinitely, leading to a deadlock. Thus, we use a try...finally block to ensure that the lock is always released.

The advantage of using locks is that it guarantees that a specific block of code can only be executed by one thread at a time. However, the downside is that it prevents concurrent execution, effectively turning that section of code into a single-threaded execution, which can significantly reduce efficiency. Additionally, if multiple locks are used and different threads hold different locks while trying to acquire each other's locks, deadlocks can occur, causing all threads to hang indefinitely.

Multi-Core CPU

If you are fortunate enough to have a multi-core CPU, you might wonder if it can execute multiple threads simultaneously.

Let’s explore what happens if we write a busy-wait loop:

Open the Activity Monitor on macOS or the Task Manager on Windows to monitor the CPU usage of a process.

You will observe that a single busy-wait thread will utilize 100% of one CPU core.

With two busy-wait threads running on a multi-core CPU, you will see a total CPU usage of 200%, indicating that both CPU cores are in use.

To fully utilize an N-core CPU, you would need to start N busy-wait threads.

Let’s write a Python program to create a busy-wait loop:

python

import threading
import multiprocessing

def loop():
    x = 0
    while True:
        x = x ^ 1

for i in range(multiprocessing.cpu_count()):
    t = threading.Thread(target=loop)
    t.start()

By starting N threads equal to the number of CPU cores, on a 4-core CPU, you might see CPU usage at around 102%, indicating that only one core is being utilized.

However, if you rewrite the same busy-wait loop in C, C++, or Java, you could easily push the CPU usage to 400% on a 4-core CPU or 800% on an 8-core CPU. Why doesn’t Python achieve this?

The reason is that while Python threads are indeed real threads, the interpreter execution is governed by a Global Interpreter Lock (GIL). Before any Python thread can execute, it must first acquire the GIL, and after executing 100 bytecode instructions, the

interpreter automatically releases the GIL, allowing other threads a chance to execute. This GIL effectively locks all execution code across threads, meaning that multithreading in Python can only run threads alternately. Even if 100 threads run on a 100-core CPU, only one core will be utilized.

The GIL is a historical design decision in the Python interpreter, and most commonly used implementations, like CPython, maintain this lock. To truly take advantage of multi-core processing, a non-GIL interpreter would need to be implemented.

Therefore, while multithreading is available in Python, it does not effectively leverage multiple cores. To utilize multiple cores through threads, one would need to resort to C extensions, but this approach sacrifices Python's simplicity and ease of use.

However, there is no need to worry too much: even if multithreading cannot utilize multiple cores effectively, Python can achieve multi-core tasks through multiprocessing. Each Python process has its own independent GIL, allowing them to run concurrently without interference.

Summary

Multithreading programming is complex, and conflicts can easily arise, necessitating the use of locks for isolation while also being cautious of deadlocks.

Due to the GIL in the Python interpreter, multithreading cannot effectively utilize multiple cores. Concurrency with threads in Python remains a beautiful dream.

Multithreading ​

Lock ​

Multi-Core CPU ​

Summary ​

Multithreading

Lock

Multi-Core CPU

Summary