Multiprocessing

To implement multiprocessing in Python, we first need to understand some related concepts of operating systems.

The Unix/Linux operating systems provide a special system call called fork(). Unlike regular function calls, which return once per invocation, fork() returns twice because it automatically creates a copy of the current process (called the parent process) as a new process (called the child process) and returns to both the parent and child processes.

The child process always returns 0, while the parent process returns the child's ID. This design allows a parent process to fork() multiple child processes, keeping track of each child's ID, while the child can obtain its parent's ID by calling getppid().

Python's os module encapsulates common system calls, including fork, making it easy to create child processes in Python programs:

python

import os

print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
    print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
    print('I (%s) just created a child process (%s).' % (os.getpid(), pid))

The output will be as follows:

Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.

Since Windows does not have a fork call, the above code cannot run on Windows. Mac OS, being based on the BSD (a type of Unix) kernel, allows the code to run without issue. It is recommended to learn Python on a Mac!

With the fork call, a process can create a child process to handle a new task. A common example is the Apache server, which has a parent process listening on a port, and whenever there is a new HTTP request, it forks a child process to handle that request.

Multiprocessing

If you plan to write a multi-process service program, Unix/Linux is undoubtedly the right choice. Since Windows does not have a fork call, does it mean we cannot write multi-process programs in Python on Windows?

Since Python is cross-platform, it provides cross-platform support for multiprocessing. The multiprocessing module is the cross-platform version for multiprocessing.

The multiprocessing module provides a Process class to represent a process object. The following example demonstrates how to start a child process and wait for it to finish:

python

from multiprocessing import Process
import os

# Code to be executed by the child process
def run_proc(name):
    print('Run child process %s (%s)...' % (name, os.getpid()))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Process(target=run_proc, args=('test',))
    print('Child process will start.')
    p.start()
    p.join()
    print('Child process end.')

The output will be as follows:

Parent process 928.
Child process will start.
Run child process test (929)...
Process end.

To create a child process, you only need to provide an executable function and its parameters, create a Process instance, and use the start() method to initiate it. This method is simpler than using fork().

The join() method waits for the child process to finish before continuing, which is usually used for inter-process synchronization.

Pool

If you want to start a large number of child processes, you can use a process pool to create them in batches:

python

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print('Run task %s (%s)...' % (name, os.getpid()))
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print('Task %s runs %0.2f seconds.' % (name, (end - start)))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')

The output will be as follows:

Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.

Code Explanation:

Calling the join() method on the Pool object will wait for all child processes to complete. You must call close() before calling join(), and after calling close(), you cannot add new processes.

Note the output: tasks 0, 1, 2, and 3 execute immediately, while task 4 waits for one of the earlier tasks to complete before executing. This is because the default pool size on my computer is 4, meaning that a maximum of four processes can execute concurrently. This limit is intentionally designed by Pool, not by the operating system. If you change it to:

python

p = Pool(5)

You can run five processes simultaneously.

Since the default size of the pool matches the number of CPU cores, if you happen to have an 8-core CPU, you must submit at least 9 child processes to see the waiting effect mentioned above.

Child Processes

Often, child processes are not standalone but are external processes. After creating a child process, we may also need to control its input and output.

The subprocess module makes it easy to start a child process and control its input and output.

The following example demonstrates how to run the command nslookup www.python.org in Python, which has the same effect as running it directly from the command line:

python

import subprocess

print('$ nslookup www.python.org')
r = subprocess.call(['nslookup', 'www.python.org'])
print('Exit code:', r)

The output will be:

$ nslookup www.python.org
Server:		192.168.19.4
Address:	192.168.19.4#53

Non-authoritative answer:
www.python.org	canonical name = python.map.fastly.net.
Name:	python.map.fastly.net
Address: 199.27.79.223

Exit code: 0

If the child process also requires input, it can be done using the communicate() method:

python

import subprocess

print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)

The above code is equivalent to executing the nslookup command in the command line and manually entering:

set q=mx
python.org
exit

The output will be as follows:

$ nslookup
Server:		192.168.19.4
Address:	192.168.19.4#53

Non-authoritative answer:
python.org	mail exchanger = 50 mail.python.org.

Authoritative answers can be found from:
mail.python.org	internet address = 82.94.164.166
mail.python.org	has AAAA address 2001:888:2000:d::a6

Exit code: 0

Inter-Process Communication

Processes certainly need to communicate with each other, and the operating system provides various mechanisms to enable inter-process communication. Python's multiprocessing module wraps these low-level mechanisms, providing multiple ways to exchange data, such as Queue and Pipes.

Using Queue as an example, we can create two child processes in the parent process, one for writing data to the queue and the other for reading data from it:

python

from multiprocessing import Process, Queue
import os, time, random

# Code executed by the process that writes data:
def write(q):
    print('Process to write: %s' % os.getpid())
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())

# Code executed by the process that reads data:
def read(q):
    print('Process to read: %s' % os.getpid())
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)

if __name__=='__main__':
    # The parent process creates a Queue and passes it to the child processes:
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # Start the child process pw to write:
    pw.start()
    # Start the child process pr to read:
    pr.start()
    # Wait for pw to finish:
    pw.join()
    # The pr process is in an infinite loop and cannot be waited upon, so it must be forcibly terminated:
    pr.terminate()

The output will be as follows:

Process to write: 50563


Put A to queue...
Process to read: 50564
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.

In Unix/Linux, the multiprocessing module encapsulates the fork() call, allowing us not to worry about the details of fork(). Since Windows does not have a fork() call, the multiprocessing module needs to "simulate" the effect of fork(). Therefore, all Python objects from the parent process must be serialized using pickle and passed to the child process. If multiprocessing fails on Windows, it is essential to consider whether it is a pickle failure.

Summary

In Unix/Linux, we can use the fork() call to implement multiprocessing.
For cross-platform multiprocessing, we can use the multiprocessing module.
Inter-process communication is achieved through Queue, Pipes, and other means.

Multiprocessing ​

Multiprocessing ​

Pool ​

Child Processes ​

Inter-Process Communication ​

Summary ​

Multiprocessing

Multiprocessing

Pool

Child Processes

Inter-Process Communication

Summary