Appearance
Multiprocessing
To implement multiprocessing in Python, we first need to understand some related concepts of operating systems.
The Unix/Linux operating systems provide a special system call called fork()
. Unlike regular function calls, which return once per invocation, fork()
returns twice because it automatically creates a copy of the current process (called the parent process) as a new process (called the child process) and returns to both the parent and child processes.
The child process always returns 0, while the parent process returns the child's ID. This design allows a parent process to fork()
multiple child processes, keeping track of each child's ID, while the child can obtain its parent's ID by calling getppid()
.
Python's os
module encapsulates common system calls, including fork
, making it easy to create child processes in Python programs:
python
import os
print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
print('I (%s) just created a child process (%s).' % (os.getpid(), pid))
The output will be as follows:
Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.
Since Windows does not have a fork
call, the above code cannot run on Windows. Mac OS, being based on the BSD (a type of Unix) kernel, allows the code to run without issue. It is recommended to learn Python on a Mac!
With the fork
call, a process can create a child process to handle a new task. A common example is the Apache server, which has a parent process listening on a port, and whenever there is a new HTTP request, it forks a child process to handle that request.
Multiprocessing
If you plan to write a multi-process service program, Unix/Linux is undoubtedly the right choice. Since Windows does not have a fork
call, does it mean we cannot write multi-process programs in Python on Windows?
Since Python is cross-platform, it provides cross-platform support for multiprocessing. The multiprocessing
module is the cross-platform version for multiprocessing.
The multiprocessing
module provides a Process
class to represent a process object. The following example demonstrates how to start a child process and wait for it to finish:
python
from multiprocessing import Process
import os
# Code to be executed by the child process
def run_proc(name):
print('Run child process %s (%s)...' % (name, os.getpid()))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Process(target=run_proc, args=('test',))
print('Child process will start.')
p.start()
p.join()
print('Child process end.')
The output will be as follows:
Parent process 928.
Child process will start.
Run child process test (929)...
Process end.
To create a child process, you only need to provide an executable function and its parameters, create a Process
instance, and use the start()
method to initiate it. This method is simpler than using fork()
.
The join()
method waits for the child process to finish before continuing, which is usually used for inter-process synchronization.
Pool
If you want to start a large number of child processes, you can use a process pool to create them in batches:
python
from multiprocessing import Pool
import os, time, random
def long_time_task(name):
print('Run task %s (%s)...' % (name, os.getpid()))
start = time.time()
time.sleep(random.random() * 3)
end = time.time()
print('Task %s runs %0.2f seconds.' % (name, (end - start)))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Pool(4)
for i in range(5):
p.apply_async(long_time_task, args=(i,))
print('Waiting for all subprocesses done...')
p.close()
p.join()
print('All subprocesses done.')
The output will be as follows:
Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.
Code Explanation:
Calling the join()
method on the Pool
object will wait for all child processes to complete. You must call close()
before calling join()
, and after calling close()
, you cannot add new processes.
Note the output: tasks 0, 1, 2, and 3 execute immediately, while task 4 waits for one of the earlier tasks to complete before executing. This is because the default pool size on my computer is 4, meaning that a maximum of four processes can execute concurrently. This limit is intentionally designed by Pool
, not by the operating system. If you change it to:
python
p = Pool(5)
You can run five processes simultaneously.
Since the default size of the pool matches the number of CPU cores, if you happen to have an 8-core CPU, you must submit at least 9 child processes to see the waiting effect mentioned above.
Child Processes
Often, child processes are not standalone but are external processes. After creating a child process, we may also need to control its input and output.
The subprocess
module makes it easy to start a child process and control its input and output.
The following example demonstrates how to run the command nslookup www.python.org
in Python, which has the same effect as running it directly from the command line:
python
import subprocess
print('$ nslookup www.python.org')
r = subprocess.call(['nslookup', 'www.python.org'])
print('Exit code:', r)
The output will be:
$ nslookup www.python.org
Server: 192.168.19.4
Address: 192.168.19.4#53
Non-authoritative answer:
www.python.org canonical name = python.map.fastly.net.
Name: python.map.fastly.net
Address: 199.27.79.223
Exit code: 0
If the child process also requires input, it can be done using the communicate()
method:
python
import subprocess
print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)
The above code is equivalent to executing the nslookup
command in the command line and manually entering:
set q=mx
python.org
exit
The output will be as follows:
$ nslookup
Server: 192.168.19.4
Address: 192.168.19.4#53
Non-authoritative answer:
python.org mail exchanger = 50 mail.python.org.
Authoritative answers can be found from:
mail.python.org internet address = 82.94.164.166
mail.python.org has AAAA address 2001:888:2000:d::a6
Exit code: 0
Inter-Process Communication
Processes certainly need to communicate with each other, and the operating system provides various mechanisms to enable inter-process communication. Python's multiprocessing
module wraps these low-level mechanisms, providing multiple ways to exchange data, such as Queue
and Pipes
.
Using Queue
as an example, we can create two child processes in the parent process, one for writing data to the queue and the other for reading data from it:
python
from multiprocessing import Process, Queue
import os, time, random
# Code executed by the process that writes data:
def write(q):
print('Process to write: %s' % os.getpid())
for value in ['A', 'B', 'C']:
print('Put %s to queue...' % value)
q.put(value)
time.sleep(random.random())
# Code executed by the process that reads data:
def read(q):
print('Process to read: %s' % os.getpid())
while True:
value = q.get(True)
print('Get %s from queue.' % value)
if __name__=='__main__':
# The parent process creates a Queue and passes it to the child processes:
q = Queue()
pw = Process(target=write, args=(q,))
pr = Process(target=read, args=(q,))
# Start the child process pw to write:
pw.start()
# Start the child process pr to read:
pr.start()
# Wait for pw to finish:
pw.join()
# The pr process is in an infinite loop and cannot be waited upon, so it must be forcibly terminated:
pr.terminate()
The output will be as follows:
Process to write: 50563
Put A to queue...
Process to read: 50564
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.
In Unix/Linux, the multiprocessing
module encapsulates the fork()
call, allowing us not to worry about the details of fork()
. Since Windows does not have a fork()
call, the multiprocessing
module needs to "simulate" the effect of fork()
. Therefore, all Python objects from the parent process must be serialized using pickle
and passed to the child process. If multiprocessing
fails on Windows, it is essential to consider whether it is a pickle
failure.
Summary
- In Unix/Linux, we can use the
fork()
call to implement multiprocessing. - For cross-platform multiprocessing, we can use the
multiprocessing
module. - Inter-process communication is achieved through
Queue
,Pipes
, and other means.