Appearance
File Read and Write
Reading and writing files is one of the most common IO operations. Python has built-in functions for file reading and writing that are compatible with C.
Before we delve into file operations, it's important to understand that the capability to read and write files on disk is provided by the operating system. Modern operating systems do not allow regular programs to manipulate disks directly, so reading and writing files involves requesting the operating system to open a file object (often referred to as a file descriptor). The program can then read data from this file object (reading) or write data to it (writing) using the interfaces provided by the operating system.
Reading Files
To open a file object in read mode, use Python's built-in open()
function with the file name and mode as arguments:
python
>>> f = open('/Users/michael/test.txt', 'r')
The mode 'r'
indicates reading, so we have successfully opened a file.
If the file does not exist, the open()
function will raise an IOError
, providing an error code and a detailed message:
python
>>> f = open('/Users/michael/notfound.txt', 'r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/Users/michael/notfound.txt'
If the file opens successfully, you can call the read()
method to read the entire content of the file into memory as a string object:
python
>>> f.read()
'Hello, world!'
The final step is to call the close()
method to close the file. Files must be closed after use because file objects consume resources from the operating system, and there is a limit to how many files the operating system can have open at once:
python
>>> f.close()
Since reading and writing files can potentially produce IOError
, if an error occurs, the subsequent f.close()
will not be called. Therefore, to ensure that the file is closed correctly regardless of errors, you can use a try ... finally
block:
python
try:
f = open('/path/to/file', 'r')
print(f.read())
finally:
if f:
f.close()
However, writing this every time can be cumbersome. Python introduces the with
statement to automatically call the close()
method:
python
with open('/path/to/file', 'r') as f:
print(f.read())
This is equivalent to the previous try ... finally
block, but the code is more concise and you do not need to call f.close()
explicitly.
Calling read()
reads the entire content of the file at once, which can be problematic if the file is very large (e.g., 10GB), as it may exceed memory limits. To be safer, you can repeatedly call read(size)
to read up to size
bytes at a time. Additionally, you can use readline()
to read one line at a time or readlines()
to read all lines at once and return a list. Therefore, you should decide how to call these methods based on your needs.
If the file is small, read()
is the most convenient method. If the file size is uncertain, repeatedly calling read(size)
is safer. If you're dealing with a configuration file, readlines()
is the easiest method:
python
for line in f.readlines():
print(line.strip()) # Remove the trailing '\n'
File-like Objects
Objects returned by the open()
function that have a read()
method are collectively referred to as file-like objects in Python. Besides file objects, this can also include memory byte streams, network streams, custom streams, and so on. A file-like object does not need to inherit from a specific class; it only needs to implement a read()
method.
StringIO
is an example of a file-like object created in memory, commonly used as a temporary buffer.
Binary Files
The previous discussion primarily addressed reading text files, specifically UTF-8 encoded text files. To read binary files, such as images or videos, open the file with the 'rb'
mode:
python
>>> f = open('/Users/michael/test.jpg', 'rb')
>>> f.read()
b'\xff\xd8\xff\xe1\x00\x18Exif\x00\x00...' # Hexadecimal representation of bytes
Character Encoding
To read non-UTF-8 encoded text files, you need to provide the encoding
parameter to the open()
function. For example, to read a GBK encoded file:
python
>>> f = open('/Users/michael/gbk.txt', 'r', encoding='gbk')
>>> f.read()
'测试'
If you encounter some files with inconsistent encoding, you might encounter a UnicodeDecodeError
because they may contain invalid characters. In such cases, the open()
function also accepts an errors
parameter to specify how to handle encoding errors. The simplest way is to ignore them:
python
>>> f = open('/Users/michael/gbk.txt', 'r', encoding='gbk', errors='ignore')
Writing Files
Writing files is similar to reading files, with the only difference being that you pass 'w'
or 'wb'
to the open()
function to indicate writing text or binary files, respectively:
python
>>> f = open('/Users/michael/test.txt', 'w')
>>> f.write('Hello, world!')
>>> f.close()
You can repeatedly call write()
to write to the file, but it is essential to call f.close()
to close the file. When writing files, the operating system often does not immediately write data to the disk; it may be cached in memory and written later. The operating system guarantees that all unwritten data is flushed to disk only when close()
is called. Failing to call close()
can result in only part of the data being written to disk, leading to data loss. Therefore, it is safer to use the with
statement:
python
with open('/Users/michael/test.txt', 'w') as f:
f.write('Hello, world!')
To write text files with specific encoding, provide the encoding
parameter to the open()
function to automatically convert the string to the specified encoding.
It's important to note that if you write to a file in 'w'
mode and the file already exists, it will be overwritten (effectively deleting the old file and creating a new one). If you want to append data to the end of the file, you can use the 'a'
mode to write in append mode.
You can refer to the official Python documentation for definitions and meanings of all modes.
Exercise
Please read a local text file into a string and print it out:
python
fpath = '/etc/timezone'
with open(fpath, 'r') as f:
s = f.read()
print(s)
Summary
In Python, file reading and writing are accomplished through file objects opened with the open()
function. Using the with
statement to handle file IO is considered a good practice.