Appearance
IO
IO stands for Input/Output, referring to the processes of input and output. Centered around memory:
Input refers to reading data from external sources into memory, such as loading files from disk into memory or reading data from the network into memory.
Output refers to sending data from memory to external sources, such as writing data from memory to a file or transmitting data from memory to the network.
Why is it necessary to read data into memory for processing? Because the code runs in memory, the data must also be loaded into memory, and the final representation is typically in the form of byte arrays, strings, etc., which must all be stored in memory.
From the perspective of Java code, input essentially involves reading content from an external source, such as a file on the hard drive, into memory and representing it using a specific Java data type, such as byte[] or String, so that subsequent code can process this data.
Due to the "volatile" nature of memory, processed data must be output in some form, such as writing it to a file. Output essentially involves sending data formatted in Java, such as byte[] or String, to a specified destination.
IO streams are a model for sequentially reading and writing data, characterized by unidirectional flow. The data flows like water through a pipe, which is why we refer to it as an IO stream.
InputStream / OutputStream
IO streams operate with bytes as the smallest unit, thus they are also referred to as byte streams. For example, when we read a file from the disk that contains 6 bytes, it means we are reading in 6 bytes of data:
╔═══════════╗
║ Memory ║
╚═══════════╝
▲
│0x48
│0x65
│0x6c
│0x6c
│0x6f
│0x21
╔═══════════╗
║ Hard Disk ║
╚═══════════╝
These 6 bytes are read in sequentially, thus forming an input byte stream.
Conversely, when we write these 6 bytes from memory to a disk file, it constitutes an output byte stream:
╔═══════════╗
║ Memory ║
╚═══════════╝
│0x21
│0x6f
│0x6c
│0x6c
│0x65
│0x48
▼
╔═══════════╗
║ Hard Disk ║
╚═══════════╝
In Java, InputStream
represents the input byte stream, while OutputStream
represents the output byte stream. These are the two fundamental types of IO streams.
Reader / Writer
If we need to read and write characters, and not all characters are represented by single-byte ASCII, it is obviously more convenient to read and write based on char
. This type of stream is called a character stream.
Java provides Reader
and Writer
to represent character streams, where the smallest data unit transmitted is char
.
For example, if we write a char[]
array containing the characters "Hi你好" using a Writer
character stream with UTF-8 encoding, the final content of the file consists of 8 bytes: the English characters 'H' and 'i' each occupy one byte, while the Chinese characters '你' and '好' each occupy 3 bytes:
0x48
0x69
0xe4bda0
0xe5a5bd
Conversely, if we use a Reader
to read these 8 bytes encoded in UTF-8, we will get the characters "Hi你好" from the Reader
.
Thus, Reader
and Writer
essentially function as InputStream
and OutputStream
that can automatically encode and decode data.
When using Reader
, although the data source is in bytes, the data we read in consists of char
type characters because Reader
internally decodes the input bytes into char
. Using InputStream
, the data we read is exactly the same as the original binary data, represented as a byte[]
array. However, we can manually convert this binary byte[]
array into a string based on a specific encoding. Ultimately, whether to use Reader
or InputStream
depends on the specific context. If the data source is not text, only InputStream
can be used. If the data source is text, using Reader
is more convenient. The same applies to Writer
and OutputStream
.
Synchronous and Asynchronous
Synchronous IO refers to situations where the code must wait for data to return before continuing execution of subsequent code. The advantage of this approach is that the code is simpler to write, while the downside is lower CPU execution efficiency.
Asynchronous IO, on the other hand, refers to scenarios where a request is sent for read/write IO, and the code continues executing immediately afterward. The advantage here is higher CPU execution efficiency, while the disadvantage is that the code can be more complex to write.
The Java standard library's java.io
package provides synchronous IO, while java.nio
handles asynchronous IO. The InputStream
, OutputStream
, Reader
, and Writer
discussed above are all abstract classes for synchronous IO, with concrete implementations such as FileInputStream
, FileOutputStream
, FileReader
, and FileWriter
for files.
In this section, we will focus solely on Java's synchronous IO, specifically the IO model of input/output streams.
Summary
IO streams are a model for streaming data input/output:
- Binary data flows in a unidirectional manner in
InputStream
/OutputStream
, with bytes as the smallest unit. - Character data flows in a unidirectional manner in
Reader
/Writer
, withchar
as the smallest unit.
The Java standard library's java.io
package provides synchronous IO functionality:
- Byte stream interfaces:
InputStream
/OutputStream
- Character stream interfaces:
Reader
/Writer