Skip to content

IO

IO stands for Input/Output, referring to the processes of input and output. Centered around memory:

Input refers to reading data from external sources into memory, such as loading files from disk into memory or reading data from the network into memory.

Output refers to sending data from memory to external sources, such as writing data from memory to a file or transmitting data from memory to the network.

Why is it necessary to read data into memory for processing? Because the code runs in memory, the data must also be loaded into memory, and the final representation is typically in the form of byte arrays, strings, etc., which must all be stored in memory.

From the perspective of Java code, input essentially involves reading content from an external source, such as a file on the hard drive, into memory and representing it using a specific Java data type, such as byte[] or String, so that subsequent code can process this data.

Due to the "volatile" nature of memory, processed data must be output in some form, such as writing it to a file. Output essentially involves sending data formatted in Java, such as byte[] or String, to a specified destination.

IO streams are a model for sequentially reading and writing data, characterized by unidirectional flow. The data flows like water through a pipe, which is why we refer to it as an IO stream.

InputStream / OutputStream

IO streams operate with bytes as the smallest unit, thus they are also referred to as byte streams. For example, when we read a file from the disk that contains 6 bytes, it means we are reading in 6 bytes of data:

╔═══════════╗
║  Memory   ║
╚═══════════╝

      │0x48
      │0x65
      │0x6c
      │0x6c
      │0x6f
      │0x21
╔═══════════╗
║ Hard Disk ║
╚═══════════╝

These 6 bytes are read in sequentially, thus forming an input byte stream.

Conversely, when we write these 6 bytes from memory to a disk file, it constitutes an output byte stream:

╔═══════════╗
║  Memory   ║
╚═══════════╝
      │0x21
      │0x6f
      │0x6c
      │0x6c
      │0x65
      │0x48

╔═══════════╗
║ Hard Disk ║
╚═══════════╝

In Java, InputStream represents the input byte stream, while OutputStream represents the output byte stream. These are the two fundamental types of IO streams.

Reader / Writer

If we need to read and write characters, and not all characters are represented by single-byte ASCII, it is obviously more convenient to read and write based on char. This type of stream is called a character stream.

Java provides Reader and Writer to represent character streams, where the smallest data unit transmitted is char.

For example, if we write a char[] array containing the characters "Hi你好" using a Writer character stream with UTF-8 encoding, the final content of the file consists of 8 bytes: the English characters 'H' and 'i' each occupy one byte, while the Chinese characters '你' and '好' each occupy 3 bytes:

0x48
0x69
0xe4bda0
0xe5a5bd

Conversely, if we use a Reader to read these 8 bytes encoded in UTF-8, we will get the characters "Hi你好" from the Reader.

Thus, Reader and Writer essentially function as InputStream and OutputStream that can automatically encode and decode data.

When using Reader, although the data source is in bytes, the data we read in consists of char type characters because Reader internally decodes the input bytes into char. Using InputStream, the data we read is exactly the same as the original binary data, represented as a byte[] array. However, we can manually convert this binary byte[] array into a string based on a specific encoding. Ultimately, whether to use Reader or InputStream depends on the specific context. If the data source is not text, only InputStream can be used. If the data source is text, using Reader is more convenient. The same applies to Writer and OutputStream.

Synchronous and Asynchronous

Synchronous IO refers to situations where the code must wait for data to return before continuing execution of subsequent code. The advantage of this approach is that the code is simpler to write, while the downside is lower CPU execution efficiency.

Asynchronous IO, on the other hand, refers to scenarios where a request is sent for read/write IO, and the code continues executing immediately afterward. The advantage here is higher CPU execution efficiency, while the disadvantage is that the code can be more complex to write.

The Java standard library's java.io package provides synchronous IO, while java.nio handles asynchronous IO. The InputStream, OutputStream, Reader, and Writer discussed above are all abstract classes for synchronous IO, with concrete implementations such as FileInputStream, FileOutputStream, FileReader, and FileWriter for files.

In this section, we will focus solely on Java's synchronous IO, specifically the IO model of input/output streams.

Summary

IO streams are a model for streaming data input/output:

  • Binary data flows in a unidirectional manner in InputStream/OutputStream, with bytes as the smallest unit.
  • Character data flows in a unidirectional manner in Reader/Writer, with char as the smallest unit.

The Java standard library's java.io package provides synchronous IO functionality:

  • Byte stream interfaces: InputStream/OutputStream
  • Character stream interfaces: Reader/Writer
IO has loaded