Skip to content

Serialization

Serialization refers to converting a Java object into binary content, essentially a byte[] array.

Why serialize a Java object? Because after serialization, the byte[] can be saved to a file or transmitted over a network, effectively storing the Java object in a file or sending it over the network.

With serialization comes deserialization, which converts binary content (i.e., a byte[] array) back into a Java object. With deserialization, the byte[] stored in a file can be "turned back" into a Java object, or a byte[] read from the network can be "converted back" into a Java object.

Let’s look at how to serialize a Java object.

A Java object must implement a special interface called java.io.Serializable to be serializable. Its definition is as follows:

java
public interface Serializable {
}

The Serializable interface does not define any methods; it is an empty interface. Such an empty interface is called a "marker interface." Classes that implement a marker interface simply mark themselves without adding any methods.

To convert a Java object into a byte[] array, we need to use ObjectOutputStream, which writes a Java object to a byte stream:

java
import java.io.*;
import java.util.Arrays;

public class Main {
    public static void main(String[] args) throws IOException {
        ByteArrayOutputStream buffer = new ByteArrayOutputStream();
        try (ObjectOutputStream output = new ObjectOutputStream(buffer)) {
            // Write int:
            output.writeInt(12345);
            // Write String:
            output.writeUTF("Hello");
            // Write Object:
            output.writeObject(Double.valueOf(123.456));
        }
        System.out.println(Arrays.toString(buffer.toByteArray()));
    }
}

ObjectOutputStream can write both primitive types like int and boolean, as well as String (in UTF-8 encoding), and it can also write objects that implement the Serializable interface.

Writing an object requires a significant amount of type information, so the content size is large.

Deserialization

Conversely, ObjectInputStream reads Java objects from a byte stream:

java
try (ObjectInputStream input = new ObjectInputStream(...)) {
    int n = input.readInt();
    String s = input.readUTF();
    Double d = (Double) input.readObject();
}

In addition to reading primitive and String types, calling readObject() directly returns an Object. To convert it into a specific type, you must perform a type cast.

The readObject() method may throw the following exceptions:

  • ClassNotFoundException: The corresponding class was not found.
  • InvalidClassException: The class does not match.

ClassNotFoundException is common when a Java program on one computer serializes a Java object, such as a Person object, and sends it over the network to another Java program on a different computer that does not define the Person class, making deserialization impossible.

InvalidClassException occurs when a serialized Person object has an int type field age, but during deserialization, the Person class has changed the age field to a long type, leading to class incompatibility.

To avoid such incompatibility due to class definition changes, Java serialization allows a class to define a special static variable called serialVersionUID to identify the serialization "version" of the Java class. This can typically be auto-generated by an IDE. If fields are added or modified, you can change the serialVersionUID value, which will automatically prevent mismatched class versions:

java
public class Person implements Serializable {
    private static final long serialVersionUID = 2709425275741743919L;
}

Important Deserialization Characteristics

During deserialization, the JVM directly constructs the Java object without invoking the constructor. Therefore, any code inside the constructor will not execute during deserialization.

Security

Java's serialization mechanism poses a security risk because it allows an instance to be created directly from a byte[] array without going through the constructor. A carefully crafted byte[] array, when deserialized, can execute specific Java code, leading to severe security vulnerabilities.

In fact, Java’s built-in object-based serialization and deserialization mechanisms have both security and compatibility issues. A better serialization method is to use a universal data structure like JSON, which outputs only primitive types (including String) and does not store any code-related information.

Summary

  • Serializable Java objects must implement the java.io.Serializable interface; empty interfaces like Serializable are called "marker interfaces."
  • During deserialization, constructors are not called, and a serialVersionUID can be set as a version number (not mandatory).
  • Java's serialization mechanism is only suitable for Java. To exchange data with other languages, a universal serialization method, such as JSON, should be used.
Serialization has loaded