Skip to content

Characters And Strings

In Java, characters and strings are two different types.

Character Type

The character type char is the basic data type, which is the abbreviation of character . A char holds a Unicode character:

java
char c1 = 'A';
char c2 = '中';

Because Java always uses Unicode to represent characters in memory, an English character and a Chinese character are both represented by a char type, and they both occupy two bytes. To display the Unicode encoding of a character, just assign the char type directly to the int type:

java
int n1 = 'A'; // The Unicode encoding for the letter "A" is 65
int n2 = '中'; // The Unicode encoding of the Chinese character "中" is 20013

You can also directly use the escape character \u +Unicode encoding to represent a character:

java
// Note that it is hexadecimal:
char c3 = '\u0041'; // 'A',Because hex 0041 = decimal 65
char c4 = '\u4e2d'; // '中',Because hex 4e2d = decimal 20013

String Type

Different from the char type, the string type String is a reference type. We use double quotes "..." to represent a string. A string can store 0 to any number of characters:

java
String s = ""; // Empty string, containing 0 characters
String s1 = "A"; // contains one character
String s2 = "ABC"; // Contains 3 characters
String s3 = "中文 ABC"; // Contains 6 characters, including a space

Because the string uses double quotes "..." to indicate the beginning and end, what if the string itself contains exactly one " character? For example, "abc"xyz" , the compiler cannot determine whether the middle quote is a string Part of it still indicates the end of the string. At this time, we need to use the escape character \ :

java
String s = "abc\"xyz"; // Contains 7 characters: a, b, c, ", x, y, z

Because \ is an escape character, two \\ represent one \ character:

java
String s = "abc\\xyz"; // Contains 7 characters: a, b, c, \, x, y, z

Common escape characters include:

  • \" represents characters "
  • \' represents character '
  • \\ represents characters \
  • \n represents a newline character
  • \r represents the carriage return character
  • \t means Tab
  • \u#### represents a Unicode encoded character

For example:

java
String s = "ABC\n\u4e2d\u6587"; // Contains 6 characters: A, B, C, \n, 中, 文

String Concatenation

Java's compiler takes special care of strings and can use + to connect any string and other data types, which greatly facilitates string processing. For example:

java
public class Main {
    public static void main(String[] args) {
        String s1 = "Hello";
        String s2 = "world";
        String s = s1 + " " + s2 + "!";
        System.out.println(s); // Hello world!
    }
}

If you use + to connect a string with other data types, the other data types will be automatically converted to strings first and then connected:

java
public class Main {
    public static void main(String[] args) {
        int age = 25;
        String s = "age is " + age;
        System.out.println(s); // age is 25
    }
}

Multiline String

If we want to represent a multi-line string, it will be very inconvenient to use the + sign to connect:

java
String s = "first line \n"
         + "second line \n"
         + "end";

Starting from Java 13, strings can use """...""" to represent multi-line strings (Text Blocks). For example:

java
public class Main {
    public static void main(String[] args) {
        String s = """
                   SELECT * FROM
                     users
                   WHERE id > 100
                   ORDER BY name DESC
                   """;
        System.out.println(s);
    }
}

The above multi-line string is actually 5 lines, with a \n after the last DESC . If we don't want to add a \n at the end of the string, we need to write like this:

java
String s = """ 
           SELECT * FROM
             users
           WHERE id > 100
           ORDER BY name DESC""";

It should also be noted that common spaces in front of multi-line strings will be removed, that is:

java
String s = """
...........SELECT * FROM
...........  users
...........WHERE id > 100
...........ORDER BY name DESC
...........""";

Spaces marked with . will be removed.

If the formatting of a multi-line string is irregular, the removed spaces will look like this:

java
String s = """
.........  SELECT * FROM
.........    users
.........WHERE id > 100
.........  ORDER BY name DESC
.........  """;

That is, the shortest space at the beginning of the line is always used as the basis.

Immutable Properties

In addition to being a reference type, Java's string also has an important feature, which is that strings are immutable. Examine the following code:

java
public class Main {
    public static void main(String[] args) {
        String s = "hello";
        System.out.println(s); //  hello
        s = "world";
        System.out.println(s); //  world
    }
}

Observe the execution results. Has the string s changed? In fact, what is changed is not the string, but the "pointing" of the variable s .

When executing String s = "hello"; the JVM virtual machine first creates the string "hello" , and then points the string variable s to it:

      s


┌───┬───────────┬───┐
│   │  "hello"  │   │
└───┴───────────┴───┘

Immediately afterwards, when executing s = "world"; the JVM virtual machine first creates the string "world" , and then points the string variable s to it:

      s ──────────────┐


┌───┬───────────┬───┬───────────┬───┐
│   │  "hello"  │   │  "world"  │   │
└───┴───────────┴───┴───────────┴───┘

The original string "hello" is still there, but we can't access it through the variable s . Therefore, the immutability of a string means that the content of the string is immutable. As for variables, they can point to the string "hello" at one time and "world" at the other time.

After understanding the "pointing" of reference types, try to explain the following code output:

java
// String is immutable
public class Main {
    public static void main(String[] args) {
        String s = "hello";
        String t = s;
        s = "world";
        System.out.println(t); // Is t "hello" or "world"?
    }
}

Null Value

A reference type variable can point to an empty value null , which means it does not exist, that is, the variable does not point to any object. For example:

java
String s1 = null; // s1 is null
String s2 = s1; // s2 is also null
String s3 = ""; // s3 points to the empty string, not null

Note that the empty value null and the empty string "" should be distinguished. The empty string is a valid string object and is not equal to null .

Practise

Please treat a set of int values as the Unicode encoding of the characters and then piece them together into a string:

java
public class Main {
    public static void main(String[] args) {
        // Please treat the following set of int values as the Unicode codes of characters and put them together into a string:
        int a = 72;
        int b = 105;
        int c = 65281;
        // FIXME:
        String s = a + b + c;
        System.out.println(s);
    }
}

Summary

Java's character type char is a basic type, and string type String is a reference type;

Variables of basic types "hold" a certain value, and variables of reference types "point to" an object;

Variables of reference types can be null ;

To distinguish between the empty value null and the empty string "" .

Characters And Strings has loaded