Appearance
Non-Greedy Matching
Before introducing non-greedy matching, let’s first look at a simple problem:
Given a string representing a number, determine how many zeros are at the end of that number. For example:
- "123000": 3 zeros
- "10100": 2 zeros
- "1001": 0 zeros
It is straightforward to write the regex for this: (\d+)(0*)
. The Java code would be as follows:
java
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(\\d+)(0*)");
Matcher matcher = pattern.matcher("1230000");
if (matcher.matches()) {
System.out.println("group1=" + matcher.group(1)); // "1230000"
System.out.println("group2=" + matcher.group(2)); // ""
}
}
}
However, the printed second substring is an empty string ""
.
In fact, we expect the grouping match results to be:
input | \d+ | 0* |
---|---|---|
123000 | "123" | "000" |
10100 | "101" | "00" |
1001 | "1001" | "" |
But the actual grouping match results are:
input | \d+ | 0* |
---|---|---|
123000 | "123000" | "" |
10100 | "10100" | "" |
1001 | "1001" | "" |
Upon careful observation of the actual match results, we see that they are entirely reasonable because \d+
indeed matches any number of zeros that follow.
This happens because regular expressions use greedy matching by default: any rule will always try to match as much as possible moving forward, so \d+
will always include the following zeros.
To make \d+
match as little as possible and 0*
match as much as possible, we need to use non-greedy matching for \d+
. We can indicate non-greedy matching by adding a ?
after \d+
. Thus, we rewrite the regex as follows:
java
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(\\d+?)(0*)");
Matcher matcher = pattern.matcher("1230000");
if (matcher.matches()) {
System.out.println("group1=" + matcher.group(1)); // "123"
System.out.println("group2=" + matcher.group(2)); // "0000"
}
}
}
Therefore, by appending ?
to a matching rule, it changes to non-greedy matching.
Now let’s look at the regex (\d??)(9*)
. Note that \d?
indicates matching 0 or 1 digit, and the second ?
indicates non-greedy matching. Thus, given the string "9999", the matched substrings are ""
and "9999"
, because \d?
can match either 1 digit or 0 digits. However, since the following ?
indicates non-greedy matching, it matches 0 digits.
Summary
Regular expression matching uses greedy matching by default, but you can use ?
to indicate non-greedy matching for a particular rule.
It's important to distinguish the meanings of ?
: \d??
.