Skip to content
On this page

Non-Greedy Matching

Before introducing non-greedy matching, let’s first look at a simple problem:

Given a string representing a number, determine how many zeros are at the end of that number. For example:

  • "123000": 3 zeros
  • "10100": 2 zeros
  • "1001": 0 zeros

It is straightforward to write the regex for this: (\d+)(0*). The Java code would be as follows:

java
import java.util.regex.*;

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d+)(0*)");
        Matcher matcher = pattern.matcher("1230000");
        if (matcher.matches()) {
            System.out.println("group1=" + matcher.group(1)); // "1230000"
            System.out.println("group2=" + matcher.group(2)); // ""
        }
    }
}

However, the printed second substring is an empty string "".

In fact, we expect the grouping match results to be:

input\d+0*
123000"123""000"
10100"101""00"
1001"1001"""

But the actual grouping match results are:

input\d+0*
123000"123000"""
10100"10100"""
1001"1001"""

Upon careful observation of the actual match results, we see that they are entirely reasonable because \d+ indeed matches any number of zeros that follow.

This happens because regular expressions use greedy matching by default: any rule will always try to match as much as possible moving forward, so \d+ will always include the following zeros.

To make \d+ match as little as possible and 0* match as much as possible, we need to use non-greedy matching for \d+. We can indicate non-greedy matching by adding a ? after \d+. Thus, we rewrite the regex as follows:

java
import java.util.regex.*;

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d+?)(0*)");
        Matcher matcher = pattern.matcher("1230000");
        if (matcher.matches()) {
            System.out.println("group1=" + matcher.group(1)); // "123"
            System.out.println("group2=" + matcher.group(2)); // "0000"
        }
    }
}

Therefore, by appending ? to a matching rule, it changes to non-greedy matching.

Now let’s look at the regex (\d??)(9*). Note that \d? indicates matching 0 or 1 digit, and the second ? indicates non-greedy matching. Thus, given the string "9999", the matched substrings are "" and "9999", because \d? can match either 1 digit or 0 digits. However, since the following ? indicates non-greedy matching, it matches 0 digits.

Summary

Regular expression matching uses greedy matching by default, but you can use ? to indicate non-greedy matching for a particular rule.

It's important to distinguish the meanings of ?: \d??.

Non-Greedy Matching has loaded