Skip to content
On this page

Group Matching

We previously mentioned that (...) can be used to group a sub-rule. Writing learn\s(java|php|go) allows us to match long strings more conveniently.

In fact, (...) has another important function: group matching.

Matching Area Code and Phone Number

Let’s look at how to use a regular expression to match the rule for an area code and a phone number. Using the matching rules discussed earlier, we can easily write this:

\d{3,4}\-\d{6,8}

Although this regex is simple, often after a successful match, the next step is to extract the area code and phone number to store them in a database. This raises the question: how do we extract the matched substrings?

While we can use methods like indexOf() and substring() provided by String, they do not provide a generalized approach for extracting substrings based on regex matches. For instance, if we need to extract learn\s(java|php), we would have to change the code.

The correct approach is to use (...) to group the rules we want to extract, changing the regex to (\d{3,4})\-(\d{6,8}).

Now, another question arises: after matching, how do we extract substrings based on the parentheses?

At this point, we can no longer use the simple method String.matches(). We must import the java.util.regex package, use the Pattern object to match, and obtain a Matcher object. If the match is successful, we can directly return the substrings using Matcher.group(index):

java
import java.util.regex.*;

public class Main {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("(\\d{3,4})\\-(\\d{7,8})");
        Matcher m = p.matcher("010-12345678");
        if (m.matches()) {
            String g1 = m.group(1);
            String g2 = m.group(2);
            System.out.println(g1); // 010
            System.out.println(g2); // 12345678
        } else {
            System.out.println("Match failed!");
        }
    }
}

Running the above code will yield two matched substrings: 010 and 12345678.

It's important to note that the parameter for Matcher.group(index) uses 1 to indicate the first substring and 2 for the second substring. What happens if we pass 0? The answer is 010-12345678, which is the entire string matched by the regex.

Pattern

In the previous code, we used the String.matches() method with regex, while in the group extraction code, we utilized the Pattern and Matcher classes from the java.util.regex package. In essence, both approaches are similar since the String.matches() method internally calls the methods of the Pattern and Matcher classes.

However, repeatedly using String.matches() for the same regex is less efficient, as it creates identical Pattern objects each time. Instead, we can create a Pattern object once and reuse it for multiple matches:

java
import java.util.regex.*;

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d{3,4})\\-(\\d{7,8})");
        pattern.matcher("010-12345678").matches(); // true
        pattern.matcher("021-123456").matches(); // false
        pattern.matcher("022#1234567").matches(); // false
        // Obtain the Matcher object:
        Matcher matcher = pattern.matcher("010-12345678");
        if (matcher.matches()) {
            String whole = matcher.group(0); // "010-12345678", 0 indicates the entire matched string
            String area = matcher.group(1); // "010", 1 indicates the first matched substring
            String tel = matcher.group(2); // "12345678", 2 indicates the second matched substring
            System.out.println(area);
            System.out.println(tel);
        }
    }
}

When using Matcher, you must first call matches() to determine if the match is successful. Only after a successful match can you call group() to extract substrings.

Using the substring extraction feature, we can easily obtain both the area code and the phone number.

Exercise

Use group matching to extract hours, minutes, and seconds from the string "23:01:59".

Summary

The use of (...) for grouping in regular expressions allows for quick substring extraction via the Matcher object:

  • group(0) represents the entire matched string;
  • group(1) represents the first substring, and group(2) represents the second substring, and so on.
Group Matching has loaded