Appearance
Group Matching
We previously mentioned that (...)
can be used to group a sub-rule. Writing learn\s(java|php|go)
allows us to match long strings more conveniently.
In fact, (...)
has another important function: group matching.
Matching Area Code and Phone Number
Let’s look at how to use a regular expression to match the rule for an area code and a phone number. Using the matching rules discussed earlier, we can easily write this:
\d{3,4}\-\d{6,8}
Although this regex is simple, often after a successful match, the next step is to extract the area code and phone number to store them in a database. This raises the question: how do we extract the matched substrings?
While we can use methods like indexOf()
and substring()
provided by String
, they do not provide a generalized approach for extracting substrings based on regex matches. For instance, if we need to extract learn\s(java|php)
, we would have to change the code.
The correct approach is to use (...)
to group the rules we want to extract, changing the regex to (\d{3,4})\-(\d{6,8})
.
Now, another question arises: after matching, how do we extract substrings based on the parentheses?
At this point, we can no longer use the simple method String.matches()
. We must import the java.util.regex
package, use the Pattern
object to match, and obtain a Matcher
object. If the match is successful, we can directly return the substrings using Matcher.group(index)
:
java
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
Pattern p = Pattern.compile("(\\d{3,4})\\-(\\d{7,8})");
Matcher m = p.matcher("010-12345678");
if (m.matches()) {
String g1 = m.group(1);
String g2 = m.group(2);
System.out.println(g1); // 010
System.out.println(g2); // 12345678
} else {
System.out.println("Match failed!");
}
}
}
Running the above code will yield two matched substrings: 010
and 12345678
.
It's important to note that the parameter for Matcher.group(index)
uses 1 to indicate the first substring and 2 for the second substring. What happens if we pass 0? The answer is 010-12345678
, which is the entire string matched by the regex.
Pattern
In the previous code, we used the String.matches()
method with regex, while in the group extraction code, we utilized the Pattern
and Matcher
classes from the java.util.regex
package. In essence, both approaches are similar since the String.matches()
method internally calls the methods of the Pattern
and Matcher
classes.
However, repeatedly using String.matches()
for the same regex is less efficient, as it creates identical Pattern
objects each time. Instead, we can create a Pattern
object once and reuse it for multiple matches:
java
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(\\d{3,4})\\-(\\d{7,8})");
pattern.matcher("010-12345678").matches(); // true
pattern.matcher("021-123456").matches(); // false
pattern.matcher("022#1234567").matches(); // false
// Obtain the Matcher object:
Matcher matcher = pattern.matcher("010-12345678");
if (matcher.matches()) {
String whole = matcher.group(0); // "010-12345678", 0 indicates the entire matched string
String area = matcher.group(1); // "010", 1 indicates the first matched substring
String tel = matcher.group(2); // "12345678", 2 indicates the second matched substring
System.out.println(area);
System.out.println(tel);
}
}
}
When using Matcher
, you must first call matches()
to determine if the match is successful. Only after a successful match can you call group()
to extract substrings.
Using the substring extraction feature, we can easily obtain both the area code and the phone number.
Exercise
Use group matching to extract hours, minutes, and seconds from the string "23:01:59".
Summary
The use of (...)
for grouping in regular expressions allows for quick substring extraction via the Matcher
object:
group(0)
represents the entire matched string;group(1)
represents the first substring, andgroup(2)
represents the second substring, and so on.