Skip to content

Complex Matching Rules

Matching Start and End

When performing multiline matching with regular expressions, we use ^ to indicate the start and $ to indicate the end. For example, ^A\d{3}$ can match "A001" and "A380".

Matching Specific Ranges

If we specify that a 7-8 digit phone number cannot start with 0, how do we write the matching rule? \d{7,8} won't work because the first \d can match 0.

Using [...], we can match characters within a range. For example, [123456789] can match 1-9. Thus, we can write the aforementioned phone number rule as: [123456789]\d{6,7}.

Listing all characters can be cumbersome. The [...] syntax also allows a shorthand version, where we can write [1-9] directly.

To match a case-insensitive hexadecimal number, such as 1A2b3c, we can write: [0-9a-fA-F], which matches any of the following character ranges:

  • 0-9: characters 0-9;
  • a-f: characters a-f;
  • A-F: characters A-F.

If we want to match a 6-digit hexadecimal number, we can continue to use {n} as mentioned earlier: [0-9a-fA-F]{6}.

The [...] notation also allows for exclusion, meaning we can match characters that do not belong to a specified range. For example, if we want to match any character except numbers, we can write [^1-9]{3}:

  • It can match "ABC" because it does not contain characters 1-9;
  • It can match "A00" because it does not contain characters 1-9;
  • It cannot match "A01" because it includes character 1;
  • It cannot match "A05" because it includes character 5.

OR Rule Matching

Two regular rules connected by | represent an OR rule. For example, AB|CD means it can match either AB or CD.

Let’s look at the following regular expression java|php:

java
// regex
public class Main {
    public static void main(String[] args) {
        String re = "java|php";
        System.out.println("java".matches(re));
        System.out.println("php".matches(re));
        System.out.println("go".matches(re));
    }
}

It can match "java" or "php," but cannot match "go."

To include "go" in the matches, we can rewrite it as java|php|go.

Using Parentheses

Now, how do we match the strings "learn java," "learn php," and "learn go"? A simple rule could be learn\sjava|learn\sphp|learn\sgo, but this rule is too complex. We can factor out the common part and use parentheses (...) to group the subrules, resulting in learn\s(java|php|go).

java
// regex
public class Main {
    public static void main(String[] args) {
        String re = "learn\\s(java|php|go)";
        System.out.println("learn java".matches(re));
        System.out.println("learn Java".matches(re));
        System.out.println("learn php".matches(re));
        System.out.println("learn Go".matches(re));
    }
}

The above rule still does not match strings like "learn Java" and "learn Go." Please modify the regular expression to match strings that start with uppercase letters like "learn Java," "learn Php," and "learn Go."

Summary

The main complex matching rules are as follows:

Regular ExpressionRuleMatches
^StartBeginning of the string
$EndEnd of the string
[ABC]Any char in [...]A, B, C
[A-F0-9xy]Specified rangeA, ..., F, 0, ..., 9, x, y
[^A-F]Any char not in specified rangeNon A-F
`ABCDEF`
Complex Matching Rules has loaded