Difference between revisions of "Pattern Matching"
From Suhrid.net Wiki
Jump to navigationJump to searchLine 103: | Line 103: | ||
* ? - Zero or one occurrence. | * ? - Zero or one occurrence. | ||
* + - One or more occurrence. | * + - One or more occurrence. | ||
+ | |||
+ | * The above three are greedy quantifiers. | ||
* Example the pattern abc(\d)<nowiki>*</nowiki> will match - | * Example the pattern abc(\d)<nowiki>*</nowiki> will match - | ||
Line 113: | Line 115: | ||
** ab211 (doesnt start with abc) | ** ab211 (doesnt start with abc) | ||
** abcs (doesnt have a digit after abc) | ** abcs (doesnt have a digit after abc) | ||
+ | |||
+ | === Greedy Quantifiers === | ||
+ | |||
+ | * Greedy quantifiers will try to look at the entire source data while trying to determine a match. | ||
+ | |||
+ | See example below: | ||
+ | |||
+ | <syntaxhighlight lang="java5"> | ||
+ | |||
+ | public class Greedy { | ||
+ | |||
+ | public static void main(String[] args) { | ||
+ | String greedyPattern = ".*xx"; | ||
+ | String reluctantPattern = ".*?xx"; | ||
+ | String source = "yyxxxxyxx"; | ||
+ | |||
+ | Pattern gp = Pattern.compile(greedyPattern); | ||
+ | Matcher gm = gp.matcher(source); | ||
+ | boolean found = false; | ||
+ | while (gm.find()) { | ||
+ | System.out.println("Greedy Match found ! Starts at : " + gm.start() | ||
+ | + ", Matched portion : " + gm.group()); | ||
+ | found = true; | ||
+ | } | ||
+ | |||
+ | if (!found) { | ||
+ | System.out.println("No match found"); | ||
+ | } | ||
+ | |||
+ | |||
+ | found = false; | ||
+ | Pattern rp = Pattern.compile(reluctantPattern); | ||
+ | Matcher rm = rp.matcher(source); | ||
+ | |||
+ | while (rm.find()) { | ||
+ | System.out.println("Reluctant Match found ! Starts at : " + rm.start() | ||
+ | + ", Matched portion : " + rm.group()); | ||
+ | found = true; | ||
+ | } | ||
+ | |||
+ | if (!found) { | ||
+ | System.out.println("No match found"); | ||
+ | } | ||
+ | |||
+ | |||
+ | } | ||
+ | |||
+ | } | ||
+ | </syntaxhighlight> | ||
[[Category:OCPJP]] | [[Category:OCPJP]] |
Revision as of 10:30, 4 July 2011
Intro
- Classes in the java.util.regex package provide regular expressions support.
- Basic example
import java.util.regex.*;
public class RegexTest1 {
public static void main(String[] args) {
Pattern p = Pattern.compile("lazy"); //The pattern to search for
Matcher m = p.matcher("The quick brown fox jumps over the lazy dog"); //The source against which to match the pattern
boolean found = false;
while(m.find()) {
System.out.println("Match found at " + m.start() + "," + m.end()); //Will print : Match found at 35,39
found = true;
}
if(!found) {
System.out.println("No match found");
}
}
}
- Thumb rule: Regex matching runs from left to right and once a source character has been consumed, it cannot be reused.
- In the below example, it will match the pattern "aba" starting at 0 and 4, but not at 2 since they are consumed during the match starting from 0.
import java.util.regex.*;
public class RegexTest2 {
public static void main(String[] args) {
Pattern p = Pattern.compile("aba");
Matcher m = p.matcher("abababa");
boolean found = false;
while(m.find()) {
System.out.println("Match found; starting at pos : " + m.start());
found = true;
}
if(!found) {
System.out.println("No match found");
}
}
}
Metacharacters
- Regex keywords that have special search meaning.
- \d - Matches a digit
- \s - Matches a whitespace char
- \w - Matches a word char (letters/digits or _)
public class RegexTest3 {
public static void main(String[] args) {
Pattern p = Pattern.compile("\\d");
Matcher m = p.matcher("The 15th of August");
boolean found = false;
while(m.find()) {
System.out.println("Match found; starting at pos : " + m.start());
found = true;
}
// Match found; starting at pos : 4
// Match found; starting at pos : 5
if(!found) {
System.out.println("No match found");
}
}
}
- Set of characters to search for using []
- [abc] - Only a's or b's or c's
- [a-f] - Search for a,b,c,d,e,f chars
- [a-fA-F] - small and caps
- Dot - "." metacharacter matches any character
Quantifiers
- Used to specify the number of occurrences of a search pattern
- * - Zero or more occurrences
- ? - Zero or one occurrence.
- + - One or more occurrence.
- The above three are greedy quantifiers.
- Example the pattern abc(\d)* will match -
- abc0
- abc13423
- abc - since * means 0 or more
- abcdef - for the similar reason as above
- It won't match -
- ab211 (doesnt start with abc)
- abcs (doesnt have a digit after abc)
Greedy Quantifiers
- Greedy quantifiers will try to look at the entire source data while trying to determine a match.
See example below:
public class Greedy {
public static void main(String[] args) {
String greedyPattern = ".*xx";
String reluctantPattern = ".*?xx";
String source = "yyxxxxyxx";
Pattern gp = Pattern.compile(greedyPattern);
Matcher gm = gp.matcher(source);
boolean found = false;
while (gm.find()) {
System.out.println("Greedy Match found ! Starts at : " + gm.start()
+ ", Matched portion : " + gm.group());
found = true;
}
if (!found) {
System.out.println("No match found");
}
found = false;
Pattern rp = Pattern.compile(reluctantPattern);
Matcher rm = rp.matcher(source);
while (rm.find()) {
System.out.println("Reluctant Match found ! Starts at : " + rm.start()
+ ", Matched portion : " + rm.group());
found = true;
}
if (!found) {
System.out.println("No match found");
}
}
}