Difference between revisions of "Pattern Matching"

From Suhrid.net Wiki
Jump to navigationJump to search
Line 103: Line 103:
 
* ? - Zero or one occurrence.
 
* ? - Zero or one occurrence.
 
* + - One or more occurrence.
 
* + - One or more occurrence.
 +
 +
* The above three are greedy quantifiers.
  
 
* Example the pattern abc(\d)<nowiki>*</nowiki> will match -
 
* Example the pattern abc(\d)<nowiki>*</nowiki> will match -
Line 113: Line 115:
 
** ab211 (doesnt start with abc)
 
** ab211 (doesnt start with abc)
 
** abcs (doesnt have a digit after abc)
 
** abcs (doesnt have a digit after abc)
 +
 +
=== Greedy Quantifiers ===
 +
 +
* Greedy quantifiers will try to look at the entire source data while trying to determine a match.
 +
 +
See example below:
 +
 +
<syntaxhighlight lang="java5">
 +
 +
public class Greedy {
 +
 +
public static void main(String[] args) {
 +
String greedyPattern = ".*xx";
 +
String reluctantPattern = ".*?xx";
 +
String source = "yyxxxxyxx";
 +
 +
Pattern gp = Pattern.compile(greedyPattern);
 +
Matcher gm = gp.matcher(source);
 +
boolean found = false;
 +
while (gm.find()) {
 +
System.out.println("Greedy Match found ! Starts at : " + gm.start()
 +
+ ", Matched portion : " + gm.group());
 +
found = true;
 +
}
 +
 +
if (!found) {
 +
System.out.println("No match found");
 +
}
 +
 +
 +
found = false;
 +
Pattern rp = Pattern.compile(reluctantPattern);
 +
Matcher rm = rp.matcher(source);
 +
 +
while (rm.find()) {
 +
System.out.println("Reluctant Match found ! Starts at : " + rm.start()
 +
+ ", Matched portion : " + rm.group());
 +
found = true;
 +
}
 +
 +
if (!found) {
 +
System.out.println("No match found");
 +
}
 +
 +
 +
}
 +
 +
}
 +
</syntaxhighlight>
  
 
[[Category:OCPJP]]
 
[[Category:OCPJP]]

Revision as of 10:30, 4 July 2011

Intro

  • Classes in the java.util.regex package provide regular expressions support.
  • Basic example
import java.util.regex.*;

public class RegexTest1 {

	public static void main(String[] args) {
		
		Pattern p = Pattern.compile("lazy"); //The pattern to search for
		Matcher m = p.matcher("The quick brown fox jumps over the lazy dog"); //The source against which to match the pattern
		boolean found = false;
		while(m.find()) {
			System.out.println("Match found at " + m.start() + "," + m.end()); //Will print : Match found at 35,39
			found = true;
		}
		
		if(!found) {
			System.out.println("No match found");
		}
	}

}
  • Thumb rule: Regex matching runs from left to right and once a source character has been consumed, it cannot be reused.
  • In the below example, it will match the pattern "aba" starting at 0 and 4, but not at 2 since they are consumed during the match starting from 0.
import java.util.regex.*;

public class RegexTest2 {

	public static void main(String[] args) {
		
		Pattern p = Pattern.compile("aba");
		Matcher m = p.matcher("abababa");
		boolean found = false;
		while(m.find()) {
			System.out.println("Match found; starting at pos : " + m.start());
			found = true;
		}
		
		if(!found) {
			System.out.println("No match found");
		}
	}

}

Metacharacters

  • Regex keywords that have special search meaning.
  • \d - Matches a digit
  • \s - Matches a whitespace char
  • \w - Matches a word char (letters/digits or _)
public class RegexTest3 {

	public static void main(String[] args) {
		
		Pattern p = Pattern.compile("\\d");
		Matcher m = p.matcher("The 15th of August");
		boolean found = false;
		while(m.find()) {
			System.out.println("Match found; starting at pos : " + m.start());
			found = true;
		}

                // Match found; starting at pos : 4
                // Match found; starting at pos : 5
		
		if(!found) {
			System.out.println("No match found");
		}
	}

}
  • Set of characters to search for using []
    • [abc] - Only a's or b's or c's
    • [a-f] - Search for a,b,c,d,e,f chars
    • [a-fA-F] - small and caps
  • Dot - "." metacharacter matches any character

Quantifiers

  • Used to specify the number of occurrences of a search pattern
  • * - Zero or more occurrences
  •  ? - Zero or one occurrence.
  • + - One or more occurrence.
  • The above three are greedy quantifiers.
  • Example the pattern abc(\d)* will match -
    • abc0
    • abc13423
    • abc - since * means 0 or more
    • abcdef - for the similar reason as above
  • It won't match -
    • ab211 (doesnt start with abc)
    • abcs (doesnt have a digit after abc)

Greedy Quantifiers

  • Greedy quantifiers will try to look at the entire source data while trying to determine a match.

See example below:

public class Greedy {

	public static void main(String[] args) {
		String greedyPattern = ".*xx";
		String reluctantPattern = ".*?xx";
		String source = "yyxxxxyxx";
		
		Pattern gp = Pattern.compile(greedyPattern);
		Matcher gm = gp.matcher(source);
		boolean found = false;
		while (gm.find()) {
			System.out.println("Greedy Match found ! Starts at : " + gm.start()
					+ ", Matched portion : " + gm.group());
			found = true;
		}

		if (!found) {
			System.out.println("No match found");
		}
		
		
		found = false;
		Pattern rp = Pattern.compile(reluctantPattern);
		Matcher rm = rp.matcher(source);
		
		while (rm.find()) {
			System.out.println("Reluctant Match found ! Starts at : " + rm.start()
					+ ", Matched portion : " + rm.group());
			found = true;
		}

		if (!found) {
			System.out.println("No match found");
		}

		
	}

}