Tuesday, June 19, 2012

Regular Expression

A good tool for leaning regular expression

    Self Summary
  1. Intro
  2. Word Boundary- \bA\w*\b match a word that starts with 'A'.
  3. \b denotes the word boundary(start and end of word). \b has zero length.
  4. Input Boundary - ^w*$ match a word if it is the only word. ^ and $ matched beginning and end of an input. Both have zero length.
  5. Repeat - \d{3} digit repeat for 3 times, \d{3,5} 3 to 5 times, \d{3,} at least 3 times
  6. Whitespace - \s any whitespace character
  7. Escape - escapes character \ for character ^  .  \   (   ) uses \^   \.  \\  \(  \) instead
  8. One of it - [abcde] any one character in the bracket
  9. none of it - [^abcde] no character in the bracket
  10. Greedy and lazy
  11. a.*b The longest string starting with a and ending with b
    a.*?b The shortest string starting with a and ending with b
  12. [.]* means ….
  13. but (.)* 12345

    Positive Matched
    Match any alphanumeric character
    Match any character except newline
    Match any whitespace character
    Match any digit
    Match the beginning or end of a word
    Match the beginning of the string
    Match the end of the s

    Negative Matched
    Match any character that is NOT alphanumeric
    Match any character that is NOT whitespace
    Match any character that is NOT a digit
    Match a position that is NOT the beginning or end of a word
    Match any character that is NOT x
    Match any character that is NOT one of the characters aeiou

    Repeat any number of times
    Repeat one or more times
    Repeat zero or one time
    Repeat n times
    Repeat at least n, but no more than m times
    Repeat at least n times

    Greedy and lazy
    Repeat any number of times, but as few as possible
    Repeat one or more times, but as few as possible
    Repeat zero or one time, but as few as possible
    Repeat at least n, but no more than m times, but as few as possible
    Repeat at least n times, but as few as possible

    Match exp and capture it in an automatically numbered group
    Match exp and capture it in a group named name
    Match exp, but do not capture it
    Match any position before a suffix exp (ing below)(not including exp)
    Say \b\w+(?=ing\b) it matches any words ending with ing
    working abcing -> working abcing
    (purple is the matching one)

    Match any position after prefix exp (re below)(including exp)
    Say (?<=\bre)\w+\b it matches any words starts with re
    reduction -> reduction
    (purple is the matching one)

    Match any position after which the suffix exp (123 below)  is not found

    work123->work1234 (no match)

    work123 ->work1234

    Reason is for work1, it goes to
    work1|23, where it is follows by 23 instead of 123

    Match any position before which the prefix exp is not found

    123work->123work (no match)

    123work->123work (no match)

    Reason is for 3work, it goes to
    12|3work, where it is prefix by 12 instead of 123


    Special Character

    opening square bracket [\[
    the backslash \\\
    the caret ^\^
    the dollar sign $, \$
    the period or dot .\.
    the vertical bar or pipe symbol |, \|
     the question mark ?, \?
     the asterisk or star *\*
    the plus sign +\+
    the opening round bracket ( and \(
    the closing round bracket ). \)

  14. About Replacement
  15. (?<named>pattern) denote a grouping with a name named
    (pattern) denote a grouping with No Name(refer by number)

    Named example
    strInput = Regex.Replace(strInput,"(?<first>abc)","def${first})
    Replace "abc" becomes "defabc"

    UnNamed Example
    strInput = Regex.Replace(strInput,"(abc)","def$(1))
    Replace "abc" becomes "defabc"

    Summary of symbol
    matched text
    original source string
    text before match
    text after match
    text matched by named group
    $1, $2   
    text matched by numbered group
    the literal "$"

    Another example

    sResult = Regex.Replace("The price is 31.95","\d+\.\d{2}","$$$&")
    It Put $ in front of monetary values

  16. About Lookaround
  17. The advantage for lookaround is not including the word that matched.

  18.  Misc
  19. Match {…} where … not containing word hede

No comments:

Post a Comment