Tuesday, June 19, 2012

Regular Expression

A good tool for leaning regular expression



    Self Summary
  1. Intro
  2. Word Boundary- \bA\w*\b match a word that starts with 'A'.
  3. \b denotes the word boundary(start and end of word). \b has zero length.
  4. Input Boundary - ^w*$ match a word if it is the only word. ^ and $ matched beginning and end of an input. Both have zero length.
  5. Repeat - \d{3} digit repeat for 3 times, \d{3,5} 3 to 5 times, \d{3,} at least 3 times
  6. Whitespace - \s any whitespace character
  7. Escape - escapes character \ for character ^  .  \   (   ) uses \^   \.  \\  \(  \) instead
  8. One of it - [abcde] any one character in the bracket
  9. none of it - [^abcde] no character in the bracket
  10. Greedy and lazy
  11. a.*b The longest string starting with a and ending with b
    a.*?b The shortest string starting with a and ending with b
  12. [.]* means ….
  13. but (.)* 12345




    Positive Matched
    \w
    Match any alphanumeric character
    .
    Match any character except newline
    \s
    Match any whitespace character
    \d
    Match any digit
    \b
    Match the beginning or end of a word
    \n
    newline
    ^
    Match the beginning of the string
    $
    Match the end of the s

    Negative Matched
    \W
    Match any character that is NOT alphanumeric
    \S
    Match any character that is NOT whitespace
    \D
    Match any character that is NOT a digit
    \B
    Match a position that is NOT the beginning or end of a word
    [^x]
    Match any character that is NOT x
    [^aeiou]
    Match any character that is NOT one of the characters aeiou

    Repeat
    *
    Repeat any number of times
    +
    Repeat one or more times
    ?
    Repeat zero or one time
    {n}
    Repeat n times
    {n,m}
    Repeat at least n, but no more than m times
    {n,}
    Repeat at least n times

    Greedy and lazy
    *?
    Repeat any number of times, but as few as possible
    +?
    Repeat one or more times, but as few as possible
    ??
    Repeat zero or one time, but as few as possible
    {n,m}?
    Repeat at least n, but no more than m times, but as few as possible
    {n,}?
    Repeat at least n times, but as few as possible


    Captures
     
    (exp)
    Match exp and capture it in an automatically numbered group
    (?<name>exp)
    Match exp and capture it in a group named name
    (?:exp)
    Match exp, but do not capture it
    Lookarounds
     
    (?=exp)
    Match any position before a suffix exp (ing below)(not including exp)
    Say \b\w+(?=ing\b) it matches any words ending with ing
    working abcing -> working abcing
    (purple is the matching one)

    (?<=exp)
    Match any position after prefix exp (re below)(including exp)
    Say (?<=\bre)\w+\b it matches any words starts with re
    reduction -> reduction
    (purple is the matching one)

    (?!exp)
    Match any position after which the suffix exp (123 below)  is not found

    work(?!123)
    work123->work1234 (no match)

    But,
    work1(?!123)
    work123 ->work1234

    Reason is for work1, it goes to
    work1|23, where it is follows by 23 instead of 123

    (?<!exp)
    Match any position before which the prefix exp is not found

    (?<!123)
    123work->123work (no match)

    But,
    (?<!123)3work
    123work->123work (no match)

    Reason is for 3work, it goes to
    12|3work, where it is prefix by 12 instead of 123



      Comment
    (?#comment)
    Comment

    Special Character

    opening square bracket [\[
    the backslash \\\
    the caret ^\^
    the dollar sign $, \$
    the period or dot .\.
    the vertical bar or pipe symbol |, \|
     the question mark ?, \?
     the asterisk or star *\*
    the plus sign +\+
    the opening round bracket ( and \(
    the closing round bracket ). \)




  14. About Replacement
  15. (?<named>pattern) denote a grouping with a name named
    (pattern) denote a grouping with No Name(refer by number)

    Named example
    strInput = Regex.Replace(strInput,"(?<first>abc)","def${first})
    Replace "abc" becomes "defabc"

    UnNamed Example
    strInput = Regex.Replace(strInput,"(abc)","def$(1))
    Replace "abc" becomes "defabc"

    Summary of symbol
    $& 
    matched text
    $_   
    original source string
    $` 
    text before match
    $' 
    text after match
    ${group_name}
    text matched by named group
    $1, $2   
    text matched by numbered group
    $$   
    the literal "$"


    Another example

    sResult = Regex.Replace("The price is 31.95","\d+\.\d{2}","$$$&")
    It Put $ in front of monetary values


  16. About Lookaround
  17. The advantage for lookaround is not including the word that matched.

  18.  Misc
  19. Match {…} where … not containing word hede
    {((?!hede)[^}])*}

No comments:

Post a Comment