| Edit | Back | pdf:writer| three column layout

Regular expressions

overview:
expression returns
"Fats Waller" =~ /a/ 1
/a/ =~ "Fats Waller" 1
'very interesting' =~ /t/ in this :
$& receives the part of the string "t"
$` receives that preceded the match "very interes"
$' receives that after the match "eresting"
match also sets the thread-global variables $~ and $1 through $9
example:
def show_regexp(a, re)
----if a =~ re
------"#{$`}<<#{$&}>>#{$'}"
----else
------"no match"
----end
end
expression returns
show_regexp('very interesting', /t/) very in<<t>>eresting
exception :
characters except ., pipe sym, (, ), [, ], {, }, +, \, ^, $, *, and ?
expression returns
show_regexp('yes [ no', /\[/) yes <<[>> no
show_regexp('yes (no)', /\(no\)/) yes <<(no)>>
show_regexp('are you sure?', /e\?/) are you sur<<e?>>
Anchors:
expression returns
/^option/ matches only if it appears at the Start of a line
show_regexp("this is the mail \n the time", /^the/) this the is \n <<the>> time
/option$/ matches only if it appears at the End of a line
show_regexp("this is a is \n this is time", /is$/) this is a <<is>>\n this is time
\A matches the beginning of a string
show_regexp("this is\nthe time", /\Athis/) <<this>> is\nthe time
\Z match the end of a string
show_regexp("this time is\nthe time\n", /time\Z/) this time is\nthe <<time>>\n
\b match word boundaries
show_regexp("this is\nthe time", /\bis/) this <<is>>\nthe time
\B match nonword boundaries
show_regexp("this is\nthe time", /\Bis/) th<<is>> is\nthe time
Character Classes:
[characters]
[aeiou] will match a vowel
[\b] backspace character
[\n] a newline
[\s] whitespace character
.,pipe,(,),[,{,+,^,$,*,? —is turned off inside the brackets
Example Character Classes:
expression returns
show_regexp('Proce $12.', /[aeiou]/) Pr<<o>>ce $12.
show_regexp('Price $12.', /[\s]/) Price<< >>$12.
show_regexp('Price $12.', /[[:digit:]]/) Price $<<1>>2.
show_regexp('Price $12.', /[[:space:]]/) Price<< >>$12.
show_regexp('Price $12.', /[[:punct:]aeiou]/) Pr<<i>>ce $12.
[^a-z] matches any character that isn’t a lowercase alphabetic
a = see [Design Patterns-page 123]
show_regexp(a, /[^a-z]/) see<< >>[Design Patterns-page 123]
show_regexp(a, /[^a-z\s]/) see <<[>>Design Patterns-page 123]
Check this:
a = 'It costs $12.'
show_regexp(a, /c.s/) It <<cos>>ts $12.
show_regexp(a, /./) <<I>>t costs $12.
show_regexp(a, /\./) It costs $12<<.>>
Character class abbreviations:
Sequence As [ . . . ] Meaning
\d [0-9] Digit character
\D [^0-9] Any character except a digit
\s [\s\t\r\n\f] Whitespace character
\S [^\s\t\r\n\f] Any character except whitespace
\w [A-Za-z0-9_] Word character
\W [^A-Za-z0-9_] Any character except a word character
POSIX Character Classes:
Sequence Meaning
[:alnum:] Alphanumeric
[:alpha:] Uppercase or lowercase letter
[:blank:] Blank and tab
[:cntrl:] Control characters (at least 0x00–0x1f, 0x7f)
[:digit:] Digit
[:graph:] Printable character excluding space
[:lower:] Lowercase letter
[:print:] Any printable character (including space)
[:punct:] Printable character excluding space and alphanumeric
[:space:] Whitespace (same as \s)
[:upper:] Uppercase letter
[:xdigit:] Hex digit (0–9, a–f, A–F)
Repetition:

 | Edit | Back| pdf:writer| three column layout