Regular Expressions• A pattern that matches a set of one or more strings • May be a simple string, or contain wildcard characters or modifiers • Used by programs such as vim, grep, awk,
Trang 1Appendix A:
Regular Expressions
It’s All Greek to Me
Trang 2Regular Expressions
• A pattern that matches a set of one or more strings
• May be a simple string, or contain wildcard characters or modifiers
• Used by programs such as vim, grep, awk, and sed
• Not the same as shell expansion
Trang 3• Characters
– Literals
– Special Characters
• Delimiters
– Mark beginning end of regular expressions – Usually /
– ’ (but not really)
Trang 4Simple Strings
• Contain no special characters
• Matches only the string
• Ex: /foo/ matches:
– foo
– tomfoolery
– bar.foo.com
Trang 5Special Characters
• Can match multiple strings
• Represent zero or more characters
• Always match the longest possible string (we’ll see examples in a bit)
Trang 6• Matches any single character
• Ex: /.ing/
– I was talking
– bling
– he called ingred
• Ex: /spar.ing/
– sparring
– sparking
Trang 7• Define a character class
• Match any one character in the class
• If a carat (^) is first character in class,
character class matches any character not in class
• Other special characters in class lose
meaning
Trang 8Brackets con’t
• Ex /[jJ]ustin/ matches justin and Justin
• Ex /[A-Za-z]/ matches any letter
• Ex /[0-9]/ matches any number
• Ex /[^a-z]/ matches anything but
lowercase letters
Trang 9• Zero or more occurrences of the previous character
• So match any number of characters would
be /.*/
• Ex /t.*ing/
– thing
– this is really annoying
Trang 10Plus Signs and Question Marks
• Very similar to asterisks, depend on previous
• + matches one or more occurrences (not 0)
• ? Matches zero or one occurrence (no more)
• Ex /2+4?/ matches one or more 2’s
followed by either zero or one 4
– 22224, 2 match
– 4, 244 do not
• Part of the class of extended R.E.
Trang 11Carets & Dollar Signs
• If a regular expression starts with a ^, the string must be at the beginning of a line
• If a regular expression ends with a $, the string must be at the end of a line
• ^ and $ are referred to as anchors
• Ex /^T.*T$/ matches any line that starts and ends with T
Trang 12Quoting Special Characters
• If you want to use a special character
literally, put a backslash in front of it
• Ex /and\/or/ matches and/or
• Ex /\\/ matches \
• Ex /\**/ matches any number of asterisks
Trang 13Longest Match
• Regular expressions match the longest string possible in a line
• Ex I (Justin) like coffee (lots)
• /(.*)/
– Matches (Justin) like coffee (lots)
• /([^)]*)/
– Matches (Justin)
Trang 14Boolean OR
• You can pattern match for two distinct strings using OR (the pipe)
• Ex /CAT|DOG/
– Matches exactly CAT and exactly DOG
• Simplier expressions can be written just
using a character class
– I.E /a[bc]/ instead of /ab|ac/
• Also part of extended R.E
Trang 15• You can apply special characters to groups
of characters in parenthesis
• Also called bracketing
• Matches same as unbracketed expression
• But can use modifiers
• Ex /\(duck\)*|\(goose\)/
Trang 16Using with vim
• Use regular expressions for searching and substituting
• Searching:
– /string or ?string
• Substituting:
– :[g][address]s/string/replace[/g]
– g : global; substitute all lines
– string and replace can be R.E.
– /g : global; replace all occurrences in the line
Trang 17Using with vim con’t
• [address]
– n : line number
– n[+/-]x : line number plus x lines before or after
– n1,n2 : from line n1 to n2
– : alias for current line
– $ : alias for last line in work buffer
– % : alias for entire work buffer
Trang 18vim examples
• /^if(
• /end\.$
• :%s/[Jj]ustin/Mr\ Awesome/g
Trang 19Using with vim con’t
• Ampersand (&)
– Alias for matched string when substituting – Ex: /[A-Z][0-9]/_&_/
• Quoted digit (\n)
– Used with R.E with multiple quoted parts – Can be used to rearrange columns
– Ex: /\([^,]*\), \(.*\)/\2 \1/
Trang 20Using with grep
• To take advantage of extended regular
expressions, use egrep or grep -E instead
• Use single quote as delimiter
• Ex:
– egrep ’^T.*T$’ myfile
Lists all lines in myfile that begin & end with T