Combining Lookahead and Lookbehind Lookahead and lookbehind operations may be combined, as in the following example the solution to the problem at the start of this lesson: Ben Forta's
Trang 1CFMX1: $899.00
XTC99: $69.96
Total items found: 4
(?<=\$)[0-9.]+
ABC01: $23.45
HGG42: $5.31
CFMX1: $899.00
XTC99: $69.96
Total items found: 4
That did the trick (?<=\$) matches $, but does not consume it, and so only the prices (without the leading $ signs) are returned
Compare the first and last expressions used in this example \$[0-9.]+ matched $ followed by a dollar amount (?<=\$)[0-9.]+ also matched $ followed by a dollar amount The difference between the two is not in what they located while
performing the search; it is in what they included in the results The former located and included the $ The latter located $ so as to correctly find the prices, but did not include that $ in the matched results
Lookahead patterns may be variable length; they may contain and
+, for example, so as to be highly dynamic
Lookbehind patterns, on the other hand, must generally be fixed
length This is a restriction imposed by almost all regular
Trang 2expression implementations
Combining Lookahead and Lookbehind
Lookahead and lookbehind operations may be combined, as in the following example (the solution to the problem at the start of this lesson):
<HEAD>
<TITLE>Ben Forta's Homepage</TITLE>
</HEAD>
(?<=<[tT][iI][tT][lL][eE]>).*(?=</[tT][iI][tT][lL][eE]>)
<HEAD>
<TITLE>Ben Forta's Homepage</TITLE>
</HEAD>
That worked (?<=<[tT][iI][tT][lL][eE]>) is a lookbehind operation that matches (but does not consume) <TITLE>;
(?=</[tT][iI][tT][lL][eE]>) similarly matches (but does not consume) </TITLE> All that is returned is the title text (as that is all that was consumed)
Tip
In the preceding example, it may be worthwhile to escape the <
(the first character being matched) to prevent ambiguity, so (?<=\<
instead of (?<=<
Trang 3Negating Lookaround
As seen thus far, lookahead and lookbehind are usually used to match text,
essentially to specify the location of text to be returned (by specifying the text before or after the desired match) These are known as positive lookahead and positive lookbehind The term positive refers to the fact that they look for a match
A lesser-used form of lookaround is the negative lookaround Negative lookahead looks ahead for text that does not match the specified pattern, and negative
lookbehind similarly looks behind for text that does not match the specified
pattern
You might have expected to be able to use ^ to negate a lookaround, but no, the syntax is a little different Lookaround operations are negated using ! (which replaces the =) Table 9.1 lists all the lookaround operations
Table 9.1 Lookaround Operations
Tip
Generally, any regular expression implementations supporting
lookahead support both positive and negative lookahead
Similarly, those implementations supporting lookbehind support
both positive and negative lookbehind
To demonstrate the difference between positive and negative lookbehind, here is
an example The following block of text contains numbers—both prices and
quantities First we'll just obtain the prices:
Trang 4I paid $30 for 100 apples,
50 oranges, and 60 pears
I saved $5 on this order
(?<=\$)\d+
I paid $30 for 100 apples,
50 oranges, and 60 pears
I saved $5 on this order
This is very similar to the example seen previously \d+ matches numbers (one or more digits), and (?<=\$) looks behind to match (but not consume) the $
(escaped as \$) Therefore, the numbers in the two prices were matched, but not the quantities
Now we'll do the opposite, locating just the quantities but not the prices:
I paid $30 for 100 apples,
50 oranges, and 60 pears
I saved $5 on this order
\b(?<!\$)\d+\b
Trang 5I paid $30 for 100 apples,
50 oranges, and 60 pears
I saved $5 on this order
Again, \d+ matched numbers, but this time only the quantities were matched and not the prices Expression (?<!\$) is a negative lookbehind that will match only when what precedes the numbers is not a $ Changing the = in the lookbehind changes the pattern from positive to negative
You may be wondering why the pattern in the negative lookbehind example
defines word boundaries (using \b) To understand why this is necessary, here is the same example without those boundaries:
I paid $30 for 100 apples,
50 oranges, and 60 pears
I saved $5 on this order
(?<!\$)\d+
I paid $30 for 100 apples,
50 oranges, and 60 pears
I saved $5 on this order