Tip When manipulating text for reformatting, it is often useful to break the text into lots of little subexpressions so as to have greater control over that text.. Case Conversion Metach
Trang 1Caution
As noted previously, you will need to modify the backreference
designator based on the implementation used JavaScript users will
need to use $ instead of the previously used \ ColdFusion users
should use \for both find and replace operations
Tip
As seen in this example, a subexpression may be referred to
multiple times simply by referring to the backreference as needed
Let's look at one more example User information is stored in a database, and phone numbers are stored in the format 313-555-1234 However, you need to reformat the phone numbers as (313) 555-1234 Here is the example:
313-555-1234
248-555-9999
810-555-9000
(\d{3})(-)(\d{3})(-)(\d{4})
($1) $3-$5
(313) 555-1234
(248) 555-9999
Trang 2(810) 555-9000
Again, two regular expression patterns are used here The first looks far more complicated than it is, so let's walk through it
(\d{3})(-)(\d{3})(-)(\d{4}) matches a phone number, but breaks it into five subexpressions (so as
to isolate its parts) (\d{3}) matches the first three digits as the first
subexpression, (-) matches – as the second subexpression, and so on The end result is that the phone number is broken into five parts (each part its own
subexpression): the area code, a hyphen, the first three digits of the number,
another hyphen, and then the final four digits These five parts can be used
individually and as needed, and so ($1) $3-$5 simply reformats the number using only three of the subexpressions and ignoring the other two, thereby turning 313-555-1234 into (313) 555-1234
Tip
When manipulating text for reformatting, it is often useful to break
the text into lots of little subexpressions so as to have greater
control over that text
Converting Case
Some regex implementations support the use of conversion operations via the metacharacters listed in Table 8.1
Table 8.1 Case Conversion Metacharacters
\L Convert all characters up to \E to lowercase
\u Convert next character to uppercase
Trang 3Table 8.1 Case Conversion Metacharacters
\U Convert all characters up to \E to uppercase
\l and \u are placed before a character (or expression) so as to convert the case of the next character \L and \U convert the case of all characters until a terminating
\E is reached
Following is a simple example, converting the text within an <H1> tag pair to uppercase:
<BODY>
<H1>Welcome to my Homepage</H1>
Content is divided into two sections:<BR>
<H2>ColdFusion</H2>
Information about Macromedia ColdFusion
<H2>Wireless</H2>
Information about Bluetooth, 802.11, and more
<H2>This is not valid HTML</H3>
</BODY>
(<[Hh]1>)(.*?)(</[Hh]1>)
Trang 4$1\U$2\E$3
<BODY>
<H1>WELCOME TO MY HOMEPAGE</H1>
Content is divided into two sections:<BR>
<H2>ColdFusion</H2>
Information about Macromedia ColdFusion
<H2>Wireless</H2>
Information about Bluetooth, 802.11, and more
<H2>This is not valid HTML</H3>
</BODY>
The pattern (<[Hh]1>)(.*?)(</[Hh]1>) breaks the header into three subexpressions: the opening tag, the text, and the closing tag The second pattern then puts the text back together: $1 contains the start tag, \U$2\E converts the second subexpression (the header text) to uppercase, and $3 contains the end tag
Summary
Subexpressions are used to define sets of characters or expressions In addition to being used for repeating matches (as seen in the previous lesson), subexpressions can be referred to within patterns This type of reference is called a backreference (and unfortunately, there are implementation differences in backreference syntax) Backreferences are useful in text matching and in replace operations
Lesson 9 Looking Ahead and Behind
Trang 5All the expressions used thus far have matched text, but sometimes you may want
to use expressions to mark the position of text to be matched (in contrast to the matched text itself) This involves the use of lookaround (the capability to look ahead and behind), which will be explained in this lesson
Introducing Lookaround
Again, we'll start with an example You need to extract the title of a Web page; HTML page titles are placed between <TITLE> and </TITLE> tags in the
<HEAD> section of HTML code Here's the example:
<HEAD>
<TITLE>Ben Forta's Homepage</TITLE>
</HEAD>
<[tT][iI][tT][lL][eE]>.*</[tT][iI][tT][lL][eE]>
<HEAD>
<TITLE>Ben Forta's Homepage</TITLE>
</HEAD>
<[tT][iI][tT][lL][eE]>.*</[tT][iI][tT][lL][eE]> matches the opening <TITLE> tag (in upper, lower, or mixed case), the closing </TITLE> tag, and whatever text is between them That worked
Or did it? What you needed was the title text, but what you got also contained the opening and closing <TITLE> tags Is it possible to return just the title text?
Trang 6One solution could be to use subexpressions (as seen in Lesson 7, "Using
Subexpressions") This would allow for you to retrieve the matched text in three parts: the opening tag, the text, and the closing tag With the matched text broken into parts, it would not be too difficult to extract just that part you want
But it makes little sense to make the effort to retrieve something that you actually don't want, only to have to manually remove it What you really need here is a way