Professional Information Technology-Programming Book part 110 pot

Tip When manipulating text for reformatting, it is often useful to break the text into lots of little subexpressions so as to have greater control over that text.. Case Conversion Metach

Trang 1

Caution

As noted previously, you will need to modify the backreference

designator based on the implementation used JavaScript users will

need to use $ instead of the previously used \ ColdFusion users

should use \for both find and replace operations

Tip

As seen in this example, a subexpression may be referred to

multiple times simply by referring to the backreference as needed

Let's look at one more example User information is stored in a database, and phone numbers are stored in the format 313-555-1234 However, you need to reformat the phone numbers as (313) 555-1234 Here is the example:

313-555-1234

248-555-9999

810-555-9000

(\d{3})(-)(\d{3})(-)(\d{4})

($1) $3-$5

(313) 555-1234

(248) 555-9999

Trang 2

(810) 555-9000

Again, two regular expression patterns are used here The first looks far more complicated than it is, so let's walk through it

(\d{3})(-)(\d{3})(-)(\d{4}) matches a phone number, but breaks it into five subexpressions (so as

to isolate its parts) (\d{3}) matches the first three digits as the first

subexpression, (-) matches – as the second subexpression, and so on The end result is that the phone number is broken into five parts (each part its own

subexpression): the area code, a hyphen, the first three digits of the number,

another hyphen, and then the final four digits These five parts can be used

individually and as needed, and so ($1) $3-$5 simply reformats the number using only three of the subexpressions and ignoring the other two, thereby turning 313-555-1234 into (313) 555-1234

Tip

When manipulating text for reformatting, it is often useful to break

the text into lots of little subexpressions so as to have greater

control over that text

Converting Case

Some regex implementations support the use of conversion operations via the metacharacters listed in Table 8.1

Table 8.1 Case Conversion Metacharacters

\L Convert all characters up to \E to lowercase

\u Convert next character to uppercase

Trang 3

Table 8.1 Case Conversion Metacharacters

\U Convert all characters up to \E to uppercase

\l and \u are placed before a character (or expression) so as to convert the case of the next character \L and \U convert the case of all characters until a terminating

\E is reached

Following is a simple example, converting the text within an <H1> tag pair to uppercase:

<BODY>

<H1>Welcome to my Homepage</H1>

Content is divided into two sections:<BR>

<H2>ColdFusion</H2>

Information about Macromedia ColdFusion

<H2>Wireless</H2>

Information about Bluetooth, 802.11, and more

<H2>This is not valid HTML</H3>

</BODY>

(<[Hh]1>)(.*?)(</[Hh]1>)

Trang 4

$1\U$2\E$3

<BODY>

<H1>WELCOME TO MY HOMEPAGE</H1>

Content is divided into two sections:<BR>

<H2>ColdFusion</H2>

Information about Macromedia ColdFusion

<H2>Wireless</H2>

Information about Bluetooth, 802.11, and more

<H2>This is not valid HTML</H3>

</BODY>

The pattern (<[Hh]1>)(.*?)(</[Hh]1>) breaks the header into three subexpressions: the opening tag, the text, and the closing tag The second pattern then puts the text back together: $1 contains the start tag, \U$2\E converts the second subexpression (the header text) to uppercase, and $3 contains the end tag

Summary

Subexpressions are used to define sets of characters or expressions In addition to being used for repeating matches (as seen in the previous lesson), subexpressions can be referred to within patterns This type of reference is called a backreference (and unfortunately, there are implementation differences in backreference syntax) Backreferences are useful in text matching and in replace operations

Lesson 9 Looking Ahead and Behind

Trang 5

All the expressions used thus far have matched text, but sometimes you may want

to use expressions to mark the position of text to be matched (in contrast to the matched text itself) This involves the use of lookaround (the capability to look ahead and behind), which will be explained in this lesson

Introducing Lookaround

Again, we'll start with an example You need to extract the title of a Web page; HTML page titles are placed between <TITLE> and </TITLE> tags in the

<HEAD> section of HTML code Here's the example:

<HEAD>

<TITLE>Ben Forta's Homepage</TITLE>

</HEAD>

<[tT][iI][tT][lL][eE]>.*</[tT][iI][tT][lL][eE]>

<HEAD>

<TITLE>Ben Forta's Homepage</TITLE>

</HEAD>

<[tT][iI][tT][lL][eE]>.*</[tT][iI][tT][lL][eE]> matches the opening <TITLE> tag (in upper, lower, or mixed case), the closing </TITLE> tag, and whatever text is between them That worked

Or did it? What you needed was the title text, but what you got also contained the opening and closing <TITLE> tags Is it possible to return just the title text?

Trang 6

One solution could be to use subexpressions (as seen in Lesson 7, "Using

Subexpressions") This would allow for you to retrieve the matched text in three parts: the opening tag, the text, and the closing tag With the matched text broken into parts, it would not be too difficult to extract just that part you want

But it makes little sense to make the effort to retrieve something that you actually don't want, only to have to manually remove it What you really need here is a way

Định dạng
Số trang	6
Dung lượng	19,89 KB