Matches any character except newline [a–z0–9] Matches any single character in set [^a–z0–9] Matches any single character not in set \D Matches a nondigit, same as [^0–9] \w Matches an
Trang 117.4 Getting Control—The Metacharacters
Regular expression metacharacters are characters that do not represent themselves
They are endowed with special powers to allow you to control the search pattern in some
way (e.g., find the pattern only at the beginning of line, or at the end of the line, or if it
starts with an upper- or lowercase letter, etc.) Metacharacters will lose their special
meaning if preceded with a backslash For example, the dot metacharacter represents
any single character, but when preceded with a backslash it is just a dot or period
If you see a backslash preceding a metacharacter, the backslash turns off the meaning
of the metacharacter, but if you see a backslash preceding an alphanumeric character in
a regular expression, then the backslash is used to create a metasymbol A metasymbol
provides a simpler form to represent some of the regular expression metachacters For
example, [0-9] represents numbers in the range between 0 and 9, and \d, the
metasym-bol, represents the same thing [0-9] uses the bracketed character class, whereas \d is a
metasymbol (see Table 17.6)
E X P L A N A T I O N
1 A new array object is created
2 The string “apples pears,peaches:plums,oranges” is assigned to the variable called
myString The delimiters are a tab, comma, and colon
3 The regular expression /[\t:,]/ is assigned to the variable called regex.
4 The String object’s split() method splits up the string using a tab, colon, or comma
as the delimiter The delimiting characters are enclosed in square brackets, which
in regular expression parlance is called a character class (See the section “Getting
Control—The Metacharacters” on page 733.) In simple terms, any one of the
characters listed within the brackets is a delimiter in the string The split() method
will search for any one of these characters and split the string accordingly,
return-ing an array called splitArray.
5 Each of the array elements is displayed in the page See Figure 17.10
Figure 17.10 The string is split on tabs, colons, and commas.
E X A M P L E 1 7 1 0
Trang 2E X P L A N A T I O N
This regular expression contains metacharacters (see Table 17.6) The first one is a
caret (^) The caret metacharacter matches for a string only if it is at the beginning of
the line The period (.) is used to match for any single character, including a
whitespace This expression contains three periods, representing any three characters
To find a literal period or any other character that does not represent itself, the
char-acter must be preceded by a backslash to prevent interpretation
The expression reads: Search at the beginning of the line for an a, followed by any
three single characters, followed by a c It will match, for example, abbbc, a123c, a c,
aAx3c, and so on, but only if those patterns were found at the beginning of the line
Table 17.6 Metacharacters and Metasymbols
Metacharacter/Met
Character Class: Single Characters and Digits
Matches any character except newline
[a–z0–9] Matches any single character in set
[^a–z0–9] Matches any single character not in set
\D Matches a nondigit, same as [^0–9]
\w Matches an alphanumeric (word) character
\W Matches a nonalphanumeric (nonword) character
Character Class: Whitespace Characters
\0 Matches a null character
\s Matches whitespace character, spaces, tabs, and newlines
\S Matches nonwhitespace character
Trang 3Character Class: Anchored Characters
^ Matches to beginning of line
$ Matches to end of line
\A Matches the beginning of the string only
\b Matches a word boundary (when not inside [ ])
\B Matches a nonword boundary
\G Matches where previous m//g left off
\Z Matches the end of the string or line
\z Matches the end of string only
Character Class: Repeated Characters
x? Matches 0 or 1 of x
x* Matches 0 or more of x
x+ Matches 1 or more of x
(xyz)+ Matches one or more patterns of xyz
x{m,n} Matches at least m of x and no more than n of x
Character Class: Alternative Characters
was|were|will Matches one of was, were, or will
Character Class: Remembered Characters
(string) Used for backreferencing (see the section “Remembering or
Capturing” on page 762)
\1 or $1 Matches first set of parentheses
\2 or $2 Matches second set of parentheses
\3 or $3 Matches third set of parentheses
Continues
Table 17.6 Metacharacters and Metasymbols (continued)
Metacharacter/Met
Trang 4If you are searching for a particular character within a regular expression, you can use
the dot metacharacter to represent a single character, or a character class that matches
on one character from a set of characters In addition to the dot and character class,
Java-Script has added some backslashed symbols (called metasymbols) to represent single
characters See Table 17.7 for the single-character metacharacters, and Table 17.8 on
page 742 for a list of metasymbols
The dot metacharacter matches for any single character with the exception of the
new-line character For example, the regular expression /a.b/ is matched if the string contains
an a, followed by any one single character (except the \n), followed by b, whereas the
expression / / matches any string containing at least three characters
New with JavaScript 1.5
(?:x) Matches x but does not remember the match These are called
noncapturing parentheses The matched substring cannot be recalled
from the resulting array’s elements [1], , [n] or from the predefined
RegExp object’s properties $1, , $9.
x(?=y) Matches x only if x is followed by y For example, /Jack(?=Sprat)/
matches Jack only if it is followed by Sprat /Jack(?=Sprat|Frost)/
matches Jack only if it is followed by Sprat or Frost However, neither
Sprat nor Frost are part of the match results
x(?!y) Matches x only if x is not followed by y For example, /\d+(?!\.)/
matches a number only if it is not followed by a decimal point
/\d+(?!\.)/.exec("3.141") matches 141 but not 3.141.
Table 17.7 Single-Character and Single-Digit Metacharacters
Matches any character except newline
[a–z0–9_] Matches any single character in set
[^a–z0–9_] Matches any single character not in set.
Table 17.6 Metacharacters and Metasymbols (continued)
Metacharacter/Met
Trang 5E X A M P L E 1 7 1 1
<html>
<head><title>The dot Metacharacter</title></head>
<body>
<script type="text/javascript">
1 var textString="Norma Jean";
3 var result=reg_expression.test(textString); // Returns true
// or false
document.write(result+"<br />");
4 if ( reg_expression.test(textString)){ // if (result)
document.write("<b>The reg_ex /N ma/ matched the
string\""+ textString +"\".<br />");
} else{
5 document.write("No Match!");
}
</script>
</body>
</html>
E X P L A N A T I O N
1 The variable textString is assigned the string “Norma Jean”.
2 The regular expression /N ma/ is assigned to the variable reg_expression A match
is found if the string being tested contains an uppercase N followed by any two
single characters (each dot represents one character), and an m and an a It would
find Norma, No man, Normandy, and so on.
3 The test method returns true if the string textString matches the regular expression
and false if it doesn’t The variable result contains either true or false.
4 If the string “Norma Jean” contains regular expression pattern /N ma/, the return
from the test method is true, and the output is sent to the screen as shown in
Figure 17.11
5 If the pattern is not found, No Match! is displayed on the page
Figure 17.11 The user entered Norma Jean, an N followed by any 2 characters,
and ma.
Trang 6A character class represents one character from a set of characters For example [abc]
matches either an a, b, or c; and [a-z] matches one character from a set of characters in
the range from a to z; and [0-9] matches one character in the range of digits between 0
to 9 If the character class contains a leading caret, ^, then the class represents any one
character not in the set; thus, [^a-zA-Z] matches a single character not in the range from
a to z or A to Z, and [^0-9] matches a single digit not in the range between 0 and 9
JavaScript provides additional symbols, called metasymbols, to represent a character
class The symbols \d and \D represent a single digit and a single nondigit, respectively;
the same as [0-9] and [^0-9]; whereas \w and \W represent a single word character and
a single nonword character, respectively; same as [A-Za-z_0-9] and [^A-Za-z_0-9].
E X A M P L E 1 7 1 2
<html>
<head><title>The Character Class</title></head>
<body>
<script type="text/javascript">
1 var reg_expression = /[A-Z][a-z]eve/;
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); // Returns true
// or false
document.write(result+"<br />");
if ( result){
document.write("<b>The reg_ex /[A-Z][a-z]eve/ matched the
string\""+ textString +"\".<br />");
} else{
alert("No Match!");
}
</script>
</body>
</html>
E X P L A N A T I O N
1 The variable is assigned a bracketed regular expression containing alphanumeric
characters This regular expression matches a string that contains at least one
up-percase character ranging between A and Z, followed by one lowercase character
ranging between a and z, followed by eve.
2 The variable textString is assigned user input, in this example Steven lives in
Cleve-land was entered.
3 The regular expression test() method will return true because Steven contains an
uppercase character, followed by a lowercase character, and eve Cleveland also
matches the pattern The variable result contains either true or false See the
out-put in Figures 17.12 and 17.13
Trang 7Figure 17.12 The user entered Steven lives in Cleveland, one uppercase letter
[A-Z], followed by one lowercase letter [a-z], followed by eve This matches both
Steven and Cleveland.
Figure 17.13 When the user entered Believe! (top), it didn’t match (bottom).
Would it have matched if he or she had entered BeLieve Why?
E X A M P L E 1 7 1 3
<html>
<head><title>The Character Class</title></head>
<body>
<script type="text/javascript">
// Character class
1 var reg_expression = /[A-Za-z0-9_]/;// A single alphanumeric
// word character
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); // Returns true
// or false
Trang 8document.write(result+"<br />");
if (result){
document.write("<b>The reg_ex /[A-Za-z0-9_]/ matched the
string\""+ textString +"\".<br />");
} else{
alert("No Match!");
}
</script>
</body>
</html>
E X P L A N A T I O N
1 A regular expression object, an alphanumeric character in the bracketed character
class [A-Za-z0-9_] is assigned to the variable called reg_expression This regular
expression matches a string that contains at least one character in the character
class ranging between A and Z, a and z, 0 and 9, and the underscore character, _.
2 User input is entered in the prompt dialog box and assigned to the variable
text-String In this example the user entered Take 5.
3 The regular expression test method will return true because this string Take 5
con-tains at least one alphanumeric character (see Figure 17.14)
Figure 17.14 User entered Take 5 (top) The string contained at least one
alphanumeric character (bottom).
E X A M P L E 1 7 1 4
<html>
<head><title>The Character Class and Negation</title></head>
<body>
E X A M P L E 1 7 1 3 (C O N T I N U E D)
Trang 9Metasymbols offer an alternative way to represent a character class For example, instead
of representing a number as [0-9], it can be represented as \d, and the alternative for
rep-resenting a nonnumber [^0-9] is \D Metasymbols are easier to use and to type than
metacharacters
<script type="text/javascript">
// Negation within a Character Class
2 var textString=prompt("Type a string of text","");
3 var result=reg_expression.test(textString); // Returns true
// or false
document.write(result+"<br />");
if (result){
document.write("<b>The reg_ex /[^0-9]/ matched the
string\""+ textString +"\".<br />");
} else{
alert("No Match!");
}
</script>
</body>
</html>
E X P L A N A T I O N
1 The caret inside a character class, when it is the first character after the opening
bracket, creates a negation, meaning any character not in this range This regular
expression matches a string that does not contain a number between 0 and 9.
2 User input is assigned to the variable textString In this example, abc was entered.
3 The regular expression test() method will return true because the string abc does
not contain a character ranging from 0 to 9 (see Figure 17.15)
Figure 17.15 The user entered abc It contains a character that is not in the range
between 0 and 9.
E X A M P L E 1 7 1 4 (C O N T I N U E D)