A regular expression is really just a sequence of characters that specify a pattern to be matched against a string of text when performing searches and replacements.. To build a regular
Trang 1717
17
Regular Expressions
and Pattern
Matching
17.1 What Is a Regular Expression?
A user is asked to fill out an HTML form and provide his or her name, address, and birth
date Before sending the form off to a server for further processing, a JavaScript program
checks the form to make sure the user actually entered something, and that the
infor-mation is in the requested format We saw in Chapter 11, “Working with Forms and
Input Devices,” some basic ways that JavaScript can check form information, but now
with the addition of regular expressions, form validation can be much more
sophisti-cated and precise Regular expressions are also useful for searching for patterns in input
data, and replacing the data with something else or splitting it up into substrings This
chapter is divided into two main parts: (1.) how to create regular expressions and
regu-lar expression metacharacters, and (2.) how to validate form input data with reguregu-lar
expressions If you are savvy with Perl regular expressions (or the UNIX utilities, grep,
sed, and awk), you can move rapidly through the first section, because JavaScript regular
expressions, for the most part, are identical to those found in Perl
A regular expression is really just a sequence of characters that specify a pattern to be
matched against a string of text when performing searches and replacements A simple
regular expression consists of a character or set of characters that matches itself The
regular expression is normally delimited by forward slashes; for example, /abc/.
Like Perl, JavaScript1 provides a large variety of regular expression metacharacters to
control the way a pattern is found A metacharacter is a special character that represents
something other than itself, such a a ^, $,*, and so on They are placed within in the
reg-ular expression to control the search pattern; for example, /^abc/ means look for the
pat-tern abc at the beginning of the line With the help of metacharacters, you can look for
strings containing only digits, only alphas, a digit at the beginning of the line followed
by any number of alphas, a line ending with a digit, and so on When searching for a
pattern of characters, the possibilities of fine-tuning your search are endless
1 JavaScript 1.2, NES 3.0 JavaScript 1.3 added toSource() method JavaScript 1.5, NES 6.0 added m flag,
nongreedy modifier, noncapturing parentheses, look-ahead assertions ECMA 262, Edition 3
Trang 2Again, JavaScript regular expressions are used primarily to verify data input on the
client side When a user fills out a form and presses the submit button, the form is sent
to a server, and then often to a server script such as PHP, ASP.NET or a JavaServlet for
further processing Although forms can be validated by a server program, it is more
effi-cient to take care of the validation before sending the script to the server This is an
important function of JavaScript The user fills out the form and JavaScript checks to see
if all the boxes have been filled out correctly, and if not, the user is told to reenter the
data before the form is submitted to the server Checking the form on the client side
allows for instant feedback, and less traveling back and forth between the browser and
server It might be that the server-side program does its own validation anyway, but if
JavaScript has already done the job, it will still save time and inconvenience for the user
With the power provided by regular expressions, the ability to check for any type of
input, such as e-mail addresses, passwords, Social Security numbers, and birthdates is
greatly simplified This chapter will teach you how regular expressions and their
metacharacters are used so that you will be able to read expressions even as complicated
as the one shown in Figure 17.1 There are a number of regular expression validators
and libraries on the Web An excellent source is at http://www.regexlib.com.
Figure 17.1 A regular expression library The user types “email” in the Search box
See Figure 17.2 for results.
Trang 317.2 Creating a Regular Expression
A regular expression is a pattern of characters It shouldn’t be any surprise by now
Java-Script regular expressions are objects When you create a regular expression, you test
the regular expression against a string For example, the regular expression /green/ might
be matched against the string “The green grass grows” If green is contained in the string,
then there is a successful match
Building a regular expression is like building a JavaScript string If you recall, you can
create a String object the literal way or you can use the String() constructor method To
build a regular expression object, you can assign a literal regular expression to a variable,
or you can use the RegExp constructor to create and return a regular expression object
17.2.1 The Literal Way
To create a regular expression object with the literal notation, you assign the regular
expression to a variable The regular expression is a pattern of characters enclosed in
Figure 17.2 The result of searching for the email regular expression considered to
be the best.
Trang 4forward slashes After the closing forward slash, options may be provided to modify
the search pattern The options are i, g, and m See Table 17.1
If you are not going to change the regular expression, say, if it is hard-coded right into
your script, then this literal notation is faster, because the regular expression is evaluated
at runtime
17.2.2 The Constructor Method
The constructor method, called RegExp(), creates a RegExp object The RegExp()
con-structor takes one or two arguments The first argument is the regular expression; it is a
string representing the regular expression, for example, “green” represents the literal
regular expression /green/ The second optional argument is called a flag such as i for
case insensitivity or g for global The constructor method is used when the regular
expression is being provided from some other place, such as from user input, and can
change throughout the run of the program This method is handled at runtime
Table 17.1 Options Used for Modifying Search Patterns
g Used to match for all occurrences of the pattern in the string.
F O RM A T
var variable_name = /regular expression/options;
E X A M P L E
var myreg = /love/;
var reobj = /san jose/ig;
F O RM A T
var variable_name = new RegExp("regular expression", "options");
E X A M P L E
var myreg = new RegExp("love");
var reobj = new RegExp("san jose", "ig");
Trang 517.2.3 Testing the Expression
The RegExp object has two methods that can be used to test for a match in a string,
the test() method and the exec() method, which are quite similar The test() method
searches for a regular expression in a string and returns true if it matched and false if
it didn’t The exec() method also searches for a regular expression in a string If the
exec() method succeeds, it returns an array of information including the search string,
and the parts of the string that matched If it fails, it returns null This is similar to the
match() method of the String object Table 17.2 summarizes the methods of the
Reg-Exp object
The test() Method. The RegExp object’s test() method is used to see if a string
con-tains the pattern represented in the regular expression It returns a true or false
Bool-ean value After the search, the lastIndex property of the RegExp object contains the
position in the string where the next search would start (A string starts at character
position 0.) If a global search is done, then the lastIndex property contains the starting
position after the last pattern was matched (See Example 17.4 to see how the lastIndex
property is used.)
Steps to test for a match:
1 Assign a regular expression to a variable
2 Use the regular expression test() method to see if there is a match If there is a
match, the test() method returns true; otherwise, it returns false There are also
four string methods that can be used with regular expressions (See section
“String Methods Using Regular Expressions” on page 727.)
Table 17.2 Methods of the RegExp Object
exec Executes a search for a match in a string and returns an array.
test Tests for a match in a string and returns either true or false.
F O RM A T
var string="String to be tested goes here";
var regex = /regular expression/; // Literal way
var regex=new RegExp("regular expression"); // Constructor way
or
/regular expression/.test("string");
Trang 6ptg
E X A M P L E
var myString="She wants attention now!";
var regex = /ten/ // Literal way
var regex=new RegExp("ten"); // Constructor way
regex.test(myString); // Looking for "ten" in myString
or
/ten/.test("She wants attention now!");
E X A M P L E 1 7 1
<html>
<head><title>Regular Expression Objects the Literal Way</title>
<script language = "JavaScript">
1 var myString="My gloves are worn for wear.";
2 var regex = /love/; // Create a regular expression object
3 if (regex.test(myString)){
4 alert("Found pattern!");
} else{
5 alert("No match.");
}
</script>
</head>
<body></body>
</html>
E X P L A N A T I O N
1 “My gloves are worn for wear.” is assigned to a variable called myString.
2 The regular expression /love/ is assigned to the variable called regex This is the
literal way of creating a regular expression object
3 The test() method for the regular expression object tests to see if myString contains
the pattern, love If love is found within gloves, the test() method will return true.
4 The alert dialog box will display Found pattern! if the test() method returned true.
5 If the pattern /love/ is not found in myString, the test() method returns false, and
the alert dialog box will display its message, No match.
E X A M P L E 1 7 2
<html>
<head>
<title>Regular Expression Objects with the Constructor</title>
<script language = "JavaScript">
1 var myString="My gloves are worn for wear.";
Trang 7The exec() Method. The exec() method executes a search to find a match for a
spec-ified pattern in a string If it doesn’t find a match, exec() returns null; otherwise it returns
an array containing the string that matched the regular expression
2 var regex = new RegExp("love"); // Creating a regular
// expression object
3 if ( regex.test(myString)){
4 alert("Found pattern love!");
} else{
5 alert("No match.");
}
</script>
</head>
<body></body>
</html>
E X P L A N A T I O N
1 The variable called myString is assigned “My gloves are worn for wear.”
2 The RegExp() constructor creates a new regular expression object, called regex.
This is the constructor way of creating a regular expression object It is assigned
the string “love”, the regular expression.
3 The test() method for the regular expression object tests to see if myString
con-tains the pattern, love If it finds love within gloves, it will return true.
4, 5 The alert dialog box will display Found pattern! if the test() method returned true,
or No match if it returns false See Figure 17.3.
Figure 17.3 My gloves are worn for wear.” contains the pattern love.
F O RM A T
array = regular_expression.exec(string);
E X A M P L E
list = /ring/.exec("Don't string me along, just bring me the goods.");
E X A M P L E 1 7 2 (C O N T I N U E D)
Trang 817.2.4 Properties of the RegExp Object
There are two types of properties that can be applied to a RegExp object The first type is
called a class property (see Table 17.3) and applies to the RegExp object as a whole, not
a simple instance of a regular expression object The input property is an example of a
class property It contains the last string that was matched, and is applied directly to the
RegExp object as RegExp.input.
The other type of property is called an instance property and is applied to an instance
of the object (see Table 17.4); for example, mypattern.lastIndex refers to the position
within the string where the next search will start for this instance of the regular
expres-sion object, called mypattern These properties will be explained in examples throughout
this chapter
E X A M P L E 1 7 3
<html>
<head><title>The exec() method</title>
<script type="text/javascript">
1 var myString="My lovely gloves are worn for wear, Love.";
2 var regex = /love/i; // Create a regular expression object
3 var array=regex.exec(myString);
4 if (regex.exec(myString)){
alert("Matched! " + array);
} else{
alert("No match.");
}
</script>
</head>
<body></body>
</html>
E X P L A N A T I O N
1 The string “My gloves are worn for wear.” is assigned to myString.
2 The regular expression /love/ is assigned to the variable regex.
3 The exec() method returns an array of values that were found.
4 If the exec() method doesn’t return null, then there was a match See Figure 17.4.
Figure 17.4 The array returned by exec() contains love.
Trang 9Table 17.3 Class Properties of the RegExp Object
input Represents the input string being matched.
lastMatch Represents the last matched characters.
lastParen Represents the last parenthesized substring pattern match.
leftContext Represents the substring preceding the most recent pattern match.
RegExp.$* Boolean value that specifies whether strings should be searched over
multiple lines; same as the multiline property.
RegExp.$& Represents the last matched characters.
RegExp.$_ Represents the string input that is being matched.
RegExp.$‘ Represents the substring preceding the most recent pattern match (see
the leftContext property).
RegExp.$’ Represents the substring following the most recent pattern match (see
the rightContextproperty).
RegExp.$+ Represents the last parenthesized substring pattern match (see the
lastParen property).
RegExp.$1,$2,$3 Used to capture substrings of matches
rightContext Represents the substring following the most recent pattern match.
Table 17.4 Instance Properties of the RegExp Object
global Boolean to specify if the g option was used to check the expression
against all possible matches in the string.
ignoreCase Boolean to specify if the i option was used to ignore case during a string
search.
lastIndex If the g option was used, specifies the character position immediately
following the last match found by exec() or test().
multiline Boolean to test if the m option was used to search across multiple lines.
source The text of the regular expression.
Trang 10E X A M P L E 1 7 4
<html>
<head>
<title>The test() method</title>
</head>
<body bgcolor="silver">
<font face="arial" size="+1">
<script type = "text/javascript">
1 var myString="I love my new gloves!";
2 var regex = /love/g; // Create a regular expression object
3 var booleanResult = regex.test(myString);
if ( booleanResult != false ){
4 document.write("Tested regular expression <em>"+
regex.source + ".</em> The result is <em>"
+ booleanResult + "</em>");
document.write(".<br>Starts searching again at position " +
5 regex.lastIndex + " in string<em> \"" +
6 RegExp.input + "\"<br />");
document.write("The last matched characters were: "+
7 RegExp.lastMatch+"<br />");
document.write("The substring preceding the last match is:
8 "+ RegExp.leftContext+"<br />");
document.write("The substring following the last match is:
9 "+ RegExp.rightContext+"<br />");
} else{ alert("No match!"); }
</script>
</font>
</body>
</html>
E X P L A N A T I O N
1 The string object to be tested is created
2 A regular expression object, called regex, is created.
3 The test() method returns true or false if the regular expression is matched in the
string
4 The source property is applied to regex, an instance of a RegExp object It contains
the text of the regular expression, /love/.
5 The lastIndex property is applied to an instance of a RegExp object It represents
the character position right after the last matched string
6 The input class property represents the input string on which the pattern
match-ing (regular expression) is performed
7 lastMatch is a class property that represents the characters that were last matched.