To demonstrate the use of string boundaries, look at the following example.. Following is a simple test to check whether text is an XML document:... $ is used much the same way.. This pa
Trang 1To demonstrate the use of string boundaries, look at the following example Valid XML documents begin with <?xml> and likely have additional attributes
(possibly a version number, as in <xml version="1.0" ?>) Following is a simple test to check whether text is an XML document:
<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"
<\?xml.*\?>
<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"
The pattern appeared to work <\?xml matches <?xml, * matches any other text (zero or more instances of ), and \?> matches the end ?>
But this is a very inaccurate test Look at the example that follows; the same pattern is being used to match text with extraneous text before the XML opening:
Trang 2This is bad, real bad!
<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"
<\?xml.*\?>
This is bad, real bad!
<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"
The pattern <\?xml.*\?> matched the second line of the text And although the opening XML tag may, in fact, be on the second line of text, this example is
definitely invalid (and processing the text as XML could cause all sorts of
problems)
What is needed is a test that ensures that the opening XML tag is the first actual text in the string, and that's a perfect job for the ^ metacharacter as seen next:
Trang 3<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"
^\s*<\?xml.*\?>
<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"
The opening ^ matches the start of the string; ^\s* therefore matches the start of the string followed by zero or more whitespace characters (thus handling legitimate spaces, tabs, or line breaks before the XML opening) The complete
^\s*<\?xml.*\?> thus matches an opening XML tag with any attributes and correctly handles whitespace, too
Tip
The pattern ^\s*<\?xml.*\?> worked, but only because the
XML shown in this example is incomplete Had a complete XML
listing been used, you would have seen an example of a greedy
quantifier at work This is, therefore, a great example of when to
use *? instead of just *
Trang 4$ is used much the same way This pattern could be used to check that nothing comes after the closing </html> tag in a Web page:
</[Hh][Tt][Mm][Ll]>\s*$
Sets are used for each of the characters H, T, M, and L (so as to be able to handle any combination of upper- or lowercase characters), and \s*$ matches any
whitespace followed by the end of a string
Note
The pattern ^.*$ is a syntactically correct regular expression; it
will almost always find a match, and it is utterly useless Can you
work out what it matches and when it will not find a match?
Using Multiline Mode
^ matches the start of a string and $ matches the end of a string—usually There is
an exception, or rather, a way to change this behavior
Many regular expression implementations support the use of special
metacharacters that modify the behavior of other metacharacters, and one of these
is (?m), which enables multiline mode Multiline mode forces the regular
expression engine to treat line breaks as a string separator, so that ^ matches the start of a string or the start after a line break (a new line), and $ matches the end of
a string or the end after a line break
If used, (?m) must be placed at the very front of the pattern, as shown in the following example, which uses a regular expression to locate all JavaScript
comments within a block of code:
<SCRIPT>
Trang 5function doSpellCheck(form, field) {
// Make sure not empty
if (field.value == '') {
return false;
}
// Init
var windowName='spellWindow';
var spellCheckURL='spell.cfm?formname=comment&fieldname='+field.name;
// Done
return false;
}
</SCRIPT>
(?m)^\s*//.*$
<SCRIPT>
function doSpellCheck(form, field) {
// Make sure not empty
if (field.value == '') {
Trang 6return false; }