Each of these subexpressions is enclosed within another subexpression with an | between each so that one of the four subexpressions has to match, but not all.. to match the final IP addr
Trang 1localhost is 127.0.0.1
This pattern uses a series of nested subexpressions The first is
(((\d{1,2})|(1\d{2})|(2[0-4]\d)|(25[0-5]))\.), a set of four nested subexpressions (\d{1,2}) matches any one- or two-digit number, or numbers 0 through 99
(1\d{2}) matches any three-digit number starting with 1 (1 followed by any 2 digits), or numbers 100 through 199 (2[0-4]\d) matches numbers 200 through 249 (25[0-5]) matches numbers 250 through 255 Each of these subexpressions is enclosed within another subexpression with an | between each (so that one of the four subexpressions has to match, but not all) After the range of numbers comes \
to match , and then the entire series is enclosed into yet another subexpression and repeated three times using {3} Finally, the range of numbers is repeated (this time without the trailing \.) to match the final IP address number The pattern thus
validates the format of the string to be matched (that it is four sets of numbers separated by ) and validates that each of the numbers has a value between 0 and
255
Note
This IP address example is explained in detail in Lesson
7, "Using Subexpressions."
URLs
URL matching is a complicated task—or rather, it can be complicated depending
on how flexible the matching needs to be At a minimum, URL matching should match the protocol (probably http and https), a hostname, an optional port, and a path
http://www.forta.com/blog
https://www.forta.com:80/blog/index.cfm
http://www.forta.com
http://ben:password@www.forta.com/
Trang 2http://localhost/index.php?ab=1&c=2
http://localhost:8500/
https?://[-\w.]+(:\d+)?(/([\w/_.]*)?)?
http://www.forta.com/blog
https://www.forta.com:80/blog/index.cfm
http://www.forta.com
http://ben:password@www.forta.com/
http://localhost/index.php?ab=1&c=2
http://localhost:8500/
https?:// matches http:// or https:// (the ? makes the s optional) [-\w.]+ matches the hostname (:\d+)? matches an optional port (as seen in the second and sixth lines in the example) (/([\w/_.]*)?)? matches the path, the outer subexpression matches / if one exists, and the inner subexpression matches the path itself As you can see, this pattern cannot handle query strings, and it misreads embedded username:password pairs However, for most URLs it will work adequately (matching hostnames, ports, and paths)
Note
This regular expression is one that should not be case sensitive
Tip
Trang 3To accept ftp URLs as well, replace the https? with (http|https|ftp)
You can do the same for other URL types if needed
Complete URLs
A more complete (and slower) pattern would also match URL query strings (variable information passed to a URL and separated from the URL itself by a ?),
as well as optional user login information, if specified
http://www.forta.com/blog
https://www.forta.com:80/blog/index.cfm
http://www.forta.com
http://ben:password@www.forta.com/
http://localhost/index.php?ab=1&c=2
http://localhost:8500/
https?://(\w*:\w*@)?[-\w.]+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?
http://www.forta.com/blog
https://www.forta.com:80/blog/index.cfm
http://www.forta.com
http://ben:password@www.forta.com/
http://localhost/index.php?ab=1&c=2
http://localhost:8500/
Trang 4This pattern builds on the previous example https?:// is now followed by
(\w*:\w*@)? This new pattern checks for embedded user and password (username and password separated by : and followed by @) as seen in the fourth line in the example In addition, (\?\S+)? (after the path) matches the query string, ? followed
by additional text, and this, too, is made optional with ?
Note
This regular expression is one that should not be case sensitive
Tip
Why not always use this pattern over the previous one? In
performance, this is a slightly more complex pattern and so it will
run slower; if the extra functionality is not needed, it should not be
used
Email Addresses
Regular expressions are frequently used for email address validation, and yet
validating a simple email address is anything but simple
My name is Ben Forta, and my
email address is ben@forta.com
(\w+\.)*\w+@(\w+\.)+[A-Za-z]+
My name is Ben Forta, and my
Trang 5email address is ben@forta.com
(\w+\.)*\w+ matches the name portion of an email address (everything before the
@) (\w+\.)* matches zero or more instances of text followed by , and \w+
matches required text (this combination matches both ben and ben.forta, for example) @ matches @ (\w+\.)+ then matches at least one instance of text followed by , and [A-Za-z]+ matches the top-level domain (com, edu, us, or uk, and so on)
The rules governing valid email address formats are extremely complex This pattern will not validate every possible email address For example, it will allow ben forta@forta.com (which is invalid) and will not allow IP addresses as the hostname (which are allowed) Still, it will suffice for most email validation, and
so it may work for you
Note
Regular expressions used to match email addresses should usually
not be case sensitive
HTML Comments
Comments in HTML pages are placed between <! and > tags (use at least two hyphens, although more are allowed) Being able to locate all comments is useful when browsing (and debugging) Web pages
<! Start of page >
<HTML>
<! Start of head >
<HEAD>
<TITLE>My Title</TITLE> <! Page title >
Trang 6</HEAD>