It takes these arguments: • $html The HTML to convert • $url The URL of the page being converted • $style If “yes”, style and JavaScript elements are retained, otherwise they are stripp
Trang 1C h a p t e r 7 : T h e I n t e r n e t 171
C h a p t e r 7 : T h e I n t e r n e t 171
F IGURE 7-10 With this plug-in you can make the busiest of web pages load quickly on a mobile browser
F IGURE 7-11 This is the original Yahoo! home page before the plug-in is applied.
Trang 2172 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
172 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
About the Plug-in This plug-in accepts a string containing the HTML to be converted, along with other required arguments, and returns a properly formatted HTML document with various formatting elements removed It takes these arguments:
• $html The HTML to convert
• $url The URL of the page being converted
• $style If “yes”, style and JavaScript elements are retained, otherwise they are
stripped out
• $images If “yes”, images are kept, otherwise they are removed
Variables, Arrays, and Functions
changed to in order to ensure it is absolute
PIPHP_RelToAbsURL() Plug-in 21: This function converts a relative URL to absolute
How It Works This function starts off by creating a DOM object that is loaded with the HTML from $html Then an XPath object is created from this, with which all a href= tags are extracted and placed
in the object $hrefs After initializing the arrays $links and $to, which will contain the links before and after converting to absolute format, all occurrences of & are converted to & symbols, and then all & symbols to the token !!**1**!!, to avoid the suspected str_ replace() bug that doesn’t handle & symbols well
Next the link parts of the tags are pulled out from $hrefs and placed into the array
$links using a for loop, and all duplicate links are removed from the array, which is then sorted
After this, the technique used in plug-ins 46 and 48 is implemented to swap all links in
$html with numbered tokens This ensures that multiple replaces don’t interfere with each other First the $to array is loaded with a proper URL which has had any !!**1**!! tokens changed back to & symbols after running them through PIPHP_RelToAbsURL() to ensure they are absolute This makes sure that legal URLs will be substituted when the tokens are later changed back
To be flexible, the plug-in supports three types of links—double quoted, single quoted, and unquoted—each case being handled by one of the str_replace() calls This function substitutes links within $html for the token !!$count!! This means that the first link becomes !!0!!, the second !!1!!, and so on, as $count is incremented at each pass
Trang 3C h a p t e r 7 : T h e I n t e r n e t 173
C h a p t e r 7 : T h e I n t e r n e t 173
With all the tokens having been substituted they can now be swapped with their associated links from the $to array This is achieved using the following for loop
Then, any remaining occurrences of the URL encoded format http%3A%2F%2F are rectified
to http://, and any !!**1**!! tokens are returned to being & symbols
Next, if $style does not have the value “yes”, then whitespace, styling, and JavaScript are removed from $html
After this, $images is also tested and if it’s equal to “yes”, then images are allowed to remain in place This is achieved, along with removing all remaining tags, by appending the tag <img> to the list of allowed tags in $allowed, which is then passed to the strip_tags() function, along with $html If $images is not equal to “yes”, then the <img> tag will not be appended to $allowed, and consequently all image tags will also be removed by this function Upon completing all the processing, the result (in $html) is returned
How to Use It
To convert HTML to a format more suitable for mobile browsers, use the plug-in like this:
$url = "http://yahoo.com";
$html = file_get_contents($url);
$style = "no";
$images = "no";
echo PIPHP_HTMLToMobile($html, $url, $style, $images);
This loads in the HTML from the index page at www.yahoo.com and then passes it to the
plug-in with both $style and $images set to “no” This means that neither styling nor JavaScript will be allowed in the converted HTML, and neither will images
If $style is set to “yes”, then style tags and JavaScript are retained in the HTML If
$images is also equal to “yes”, then some images will be retained—but not all, due to a lot
of the page’s content being removed
If you play with this plug-in you’ll find that often you can set both $style and $images
to “yes” and many web pages will still return a lot less information because the strip_
tags() function removes plenty of HTML not strictly needed to use a web page
Remember that this plug-in relies on plug-in 21, PIPHP_RelToAbsURL() Therefore, you must also copy it into your program or otherwise include it
The Plug-in
function PIPHP_HTMLToMobile($html, $url, $style, $images) {
$dom = new domdocument();
@$dom ->loadhtml($html);
$xpath = new domxpath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$links = array();
$to = array();
$count = 0;
$html = str_replace('&', '&', $html);
$html = str_replace('&', '!!**1**!!', $html);
Trang 4174 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
for ($j = 0 ; $j < $hrefs->length ; ++$j) $links[] = $hrefs->item($j)->getAttribute('href');
$links = array_unique($links);
sort($links);
foreach ($links as $link) {
if ($link != "") {
$temp = str_replace('!!**1**!!', '&', $link);
$to[$count] = urlencode(PIPHP_RelToAbsURL($url, $temp)); $html = str_replace("href=\"$link\"",
"href=\"!!$count!!\"", $html);
$html = str_replace("href='$link'", "href='!!$count!!'", $html);
$html = str_replace("href=$link", "href=!!$count!!", $html);
++$count;
} }
for ($j = 0 ; $j < $count ; ++$j) $html = str_replace("!!$j!!", $to[$j], $html);
$html = str_replace('http%3A%2F%2F', 'http://', $html);
$html = str_replace('!!**1**!!', '&', $html);
if (strtolower($style) != "yes") {
$html = preg_replace('/[\s]+/', ' ', $html);
$html = preg_replace('/<script[^>]*>.*?<\/script>/i', '', $html);
$html = preg_replace('/<style[^>]*>.*?<\/style>/i', '', $html);
}
$allowed = "<a><p><h><i><b><u><s>";
if (strtolower($images) == "yes") $allowed = "<img>";
return strip_tags($html, $allowed);
}
Trang 5CHAPTER 8 Chat and Messaging