A for loop is then started to iterate through the $left array and check whether $page already exists in the datafile.. Using this pointer, the matching element in $right is compared with
Trang 1using a loop to iterate through each element of $data, the much faster and more efficient array_map() function is called This does the same thing, only requiring the name of a function to call for each element In the case of populating the $left array, which will be assigned all the left halves of each line, the function PIPHP_PU_F1() is called For the
$right array, PIPHP_PU_F2() is called
The reason for the split is that the checksum and URL are stored side by side on a line, separated only by the token !1!, which is unlikely to appear in any URL
A for loop is then started to iterate through the $left array and check whether $page already exists in the datafile If so, $exists is set to point to the element number within the array where it is located Using this pointer, the matching element in $right is compared with the value of $checksum and, if it is the same, zero is returned to indicate that the page
is still the same as last time the program checked
If, on the other hand, $page exists in the datafile but $checksum does not match the saved value, then the page contents must have changed In this case, the old checksum value in the datafile is overwritten with the new value in $checksum using the str_replace() function, the datafile is saved back to disk, and a value of 1 is returned to indicate that the web page has changed
At the end of the if (file_exists($datafile)) set of statements, if the file does not already exist, then the string $rawfile is assigned the empty string
Finally, whether or not the file exists, the contents of $rawfile are saved to disk, along with the values of $page and $checksum, separated by the token !1! This has the effect of either creating the datafile if it doesn’t exist, or if it does, a new line of data is appended to
it, followed by a \n newline character Either way, a value of -1 is returned to indicate that the URL in $page was new to the datafile and has now been saved
Note that the two functions PIPHP_PU_F1() and PIPHP_PU_F2() are for the exclusive use of the main plug-in and are not intended to be called elsewhere
How to Use It
To use this plug-in, call it like this:
$page = "http://pluginphp.com";
$datafile = "urldata.txt";
$result = PIPHP_PageUpdated($page, $datafile);
Then, to act on the value in $result, you might use code such as this:
echo "<pre>(1st call) The URL '$page' is ";
if ($result == -1) echo "New";
elseif ($result == 1) echo "Changed";
elseif ($result == 0) echo "Unchanged";
else echo "Inaccessible";
This will tell you (or your users) whether the index page at www.pluginphp.com has
changed since the last time it was checked, or whether it is new to the datafile or even inaccessible The first time you make the call regarding a new page it will always report that the page is new If you try an additional call (such as via the following code) immediately
Trang 2after on a site that is not dynamically generated, you will then be informed that the page is unchanged, otherwise you’ll be told it has changed:
$result = PIPHP_PageUpdated($page, $datafile);
echo "<br />(2nd call) The URL '$page' is ";
if ($result == -1) echo "New";
elseif ($result == 1) echo "Changed";
elseif ($result == 0) echo "Unchanged";
else echo "Inaccessible";
You might prefer to send an e-mail instead of displaying this information to a browser, in which case just replace the echo statements with a call to plug-in 38, PIPHP_SendEmail(), sending the contents of the echo statements in the $message argument
The Plug-in
function PIPHP_PageUpdated($page, $datafile) {
$contents = @file_get_contents($page);
if (!$contents) return FALSE;
$checksum = md5($contents);
if (file_exists($datafile)) {
$rawfile = file_get_contents($datafile);
$data = explode("\n", rtrim($rawfile));
$left = array_map("PIPHP_PU_F1", $data);
$right = array_map("PIPHP_PU_F2", $data);
$exists = -1;
for ($j = 0 ; $j < count($left) ; ++$j) {
if ($left[$j] == $page) {
$exists = $j;
if ($right[$j] == $checksum) return 0;
} }
if ($exists > -1) {
$rawfile = str_replace($right[$exists], $checksum, $rawfile);
file_put_contents($datafile, $rawfile);
return 1;
} } else $rawfile = "";
file_put_contents($datafile, $rawfile "$page!1!$checksum\n");
Trang 3return -1;
} function PIPHP_PU_F1($s) {
list($a, $b) = explode("!1!", $s);
return $a;
} function PIPHP_PU_F2($s) {
list($a, $b) = explode("!1!", $s);
return $b;
}
HTML To RSS
The popularity of RSS (Really Simple Syndication) feeds is still growing due to the ease with which you can subscribe to a feed and have updates automatically sent to the feed reader In fact, most decent browsers also offer RSS reading facilities But what if you’re too busy developing the HTML portion of your site to start building RSS feeds? Or what if you’d like to be able to view other web sites in RSS?
The solution comes with this plug-in, which will fetch a web page, analyze it, strip out non-essential and formatting items, and reformat it into RSS (see Figure 7-8 for an example)
F IGURE 7-8 The plug-in is used to output the McGraw-Hill web site as an RSS feed.
48
Trang 4About the Plug-in This plug-in accepts a string containing the HTML to be converted, along with other required arguments, and returns a properly formatted RSS document It takes these arguments:
• $html The HTML to convert
• $title The RSS feed title to use
• $description The RSS description to use
• $url The URL to which the feed should link
• $webmaster The e-mail address of the responsible webmaster
• $copyright The copyright details
Variables, Arrays, and Functions
changed to in order to ensure it is absolute
PIPHP_RelToAbsURL() Plug-in 21: This function converts a relative URL to absolute How It Works
This plug-in starts by setting the string variable $date to the current date and time, in a format that is acceptable to RSS readers Then all instances of & (the XML and XHTML required form of the & symbol) are converted to just the & symbol, and then all & symbols are changed to a special token with the value !!**1**!! As described in plug-in 46, this is done because the str_replace() function seems to have a bug relating to the use of the & symbol, so the token is substituted to avoid it The & symbols will be swapped back later After that, the code has much in common with many of the other plug-ins in this chapter in that it must traverse an HTML DOM (Document Object Model), ensuring all a href= links are in absolute format It does this by creating a new DOM object in $dom and then loading it up with the HTML tags from $html Then a new XPath object is created in
$xpath This is used by $xpath->evaluate to extract all the a href= tags into the $hrefs array
Next the arrays $links and $to are initialized These will respectively contain all the encountered links and the absolute forms to which they should be changed A counter that will index into these arrays, $count, is also initialized
Trang 5A for loop is then used to extract the links from each a href= tag into the array
$links, which then has all duplicates removed using the array_unique() function
This simply removes duplicates in place so the array can be sorted so that all elements are stored contiguously
A foreach loop is then used to iterate through each link, first checking that a link actually has been assigned a value If it has, the string variable $temp is assigned a version
of $link without any !!**1**!! tokens that may have replaced any & symbols This
ensures a properly formed URL is ready for converting to absolute format using the PIPHP_ RelToAbsURL() function, and for assigning to an element in the $to array
Again, as in plug-in 46, tokens are substituted for all links within the main document to prevent potential clashes during multiple replace operations Every form of allowable link
is substituted, whether single, double, or unquoted: href="link", href='link', and href=link The tokens take the form !!$count!! and therefore start at !!0!! and proceed
on through !!1!! and so on each time a new link is substituted
Once all the tokens are in place in the document, and there is no chance of clashes during string substitutions, a for loop is used to convert them into the absolute URLs held
in the $to array
Next, any encoded URLs in which http:// has been turned into http%3A%2F%2F are restored back to http://, any & symbols are restored back from the token !!**1**!!, and all whitespace is removed from the document using the preg_replace() function with a parameter of /[\s]+/, which forces all consecutive strings of one or more whitespace characters to be replaced with a single space
The next lines strip out any <script> and <style> tags and their contents, followed by ensuring that all <h> tags have their contents removed This is done so a conversion can easily be made later into RSS headers
With those tags removed, all remaining tags are also stripped out, with the exception of those listed in the string $ok This process is handled by the function strip_tags() In case you’re wondering, I tried to also remove the <script> and <style> tags using strip_
tags(), but the function seems buggy and would not always remove them, so that’s why these are handled separately
After that, all remaining HTML characters are replaced with their RSS equivalents; so, for example, the < symbol becomes <, the > becomes >, and so on
The final two preg_replace() calls substitute the two opening and closing forms of the
<h> tag, which previously had any contents stripped out, into the XML required for properly formatted RSS headers In other words, this plug-in assumes that anything between <h> and
</h> tags should be treated as RSS headers
Finally, the RSS itself is returned within a return <<<_END _END construct, where you can see $title, $url, $description, and all the other variables in their correct places, all the way down to $html, the main contents of the feed on which this plug-in has performed all the processing
How to Use It When you want to convert HTML to RSS, you can use code such as the following, in which
your web site domain is assumed to be myserver.com:
$html = "Your HTML content goes here";
$title = "RSS version of my webpage";