|
|
 |
howto / php / content syndication
|
Imitation is the sincerest form of flattery
In this fast-paced modern world where information is
power, having the power to use constantly-changing content can be
essential. I have used PHP to read in content from many dynamic exterior
sources, and it's not terribly difficult.
On the other hand, there
are some restrictions and conflicted
legal opinions questioning if you have the
right to use other people's content.
|
|
Step A: reading in foreign content |
|
For the simplest example, I can pull in the daily
weather forecast, while ignoring all the other junk on the page. In my
simplest example, I use fopen to establish an HTTP 1.0 connection to the
server, from which I read all the lines, find where the important stuff begins
and ends, then print out the prudent content.
Examples:
wunderground forecast: 48103
My stripped, lynx friendly rendition
Sample Code:
<html> <head> <title>Ann Arbor Weather</title> <link rel="stylesheet" href="styles.css" type="text/css"> </head> <body bgcolor="#ffffff">
<p><a href="http://www.crh.noaa.gov/forecasts/MIZ075.php"> http://www.crh.noaa.gov/forecasts/MIZ075.php</a></p>
<table cellpadding="7" cellspacing="0" border="0"> <tr> <td valign="top"> <img src="http://www.bmcmedia.net/webcam/bmccam.jpg" width="352" height="288" alt="" border="0" /> <br /><br /> <img src="http://weather.yahoo.com/images/northeast_sat_440x297.jpg" width="440" height="297" alt="" border="0" /> </td> <td> <?php
$src = 'http://www.wunderground.com/cgi-bin/findweather/getForecast?query=48103';
$stop = 0; $start = 1; $fp = fopen ($src, "r"); while ((!feof ($fp)) && (!$stop)) { $line = fgets($fp, 4096);
if (preg_match("/Nowcast as of/", $line)) { $start = 0; } if (preg_match("/Forecast for Washtenaw/", $line)) { $start = 0; }
if (preg_match("/Air Pollution/", $line)) { $stop = 1; } if ( !$start && ( preg_match( "/smalltableheader/", $line ))) { $stop = 1; }
if (!$start) { if ( preg_match( "/<table /", $line )) { $stop = 1; } elseif ( preg_match( "/<\/?table[^>]*>/", $line )) { ; } else { $line = preg_replace("/<img src[^>]*>/", '', $line); $line = preg_replace("/<(\/)?td[^>]*>/", "<$1p>", $line); $line = preg_replace("/<\/?(tr[^>]*|font|center)>/", '', $line); $line = preg_replace("/<p><\/p>/", '', $line); echo $line; } } } fclose($fp); ?> </td> </tr> </table>
</body> </html>
|
|
Step B: processing what you need |
|
A similar strategy can be applied toward stock quotes.
The task here is to get a pure number for a specific ticker symbol. Again,
regular expressions are your friend. I've been pulling my own numbers reports from yahoo's
finance board.
Please select a stock ticker symbol to see the next example in action:
<?php $src = "http://finance.yahoo.com/q?s=$_POST[symbol]&d=v1";
echo "<p>\n"; $fp = fopen ($src, "r"); while ((!feof ($fp)) && (!$found)) { $line = fgets($fp, 4096);
if (preg_match("/<font face=arial size=-1><a href=\"\/q\?s=/", $line)) { $found = 1; $Pieces = preg_split("/<\/td>/", $line); $Pieces = preg_replace("/<[^>]*>/", "", $Pieces); echo "<p>$Pieces[0]: $Pieces[2]</p>\n"; } } fclose($fp);
if (!$found) { echo "<p>Ticker symbol not found</p>\n"; } echo "</p>\n";
?>
|
|
Step C: Circumventing protection schemes |
|
Google frowns upon "Automated Querying" and will block attempts to use fopen, as demonstrated above. One way to get around this is by using curl, or Client URL Library Functions. The search implemented in the navigation strip on this site was done using a curl call.
The functions used were:
- curl_init
- curl_setopt with parameters CURLOPT_URL, CURLOPT_HEADER, and CURLOPT_RETURNTRANSFER
- curl_exec
- curl_close
Unfortunately, google has once again changed
their blocking strategy to nullify the above
procedures, and I've had to once again disable
the search ability on the site. Oh well,
I guess you can always find what you want here.
$Id: content_syndication.html,v 1.7 2005/02/17 20:01:18 willn Exp $
|
|
 |