RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

Police 'risk public confidence'
Police failures to tackle anti-social behaviour risk public confidence, says the chief inspector of constabulary.
Date: Thu, 11 Mar 2010 01:50:31 GMT
Four due in court over expenses
Three Labour MPs and a Tory peer are due in court later to face charges in relation to their expenses claims.
Date: Thu, 11 Mar 2010 04:24:37 GMT
Union to consider BA strike dates
Union leaders representing British Airways cabin crew will meet later to decide whether to announce dates for a strike.
Date: Thu, 11 Mar 2010 04:00:00 GMT
Afghan criticisms unfair, PM says
Gordon Brown labels as "unfair" criticisms over the timing of his weekend visit to British troops in Afghanistan.
Date: Thu, 11 Mar 2010 04:00:33 GMT
High-speed rail line plan awaited
Plans for a new high-speed rail line between London and Birmingham are to be published by the government later.
Date: Thu, 11 Mar 2010 02:33:05 GMT
Greeks stage fresh general strike
Thousands of Greek workers are expected to bring the country to a halt with a second strike in a month over austerity measures.
Date: Thu, 11 Mar 2010 00:10:21 GMT
Clegg seeks £10bn to cut deficit
Liberal Democrat leader Nick Clegg calls for a £10bn "repayment" in the next financial year to help cut the UK's deficit.
Date: Thu, 11 Mar 2010 03:29:48 GMT
MPs query non-political ministers
Ennobling people from outside Parliament to make them ministers should be "exceptional", cross-party MPs say.
Date: Thu, 11 Mar 2010 03:09:22 GMT
Fire College 'failed fire safety'
The UK Fire Service College failed to comply with fire safety laws when part of its own premises burnt down, the BBC learns.
Date: Thu, 11 Mar 2010 01:09:24 GMT
Winter insurance claims hit £650m
Insurers paid out £650m from 335,000 claims made as a result of damage caused by the wintry weather in the UK.
Date: Thu, 11 Mar 2010 00:07:35 GMT

cnn

Who is Pennsylvania's alleged 'Jihad Jane'?
Colleen LaRose, the U.S. woman indicted for allegedly conspiring to support terrorists and kill a person in a foreign country, attempted suicide in 2005, police said.
Date: Wed, 10 Mar 2010 16:54:05 EST
Ex-Toyota lawyer: Documents withheld
When former in-house defense attorney Dimitrios Biller resigned from his top post at Toyota, he walked out with something potentially more valuable than his nearly $4 million severance package.
Date: Wed, 10 Mar 2010 22:11:33 EST
Chief justice: Obama speech 'troubling'
Simmering tension spilled into public this week when Chief Justice John Roberts labeled the political atmosphere at the State of the Union address "very troubling."
Date: Wed, 10 Mar 2010 14:35:06 EST
Myanmar bars Suu Kyi from elections
Myanmar's ruling junta has announced a new election law that disqualifies pro-democracy leader Aung San Suu Kyi from participating in upcoming national elections.
Date: Wed, 10 Mar 2010 17:11:26 EST
CDC: Most with herpes don't know it
As much as 16 percent of the U.S. population between the ages of 14 and 49 has genital herpes, according to a government study released Tuesday.
Date: Wed, 10 Mar 2010 12:42:29 EST
Workers trained to hack Defense Dept.
The Pentagon is training people to hack into its own computer networks.
Date: Wed, 10 Mar 2010 19:27:39 EST
Passenger admits disrupting flight
A Texas man who became enraged when a flight attendant refused to serve him alcohol and spent part of a flight locked in the lavatory has pleaded guilty to interfering with an airline flight crew.
Date: Wed, 10 Mar 2010 12:04:19 EST
Dems aim to ban for-profits' earmarks
House Democrats said Wednesday that they will ban earmarks directed to for-profit companies.
Date: Wed, 10 Mar 2010 18:47:53 EST
N.Y. state loses 2nd top cop in 2 weeks
New York state's top police official announced Wednesday he was quitting, the second acting superintendent to step down in as many weeks.
Date: Wed, 10 Mar 2010 13:50:14 EST

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

IMG_4341

My Life Diary posted a photo:

IMG_4341

Date: 2010-03-11T04:37:58Z
Festival of the Son 2009 078

Friends of GO posted a photo:

Festival of the Son 2009 078

Date: 2010-03-11T04:37:59Z
DSC_5495

lightmod posted a photo:

DSC_5495

Date: 2010-03-11T04:37:59Z
Mammoth Tree

JTContinental posted a photo:

Mammoth Tree

Date: 2010-03-11T04:38:00Z
BOOBAH 065

mufasah posted a photo:

BOOBAH 065

Date: 2010-03-11T04:38:00Z
DSC_1483

Benji Holzman posted a photo:

DSC_1483

Date: 2010-03-11T04:38:00Z
DSCN3151.JPG

ecast_1028455 posted a photo:

DSCN3151.JPG

Effortlessly uploaded by Eye-Fi

Date: 2010-03-11T04:38:01Z
Taking Her Reflection

LaValle PDX posted a photo:

Taking Her Reflection

Date: 2010-03-11T04:38:01Z
IMG_8272l

Tata Yap posted a photo:

IMG_8272l

Date: 2010-03-11T04:38:01Z
DSC_4623

Bella Productions posted a photo:

DSC_4623

Date: 2010-03-11T04:38:01Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'orange' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url