RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

Resolution on Syria vetoed at UN
Russia and China veto an Arab and Western-backed UN resolution condemning the violent crackdown in Syria, hours after scores are killed in Homs.
Date: Sat, 04 Feb 2012 21:32:13 GMT
Warning as heavy snowfall hits UK
Snow is falling across much of the UK, from southern Scotland to the English Midlands, disrupting flights and leading to calls for drivers to take care.
Date: Sat, 04 Feb 2012 18:37:31 GMT
William starts Falklands duties
Prince William starts work as an RAF search and rescue pilot in the Falkland Islands, having arrived in the territory on a six-week routine deployment.
Date: Sat, 04 Feb 2012 17:49:24 GMT
Arrest in Betty Yates murder case
A man is arrested in connection with the murder of Worcestershire pensioner Betty Yates.
Date: Sat, 04 Feb 2012 14:56:33 GMT
Nevada holds Republican vote
Republicans in the US state of Nevada take part in caucuses to decide their choice of presidential candidate with Mitt Romney leading the field.
Date: Sat, 04 Feb 2012 17:34:06 GMT
Fidel Castro launches his memoirs
Former Cuban President Fidel Castro appears in public for the first time since April 2011 to launch a two-volume book of memoirs.
Date: Sat, 04 Feb 2012 19:22:55 GMT
Huhne's exit 'a loss to cabinet'
Chris Huhne's resignation as Energy Secretary will be a loss to the cabinet and the Lib Dems, his former parliamentary private secretary tells BBC News.
Date: Sat, 04 Feb 2012 11:32:54 GMT
Thousands in rival Moscow marches
Tens of thousands of people march in Moscow in protest at Prime Minister Vladimir Putin, while his supporters hold a rally elsewhere in Russia's capital.
Date: Sat, 04 Feb 2012 16:02:11 GMT
ANC youth leader appeal dismissed
South African youth leader Julius Malema loses his appeals against the ruling ANC's decision to suspend him for bringing the party into disrepute.
Date: Sat, 04 Feb 2012 14:29:43 GMT
Baths site reopens after 11 years
A swimming pool in Glasgow whose closure 11 years ago led to violent clashes with police is reopened by Scots actor and director Peter Mullan.
Date: Sat, 04 Feb 2012 13:59:33 GMT

cnn

Russia, China veto resolution aimed at Syrian violence
The draft resolution before the U.N. Security Council would demand an end to the bloody crackdown on government opponents in Syria.
Date: Sat, 04 Feb 2012 15:51:56 EST
Romney poised to win Nevada caucuses
Mitt Romney has a chance today in Nevada to do what no Republican presidential candidate has done so far this election cycle: win two contests back to back.
Date: Sat, 04 Feb 2012 12:12:55 EST
Martin: Why don't candidates see poor?
In the 1,257 GOP debates we have had to sit through, poverty and the poor have rarely come up, so it was no surprise that Mitt Romney would be dismissive of them in an interview this week with CNN's Soledad O'Brien.
Date: Sat, 04 Feb 2012 12:15:51 EST
Civilian deaths rise in Afghanistan
Across Afghanistan, civilians live in fear that they will become part of a grim and ever growing tally -- as innocent casualties in a decade of conflict. And according to a United Nations report released Saturday tracking civilian casualties, that rate of such deaths rose yet again in 2011.
Date: Sat, 04 Feb 2012 16:10:19 EST
Fidel Castro unveils 1,000-page memoir
Fidel Castro has released a previously unannounced two-volume memoir of his life, Cuban state-run media reported Saturday.
Date: Sat, 04 Feb 2012 15:46:07 EST
2nd teacher at L.A. school arrested
Another teacher has been arrested at Miramonte Elementary School in Los Angeles on allegations of lewd acts on young pupils, authorities said.
Date: Fri, 03 Feb 2012 20:44:23 EST
Police: Suspect on run kills Ala. officer
An Alabama robbery suspect fatally stabbed a police officer in jail, escaped in a stolen patrol car and wounded another officer before he was killed, authorities said Friday.
Date: Sat, 04 Feb 2012 03:02:53 EST
WikiLeaks: Soldier faces court martial
Pfc. Bradley E. Manning, who is suspected of leaking thousands of classified documents to WikiLeaks, will be court martialed on charges that could lead to a sentence of life in prison, the Army said Friday in a statement.
Date: Fri, 03 Feb 2012 21:37:16 EST
Lance Armstrong doping case dropped
Federal prosecutors said Friday that they are closing a criminal investigation into alleged use of performance-enhancing drugs by champion cyclist Lance Armstrong without filing charges.
Date: Fri, 03 Feb 2012 20:52:42 EST
Police arrest 6 at Occupy DC camp
U.S. Park Police in riot gear entered the Occupy DC camp in McPherson Square in downtown Washington early Saturday.
Date: Sat, 04 Feb 2012 14:16:30 EST

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

ABC_0629-5

StewartJames posted a photo:

ABC_0629-5

Beverly Hills At Night

Date: 2012-02-04T21:40:47Z
085

pwbaker posted a photo:

085

Date: 2012-02-04T21:40:50Z
P1010939

KG Müschenbach posted a photo:

P1010939

Date: 2012-02-04T21:40:50Z
Grabsch

laurasia280 posted a photo:

Grabsch

Date: 2012-02-04T21:40:50Z
DSC04683

CLme posted a photo:

DSC04683

Date: 2012-02-04T21:40:52Z
Dani and Zoe

amymhathaway posted a photo:

Dani and Zoe

Date: 2012-02-04T21:40:52Z
DSC_9718LRLR

xstc posted a photo:

DSC_9718LRLR

Date: 2012-02-04T21:40:52Z
IMG_0584

Lisa Kettell posted a photo:

IMG_0584

Date: 2012-02-04T21:40:52Z
BMX Riders-428

steph1808 posted a photo:

BMX Riders-428

Date: 2012-02-04T21:40:52Z
DSC09583

Luis Serichol posted a photo:

DSC09583

Date: 2012-02-04T21:40:52Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'whisky' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url