Home > Programming > RSS/Atom parsing: Easy as pie, but only after a simple fix.

RSS/Atom parsing: Easy as pie, but only after a simple fix.

September 13, 2011

I recently discovered how much I love SimplePie, a PHP class that parses ATOM and RSS feeds. I’ve tried doing it manually in the past, and let me tell you that it’s a pain in the butt.

However, there’s one little issue that makes the pie less simple. This is one of those times when the error message doesn’t relate to the actual error. The message itself was:

This XML document is invalid, likely due to invalid characters. XML error: Mismatched tag at line 156, column 11

But that couldn’t possibly be the actual issue. For one thing, the XML file was only about 50 lines long, so there was no way the error could be on line 156. Besides, it worked perfectly on the W3C’s validator, Windows Live Mail, and even the online SimplePie demo. Weirdest of all, when I saved a local copy of the feed, SimplePie was able to open that just fine.

It turns out that the problem came from a bug in SimplePie: The equals sign (=) and ampersand (&) were being URL-encoded to %3D and %26, respectively. This bug was supposed to be fixed in the 1.2.1-dev version, which I’m using, but somehow the characters only got added to one of the two lines where they were needed.

The end result was that any feed that was automatically generated based on GET data in the URL wouldn’t work. I was testing with a feed from NASDAQ, and the URL was http://www.nasdaq.com/aspxcontent/NasdaqRSS.aspx?data=quotes&symbol=MSFT. When I changed the & and = characters in that url to their encoded values (http://www.nasdaq.com/aspxcontent/NasdaqRSS.aspx?data%3Dquotes%26symbol%3DMSFT), I got this nice custom HTTP 500 error page.

That explains everything: SimplePie messed up the URL, then tried to parse the error page as a feed. Adding the & and = to the appropriate line in simplepie.inc solved the problem.

If you’re working on a PHP script that parses RSS or ATOM feeds, I highly recommend giving SimplePie a try. This bug will probably be fixed soon, and in the mean time, adding two characters to one line of code is way easier than trying to write your own parser.

Categories: Programming Tags: ,
%d bloggers like this: