Archive

Archive for the ‘Programming’ Category

TDD: Why Coding to Pass the Tests Makes Sense

August 31, 2013 Comments off

Lately I’ve been working on learning Test-Driven Development (TDD) and trying to use it for a project or two. I’m finding the TDD way of thinking a bit difficult to wrap my mind around. Much of it initially seems like the wrong way to do things. Eventually, though, I find that things just start falling into place and it becomes clear why TDD imposes such seemingly odd requirements.

For those who don’t know, Test-Driven Development is a methodology that aims to make unit testing easier by making developers produce more testable code. Its goal is to ensure that there is a test for every piece of functionality. One big advantage is that the resulting unit tests can double as a specification, so it aids documentation as well as testing. Since I’m still learning, though, you should probably not rely on me for an explanation. Instead, you might try the Wikipedia article on TDD or the tutorial in the SimpleTest documentation that I started with. (I know, PHPUnit is supposed to be better, but I haven’t had any luck setting it up yet.)

The part of TDD that bothered me was not the idea of writing the tests before the actual program code, but the insistence that the program code be written so that it just passes the tests. If you just have a test that checks for a return value, for instance, the program code isn’t supposed to do anything but return the expected value. This seemed silly to me: Why write code that passes the test, but doesn’t actually do anything? I felt like that defeated the whole purpose of testing by producing code that passed the tests but didn’t actually work. Sure, it makes sense for the tests to be implementation-agnostic, but the implementation should actually do something.

It finally clicked a couple weeks ago when I started a new project and did things the TDD way even when they didn’t make sense to me. I wrote a test, wrote some code that just passed it, and then went back and wrote another test that would require the code to change. Eureka! For one thing, it dawned on me that all I needed to do was keep writing more tests that required the program to do what it needed to do. More to the point, I noticed that making these tiny, incremental changes helped ensure that I would never introduce any functionality for which there wasn’t already a functioning test. And isn’t ensuring that the tests cover as much of the program code as possible what TDD is all about?

If you don’t mind my extrapolating a bit, I think I learned something else, too: If you’re going to learn a methodology, you have to accept it on its own terms. I don’t mean that programming requires blind faith; just that it’s important to keep an open mind rather than approaching a methodology with the stance that you’re going to keep the parts that fit your existing assumptions and reject the rest. In other words, learn to follow the rules before you break them.

Having had that insight, I’m eager to see what else I can learn from this experiment. If I come up with anything interesting, I’ll post it here.

Nice, Clean HTML output at any length

September 17, 2011 Comments off

I have written previously about the rather spiffy SimplePie. For the record, I really like it so far.

But there’s one thing I want to do with feeds that SimplePie doesn’t support: Trimming output. It’s easy enough if you plan on stripping all markup from the output, but it gets trickier if you want to keep it. This feature is planned for the 2.0 release, which may or may not have to do with the “approximately 100 Jillion-Kabillion-Bazillion support questions” on the subject. (Now that’s a user base!)

Needing a solution that would work in the interim, I searched the Internet for a while and found an interesting function to truncate text and keep the HTML by Jonas Raoni Soares Silva. It worked well given good input, but it inserted extra end tags when the input contained improperly nested tags or missing end tags inside correctly paired tags.

Figuring that the last thing I need is a bunch of rogue </div> end tags breaking my layout, I decided to see if I could come up with something on my own. Here’s what I came up with:

function trim_html($string, $length = null, $suffix = '&hellip;'){

	// Trim the string to $length--if necessary (i.e. if a number given for $length).
	if (is_numeric($length)){
		$string = substr($string, 0, $length); // Get only first $length characters of $string
		$string .= $suffix; // If trimming, add the ellipsis or other specified suffix.
	} // endif 
	
	// Next, create a DOM document from the trimmed string.
	// The DOMDocument will correct any errors when loading the HTML.
	$dom = new DOMDocument();
	@$dom->loadHTML($string); // This can produce lots of warnings, so ignore errors.
	$string = $dom->saveHTML();
	
	// Remove the extra HTML added by saveHTML.
	$string = preg_replace('/^.*<body[^>]*>/is', '', $string);
	$string = preg_replace('/<\/body[^>]*>.*$/is', '', $string); 
	
	return($string);
} // end trim_html()

Of course, there are some issues. (Nothing’s ever easy.) For one thing, the DOMDocument::saveHTML method in recent versions of PHP (5.3.6 and up) lets you pass a DOMNode and get back only “a subset of the document.” I would have liked to try this, as the regular expressions seem hackish to me for some reason. However, my test environment is running 5.3.5, so I can’t test this.

Another issue is that, since a text node can’t be a child of <body>, loadHTML wraps orphaned text nodes in paragraphs. So trim_html('test') would return <p>test</p>. I suppose I could add code to wrap input in a <div> and then tweak the regular expressions to strip those out, should the need arise. For now, the extra paragraphs don’t bother me, so I’ll leave them alone.

Here’s hoping that this is of some use to someone out there. If anyone has any suggestions for improvements, I’d love to hear them.

Categories: Programming Tags: , ,

RSS/Atom parsing: Easy as pie, but only after a simple fix.

September 13, 2011 Comments off

I recently discovered how much I love SimplePie, a PHP class that parses ATOM and RSS feeds. I’ve tried doing it manually in the past, and let me tell you that it’s a pain in the butt.

However, there’s one little issue that makes the pie less simple. This is one of those times when the error message doesn’t relate to the actual error. The message itself was:

This XML document is invalid, likely due to invalid characters. XML error: Mismatched tag at line 156, column 11

But that couldn’t possibly be the actual issue. For one thing, the XML file was only about 50 lines long, so there was no way the error could be on line 156. Besides, it worked perfectly on the W3C’s validator, Windows Live Mail, and even the online SimplePie demo. Weirdest of all, when I saved a local copy of the feed, SimplePie was able to open that just fine.

It turns out that the problem came from a bug in SimplePie: The equals sign (=) and ampersand (&) were being URL-encoded to %3D and %26, respectively. This bug was supposed to be fixed in the 1.2.1-dev version, which I’m using, but somehow the characters only got added to one of the two lines where they were needed.

The end result was that any feed that was automatically generated based on GET data in the URL wouldn’t work. I was testing with a feed from NASDAQ, and the URL was http://www.nasdaq.com/aspxcontent/NasdaqRSS.aspx?data=quotes&symbol=MSFT. When I changed the & and = characters in that url to their encoded values (http://www.nasdaq.com/aspxcontent/NasdaqRSS.aspx?data%3Dquotes%26symbol%3DMSFT), I got this nice custom HTTP 500 error page.

That explains everything: SimplePie messed up the URL, then tried to parse the error page as a feed. Adding the & and = to the appropriate line in simplepie.inc solved the problem.

If you’re working on a PHP script that parses RSS or ATOM feeds, I highly recommend giving SimplePie a try. This bug will probably be fixed soon, and in the mean time, adding two characters to one line of code is way easier than trying to write your own parser.

Categories: Programming Tags: ,