Archive for September, 2012

Why <b> and <i>?

September 29, 2012 Comments off

Common Web development wisdom states that the <b> and <i> tags, and the various other presentational font style elements, should be scrupulously avoided because they are purely presentational in nature and carry no semantic meaning. So imagine my surprise to learn that not only would they be in HTML 5, but they are standard already according to HTML 4.01 strict! (Of the font style elements, only <u> and <s> are deprecated.)

My question is, why? That’s not to say I disagree with the decision, but what’s the rationale?

TheWHATWG FAQ explains it thus:

The inclusion of these elements is a largely pragmatic decision based upon their widespread usage, and their usefulness for use cases which are not covered by more specific elements.

While there are a number of common use cases for italics which are covered by more specific elements, such as emphasis (em), citations (cite), definitions (dfn) and variables (var), there are many other use cases which are not covered well by these elements. For example, a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, or a ship name.

Similarly, although a number of common use cases for bold text are also covered by more specific elements such as strong emphasis (strong), headings (h1-h6) or table headers (th); there are others which are not, such as key words in a document abstract or product names in a review.

Some people argue that in such cases, the span element should be used with an appropriate class name and associated stylesheet. However, the b and i elements provide for a reasonable fallback styling in environments that don’t support stylesheets or which do not render visually, such as screen readers, and they also provide some indication that the text is somehow distinct from its surrounding content.

In essence, they convey distinct, though non-specific, semantics, which are to be determined by the reader in the context of their use. In other words, although they don’t convey specific semantics by themselves, they indicate that that the content is somehow distinct from its surroundings and leaves the interpretation of the semantics up to the reader.

That makes sense. In addition to the given examples, I imagine it would be appropriate to use <i> for scare italics to indicate use-mention distinction. There’s also a pretty strong case to be made that using <cite> to mark up a book title even when it’s not technically a citation is a misuse. (Case in point: MLA or APA citations, in which the whole entry is the citation, but only the title is italicized, and only for certain kinds of works.)

I disagree with some of the other examples, though. For instance, wouldn’t it be more appropriate to use <q> with a class name and some CSS rules for thoughts? (I assume that they’re referring to the way thoughts are often italicized in prose fiction to distinguish them from spoken dialogue.) Aren’t keywords (often rendered in bold or italics) a special class of emphasis? If not, WHATWG HTML uses the <mark> tag to denote “relevance” as distinct from “importance.”

Another suggested use of <b> in the WHATWG spec is as a lede. A lede is just the first sentence or paragraph, after all. That means that the semantic meaning (such as it is) of a lede is already defined just by where it is in the document. So why bother with <b> at all? In the case of highlighting only the first sentence rather than the whole paragraph, why choose <b class="lede"> over <span class="lede">? I can’t think of any reason but the presentational nature of <b>, because (again) what little semantic meaning a lede has is already implied by its position in the document. A CSS solution using the :first-child pseudo-class to bold the first <p> in an <article> or <section> would work fine, perhaps using something like <p class="lede"> as a fallback. (You could even use :first-line if you don’t mind cheating the definition of “lede&rdquo; a little bit. I don’t advocate cheating definitions, but I have seen printed works bold the first line rather than the lede, properly speaking.)

By far my biggest disagreement with WHATWG concerns this:

The problem with elements like <font> isn’t that they are presentational per se, it’s that they are media-dependent (they apply to visual browsers but not to speech browsers). While <b>, <i> and <small> historically have been presentational, they are defined in a media-independent manner in HTML5. For example, <small> corresponds to the really quickly spoken part at the end of radio advertisements.

First of all, it’s completely at odds with my understanding of the semantic Web. The way I learned it, HTML is supposed to describe content, not how that content should be displayed. Presentational information is the job of CSS, just as behavior is the job of JavaScript. Thus it actually is a problem that elements like <font> (and <b> and <i>) are presentational. However, maybe that’s just a difference of opinion between WHATWG and the sources from which I learned this attitude.

More importantly, I can see little difference between “presentational per se” and “media-dependent.” How something is presented depends on the medium in which it is presented. To illustrate, let’s look at that statement about <small>. You can’t have spoken words that are small, any more than you can have text (be it on screen or in print) that is spoken quickly, because by definition it is not spoken.

I object most strongly to the sentence before last: “[These tags] are defined in a media-independent manner in HTML5.” This misses the entire point. The <b> and <i> tags are explicitly named after bold and italics. That tag names are supposed to have meaning is clear from the existence of tags like <em>, <cite>, <address>, and <code>, or the introduction of tags like <article>, <section>, <nav>, and <header>. Granting the alleged distinction between presentation and media-dependence doesn’t help: They may claim that the tags are now media-independent, but defining in a spec that you can speak in italics doesn’t make it so.

Returning to the <small> example, it seems to me that what WHATWG seems to be after isn’t size, speed, or any other presentational attributes, but rather de-emphasis or unimportance. The legalese in ads is printed small or spoken quickly because it is less important than the rest of the ad copy. The point of the ad is that you should buy the product, not that conditions apply and results may vary, even though that information is important enough to be included. The <aside> element, styled appropriately, would work well here. (Admittedly, <aside> is not suited for parentheticals within the flow of the text, but the ad copy example doesn’t have that issue. Use of <small> for such parentheticals, though, seems to differ from using a <span> only in presentation, which—again—is a job for CSS.)

After all this huffing and puffing on my part, the bottom line is that tags like <i> should be kept around because there are good reasons for using them, but that doesn’t mean there aren’t also bad reasons.

So what should Web authors do? All I can offer is my opinion: Use tags with clear semantic meaning (like <em>) wherever possible, but don’t stretch their definitions just to avoid using a presentational tag. Use the presentational tags, with appropriate class names, when the rules of grammar and style that you follow prescribe that presentation for the kind of content you’re marking up and there’s not a more semantic tag that will work. This follows both the meaning given in the FAQ (namely that there is some semantic difference from the surrounding text) and the literal meaning of the tag name (namely that the text ought to be displayed in, for instance, italics). It’s basically the same principle as using quotation mark characters rather than <q> for something that is supposed to be enclosed in quotation marks but isn’t a quotation. Again, this is just my opinion, and I’m not aware of whether it conflicts with any established best practices, but so far it seems like a good approach.

Categories: HTML Tags: ,