Submit Link>>


First, we are a little concerned. In our data sample there were twice as many pages that used the table element but didn't use the td element, than there were pages that used the samp element. For every page that you can find that uses the samp element, we can find two that use the table element in a completely bogus fashion!

Our initial thought was that maybe this was cases like the Geocities footer:

...but that isn't it, because our analysis only counted start tags, not end tags. If someone can explain why so many pages would use a <table> tag and then not put any cells in it, please let us know.

There's not much surprise in the attributes; as we all know, HTML tables are mostly abused for layout purposes, and this is clearly visible in the fact that all but one of the thirty five most commonly used table-related attributes are presentational (well, we could argue that class="", id="", colspan="", and rowspan="" were being used for semantic purposes, but who are we trying to kid here?).

The onmouseover and onmouseout attributes are seen on td elements; in fact the td element is the second most likely element to see these event handler attributes, after a and before img and div.

The count of caption/thead/tfoot elements probably represents the cases of non-presentational tables. The tbody elements are probably mostly from authoring tools that are internally written to support HTML, and therefore output the implied tbody elements explicitly. I have no explanation for the relatively high number of th elements.

Some of the semantic attributes on the table elements are used. The table element sees the summary attribute quite a bit, considering (though we have not checked how many of them simply say "Layout table, two columns, menu in the first column, content in the second" or some such). Also, although it isn't shown on the graphs above, the td element sees the headers, scope and abbr attributes sometimes too (and the last two are also sometimes seen on th as well). There was no measurable use of the axis attribute, though.

Typos were quite common; the td element, for example, had more pages with widht, witdh, aling, valing, with, and heigth attributes than it had pages with headers attributes.

Also common were misplaced attributes; again looking at td we see alt, face, and size attributes (possibly useful on other elements but certainly not td). There were also cases of "What were you thinking?", most notably <td wrap=""> (was nowrap intended?), and <td span=""> (colspan maybe?).

The <p> element

The commonly used attributes on p are unsurprising:

A few sites use the dynamicanimation attribute on the p element, a FrontPage extension. A few also used the language attribute, though we cannot determine why. If it was an attempt at specifying the lang attribute then one would expect to see a lot of successful attempts, but in fact there weren't enough pages that used that attribute on the p element to even register on the radar, so it seems unlikely that that is the explanation. Maybe attempts to set the script element's attribute of the same name?

The <br> element

The br element is a simple one, yet used on so many pages that it is the 8th most-used element. It is used more than the p element.

There are very few legitimate semantic places to use this element (addresses and poems are the canonical examples), which means that most uses are probably presentational. Its two most commonly used attributes are certainly presentational, and the third is almost certainly used presentationally as well.

The soft attribute doesn't appear to be supported by any modern browsers. We couldn't find any formal documentation for it; presumably it is an obsolete Netscape feature.

The \ "attribute" is almost certainly the result of people writing markup like <br\> when intending to do <br/>. Of course, neither is particularly useful to browsers when the page is sent as text/html (as all these pages were).

The <html> element

The html element's popular attributes:

The most-used attribute on html elements is xmlns, from misguided people using XHTML but sending it as text/html. They even (just) outnumber the people who specify the lang attribute!

A whole slew of people are specifying the xml:lang attribute, which will have absolutely no effect (no HTML processor will look at that attribute; it's an XML attribute). And finally, the fourth most-used attribute on the html element is the dir attribute (used by people who write in languages written right-to-left to make the text render in the right way).

All the other attributes used on html are invalid. Most (all?) of the xmlns:foo attributes are artefacts of Microsoft Office's creative "HTML" output, and the id attribute — not legal on the html element in HTML4! — was used by people to allow user stylesheets to target their sites. (This is now redundant since newer Web browsers give users that kind of control themselves.)
The <head> element

The head element is the most popular, apparently. Do people specify any attributes on it?

Short answer: not very often! It turns out that a tiny but measurable number of people do use the profile attribute, though. The three most-often used values are,, and This makes XFN the most popular HTML metadata profile!

The other values of profile we found in the sample data were all below the threshold of significance, but for some reason a large number of sites seem to have one or two pages with profile attributes that point at themselves.
The <title> element

The title element is pretty boring:

Only one attribute is used on the title element often enough to appear on the radar, and that's the quite legitimate lang attribute.

We can't even really say anything about bad markup; the title element is the one element that is absolutely required on every HTML page, and indeed, it seems the overwhelming majority of pages specify it.

The <img> element

Few surprises with the img element:

It's a popular element; most pages, apparently, have an image somewhere. Most people give at least some of their images dimensions. Specifying the border is common too. Our guess (unverified) is that people are generally turning the border off (historically, images by default have a blue/purple border around them when they are part of a link).

Around three quarters of pages with images have at least one image with an alt attribute.

Comparatively few pages align their images with the align attribute. (The cynical amongst us concluded from this that most people probably put their images in tables and align the tables instead.) Even fewer give their images the hspace and vspace attributes (non-standard extensions equivalent to CSS margins). The valign attribute is virtually unused.

Image maps are not used on most pages, but they are used. In fact, nearly ten percent of pages in the sample used them, which is quite significant when you stop to think about when you last saw an image map. The numbers here map reasonably closely to the number of pages that use map and area elements, so this isn't a false positive:

The title attribute enjoys some use. We wonder, though, if, if IE stopped showing the alt attribute as a tooltip, we would see a big increase in the number of pages that used title and a big decrease in the number of pages using the alt attribute.

Something that we found interesting in this data is the relatively low number of pages that use onmouseover and onmouseout. This likely maps quite closely to the number of pages doing image rollovers. It would be interesting to compare the results one would get by segregating pages by the year of their Last-Modified headers, to see if that makes a difference.

Another attribute that is used relatively rarely is ismap. Accessibility evangelism efforts can take from these results that server-side image maps are less of a problem than, say, the use of presentational markup.

Speaking of accessibility, the longdesc attribute did register as one of the top 1000 most-used attributes, but it isn't clear whether those hits were legitimate uses or merely programs being thorough (and useless). The latter is not unheard of; for example the HTML4 DTD says that the default value of the a element's shape attribute is "rect", and so many pages actually explicitly and uselessly say <a href="..." shape="rect">. (You can see from the fact that approximately no pages use the coords attribute with the a element that in fact those cases of shape are indeed all bogus — you can see above that with the area element the coords attribute is used on more pages than shape.)

The <script> element

The script element was used on roughly half the pages we checked. The most common attributes:

There really is no reason for using language these days. It's been deprecated since forever, and quite obviously a lot of people can't spell it. And given that more than half of pages specify the type attribute anyway... this is probably mostly a matter of "just in case" cargo-cult authoring. (It's worth noting that in the current HTML5 proposals, the language attribute is gone, leaving only type, and that that is being made optional, defaulting to text/javascript.)

Lots of src: more than half the pages we examined apparently use external scripts somewhere.

That's it as far as popular attributes go. In the rarely-used bucket, we have mostly IE-specific things. The defer attribute, currently implemented only by IE (to our knowledge), is underdefined in HTML4. There have been proposals for dropping it in HTML5, but apparently it is used, so maybe instead the specification will have to describe what it actually means (exactly). The for and event attributes are IE extensions similar to XML Events (although simpler). It is interesting that they are used at all. fptype is a FrontPage extension.

It would be interesting (though quite hard) to examine the uses of charset to determine how many of them were wrong or redundant. The aforementioned proposals for HTML5 don't mention the charset attribute currently; if the attribute is used for good reason, though, it may have to be added.

Editors and their custom markup

GoLive's footprints are all over the Web. A scary number of pages use <table gridx="" gridy="" showgridx="" showgridy="">, not to mention the multitude of <csscriptdict>, <csactiondict>, and <csobj> elements.

GoLive is of course far from the only offender. There are more <o:p> elements (from Microsoft Office) on the Web than there are <h6> elements. There are also plenty of <x-claris-window>, <x-claris-tagview>, and <x-sas-window> elements (from Claris Homepage, we presume). Apparently Actinic, a British company that produces e-commerce solutions, has software that is now quite widely deployed , too: <actinic:basehref>, <actinic:section>, <actinic:nowserving>, and <actinic:curraccount> elements litter the Web. Macromedia join in the fun as well, with <mm:endlock> and <mm:beginlock> elements found on a number of pages (the former somewhat more than the latter, oddly). NetObjects Fusion is the source of a startling number of nof="" attributes on many elements (not quite enough to hit any of the "popular attributes" charts, but hiding just below the fold of the table, body, img, td and a elements' tables).

Some of the more obscure cases of non-standard tags we found include a series of tags with the st1: prefix, such as <st1:city>, and <st1:placetype>, <st1:country-region>, <st1:state>, which we are told come from Microsoft Office ("smarttags"). Those four tags are used more often than the ins and del elements from HTML4 (and there are others).

Of interest to the SVG crowd may be the fact that all of the elements mentioned so far are more popular than IE's VML. <v:stroke> is the most popular VML element, followed by <v:shape>, <v:shapetype>, <v:path>, <v:f>, <v:formulas>, <v:imagedata>, and <v:fill>. The last of those is only used about 40% as often as the first one. (There's actually a v:shape attribute that is used on div elements a lot more than the v:foo elements, as well.)

Certain individual sites use custom markup that appeared on the radar, too: the New York Times, for instance, with, for example, their <NYT_COPYRIGHT> element.

The good thing, if we can be forgiven for trying to remain optimistic in the face of all this non-standard markup, is that at least these elements are all clearly using vendor-specific names. This massively reduces the likelihood that standards bodies will invent elements and attributes that clash with any of them.

Optimization - Details - submitfree - submitpaid - contactUS - contactUS - contactUS - contactUS - contactUS