Agent Strings in Popular Browsers.

Agent Strings in Popular Browsers

By Gerry Patterson

A Short Summary of the Long Sad History of Agent Strings.

The history of user agent strings is a long and somewhat convoluted saga that may serve as a cautionary tale of laissez-faire development standards (1). In case you missed it here is a brief summary of the state of user agent strings. In the early days of browsers, one of the most popular was Mosaic. Netscape adopted the Mosaic standards which came to be known as Mozilla (Mosaic + Godzilla). The user agent string was originally meant to convey information concerning the browser. And Netscape agent strings have mostly contained the word "Mozilla" meaning Mozilla compliant. This was also because some sites were looking for the word "Mozilla" in the agent string. Strictly speaking the information contained in the Netscape agent string was incorrect, since the browser was not Mozilla, it was Netscape.

Also Netscape wasn't a hundred per cent compliant. As it became the most popular browser in the world, the corporation that owned Netscape began to add features, which they claimed were urgently required. Such was the urgency, in fact, that they had to preempt any deliberations of standards committees. There was no time for consultation with the user or development communities. In this regard their behaviour was similar to that of many large software development corporations in recent times. In most cases, the pressing necessity for change had less to do with the need to upgrade an ailing system and more to do with the time-honoured practice of locking customers in and competitors out of a market by vertical integration.

Large US Software companies took advantage of the fact that their legal system had not caught up with the computer industry. In other fields of entrepeneural activity, large companies that indulged in vertical integration, ran the risk of being in violation of US Anti-Trust legislation. Corporations in the IT sector often employed it with impunity. In the seventies and eighties IBM turned it into an art form. Later still, Microsoft turned it into a science. Netscape Corporation took a leaf from their book.

In any case the new features were added to Netscape. These were purported to be for the enhancement of the HTML standards and the enlightment of the Internet community, though a cynical observer may have come to the conclusion that many of the new features added during the nineties had more to do with consolidating market dominance and fending of challenges from new contenders. As the market began to fragment around market leaders, certain web sites used the information contained in the user agent string to serve up specific pages for specific browsers, a practice which came to be known as browser sniffing. I will not subject you to lengthy lamentation or an acrimonous tirade against this practice. There is plenty available elsewhere on the web. The fact is, they did it, and it became wide-spread. It was a classic case of act in haste, repent at leisure.

Microsoft, one of the new contenders in the browser market, was by this stage an experienced practitioner of vertical integration. As the newcomer, they realised what was required in order for the MSIE browser to get a foothold on the back of the Windows operating system and hence gain access to sites that unwisely employed browser sniffing.

And so the Microsoft browser used an agent string that was similar to the Netscape browser, the clear market leader at the time. In effect MSIE pretended to be Netscape.

Of course the rest is history. Microsoft is now the market leader, and most new browsers that enter the market, and these days there are quite a few of them, use an agent string which is similar to MSIE. Following on with established tradition these new contenders often pretend that they are Microsoft, who are still pretending to be Netscape, who started the whole thing off by pretending to be Mozilla. Are you confused? Don't worry, you are not alone. Just about everyone is confused about this topic.

Even someone who gives some thought to on-going maintenance, and most people who adopt browser sniffing don't give any thought to the subject, can see that the potential maintenance overburden can easily increase by a 100 percent even for a well-planned implementation of this perilous practice. Poorly implemented browser sniffing could add a back-breaking maintenace overburden. Speaking personally, the prospect of cutting unecessary code leaves me quite unexcited. And unless you are a Javascript programmer trying to ensure job security at someone else's site, the strategy has little to recommend it.

Now, I did promise not to launch into a tirade against the practice of browser sniffing. So I will not devote any more space to discussing why, just like other types of substance sniffing, it is dangerous, addictive and a serious health hazard. The current parlous state of the user agent string should be proof enough. And there is plenty of such discussion already on the web.

Also, I find the positive arguments for simple design more persuasive then the negative arguments criticising browser sniffing, so I should mention why it is a good practice to serve content that consists of standard HTML which can be read by any browser. The most compelling reason is, as already mentioned, the amount of time it saves the owner and/or operator of the site. This should lend sufficient weight to the argument to stand on its' own. Still if you are not convinced, then consider that users actually prefer sites that use base level code, and just as important, search engines prefer such sites. If you are not convinced by now, consider that sites that adopt this strategy can devote more effort to content rather than form, which further enhances their appeal. It should not come as a surprise that such sites often rate well. For more about this read KISS Compliant Web Sites.

Agent Identification.

So by now you are probably wondering, why bother trying to identify browsers? Not only would it be poor design to incorporate browser sniffing into a site, but it would also be unreliable since many browsers allow the user to set their own agent string. Well, if you are curious about questions like market share, and trends in the Internet community, examination of agent strings is still the only practical way to get a meaningful and sizeable snapshot of which browsers are being used on the web.

And I should add that people who set their agent string to "None of your business" or some rude four-letter words, are not being counted. In other words if you prefer to use Galeon and you would like the rest of the world (including statisticians, economists, lesgislators, spin-doctors, advocates etc) to know that there are a few Galeon users in the world, the best way to have a say, and in effect vote for Galeon would be to use the standard agent string that shipped with the Galeon distribution.

Curiosity about market share was the main reason that I added the -A option to the perl script that parses the apache logfiles. The logic behind this simple analysis was based on the small sample of agent strings that I had collected and the few browsers that I was using (various versions of lynx, Netscape, MSIE and Konqueror). However I found it difficult to get information regarding agent strings (2).

I based the original logic on strings like the following:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)
Mozilla/4.78 [en] (Win98; U)

The logic for this was very simple:

If the first word is 'Mozilla/4.0'
	If the second word is '(compatible;'
		use the third word as the browser type
	Else
		Treat it as Mozilla/4.0.
Else
	Carry out remaining checks.

This is based on the observation that MSIE always claims to be Mozilla/4.0. There was a flaw in this logic however. It turned out that there were many agent strings like:

Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0) Opera 6.04  [en]

On the face of it this would seem to be Opera 6.04 pretending to be MSIE 5.0 pretending to be Netscape pretending to be Mozilla/4.0. Now, I am only guessing when I say this. So if someone knows better, please send me an e-mail and set me right.

Since then I have modified the perl script that parses the apache log files. This means that an agent string such as the one above is no longer counted as MSIE. The new browser and robot detection logic has been placed in a seperate script called agent_id. This contains two perl subroutines, which_browser() and which_robot(). If you feed an agent string to these routines they should return the name of the browser or the robot.

In order to test the script you could save the agent_id to /MyPath/agent_id, and use a script like the following:

#!/usr/bin/perl
require "/MyPath/agent_id";
while(<>){
	$agent = which_browser($_);
	print "$agent\t$_\n";
}

You can now feed the browser agent strings to this script and it should print the results as browser and full agent string (seperated by a tab).

Conclusion.

A complete listing of the agent strings that visited my site and the browsers that I think they represent, can be viewed in the List of User Agent Strings and the List of Robot Agent Strings. These two lists are updated daily. So any new strings will be added to the list. If you see a string that is incorrect, please let me know. To find out what agent a certain string belongs to, just search for the agent string with your browser.

The logic which unravels the agent strings can be found in the agent_id script.

BIBLIOGRAPHY:

The Mozilla organisation has released documents on user-agent strings. Since they are closest to the "Original" GUI browser, they probably have more credibility than other organisations. The only question is "Will anyone take heed of any of their recommendations?" Probably not.
I spent a fair amount of time googling for lists of user agent strings. They may be out there somewhere. However finding them amongst the thousands of published server logs is like looking for needles in a haystack. There is a good summary of browsers at Dan's Web Tips. However it is difficult to use Dan's list to work out which browser corresponds to a strange new agent string in your log file. A more useful list for the purposes of identification can be found at Zytrax Browser IDs.