That's SIR Google To You ...

That's SIR Google To You ..

By Gerry Patterson

In recent months I have seen letters and articles complaining about bias in Search Engines. One such article cited a test search (carried out in the USA) for refrigerators, which failed to find all thirteen major US brands of these consumer items. On ABC News Radio The well known broadcaster, Dr. Swann, commented that a search for "immunization" in Google returned far more (rather hysterical) anti-immunization polemic, than factual data on immunization. Also letters have been published in the Australian Press, alleging that there is a lack of competition in search engines. Most of them cite Google as the offending search engine. Also some critics say that Google has dumbed down searches, returning pages that are not necessarily the best, but the most popular.

So, Does Google need some competition?

Another Little Truck Stop Open For Business.

My initial experience with the Internet was not all that exciting. This was my (rather despondent) impression at the time:

Languishing forlorn, in a swamp, at the end of a cul-de-sac, behind one of the pylons that support the gigantic information super-highway, the PGTS truck-stop is now open for business.

The tiny neon sign flashes unnoticed, since this cul-de-sac is closed to pedestrian and vehicular traffic. From overhead the rumble of super-trucks on the information super-highway sounds like constant thunder. They don't stop here ... The nearest off-ramp is a few klicks up stream. Rain spatters on the little neon sign ... it sputters and almost goes out. A cloud of Nimbda and Code Red insects hover outside the flyscreen. One of them self-electrocutes in the ultra-violet wash of the bug-zapper. It dies with a juicy splat.

My dog scratches himself.

Another Nimbda bug sizzles briefly in the bug-zapper.

Not much is happening at the PGTS truck stop.

The scale of fees being charged for guaranteed listing in the search engines was exorbitant. Or so it seemed to me. One encouraging fact (or discouraging depending on your point of view), was the clearly discernible and steady decrease in these fees, which has continued to this very day, as the Dot Com silliness has steadily and inexorably evaporated. Predictably, this was accompanied by a decrease in the number of search engines. In a previous article, I wrote about these initial experiences with the Internet and Search Engines, from the point of view of a web site operator rather than a user. And I included some of the assumptions that I would use to construct a roadmap, to help me navigate my way around them. (see bibliography).

I had previously reached the conclusion that there was only one search engine worth using and that was Google. My experience of running a website has, if anything, strengthened this conviction. Since Google started to index this site, it would seem that about seventy-eight percent of the referred traffic (i.e. containing a valid referrer string) has come from either Google or a provider that uses Google services. So, having admitted that my site virtually owes it's existence to this illustrious search engine, most readers probably don't expect a highly critical assessment from me. Nevertheless, having declared my self-interest, I will try to be objective about some of the criticisms being voiced about Google, in the interest of genuine free enterprise. After all, everyone believes in competition ... You do believe in competition don't you?

The search engine market was, in fact, one of the few software markets during the nineties that displayed genuine competition. During this decade there were many examples of vendors occupying market niches and locking customers in and competitors out by using vertical integration, employing anti-competitive agreements, pricing policies and/or other practices. Search Engines, however, competed on a level playing field. And they all offered their wares free of charge to the end-users. Last decade there were numerous popular search engines on a more or less equivalent footing. The best opportunity for growth of these entities was to expand their user base. Although not all of them realised this. Those that did, pursued their goals aggressively. In so doing they hoped to increase revenue. Since there was no charge for the service to the end-users, and no opportunity to lock them in with agreements, the only path available was to offer a better service. Even though there was no fee charged it was a remarkably close approximation to an ideal market, that should have warmed the cold hearts of the most fundamental and dogmatic economic rationalists.

In those giddy boom times, I would switch between a dozen or so major search engines, choosing a different one each time I got onto the information super highway. One of the search engines I did not choose very often was the one called Google. I reasoned that a search engine with such a silly name would not be of any use to a serious computer professional such as myself. Eventually despite my prejudices I started to use Google more often. Despite the name, the results were often impressive. Over time I used the other search engines less frequently. Many of them showed a heavy bias towards their subscribers, often to the exclusion of more relevant websites. Google on the other hand would return large numbers of relevant references from all over the Internet. Now, I use Google exclusively. Not that I have locked it in to my browser. After all, I still like to think that I have a choice.

The rest is history. The fact that this near-ideal market has produced a single dominant player should be of equal interest to economic theorists and computer historians. The rise of the search engine giant has even given us a new verb. The verb to google. By now, of course you all know the meaning of this verb ...

Google ... another 800 lb gorilla?

It seems that Google owe their success to some important strategic decisions:

Rather than try to collect subscriptions from the sites in their index, they decided to rely on sponsored advertisements, and later on re-selling search services.
They use Linux clusters often running on Intel hardware. The cost of this platform is extremely low, if not the lowest. And yet the performance is extremely high.
Google employ a page-ranking algorithm that incorporates the links from various sites. The algorithm considers these links to be a "vote" for a particular page. These votes go into calculating a score used to rank returns to their users' queries.

By making these choices Google have entered into a form of symbiosis with the sites in their index. Although Google has dallied with products like toolbars, to date they have resisted the temptation to indulge in vertical integration and have rather, continued to earn respect and loyalty by being consistently better than their competitors. In order to maintain their pre-eminent position, Google must continue to trawl the web for content to provide their users. Although on the face of it this is a democratic system, it is not an egalitarian democracy. In the Google page ranking system, a vote from a high-ranked site is worth more than a vote from a lower-ranked site. It is possible to improve one's rating, although for a new site, the prospect might appear daunting. However, in order to earn a high position in the page-ranking and maintain it, the best way is to provide quality content. There is no such thing as a free-lunch. And, unless you are an exceptional busker, you will need some sheet music, if you are going to sing for your supper.

A number of people charge that Google (and other search engines) are biased. In the case of Google, this criticism is aimed at the pagerank algorithm. This in effect results in pages that are not necessarily the best, but the most popular. Some of this criticism may be sour grapes, but some of it is articulate and passionate. The argument is that by ranking some sites higher this leads to inequality and a lack of fairness ... Ahh but, surely the most popular pages must be the best? That is a significant ideological point. In effect, pagerank imposes a tyranny of the market. A further criticism is leveled at Google along the lines of privacy. Google makes use of cookies and as the leading search engine, may be gathering a lot of information about a lot of web users. Now that survellence is high on the government agenda (esp in the USA), this may become an issue for privacy advocates, who see the possibility of Google being seconded by government agencies. If you are interested in reading more about this, see the bibliography.

Whose 800 lb gorilla is it anyway?

So has Google grown too big for their boots? Here I am going to go out on a limb and declare that the problem of competition (or lack of) amongst search engines is not really such a big deal. Complex indexing systems often tend toward a centralised model because of the efficiency of a central index. Bearing in mind that having earned a position in Google's index, I do have a (declared) self-interest in the status quo. Still I believe the major problem facing the Internet is the question of ownership. In theory nobody owns the Internet, and as far as we (the users) are concerned, this is a desirable state of affairs. However the past few years have seen a big push from a few large corporations to take control of The Internet. And resisting this will take considerable effort and require on-going vigilance from Internet users.

I mention this because some notable changes have been made to legal systems around the world during the last hundred years to assist this ongoing seizure of information and software by major corporations in the Information and Entertainment sectors. Copyright law which was originally intended to give the benefit of a limited monopoly to authors, so that they may be advantaged as a reward for their endeavours, has been progressively extended. In the USA the most recent extension takes it to the lifetime of the author plus seventy years. It is difficult to understand how an author might be a beneficiary seventy years after his/her death. And if anything the limits on copyright seem to vanishing. Rather than a limited monopoly as originally intended they seem to be becoming perpetual monopolies. Even the rather tenous argument of benefits to the author's descendants starts to wear a bit thin when we are asked to consider benefits to the descendants of the author's descendants. And in any case, these changes have not been made for the benefit of authors or their great-grandchildren. Despite the pious utterances from legislators, copyright legislation has clearly been modified for the benefit of corporations. In addition to the protection afforded by copyright laws, courts in the USA are continually inventing new ways to lock up information and create barriers to competition. Good examples of this were the "software patent" concept which arose in the eighties and later the "intellectual property" notion, both of which run counter to the original intent of patent and copyright laws. And in any case words like "software patent" and "intellectual property" are oxymorons, couched in legalese to lend them validity.

This seizure of software and intellectual assets, if successful, will ultimately stifle the exchange of information and ideas that has been the engine of research, innovation and creativity in the sciences and arts ever since The Renaissance. For that reason, the emergence of open source, and a gathering rebellion against this seizure is an obvious and necessary reaction to these restrictions. And the world wide web has become an important battleground in this contest. This conflict is now developing into a battle for ownership of The Internet itself. In the Entertainment sector, large corporations are pressing to include cumbersome and restrictive protection software, firmware and hardware in digital media, rather than meet the challenge of diversifying and finding new distribution channels and revenue streams. And in the OS arena, Microsoft have at last declared their hostility to open source, and seem to be planning a major assault on Open Standards, which will most likely take the form of an attempt to subvert them (see bibliography).

The purpose of the assault on the Internet is to seize control of it and turn it into a mere distribution channel. Internet enthusiasists contend that the world wide web is too large and too complex to be restricted to the role of a distribution channel, and that corporations that use the Internet should learn how to adapt their behaviour to the protocols rather than change the protocols to suit their behaviour. And so the battle lines are drawn.

As far as the obviously well-intentioned and articulate criticism of pagerank go. It is a sad fact of life that the market will come to the Internet (or the Internet will go to the market). And this will bring with it all the problems that a market-based system brings. And for those who are concerned about privacy, they can always turn off cookies. Google still works ok without cookies. If Google weren't using a system based on the popularity of pages and sending cookies someone else would. And to date, Google has flown the open standards colours, by using open source products in their core enterprise and endorsing the free exchange of information and ideas. So, although Google might be an 800 lb gorilla, we can take solace from the fact that it is our Gorilla

The Internet Swamp.

As far as the allegations of bias, I have to say that they are probably true. However, anyone who uses a search engine such as Google needs to be aware of the following:

Pageranking is determined by the popularity of a site. If a bias exists in the Internet, then this bias will be reflected in the search results. There is no guarantee of accuracy. So if, for example, there is more anti-immunization polemic on the web than immunization information, the results will reflect this bias.
Pageranking is also affected by the quality of a site. If a certain manufacturer of refrigerators has a site that features huge graphics files, with some dull repetitive advertising copy that never changes, and has no information content, then they will almost certainly get a very poor place in the pageranking index. They may even be omitted entirely. On the other hand a well organised site that has plain HTML pages filled with constantly updated information will probably get a better ranking, especially if other sites link to it.
The results are dependent on the query. Google go to great lengths to try and make the interface user-friendly, but a poorly constructed query will give you a poor result. This is the old GIGO principal. So for example if you type immunization you may get many references that contain hysterical polemic. If you had typed immunization +facts. The insertion of the second word can lead to results that will at least purport to be factual, even if they are still biased. On the other hand, you might get a better result with immunization +statistics -facts, because people who purport to give facts often don't. If you were really interested in a certain type of immunization, something like "meningeal coccus"+infection+immunization, should return more specific results. All you need is a basic understanding of the topic, some knowledge of how computers store information and you should be able to come up with a useful query. Even so the results may display a bias. This is where the last item in this list is most important.
Use common sense when checking the results. Just because it is on the Internet doesn't mean that it is correct. It could be a load of rubbish. In fact old diehard web critics (like myself) would say it is quite likely to be rubbish. The Internet has been touted as a deep Ocean of information. At times it may seem more like a shallow swamp of rumours, mis-information and propaganda. However a little common sense should enable you to filter the bullshit from the gold. There are the obvious checks such as verifying that a site identifies itself and that they are who they say they are. A reputable publication usually gives references to back up any assertions. And a bit of scepticism never went astray.

Of course if you are not happy with bias on the Internet, you should write something yourself. Make sure you link to other sites that support your point of view, maybe even link to the ones that disagree with your position. Might as well let the readers make an informed decision. Maybe then the web will be a little less biased ...

As far as some more competition? Google does not have a monopoly. Anyone who wants to, can challenge them. Although if you are thinking of competing, you would be well-advised to use open source and adopt open standards and get yourself several truck-loads of Linux clusters ... Otherwise Google will be so far ahead of you, you won't even be able to see their tail-lights ... disappearing way up ahead in the gloom of the Internet swamp ...

BIBLIOGRAPHY:


Gerry Patterson	Getting Listed In The Search Engines. The first article that I wrote about Search Engines (esp Google).
Nathan Cochrane	Warning on search engines. Article in The Age weekly IT section (Next), which warns about bias and lack of competition in Search Engines.
Clay Shirky	Power Laws, Weblogs, and Inequality. An article that is critical of the Google pagerank system. There is also an organisation known as google-watch.org, who run a Google-Watch Proxy. The criticisms on these sites are articulate and well-reasoned. However in Google's defence I must say that Google does give them a good pagerank.
Google	Rank Explained. The Google version of how their search engine works. This official explanation seems to match the conclusions I reached (independently) in May 2002. One of my assumptions was incorrect however. It seems it is not possible to purchase a high pagerank. Google claim that sites have to earn their position in the pagerankings.
Richard Forno	DMCA A article which discusses the US Digital Millenium Copyright Act of 1998, which was passed on behalf of the Entertainment Industry.
Eric Raymond	The Halloween Documents. I swear that I did not know of the existence of these documents when I wrote previous articles about Microsoft. Of course I suspected that something like them must exist. Here, former Microsoft insiders discuss the possibility of deliberately subverting Open Standards. It's real Horror Show Also, largely because of the Halloween Documents, Eric has written a rather passionate anti-Microsoft rant that compensates with eloquence and wit for what it lacks in diplomacy.