Getting Listed in the Search Engines
By Gerry Patterson
Upon embarking on the setting up of a new web site and LAN, I tried to plan my requirements in advance. I had purchased the networking components, installed hardware and cables, installed the operating system, configured routing tables and set up DNS. I had also budgeted carefully for costs associated with these tasks. I thought I had considered everything. Everything that is, except getting listed in the Search Engines. Well that shouldn't be a problem! How hard can it be to get listed in the search engines? This article explores this rather knotty question.
Start Your Engines Please ...
A portion of my youth was spent in remote areas near the centre and top end of Australia. This was because my father worked as an exploration geologist. One of the high-lights of my childhood was to accompany him on field trips. Since these were often to places that could be hundreds of miles from the nearest town, a two-way radio was a vital piece of equipment. Every night, after we had set up our camp, we would unpack the radio, sling an antenna over a tree and call-in a check of our position and the expected time and location of our next check. The radio really only worked at night. When the sun went down, the hot red earth would surrender its heat rapidly and as night fell, the cooling layers of air that gathered over us had the dual effect of making the stars seem brighter and somehow closer than they were in any town and of allowing radio waves to pass easily up to the ionosphere and thus by bouncing back towards the planet, enable communication with people way below our horizon. Some of the voices that floated back to us could come as far as half a world away.
One of the things that would amuse my father was ham radio operators, who would go on air and tell everyone about their ham radio set. For someone exploring remote regions in the fifties and early sixties, a working two-way radio could have been literally a matter of life or death, and so it seemed amusing to my father, that someone would build an amateur radio for no other purpose than to tell other people about how to build an amateur radio! He was busy using this important technology to communicate and get on with his job. It is very similar to web-enthusiasists who these days build a website so that they can tell other people all about how to build a website. For the average computer user, the computer has insinuated itself into their lives to establish itself as indispensable now that it enables them to do their banking, shopping, accounting, write letters and e-mails etc. However, after they have done whatever it is they wanted to do, most of them want to get on with their lives. That is normal human behaviour. Those who see the computer as a tool, probably are amused or even puzzled by the fact that the the search engines are full of chatter about search engines. All of which is a common criticism of web culture, which at times is very inward-looking.
Some of the Things I wish I'd Never Said ...
And while I'm reminiscing I might as well admit that in the past I have made several statements, that I wish I hadn't, about my consumer choices, and in particular the choices I would not make. Among them:
- I would not buy a mobile phone ... they never work when you really need them. On the few occasions they are actually working, they ring at the most inconvenient time. And besides, anyone who really wants me, can page me!
- I would not buy a laptop. They are too expensive, too easily lost or damaged, have insufficient RAM and storage, are always running out of power and sometimes it's good not to have a computer.
- I Would not pay to connect to an ISP from home. Although The Web may have been an excellent idea in its' original concept and although it contains some little gemstones of information, I was not prepared to pay for the privilege of sifting through mountains of bullshit to get those (precious few) gems.
Building A Website About Building A Website About Building A Website ...
The reason for the long preamble is that I have finally decided to cease wielding my Occam's Razor and to put a toe into the warm scented water of marketing. And if I continue to tell myself that the waters are indeed warm and scented perhaps I will believe it. Up till now I have tended to regard the sales and marketing pond as rather cold and greasy, and largely inhabited by bottom feeders. I have been a devout sceptic for most of life. In fact, I am so devout that I am even sceptical of my own scepticism. This has made it difficult to engage in self-promotion, or for that matter any form of promotion. I gather that several years ago it was quite easy to get listed in the major search engines. These days, however, it seems there are only two ways of getting listed in the search engines. You can either pay quite a lot of money, or you can promote yourself.
When I decided to embark on my home office project, I costed everything carefully. I bought an impressive computer with a dual IDE bus with two 100 GB drives, which run as RAID-1. The computer has 1.5 GB of RAM and a Pentium 4 processor, which has a clockspeed of 1.6 Ghz. I am not exaggerating too much when I say that most MIS managers would have wet their pants with excitement if you had presented them with vital statistics like that ten years ago, especially with a price tag of about $AU 3000 (That's about $US 1500). I also mapped out everything I needed to do from a technical viewpoint:
- I looked at installing an OS. Ok, I know about such installations. Without being too immodest I can say that what I don't know probably isn't worth knowing.
- Cabling my office. I'm not a sparkie, but I can legally have a go at anything that doesn't have 240 volts running through it.
- Configuring the LAN. I have not spent a lot of time on network configuration, but I understand the principles. I concluded that it wouldn't take me long to work it out. It didn't.
- Configuring DNS and BIND. I'm not a DNS guru, but I know a lot about it. Certainly more than enough to setup a small site like my own.
And I waited for the great and revered Googlebot to visit my humble little site.
And I waited ...
And waited ...
And waited ...
While I was waiting, I thought I might do some research on the topic of search engines. Not surprisingly there is a lot of information about this topic. Most of it highly subjective. The problem is that unless a particular pundit has inside knowledge of the code that Google or Lycos or whoever uses, then any pronouncements on search engine behaviour is mostly opinion. True, it is possible to do tests, but an accurate scientific test would be very time consuming. Also search engine behaviour is by its' nature a moving target. As website owners try to keep raising their ratings by using tricks, the search engines keep responding by altering their code. Furthermore, it seems to me to be a fool's errand to try and trick the search engines. You might be found out. And if they (the search engine administrators) get annoyed, they might just decide to ignore you! Which would be a fate if not worse than death than the equivalent for a startup site. Still, sifting out the scuttlebut, some of the opinions about Google, in particular and search engines in general, are believable:
- Of all the folklore surrounding search engine behaviour, the one that makes most sense to me, just from a consideration of their function, is the importance of the number of links to your site. To test this I asked a friend to create a link to my site from his web site, whatson, which had been running for five years. The next day, the Googlebot appeared and read my home page. At last! I thought, the Googlebot has arrived! However, my elation was a little premature. The Googlebot did not return. However a few days later I finally had an entry in Google's search engine. It was only an indirect entry via the link from whatson.
- Another believable bit of folklore is the assertion that major search engines like Google rely on other search engines and or directories of sites. Not only is it reasonable along the lines of Why keep a dog and bark yourself? But if you think about it, it is really just an extension of the previous principal. After all a reference to your site from another search engine is like a link, even though the reference may be dynamically generated, and in the case of some indices and directories the content is static.
- Another thing that would make sense to me is the number of requests that a search engine gets for content on a certain site. If the engine is fulfilling it's task it should rank sites according to their hit rate. However, for a startup site this is a classic chicken and egg situation.
Search Engine Hits and Myths.
Despite getting an entry in the Google index, my site was still not listed in several major directories, and my ranking in Google searches using common keywords was way below the horizon. So I resumed searching for directories and search engines. I also looked to increase the content on my website, as I was coming to the conclusion that this was the best way to get listed in the search engines. My reasoning for this is:
- As I said previously it seems a bad idea to attempt to trick the engines because it is possible for them to take counter-measures. If I was designing a search engine, I would be on the look out for sites that tried to create large numbers of links to other sites with no obvious purpose or policy, I would also be watching for attempts at stuffing large numbers of key phrases into web pages, or including text that was invisible to a human reader, or any other deliberate attempt to spam the engines. I would also be wary of bogus automated search requests designed to increase the hit rate on a site. There are counter-measures that I could imagine that could be included in search agent code for the aforementioned simple spamming techniques. Other more sophisticated techniques might be harder to screen automatically but there are other measures I would take (as outlined below). Having said that, I must admit that one of the unfortunate side effects of software counter-measures might be the screening of genuine content from the engine's indices.
- It might be possible to fool a robot, but an experienced human could differentiate spam fodder from genuine content with little more than a glance. So if I was in charge of a search engine, and due to shrinking revenues and increasing demand on resources, I found myself unable to afford large numbers of knowledgeable human editors, I might try to form an alliance with projects like ODP (Open Directory Project) which have human beings validating the site. Ok, I don't know for sure that the major engines use this as their policy, but a number of major engines do seem to be using ODP.
- Some of the mergers and consolidations that have taken place lately tend to indicate that the number of major search engines is shrinking. There is less money in the Internet economy now that web based businesses have to work for it rather than rely on perpetual motion or the stock market defying gravity. Although the web is still growing in physical terms, it is shrinking in investment. When money is scarce it becomes smarter. And businesses that rely on Open Standards should have an advantage. Otherwise they will need very deep pockets. And if you are chasing stupid money in this environment, you may become an endangered species.
- In any case, for my little site, I want visitors to stay once they visit. It might be possible to get one past the keeper, and spam the engines. Even if I succeeded in this, I would gain nothing if my site was not worth viewing. After all, the purpose of the promotion is to get human beings to read the material on the web site.
- I want visitors to return to my site. Visitors are highly unlikely to return to a site that has tricked them. And if they do, are they a potential customer I want? The best way to encourage them to return to the site is, once again, to have content.
The web is flooded with advertisements about how to improve your ratings with the engines. Much of this seems to work on the same principal as snake-oil salesmen. You want to look attractive? Be popular? Just buy this product! That's all you need to do! You don't have to have a healthy diet! You don't need to exercise regularly! You don't need to avoid dangerous and addictive substance! You don't have to be blessed with a good genotype! You just have to buy this product! Some consumers fall for this line. Because they want desperately to believe it. If only I could just take this pill, have this surgical procedure performed, buy this device etc, then my life will be better. Without actually doing anything to maintain my quality of life. If only it were true! It's the same with website promotions, especially the lower priced ones. People want to believe that there is a product that will make their website the most popular on the Internet. Having said that, I have to admit that there is one surefire method to get a good rating with the major engines. You could just pay them to give you a good rating. Many of them will do it. And the major engines do have the power to direct traffic to your site. However, if you haven't also paid someone to put content on the site, you are just wasting your time and money.
There is no magic potion.
Ok, so you know what I'm going to say. It seems a good idea to stick to the three Cs: content, content, content. And once the content is there I might have to consider an advertising budget. Nevertheless I thought I would try some home grown self-promotion. After writing several articles and putting together some scripts which I thought were useful. I placed relevant keywords in the HTML META tags in these articles. I then fed those same lists of key words into Google with the word "POSTED" appended to the end of each list. I reasoned that I might find similar articles listed on a site that accepted submissions posted from internet users. And if I found such sites, I would offer to post my articles to the higher ranked sites, which should improve my visibility. I was very fortunate to find a site almost immediately. It had a high profile and a very close match with the keywords that I had put in my articles, and as a bonus, they also supported Open Standards, which I have concluded will soon be more influential than proprietary standards. The site that I found was Librenix, which is a popular and successful website which publishes original articles and news about Linux, System Administration and related topics. They have an open policy to contributions, and rely on their users as well as in-house staff to rate the submissions. And if you don't already know about Librenix, I recommend that you have a look. After submitting to them, I noticed a substantial increase in the number of hits to my site. I also noticed an increase in the number of robots visiting my site. The robots seemed to be chasing just the articles that were submitted to Librenix. The better an article rated on Librenix the larger the number of robots that followed it. Was it possible that some search engines like Googlebot payed attention to the ratings on Librenix? Read on ...
I carried out some experiments with articles as I submitted them to Librenix. Hits usually arrived within minutes of being accepted. Flocks of robots followed the hits usually within 24 hours. Within 48 hours I could find parts of the article that I submitted using Google Search to look for phrases in the summary that was submitted to Librenix, even though these phrases had not been placed in any META tags (at Librenix or at my site). However searching for phrases that occurred in documents not listed with Librenix or key words that occur only in the portion on my site does not get a result in the top fifty unless I make them highly specific. It seems that high-ranking sites do get preferential treatment from google not only in terms of subsequent ranking, but also in the numbers of robots that get driven to a document. This means that a link from a high-ranking site is worth much more than a link from a lower ranked site, whether or not you get hits from the link. Which is more or less what I expected. I cannot say whether the higher ranking translates into more frequent visits from robots or vice-versa. This is another chicken and egg situation.
Next, I decided to add a facility to my site, so that other sites with relevant content could add links. This was a CGI script and it also presented an opportunity to create another article on the topic of Generating a List of Links From Postgres. I used perl to analyse the log files and give some basic statistics, as outlined in Parsing The Log Files.
As I built up links around the web, my visibility in the search Engines improved. It had been a hard grind and I had worn out the lettering on my new keyboard by this time, but at last, I was listed. Furthermore the Googlebot had crawled my site and indexed almost every word in every document. I was able to see some accidental hits that came my way from Google, because a highly specific phrase had matched some content on my site. The query details could, of course, be found in the log files, as outlined in Parsing The Log Files, and could be analysed with perl, as explained in that article. The visibility of PGTS, my systems administration and DBA web site, has been increasing. Though when you are starting from zero, things can only get better. It remains to be seen whether I can achieve a ranking similar to PGTS, the high-heel foot fetish web site. I'm not sure who had the name first, but they got on the web before I did, and paid handsomely for their number one spot. And they have lots of content, but quite different from this site. And by the way, if you came here looking for some of that other content you should have realised by now that you are at the wrong site. Besides you would be disappointed at the sight of me in stilettos, leather skirt, and fishnet stockings. (Yes, there are some things that I will not do -- even for promotional purposes)
Ok, I still have a long way to go. So if instead of looking for high heels, you came to this site looking for a magic potion to improve your popularity, you may be disappointed to learn that all I have to say on that topic is, there is little substitute for hard work (except perhaps money). Or maybe you already knew that.
The system seems to work as it is supposed to. The best way to improve a site's ranking with the major search engines is to add relevant quality content. It would seem best to use META tags to convey what really are the key phrases in the document. To date, I am yet to find any evidence for significant improvements in visibility due to content in META tags. But then again this might just be one of those things that only matters once you are in the top five percent. And although I don't know for certain, I suspect that attempts to spam the engines with META tags will be met with software counter-measures that could result in your entry being culled.
I might also add that this is all just opinion. I do not have detailed inside knowledge of the workings of the major engines. This document is based mainly on my observations. The few tests I have performed have not been rigorous because I was more concerned with getting the site listed than researching search engine behaviour. And I must reiterate that the topic of search engines is complex. And it would be wise to regard any definitive statements on such a complex topic with a considerable degree of scepticism. But then again I regard just about everything with a degree of scepticism. Furthermore my own experience would indicate that if you have a product to sell, you've prepared your site, you have swags of quality content and you have more money than time, the easiest course would be to just pay Google to rate your site. This will probably give you the best bang for your buck. If you feel strongly enough to object about any of my opinions and can express your objection in an articulate e-mail, please do. I will publish your objection (unless you object to publication).
I just hope I don't have to eat my words again ... (I'll get some more relish just in case).
Useful Links:Link removed ("http://www.google.com/addurl.html) ... This form warns that there is no guarantee that you will be included if you submit this form. And they're not kidding!
robotstxt.org. Information on the robots.txt Robots Exclusion Standard and articles about writing well-behaved Web robots.
SelfPromotion.com. Free tips and articles, submission aids and automatic submission. Pay if you are happy
Aeiwi Search Engine Requires meta tags with commas. There seems to be some discrepancy about this issue. Some sites say they require commas others say they require only spaces.
Alexa Has an invitation to the ia_archiver robot to crawl your site. Once it starts, this robot will slowly and steadily crawl the entire site.
Web Wombat. An Australian index. Standard submission claimed to be free for .au and .nz. Tardy to action non-paying requests.
Link removed ("http://www.vicnet.net.au/search/") ... Library of Victoria, Australia. Links available for local and regional sites.
All The Web. Submit your site to AllTheWeb.com.
Add Me! Free automated submission tool to numerous search engines. This one seems to work!
EuroSeek. Navigate to category and add site.
AOL search. How to add your site to AOL (uses Open Directory).
ExactSeek. Free Submission to ExactSeek.
RankPilot. Free search engine ranking service. Pretty graphics. Accuracy not tested.
Lycos. Search options for Lycos.
Northern Light. Index of Text Documents, based in Massachusetts.
Link removed ("http://www.whatuseek.com/addurl-secondary.shtml">) ... WhatUSeek Secondary Index. An index of sites
INeedHits. A good entry level free submission service
Search Engines Worldwide. A Collection of search engines sorted by country and region.
www.rtlsoft.com/submitblaster. At the time this article was written this was a free automated submission service. It seemed very good. This link is now broken.
PGTS Links. Add your web site. Categories are: Open Source, Index/Search Engine, Community/Regional (Australia only), Journal, Off Beat, Security