 |
|
|
These are the basic components that make up a search engine: |
| |
 |
| World Wide Web |
When I was first sending and receiving email in the 70s, email traveled over interlinking telephone
connections late at night to save on long distance phone bills between the universities and hi-tech companies. Serving up a
web page might have taken a few days. Not until email and newsgroup traffic started flowing 24 hours a day, over faster
connections, passed forward by a lot more serving computers could the internet and it's websites seriously take off as a
practical application of the World Wide Web. |
| Database |
Search engine companies build incredibly large databases in which to save some or all of each of the web
pages that they collect. Even though Google currently claims to have the largest database, it does not copy all the web
pages out there. Google's page update rate is primarily once a month with more frequent updates depending on each site's
PageRank rating. Several search engines feed off of Google. Many search engines and directories can take months to get around
to an update. |
| Robots |
Unlike directories which are edited by humans, search engine databases use robots that spider the internet
looking for new sites, updating pages that have changed and adding/deleting pages from the database as necessary. These robots
find new sites by tracing links from other websites. Submission forms can be used to let the robots know about your new site.
However, Google does not list sites with no incoming links, so no point in submitting to Google. The important thing with
Google is to get directories and other sites to link to yours so the Google robots can find you. |
| Indexing |
Google has an index for the word Aardvark that lists all the pages in it's database that include the word
Aardvark. A pointer to your new page gets added to this Aardvark list. Your site is now ready for anybody looking for sites
that talk about Aardvarks and lets say that's me. I bring up Google on my computer and type in "happy aardvark". The index for
"happy" and for "aardvark" get tested and only the pages indexed on both lists qualify. Document retrieval systems work on the
basis of spending a lot of crunch time building the index structures so that the retrieval and listing process can take an
absolute minimum of time. |
Document
Retrieval
System |
Document retrieval systems are a highly competitive business with far ranging applications including search
engines. Libraries use a document retrieval system to serve up the references to the books you are looking for. Key to high
speed document retrieval is an indexing system. Lets say your site is established with Google and you add a new page that
talks about Aardvarks. Soon enough a robot discovers the page and loads it in the database. |
Ranking
System |
The indexed list for Aardvark has already been sorted according to rank. The ranking system is tasked with
the job of determining the order in which the qualifying pages get listed in the search results. Google claims to use a
ranking recipe of over 100 ingredients. |
|
More About Search Engines |
| Meta |
Meta Search Engines are the ones that make use of more than one search engine in order to be even more
comprehensive. Metacrawler and Dogpile do their searches from Google, Yahoo, LookSmart, Teoma, Overture and FindWhat. |
| Filtered |
Search engines such as MyWay.com offer a filtered subset of Google which is intended to be a lot cleaner for
family consumption. Ask Jeeves feeds of the Teoma search engine which it owns. |