SEARCH ENGINE OPT. -- Search Engine Basics
These are the basic components that make up a search engine:
World Wide Web
The World Wide Web is a massive network of websites on the Internet that can be accessed by means of browser software using the communication protocols managed by the World Wide Web Consortium (also known as W3C, or "www", or simply the "Web"). The www protocols were originally conceived by British engineer and computer scientist Sir Tim Berners-Lee in 1989 while working for CERN, the European Organization for Nuclear Research based in Switzerland. The formal protocol was developed with the help of fellow CERN computer scientist Robert Cailliau and launched in 1990/91. The www developed into a network of millions of Internet-connected websites containing billions of webpages of information and other files that can be accessed by using Web browser software such as Mozilla's "Firefox", Microsoft's "Internet Explorer" (IE), Google's "Chrome", and Apple's "Safari". In 2008 Google Search announced that it had discovered over a trillion items (URLs) on the World Wide Web. By 2009 over 100 million domains were in operation and over 25 billion searchable webpages were indexed. By 2010 the number of users of the Web surpassed 2 billion. Search Engine Optimization (SEO) is the task of making sure that as many surfers as possible come to your website when they are looking for information on the topics that you cover.
Database
Search engine companies build incredibly huge databases in which to save some, or all, of each of the webpages they collect. Even though Google currently claims to have the largest database, it does not copy all the webpages that are out there on the Web. Google's page update rate is primarily around once-a-month, with more frequent updates depending on each site's PageRank rating. Several search engines feed off Google. Yahoo usees the Microsoft Bing engine to produce its searches. Some lesser-known search engines and directories can take months to get around to an update of some webpages on their database.
Robots
Unlike traditional directories edited by humans, search engine databases use robots that spider or crawl the Internet looking for new websites and new webpages. The robots update the engine's database to reflect new webpages or webpages that have changed, adding/deleting pages from the database as necessary. These robots find new sites by tracing links from other websites. Submission forms can be used to let some search engines know about your new website. However, Google and some other top engines no longer have submission forms due to the fact that their robots crawl all over the Web and supposedly discover everything that is out there. Some engines does not list websites that have zero incoming links to them. So if you launch a new website, it will be wose to ask some webmasters to link to you on an exchange link basis. The most important thing with search engines is to somehow arranges for directories and other websites to link to your site so that the robots can find you and increase your ranking.
Indexing
Search engines index under keywords. For example Google search has an index for the word "aardvark" that lists all the webpages in it's database that include the word "aardvark". If you publish a new webpage that includes the word "aardvark", the robots will soon find it and a pointer to your new webpage will be added to the "Aardvark" list on Google's engine. Your site is now ready for anybody looking for sites that mention about "aardvark", which incidentally is primarily the name of a mammal native to African, but the word could also be used in commercial business names, street names, or other names. Let's say that someone wants info on "Aardvark store". They bring up Google.com on their computer and type in "Aardvark Store". The index for "aardvark" and for "store" get tested, and only the pages indexed on both lists qualify, and will primarily list stores containing the name "Aardvark". Search engine document retrieval systems work on the basis of spending a lot of crunch time building the index structures so that the retrieval and listing process can take an absolute minimum of time.
Document Retrieval System
Document retrieval systems are a highly competitive business with many applications, including search engines. Libraries use a document retrieval system to serve up the references to the books you are looking for. The key to high-speed document retrieval is an efficient indexing system. Let's say your site is established with Google and you add a new webpage that talks about aardvarks, the African mammals. Soon the search engine robots will discover the page and load pertinent portions of it in the engine database.
Page Ranking System
Search engines constantly revise their indexed list for primary keywords such as "aardvark". The list is sorted according to page rank. The ranking system is tasked with the job of determining the order in which the qualifying webpages get listed in the search results. Obviously the one's webpage is listed in the search engine rankings, the more traffic will come to your webpage. The ideal is to try to get ranked on the first page of listings for a primary keyword or keywords relevent to the core topic of your website. This is not easy. Google claims to use a ranking recipe of over 100 ingredients. The task of achieving better page rankings for your website is known as "Search Engine Optimization" (SEO).
Meta Search Engines
Meta Search Engines are the ones that make use of more than one search engine in order to be even more comprehensive. Metacrawler and Dogpile do their searches from Google, Yahoo, LookSmart, Teoma, Overture, FindWhat and others.
Filtered Engines
Search engines such as MyWay.com offer a filtered subset of Google which is intended to be a lot cleaner for family consumption. AskJeeves feeds off the Teoma search engine which it owns.