Monday, October 17, 2011

Chapter 8 Search Engine

How do search engines work?

The good news about the Internet and its most visible component, the World Wide Web, is that there are hundreds of millions of pages available, waiting to present information on an amazing variety of topics. The bad news about the Internet is that there are hundreds of millions of pages available, most of them titled according to the whim of their author, almost all of them sitting on servers with cryptic names. When you need to know about a particular subject, how do you know which pages to read? If you're like most people, you visit an Internet search engine.

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:

  • They search the Internet -- or select pieces of the Internet -- based on important words.
  • They keep an index of the words they find, and where they find them.
  • They allow users to look for words or combinations of words found in that index.

Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day. In this article, we'll tell you how these major tasks are performed, and how Internet search engines put the pieces together in order to let you find the information you need on the Web.


They are three types of search engines:

  1. Crawler-based search engines
  2. Human-powered directories or Search engine directory
  3. Meta or hybrid search engines

Crawler-based search engines

Crawler-based search engines, such as Google (http://www.google.com), create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If web pages are changed, crawler-based search engines eventually find these changes, and that can affect how those pages are listed. Page titles, body copy and other elements all play a role.

The life span of a typical web query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information. The following graphic (Figure 1) illustrates this life span (from http://www.google.com/corporate/tech.html):


3.
The search results are returned to the user in a fraction of a second.



1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query.

2. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.


Human-powered directories

A human-powered directory, such as the Open Directory Project (http://www.dmoz.org/about.html) depends on humans for its listings. (Yahoo!, which used to be a directory, now gets its information from the use of crawlers.) A directory gets its information from submissions, which include a short description to the directory for the entire site, or from editors who write one for sites they review. A search looks for matches only in the descriptions submitted. Changing web pages, therefore, has no effect on how they are listed. Techniques that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.

Hybrid search engines

Today, it is extremely common for crawler-type and human-powered results to be combined when conducting a search. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search (http://www.imagine-msn.com/search/tour/moreprecise.aspx) is more likely to present human-powered listings from LookSmart (http://search.looksmart.com/). However, it also presents crawler-based results, especially for more obscure queries.


5 Example of Search Engines on the internet:

Reference

http://computer.howstuffworks.com/internet/basics/search-engine2.htm

http://edtech.boisestate.edu/bschroeder/publicizing/three_types.htm

No comments:

Post a Comment