1 / 17

Search Engines

Search Engines. Search Engines. Search engines take massive amounts of text and index them, so you can quickly and easily find key words Special Example archives USENET posts; Portal search engines archive web pages; Local search engines are limited to that site Travelocity.com Meta

Download Presentation

Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Engines

  2. Search Engines • Search engines take massive amounts of text and index them, so you can quickly and easily find key words • Special • Example archives USENET posts; • Portal • search engines archive web pages; • Local • search engines are limited to that site • Travelocity.com • Meta • Queri other search engines

  3. Web Search Engines Examples • Altavista (www.altavista.com) • lycos.com • Excite.com • yahoo.com (Actually indexed/filed by humans) • google.com • dogpile.com (meta-search engine)

  4. Search Engines • How they work User Browser Or Meta-search engine Queri User Interface Filtered database Searcher Evaluator Indexed Database indexer Raw database Web pages Gatherer(spider)

  5. Meta Search Engines • How they work User browser User Interface Filtered database Searcher Evaluator Raw response Search Engine Search Engine Web pages Search Engine

  6. Key Word Search User Browser User Interface Filtered database • search options • Match all words(and) • Match any words(or) • Mach whole phrase Evaluator Sort Criteria *Key Word Frequency *Key Word in Title *Meta tag *number of key words *Proximity to Start *Number of page references to this page Searcher Indexed Database

  7. Web Search Engines • Search engines usually work via “spiders” • A spider visits a site and downloads a web page. It indexes that page, then looks for any pages that are linked from it, and repeats the process • Use the <meta> tags key words , description

  8. Web Search Strategies 1 Each page is downloaded, indexed, and any links followed 2 linked5.html linked1.html linked2.html 3 myPage.html 6 linked3.html linked6.html 4 7 linked4.html Depth-first Search Breadth-First Search linked7.html 5

  9. Web Search Engines If web pages are not linked you may get independent “trees” that are not indexed wine.html beer.html laFite.html Oly.html Ranier.html

  10. Percentage Indexed? • Maybe 10-15% of web pages are indexed • Hidden web • Not accessible freely to public • Some sites disallow spiders/bots • Load issues • Content ownership issues • Site traffic pattern issues

  11. Getting Indexed • You can submit a page to a search engine to be indexed; See search engine instructions; usually a link off the main page • http://www.google.com/intl/en/about.html • You can see the spiders visit in the logs: • Of the server hosting the web pages • They seem to visit about once a week • You can configure your server to refuse bots • Search engines can be out of date!

  12. Search for "search engine submission" • http://www.wpromote.com/ - 50$/mo submission service • www.addme.com/ - free and for pay submission With this service your site will be optimized, submitted, and monitored to achieve top ranking in the most popular search engines: 20 guaranteed top 20 ranking. • http://www.spider-food.net/

  13. Examples of Special Search Sites • Thomas.loc.gov • Bills, congressional record, reports • uscode.house.gov • www.leginfo.ca.gov • www.fas.org • www.janes.com

  14. Media on the Net • Papers: • www.washingtonpost.com, nytimes.com mercurynews.com, latimes.com • Networks: • cnn.com, abcnews.com, nbc.com, cbs.com

  15. English & non-English • Just about everything is in English on the web • You can get an approximate (very!) translation with babelfish at altavista • Sometimes very entertaining

  16. Maps • Driving maps at yahoo.com • Satellite photos at www.terraserver.microsoft.com • USGS & Russian, can go to 1 m resolution; 1 terabyte of data • (They did it because it was big. All transactions on the NYSE in history are 0.5 terabytes)

  17. Summary • Search Engines use “spiders” to index text and build local data bases • Indexing of Your material can be controlled by HTML <META>, Word placement in text, Page submission, • Meta Search engines, Search engines, Portals, subject matter pages …>more special • Beware of commercial motive , most of the Web is not indexed • Other services News, stock, *.NET initiative Map, weather, etc.

More Related