Web indexing

Please download to get full document.

View again

of 24
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Web indexing. ICE0534 – Web-based Software Development July 21. 2005 Seonah Lee. Contents. News related to Web Indexing Web Indexing? Web Indexing: Styles Web Indexing: Tools Web Indexing in Search Engine Web Indexing in Google Summary References Question.
Web indexingICE0534 – Web-based Software Development July 21. 2005Seonah LeeContents
  • News related to Web Indexing
  • Web Indexing?
  • Web Indexing: Styles
  • Web Indexing: Tools
  • Web Indexing in Search Engine
  • Web Indexing in Google
  • Summary
  • References
  • Question
  • Google tests tool to aid Web indexingBy Dawn Kawamoto, CNET News.com, Monday , June 06 2005 12:00 AMWeb Indexing?
  • Creating indexes for
  • individual web sites
  • Intranets
  • collections of HTML documents
  • collections of web sites.
  • Purpose for
  • helping users find information using a variety of keywords and gathering similar information.
  • Web Indexing?
  • Indexes
  • systematically arranged items
  • entry points to go directly to desired information within a larger document or set of documents
  • Indexing
  • an analytic process of determining which concepts are worth indexing, what entry labels to use, and how to arrange the entries.
  • Web Indexing: Styles (1/2)
  • Back-of-the-Book Style Web Indexing
  • Including “A-Z indexes” to websites or an Intranet
  • Some web indexes take the form of a list of hierarchical categories arranged in alphabetical order
  • Web Indexing: Styles (2/2)
  • Metadata and Web Indexing
  • assigning keywords or phrases to web pages or web sites within a meta-tag field
  • so that the web page or web site can be retrieved with a search engine that is customized to search the keywords field.
  • Web Indexing: ToolsWeb Indexing: The Most Famous Tool
  • HTML Indexer, by Brown Inc.
  • http://www.html-indexer.com/index.html
  • Web Indexing in Search Engine
  • Phases of work of Web SE
  • Document gathering
  • Document indexing
  • Searching in response to a query
  • Visualization of search results
  • The WebParseGatheringQueryIndexingRank or MatchVisualizationWeb Indexing in Search Engine
  • Almost every Web Search Engine uses a slightly different technique
  • The parsing discards some html marking
  • Some give different weight to terms in different html field
  • Some do not index the full text of the document, but only part of it
  • Some make full use of “metadata”
  • Very few make use of the information provided by linking: HITS and PageRank (Google)
  • Web Indexing in Google
  • PageRank
  • Google assigns a number called the PageRank to every web page that it knows about.
  • Assumption: A page is important if other important web pages link to it
  • Each Page = Node
  • Directed Edge = a link from one to the other
  • Main PageGoogleThis PageYahooWeb Indexing in Google
  • PageRank: Example
  • Assumption: an average page has a PageRank of 1R2R2: 0.6R1R1: 1.2R3R3: 1.2
  • R1 = R3
  • R2 = R1 / 2
  • R3 = R1 / 2 + R2
  • R1 = 2R1
  • R3 = R1
  • 3 = R1 + R2 + R3
  • Web Indexing in Google
  • HITS (Hyperlink-Induced Topic Search)
  • Divides pages relating to a topic into two groups
  • Authorities: pages with good content about a topic
  • Hubs: pages that link to many authority pages on a topic (directory)
  • Iteratively calculate hub and authority scores for each page in neighborhood and rank results accordingly
  • Document that many pages point to is a good authority
  • Document that points to many authorities is a good hub, pointing to many good authorities makes for an even better hub
  • Summary
  • Web Indexing
  • Web Indexing Styles
  • Back-of-the-Book Style Web Indexing
  • Metadata and Web Indexing
  • Web Indexing Techniques in Google
  • HITS
  • PageRank
  • References
  • News
  • http://news.com.com/2100-1032_3-5730744.html
  • Definition
  • http://www.marisol.com/websiteindexing.html
  • http://taxonomist.tripod.com/indexing/paperless.html
  • http://en.wikipedia.org/wiki/Web_indexing
  • Tools
  • www.stcsig.org/idx/articles/webindexing.pdf
  • Theory
  • http://amath.colorado.edu/outreach/demos/hshi/2001Sum/pagerank.html
  • http://www.cis.strath.ac.uk/~fabioc/04-mia/lects/11.pdf
  • Question?
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks