Chapter 19Web Crawler... Chapter Objectives• Provide a case study example from problem statement through implementation • Demonstrate how hash tables and graphs can be used to solve a
Trang 1Chapter 19
Web Crawler
Trang 2Chapter Objectives
• Provide a case study example from
problem statement through
implementation
• Demonstrate how hash tables and
graphs can be used to solve a problem
Trang 3Copyright © 2005 Pearson
Addison-Wesley All rights
Web Crawler
• A web crawler is a system that
searches the web, beginning with a
user-designated we page, looking for a designated target string
• A web crawler follows all of the links on
each page that it encounter until there are no more pages or until it reaches a designated limit
Trang 4Web Crawler
• For this case study, we will create a
graphical web crawler with the
following requirements
– Enter a designated starting web page – Enter a target string for which to search – Limit the search to 50 pages
– Display the results when done
Trang 5Copyright © 2005 Pearson
Addison-Wesley All rights
Web Crawler - Design
• Our web crawler system consists of
three high-level components:
– The driver
– The graphical user interface
– The web crawler implementation
• Makes use of graphs and hashtables
Trang 6Web Crawler - Design
• The algorithm for the web crawler is as
follows
– Add the starting page to a HashSet of pages to
be searched and to our graph
– Remove a page from the set of pages to be
searched
– Search the page for the target string
• If string exists, add page to list of results
– Search the page for links
• If links have not already been searched, add them to
Trang 7Copyright © 2005 Pearson
Addison-Wesley All rights
FIGURE 19.1 User interface design
Trang 8FIGURE 19.2
UML description