| Skip to main content | Skip to navigation |

Register Now!

Geographically Focused Collaborative Crawling

  • Weizheng Gao, Genieknows.com, Canada
  • Hyun Chul Lee, University of Toronto, Canada
  • Yingbo Miao, Genieknows.com, Canada

Full text:

Presentation Slides:

Track: Web Engineering

A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographicallyaware pages using collaborative crawling strategies. We first propose several collaborative crawling strategies for the geographically focused crawling, whose goal is to collect web pages about specified geographic locations, by considering features like URL address of page, content of page, extended anchor text of link, and others. Later, we propose various evaluation criteria to qualify the performance of such crawling strategies. Finally, we experimentally study our crawling strategies by crawling the real web data showing that some of our crawling strategies greatly outperform the simple URL-hash based partition collaborative crawling, in which the crawling assignments are determined according to the hash-value computation over URLs. More precisely, features like URL address of page and extended anchor text of link are shown to yield the best overall performance for the geographically focused crawling.

Citation

Gao, W., Lee, H. C., and Miao, Y. 2006. Geographically focused collaborative crawling. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 287-296.
DOI= http://doi.acm.org/10.1145/1135777.1135822

Organised by

ECS Logo

in association with

BCS Logo ACM Logo

Platinum Sponsors

Sponsor of The CIO Dinner


Become a sponsor or exhibitor
Valid XHTML 1.0! IFIP logo WWW Conference Committee logo Web Consortium logo Valid CSS!