Focused Crawling: Experiences in a Real World Project
In this paper, we describe our experience building a focused web crawler, that is, a web crawler that retrieves only pages about a given topic. We review some of the problems encountered, roughly dividing them into practical or engineering issues (related to the lack of standards and control for the web, and not addressed in most research) and conceptual issues (related to the task at hand -determining if a certain page is about a given topic-, over which considerable research has been done). We then overview the system we designed and built, and provide some preliminary evidence of its performance. We conclude with some observations and suggestions for further research.
Badia, A., Muezzinoglu, T., and Nasraoui, O. 2006. Focused crawling: experiences in a real world project. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 1043-1044.
Other items being presented by these speakers
Sponsor of The CIO Dinner