| Skip to main content | Skip to navigation |

Register Now!

Focused Crawling: Experiences in a Real World Project

  • Antonio Badia, University of Louisville, USA
  • Tulay Muezzinoglu, University of Louisville, USA
  • Olfa Nasraoui, University of Louisville, USA

Full text:

Poster:

Track: Posters

In this paper, we describe our experience building a focused web crawler, that is, a web crawler that retrieves only pages about a given topic. We review some of the problems encountered, roughly dividing them into practical or engineering issues (related to the lack of standards and control for the web, and not addressed in most research) and conceptual issues (related to the task at hand -determining if a certain page is about a given topic-, over which considerable research has been done). We then overview the system we designed and built, and provide some preliminary evidence of its performance. We conclude with some observations and suggestions for further research.

Citation

Badia, A., Muezzinoglu, T., and Nasraoui, O. 2006. Focused crawling: experiences in a real world project. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 1043-1044.
DOI= http://doi.acm.org/10.1145/1135777.1136006

Other items being presented by these speakers

Organised by

ECS Logo

in association with

BCS Logo ACM Logo

Platinum Sponsors

Sponsor of The CIO Dinner


Become a sponsor or exhibitor
Valid XHTML 1.0! IFIP logo WWW Conference Committee logo Web Consortium logo Valid CSS!