Web Crawling Model and Architecture

Vol-4 \| Issue-02 \| February 2019 \| Published Online: 20 February 2019 PDF ( 337 KB )
Author(s)
Dr Anish Gupta ¹; Dr. K. B. Singh ²; Dr. R. K. Singh ³
¹Department Of Information Technology, B. R. Ambedhar Bihar University, Muzaffarpur ²PG Department of Physics, Samastipur College, Samastipur, LNMU, Darbhanga ³Department of EC and IT, MIT Muzaffarpur, Bihar
Abstract
Web crawlers [1] concurrently contend with multiple problems that can even contradict each other. To re-visit pages together with finding new pages concurrently, fresh copies of the web pages are held [3]. Usage of limited infrastructure such as network connectivity without web servers being overwhelmed. They ought to have many "good pages" but they don't specify exactly which pages are the preferred alternatives in advance. This paper explores a paradigm that directly combines crawling with all existing search engine and, using customizable criteria, retrieves potential answers to cope with conflicting objectives [2]. We explain how this model generalises several individual cases, and add a crawling programming structure which really utilizes the model.
Keywords
web crawler, network, search engine, software architecture.
Statistics Article View: 266