See: Description
Interface | Description |
---|---|
MultiPageModel |
Extract an object of more than one pages, such as news and articles.
|
SpiderListener |
Listener of Spider on page processing.
|
Task |
Interface for identifying different tasks.
|
Class | Description |
---|---|
Page |
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Request |
Object contains url to crawl.
It contains some additional information. |
ResultItems |
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
SimpleHttpClient | |
Site |
Object contains setting for crawler.
|
Spider |
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
Enum | Description |
---|---|
Spider.Status |
Copyright © 2017. All rights reserved.