See: Description
| Interface | Description |
|---|---|
| MultiPageModel |
Extract an object of more than one pages, such as news and articles.
|
| SpiderListener |
Listener of Spider on page processing.
|
| Task |
Interface for identifying different tasks.
|
| Class | Description |
|---|---|
| Page |
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request |
Object contains url to crawl.
It contains some additional information. |
| ResultItems |
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| SimpleHttpClient | |
| Site |
Object contains setting for crawler.
|
| Spider |
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
| Enum | Description |
|---|---|
| Spider.Status |
Copyright © 2017. All rights reserved.