| Class and Description |
|---|
| MultiPageModel
Extract an object of more than one pages, such as news and articles.
|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| Site
Object contains setting for crawler.
|
| Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
| Spider.Status |
| SpiderListener
Listener of Spider on page processing.
|
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| Site
Object contains setting for crawler.
|
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| Site
Object contains setting for crawler.
|
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| MultiPageModel
Extract an object of more than one pages, such as news and articles.
|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
| Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
| SpiderListener
Listener of Spider on page processing.
|
| Class and Description |
|---|
| ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| Site
Object contains setting for crawler.
|
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
Copyright © 2017. All rights reserved.