Class and Description |
---|
MultiPageModel
Extract an object of more than one pages, such as news and articles.
|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Request
Object contains url to crawl.
It contains some additional information. |
ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
Site
Object contains setting for crawler.
|
Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
Spider.Status |
SpiderListener
Listener of Spider on page processing.
|
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Request
Object contains url to crawl.
It contains some additional information. |
Site
Object contains setting for crawler.
|
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Request
Object contains url to crawl.
It contains some additional information. |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Request
Object contains url to crawl.
It contains some additional information. |
ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
Site
Object contains setting for crawler.
|
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
MultiPageModel
Extract an object of more than one pages, such as news and articles.
|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Class and Description |
---|
Request
Object contains url to crawl.
It contains some additional information. |
Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
SpiderListener
Listener of Spider on page processing.
|
Class and Description |
---|
ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Class and Description |
---|
ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Request
Object contains url to crawl.
It contains some additional information. |
Site
Object contains setting for crawler.
|
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Request
Object contains url to crawl.
It contains some additional information. |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Request
Object contains url to crawl.
It contains some additional information. |
Task
Interface for identifying different tasks.
|
Class and Description |
---|
Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in Pipeline Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
Site
Object contains setting for crawler.
|
Class and Description |
---|
Request
Object contains url to crawl.
It contains some additional information. |
Copyright © 2017. All rights reserved.