Package | Description |
---|---|
us.codecraft.webmagic |
Main class "Spider" and models.
|
us.codecraft.webmagic.configurable | |
us.codecraft.webmagic.example | |
us.codecraft.webmagic.model |
Page model and annotations used to customize a crawler.
|
us.codecraft.webmagic.monitor | |
us.codecraft.webmagic.pipeline |
Pipeline is the persistent and offline process part of crawler.
|
us.codecraft.webmagic.selector |
Selectors for page extraction.
|
Modifier and Type | Interface and Description |
---|---|
interface |
MultiPageModel
Extract an object of more than one pages, such as news and articles.
|
Modifier and Type | Method and Description |
---|---|
Request |
Request.setPriority(long priority)
Set the priority of request for sorting.
Need a scheduler supporting priority. |
Modifier and Type | Class and Description |
---|---|
class |
ConfigurablePageProcessor |
Modifier and Type | Class and Description |
---|---|
class |
AppStore |
Modifier and Type | Interface and Description |
---|---|
interface |
HasKey
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object. |
Modifier and Type | Class and Description |
---|---|
class |
SpiderMonitor |
Modifier and Type | Class and Description |
---|---|
class |
MultiPagePipeline
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page. |
Modifier and Type | Class and Description |
---|---|
class |
SmartContentSelector
Borrowed from https://code.google.com/p/cx-extractor/
|
Copyright © 2017. All rights reserved.