| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.configurable | |
| us.codecraft.webmagic.example | |
| us.codecraft.webmagic.model |
Page model and annotations used to customize a crawler.
|
| us.codecraft.webmagic.monitor | |
| us.codecraft.webmagic.pipeline |
Pipeline is the persistent and offline process part of crawler.
|
| us.codecraft.webmagic.selector |
Selectors for page extraction.
|
| Modifier and Type | Interface and Description |
|---|---|
interface |
MultiPageModel
Extract an object of more than one pages, such as news and articles.
|
| Modifier and Type | Method and Description |
|---|---|
Request |
Request.setPriority(long priority)
Set the priority of request for sorting.
Need a scheduler supporting priority. |
| Modifier and Type | Class and Description |
|---|---|
class |
ConfigurablePageProcessor |
| Modifier and Type | Class and Description |
|---|---|
class |
AppStore |
| Modifier and Type | Interface and Description |
|---|---|
interface |
HasKey
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object. |
| Modifier and Type | Class and Description |
|---|---|
class |
SpiderMonitor |
| Modifier and Type | Class and Description |
|---|---|
class |
MultiPagePipeline
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page. |
| Modifier and Type | Class and Description |
|---|---|
class |
SmartContentSelector
Borrowed from https://code.google.com/p/cx-extractor/
|
Copyright © 2017. All rights reserved.