| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.configurable | |
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.selector |
Selectors for page extraction.
|
| us.codecraft.webmagic.utils |
Static utils of webmagic.
|
| Class and Description |
|---|
| Html
Selectable html.
|
| Json
parse json
|
| Selectable
Selectable text.
|
| Class and Description |
|---|
| Selector
Selector(extractor) for text.
|
| Class and Description |
|---|
| Html
Selectable html.
|
| Class and Description |
|---|
| AbstractSelectable |
| AndSelector
All selectors will be arranged as a pipeline.
|
| BaseElementSelector |
| CssSelector
CSS selector.
|
| ElementSelector
Selector(extractor) for html elements.
|
| Html
Selectable html.
|
| HtmlNode |
| Json
parse json
|
| OrSelector
All extractors will do extracting separately,
and the results of extractors will combined as the final result. |
| PlainText
Selectable plain text.
Can not be selected by XPath or CSS Selector. |
| RegexSelector
Selector in regex.
|
| Selectable
Selectable text.
|
| Selector
Selector(extractor) for text.
|
| SmartContentSelector
Borrowed from https://code.google.com/p/cx-extractor/
|
| XpathSelector
XPath selector based on Xsoup.
|
| Class and Description |
|---|
| Selector
Selector(extractor) for text.
|
Copyright © 2017. All rights reserved.