Package | Description |
---|---|
us.codecraft.webmagic |
Main class "Spider" and models.
|
us.codecraft.webmagic.selector |
Selectors for page extraction.
|
Modifier and Type | Method and Description |
---|---|
Selectable |
Page.getUrl()
get url of current page
|
Modifier and Type | Method and Description |
---|---|
void |
Page.setUrl(Selectable url) |
Modifier and Type | Class and Description |
---|---|
class |
AbstractSelectable |
class |
Html
Selectable html.
|
class |
HtmlNode |
class |
Json
parse json
|
class |
PlainText
Selectable plain text.
Can not be selected by XPath or CSS Selector. |
Modifier and Type | Method and Description |
---|---|
Selectable |
Selectable.$(String selector)
select list with css selector
|
Selectable |
PlainText.$(String selector) |
Selectable |
HtmlNode.$(String selector) |
Selectable |
Selectable.$(String selector,
String attrName)
select list with css selector
|
Selectable |
PlainText.$(String selector,
String attrName) |
Selectable |
HtmlNode.$(String selector,
String attrName) |
Selectable |
Selectable.css(String selector)
select list with css selector
|
Selectable |
AbstractSelectable.css(String selector) |
Selectable |
Selectable.css(String selector,
String attrName)
select list with css selector
|
Selectable |
AbstractSelectable.css(String selector,
String attrName) |
Selectable |
Selectable.jsonPath(String jsonPath)
extract by JSON Path expression
|
Selectable |
Json.jsonPath(String jsonPath) |
Selectable |
AbstractSelectable.jsonPath(String jsonPath) |
Selectable |
Selectable.links()
select all links
|
Selectable |
PlainText.links() |
Selectable |
HtmlNode.links() |
Selectable |
Selectable.regex(String regex)
select list with regex, default group is group 1
|
Selectable |
AbstractSelectable.regex(String regex) |
Selectable |
Selectable.regex(String regex,
int group)
select list with regex
|
Selectable |
AbstractSelectable.regex(String regex,
int group) |
Selectable |
Selectable.replace(String regex,
String replacement)
replace with regex
|
Selectable |
AbstractSelectable.replace(String regex,
String replacement) |
Selectable |
Selectable.select(Selector selector)
extract by custom selector
|
Selectable |
HtmlNode.select(Selector selector) |
Selectable |
AbstractSelectable.select(Selector selector) |
protected Selectable |
AbstractSelectable.select(Selector selector,
List<String> strings) |
protected Selectable |
HtmlNode.selectElements(BaseElementSelector elementSelector)
select elements
|
Selectable |
Selectable.selectList(Selector selector)
extract by custom selector
|
Selectable |
HtmlNode.selectList(Selector selector) |
Selectable |
AbstractSelectable.selectList(Selector selector) |
protected Selectable |
AbstractSelectable.selectList(Selector selector,
List<String> strings) |
Selectable |
Selectable.smartContent()
select smart content with ReadAbility algorithm
|
Selectable |
PlainText.smartContent() |
Selectable |
HtmlNode.smartContent() |
Selectable |
Selectable.xpath(String xpath)
select list with xpath
|
Selectable |
PlainText.xpath(String xpath) |
Selectable |
HtmlNode.xpath(String xpath) |
Modifier and Type | Method and Description |
---|---|
List<Selectable> |
Selectable.nodes()
get all nodes
|
List<Selectable> |
PlainText.nodes() |
List<Selectable> |
HtmlNode.nodes() |
Copyright © 2017. All rights reserved.