Package | Description |
---|---|
us.codecraft.webmagic |
Main class "Spider" and models.
|
us.codecraft.webmagic.configurable | |
us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
us.codecraft.webmagic.example | |
us.codecraft.webmagic.handler | |
us.codecraft.webmagic.model |
Page model and annotations used to customize a crawler.
|
us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
us.codecraft.webmagic.processor.example | |
us.codecraft.webmagic.samples | |
us.codecraft.webmagic.samples.scheduler | |
us.codecraft.webmagic.scripts |
Modifier and Type | Field and Description |
---|---|
protected Site |
Spider.site |
Modifier and Type | Method and Description |
---|---|
Site |
Site.addCookie(String name,
String value)
Add a cookie with domain
getDomain() |
Site |
Site.addCookie(String domain,
String name,
String value)
Add a cookie with specific domain.
|
Site |
Site.addHeader(String key,
String value)
Put an Http header for downloader.
|
Site |
Task.getSite()
site of a task
|
Site |
Spider.getSite() |
static Site |
Site.me()
new a Site
|
Site |
Site.setAcceptStatCode(Set<Integer> acceptStatCode)
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed. {200} by default. It is not necessarily to be set. |
Site |
Site.setCharset(String charset)
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header. |
Site |
Site.setCycleRetryTimes(int cycleRetryTimes)
Set cycleRetryTimes times when download fail, 0 by default.
|
Site |
Site.setDisableCookieManagement(boolean disableCookieManagement)
Downloader is supposed to store response cookie.
|
Site |
Site.setDomain(String domain)
set the domain of site.
|
Site |
Site.setRetrySleepTime(int retrySleepTime)
Set retry sleep times when download fail, 1000 by default.
|
Site |
Site.setRetryTimes(int retryTimes)
Set retry times when download fail, 0 by default.
|
Site |
Site.setSleepTime(int sleepTime)
Set the interval between the processing of two pages.
Time unit is micro seconds. |
Site |
Site.setTimeOut(int timeOut)
set timeout for downloader in ms
|
Site |
Site.setUseGzip(boolean useGzip)
Whether use gzip.
|
Site |
Site.setUserAgent(String userAgent)
set user agent
|
Constructor and Description |
---|
SimpleHttpClient(Site site) |
Modifier and Type | Method and Description |
---|---|
Site |
ConfigurablePageProcessor.getSite() |
Constructor and Description |
---|
ConfigurablePageProcessor(Site site,
List<ExtractRule> extractRules) |
Modifier and Type | Method and Description |
---|---|
HttpClientRequestContext |
HttpUriRequestConverter.convert(Request request,
Site site,
Proxy proxy) |
org.apache.http.impl.client.CloseableHttpClient |
HttpClientGenerator.getClient(Site site) |
Modifier and Type | Method and Description |
---|---|
Site |
GithubRepoPageMapper.getSite() |
Modifier and Type | Method and Description |
---|---|
Site |
CompositePageProcessor.getSite() |
Modifier and Type | Method and Description |
---|---|
CompositePageProcessor |
CompositePageProcessor.setSite(Site site) |
Constructor and Description |
---|
CompositePageProcessor(Site site) |
Modifier and Type | Method and Description |
---|---|
static OOSpider |
OOSpider.create(Site site,
Class... pageModels) |
static OOSpider |
OOSpider.create(Site site,
PageModelPipeline pageModelPipeline,
Class... pageModels) |
Constructor and Description |
---|
OOSpider(Site site,
PageModelPipeline pageModelPipeline,
Class... pageModels)
create a spider
|
Modifier and Type | Method and Description |
---|---|
Site |
SimplePageProcessor.getSite() |
Site |
PageProcessor.getSite()
get the site settings
|
Modifier and Type | Method and Description |
---|---|
Site |
ZhihuPageProcessor.getSite() |
Site |
GithubRepoPageProcessor.getSite() |
Site |
BaiduBaikePageProcessor.getSite() |
Modifier and Type | Method and Description |
---|---|
Site |
ZhihuPageProcessor.getSite() |
Site |
TianyaPageProcesser.getSite() |
Site |
SinaBlogProcessor.getSite() |
Site |
QzoneBlogProcessor.getSite() |
Site |
PhantomJSPageProcessor.getSite() |
Site |
NjuBBSProcessor.getSite() |
Site |
MeicanProcessor.getSite() |
Site |
MamacnPageProcessor.getSite() |
Site |
KaichibaProcessor.getSite() |
Site |
IteyeBlogProcessor.getSite() |
Site |
InfoQMiniBookProcessor.getSite() |
Site |
HuxiuProcessor.getSite() |
Site |
GithubRepoPageProcessor.getSite() |
Site |
F58PageProcesser.getSite() |
Site |
DiaoyuwengProcessor.getSite() |
Site |
DiandianBlogProcessor.getSite() |
Site |
AngularJSProcessor.getSite() |
Site |
AmanzonPageProcessor.getSite() |
Site |
AlexanderMcqueenGoodsProcessor.getSite() |
Modifier and Type | Method and Description |
---|---|
Site |
ZipCodePageProcessor.getSite() |
Modifier and Type | Method and Description |
---|---|
Site |
ScriptProcessor.getSite() |
Copyright © 2017. All rights reserved.