Skip navigation links
$ A B C D E F G H I J K L M N O P Q R S T U V W X Z 

$

$(String) - Method in class us.codecraft.webmagic.selector.HtmlNode
 
$(String, String) - Method in class us.codecraft.webmagic.selector.HtmlNode
 
$(String) - Method in class us.codecraft.webmagic.selector.PlainText
 
$(String, String) - Method in class us.codecraft.webmagic.selector.PlainText
 
$(String) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with css selector
$(String, String) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with css selector
$(String) - Static method in class us.codecraft.webmagic.selector.Selectors
 
$(String, String) - Static method in class us.codecraft.webmagic.selector.Selectors
 

A

AbstractDownloader - Class in us.codecraft.webmagic.downloader
Base class of downloader with some common methods.
AbstractDownloader() - Constructor for class us.codecraft.webmagic.downloader.AbstractDownloader
 
AbstractSelectable - Class in us.codecraft.webmagic.selector
 
AbstractSelectable() - Constructor for class us.codecraft.webmagic.selector.AbstractSelectable
 
addCookie(String, String) - Method in class us.codecraft.webmagic.Request
 
addCookie(String, String) - Method in class us.codecraft.webmagic.Site
Add a cookie with domain Site.getDomain()
addCookie(String, String, String) - Method in class us.codecraft.webmagic.Site
Add a cookie with specific domain.
addHeader(String, String) - Method in class us.codecraft.webmagic.Request
 
addHeader(String, String) - Method in class us.codecraft.webmagic.Site
Put an Http header for downloader.
addPageModel(PageModelPipeline, Class...) - Method in class us.codecraft.webmagic.model.OOSpider
 
addPipeline(Pipeline) - Method in class us.codecraft.webmagic.Spider
add a pipeline for Spider
addRequest(Request...) - Method in class us.codecraft.webmagic.Spider
Add urls with information to crawl.
addSubPageProcessor(SubPageProcessor) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
addSubPipeline(SubPipeline) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
addTargetRequest(String) - Method in class us.codecraft.webmagic.Page
add url to fetch
addTargetRequest(Request) - Method in class us.codecraft.webmagic.Page
add requests to fetch
addTargetRequests(List<String>) - Method in class us.codecraft.webmagic.Page
add urls to fetch
addTargetRequests(List<String>, long) - Method in class us.codecraft.webmagic.Page
add urls to fetch
addUrl(String...) - Method in class us.codecraft.webmagic.Spider
Add urls to crawl.
AfterExtractor - Interface in us.codecraft.webmagic.model
Interface to be implemented by page models that need to do something after fields are extracted.
afterProcess(Page) - Method in interface us.codecraft.webmagic.model.AfterExtractor
 
afterProcess(Page) - Method in class us.codecraft.webmagic.model.samples.DianpingFtlDataScanner
 
afterProcess(Page) - Method in class us.codecraft.webmagic.model.samples.OschinaAnswer
 
AlexanderMcqueenGoodsProcessor - Class in us.codecraft.webmagic.samples
 
AlexanderMcqueenGoodsProcessor() - Constructor for class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
 
all() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
all() - Method in interface us.codecraft.webmagic.selector.Selectable
multi string result
AmanzonPageProcessor - Class in us.codecraft.webmagic.samples
 
AmanzonPageProcessor() - Constructor for class us.codecraft.webmagic.samples.AmanzonPageProcessor
 
and(Selector...) - Static method in class us.codecraft.webmagic.selector.Selectors
 
AndSelector - Class in us.codecraft.webmagic.selector
All selectors will be arranged as a pipeline.
AndSelector(Selector...) - Constructor for class us.codecraft.webmagic.selector.AndSelector
 
AndSelector(List<Selector>) - Constructor for class us.codecraft.webmagic.selector.AndSelector
 
AngularJSProcessor - Class in us.codecraft.webmagic.samples
 
AngularJSProcessor() - Constructor for class us.codecraft.webmagic.samples.AngularJSProcessor
 
AppStore - Class in us.codecraft.webmagic.example
 
AppStore() - Constructor for class us.codecraft.webmagic.example.AppStore
 

B

BaiduBaike - Class in us.codecraft.webmagic.example
 
BaiduBaike() - Constructor for class us.codecraft.webmagic.example.BaiduBaike
 
BaiduBaikePageProcessor - Class in us.codecraft.webmagic.processor.example
 
BaiduBaikePageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
 
BaiduNews - Class in us.codecraft.webmagic.model.samples
 
BaiduNews() - Constructor for class us.codecraft.webmagic.model.samples.BaiduNews
 
BaseElementSelector - Class in us.codecraft.webmagic.selector
 
BaseElementSelector() - Constructor for class us.codecraft.webmagic.selector.BaseElementSelector
 
BasicTypeFormatter<T> - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
BasicTypeFormatter.BooleanFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.ByteFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.CharactorFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.DoubleFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.FloatFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.IntegerFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.LongFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.ShortFormatter - Class in us.codecraft.webmagic.model.formatter
 
basicTypeFormatters - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
Blog - Interface in us.codecraft.webmagic.model.samples
 
BloomFilterDuplicateRemover - Class in us.codecraft.webmagic.scheduler
BloomFilterDuplicateRemover for huge number of urls.
BloomFilterDuplicateRemover(int) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
BloomFilterDuplicateRemover(int, double) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
BooleanFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
build() - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
build() - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
ByteFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 

C

canonicalizeUrl(String, String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
canonicalizeUrl
Borrowed from Jsoup.
CharactorFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
CharsetUtils - Class in us.codecraft.webmagic.utils
 
CharsetUtils() - Constructor for class us.codecraft.webmagic.utils.CharsetUtils
 
checkAndMakeParentDirecotry(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
 
checkIfRunning() - Method in class us.codecraft.webmagic.Spider
 
ClassUtils - Class in us.codecraft.webmagic.utils
 
ClassUtils() - Constructor for class us.codecraft.webmagic.utils.ClassUtils
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
clazz() - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
clazz() - Method in class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
 
clearPipeline() - Method in class us.codecraft.webmagic.Spider
clear the pipelines set
close() - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
 
close() - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
close() - Method in class us.codecraft.webmagic.Spider
 
CODE_200 - Static variable in class us.codecraft.webmagic.utils.HttpConstant.StatusCode
 
CollectorPageModelPipeline<T> - Class in us.codecraft.webmagic.pipeline
 
CollectorPageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
CollectorPipeline<T> - Interface in us.codecraft.webmagic.pipeline
Pipeline that can collect and store results.
combine(MultiPageModel) - Method in class us.codecraft.webmagic.model.samples.News163
 
combine(MultiPageModel) - Method in interface us.codecraft.webmagic.MultiPageModel
Combine multiPageModels to a whole object.
ComboExtract - Annotation Type in us.codecraft.webmagic.model.annotation
Combo 'ExtractBy' extractor with and/or operator.
ComboExtract.Op - Enum in us.codecraft.webmagic.model.annotation
 
ComboExtract.Source - Enum in us.codecraft.webmagic.model.annotation
types of source for extracting.
compareLong(long, long) - Static method in class us.codecraft.webmagic.utils.NumberUtils
 
CompositePageProcessor - Class in us.codecraft.webmagic.handler
 
CompositePageProcessor(Site) - Constructor for class us.codecraft.webmagic.handler.CompositePageProcessor
 
CompositePipeline - Class in us.codecraft.webmagic.handler
 
CompositePipeline() - Constructor for class us.codecraft.webmagic.handler.CompositePipeline
 
ConfigurablePageProcessor - Class in us.codecraft.webmagic.configurable
 
ConfigurablePageProcessor(Site, List<ExtractRule>) - Constructor for class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
CONNECT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 
ConsolePageModelPipeline - Class in us.codecraft.webmagic.model
Print page model in console.
Usually used in test.
ConsolePageModelPipeline() - Constructor for class us.codecraft.webmagic.model.ConsolePageModelPipeline
 
ConsolePipeline - Class in us.codecraft.webmagic.pipeline
Write results in console.
Usually used in test.
ConsolePipeline() - Constructor for class us.codecraft.webmagic.pipeline.ConsolePipeline
 
ContentType() - Constructor for class us.codecraft.webmagic.model.HttpRequestBody.ContentType
 
convert(Request, Site, Proxy) - Method in class us.codecraft.webmagic.downloader.HttpUriRequestConverter
 
convertHeaders(Header[]) - Static method in class us.codecraft.webmagic.utils.HttpClientUtils
 
convertToRequests(Collection<String>) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
convertToUrls(Collection<Request>) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
CountableThreadPool - Class in us.codecraft.webmagic.thread
Thread pool for workers.

Use ExecutorService as inner implement.
CountableThreadPool(int) - Constructor for class us.codecraft.webmagic.thread.CountableThreadPool
 
CountableThreadPool(int, ExecutorService) - Constructor for class us.codecraft.webmagic.thread.CountableThreadPool
 
create(Site, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
 
create(Site, PageModelPipeline, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
 
create(String) - Static method in class us.codecraft.webmagic.selector.Html
 
create(String) - Static method in class us.codecraft.webmagic.selector.PlainText
 
create(PageProcessor) - Static method in class us.codecraft.webmagic.Spider
create a spider with pageProcessor.
css(String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
css(String, String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
css(String) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with css selector
css(String, String) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with css selector
CssSelector - Class in us.codecraft.webmagic.selector
CSS selector.
CssSelector(String) - Constructor for class us.codecraft.webmagic.selector.CssSelector
 
CssSelector(String, String) - Constructor for class us.codecraft.webmagic.selector.CssSelector
 
custom(byte[], String, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
 
custom() - Static method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
CustomRedirectStrategy - Class in us.codecraft.webmagic.downloader
支持post 302跳转策略实现类 HttpClient默认跳转:httpClientBuilder.setRedirectStrategy(new LaxRedirectStrategy()); 上述代码在post/redirect/post这种情况下不会传递原有请求的数据信息。所以参考了下SeimiCrawler这个项目的重定向策略。 原代码地址:https://github.com/zhegexiaohuozi/SeimiCrawler/blob/master/project/src/main/java/cn/wanghaomiao/seimi/http/hc/SeimiRedirectStrategy.java
CustomRedirectStrategy() - Constructor for class us.codecraft.webmagic.downloader.CustomRedirectStrategy
 
CYCLE_TRIED_TIMES - Static variable in class us.codecraft.webmagic.Request
 

D

DateFormatter - Class in us.codecraft.webmagic.model.formatter
 
DateFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.DateFormatter
 
DEFAULT_CLAZZ - Static variable in class us.codecraft.webmagic.utils.MultiKeyMapBase
 
DEFAULT_FORMATTER - Static variable in annotation type us.codecraft.webmagic.model.annotation.Formatter
 
DEFAULT_PATTERN - Static variable in class us.codecraft.webmagic.model.formatter.DateFormatter
 
DelayQueueScheduler - Class in us.codecraft.webmagic.samples.scheduler
 
DelayQueueScheduler(long, TimeUnit) - Constructor for class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
 
DELETE - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 
destroyWhenExit - Variable in class us.codecraft.webmagic.Spider
 
detectBasicClass(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
detectCharset(String, byte[]) - Static method in class us.codecraft.webmagic.utils.CharsetUtils
 
DiandianBlogProcessor - Class in us.codecraft.webmagic.samples
 
DiandianBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.DiandianBlogProcessor
 
DianpingFtlDataScanner - Class in us.codecraft.webmagic.model.samples
 
DianpingFtlDataScanner() - Constructor for class us.codecraft.webmagic.model.samples.DianpingFtlDataScanner
 
DiaoyuwengProcessor - Class in us.codecraft.webmagic.samples
 
DiaoyuwengProcessor() - Constructor for class us.codecraft.webmagic.samples.DiaoyuwengProcessor
 
DISABLE_HTML_ENTITY_ESCAPE - Static variable in class us.codecraft.webmagic.selector.Html
Deprecated. 
DoubleFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
DoubleKeyMap<K1,K2,V> - Class in us.codecraft.webmagic.utils
 
DoubleKeyMap() - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Map<K1, Map<K2, V>>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Map<K1, Map<K2, V>>, Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
init map with protoMapClass
download(String) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
A simple method to download a url.
download(String, String) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
A simple method to download a url.
download(Request, Task) - Method in interface us.codecraft.webmagic.downloader.Downloader
Downloads web pages and store in Page object.
download(Request, Task) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
 
download(Request, Task) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
download(Request, Task) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
 
Downloader - Interface in us.codecraft.webmagic.downloader
Downloader is the part that downloads web pages and store in Page object.
downloader - Variable in class us.codecraft.webmagic.Spider
 
downloader(Downloader) - Method in class us.codecraft.webmagic.Spider
Deprecated. 
DuplicateRemovedScheduler - Class in us.codecraft.webmagic.scheduler
Remove duplicate urls and only push urls which are not duplicate.

DuplicateRemovedScheduler() - Constructor for class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
DuplicateRemover - Interface in us.codecraft.webmagic.scheduler.component
Remove duplicate requests.

E

ElementSelector - Interface in us.codecraft.webmagic.selector
Selector(extractor) for html elements.
encodeIllegalCharacterInUrl(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
Deprecated. 
equals(Object) - Method in class us.codecraft.webmagic.proxy.Proxy
 
equals(Object) - Method in class us.codecraft.webmagic.Request
 
equals(Object) - Method in class us.codecraft.webmagic.Site
 
execute(Runnable) - Method in class us.codecraft.webmagic.thread.CountableThreadPool
 
executorService - Variable in class us.codecraft.webmagic.Spider
 
exitWhenComplete - Variable in class us.codecraft.webmagic.Spider
 
Experimental - Annotation Type in us.codecraft.webmagic.utils
Stands for features unstable.
ExpressionType - Enum in us.codecraft.webmagic.configurable
 
extractAndAddRequests(Page, boolean) - Method in class us.codecraft.webmagic.Spider
 
ExtractBy - Annotation Type in us.codecraft.webmagic.model.annotation
Define the extractor for field or class.
ExtractBy.Source - Enum in us.codecraft.webmagic.model.annotation
types of source for extracting.
ExtractBy.Type - Enum in us.codecraft.webmagic.model.annotation
types of extractor expressions
ExtractByUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define a extractor to extract data in url of current page.
ExtractorUtils - Class in us.codecraft.webmagic.utils
Tools for annotation converting.
ExtractorUtils() - Constructor for class us.codecraft.webmagic.utils.ExtractorUtils
 
ExtractRule - Class in us.codecraft.webmagic.configurable
 
ExtractRule() - Constructor for class us.codecraft.webmagic.configurable.ExtractRule
 

F

F58PageProcesser - Class in us.codecraft.webmagic.samples
 
F58PageProcesser() - Constructor for class us.codecraft.webmagic.samples.F58PageProcesser
 
fail() - Static method in class us.codecraft.webmagic.Page
 
FileCacheQueueScheduler - Class in us.codecraft.webmagic.scheduler
Store urls and cursor in files so that a Spider can resume the status when shutdown.
FileCacheQueueScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
FilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.
FilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
FilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
 
FilePersistentBase - Class in us.codecraft.webmagic.utils
Base object of file persistence.
FilePersistentBase() - Constructor for class us.codecraft.webmagic.utils.FilePersistentBase
 
FilePipeline - Class in us.codecraft.webmagic.pipeline
Store results in files.
FilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePipeline
create a FilePipeline with default path"/data/webmagic/"
FilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePipeline
 
fixIllegalCharacterInUrl(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
FloatFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
FORM - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
 
form(Map<String, Object>, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
 
format(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
format(String) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
format(String) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
format(String) - Method in class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
 
Formatter - Annotation Type in us.codecraft.webmagic.model.annotation
Define how the result string is convert to an object for field.
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
from(Proxy...) - Static method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
 
from(String) - Static method in class us.codecraft.webmagic.utils.RequestUtils
 
fromValue(int) - Static method in enum us.codecraft.webmagic.Spider.Status
 

G

get(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
get(Page) - Method in class us.codecraft.webmagic.model.PageMapper
 
get(String) - Method in class us.codecraft.webmagic.ResultItems
 
get() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
get() - Method in interface us.codecraft.webmagic.selector.Selectable
single string result
get(String, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(Request, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(String) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(Request) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(String) - Method in class us.codecraft.webmagic.Spider
 
get(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
get(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
GET - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 
getAcceptStatCode() - Method in class us.codecraft.webmagic.Site
get acceptStatCode
getAll(Page) - Method in class us.codecraft.webmagic.model.PageMapper
 
getAll() - Method in class us.codecraft.webmagic.ResultItems
 
getAll(Collection<String>) - Method in class us.codecraft.webmagic.Spider
Download urls synchronizing.
getAllCookies() - Method in class us.codecraft.webmagic.Site
get cookies of all domains
getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getAuthor() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getAuthor() - Method in class us.codecraft.webmagic.samples.GithubRepo
 
getBody() - Method in class us.codecraft.webmagic.model.HttpRequestBody
 
getBytes() - Method in class us.codecraft.webmagic.Page
 
getCharset() - Method in class us.codecraft.webmagic.Page
 
getCharset() - Method in class us.codecraft.webmagic.Request
 
getCharset() - Method in class us.codecraft.webmagic.Site
get charset set manually
getCharset(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
getClient(Site) - Method in class us.codecraft.webmagic.downloader.HttpClientGenerator
 
getCollected() - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
getCollected() - Method in interface us.codecraft.webmagic.pipeline.CollectorPipeline
Get all results collected.
getCollected() - Method in class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
 
getCollectorPipeline() - Method in class us.codecraft.webmagic.model.OOSpider
 
getCollectorPipeline() - Method in class us.codecraft.webmagic.Spider
 
getContent() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getContent() - Method in interface us.codecraft.webmagic.model.samples.Blog
 
getContent() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
 
getContent() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
 
getContent() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
 
getContentType() - Method in class us.codecraft.webmagic.model.HttpRequestBody
 
getCookies() - Method in class us.codecraft.webmagic.Request
 
getCookies() - Method in class us.codecraft.webmagic.Site
get cookies
getCycleRetryTimes() - Method in class us.codecraft.webmagic.Site
When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.
getDate() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getDefineFile() - Method in enum us.codecraft.webmagic.scripts.Language
 
getDescription() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
getDescription() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
 
getDocument() - Method in class us.codecraft.webmagic.selector.Html
 
getDomain() - Method in class us.codecraft.webmagic.Site
get domain
getDomain(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
getDuplicateRemover() - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
getElements() - Method in class us.codecraft.webmagic.selector.Html
 
getElements() - Method in class us.codecraft.webmagic.selector.HtmlNode
 
getEncoding() - Method in class us.codecraft.webmagic.model.HttpRequestBody
 
getEngine() - Method in class us.codecraft.webmagic.scripts.ScriptEnginePool
 
getEngineName() - Method in enum us.codecraft.webmagic.scripts.Language
 
getErrorCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getErrorPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getErrorPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getErrorPages() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getErrorPages() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getErrorUrls() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getExpressionParams() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExpressionType() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExpressionValue() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExtra(String) - Method in class us.codecraft.webmagic.Request
 
getExtras() - Method in class us.codecraft.webmagic.Request
 
getFieldName() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getFieldsIncludeSuperClass(Class) - Static method in class us.codecraft.webmagic.utils.ClassUtils
 
getFile(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
 
getFirstNoLoopbackIPAddresses() - Static method in class us.codecraft.webmagic.utils.IPUtils
 
getFirstSourceText() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
getFork() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getFork() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getFork() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getGatherFile() - Method in enum us.codecraft.webmagic.scripts.Language
 
getHeaders() - Method in class us.codecraft.webmagic.Page
 
getHeaders() - Method in class us.codecraft.webmagic.Request
 
getHeaders() - Method in class us.codecraft.webmagic.Site
 
getHost() - Method in class us.codecraft.webmagic.proxy.Proxy
 
getHost(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
getHtml() - Method in class us.codecraft.webmagic.Page
get html content of page
getHttpClientContext() - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
 
getHttpUriRequest() - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
 
getItemKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getJson() - Method in class us.codecraft.webmagic.Page
get json content of page
getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getLanguage() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getLeftPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getLeftPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
getLeftRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.MonitorableScheduler
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getMethod() - Method in class us.codecraft.webmagic.Request
The http method of the request.
getName() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
getName() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getName() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getName() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
 
getName() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getName() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getName() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getName() - Method in class us.codecraft.webmagic.samples.GithubRepo
 
getOtherPages() - Method in class us.codecraft.webmagic.model.samples.News163
 
getOtherPages() - Method in interface us.codecraft.webmagic.MultiPageModel
other pages to be extracted.
It is used to judge whether an object contains more than one page, and whether the pages of the object are all extracted.
getPage(Request) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
getPage() - Method in class us.codecraft.webmagic.model.samples.News163
 
getPage() - Method in interface us.codecraft.webmagic.MultiPageModel
page is the identifier of a page in pages for one object.
getPageCount() - Method in class us.codecraft.webmagic.Spider
Get page count downloaded by spider.
getPageKey() - Method in class us.codecraft.webmagic.model.samples.News163
 
getPageKey() - Method in interface us.codecraft.webmagic.MultiPageModel
Page key is the identifier for the object.
getPagePerSecond() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getPagePerSecond() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getPassword() - Method in class us.codecraft.webmagic.proxy.Proxy
 
getPath() - Method in class us.codecraft.webmagic.utils.FilePersistentBase
 
getPort() - Method in class us.codecraft.webmagic.proxy.Proxy
 
getPriority() - Method in class us.codecraft.webmagic.Request
 
getProxy(Task) - Method in interface us.codecraft.webmagic.proxy.ProxyProvider
Get a proxy for task by some strategy.
getProxy(Task) - Method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
 
getQueueKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getRawText() - Method in class us.codecraft.webmagic.Page
 
getReadme() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getReadme() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getReadme() - Method in class us.codecraft.webmagic.samples.GithubRepo
 
getRedirect(HttpRequest, HttpResponse, HttpContext) - Method in class us.codecraft.webmagic.downloader.CustomRedirectStrategy
 
getRequest() - Method in class us.codecraft.webmagic.Page
get request of current page
getRequest() - Method in class us.codecraft.webmagic.ResultItems
 
getRequestBody() - Method in class us.codecraft.webmagic.Request
 
getResultItems() - Method in class us.codecraft.webmagic.Page
 
getRetryNum() - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
getRetrySleepTime() - Method in class us.codecraft.webmagic.Site
 
getRetryTimes() - Method in class us.codecraft.webmagic.Site
Get retry times immediately when download fail, 0 by default.
getScheduler() - Method in class us.codecraft.webmagic.Spider
 
getSelector() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getSelector(ExtractBy) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
 
getSelectors(ExtractBy[]) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
 
getSetKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getSite() - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
getSite() - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
 
getSite() - Method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
 
getSite() - Method in interface us.codecraft.webmagic.processor.PageProcessor
get the site settings
getSite() - Method in class us.codecraft.webmagic.processor.SimplePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.AngularJSProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.DiandianBlogProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.F58PageProcesser
 
getSite() - Method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.HuxiuProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.KaichibaProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.MamacnPageProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.MeicanProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.NjuBBSProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.QzoneBlogProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.SinaBlogProcessor
 
getSite() - Method in class us.codecraft.webmagic.samples.TianyaPageProcesser
 
getSite() - Method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
 
getSite() - Method in class us.codecraft.webmagic.scripts.ScriptProcessor
 
getSite() - Method in class us.codecraft.webmagic.Spider
 
getSite() - Method in interface us.codecraft.webmagic.Task
site of a task
getSleepTime() - Method in class us.codecraft.webmagic.Site
Get the interval between the processing of two pages.
Time unit is micro seconds.
getSourceTexts() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
getSourceTexts() - Method in class us.codecraft.webmagic.selector.HtmlNode
 
getSourceTexts() - Method in class us.codecraft.webmagic.selector.PlainText
 
getSpiderListeners() - Method in class us.codecraft.webmagic.Spider
 
getSpiderStatusMBean(Spider, SpiderMonitor.MonitorSpiderListener) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
getStar() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getStar() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getStar() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getStartTime() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getStartTime() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getStartTime() - Method in class us.codecraft.webmagic.Spider
 
getStatus() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getStatus() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getStatus() - Method in class us.codecraft.webmagic.Spider
Get running status by spider.
getStatusCode() - Method in class us.codecraft.webmagic.Page
 
getSuccessCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getSuccessPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getSuccessPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTags() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getTags() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
 
getTargetRequests() - Method in class us.codecraft.webmagic.Page
 
getText(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
 
getThread() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getThread() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getThreadAlive() - Method in class us.codecraft.webmagic.Spider
Get thread count which is running
getThreadAlive() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
 
getThreadNum() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
 
getTimeOut() - Method in class us.codecraft.webmagic.Site
 
getTitle() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getTitle() - Method in interface us.codecraft.webmagic.model.samples.Blog
 
getTitle() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
 
getTitle() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
 
getTitle() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
 
getTotalPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getTotalPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
getTotalRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
Get TotalRequestsCount for monitor.
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
getTotalRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.MonitorableScheduler
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getUrl() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getUrl() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getUrl() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
getUrl() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
 
getUrl() - Method in class us.codecraft.webmagic.Page
get url of current page
getUrl() - Method in class us.codecraft.webmagic.Request
 
getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
 
getUserAgent() - Method in class us.codecraft.webmagic.Site
get user agent
getUsername() - Method in class us.codecraft.webmagic.proxy.Proxy
 
getUUID() - Method in class us.codecraft.webmagic.Spider
 
getUUID() - Method in interface us.codecraft.webmagic.Task
unique id for a task.
GithubRepo - Class in us.codecraft.webmagic.example
 
GithubRepo() - Constructor for class us.codecraft.webmagic.example.GithubRepo
 
GithubRepo - Class in us.codecraft.webmagic.model.samples
 
GithubRepo() - Constructor for class us.codecraft.webmagic.model.samples.GithubRepo
 
GithubRepo - Class in us.codecraft.webmagic.samples
 
GithubRepo() - Constructor for class us.codecraft.webmagic.samples.GithubRepo
 
GithubRepoApi - Class in us.codecraft.webmagic.example
 
GithubRepoApi() - Constructor for class us.codecraft.webmagic.example.GithubRepoApi
 
GithubRepoPageMapper - Class in us.codecraft.webmagic.example
 
GithubRepoPageMapper() - Constructor for class us.codecraft.webmagic.example.GithubRepoPageMapper
 
GithubRepoPageProcessor - Class in us.codecraft.webmagic.processor.example
 
GithubRepoPageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
 
GithubRepoPageProcessor - Class in us.codecraft.webmagic.samples
 
GithubRepoPageProcessor() - Constructor for class us.codecraft.webmagic.samples.GithubRepoPageProcessor
 

H

handleResponse(Request, String, HttpResponse, Task) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
 
hasAttribute() - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
hasAttribute() - Method in class us.codecraft.webmagic.selector.CssSelector
 
hasAttribute() - Method in class us.codecraft.webmagic.selector.LinksSelector
 
hasAttribute() - Method in class us.codecraft.webmagic.selector.XpathSelector
 
hashCode() - Method in class us.codecraft.webmagic.proxy.Proxy
 
hashCode() - Method in class us.codecraft.webmagic.Request
 
hashCode() - Method in class us.codecraft.webmagic.Site
 
HashSetDuplicateRemover - Class in us.codecraft.webmagic.scheduler.component
 
HashSetDuplicateRemover() - Constructor for class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
 
HasKey - Interface in us.codecraft.webmagic.model
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object.
HEAD - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 
Header() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.Header
 
HelpUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define the 'help' url patterns for class.
Html - Class in us.codecraft.webmagic.selector
Selectable html.
Html(String, String) - Constructor for class us.codecraft.webmagic.selector.Html
 
Html(String) - Constructor for class us.codecraft.webmagic.selector.Html
 
Html(Document) - Constructor for class us.codecraft.webmagic.selector.Html
 
HtmlNode - Class in us.codecraft.webmagic.selector
 
HtmlNode(List<Element>) - Constructor for class us.codecraft.webmagic.selector.HtmlNode
 
HtmlNode() - Constructor for class us.codecraft.webmagic.selector.HtmlNode
 
HttpClientDownloader - Class in us.codecraft.webmagic.downloader
The http downloader based on HttpClient.
HttpClientDownloader() - Constructor for class us.codecraft.webmagic.downloader.HttpClientDownloader
 
HttpClientGenerator - Class in us.codecraft.webmagic.downloader
 
HttpClientGenerator() - Constructor for class us.codecraft.webmagic.downloader.HttpClientGenerator
 
HttpClientRequestContext - Class in us.codecraft.webmagic.downloader
 
HttpClientRequestContext() - Constructor for class us.codecraft.webmagic.downloader.HttpClientRequestContext
 
HttpClientUtils - Class in us.codecraft.webmagic.utils
 
HttpClientUtils() - Constructor for class us.codecraft.webmagic.utils.HttpClientUtils
 
HttpConstant - Class in us.codecraft.webmagic.utils
Some constants of Http protocal.
HttpConstant() - Constructor for class us.codecraft.webmagic.utils.HttpConstant
 
HttpConstant.Header - Class in us.codecraft.webmagic.utils
 
HttpConstant.Method - Class in us.codecraft.webmagic.utils
 
HttpConstant.StatusCode - Class in us.codecraft.webmagic.utils
 
HttpRequestBody - Class in us.codecraft.webmagic.model
 
HttpRequestBody() - Constructor for class us.codecraft.webmagic.model.HttpRequestBody
 
HttpRequestBody(byte[], String, String) - Constructor for class us.codecraft.webmagic.model.HttpRequestBody
 
HttpRequestBody.ContentType - Class in us.codecraft.webmagic.model
 
HttpUriRequestConverter - Class in us.codecraft.webmagic.downloader
 
HttpUriRequestConverter() - Constructor for class us.codecraft.webmagic.downloader.HttpUriRequestConverter
 
HuxiuProcessor - Class in us.codecraft.webmagic.samples
 
HuxiuProcessor() - Constructor for class us.codecraft.webmagic.samples.HuxiuProcessor
 

I

InfoQMiniBookProcessor - Class in us.codecraft.webmagic.samples
 
InfoQMiniBookProcessor() - Constructor for class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
 
initComponent() - Method in class us.codecraft.webmagic.Spider
 
INITIAL_CAPACITY - Static variable in class us.codecraft.webmagic.scheduler.PriorityScheduler
 
initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
initParam(String[]) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
initParam(String[]) - Method in class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
 
instance() - Static method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
IntegerFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
IPUtils - Class in us.codecraft.webmagic.utils
 
IPUtils() - Constructor for class us.codecraft.webmagic.utils.IPUtils
 
isBinaryContent() - Method in class us.codecraft.webmagic.Request
 
isDisableCookieManagement() - Method in class us.codecraft.webmagic.Site
 
isDownloadSuccess() - Method in class us.codecraft.webmagic.Page
 
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
isDuplicate(Request, Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
Check whether the request is duplicate.
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
 
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
isExitWhenComplete() - Method in class us.codecraft.webmagic.Spider
 
isMulti() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
isNotNull() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
isShutdown() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
 
isSkip() - Method in class us.codecraft.webmagic.ResultItems
Whether to skip the result.
Result which is skipped will not be processed by Pipeline.
isSpawnUrl() - Method in class us.codecraft.webmagic.Spider
 
isUseGzip() - Method in class us.codecraft.webmagic.Site
 
IteyeBlog - Class in us.codecraft.webmagic.model.samples
 
IteyeBlog() - Constructor for class us.codecraft.webmagic.model.samples.IteyeBlog
 
IteyeBlogProcessor - Class in us.codecraft.webmagic.samples
 
IteyeBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.IteyeBlogProcessor
 

J

JokejiModel - Class in us.codecraft.webmagic.model.samples
 
JokejiModel() - Constructor for class us.codecraft.webmagic.model.samples.JokejiModel
 
JSON - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
 
json(String, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
 
Json - Class in us.codecraft.webmagic.selector
parse json
Json(List<String>) - Constructor for class us.codecraft.webmagic.selector.Json
 
Json(String) - Constructor for class us.codecraft.webmagic.selector.Json
 
JsonFilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.
JsonFilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
JsonFilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
 
JsonFilePipeline - Class in us.codecraft.webmagic.pipeline
Store results to files in JSON format.
JsonFilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
JsonFilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
 
jsonPath(String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
jsonPath(String) - Method in class us.codecraft.webmagic.selector.Json
 
jsonPath(String) - Method in interface us.codecraft.webmagic.selector.Selectable
extract by JSON Path expression
JsonPathSelector - Class in us.codecraft.webmagic.selector
JsonPath selector.
Used to extract content from JSON.
JsonPathSelector(String) - Constructor for class us.codecraft.webmagic.selector.JsonPathSelector
 

K

KaichibaProcessor - Class in us.codecraft.webmagic.samples
 
KaichibaProcessor() - Constructor for class us.codecraft.webmagic.samples.KaichibaProcessor
 
key() - Method in class us.codecraft.webmagic.example.GithubRepo
 
key() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
key() - Method in interface us.codecraft.webmagic.model.HasKey
 
key() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
 
Kr36NewsModel - Class in us.codecraft.webmagic.model.samples
 
Kr36NewsModel() - Constructor for class us.codecraft.webmagic.model.samples.Kr36NewsModel
 

L

Language - Enum in us.codecraft.webmagic.scripts
 
language(Language) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
LevelLimitScheduler - Class in us.codecraft.webmagic.samples.scheduler
 
LevelLimitScheduler(int) - Constructor for class us.codecraft.webmagic.samples.scheduler.LevelLimitScheduler
 
links() - Method in class us.codecraft.webmagic.selector.HtmlNode
 
links() - Method in class us.codecraft.webmagic.selector.PlainText
 
links() - Method in interface us.codecraft.webmagic.selector.Selectable
select all links
LinksSelector - Class in us.codecraft.webmagic.selector
Links selector based on jsoup.
LinksSelector() - Constructor for class us.codecraft.webmagic.selector.LinksSelector
 
logger - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
logger - Variable in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
logger - Variable in class us.codecraft.webmagic.Spider
 
LongFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 

M

main(String[]) - Static method in class us.codecraft.webmagic.example.AppStore
 
main(String[]) - Static method in class us.codecraft.webmagic.example.BaiduBaike
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepo
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoApi
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
main(String[]) - Static method in class us.codecraft.webmagic.example.MonitorExample
 
main(String[]) - Static method in class us.codecraft.webmagic.example.OschinaBlog
 
main(String...) - Static method in class us.codecraft.webmagic.example.PatternProcessorExample
 
main(String[]) - Static method in class us.codecraft.webmagic.main.QuickStarter
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.BaiduNews
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.DianpingFtlDataScanner
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.GithubRepo
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.IteyeBlog
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.JokejiModel
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.News163
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.OschinaAnswer
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.OschinaBlog
 
main(String[]) - Static method in class us.codecraft.webmagic.model.samples.QQMeishi
 
main(String[]) - Static method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.AngularJSProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.F58PageProcesser
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.HuxiuProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.KaichibaProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.MamacnPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.MeicanProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.NjuBBSProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.SinaBlogProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
 
main(String[]) - Static method in class us.codecraft.webmagic.scripts.ScriptConsole
 
MamacnPageProcessor - Class in us.codecraft.webmagic.samples
 
MamacnPageProcessor() - Constructor for class us.codecraft.webmagic.samples.MamacnPageProcessor
 
match(Request) - Method in class us.codecraft.webmagic.handler.PatternRequestMatcher
 
match(Request) - Method in interface us.codecraft.webmagic.handler.RequestMatcher
Check whether to process the page.

Please DO NOT change page status in this method.
match() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
match() - Method in interface us.codecraft.webmagic.selector.Selectable
if result exist for select
me() - Static method in class us.codecraft.webmagic.Site
new a Site
MeicanProcessor - Class in us.codecraft.webmagic.samples
 
MeicanProcessor() - Constructor for class us.codecraft.webmagic.samples.MeicanProcessor
 
Method() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.Method
 
MonitorableScheduler - Interface in us.codecraft.webmagic.scheduler
The scheduler whose requests can be counted for monitor.
MonitorExample - Class in us.codecraft.webmagic.example
 
MonitorExample() - Constructor for class us.codecraft.webmagic.example.MonitorExample
 
MonitorSpiderListener() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
monitorSpiderListener - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
MultiKeyMapBase - Class in us.codecraft.webmagic.utils
multi-key map, some basic objects *
MultiKeyMapBase() - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
 
MultiKeyMapBase(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
 
MultiPageModel - Interface in us.codecraft.webmagic
Extract an object of more than one pages, such as news and articles.
MultiPagePipeline - Class in us.codecraft.webmagic.pipeline
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.
MultiPagePipeline() - Constructor for class us.codecraft.webmagic.pipeline.MultiPagePipeline
 
MULTIPART - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
 

N

newArrayList(T...) - Static method in class us.codecraft.webmagic.utils.WMCollections
 
newHashSet(T...) - Static method in class us.codecraft.webmagic.utils.WMCollections
 
newMap() - Method in class us.codecraft.webmagic.utils.MultiKeyMapBase
 
News163 - Class in us.codecraft.webmagic.model.samples
 
News163() - Constructor for class us.codecraft.webmagic.model.samples.News163
 
NjuBBSProcessor - Class in us.codecraft.webmagic.samples
 
NjuBBSProcessor() - Constructor for class us.codecraft.webmagic.samples.NjuBBSProcessor
 
nodes() - Method in class us.codecraft.webmagic.selector.HtmlNode
 
nodes() - Method in class us.codecraft.webmagic.selector.PlainText
 
nodes() - Method in interface us.codecraft.webmagic.selector.Selectable
get all nodes
noNeedToRemoveDuplicate(Request) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
NumberUtils - Class in us.codecraft.webmagic.utils
 
NumberUtils() - Constructor for class us.codecraft.webmagic.utils.NumberUtils
 

O

ObjectFormatter<T> - Interface in us.codecraft.webmagic.model.formatter
 
ObjectFormatterBuilder - Class in us.codecraft.webmagic.model.formatter
 
ObjectFormatterBuilder() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
ObjectFormatters - Class in us.codecraft.webmagic.model.formatter
 
ObjectFormatters() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
OneFilePipeline - Class in us.codecraft.webmagic.samples.pipeline
 
OneFilePipeline() - Constructor for class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
 
OneFilePipeline(String) - Constructor for class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
 
onError(Request) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
 
onError(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
onError(Request) - Method in class us.codecraft.webmagic.Spider
 
onError(Request) - Method in interface us.codecraft.webmagic.SpiderListener
 
onSuccess(Request) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
 
onSuccess(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
onSuccess(Request) - Method in class us.codecraft.webmagic.Spider
 
onSuccess(Request) - Method in interface us.codecraft.webmagic.SpiderListener
 
OOSpider<T> - Class in us.codecraft.webmagic.model
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".
OOSpider(ModelPageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
 
OOSpider(PageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
 
OOSpider(Site, PageModelPipeline, Class...) - Constructor for class us.codecraft.webmagic.model.OOSpider
create a spider
or(Selector...) - Static method in class us.codecraft.webmagic.selector.Selectors
 
OrSelector - Class in us.codecraft.webmagic.selector
All extractors will do extracting separately,
and the results of extractors will combined as the final result.
OrSelector(Selector...) - Constructor for class us.codecraft.webmagic.selector.OrSelector
 
OrSelector(List<Selector>) - Constructor for class us.codecraft.webmagic.selector.OrSelector
 
OschinaAnswer - Class in us.codecraft.webmagic.model.samples
 
OschinaAnswer() - Constructor for class us.codecraft.webmagic.model.samples.OschinaAnswer
 
OschinaBlog - Class in us.codecraft.webmagic.example
 
OschinaBlog() - Constructor for class us.codecraft.webmagic.example.OschinaBlog
 
OschinaBlog - Class in us.codecraft.webmagic.model.samples
 
OschinaBlog() - Constructor for class us.codecraft.webmagic.model.samples.OschinaBlog
 

P

Page - Class in us.codecraft.webmagic
Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl() get url of current page
Page.getHtml() get content of current page
Page.putField(String, Object) save extracted result
Page.getResultItems() get extract results to be used in Pipeline
Page.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch
Page() - Constructor for class us.codecraft.webmagic.Page
 
PageMapper<T> - Class in us.codecraft.webmagic.model
 
PageMapper(Class<T>) - Constructor for class us.codecraft.webmagic.model.PageMapper
 
PageModelPipeline<T> - Interface in us.codecraft.webmagic.pipeline
Implements PageModelPipeline to persistent your page model.
PageProcessor - Interface in us.codecraft.webmagic.processor
Interface to be implemented to customize a crawler.

In PageProcessor, you can customize:
start urls and other settings in Site
how the urls to fetch are detected
how the data are extracted and stored
pageProcessor - Variable in class us.codecraft.webmagic.Spider
 
path - Variable in class us.codecraft.webmagic.utils.FilePersistentBase
 
PATH_SEPERATOR - Static variable in class us.codecraft.webmagic.utils.FilePersistentBase
 
pattern - Variable in class us.codecraft.webmagic.handler.PatternRequestMatcher
match pattern.
PatternProcessor - Class in us.codecraft.webmagic.handler
 
PatternProcessor(String) - Constructor for class us.codecraft.webmagic.handler.PatternProcessor
 
PatternProcessorExample - Class in us.codecraft.webmagic.example
Created with IntelliJ IDEA.
PatternProcessorExample() - Constructor for class us.codecraft.webmagic.example.PatternProcessorExample
 
PatternRequestMatcher - Class in us.codecraft.webmagic.handler
Created with IntelliJ IDEA.
PatternRequestMatcher(String) - Constructor for class us.codecraft.webmagic.handler.PatternRequestMatcher
 
PhantomJSDownloader - Class in us.codecraft.webmagic.downloader
this downloader is used to download pages which need to render the javascript
PhantomJSDownloader() - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
PhantomJSDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
添加新的构造函数,支持phantomjs自定义命令 example: phantomjs.exe 支持windows环境 phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误 /usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
PhantomJSDownloader(String, String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
PhantomJSPageProcessor - Class in us.codecraft.webmagic.samples
Created by dolphineor on 2014-11-21.
PhantomJSPageProcessor() - Constructor for class us.codecraft.webmagic.samples.PhantomJSPageProcessor
 
Pipeline - Interface in us.codecraft.webmagic.pipeline
Pipeline is the persistent and offline process part of crawler.
The interface Pipeline can be implemented to customize ways of persistent.
pipeline(Pipeline) - Method in class us.codecraft.webmagic.Spider
Deprecated. 
pipelines - Variable in class us.codecraft.webmagic.Spider
 
PlainText - Class in us.codecraft.webmagic.selector
Selectable plain text.
Can not be selected by XPath or CSS Selector.
PlainText(List<String>) - Constructor for class us.codecraft.webmagic.selector.PlainText
 
PlainText(String) - Constructor for class us.codecraft.webmagic.selector.PlainText
 
poll(Task) - Method in class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
poll(Task) - Method in interface us.codecraft.webmagic.scheduler.Scheduler
get an url to crawl
pool - Variable in class us.codecraft.webmagic.scheduler.RedisScheduler
 
POST - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 
PriorityScheduler - Class in us.codecraft.webmagic.scheduler
Priority scheduler.
PriorityScheduler() - Constructor for class us.codecraft.webmagic.scheduler.PriorityScheduler
 
process(Page) - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
process(Page) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.model.ConsolePageModelPipeline
 
process(T, Task) - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.ConsolePipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.FilePageModelPipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.FilePipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.MultiPagePipeline
 
process(T, Task) - Method in interface us.codecraft.webmagic.pipeline.PageModelPipeline
 
process(ResultItems, Task) - Method in interface us.codecraft.webmagic.pipeline.Pipeline
Process extracted results.
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
 
process(Page) - Method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
 
process(Page) - Method in interface us.codecraft.webmagic.processor.PageProcessor
process the page, extract urls to fetch, extract the data and store
process(Page) - Method in class us.codecraft.webmagic.processor.SimplePageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.AngularJSProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.DiandianBlogProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.F58PageProcesser
 
process(Page) - Method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.HuxiuProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.KaichibaProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.MamacnPageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.MeicanProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.NjuBBSProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
 
process(Page) - Method in class us.codecraft.webmagic.samples.QzoneBlogProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.SinaBlogProcessor
 
process(Page) - Method in class us.codecraft.webmagic.samples.TianyaPageProcesser
 
process(Page) - Method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.scripts.ScriptProcessor
 
processPage(Page) - Method in interface us.codecraft.webmagic.handler.SubPageProcessor
process the page, extract urls to fetch, extract the data and store
processResult(ResultItems, Task) - Method in interface us.codecraft.webmagic.handler.SubPipeline
process the page, extract urls to fetch, extract the data and store
Proxy - Class in us.codecraft.webmagic.proxy
 
Proxy(String, int) - Constructor for class us.codecraft.webmagic.proxy.Proxy
 
Proxy(String, int, String, String) - Constructor for class us.codecraft.webmagic.proxy.Proxy
 
ProxyProvider - Interface in us.codecraft.webmagic.proxy
Proxy provider.
ProxyUtils - Class in us.codecraft.webmagic.utils
Pooled Proxy Object
ProxyUtils() - Constructor for class us.codecraft.webmagic.utils.ProxyUtils
 
push(Request, Task) - Method in class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
 
push(Request, Task) - Method in class us.codecraft.webmagic.samples.scheduler.LevelLimitScheduler
 
push(Request, Task) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
push(Request, Task) - Method in interface us.codecraft.webmagic.scheduler.Scheduler
add a url to fetch
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
put(Class<? extends ObjectFormatter>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
put(String, T) - Method in class us.codecraft.webmagic.ResultItems
 
put(K1, Map<K2, V>) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
put(K1, K2, V) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
PUT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 
putExtra(String, Object) - Method in class us.codecraft.webmagic.Request
 
putField(String, Object) - Method in class us.codecraft.webmagic.Page
store extract results

Q

QQMeishi - Class in us.codecraft.webmagic.model.samples
 
QQMeishi() - Constructor for class us.codecraft.webmagic.model.samples.QQMeishi
 
QueueScheduler - Class in us.codecraft.webmagic.scheduler
Basic Scheduler implementation.
Store urls to fetch in LinkedBlockingQueue and remove duplicate urls by HashMap.
QueueScheduler() - Constructor for class us.codecraft.webmagic.scheduler.QueueScheduler
 
QuickStarter - Class in us.codecraft.webmagic.main
 
QuickStarter() - Constructor for class us.codecraft.webmagic.main.QuickStarter
 
QzoneBlogProcessor - Class in us.codecraft.webmagic.samples
 
QzoneBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.QzoneBlogProcessor
 

R

rebuildBloomFilter() - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
RedisPriorityScheduler - Class in us.codecraft.webmagic.scheduler
the redis scheduler with priority
RedisPriorityScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
RedisPriorityScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
RedisScheduler - Class in us.codecraft.webmagic.scheduler
Use Redis as url scheduler for distributed crawlers.
RedisScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
 
RedisScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
 
REFERER - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Header
 
regex(String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
regex(String, int) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
regex(String) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with regex, default group is group 1
regex(String, int) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with regex
regex(String) - Static method in class us.codecraft.webmagic.selector.Selectors
 
regex(String, int) - Static method in class us.codecraft.webmagic.selector.Selectors
 
RegexSelector - Class in us.codecraft.webmagic.selector
Selector in regex.
RegexSelector(String, int) - Constructor for class us.codecraft.webmagic.selector.RegexSelector
 
RegexSelector(String) - Constructor for class us.codecraft.webmagic.selector.RegexSelector
Create a RegexSelector.
register(Spider...) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
Register spider for monitor.
registerMBean(SpiderStatusMXBean) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
release(ScriptEngine) - Method in class us.codecraft.webmagic.scripts.ScriptEnginePool
 
remove(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
remove(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
removePadding(String) - Method in class us.codecraft.webmagic.selector.Json
remove padding for JSONP
removePort(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
removeProtocol(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
 
replace(String, String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
replace(String, String) - Method in interface us.codecraft.webmagic.selector.Selectable
replace with regex
ReplacePipeline - Class in us.codecraft.webmagic.samples.pipeline
 
ReplacePipeline() - Constructor for class us.codecraft.webmagic.samples.pipeline.ReplacePipeline
 
ReplaceSelector - Class in us.codecraft.webmagic.selector
Replace selector.
ReplaceSelector(String, String) - Constructor for class us.codecraft.webmagic.selector.ReplaceSelector
 
Request - Class in us.codecraft.webmagic
Object contains url to crawl.
It contains some additional information.
Request() - Constructor for class us.codecraft.webmagic.Request
 
Request(String) - Constructor for class us.codecraft.webmagic.Request
 
RequestMatcher - Interface in us.codecraft.webmagic.handler
 
RequestMatcher.MatchOther - Enum in us.codecraft.webmagic.handler
 
RequestUtils - Class in us.codecraft.webmagic.utils
 
RequestUtils() - Constructor for class us.codecraft.webmagic.utils.RequestUtils
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
resetDuplicateCheck(Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
Reset duplicate check.
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
ResultItems - Class in us.codecraft.webmagic
Object contains extract results.
It is contained in Page and will be processed in pipeline.
ResultItems() - Constructor for class us.codecraft.webmagic.ResultItems
 
ResultItemsCollectorPipeline - Class in us.codecraft.webmagic.pipeline
 
ResultItemsCollectorPipeline() - Constructor for class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
 
returnProxy(Proxy, Page, Task) - Method in interface us.codecraft.webmagic.proxy.ProxyProvider
Return proxy to Provider when complete a download.
returnProxy(Proxy, Page, Task) - Method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
 
run() - Method in class us.codecraft.webmagic.Spider
 
runAsync() - Method in class us.codecraft.webmagic.Spider
 

S

Scheduler - Interface in us.codecraft.webmagic.scheduler
Scheduler is the part of url management.
You can implement interface Scheduler to do: manage urls to fetch remove duplicate urls
scheduler - Variable in class us.codecraft.webmagic.Spider
 
scheduler(Scheduler) - Method in class us.codecraft.webmagic.Spider
Deprecated.
script(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
ScriptConsole - Class in us.codecraft.webmagic.scripts
 
ScriptConsole() - Constructor for class us.codecraft.webmagic.scripts.ScriptConsole
 
ScriptEnginePool - Class in us.codecraft.webmagic.scripts
 
ScriptEnginePool(Language, int) - Constructor for class us.codecraft.webmagic.scripts.ScriptEnginePool
 
scriptFromClassPathFile(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
scriptFromFile(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
ScriptProcessor - Class in us.codecraft.webmagic.scripts
 
ScriptProcessor(Language, String, int) - Constructor for class us.codecraft.webmagic.scripts.ScriptProcessor
 
ScriptProcessorBuilder - Class in us.codecraft.webmagic.scripts
 
select(Selector, List<String>) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
select(Selector) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
select(String) - Method in class us.codecraft.webmagic.selector.AndSelector
 
select(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
select(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
 
select(Element) - Method in interface us.codecraft.webmagic.selector.ElementSelector
Extract single result in text.
If there are more than one result, only the first will be chosen.
select(Selector) - Method in class us.codecraft.webmagic.selector.HtmlNode
 
select(String) - Method in class us.codecraft.webmagic.selector.JsonPathSelector
 
select(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
 
select(String) - Method in class us.codecraft.webmagic.selector.OrSelector
 
select(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
 
select(String) - Method in class us.codecraft.webmagic.selector.ReplaceSelector
 
select(Selector) - Method in interface us.codecraft.webmagic.selector.Selectable
extract by custom selector
select(String) - Method in interface us.codecraft.webmagic.selector.Selector
Extract single result in text.
If there are more than one result, only the first will be chosen.
select(String) - Method in class us.codecraft.webmagic.selector.SmartContentSelector
 
select(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
 
select(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
 
Selectable - Interface in us.codecraft.webmagic.selector
Selectable text.
selectDocument(Selector) - Method in class us.codecraft.webmagic.selector.Html
 
selectDocumentForList(Selector) - Method in class us.codecraft.webmagic.selector.Html
 
selectElement(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
selectElement(Element) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
selectElement(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
 
selectElement(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
 
selectElement(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
 
selectElements(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
selectElements(Element) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
selectElements(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
 
selectElements(BaseElementSelector) - Method in class us.codecraft.webmagic.selector.HtmlNode
select elements
selectElements(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
 
selectElements(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
 
selectGroup(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
 
selectGroupList(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
 
selectList(Selector, List<String>) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
selectList(Selector) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
selectList(String) - Method in class us.codecraft.webmagic.selector.AndSelector
 
selectList(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
 
selectList(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
 
selectList(Element) - Method in interface us.codecraft.webmagic.selector.ElementSelector
Extract all results in text.
selectList(Selector) - Method in class us.codecraft.webmagic.selector.HtmlNode
 
selectList(String) - Method in class us.codecraft.webmagic.selector.JsonPathSelector
 
selectList(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
 
selectList(String) - Method in class us.codecraft.webmagic.selector.OrSelector
 
selectList(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
 
selectList(String) - Method in class us.codecraft.webmagic.selector.ReplaceSelector
 
selectList(Selector) - Method in interface us.codecraft.webmagic.selector.Selectable
extract by custom selector
selectList(String) - Method in interface us.codecraft.webmagic.selector.Selector
Extract all results in text.
selectList(String) - Method in class us.codecraft.webmagic.selector.SmartContentSelector
 
selectList(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
 
selectList(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
 
Selector - Interface in us.codecraft.webmagic.selector
Selector(extractor) for text.
Selectors - Class in us.codecraft.webmagic.selector
Convenient methods for selectors.
Selectors() - Constructor for class us.codecraft.webmagic.selector.Selectors
 
SeleniumDownloader - Class in us.codecraft.webmagic.downloader.selenium
使用Selenium调用浏览器进行渲染。目前仅支持chrome。
需要下载Selenium driver支持。
SeleniumDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
新建
SeleniumDownloader() - Constructor for class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
Constructor without any filed.
setAcceptStatCode(Set<Integer>) - Method in class us.codecraft.webmagic.Site
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set.
setAuthor(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
 
setBinaryContent(boolean) - Method in class us.codecraft.webmagic.Request
 
setBody(byte[]) - Method in class us.codecraft.webmagic.model.HttpRequestBody
 
setBytes(byte[]) - Method in class us.codecraft.webmagic.Page
 
setCharset(String) - Method in class us.codecraft.webmagic.Page
 
setCharset(String) - Method in class us.codecraft.webmagic.Request
 
setCharset(String) - Method in class us.codecraft.webmagic.Site
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header.
setContentType(String) - Method in class us.codecraft.webmagic.model.HttpRequestBody
 
setCycleRetryTimes(int) - Method in class us.codecraft.webmagic.Site
Set cycleRetryTimes times when download fail, 0 by default.
setDisableCookieManagement(boolean) - Method in class us.codecraft.webmagic.Site
Downloader is supposed to store response cookie.
setDomain(String) - Method in class us.codecraft.webmagic.Site
set the domain of site.
setDownloader(Downloader) - Method in class us.codecraft.webmagic.Spider
set the downloader of spider
setDownloadSuccess(boolean) - Method in class us.codecraft.webmagic.Page
 
setDuplicateRemover(DuplicateRemover) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
setEmptySleepTime(int) - Method in class us.codecraft.webmagic.Spider
Set wait time when no url is polled.

setEncoding(String) - Method in class us.codecraft.webmagic.model.HttpRequestBody
 
setExecutorService(ExecutorService) - Method in class us.codecraft.webmagic.Spider
 
setExecutorService(ExecutorService) - Method in class us.codecraft.webmagic.thread.CountableThreadPool
 
setExitWhenComplete(boolean) - Method in class us.codecraft.webmagic.Spider
Exit when complete.
setExpressionParams(String[]) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExpressionType(ExpressionType) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExpressionValue(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExtras(Map<String, Object>) - Method in class us.codecraft.webmagic.Request
 
setField(Field) - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
setFieldName(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setHeaders(Map<String, List<String>>) - Method in class us.codecraft.webmagic.Page
 
setHtml(Html) - Method in class us.codecraft.webmagic.Page
Deprecated.
since 0.4.0 The html is parse just when first time of calling Page.getHtml(), so use Page.setRawText(String) instead.
setHttpClientContext(HttpClientContext) - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
 
setHttpUriRequest(HttpUriRequest) - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
 
setHttpUriRequestConverter(HttpUriRequestConverter) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
 
setIsExtractLinks(boolean) - Method in class us.codecraft.webmagic.model.OOSpider
 
setMethod(String) - Method in class us.codecraft.webmagic.Request
 
setMulti(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setName(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
 
setNotNull(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setPath(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
 
setPipelines(List<Pipeline>) - Method in class us.codecraft.webmagic.Spider
set pipelines for Spider
setPoolSize(int) - Method in class us.codecraft.webmagic.downloader.HttpClientGenerator
 
setPriority(long) - Method in class us.codecraft.webmagic.Request
Set the priority of request for sorting.
Need a scheduler supporting priority.
setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
 
setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
setRawText(String) - Method in class us.codecraft.webmagic.Page
 
setReadme(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
 
setRequest(Request) - Method in class us.codecraft.webmagic.Page
 
setRequest(Request) - Method in class us.codecraft.webmagic.ResultItems
 
setRequestBody(HttpRequestBody) - Method in class us.codecraft.webmagic.Request
 
setRetryNum(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
setRetrySleepTime(int) - Method in class us.codecraft.webmagic.Site
Set retry sleep times when download fail, 1000 by default.
setRetryTimes(int) - Method in class us.codecraft.webmagic.Site
Set retry times when download fail, 0 by default.
setScheduler(Scheduler) - Method in class us.codecraft.webmagic.Spider
set scheduler for Spider
setSelector(Selector) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setSite(Site) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
setSkip(boolean) - Method in class us.codecraft.webmagic.Page
 
setSkip(boolean) - Method in class us.codecraft.webmagic.ResultItems
Set whether to skip the result.
Result which is skipped will not be processed by Pipeline.
setSleepTime(int) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
set sleep time to wait until load success
setSleepTime(int) - Method in class us.codecraft.webmagic.Site
Set the interval between the processing of two pages.
Time unit is micro seconds.
setSpawnUrl(boolean) - Method in class us.codecraft.webmagic.Spider
Whether add urls extracted to download.
Add urls to download when it is true, and just download seed urls when it is false.
setSpiderListeners(List<SpiderListener>) - Method in class us.codecraft.webmagic.Spider
 
setStatusCode(int) - Method in class us.codecraft.webmagic.Page
 
setSubPageProcessors(SubPageProcessor...) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
setSubPipeline(SubPipeline...) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
setThread(int) - Method in interface us.codecraft.webmagic.downloader.Downloader
Tell the downloader how many threads the spider used.
setThread(int) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
 
setThread(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
setThread(int) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
 
setTimeOut(int) - Method in class us.codecraft.webmagic.Site
set timeout for downloader in ms
setUrl(Selectable) - Method in class us.codecraft.webmagic.Page
 
setUrl(String) - Method in class us.codecraft.webmagic.Request
 
setUseGzip(boolean) - Method in class us.codecraft.webmagic.Site
Whether use gzip.
setUserAgent(String) - Method in class us.codecraft.webmagic.Site
set user agent
setUUID(String) - Method in class us.codecraft.webmagic.Spider
Set an uuid for spider.
Default uuid is domain of site.
ShortFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
shouldReserved(Request) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
 
shutdown() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
 
SimpleHttpClient - Class in us.codecraft.webmagic
 
SimpleHttpClient() - Constructor for class us.codecraft.webmagic.SimpleHttpClient
 
SimpleHttpClient(Site) - Constructor for class us.codecraft.webmagic.SimpleHttpClient
 
SimplePageProcessor - Class in us.codecraft.webmagic.processor
A simple PageProcessor.
SimplePageProcessor(String) - Constructor for class us.codecraft.webmagic.processor.SimplePageProcessor
 
SimpleProxyProvider - Class in us.codecraft.webmagic.proxy
A simple ProxyProvider.
SimpleProxyProvider(List<Proxy>) - Constructor for class us.codecraft.webmagic.proxy.SimpleProxyProvider
 
SinaBlogProcessor - Class in us.codecraft.webmagic.samples
 
SinaBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.SinaBlogProcessor
 
Site - Class in us.codecraft.webmagic
Object contains setting for crawler.
Site() - Constructor for class us.codecraft.webmagic.Site
 
site - Variable in class us.codecraft.webmagic.Spider
 
sleep(int) - Method in class us.codecraft.webmagic.Spider
 
smartContent() - Method in class us.codecraft.webmagic.selector.HtmlNode
 
smartContent() - Method in class us.codecraft.webmagic.selector.PlainText
 
smartContent() - Method in interface us.codecraft.webmagic.selector.Selectable
select smart content with ReadAbility algorithm
smartContent() - Static method in class us.codecraft.webmagic.selector.Selectors
 
SmartContentSelector - Class in us.codecraft.webmagic.selector
Borrowed from https://code.google.com/p/cx-extractor/
SmartContentSelector() - Constructor for class us.codecraft.webmagic.selector.SmartContentSelector
 
sourceTexts - Variable in class us.codecraft.webmagic.selector.PlainText
 
spawnUrl - Variable in class us.codecraft.webmagic.Spider
 
spider - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
Spider - Class in us.codecraft.webmagic
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline.
Every module is a field of Spider.
Spider(PageProcessor) - Constructor for class us.codecraft.webmagic.Spider
create a spider with pageProcessor.
Spider.Status - Enum in us.codecraft.webmagic
 
SpiderListener - Interface in us.codecraft.webmagic
Listener of Spider on page processing.
SpiderMonitor - Class in us.codecraft.webmagic.monitor
 
SpiderMonitor() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor
 
SpiderMonitor.MonitorSpiderListener - Class in us.codecraft.webmagic.monitor
 
SpiderStatus - Class in us.codecraft.webmagic.monitor
 
SpiderStatus(Spider, SpiderMonitor.MonitorSpiderListener) - Constructor for class us.codecraft.webmagic.monitor.SpiderStatus
 
SpiderStatusMXBean - Interface in us.codecraft.webmagic.monitor
 
start() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
start() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
start() - Method in class us.codecraft.webmagic.Spider
 
startRequest(List<Request>) - Method in class us.codecraft.webmagic.Spider
Set startUrls of Spider.
Prior to startUrls of Site.
startRequests - Variable in class us.codecraft.webmagic.Spider
 
startUrls(List<String>) - Method in class us.codecraft.webmagic.Spider
Set startUrls of Spider.
Prior to startUrls of Site.
stat - Variable in class us.codecraft.webmagic.Spider
 
STAT_INIT - Static variable in class us.codecraft.webmagic.Spider
 
STAT_RUNNING - Static variable in class us.codecraft.webmagic.Spider
 
STAT_STOPPED - Static variable in class us.codecraft.webmagic.Spider
 
StatusCode() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.StatusCode
 
stop() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
stop() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
stop() - Method in class us.codecraft.webmagic.Spider
 
StringTemplateFormatter - Class in us.codecraft.webmagic.samples.formatter
 
StringTemplateFormatter() - Constructor for class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
 
SubPageProcessor - Interface in us.codecraft.webmagic.handler
 
SubPipeline - Interface in us.codecraft.webmagic.handler
 

T

TargetUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define the url patterns for class.
Task - Interface in us.codecraft.webmagic
Interface for identifying different tasks.
test(String...) - Method in class us.codecraft.webmagic.Spider
Process specific urls without url discovering.
thread(int) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
 
thread(int) - Method in class us.codecraft.webmagic.Spider
start with more than one threads
thread(ExecutorService, int) - Method in class us.codecraft.webmagic.Spider
start with more than one threads
threadNum - Variable in class us.codecraft.webmagic.Spider
 
threadPool - Variable in class us.codecraft.webmagic.Spider
 
TianyaPageProcesser - Class in us.codecraft.webmagic.samples
 
TianyaPageProcesser() - Constructor for class us.codecraft.webmagic.samples.TianyaPageProcesser
 
toList(Class<T>) - Method in class us.codecraft.webmagic.selector.Json
 
toObject(Class<T>) - Method in class us.codecraft.webmagic.selector.Json
 
toString() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
toString() - Method in class us.codecraft.webmagic.example.GithubRepo
 
toString() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
 
toString() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
 
toString() - Method in class us.codecraft.webmagic.model.samples.News163
 
toString() - Method in class us.codecraft.webmagic.Page
 
toString() - Method in class us.codecraft.webmagic.proxy.Proxy
 
toString() - Method in class us.codecraft.webmagic.Request
 
toString() - Method in class us.codecraft.webmagic.ResultItems
 
toString() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
 
toString() - Method in class us.codecraft.webmagic.selector.RegexSelector
 
toString() - Method in class us.codecraft.webmagic.selector.ReplaceSelector
 
toString() - Method in interface us.codecraft.webmagic.selector.Selectable
single string result
toString() - Method in class us.codecraft.webmagic.Site
 
toTask() - Method in class us.codecraft.webmagic.Site
 
TRACE - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
 

U

URL_LIST - Static variable in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
 
URL_LIST - Static variable in class us.codecraft.webmagic.samples.SinaBlogProcessor
 
URL_POST - Static variable in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
 
URL_POST - Static variable in class us.codecraft.webmagic.samples.SinaBlogProcessor
 
UrlUtils - Class in us.codecraft.webmagic.utils
url and html utils.
UrlUtils() - Constructor for class us.codecraft.webmagic.utils.UrlUtils
 
us.codecraft.webmagic - package us.codecraft.webmagic
Main class "Spider" and models.
us.codecraft.webmagic.configurable - package us.codecraft.webmagic.configurable
 
us.codecraft.webmagic.downloader - package us.codecraft.webmagic.downloader
Downloader is the part that downloads web pages and store in Page object.
us.codecraft.webmagic.downloader.selenium - package us.codecraft.webmagic.downloader.selenium
 
us.codecraft.webmagic.example - package us.codecraft.webmagic.example
 
us.codecraft.webmagic.handler - package us.codecraft.webmagic.handler
 
us.codecraft.webmagic.main - package us.codecraft.webmagic.main
 
us.codecraft.webmagic.model - package us.codecraft.webmagic.model
Page model and annotations used to customize a crawler.
us.codecraft.webmagic.model.annotation - package us.codecraft.webmagic.model.annotation
Annotations for defining a extractor.
us.codecraft.webmagic.model.formatter - package us.codecraft.webmagic.model.formatter
 
us.codecraft.webmagic.model.samples - package us.codecraft.webmagic.model.samples
 
us.codecraft.webmagic.monitor - package us.codecraft.webmagic.monitor
 
us.codecraft.webmagic.pipeline - package us.codecraft.webmagic.pipeline
Pipeline is the persistent and offline process part of crawler.
us.codecraft.webmagic.processor - package us.codecraft.webmagic.processor
PageProcessor custom part of a crawler for specific site.
us.codecraft.webmagic.processor.example - package us.codecraft.webmagic.processor.example
 
us.codecraft.webmagic.proxy - package us.codecraft.webmagic.proxy
 
us.codecraft.webmagic.samples - package us.codecraft.webmagic.samples
 
us.codecraft.webmagic.samples.formatter - package us.codecraft.webmagic.samples.formatter
 
us.codecraft.webmagic.samples.pipeline - package us.codecraft.webmagic.samples.pipeline
 
us.codecraft.webmagic.samples.scheduler - package us.codecraft.webmagic.samples.scheduler
 
us.codecraft.webmagic.scheduler - package us.codecraft.webmagic.scheduler
Scheduler is the part of url management.
us.codecraft.webmagic.scheduler.component - package us.codecraft.webmagic.scheduler.component
Component of scheduler.
us.codecraft.webmagic.scripts - package us.codecraft.webmagic.scripts
 
us.codecraft.webmagic.selector - package us.codecraft.webmagic.selector
Selectors for page extraction.
us.codecraft.webmagic.thread - package us.codecraft.webmagic.thread
 
us.codecraft.webmagic.utils - package us.codecraft.webmagic.utils
Static utils of webmagic.
USER_AGENT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Header
 
uuid - Variable in class us.codecraft.webmagic.Spider
 

V

validateProxy(Proxy) - Static method in class us.codecraft.webmagic.utils.ProxyUtils
 
valueOf(String) - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.scripts.Language
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.Spider.Status
Returns the enum constant of this type with the specified name.
values() - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.scripts.Language
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.Spider.Status
Returns an array containing the constants of this enum type, in the order they are declared.

W

WMCollections - Class in us.codecraft.webmagic.utils
 
WMCollections() - Constructor for class us.codecraft.webmagic.utils.WMCollections
 

X

XML - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
 
xml(String, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
 
xpath(String) - Method in class us.codecraft.webmagic.selector.HtmlNode
 
xpath(String) - Method in class us.codecraft.webmagic.selector.PlainText
 
xpath(String) - Method in interface us.codecraft.webmagic.selector.Selectable
select list with xpath
xpath(String) - Static method in class us.codecraft.webmagic.selector.Selectors
 
Xpath2Selector - Class in us.codecraft.webmagic.selector
支持xpath2.0的选择器。包装了HtmlCleaner和Saxon HE。
Xpath2Selector(String) - Constructor for class us.codecraft.webmagic.selector.Xpath2Selector
 
XpathSelector - Class in us.codecraft.webmagic.selector
XPath selector based on Xsoup.
XpathSelector(String) - Constructor for class us.codecraft.webmagic.selector.XpathSelector
 
xsoup(String) - Static method in class us.codecraft.webmagic.selector.Selectors
Deprecated.

Z

ZhihuPageProcessor - Class in us.codecraft.webmagic.processor.example
 
ZhihuPageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
 
ZhihuPageProcessor - Class in us.codecraft.webmagic.samples
 
ZhihuPageProcessor() - Constructor for class us.codecraft.webmagic.samples.ZhihuPageProcessor
 
ZipCodePageProcessor - Class in us.codecraft.webmagic.samples.scheduler
 
ZipCodePageProcessor() - Constructor for class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
 
$ A B C D E F G H I J K L M N O P Q R S T U V W X Z 
Skip navigation links

Copyright © 2017. All rights reserved.