- get(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
-
- get(Page) - Method in class us.codecraft.webmagic.model.PageMapper
-
- get(String) - Method in class us.codecraft.webmagic.ResultItems
-
- get() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- get() - Method in interface us.codecraft.webmagic.selector.Selectable
-
single string result
- get(String, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
-
- get(Request, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
-
- get(String) - Method in class us.codecraft.webmagic.SimpleHttpClient
-
- get(Request) - Method in class us.codecraft.webmagic.SimpleHttpClient
-
- get(String) - Method in class us.codecraft.webmagic.Spider
-
- get(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
-
- get(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
-
- GET - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
-
- getAcceptStatCode() - Method in class us.codecraft.webmagic.Site
-
get acceptStatCode
- getAll(Page) - Method in class us.codecraft.webmagic.model.PageMapper
-
- getAll() - Method in class us.codecraft.webmagic.ResultItems
-
- getAll(Collection<String>) - Method in class us.codecraft.webmagic.Spider
-
Download urls synchronizing.
- getAllCookies() - Method in class us.codecraft.webmagic.Site
-
get cookies of all domains
- getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepoApi
-
- getAuthor() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getAuthor() - Method in class us.codecraft.webmagic.samples.GithubRepo
-
- getBody() - Method in class us.codecraft.webmagic.model.HttpRequestBody
-
- getBytes() - Method in class us.codecraft.webmagic.Page
-
- getCharset() - Method in class us.codecraft.webmagic.Page
-
- getCharset() - Method in class us.codecraft.webmagic.Request
-
- getCharset() - Method in class us.codecraft.webmagic.Site
-
get charset set manually
- getCharset(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
-
- getClient(Site) - Method in class us.codecraft.webmagic.downloader.HttpClientGenerator
-
- getCollected() - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
-
- getCollected() - Method in interface us.codecraft.webmagic.pipeline.CollectorPipeline
-
Get all results collected.
- getCollected() - Method in class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
-
- getCollectorPipeline() - Method in class us.codecraft.webmagic.model.OOSpider
-
- getCollectorPipeline() - Method in class us.codecraft.webmagic.Spider
-
- getContent() - Method in class us.codecraft.webmagic.example.OschinaBlog
-
- getContent() - Method in interface us.codecraft.webmagic.model.samples.Blog
-
- getContent() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
-
- getContent() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
-
- getContent() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
-
- getContentType() - Method in class us.codecraft.webmagic.model.HttpRequestBody
-
- getCookies() - Method in class us.codecraft.webmagic.Request
-
- getCookies() - Method in class us.codecraft.webmagic.Site
-
get cookies
- getCycleRetryTimes() - Method in class us.codecraft.webmagic.Site
-
When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.
- getDate() - Method in class us.codecraft.webmagic.example.OschinaBlog
-
- getDefineFile() - Method in enum us.codecraft.webmagic.scripts.Language
-
- getDescription() - Method in class us.codecraft.webmagic.example.BaiduBaike
-
- getDescription() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
-
- getDocument() - Method in class us.codecraft.webmagic.selector.Html
-
- getDomain() - Method in class us.codecraft.webmagic.Site
-
get domain
- getDomain(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
-
- getDuplicateRemover() - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- getElements() - Method in class us.codecraft.webmagic.selector.Html
-
- getElements() - Method in class us.codecraft.webmagic.selector.HtmlNode
-
- getEncoding() - Method in class us.codecraft.webmagic.model.HttpRequestBody
-
- getEngine() - Method in class us.codecraft.webmagic.scripts.ScriptEnginePool
-
- getEngineName() - Method in enum us.codecraft.webmagic.scripts.Language
-
- getErrorCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
-
- getErrorPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getErrorPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getErrorPages() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getErrorPages() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getErrorUrls() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
-
- getExpressionParams() - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- getExpressionType() - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- getExpressionValue() - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- getExtra(String) - Method in class us.codecraft.webmagic.Request
-
- getExtras() - Method in class us.codecraft.webmagic.Request
-
- getFieldName() - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- getFieldsIncludeSuperClass(Class) - Static method in class us.codecraft.webmagic.utils.ClassUtils
-
- getFile(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
-
- getFirstNoLoopbackIPAddresses() - Static method in class us.codecraft.webmagic.utils.IPUtils
-
- getFirstSourceText() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- getFork() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getFork() - Method in class us.codecraft.webmagic.example.GithubRepoApi
-
- getFork() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getGatherFile() - Method in enum us.codecraft.webmagic.scripts.Language
-
- getHeaders() - Method in class us.codecraft.webmagic.Page
-
- getHeaders() - Method in class us.codecraft.webmagic.Request
-
- getHeaders() - Method in class us.codecraft.webmagic.Site
-
- getHost() - Method in class us.codecraft.webmagic.proxy.Proxy
-
- getHost(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
-
- getHtml() - Method in class us.codecraft.webmagic.Page
-
get html content of page
- getHttpClientContext() - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
-
- getHttpUriRequest() - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
-
- getItemKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- getJson() - Method in class us.codecraft.webmagic.Page
-
get json content of page
- getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepoApi
-
- getLanguage() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getLeftPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getLeftPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
-
- getLeftRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.MonitorableScheduler
-
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
-
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
-
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- getMethod() - Method in class us.codecraft.webmagic.Request
-
The http method of the request.
- getName() - Method in class us.codecraft.webmagic.example.BaiduBaike
-
- getName() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getName() - Method in class us.codecraft.webmagic.example.GithubRepoApi
-
- getName() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
-
- getName() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getName() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getName() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getName() - Method in class us.codecraft.webmagic.samples.GithubRepo
-
- getOtherPages() - Method in class us.codecraft.webmagic.model.samples.News163
-
- getOtherPages() - Method in interface us.codecraft.webmagic.MultiPageModel
-
other pages to be extracted.
It is used to judge whether an object contains more than one page, and whether the pages of the object are all extracted.
- getPage(Request) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- getPage() - Method in class us.codecraft.webmagic.model.samples.News163
-
- getPage() - Method in interface us.codecraft.webmagic.MultiPageModel
-
page is the identifier of a page in pages for one object.
- getPageCount() - Method in class us.codecraft.webmagic.Spider
-
Get page count downloaded by spider.
- getPageKey() - Method in class us.codecraft.webmagic.model.samples.News163
-
- getPageKey() - Method in interface us.codecraft.webmagic.MultiPageModel
-
Page key is the identifier for the object.
- getPagePerSecond() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getPagePerSecond() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getPassword() - Method in class us.codecraft.webmagic.proxy.Proxy
-
- getPath() - Method in class us.codecraft.webmagic.utils.FilePersistentBase
-
- getPort() - Method in class us.codecraft.webmagic.proxy.Proxy
-
- getPriority() - Method in class us.codecraft.webmagic.Request
-
- getProxy(Task) - Method in interface us.codecraft.webmagic.proxy.ProxyProvider
-
Get a proxy for task by some strategy.
- getProxy(Task) - Method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
-
- getQueueKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- getRawText() - Method in class us.codecraft.webmagic.Page
-
- getReadme() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getReadme() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getReadme() - Method in class us.codecraft.webmagic.samples.GithubRepo
-
- getRedirect(HttpRequest, HttpResponse, HttpContext) - Method in class us.codecraft.webmagic.downloader.CustomRedirectStrategy
-
- getRequest() - Method in class us.codecraft.webmagic.Page
-
get request of current page
- getRequest() - Method in class us.codecraft.webmagic.ResultItems
-
- getRequestBody() - Method in class us.codecraft.webmagic.Request
-
- getResultItems() - Method in class us.codecraft.webmagic.Page
-
- getRetryNum() - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- getRetrySleepTime() - Method in class us.codecraft.webmagic.Site
-
- getRetryTimes() - Method in class us.codecraft.webmagic.Site
-
Get retry times immediately when download fail, 0 by default.
- getScheduler() - Method in class us.codecraft.webmagic.Spider
-
- getSelector() - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- getSelector(ExtractBy) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
-
- getSelectors(ExtractBy[]) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
-
- getSetKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- getSite() - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
-
- getSite() - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
-
- getSite() - Method in interface us.codecraft.webmagic.processor.PageProcessor
-
get the site settings
- getSite() - Method in class us.codecraft.webmagic.processor.SimplePageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.AngularJSProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.DiandianBlogProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.F58PageProcesser
-
- getSite() - Method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.HuxiuProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.KaichibaProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.MamacnPageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.MeicanProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.NjuBBSProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.QzoneBlogProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.SinaBlogProcessor
-
- getSite() - Method in class us.codecraft.webmagic.samples.TianyaPageProcesser
-
- getSite() - Method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
-
- getSite() - Method in class us.codecraft.webmagic.scripts.ScriptProcessor
-
- getSite() - Method in class us.codecraft.webmagic.Spider
-
- getSite() - Method in interface us.codecraft.webmagic.Task
-
site of a task
- getSleepTime() - Method in class us.codecraft.webmagic.Site
-
Get the interval between the processing of two pages.
Time unit is micro seconds.
- getSourceTexts() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- getSourceTexts() - Method in class us.codecraft.webmagic.selector.HtmlNode
-
- getSourceTexts() - Method in class us.codecraft.webmagic.selector.PlainText
-
- getSpiderListeners() - Method in class us.codecraft.webmagic.Spider
-
- getSpiderStatusMBean(Spider, SpiderMonitor.MonitorSpiderListener) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
-
- getStar() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getStar() - Method in class us.codecraft.webmagic.example.GithubRepoApi
-
- getStar() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getStartTime() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getStartTime() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getStartTime() - Method in class us.codecraft.webmagic.Spider
-
- getStatus() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getStatus() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getStatus() - Method in class us.codecraft.webmagic.Spider
-
Get running status by spider.
- getStatusCode() - Method in class us.codecraft.webmagic.Page
-
- getSuccessCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
-
- getSuccessPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getSuccessPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getTags() - Method in class us.codecraft.webmagic.example.OschinaBlog
-
- getTags() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
-
- getTargetRequests() - Method in class us.codecraft.webmagic.Page
-
- getText(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
-
- getThread() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getThread() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getThreadAlive() - Method in class us.codecraft.webmagic.Spider
-
Get thread count which is running
- getThreadAlive() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
-
- getThreadNum() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
-
- getTimeOut() - Method in class us.codecraft.webmagic.Site
-
- getTitle() - Method in class us.codecraft.webmagic.example.OschinaBlog
-
- getTitle() - Method in interface us.codecraft.webmagic.model.samples.Blog
-
- getTitle() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
-
- getTitle() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
-
- getTitle() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
-
- getTotalPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- getTotalPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
-
- getTotalRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
-
Get TotalRequestsCount for monitor.
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
-
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
-
- getTotalRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.MonitorableScheduler
-
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
-
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
-
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- getUrl() - Method in class us.codecraft.webmagic.example.GithubRepo
-
- getUrl() - Method in class us.codecraft.webmagic.example.GithubRepoApi
-
- getUrl() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
-
- getUrl() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
-
- getUrl() - Method in class us.codecraft.webmagic.Page
-
get url of current page
- getUrl() - Method in class us.codecraft.webmagic.Request
-
- getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
-
- getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
-
- getUserAgent() - Method in class us.codecraft.webmagic.Site
-
get user agent
- getUsername() - Method in class us.codecraft.webmagic.proxy.Proxy
-
- getUUID() - Method in class us.codecraft.webmagic.Spider
-
- getUUID() - Method in interface us.codecraft.webmagic.Task
-
unique id for a task.
- GithubRepo - Class in us.codecraft.webmagic.example
-
- GithubRepo() - Constructor for class us.codecraft.webmagic.example.GithubRepo
-
- GithubRepo - Class in us.codecraft.webmagic.model.samples
-
- GithubRepo() - Constructor for class us.codecraft.webmagic.model.samples.GithubRepo
-
- GithubRepo - Class in us.codecraft.webmagic.samples
-
- GithubRepo() - Constructor for class us.codecraft.webmagic.samples.GithubRepo
-
- GithubRepoApi - Class in us.codecraft.webmagic.example
-
- GithubRepoApi() - Constructor for class us.codecraft.webmagic.example.GithubRepoApi
-
- GithubRepoPageMapper - Class in us.codecraft.webmagic.example
-
- GithubRepoPageMapper() - Constructor for class us.codecraft.webmagic.example.GithubRepoPageMapper
-
- GithubRepoPageProcessor - Class in us.codecraft.webmagic.processor.example
-
- GithubRepoPageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
-
- GithubRepoPageProcessor - Class in us.codecraft.webmagic.samples
-
- GithubRepoPageProcessor() - Constructor for class us.codecraft.webmagic.samples.GithubRepoPageProcessor
-
- Page - Class in us.codecraft.webmagic
-
- Page() - Constructor for class us.codecraft.webmagic.Page
-
- PageMapper<T> - Class in us.codecraft.webmagic.model
-
- PageMapper(Class<T>) - Constructor for class us.codecraft.webmagic.model.PageMapper
-
- PageModelPipeline<T> - Interface in us.codecraft.webmagic.pipeline
-
Implements PageModelPipeline to persistent your page model.
- PageProcessor - Interface in us.codecraft.webmagic.processor
-
Interface to be implemented to customize a crawler.
In PageProcessor, you can customize:
start urls and other settings in
Site
how the urls to fetch are detected
how the data are extracted and stored
- pageProcessor - Variable in class us.codecraft.webmagic.Spider
-
- path - Variable in class us.codecraft.webmagic.utils.FilePersistentBase
-
- PATH_SEPERATOR - Static variable in class us.codecraft.webmagic.utils.FilePersistentBase
-
- pattern - Variable in class us.codecraft.webmagic.handler.PatternRequestMatcher
-
match pattern.
- PatternProcessor - Class in us.codecraft.webmagic.handler
-
- PatternProcessor(String) - Constructor for class us.codecraft.webmagic.handler.PatternProcessor
-
- PatternProcessorExample - Class in us.codecraft.webmagic.example
-
Created with IntelliJ IDEA.
- PatternProcessorExample() - Constructor for class us.codecraft.webmagic.example.PatternProcessorExample
-
- PatternRequestMatcher - Class in us.codecraft.webmagic.handler
-
Created with IntelliJ IDEA.
- PatternRequestMatcher(String) - Constructor for class us.codecraft.webmagic.handler.PatternRequestMatcher
-
- PhantomJSDownloader - Class in us.codecraft.webmagic.downloader
-
this downloader is used to download pages which need to render the javascript
- PhantomJSDownloader() - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- PhantomJSDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
添加新的构造函数,支持phantomjs自定义命令
example:
phantomjs.exe 支持windows环境
phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误
/usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
- PhantomJSDownloader(String, String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
- PhantomJSPageProcessor - Class in us.codecraft.webmagic.samples
-
Created by dolphineor on 2014-11-21.
- PhantomJSPageProcessor() - Constructor for class us.codecraft.webmagic.samples.PhantomJSPageProcessor
-
- Pipeline - Interface in us.codecraft.webmagic.pipeline
-
Pipeline is the persistent and offline process part of crawler.
The interface Pipeline can be implemented to customize ways of persistent.
- pipeline(Pipeline) - Method in class us.codecraft.webmagic.Spider
-
Deprecated.
- pipelines - Variable in class us.codecraft.webmagic.Spider
-
- PlainText - Class in us.codecraft.webmagic.selector
-
Selectable plain text.
Can not be selected by XPath or CSS Selector.
- PlainText(List<String>) - Constructor for class us.codecraft.webmagic.selector.PlainText
-
- PlainText(String) - Constructor for class us.codecraft.webmagic.selector.PlainText
-
- poll(Task) - Method in class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
-
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
-
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
-
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
-
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
-
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- poll(Task) - Method in interface us.codecraft.webmagic.scheduler.Scheduler
-
get an url to crawl
- pool - Variable in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- POST - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
-
- PriorityScheduler - Class in us.codecraft.webmagic.scheduler
-
Priority scheduler.
- PriorityScheduler() - Constructor for class us.codecraft.webmagic.scheduler.PriorityScheduler
-
- process(Page) - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
-
- process(Page) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
-
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.handler.CompositePipeline
-
- process(Object, Task) - Method in class us.codecraft.webmagic.model.ConsolePageModelPipeline
-
- process(T, Task) - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
-
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.ConsolePipeline
-
- process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.FilePageModelPipeline
-
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.FilePipeline
-
- process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
-
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePipeline
-
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.MultiPagePipeline
-
- process(T, Task) - Method in interface us.codecraft.webmagic.pipeline.PageModelPipeline
-
- process(ResultItems, Task) - Method in interface us.codecraft.webmagic.pipeline.Pipeline
-
Process extracted results.
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
-
- process(Page) - Method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
-
- process(Page) - Method in interface us.codecraft.webmagic.processor.PageProcessor
-
process the page, extract urls to fetch, extract the data and store
- process(Page) - Method in class us.codecraft.webmagic.processor.SimplePageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.AngularJSProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.DiandianBlogProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.F58PageProcesser
-
- process(Page) - Method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.HuxiuProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.KaichibaProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.MamacnPageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.MeicanProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.NjuBBSProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
-
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
-
- process(Page) - Method in class us.codecraft.webmagic.samples.QzoneBlogProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.SinaBlogProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.samples.TianyaPageProcesser
-
- process(Page) - Method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
-
- process(Page) - Method in class us.codecraft.webmagic.scripts.ScriptProcessor
-
- processPage(Page) - Method in interface us.codecraft.webmagic.handler.SubPageProcessor
-
process the page, extract urls to fetch, extract the data and store
- processResult(ResultItems, Task) - Method in interface us.codecraft.webmagic.handler.SubPipeline
-
process the page, extract urls to fetch, extract the data and store
- Proxy - Class in us.codecraft.webmagic.proxy
-
- Proxy(String, int) - Constructor for class us.codecraft.webmagic.proxy.Proxy
-
- Proxy(String, int, String, String) - Constructor for class us.codecraft.webmagic.proxy.Proxy
-
- ProxyProvider - Interface in us.codecraft.webmagic.proxy
-
Proxy provider.
- ProxyUtils - Class in us.codecraft.webmagic.utils
-
Pooled Proxy Object
- ProxyUtils() - Constructor for class us.codecraft.webmagic.utils.ProxyUtils
-
- push(Request, Task) - Method in class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
-
- push(Request, Task) - Method in class us.codecraft.webmagic.samples.scheduler.LevelLimitScheduler
-
- push(Request, Task) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- push(Request, Task) - Method in interface us.codecraft.webmagic.scheduler.Scheduler
-
add a url to fetch
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
-
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
-
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
-
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
-
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
-
- put(Class<? extends ObjectFormatter>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
-
- put(String, T) - Method in class us.codecraft.webmagic.ResultItems
-
- put(K1, Map<K2, V>) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
-
- put(K1, K2, V) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
-
- PUT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
-
- putExtra(String, Object) - Method in class us.codecraft.webmagic.Request
-
- putField(String, Object) - Method in class us.codecraft.webmagic.Page
-
store extract results
- Scheduler - Interface in us.codecraft.webmagic.scheduler
-
Scheduler is the part of url management.
You can implement interface Scheduler to do:
manage urls to fetch
remove duplicate urls
- scheduler - Variable in class us.codecraft.webmagic.Spider
-
- scheduler(Scheduler) - Method in class us.codecraft.webmagic.Spider
-
Deprecated.
- script(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
-
- ScriptConsole - Class in us.codecraft.webmagic.scripts
-
- ScriptConsole() - Constructor for class us.codecraft.webmagic.scripts.ScriptConsole
-
- ScriptEnginePool - Class in us.codecraft.webmagic.scripts
-
- ScriptEnginePool(Language, int) - Constructor for class us.codecraft.webmagic.scripts.ScriptEnginePool
-
- scriptFromClassPathFile(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
-
- scriptFromFile(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
-
- ScriptProcessor - Class in us.codecraft.webmagic.scripts
-
- ScriptProcessor(Language, String, int) - Constructor for class us.codecraft.webmagic.scripts.ScriptProcessor
-
- ScriptProcessorBuilder - Class in us.codecraft.webmagic.scripts
-
- select(Selector, List<String>) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- select(Selector) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- select(String) - Method in class us.codecraft.webmagic.selector.AndSelector
-
- select(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
-
- select(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
-
- select(Element) - Method in interface us.codecraft.webmagic.selector.ElementSelector
-
Extract single result in text.
If there are more than one result, only the first will be chosen.
- select(Selector) - Method in class us.codecraft.webmagic.selector.HtmlNode
-
- select(String) - Method in class us.codecraft.webmagic.selector.JsonPathSelector
-
- select(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
-
- select(String) - Method in class us.codecraft.webmagic.selector.OrSelector
-
- select(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
-
- select(String) - Method in class us.codecraft.webmagic.selector.ReplaceSelector
-
- select(Selector) - Method in interface us.codecraft.webmagic.selector.Selectable
-
extract by custom selector
- select(String) - Method in interface us.codecraft.webmagic.selector.Selector
-
Extract single result in text.
If there are more than one result, only the first will be chosen.
- select(String) - Method in class us.codecraft.webmagic.selector.SmartContentSelector
-
- select(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
-
- select(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
-
- Selectable - Interface in us.codecraft.webmagic.selector
-
Selectable text.
- selectDocument(Selector) - Method in class us.codecraft.webmagic.selector.Html
-
- selectDocumentForList(Selector) - Method in class us.codecraft.webmagic.selector.Html
-
- selectElement(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
-
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
-
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
-
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
-
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
-
- selectElements(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
-
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
-
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
-
- selectElements(BaseElementSelector) - Method in class us.codecraft.webmagic.selector.HtmlNode
-
select elements
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
-
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
-
- selectGroup(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
-
- selectGroupList(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
-
- selectList(Selector, List<String>) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- selectList(Selector) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.AndSelector
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
-
- selectList(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
-
- selectList(Element) - Method in interface us.codecraft.webmagic.selector.ElementSelector
-
Extract all results in text.
- selectList(Selector) - Method in class us.codecraft.webmagic.selector.HtmlNode
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.JsonPathSelector
-
- selectList(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.OrSelector
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.ReplaceSelector
-
- selectList(Selector) - Method in interface us.codecraft.webmagic.selector.Selectable
-
extract by custom selector
- selectList(String) - Method in interface us.codecraft.webmagic.selector.Selector
-
Extract all results in text.
- selectList(String) - Method in class us.codecraft.webmagic.selector.SmartContentSelector
-
- selectList(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
-
- selectList(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
-
- Selector - Interface in us.codecraft.webmagic.selector
-
Selector(extractor) for text.
- Selectors - Class in us.codecraft.webmagic.selector
-
Convenient methods for selectors.
- Selectors() - Constructor for class us.codecraft.webmagic.selector.Selectors
-
- SeleniumDownloader - Class in us.codecraft.webmagic.downloader.selenium
-
使用Selenium调用浏览器进行渲染。目前仅支持chrome。
需要下载Selenium driver支持。
- SeleniumDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
新建
- SeleniumDownloader() - Constructor for class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
Constructor without any filed.
- setAcceptStatCode(Set<Integer>) - Method in class us.codecraft.webmagic.Site
-
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set.
- setAuthor(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
-
- setBinaryContent(boolean) - Method in class us.codecraft.webmagic.Request
-
- setBody(byte[]) - Method in class us.codecraft.webmagic.model.HttpRequestBody
-
- setBytes(byte[]) - Method in class us.codecraft.webmagic.Page
-
- setCharset(String) - Method in class us.codecraft.webmagic.Page
-
- setCharset(String) - Method in class us.codecraft.webmagic.Request
-
- setCharset(String) - Method in class us.codecraft.webmagic.Site
-
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header.
- setContentType(String) - Method in class us.codecraft.webmagic.model.HttpRequestBody
-
- setCycleRetryTimes(int) - Method in class us.codecraft.webmagic.Site
-
Set cycleRetryTimes times when download fail, 0 by default.
- setDisableCookieManagement(boolean) - Method in class us.codecraft.webmagic.Site
-
Downloader is supposed to store response cookie.
- setDomain(String) - Method in class us.codecraft.webmagic.Site
-
set the domain of site.
- setDownloader(Downloader) - Method in class us.codecraft.webmagic.Spider
-
set the downloader of spider
- setDownloadSuccess(boolean) - Method in class us.codecraft.webmagic.Page
-
- setDuplicateRemover(DuplicateRemover) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- setEmptySleepTime(int) - Method in class us.codecraft.webmagic.Spider
-
Set wait time when no url is polled.
- setEncoding(String) - Method in class us.codecraft.webmagic.model.HttpRequestBody
-
- setExecutorService(ExecutorService) - Method in class us.codecraft.webmagic.Spider
-
- setExecutorService(ExecutorService) - Method in class us.codecraft.webmagic.thread.CountableThreadPool
-
- setExitWhenComplete(boolean) - Method in class us.codecraft.webmagic.Spider
-
Exit when complete.
- setExpressionParams(String[]) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setExpressionType(ExpressionType) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setExpressionValue(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setExtras(Map<String, Object>) - Method in class us.codecraft.webmagic.Request
-
- setField(Field) - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
-
- setFieldName(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setHeaders(Map<String, List<String>>) - Method in class us.codecraft.webmagic.Page
-
- setHtml(Html) - Method in class us.codecraft.webmagic.Page
-
- setHttpClientContext(HttpClientContext) - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
-
- setHttpUriRequest(HttpUriRequest) - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
-
- setHttpUriRequestConverter(HttpUriRequestConverter) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
-
- setIsExtractLinks(boolean) - Method in class us.codecraft.webmagic.model.OOSpider
-
- setMethod(String) - Method in class us.codecraft.webmagic.Request
-
- setMulti(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setName(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
-
- setNotNull(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setPath(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
-
- setPipelines(List<Pipeline>) - Method in class us.codecraft.webmagic.Spider
-
set pipelines for Spider
- setPoolSize(int) - Method in class us.codecraft.webmagic.downloader.HttpClientGenerator
-
- setPriority(long) - Method in class us.codecraft.webmagic.Request
-
Set the priority of request for sorting.
Need a scheduler supporting priority.
- setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
-
- setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.SimpleHttpClient
-
- setRawText(String) - Method in class us.codecraft.webmagic.Page
-
- setReadme(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
-
- setRequest(Request) - Method in class us.codecraft.webmagic.Page
-
- setRequest(Request) - Method in class us.codecraft.webmagic.ResultItems
-
- setRequestBody(HttpRequestBody) - Method in class us.codecraft.webmagic.Request
-
- setRetryNum(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- setRetrySleepTime(int) - Method in class us.codecraft.webmagic.Site
-
Set retry sleep times when download fail, 1000 by default.
- setRetryTimes(int) - Method in class us.codecraft.webmagic.Site
-
Set retry times when download fail, 0 by default.
- setScheduler(Scheduler) - Method in class us.codecraft.webmagic.Spider
-
set scheduler for Spider
- setSelector(Selector) - Method in class us.codecraft.webmagic.configurable.ExtractRule
-
- setSite(Site) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
-
- setSkip(boolean) - Method in class us.codecraft.webmagic.Page
-
- setSkip(boolean) - Method in class us.codecraft.webmagic.ResultItems
-
Set whether to skip the result.
Result which is skipped will not be processed by Pipeline.
- setSleepTime(int) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
set sleep time to wait until load success
- setSleepTime(int) - Method in class us.codecraft.webmagic.Site
-
Set the interval between the processing of two pages.
Time unit is micro seconds.
- setSpawnUrl(boolean) - Method in class us.codecraft.webmagic.Spider
-
Whether add urls extracted to download.
Add urls to download when it is true, and just download seed urls when it is false.
- setSpiderListeners(List<SpiderListener>) - Method in class us.codecraft.webmagic.Spider
-
- setStatusCode(int) - Method in class us.codecraft.webmagic.Page
-
- setSubPageProcessors(SubPageProcessor...) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
-
- setSubPipeline(SubPipeline...) - Method in class us.codecraft.webmagic.handler.CompositePipeline
-
- setThread(int) - Method in interface us.codecraft.webmagic.downloader.Downloader
-
Tell the downloader how many threads the spider used.
- setThread(int) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
-
- setThread(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- setThread(int) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
- setTimeOut(int) - Method in class us.codecraft.webmagic.Site
-
set timeout for downloader in ms
- setUrl(Selectable) - Method in class us.codecraft.webmagic.Page
-
- setUrl(String) - Method in class us.codecraft.webmagic.Request
-
- setUseGzip(boolean) - Method in class us.codecraft.webmagic.Site
-
Whether use gzip.
- setUserAgent(String) - Method in class us.codecraft.webmagic.Site
-
set user agent
- setUUID(String) - Method in class us.codecraft.webmagic.Spider
-
Set an uuid for spider.
Default uuid is domain of site.
- ShortFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
-
- shouldReserved(Request) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- shutdown() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
-
- SimpleHttpClient - Class in us.codecraft.webmagic
-
- SimpleHttpClient() - Constructor for class us.codecraft.webmagic.SimpleHttpClient
-
- SimpleHttpClient(Site) - Constructor for class us.codecraft.webmagic.SimpleHttpClient
-
- SimplePageProcessor - Class in us.codecraft.webmagic.processor
-
A simple PageProcessor.
- SimplePageProcessor(String) - Constructor for class us.codecraft.webmagic.processor.SimplePageProcessor
-
- SimpleProxyProvider - Class in us.codecraft.webmagic.proxy
-
A simple ProxyProvider.
- SimpleProxyProvider(List<Proxy>) - Constructor for class us.codecraft.webmagic.proxy.SimpleProxyProvider
-
- SinaBlogProcessor - Class in us.codecraft.webmagic.samples
-
- SinaBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.SinaBlogProcessor
-
- Site - Class in us.codecraft.webmagic
-
Object contains setting for crawler.
- Site() - Constructor for class us.codecraft.webmagic.Site
-
- site - Variable in class us.codecraft.webmagic.Spider
-
- sleep(int) - Method in class us.codecraft.webmagic.Spider
-
- smartContent() - Method in class us.codecraft.webmagic.selector.HtmlNode
-
- smartContent() - Method in class us.codecraft.webmagic.selector.PlainText
-
- smartContent() - Method in interface us.codecraft.webmagic.selector.Selectable
-
select smart content with ReadAbility algorithm
- smartContent() - Static method in class us.codecraft.webmagic.selector.Selectors
-
- SmartContentSelector - Class in us.codecraft.webmagic.selector
-
Borrowed from https://code.google.com/p/cx-extractor/
- SmartContentSelector() - Constructor for class us.codecraft.webmagic.selector.SmartContentSelector
-
- sourceTexts - Variable in class us.codecraft.webmagic.selector.PlainText
-
- spawnUrl - Variable in class us.codecraft.webmagic.Spider
-
- spider - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
-
- Spider - Class in us.codecraft.webmagic
-
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and
Pipeline.
Every module is a field of Spider.
- Spider(PageProcessor) - Constructor for class us.codecraft.webmagic.Spider
-
create a spider with pageProcessor.
- Spider.Status - Enum in us.codecraft.webmagic
-
- SpiderListener - Interface in us.codecraft.webmagic
-
Listener of Spider on page processing.
- SpiderMonitor - Class in us.codecraft.webmagic.monitor
-
- SpiderMonitor() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor
-
- SpiderMonitor.MonitorSpiderListener - Class in us.codecraft.webmagic.monitor
-
- SpiderStatus - Class in us.codecraft.webmagic.monitor
-
- SpiderStatus(Spider, SpiderMonitor.MonitorSpiderListener) - Constructor for class us.codecraft.webmagic.monitor.SpiderStatus
-
- SpiderStatusMXBean - Interface in us.codecraft.webmagic.monitor
-
- start() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- start() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- start() - Method in class us.codecraft.webmagic.Spider
-
- startRequest(List<Request>) - Method in class us.codecraft.webmagic.Spider
-
Set startUrls of Spider.
Prior to startUrls of Site.
- startRequests - Variable in class us.codecraft.webmagic.Spider
-
- startUrls(List<String>) - Method in class us.codecraft.webmagic.Spider
-
Set startUrls of Spider.
Prior to startUrls of Site.
- stat - Variable in class us.codecraft.webmagic.Spider
-
- STAT_INIT - Static variable in class us.codecraft.webmagic.Spider
-
- STAT_RUNNING - Static variable in class us.codecraft.webmagic.Spider
-
- STAT_STOPPED - Static variable in class us.codecraft.webmagic.Spider
-
- StatusCode() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.StatusCode
-
- stop() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
-
- stop() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
-
- stop() - Method in class us.codecraft.webmagic.Spider
-
- StringTemplateFormatter - Class in us.codecraft.webmagic.samples.formatter
-
- StringTemplateFormatter() - Constructor for class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
-
- SubPageProcessor - Interface in us.codecraft.webmagic.handler
-
- SubPipeline - Interface in us.codecraft.webmagic.handler
-