Package | Description |
---|---|
us.codecraft.webmagic |
Main class "Spider" and models.
|
us.codecraft.webmagic.model |
Page model and annotations used to customize a crawler.
|
us.codecraft.webmagic.monitor |
Modifier and Type | Method and Description |
---|---|
Spider |
Spider.addPipeline(Pipeline pipeline)
add a pipeline for Spider
|
Spider |
Spider.addRequest(Request... requests)
Add urls with information to crawl.
|
Spider |
Spider.addUrl(String... urls)
Add urls to crawl.
|
Spider |
Spider.clearPipeline()
clear the pipelines set
|
static Spider |
Spider.create(PageProcessor pageProcessor)
create a spider with pageProcessor.
|
Spider |
Spider.downloader(Downloader downloader)
Deprecated.
|
Spider |
Spider.pipeline(Pipeline pipeline)
Deprecated.
|
Spider |
Spider.scheduler(Scheduler scheduler)
Deprecated.
|
Spider |
Spider.setDownloader(Downloader downloader)
set the downloader of spider
|
Spider |
Spider.setExecutorService(ExecutorService executorService) |
Spider |
Spider.setExitWhenComplete(boolean exitWhenComplete)
Exit when complete.
|
Spider |
Spider.setPipelines(List<Pipeline> pipelines)
set pipelines for Spider
|
Spider |
Spider.setScheduler(Scheduler scheduler)
set scheduler for Spider
|
Spider |
Spider.setSpawnUrl(boolean spawnUrl)
Whether add urls extracted to download.
Add urls to download when it is true, and just download seed urls when it is false. |
Spider |
Spider.setSpiderListeners(List<SpiderListener> spiderListeners) |
Spider |
Spider.setUUID(String uuid)
Set an uuid for spider.
Default uuid is domain of site. |
Spider |
Spider.startRequest(List<Request> startRequests)
Set startUrls of Spider.
Prior to startUrls of Site. |
Spider |
Spider.startUrls(List<String> startUrls)
Set startUrls of Spider.
Prior to startUrls of Site. |
Spider |
Spider.thread(ExecutorService executorService,
int threadNum)
start with more than one threads
|
Spider |
Spider.thread(int threadNum)
start with more than one threads
|
Modifier and Type | Class and Description |
---|---|
class |
OOSpider<T>
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model". |
Modifier and Type | Field and Description |
---|---|
protected Spider |
SpiderStatus.spider |
Modifier and Type | Method and Description |
---|---|
protected SpiderStatusMXBean |
SpiderMonitor.getSpiderStatusMBean(Spider spider,
SpiderMonitor.MonitorSpiderListener monitorSpiderListener) |
SpiderMonitor |
SpiderMonitor.register(Spider... spiders)
Register spider for monitor.
|
Constructor and Description |
---|
SpiderStatus(Spider spider,
SpiderMonitor.MonitorSpiderListener monitorSpiderListener) |
Copyright © 2017. All rights reserved.