| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.samples.scheduler | |
| us.codecraft.webmagic.scheduler |
Scheduler is the part of url management.
|
| Modifier and Type | Field and Description |
|---|---|
protected Scheduler |
Spider.scheduler |
| Modifier and Type | Method and Description |
|---|---|
Scheduler |
Spider.getScheduler() |
| Modifier and Type | Method and Description |
|---|---|
Spider |
Spider.scheduler(Scheduler scheduler)
Deprecated.
|
Spider |
Spider.setScheduler(Scheduler scheduler)
set scheduler for Spider
|
| Modifier and Type | Class and Description |
|---|---|
class |
DelayQueueScheduler |
class |
LevelLimitScheduler |
| Modifier and Type | Interface and Description |
|---|---|
interface |
MonitorableScheduler
The scheduler whose requests can be counted for monitor.
|
| Modifier and Type | Class and Description |
|---|---|
class |
DuplicateRemovedScheduler
Remove duplicate urls and only push urls which are not duplicate.
|
class |
FileCacheQueueScheduler
Store urls and cursor in files so that a Spider can resume the status when shutdown.
|
class |
PriorityScheduler
Priority scheduler.
|
class |
QueueScheduler
Basic Scheduler implementation.
Store urls to fetch in LinkedBlockingQueue and remove duplicate urls by HashMap. |
class |
RedisPriorityScheduler
the redis scheduler with priority
|
class |
RedisScheduler
Use Redis as url scheduler for distributed crawlers.
|
Copyright © 2017. All rights reserved.