Interface | Description |
---|---|
MonitorableScheduler |
The scheduler whose requests can be counted for monitor.
|
Scheduler |
Scheduler is the part of url management.
You can implement interface Scheduler to do: manage urls to fetch remove duplicate urls |
Class | Description |
---|---|
BloomFilterDuplicateRemover |
BloomFilterDuplicateRemover for huge number of urls.
|
DuplicateRemovedScheduler |
Remove duplicate urls and only push urls which are not duplicate.
|
FileCacheQueueScheduler |
Store urls and cursor in files so that a Spider can resume the status when shutdown.
|
PriorityScheduler |
Priority scheduler.
|
QueueScheduler |
Basic Scheduler implementation.
Store urls to fetch in LinkedBlockingQueue and remove duplicate urls by HashMap. |
RedisPriorityScheduler |
the redis scheduler with priority
|
RedisScheduler |
Use Redis as url scheduler for distributed crawlers.
|
Copyright © 2017. All rights reserved.