| Interface | Description |
|---|---|
| MonitorableScheduler |
The scheduler whose requests can be counted for monitor.
|
| Scheduler |
Scheduler is the part of url management.
You can implement interface Scheduler to do: manage urls to fetch remove duplicate urls |
| Class | Description |
|---|---|
| BloomFilterDuplicateRemover |
BloomFilterDuplicateRemover for huge number of urls.
|
| DuplicateRemovedScheduler |
Remove duplicate urls and only push urls which are not duplicate.
|
| FileCacheQueueScheduler |
Store urls and cursor in files so that a Spider can resume the status when shutdown.
|
| PriorityScheduler |
Priority scheduler.
|
| QueueScheduler |
Basic Scheduler implementation.
Store urls to fetch in LinkedBlockingQueue and remove duplicate urls by HashMap. |
| RedisPriorityScheduler |
the redis scheduler with priority
|
| RedisScheduler |
Use Redis as url scheduler for distributed crawlers.
|
Copyright © 2017. All rights reserved.