Skip navigation links

Prev Package
Next Package

All Classes

Package us.codecraft.webmagic

Main class "Spider" and models.

See: Description

Interface Summary
Interface	Description
MultiPageModel	Extract an object of more than one pages, such as news and articles.
SpiderListener	Listener of Spider on page processing.
Task	Interface for identifying different tasks.

Class Summary
Class	Description
Page	Object storing extracted result and urls to fetch. Not thread safe. Main method： `Page.getUrl()` get url of current page `Page.getHtml()` get content of current page `Page.putField(String, Object)` save extracted result `Page.getResultItems()` get extract results to be used in `Pipeline` `Page.addTargetRequests(java.util.List)` `Page.addTargetRequest(String)` add urls to fetch
Request	Object contains url to crawl. It contains some additional information.
ResultItems	Object contains extract results. It is contained in Page and will be processed in pipeline.
SimpleHttpClient
Site	Object contains setting for crawler.
Spider	Entrance of a crawler. A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider.

Enum Summary
Enum Description

Spider.Status

Package us.codecraft.webmagic Description

Main class "Spider" and models.

Skip navigation links

Prev Package
Next Package

All Classes

Copyright © 2017. All rights reserved.