@ThreadSafe public class PhantomJSDownloader extends AbstractDownloader
Constructor and Description |
---|
PhantomJSDownloader() |
PhantomJSDownloader(String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令
example:
phantomjs.exe 支持windows环境
phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误
/usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
|
PhantomJSDownloader(String phantomJsCommand,
String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
|
Modifier and Type | Method and Description |
---|---|
Page |
download(Request request,
Task task)
Downloads web pages and store in Page object.
|
protected String |
getPage(Request request) |
int |
getRetryNum() |
PhantomJSDownloader |
setRetryNum(int retryNum) |
void |
setThread(int threadNum)
Tell the downloader how many threads the spider used.
|
download, download, onError, onSuccess
public PhantomJSDownloader()
public PhantomJSDownloader(String phantomJsCommand)
phantomJsCommand
- phantomJsCommandpublic PhantomJSDownloader(String phantomJsCommand, String crawlJsPath)
crawl.js start -- var system = require('system'); var url = system.args[1]; var page = require('webpage').create(); page.settings.loadImages = false; page.settings.resourceTimeout = 5000; page.open(url, function (status) { if (status != 'success') { console.log("HTTP request failed!"); } else { console.log(page.content); } page.close(); phantom.exit(); }); -- crawl.js end具体项目时可以将以上js代码复制下来使用 example: new PhantomJSDownloader("/your/path/phantomjs", "/your/path/crawl.js");
phantomJsCommand
- phantomJsCommandcrawlJsPath
- crawlJsPathpublic Page download(Request request, Task task)
Downloader
request
- requesttask
- taskpublic void setThread(int threadNum)
Downloader
threadNum
- number of threadspublic int getRetryNum()
public PhantomJSDownloader setRetryNum(int retryNum)
Copyright © 2017. All rights reserved.