@ThreadSafe public class PhantomJSDownloader extends AbstractDownloader
| Constructor and Description |
|---|
PhantomJSDownloader() |
PhantomJSDownloader(String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令
example:
phantomjs.exe 支持windows环境
phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误
/usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
|
PhantomJSDownloader(String phantomJsCommand,
String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
|
| Modifier and Type | Method and Description |
|---|---|
Page |
download(Request request,
Task task)
Downloads web pages and store in Page object.
|
protected String |
getPage(Request request) |
int |
getRetryNum() |
PhantomJSDownloader |
setRetryNum(int retryNum) |
void |
setThread(int threadNum)
Tell the downloader how many threads the spider used.
|
download, download, onError, onSuccesspublic PhantomJSDownloader()
public PhantomJSDownloader(String phantomJsCommand)
phantomJsCommand - phantomJsCommandpublic PhantomJSDownloader(String phantomJsCommand, String crawlJsPath)
crawl.js start --
var system = require('system');
var url = system.args[1];
var page = require('webpage').create();
page.settings.loadImages = false;
page.settings.resourceTimeout = 5000;
page.open(url, function (status) {
if (status != 'success') {
console.log("HTTP request failed!");
} else {
console.log(page.content);
}
page.close();
phantom.exit();
});
-- crawl.js end
具体项目时可以将以上js代码复制下来使用
example:
new PhantomJSDownloader("/your/path/phantomjs", "/your/path/crawl.js");phantomJsCommand - phantomJsCommandcrawlJsPath - crawlJsPathpublic Page download(Request request, Task task)
Downloaderrequest - requesttask - taskpublic void setThread(int threadNum)
DownloaderthreadNum - number of threadspublic int getRetryNum()
public PhantomJSDownloader setRetryNum(int retryNum)
Copyright © 2017. All rights reserved.