4.3 Save the results
Well, crawlers written and now we may have a question: If I want to grab the results saved, how to do it? Components WebMagic to hold the result is called Pipeline
. For example, we adopted "console output" it is through a built-in Pipeline completed, it is called ConsolePipeline
. Well, I now want to save the results down by Json format, how to do it? I just need to be replaced to achieve Pipeline "JsonFilePipeline" on it.
public static void main(String[] args) {
Spider.create(new GithubRepoPageProcessor())
// From "https://github.com/code4craft" began to grasp
.addUrl("https://github.com/code4craft")
.addPipeline(new JsonFilePipeline("D:\\webmagic\\"))
// Open 5 threads of Crawl
.thread(5)
// Start Crawl
.run();
}
Like this downloaded file will be saved in the disk D: directory webmagic.
By customizing Pipeline, we can achieve save the results to a file, a database and a series of functions. This will be introduced in Chapter 7, "to extract a result Handling".
Thus far, we have completed the basic preparation of a crawler, but also has a number of customization features.