5.5 Results of type conversion
Type Conversion ( Formatter
mechanism) is WebMagic 0.3.2 increased functionality. Because the content is always drawn to String, and we want may be other types of content. Formatter can be drawn into the content is automatically converted into a number of basic types without having to manually use the code conversion.
E.g:
@ExtractBy("//ul[@class='pagehead-actions']/li[1]//a[@class='social-count js-social-count']/text()")
private int star;
5.5.1 supports automatic conversion type
Automatic conversion supports all basic types and packing type.
Primitive | packing type |
---|---|
int | Integer |
long | Long |
double | Double |
float | Float |
short | Short |
char | Character |
byte | Byte |
boolean | Boolean |
In addition, it supports java.util.Date
type conversion. However, when you convert, you need to specify the Date format. Format according to standard JDK defined, specific norms can be seen here: http://java.sun.com/docs/books/tutorial/i18n/format/simpleDateFormat.html
@Formatter("yyyy-MM-dd HH:mm")
@ExtractBy("//div[@class='BlogStat']/regex('\\d+-\\d+-\\d+\\s+\\d+:\\d+')")
private Date date;
5.5.2 explicitly specify conversion types
Under normal circumstances, Formatter will be converted according to the field type, but under special circumstances, we will need to manually specify the type. This occurs mainly in the field type is List
time.
@Formatter(value = "",subClazz = Integer.class)
@ExtractBy(value = "//div[@class='id']/text()", multi = true)
private List<Integer> ids;
5.5.3 Custom Formatter (TODO)
In fact, in addition to the automatic type conversion, Formatter also can do some things to process the results. For example, we have a demand scenario, the results need to be extracted as a result of part of the mosaic on the part of the string to use. Here, we define a StringTemplateFormatter
.
public class StringTemplateFormatter implements ObjectFormatter<String> {
private String template;
@Override
public String format(String raw) throws Exception {
return String.format(template, raw);
}
@Override
public Class<String> clazz() {
return String.class;
}
@Override
public void initParam(String[] extra) {
template = extra[0];
}
}
Well, we can, after extraction, to do some of the simple operation!
@Formatter(value = "author is %s",formatter = StringTemplateFormatter.class)
@ExtractByUrl("https://github\\.com/(\\w+)/.*")
private String author;
This feature in version 0.4.3 BUG, and will be fixed in 0.5.0 are open.