public class BasicIndexingFilter extends Object implements IndexingFilter
indexer.add.domain in nutch-default.xml. title is truncated as per
indexer.max.title.length in nutch-default.xml. (As per NUTCH-1004, a
zero-length title is not added) content is truncated as per
indexer.max.content.length in nutch-default.xml.X_POINT_ID| Constructor and Description |
|---|
BasicIndexingFilter() |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
The
BasicIndexingFilter filter object which supports few
configuration settings for adding basic searchable fields. |
Configuration |
getConf()
Get the
Configuration object |
void |
setConf(Configuration conf)
Set the
Configuration object |
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
BasicIndexingFilter filter object which supports few
configuration settings for adding basic searchable fields. See
indexer.add.domain, indexer.max.title.length,
indexer.max.content.length in nutch-default.xml.filter in interface IndexingFilterdoc - The NutchDocument objectparse - The relevant Parse object passing through the filterurl - URL to be filtered for anchor textdatum - The CrawlDatum entryinlinks - The Inlinks containing anchor textIndexingExceptionpublic void setConf(Configuration conf)
Configuration objectsetConf in interface Configurablepublic Configuration getConf()
Configuration objectgetConf in interface ConfigurableCopyright © 2021 The Apache Software Foundation