public class AnchorIndexingFilter extends Object implements IndexingFilter
anchorIndexingFilter.deduplicate in nutch-default.xml.X_POINT_ID| Constructor and Description |
|---|
AnchorIndexingFilter() |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
The
AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
Configuration |
getConf()
Get the
Configuration object |
void |
setConf(Configuration conf)
Set the
Configuration object |
public void setConf(Configuration conf)
Configuration objectsetConf in interface Configurablepublic Configuration getConf()
Configuration objectgetConf in interface Configurablepublic NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. See
anchorIndexingFilter.deduplicate in nutch-default.xml.filter in interface IndexingFilterdoc - The NutchDocument objectparse - The relevant Parse object passing through the filterurl - URL to be filtered for anchor textdatum - The CrawlDatum entryinlinks - The Inlinks containing anchor textIndexingExceptionCopyright © 2021 The Apache Software Foundation