public class MimeTypeIndexingFilter extends Object implements IndexingFilter
IndexingFilter that allows filtering
of documents based on the MIME Type detected by Tika| Modifier and Type | Field and Description |
|---|---|
static String |
MIMEFILTER_REGEX_FILE |
X_POINT_ID| Constructor and Description |
|---|
MimeTypeIndexingFilter() |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
Configuration |
getConf() |
static void |
main(String[] args)
Main method for invoking this tool
|
void |
setConf(Configuration conf) |
public static final String MIMEFILTER_REGEX_FILE
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
IndexingFilterfilter in interface IndexingFilterdoc - document instance for collecting fieldsparse - parse data instanceurl - page urldatum - crawl datum for the page (fetch datum from segment containing
fetch status and fetch time)inlinks - page inlinksIndexingExceptionpublic void setConf(Configuration conf)
setConf in interface Configurablepublic Configuration getConf()
getConf in interface Configurablepublic static void main(String[] args) throws IOException, IndexingException
IOExceptionIndexingExceptionCopyright © 2021 The Apache Software Foundation