| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_PLUGIN
Wildcard for default plugins.
|
| Constructor and Description |
|---|
ParserFactory(Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
protected List<Extension> |
getExtensions(String contentType)
Finds the best-suited parse plugin for a given contentType.
|
Parser |
getParserById(String id)
Function returns a
Parser instance with the specified
extId, representing its extension ID. |
Parser[] |
getParsers(String contentType,
String url)
Function returns an array of
Parsers for a given content type. |
public static final String DEFAULT_PLUGIN
public ParserFactory(Configuration conf)
public Parser[] getParsers(String contentType, String url) throws ParserNotFound
Parsers for a given content type.
The function consults the internal list of parse plugins for the
ParserFactory to determine the list of pluginIds, then gets the appropriate
extension points to instantiate as Parsers.contentType - The contentType to return the Array of Parser
s for.url - The url for the content that may allow us to get the type from the
file suffix.Array of Parsers for the given contentType.
If there were plugins mapped to a contentType via the
parse-plugins.xml file, but never enabled via the
plugin.includes Nutch conf, then those plugins won't
be part of this array, i.e., they will be skipped. So, if the
ordered list of parsing plugins for text/plain was
[parse-text,parse-html,
parse-rtf], and only parse-html and
parse-rtf were enabled via
plugin.includes, then this ordered Array would consist
of two Parser interfaces,
[parse-html, parse-rtf].ParserNotFoundpublic Parser getParserById(String id) throws ParserNotFound
Parser instance with the specified
extId, representing its extension ID. If the Parser instance
isn't found, then the function throws a ParserNotFound
exception. If the function is able to find the Parser in the
internal PARSER_CACHE then it will return the already
instantiated Parser. Otherwise, if it has to instantiate the Parser itself
, then this function will cache that Parser in the internal
PARSER_CACHE.id - The string extension ID (e.g.,
"org.apache.nutch.parse.rss.RSSParser",
"org.apache.nutch.parse.rtf.RTFParseFactory") of the
Parser implementation to return.Parser implementation specified by the parameter
id.ParserNotFound - If the Parser is not found (i.e., registered with the extension
point), or if the there a PluginRuntimeException
instantiating the Parser.Copyright © 2021 The Apache Software Foundation