| Modifier and Type | Field and Description |
|---|---|
static String |
X_POINT_ID
The name of the extension point.
|
| Modifier and Type | Method and Description |
|---|---|
ProtocolOutput |
getProtocolOutput(Text url,
CrawlDatum datum)
Returns the
Content for a fetchlist entry. |
crawlercommons.robots.BaseRobotRules |
getRobotRules(Text url,
CrawlDatum datum,
List<Content> robotsTxtContent)
Retrieve robot rules applicable for this URL.
|
getConf, setConfstatic final String X_POINT_ID
ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum)
Content for a fetchlist entry.crawlercommons.robots.BaseRobotRules getRobotRules(Text url, CrawlDatum datum, List<Content> robotsTxtContent)
url - URL to checkdatum - page datumrobotsTxtContent - container to store responses when fetching the robots.txt file for
debugging or archival purposes. Instead of a robots.txt file, it
may include redirects or an error page (404, etc.). Response
Content is appended to the passed list. If null is passed
nothing is stored.Copyright © 2021 The Apache Software Foundation