public class FtpRobotRulesParser extends RobotRulesParser
RobotRulesParser class and contains Ftp protocol
specific implementation for obtaining the robots file.agentNames, CACHE, conf, EMPTY_RULES, FORBID_ALL_RULES, whiteList| Constructor and Description |
|---|
FtpRobotRulesParser(Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
crawlercommons.robots.BaseRobotRules |
getRobotRulesSet(Protocol ftp,
URL url,
List<Content> robotsTxtContent)
The hosts for which the caching of robots rules is yet to be done, it sends
a Ftp request to the host corresponding to the
URL passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future. |
getConf, getRobotRulesSet, isWhiteListed, main, parseRules, run, setConfpublic FtpRobotRulesParser(Configuration conf)
public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol ftp, URL url, List<Content> robotsTxtContent)
URL passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future.getRobotRulesSet in class RobotRulesParserftp - The Protocol objecturl - URLrobotsTxtContent - container to store responses when fetching the robots.txt file for
debugging or archival purposes. Instead of a robots.txt file, it
may include redirects or an error page (404, etc.). Response
Content is appended to the passed list. If null is passed
nothing is stored.BaseRobotRules object for the rulesCopyright © 2021 The Apache Software Foundation