public class Ftp extends Object implements Protocol
FtpResponse object and gets the content of the url from it.
Configurable parameters are ftp.username, ftp.password,
ftp.content.limit, ftp.timeout, ftp.server.timeout,
ftp.password, ftp.keep.connection and ftp.follow.talk
. For details see "FTP properties" section in nutch-default.xml.| Modifier and Type | Field and Description |
|---|---|
protected static org.slf4j.Logger |
LOG |
X_POINT_ID| Constructor and Description |
|---|
Ftp() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
finalize() |
int |
getBufferSize() |
Configuration |
getConf()
Get the
Configuration object |
ProtocolOutput |
getProtocolOutput(Text url,
CrawlDatum datum)
Creates a
FtpResponse object corresponding to the url and returns a
ProtocolOutput object as per the content received |
crawlercommons.robots.BaseRobotRules |
getRobotRules(Text url,
CrawlDatum datum,
List<Content> robotsTxtContent)
Get the robots rules for a given url
|
static void |
main(String[] args)
For debugging.
|
void |
setConf(Configuration conf)
Set the
Configuration object |
void |
setFollowTalk(boolean followTalk)
Set followTalk
|
void |
setKeepConnection(boolean keepConnection)
Set keepConnection
|
void |
setMaxContentLength(int length)
Set the point at which content is truncated.
|
void |
setTimeout(int to)
Set the timeout.
|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum)
FtpResponse object corresponding to the url and returns a
ProtocolOutput object as per the content receivedgetProtocolOutput in interface Protocolurl - Text containing the ftp urldatum - The CrawlDatum object corresponding to the urlProtocolOutput object for the urlpublic void setConf(Configuration conf)
Configuration objectsetConf in interface Configurablepublic Configuration getConf()
Configuration objectgetConf in interface Configurablepublic crawlercommons.robots.BaseRobotRules getRobotRules(Text url, CrawlDatum datum, List<Content> robotsTxtContent)
getRobotRules in interface Protocolurl - URL to checkdatum - page datumrobotsTxtContent - container to store responses when fetching the robots.txt file for
debugging or archival purposes. Instead of a robots.txt file, it
may include redirects or an error page (404, etc.). Response
Content is appended to the passed list. If null is passed
nothing is stored.public int getBufferSize()
Copyright © 2021 The Apache Software Foundation