public class Http extends HttpBase
This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. It takes care of HTTPS protocol as well as cookies in a single fetch session.
Documentation can be found on the Nutch HttpAuthenticationSchemes wiki page.
The original description of the motivation to support HttpPostAuthentication is also included on the Nutch wiki. Additionally HttpPostAuthentication development is documented at the NUTCH-827 Jira issue.
| Modifier and Type | Field and Description |
|---|---|
protected static org.slf4j.Logger |
LOG |
accept, acceptCharset, acceptLanguage, BUFFER_SIZE, COOKIE, enableCookieHeader, enableIfModifiedsinceHeader, maxContent, maxCrawlDelay, maxDuration, partialAsTruncated, proxyException, proxyHost, proxyPort, proxyType, RESPONSE_TIME, responseTime, storeHttpHeaders, storeHttpRequest, storeIPAddress, timeout, tlsCheckCertificate, tlsPreferredCipherSuites, tlsPreferredProtocols, useHttp11, useHttp2, useProxy, userAgentX_POINT_ID| Constructor and Description |
|---|
Http()
Constructs this plugin.
|
| Modifier and Type | Method and Description |
|---|---|
protected Response |
getResponse(URL url,
CrawlDatum datum,
boolean redirect)
Fetches the
url with a configured HTTP client and gets the
response. |
static void |
main(String[] args)
Main method.
|
void |
setConf(Configuration conf)
Reads the configuration from the Nutch configuration files and sets the
configuration.
|
getAccept, getAcceptCharset, getAcceptLanguage, getConf, getCookie, getMaxContent, getMaxDuration, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getTlsPreferredCipherSuites, getTlsPreferredProtocols, getUseHttp11, getUserAgent, isCookieEnabled, isIfModifiedSinceEnabled, isStoreHttpHeaders, isStoreHttpRequest, isStoreIPAddress, isStorePartialAsTruncated, isTlsCheckCertificates, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy, useProxy, useProxypublic void setConf(Configuration conf)
setConf in interface ConfigurablesetConf in class HttpBaseconf - Configurationpublic static void main(String[] args) throws Exception
args - Command line argumentsExceptionprotected Response getResponse(URL url, CrawlDatum datum, boolean redirect) throws ProtocolException, IOException
url with a configured HTTP client and gets the
response.getResponse in class HttpBaseurl - URL to be fetcheddatum - Crawl dataredirect - Follow redirects if and only if trueProtocolExceptionIOExceptionCopyright © 2021 The Apache Software Foundation