| Class | Description |
|---|---|
| AbstractChecker |
Scaffolding class for the various Checker implementations.
|
| CommandRunner | |
| CrawlCompletionStats |
Extracts some simple crawl completion stats from the crawldb
Stats will be sorted by host/domain and will be of the form:
1 www.spitzer.caltech.edu FETCHED
50 www.spitzer.caltech.edu UNFETCHED
|
| CrawlCompletionStats.CrawlCompletionStatsCombiner | |
| DeflateUtils |
A collection of utility methods for working on deflated data.
|
| DomUtil | |
| DumpFileUtil | |
| EncodingDetector |
A simple class for detecting character encodings.
|
| FSUtils |
Utility methods for common filesystem operations.
|
| GenericWritableConfigurable |
A generic Writable wrapper that can inject Configuration to
Configurables |
| GZIPUtils |
A collection of utility methods for working on GZIPed data.
|
| HadoopFSUtil | |
| JexlUtil |
Utility methods for handling JEXL expressions
|
| LockUtil |
Utility methods for handling application-level locking.
|
| MimeUtil |
This is a facade class to insulate Nutch from its underlying Mime Type
substrate library, Apache Tika.
|
| NodeWalker |
A utility class that allows the walking of any DOM tree using a stack instead
of recursion.
|
| NutchConfiguration |
Utility to create Hadoop
Configurations that include Nutch-specific
resources. |
| NutchJob |
A
Job for Nutch jobs. |
| NutchTool | |
| ObjectCache | |
| PrefixStringMatcher |
A class for efficiently matching
Strings against a set of
prefixes. |
| ProtocolStatusStatistics |
Extracts protocol status code information from the crawl database.
|
| ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner | |
| SegmentReaderUtil | |
| SitemapProcessor |
Performs Sitemap processing by fetching sitemap links, parsing the content and merging
the urls from Sitemap (with the metadata) with the existing crawldb.
|
| StringUtil |
A collection of String processing utility methods.
|
| SuffixStringMatcher |
A class for efficiently matching
Strings against a set of
suffixes. |
| TableUtil | |
| TimingUtil | |
| TrieStringMatcher |
TrieStringMatcher is a base class for simple tree-based string matching.
|
| URLUtil |
Utility class for URL analysis
|
Copyright © 2021 The Apache Software Foundation