| Package | Description |
|---|---|
| org.apache.nutch.indexer |
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
|
| org.apache.nutch.metadata |
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
|
| org.apache.nutch.net.protocols |
Helper classes related to the
Protocol
interface, see also org.apache.nutch.protocol. |
| org.apache.nutch.parse |
The
Parse interface and related classes. |
| org.apache.nutch.protocol |
Classes related to the
Protocol interface,
see also org.apache.nutch.net.protocols. |
| org.apache.nutch.protocol.htmlunit |
Protocol plugin which supports retrieving documents via the http protocol.
|
| org.apache.nutch.protocol.httpclient |
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
|
| org.apache.nutch.protocol.okhttp |
Protocol plugin based on okhttp, supports http, https, http/2.
|
| org.apache.nutch.scoring.webgraph | |
| org.apache.nutch.segment |
A segment stores all data from on generate/fetch/update cycle:
fetch list, protocol status, raw content, parsed content, and extracted outgoing links.
|
| org.apache.nutch.tools |
Miscellaneous tools.
|
| org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| CreativeCommons
A collection of Creative Commons properties names.
|
| DublinCore
A collection of Dublin Core metadata names.
|
| Feed
A collection of Feed property names extracted by the ROME library.
|
| HttpHeaders
A collection of HTTP header names.
|
| Metadata
A multi-valued metadata container.
|
| Nutch
A collection of Nutch internal metadata constants.
|
| Class and Description |
|---|
| HttpHeaders
A collection of HTTP header names.
|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| HttpHeaders
A collection of HTTP header names.
|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| HttpHeaders
A collection of HTTP header names.
|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| MetaWrapper
This is a simple decorator that adds metadata to any Writable-s that can be
serialized by NutchWritable.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
| Class and Description |
|---|
| Metadata
A multi-valued metadata container.
|
Copyright © 2021 The Apache Software Foundation