See: Description
| Interface | Description |
|---|---|
| IndexingFilter |
Extension point for indexing.
|
| IndexWriter |
| Class | Description |
|---|---|
| CleaningJob |
The class scans CrawlDB looking for entries with status DB_GONE (404) or
DB_DUPLICATE and sends delete requests to indexers for those documents.
|
| CleaningJob.DBFilter | |
| CleaningJob.DeleterReducer | |
| IndexerMapReduce |
This class is typically invoked from within
IndexingJob and handles all MapReduce
functionality required when undertaking indexing. |
| IndexerMapReduce.IndexerMapper | |
| IndexerMapReduce.IndexerReducer | |
| IndexerOutputFormat | |
| IndexingFilters |
Creates and caches
IndexingFilter implementing plugins. |
| IndexingFiltersChecker |
Reads and parses a URL and run the indexers on it.
|
| IndexingJob |
Generic indexer which relies on the plugins implementing IndexWriter
|
| IndexWriterConfig | |
| IndexWriterParams | |
| IndexWriters |
Creates and caches
IndexWriter implementing plugins. |
| NutchDocument |
A
NutchDocument is the unit of indexing. |
| NutchField |
This class represents a multi-valued field with a weight.
|
| NutchIndexAction |
A
NutchIndexAction is the new unit of indexing holding the document
and action information. |
| Exception | Description |
|---|---|
| IndexingException |
Copyright © 2021 The Apache Software Foundation