public static class Injector.InjectReducer extends Reducer<Text,CrawlDatum,Text,CrawlDatum>
Reducer.Context| Constructor and Description |
|---|
InjectReducer() |
| Modifier and Type | Method and Description |
|---|---|
void |
reduce(Text key,
Iterable<CrawlDatum> values,
Reducer.Context context)
Merge the input records of one URL as per rules below :
|
void |
setup(Reducer.Context context) |
public void setup(Reducer.Context context)
setup in class Reducer<Text,CrawlDatum,Text,CrawlDatum>public void reduce(Text key, Iterable<CrawlDatum> values, Reducer.Context context) throws IOException, InterruptedException
1. If there is ONLY new injected record ==> emit injected record
2. If there is ONLY old record ==> emit existing record
3. If BOTH new and old records are present:
(a) If 'overwrite' is true ==> emit injected record
(b) If 'overwrite' is false :
(i) If 'update' is false ==> emit existing record
(ii) If 'update' is true ==> update existing record and emit it
For more details @see NUTCH-1405reduce in class Reducer<Text,CrawlDatum,Text,CrawlDatum>IOExceptionInterruptedExceptionCopyright © 2021 The Apache Software Foundation