public class ParseSegment extends NutchTool implements Tool
| Modifier and Type | Class and Description |
|---|---|
static class |
ParseSegment.ParseSegmentMapper |
static class |
ParseSegment.ParseSegmentReducer |
| Modifier and Type | Field and Description |
|---|---|
static String |
SKIP_TRUNCATED |
currentJob, currentJobNum, numJobs, results, status| Constructor and Description |
|---|
ParseSegment() |
ParseSegment(Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
static boolean |
isTruncated(Content content)
Checks if the page's content is truncated.
|
static void |
main(String[] args) |
void |
parse(Path segment) |
Map<String,Object> |
run(Map<String,Object> args,
String crawlId)
Runs the tool, using a map of arguments.
|
int |
run(String[] args) |
getProgress, getStatus, killJob, stopJobgetConf, setConfclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetConf, setConfpublic static final String SKIP_TRUNCATED
public ParseSegment()
public ParseSegment(Configuration conf)
public static boolean isTruncated(Content content)
content - true. When it is not, or when
it could be determined, false.public void parse(Path segment) throws IOException, InterruptedException, ClassNotFoundException
Copyright © 2021 The Apache Software Foundation