public class FileDumper extends Object
The tool has a number of immediate uses:
Upon successful completion the tool displays a very convenient JSON snippet detailing the mimetype classifications and the counts of documents which fall into those classifications. An example is as follows:
INFO: File Types:
TOTAL Stats:
[
{"mimeType":"application/xml","count":"19"}
{"mimeType":"image/png","count":"47"}
{"mimeType":"image/jpeg","count":"141"}
{"mimeType":"image/vnd.microsoft.icon","count":"4"}
{"mimeType":"text/plain","count":"89"}
{"mimeType":"video/quicktime","count":"2"}
{"mimeType":"image/gif","count":"63"}
{"mimeType":"application/xhtml+xml","count":"1670"}
{"mimeType":"application/octet-stream","count":"40"}
{"mimeType":"text/html","count":"1863"}
]
FILTER Stats:
[
{"mimeType":"image/png","count":"47"}
{"mimeType":"image/jpeg","count":"141"}
{"mimeType":"image/vnd.microsoft.icon","count":"4"}
{"mimeType":"video/quicktime","count":"2"}
{"mimeType":"image/gif","count":"63"}
]
In the case above, the tool would have been run with the -mimeType image/png image/jpeg image/vnd.microsoft.icon video/quicktime image/gif flag and corresponding values activated.
| Constructor and Description |
|---|
FileDumper() |
| Modifier and Type | Method and Description |
|---|---|
void |
dump(File outputDir,
File segmentRootDir,
String[] mimeTypes,
boolean flatDir,
boolean mimeTypeStats,
boolean reverseURLDump)
Dumps the reverse engineered raw content from the provided segment
directories if a parent directory contains more than one segment, otherwise
a single segment can be passed as an argument.
|
static void |
main(String[] args)
Main method for invoking this tool
|
public void dump(File outputDir, File segmentRootDir, String[] mimeTypes, boolean flatDir, boolean mimeTypeStats, boolean reverseURLDump) throws Exception
outputDir - the directory you wish to dump the raw content to. This directory
will be created.segmentRootDir - a directory containing one or more segments.mimeTypes - an array of mime types we have to dump, all others will be
filtered out.flatDir - a boolean flag specifying whether the output directory should contain
only files instead of using nested directories to prevent naming
conflicts.mimeTypeStats - a flag indicating whether mimetype stats should be displayed
instead of dumping files.ExceptionCopyright © 2021 The Apache Software Foundation