public class TextProfileSignature extends Signature
An implementation of a page signature. It calculates an MD5 hash of a plain
text "profile" of a page. In case there is no text, it calculates a hash
using the MD5Signature.
The algorithm to calculate a page "profile" takes the plain text version of a page and performs the following steps:
QUANT = QUANT_RATE * maxFreq, where QUANT_RATE is
0.01f by default, and maxFreq is the maximum token frequency).
If maxFreq is higher than 1, then QUANT is always higher than 2
(which means that tokens with frequency 1 are always discarded).| Constructor and Description |
|---|
TextProfileSignature() |
| Modifier and Type | Method and Description |
|---|---|
byte[] |
calculate(Content content,
Parse parse) |
static void |
main(String[] args) |
void |
setConf(Configuration conf) |
public void setConf(Configuration conf)
setConf in interface ConfigurablesetConf in class SignatureCopyright © 2021 The Apache Software Foundation