public class LuceneTokenizer extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
LuceneTokenizer.TokenizerType |
| Constructor and Description |
|---|
LuceneTokenizer(String content,
LuceneTokenizer.TokenizerType tokenizer,
boolean useStopFilter,
LuceneAnalyzerUtil.StemFilterType stemFilterType)
Creates a tokenizer based on param values
|
LuceneTokenizer(String content,
LuceneTokenizer.TokenizerType tokenizer,
List<String> stopWords,
boolean addToDefault,
LuceneAnalyzerUtil.StemFilterType stemFilterType)
Creates a tokenizer based on param values
|
LuceneTokenizer(String content,
LuceneTokenizer.TokenizerType tokenizer,
LuceneAnalyzerUtil.StemFilterType stemFilterType,
int mingram,
int maxgram)
Creates a tokenizer for the ngram model based on param values
|
| Modifier and Type | Method and Description |
|---|---|
TokenStream |
getTokenStream()
Returns the tokenStream created by the Tokenizer
|
public LuceneTokenizer(String content, LuceneTokenizer.TokenizerType tokenizer, boolean useStopFilter, LuceneAnalyzerUtil.StemFilterType stemFilterType)
content - - The text to tokenizetokenizer - - the type of tokenizer to use CLASSIC or DEFAULTuseStopFilter - - if set to true the token stream will be filtered using default Lucene stopsetstemFilterType - - Type of stemming to performpublic LuceneTokenizer(String content, LuceneTokenizer.TokenizerType tokenizer, List<String> stopWords, boolean addToDefault, LuceneAnalyzerUtil.StemFilterType stemFilterType)
content - - The text to tokenizetokenizer - - the type of tokenizer to use CLASSIC or DEFAULTstopWords - - Provide a set of user defined stop wordsaddToDefault - - If set to true, the stopSet words will be added to the Lucene default stop set.
If false, then only the user provided words will be used as the stop setstemFilterType - public LuceneTokenizer(String content, LuceneTokenizer.TokenizerType tokenizer, LuceneAnalyzerUtil.StemFilterType stemFilterType, int mingram, int maxgram)
content - - The text to tokenizetokenizer - - the type of tokenizer to use CLASSIC or DEFAULTstemFilterType - - Type of stemming to performmingram - - Value of mingram for tokenizingmaxgram - - Value of maxgram for tokenizingCopyright © 2021 The Apache Software Foundation