org.apache.uima.examples.tagger.trainAndTest
Class ModelGeneration
java.lang.Object
org.apache.uima.examples.tagger.trainAndTest.ModelGeneration
- All Implemented Interfaces:
- Serializable
public class ModelGeneration
- extends Object
- implements Serializable
Trains an N-gram model for the tagger, iterating over the files from some predefined training directory.
Writes the resulting model to a binary file.
NB. At the moment: both bi-and trigram statistics are saved in one model file..
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
suffix_tree
public Map suffix_tree
suffix_tree_capitalized
public Map suffix_tree_capitalized
word_probs
public Map<String,Map<String,Double>> word_probs
- Map containing
<word,tag>
probabilities, that is probability of a certain word given a certain tag at a time t: P(wordt|tagt))
transition_probs
public Map<NGram,Double> transition_probs
- Map containing N-gram probabilities
lambdas2
public double[] lambdas2
lambdas3
public double[] lambdas3
theta
public double theta
ModelGeneration
public ModelGeneration(List<Token> corpus,
String OutputFile)
init
public void init()
capitalized
public static boolean capitalized(String word)
- Check is the token is capitalized
main
public static void main(String[] args)
Copyright © 2006-2011 The Apache Software Foundation. All Rights Reserved.